7. ASL LiveSign: Real Time Sign Language Translation
25 Spring, MIT.
- Roland Barthes, A Lover’s Discourse"Language is a skin: I rub my language against the other."
American Sign Language (ASL) is an essential communication tool for deaf and hard-of-hearing individuals, but significant barriers persist in interactions between ASL users and the wider community. Existing translation systems frequently struggle with accurate, real-time interpretation due to limitations in gesture recognition and contextual understanding. ASL LiveSign addresses these limitations by integrating state-of-the-art multimodal technologies, providing a practical, real-time translation solution. Its development is motivated by the urgent need for improved accessibility and inclusion, particularly in educational, professional, and daily social contexts.
Initial tests using generic pre-trained models demonstrated insufficient gesture classification accuracy. Addressing this, I specifically fine-tuned the TimeSformer-base model initially trained on Kinetics-400, adapting it to my ASL-focused dataset collected from Youtube. This significantly improved the gesture recognition performance. This also raised certain problems, an important one is that some gestures are actually pretty similar but with different meanings, like “study” and “school”, this may cause the model to randomly select one, which emphasizes the importance of why I should implement a MLLM to do the final summarization, as it has a better continuous contextual understanding and may be able to correct the meaning during the final stage. Additionally, I meticulously optimized MediaPipe’s segmentation parameters to better handle gesture variability.