Accomplishments
Comparative Study of Vision Transformers for ASD Detection in Toddlers Using Facial Features
- Abstract
Autism Spectrum Disorder (ASD) is a neurodevelopmental condition characterized by difficulties in social interaction and communication. Early detection in children is critical for timely intervention, and recent research has explored the potential of using facial expressions, eye contact, and other facial features to identify ASD. While traditional clinical methods are effective but expensive, deep learning approaches, particularly Vision Transformers (ViTs), have demonstrated significant promise in enhancing ASD detection through image classification. In this study, a custom classifier is introduced to enhance the standard ViT model’s accuracy in classifying ASD-related features. By employing transfer learning and fine-tuning pre-trained Vision Transformers such as Simple ViT, DeiT, and CvT, a comparative analysis is conducted between these models and the ViT model with the proposed classifier. The ViT model with the proposed classifier shows superior performance, achieving an accuracy of 93% and a precision of 95.74% on test data. Additionally, the study uses a custom dataset consisting of Indian kids who have been confirmed as being autistic, on the mentioned models, revealing that racial or ethnic factors do not bias ASD detection. This claim was supported by the high accuracy achieved by each of the models.