A hybrid CNN–ViT framework for skin disease classification via feature extraction and selection

Back to Accomplishments

Accomplishments

A hybrid CNN–ViT framework for skin disease classification via feature extraction and selection

Details
Share

Category

Articles

Authors

Jagannath Nirmal & Ninad Mehendale

Publisher

Springer

Publishing Date

01-Sep-2025

volume

Issue

Pages

27151–27177

doi

https://doi.org/10.1007/s00521-025-11664-x

Abstract

Automated classification of skin diseases using deep learning and machine learning is becoming increasingly prominent in the field of medical imaging. This study explores a hybrid framework that combines convolutional neural networks (CNNs), vision transformers (ViTs), and traditional machine learning classifiers to distinguish between three common skin conditions: Psoriasis, Eczema, and Tinea Ringworm. A curated dataset of 2,750 clinical images sourced from DermNet and the Atlas of Dermatology was used, with two dataset splits: 70:15:15 and 80:10:10 for training, validation, and testing, respectively. Feature extraction was performed using five deep learning backbones—EfficientNet-B0, ResNet50, ConvNeXt-Tiny, Swin-Tiny, and DeiT-Small. To reduce dimensionality and enhance relevance, five feature selection techniques were applied: Boruta, Chi-Square, ANOVA, recursive feature elimination (RFE), and Kruskal–Wallis (KW). The selected features were then used to train five classical machine learning classifiers: support vector machine (SVM), logistic regression (LR), k-nearest neighbors (KNN), random forest (RF), and multilayer perceptron (MLP). All experiments were repeated with three random seeds (42, 99, and 123) to ensure robustness. Among the combinations, ConvNeXt-Tiny features selected via Boruta and classified using SVM delivered the best hybrid result, achieving an accuracy of 79.78%, a notable improvement over the baseline of 74.7% ± 0.0% on the 80:10:10 split. Additionally, an end-to-end classification approach was evaluated using four backbone models—ResNet50, EfficientNet-B0, ConvNeXt-Tiny, and Swin-Tiny—also across three runs with seeds 42, 99, and 123. Among these, the Swin-Tiny model achieved the highest performance, with an accuracy of 82.1% ± 1.3%. These findings demonstrate the strength of hybrid approaches that leverage both deep learning for feature extraction and classical machine learning for classification, alongside the competitive edge of end-to

Related Items

NINAD MEHENDALE. (2018).

Clogging-free continuous operation with whole blood in a radial pillar device (RAPID). Biomedical Microdevices,20(1): 75.doi: https://doi.org/10.1007/s10544-018-0319-z

NINAD MEHENDALE. (2019).

Intervention of microfluidics in biofuel and bioenergy sectors: Technological considerations and future prospects. Renewable and Sustainable Energy Reviews,101(3): 548-558.doi: https://doi.org/10.1016/j.rser.2018.11.040

View All

Apply Now Enquire Now Chat with a Student