Evaluating and Optimizing CNN–Transformer Architectures for Musculoskeletal Disease Classification
Keywords:
Deep Learning, Dataset Scaling, Computer Vision, Neural Network ArchitectureAbstract
This study examines the impact of dataset dimensionality on deep learning performance in musculoskeletal disease detection, focusing on osteoporosis and rheumatoid arthritis. Using over 200,000 annotated X-ray, DXA, and MRI images, the performance of Vision Transformer (ViT), ConvNeXt, and Swin Transformer models was systematically evaluated in terms of scalability, robustness, and multi-modal integration. Results demonstrate that increasing dataset scale significantly enhances model generalization, with Swin Transformer achieving the best performance (AUC = 0.94, p < 0.001). These findings underscore the critical role of self-attention mechanisms and model scaling strategies in medical image classification, providing new benchmarks for dataset requirements and guiding the development of more reliable AI-driven diagnostic systems. Furthermore, the study emphasizes the necessity of large, diverse datasets to mitigate overfitting and improve real-world applicability. It also highlights the potential of hybrid architectures for integrating multi-source medical data. Overall, this research contributes to advancing explainable and scalable AI solutions for musculoskeletal imaging in clinical practice.
Downloads
References
A. Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” 2021. [Online]. Available: https://arxiv.org/abs/2010.11929
X. Liu, L. Song, S. Liu, and Y. Zhang, “A Review of Deep-Learning-Based Medical Image Segmentation Methods,” 2021.
T. Johnson, J. Su, A. Henning, and J. Ren, “A 7T MRI Study of Fibular Bone Thickness and Density : Impact of Age , Sex and Body Weight , and Correlation with Bone Marrow Expansion and Muscle Fat Infiltration,” 2025.
T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A Simple Framework for Contrastive Learning of Visual Representations,” no. Figure 1, 2019.
J. He et al., “Focused Contrastive Loss for Classification With Pre-Trained Language Models,” IEEE Trans. Knowl. Data Eng., vol. 36, no. 7, pp. 3047–3061, 2024, doi: 10.1109/tkde.2023.3327777.
B. C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understanding Deep Learning ( Still ) Requires Rethinking Generalization,” pp. 107–115, 2017.
and J. F. T. Hastie, R. Tibshirani, “The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. New York, NY, USA: Springer, 2009.,” Math. Intell., vol. 27, no. 2, pp. 83–85, 2009.
I. Loshchilov and F. Hutter, “D w d r,” 2019.
G. Hinton, “Dropout : A Simple Way to Prevent Neural Networks from Overfitting,” vol. 15, pp. 1929–1958, 2014.
J. M. Johnson and T. M. Khoshgoftaar, “Survey on deep learning with class imbalance,” J. Big Data, 2019, doi: 10.1186/s40537-019-0192-5.
R. Kohavi, “A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection,” no. June, 2013.
M. Lin and H. Chen, “A Study of the Effects of Digital Learning on Learning Motivation and Learning Outcome,” vol. 8223, no. 7, pp. 3553–3564, 2017, doi: 10.12973/eurasia.2017.00744a.
Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436 – 444, 2015, doi: 10.1038/nature14539.
G. Litjens et al., “A survey on deep learning in medical image analysis,” vol. 42, no. December 2012, pp. 60–88, 2017, doi: 10.1016/j.media.2017.07.005.
Downloads
Published
How to Cite
Issue
Section
ARK
License
Copyright (c) 2025 Moulay Youssef Ichahane, Noureddine Assad

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright on any article published in the International Journal of Computer Engineering and Data Science (IJCEDS) is retained by the author(s). All articles are published under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0), which permits any non-commercial use, distribution, and reproduction in any medium, provided that the original work is properly cited.
License Agreement
By submitting and publishing their work in IJCEDS, the authors:
-
Grant IJCEDS the non-exclusive right to publish the article and to identify IJCEDS as the original publisher.
-
Authorize any third party to use, share, and reproduce the article for non-commercial purposes, provided that appropriate credit is given to the original authors and source, and a link to the license is included.