Vision Transformer Enhanced by Contrastive Learning: A Self-Supervised Strategy for Pulmonary Tuberculosis Diagnosis
Downloads
Tuberculosis (TB) diagnosis from Chest X-ray (CXR) images poses a significant challenge in radiology due to the inherent data imbalance and subtle lesion heterogeneity. These factors cause traditional deep learning models, like standard CNNs and conventional Vision Transformers (ViT), to exhibit poor generalization and inadequate sensitivity (recall) for the minority TB class. We address this critical research gap by introducing a novel methodology, an enhanced ViT architecture that leverages Self-Supervised Learning (SSL) via the SimCLR framework, subsequently optimized with an Adaptive Weighted Focal Loss. Our primary objective was to develop a generalizable model that minimizes false negatives without sacrificing overall precision, thereby establishing a new performance benchmark for automated TB detection. The methodology conceptually separates feature learning from SSL pre-training on unlabeled data to generate robust and domain-invariant features, distinct from classification optimization. Adaptive Weighted Focal Loss is employed during fine-tuning to counter majority class gradient dominance mechanistically. We validated this approach using K-Fold Cross-Validation. The final ViT SSL Weighted model achieved a peak internal accuracy of 0.9861 and an AUPRC of 0.9781. Crucially, it maintained generalization stability when externally tested on the TBX11K dataset, securing an AUPRC of 0.9795 and a high recall of 0.9527. This minimal variance strongly confirms the reproducibility and robustness of our features against institutional variation. The resulting high recall directly translates to enhanced diagnostic decision-making, significantly lowering the clinical risk associated with a missed TB diagnosis. This study establishes an effective, stable, and generalizable SSL-based ViT framework, offering a scalable solution for public health efforts in resource-constrained settings.
[1] World Health Organization, “2024 Global tuberculosis report,” Geneva, 2024.
[2] Ministry of Health of the Republic of Indonesia, “Indonesia’s Movement to End TB.” Accessed: Jun. 13, 2025. [Online]. Available: https://kemkes.go.id/id/indonesias-movement-to-end-tb
[3] W. N. Waluyo, R. Rizal Isnanto, and Adian Fatchur Rochim, “Comparison of Mycobacterium Tuberculosis Image Detection Accuracy Using CNN and Combination CNN-KNN,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 7, no. 1, pp. 80–87, Feb. 2023, doi: 10.29207/resti.v7i1.4626.
[4] E. Showkatian, M. Salehi, H. Ghaffari, R. Reiazi, and N. Sadighi, “Deep learning-based automatic detection of tuberculosis disease in chest X-ray images,” Pol J Radiol, vol. 87, no. 1, pp. 118–124, 2022, doi: 10.5114/pjr.2022.113435.
[5] J. Onno, F. Ahmad Khan, A. Daftary, and P.-M. David, “Artificial intelligence-based computer aided detection (AI-CAD) in the fight against tuberculosis: Effects of moving health technologies in global health,” Soc Sci Med, vol. 327, p. 115949, 2023, doi: https://doi.org/10.1016/j.socscimed.2023.115949.
[6] L. Devnath et al., “Computer-Aided Diagnosis of Coal Workers’ Pneumoconiosis in Chest X-ray Radiographs Using Machine Learning: A Systematic Literature Review,” Int J Environ Res Public Health, vol. 19, no. 11, Jun. 2022, doi: 10.3390/ijerph19116439.
[7] Y. Hadhoud et al., “From Binary to Multi-Class Classification: A Two-Step Hybrid CNN-ViT Model for Chest Disease Classification Based on X-Ray Images,” Diagnostics, vol. 14, no. 23, Dec. 2024, doi: 10.3390/diagnostics14232754.
[8] M. Kolhar, A. M. Al Rajeh, and R. N. A. Kazi, “Augmenting Radiological Diagnostics with AI for Tuberculosis and COVID-19 Disease Detection: Deep Learning Detection of Chest Radiographs,” Diagnostics, vol. 14, no. 13, Jul. 2024, doi: 10.3390/diagnostics14131334.
[9] S. Hansun, A. Argha, S. T. Liaw, B. G. Celler, and G. B. Marks, “Machine and Deep Learning for Tuberculosis Detection on Chest X-Rays: Systematic Literature Review,” 2023, JMIR Publications Inc. doi: 10.2196/43154.
[10] Z. Chen, J. Duan, L. Kang, and G. Qiu, “A hybrid data-level ensemble to enable learning from highly imbalanced dataset,” Inf Sci (N Y), vol. 554, pp. 157–176, 2021, doi: https://doi.org/10.1016/j.ins.2020.12.023.
[11] S. Azizi et al., “Big Self-Supervised Models Advance Medical Image Classification,” in Google Research and Health, Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 3478–3488.
[12] J. Kishore, A. Jain, K. Krishna Koushika, P. K. Mishra, S. Karanwal, and S. Solanki, “Enhancing medical diagnosis on chest X-rays: knowledge distillation from self-supervised based model to compressed student model,” Discover Computing, vol. 28, no. 1, Dec. 2025, doi: 10.1007/s10791-025-09637-8.
[13] S. Singh, M. Kumar, A. Kumar, B. K. Verma, K. Abhishek, and S. Selvarajan, “Efficient pneumonia detection using Vision Transformers on chest X-rays,” Sci Rep, vol. 14, no. 1, pp. 1–17, Dec. 2024, doi: 10.1038/s41598-024-52703-2.
[14] E. Chamseddine, N. Mansouri, M. Soui, and M. Abed, “Handling class imbalance in COVID-19 chest X-ray images classification: Using SMOTE and weighted loss,” Appl Soft Comput, vol. 129, Nov. 2022, doi: 10.1016/j.asoc.2022.109588.
[15] K. R. M. Fernando and C. P. Tsokos, “Dynamically Weighted Balanced Loss: Class Imbalanced Learning and Confidence Calibration of Deep Neural Networks,” IEEE Trans Neural Netw Learn Syst, vol. 33, no. 7, pp. 2940–2951, Jul. 2022, doi: 10.1109/TNNLS.2020.3047335.
[16] X. Wang and G.-J. Qi, “Contrastive Learning with Stronger Augmentations,” Jan. 2022, [Online]. Available: http://arxiv.org/abs/2104.07713
[17] Z. Huang, J. Chen, J. Zhang, and H. Shan, “Learning Representation for Clustering via Prototype Scattering and Positive Sampling,” Oct. 2022, doi: 10.1109/TPAMI.2022.3216454.
[18] C. Zhang, X. Deng, and S. H. Ling, “Next-Gen Medical Imaging: U-Net Evolution and the Rise of Transformers,” Jul. 01, 2024, Multidisciplinary Digital Publishing Institute (MDPI). doi: 10.3390/s24144668.
[19] A. Qayyum, I. Razzak, M. Mazher, T. Khan, W. Ding, and S. Niederer, “Two-Stage Self-Supervised Contrastive Learning Aided Transformer for Real-Time Medical Image Segmentation,” IEEE J Biomed Health Inform, pp. 1–10, Oct. 2023, doi: 10.1109/JBHI.2023.3340956.
[20] T. Rahman et al., “Tuberculosis (TB) Chest X-ray Database,” Kaggle. Accessed: Jun. 10, 2025. [Online]. Available: https://www.kaggle.com/datasets/tawsifurrahman/tuberculosis-tb-chest-xray-dataset
[21] Y. Liu, Y.-H. Wu, Y. Ban, H. Wang, and M.-M. Cheng, “Rethinking Computer-Aided Tuberculosis Diagnosis,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 2643–2652. doi: 10.1109/CVPR42600.2020.00272.
[22] S. Szeghalmy and A. Fazekas, “A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning,” Sensors, vol. 23, no. 4, Feb. 2023, doi: 10.3390/s23042333.
[23] Z. Lyu et al., “Back-Propagation Neural Network Optimized by K-Fold Cross-Validation for Prediction of Torsional Strength of Reinforced Concrete Beam,” Materials, vol. 15, no. 4, Feb. 2022, doi: 10.3390/ma15041477.
[24] E. Goceri, “Medical image data augmentation: techniques, comparisons and interpretations,” Artif Intell Rev, vol. 56, no. 11, pp. 12561–12605, 2023, doi: 10.1007/s10462-023-10453-z.
[25] E. Tiu, E. Talius, P. Patel, C. P. Langlotz, A. Y. Ng, and P. Rajpurkar, “Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning,” Nat Biomed Eng, vol. 6, no. 12, pp. 1399–1406, Dec. 2022, doi: 10.1038/s41551-022-00936-9.
[26] J. Z. HaoChen, C. Wei, A. Gaidon, and T. Ma, “Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss,” Jun. 2022, [Online]. Available: http://arxiv.org/abs/2106.04156
[27] Z. Liu, A. Alavi, M. Li, and X. Zhang, “Self-Supervised Learning for Time Series: Contrastive or Generative?,” Mar. 2024, [Online]. Available: http://arxiv.org/abs/2403.09809
[28] M. Yeung, E. Sala, C. B. Schönlieb, and L. Rundo, “Unified Focal loss: Generalising Dice and cross entropy-based losses to handle class imbalanced medical image segmentation,” Computerized Medical Imaging and Graphics, vol. 95, Jan. 2022, doi: 10.1016/j.compmedimag.2021.102026.
[29] W. C. Wang, E. Ahn, D. Feng, and J. Kim, “A Review of Predictive and Contrastive Self-supervised Learning for Medical Images,” Aug. 01, 2023, Chinese Academy of Sciences. doi: 10.1007/s11633-022-1406-4.
[30] S. C. Huang, A. Pareek, M. Jensen, M. P. Lungren, S. Yeung, and A. S. Chaudhari, “Self-supervised learning for medical image classification: a systematic review and implementation guidelines,” Dec. 01, 2023, Nature Research. doi: 10.1038/s41746-023-00811-0.
[31] C.-H. Yeh, C.-Y. Hong, Y.-C. Hsu, T.-L. Liu, Y. Chen, and Y. LeCun, “Decoupled Contrastive Learning,” Jul. 2022, [Online]. Available: http://arxiv.org/abs/2110.06848
[32] J. Stuckner, B. Harder, and T. M. Smith, “Microstructure segmentation with deep learning encoders pre-trained on a large microscopy dataset,” NPJ Comput Mater, vol. 8, no. 1, Dec. 2022, doi: 10.1038/s41524-022-00878-5.
[33] S. V. Mehta, D. Patil, S. Chandar, and E. Strubell, “An Empirical Investigation of the Role of Pre-training in Lifelong Learning,” Aug. 2023, [Online]. Available: http://arxiv.org/abs/2112.09153
[34] D. Scholz, A. Can Erdur, J. Buchner, J. C. Peeken, D. Rueckert, and B. Wiestler, “Imbalance-aware loss functions improve medical image classification,” 2024.
[35] A. M. Carrington et al., “Deep ROC Analysis and AUC as Balanced Average Accuracy, for Improved Classifier Selection, Audit and Explanation,” IEEE Trans Pattern Anal Mach Intell, vol. 45, no. 1, pp. 329–341, Jan. 2023, doi: 10.1109/TPAMI.2022.3145392.
[36] K. Takahashi, K. Yamamoto, A. Kuchiba, and T. Koyama, “Confidence interval for micro-averaged F 1 and macro-averaged F 1 scores,” Applied Intelligence, vol. 52, no. 5, pp. 4961–4972, Mar. 2022, doi: 10.1007/s10489-021-02635-5.
[37] Y. Kumar, A. Koul, R. Singla, and M. F. Ijaz, “Artificial intelligence in disease diagnosis: a systematic literature review, synthesizing framework and future research agenda,” J Ambient Intell Humaniz Comput, vol. 14, no. 7, pp. 8459–8486, 2023, doi: 10.1007/s12652-021-03612-z.
[38] Y. Luo et al., “Machine learning based on routine laboratory indicators promoting the discrimination between active tuberculosis and latent tuberculosis infection,” Journal of Infection, vol. 84, no. 5, pp. 648–657, May 2022, doi: 10.1016/j.jinf.2021.12.046.
[39] N. Kavasoglu, O. Faruk Ertugrul, S. Kotan, Y. Hazar, and V. Eratilla, “Artificial Intelligence-Assisted Wrist Radiography Analysis in Orthodontics: Classification of Maturation Stage,” 2025, doi: 10.3390/app.
[40] Q. Dong et al., “Deep Learning Classification of Spinal Osteoporotic Compression Fractures on Radiographs using an Adaptation of the Genant Semiquantitative Criteria,” Acad Radiol, vol. 29, no. 12, pp. 1819–1832, Dec. 2022, doi: 10.1016/j.acra.2022.02.020.
[41] K. An and Y. Zhang, “A Self-Supervised Detail-Sensitive ViT-Based Model for COVID-19 X-ray Image Diagnosis: SDViT,” Applied Sciences (Switzerland), vol. 13, no. 1, Jan. 2023, doi: 10.3390/app13010454.
[42] D. Capellán-Martín et al., “Zero-Shot Pediatric Tuberculosis Detection in Chest X-Rays using Self-Supervised Learning,” Feb. 2024, doi: 10.1109/ISBI56570.2024.10635520.
Copyright (c) 2025 Widia Marlina, Umar Zaky (Author)

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlikel 4.0 International (CC BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).






