Vision Transformer Enhanced by Contrastive Learning: A Self-Supervised Strategy for Pulmonary Tuberculosis Diagnosis 

Vision Trasnformer Self-Supervised Learning ViT SimCLR Tuberculosis

Authors

Downloads

Tuberculosis (TB) diagnosis from Chest X-ray (CXR) images poses a significant challenge in radiology due to the inherent data imbalance and subtle lesion heterogeneity. These factors cause traditional deep learning models, like standard CNNs and conventional Vision Transformers (ViT), to exhibit poor generalization and inadequate sensitivity (recall) for the minority TB class. We address this critical research gap by introducing a novel methodology, an enhanced ViT architecture that leverages Self-Supervised Learning (SSL) via the SimCLR framework, subsequently optimized with an Adaptive Weighted Focal Loss. Our primary objective was to develop a generalizable model that minimizes false negatives without sacrificing overall precision, thereby establishing a new performance benchmark for automated TB detection. The methodology conceptually separates feature learning from SSL pre-training on unlabeled data to generate robust and domain-invariant features, distinct from classification optimization. Adaptive Weighted Focal Loss is employed during fine-tuning to counter majority class gradient dominance mechanistically. We validated this approach using K-Fold Cross-Validation. The final ViT SSL Weighted model achieved a peak internal accuracy of 0.9861 and an AUPRC of 0.9781. Crucially, it maintained generalization stability when externally tested on the TBX11K dataset, securing an AUPRC of 0.9795 and a high recall of 0.9527. This minimal variance strongly confirms the reproducibility and robustness of our features against institutional variation. The resulting high recall directly translates to enhanced diagnostic decision-making, significantly lowering the clinical risk associated with a missed TB diagnosis. This study establishes an effective, stable, and generalizable SSL-based ViT framework, offering a scalable solution for public health efforts in resource-constrained settings.