The retinal disease screening systems require a high level of diagnostic accuracy along with computational efficiency to be implemented in resource-constrained clinical settings. Although the large vision transformer models have shown promising results in retinal image analysis tasks, their computational complexity restricts their applicability. This paper presents a comparative study of two light-weight Vision Transformer (ViT) models, namely Swin-Tiny and ViT-Small, for multi-class retinal disease classification problems using color fundus images. A judiciously selected dataset of 4,217 fundus images from four classes: Normal, Diabetic Retinopathy (DR), Age-related Macular Degeneration (AMD), and Branch Retinal Vein Occlusion (BRVO) is used for the experimental study. Both models are trained in the same manner for preprocessing, training, and testing to make a fair and impartial comparison. The performance of the models is assessed using standard classification performance metrics like accuracy, precision, recall, and F1-score, and model complexity metrics like the number of parameters and computational complexity. The experimental study reveals that both models are able to achieve a competitive performance with varying trade-offs between classification performance and computational complexity.