Table 1: Comprehensive Literature Survey Summary

Title Authors Year & Venue Technology / Method Key Observations & Results Strengths & Limitations
Plant leaf disease detection using vision transformers for precision agriculture Murugavalli S, Gopi R 2025, Scientific Reports PLA-ViT: Vision Transformer with multi-head self-attention, data augmentation, bilateral filtering, transfer learning Accuracy: 98.7%. Dataset: New Plant Diseases Dataset (Kaggle) Strengths: Outperforms CNNs; efficient (12 ms). Limitations: Attention blocks weaker on certain tasks; needs multi-label classification.
Multispectral Plant Disease Detection with Vision Transformer–Convolutional Neural Network Hybrid Approaches De Silva M, Brown D 2023, Sensors Hybrid CNN-ViT: Xception, ResNet vs ViT-B16 on multispectral images Accuracy: 83.3%. F1-Score: Highest for ViT-B16. Dataset: Custom balanced multispectral (2652 images) Strengths: Comprehensive comparison of models; creates a new balanced multispectral dataset. Limitations: Small dataset size limited performance of larger models; misclassifications between species with similar leaf shapes.
A hybrid Framework for plant leaf disease detection and classification using convolutional neural networks and vision transformer Aboelenin S, et al. 2025, Complex & Intelligent Systems Hybrid CNN + Ensemble + ViT: Ensemble of VGG16, Inception-V3, DenseNet20 + ViT for local features Accuracy: 99.24% (Apple), 98% (Corn). Dataset: PlantVillage (Apple and Corn subsets). Strengths: High accuracy by combining strengths of multiple CNNs and a ViT. Limitations: Evaluated on lab-based PlantVillage data; real-world performance is unverified.
A Deep Features Extraction Model Based on the Transfer Learning Model and Vision Transformer "TLMViT" for Plant Disease Classification 21 Tabbakh A, Barpanda S 2023, IEEE Access TLMViT: A sequential hybrid model using a pre-trained CNN for feature extraction followed by a ViT for classification. Accuracy: Not explicitly stated but proves the efficiency of using ViT for deep feature processing. Dataset: Not specified, likely a standard plant disease dataset. Strengths: Demonstrates a clear and effective hybrid architecture. Limitations: The sequential hybrid approach is becoming a common pattern, potentially limiting novelty.
Basil plant leaf disease detection using amalgam based deep learning models Mane D, et al. 2024, Journal of Autonomous Intelligence Hybrid CNN+SVM: A CNN is used for feature extraction, and a Support Vector Machine (SVM) with an RBF kernel performs the classification. Accuracy: 95.02%. Dataset: Custom-created dataset of 803 basil leaf images across 5 classes. Strengths: Addresses lack of a standard dataset by creating a new one; hybrid model outperforms standalone CNN. Limitations: Small dataset size; uses a classical classifier (SVM) instead of a more modern Transformer head.
Enhanced leaf disease detection: UNet for segmentation and optimized EfficientNet for disease classification Kotwal J, et al. 2024, Software Impacts UNet + Optimized EfficientNet: UNet segments the disease region, followed by an optimized EfficientNet (AD-ENet) for classification. Accuracy: 99.91%. Precision: 99.87%. Recall: 99.81%. Dataset: PlantVillage and a custom Indian Soybean dataset. Strengths: Explicit segmentation step improves accuracy; optimization addresses overfitting and gradient issues. Limitations: High performance is on datasets that may not fully represent 'in-the-wild' complexity.
EMSAM: enhanced multi-scale segment anything model for leaf disease segmentation Li J, et al. 2025, Frontiers in Plant Science EMSAM: Hybrid ViT-CNN architecture based on Segment Anything Model (SAM) for joint segmentation and classification. Fuses global features (ViT) and local features (CNN). Accuracy: 87.86%. Dice: 79.25%. IoU: 69.87%. Dataset: A new annotated subset of PlantVillage (PSD, 5200 images). Strengths: State-of-the-art segmentation performance; robust across different disease severities; establishes a new benchmark. Limitations: Higher computational cost; relies on PlantVillage data, limiting real-world generalization.
Leveraging deep learning for plant disease and pest detection: a comprehensive review and future directions Shoaib M, et al. 2025, Frontiers in Plant Science Review Paper: Surveys deep learning models (CNNs, FCNs, U-Nets, Mask R-CNN) for plant disease classification, detection, and segmentation. Accuracy: Notes that classification models often exceed 95% and segmentation models exceed 90% precision on benchmark datasets. Strengths: Comprehensive overview of the field. Limitations: Highlights key challenges like data scarcity, environmental variability, and the lab-to-field gap.
Vision Transformer with Mixture of Experts for addressing the lab-to-field gap in plant disease classification Zafar Salman, et al. 2025, Frontiers in Plant Science ViT + Mixture of Experts (MoE): A ViT backbone combined with an MoE framework where specialized experts are trained for different data aspects (e.g., imaging conditions). Accuracy: 20% improvement over baseline ViT; 68% accuracy on cross-domain (PlantVillage to PlantDoc) evaluation. Dataset: PlantVillage, PlantDoc. Strengths: Directly tackles the lab-to-field generalization problem; MoE enhances adaptability and robustness. Limitations: Increased model complexity.
Tulsi Leaf Disease Detection Using AI & Machine Learning Not specified Not specified, Aislyn Project Page Standard CNNs: Compares InceptionV3, ResNet50, and VGG16 for classification. Accuracy: Not specified, but states the goal is to select the most accurate model. Dataset: TulsiDoc dataset from Mendeley (1000 samples). Strengths: Addresses the specific domain of Tulasi leaves. Limitations: Not a research paper; uses foundational CNN models without novel contributions.