| Plant leaf disease detection using vision transformers for precision agriculture |
Murugavalli S, Gopi R |
2025, Scientific Reports |
PLA-ViT: Vision Transformer with multi-head self-attention, data augmentation, bilateral filtering, transfer learning |
Accuracy: 98.7%. Dataset: New Plant Diseases Dataset (Kaggle) |
Strengths: Outperforms CNNs; efficient (12 ms). Limitations: Attention blocks weaker on certain tasks; needs multi-label classification. |
| Multispectral Plant Disease Detection with Vision Transformer–Convolutional Neural Network Hybrid Approaches |
De Silva M, Brown D |
2023, Sensors |
Hybrid CNN-ViT: Xception, ResNet vs ViT-B16 on multispectral images |
Accuracy: 83.3%. F1-Score: Highest for ViT-B16. Dataset: Custom balanced multispectral (2652 images) |
Strengths: Comprehensive comparison of models; creates a new balanced multispectral dataset. Limitations: Small dataset size limited performance of larger models; misclassifications between species with similar leaf shapes. |
| A hybrid Framework for plant leaf disease detection and classification using convolutional neural networks and vision transformer |
Aboelenin S, et al. |
2025, Complex & Intelligent Systems |
Hybrid CNN + Ensemble + ViT: Ensemble of VGG16, Inception-V3, DenseNet20 + ViT for local features |
Accuracy: 99.24% (Apple), 98% (Corn). Dataset: PlantVillage (Apple and Corn subsets). |
Strengths: High accuracy by combining strengths of multiple CNNs and a ViT. Limitations: Evaluated on lab-based PlantVillage data; real-world performance is unverified. |
| A Deep Features Extraction Model Based on the Transfer Learning Model and Vision Transformer "TLMViT" for Plant Disease Classification 21 |
Tabbakh A, Barpanda S |
2023, IEEE Access |
TLMViT: A sequential hybrid model using a pre-trained CNN for feature extraction followed by a ViT for classification. |
Accuracy: Not explicitly stated but proves the efficiency of using ViT for deep feature processing. Dataset: Not specified, likely a standard plant disease dataset. |
Strengths: Demonstrates a clear and effective hybrid architecture. Limitations: The sequential hybrid approach is becoming a common pattern, potentially limiting novelty. |
| Basil plant leaf disease detection using amalgam based deep learning models |
Mane D, et al. |
2024, Journal of Autonomous Intelligence |
Hybrid CNN+SVM: A CNN is used for feature extraction, and a Support Vector Machine (SVM) with an RBF kernel performs the classification. |
Accuracy: 95.02%. Dataset: Custom-created dataset of 803 basil leaf images across 5 classes. |
Strengths: Addresses lack of a standard dataset by creating a new one; hybrid model outperforms standalone CNN. Limitations: Small dataset size; uses a classical classifier (SVM) instead of a more modern Transformer head. |
| Enhanced leaf disease detection: UNet for segmentation and optimized EfficientNet for disease classification |
Kotwal J, et al. |
2024, Software Impacts |
UNet + Optimized EfficientNet: UNet segments the disease region, followed by an optimized EfficientNet (AD-ENet) for classification. |
Accuracy: 99.91%. Precision: 99.87%. Recall: 99.81%. Dataset: PlantVillage and a custom Indian Soybean dataset. |
Strengths: Explicit segmentation step improves accuracy; optimization addresses overfitting and gradient issues. Limitations: High performance is on datasets that may not fully represent 'in-the-wild' complexity. |
| EMSAM: enhanced multi-scale segment anything model for leaf disease segmentation |
Li J, et al. |
2025, Frontiers in Plant Science |
EMSAM: Hybrid ViT-CNN architecture based on Segment Anything Model (SAM) for joint segmentation and classification. Fuses global features (ViT) and local features (CNN). |
Accuracy: 87.86%. Dice: 79.25%. IoU: 69.87%. Dataset: A new annotated subset of PlantVillage (PSD, 5200 images). |
Strengths: State-of-the-art segmentation performance; robust across different disease severities; establishes a new benchmark. Limitations: Higher computational cost; relies on PlantVillage data, limiting real-world generalization. |
| Leveraging deep learning for plant disease and pest detection: a comprehensive review and future directions |
Shoaib M, et al. |
2025, Frontiers in Plant Science |
Review Paper: Surveys deep learning models (CNNs, FCNs, U-Nets, Mask R-CNN) for plant disease classification, detection, and segmentation. |
Accuracy: Notes that classification models often exceed 95% and segmentation models exceed 90% precision on benchmark datasets. |
Strengths: Comprehensive overview of the field. Limitations: Highlights key challenges like data scarcity, environmental variability, and the lab-to-field gap. |
| Vision Transformer with Mixture of Experts for addressing the lab-to-field gap in plant disease classification |
Zafar Salman, et al. |
2025, Frontiers in Plant Science |
ViT + Mixture of Experts (MoE): A ViT backbone combined with an MoE framework where specialized experts are trained for different data aspects (e.g., imaging conditions). |
Accuracy: 20% improvement over baseline ViT; 68% accuracy on cross-domain (PlantVillage to PlantDoc) evaluation. Dataset: PlantVillage, PlantDoc. |
Strengths: Directly tackles the lab-to-field generalization problem; MoE enhances adaptability and robustness. Limitations: Increased model complexity. |
| Tulsi Leaf Disease Detection Using AI & Machine Learning |
Not specified |
Not specified, Aislyn Project Page |
Standard CNNs: Compares InceptionV3, ResNet50, and VGG16 for classification. |
Accuracy: Not specified, but states the goal is to select the most accurate model. Dataset: TulsiDoc dataset from Mendeley (1000 samples). |
Strengths: Addresses the specific domain of Tulasi leaves. Limitations: Not a research paper; uses foundational CNN models without novel contributions. |