Abstract:In order to enhance the effectiveness of identifying pests in fruit trees and promptly implement preventive measures, focusing on six major pests that pose a significant threat to fruit trees, an improved lightweight MobileViT recognition model was proposed for the problems of complex background of fruit tree pest recognition in the natural environment, high difficulty of detecting the small target of the pests, and high feature similarity with the features between different categories. In enhancing the model, the partial convolution (PConv) module was employed to replace certain standard convolution modules in the original MobileViT module. Additionally, modifications were made to the feature fusion strategy within the MobileViT module, involving the concatenation fusion of input features, local expressive features, and global expressive features. The tenth layer MV2 module and the eleventh layer MobileViT module were removed, introducing an improved atrous spatial pyramid pooling (ASPP) module as a replacement, aiming to create multi-scale fusion features. Furthermore, the model adopted the SiLU activation function in lieu of the ReLU6 activation function for computations. Finally, the model underwent transfer learning based on the ImageNet dataset. The experimental results indicated that the recognition accuracy of six categories of fruit tree pest images reached 93.77%, with a parameter count of 8.40×105. In comparison with the previous version, the recognition accuracy was improved by 7.5 percentage points, while the parameter count was decreased by 33.86%. When compared with commonly used pest CNN recognition models, namely AlexNet, ResNet 50, MobileNetV2, and ShuffleNetV2, the proposed model achieved higher recognition accuracy by 8.25, 4.78, 7.27 and 7.41 percentage points, respectively, with parameter counts lowered by 6.03×107, 2.48×107, 2.66×106 and 5.30×105, respectively. Compared with Transformer recognition models such as ViT and Swin Transformer, the accuracy was higher by 19.03 and 9.8 percentage points, respectively, with parameter counts lowered by 8.56×107 and 2.75×107. The research was suitable for deployment in environments with limited resources, such as mobile terminals, which can contribute to the effective identification and detection of small target pests in fruit trees amidst complex backgrounds.