Abstract
The damage caused by agricultural pests and diseases has brought huge losses to the economy. Rapid recognition and timely treatment can minimize economic losses. Most of the existing image databases are produced in laboratories, where the shooting costs are expensive, and the background of these images are very different from the real farmland environment. Moreover, although the existing recognition systems can locate entities, they cannot provide discriminative evidence which is semantically interpretable, which makes it difficult for them to distinguish entities with very similar appearances. Fortunately, there are text descriptions in professional agricultural control documents that can clearly distinguish similar entities. In this paper, a textual-visual database for agricultural pests and diseases named APD-229 is constructed. The goal of APD-229 is to learn prior knowledge that can distinguish similar entities from the control documents, and to guide the image recognition system to complete the task of fine-grained classification. The database contains two sub-databases: pest set and disease set. A total of 121,213 images and 8,209 text descriptions belong to 229 categories. Furthermore, extensive experiments were carried out on APD-229, results show that in the single-modal image classification task, the accuracy of pest database is 75.15% and the accuracy of disease database is 61.23%. While in the multi-modal image classification task, the accuracy is 78.74% and 71.67% respectively. Compare with the single-model experiment, the accuracy of multi-model is improved by 4.78% and 17% respectively. APD-229 is publicly available at https://github.com/SDUST-MMML/APD-229.