Research article - Peer-reviewed, 2022
Potential Use of Data-Driven Models to Estimate and Predict Soybean Yields at National Scale in Brazil
Monteiro, Leonardo A.; Ramos, Rafael M.; Battisti, Rafael; Soares, Johnny R.; Oliveira, Julianne C.; Figueiredo, Gleyce K. D. A.; Lamparelli, Rubens A. C.; Nendel, Claas; Lana, Marcos AlbertoAbstract
Large-scale assessment of crop yields plays a fundamental role for agricultural planning and to achieve food security goals. In this study, we evaluated the robustness of data-driven models for estimating soybean yields at 120 days after sow (DAS) in the main producing regions in Brazil; and evaluated the reliability of the "best" data-driven model as a tool for early prediction of soybean yields for an independent year. Our methodology explicitly describes a general approach for wrapping up publicly available databases and build data-driven models (multiple linear regression-MLR; random forests-RF; and support vector machines-SVM) to predict yields at large scales using gridded data of weather and soil information. We filtered out counties with missing or suspicious yield records, resulting on a crop yield database containing 3450 records (23 years x 150 "high-quality" counties). RF and SVM had similar results for calibration and validation steps, whereas MLR showed the poorest performance. Our analysis revealed a potential use of data-driven models for predict soybean yields at large scales in Brazil with around one month before harvest (i.e. 90 DAS). Using a well-trained RF model for predicting crop yield during a specific year at 90 DAS, the RMSE ranged from 303.9 to 1055.7 kg ha(-1) representing a relative error (rRMSE) between 9.2 and 41.5%. Although we showed up robust data-driven models for yield prediction at large scales in Brazil, there are still a room for improving its accuracy. The inclusion of explanatory variables related to crop (e.g. growing degree-days, flowering dates), environment (e.g. remotely-sensed vegetation indices, number of dry and heat days during the cycle) and outputs from process-based crop simulation models (e.g. biomass, leaf area index and plant phenology), are potential strategies to improve model accuracy.Keywords
Large-scale analysis; Machine learning approaches; Public databases; Geospatial and temporal variability; Climatic and soil variablesPublished in
International Journal Of Plant Production2022, volume: 16, number: 4, pages: 691-703
Publisher: SPRINGER
Authors' information
Monteiro, Leonardo A.
Food and Agriculture Organization of the United Nations (FAO)
Monteiro, Leonardo A.
University of Kentucky
Ramos, Rafael M.
University UNIEURO
Battisti, Rafael
Universidade Federal de Goias
Soares, Johnny R.
Universidade Estadual de Campinas
Oliveira, Julianne C.
Chalmers University of Technology
Figueiredo, Gleyce K. D. A.
Universidade Estadual de Campinas
Lamparelli, Rubens A. C.
Ctr Energy Planning NIPE
Nendel, Claas
Leibniz Zentrum fur Agrarlandschaftsforschung (ZALF)
Nendel, Claas
University of Potsdam
Nendel, Claas
Czech Academy of Sciences
Swedish University of Agricultural Sciences, Department of Crop Production Ecology
Sustainable Development Goals
SDG2 Zero hunger
UKÄ Subject classification
Agricultural Science
Publication Identifiers
DOI: https://doi.org/10.1007/s42106-022-00209-0
URI (permanent link to this page)
https://res.slu.se/id/publ/119035