Potential Use of Data-Driven Models to Estimate and Predict Soybean Yields at National Scale in Brazil

Monteiro, Leonardo A.; Ramos, Rafael M.; Battisti, Rafael; Soares, Johnny R.; Oliveira, Julianne C.; Figueiredo, Gleyce K. D. A.; Lamparelli, Rubens A. C.; Nendel, Claas; Lana, Marcos Alberto

doi:10.1007/s42106-022-00209-0

Research article2022Peer reviewed

Potential Use of Data-Driven Models to Estimate and Predict Soybean Yields at National Scale in Brazil

Monteiro, Leonardo A.; Ramos, Rafael M.; Battisti, Rafael; Soares, Johnny R.; Oliveira, Julianne C.; Figueiredo, Gleyce K. D. A.; Lamparelli, Rubens A. C.; Nendel, Claas; Lana, Marcos Alberto

Abstract

Large-scale assessment of crop yields plays a fundamental role for agricultural planning and to achieve food security goals. In this study, we evaluated the robustness of data-driven models for estimating soybean yields at 120 days after sow (DAS) in the main producing regions in Brazil; and evaluated the reliability of the "best" data-driven model as a tool for early prediction of soybean yields for an independent year. Our methodology explicitly describes a general approach for wrapping up publicly available databases and build data-driven models (multiple linear regression-MLR; random forests-RF; and support vector machines-SVM) to predict yields at large scales using gridded data of weather and soil information. We filtered out counties with missing or suspicious yield records, resulting on a crop yield database containing 3450 records (23 years x 150 "high-quality" counties). RF and SVM had similar results for calibration and validation steps, whereas MLR showed the poorest performance. Our analysis revealed a potential use of data-driven models for predict soybean yields at large scales in Brazil with around one month before harvest (i.e. 90 DAS). Using a well-trained RF model for predicting crop yield during a specific year at 90 DAS, the RMSE ranged from 303.9 to 1055.7 kg ha(-1) representing a relative error (rRMSE) between 9.2 and 41.5%. Although we showed up robust data-driven models for yield prediction at large scales in Brazil, there are still a room for improving its accuracy. The inclusion of explanatory variables related to crop (e.g. growing degree-days, flowering dates), environment (e.g. remotely-sensed vegetation indices, number of dry and heat days during the cycle) and outputs from process-based crop simulation models (e.g. biomass, leaf area index and plant phenology), are potential strategies to improve model accuracy.

Keywords

Large-scale analysis; Machine learning approaches; Public databases; Geospatial and temporal variability; Climatic and soil variables

Published in

International Journal Of Plant Production
2022, volume: 16, number: 4, pages: 691-703
Publisher: SPRINGER

SLU Authors

Oliveira, Julianne
- Chalmers University of Technology
Lana, Marcos
- Department of Crop Production Ecology, Swedish University of Agricultural Sciences

Global goals (SDG)

SDG2 Zero hunger

UKÄ Subject classification

Agricultural Science

Publication identifier

DOI: https://doi.org/10.1007/s42106-022-00209-0

Permanent link to this page (URI)

https://res.slu.se/id/publ/119035