Skip to main content
SLU publication database (SLUpub)

Research article2014Peer reviewedOpen access

Improving the prediction performance of a large tropical vis-NIR spectroscopic soil library from Brazil by clustering into smaller subsets or use of data mining calibration techniques

Araújo, S. R.; Wetterlind, Johanna; Demattê, J. A. M.; Stenberg, Bo


Effective agricultural planning requires basic soil information. In recent decades visible near-infrared diffuse reflectance spectroscopy (vis-NIR) has been shown to be a viable alternative for rapidly analysing soil properties. We studied 7172 samples of seven different soil types collected from several regions of Brazil and varying in organic matter (OM) (0.2-10.3%) and clay content (0.2-99.0%). The aim was to explore the possibility of enhancing the performance of vis-NIR data in predicting organic matter and clay content in this library by dividing it into smaller sub-libraries on the basis of their vis-NIR spectra. We used partial least square regression (PLSR) models on the sub-libraries and compared the results with PLSR and two non-linear calibration techniques, boosted regression trees (BT) and support vector machines (SVM) applied to the whole library. The whole library calibrations for clay performed well (ME (modelling efficiency) > 0.82; RMSE (root mean squared error) < 10.9%), reflecting the influence of the direct spectral responses of this property in the vis-NIR range. Calibrations for OM were reasonably good, especially in view of the very small variation in this property (ME > 0.60; RMSE < 0.55%). The best results were, however, found when dividing the large library into smaller subsets by using variation in the mean-normalized or first derivative spectra. This divided the global data set into clusters that were more uniform in mineralogy, regardless of geographical origin, and improved predictive performance. The best clustering method improved the RMSE in the validation to 8.6% clay and 0.47% OM, which corresponds to a 21% and 15% reduction, respectively, as compared with whole library PLSR. For the whole library, SVM performed almost equally well, reducing RMSE to 8.9% clay and 0.48% OM.

Published in

European Journal of Soil Science
2014, Volume: 65, number: 5, pages: 718-729