Skip to main content
SLU publication database (SLUpub)

Research article2024Peer reviewedOpen access

Exploring Data Augmentation Algorithm to Improve Genomic Prediction of Top-Ranking Cultivars

Montesinos-Lopez, Osval A.; Sivakumar, Arvinth; Huerta Prado, Gloria Isabel; Salinas-Ruiz, Josafhat; Agbona, Afolabi; Ortiz Reyes, Axel Efrain; Alnowibet, Khalid; Ortiz, Rodomiro; Montesinos-Lopez, Abelardo; Crossa, Jose


Genomic selection (GS) is a groundbreaking statistical machine learning method for advancing plant and animal breeding. Nonetheless, its practical implementation remains challenging due to numerous factors affecting its predictive performance. This research explores the potential of data augmentation to enhance prediction accuracy across entire datasets and specifically within the top 20% of the testing set. Our findings indicate that, overall, the data augmentation method (method A), when compared to the conventional model (method C) and assessed using Mean Arctangent Absolute Prediction Error (MAAPE) and normalized root mean square error (NRMSE), did not improve the prediction accuracy for the unobserved cultivars. However, significant improvements in prediction accuracy (evidenced by reduced prediction error) were observed when data augmentation was applied exclusively to the top 20% of the testing set. Specifically, reductions in MAAPE_20 and NRMSE_20 by 52.86% and 41.05%, respectively, were noted across various datasets. Further investigation is needed to refine data augmentation techniques for effective use in genomic prediction.


machine learning models for genomic prediction and selection; plant breeding; data augmentation

Published in

2024, Volume: 17, number: 6, article number: 260

    UKÄ Subject classification

    Agricultural Science
    Genetics and Breeding

    Publication identifier


    Permanent link to this page (URI)