Skip to main content
SLU publication database (SLUpub)

Research article2019Peer reviewedOpen access

Evaluating complex relationships between ecological indicators and environmental factors in the Baltic Sea: A machine learning approach

Lehikoinen, Annukka; Olsson, Jens; Bergstrom, Lena; Bergstrom, Ulf; Bryhn, Andreas; Fredriksson, Ronny; Uusitalo, Laura


The state of marine ecosystems is increasingly evaluated using indicators. The indicator assessment results need to be understood in the context of the whole ecosystem in order to understand the key factors determining the status of these environmental components. Data available from the system's different components are, however, often heterogeneous: they may represent different spatial and temporal scales, and different parameters can be measured with different accuracy. This makes it difficult to evaluate the relationship between these variables and status of the environment using indicators. We studied whether probabilistic, machine learning-based classifiers could provide for assessing the relationships between multiple environmental factors and ecological indicators. This paper demonstrates the use of Bayesian network classifiers (Tree-augmented Naive Bayes classifier, TAN as the specific case example), used together with structural learning from data and Entropy Minimization Discretization (IEMD) algorithm to study environment-indicator relationships within coastal fish communities in the Baltic Sea. By using two Baltic-wide indicators of coastal fish community status and a heterogeneous set of potentially influential natural and anthropogenic variables, we explore and discuss the potential of the approach. Given pre-defined cutting points for the indicators, such as the classification thresholds of the indicator, the method enables identifying relevant variables and estimating their relative importance. This information could be used in environmental management to demonstrate at which threshold value the state of an indicator is likely to respond to a pressure or a combination of pressures. In contrast to many other multivariate statistical methodologies, the presented approach can handle missing data as well as data of varying types, from fully quantitative to presence-absence, in the same analysis.


Bayesian network classifiers; Tree-augmented Naive Bayes; Entropy Minimization Discretization; Coastal fish communities; Baltic Sea

Published in

Ecological Indicators
2019, Volume: 101, pages: 117-125