Logistic regression for clustered data from environmental monitoring programs

Ekstrom, M.; Esseen, P. -A.; Westerlund, B.; Grafstrom, A.; Jonsson, B. G.; Stahl, G.

doi:10.1016/j.ecoinf.2017.10.006

Abstract

Large-scale surveys, such as national forest inventories and vegetation monitoring programs, usually have complex sampling designs that include geographical stratification and units organized in clusters. When models are developed using data from such programs, a key question is whether or not to utilize design information when analyzing the relationship between a response variable and a set of covariates. Standard statistical regression methods often fail to account for complex sampling designs, which may lead to severely biased estimators of model coefficients. Furthermore, ignoring that data are spatially correlated within clusters may underestimate the standard errors of regression coefficient estimates, with a risk for drawing wrong conclusions. We first review general approaches that account for complex sampling designs, e.g. methods using probability weighting, and stress the need to explore the effects of the sampling design when applying logistic regression models. We then use Monte Carlo simulation to compare the performance of the standard logistic regression model with two approaches to model correlated binary responses, i.e. cluster-specific and population-averaged logistic regression models. As an example, we analyze the occurrence of epiphytic hair lichens in the genus Bryoria; an indicator of forest ecosystem integrity. Based on data from the National Forest Inventory (NFI) for the period 1993-2014 we generated a data set on hair lichen occurrence on > 100,000 Picea abies trees distributed throughout Sweden. The NFI data included ten covariates representing forest structure and climate variables potentially affecting lichen occurrence. Our analyses show the importance of taking complex sampling designs and correlated binary responses into account in logistic regression modeling to avoid the risk of obtaining notably biased parameter estimators and standard errors, and erroneous interpretations about factors affecting e.g. hair lichen occurrence. We recommend comparisons of unweighted and weighted logistic regression analyses as an essential step in development of models based on data from large-scale surveys.

Keywords

Bryoria; Cluster-specific model; Complex sampling design; Correlated data; Logistic regression; National forest inventory; Population-averaged model

Published in

Ecological Informatics
2018, volume: 43, pages: 165-173
Publisher: ELSEVIER SCIENCE BV

SLU Authors

Ekström, Magnus
- Umeå University
Westerlund, Bertil
- Department of Forest Resource Management, Swedish University of Agricultural Sciences
Grafström, Anton
- Department of Forest Resource Management, Swedish University of Agricultural Sciences
Ståhl, Göran
- Department of Forest Resource Management, Swedish University of Agricultural Sciences

UKÄ Subject classification

Forest Science
Probability Theory and Statistics

Publication identifier

DOI: https://doi.org/10.1016/j.ecoinf.2017.10.006

Permanent link to this page (URI)

https://res.slu.se/id/publ/94070

Logistic regression for clustered data from environmental monitoring programs

Abstract

Keywords

Published in

SLU Authors

Ekström, Magnus

Westerlund, Bertil

Grafström, Anton

Ståhl, Göran

UKÄ Subject classification

Publication identifier

Permanent link to this page (URI)