Morrison, David
- Department of Animal Biosciences, Swedish University of Agricultural Sciences
Review article2014Peer reviewed
Morrison, David
Exploratory data analysis (EDA) involving both graphical displays and numerical summaries of data, is intended to evaluate the characteristics of the data as well as providing a form of data mining. For multivariate data, the best-known visual summaries include discriminant analysis, ordination, and clustering, particularly metric ordinations such as principal components analysis. However, these techniques have limiting mathematical assumptions that are not always realistic. Recently, network techniques have been developed in the biological field of phylogenetics that address some of these limitations. They are now widely used in biology under the name phylogenetic networks, but they are actually of general applicability to any multivariate dataset. Phylogenetic networks are fast and relatively easy to calculate, which makes them ideal as a tool for EDA. This review provides an overview of the field, with particular reference to the use of what are called splits graphs. There are several types of splits graph, which summarize the multivariate data in different ways. Example analyses are presented based on the neighbor-net graph, which seems to be the most generally useful of the available algorithms. This should encourage the more widespread use of these networks whenever a summary of a multivariate dataset is required. (C) 2014 John Wiley & Sons, Ltd.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
2014, volume: 4, number: 4, pages: 296-312
Publisher: WILEY PERIODICALS, INC
Information Systems
https://res.slu.se/id/publ/68284