Skip to main content
SLU publication database (SLUpub)

Review article2014Peer reviewed

Phylogenetic networks: a new form of multivariate data summary for data mining and exploratory data analysis

Morrison, David

Abstract

Exploratory data analysis (EDA) involving both graphical displays and numerical summaries of data, is intended to evaluate the characteristics of the data as well as providing a form of data mining. For multivariate data, the best-known visual summaries include discriminant analysis, ordination, and clustering, particularly metric ordinations such as principal components analysis. However, these techniques have limiting mathematical assumptions that are not always realistic. Recently, network techniques have been developed in the biological field of phylogenetics that address some of these limitations. They are now widely used in biology under the name phylogenetic networks, but they are actually of general applicability to any multivariate dataset. Phylogenetic networks are fast and relatively easy to calculate, which makes them ideal as a tool for EDA. This review provides an overview of the field, with particular reference to the use of what are called splits graphs. There are several types of splits graph, which summarize the multivariate data in different ways. Example analyses are presented based on the neighbor-net graph, which seems to be the most generally useful of the available algorithms. This should encourage the more widespread use of these networks whenever a summary of a multivariate dataset is required. (C) 2014 John Wiley & Sons, Ltd.

Published in

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
2014, volume: 4, number: 4, pages: 296-312
Publisher: WILEY PERIODICALS, INC

SLU Authors

UKÄ Subject classification

Information Systems

Publication identifier

  • DOI: https://doi.org/10.1002/widm.1130

Permanent link to this page (URI)

https://res.slu.se/id/publ/68284