Skip to main content
SLU publication database (SLUpub)

Report2004

Towards the Optimal Feature Selection in High-Dimensional Bayesian Network Classifiers

Pavlenko, T; Hall, M; von, Rosen

Abstract

Incorporating subset selection into a classification method often carries a number of advantages, especially when operating in the domain of high-dimensional features. In this paper, we focus on Bayesian network (BN) classifiers and formalize the feature selection from a perspective of improving classification accuracy. To exploring the effect of high-dimensionality we apply the growing dimension asymptotics, meaning that the number of training examples is relatively small compared to the number of training examples is relatively small compared to the number of feature nodes. In order to ascertain which set of features is indeed relevant for a classification task, we introduce a distance-based scoring measure reflecting how well the set separates different classes. This score is the employed to feature selection, using the weighted form of BN classifier. The idea is to view weights as inclusion-exclusion factors which eliminates the sets of features whose separation score do not exceed a given threshold. We establish the asymptotic optimal threshold and demonstrate that the proposed selection technique carries improvements over classification accuracy for different a priori assupmtions concerning the separation strength

Keywords

Bayesian network; augmention; separation strenght; growing dimension asymptotic; weighted classifier; subset selection; limiting error probability

Published in

Research report (Centre of Biostochastics)
2004, number: 1
Publisher: Centre of Biostochastics