Skip to main content
SLU:s publikationsdatabas (SLUpub)

Sammanfattning

Incorporating subset selection into a classification method often carries a number of advantages, especially when operating in the domain of high-dimensional features. In this paper, we focus on Bayesian network (BN) classifiers and formalize the feature selection from a perspective of improving classification accuracy. To exploring the effect of high-dimensionality we apply the growing dimension asymptotics, meaning that the number of training examples is relatively small compared to the number of training examples is relatively small compared to the number of feature nodes. In order to ascertain which set of features is indeed relevant for a classification task, we introduce a distance-based scoring measure reflecting how well the set separates different classes. This score is the employed to feature selection, using the weighted form of BN classifier. The idea is to view weights as inclusion-exclusion factors which eliminates the sets of features whose separation score do not exceed a given threshold. We establish the asymptotic optimal threshold and demonstrate that the proposed selection technique carries improvements over classification accuracy for different a priori assupmtions concerning the separation strength

Nyckelord

Bayesian network; augmention; separation strenght; growing dimension asymptotic; weighted classifier; subset selection; limiting error probability

Publicerad i

Research report (Centre of Biostochastics)
2004, nummer: 1
Utgivare: Centre of Biostochastics

SLU författare

UKÄ forskningsämne

Jordbruksvetenskap

Permanent länk till denna sida (URI)

https://res.slu.se/id/publ/5211