Publication details

The Problem of Fragile Feature Subset Preference in Feature Selection Methods and A Proposal of Algorithmic Workaround

Conference Paper (international conference)

Somol Petr, Grim Jiří, Pudil Pavel

serial: Proc. 2010 Int. Conf. on Pattern Recognition, p. 4396-4399

action: 20th International Conference on Pattern Recognition, (Istanbul, TR, 23.08.2010-26.08.2010)

research: CEZ:AV0Z10750506

project(s): 2C06019, GA MŠk, GA102/07/1594, GA ČR, GA102/08/0593, GA ČR, 1M0572, GA MŠk

keywords: feature selection, machine learning, over-fitting, classification, feature weights, weighted features, feature acquisition cost

preview: Download

abstract (eng):

We point out a problem inherent in the optimization scheme of many popular feature selection methods. It follows from the implicit assumption that higher feature selection criterion value always indicates more preferable subset even if the value difference is marginal. This assumption ignores the reliability issues of particular feature preferences, overfitting and feature acquisition cost. We propose an algorithmic extension applicable to many standard feature selection methods allowing better control over feature subset preference. We show experimentally that the proposed mechanism is capable of reducing the size of selected subsets as well as improving classifier generalization.