Publication details | Department of Pattern Recognition

Conference Paper (international conference)

On the Over-Fitting Problem of Complex Feature Selection Methods

Somol Petr, Novovičová Jana, Pudil Pavel

: Proc. 5th International Computer Engineering Conference - A better Information Society Through the e@, p. 12-17

: 5th International Computer Engineering Conference - A better Information Society Through the e@, (Káhira, EG, 27.12.2009-28.12.2009)

: CEZ:AV0Z10750506

: 2C06019, GA MŠk, 1M0572, GA MŠk, GA102/07/1594, GA ČR, GA102/08/0593, GA ČR

: feature selection, overfitting, overselection

: http://library.utia.cas.cz/separaty/2010/RO/somol-on the over-fitting problem of complex feature selection methods.pdf

(eng): One of the hot topics discussed recently in relation to machine learning is the question of actual performance of modern feature selection methods. Feature selection has been a highly active area of research in recent years due to its potential to improve both the performance and economy of automatic decision systems in various applicational fields, including medicine, image analysis, remote sensing, economics etc. The number of available methods and methodologies has grown rapidly throughout recent years while promising important improvements. Yet recently many authors put this development in question, claiming that simpler older tools are actually better than complex modern ones – which, despite promises, are claimed to actually fail in real-world applications. We investigate this question, show several illustrative examples and draw several conclusions and recommendations regarding feature selection methods’ expectable performance.

(cze): Jedno z aktuálních témat ve strojovém učení je otázka skutečné výkonnosti moderních metod výběru příznaků. Výběr příznaků byl v posledních letech v centru pozornosti pro svou schopnost zlepšovat nejen úspěšnost ale i snižovat náročnost systémů automatického rozhodování v různých oblastech - v medicíně, zpracování obrazových dat, ekonomii atd. Počet dostupných metod a matodologií stále narůstá přičemž nové metody slibují další zlepšení. Tento vývoj byl nicméně nedávno zpochybněn řadou autorů s poukazem na pozorování, že starší metody se ve skutečnosti chovají lépe v reálných aplikacích. V této práci zkoumáme tuto otázku a na několika ilustrativních příkladech podáváme doporučení ohledně očekávatelného výkonu různých metod.

: BD