(source codes available below)
(to be further extended)
It should be stressed that the following is only a very brief summary of our experience regarding the applicability of a limited number of sub-optimal feature selection methods. The whole family of existing methods (and methodologies) is constantly growing; for more information, please, follow the references below.
Suppose a traditional feature selection task: we have training data of C classes of approximately S samples per class, each sample represented by F numerical features and we have some reasonable feature subset evaluation criterion. Suppose we do not face too many problems like: mixed feature types, (numerical, nominal), extreme difference in class sizes, incomplete data etc. etc. OK, real world problems usually involve many such problems, but in this explanation we only want to compare search strategies as such.
Be immediately warned against the naive feature ordering according to individual criterion values. Very few real data have so simple character that this would work. You need methods that reveal at least some inter-feature relations.
The next simplest choice is sequential forward selection (SFS) or sequential backward selection (SBS), which has some properties that make it the preferred choice by many practitioners: 1) it is extremely simple to implement, 2) it is very fast because it consists of only F core steps (although each step involves max. F criterion tests), 3) it can gives solutions for all subset sizes in one run. The SFS is usually faster than SBS. SBS may better reflect feature relations in larger subsets.
The problem with SFS and SBS is its single-track search. It tends to "go straight to the closest local optimum" of the search space. Technically said, it does not address the feature nesting problem - once a feature is added/removed, it remains so forever, even if the context changes and the feature could become important/useless inside some newly assembled subset. With most real datasets both SFS and SBS get considerably beaten by more sophisticated methods.
Significantly better results can be expected from the floating search methods (SFFS,SBFS). They take use
of backtracking and are capable of "correcting wrong inclusion/removal decisions". Floating search has become widely popular
because of its good performance/speed ratio, although some critics claim it to be too slow. Our experience shows that one can
expect roughly 10x longer computational time when compared to SFS/SBS. This obviously cannot be taken as too accurate guess,
because the overall speed depends primarily on the dimensionality of the problem. With dozens of features it usually is not
important whether the search takes 10 seconds (SFS) or 2 minutes (SFFS). With thousands of features all sequential methods may
become unusable, as even the simplest sequential forward selection may take days due to the overwhelming amount of possible
feature combinations to be tested. The original floating search description may be found in:
Adaptive floating search (ASFFS, ASFBS) is an extension of the classical floating search. It should be noted that the adaptive floating search can seem rather complicated and does not necessarily bring much (according to our long-term experience by using the adaptive floating instead classical floating search you would get results occassionally better by 1-5%, but at a cost of several times longer computational time). In contrary, the classical floating search has proved to be a great tool, especially when compared to the older methods like Plus-L-Minus-R or the basic sequential forward and backward selection.
Apart from floating search, we can recommend something even more flexible if you intend to achieve best
possible results and are ready to experiment a bit - the Oscillating Search (OS), introduced in the following paper:
According to our current state of experience, we would recommed:
1) There is a freely available C source code of the floating search methods (SFFS, SBFS, ASFFS, ASBFS) at the Elsevier server.
It is an "electronic annex" of our paper:
2) Oscillating search source code is available in a similar form here. The procedure technically looks quite complicated,
but the only reason is the extent of implemented optional features, not the core algorithm as such. However, it should not be
a problem to adjust it for use with your criterion procedures. For first experiments we recommend to stay with the simplest
(sequential) form of OS, low Delta values (2-3) and some simple initialization (SFS or random).
The codes come from experimental versions of Feature Selection Toolbox, a tool being developed in our department. F.S.Toolbox as a whole is not currently available for general use.
|Back to the demonstration list.|