Feature subset selection via an improved discretization-based particle swarm optimization
- Author: Yu Zhou , Jiping Lin , Hainan Guo
- Accepted: 7 October 2020
- Published in: Applied Soft Computing Journal ( ASOC )
- paper link
Abstract
High-dimensional data analysis has attracted increasingly attention in machine learning or data mining tasks. Due to the existence of irrelevant and redundant features, classification accuracy is often degraded seriously. Feature selection (FS), which aims to improve the predictive accuracy by selecting a subset of features, plays a very important role. In this paper, we proposed an improved discretizationbased particle swarm optimization (PSO) for FS. In our method, we applied a moderate pre-screening process to obtain a reduced size of features at first. Then, a ranking-based cut-point table that stores multiple cut-points sorted by an entropy-based cut-point priority for each feature was obtained. To find the optimal combination of the cut-points that could best distinguish the data samples, a simple yet efficient encoding and decoding approach in PSO was used to select a flexible number of cut-points. Moreover, a probability-guided local search strategy was applied to search for better combination of cut-points to achieve promising feature subset. Comprehensive simulation results on 19 benchmark datasets demonstrate the effectiveness of several improved strategies in our proposed method and the advantages of our proposed method over some state-of-the-art PSO-based competitors.