A Nascent Technique for Cultivated Feature Selection using Evolutionary Computation Algorithms

Authors

  • Sachin Minocha Department of Computer Science and Engineering, Sant Longowal Institute of Engineering and Technology, Punjab, India
  • Birmohan Singh Department of Computer Science and Engineering, Sant Longowal Institute of Engineering and Technology, Punjab, India

DOI:

https://doi.org/10.48048/tis.2022.4588

Keywords:

Feature selection, Evolutionary computation, Metaheuristic algorithm, Binary whale optimization algorithm, Preprocessing

Abstract

Evolutionary computation algorithms are in recent trends for feature selection due to their efficient results as compared to traditional algorithms. The performance of the evolutionary computational algorithm completely depends upon the parameter (like population, number of generations) setting. Existing parameter setting techniques have been developed for specific algorithms. This paper designs a generic pre-processing technique that removes the redundant as well as irrelevant features from the initial given feature set. This work uses Kendall tau to find the non-redundant feature and the Kruskal Wallis for the relevant features. If Kendall tau is not able to select any non-redundant feature then mutual information gain is used to select significant features followed by the Kruskal Wallis to select relevant features. The designed technique has been analyzed on 4 evolutionary computation algorithms, Modified Binary Particle Swarm Optimization, Non-dominated Sorting Genetic Algorithm, Binary Grey Wolf Optimization, and Binary Whale Optimization Algorithm over 6 datasets by using accuracy, the number of features in selected subset, sensitivity, and specificity. The present technique improves the performance of each algorithm in terms of accuracy, sensitivity, and specificity by 1 % on average with a 10 % reduction in the selected feature subset proves the significance of the proposed technique.

HIGHLIGHTS

  • This work removes the redundant features from the initial population using Kendall’s tau and mutual information gain to reduce the initial population for any Evolutionary Computation (EC) algorithm
  • The relevant features that are highly correlated with class are added back to the initial population to maintain the optimal initial population
  • The performance validation has been done using four EC algorithms to show the significance of the work


GRAPHICAL ABSTRACT

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

References

S Gu, R Cheng and Y Jin. Feature selection for high-dimensional classification using a competitive swarm optimizer. Soft Comput. 2018; 22, 811-22.

H Chantar, M Mafarja, H Alsawalqah, AA Heidari, I Aljarah and H Faris. Feature selection using binary grey wolf optimizer with elite-based crossover for Arabic text classification. Neural Comput. Appl. 2020; 32, 12201-20.

L Yu and H Liu. Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 2004; 5, 1205-24.

D Luo, F Wang, J Sun, M Markatou, J Hu and S Ebadollahi. SOR: Scalable orthogonal regression for non-redundant feature selection and its healthcare applications. In: Proceedings of the SIAM International Conference on Data Mining, Anaheim, California, USA. 2012, p. 576-87.

M Mafarja, I Aljarah, H Faris, AI Hammouri, AM Al-Zoubi and S Mirjalili. Binary grasshopper optimisation algorithm approaches for feature selection problems. Expert Syst. Appl. 2019; 117, 267-86.

X Gu, J Guo, L Xiao, T Ming and C Li. A feature selection algorithm based on equal interval division and minimal-redundancy - maximal-relevance. Neural Process. Lett. 2020; 51, 1237-63.

G Ditzler, R Polikar and G Rosen. A sequential learning approach for scaling up filter-based feature subset selection. IEEE Trans. Neural Networks Learn. 2018; 29, 2530-44.

DE Goldberg and JH Holland. Genetic algorithms and machine learning. Mach. Learn. 1988; 3, 95-9.

R Eberhart and J Kennedy. A new optimizer using particle swarm theory. In: Proceedings of the MHS'95. Proceedings of the 6th International Symposium on Micro Machine and Human Science, Nagoya, Japan. 1995, p. 39-43.

R Storn and K Price. Differential evolution - a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 1997; 11, 341-59.

B Xue, M Zhang, WN Browne and X Yao. A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 2016; 20, 606-26.

Y Yuan, H Xu and B Wang. An improved NSGA-III procedure for evolutionary many-objective optimization. In: Proceedings of the Annual Conference on Genetic and Evolutionary Computation, Vancouver BC, Canada. 2014, p. 661-8.

H Bach, N Bing, X Ivy, L Peter and A Mengjie. New mechanism for archive maintenance in PSO-based multi-objective feature selection. Soft Comput. 2016; 20, 3927-46.

SM Vieira, LF Mendonça, GJ Farinha and JMC Sousa. Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients. Appl. Soft Comput. J. 2013; 13, 3494-504.

A Al-Ani, A Alsukker and RN Khushaba. Feature subset selection using differential evolution and a wheel based search strategy. Swarm Evol. Comput. 2013; 9, 15-26.

E Emary, HM Zawbaa and AE Hassanien. Binary grey wolf optimization approaches for feature selection. Neurocomputing 2016; 172, 371-81.

AG Hussien, D Oliva, EH Houssein, AA Juan and X Yu. Binary whale optimization algorithm for dimensionality reduction. Mathematics 2020; 8, 1821.

FJ Lobo, CF Lima and Z Michalewicz. Parameter setting in evolutionary algorithms. Springer, Heidelberg, Germany, 2007.

AG Hussien and M Amin. A self-adaptive Harris Hawks optimization algorithm with opposition-based learning and chaotic local search strategy for global optimization and feature selection. Int. J. Mach. Learn. Cybern. 2022; 13, 309-36.

CS Ooi, MH Lim and MS Leong. Self-tune linear adaptive-genetic algorithm for feature selection. IEEE Access. 2019; 7, 138211-32.

FG Lobo and CF Lima. A review of adaptive population sizing schemes in genetic algorithms. In: Proceedings of the 7th Annual Workshop on Genetic and Evolutionary Computation, Washington DC, USA. 2007, p. 185-204.

C Bielza, P Juan and P Larra. Parameter control of genetic algorithms by learning and simulation of Bayesian networks - a case study for the optimal ordering of tables. J. Comput. Sci. Tech. 2013; 28, 720-31.

V Toǧan and AT Daloǧlu. An improved genetic algorithm with initial population strategy and self-adaptive member grouping. Comput. Struct. 2008; 86, 1204-18.

T Yu, K Sastry and DE Goldberg. Population sizing to go: Online adaptation using noise and substructural measurements. In: FG Lobo, CF Lima, Z Michalewicz (Eds.). Parameter setting in evolutionary algorithms. Springer, Heidelberg, Germany, 2007, p. 205-23.

J Brest and M Sepesy. Population size reduction for the differential evolution algorithm. Appl. Intell. 2008; 29, 228-47.

X Teng, H Dong and X Zhou. Adaptive feature selection using v-shaped binary particle swarm optimization. PLoS One 2017; 12, e0173907.

ZJ Viharos, KB Kis, Á Fodor and MI Büki. Adaptive, Hybrid Feature Selection (AHFS). Pattern Recognit. 2021; 116, 107932.

GM Beuren and MJ Anzanello. Variable selection using statistical non-parametric tests for classifying production batches into multiple classes. Chemom. Intell. Lab. Syst. 2019; 193, 103830.

AI Mcleod. The Kendall package. Western University, Ontario, Canada, 2005.

MG Kendall. Biometrika trust a new measure of rank correlation. Biometrika 1938; 30, 81-93.

N Cliff and V Charlin. Variances and covariances of Kendall’s tau and their estimation. Multivariate Behav. Res. 1991; 26, 693-707.

WH Kruskal and WA Wallis. Use of ranks in one-criterion variance analysis. J. Am. Stat. Assoc. 1952; 47, 583-621.

T Pohlert. The Pairwise Multiple Comparison of Mean Ranks Package (PMCMR), Available at: . http://CRAN.R-project.org/package=PMCMR, accessed June 2020.

JK Chung, PL Kannappan, CT Ng and PK Sahoo. Measures of distance between probability distributions. J. Math. Anal. Appl. 1989; 138, 280-92.

B Auffarth, M López and J Cerquides. Comparison of redundancy and relevance measures for feature selection in tissue classification of CT images. In: P Perner (Ed.). Advances in data mining. applications and theoretical aspects. ICDM 2010. Lecture notes in computer science. Vol 6171. Springer, Heidelberg, Germany, 2010, p. 248-62.

D Dheeru and E Karra Taniskidou. Machine Learning Repository, Available: https://archive.ics.uci.edu/ml/index.php, accessed June 2020.

Downloads

Published

2022-06-08

How to Cite

Minocha, S. ., & Singh, B. . (2022). A Nascent Technique for Cultivated Feature Selection using Evolutionary Computation Algorithms. Trends in Sciences, 19(12), 4588. https://doi.org/10.48048/tis.2022.4588