Filter-Based Feature Selection Using Information Theory and Binary Cuckoo Optimisation Algorithm

Document Type : Research Paper


1 chool of Computer Sciences, University Sains Malaysia 11800 Pulau Pinang, Malaysia; Department of Computer Sciences, Federal College of Education (Technical) Gombe, Nigeria

2 Assistant Professor, School of Computer Sciences, University Sains Malaysia 11800 Pulau Pinang, Malaysia.

3 School of Computer Sciences, University Sains Malaysia 11800 Pulau Pinang, Malaysia.


Dimensionality reduction is among the data mining process that is used to reduce the noise and complexity of features in various datasets. Feature selection (FS) is one of the most commonly used dimensionalities that reduces the unwanted features from the datasets. FS can be either wrapper or filter. Wrappers select subsets of the feature with better classification performance but are computationally expensive. On the other hand, filters are computationally fast but lack feature interaction among selected subsets of features which in turn affect the classification performance of the chosen subsets of features. This study proposes two concepts of information theory mutual information (MI). As well as entropy (E). Both were used together with binary cuckoo optimization algorithm BCOA (BCOA-MI and BCOA-EI). The target is to improve classification performance (reduce the error rate and computational complexity) on eight datasets with varying degrees of complexity. A support vector machine classifier was used to measure and computes the error rates of each of the datasets for both BCOA-MI and BCOA-E. The analysis of the results showed that BCOA-E selects a fewer number of features and performed better in terms of error rate. In contrast, BCOA-MI is computationally faster but chooses a larger number of features. Comparison with other methods found in the literature shows that the proposed BCOA-MI and BCOA-E performed better in terms of accuracy, the number of selected features, and execution time in most of the datasets.


 Arora, S., & Anand, P. (2019). Binary butterfly optimization approaches for feature selection. Expert Systems with Applications, 116, 147-160.
Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks 5(4), 537–550 (1994)
Cervante, L., Xue, B., Zhang, M., Shang, L.: Binary particle swarm optimization for feature selection: A filter-based approach. In: 2012 IEEE Congress on Evolutionary Computation (CEC). pp. 1–8. IEEE (2012)
Dash, M., & Liu, H. (1997). Feature selection for classification. Intelligent data analysis, 1(1-4), 131-156.
De Rezende, L. F. M., Lopes, M. R., Rey-López, J. P., Matsudo, V. K. R., & do Carmo Luiz, O. (2014). Sedentary behavior and health outcomes: an overview of systematic reviews. PloS one, 9(8), e105620.
Estévez, P. A., Tesmer, M., Perez, C. A., & Zurada, J. M. (2009). Normalized mutual information feature selection. IEEE Transactions on neural networks, 20(2), 189-201.
Fahad, L. G., Tahir, S. F., Shahzad, W., Hassan, M., Alquhayz, H., & Hassan, R. (2020). Ant Colony Optimization-Based Streaming Feature Selection: An Application to the Medical Image Diagnosis. Scientific Programming, 2020.
Frank, A., & Asuncion, A. (2010). UCI Machine Learning Repository [http://archive. ics. uci. edu/ml]. Irvine, CA: University of California. School of information and computer science, 213, 2-2.
Freeman, C., Kulić, D., & Basir, O. (2015). An evaluation of classifier-specific filter measure performance for feature selection. Pattern Recognition, 48(5), 1812-1826.
Gonzalez-Lopez, J., Ventura, S., & Cano, A. (2020). Distributed multi-label feature selection using individual mutual information measures. Knowledge-Based Systems, 188, 105052.
Goswami, S., Chakraborty, S., Guha, P., Tarafdar, A., & Kedia, A. (2019). Filter-Based Feature Selection Methods Using Hill Climbing Approach. In Natural Computing for Unsupervised Learning (pp. 213-234). Springer, Cham.
Guha, R., Ghosh, K. K., Bhowmik, S., & Sarkar, R. (2020, February). Mutually Informed Correlation Coefficient (MICC)-a New Filter Based Feature Selection Method. In 2020 IEEE Calcutta Conference (CALCON) (pp. 54-58). IEEE.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, 11(1), 10-18.
Hancer, E., Xue, B., & Zhang, M. (2018). Differential evolution for filter feature selection based on information theory and feature ranking. Knowledge-Based Systems, 140, 103-119.
Hancer, E., Xue, B., Zhang, M., Karaboga, D., & Akay, B. (2018). Pareto front feature selection based on artificial bee colony optimization. Information Sciences, 422, 462-479.
Hart, E., Sim, K., Gardiner, B., & Kamimura, K. (2017, July). A hybrid method for feature construction and selection to improve wind-damage prediction in the forestry sector. In Proceedings of the Genetic and Evolutionary Computation Conference (pp. 1121-1128).
Hichem, H., Elkamel, M., Rafik, M., Mesaaoud, M. T., & Ouahiba, C. (2019). A new binary grasshopper optimization algorithm for feature selection problem. Journal of King Saud University-Computer and Information Sciences.
Huda, R. K., & Banka, H. (2020). A group evaluation based binary PSO algorithm for feature selection in high dimensional data. Evolutionary Intelligence, 1-15.
Jain, R., Sawhney, R., & Mathur, P. (2018, March). Feature selection for cryotherapy and immunotherapy treatment methods based on gravitational search algorithm. In 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT) (pp. 1-7). IEEE.
Lall, S., Sinha, D., Ghosh, A., Sengupta, D., & Bandyopadhyay, S. (2021). Stable feature selection using copula-based mutual information. Pattern Recognition, 112, 107697.
Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R. P., Tang, J., & Liu, H. (2017). Feature selection: A data perspective. ACM Computing Surveys (CSUR), 50(6), 1-45.
Lim, H., & Kim, D. W. (2020). MFC: Initialization method for multi-label feature selection based on conditional mutual information. Neurocomputing, 382, 40-51.
Liu, W., & Wang, J. (2019, May). A brief survey on nature-inspired metaheuristics for feature selection in classification in this decade. In 2019 IEEE 16th International Conference on Networking, Sensing and Control (ICNSC) (pp. 424-429). IEEE.
Ma, J., & Gao, X. (2020). A filter-based feature construction and feature selection approach for classification using Genetic Programming. Knowledge-Based Systems, 196, 105806.
Mafarja, M. M., & Mirjalili, S. (2017). Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing, 260, 302-312.
Mahmoudi, S., Rajabioun, R., & Lotfi, S. (2013). Binary cuckoo optimization algorithm. nature.
Mlakar, U., Fister, I., & Brest, J. (2017, June). Hybrid Multi-objective PSO for Filter-Based Feature Selection. In 23rd International Conference on Soft Computing (pp. 113-123). Springer, Cham.
Moghadasian, M., & Hosseini, S. P. (2014). Binary cuckoo optimization algorithm for feature selection in high-dimensional datasets. In International conference on innovative engineering technologies (ICIET’2014) (pp. 18-21).
Moslehi, F., & Haeri, A. (2020). A novel hybrid wrapper–filter approach based on genetic algorithm, particle swarm optimization for feature subset selection. Journal of Ambient Intelligence and Humanized Computing, 11(3), 1105-1127.
Muharram, M., & Smith, G. D. (2005). Evolutionary constructive induction. IEEE transactions on knowledge and data engineering, 17(11), 1518-1528.
Nogueira, S., Sechidis, K., & Brown, G. (2017). On the stability of feature selection algorithms. J. Mach. Learn. Res., 18(1), 6345-6398.
Otero, F. E., Silva, M. M., Freitas, A. A., & Nievola, J. C. (2003, April). Genetic programming for attribute construction in data mining. In European Conference on Genetic Programming (pp. 384-393). Springer, Berlin, Heidelberg.
Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on pattern analysis and machine intelligence, 27(8), 1226-1238.
Rahman, M. A., Khanam, F., Ahmad, M., & Uddin, M. S. (2020). Multiclass EEG signal classification utilizing Rényi min-entropy-based feature selection from wavelet packet transformation. Brain informatics, 7(1), 1-11.
Rajabioun, R. (2011). Cuckoo optimization algorithm. Applied soft computing, 11(8), 5508-5518.
Russell, S., & Norvig, P. (2002). Artificial intelligence: a modern approach. 
Samuel, O., Alzahrani, F. A., Hussen Khan, R. J. U., Farooq, H., Shafiq, M., Afzal, M. K., & Javaid, N. (2020). Towards modified entropy mutual information feature selection to forecast medium-term load using a deep learning model in smart homes. Entropy, 22(1), 68.
Samy, A., Hosny, K. M., & Zaied, A. N. H. (2020). An efficient binary whale optimization algorithm with optimum path forest for feature selection. International Journal of Computer Applications in Technology, 63(1-2), 41-54.
Shi, E., Sun, L., Xu, J., & Zhang, S. (2020). Multilabel Feature Selection Using Mutual Information and ML-ReliefF for Multilabel Classification. IEEE Access, 8, 145381-145400.
Sun, L., Yin, T., Ding, W., Qian, Y., & Xu, J. (2020). Multilabel feature selection using ML-ReliefF and neighborhood mutual information for multilabel neighborhood decision systems. Information Sciences, 537, 401-424.
Tahir, M., Tubaishat, A., Al-Obeidat, F., Shah, B., Halim, Z., & Waqas, M. (2020). A novel binary chaotic genetic algorithm for feature selection and its utility in affective computing and healthcare. Neural Computing and Applications, 1-22.
Tavana, M., Shahdi-Pashaki, S., Teymourian, E., Santos-Arteaga, F. J., & Komaki, M. (2018). A discrete cuckoo optimization algorithm for consolidation in cloud computing. Computers & Industrial Engineering, 115, 495-511.
Tran, B., Xue, B., & Zhang, M. (2016). Genetic programming for feature construction and selection in classification on high-dimensional data. Memetic Computing, 8(1), 3-15.
Tran, B., Zhang, M., & Xue, B. (2016, December). Multiple feature construction in classification on high-dimensional data using GP. In 2016 IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 1-8). IEEE.
Tsanas, A., Little, M. A., & McSharry, P. E. (2010). A simple filter benchmark for feature selection. Journal of Machine Learning Research, 1(1-24).
Usman, A. M., Abdullah, A. U., Adamu, A., & Ahmed, M. M. (2018). Comparative Evaluation of Nature-Based Optimization Algorithms for Feature Selection on Some Medical Datasets. i-manager's Journal on Image Processing, 5(4), 9.
Usman, A. M., Yusof, U. K., & Naim, S. (2018). Cuckoo inspired algorithms for feature selection in heart disease prediction. International Journal of Advances in Intelligent Informatics, 4(2), 95-106.
Usman, A. M., Yusof, U. K., & Naim, S. (2020). Filter-Based Multi-Objective Feature Selection Using NSGA III and Cuckoo Optimization Algorithm. IEEE Access, 8, 76333-76356.
Usman, A. M., Yusof, U. K., Naim, S., Musa, N., & Chiroma, H. (2020). Multi-objective Filter-based Feature Selection Using NSGAIII With Mutual Information and Entropy. In 2020 2nd International Conference on Computer and Information Sciences (ICCIS) (pp. 1-7). IEEE.
Xue, B., Zhang, M., Browne, W. N., & Yao, X. (2015). A survey on evolutionary computation approaches to feature selection. IEEE Transactions on Evolutionary Computation, 20(4), 606-626.