Feature Selection Using a Genetic Algorithms and Fuzzy logic in Anti-Human Immunodeficiency Virus Prediction for Drug Discovery

Document Type : Research Paper


1 Researcher, Laboratory Processes and Environment, Faculty of Sciences and Technology, University Hassan II Casablanca, Mohammedia, Morocco.

2 Associate Professor, Information System Departement, Taibah University, Al-Madinah Al-Monawarah, Saudi Arabia.

3 Full Professor, LIM Laboratory, Computer Science Department, Faculty of Sciences and Technology, University Hassan II Casablanca, Mohammedia, Morocco.


This paper presents an approach that uses both genetic algorithm (GA) and fuzzy inference system (FIS), for feature selection for descriptor in a quantitative structure activity relationships (QSAR) classification and prediction problem. Unlike the traditional techniques that employed GA, the FIS is used to evaluate an individual population in the GA process. So, the fitness function is introduced and defined by the error rate of the GA and FIS combination. The proposed approach has been implemented and tested using a data set with experimental value anti-human immunodeficiency virus (HIV) molecules. The statistical parameters q2 (leave many out) is equal 0.59 and r (coefficient of correlation) is equal 0.98. These results reveal the capacity for achieving subset of descriptors, with high predictive capacity as well as the effectiveness and robustness of the proposed approach.


Danishuddin, M., & Khan, A. U. (2016). Descriptors and their selection methods in QSAR analysis: paradigm for drug design. Drug Discovery Therapy, 21(8), 1291-1302.
Fenner, K., & Tratnyekc, P. G. (2017). QSARs and computational chemistry methods in environmental chemical sciences. Environmental Science: Processes & Impacts,19, 185-187.
Swathik, C. P., Jaspreet, K. D., Vidhi, M., Navaneethan, R., Mannu, J., & Durai S. (2019). Quantitative Structure-Activity Relationship (QSAR): Modeling Approaches to Biological Applications. Encyclopedia of Bioinformatics and Computational Biology, 2, 661-676.
Hdoufane, I., Stoycheva, J., Tadjer, A., Villemin, D., Najdoska-Bogdanov, M., Bogdanov, J., & Cherqaoui, D. (2019). QSAR and molecular docking studies of indole-based analogs as HIV-1 attachment inhibitors. Journal of Molecular Structure, 1193, 429-443.
Todeschini, R., & Consonni, V. (2009). Molecular Descriptors for Chemoinformatics, Wiley-VCH. 
Eklund, M., Norinder, U., Boyer, S., & Carlsson, L. (2014). Choosing feature selection and learning algorithms in QSAR. Journal of Chemical Information and Modeling, 54(3), 837-843. 
Grisoni, F., Consonni, V., & Todeschini, R. (2018). Impact of molecular descriptors on computational models. In Computational Chemogenomics. Methods in Molecular Biology. J. Brown, Ed., vol. 1825, Humana Press, New York, USA. 
Liu, X. Y., Liang, Y., Wang, S., Yang, Z. Y., & Ye, H., S. (2018). Hybrid Genetic Algorithm With Wrapper-Embedded Approaches for Feature Selection.  IEEE Access, 6, 22863-22874.
Wutzl, B., Leibnitz, K., Rattay, F., Kronbichler, M., Murata, M., & Golaszewski, S. M. (2019). Genetic algorithms for feature selection when classifying severe chronic disorders of consciousness. PLoS ONE 14(7), 1-16.
Nagasubramanian, K., Jones, S., Sarkar, S., Singh, A. K., Singh, A., & Ganapathysubramanian, B. (2018). Hyperspectral band selection using genetic algorithm and support vector machines for early identification of charcoal rot disease in soybean stems. Plant Methods, 14(86), 1-13.
Labjar, H., Kissi, M., Mouhibi, R., Khadir, O., Chaair, H., & Zahouily, M. (2016). QSAR study of 1-(3, 3-diphenylpropyl)-piperidinyl amides and ureas using genetic algorithms and artificial neural networks. International Journal of Bioinformatics Research and Applications,12(2), 116-128.
Salari, N., Shohaimi, S., Najafi, F., Nallappan, M., & Karishnarajah, I. (2014). A novel hybrid classification model of genetic algorithms, modified k-Nearest Neighbor and developed backpropagation neural network. PLoS One. 9(11), 1-50.
Srivastava, A. K., Singh, D., Pandey, A. S., & Maini, T. (2019). A Novel Feature Selection and Short-Term Price Forecasting Based on a Decision Tree (J48) Model. Energies, 12, 1-17.
Guyon, I., Weston, J., Barnhill, S.,  & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46, 389-422.
Lal, T.N., Chapelle, O., Weston, J., & Elisseeff, A. (2006). Embedded Methods. In: Guyon I., Nikravesh M., Gunn S., Zadeh L.A. (eds) Feature Extraction. Studies in Fuzziness and Soft Computing, vol 207. Springer, Berlin, Heidelberg.
Swathik, C. P., Jaspreet, K. D., Vidhi, M., Navaneethan, R., Mannu, J., & Durai, S. (2019). Quantitative Structure-Activity Relationship (QSAR): Modeling Approaches to Biological Applications. Encyclopedia of Bioinformatics and Computational Biology, 2, 661-676
Liu, B., He, H., Luo, H., Zhang, T., & Jiang, J. (2019). Artificial intelligence and big data facilitated targeted drug discovery. Stroke & Vascular Neurology, 4, 206-213.
Racz, A., Bajusz, D., & Héberger, K. (2019). Intercorrelation Limits in Molecular Descriptor Preselection for QSAR/QSPR. Molecular informatics, 38, 1-6.
Pourbasheer, E., Aalizadeh, R., Ganjali, M. R., Norouzi, P., Shadmanesh, J. (2014). QSAR study of ACK1 inhibitors by genetic algorithm–multiple linear regression (GA–MLR). Journal of Saudi Chemical Society ,18, 681-688.
Holland, J. H. (1992). Adaptation in Natural and Artificial Systems. Ann Arbor, MI, university of Michigan Press.
Takagi, T., & Sugeno, M. (1985). Fuzzy identification of systems and its application to modelling and control. IEEE Trans on Systems, Man and Cybernetics, 15, 116-132.
Jang, J.S.R. (1993). ANFIS: Adaptive-Network-Based Fuzzy Inference systems. IEEE Trans. Syst. Man Cybernet., 23 (3), 665–685.
Tanaka, H., Takashima, H., Ubasawa, M., Sekiya, K., Nitta, I., Baba, M., Shigata, S., Walker, R. T., De Clercq, E., Miyasaka, T. (1992). Structure-activity relationships of 1-[(2-hydroxyethoxy) methyl]-6-(phenylthio) thymine (HEPT) analogues: Effect of substitutions at the C-6 phenyl ring and the C-5 position on anti-HIV-1 activity. J. Med. Chem. 35, 337-345. 
Garg, R., Gupta, S. P., Gao, H., Babu, M. S., & Debnath, A. K. (1999). Comparative Quantitative Structure-Activity Relationships Studies on Anti-HIV Drugs. Chem. Rev. 99, 3525-3601.
MMP, molecular modelling pro-Demo (TM) Revision 301 demo. ChemSW Software (TM). http://www.chemistry-software.com/modelling/molecular_modeling_pro_plus.htm
Buglak, A. A., Zherdev, A. V., Lei H. T., & Dzantiev, B. B. (2019). QSAR analysis of immune recognition for triazine herbicides based on immunoassay data for polyclonal and monoclonal antibodies. PLoS ONE, 14(4),1-19.
Anacleto de Souza, S., Leonardo Ferreira, L. G., Aldo de Oliveira, S., & Adriano Andricopulo, D. (2019). Quantitative Structure–Activity Relationships for Structurally Diverse Chemotypes Having Anti-Trypanosoma cruzi Activity. Int. J. Mol. Sci., 20, 1-21.
Wen, L., Li, Q., Li, W., Cai, Q., & Cai., Y. M. (2017). A QSAR Study Based on SVM for the Compound of Hydroxyl Benzoic Esters. Bioinorganic Chemistry and Applications, 1-10.
Marunnan, S. M., Pulikkal, B. P., Jabamalairaj, A., Bandaru, S., Yadav, M., Nayarisseri, A., & Doss, V. A. (2017). Development of MLR and SVM Aided QSAR Models to Identify Common SAR of GABA Uptake Herbal Inhibitors used in the Treatment of Schizophrenia. Current Neurophar macology, 15(8), 1085-1092.
Chakravarti, S. K., & Alla S. R. M. (2019). Descriptor Free QSAR Modeling Using Deep Learning With Long Short-Term Memory Neural Networks. Front. Artif. Intell, 2(17), 1-18.