Predicting Heart Disease Using Automated Machine Learning Based on Genetic Algorithms

Document Type : Research Paper

Authors

1 Prof., Department of Industrial Management, Faculty of Industrial Management and Technology, College of Management, University of Tehran, Tehran, Iran.

2 Ph.D. Candidate, Department of Industrial Management, Faculty of Industrial Management and Technology, College of Management, University of Tehran, Tehran, Iran.

Abstract

This study aims to apply automatic machine-learning approaches using genetic algorithms to enhance heart disease prediction. Heart disease has remained the major cause of mortality in the world, necessitating an effective and timely diagnosis. Most current diagnostic and assessment processes are lengthy and expensive, relying heavily on clinical expert knowledge. To help address these issues, machine learning approaches, which derive their utility from examining substantial datasets for the recognition of patterns, have emerged as a potential solution, providing solutions beyond those achievable by human recognition alone. Genetic algorithms are also suited to addressing these issues as they mimic natural evolution to perfect high-caliber machine-learning models, feature selection, and parameter selection in machine-learning applications. This study examines the utilization of genetic algorithms working alongside AutoML frameworks to improve accuracy in heart disease predictions. Reducing to the best combination of attributes and the optimum parameters for each attribute is a time-consuming task, so automating this aspect of the process allows for more accurate and prompt predictions, consequently reducing the manual work. The AutoML approach followed in this research is TPOT, which uses genetic algorithms to ascertain optimally designed machine-learning pipelines. The application of AutoML, together with genetic algorithms, is the most prominent finding that yields a significant improvement in the quality of the predictions for heart disease compared to the traditional assessment approaches, with an accuracy of 93.8%. This approach will enhance diagnostic accuracy and enable early diagnosis, thereby reducing the likelihood of misdiagnoses or ineffective treatments and ultimately lowering associated costs.

Keywords

Main Subjects


Akkur, E. (2023). Prediction of Cardiovascular Disease Based on Voting Ensemble Model and SHAP Analysis. Sakarya University Journal of Computer and Information Sciences, 6(3), 226-238. https://doi.org/https://doi.org/10.35377/saucis.1367326
Ali, M. M., Paul, B. K., Ahmed, K., Bui, F. M., Quinn, J. M., & Moni, M. A. (2021). Heart disease prediction using supervised machine learning algorithms: Performance analysis and comparison. Computers in Biology and Medicine, 136, 104672. https://doi.org/https://doi.org/10.1016/j.compbiomed.2021.104672
Amin, F., & Mahmoud, M. (2022). Confusion matrix in binary classification problems: A step-by-step tutorial. Journal of Engineering Research, 6(5), 1-12. https://doi.org/https://doi.org/10.21608/erjeng.2022.274526
Anshori, M., & Haris, M. S. (2022). Predicting heart disease using logistic regression. Knowledge Engineering and Data Science (KEDS), 5(2), 188-196. https://doi.org/https://doi.org/10.17977/um018v5i22022p188-196
Arroyo, J. C. T., & Delima, A. J. P. (2022). An optimized neural network using genetic algorithm for cardiovascular disease prediction. Journal of Advances in Information Technology, 13(1), 95-99. https://doi.org/https://doi.org/10.12720/jait.13.1.95-99
Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. The journal of machine learning research, 13(1), 281-305.
Bhatt, C. M., Patel, P., Ghetia, T., & Mazzeo, P. L. (2023). Effective heart disease prediction using machine learning techniques. Algorithms, 16(2), 88-101. https://doi.org/https://doi.org/10.3390/a16020088
Bumm, C. V., Wölfle, U. C., Keßler, A., Werner, N., & Folwaczny, M. (2023). Influence of decision-making algorithms on the diagnostic accuracy using the current classification of periodontal diseases—a randomized controlled trial. Clinical Oral Investigations, 27(11), 6589-6596. https://doi.org/https://doi.org/10.1007/s00784-023-05264-z
Darrab, S., Broneske, D., & Saake, G. (2024). Exploring the predictive factors of heart disease using rare association rule mining. Scientific Reports, 14(1), 18178. https://doi.org/https://doi.org/10.1038/s41598-024-69071-6
Deepan, P., Vidhya, R., Rajalingam, B., Santhoshkumar, R., & Arul, N. (2024). FLAML-HDPS Model: An Efficient and Intelligent AutoML Approach for Heart Disease Prediction. International Conference on Computer & Communication Technologies.
Dorraki, M., Liao, Z., Abbott, D., Psaltis, P. J., Baker, E., Bidargaddi, N., Wardill, H. R., van den Hengel, A., Narula, J., & Verjans, J. W. (2024). Improving cardiovascular disease prediction with machine learning using mental health data: a prospective UK Biobank study. JACC: Advances, 3(9_Part_2), 101180. https://doi.org/https://doi.org/10.1016/j.jacadv.2024.101180
Fahimfar, N., Khalili, D., Sepanlou, S. G., Malekzadeh, R., Azizi, F., Mansournia, M. A., Roohafza, H., Emamian, M. H., Hadaegh, F., & Poustchi, H. (2018). Cardiovascular mortality in a Western Asian country: results from the Iran Cohort Consortium. BMJ open, 8(7), e020303. https://doi.org/https://doi.org/10.1136/bmjopen-2017-020303
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., & Hutter, F. (2019). Efficient and robust automated machine learning. Advances in neural information processing systems, 28, 113-134. https://doi.org/https://doi.org/10.1007/978-3-030-05318-5_6
Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine learning, 63, 3-42. https://doi.org/https://doi.org/10.1007/s10994-006-6226-1
Gijsbers, P., Vanschoren, J., & Olson, R. S. (2017). Layered TPOT: Speeding up tree-based pipeline optimization. European Conference on Machine Learning, 49-68. https://doi.org/http://ceur-ws.org/Vol-1998/paper_06.pdf
Gonzalez-Abril, L., Angulo, C., Nuñez, H., & Leal, Y. (2017). Handling binary classification problems with a priority class by using Support Vector Machines. Applied Soft Computing, 61, 661-669. https://doi.org/https://doi.org/10.1016/j.asoc.2017.08.023
Jafarnejad Chaghoshi, A., Rezasoltani, A., & Khani, A. M. (2024). Unleashing the Power of Ensemble Learning: Predicting National Ranks in Iran’s University Entrance Examination. Industrial Management Journal, 16(3), 457-481. https://doi.org/https://doi: 10.22059/imj.2024.381521.1008178.
Jha, S., Vaithiyanathan, D., Verma, P., & Kaur, B. (2024). An Automated Machine Learning Approach for Detecting Chronic Ischemic Heart Disease. 2024 International Conference on Advances in Modern Age Technologies for Health and Engineering Science (AMATHE).
Katoch, S., Chauhan, S. S., & Kumar, V. (2021). A review on genetic algorithm: past, present, and future. Multimedia tools and applications, 80(5), 8091-8126. https://doi.org/https://doi.org/10.1007/s11042-020-10139-6
Kenny, A., Ray, T., Limmer, S., Singh, H. K., Rodemann, T., & Olhofer, M. (2023). Hybridizing TPOT with Bayesian optimization. Proceedings of the Genetic and Evolutionary Computation Conference.
Khani, A. M., Kazazi, A., & Taqhavi Fard, M. T. (2022). Evaluating the quality of services of the cultural and social deputy of Tehran municipality in the field of culture and art. Social Development & Welfare Planning, 13(50), 205-250. https://doi.org/https://doi.org/10.22054/qjsd.2021.58035.2110
Koshiga, N., Borugadda, P., & Shaprapawad, S. (2023). Prediction of heart disease based on machine learning algorithms. 2023 International Conference on Inventive Computation Technologies (ICICT).
Liu, X., Su, S., Wang, B., & Zhang, X. (2023). Prediction of heart disease based on logistic regression and random forest models. Highlights in Science Engineering and Technology, 49, 489-495. https://doi.org/https://doi.org/10.54097/hset.v49i.8599
Maihami, V., Khormehr, A., & Rahimi, E. (2016). Designing an expert system for prediction of heart attack using fuzzy systems. Scientific Journal of Kurdistan University of Medical Sciences, 21(4), 118-131. https://doi.org/https://civilica.com/doc/810306
Maleki, S., & Mehrjerdi, Y. Z. (2022). Diagnosis of coronary artery disease by Bat and Harris hawk meta-heuristic optimization algorithms and machine learning methods. Journal of Health Administration, 25(1), 57-68.
Mangalath Ravindran, S., Moorakkal Bhaskaran, S. K., K. Ambat, S., Balakrishnan, K., & Manguttathil Gopalakrishnan, M. (2022). An automated machine learning methodology for the improved prediction of reference evapotranspiration using minimal input parameters. Hydrological Processes, 36(5), e14571. https://doi.org/https://doi.org/10.1002/hyp.14571
Mansouri, M., & Dadvar, M. (2017). Diagnosing heart attacks using a model based on genetic algorithm and ensemble learning Fifth National Conference on Computer Science, Engineering and Information Technology, Babol. https://civilica.com/doc/810306
McCall, J. (2005). Genetic algorithms for modelling and optimisation. Journal of computational and Applied Mathematics, 184(1), 205-222. https://doi.org/https://doi.org/10.1016/j.cam.2004.07.034
Mohan, S., Thirumalai, C., & Srivastava, G. (2019). Effective heart disease prediction using hybrid machine learning techniques. IEEE access, 7, 81542-81554. https://doi.org/https://doi.org/10.1109/access.2019.2923707
Mueller, J., Shi, X., & Smola, A. (2020). Faster, simpler, more accurate: Practical automated machine learning with tabular, text, and image data. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.
Nazari, S., & Jodki, S. (2020). Using genetic algorithm and K-means clustering to improve the accuracy of support vector machine in the diagnosis of heart disease National Conference on Latest Achievements in Data Engineering and Knowledge and Soft Computing, Shahrekord.
Nguyen, T.-D., Musial, K., & Gabrys, B. (2021). AutoWeka4MCPS-AVATAR: Accelerating automated machine learning pipeline composition and optimisation. Expert systems with applications, 185, 115643. https://doi.org/https://doi.org/10.1016/j.eswa.2021.115643
Oliveto, P. S., Auger, A., Chicano, F., & Fonseca, C. M. (2020). Guest Editorial Special Issue on Theoretical Foundations of Evolutionary Computation. IEEE Transactions on Evolutionary Computation, 24(6), 993-994. https://doi.org/https://doi.org/10.1109/tevc.2020.3035225
Olson, R. S., & Moore, J. H. (2019). TPOT: A tree-based pipeline optimization tool for automating machine learning. Workshop on automatic machine learning.
Orlenko, A., Kofink, D., Lyytikäinen, L.-P., Nikus, K., Mishra, P., Kuukasjärvi, P., Karhunen, P. J., Kähönen, M., Laurikka, J. O., & Lehtimäki, T. (2020). Model selection for metabolomics: predicting diagnosis of coronary artery disease using automated machine learning. Bioinformatics, 36(6), 1772-1778. https://doi.org/https://doi.org/10.1093/bioinformatics/btz796
Paladino, L. M., Hughes, A., Perera, A., Topsakal, O., & Akinci, T. C. (2023). Evaluating the performance of automated machine learning (AutoML) tools for heart disease diagnosis and prediction. Ai, 4(4), 1036-1058. https://doi.org/https://doi.org/10.3390/ai4040053
Pandiaraj, A., Prakash, S. L., & Kanna, P. R. (2021). Effective heart disease prediction using hybridmachine learning. 2021 Third international conference on intelligent communication technologies and virtual mobile networks (ICICV).
Rajdhan, N. A., Agarwal, N. A., Sai, N. M., & Ghuli, N. D. R. D. P. (2020). Heart Disease Prediction using Machine Learning. IJERT, 09(04). https://doi.org/https://doi.org/10.17577/ijertv9is040614
Reddy, N. N., Nipun, L., Baba, M. U., Rishindra, N., & Shilpa, T. (2024). Optimizing heart disease prediction through ensemble and hybrid machine learning techniques. International Journal of Electrical and Computer Engineering (IJECE), 14(5), 5744-5754. https://doi.org/https://doi.org/10.11591/ijece.v14i5.pp5744-5754
Rezaeenoor, J., Saadi, G., & Jahani, M. (2019). Prediction of Cardiovascular Diseases Using an Optimized Artificial Neural Network. Journal of Ilam University of Medical Sciences, 27(5), 15-23. https://doi.org/http://dx.doi.org/10.29252/sjimu.27.5.15
Salomon, R. (1997). Raising theoretical questions about the utility of genetic algorithms. International Conference on Evolutionary Programming.
Sarraf-Zadegan, N., Boshtam, M., Malekafzali, H., Bashardoost, N., Sayed-Tabatabaei, F., Rafiei, M., Khalili, A., Mostafavi, S., Khami, M., & Hassanvand, R. (1999). Secular trends in cardiovascular mortality in Iran, with special reference to Isfahan. Acta cardiologica, 54(6), 327-333. https://doi.org/https://pubmed.ncbi.nlm.nih.gov/10672288
Sarrafzadegan, N., & Mohammmadifard, N. (2019). ardiovascular disease in Iran in the last 40 years: prevalence, mortality, morbidity, challenges, and strategies for cardiovascular prevention. C PubMed, 22(4), 204-210. https://doi.org/https://pubmed.ncbi.nlm.nih.gov/31126179
Shah, D., Patel, S., & Bharti, S. K. (2020). Heart Disease Prediction using Machine Learning Techniques. SN Computer Science, 1(6), 345. https://doi.org/https://doi.org/10.1007/s42979-020-00365-y
Siddhartha, M. (2020). Heart disease dataset (comprehensive). IEEE Dataport, 10. https://doi.org/https://dx.doi.org/10.21227/dz4t-cm36
Takahashi, K., Yamamoto, K., Kuchiba, A., & Koyama, T. (2021). Confidence interval for micro-averaged F1 and macro-averaged F1 scores. Applied Intelligence, 52(5), 4961–4972. https://doi.org/https://doi.org/10.1007/s10489-021-02635-5
Tiwari, A., Chugh, A., & Sharma, A. (2022). Ensemble framework for cardiovascular disease prediction. Computers in Biology and Medicine, 146, 105624. https://doi.org/https://doi.org/10.1016/j.compbiomed.2022.105624
Truong, A., Walters, A., Goodsitt, J., Hines, K., Bruss, C. B., & Farivar, R. (2019). Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools 2019 IEEE 31st International Conference on Tools With Artificial Intelligence (ICTAI). 
Veisi, H., Ghaedsharaf, H. R., & Ebrahimi, M. (2021). Improving the performance of machine learning algorithms for heart disease diagnosis by optimizing data and features. Soft computing journal, 8(1), 70-85. https://doi.org/https://dor.isc.ac/dor/20.1001.1.23223707.1398.8.1.6.2
Verma, K., Bartwal, A. S., & Thapliyal, M. P. (2021). A genetic algorithm based hybrid deep learning approach for heart disease prediction. Journal of Mountain Research, 16(3), 179-187. https://doi.org/https://doi.org/10.51220/jmr.v16i3.19
Villmann, T., Kaden, M., Lange, M., Sturmer, P., & Hermann, W. (2014). Precision-Recall-Optimization in Learning Vector Quantization Classifiers for Improved Medical Classification Systems 2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM). 
Wang, J., Xue, Q., Zhang, C. W. J., Wong, K. K. L., & Liu, Z. (2024). Explainable coronary artery disease prediction model based on AutoGluon from AutoML framework. Frontiers in Cardiovascular Medicine, 11. https://doi.org/https://doi.org/10.3389/fcvm.2024.1360548
Wang, S. (2023). Research on the heart attack prediction based on logistic regression. Highlights in Science Engineering and Technology, 65, 153-158. https://doi.org/https://doi.org/10.54097/hset.v65i.11357
Whitley, D. (1994). A genetic algorithm tutorial. . Statistics and computing, 4(2), 65-85. https://doi.org/https://doi.org/10.1007/bf00175354
Yadav, D., Saini, P., & Mittal, P. (2021). Feature Optimization Based Heart Disease Prediction using Machine Learning 2021 5th International Conference on Information Systems and Computer Networks (ISCON). 
Yu, H. (2023). Analysis and Prediction of Heart Disease Based on Machine Learning Algorithms 2023 8th International Conference on Intelligent Computing and Signal Processing (ICSP).