Heart Disease Prediction using an Ensemble Learning Method: A Study at King Abdullah Hospital in Bisha, Saudi Arabia

Authors

  • Ghalia A. Alshehri Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
  • Hajar M. Alharbi Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
  • Husain H. Jabbad Department of Surgery, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia

DOI:

https://doi.org/10.6000/1929-6029.2025.14.52

Keywords:

Machine learning, Ensemble learning, Classification, Disease prediction, Heart disease

Abstract

The detection of diseases is essential to improving healthcare outcomes and saving lives. Thanks to technological advancements in medicine, machine learning has become a valuable tool for predicting future patient health outcomes. Despite the abundance of available patient data, accurately predicting cardiac disease has become increasingly challenging. In response, we developed an innovative ensemble learning approach (ELA) that combines three powerful machine learning (ML) techniques. Our ELA provides reliable predictions of cardiac disease that surpass those of the individual classification algorithms, resulting in higher accuracy. Our research yields a new combination of classification algorithms that significantly increases the prediction accuracy. We tested our model on a regional dataset collected from King Abdullah Hospital in Bisha, Saudi Arabia. We obtained the best results false negatives (FN ) of 8, true positives (TP) of 70, true negatives (TN) of 72, false positives (FP) of 6, accuracy of 0.9113, sensitivity of 0.8839, specificity of 0.95, PPV of 0.9389, NPV of 0.8878, AUC of 0.9569, F1 of 0.9133 Kappa of 0.8220, MCC of 0.8277 with an ELA comprising logistic regression (LR), extra trees (ET) and support vector machine (SVM) with radial basis function (RBF) kernel. With our ELA, medical professionals can detect cardiac disease and provide timely interventions to prevent potentially life-threatening health issues.

References

Ministry of Health, Saudi Arabia. Heart disease is the cause of 42% of deaths from non-communicable diseases in the Kingdom [Press release] 2013. https://www.moh.gov.sa

Rath A, Mishra D, Panda G, Satapathy SC. Heart disease detection using deep learning methods from imbalanced ECG samples. Biomedical Signal Processing and Control 2021; 68: 102820. DOI: https://doi.org/10.1016/j.bspc.2021.102820

Devi AD, Xavier S. Enhanced prediction of heart disease by genetic algorithm and RBF network. International Journal of Advanced Information Engineering and Technology 2015; 2(2): 29-37.

Djam XY, Wajiga GM, Kimbi YH, Blamah NV. A Fuzzy Expert System for the Management of Malaria. International Journal of Pure & Applied Sciences & Technology 2011; 5(2): 84-102.

Pawlovsky AP. An ensemble based on distances for a kNN method for heart disease diagnosis. In 2018 International Conference on Electronics, Information, and Communication (ICEIC) IEEE 2018; pp. 1-4. DOI: https://doi.org/10.23919/ELINFOCOM.2018.8330570

Janosi A, Steinbrunn W, Pfisterer M, Detrano R. UCI machine learning repository-heart disease data set [Data set]. University of California, Irvine 1988.

Alizadehsani Z, Alizadehsani R, Roshanzamir M. Z-Alizadeh Sani data set [Data set]. UCI Machine Learning Repository 2017. https://archive.ics.uci.edu

Latha CBC, Jeeva SC. Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Informatics in Medicine Unlocked 2019; 16: 100203. DOI: https://doi.org/10.1016/j.imu.2019.100203

Atallah R, Al-Mousa A. Heart disease detection using machine learning majority voting ensemble method. In 2019 2nd International Conference on New Trends in Computing Sciences (ICTCS) IEEE 2019; pp. 1-6. DOI: https://doi.org/10.1109/ICTCS.2019.8923053

Lapp D. Heart Disease Dataset [Data set]. Kaggle 2019. https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset

Li R, Shen S, Chen G, Xie T, Ji S, Zhou B, Wang Z. Multilevel risk prediction of cardiovascular disease based on Adaboost+RF ensemble learning. IOP Conference Series: Materials Science and Engineering 2019; 533(1): 012050. DOI: https://doi.org/10.1088/1757-899X/533/1/012050

Chaurasia V, Chaurasia A. Novel method of characterization of heart disease prediction using sequential feature selection-based ensemble technique. Biomedical Materials & Devices 2023; 1(2): 932-941. DOI: https://doi.org/10.1007/s44174-022-00060-x

Asif D, Bibi M, Arif MS, Mukheimer A. Enhancing heart dis-ease prediction through ensemble learning techniques with hyperparameter optimization. Algorithms 2023; 16(6): 308. DOI: https://doi.org/10.3390/a16060308

Ahmed R. Heart Disease [Data set]. Kaggle 2020. https://www.kaggle.com/datasets/data855/heart-disease

Cherngs. Heart disease Cleveland UCI [Data set]. Kaggle 2020. https://www.kaggle.com/datasets/cherngs/heart-disease-cleveland-uci

Ganie SM, Pramanik PKD, Malik MB, Nayyar A, Kwak KS. An Improved Ensemble Learning Approach for Heart Disease Prediction Using Boosting Algorithms. Computer Systems Science and Engineering 2023; 46(3): 3993-4006. DOI: https://doi.org/10.32604/csse.2023.035244

Aziz S, Afreen N, Akram F, Ahmed M. A Framework for Cardiac Arrest Prediction via Application of Ensemble Learning Using Boosting Algorithms. Procedia Computer Science 2024; 235: 3293-3304. DOI: https://doi.org/10.1016/j.procs.2024.04.311

Narayanana J. Implementation of Efficient Machine Learning Techniques for Prediction of Cardiac Disease using SMOTE. Procedia Computer Science 2024; 233: 558-569. DOI: https://doi.org/10.1016/j.procs.2024.03.245

Musa IR, Omar SM, Sharif ME, Ahmed ABA, Adam I. The calculated versus the measured glycosylated haemoglobin (HbA1c) levels in patients with type 2 diabetes mellitus. Journal of Clinical Laboratory Analysis 2021; 35(8): e23873. DOI: https://doi.org/10.1002/jcla.23873

Beaulieu-Jones B, Greene CS, Consortium P. A new analy-tical framework for missing data imputation and classification with uncertainty. PLOS One 2022; 17(3): e0264238.

Waljee AK, Mukherjee A, Singal AG, Zhang Y, Warren J, Balis U, Marrero J, Zhu J, Higgins PDR. Comparison of imputation methods for missing laboratory data in medicine. BMJ Open 2013; 3(8): e002847. DOI: https://doi.org/10.1136/bmjopen-2013-002847

Zhang Y, Zhang J, Gong C, et al. Improving the prediction of heart failure patients' survival using SMOTE and effective data mining techniques. IEEE Access 2020; 8: 182459-182472.

García S, Luengo J, Herrera F. Data preprocessing in data mining. Springer 2015. DOI: https://doi.org/10.1007/978-3-319-10247-4

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay É. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 2011; 12: 2825-2830.

Tang J, Alelyani S, Liu H. Feature selection for classification: A review. In Data classification: Algorithms and applications. CRC Press 2014; pp. 37-64.

Hasan N, Bao Y. Comparing different feature selection algorithms for cardiovascular disease prediction. Health and Technology 2021; 11(1): 49-62. DOI: https://doi.org/10.1007/s12553-020-00499-2

Guyon I, Elisseeff A. An introduction to variable and feature selection. Journal of Machine Learning Research 2003; 3: 1157-1182.

Dey A, Ashour AS, Bhattacharya S. Machine learning techniques for heart disease prediction: A comparative study. In 2017 International Conference on Electronics, Communication and Aerospace Technology (ICECA) IEEE 2017; pp. 547-550.

Das R, Turkoglu I, Sengur A. Effective diagnosis of heart disease through neural networks ensembles. Expert Systems with Applications 2009; 36(4): 7675-7680. DOI: https://doi.org/10.1016/j.eswa.2008.09.013

Pattekari SA, Parveen S. Prediction system for heart disease using Naive Bayes. International Journal of Advanced Computer and Mathematical Sciences 2012; 3(3): 290-294.

Rajkumar A, Reena GS. Diagnosis of heart disease using machine learning algorithms. International Journal of Research in Engineering and Technology 2010; 2(6): 741-744.

Ali L, Zhu C, Zhou M, Javeed A. Reliable Parkinson's disease detection by using an intelligent system based on L2-regularized logistic regression and extra trees classifier. Future Generation Computer Systems 2019; 97: 238-252.

Schapire RE. Explaining adaboost. In Empirical Inference. Springer 2013; pp. 37-52. DOI: https://doi.org/10.1007/978-3-642-41136-6_5

Soofi AA, Awan A. Classification techniques in machine learning: Applications and issues. Journal of Basic and Applied Sciences 2017; 13: 459-465. DOI: https://doi.org/10.6000/1927-5129.2017.13.76

Jakkula V. Tutorial on support vector machine (SVM) (Technical Report). School of EECS, Washington State University 2006.

Mitchell TM. Machine learning. McGraw-Hill 1997.

Speiser JL, Miller ME, Tooze J, Ip E. A comparison of random forest variable selection methods for classification prediction modeling. Expert Systems with Applications 2019; 134: 93-101. DOI: https://doi.org/10.1016/j.eswa.2019.05.028

Dong X, Yu Z, Cao W, Shi Y, Ma Q. A survey on ensemble learning. Frontiers of Computer Science 2020; 14(2): 241-258. DOI: https://doi.org/10.1007/s11704-019-8208-z

Patil S, Bhosale S. Hyperparameter tuning based performance analysis of machine learning approaches for prediction of cardiac complications. In International Conference on Soft Computing and Pattern Recognition. Springer 2020; pp. 605-617. DOI: https://doi.org/10.1007/978-3-030-73689-7_58

Alshehri GA, Alharbi HM. Prediction of heart disease using an ensemble learning approach. International Journal of Advanced Computer Science and Applications 2023; 14(8). DOI: https://doi.org/10.14569/IJACSA.2023.01408118

Ullah T, Ullah SI, Ullah K, Ishaq M, Khan A, Ghadi YY, Algarni A. Machine learning-based cardiovascular disease detection using optimal feature selection. IEEE Access 2024. DOI: https://doi.org/10.1109/ACCESS.2024.3359910

Sumbria S. Statlog (Heart) Data Set [Data set]. Kaggle 2019. https://www.kaggle.com/datasets/shubamsumbria/statlog-heart-data-set

Ulianova S. Cardiovascular Disease dataset [Data set]. Kaggle 2019. https://www.kaggle.com/datasets/sulianova/ cardiovascular-disease-dataset

Abdar M, Acharya UR, Sarrafzadegan N, Makarenkov V. NE-nu-SVC: A new nested ensemble clinical decision support system for effective diagnosis of coronary artery disease. IEEE Access 2019; 7: 167605-167620. DOI: https://doi.org/10.1109/ACCESS.2019.2953920

Yewale D, Vijayaragavan SP, Bairagi VK. An Effective Heart Disease Prediction Framework based on Ensemble Techniques in Machine Learning. International Journal of Advanced Computer Science and Applications 2023; 14(2). DOI: https://doi.org/10.14569/IJACSA.2023.0140223

Downloads

Published

2025-09-16

How to Cite

Alshehri, G. A. ., Alharbi, H. M. ., & Jabbad, H. H. . (2025). Heart Disease Prediction using an Ensemble Learning Method: A Study at King Abdullah Hospital in Bisha, Saudi Arabia. International Journal of Statistics in Medical Research, 14, 549–561. https://doi.org/10.6000/1929-6029.2025.14.52

Issue

Section

Special Issue: Trends in Artificial Intelligence and Machine Learning in Healthcare