Transforming Breast Cancer Prediction: Advanced Machine Learning Models for Accurate Prediction and Personalized Care
DOI:
https://doi.org/10.6000/1929-6029.2025.14.54Keywords:
Breast Cancer, Machine Learning, Random Forest, AUC-ROC, Predictive ModelingAbstract
Background: Breast cancer is the most common malignancy among women worldwide, underscoring the importance of early detection and accurate prognostication. Machine learning (ML) has emerged as a promising approach, offering powerful tools for analyzing complex datasets in breast cancer prediction and diagnosis.
Objective: This study evaluates the predictive performance of diverse ML algorithms for breast cancer classification using publicly available datasets, focusing on accuracy, interpretability, and generalizability.
Methods: The dataset included clinical and demographic variables such as age, menopausal status, tumor size, and lymph node involvement. Data preprocessing addressed missing values and class imbalance, with the Synthetic Minority Oversampling Technique (SMOTE) applied to improve sensitivity for the minority class. Feature engineering involved interaction terms and scaling of numerical variables. Multiple ML models—Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, Support Vector Machine (SVM), Naive Bayes, K-Nearest Neighbors (KNN), and Neural Networks—were trained and evaluated. Performance was measured using sensitivity, F1-score, and AUC-ROC. Model interpretability was enhanced with SHapley Additive exPlanations (SHAP).
Results: Random Forest achieved the best performance with an AUC-ROC of 0.9751, followed by Gradient Boosting (0.9242) and Neural Networks (0.9254). Logistic Regression and SVM yielded comparable results (0.9005 and 0.9344). Ensemble models showed higher accuracy and generalizability, particularly on external validation. Tumor size and lymph node involvement emerged as key predictors. SMOTE improved sensitivity across models.
Conclusion: This study demonstrates the potential of ML in breast cancer prediction, emphasizing the effectiveness of ensemble methods and interpretability tools. Future work should focus on integrating ML into clinical practice for earlier detection and personalized treatment.
References
Wilkinson L, Gathani T. Understanding breast cancer as a global health concern. Br J Radiol 2022; 95(1130): 20211033. DOI: https://doi.org/10.1259/bjr.20211033
Arnold M, et al. Current and future burden of breast cancer: global statistics for 2020 and 2040. Breast 2022; 66: 15-23. DOI: https://doi.org/10.1016/j.breast.2022.08.010
Chakraborty C, Bhattacharya M, Pal S, Lee S-S. From machine learning to deep learning: advances of the recent data-driven paradigm shift in medicine and healthcare. Curr Res Biotechnol 2024; 7: 100164. DOI: https://doi.org/10.1016/j.crbiot.2023.100164
Liao J, et al. Artificial intelligence assists precision medicine in cancer treatment. Front Oncol 2022; 12: 998222. DOI: https://doi.org/10.3389/fonc.2022.998222
Zhang B, Shi H, Wang H. Machine learning and AI in cancer prognosis, prediction, and treatment selection: a critical approach. J Multidiscip Healthc 2023; 16: 1779-1791. DOI: https://doi.org/10.2147/JMDH.S410301
Islam T, et al. Predictive modeling for breast cancer classification in the context of Bangladeshi patients by use of machine learning approach with explainable AI. Sci Rep 2024; 14(1): 8487. DOI: https://doi.org/10.1038/s41598-024-57740-5
Khalid A, et al. Breast cancer detection and prevention using machine learning. Diagnostics (Basel) 2023; 13(19): 3113. DOI: https://doi.org/10.3390/diagnostics13193113
Omar ED, et al. Comparative analysis of logistic regression, gradient boosted trees, SVM, and random forest algorithms for prediction of acute kidney injury requiring dialysis after cardiac surgery. Int J Nephrol Renovasc Dis 2024; 17: 197-204. DOI: https://doi.org/10.2147/IJNRD.S461028
Noura HN, Chu T, Allal Z, Salman O, Chahine K. A comparative study of ensemble methods and multi-output classifiers for predictive maintenance of hydraulic systems. Results Eng 2024; 24: 102900. DOI: https://doi.org/10.1016/j.rineng.2024.102900
Kern C, Klausch T, Kreuter F. Tree-based machine learning methods for survey research. Surv Res Methods 2019; 13(1): 73-93.
Priya CV L, V G BV, B R V, Ramachandran S. Deep learning approaches for breast cancer detection in histopathology images: a review. Cancer Biomark 2024; 40(1): 1-25. DOI: https://doi.org/10.3233/CBM-230251
Han Y, Joe I. Enhancing machine learning models through PCA, SMOTE-ENN, and stochastic weighted averaging 2024. DOI: https://doi.org/10.3390/app14219772
Gonzalez-Cuautle D, et al. Synthetic minority oversampling technique for optimizing classification tasks in botnet and intrusion-detection-system datasets 2020. DOI: https://doi.org/10.3390/app10030794
Khan AA, Chaudhari O, Chandra R. A review of ensemble learning and data augmentation models for class imbalanced problems: combination, implementation and evaluation. Expert Syst Appl 2024; 244: 122778. DOI: https://doi.org/10.1016/j.eswa.2023.122778
Jaganathan D, Balasubramaniam S, Sureshkumar V, Dhanasekaran S. Revolutionizing breast cancer diagnosis: a concatenated precision through transfer learning in histopathological data analysis. Diagnostics (Basel) 2024; 14(4): 0422. DOI: https://doi.org/10.3390/diagnostics14040422
Amethiya Y, Pipariya P, Patel S, Shah M. Comparative analysis of breast cancer detection using machine learning and biosensors. Intell Med 2022; 2(2): 69-81. DOI: https://doi.org/10.1016/j.imed.2021.08.004
Ming C, Viassolo V, Probst-Hensch N, Chappuis PO, Dinov ID, Katapodi MC. Machine learning techniques for personalized breast cancer risk prediction: comparison with the BCRAT and BOADICEA models. Breast Cancer Res 2019; 21(1): 75. DOI: https://doi.org/10.1186/s13058-019-1158-4
Sardanelli F, Magni V, Rossini G, Kilburn-Toppin F, Healy NA, Gilbert FJ. The paradox of MRI for breast cancer screening: high-risk and dense breasts—available evidence and current practice. Insights Imaging 2024; 15(1): 96. DOI: https://doi.org/10.1186/s13244-024-01653-4
Sharma A, Goyal D, Mohana R. An ensemble learning-based framework for breast cancer prediction. Decis Anal J 2024; 10: 100372. DOI: https://doi.org/10.1016/j.dajour.2023.100372
Obaido G, et al. Supervised machine learning in drug discovery and development: algorithms, applications, challenges, and prospects. Mach Learn with Appl 2024; 17: 100576. DOI: https://doi.org/10.1016/j.mlwa.2024.100576
Javaid M, Haleem A, Singh RP, Suman R, Rab S. Significance of machine learning in healthcare: features, pillars and applications. Int J Intell Networks 2022; 3: 58-73. DOI: https://doi.org/10.1016/j.ijin.2022.05.002
Singh S, Kumar R, Payra S, Singh SK. Artificial intelligence and machine learning in pharmacological research: bridging the gap between data and drug discovery. Cureus 2023; 15(8): e44359. DOI: https://doi.org/10.7759/cureus.44359
Cabitza F, et al. The importance of being external: methodological insights for the external validation of machine learning models in medicine. Comput Methods Programs Biomed 2021; 208: 106288. DOI: https://doi.org/10.1016/j.cmpb.2021.106288
Hanna M, et al. Ethical and bias considerations in artificial intelligence (AI)/machine learning. Mod Pathol 2024; 100686. DOI: https://doi.org/10.1016/j.modpat.2024.100686
Freiesleben T, König G, Molnar C, Tejero-Cantero Á. Scientific inference with interpretable machine learning: analyzing models to learn about real-world phenomena. Minds Mach 2024; 34(3): 32. DOI: https://doi.org/10.1007/s11023-024-09691-z
Nasarian E, Alizadehsani R, Acharya UR, Tsui K-L. Designing interpretable ML system to enhance trust in healthcare: a systematic review to proposed responsible clinician-AI-collaboration framework. Inf Fusion 2024; 108: 102412. DOI: https://doi.org/10.1016/j.inffus.2024.102412
Marey A, et al. Explainability, transparency and black box challenges of AI in radiology: impact on patient care in cardiovascular radiology. Egypt J Radiol Nucl Med 2024; 55(1): 183. DOI: https://doi.org/10.1186/s43055-024-01356-2
Salahuddin Z, Woodruff HC, Chatterjee A, Lambin P. Transparency of deep neural networks for medical image analysis: a review of interpretability methods. Comput Biol Med 2022; 140: 105111. DOI: https://doi.org/10.1016/j.compbiomed.2021.105111
Sushmitha GLN, Utukuru S. Age-based disease prediction and health monitoring: integrating explainable AI and deep learning techniques. Iran J Comput Sci 2025. DOI: https://doi.org/10.1007/s42044-024-00223-7
Farah L, Murris JM, Borget I, Guilloux A, Martelli NM, Katsahian SIM. Assessment of performance, interpretability, and explainability in artificial intelligence-based health technologies: what healthcare stakeholders need to know. Mayo Clin Proc Digit Heal 2023; 1(2): 120-138. DOI: https://doi.org/10.1016/j.mcpdig.2023.02.004
Jin Y, Lan A, Dai Y, Jiang L, Liu S. Development and testing of a random forest-based machine learning model for predicting events among breast cancer patients with a poor response to neoadjuvant chemotherapy. Eur J Med Res 2023; 28(1): 394. DOI: https://doi.org/10.1186/s40001-023-01361-7
Becker AS, Marcon M, Ghafoor S, Wurnig MC, Frauenfelder T, Boss A. Deep learning in mammography: diagnostic accuracy of a multipurpose image analysis software in the detection of breast cancer. Invest Radiol 2017; 52(7): 434-440. DOI: https://doi.org/10.1097/RLI.0000000000000358
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Policy for Journals/Articles with Open Access
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are permitted and encouraged to post links to their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
Policy for Journals / Manuscript with Paid Access
Authors who publish with this journal agree to the following terms:
- Publisher retain copyright .
- Authors are permitted and encouraged to post links to their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work .