Transforming Breast Cancer Prediction: Advanced Machine Learning Models for Accurate Prediction and Personalized Care

Usha  Adiga; Sampara  Vasishta; Alfred J.  Augustine; Kasala  Farzia; Eddula  Venkataravikanth; Lokesh  Ravi

doi:10.6000/1929-6029.2025.14.54

Authors

Usha Adiga Department of Biochemistry, Apollo Institute of Medical Sciences and Research Chittoor, Murukambattu - 517127, Chittoor, Andhra Pradesh, India
Sampara Vasishta Department of Biochemistry, Apollo Institute of Medical Sciences and Research Chittoor, Murukambattu - 517127, Chittoor, Andhra Pradesh, India
Alfred J. Augustine Department of Surgery, Apollo Institute of Medical Sciences and Research Chittoor, Murukambattu - 517127, Chittoor, Andhra Pradesh, India
Kasala Farzia Department of Biochemistry, Apollo Institute of Medical Sciences and Research Chittoor, Murukambattu - 517127, Chittoor, Andhra Pradesh, India
Eddula Venkataravikanth Department of Dermatology (DVL), Apollo Institute of Medical Sciences and Research Chittoor, Murukambattu - 517127, Chittoor, Andhra Pradesh, India
Lokesh Ravi Centre for Digital Health & Precision Medicine, The Apollo University, Chittoor, Andhra Pradesh, 517127, India

DOI:

https://doi.org/10.6000/1929-6029.2025.14.54

Keywords:

Breast Cancer, Machine Learning, Random Forest, AUC-ROC, Predictive Modeling

Abstract

Background: Breast cancer is the most common malignancy among women worldwide, underscoring the importance of early detection and accurate prognostication. Machine learning (ML) has emerged as a promising approach, offering powerful tools for analyzing complex datasets in breast cancer prediction and diagnosis.

Objective: This study evaluates the predictive performance of diverse ML algorithms for breast cancer classification using publicly available datasets, focusing on accuracy, interpretability, and generalizability.

Methods: The dataset included clinical and demographic variables such as age, menopausal status, tumor size, and lymph node involvement. Data preprocessing addressed missing values and class imbalance, with the Synthetic Minority Oversampling Technique (SMOTE) applied to improve sensitivity for the minority class. Feature engineering involved interaction terms and scaling of numerical variables. Multiple ML models—Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, Support Vector Machine (SVM), Naive Bayes, K-Nearest Neighbors (KNN), and Neural Networks—were trained and evaluated. Performance was measured using sensitivity, F1-score, and AUC-ROC. Model interpretability was enhanced with SHapley Additive exPlanations (SHAP).

Results: Random Forest achieved the best performance with an AUC-ROC of 0.9751, followed by Gradient Boosting (0.9242) and Neural Networks (0.9254). Logistic Regression and SVM yielded comparable results (0.9005 and 0.9344). Ensemble models showed higher accuracy and generalizability, particularly on external validation. Tumor size and lymph node involvement emerged as key predictors. SMOTE improved sensitivity across models.

Conclusion: This study demonstrates the potential of ML in breast cancer prediction, emphasizing the effectiveness of ensemble methods and interpretability tools. Future work should focus on integrating ML into clinical practice for earlier detection and personalized treatment.

References

Wilkinson L, Gathani T. Understanding breast cancer as a global health concern. Br J Radiol 2022; 95(1130): 20211033. DOI: https://doi.org/10.1259/bjr.20211033

Arnold M, et al. Current and future burden of breast cancer: global statistics for 2020 and 2040. Breast 2022; 66: 15-23. DOI: https://doi.org/10.1016/j.breast.2022.08.010

Chakraborty C, Bhattacharya M, Pal S, Lee S-S. From machine learning to deep learning: advances of the recent data-driven paradigm shift in medicine and healthcare. Curr Res Biotechnol 2024; 7: 100164. DOI: https://doi.org/10.1016/j.crbiot.2023.100164

Liao J, et al. Artificial intelligence assists precision medicine in cancer treatment. Front Oncol 2022; 12: 998222. DOI: https://doi.org/10.3389/fonc.2022.998222

Zhang B, Shi H, Wang H. Machine learning and AI in cancer prognosis, prediction, and treatment selection: a critical approach. J Multidiscip Healthc 2023; 16: 1779-1791. DOI: https://doi.org/10.2147/JMDH.S410301

Islam T, et al. Predictive modeling for breast cancer classification in the context of Bangladeshi patients by use of machine learning approach with explainable AI. Sci Rep 2024; 14(1): 8487. DOI: https://doi.org/10.1038/s41598-024-57740-5

Khalid A, et al. Breast cancer detection and prevention using machine learning. Diagnostics (Basel) 2023; 13(19): 3113. DOI: https://doi.org/10.3390/diagnostics13193113

Omar ED, et al. Comparative analysis of logistic regression, gradient boosted trees, SVM, and random forest algorithms for prediction of acute kidney injury requiring dialysis after cardiac surgery. Int J Nephrol Renovasc Dis 2024; 17: 197-204. DOI: https://doi.org/10.2147/IJNRD.S461028

Noura HN, Chu T, Allal Z, Salman O, Chahine K. A comparative study of ensemble methods and multi-output classifiers for predictive maintenance of hydraulic systems. Results Eng 2024; 24: 102900. DOI: https://doi.org/10.1016/j.rineng.2024.102900

Kern C, Klausch T, Kreuter F. Tree-based machine learning methods for survey research. Surv Res Methods 2019; 13(1): 73-93.

Priya CV L, V G BV, B R V, Ramachandran S. Deep learning approaches for breast cancer detection in histopathology images: a review. Cancer Biomark 2024; 40(1): 1-25. DOI: https://doi.org/10.3233/CBM-230251

Han Y, Joe I. Enhancing machine learning models through PCA, SMOTE-ENN, and stochastic weighted averaging 2024. DOI: https://doi.org/10.3390/app14219772

Gonzalez-Cuautle D, et al. Synthetic minority oversampling technique for optimizing classification tasks in botnet and intrusion-detection-system datasets 2020. DOI: https://doi.org/10.3390/app10030794

Khan AA, Chaudhari O, Chandra R. A review of ensemble learning and data augmentation models for class imbalanced problems: combination, implementation and evaluation. Expert Syst Appl 2024; 244: 122778. DOI: https://doi.org/10.1016/j.eswa.2023.122778

Jaganathan D, Balasubramaniam S, Sureshkumar V, Dhanasekaran S. Revolutionizing breast cancer diagnosis: a concatenated precision through transfer learning in histopathological data analysis. Diagnostics (Basel) 2024; 14(4): 0422. DOI: https://doi.org/10.3390/diagnostics14040422

Amethiya Y, Pipariya P, Patel S, Shah M. Comparative analysis of breast cancer detection using machine learning and biosensors. Intell Med 2022; 2(2): 69-81. DOI: https://doi.org/10.1016/j.imed.2021.08.004

Ming C, Viassolo V, Probst-Hensch N, Chappuis PO, Dinov ID, Katapodi MC. Machine learning techniques for personalized breast cancer risk prediction: comparison with the BCRAT and BOADICEA models. Breast Cancer Res 2019; 21(1): 75. DOI: https://doi.org/10.1186/s13058-019-1158-4

Sardanelli F, Magni V, Rossini G, Kilburn-Toppin F, Healy NA, Gilbert FJ. The paradox of MRI for breast cancer screening: high-risk and dense breasts—available evidence and current practice. Insights Imaging 2024; 15(1): 96. DOI: https://doi.org/10.1186/s13244-024-01653-4

Sharma A, Goyal D, Mohana R. An ensemble learning-based framework for breast cancer prediction. Decis Anal J 2024; 10: 100372. DOI: https://doi.org/10.1016/j.dajour.2023.100372

Obaido G, et al. Supervised machine learning in drug discovery and development: algorithms, applications, challenges, and prospects. Mach Learn with Appl 2024; 17: 100576. DOI: https://doi.org/10.1016/j.mlwa.2024.100576

Javaid M, Haleem A, Singh RP, Suman R, Rab S. Significance of machine learning in healthcare: features, pillars and applications. Int J Intell Networks 2022; 3: 58-73. DOI: https://doi.org/10.1016/j.ijin.2022.05.002

Singh S, Kumar R, Payra S, Singh SK. Artificial intelligence and machine learning in pharmacological research: bridging the gap between data and drug discovery. Cureus 2023; 15(8): e44359. DOI: https://doi.org/10.7759/cureus.44359

Cabitza F, et al. The importance of being external: methodological insights for the external validation of machine learning models in medicine. Comput Methods Programs Biomed 2021; 208: 106288. DOI: https://doi.org/10.1016/j.cmpb.2021.106288

Hanna M, et al. Ethical and bias considerations in artificial intelligence (AI)/machine learning. Mod Pathol 2024; 100686. DOI: https://doi.org/10.1016/j.modpat.2024.100686

Freiesleben T, König G, Molnar C, Tejero-Cantero Á. Scientific inference with interpretable machine learning: analyzing models to learn about real-world phenomena. Minds Mach 2024; 34(3): 32. DOI: https://doi.org/10.1007/s11023-024-09691-z

Nasarian E, Alizadehsani R, Acharya UR, Tsui K-L. Designing interpretable ML system to enhance trust in healthcare: a systematic review to proposed responsible clinician-AI-collaboration framework. Inf Fusion 2024; 108: 102412. DOI: https://doi.org/10.1016/j.inffus.2024.102412

Marey A, et al. Explainability, transparency and black box challenges of AI in radiology: impact on patient care in cardiovascular radiology. Egypt J Radiol Nucl Med 2024; 55(1): 183. DOI: https://doi.org/10.1186/s43055-024-01356-2

Salahuddin Z, Woodruff HC, Chatterjee A, Lambin P. Transparency of deep neural networks for medical image analysis: a review of interpretability methods. Comput Biol Med 2022; 140: 105111. DOI: https://doi.org/10.1016/j.compbiomed.2021.105111

Sushmitha GLN, Utukuru S. Age-based disease prediction and health monitoring: integrating explainable AI and deep learning techniques. Iran J Comput Sci 2025. DOI: https://doi.org/10.1007/s42044-024-00223-7

Farah L, Murris JM, Borget I, Guilloux A, Martelli NM, Katsahian SIM. Assessment of performance, interpretability, and explainability in artificial intelligence-based health technologies: what healthcare stakeholders need to know. Mayo Clin Proc Digit Heal 2023; 1(2): 120-138. DOI: https://doi.org/10.1016/j.mcpdig.2023.02.004

Jin Y, Lan A, Dai Y, Jiang L, Liu S. Development and testing of a random forest-based machine learning model for predicting events among breast cancer patients with a poor response to neoadjuvant chemotherapy. Eur J Med Res 2023; 28(1): 394. DOI: https://doi.org/10.1186/s40001-023-01361-7

Becker AS, Marcon M, Ghafoor S, Wurnig MC, Frauenfelder T, Boss A. Deep learning in mammography: diagnostic accuracy of a multipurpose image analysis software in the detection of breast cancer. Invest Radiol 2017; 52(7): 434-440. DOI: https://doi.org/10.1097/RLI.0000000000000358