Sample Size and Statistical Power Calculation in Multivariable Analyses: Development and Implementation of "SampleSizeMulti" Packages in R

Authors

  • Víctor J. Vera-Ponce Instituto de Investigación de Enfermedades Tropicales, Universidad Nacional Toribio Rodríguez de Mendoza de Amazonas (UNTRM), Amazonas, Perú and Facultad de Medicina (FAMED), Universidad Nacional Toribio Rodríguez de Mendoza de, Amazonas (UNTRM), Amazonas, Perú
  • Fiorella E. Zuzunaga-Montoya Universidad Continental, Lima, Perú
  • Nataly M. Sanchez-Tamay Instituto de Investigación de Enfermedades Tropicales, Universidad Nacional Toribio Rodríguez de Mendoza de Amazonas (UNTRM), Amazonas, Perú and Facultad de Medicina (FAMED), Universidad Nacional Toribio Rodríguez de Mendoza de, Amazonas (UNTRM), Amazonas, Perú
  • Luisa E.M. Vásquez-Romero Instituto de Investigación de Enfermedades Tropicales, Universidad Nacional Toribio Rodríguez de Mendoza de Amazonas (UNTRM), Amazonas, Perú
  • Joan A. Loayza-Castro Instituto de Investigación de Enfermedades Tropicales, Universidad Nacional Toribio Rodríguez de Mendoza de Amazonas (UNTRM), Amazonas, Perú
  • Christian H. Huaman-Vega Instituto de Investigación de Enfermedades Tropicales, Universidad Nacional Toribio Rodríguez de Mendoza de Amazonas (UNTRM), Amazonas, Perú and Escuela Profesional de Psicología, Facultad de Ciencias de la Salud (FACISA), Universidad Nacional Toribio Rodríguez de Mendoza de Amazonas (UNTRM), Amazonas, Perú https://orcid.org/0000-0003-2333-2254
  • Rafael Tapia-Limonchi Instituto de Investigación de Enfermedades Tropicales, Universidad Nacional Toribio Rodríguez de Mendoza de Amazonas (UNTRM), Amazonas, Perú and Facultad de Medicina (FAMED), Universidad Nacional Toribio Rodríguez de Mendoza de, Amazonas (UNTRM), Amazonas, Perú
  • Carmen I.G. De Carrillo Instituto de Investigación de Enfermedades Tropicales, Universidad Nacional Toribio Rodríguez de Mendoza de Amazonas (UNTRM), Amazonas, Perú and Facultad de Medicina (FAMED), Universidad Nacional Toribio Rodríguez de Mendoza de, Amazonas (UNTRM), Amazonas, Perú https://orcid.org/0000-0002-4711-7201

DOI:

https://doi.org/10.6000/1929-6029.2024.13.24

Keywords:

Sample size, Statistical Inference, Regression Analysis, Epidemiological methods, Software Design, Research Design, correlation coefficient (source: Mesh)

Abstract

This paper presents advanced methodological approaches and practical tools for sample size calculation in epidemiological studies involving multivariable analyses. Traditional sample size calculation methods often fail to account for the complexity of modern statistical analyses, particularly regarding the correlation between covariates in multivariable models.

We introduce a series of R packages (SampleSizeMulti) designed to address these limitations. These packages offer two distinct calculation approaches: one based on the multiple correlation coefficient between covariates (rho-based method) and another utilizing standard errors from previous studies (SE-based method). These complementary approaches provide comprehensive solutions for different association measures commonly used in epidemiological research: prevalence ratios, odds ratios, risk ratios, and hazard ratios.

The rho-based method innovatively incorporates the explicit consideration of the multiple correlation coefficient between covariates, significantly impacting required sample sizes in multivariable analyses. The SE-based method leverages information from previous studies through their confidence intervals, offering an alternative when correlation estimates are unavailable but published results exist. Furthermore, both approaches integrate crucial logistical considerations, including rejection rates, eligibility criteria, and expected losses to follow-up, providing researchers with realistic estimates of recruitment requirements and timelines.

Seven detailed case studies covering various epidemiological study designs and analytical scenarios demonstrate the practical application of these methods. These examples illustrate how correlation values, standard errors, and logistical factors influence sample size calculations and study planning.

The implementation in R ensures accessibility and reproducibility, while the incorporation of logistical planning tools bridges the gap between theoretical calculations and practical research requirements. These methods represent a significant advancement in study design methodology, potentially improving the quality and efficiency of epidemiological research by ensuring adequate statistical power while optimizing resource utilization.

References

García-García JA, Reding-Bernal A, López-Alvarenga JC. Cálculo del tamaño de la muestra en investigación en educación médica. Investig En Educ Médica. 2013; 2(8): 217-24. https://doi.org/10.1016/S2007-5057(13)72715-7 DOI: https://doi.org/10.1016/S2007-5057(13)72715-7

Biau DJ, Kernéis S, Porcher R. Statistics in brief: the importance of sample size in the planning and interpreting medical research. Clin Orthop. 2008; 466(9): 2282-8. https://doi.org/10.1007/s11999-008-0346-9 DOI: https://doi.org/10.1007/s11999-008-0346-9

Noordzij M, Tripepi G, Dekker FW, Zoccali C, Tanck MW, Jager KJ. Sample size calculations: basic principles and common pitfalls. Nephrol Dial Transplant Off Publ Eur Dial Transpl Assoc - Eur Ren Assoc. 2010; 25(5): 1388-93. https://doi.org/10.1093/ndt/gfp732 DOI: https://doi.org/10.1093/ndt/gfp732

Vittinghoff E, Glidden DV, Shiboski SC, McCulloch CE. Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models [Internet]. Boston,

MA: Springer US; 2012 [citado el 21 de octubre de 2024]. (Statistics for Biology and Health). https://doi.org/10.1007/978-1-4614-1353-0 DOI: https://doi.org/10.1007/978-1-4614-1353-0

Button KS, Ioannidis JPA, Mokrysz C, Nosek BA, Flint J, Robinson ESJ, et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci. 2013; 14(5): 365-76. https://doi.org/10.1038/nrn3475 DOI: https://doi.org/10.1038/nrn3475

Althubaiti A. Sample size determination: A practical guide for health researchers. J Gen Fam Med. 2022; 24(2): 72. https://doi.org/10.1002/jgf2.600 DOI: https://doi.org/10.1002/jgf2.600

Hanley JA. Simple and multiple linear regression: sample size considerations. J Clin Epidemiol. 2016; 79: 112-9. https://doi.org/10.1016/j.jclinepi.2016.05.014 DOI: https://doi.org/10.1016/j.jclinepi.2016.05.014

Qin X. Sample size and power calculations for causal mediation analysis: A Tutorial and Shiny App. Behav Res Methods. 2024; 56(3): 1738-69. https://doi.org/10.3758/s13428-023-02118-0 DOI: https://doi.org/10.3758/s13428-023-02118-0

Statistical Methods for Rates and Proportions, 3rd Edition | Wiley [Internet]. Wiley.com. [citado el 21 de octubre de 2024]. Disponible en: https://www.wiley.com/en-in/Statistical+

Methods+for+Rates+and+Proportions%2C+3rd+Edition-p-9780471526292

Vittinghoff E, McCulloch CE. Relaxing the rule of ten events per variable in logistic and Cox regression. Am J Epidemiol. 2007; 165(6): 710-8. https://doi.org/10.1093/aje/kwk052 DOI: https://doi.org/10.1093/aje/kwk052

Demidenko E. Sample size and optimal design for logistic regression with binary interaction. Stat Med. 2008; 27(1): 36-46. https://doi.org/10.1002/sim.2980 DOI: https://doi.org/10.1002/sim.2980

Marill KA. Advanced statistics: linear regression, part II: multiple linear regression. Acad Emerg Med Off J Soc Acad Emerg Med. 2004; 11(1): 94-102. https://doi.org/10.1197/j.aem.2003.09.006 DOI: https://doi.org/10.1111/j.1553-2712.2004.tb01379.x

Zurakowski D, Staffa SJ. Statistical power and sample size calculations for time-to-event analysis. J Thorac Cardiovasc Surg. 2023; 166(6): 1542-1547.e1. https://doi.org/10.1016/j.jtcvs.2022.09.023 DOI: https://doi.org/10.1016/j.jtcvs.2022.09.023

Ury HK. Efficiency of case-control studies with multiple controls per case: continuous or dichotomous data. Biometrics. 1975; 31(3): 643-9. DOI: https://doi.org/10.2307/2529548

Modern Epidemiology [Internet]. [citado el 21 de octubre de 2024]. Disponible en: https://www.wolterskluwer.com/en/

solutions/ovid/modern-epidemiology-4634

Cohen J, Cohen P, West SG, Aiken L. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, Third Edition [Internet]. Taylor and Francis; 2013 [citado el 21 de octubre de 2024]. https://doi.org/10.4324/9780203774441 DOI: https://doi.org/10.4324/9780203774441

Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2a ed. New York: Routledge; 1988; 567 p. https://doi.org/10.4324/9780203771587 DOI: https://doi.org/10.4324/9780203771587

Jenkins DG, Quintana-Ascencio PF. A solution to minimum sample size for regressions. PloS One. 2020; 15(2): e0229345. https://doi.org/10.1371/journal.pone.0229345 DOI: https://doi.org/10.1371/journal.pone.0229345

Shieh G. Precise confidence intervals of regression-based reference limits: Method comparisons and sample size requirements. Comput Biol Med. 2017; 91: 191-7. https://doi.org/10.1016/j.compbiomed.2017.10.015 DOI: https://doi.org/10.1016/j.compbiomed.2017.10.015

Downloads

Published

2024-11-25

How to Cite

Vera-Ponce, V. J. ., Zuzunaga-Montoya, F. E. ., Sanchez-Tamay, N. M. ., Vásquez-Romero, L. E. ., Loayza-Castro, J. A. ., Huaman-Vega, C. H. ., Tapia-Limonchi, R. ., & De Carrillo, C. I. . (2024). Sample Size and Statistical Power Calculation in Multivariable Analyses: Development and Implementation of "SampleSizeMulti" Packages in R. International Journal of Statistics in Medical Research, 13, 259–274. https://doi.org/10.6000/1929-6029.2024.13.24

Issue

Section

General Articles

Most read articles by the same author(s)