Sample Size Calculator

Sample Size Estimation in Clinical Research: from Randomized Controlled Trials to Observational Studies

Wang, X. and Ji, X., 2020. Sample size estimation in clinical research: from randomized controlled trials to observational studies. Chest, 158(1), pp.S12-S20.

Wang, X. and Ji, X., 2020. Sample size formulas for different study designs: supplement document for sample size estimation in clinical research.

Reference Example

Chow S-C, Shao J, Wang H, Lokhnygina Y. Sample Size Calculations in Clinical Research. Third ed: Chapman and Hall/CRC; 2017.

Type I error rate, \(\alpha\)

Power, \(1-\beta\)

Ratio of case to control, \(k\)

Allowable difference, \(d=\mu_T-\mu_C\)

Expected population standard deviation, \(\text{SD}\)

\(\delta (>0)\)

Drop rate (%, 0 ~ 99)

Reference Example

Chow S-C, Shao J, Wang H, Lokhnygina Y. Sample Size Calculations in Clinical Research. Third ed: Chapman and Hall/CRC; 2017.

Type I error rate, \(\alpha\)

Power, \(1-\beta\)

Ratio of case to control, \(k\)

Drop rate (%, 0 ~ 99)

Proportion Odds ratio

\(p_T\)

\(p_C\)

Margin on risk difference scale (\(\delta \geq 0)\)

Odds ratio

Margin for log-scale odds ratio (\(\delta>0)\)

\(p_C\)

Reference Example

Schoenfeld D. The Asymptotic Properties of Nonparametric-Tests for Comparing Survival Distributions. Biometrika. 1981;68(1):316-319.

Schoenfeld D. Sample-Size Formula for the Proportional-Hazards Regression-Model. Biometrics. 1983;39(2):499-503.

Type I error rate, \(\alpha\)

Power, \(1-\beta\)

Ratio of case to control, \(k\)

Margin for log-scale hazard ratio (\(\delta\)>0)

Known probabilities of event during the trial

Estimate probabilities of event during the trial using exponential model

\(\pi_T\)

\(\pi_C\)

Accrual time period, \(T_a\)

Follow-up time period, \(T_b\)

Hazard for the control group , \(\lambda_C\)

Reference Example

Fleiss JL, Levin B, Paik MC. Statistical Methods for Rates and Proportions. Third ed: John Wiley & Sons; 2013.

A case-control study of the relationship between smoking and CHD is planned. A sample of men with newly diagnosed CHD will be compared for smoking status with a sample of controls. Assuming an equal number of cases and controls (i.e., \(k = 1\)). Previous surveys have shown that around 0.40 of males without CHD are smokers (i.e., \(p_0 = 0.4\)). For achieving an 90% power (i.e., \(1-\beta = 0.9\)) at the 5% level of significance (i.e., \(\alpha = 0.05\)), the sample size to detect an odds ratio of 1.5 (i.e., \(OR = 1.5\) or \(p_1 = 0.5\)) is \(519\) cases and \(519\) controls or \(538\) cases and \(538\) controls by incorporating the continuity correction.

Two-sided (Unchecking the checkbox will perform the sample estimation for a one-sided test.)

Type I error rate, \(\alpha\)

Power, \(1-\beta\)

Ratio of case to control, \(k\)

Proportion Odds ratio

\(p_0\)

\(p_1\)

Reference Example

Dupont WD. Power calculations for matched case-control studies. Biometrics. 1988;44(4):1157-1168.

Suppose a researcher conduct a matched case-control study to assess whether bladder cancer may be associated with past exposure to cigarette smoking. Cases will be patients with bladder cancer and controls will be patients hospitalised for injury. One case will be matched to one control (i.e., \(k = 1\))and the correlation between case and control exposures for matched pairs is estimated to be 0.01 (low, i.e., \(r = 0.01\)). It is assumed that 20% of controls will be smokers or past smokers (i.e., \(p_0 = 0.2\)), and the researcher wish to detect an odds ratio of 2 (i.e., \(OR = 2\) or \(p_1 = 0.67\)) with power 90% (i.e., \(1-\beta = 0.9\)). The sample size needed for cases and controls is \(16\) and \(16\), respectively.

Two-sided (Unchecking the checkbox will perform the sample estimation for a one-sided test.)

Type I error rate, \(\alpha\)

Power, \(1-\beta\)

Ratio of case to control, \(k\)

\(r\)

Proportion Odds ratio

\(p_0\)

\(p_1\)

Reference Example

Woodward M. Formulae for sample size, power and minimum detectable relative risk in medical studies. Journal of the Royal Statistical Society: Series D (The Statistician). 1992;41(2):185-196

Suppose that the primary interest lies in comparing systolic blood pressure between the two cities. Assume that simple random sampling from among 40-44-year-old men is to be used in each city with twice as many sampled from City 1 as from City 2, so that \(k=2\). Systolic blood pressure is to be compared using a one-sided 5% significance test (i.e. \(\alpha = 0.05\)). The medical investigators wish to be 95% sure of detecting when the average blood pressure in City 1 exceeds that in City 2 by 3 mm Hg (i.e., \(1-\beta=0.95\) and \(m_1 = 3\), \(m_2 = 0\)). From published literature (Smith et al. 1989) the standard deviation of systolic blood pressure is likely to be 15.6mmHg (i.e. \(SD=15.6\)). The sample size required is \(878\) for City 1 and \(439\) for City 2.

Two-sided (Unchecking the checkbox will perform the sample estimation for a one-sided test.)

Type I error rate, \(\alpha\)

Power, \(1-\beta\)

Ratio of first samples to second samples, \(k\)

\(m_1\)

\(m_2\)

Expected population standard deviation, \(\text{SD}\)

Reference Example

Fleiss JL, Levin B, Paik MC. Statistical Methods for Rates and Proportions. Third ed: John Wiley & Sons; 2013.

Woodward M. Formulae for sample size, power and minimum detectable relative risk in medical studies. Journal of the Royal Statistical Society: Series D (The Statistician). 1992;41(2):185-196

Suppose the estimated prevalence of smoking is higher among male students (around 50%, i.e., \(p_1 = 0.5\)) compared with female students (around 35%, i.e., \(p_2 = 0.35\)). In order to 80% certain (i.e., \(1-\beta=0.8\)) of detecting a prevalence ratio of \(RR = 0.50 / 0.35 = 1.428\) using a 0.05 level of significance (i.e., \(\alpha =0.05\)) with equal number of recruited males and females, the study needs to enroll \(170\) males and \(170\) females.

Two-sided (Unchecking the checkbox will perform the sample estimation for a one-sided test.)

Type I error rate, \(\alpha\)

Power, \(1-\beta\)

Ratio of first samples to second samples, \(k\)

Proportion Relative risk

\(p_1\)

\(p_2\)

Reference Example

Cochran WG. Sampling Techniques. John Wiley & Sons; 1977.

Kotrlik, J. W. K. J. W., & Higgins, C. C. H. C. C. (2001). Organizational research: Determining appropriate sample size in survey research appropriate sample size in survey research. Information technology, learning, and performance journal, 19(1), 43.

Suppose the researcher assumes a seven (\(7\)) point scaled survery as a continuous data. Suppose for the continuous variable, the level of acceptable error is 3% (i.e., \(d = 0.21\)), and the estimated standard deviation of the scale as 1.167 (i.e., \(SD = 1.167\)). At the 5% Type I error rate (i.e., \(\alpha = 0.05\)), the sample size of the survery is \(119\).

Type I error rate, \(\alpha\)

Standard deviation of outcome, \(SD\)

Absolute error or precision, \(d\)

Reference Example

Cochran WG. Sampling Techniques. John Wiley & Sons; 1977.

Suppose for the proportional variable, the level of acceptable error is 5% (i.e., \(d = 0.05\)), and the expected proportion in population is 0.5 (i.e., \(p = 0.5\)). At the 5% Type I error rate (i.e., \(\alpha = 0.05\)), the sample size of the survery is \(385\).

Type I error rate, \(\alpha\)

Expected proportion in population, \(p\)

Absolute error or precision, \(d\)

Reference

Riley R D, Ensor J, Snell K I E, Harrell F E, Martin G P, Reitsma J B et al. (2020). Calculating the sample size required for developing a clinical prediction model. BMJ, m441. doi: 10.1136/bmj.m441

Expected value of the (Cox-Snell) R-squared of the new model

Number of candidate predictor parameters for potential inclusion in the new model

Level of shrinkage desired at internal validation after developing the new model

Overall outcome proportion (for a prognostic model) or overall prevalence (for a diagnostic model)

Use C-statistic in conjunction with the expected prevalence to approximate the Cox-Snell R-squared (input of R-squared will be ignored)

C-statistic reported in an existing prediction model study

Reference

Riley R D, Ensor J, Snell K I E, Harrell F E, Martin G P, Reitsma J B et al. (2020). Calculating the sample size required for developing a clinical prediction model. BMJ, m441. doi: 10.1136/bmj.m441

Expected value of the (Cox-Snell) R-squared of the new model

Number of candidate predictor parameters for potential inclusion in the new model

Level of shrinkage desired at internal validation after developing the new model

Overall event rate in the population of interest

Timepoint of interest for prediction in follow-up

Average (mean) follow-up time anticipated for individuals

Reference

Riley R D, Ensor J, Snell K I E, Harrell F E, Martin G P, Reitsma J B et al. (2020). Calculating the sample size required for developing a clinical prediction model. BMJ, m441. doi: 10.1136/bmj.m441

Expected value of the (Cox-Snell) R-squared of the new model

Number of candidate predictor parameters for potential inclusion in the new model

Level of shrinkage desired at internal validation after developing the new model

Average outcome value in the population of interest

Standard deviation (SD) of outcome values in the population

Multiplicative margin of error (MMOE) acceptable for calculation of the intercept

Reference

Lu, Grace, "Sample Size Formulas For Estimating Areas Under the Receiver Operating Characteristic Curves With Precision and Assurance" (2021). Electronic Thesis and Dissertation Repository. 8045. https://ir.lib.uwo.ca/etd/8045

Area under ROC curve

Null hypothesis AUC value

Prevalence (ratio of positive cases / total sample size)

Type I error rate, \(\alpha\)

Power, 1-\(\beta\)