In this lecture we will do some hands-on examples of power and sample size calculations in survival analysis using R.

Note: This lecture is designed based on several resources. For details see the end-notes1.


1 Introduction

Earlier we talked about the p-value from log-rank test to check whether two groups differ with respect to survival/hazard.

  • One reason such tests are useful is that they provide an objective criteria (statistical significance) around which to plan out a study:

    • How many subjects do we need?

    • How long will the study take to complete?

  • In survival analysis, we need to specify information regarding the censoring mechanism and the particular survival distributions in the null and alternative hypotheses.

    • First, one needs either to specify what parametric survival model to use, or that the test will be semi-parametric, e.g., the log-rank test. This allows for determining the number of deaths (or events) required to meet the power and other design specifications.

    • Second, one must also provide an estimate of the number of patients that need to be entered into the trial to produce the required number of deaths.

  • We shall assume that the patients enter a trial over a certain accrual period of length \(a\), and then followed for an additional period of time \(f\) known as the follow-up time. Patients still alive at the end of follow-up are censored.

We will describe sample size methods for single arm clinical trials and two arm clinical trials.

1.1 Exponential Approximation

  • Let us assume we have constant hazards (i.e., exponential distributions) for the sake of simplicity.

  • Other work in literature has indicated that the power/sample size obtained from assuming constant hazards is fairly close to the empirical power of the log-rank test, provided that the ratio between the two hazard functions is constant.

  • Typically in a power analysis, we are simply trying to find the approximate number of subjects required by the study, and many approximations/guesses are involved, so using formulas based on the exponential distribution is often good enough.

  • No Censoring (Special Case): If \(T_i \stackrel{iid}{\sim} Exp(\lambda) = Gamma(1, \lambda)\) for \(i=1,\ldots,d\), then \(L(\lambda) = \lambda^d e^{-\lambda V}\), where \(V = \sum\limits_i t_i \sim Gamma(d, \lambda)\).

  • Now let us assume that we have an initial sample size \(n\) and follow \(d\) subjects to failure, then \[T_{(1)} \sim Exp(n\lambda),\] \[T_{(2)} - T_{(1)} \sim Exp((n - 1)\lambda),\] \[\vdots\] \[T_{(j)} - T_{(j-1)} \sim Exp((n - j + 1)\lambda),\] for \(j = 1, \ldots, d\), with \(T_{(0)} = 0\). Here \(t_{(j)}\) are the order statistics.

  • Alternatively, let \(U_j = (n - j + 1)(T_{(j)} - T_{(j-1)})\), then \(U_j \stackrel{iid}{\sim} Exp(\lambda)\) and \(L(\lambda) = \lambda^d e^{-\lambda V}\), where \(V = \sum\limits_j u_j \sim Gamma(d, \lambda)\).

  • The exponential distribution, therefore, has the somewhat remarkable property that we arrive at the exact same inference:

    • if we follow \(d\) subjects until all have failed, or

    • if we follow some larger number \(n\) until \(d\) have failed.

  • Note that because the exact distribution of \(V\) is known and easy to work with, it is possible to carry out exact power and sample size calculations. However, one can obtain much simpler, closed-form expressions through a normal approximation.

  • The exponential distribution has mean \(1/\lambda\) and variance \(1/\lambda^2\). By the central limit theorem, we have \(\bar{X} \sim N(\frac{1}{\lambda},\frac{1}{n\lambda^2})\).

    • This result, however, is not particularly satisfactory due to the \(\lambda\) term in the variance, which means we will have to solve a nonlinear equation to determine power/sample size.
  • Consider instead the variance-stabilizing transformation \(g(x) = \log(x)\). By the delta method, we have \(log \bar{X} \sim N(\log(\lambda),\frac{1}{n})\).

    • In addition to the convenience of linearity, variance-stabilizing transformations also typically lead to more accurate normal approximations.
  • If \(X_i \stackrel{iid}{\sim} Exp(\lambda_1)\) and \(Y_i \stackrel{iid}{\sim} Exp(\lambda_2)\) with \(X_i\) and \(Y_i\) being independent, then we have \(\log\big(\frac{\bar{Y}}{\bar{X}}\big) \sim N\big(\Delta,\frac{1}{n_1}+\frac{1}{n_2}\big)\), where \(\Delta = \frac{\lambda_1}{\lambda_2}\) is the hazard ratio.


2 Single-Arm Study

  • Assume that the survival times follow an exponential distribution with hazard \(h(t) = \lambda\) and survival distribution \(S(t) = e^{-\lambda t}\).

  • We want to test \(H_0: \lambda = \lambda_0\) versus \(H_A: \lambda = \lambda_A\).

    • The null hypothesis mean, \(\mu_0 = 1/\lambda_0\), would correspond to the mean survival one has observed in the past for the standard therapy, and the alternative hazard, \(\mu_A = 1/\lambda_A\) is a (presumably) larger mean survival that we aim to find with a new, experimental therapy.

    • Thus, the treatment ratio we would like to detect is \(\Delta = \frac{\mu_A}{\mu_0} = \frac{\lambda_0}{\lambda_A}\).

  • Our ultimate goal is to determine how many patients we need to detect a certain hazard ratio \(\Delta\) with a specified power and significance level.

  • In practice, the null and alternative hypotheses may be expressed in terms of either median survival or survival probability at a specified time \(t\). A median survival may be converted into a hazard rate by re-expressing \(0.5 = e^{-\lambda t}\) as \(\lambda = \log(2)/t\).

  • Let us use the reparameterization \(\theta = \log(\mu) = -\log(\lambda)\).

  • Let \(d = \sum \delta_i\) and \(V = \sum t_i\) are the number of deaths and the total patient-time, respectively.

  • The MLE of \(\theta\) is given as \(\hat{\theta} = \log(V/d)\) and \(Var(\hat{\theta}) = 1/d\).

2.1 Log-Mean Method

We will use \(\hat{\theta}\) as our test statistic, and reject \(H_0\) in favor of \(H_A\) if \(\hat{\theta} > k\) for some constant \(k\).

  • The significance level of the test, or Type I error rate, is \(\alpha = P(\hat{\theta}>k | \theta = \theta_0)\).

    • If \(Z = \frac{\hat{\theta}-\theta}{1/\sqrt{d}}\), then we have \(\alpha = P(Z>\frac{k-\theta_0}{1/\sqrt{d}})\).

    • Let \(\Phi(z_\alpha) = 1-\alpha\), then \(z_\alpha = \frac{k-\theta_0}{1/\sqrt{d}}\) and hence \(k = \theta_0 + \frac{z_\alpha}{\sqrt{d}}\).

  • The power of the test is given by \[ 1-\beta = P(\hat{\theta} > k | \theta = \theta_A) = P\Big(Z>\frac{k-\theta_A}{1/\sqrt{d}}\Big)\]

  • Solving for \(d\) we have \[z_{1-\beta} = -z_\beta = \sqrt{d}(k-\theta_A) = \sqrt{d}\Big(\theta_0 + \frac{z_\alpha}{\sqrt{d}}-\theta_A\Big)\] \[ \Rightarrow d = \frac{(z_\beta+z_\alpha)^2}{(\theta_A+\theta_0)^2} = \frac{(z_\beta+z_\alpha)^2}{(\log \Delta)^2}.\]

Following is the function to compute \(d\) as described above:

expLogMeanDeaths = function(Delta, alpha, pwr){
  z.alpha = qnorm(alpha, lower.tail=F)
  z.beta = qnorm(1-pwr, lower.tail=F)
  num = (z.alpha + z.beta)^2
  denom = (log(Delta))^2
  dd = num/denom
  dd
}

This gives us the number of deaths needed to achieve the specified power, but not the total number of patients.

2.1.1 Example 1

Suppose that we are designing a Phase II oncology trial where we plan a \(5\%\) level (one-sided) test, and we need \(80\%\) power to detect a hazard ratio of \(1.5\).

We can find the required number of deaths as follows:

# Log-mean based approach
expLogMeanDeaths(Delta = 1.5, alpha = 0.05, pwr = 0.8)
## [1] 37.60635

That is, we would need \(\approx 38\) deaths according to the log mean method (function expLogMeanDeaths).

  • This derivation of log-mean method is based on the parameter estimate normalized by its standard deviation, which produces the Wald test statistic.

  • One aspect of this test is that different paramterizations lead to somewhat different formulas for the sample size. This dependence on parameterization may be largely avoided by working directly with the likelihood ratio statistic.

2.2 Likelihood-Ratio Based Approach

  • For fixed \(d\), \(V = \sum t_i \sim Gamma(d, \lambda)\) and it is known2 that \[W = \frac{2d\lambda}{\hat{\lambda}} \sim \chi^2_{2d},\] although this result is approximate for general censoring patterns.
  • Under \(H_0: \lambda = \lambda_0\), we need to find a constant \(k\) such that \(\alpha = P(1/\hat{\lambda} > k |\lambda = \lambda_0) = P(W > 2dk\lambda_0)\). Thus we have \(\chi^2_{2d,\alpha} = 2dk\lambda_0\) and hence \(k = \frac{\chi^2_{2d,\alpha}}{2d\lambda_0}\).

  • The power of the test is given by \[1-\beta = P(1/\hat{\lambda > k | \lambda = \lambda_A} = P(W > 2dk\lambda_A)).\]

  • We have \(\chi^2_{2d,1-\beta} = 2dk\lambda_A \Rightarrow \chi^2_{2d,1-\beta} = \frac{\chi^2_{2d,\alpha}\lambda_A}{\lambda_0}\), hence \(\Delta = \frac{\lambda_0}{\lambda_A} = \frac{\chi^2_{2d,\alpha}}{\chi^2_{2d,1-\beta}}\). For specified \(\alpha\), power \(1-\beta\), and ratio \(\Delta\), we may solve this for the required number of deaths, \(d\).

\(\Delta\) can be computed using the following function:

expLikeRatio = function(d, alpha, pwr){
  num = qchisq(alpha, df=(2*d), lower.tail=F)
  denom = qchisq(pwr, df=(2*d), lower.tail=F)
  Delta = num/denom
  Delta
}
  • To get the number of deaths \(d\) for a specified \(\Delta\), we define a new function \(LR(d) = \frac{\chi^2_{2d,\alpha}}{\chi^2_{2d,1-\beta}} - \Delta\).

The solution for \(LR(d) = 0\) is the required number of deaths and is computed as:

expLRdeaths = function(Delta, alpha, pwr){
  LR = function(d, alpha, pwr, Delta){
    expLikeRatio(d, alpha, pwr) - Delta
  }
  # Find the root for the function LR(d)
  result = uniroot(f = LR, lower = 1, upper = 1000,
                   alpha = alpha, pwr = pwr, Delta = Delta)
  result$root
}

2.2.1 Example 1 (cont.)

We wanted to design a Phase II oncology trial where we plan a \(5\%\) level (one-sided) test, and we need \(80\%\) power to detect a hazard ratio of \(1.5\).

We can find the required number of deaths based on likelihood ratio based approach as follows:

# Likelihood ratio based approach
expLRdeaths(Delta = 1.5, alpha = 0.05, pwr = 0.8)
## [1] 36.33916

That is, we would need \(\approx 37\) deaths according to the likelihood ratio method (function expLikeRatio).

2.3 Probability of Event (Death)

We need to provide an estimate of the proportion \(\pi\) of patients who will die by the time of analysis.

  • If all patients entered at the same time, we would simply have \(\pi = 1-S_\lambda(t)\), where \(t\) is the follow-up time.

  • However, patients actually enter over an accrual period of length \(a\) and then, after accrual to the trial has ended, they are followed for an additional time \(f\).

    • So a patient who enters at time \(t = 0\) will have failure probability \(\pi(0) = 1-S_\lambda(a+f)\) as this patient will have the maximum possible follow-up time \(a+f\).

    • Similarly, for any patient who enters at a time \(t \in [0,a]\), the failure probability \(\pi(t) = 1-S_\lambda(a+f-t)\).

  • Assuming that the patients enter uniformly between times \(0\) and \(a\), the probability of death can be computed as \[ \pi = \int\limits_0^a \frac{1}{a} [1-S_\lambda(a+f-t)] dt.\]

  • Assuming \(S_\lambda(t) = e^{-\lambda t}\), we have \[ \pi = 1 - \frac{1}{a\lambda}[e^{-\lambda f} - e^{-\lambda (a+f)}].\]

The following function uses the above expression to compute the probability of death:

prob.death = function(lambda, accrual, followup){
  term1 = exp(-lambda*followup) - exp(-lambda*(accrual + followup))
  probDeath = 1 - (term1/(accrual*lambda))
  probDeath
}

2.3.1 Example 1 (cont.)

We wanted to design a Phase II oncology trial where we plan a \(5\%\) level (one-sided) test, and we need \(80\%\) power to detect a hazard ratio of \(1.5\).

  • Suppose that \(\lambda_0 = 0.15\), then we have \(\lambda_A = \lambda_0/\Delta = 0.1\). Assume accrual period \(a = 2\) years and follow-up time \(f = 3\) years.

The probability of death under \(H_A: \lambda = 0.1\) is computed as:

# Probability of death
pd = prob.death(lambda=0.10, accrual=2, followup=3)
pd
## [1] 0.3285622

From earlier parts of Example 1, we know that we required \(38\) deaths. Now with the probability of death as \(0.3285622\), we need approximately \(38/0.3285622 = 115.7 \approx 116\) number of patients.


3 Two-Arm Study

3.1 Comparing Two Exponential Survival Distributions

Suppose that we want to test \(H_0: S_0 \geq S_1\) versus \(H_A: S_0 < S_1\) for all \(t\), where \(S_0\) and \(S_1\) are exponential survival distributions with hazards \(\lambda_0\) and \(\lambda_1\) for the control and experimental regimens, respectively.

  • Consider the hazard ratio \(\Delta = \lambda_0/\lambda_1\) and presume that \(\lambda_1\) is smaller than \(\lambda_0\). Our test can now be viewed as \(H_0: \Delta = 1\) versus \(H_A: \Delta > 1\).

  • Let \(p\) denote the proportion of patients randomized to the control group. If \(n\) denotes the total number of patients in the trial, and we have \(n_0 = np\) and \(n_1 = n(1-p)\) control and experimental patients, respectively.

  • When the trial has been completed, we will observe \(d_0\) and \(d_1\) deaths in the control and experimental groups, and total patient times of \(V_0 = \sum t_{0i}\) and \(V_1 = \sum t_{1i}\), respectively.

  • The MLEs of the hazards are given as \(\hat{\lambda}_0 = d_0/V_0\) and \(\hat{\lambda}_1 = d_1/V_1\).

  • Let \(\delta = \log \Delta = \log \lambda_0 - \log \lambda_1\), then \[\sigma^2 = var(\hat{\delta}) = \frac{1}{E(d_0)} + \frac{1}{E(d_1)} = \frac{1}{n_o\pi_0} + \frac{1}{n_1 \pi_1} = \frac{1}{np(1-p)} \times \frac{p\pi_0 + (1-p)\pi_1}{\pi_0 \pi_1},\] where \(\pi_0\) and \(\pi_1\) are the probabilities of death in the control and treatment groups, respectively.

  • Let \(\tilde{\pi} = \big( \frac{p\pi_0 + (1-p)\pi_1}{\pi_0 \pi_1} \big)^{-1} = \big( \frac{p}{\pi_1} + \frac{1-p}{\pi_0} \big)^{-1}\), then \(\tilde{\pi}\) is a weighted harmonic mean of \(\pi_0\) and \(\pi_1\), and thus may be viewed as an average probability of death across the control and treatment groups.

3.1.1 Reformulating the Test

Now the test can be written in terms of \(\delta\) as \(H_0: \delta = 0\) versus \(H_A: \delta > 0\).

  • We need a constant \(k\) such that \(\alpha = P(\hat{\delta}>k | \delta = 0) = P(Z > k/\sigma)\).

    • Hence we have \(k = z_\alpha \sigma\).
  • The power of the test is given as \[1-\beta = P(\hat{\delta}>k|\delta = \delta_A) = P\Big(Z > \frac{k-\delta_A}{\sigma}\Big)\].

  • Since \(z_{1-\beta} = -z_\beta = \frac{k-\delta_A}{\sigma}\) and \(\sigma = \frac{\delta}{z_\alpha+z_\beta}\), we now have \(\sigma^2 = \frac{1}{np(1-p)}\tilde{\pi}^{-1} = \frac{\delta^2}{(z_\alpha+z_\beta)^2}\) and hence \[n = \frac{(z_\alpha+z_\beta)^2}{\delta^2p(1-p)\tilde{\pi}}.\]

  • This form of the number of patients required can be interpreted as the number of deaths \(d = \frac{(z_\alpha+z_\beta)^2}{\delta^2p(1-p)}\) divided by the probability of death \(\tilde{\pi}\).

3.2 Comparing Two Survival Distributions Using the Log-Rank Test

  • With this test, which is based on the proportional hazards assumption, the ratio \(\Delta = \lambda_0(t)/\lambda_1(t)\) is constant, but the baseline hazard \(\lambda_0(t)\) is unspecified.

  • We use the log-rank statistic \(U_0 = \sum d_{0i} - \sum e_{0i}\), where \(e_{0i} = E(d_{0i}) = n_{0i}d_i/n_i\); and its variance which for the \(i^{th}\) failure time is given as \(var(U_0) = \sum v_{0i}\), where \(v_{0i} = var(d_{0i}) = \frac{n_{0i}n_{1i}d_i(n_i - d_i)}{n_i^2(n_i-1)}\).

  • Assuming the number of deaths at each failure time is small compared to the number at risk, and that the proportion \(p \approx n_{0i}/n_i\) of subjects assigned to the control group is constant over time, we have the following approximation, \[v_{0i} \approx \frac{n_{0i}n_{1i}d_i(n_i - d_i)}{n_i^2(n_i-1)} \approx \frac{n_{0i}n_{1i}d_i}{n_i^2} \approx p(1-p)d_i.\]

  • Hence the variance of log-rank statistic is \(V_0 = var(U_0) \approx p(1-p)\sum d_i = p(1-p)d.\)

  • From the standard calculations, we see that the number of events needed to detect a treatment difference \(\delta = \log \Delta\) with power \(1-\beta\) and a two-sided level-\(\alpha\) log-rank test is given as \[d = \frac{(z_{\alpha/2} + z_\beta)^2}{p(1-p)\delta^2}.\]

The number of events that we require can be computed using the following function:

TwoArmDeaths = function(Delta, p, alpha, pwr){
  z.alpha = qnorm(alpha, lower.tail=F)
  z.beta = qnorm(1-pwr, lower.tail=F)
  num = (z.alpha + z.beta)^2
  denom = p*(1-p)*(log(Delta))^2
  dd = num/denom
  dd
}

3.3 Probability of Death from a Non-Parametric Survival Curve Estimate

  • If a survival distribution estimate is available for the control group, say, from an earlier trial, then we can use that, along with the proportional hazards assumption, to estimate a probability of death without assuming that the survival distribution is exponential.

  • Typically, the survival function for the control group of a randomized trial is a Kaplan-Meier estimate \(\hat{S}_0(t)\) obtained from a prior study.

  • If we need to detect a hazard ratio \(\Delta\), assuming proportional hazards, the survival function under alternative hypothesis would be \(\hat{S}_1(t) = [\hat{S}_0(t)]^{1/\Delta}\).

  • To compute the expected number of deaths with accrual and followup times \(a\) and \(f\), we use the weighted mean survival, \(\hat{S}(t) = p\hat{S}_0(t) + (1-p)\hat{S}_1(t)\), where \(p\) is the proportion of subjects randomly assigned to the control group.

  • We now want to evaluate \[\pi = \int\limits_0^a \frac{1}{a} [1-\hat{S}(a+f-t)] dt = 1-\frac{1}{a} \int\limits_f^{a+f} \hat{S}(t) dt\].

  • A good approximation for \(\pi\) could be done by using trapezoidal rule as \(\pi_t = 1-\frac{1}{4} \{ \hat{S}(a+f) + 2\hat{S}(\frac{a}{2} + f) + \hat{S}(f) \}\). An alternative approximation approach is by Simpson’s rule which approximates \(\pi\) as \(\pi_s = 1-\frac{1}{6} \{ \hat{S}(a+f) + 4\hat{S}(\frac{a}{2} + f) + \hat{S}(f) \}\)

A function to compute the probability of death using Simpson’s rule is as follows:

pDeath.Simp = function(accrual, followup, S){
  # “S” is a vector with three elements corresponding to the 
  # survival probabilities at times “followup”, “followup + 0.5*accrual”,
  # and “followup + accrual”. 
  # Use Simpson’s rule to approximate the probability of death
  # assuming uniform accrual
  probDeath = 1 - (1/6)*(S[1] + 4*S[2] + S[3])
  probDeath
}
  • However, the most accurate method is to evaluate the integral numerically.

  • Let the ordered failure times be denoted as \(t_{(1)}, t_{(2)},\ldots, t_{(n)}\). The numerical approximation is given as \[\pi_r = \sum\limits_{t_{(i)}:f<t_{(i)}\leq a+f} [\hat{S}(a+f-t_{(i)})\times (t_{(i)}-t_{(i-1)})].\]

3.3.1 Example 2

We want to compute the probability of death from a non-parametric estimate of the survival curve from the lung data.

  • Assume that the accrual time is \(a = 366\) days and the follow-up time is \(f = 180\) days.

The probability of death may be computed as:

library(survival)
# Fit survival curve
sfit = survfit(Surv(time, status)~1, data=lung)
times.obs = sfit$time # observed event times
  
accrual = 366 # accrual time (in days)
followup = 180 # follow-up time (in days)

# identify times t such that f < t < a+f
obs.ind = which({times.obs >= followup} & {times.obs <= accrual + followup})
times.use = c(followup, times.obs[obs.ind])

# retrieve the survival fitted values
surv.use = summary(sfit, times=times.use)$surv

# compute the probability of death
times.diff = diff(c(times.use, accrual + followup))
pi.rec = 1 - (1/accrual)*sum(times.diff*surv.use)
pi.rec
## [1] 0.549486
  • However we estimate the probability of death \(\pi\), the number of patients needed for the trial will be estimated by \(n=d/\pi\), where \(d\) is the required number of deaths.

3.3.2 Example 3

Suppose we need to design a two-arm randomized clinical trial to test the effect of a new therapy in patients with advanced gastric cancer.

  • For the control arm survival distribution we use the survival time from the lung data, and we wish to have \(80\%\) power to detect an alternative hazard ratio \(\Delta = 2\) (for an alternative experimental therapy) using a \(2.5\%\) level one-sided log-rank test.

The number of deaths is as follows:

d = TwoArmDeaths(Delta=2, p=0.5, alpha=0.025, pwr=0.8)
d
## [1] 65.34566

Hence we need 66 events total.

  • Assume that the accrual time is \(a = 366\) days and the follow-up time is \(f = 180\) days.
Delta = 2

# surv.use variable from the previous example
# surival under alternate hypothesis
surv.alt = surv.use^(1/Delta)

# weighted mean survival with weight 0.5
surv.avg = 0.5*surv.use + 0.5*surv.alt

# times diff variable computed from the previous example
# accrual = 366 and follow-up = 180
# probability of death
pi.exact = 1 - (1/accrual)*sum(times.diff*surv.avg)
pi.exact
## [1] 0.4429933

The number of events required is \(65.3456593\) and the probability of event is \(0.4429933\). Hence, the number of patients to be enrolled is \(65.3456593/0.4429933 \approx 147.5\).

  • Thus, we would need to enroll \(n=148\) patients, \(n_1=n_2=74\) patients in each arm, to meet the design specifications.

  • If the full survival curve is unavailable, we may still estimate the sample size by specifying the null and alternative survival distributions in terms of median survival.

    • We can use \(\lambda = \log(2)/t_m\), where \(t_m\) is the median survival time.
    • This approximation will be reasonable if the hazard at the median survival time is near the median of an exponential distribution.
    • For the current example, we have \(t_m = 274\) days and \(\lambda_0 = \log(2)/t_m = 0.0025\). Thus, the hazard for the alternative hypothesis is \(\lambda_1 = 0.0025/2 = 0.0013\).
    • As discussed earlier, the required number of events under the exponential assumption is \(66\), the same as we need using a log-rank test.
    • The probability of death is obtained using the function prob.death defined earlier as
pi0 = prob.death(lambda=0.0025, accrual=366, followup=180)
pi1 = prob.death(lambda=0.0013, accrual=366, followup=180)
pi.harmonicMean = 1/(0.5/pi0 + 0.5/pi1)
pi.harmonicMean
## [1] 0.4526801

The number of events required is \(65.3456593\) and the probability of event is \(0.4526801\). Hence, the number of patients to be enrolled is \(65.3456593/0.4526801 \approx 144.4\), or \(146\) in total.

3.3.3 Example 4: Randomized Study of Patients with Metastatic Colorectal Cancer

Assume that we have a Kaplan-Meier plot of overall survival probabilities of \(161\) patients with metastatic colon cancer who had undergone metastasectomy (surgical removal of cancerous growths that have spread from the colon)3.

  • Survival probabilities at \(24, 36\), and \(48\) months (\(2, 3\), and \(4\) years) are, respectively, \(0.76\), \(0.59\), and \(0.49\).

Suppose that we plan a randomized phase III study comparing a new therapy to placebo for these patients.

  • We plan to carry out a \(0.025\) level log-rank test, and wish to have \(85\%\) power to detect an increase in the three-year survival probability from \(0.59\) (the current untreated rate) to \(0.75\).

How many patients do we need?

  • First, we find the hazard ratio by solving \(0.59 = 0.75^\Delta \Rightarrow \Delta = \log(0.59)/\log(0.75) = 1.834\).

Then the number of deaths required is \(98\), and is computed as

TwoArmDeaths(Delta=1.834, p=0.5, alpha=0.025, pwr=0.85)
## [1] 97.63333

The probabilities of death in the control arm, treatment arm, and the average, are computed as:

# define the parameters as follows
accrual = 2
followup = 2
S.0 = c(0.76, 0.59, 0.49)
psi = 1.834
S.1 = (S.0)^(1/psi)
S.both = 0.5*(S.0 + S.1)

pDeathControl = pDeath.Simp(accrual=2, followup=2, S=S.0)
pDeathControl
## [1] 0.3983333
pDeathTreatment = pDeath.Simp(accrual=2, followup=2, S=S.1)
pDeathTreatment
## [1] 0.2435429
pDeathAll = 0.5*(pDeathControl + pDeathTreatment)
pDeathAll
## [1] 0.3209381

Thus, the total number of patients necessary to produce and expected value of \(97.63\) deaths is \(97.63/0.3209381 = 305\), or \(306/2 = 153\) patients per arm.


4 Other Software

There are several software available for power and sample size calculation.



  1. This notes is adapted from Power and sample size calculations and Applied survival analysis using R.

  2. Cox, D.R., 2018. Analysis of survival data. Chapman and Hall/CRC.

  3. Morse, M.A., Niedzwiecki, D., Marshall, J.L., Garrett, C., Chang, D.Z., Aklilu, M., Crocenzi, T.S., Cole, D.J., Dessureault, S., Hobeika, A.C. and Osada, T., 2013. A randomized phase II study of immunization with dendritic cells modified with poxvectors encoding CEA and MUC1 compared with the same poxvectors plus GM-CSF for resected metastatic colorectal cancer. Annals of surgery, 258(6).

  4. Find statistical considerations for a study where the outcome is a time to failure: http://hedwig.mgh.harvard.edu/sample_size/time_to_event/para_time.html