The Biometrics Weekly

ICH E9(R1) at Five: The Vocabulary Stuck, the Hard Problems Didn't

A cluster of new publications and multi-agency forums marks the end of E9(R1)'s foundational phase — and the start of reckoning with where stated estimands and actual analyses quietly diverge.

  • Estimands & ICH E9(R1)
  • Regulatory

The ICH E9(R1) addendum was finalized in 2019 with the ambition of forcing sponsors to state, precisely, what question a trial is designed to answer before they start designing it. Five years on, a anniversary stocktake in Statistics in Biopharmaceutical Research delivers a verdict that will be familiar to anyone who has reviewed a late-phase SAP recently: the framework’s vocabulary has been absorbed widely, but its discipline has been absorbed unevenly. Some teams have genuinely restructured how they think about trial design. Others have added an estimand section to the protocol and moved on.

The anniversary paper identifies oncology treatment switching, rare disease, and HRQoL as the persistent challenge areas — where the gap between a carefully worded estimand and what the analysis actually estimates is largest and most consequential.

A Statistics in Medicine paper on competing events illustrates the sharpest version of this gap: using IPCW to target a controlled direct effect when the actual clinical question is a separable direct effect can yield estimates that differ not just in magnitude but in sign. Simulations and empirical data from an estrogen therapy trial show how far that divergence can run. The estimand is stated; the estimator quietly answers a different question.

A systematic review in Journal of Clinical Epidemiology adds an uncomfortable angle on the two-trial paradigm: estimands and outcomes change frequently between Trial 1 and Trial 2 in Phase 3 programs, undermining the evidentiary independence assumption the two-trial rule rests on. That finding matters more, not less, as the field moves toward single-pivotal-trial approvals.

The deeper implication of the five-year review isn’t about new methods — it’s about enforcement: ensuring the estimand in the protocol is the quantity the SAP targets, the analysis estimates, and the censoring rules actually support.

Protocol read: Five years in, the estimand framework’s successes and its failures both come down to whether the statement in the protocol is what the SAP, the analysis, and the censoring rules actually target. The vocabulary did the easy work; the alignment is still the work.

What to do now:

  • Audit existing SAPs in oncology switching, rare-disease, and HRQoL programs for estimand–estimator drift; treat the protocol’s estimand statement as the testable specification, not as boilerplate.
  • Pressure-test IPCW choices in competing-events settings against the separable-direct-effect estimand explicitly — the divergence between targets can flip the sign of the result.
  • For single-pivotal-trial designs, plan estimand stability across what would have been Trial 1 and Trial 2; the two-trial paradigm was doing more replication work than its formal status suggested.