Five Years On, the Estimand Framework Meets Its Legacy Conventions
The 2019 addendum's anniversary stocktaking arrives alongside two Project SignifiCanT forums and a censoring commentary that move the hard problem from defining intercurrent events to dismantling the conventions that contradict them.
- Estimands & ICH E9(R1)
- Regulatory
- Methodology Frontier
The five-year anniversary paper on ICH E9(R1), published in Statistics in Biopharmaceutical Research, arrives at a moment when the field has largely stopped arguing about whether to write an estimand section and started arguing about what the rest of the SAP has to change to make that section honest. The anniversary framing is “where are we with implementation and where to go next.” The honest answer, judging by the wave of papers landing alongside it, is: the easy part is done, and the legacy analytical conventions are next.
Two regulator forums under FDA OCE’s Project SignifiCanT illustrate where the centre of gravity has moved. The June 26, 2025 forum on standard-of-care changes mid-trial — convened with regulators from FDA, Health Canada, MHRA, PEI, and TGA, with EMA, PMDA, ANVISA, HSA, and MOH-Israel observing — drew an explicit estimand-level line between Case A “add-on” designs (SOC in both arms) and Case B “SOC-only-in-control” designs. In Case B, the forum noted, pooled and stratified analyses across pre- and post-SOC-change periods are “generally implausible due to incompatible assumptions,” and non-concurrent data introduces time-trend bias that no amount of stratification dissolves. The academic consensus leaned toward trial restart over adaptive control-arm changes in many scenarios. That is a meaningful shift: the methods literature on subsequent-therapy adjustment (MSMs, IPW, multiple imputation, and now causal ML extensions) has been treated as the answer for years; the regulators in the room treated it as a fallback.
The September 30, 2025 forum on Duration of Response made the same move on a different endpoint. Restricted Mean DoR — a multi-state, ITT-compatible composite of binary response and time-in-response — was the leading candidate to replace responder-only DoR, which has selection bias baked in by construction. The panel was explicit that RMDoR has to be anchored in E9(R1) before the method is selected, and equally explicit that it is the wrong tool for cytostatic agents with low response rates. Useful guardrails, for an endpoint that until recently was reported with little methodological self-consciousness at all.
Where the conventions push back
The substantive shift this cycle is best captured by a focused commentary in the same SBR issue, “Time-to-Event Estimands in Oncology Trials: What’s Censoring Got To Do, Got To Do With It?” (Vol 18 Iss 1, pp. 187–192). The argument is short and uncomfortable: censoring at new anti-cancer therapy, and standard K-M lost-to-follow-up rules, are themselves substantive estimand choices, and they have not been reconciled with the intercurrent-event strategies E9(R1) actually mandates. Censoring is not a technical default. It is a hypothetical-strategy assumption smuggled in under a different name, and it has been smuggled into PFS, OS, EFS, and DOR SAPs for two decades.
A Statistics in Medicine paper on competing events sharpens the point with a simulation: the controlled direct effect targeted by IPCW “censoring by competing events” and the separable direct effect can diverge substantially in magnitude — and can have opposite signs — under the same data-generating process, illustrated on a randomized estrogen-therapy prostate cancer trial. If your causal question is mechanistic and your estimator targets a counterfactual world without competing events, you may be answering a question no clinician asked.
What this means for the next confirmatory programme
The two-trial paradigm gets its own empirical reckoning in a Journal of Clinical Epidemiology systematic review quantifying how often estimands and outcomes shift between Trial 1 and Trial 2, and how regulators handle the discrepancies. If the estimand changes, the two trials are not testing the same question, and the Type I error logic of requiring two positive trials weakens in ways the framework was never asked to absorb. Combine that with the ALTA-1L HRQoL paper showing that censoring-at-discontinuation in PRO analyses leans on a noninformative-censoring assumption “often violated” when discontinuation is driven by progression or toxicity, and the pattern is consistent: the framework has matured faster than the conventions sitting underneath it.
None of this is a regulatory earthquake. It is, more accurately, the slow recognition that the estimand section and the analysis section of the SAP have been in quiet contradiction for some time, and the contradictions are now being named in print.
Protocol read: The framework’s first five years cleared the conceptual ground; the next five will be spent reconciling censoring rules, two-trial pre-specification, and oncology endpoint conventions with the estimands sponsors already claim in labels — and the regulators who attended these forums are no longer treating analytical workarounds as the default answer.
What to do now:
- Audit oncology SAP censoring rules (subsequent therapy, LTFU, competing events) against the declared intercurrent-event strategy; flag any where censoring is doing hypothetical-strategy work without saying so, and hold RMDoR off the primary-endpoint slate outside high-response-rate cytotoxic settings until Project SignifiCanT output lands in guidance-adjacent form.
- For trials with mid-development SOC risk, pre-specify the Case A vs. Case B contingency and the trial-restart trigger before the question is forced by an external approval.
- Before locking Trial 2 of a two-trial programme, document any estimand or endpoint deltas from Trial 1 explicitly — regulators now have a systematic review quantifying how common this is.