Five years after ICH E9(R1) was finalised, the evidence is in: sponsors who integrated estimand thinking into protocol design and SAP development early are producing sharper, more defensible submissions. Sponsors who bolted on an “estimands section” to a trial designed by other means are producing confusion. A five-year anniversary paper in Statistics in Biopharmaceutical Research makes that split explicit, cataloguing where implementation has been substantive and where it remains performative — and mapping the research agenda for the next phase.

The gaps are neither small nor obscure. Three are worth flagging immediately.

The oncology treatment-switching problem remains methodologically contested. The hypothetical strategy — “what would OS have looked like without subsequent therapy?” — is the natural estimand for many oncology programs, but choosing the method to implement it is still largely an act of faith. A simulation study in Statistics in Biopharmaceutical Research compares methods head-to-head under controlled conditions, finding meaningful performance differences across RPSFT, IPCW, and two-stage estimation. A companion multiple imputation approach from the Journal of Biopharmaceutical Statistics and a marginal structural model application both add to the toolkit while reinforcing the same inconvenient truth: method choice is not a statistical footnote. In hematologic oncology specifically, a head-to-head comparison on real trial data makes the regulatory stakes visible — different methods yield materially different OS estimates, and EMA and FDA have not converged on a preferred approach. The implication for SAP authors is not to pick the “best” method but to pre-specify a primary and a set of sensitivity analyses with transparent assumptions, and to be ready to defend the choice.

Censoring rules in time-to-event endpoints have not caught up with estimand thinking. A focused commentary in Statistics in Biopharmaceutical Research — with a title that earns its pop-culture reference — argues that legacy censoring conventions in oncology are frequently incoherent with the declared intercurrent event strategy. Censoring at new anti-cancer therapy initiation is not the same thing as a hypothetical strategy implemented correctly; it is a convention that approximates it, sometimes badly. Teams should audit their current SAP censoring rules against their declared estimand strategies before submission, not after a query.

Competing events are treated as censoring far too casually. A Statistics in Medicine paper on controlled versus separable direct effects delivers what is essentially a methodological warning: in the presence of competing events, the common practice of censoring-by-competing-event targets a controlled direct effect — estimating what would happen in a counterfactual world where the competing event cannot occur. That is not the same question as what most trials are actually trying to answer, which involves the separable direct effect of a treatment component acting only through the pathway of interest. The two estimands can diverge substantially — including sign reversal in simulations — which is the kind of finding that should arrive in a SAP review, not a post-submission query.

What the surrounding methodology says

The E9(R1) ecosystem is expanding in every direction simultaneously, and the volume is significant. On vaccine trials, a Journal of Biopharmaceutical Statistics paper works through the hypothetical and principal stratum strategies for vaccine-specific intercurrent events — prior infection, concurrent vaccine use, non-adherence — filling a gap the original addendum left underspecified. On gene therapy long-term follow-up, another JBS paper maps all five estimand attributes onto LTFU trial designs where regulatory follow-up windows run 5–15 years and the original pivotal estimand is frequently not carried forward coherently.

The covariate adjustment story is similarly active. The 17th UPenn Statistical Issues in Clinical Trials conference devoted its full day to covariate adjustment, with morning and afternoon panel proceedings both published ahead of print in Clinical Trials — the topic’s sustained occupation of a full conference day being the clearest possible signal of where the field considers its attention overdue. A Clinical Trials overview and a theory-to-practice synthesis provide the conceptual scaffolding, while efficiency results for model-robust direct regression adjustment with binary outcomes and IPTW for survival endpoints provide the analytic substance. The practical message is simple and by now should be non-negotiable: pre-specified covariate adjustment is expected, not optional, and the methods to implement it rigorously across endpoint types now exist.

The Bayesian front is also producing substantive output. A narrative review of Bayesian methods in confirmatory trials in Clinical Trials maps the regulatory landscape for pivotal Bayesian submissions, while a cluster of papers addresses the practical problem that keeps such submissions tethered to the negotiating table: how to borrow from historical or external controls without inflating type I error. SPx dynamic borrowing using aggregate data, power priors with constrained borrowing, historical-bias power priors with empirical Bayes, and robust mixture prior tuning parameter guidance all address different facets of the same regulatory bottleneck: borrowing is attractive, prior-data conflict is real, and regulators remain unconvinced unless the operating characteristics are demonstrated under heterogeneity scenarios they find plausible. None of these papers resolve the regulatory uncertainty, but they collectively reduce the methodological distance between a good idea and a defensible submission.

On composite endpoints, win ratio methodology is generating enough methodology papers to constitute its own sub-cluster. The non-monotonicity paradox — where treatment can improve every individual component yet yield a win ratio below 1 — deserves to be on every cardiovascular trial statistician’s desk. The noncollapsibility of win statistics has direct implications for subgroup analyses and covariate-adjusted interpretations. Formal covariate adjustment for win odds is now developed, with a small-sample type I error caveat worth noting. Teams pre-specifying win ratio analyses for regulatory submissions should be running these diagnostics before locking SAPs.

What to watch

The E9(R1) five-year paper is worth reading in full — not as a celebration but as a gap analysis. The next regulatory cycle will likely involve more agency-level scrutiny of whether estimands declared in protocols are actually implemented coherently in SAPs and TLF shells, not just stated. The causal inference papers in this cluster — on competing events, separable effects, IPCW assumptions — suggest that the field is converging on a harder-edged definition of what “implementing the hypothetical strategy” actually requires. Teams that have treated the estimands section as a documentation exercise will find that position increasingly difficult to defend as reviewers develop better pattern recognition for the difference between a stated strategy and an executed one.