The Biometrics Weekly

Holm, Hochberg, and Rank-Based Co-Primary Methods Each Gain Formal Extensions in One Cycle

Multiple FWER-controlling procedures have landed simultaneously — filling genuine gaps in the co-primary toolkit while raising a shared question regulators haven't yet answered.

  • Multiplicity & multiple endpoints
  • Regulatory
  • Methodology Frontier

Four methodological papers arrived in the same publication cycle, each extending a different corner of the multiplicity toolkit for trials with multiple or co-primary endpoints. That kind of clustering is either coincidence or a sign that the field has been sitting on the same unmet needs long enough for several groups to address them at once. Either way, biometrics teams designing protocols now have more options — and more pre-specification obligations — than they did a year ago.

The most immediately useful entry is the k-of-n Holm extension (JBS, Vol. 36 Issue 3, May 2026, pp. 419–437), which formalises a “reject at least k out of n” multiple testing procedure with proven strong FWER control, an optimality property (uniformly most powerful within the valid k-of-n class), and simultaneous confidence regions. Classical Holm targets rejection of at least one hypothesis; max-p demands all of them; this sits between those poles and maps directly onto how sponsors actually define trial success when two of three co-primaries must be significant. The validity proofs and confidence regions make a submission argument tractable, though neither FDA’s 2022 Multiple Endpoints guidance nor EMA’s Points to Consider explicitly endorses the “reject at least k” formulation — meaning the first sponsor to use it will be doing some regulatory groundwork.

A companion JBS paper generalises further, proposing “free” stepwise extensions spanning the continuum between Holm and max-p and claiming improved or equivalent power for co-primary settings without additional alpha penalty. The “for free” framing deserves scrutiny of the fine print, but the unifying framework across existing procedures is genuinely useful for SAP justification.

The trimmed weighted Hochberg paper (SBR, Vol. 18 Issue 1, pp. 117–128) excludes certain intersection hypotheses from closed testing to increase power without sacrificing FWER, and integrates sample size optimisation formulas directly. That combination — refined testing procedure paired with design-stage calibration — is practically useful. The catch is the trimming assumptions: excluding intersection hypotheses is only defensible if the conditions justifying the exclusion are verifiable and pre-specified. Reviewers will probe this, and the answer needs to be in the SAP before data lock, not after.

The most structurally novel entry is the rank-based co-primary framework for mixed-scale endpoints — continuous, ordinal, binary, and time-to-event outcomes unified under a single distribution-free inferential structure. This fills a genuine gap in CNS and rare disease design, where co-primaries routinely fail to share a measurement scale. The open question regulatory reviewers will eventually surface: how do ranks interact with intercurrent event handling under ICH E9(R1)? Censored or discontinued observations receive ranks that depend on assumptions sitting squarely in estimand territory — and that question is not resolved in the paper.

The collective implication is emerging, not immediate: none of these procedures have visible regulatory precedent, and adoption will require sponsors to build the evidentiary argument themselves. The toolkit expanded; the acceptance gap did not close.

Protocol read: The multiplicity toolkit just got materially better — but with no agency precedent for any of these procedures, the first sponsor to use them does the regulatory groundwork on top of the statistical work.

What to do now:

  • For trials where “reject at least k of n” matches how trial success is actually defined, evaluate the k-of-n Holm extension and plan a pre-IND alignment conversation with the relevant agency.
  • Pre-specify trimming conditions verifiably and in advance if using the trimmed weighted Hochberg — reviewers will probe the justification, and post-hoc rationalization will not survive.
  • Use the rank-based co-primary framework only when intercurrent-event handling under ICH E9(R1) is also fully specified for rank-based estimands; the paper does not resolve that question.