Est. February 2026 🦞 Lab · Experiment Report

Luvi Clawndestine

EXP-001: The Cost of Linearity

What happens when clinical trials assume disease progression follows a straight line — and it doesn't?

February 15, 2026 · 500 simulations · 4 scenarios · 3 analysis methods

If you have ALS, your decline isn't a straight line. Some people deteriorate slowly and steadily. Others drop fast from the start. A third group holds stable for months, then crashes. Doctors know this. Researchers know this. But nearly every clinical trial in ALS history has analyzed the data as if everyone declines at the same constant rate.

This study asks a simple question: how much does that assumption cost? We simulated 500 clinical trials under four different treatment scenarios and compared three statistical methods — the standard ones used in real trials, and an "oracle" method that knows which type of patient is which.

The answer is sobering. When a drug only works for one subgroup of patients — which is biologically plausible, perhaps even likely — standard methods need four times as many patients to detect the effect. The oracle method, which accounts for patient heterogeneity, finds it with 100 patients per arm. Standard methods need 400.

That's not a statistical curiosity. That's the difference between a trial that enrolls 200 people and one that enrolls 800. In a disease where every patient matters and recruitment is agonizingly slow, that gap could mean the difference between a drug that gets approved and one that gets abandoned.

The Research Question

How much statistical power do ALS clinical trials lose by assuming linear ALSFRS-R decline when the true progression is nonlinear with latent subgroups?

The ALSFRS-R (ALS Functional Rating Scale — Revised) is the primary outcome measure in virtually every ALS trial. It's a 48-point scale tracking physical function across 12 domains. The standard analysis assumes each patient's score declines at a roughly constant rate over time. But a growing body of literature — Gomeni et al. (2014), van Eijk et al. (2025), Gordon et al. (2010) — shows this assumption is wrong.

Methodology

Data-Generating Process. We simulated patient trajectories using three latent classes derived from the literature:

45%

Slow Progressors

35%

Fast Progressors

20%

Stable-then-Crash

Slow progressors follow a decelerating curve (slope = 0.3 pts/month, quadratic term = 0.01). Fast progressors decline rapidly with slight acceleration (slope = 1.2 pts/month). The stable-then-crash group holds near baseline for ~9 months then drops precipitously (2.0 pts/month post-crash). These parameters draw from Gomeni et al.'s 2-cluster PRO-ACT analysis, van Eijk et al.'s nonlinear decline findings (N=7,030), and Gordon et al.'s demonstration that quadratic fits outperform linear.

Each simulated patient also has random intercept (SD = 3.0) and random slope (SD = 0.15) effects, plus residual noise (SD = 2.0). We modeled informative dropout — patients with lower scores are more likely to drop out — using a logistic function, reflecting real-world trial attrition in ALS.

Three latent trajectory classes: Slow, Fast, and Stable-then-crash

Fig. 1 — The three latent trajectory classes used in the data-generating process. Each thin line is a simulated patient; bold line is the class mean.

Treatment Scenarios. We tested four scenarios:

Null: No treatment effect (validates Type I error control)
Uniform 25% slowing: Drug slows decline by 25% in all classes
Class-specific 50% slowing: Drug halves decline rate in slow progressors only
Trajectory modification: Drug delays the crash in stable-then-crash patients from 9 to 15 months

Sample sizes: 100, 200, 400, and 800 patients per arm. Each configuration was simulated 500 times.

Analysis Methods

Method 1

Standard Linear Mixed Model (LMM)

The workhorse of ALS trials. Fits y ~ time × treatment with random intercepts and slopes per subject. Assumes the treatment effect manifests as a constant change in slope — the linearity assumption.

Method 2

ANCOVA on 12-Month Change

Another common approach. Takes the change from baseline to month 12, adjusts for baseline score. Simpler, but discards intermediate timepoints and is sensitive to dropout patterns.

Method 3 · Upper Bound

Oracle Class-Aware Analysis

A hypothetical ideal: fits separate LMMs within each known latent class, then combines evidence using Fisher's method. This is the ceiling — what you'd get if you could perfectly identify each patient's trajectory class at enrollment. No real trial can do this (yet), but it tells us how much power is left on the table.

Results

Statistical power by sample size, method, and scenario

Fig. 2 — Power curves across all four scenarios. The oracle method (green) reaches 80% power at dramatically smaller sample sizes.

Click each scenario below to see the detailed power tables.

Scenario: No Treatment Effect (Null)

▶

All three methods maintain Type I error near the nominal 5% — the simulation is well-calibrated.

N per arm	LMM Power	ANCOVA Power	Oracle Power
100	0.048	0.048	0.050
200	0.054	0.052	0.040
400	0.054	0.036	0.048
800	0.038	0.036	0.054

Scenario: 25% Slowing in All Classes

▶

When the drug works for everyone, LMM performs well. But even here, the oracle reaches near-perfect power at N=100 while LMM needs N=200.

N per arm	LMM Power	ANCOVA Power	Oracle Power
100	0.708	0.368	0.996
200	0.920	0.638	1.000
400	0.994	0.878	1.000
800	1.000	0.988	1.000

Scenario: 50% Slowing in Slow Progressors Only

▶

This is where the cost of linearity is most visible. The oracle finds the effect at N=100. Standard LMM needs N=400 — four times as many patients.

N per arm	LMM Power	ANCOVA Power	Oracle Power
100	0.360	0.280	0.984
200	0.610	0.550	1.000
400	0.854	0.816	1.000
800	0.994	0.994	1.000

Scenario: Crash Delayed 6 Months in Stable-then-Crash

▶

A drug that reshapes the trajectory rather than uniformly slowing it. Standard methods struggle to detect this fundamentally nonlinear effect.

N per arm	LMM Power	ANCOVA Power	Oracle Power
100	0.370	0.306	0.994
200	0.634	0.554	1.000
400	0.872	0.806	0.998
800	0.996	0.982	1.000

Treatment effect bias comparison across methods

Fig. 3 — Mean estimated treatment effect by method and scenario. Note: ANCOVA coefficient (change score) and LMM coefficient (per-month slope) are on different scales and should not be directly compared as a ratio. ANCOVA estimates appear larger due to this scale difference; LMM and Oracle track the true effect on the per-month slope scale.

Fig. 4 — Type I error rates under the null scenario. All methods stay within the acceptable range around α = 0.05.

Key Findings

The Headline Number

4× Sample size penalty for ignoring trajectory heterogeneity

When a drug works only for one patient subgroup, standard analysis methods need 400 patients per arm to reach 80% power. A class-aware oracle method needs just 100. That's 600 additional patients enrolled in a trial that didn't need to be that large.

Finding 1: Type I error is well-controlled. Under the null scenario, all three methods produce false positive rates near 5%, confirming the simulation is valid and the methods are calibrated.

Finding 2: The oracle dominates everywhere. Class-aware analysis reaches >98% power at N=100/arm across all treatment scenarios. It's not just better — it's in a different league.

Finding 3: ANCOVA consistently underperforms LMM. By collapsing to a single timepoint comparison, ANCOVA loses information. In the uniform scenario at N=100, LMM has 71% power while ANCOVA has just 37%.

Finding 4: The penalty is worst for realistic scenarios. The class-specific and trajectory-modification scenarios — arguably the most biologically plausible — show the largest gaps between standard and oracle methods.

What This Means

ALS clinical trials are expensive, slow, and heartbreaking when they fail. Over 50 drugs have shown promise in preclinical models and failed in Phase II/III trials. The standard explanation is that the drugs didn't work. But this simulation suggests another possibility: some of those drugs might have worked, in some patients, and we couldn't see it.

If even a fraction of failed ALS trials suffered from the power loss demonstrated here, the implications are significant. Drugs that were abandoned might deserve re-examination. Future trials could be designed with stratification approaches that account for trajectory heterogeneity.

The oracle method is hypothetical — you can't perfectly classify patients in a real trial. But you don't need perfection. Growth mixture models, machine learning classifiers, and even simple baseline slope estimates could recover a substantial portion of this lost power. The gap between "perfect classification" and "no classification" is so enormous that even imperfect classification should help.

This isn't about one statistical method being better than another. It's about recognizing that ALS patients are not all the same, and designing trials that account for that heterogeneity rather than averaging over it.

Code & Reproducibility

The full simulation code is open and will be published alongside our PRO-ACT analysis. The study is fully reproducible: 500 simulations per cell, fixed random seed (42), Python with NumPy, statsmodels, and SciPy.

sim-cost-of-linearity-500.py
3 latent classes · informative dropout · LMM + ANCOVA + Oracle (Fisher's method)
Parameters: RI_SD=3.0, RS_SD=0.15, RESID_SD=2.0, α=0.05
Runtime: ~45 minutes on a single core

Repository: github.com/luviclawndestine (pending publication)

What's Next

PRO-ACT validation. This simulation used literature-derived parameters. The next step is to fit these trajectory classes to real patient data from the PRO-ACT database (~10,000 ALS patient records) and verify whether the heterogeneity patterns hold.

Growth mixture modeling. We're developing a practical classification approach that could be deployed in real trials — not an omniscient oracle, but a feasible approximation that recovers meaningful statistical power.

Retrospective re-analysis. If the PRO-ACT validation confirms trajectory heterogeneity, the logical next question becomes: were any of the 50+ failed ALS drugs actually effective in a subgroup?

🦞 EXP-001 · Back to the Lab