What happens when clinical trials assume disease progression follows a straight line โ and it doesn't?
If you have ALS, your decline isn't a straight line. Some people deteriorate slowly and steadily. Others drop fast from the start. A third group holds stable for months, then crashes. Doctors know this. Researchers know this. But nearly every clinical trial in ALS history has analyzed the data as if everyone declines at the same constant rate.
This study asks a simple question: how much does that assumption cost? We simulated 500 clinical trials under four different treatment scenarios and compared three statistical methods โ the standard ones used in real trials, and an "oracle" method that knows which type of patient is which.
The answer is sobering. When a drug only works for one subgroup of patients โ which is biologically plausible, perhaps even likely โ standard methods need four times as many patients to detect the effect. The oracle method, which accounts for patient heterogeneity, finds it with 100 patients per arm. Standard methods need 400.
That's not a statistical curiosity. That's the difference between a trial that enrolls 200 people and one that enrolls 800. In a disease where every patient matters and recruitment is agonizingly slow, that gap could mean the difference between a drug that gets approved and one that gets abandoned.
How much statistical power do ALS clinical trials lose by assuming linear ALSFRS-R decline when the true progression is nonlinear with latent subgroups?
The ALSFRS-R (ALS Functional Rating Scale โ Revised) is the primary outcome measure in virtually every ALS trial. It's a 48-point scale tracking physical function across 12 domains. The standard analysis assumes each patient's score declines at a roughly constant rate over time. But a growing body of literature โ Gomeni et al. (2014), van Eijk et al. (2025), Gordon et al. (2010) โ shows this assumption is wrong.
Data-Generating Process. We simulated patient trajectories using three latent classes derived from the literature:
Slow progressors follow a decelerating curve (slope = 0.3 pts/month, quadratic term = 0.01). Fast progressors decline rapidly with slight acceleration (slope = 1.2 pts/month). The stable-then-crash group holds near baseline for ~9 months then drops precipitously (2.0 pts/month post-crash). These parameters draw from Gomeni et al.'s 2-cluster PRO-ACT analysis, van Eijk et al.'s nonlinear decline findings (N=7,030), and Gordon et al.'s demonstration that quadratic fits outperform linear.
Each simulated patient also has random intercept (SD = 3.0) and random slope (SD = 0.15) effects, plus residual noise (SD = 2.0). We modeled informative dropout โ patients with lower scores are more likely to drop out โ using a logistic function, reflecting real-world trial attrition in ALS.
Treatment Scenarios. We tested four scenarios:
Sample sizes: 100, 200, 400, and 800 patients per arm. Each configuration was simulated 500 times.
The workhorse of ALS trials. Fits y ~ time ร treatment with random intercepts and slopes per subject. Assumes the treatment effect manifests as a constant change in slope โ the linearity assumption.
Another common approach. Takes the change from baseline to month 12, adjusts for baseline score. Simpler, but discards intermediate timepoints and is sensitive to dropout patterns.
A hypothetical ideal: fits separate LMMs within each known latent class, then combines evidence using Fisher's method. This is the ceiling โ what you'd get if you could perfectly identify each patient's trajectory class at enrollment. No real trial can do this (yet), but it tells us how much power is left on the table.
Click each scenario below to see the detailed power tables.
All three methods maintain Type I error near the nominal 5% โ the simulation is well-calibrated.
| N per arm | LMM Power | ANCOVA Power | Oracle Power |
|---|---|---|---|
| 100 | 0.048 | 0.048 | 0.050 |
| 200 | 0.054 | 0.052 | 0.040 |
| 400 | 0.054 | 0.036 | 0.048 |
| 800 | 0.038 | 0.036 | 0.054 |
When the drug works for everyone, LMM performs well. But even here, the oracle reaches near-perfect power at N=100 while LMM needs N=200.
| N per arm | LMM Power | ANCOVA Power | Oracle Power |
|---|---|---|---|
| 100 | 0.708 | 0.368 | 0.996 |
| 200 | 0.920 | 0.638 | 1.000 |
| 400 | 0.994 | 0.878 | 1.000 |
| 800 | 1.000 | 0.988 | 1.000 |
This is where the cost of linearity is most visible. The oracle finds the effect at N=100. Standard LMM needs N=400 โ four times as many patients.
| N per arm | LMM Power | ANCOVA Power | Oracle Power |
|---|---|---|---|
| 100 | 0.360 | 0.280 | 0.984 |
| 200 | 0.610 | 0.550 | 1.000 |
| 400 | 0.854 | 0.816 | 1.000 |
| 800 | 0.994 | 0.994 | 1.000 |
A drug that reshapes the trajectory rather than uniformly slowing it. Standard methods struggle to detect this fundamentally nonlinear effect.
| N per arm | LMM Power | ANCOVA Power | Oracle Power |
|---|---|---|---|
| 100 | 0.370 | 0.306 | 0.994 |
| 200 | 0.634 | 0.554 | 1.000 |
| 400 | 0.872 | 0.806 | 0.998 |
| 800 | 0.996 | 0.982 | 1.000 |
When a drug works only for one patient subgroup, standard analysis methods need 400 patients per arm to reach 80% power. A class-aware oracle method needs just 100. That's 600 additional patients enrolled in a trial that didn't need to be that large.
Finding 1: Type I error is well-controlled. Under the null scenario, all three methods produce false positive rates near 5%, confirming the simulation is valid and the methods are calibrated.
Finding 2: The oracle dominates everywhere. Class-aware analysis reaches >98% power at N=100/arm across all treatment scenarios. It's not just better โ it's in a different league.
Finding 3: ANCOVA consistently underperforms LMM. By collapsing to a single timepoint comparison, ANCOVA loses information. In the uniform scenario at N=100, LMM has 71% power while ANCOVA has just 37%.
Finding 4: The penalty is worst for realistic scenarios. The class-specific and trajectory-modification scenarios โ arguably the most biologically plausible โ show the largest gaps between standard and oracle methods.
ALS clinical trials are expensive, slow, and heartbreaking when they fail. Over 50 drugs have shown promise in preclinical models and failed in Phase II/III trials. The standard explanation is that the drugs didn't work. But this simulation suggests another possibility: some of those drugs might have worked, in some patients, and we couldn't see it.
If even a fraction of failed ALS trials suffered from the power loss demonstrated here, the implications are significant. Drugs that were abandoned might deserve re-examination. Future trials could be designed with stratification approaches that account for trajectory heterogeneity.
The oracle method is hypothetical โ you can't perfectly classify patients in a real trial. But you don't need perfection. Growth mixture models, machine learning classifiers, and even simple baseline slope estimates could recover a substantial portion of this lost power. The gap between "perfect classification" and "no classification" is so enormous that even imperfect classification should help.
This isn't about one statistical method being better than another. It's about recognizing that ALS patients are not all the same, and designing trials that account for that heterogeneity rather than averaging over it.
The full simulation code is open and will be published alongside our PRO-ACT analysis. The study is fully reproducible: 500 simulations per cell, fixed random seed (42), Python with NumPy, statsmodels, and SciPy.
PRO-ACT validation. This simulation used literature-derived parameters. The next step is to fit these trajectory classes to real patient data from the PRO-ACT database (~10,000 ALS patient records) and verify whether the heterogeneity patterns hold.
Growth mixture modeling. We're developing a practical classification approach that could be deployed in real trials โ not an omniscient oracle, but a feasible approximation that recovers meaningful statistical power.
Retrospective re-analysis. If the PRO-ACT validation confirms trajectory heterogeneity, the logical next question becomes: were any of the 50+ failed ALS drugs actually effective in a subgroup?