Does ICL fix BIC's tendency to over-select K? 1,200 simulations across two scenarios and three sample sizes reveal the real culprit: treatment-induced class splitting, not information criterion choice.
EXP-002 found that BIC consistently over-selected K โ picking K=4 when the true number of trajectory classes was 3. Board Room Session 005 locked ICL as the replacement. But does ICL actually fix the problem?
It doesn't. Under the null (no treatment effect), both BIC and ICL recover the true K=3 perfectly โ 100% of the time, at every sample size. The model selection machinery works. But under class-specific treatment (50% slowing in slow progressors), both criteria struggle badly: only 38โ44% correct K recovery, with K=4 as the most common selection.
The key insight: the treatment isn't fooling the model โ it's creating a real 4th trajectory. When slow progressors receive treatment, their trajectories diverge from untreated slow progressors. The mixture model correctly detects this as a distinct class. The fix isn't a better information criterion โ it's fitting LCMM on pooled data without treatment covariates, then estimating treatment effects within the discovered classes.
In EXP-002, the two-stage LCMM pipeline recovered most of the oracle's power advantage โ but BIC consistently selected K=4 instead of the true K=3. Session 005 replaced BIC with ICL (BIC plus an entropy penalty that rewards well-separated classes), reasoning that the entropy correction would prevent spurious class splitting.
This experiment tests that hypothesis directly: does ICL outperform BIC for K-selection, and if not, what's actually driving the over-selection?
Data-Generating Process. Same three-class ALS trajectory model: slow, fast, and stable-then-crash progressors. N per arm of 100, 200, and 400. 200 simulations per scenario per sample size.
K-selection: Fit LCMM for K=1 through K=5. Select K by BIC (lowest) and ICL (lowest). Quality filters: minimum class size >5%, mean posterior probability >0.70. EM: 50 iterations, 3 random restarts, linear trajectories.
Two scenarios: Null (no treatment effect) and class-specific treatment (50% slowing in slow progressors โ same as EXP-001 through EXP-003).
Under null conditions, both BIC and ICL recover the true K=3 perfectly. Under class-specific treatment, both collapse to ~38โ44% correct recovery. ICL provides no meaningful advantage. The treatment effect itself creates a 4th trajectory that the model correctly detects.
Perfect K recovery across all sample sizes. Both criteria work flawlessly when there's no treatment-induced splitting.
| N/arm | BIC โ K=3 | ICL โ K=3 | BIC Mode | ICL Mode |
|---|---|---|---|---|
| 100 | 100% | 100% | 3 | 3 |
| 200 | 100% | 100% | 3 | 3 |
| 400 | 100% | 100% | 3 | 3 |
Both BIC and ICL struggle. K=4 is the modal selection โ the treatment creates a real 4th trajectory pattern. ICL offers marginal improvement at N=100 but no benefit at larger sample sizes.
| N/arm | BIC โ K=3 | ICL โ K=3 | BIC Mode | ICL Mode |
|---|---|---|---|---|
| 100 | 38% | 44% | 4 | 4 |
| 200 | 39% | 39% | 4 | 4 |
| 400 | 44% | 44% | 4 | 4 |
Finding 1: Under null conditions, both BIC and ICL recover K=3 perfectly. 100% recovery rate at all sample sizes (N=100, 200, 400 per arm). The EM algorithm, quality filters, and model selection machinery all work correctly. The three-class structure is identifiable.
Finding 2: Under class-specific treatment, both criteria fail similarly. K=3 recovery drops to 38โ44%. K=4 is the most common selection. ICL's entropy penalty provides marginal benefit at N=100 (44% vs 38%) but zero benefit at larger N. The problem is not criterion sensitivity โ it's that the treatment creates a genuine 4th trajectory pattern.
Finding 3: The K=4 selection is not an error. This is the critical insight. When slow progressors receive treatment, their trajectories diverge from untreated slow progressors. The mixture model correctly identifies this as a distinct class. It's doing exactly what it should โ detecting real structure in the data. The "over-selection" in EXP-002 was the model responding faithfully to treatment-induced heterogeneity.
Finding 4: ICL is not the fix. Session 005 replaced BIC with ICL on the premise that entropy correction would prevent spurious splitting. It doesn't โ because the splitting isn't spurious. The entropy of a 4-class model that captures a real treatment ร class interaction can be just as clean as a 3-class model.
Fit LCMM on pooled data without treatment covariates (or on the placebo arm only). Discover the latent trajectory classes first, agnostic to treatment assignment. Then estimate treatment effects within each discovered class. This avoids confounding class structure with treatment response.
The two-stage LCMM pipeline from EXP-002 needs a structural correction. Class enumeration must happen on pooled data โ both arms combined โ without treatment as a covariate. Otherwise, the model will detect the treatment effect as additional trajectory heterogeneity and inflate K.
This is not a flaw in BIC, ICL, or the EM algorithm. It's a design principle: discover structure first, then test hypotheses within that structure. If you let the hypothesis (treatment) contaminate the discovery step (class enumeration), you get circular reasoning โ the treatment effect bootstraps itself into the class structure.
The practical implication for the ALS pipeline: Step 1 fits LCMM on all patients ignoring treatment assignment. Step 2 assigns patients to trajectory classes. Step 3 tests treatment effects within each class. This clean separation is what makes the two-stage approach valid.
Builds on: EXP-002: The Oracle Haircut โ identified K over-selection as a problem. Board Room Session 005 โ locked ICL as the replacement for BIC.
Answers: Does ICL fix BIC's K over-selection? No โ the over-selection is treatment-induced class splitting, not an information criterion flaw.
Pipeline update: LCMM class enumeration should be performed on pooled data without treatment covariates, then treatment effects estimated within discovered classes.