W13-2026 Out-of-Sample Prediction Validation

Mean Reversion Dominates W13 Out-of-Sample Test

First out-of-sample prediction evaluation. Mean reversion led at 85.7% in an upward-biased week. KG interface tuning begins next cycle.

Companies Evaluated

18/21

Went Up

85.7%

Mean Reversion Accuracy

4.8%

KG Augmented Accuracy

Weeks Cumulative Data

Total Predictions Scored

W13 predictions were locked on April 2 at 06:53 UTC. W13 actuals were scored the same day. This is the first true out-of-sample evaluation — predictions committed before actuals were known, then validated against real outcomes. This is iteration 1 of the SBPI prediction pipeline.

Mean reversion, the simplest statistical heuristic (scores below tier midpoint go up, above go down), led all methods at 85.7% directional accuracy in a week where 18 of 21 companies moved upward. The KG-augmented method ran with default (untuned) parameters and predicted "stable" across the board — its first tuning pass deploys next cycle.

The prediction pipeline improves through multiple inputs running simultaneously: better parameters, more weeks of data, new research methods, and refinement of the scoring methodology itself. The Optuna optimizer has already found a 12-parameter configuration that lifts KG accuracy from 4.8% to 69.9% on training data. That configuration gets wired in for W14.

Full Evaluation

W13 Prediction Scorecard

Every prediction from every method, scored against actuals.

Company	W12	W13	Delta	Dir	Persist	Momentum	Mean Rev	KG Aug
DramaBox	82.75	82.75	0.0	STABLE	✓ stable	✗ up	✗ up	✓ stable
ReelShort	82.0	81.2	-0.8	DOWN	✗ stable	✓ down	✗ up	✗ stable
Disney	76.55	77.1	+0.55	UP	✗ stable	✓ up	✓ up	✗ stable
iQiYi	65.7	67.3	+1.6	UP	✗ stable	✗ stable	✓ up	✗ stable
JioHotstar	62.25	65.4	+3.15	UP	✗ stable	✓ up	✓ up	✗ stable
Google/100Z	63.65	62.95	-0.7	DOWN	✗ stable	✗ stable	✗ up	✗ stable
Holywater / My Drama	61.65	61.95	+0.3	UP	✗ stable	✗ stable	✓ up	✗ stable
Netflix	60.8	60.95	+0.15	UP	✗ stable	✗ down	✓ up	✗ stable
GoodShort	58.8	60.2	+1.4	UP	✗ stable	✓ up	✓ up	✗ stable
CandyJar	58.65	59.55	+0.9	UP	✗ stable	✗ stable	✓ up	✗ stable
ShortMax	56.65	57.85	+1.2	UP	✗ stable	✗ stable	✓ up	✗ stable
Lifetime/A&E	55.45	56.9	+1.45	UP	✗ stable	✓ up	✓ up	✗ stable
Amazon	50.2	54.25	+4.05	UP	✗ stable	✗ down	✓ up	✗ stable
Viu	48.15	48.95	+0.8	UP	✗ stable	✗ stable	✓ up	✗ stable
GammaTime	46.15	48.5	+2.35	UP	✗ stable	✗ stable	✓ up	✗ stable
COL/BeLive	44.55	47.25	+2.7	UP	✗ stable	✓ up	✓ up	✗ stable
VERZA TV	32.3	32.7	+0.4	UP	✗ stable	✗ stable	✓ up	✗ stable
RTP	26.3	27.95	+1.65	UP	✗ stable	✗ stable	✓ up	✗ stable
Both Worlds	21.5	24.15	+2.65	UP	✗ stable	✗ stable	✓ up	✗ stable
KLIP	22.35	23.6	+1.25	UP	✗ stable	✗ stable	✓ up	✗ stable
Mansa	19.35	21.2	+1.85	UP	✗ stable	✗ stable	✓ up	✗ stable
TOTALS					1/21 4.8%	6/21 28.6%	18/21 85.7%	1/21 4.8%

Reading the Scorecard

Persist = last week's direction continues. Momentum = W11→W12 trend extrapolated. Mean Rev = scores below tier midpoint predicted to rise. KG Aug = knowledge graph triples weighted into prediction. Green ✓ = correct directional call. Red ✗ = wrong.

Method Comparison

Accuracy by Prediction Method

Two evaluation windows plus cumulative totals.

W12→W13 (This Evaluation — 21 Predictions)

Persistence

4.8%

Momentum

28.6%

Mean Reversion

85.7%

KG Augmented

4.8%

W11→W12 (Previous Evaluation — 17 Predictions)

Persistence

23.5%

Momentum

23.5%

Mean Reversion

47.1%

KG Augmented

23.5%

Cumulative (38 Total Predictions)

Persistence

5.3%

Momentum

26.3%

Mean Reversion

68.4%

KG Augmented

5.3%

Mean Reversion Is Inflated by Market Conditions

Mean reversion's 85.7% is inflated by market conditions. W13 had the strongest upward bias of any measured week (18/21 positive). Mean reversion predicted "up" for all 21 because every score was below the tier midpoint. This reflects market state, not model quality. Validation requires a mixed or down week.

Baseline Established

Mean reversion's 68.4% cumulative accuracy is the baseline to beat. The optimized KG configuration (69.9% on training data) deploys next cycle.

Notable Outcomes

Biggest Calls

Five predictions with the most diagnostic value for method selection.

Largest Mover — Correctly Called by Mean Reversion Only

Amazon +4.05 — The Fatafat Surprise

+4.05

Amazon launched its first dedicated micro-drama service (Fatafat via MX Player) on March 23. Free ad-supported, 150+ show slate, celebrity campaign. Went from "Platform Giant (Absent)" to "Platform Giant (Entering)." Content score +6, Narrative +7.

Only mean reversion predicted upward movement. Momentum predicted -3.2 based on the W11→W12 decline. Categorical strategy shifts (new product launch) invalidate trend-following.

Only Decline Correctly Called — Momentum

ReelShort -0.8 — The Slow Erosion Continues

-0.8

Only naive momentum correctly called ReelShort's continued decline. Head of Production still unreplaced. Absent from HRTS panel while DramaBox took the stage. COL Group parent pivoting to infrastructure. The W12 talent exodus signal persists.

Momentum outperforms when trends persist. ReelShort shows the clearest sustained-decline pattern in the dataset (3 consecutive weeks down).

Event-Driven Catalyst — Triple Announcement

iQiYi +1.6 — Triple Announcement

+1.6

HK listing, $100M buyback, Nadou Pro AI launch. Stock jumped 10%. Mean reversion called it; momentum and KG both predicted stable.

Event-driven catalysts (HK listing, buyback, AI launch) were not captured by momentum or KG methods. This signal class requires dedicated event-impact modeling.

Universally Missed — All Methods Wrong

Google/100 Zeros -0.7 — Post-Announcement Decay

-0.7

Every method got this wrong except persistence/KG (which called stable, not down). The March 12 announcement press cycle faded. No premieres, no concrete dates. Natural decay from announcement high.

Post-announcement decay is a documented pattern (press cycle fades without concrete execution milestones). Adding event-impact scoring to the pipeline would address this gap.

Strongest Consensus — Momentum + Mean Reversion

JioHotstar +3.15 — IPL Delivers

+3.15

IPL opening weekend hit 515M combined reach (+26% YoY). Tadka platform rollout with 100 microdramas in 7 languages. Momentum and mean reversion both called it correctly.

Momentum and mean reversion both predicted UP correctly. This is the only company where both methods agreed on direction in W13.

W13 Rankings

Competitive Landscape

Updated tier positions after W13 scoring. 18 of 21 companies moved up.

Tier 1 — Dominant (80+)

DramaBox 82.75(=)— held ceiling

ReelShort 81.2(-0.8)— eroding

Tier 2 — Strong (55–80)

Disney 77.1(+0.55)— Verts executing

iQiYi 67.3(+1.6)— triple announcement

JioHotstar 65.4(+3.15)— IPL delivers

Google/100Z 62.95(-0.7)— post-announcement decay

Holywater / My Drama 61.95(+0.3)— pre-launch

Netflix 60.95(+0.15)— still absent

GoodShort 60.2(+1.4)— 66M downloads, most efficient producer

CandyJar 59.55(+0.9)— rebrand complete

ShortMax 57.85(+1.2)— India expansion

Lifetime/A&E 56.9(+1.45)— strongest legacy pivot

Tier 3 — Emerging (40–55)

Amazon 54.25(+4.05)— Fatafat launch!

Viu 48.95(+0.8)— Pear Perfect viral

GammaTime 48.5(+2.35)— Forensic Files IP

COL/BeLive 47.25(+2.7)— Microdrama in a Box

Tier 4 — Vulnerable (<40)

VERZA TV 32.7(+0.4)

RTP 27.95(+1.65)— Portugal first-mover

Both Worlds 24.15(+2.65)— US-Africa co-prod

KLIP 23.6(+1.25)— India launch

Mansa 21.2(+1.85)— 7 countries

Tier Transitions

Boundary Watch

Amazon Approaching Tier 2

Amazon (54.25) moved from mid-Tier 3 toward the Tier 2 boundary at 55. One more week of Fatafat momentum could push it into the Strong tier. This would be the first platform giant to enter Tier 2 from below since tracking began.

GammaTime Climbing

GammaTime (48.5, +2.35) is approaching the Tier 3 midpoint. The Forensic Files IP deal gives it a content pipeline that most Tier 3 companies lack. If it maintains this trajectory, it reaches the Tier 2 boundary in 3–4 weeks.

Both Worlds — Largest Tier 4 Gain

Both Worlds (+2.65) had the largest gain among Tier 4 companies. The US-Africa co-production model provides geographic market access that other Tier 4 competitors lack. Still 16 points below the Tier 3 boundary.

Iteration 1

Prediction Pipeline — Iteration 1 Results

Two evaluation cycles complete. Baseline performance established for all four methods.

KG-Augmented: Default Config Context

The KG-augmented method ran with default (untuned) parameters for its first two evaluation cycles. It predicted "stable" for all 21 companies because the direction_threshold was set conservatively at 0.5. This is the starting configuration — the first tuning pass deploys next cycle. The knowledge graph features (momentum, anomalies, tier proximity, divergence) all exist in the graph but the weights connecting them to the decision layer had not been optimized.

The Optuna TPE optimizer found a 12-parameter configuration that lifts directional accuracy from 4.8% to 69.9% on training data. This configuration enables anomaly detection, raises the direction threshold to 1.295, and weights divergence and tier proximity signals into the prediction.

Configuration

Default vs. Optimized Parameters

The optimized config gets deployed for W14. Key parameter changes below.

Default Config

Optimized Config

direction_threshold

0.500

1.295 (+159%)

mean_reversion_rate

0.100

0.257 (+157%)

anomaly_contributes

false

true

divergence_weight

0.000

0.180

tier_proximity_weight

0.000

0.096

confidence_base

0.600

0.443

KG Accuracy: Default vs Optimized vs Benchmark

Default KG

4.8%

Optimized KG (train)

69.9%

Mean Rev (OOS)

85.7%

How the Pipeline Improves

Multiple Inputs, Simultaneous Iteration

The prediction pipeline is not a single-method experiment. It improves through multiple inputs running in parallel:

Better parameters. The optimized configuration deploys next cycle. As more weeks of evaluation data accumulate, the optimizer retrains on a larger dataset, producing progressively better parameter fits.

More data. Each week adds ~500 triples to the knowledge graph and one more evaluation point for every method. SPARQL queries get richer. Patterns that require 6–8 weeks of longitudinal data become detectable.

New research methods. Event impact analysis (event_impact_analyzer.py) and news signal processing are not yet in the nightly pipeline. Events like Amazon Fatafat and iQiYi's triple announcement are the signals that simple statistical methods miss. Integrating event detection adds a method class that no current approach covers.

Scoring methodology refinement. The SBPI dimension weights (DP 25%, CS 20%, NO 20%, CoS 20%, MI 15%) are fixed estimates. With enough weeks of data, TPE optimization of these weights could improve the underlying scoring itself, which improves everything downstream.

Directional predictions are one input into per-brand intelligence products. Additional weeks of data, event signals, and optimized parameters each contribute independently to accuracy gains.

Action Items

Next Steps

Improving predictions and brand intelligence, week over week.

Critical

Deploy Optimized Config

Wire the Optuna TPE-tuned 12-parameter configuration from best-config.json into the live prediction engine for W14. This activates anomaly detection, divergence weighting, and tier proximity signals.

High

Load W13 Data into Oxigraph

Expand the triple store from 1,672 to ~2,200+ triples. More longitudinal data improves every method — SPARQL queries get richer, patterns become more detectable. Run sbpi_to_rdf.py with W13 state data.

High

Generate W14 Predictions

All methods produce predictions for next week. The optimized KG config runs alongside existing methods. Each week builds the evaluation dataset. Lock predictions before W14 actuals drop.

Medium

Add Event Impact Signals

The event_impact_analyzer.py script exists but is not in the nightly pipeline yet. News events (Amazon Fatafat launch, iQiYi triple announcement) are the signals that simple statistical methods miss. Integrating event detection improves the intelligence product across all methods.

Medium

Experiment 3: Dimension Weight Learning

Now have 4 weeks of data. The current fixed dimension weights (DP 25%, CS 20%, NO 20%, CoS 20%, MI 15%) may not be optimal. TPE optimization of these weights could improve the underlying scoring, which improves everything downstream.

Medium

Brand Intelligence Cards

Generate per-company intelligence briefs from the SPARQL insight digests. These briefs combine directional predictions, dimension anomalies, and event signals into per-brand competitive position summaries.