CERES

Dashboard→Regions→Map→Sub-national→Methodology→API→About→Track Record→Validation→Impact→Data→Changelog→Sign In→

Accuracy Metrics

0 predictions \u00B7 grading from Jun 2026

Brier Score

Pending

Target < 0.10 ⏳ Grading from Jun 2026

SI Coverage (90%)

Pending

Target > 88% ⏳ Grading from Jun 2026

Brier Skill Score

Pending

Target > 0 ⏳ Grading from Jun 2026

Total Predictions

Target ≥ 43/week ✗ Missed

Note: 0 predictions issued and awaiting grading. First grading window opens June 7, 2026 (March 9 run + 90 days). Brier decomposition computed automatically when ≥10 predictions are graded.

Pending Verification

Predictions Awaiting Grading Window

Loading…

Loading verification ledger\u2026

Public Prediction Ledger

Graded Predictions \u2014 Forward Validation

Loading…

NO GRADED PREDICTIONS YET

First grading window opens June 7, 2026 (March 9 run + 90 days). IPC outcome grading will occur automatically when OCHA/IPC publish the classification for each monitored region. This ledger updates every Monday.

Calibration — Awaiting Prospective Data

87 IPC Records · 31 Countries · 4 Back-validation Cases

Model initialised against 87 IPC transition records (2011–2023, 31 countries). 4 data-complete back-validation cases.

Reliability Diagram \u2014 Predicted vs. Observed Probability

Perfect calibration lies on the diagonal. Points above = underconfident; below = overconfident.

Well-calibrated (±10%)

Outside tolerance

Perfect calibration

Calibration by Predicted Probability Bin

Grey = ideal calibration · Amber = CERES observed rate · (n) = predictions in bin

Validation Dataset Breakdown

IPC transition records	87 country-seasons
Countries represented	31
Time period	2011–2023
Phase 4–5 events	18
Back-validation cases	4 (data-complete only)
Perturbation draws	n=2,000 per prediction
Interval type	Input-perturbation 90%

Pre-Registered Calibration Protocol

What We Commit to Measuring

Table 1 from the CERES preprint. These metrics were pre-registered before any prospective outcome data was collected. No metrics will be selectively reported — all graded predictions remain permanently visible. Minimum sample sizes are fixed; targets cannot be revised retroactively.

Metric	Definition	Min. N	Target date	Status
Brier Score	Mean (P̂₃ − O₃)²	100 predictions	Jun 2026	⏳ Pending
Brier Skill Score	1 − BS / BS_climatology	100 predictions	Jun 2026	⏳ Pending
TIER-1 Precision	True TIER-1 / all TIER-1 issued	30 TIER-1 alerts	Sep 2026	⏳ Pending
TIER-1 Recall	True TIER-1 / all Phase 4+ events	10 Phase 4+ events	Sep 2026	⏳ Pending
Sensitivity interval coverage	Fraction outcomes in 90% interval	200 predictions	Sep 2026	⏳ Pending
CRPS (ordered categorical)	Full distribution vs IPC phase	500 predictions	Mar 2027	⏳ Pending
Reliability diagram	Forecast prob. vs empirical frequency	500 predictions	Mar 2027	⏳ Pending

Pre-registered in Pedersen (2026), Table 1. Protocol locked prior to accumulation of prospective outcome data.

The CERES Transparency Commitment

Every prediction CERES issues is permanently recorded in this ledger with a timestamp, probability estimate, confidence interval, and T+90 day grading date. We do not remove predictions that prove incorrect. We analyse and publish the reasons for forecast errors. The accuracy record here is the complete record — there is no curated subset. This is the foundation of institutional trust.