Global Situational Awareness Dashboard

AI Capabilities (ECI)

Epoch Capabilities Index — composite score across 40+ benchmarks

METR Task Horizon

How long can AI agents reliably work on a task? (hours)

Hardware Cost Trend

TFLOPS × Memory GB per $1K at launch. Bubble size = VRAM.

Model Release Timeline

Frontier and open-weights releases

Recent Agent Releases

Ecosystem Events

Recent Geopolitical Events

Recent Regulatory Actions

All Model Releases

Epoch Capabilities Index (ECI)

Composite benchmark score across 40+ evaluations, by model accessibility. Source: Epoch AI

METR Task Horizon

50th-percentile time horizon for autonomous task completion. Source: METR

AA Intelligence Index — Cost-Adjusted

Composite AAII index − α·log₁₀(cost to run GPQA in USD), α≈14.5 refit on the Pareto frontier. Cost axis uses each model's near-launch input/output prices via Gundlach et al., so release-date ordering carries real temporal signal (unlike AAII's own XLSX cost column, which is re-baked at current prices). Capability axis is AAII's current-suite index — read the chart as "for this level of capability-as-measured-today, what did access cost at release?"

Capability per dollar over time (GPQA × near-launch price)

GPQA Diamond − 15·log₁₀(cost to run GPQA). Cost uses input/output prices observed at or near each model's first sample on AAII — not current prices — so release-date ordering reflects real capability-per-dollar progress. Data ends Oct 2025 — extending past the Gundlach snapshot needs per-model GPQA token counts that aren't in AAII's free API; running our own GPQA evals would close the gap. Source: Gundlach et al., "The Price of Progress".

AA Intelligence Index — Compute-Adjusted

AAII composite index − α·log₁₀(GPQA compute FLOPs), where FLOPs ≈ 2·active_params·tokens. Unlike the cost-adjusted chart, this axis is provider-independent — it measures capability against the actual compute cost of running the model at the edge, cutting through subsidised API pricing. Open-weights only for reliable active-param data; most closed MoE models are excluded (sizing is rumor). Source: Gundlach et al.

Compute per Dollar (TFLOPS/$1K)

Inference compute per $1K at launch. Bubble size = VRAM.

Bandwidth per Dollar (GB/s/$1K)

Memory bandwidth per $1K at launch. Bubble size = VRAM.

Composite: TFLOPS × Memory / $1K

Combined compute × memory capacity per $1K. Captures both throughput and working set size.

Hardware Releases

Agent Frameworks & Systems

Category emerged ~2025. Tracking autonomous AI system capabilities.

Geopolitical Events

Regulatory Actions

Ecosystem Events

Tuned Pre-tuning

ECI Capability Frontier

Best model score at each date (any accessibility) with 5-year projection. Open-weights shown as green overlay. Bands: 50/80/95% prediction intervals.

METR Task Horizon

Frontier autonomous task completion horizon with 5-year projection (log scale). Open-weights overlay in green.

TFLOPS × Memory / $1K

Composite compute × memory per $1K trend with 5-year projection (log scale).

Trend Summary

Indicators

Structured forecasting questions derived from trend projections. Updated daily.

Calibration

Backtest results: how well do prediction intervals match actual outcomes when tested on held-out data?

Market Context

Prediction market signals relevant to AI capability trends. Loaded from the Markets tab data.

Track Record

Scored predictions from trend-based forecasting. Includes retrodicted predictions (fit on past data, scored against actuals) and live predictions. Brier score: lower is better (0 = perfect, 0.25 = uninformed).

Market Forecasting

LLM price forecasts with trading simulation. Amber dots show implied E[price] at each forecast cutoff. Green/red shading shows simulated positions.

Prediction Market Signals

Biggest probability movers from Polymarket and Manifold. Sparklines show the last 90 calendar days.

All Tracked Markets

AI-related prediction markets with historical price data. Click column headers to sort.

Activity Log

Collector runs, curator updates, and data changes

System Status

Infrastructure health, external API state, last collector runs.