Deepbox AI Benchmark v1.0.0

Leaderboard

DeepboxBench now ships 452 executable tasks across all 13 Deepbox scopes and 4 difficulty tiers. Frontier models should land around 85%+, solid models around 40-85%, and weak models below 40% once the full bank is rerun.

Model (0)	DeepScore	Core	Tensor	Random	Tables	Data	Prep	Stats	Linalg	Eval	ML	Neural	Optim	Viz
No benchmark runs are published for the current task catalog yet. Re-run models against the v1.0.0 bank to populate the leaderboard.