Deepbox AI Benchmark v1.0.0
Leaderboard
DeepboxBench now ships 452 executable tasks across all 13 Deepbox scopes and 4 difficulty tiers. Frontier models should land around 85%+, solid models around 40-85%, and weak models below 40% once the full bank is rerun.
| Model (0) | DeepScore | Core | Tensor | Random | Tables | Data | Prep | Stats | Linalg | Eval | ML | Neural | Optim | Viz |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| No benchmark runs are published for the current task catalog yet. Re-run models against the v1.0.0 bank to populate the leaderboard. | ||||||||||||||