Deepbox AI Benchmark v1.0.0

Leaderboard

DeepboxBench now ships 452 executable tasks across all 13 Deepbox scopes and 4 difficulty tiers. Frontier models should land around 85%+, solid models around 40-85%, and weak models below 40% once the full bank is rerun.

Model (0)DeepScoreCoreTensorRandomTablesDataPrepStatsLinalgEvalMLNeuralOptimViz
No benchmark runs are published for the current task catalog yet. Re-run models against the v1.0.0 bank to populate the leaderboard.