AI Benchmarks

The AI industry spent last week congratulating itself. Jensen Huang declared on Lex Fridman’s podcast that “we’ve achieved AGI.” Arm named its new data center chip the “AGI CPU.” Sam Altman said they’ve “basically built AGI.” Then François Chollet’s ARC Prize Foundation dropped ARC-AGI-3, and the results were humiliating. Google’s Gemini 3.1 Pro — the best performer — scored 0.37%. GPT-5.4 managed 0.26%. Claude Opus 4.6 hit 0.25%. Grok-4.20 scored a clean, round zero. ...