Berkeley Researchers Gamed Top AI Agent Benchmarks in Days