Scaling Karpathy's Autoresearch: What Happens When the Agent Gets a GPU Cluster

What it is
Picture Karpathy's AutoResearch—the LLM agent that designs and runs ML experiments—but instead of a laptop, it controls a cloud GPU cluster. SkyPilot gave the agent infrastructure APIs: spawn VMs, launch distributed training jobs, collect logs. The agent treats compute like a function call, not a resource you manually provision.
Why it matters
Most AI agent demos hit a wall: they can write code but can't run serious experiments. This shows the next layer—agents that orchestrate infrastructure. If you're building agentic workflows that need compute (training, simulation, rendering), you'll need tooling that exposes infrastructure as clean APIs, not YAML configs and kubectl commands.
Key details
- •Built on SkyPilot's managed spot instances and auto-recovery features
- •Agent uses SkyPilot Python API to launch jobs, monitor status, and retrieve results programmatically
- •Handles distributed training setup (multi-node PyTorch) without manual cluster config
- •Automatically recovers from spot instance preemptions mid-experiment
- •Open source experiment—SkyPilot team published setup details and agent modifications
Worth watching
4:20The AI That Researches Itself: Inside Karpathy's Autoresearch
NewTechWorld
Provides a comprehensive overview of Karpathy's Autoresearch concept and architecture, establishing the foundational understanding needed to explore scaling scenarios.
5:12AutoResearch on MacBook Pro (Apple M2 Pro): Running Automated AI Research on Consumer Hardware
Alex Hitt, The Great Discovery Pro
Demonstrates practical implementation of automated research systems and offers crucial insights into resource constraints and optimization strategies that directly inform understanding of GPU cluster scaling.
Video data provided by YouTube. Videos link to youtube.com.