March 20, 2026AgentsInfrastructureOpen SourceResearch

SkyPilot Scales Karpathy's Autoresearch: AI Agent Runs 910 Experiments on 16 GPUs Autonomously

SkyPilot published results on March 18 showing what happens when you give an AI coding agent access to a GPU cluster instead of a single machine. The experiment scales Andrej Karpathy's Autoresearch project — where an AI agent autonomously improves a neural network training script by editing code, running experiments, and iterating on results.

The results are striking: Claude Code completed 910 experiments in 8 hours across 16 GPUs, compared to roughly 10 per hour on a single GPU. Validation metrics improved from 1.003 to 0.974 bits-per-byte (2.87% gain), with a 9x speedup reaching equivalent results compared to a sequential baseline.

Most notably, the agent exhibited emergent behavior — it independently discovered it had access to different GPU types (H100s and H200s) and developed a two-tier validation strategy without being explicitly instructed to do so. The agent provisions clusters, submits parallel experiments via YAML configurations, monitors results, and commits winning changes autonomously.

SkyPilot is an open-source tool that lets agents launch and manage GPU clusters across Kubernetes, AWS, GCP, and Azure. The full setup, including agent instructions and YAML templates, is available at https://github.com/skypilot-org/skypilot in the examples/autoresearch directory.

This demonstrates a key infrastructure need in the agentic ecosystem: agents that can self-provision compute resources and scale their own experiments without human intervention. Blog post: https://blog.skypilot.co/scaling-autoresearch/
← Previous
Google Colab MCP Server: Connect Any AI Agent to Cloud Notebooks with GPU Access
Next →
Surf AI Raises $57M for Agentic Security Operations Platform
← Back to all articles
>_