nCompass: A GPU Kernel Agent That Outperforms NVIDIA's Own Code
Writing GPU kernels is one of those tasks that separates senior systems engineers from everyone else. It takes weeks of profiling, architecture-specific tuning, and deep knowledge of memory hierarchies. nCompass, a YC W24 startup, just shipped an AI agent that does it in a day and beats NVIDIA's reference implementation.
The pitch is concrete: their agent analyzes GPU system performance from CPU-GPU interactions down to individual kernels, then pairs with Cursor or Claude Code to automate both the reasoning and the code implementation. Using nCompass, the team implemented a Hopper GEMM kernel that outperformed NVIDIA's CUTLASS GEMMs by 3%. What used to take months now takes a day.
The product is a VSCode extension that integrates directly into your existing workflow. It includes system trace diffing, collaboration features, and the core performance analysis agent. Free to use. No switching IDEs, no new workflow to learn. You stay in Cursor or Claude Code and the agent handles the GPU optimization work alongside you.
This fills a real gap in the agentic ecosystem. We've seen plenty of coding agents that write application-level code, but almost none that can reason about hardware-level performance. GPU kernel optimization is becoming a critical bottleneck as AI inference costs dominate budgets. An agent that can shave 3% off your GEMM operations compounds into serious money at scale.
https://www.ncompass.tech/
← Back to all articles
The pitch is concrete: their agent analyzes GPU system performance from CPU-GPU interactions down to individual kernels, then pairs with Cursor or Claude Code to automate both the reasoning and the code implementation. Using nCompass, the team implemented a Hopper GEMM kernel that outperformed NVIDIA's CUTLASS GEMMs by 3%. What used to take months now takes a day.
The product is a VSCode extension that integrates directly into your existing workflow. It includes system trace diffing, collaboration features, and the core performance analysis agent. Free to use. No switching IDEs, no new workflow to learn. You stay in Cursor or Claude Code and the agent handles the GPU optimization work alongside you.
This fills a real gap in the agentic ecosystem. We've seen plenty of coding agents that write application-level code, but almost none that can reason about hardware-level performance. GPU kernel optimization is becoming a critical bottleneck as AI inference costs dominate budgets. An agent that can shave 3% off your GEMM operations compounds into serious money at scale.
https://www.ncompass.tech/
Comments