Grep Beats Vector Search for Coding Agents
A paper titled "Is Grep All You Need? How Agent Harnesses Reshape Agentic Search" landed on arXiv May 14 (2605.15184). It runs 116 questions from LongMemEval across four production harnesses (Chronos, Claude Code, Codex, Gemini CLI) and reports a direct finding: grep generally yields higher accuracy than vector retrieval. Result strength depends on harness and tool-calling style but the headline holds.
This is the published version of an argument that has been growing inside coding-agent teams for months. Anthropic shipped Claude Code with grep as a first-class tool and no vector store. Codex moved the same direction. Cursor still pushes embeddings hard. The paper's second experiment shows how performance degrades when relevant passages are buried inside distracting conversation history, which is a fair model of long agentic sessions.
If you have a vector database in your coding-agent pipeline because that was the 2023 default, this is the paper to forward to whoever signs off on the bill. Embedding store cost, freshness lag, and indexing complexity stop justifying themselves when grep with a half-decent harness wins on accuracy.
Paper at arxiv.org/abs/2605.15184. Pairs with the broader "context engineering > RAG" thesis that Karpathy, Anthropic, and the Claude Code team have been arguing in public for the past quarter.
← Back to all articles
This is the published version of an argument that has been growing inside coding-agent teams for months. Anthropic shipped Claude Code with grep as a first-class tool and no vector store. Codex moved the same direction. Cursor still pushes embeddings hard. The paper's second experiment shows how performance degrades when relevant passages are buried inside distracting conversation history, which is a fair model of long agentic sessions.
If you have a vector database in your coding-agent pipeline because that was the 2023 default, this is the paper to forward to whoever signs off on the bill. Embedding store cost, freshness lag, and indexing complexity stop justifying themselves when grep with a half-decent harness wins on accuracy.
Paper at arxiv.org/abs/2605.15184. Pairs with the broader "context engineering > RAG" thesis that Karpathy, Anthropic, and the Claude Code team have been arguing in public for the past quarter.
Comments