DCI-Agent Says Drop the Vector DB and Let Agents Use Grep
A new arXiv paper from TIGER-Lab, "Beyond Semantic Similarity," argues that the entire embeddings-and-vector-index retrieval stack is unnecessary for agentic search. Just give the agent grep, file reads, and a shell. Let it search the corpus directly.
The numbers earned the HuggingFace Daily Papers #2 spot. +30.7% on multi-hop QA. +21.5% on IR Ranking. +11% on BrowseComp-Plus. The approach beat strong sparse, dense, and reranking baselines on BRIGHT and BEIR. Code released at github.com/DCI-Agent/DCI-Agent-Lite.
The 19-author list is a who's-who: Yejin Choi, Jimmy Lin, Jiawei Han, Wenhu Chen. When the senior IR community signs on to a paper that says "the index is the wrong abstraction," that's a real shift, not a curiosity.
This is the third major paper in two weeks arguing the same thing. PageIndex (May 8) said RAG should be a hierarchical document tree the model navigates, not a vector lookup. Chroma's Context-1 said the same. Tongyi DeepResearch and LongSeeker said it for search agents. Now DCI-Agent puts a sharper edge on it: drop indexing entirely, give the model shell tools.
The structural read: "embedding model + vector store + reranker" is a 2022 architecture that the field is publicly retiring. The wedge is that frontier models are good enough at reasoning that retrieval-as-reasoning beats retrieval-as-similarity-search at the task level. Watch the next 60 days for the cloud-database vendors (Pinecone, Weaviate, Qdrant) to publish defenses. Source: https://arxiv.org/abs/2605.05242
← Back to all articles
The numbers earned the HuggingFace Daily Papers #2 spot. +30.7% on multi-hop QA. +21.5% on IR Ranking. +11% on BrowseComp-Plus. The approach beat strong sparse, dense, and reranking baselines on BRIGHT and BEIR. Code released at github.com/DCI-Agent/DCI-Agent-Lite.
The 19-author list is a who's-who: Yejin Choi, Jimmy Lin, Jiawei Han, Wenhu Chen. When the senior IR community signs on to a paper that says "the index is the wrong abstraction," that's a real shift, not a curiosity.
This is the third major paper in two weeks arguing the same thing. PageIndex (May 8) said RAG should be a hierarchical document tree the model navigates, not a vector lookup. Chroma's Context-1 said the same. Tongyi DeepResearch and LongSeeker said it for search agents. Now DCI-Agent puts a sharper edge on it: drop indexing entirely, give the model shell tools.
The structural read: "embedding model + vector store + reranker" is a 2022 architecture that the field is publicly retiring. The wedge is that frontier models are good enough at reasoning that retrieval-as-reasoning beats retrieval-as-similarity-search at the task level. Watch the next 60 days for the cloud-database vendors (Pinecone, Weaviate, Qdrant) to publish defenses. Source: https://arxiv.org/abs/2605.05242
Comments