PageIndex hit 29.5K stars selling the same idea everyone keeps re-discovering — vectors are not retrieval
VectifyAI's PageIndex is trending today on GitHub at 953 stars/day — total 29.5K stars now. The pitch: stop using vector databases for retrieval. Build a hierarchical document tree that the LLM navigates through reasoning, not similarity search. They call it vectorless RAG, or reasoning-based retrieval.
The mechanism is basic enough that you wonder why this took until 2026 to become canonical. PageIndex turns a long document into a tree shaped like a table of contents. When the agent needs information, it walks the tree by reading section headers and reasoning about which subtree probably has what it needs. No embeddings, no chunking, no cosine similarity. The output is traceable — you get exact page references, not "vector ID 482739" — and naturally chunked by document semantics rather than by 512-token windows that cut sentences in half.
The customer proof is the part that closes the case. VectifyAI's own Mafin 2.5 system, built on PageIndex, hit 98.7% accuracy on FinanceBench. That's not a benchmark vendors run to advertise their own tools — it's a hard finance Q&A benchmark that has been embarrassing vector RAG for two years. The phrase the README keeps repeating is: similarity is not relevance, and relevance requires reasoning.
Pair this with Chroma's Context-1 from earlier this year, Tongyi DeepResearch's context engineering, the LongSeeker paper from this Tuesday, and the OpenSearch-VL recipe — there are now five separate teams in 2026 publicly arguing that the embedding-search-vector-store pipeline is a 2022 architecture being kept alive by cloud-database vendors who already sold it. PageIndex is the one with 29.5K stars saying the quiet part loud.
Repo: https://github.com/VectifyAI/PageIndex
← Back to all articles
The mechanism is basic enough that you wonder why this took until 2026 to become canonical. PageIndex turns a long document into a tree shaped like a table of contents. When the agent needs information, it walks the tree by reading section headers and reasoning about which subtree probably has what it needs. No embeddings, no chunking, no cosine similarity. The output is traceable — you get exact page references, not "vector ID 482739" — and naturally chunked by document semantics rather than by 512-token windows that cut sentences in half.
The customer proof is the part that closes the case. VectifyAI's own Mafin 2.5 system, built on PageIndex, hit 98.7% accuracy on FinanceBench. That's not a benchmark vendors run to advertise their own tools — it's a hard finance Q&A benchmark that has been embarrassing vector RAG for two years. The phrase the README keeps repeating is: similarity is not relevance, and relevance requires reasoning.
Pair this with Chroma's Context-1 from earlier this year, Tongyi DeepResearch's context engineering, the LongSeeker paper from this Tuesday, and the OpenSearch-VL recipe — there are now five separate teams in 2026 publicly arguing that the embedding-search-vector-store pipeline is a 2022 architecture being kept alive by cloud-database vendors who already sold it. PageIndex is the one with 29.5K stars saying the quiet part loud.
Repo: https://github.com/VectifyAI/PageIndex
Comments