RAG Without Vectors: How PageIndex Retrieves by Reasoning

Retrieval is where most RAG systems quietly break. Traditional pipelines rely on vector similarity—embedding queries and document chunks into the same space and fetching the “closest” matches. But similarity is a weak proxy for what we actually need: relevance grounded in reasoning.

In long, professional documents—like financial reports, research papers, or legal texts—the right answer often isn’t in the most semantically similar paragraph. It requires navigating structure, understanding context, and performing multi-step reasoning across sections. This is exactly where vector-based RAG starts to fall apart.

PageIndex is designed to solve this gap by rethinking retrieval from first principles. Instead of chunking documents and searching via embeddings, it builds a hierarchical table-of-contents-style tree index and uses LLMs to reason over that structure—much like a human expert scanning sections, drilling down, and connecting ideas. This enables a vectorless, reasoning-driven retrieval process that is more interpretable, traceable, and aligned with how knowledge is actually extracted from complex documents.

By replacing similarity search with structured exploration and tree-based reasoning, PageIndex delivers significantly higher retrieval accuracy—demonstrated by its strong performance on benchmarks like FinanceBench—making it particularly effective for domains that demand precision and deep understanding. In this article, we’ll use PageIndex to index the seminal Transformer paper — “Attention Is All You Need” — and run two cross-cutting queries against it without a single vector or embedding. Instead of chunking the PDF and retrieving by similarity, PageIndex builds a hierarchical tree of the document’s sections, then uses GPT-5.4 to reason over node summaries and identify exactly which sections contain the answer — before reading a single word of full text.

Setting up the dependencies For this tutorial, you would require PageIndex & OpenAI API keys. You can get the same from https://dash.pageindex.ai/api-keys and https://platform.openai.com/api-keys respectively. Copy CodeCopiedUse a different Browserpip install pageindex openai requests Copy CodeCopiedUse a different Browserfrom pageindex import PageIndexClient import pageindex.utils as utils import os from getpass import getpass PAGEINDEX_API_KEY = getpass('Enter PageIndex API Key: ') pi_client = PageIndexClient(api_key=PAGEINDEX_API_KEY) We import the OpenAI client and configure it with an API key to enable access to LLMs.

Then, we define an asynchronous helper function that sends prompts to the model and returns the generated response. Copy CodeCopiedUse a different Browserimport openai OPENAI_API_KEY = getpass('Enter OpenAI API Key: ') async def call_llm(prompt, model="gpt-5.4", temperature=0): client = openai.AsyncOpenAI(api_key=OPENAI_API_KEY) response = await client.chat.completions.create( model=model, messages=[{"role": "user", "content": prompt}], temperature=temperature ) return response.choices[0].message.content.strip() Building the PageIndex Tree In this chunk, we download the Transformer paper directly from arXiv and submit it to PageIndex, which processes the PDF and builds a hierarchical tree of its sections — each node storing a title, a summary, and the full section text.

Once the tree is ready, we print it out to inspect the structure PageIndex has inferred: every chapter, subsection, and nested heading becomes a node in the tree, preserving the document’s natural organization exactly as the authors intended it. Copy CodeCopiedUse a different Browser# ───────────────────────────────────────────── # Step 1: Build the PageIndex Tree # ───────────────────────────────────────────── # 1.1 Download the Transformer paper and submit it import os, requests pdf_url = "https://arxiv.org/pdf/1706.03762.pdf" pdf_path = os.path.join("data", pdf_url.split("/")[-1]) os.makedirs("data", exist_ok=True) print("Downloading 'Attention Is All You Need'...") response = requests.get(pdf_url) with open(pdf_path, "wb") as f: f.write(response.content) print(f" Saved to {pdf_path}") doc_id = pi_client.submit_document(pdf_path)["doc_id"] print(f" Document submitted. doc_id: {doc_id}") # 1.2 Retrieve the tree (poll until ready) import time print("\nWaiting for PageIndex tree to be ready", end="") while not pi_client.is_retrieval_ready(doc_id): print(".", end="", flush=True) time.sleep(5) tree = pi_client.get_tree(doc_id, node_summary=True)["result"] print("\n\n Document Tree Structure:") utils.print_tree(tree) Reasoning-Based Retrieval With the tree built, we now run a query that is intentionally cross-cutting — one that can’t be answered by a single section of the paper.

We strip the full text from each node, leaving only titles and summaries, and pass the entire tree structure to GPT-5.4. The model then reasons over these summaries to identify every node likely to contain a relevant answer, returning both its step-by-step thinking and a list of matched node ID