Ross AI is a Recursive Language Model engineered by our lab for a law firm dealing with massive legal documents daily. It eliminates context rot and context window limitations — making entire 10,000-page document bodies fully navigable, queryable, and analyzable without information loss.
Built for a law firmThe law firm worked with contracts, regulatory filings, case law, and compliance documents that routinely ran into hundreds of pages. They had tried using commercial LLM tools to assist with review, summarization, and clause extraction. The results were promising on short documents — and unreliable on everything else.
The root cause is a fundamental architectural limitation. Standard large language models operate within a fixed context window — a ceiling on how much text the model can "see" at any given time. When a document exceeds that window, the model either truncates it, chunks it into fragments and processes each one in isolation, or attempts to summarize its way down to a manageable size. In every case, information is lost. Context from page twelve stops informing the interpretation of page three hundred. Cross-references break. Nuance degrades. This phenomenon — where model coherence and factual reliability decay as the input grows — is known as context rot.
For a law firm, this isn't a minor inconvenience. A missed cross-reference between a liability clause on page forty and an indemnification exception on page two hundred can mean a material misreading of a contract. The firm needed a system that could hold an entire document in mind — every page, every clause, every cross-reference — and reason over it as a unified whole, not as a bag of fragments.
Ross AI is a Recursive Language Model — an inference architecture where the language model can decompose and recursively interact with input context of unbounded length. Unlike standard LLM approaches that treat the entire document as a single input and hope the context window holds, Ross AI operates as a thin orchestration layer around a base language model that can spawn recursive LM calls for intermediate computation. The root model never directly sees the entire context. Instead, the full document body is stored in an external environment, and the model decides how to break it down — peeking at sections, partitioning content, searching through it, and launching recursive sub-queries over relevant portions.
The system is context-centric rather than problem-centric. When a query is issued, only the query itself is provided to the root language model. The potentially massive document context — tens of thousands of pages — lives in a computational environment that the model can interact with programmatically. The root model examines the structure of the context, identifies what portions are relevant, and spawns child LM calls that each operate over manageable sub-contexts. The outputs of those recursive calls flow back to the root model, which synthesizes them into a final answer.
What makes this architecture powerful for legal work is the emergent reasoning strategies it develops autonomously. The model learns to peek at initial sections to understand document structure before committing to a deeper analysis strategy. It greps through the full context using keyword and regex searches to narrow down relevant clauses — far more precise than semantic similarity retrieval. It partitions large documents into chunks and maps recursive calls across them to extract answers. And it summarizes intermediate results to keep its own working context clean while reasoning over a document that may span millions of tokens.
The firm's lawyers can query Ross AI against an entire document body — asking questions like "Does this contract contain any clause that limits our liability in the event of a data breach, and if so, does any other clause elsewhere modify or override that limitation?" — and receive answers that are grounded in the full text, with exact citations to the relevant pages and clauses. The input-output interface is identical to a standard LLM call — the recursive complexity is entirely abstracted away.
The system also supports comparative analysis across multiple documents — surfacing inconsistencies, conflicting terms, or missing provisions between a draft and a reference document, even when both documents are tens of thousands of pages long.
Ross AI's core architecture is built on the Recursive Language Model paradigm — a fundamentally different approach to long-context reasoning than standard retrieval-augmented generation or extended context windows.
The foundation is an environment-based execution layer. Rather than passing the full document into a language model's context window, Ross AI stores the entire document body in a computational environment — a sandboxed runtime that the root language model can interact with programmatically. The root model writes and executes code blocks against this environment, receives truncated outputs within its own context, and launches recursive LM calls as function invocations. This keeps the root model's context window uncluttered while giving it access to documents of effectively unlimited length. A document of ten million tokens is no different architecturally from a document of ten thousand.
The recursive decomposition layer is where the architecture diverges most sharply from conventional approaches. When a query arrives, the root model does not rely on a pre-built vector index or static chunking strategy. Instead, it autonomously decides how to break down the problem. It begins by peeking — examining the initial sections of the document to understand its structure, identify section boundaries, and determine a decomposition strategy. It then uses programmatic search — regex matching, keyword extraction, and structural pattern recognition — to locate relevant portions of the context. This is fundamentally more precise than embedding-based semantic retrieval, which often misses exact clause references, defined terms, and cross-reference chains that are critical in legal documents.
The partition-and-map execution engine handles the core reasoning. Once the root model identifies relevant document regions, it partitions them into sub-contexts and spawns recursive LM calls — child language model invocations that each receive a focused slice of the document along with the original query. Each child call reasons over its sub-context independently and returns a structured result. The root model then synthesizes these intermediate outputs, resolving conflicts, identifying gaps, and — if necessary — launching additional recursive calls to fill in missing context. This recursive loop continues until the system converges on an answer that is fully grounded in the source material.
For tasks that require programmatic precision — such as tracking changes across contract versions, comparing clause language between documents, or generating structured clause inventories — the execution environment allows the model to write and run code directly rather than attempting to reason over the output linguistically. Diff tracking, table extraction, and cross-reference mapping are all handled computationally, with the language model orchestrating the process rather than brute-forcing it through token prediction.
The entire pipeline is augmented with a citation grounding module that maps every claim in the model's output back to specific page numbers, clause identifiers, and exact passages in the source document — ensuring that the firm's lawyers can verify every output against the original text without searching for it.