RLM (Recursive Language Models)

Analyze 10M+ token codebases with fixed cost ~$0.47. No context overflow.

The Problem

Even modern LLMs with huge context windows suffer from "context rot" — performance degrades with very long prompts. Sending 1M tokens to the LLM is expensive and produces worse results.

The RLM Solution

RLM treats the document as an "external environment", not input. The document is loaded into a Python REPL, and the model navigates it programmatically with sub-LLM calls for chunk analysis.

RLM Flow (4 API turns to root model):
┌─────────────────────────────────────────────────────────┐
│ Turn 1: Load document    → rlm_load                     │
│ Turn 2: Explore structure → python_repl                 │
│ Turn 3: Execute subcalls → 20x rlm_subcall (parallel)   │
│ Turn 4: Synthesize       → final answer                 │
└─────────────────────────────────────────────────────────┘

Cost components:
- Root model (Opus 4.5):    $0.30 (4 turns, ~26K accumulated context)
- Subcalls (Haiku 4.5):     $0.17 (20 × $0.008)
- Total:                    ~$0.47 per RLM invocation

⚡ KEY INSIGHT: RLM cost is FIXED (~$0.47) regardless of document size!
   - 100K tokens: $0.47
   - 500K tokens: $0.47
   - 1M tokens:   $0.47
   - 5M tokens:   $0.47
   The document is loaded into Python REPL, not sent to LLM.

Cost Comparison

Traditional Approach

500K tokens$2.56

1M tokens$5.06

5M tokens— exceeds context

Cost scales linearly. Quality degrades with size.

RLM Approach

500K tokens$0.47

1M tokens$0.47

5M tokens$0.47

Fixed cost. No context limits.

How It Works

1. Load — Document loaded into sandboxed Python REPL as variable P
2. Plan — Opus writes Python code to chunk and navigate the document
3. Analyze — Haiku 4.5 processes each chunk in parallel (up to 50 subcalls)
4. Synthesize — Opus combines findings into final answer

RLM Tools

Tool	Description
rlm_load	Load file/URL into Python REPL as variable P
rlm_subcall	Call sub-LLM to analyze a chunk (max 50 per request)
rlm_search	Search within loaded document
python_repl	Execute Python code in the persistent REPL

Configuration

# Auto-activation threshold (tokens)
export TAU_RLM_CONTEXT_THRESHOLD=50000

# Max subcalls per RLM request
export TAU_RLM_MAX_SUBCALLS=50

# Sub-model for chunk analysis (default: auto-detected cheap model)
export TAU_RLM_SUB_MODEL=claude-haiku-4-5        # Anthropic
export TAU_RLM_SUB_MODEL=gpt-4o-mini             # OpenAI
export TAU_RLM_SUB_MODEL=llama3.2:3b             # Ollama (free)

# Sub-model provider
export TAU_RLM_SUB_PROVIDER=anthropic            # or openai, ollama

Example Usage

# In TAU, just ask about large files:
> Analyze the security issues in this 500K token codebase

# TAU automatically:
# 1. Detects file size exceeds threshold
# 2. Activates RLM mode
# 3. Loads document into Python REPL
# 4. Orchestrates subcalls for analysis
# 5. Synthesizes findings

# Or manually trigger RLM:
> /rlm load ./large-codebase.rs
> Find all SQL injection vulnerabilities

Best Use Cases

Large codebases — Analyze entire repositories (500K+ LOC)
Legal documents — Review 200+ page M&A contracts
Security audits — Find vulnerabilities across massive codebases
Data analysis — Process large JSON/CSV datasets
Documentation — Summarize extensive technical docs

Important Notes

RLM requires a Python REPL — currently macOS only
Document stays local in REPL — never sent to LLM
Only chunked excerpts reach the sub-LLM
Works on existing models (no retraining needed)