RLM (Recursive Language Models)
Analyze 10M+ token codebases with fixed cost ~$0.47. No context overflow.
The Problem
Even modern LLMs with huge context windows suffer from "context rot" — performance degrades with very long prompts. Sending 1M tokens to the LLM is expensive and produces worse results.
The RLM Solution
RLM treats the document as an "external environment", not input. The document is loaded into a Python REPL, and the model navigates it programmatically with sub-LLM calls for chunk analysis.
RLM Flow (4 API turns to root model): ┌─────────────────────────────────────────────────────────┐ │ Turn 1: Load document → rlm_load │ │ Turn 2: Explore structure → python_repl │ │ Turn 3: Execute subcalls → 20x rlm_subcall (parallel) │ │ Turn 4: Synthesize → final answer │ └─────────────────────────────────────────────────────────┘ Cost components: - Root model (Opus 4.5): $0.30 (4 turns, ~26K accumulated context) - Subcalls (Haiku 4.5): $0.17 (20 × $0.008) - Total: ~$0.47 per RLM invocation ⚡ KEY INSIGHT: RLM cost is FIXED (~$0.47) regardless of document size! - 100K tokens: $0.47 - 500K tokens: $0.47 - 1M tokens: $0.47 - 5M tokens: $0.47 The document is loaded into Python REPL, not sent to LLM.
Cost Comparison
Traditional Approach
500K tokens$2.56
1M tokens$5.06
5M tokens— exceeds context
Cost scales linearly. Quality degrades with size.
RLM Approach
500K tokens$0.47
1M tokens$0.47
5M tokens$0.47
Fixed cost. No context limits.
How It Works
- 1. Load — Document loaded into sandboxed Python REPL as variable
P - 2. Plan — Opus writes Python code to chunk and navigate the document
- 3. Analyze — Haiku 4.5 processes each chunk in parallel (up to 50 subcalls)
- 4. Synthesize — Opus combines findings into final answer
RLM Tools
| Tool | Description |
|---|---|
| rlm_load | Load file/URL into Python REPL as variable P |
| rlm_subcall | Call sub-LLM to analyze a chunk (max 50 per request) |
| rlm_search | Search within loaded document |
| python_repl | Execute Python code in the persistent REPL |
Configuration
# Auto-activation threshold (tokens)
export TAU_RLM_CONTEXT_THRESHOLD=50000
# Max subcalls per RLM request
export TAU_RLM_MAX_SUBCALLS=50
# Sub-model for chunk analysis (default: auto-detected cheap model)
export TAU_RLM_SUB_MODEL=claude-haiku-4-5 # Anthropic
export TAU_RLM_SUB_MODEL=gpt-4o-mini # OpenAI
export TAU_RLM_SUB_MODEL=llama3.2:3b # Ollama (free)
# Sub-model provider
export TAU_RLM_SUB_PROVIDER=anthropic # or openai, ollamaExample Usage
# In TAU, just ask about large files:
> Analyze the security issues in this 500K token codebase
# TAU automatically:
# 1. Detects file size exceeds threshold
# 2. Activates RLM mode
# 3. Loads document into Python REPL
# 4. Orchestrates subcalls for analysis
# 5. Synthesizes findings
# Or manually trigger RLM:
> /rlm load ./large-codebase.rs
> Find all SQL injection vulnerabilitiesBest Use Cases
- Large codebases — Analyze entire repositories (500K+ LOC)
- Legal documents — Review 200+ page M&A contracts
- Security audits — Find vulnerabilities across massive codebases
- Data analysis — Process large JSON/CSV datasets
- Documentation — Summarize extensive technical docs
Important Notes
- RLM requires a Python REPL — currently macOS only
- Document stays local in REPL — never sent to LLM
- Only chunked excerpts reach the sub-LLM
- Works on existing models (no retraining needed)