Why I Built a Tool That Reads Codebases Like an Architect
Understanding a large codebase is one of the hardest problems in software engineering. You can read individual files, trace a few imports, grep for patterns — but the architecture? The circular dependencies hiding three layers deep? The hub module that half the codebase depends on? That stays invisible until something breaks.
LLMs don’t solve this. Even with million-token context windows, models suffer context degradation — retrieval accuracy drops, relationship tracking fails, and the model starts confusing which module depends on which. Kubernetes has over 12,000 files. You can’t dump that into a prompt and ask “what’s wrong?”
This is why I built RLM-Codelens — a tool that combines deterministic graph analysis with Recursive Language Models to deliver architecture intelligence at scale.
Architecture Is a Graph Problem
My first instinct was to throw an LLM at it. Everyone’s first instinct.
It doesn’t work — because architecture is a graph problem, not a text problem.
Architecture isn’t what’s inside each file. It’s the relationships between files. Module A imports Module B, which imports Module C, which imports Module A again. That cycle is invisible if you’re reading files individually. It only appears when you see the whole graph.
So I flipped the approach: build the graph first, reason second.
How RLM-Codelens Works
The core is not an LLM. It’s NetworkX.
Every codebase is a directed graph. Modules are nodes. Import statements are edges. Once you have that graph, graph theory does the heavy lifting:
graph LR
A["Cycle Detection"] --- B["Hub Analysis"]
B --- C["Layer Classification"]
C --- D["Anti-Pattern Detection"]
Cycle detection finds circular dependencies — import chains that loop back on themselves. These create tight coupling and make refactoring dangerous.
Hub analysis identifies modules with high fan-in — files that many other modules depend on. Changing a hub has a large blast radius.
Layer classification assigns modules to architectural layers and flags violations — like a utility module importing from the application layer. The dependency arrow points the wrong way.
Anti-pattern detection finds structural problems that emerge from relationships between files, not from individual files in isolation.
None of this requires an LLM. No API keys, no cost, no hallucination risk. Just parsing and math.
Multi-Language Parsing
Getting the graph right means getting the parsing right. A missed import is a missing edge — and a missing edge means invisible dependencies.
For Python, I use the built-in ast module — actual abstract syntax tree parsing, not regex. This correctly handles relative imports, from x import y, conditional imports inside TYPE_CHECKING guards, and __init__.py re-exports.
# All of these produce edges in the dependency graph
import os # standard import
from pathlib import Path # from-import
from ..utils import helper # relative import
from typing import TYPE_CHECKING # conditional import guard
if TYPE_CHECKING:
from models import User # type-checking-only import
For Go, Java, JavaScript/TypeScript, Rust, and C/C++, I use tree-sitter grammars that auto-install:
# Python works out of the box. Other languages:
uv sync --extra go --extra java --extra rust
Tree-sitter produces concrete syntax trees that handle the edge cases that break regex — macros in C, generics in Java, template literals in TypeScript.
What It Finds in Real Codebases
I tested RLM-Codelens against large, well-known open-source projects:
| Repository | Language | Files | LOC | Import Edges | Cycles | Anti-Patterns |
|---|---|---|---|---|---|---|
| Kubernetes | Go | 12,235 | 3.4M | 77,373 | 182 | 1,860 |
| gRPC | C/C++/Python | 7,163 | 1.2M | 35 | 0 | 1 |
| K8s Java Client | Java | 3,017 | 2.1M | 6,447 | 14 | 267 |
| vLLM | Python | 2,594 | 804K | 12,013 | 24 | 341 |
| rlm-codelens | Python | 23 | 6,824 | 17 | 0 | 3 |
The contrast is revealing. gRPC — 35 edges across 7,163 files, zero cycles. That’s strict architectural discipline. The C/C++ header-based inclusion model enforces this naturally.
Kubernetes — 77,373 edges, 182 cycles, 1,860 anti-patterns. Not because it’s badly written, but because a decade of continuous development by thousands of contributors creates structural debt that’s invisible without tooling. Nobody introduced those cycles on purpose. They accumulated one PR at a time.
And RLM-Codelens analyzing itself — 23 files, 17 edges, zero cycles, 3 anti-patterns. Clean enough.
Where Static Analysis Hits Its Ceiling
The numbers tell you where the problems are. They don’t tell you why they exist or how to fix them.
182 cycles in Kubernetes — which ones are dangerous and which are benign? Which anti-patterns are real debt and which are intentional tradeoffs? Graph algorithms find structure. They can’t explain intent.
Standard LLM approaches fail here too. The architecture analysis output for a large codebase can itself exceed a model’s context window.
This is where Recursive Language Models come in.
Recursive Language Models
RLM (Zhang, Kraska & Khattab, MIT) is a paradigm that replaces the standard llm.completion(prompt) pattern. Instead of cramming the full context into the prompt, the context is offloaded as a variable in a REPL environment. The model never sees the full context — it writes code to examine, decompose, and process it.
graph LR
subgraph Standard["Standard LLM"]
direction LR
S1["Full context\n+ query"] --> S2["Model"] --> S3["Answer"]
end
subgraph RLM["Recursive LM"]
direction LR
R1["Query"] --> R2["Root LM"]
R2 -->|"peek\ngrep"| R3["Context\nin REPL"]
R2 -->|"spawn"| R4["Child A"]
R2 -->|"spawn"| R5["Child B"]
R4 --> R6["Merge"]
R5 --> R6
R6 --> R7["Answer"]
end
The root model can peek at subsets of the data, grep for patterns, partition the context into chunks, and spawn isolated child LM calls on each subset. The decomposition strategy is decided by the model at inference time — not hardcoded by the developer.
Why This Fits Codebase Analysis
Architecture analysis data is exactly the kind of input that breaks standard approaches:
- Too large — analysis output for a repo like Kubernetes exceeds context limits
- Hierarchical — cycles cluster by subsystem, anti-patterns cluster by layer
- Variable structure — a monolith needs different analysis than a microservice architecture
When you pass --deep to RLM-Codelens, the tool feeds verified structural findings into an RLM call. The RLM enriches the static analysis with root cause explanations, refactoring recommendations, and priority assessments. It reasons over proven structure — it doesn’t hallucinate it.
Cost Control
RLM calls spawn API requests, and recursive calls can multiply cost. RLM-Codelens has built-in budget enforcement:
# Static analysis only — free, deterministic
rlmc analyze-architecture --repo /path/to/repo
# RLM enrichment — $10 budget cap (default)
rlmc analyze-architecture --repo /path/to/repo --deep --budget 5.0
# Free local analysis with Ollama
rlmc analyze-architecture --repo /path/to/repo --ollama --model deepseek-r1:latest
The cost tracker monitors every API call — including recursive sub-calls — and halts if the budget is reached. An environment-level hard cap of $50 provides a ceiling regardless of the --budget flag. All API keys are redacted in logs.
My recommendation: static analysis for continuous monitoring, RLM enrichment for architecture reviews or before major refactors.
The Output
A full pipeline generates four artifacts:
- scan.json — module structure (files, imports, functions, classes)
- arch.json — architecture analysis (cycles, hubs, layers, anti-patterns)
- viz.html — interactive D3.js dependency graph (zoom, pan, click-to-inspect)
- report.html — structured HTML architecture report
# Full pipeline
./run_analysis.sh /path/to/your/repo myproject
# Generates: myproject_scan.json, myproject_arch.json,
# myproject_viz.html, myproject_report.html
Layer Classification
graph TB
P["Presentation\nCLI, API, UI"] --> App["Application\nBusiness logic"]
App --> Dom["Domain\nCore models"]
Dom --> Infra["Infrastructure\nDB, Network, I/O"]
Infra --> Util["Utility\nHelpers, config"]
Util -.->|"Layer Violation"| App
style Util fill:#fef3c7,stroke:#d97706
style App fill:#fef3c7,stroke:#d97706
Modules are assigned to architectural layers based on their position in the dependency graph. A layer violation — like a utility module importing from the application layer — is automatically flagged. The dependency arrow points upward when it should only point down.
Try It
git clone https://github.com/knijesh/rlm-codelens.git
cd rlm-codelens
uv sync --extra dev
# Analyze any repo — no API keys needed
uv run rlmc analyze-architecture --repo /path/to/your/repo
Or use the Python API:
from rlm_codelens import RepositoryScanner, CodebaseGraphAnalyzer
scanner = RepositoryScanner("/path/to/repo")
structure = scanner.scan()
print(f"{structure.total_files} files, {structure.total_lines:,} LOC")
analyzer = CodebaseGraphAnalyzer(structure)
analysis = analyzer.analyze()
print(f"{len(analysis.cycles)} cycles")
print(f"{len(analysis.anti_patterns)} anti-patterns")
analysis.save("architecture.json")
The tool is open source under MIT. The most interesting insights usually come from the codebases you think are already clean.
Feel free to fork it and play around with it. If it’s useful, please ⭐ the repository.