Why I Built a Tool That Reads Codebases Like an Architect

Feb 27, 2026 6 min read

Open SourceArchitectureRLMGraph Algorithms

Understanding a large codebase is one of the hardest problems in software engineering. You can read individual files, trace a few imports, grep for patterns — but the architecture? The circular dependencies hiding three layers deep? The hub module that half the codebase depends on? That stays invisible until something breaks.

LLMs don’t solve this. Even with million-token context windows, models suffer context degradation — retrieval accuracy drops, relationship tracking fails, and the model starts confusing which module depends on which. Kubernetes has over 12,000 files. You can’t dump that into a prompt and ask “what’s wrong?”

This is why I built RLM-Codelens — a tool that combines deterministic graph analysis with Recursive Language Models to deliver architecture intelligence at scale.

Architecture Is a Graph Problem

My first instinct was to throw an LLM at it. Everyone’s first instinct.

It doesn’t work — because architecture is a graph problem, not a text problem.

Architecture isn’t what’s inside each file. It’s the relationships between files. Module A imports Module B, which imports Module C, which imports Module A again. That cycle is invisible if you’re reading files individually. It only appears when you see the whole graph.

So I flipped the approach: build the graph first, reason second.

How RLM-Codelens Works

The core is not an LLM. It’s NetworkX.

Every codebase is a directed graph. Modules are nodes. Import statements are edges. Once you have that graph, graph theory does the heavy lifting:

graph LR
    A["Cycle Detection"] --- B["Hub Analysis"]
    B --- C["Layer Classification"]
    C --- D["Anti-Pattern Detection"]

Cycle detection finds circular dependencies — import chains that loop back on themselves. These create tight coupling and make refactoring dangerous.

Hub analysis identifies modules with high fan-in — files that many other modules depend on. Changing a hub has a large blast radius.

Layer classification assigns modules to architectural layers and flags violations — like a utility module importing from the application layer. The dependency arrow points the wrong way.

Anti-pattern detection finds structural problems that emerge from relationships between files, not from individual files in isolation.

None of this requires an LLM. No API keys, no cost, no hallucination risk. Just parsing and math.

Multi-Language Parsing

Getting the graph right means getting the parsing right. A missed import is a missing edge — and a missing edge means invisible dependencies.

For Python, I use the built-in ast module — actual abstract syntax tree parsing, not regex. This correctly handles relative imports, from x import y, conditional imports inside TYPE_CHECKING guards, and __init__.py re-exports.

# All of these produce edges in the dependency graph
import os                           # standard import
from pathlib import Path            # from-import
from ..utils import helper          # relative import
from typing import TYPE_CHECKING    # conditional import guard

if TYPE_CHECKING:
    from models import User         # type-checking-only import

For Go, Java, JavaScript/TypeScript, Rust, and C/C++, I use tree-sitter grammars that auto-install:

# Python works out of the box. Other languages:
uv sync --extra go --extra java --extra rust

Tree-sitter produces concrete syntax trees that handle the edge cases that break regex — macros in C, generics in Java, template literals in TypeScript.

What It Finds in Real Codebases

I tested RLM-Codelens against large, well-known open-source projects:

Repository	Language	Files	LOC	Import Edges	Cycles	Anti-Patterns
Kubernetes	Go	12,235	3.4M	77,373	182	1,860
gRPC	C/C++/Python	7,163	1.2M	35	0	1
K8s Java Client	Java	3,017	2.1M	6,447	14	267
vLLM	Python	2,594	804K	12,013	24	341
rlm-codelens	Python	23	6,824	17	0	3

The contrast is revealing. gRPC — 35 edges across 7,163 files, zero cycles. That’s strict architectural discipline. The C/C++ header-based inclusion model enforces this naturally.

Kubernetes — 77,373 edges, 182 cycles, 1,860 anti-patterns. Not because it’s badly written, but because a decade of continuous development by thousands of contributors creates structural debt that’s invisible without tooling. Nobody introduced those cycles on purpose. They accumulated one PR at a time.

And RLM-Codelens analyzing itself — 23 files, 17 edges, zero cycles, 3 anti-patterns. Clean enough.

Where Static Analysis Hits Its Ceiling

The numbers tell you where the problems are. They don’t tell you why they exist or how to fix them.

182 cycles in Kubernetes — which ones are dangerous and which are benign? Which anti-patterns are real debt and which are intentional tradeoffs? Graph algorithms find structure. They can’t explain intent.

Standard LLM approaches fail here too. The architecture analysis output for a large codebase can itself exceed a model’s context window.

This is where Recursive Language Models come in.

Recursive Language Models

RLM (Zhang, Kraska & Khattab, MIT) is a paradigm that replaces the standard llm.completion(prompt) pattern. Instead of cramming the full context into the prompt, the context is offloaded as a variable in a REPL environment. The model never sees the full context — it writes code to examine, decompose, and process it.

graph LR
    subgraph Standard["Standard LLM"]
        direction LR
        S1["Full context\n+ query"] --> S2["Model"] --> S3["Answer"]
    end
    subgraph RLM["Recursive LM"]
        direction LR
        R1["Query"] --> R2["Root LM"]
        R2 -->|"peek\ngrep"| R3["Context\nin REPL"]
        R2 -->|"spawn"| R4["Child A"]
        R2 -->|"spawn"| R5["Child B"]
        R4 --> R6["Merge"]
        R5 --> R6
        R6 --> R7["Answer"]
    end

The root model can peek at subsets of the data, grep for patterns, partition the context into chunks, and spawn isolated child LM calls on each subset. The decomposition strategy is decided by the model at inference time — not hardcoded by the developer.

Why This Fits Codebase Analysis

Architecture analysis data is exactly the kind of input that breaks standard approaches:

Too large — analysis output for a repo like Kubernetes exceeds context limits
Hierarchical — cycles cluster by subsystem, anti-patterns cluster by layer
Variable structure — a monolith needs different analysis than a microservice architecture

When you pass --deep to RLM-Codelens, the tool feeds verified structural findings into an RLM call. The RLM enriches the static analysis with root cause explanations, refactoring recommendations, and priority assessments. It reasons over proven structure — it doesn’t hallucinate it.

Cost Control

RLM calls spawn API requests, and recursive calls can multiply cost. RLM-Codelens has built-in budget enforcement:

# Static analysis only — free, deterministic
rlmc analyze-architecture --repo /path/to/repo

# RLM enrichment — $10 budget cap (default)
rlmc analyze-architecture --repo /path/to/repo --deep --budget 5.0

# Free local analysis with Ollama
rlmc analyze-architecture --repo /path/to/repo --ollama --model deepseek-r1:latest

The cost tracker monitors every API call — including recursive sub-calls — and halts if the budget is reached. An environment-level hard cap of $50 provides a ceiling regardless of the --budget flag. All API keys are redacted in logs.

My recommendation: static analysis for continuous monitoring, RLM enrichment for architecture reviews or before major refactors.

The Output

A full pipeline generates four artifacts:

scan.json — module structure (files, imports, functions, classes)
arch.json — architecture analysis (cycles, hubs, layers, anti-patterns)
viz.html — interactive D3.js dependency graph (zoom, pan, click-to-inspect)
report.html — structured HTML architecture report

# Full pipeline
./run_analysis.sh /path/to/your/repo myproject
# Generates: myproject_scan.json, myproject_arch.json,
#            myproject_viz.html, myproject_report.html

Layer Classification

graph TB
    P["Presentation\nCLI, API, UI"] --> App["Application\nBusiness logic"]
    App --> Dom["Domain\nCore models"]
    Dom --> Infra["Infrastructure\nDB, Network, I/O"]
    Infra --> Util["Utility\nHelpers, config"]
    Util -.->|"Layer Violation"| App
    style Util fill:#fef3c7,stroke:#d97706
    style App fill:#fef3c7,stroke:#d97706

Modules are assigned to architectural layers based on their position in the dependency graph. A layer violation — like a utility module importing from the application layer — is automatically flagged. The dependency arrow points upward when it should only point down.

Try It

git clone https://github.com/knijesh/rlm-codelens.git
cd rlm-codelens
uv sync --extra dev

# Analyze any repo — no API keys needed
uv run rlmc analyze-architecture --repo /path/to/your/repo

Or use the Python API:

from rlm_codelens import RepositoryScanner, CodebaseGraphAnalyzer

scanner = RepositoryScanner("/path/to/repo")
structure = scanner.scan()
print(f"{structure.total_files} files, {structure.total_lines:,} LOC")

analyzer = CodebaseGraphAnalyzer(structure)
analysis = analyzer.analyze()
print(f"{len(analysis.cycles)} cycles")
print(f"{len(analysis.anti_patterns)} anti-patterns")

analysis.save("architecture.json")

The tool is open source under MIT. The most interesting insights usually come from the codebases you think are already clean.

Feel free to fork it and play around with it. If it’s useful, please ⭐ the repository.