I already wrote about why I built CodeClone and why I care about baseline-aware
code health:
I built a baseline-aware Python code health tool for CI and AI-assisted coding
This post is about what changed in 2.0.0b3.
The short version: this is the first release where CodeClone feels less like a Python structural analysis CLI and more like a serious MCP surface for AI coding agents.
Not by building a second engine.
Not by adding AI-specific heuristics to the core.
But by exposing the same deterministic, baseline-aware pipeline through a read-only MCP layer that agents can actually use.
Why MCP mattered for CodeClone
Once you start using coding agents seriously, the hard part is not "can the model write code?"
The harder questions are:
- what changed structurally?
- is this debt new or already accepted in baseline?
- is this production risk or just test noise?
- should this block CI?
- what is the safest next refactor target?
That is the gap I wanted CodeClone to close.
What shipped in 2.0.0b3
The headline is an optional MCP server:
pip install --pre "codeclone[mcp]"
codeclone-mcp --transport stdio
Since b3 is still a beta, the --pre flag matters here.
But the useful part is the workflow around it.
b3 adds three things that matter together:
- a read-only MCP surface for agents and IDE clients
- review-oriented workflows: changed-files analysis, run comparison, gate preview, and PR summaries
- tighter surrounding surfaces: stronger SARIF, better HTML navigation, and directory hotspots
There is also a packaging change worth mentioning:
- CodeClone source code is now under
MPL-2.0 - documentation stays under
MIT
What makes this MCP layer different
I think there are a lot of tools now that can expose "some analysis" over MCP.
What I wanted from CodeClone was stricter than that.
1. Canonical-report-first
The MCP layer is not a second truth path.
It reads the same canonical report model as the CLI, HTML, and SARIF surfaces.
That means an agent is not looking at an "AI view" that quietly disagrees with what CI or the report says.
2. Read-only
This was non-negotiable for me.
CodeClone MCP does not mutate:
- source files
- baselines
- repository state
- on-disk report artifacts
The only mutable part is session-local review state, and that stays in memory only.
3. Budget-aware by design
This is the part I ended up caring about more than I expected.
A lot of MCP tools are technically useful, but easy to use badly. An agent can burn a lot of tokens just by listing too much too early.
CodeClone MCP is intentionally shaped so that the cheapest useful path is the default path.
It is not only bounded in payload shape. It actively guides agents toward low-cost, high-signal workflows.
The cheapest useful path is now the most obvious path.
The workflow I wanted agents to follow
The right first pass is not "dump all findings."
In practice, the first useful question is rarely “show me everything.”
It is usually “where should I look first?”
It is:
analyze_repository or analyze_changed_paths
→ get_run_summary or get_production_triage
→ list_hotspots or focused check_*
→ get_finding
→ get_remediation
That sounds simple, but it matters a lot.
It means:
- cheap overview first
- narrow triage second
- deep detail only when it is actually needed
That is a better fit for LLMs, and honestly a better fit for humans too.
Real token cost on a dirty repository
I wanted to check whether this was just a nice theory, so I tested it on one of my own messier private Python repos.
It is still in an early development stage, is not public yet, and from CodeClone's point of view it currently has a lot of structural debt.
It works, but "works" and "structurally healthy" are obviously not the same thing.
In one local run, that looked like this:
-
449Python files -
108,939lines -
2,729functions -
1,048classes -
659findings - health score
34 (F)
Then I compared two MCP paths.
Broad first-pass flow
A more naive "ask for a lot of things" practical cycle came out to about:
-
10,566tokens
Guided first-pass flow
Following the new MCP guidance:
analyze_repositoryget_production_triagelist_hotspotsget_findingget_remediation
The same first-pass workflow came out to about:
-
2,535tokens
That is roughly a 76% reduction in token cost for a useful first pass.
The payloads did not magically become tiny; the main change was that the MCP surface now guided the client toward a narrower first-pass workflow.
That result mattered to me because it changed how I think about MCP quality.
For agent tooling, payload size is only half the story.
The other half is whether the server nudges the agent toward the right path.
Why this matters for PR review
In practice, the most valuable agent loop is usually not “analyze the whole repository forever,” but “review what changed, compare it to baseline, and decide whether anything should block the merge.”
It is usually closer to:
- code changed
- tests passed
- now check whether the structure got better or worse
That is why b3 puts a lot of weight on changed-scope review.
With CodeClone MCP, an agent can now ask things like:
- what findings touch the files changed in this branch?
- are these findings new relative to baseline?
- what is the highest-priority structural issue here?
- would this fail CI?
- can I produce a short PR-ready summary?
That is a much better review loop than a giant flat findings dump.
What the MCP surface is good at now
The shape I like most is:
- full repository analysis when you need canonical truth
- changed-files analysis when you need review focus
- compact triage first
- single-finding drill-down second
- markdown PR summary at the end
In practice, that makes prompts stay simple.
For example:
Changed-files review
Use CodeClone MCP to review the files changed in this branch.
Show me only findings that touch changed files, rank them by priority, and tell me whether anything here should block CI.
Safe refactor pick
Use CodeClone MCP to find one high-priority structural issue that looks safe to refactor. Explain why it is a good first target and what refactor shape you would use. Do not change code yet.
AI-generated code check
I added a lot of code with an AI agent.
Use CodeClone MCP to check for structural drift: new clone groups, duplicated branches, dead code, or design hotspots. Prioritize what is new relative to baseline.
That is the kind of MCP ergonomics I was aiming for: prompts stay fairly client-agnostic, and the server gives the agent a disciplined path.
b3 is not only about MCP
Even though MCP is the headline, I did not want it to be isolated from the rest of the product.
2.0.0b3 also tightens the surrounding surfaces:
- canonical report schema
2.2 - cache schema
2.3 - canonical design-finding thresholds recorded in report metadata
-
Hotspots by Directoryin the HTML overview - stronger SARIF identities for code-scanning workflows
- Composite GitHub Action v2 for CI and PR automation
That matters because I want all of these surfaces to agree:
- CLI for CI
- MCP for agents
- HTML for navigation
- SARIF for platform workflows
The product truth I am taking from this release
The biggest lesson from b3 is that a good MCP server is not just a pile of tools.
It is a control surface.
For CodeClone, that now means:
- deterministic
- canonical-report-first
- read-only
- budget-aware
- triage-first
- agent-guiding
That is the direction I want to keep pushing.
Not "AI magic."
Better control loops.
Try it (don't forget use --pre)
- GitHub: orenlab/codeclone
- Docs: orenlab.github.io/codeclone
- MCP guide: orenlab.github.io/codeclone/mcp/
- PyPI: pypi.org/project/codeclone
If you are already building with MCP clients, I would especially love feedback on one question:
what would make PR review through an MCP tool genuinely useful for your team?
Top comments (0)