orenlab

Posted on Apr 1

I turned my Python code quality tool into a budget-aware MCP server for AI agents

#ai #python #mcp #opensource

I already wrote about why I built CodeClone and why I care about baseline-aware
code health:

I built a baseline-aware Python code health tool for CI and AI-assisted coding

This post is about what changed in 2.0.0b3.

The short version: this is the first release where CodeClone feels less like a Python structural analysis CLI and more like a serious MCP surface for AI coding agents.

Not by building a second engine.
Not by adding AI-specific heuristics to the core.
But by exposing the same deterministic, baseline-aware pipeline through a read-only MCP layer that agents can actually use.

Why MCP mattered for CodeClone

Once you start using coding agents seriously, the hard part is not "can the model write code?"

The harder questions are:

what changed structurally?
is this debt new or already accepted in baseline?
is this production risk or just test noise?
should this block CI?
what is the safest next refactor target?

That is the gap I wanted CodeClone to close.

What shipped in `2.0.0b3`

The headline is an optional MCP server:

pip install --pre "codeclone[mcp]"
codeclone-mcp --transport stdio

Since b3 is still a beta, the --pre flag matters here.

But the useful part is the workflow around it.

b3 adds three things that matter together:

a read-only MCP surface for agents and IDE clients
review-oriented workflows: changed-files analysis, run comparison, gate preview, and PR summaries
tighter surrounding surfaces: stronger SARIF, better HTML navigation, and directory hotspots

There is also a packaging change worth mentioning:

CodeClone source code is now under MPL-2.0
documentation stays under MIT

What makes this MCP layer different

I think there are a lot of tools now that can expose "some analysis" over MCP.

What I wanted from CodeClone was stricter than that.

1. Canonical-report-first

The MCP layer is not a second truth path.

It reads the same canonical report model as the CLI, HTML, and SARIF surfaces.
That means an agent is not looking at an "AI view" that quietly disagrees with what CI or the report says.

2. Read-only

This was non-negotiable for me.

CodeClone MCP does not mutate:

source files
baselines
repository state
on-disk report artifacts

The only mutable part is session-local review state, and that stays in memory only.

3. Budget-aware by design

This is the part I ended up caring about more than I expected.

A lot of MCP tools are technically useful, but easy to use badly. An agent can burn a lot of tokens just by listing too much too early.

CodeClone MCP is intentionally shaped so that the cheapest useful path is the default path.

It is not only bounded in payload shape. It actively guides agents toward low-cost, high-signal workflows.

The cheapest useful path is now the most obvious path.

The workflow I wanted agents to follow

The right first pass is not "dump all findings."

In practice, the first useful question is rarely “show me everything.”
It is usually “where should I look first?”

It is:

analyze_repository or analyze_changed_paths
→ get_run_summary or get_production_triage
→ list_hotspots or focused check_*
→ get_finding
→ get_remediation

That sounds simple, but it matters a lot.

It means:

cheap overview first
narrow triage second
deep detail only when it is actually needed

That is a better fit for LLMs, and honestly a better fit for humans too.

Real token cost on a dirty repository

I wanted to check whether this was just a nice theory, so I tested it on one of my own messier private Python repos.
It is still in an early development stage, is not public yet, and from CodeClone's point of view it currently has a lot of structural debt.
It works, but "works" and "structurally healthy" are obviously not the same thing.

In one local run, that looked like this:

449 Python files
108,939 lines
2,729 functions
1,048 classes
659 findings
health score 34 (F)

Then I compared two MCP paths.

Broad first-pass flow

A more naive "ask for a lot of things" practical cycle came out to about:

10,566 tokens

Guided first-pass flow

Following the new MCP guidance:

analyze_repository
get_production_triage
list_hotspots
get_finding
get_remediation

The same first-pass workflow came out to about:

2,535 tokens

That is roughly a 76% reduction in token cost for a useful first pass.

The payloads did not magically become tiny; the main change was that the MCP surface now guided the client toward a narrower first-pass workflow.

That result mattered to me because it changed how I think about MCP quality.

For agent tooling, payload size is only half the story.
The other half is whether the server nudges the agent toward the right path.

Why this matters for PR review

In practice, the most valuable agent loop is usually not “analyze the whole repository forever,” but “review what changed, compare it to baseline, and decide whether anything should block the merge.”

It is usually closer to:

code changed
tests passed
now check whether the structure got better or worse

That is why b3 puts a lot of weight on changed-scope review.

With CodeClone MCP, an agent can now ask things like:

what findings touch the files changed in this branch?
are these findings new relative to baseline?
what is the highest-priority structural issue here?
would this fail CI?
can I produce a short PR-ready summary?

That is a much better review loop than a giant flat findings dump.

What the MCP surface is good at now

The shape I like most is:

full repository analysis when you need canonical truth
changed-files analysis when you need review focus
compact triage first
single-finding drill-down second
markdown PR summary at the end

In practice, that makes prompts stay simple.

For example:

Changed-files review

Use CodeClone MCP to review the files changed in this branch.
Show me only findings that touch changed files, rank them by priority, and tell me whether anything here should block CI.

Safe refactor pick

Use CodeClone MCP to find one high-priority structural issue that looks safe to refactor. Explain why it is a good first target and what refactor shape you would use. Do not change code yet.

AI-generated code check

I added a lot of code with an AI agent.
Use CodeClone MCP to check for structural drift: new clone groups, duplicated branches, dead code, or design hotspots. Prioritize what is new relative to baseline.

That is the kind of MCP ergonomics I was aiming for: prompts stay fairly client-agnostic, and the server gives the agent a disciplined path.

`b3` is not only about MCP

Even though MCP is the headline, I did not want it to be isolated from the rest of the product.

2.0.0b3 also tightens the surrounding surfaces:

canonical report schema 2.2
cache schema 2.3
canonical design-finding thresholds recorded in report metadata
Hotspots by Directory in the HTML overview
stronger SARIF identities for code-scanning workflows
Composite GitHub Action v2 for CI and PR automation

That matters because I want all of these surfaces to agree:

CLI for CI
MCP for agents
HTML for navigation
SARIF for platform workflows

The product truth I am taking from this release

The biggest lesson from b3 is that a good MCP server is not just a pile of tools.

It is a control surface.

For CodeClone, that now means:

deterministic
canonical-report-first
read-only
budget-aware
triage-first
agent-guiding

That is the direction I want to keep pushing.

Not "AI magic."
Better control loops.

Try it (don't forget use `--pre`)

GitHub: orenlab/codeclone
Docs: orenlab.github.io/codeclone
MCP guide: orenlab.github.io/codeclone/mcp/
PyPI: pypi.org/project/codeclone

If you are already building with MCP clients, I would especially love feedback on one question:

what would make PR review through an MCP tool genuinely useful for your team?

DEV Community

I turned my Python code quality tool into a budget-aware MCP server for AI agents

Why MCP mattered for CodeClone

What shipped in `2.0.0b3`

What makes this MCP layer different

1. Canonical-report-first

2. Read-only

3. Budget-aware by design

The workflow I wanted agents to follow

Real token cost on a dirty repository

Broad first-pass flow

Guided first-pass flow

Why this matters for PR review

What the MCP surface is good at now

Changed-files review

Safe refactor pick

AI-generated code check

`b3` is not only about MCP

The product truth I am taking from this release

Try it (don't forget use `--pre`)

Top comments (0)

Why MCP mattered for CodeClone

What shipped in 2.0.0b3

What makes this MCP layer different

1. Canonical-report-first

2. Read-only

3. Budget-aware by design

The workflow I wanted agents to follow

Real token cost on a dirty repository

Broad first-pass flow

Guided first-pass flow

Why this matters for PR review

What the MCP surface is good at now

Changed-files review

Safe refactor pick

AI-generated code check

b3 is not only about MCP

The product truth I am taking from this release

Try it (don't forget use --pre)

What shipped in `2.0.0b3`

`b3` is not only about MCP

Try it (don't forget use `--pre`)