The Problem: Agents Grading Their Own Homework
If you’re running LLM agents in production — whether with LangChain, CrewAI, or custom pipelines — you’ve probably built some kind of output validation. Maybe a second LLM call checks the first one’s work. Maybe you parse for structural issues.
Here’s what I kept finding: LLM-based self-review has a systematic leniency bias. When you prompt an LLM to review output from another LLM (or itself), it overwhelmingly approves. The reviewer and generator share similar blind spots — they fail in correlated ways.
This matters when your agent writes code that gets deployed, generates customer-facing content, or makes decisions affecting downstream systems.
The Approach: Adversarial Review with Dual Consensus
AgentDesk provides two interfaces for adding adversarial review:
- MCP Server (open source, MIT) — review-only. Pass in any content, get structured quality feedback. Runs locally with your own API key.
- Hosted REST API — generate + review + auto-fix in one call. Submit a prompt, get reviewed output back.
Both use the same core approach:
- Two independent reviewers evaluate the output, each prompted adversarially (their job is to find problems, not confirm quality)
- Dual consensus — both must agree on pass/fail
- Substantive quality validation — a deterministic (non-LLM) layer that requires every checklist item to include specific evidence quoted from the output. Missing citations → automatic fail.
- Scored 0-100 with structured feedback on every issue found
It’s BYOK — you supply your own LLM API key. AgentDesk handles orchestration only.
When to Use This
- CI pipeline for generated code — gate merges on review score
- Content QA for chatbot outputs — catch hallucinations before they reach users
- Data extraction validation — verify structured output completeness
- Multi-agent workflow checkpoints — validate intermediate outputs between agents
Option 1: MCP Server (Review Only)
Install in one command and use from Claude Code, Claude Desktop, or any MCP client:
claude mcp add agentdesk-mcp -- npx @ezark-publish/agentdesk-mcp
The review_output tool takes content you’ve already generated and reviews it:
{
"output": "Your agent generated content goes here...",
"review_type": "code",
"criteria": "Check for security vulnerabilities and race conditions"
}
Response:
{
"verdict": "FAIL",
"score": 38,
"issues": [
{
"severity": "critical",
"category": "concurrency",
"description": "Race condition in refill logic under concurrent access",
"suggestion": "Use atomic CAS operation or mutex lock"
}
],
"checklist": [
{
"item": "Thread safety",
"status": "fail",
"evidence": "lines 23-31: non-atomic read-modify-write on token count"
}
],
"summary": "Critical concurrency issues found. Not safe for production use."
}
For higher confidence, review_dual runs two independent reviewers + merge. Either reviewer can veto a pass.
Option 2: REST API (Generate + Review + Auto-Fix)
For non-MCP integration, the hosted API generates output, reviews it, and auto-fixes (up to 2 iterations) in a single call:
curl -X POST https://agentdesk.usedevtools.com/api/v1/tasks \
-H "Authorization: Bearer agd_your_key" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Write a token bucket rate limiter in Python",
"api_key": "sk-ant-your-anthropic-key",
"review": true,
"review_type": "code",
"dual_review": true
}'
import requests
resp = requests.post(
"https://agentdesk.usedevtools.com/api/v1/tasks",
headers={"Authorization": "Bearer agd_your_key"},
json={
"prompt": "Write a token bucket rate limiter in Python",
"api_key": "sk-ant-your-anthropic-key",
"review": True,
"review_type": "code",
"dual_review": True,
},
)
task = resp.json()
import time
while True:
result = requests.get(
f"https://agentdesk.usedevtools.com/api/v1/tasks/{task['id']}",
headers={"Authorization": "Bearer agd_your_key"},
).json()
if result["status"] in ("completed", "failed"):
break
time.sleep(2)
print(result["review"]["verdict"]) # PASS / FAIL / CONDITIONAL_PASS
print(result["review"]["score"]) # 0-100
How It Works Internally
- Reviewer A gets the output with an adversarial system prompt — find every flaw: factual errors, logical gaps, missing requirements.
- Reviewer B independently evaluates from a different angle — completeness, edge cases, whether the output actually addresses the task.
- Substantive quality check (deterministic, not LLM-based): every checklist item must include a specific quote from the output as evidence. Missing citations → item force-failed. If >30% of items lack evidence, entire review capped at score 50.
- Consensus engine combines reviews. Both must pass. Scores averaged. Disagreements flagged.
Each reviewer gets a fresh API call with only the output to review and a distinct system prompt — no shared conversation history.
How This Compares
| Approach | Correlation with generator | Cost | Runtime? |
|---|---|---|---|
| Self-review (same model) | High | 1 LLM call | Yes |
| Chain-of-verification | Medium | 2-3 LLM calls | Yes |
| AgentDesk adversarial | Low | 2-3 LLM calls | Yes |
| Offline eval (Braintrust, DeepEval) | N/A | Varies | No |
| Human review | None | $$$ + slow | Partially |
Pricing
- Free: 20 tasks/month (BYOK)
- Starter: $29/mo — 500 tasks
- Pro: $79/mo — 5,000 tasks + dual review + workflows
- Team: $199/mo — 50,000 tasks
The MCP server is free and open source — you only pay for your own LLM API calls.
Get started in 30 seconds:
- MCP:
claude mcp add agentdesk-mcp -- npx @ezark-publish/agentdesk-mcp - REST API: Sign up for a free API key
- GitHub: github.com/Rih0z/agentdesk-mcp
If you’re building with AI agents, I’d like to hear what’s working for you on quality control. Drop a comment or open an issue on GitHub.
Top comments (0)