Cyber Craft

Posted on Apr 1

We Scored 5,154 MCP Servers. Here's the Trust Distribution.

#ai #security #opensource #mcp

Most MCP security analysis posts start with a few hundred servers. Some reach 1,800.

We indexed 5,154.

CraftedTrust is an independent trust registry for the MCP server ecosystem. We've been scanning, scoring, and cataloging every MCP server we can find — npm packages, GitHub repos, and live endpoints. As of today, we've built what we believe is the largest trust-scored dataset of MCP servers in existence.

Here's what we found.

The Numbers

Metric	Count
Total MCP servers indexed	5,154
Live-verified (actual handshake + deep probe)	118
Static-analyzed (npm metadata + repo signals)	5,027
Unique vulnerability findings	62
High-severity vulnerabilities	23
Published security advisories	5
Active coordinated disclosures	9
Security checks in our model	60

That last number matters. Our scanner, Touchstone, runs 60 automated security checks across 8 domains every time we assess a server. This isn't a surface-level metadata scrape — it's protocol-level interrogation.

Trust Score Distribution

Every server gets a trust score from 0 to 100, computed across 12 CoSAI-aligned factors. Here's how the 118 live-verified servers break down:

Trusted  (80-100)  ████████████████████████  46 servers  (39%)
Moderate (60-79)   ████████████████████████████████████  70 servers  (59%)
Caution  (40-59)   █                                     1 server   (<1%)
Warning  (20-39)   █                                     1 server   (<1%)
Dangerous (0-19)                                         0 servers

Average live trust score: 76/100.

The good news: 98.3% of live-scanned servers score 60 or above. The MCP ecosystem isn't a wasteland.

The bad news: static-analyzed npm packages tell a different story. Their average score is 54/100 — a full 22 points lower than live servers. Many packages have no README, no license, stale dependencies, and no security policy. They're published and forgotten.

The full distribution for the 5,027 static packages skews heavily toward the middle — lots of C-grade servers that work, but haven't earned trust.

Top 5 Vulnerability Patterns

Our 60-check Touchstone scanner categorizes findings across 8 security domains. Here's where MCP servers are failing most often:

1. Supply Chain Gaps — 44 findings (71% of all findings)

This is the dominant problem. Most MCP servers on npm have:

No provenance attestation. No sigstore, no build attestation linking the published package to its source repo. Anyone could have published it.
Single-maintainer risk. One compromised npm account = full supply chain takeover of every downstream agent using that tool.
No package integrity verification. The package.json says one thing; the published tarball says another.

We found two packages impersonating well-known tools — one claiming to be a Notion MCP server, another a Gmail server — with zero cryptographic proof linking them to the official source. Both are now in coordinated disclosure.

2. Infrastructure Misconfiguration — 6 findings

Servers binding to 0.0.0.0 with no authentication. Missing rate limits. Missing CORS configuration. Stack traces in error responses. These aren't exotic vulnerabilities — they're deployment hygiene that nobody checked because there's no standard saying you should.

3. Authentication Weaknesses — 6 findings

MCP doesn't mandate authentication. Many servers don't implement it. Of those that do, we found missing PKCE enforcement on OAuth flows, overly broad token scopes, and tokens that never expire. One server accepted any bearer token without validation.

4. Data Security Issues — 4 findings

Credential patterns appearing in tool descriptions. API keys in error messages. PII in tool responses with no data classification or filtering. When your AI agent calls a tool and the response includes your AWS secret key in a stack trace, that's not a feature.

5. Input Validation Failures — 1 confirmed, more under disclosure

SSRF vectors through unrestricted URL parameters. Command injection through tool parameters that get passed to shell commands. Path traversal in filesystem tools. The confirmed finding: a browser automation server that let you navigate to http://169.254.169.254 (AWS metadata endpoint) with zero validation.

What We Actually Check: 60 Checks, 8 Domains

Most scanning tools run a handful of surface-level checks. Here's the full scope of what Touchstone evaluates:

Domain	Checks	What We're Looking For
Authentication & Authorization	9	OAuth 2.1, PKCE, token storage, scope analysis, session fixation, RFC 8707
Tool Security	10	Prompt injection in descriptions, parameter injection, rug-pull detection via tool hash tracking, shadowing, permission over-privilege
Input Validation	9	SSRF (private IPs, cloud metadata), command injection, SQL injection, path traversal, DNS rebinding, URL scheme abuse
Data Security	6	Credential patterns, PII exposure, secrets in errors/logs, cross-server data leakage
Supply Chain	8	npm provenance, CVE matching, typosquat detection, maintainer reputation, dependency confusion, source-to-package matching
Infrastructure	8	Network binding, TLS enforcement, rate limiting, CORS, error handling, HTTP security headers, DNS rebinding protection
Runtime	5	Guardrail bypass, response size limits, timeout enforcement, concurrency handling, kill switch presence
A2A Agent Cards	5	Prompt injection in agent cards, obfuscated content, identity spoofing, HTTP-only serving, excessive capability claims

Severity breakdown across all 60 checks: 13 critical, 25 high, 17 medium, 1 low.

Every single finding is mapped to CWE identifiers and scored using AIVSS (AI Vulnerability Scoring System) — a weighted formula that accounts for AI-specific factors like autonomy level, decision criticality, and cascading potential that CVSS alone can't capture.

Static Analysis vs. Live Verification — We Do Both

This is where most tools diverge. Some scan npm metadata. Some probe live endpoints. We do both, and we weight them differently.

Static Analysis (5,027 packages)

For every npm package with MCP-related keywords, we score 7 factors:

Maintenance recency — When was it last published?
Dependency health — How many deps? Any known CVEs?
Popularity — Weekly downloads as a signal (not a guarantee)
Documentation — README quality, description, MCP keyword presence
Repository activity — GitHub stars, recent commits
License clarity — Recognized OSS license present?
Security policy — SECURITY.md exists?

This catches the long tail: abandoned packages, documentation-free tools, and typosquats that never run on a live server but still get npm installed into production.

Live Verification (118 servers)

For servers with a reachable endpoint, we go deeper:

MCP handshake — Full JSON-RPC initialize exchange
Tool discovery — List every tool, resource, and prompt
Schema analysis — Validate parameter types, required fields, injection patterns
Deep probes — Actually call tools with test inputs, check error handling, validate TLS, test protocol compliance
Hash tracking — SHA-256 hash every tool's description and schema. Compare across scans. Detect rug pulls (a server that changes its tools after initial review).
Network analysis — Check for undeclared outbound connections, suspicious TLDs
12-factor scoring — The full trust model (see below)

When both exist, the combined score is weighted 60% live / 40% static. Live behavior is more trustworthy than metadata claims.

The 12-Factor Trust Model

Every live-scanned server is scored across 12 factors, organized into 5 groups. Total: 100 points.

Here's a real breakdown — our own MCP server at mcp.craftedtrust.com, which scores 81/100 (Grade B, Trusted):

                        Score   Max   Rating
─── Authentication & Access ────────────────
Identity & Auth          10     10    ██████████  Pass
Permission Scope          7      8    ████████▒   Pass

─── Server Security ────────────────────────
Transport Security        8      8    ████████    Pass
Network Behavior         10     10    ██████████  Pass
Protocol Compliance       8      8    ████████    Pass

─── Tool Safety ────────────────────────────
Declaration Accuracy      8      8    ████████    Pass
Tool Integrity           10     10    ██████████  Pass
Input Validation          7      8    ████████▒   Pass

─── Supply Chain ───────────────────────────
Supply Chain              5      8    ██████▒▒    Warn
Code Transparency         0      6    ▒▒▒▒▒▒      Fail
Publisher Trust           0      8    ▒▒▒▒▒▒▒▒    Fail

─── Data Handling ──────────────────────────
Data Protection           8      8    ████████    Pass

                   TOTAL: 81    100

Notice the pattern: security fundamentals are strong (identity, transport, tool integrity all maxed out), but supply chain trust signals are weak. No open-source repo, no publisher verification. This is the most common profile we see — servers that work correctly and securely, but can't prove provenance.

Each factor maps to a CoSAI Agentic AI Security Framework category. We also generate mappings for:

OWASP MCP Top 10 — Tool Poisoning, Excessive Permissions, Insecure Credential Storage, and 7 more
OWASP Agentic Security Initiatives (ASI) Top 10 — Agent Tool Misuse, Supply Chain Compromise, Goal Hijacking, and 7 more
MITRE ATLAS — AI Agent Context Poisoning, ML Supply Chain Compromise
NIST AI RMF — Govern, Map, Measure, Manage functions
EU AI Act — Articles 9 (Risk Management) and 15 (Accuracy, Robustness, Cybersecurity)

Five compliance frameworks. Every finding. Every server. We haven't seen another MCP scanner that does this.

Published Advisories

Touchstone's vulnerability research has already produced 5 published advisories and 9 active disclosures under our 90-day coordinated disclosure process. Two examples:

Arbitrary JavaScript Execution in chrome-local-mcp (Critical) — The eval endpoint passes user-supplied JavaScript directly to Puppeteer's page.evaluate() with zero restrictions. Persistent browser profiles retain login credentials. A prompt injection attack could steal every saved credential in the browser.

Supply Chain Impersonation (High) — We found third-party npm packages republishing popular MCP servers (notion-mcp-server, server-gmail-mcp) without any cryptographic provenance linking them to the original source. If you installed the wrong one, a single maintainer controls your Notion workspace or Gmail inbox.

All advisories: touchstone.craftedtrust.com

What This Means for the Ecosystem

The MCP ecosystem is growing fast. 5,154 servers and counting. The trust distribution tells a clear story:

Live servers are mostly fine. 98% score Moderate or Trusted. The protocol works. Most developers building MCP servers are doing reasonable security work.
The npm long tail is the risk. Average score of 54 vs. 76 for live servers. Thousands of packages with no provenance, no maintainer accountability, no security policy. Your AI agent's npm install is the attack surface.
Supply chain is the #1 vulnerability category. 71% of all findings. This isn't an MCP-specific problem, but MCP amplifies it — because every tool your agent calls is an implicit trust decision made at machine speed.
Nobody is checking compliance. We map every finding to 5 frameworks because enterprises will need this. EU AI Act Article 9 requires a risk management system. NIST AI RMF requires assessment and measurement. If your MCP servers aren't scored, you can't prove compliance.

Try It Yourself

Search any MCP server and see its trust score, 12-factor breakdown, and compliance mappings:

mcp.craftedtrust.com

Or paste a server URL and scan it free — no account required:

craftedtrust.com

The registry, scanner, and API are live. The data is public. Trust, but verify.

CraftedTrust is built by Cyber Craft Solutions. We're building the trust infrastructure for the AI agent ecosystem — from scanning MCP servers to cryptographic audit trails. If you're building with MCP and care about security, we'd like to hear from you.

DEV Community