Most MCP security analysis posts start with a few hundred servers. Some reach 1,800.
We indexed 5,154.
CraftedTrust is an independent trust registry for the MCP server ecosystem. We've been scanning, scoring, and cataloging every MCP server we can find — npm packages, GitHub repos, and live endpoints. As of today, we've built what we believe is the largest trust-scored dataset of MCP servers in existence.
Here's what we found.
The Numbers
| Metric | Count |
|---|---|
| Total MCP servers indexed | 5,154 |
| Live-verified (actual handshake + deep probe) | 118 |
| Static-analyzed (npm metadata + repo signals) | 5,027 |
| Unique vulnerability findings | 62 |
| High-severity vulnerabilities | 23 |
| Published security advisories | 5 |
| Active coordinated disclosures | 9 |
| Security checks in our model | 60 |
That last number matters. Our scanner, Touchstone, runs 60 automated security checks across 8 domains every time we assess a server. This isn't a surface-level metadata scrape — it's protocol-level interrogation.
Trust Score Distribution
Every server gets a trust score from 0 to 100, computed across 12 CoSAI-aligned factors. Here's how the 118 live-verified servers break down:
Trusted (80-100) ████████████████████████ 46 servers (39%)
Moderate (60-79) ████████████████████████████████████ 70 servers (59%)
Caution (40-59) █ 1 server (<1%)
Warning (20-39) █ 1 server (<1%)
Dangerous (0-19) 0 servers
Average live trust score: 76/100.
The good news: 98.3% of live-scanned servers score 60 or above. The MCP ecosystem isn't a wasteland.
The bad news: static-analyzed npm packages tell a different story. Their average score is 54/100 — a full 22 points lower than live servers. Many packages have no README, no license, stale dependencies, and no security policy. They're published and forgotten.
The full distribution for the 5,027 static packages skews heavily toward the middle — lots of C-grade servers that work, but haven't earned trust.
Top 5 Vulnerability Patterns
Our 60-check Touchstone scanner categorizes findings across 8 security domains. Here's where MCP servers are failing most often:
1. Supply Chain Gaps — 44 findings (71% of all findings)
This is the dominant problem. Most MCP servers on npm have:
- No provenance attestation. No sigstore, no build attestation linking the published package to its source repo. Anyone could have published it.
- Single-maintainer risk. One compromised npm account = full supply chain takeover of every downstream agent using that tool.
-
No package integrity verification. The
package.jsonsays one thing; the published tarball says another.
We found two packages impersonating well-known tools — one claiming to be a Notion MCP server, another a Gmail server — with zero cryptographic proof linking them to the official source. Both are now in coordinated disclosure.
2. Infrastructure Misconfiguration — 6 findings
Servers binding to 0.0.0.0 with no authentication. Missing rate limits. Missing CORS configuration. Stack traces in error responses. These aren't exotic vulnerabilities — they're deployment hygiene that nobody checked because there's no standard saying you should.
3. Authentication Weaknesses — 6 findings
MCP doesn't mandate authentication. Many servers don't implement it. Of those that do, we found missing PKCE enforcement on OAuth flows, overly broad token scopes, and tokens that never expire. One server accepted any bearer token without validation.
4. Data Security Issues — 4 findings
Credential patterns appearing in tool descriptions. API keys in error messages. PII in tool responses with no data classification or filtering. When your AI agent calls a tool and the response includes your AWS secret key in a stack trace, that's not a feature.
5. Input Validation Failures — 1 confirmed, more under disclosure
SSRF vectors through unrestricted URL parameters. Command injection through tool parameters that get passed to shell commands. Path traversal in filesystem tools. The confirmed finding: a browser automation server that let you navigate to http://169.254.169.254 (AWS metadata endpoint) with zero validation.
What We Actually Check: 60 Checks, 8 Domains
Most scanning tools run a handful of surface-level checks. Here's the full scope of what Touchstone evaluates:
| Domain | Checks | What We're Looking For |
|---|---|---|
| Authentication & Authorization | 9 | OAuth 2.1, PKCE, token storage, scope analysis, session fixation, RFC 8707 |
| Tool Security | 10 | Prompt injection in descriptions, parameter injection, rug-pull detection via tool hash tracking, shadowing, permission over-privilege |
| Input Validation | 9 | SSRF (private IPs, cloud metadata), command injection, SQL injection, path traversal, DNS rebinding, URL scheme abuse |
| Data Security | 6 | Credential patterns, PII exposure, secrets in errors/logs, cross-server data leakage |
| Supply Chain | 8 | npm provenance, CVE matching, typosquat detection, maintainer reputation, dependency confusion, source-to-package matching |
| Infrastructure | 8 | Network binding, TLS enforcement, rate limiting, CORS, error handling, HTTP security headers, DNS rebinding protection |
| Runtime | 5 | Guardrail bypass, response size limits, timeout enforcement, concurrency handling, kill switch presence |
| A2A Agent Cards | 5 | Prompt injection in agent cards, obfuscated content, identity spoofing, HTTP-only serving, excessive capability claims |
Severity breakdown across all 60 checks: 13 critical, 25 high, 17 medium, 1 low.
Every single finding is mapped to CWE identifiers and scored using AIVSS (AI Vulnerability Scoring System) — a weighted formula that accounts for AI-specific factors like autonomy level, decision criticality, and cascading potential that CVSS alone can't capture.
Static Analysis vs. Live Verification — We Do Both
This is where most tools diverge. Some scan npm metadata. Some probe live endpoints. We do both, and we weight them differently.
Static Analysis (5,027 packages)
For every npm package with MCP-related keywords, we score 7 factors:
- Maintenance recency — When was it last published?
- Dependency health — How many deps? Any known CVEs?
- Popularity — Weekly downloads as a signal (not a guarantee)
- Documentation — README quality, description, MCP keyword presence
- Repository activity — GitHub stars, recent commits
- License clarity — Recognized OSS license present?
- Security policy — SECURITY.md exists?
This catches the long tail: abandoned packages, documentation-free tools, and typosquats that never run on a live server but still get npm installed into production.
Live Verification (118 servers)
For servers with a reachable endpoint, we go deeper:
-
MCP handshake — Full JSON-RPC
initializeexchange - Tool discovery — List every tool, resource, and prompt
- Schema analysis — Validate parameter types, required fields, injection patterns
- Deep probes — Actually call tools with test inputs, check error handling, validate TLS, test protocol compliance
- Hash tracking — SHA-256 hash every tool's description and schema. Compare across scans. Detect rug pulls (a server that changes its tools after initial review).
- Network analysis — Check for undeclared outbound connections, suspicious TLDs
- 12-factor scoring — The full trust model (see below)
When both exist, the combined score is weighted 60% live / 40% static. Live behavior is more trustworthy than metadata claims.
The 12-Factor Trust Model
Every live-scanned server is scored across 12 factors, organized into 5 groups. Total: 100 points.
Here's a real breakdown — our own MCP server at mcp.craftedtrust.com, which scores 81/100 (Grade B, Trusted):
Score Max Rating
─── Authentication & Access ────────────────
Identity & Auth 10 10 ██████████ Pass
Permission Scope 7 8 ████████▒ Pass
─── Server Security ────────────────────────
Transport Security 8 8 ████████ Pass
Network Behavior 10 10 ██████████ Pass
Protocol Compliance 8 8 ████████ Pass
─── Tool Safety ────────────────────────────
Declaration Accuracy 8 8 ████████ Pass
Tool Integrity 10 10 ██████████ Pass
Input Validation 7 8 ████████▒ Pass
─── Supply Chain ───────────────────────────
Supply Chain 5 8 ██████▒▒ Warn
Code Transparency 0 6 ▒▒▒▒▒▒ Fail
Publisher Trust 0 8 ▒▒▒▒▒▒▒▒ Fail
─── Data Handling ──────────────────────────
Data Protection 8 8 ████████ Pass
TOTAL: 81 100
Notice the pattern: security fundamentals are strong (identity, transport, tool integrity all maxed out), but supply chain trust signals are weak. No open-source repo, no publisher verification. This is the most common profile we see — servers that work correctly and securely, but can't prove provenance.
Each factor maps to a CoSAI Agentic AI Security Framework category. We also generate mappings for:
- OWASP MCP Top 10 — Tool Poisoning, Excessive Permissions, Insecure Credential Storage, and 7 more
- OWASP Agentic Security Initiatives (ASI) Top 10 — Agent Tool Misuse, Supply Chain Compromise, Goal Hijacking, and 7 more
- MITRE ATLAS — AI Agent Context Poisoning, ML Supply Chain Compromise
- NIST AI RMF — Govern, Map, Measure, Manage functions
- EU AI Act — Articles 9 (Risk Management) and 15 (Accuracy, Robustness, Cybersecurity)
Five compliance frameworks. Every finding. Every server. We haven't seen another MCP scanner that does this.
Published Advisories
Touchstone's vulnerability research has already produced 5 published advisories and 9 active disclosures under our 90-day coordinated disclosure process. Two examples:
Arbitrary JavaScript Execution in chrome-local-mcp (Critical) — The eval endpoint passes user-supplied JavaScript directly to Puppeteer's page.evaluate() with zero restrictions. Persistent browser profiles retain login credentials. A prompt injection attack could steal every saved credential in the browser.
Supply Chain Impersonation (High) — We found third-party npm packages republishing popular MCP servers (notion-mcp-server, server-gmail-mcp) without any cryptographic provenance linking them to the original source. If you installed the wrong one, a single maintainer controls your Notion workspace or Gmail inbox.
All advisories: touchstone.craftedtrust.com
What This Means for the Ecosystem
The MCP ecosystem is growing fast. 5,154 servers and counting. The trust distribution tells a clear story:
Live servers are mostly fine. 98% score Moderate or Trusted. The protocol works. Most developers building MCP servers are doing reasonable security work.
The npm long tail is the risk. Average score of 54 vs. 76 for live servers. Thousands of packages with no provenance, no maintainer accountability, no security policy. Your AI agent's
npm installis the attack surface.Supply chain is the #1 vulnerability category. 71% of all findings. This isn't an MCP-specific problem, but MCP amplifies it — because every tool your agent calls is an implicit trust decision made at machine speed.
Nobody is checking compliance. We map every finding to 5 frameworks because enterprises will need this. EU AI Act Article 9 requires a risk management system. NIST AI RMF requires assessment and measurement. If your MCP servers aren't scored, you can't prove compliance.
Try It Yourself
Search any MCP server and see its trust score, 12-factor breakdown, and compliance mappings:
Or paste a server URL and scan it free — no account required:
The registry, scanner, and API are live. The data is public. Trust, but verify.
CraftedTrust is built by Cyber Craft Solutions. We're building the trust infrastructure for the AI agent ecosystem — from scanning MCP servers to cryptographic audit trails. If you're building with MCP and care about security, we'd like to hear from you.
Top comments (0)