VoidLLM vs LiteLLM - An Honest Comparison from the Builder's Perspective

#llm #proxy #go #privacy

If you're running LLMs in production, you've probably evaluated LiteLLM. It's the most popular gateway out there - 100+ providers, massive community, used by companies like Stripe and Netflix.

I built VoidLLM with a different set of priorities. Here's an honest comparison - including where LiteLLM is ahead.

Why I built something different

We were running self-hosted models in Kubernetes, hitting vLLM directly. No proxy, network policies were the only access control. It worked until we needed to know which team was burning through GPU hours.

LiteLLM was the obvious first choice, but the Python runtime, startup time, and dependency tree felt heavy for what we needed. We also had a hard GDPR requirement - no prompt content could be stored anywhere.

So we built VoidLLM in Go.

What VoidLLM does differently

Privacy by architecture. There's no "disable content logging" toggle - because there's no content logging code. The proxy reads the model field from the request body, streams bytes between client and upstream, and forgets. Usage events track who, which model, how many tokens - nothing else.

Single binary. One Go binary (~25MB) with the admin UI embedded. No Python, no pip, no virtualenv. Download, configure, run.

Performance. Under 500 microseconds of proxy overhead at 2000 RPS. We benchmarked this with Vegeta at sustained load on a 12-core machine.

Built-in admin UI. Key management, usage tracking, model configuration, playground, team management - all embedded in the binary. Not a separate service.

MCP Gateway. VoidLLM doubles as an MCP gateway - register external MCP servers, proxy tool calls with scoped access control. Plus Code Mode: AI agents write JavaScript that orchestrates multiple MCP tool calls in a single WASM-sandboxed execution.

RBAC. Org/team/user/key hierarchy with four roles. Rate limits, token budgets, and model access control at every level. Most-restrictive-wins inheritance.

Load balancing. Multi-deployment models with round-robin, least-latency, weighted, and priority routing. Automatic failover with per-deployment circuit breakers.

Where LiteLLM is better

I'll be honest about this.

Provider coverage. LiteLLM supports 100+ providers. VoidLLM supports 6 (OpenAI, Anthropic, Azure, Ollama, vLLM, custom). If you need native Bedrock, VertexAI, or Cohere integration, LiteLLM has us beat.

Community. Thousands of users, extensive docs, large contributor base. VoidLLM is new. Our docs are solid but our community is just getting started.

Python SDK. If your stack is Python-native and you want a library you can import directly, LiteLLM's SDK is a natural fit. VoidLLM is a standalone proxy - you point your SDK at it.

Observability integrations. LiteLLM connects to Langfuse, Lunary, MLflow for request-level observability. VoidLLM deliberately avoids content-level logging - that's the privacy trade-off.

Quick comparison

	VoidLLM	LiteLLM
Language	Go	Python
Proxy overhead	< 500us P50	~8ms P95
Providers	6	100+
Content logging	Never (by design)	Optional
Deployment	Single binary	Python runtime + deps
Admin UI	Embedded	Separate service
MCP Gateway	Built-in + Code Mode	Recent addition
RBAC	Org/team/user/key	Virtual keys
Load balancing	4 strategies + failover	Retry/fallback
License	BSL 1.1	MIT

Switching is easy

Both are OpenAI-compatible. Switching from LiteLLM to VoidLLM (or back) is a base URL change:

from openai import OpenAI

# Before (LiteLLM)
client = OpenAI(base_url="http://litellm:4000/v1", api_key="sk-...")

# After (VoidLLM)
client = OpenAI(base_url="http://voidllm:8080/v1", api_key="vl_uk_...")

Your application code stays the same.

Bottom line

If you need 30+ LLM providers and a Python SDK with a large ecosystem, use LiteLLM.

If you care about privacy by design, want zero operational overhead (one binary, SQLite default), need sub-millisecond proxy performance, or want an MCP gateway built in - take a look at VoidLLM.

Different problems, different trade-offs. Pick what fits.