DEV Community

Pranay Batta
Pranay Batta

Posted on

How to Connect Non-Anthropic Models to Claude Code with Bifrost AI Gateway

I tested five different LLM gateways to route non-Anthropic models through Claude Code. Bifrost was the fastest by a wide margin. 11 microseconds of overhead per request. 50x faster than the Python-based alternatives I benchmarked.

Here is exactly how I set it up, what worked, and where each feature matters.

TL;DR: Bifrost is an open-source Go gateway that exposes an Anthropic-compatible endpoint, letting you route Claude Code requests to GPT-4o, Gemini, Bedrock, or any supported provider by changing one environment variable. You get multi-provider failover, budget controls, and semantic caching at 11 microseconds of overhead per request.

This post assumes you are familiar with Claude Code and have used at least one LLM API.

Bifrost on GitHub -- open-source, written in Go, handles 5,000 RPS on a single instance.

The Problem

Claude Code locks you into api.anthropic.com. No native way to swap providers. You cannot route to GPT-4o, Gemini, or Bedrock models without building your own proxy or switching tools entirely.

I needed:

  • GPT-4o for certain coding tasks
  • Gemini 2.5 Pro for long context
  • Automatic failover when a provider goes down
  • One place to track costs across all models

Building a custom proxy was not worth the maintenance burden. So I went looking for something production-ready.

What Bifrost Does

Bifrost exposes an Anthropic-compatible endpoint at /anthropic. Claude Code sends standard Anthropic-format requests. Bifrost translates and routes them to whatever provider you configure -- OpenAI, Bedrock, Vertex AI, Gemini, others.

It is a drop-in replacement. Change one URL. No SDK modifications. No wrapper code.

Claude Code -> Bifrost (/anthropic) -> Any LLM Provider
Enter fullscreen mode Exit fullscreen mode

The Anthropic SDK integration page has the full compatibility details.

Setup: 3 Minutes

Step 1: Install

npx -y @maximhq/bifrost
Enter fullscreen mode Exit fullscreen mode

That starts the gateway locally. Full setup instructions here.

Step 2: Configure a Provider

Create bifrost.yaml. This routes everything to GPT-4o:

accounts:
  - id: "my-account"
    providers:
      - id: "openai-primary"
        type: "openai"
        api_key: "${OPENAI_API_KEY}"
        model: "gpt-4o"
        weight: 100
Enter fullscreen mode Exit fullscreen mode

See the provider configuration docs for all supported providers and options.

Step 3: Point Claude Code at Bifrost

export ANTHROPIC_BASE_URL=http://localhost:8080/anthropic
Enter fullscreen mode Exit fullscreen mode

Done. Claude Code now sends requests through Bifrost, which translates them to OpenAI format and forwards to GPT-4o. Zero code changes.

Multi-Provider Routing

This is where it gets interesting. I configured weighted routing across two providers:

accounts:
  - id: "my-account"
    providers:
      - id: "openai-primary"
        type: "openai"
        api_key: "${OPENAI_API_KEY}"
        model: "gpt-4o"
        weight: 80
      - id: "anthropic-fallback"
        type: "anthropic"
        api_key: "${ANTHROPIC_API_KEY}"
        model: "claude-sonnet-4-20250514"
        weight: 20
Enter fullscreen mode Exit fullscreen mode

80% of traffic goes to GPT-4o. 20% to Claude. Useful when you want to compare output quality across models in real usage.

Automatic Failover

This was the feature that sold me. Failover configuration took five minutes. If GPT-4o goes down, Bifrost tries the next provider automatically:

accounts:
  - id: "my-account"
    failover:
      enabled: true
    providers:
      - id: "openai-primary"
        type: "openai"
        api_key: "${OPENAI_API_KEY}"
        model: "gpt-4o"
        priority: 1
      - id: "gemini-secondary"
        type: "gemini"
        api_key: "${GEMINI_API_KEY}"
        model: "gemini-2.5-pro"
        priority: 2
      - id: "anthropic-tertiary"
        type: "anthropic"
        api_key: "${ANTHROPIC_API_KEY}"
        model: "claude-sonnet-4-20250514"
        priority: 3
Enter fullscreen mode Exit fullscreen mode

OpenAI fails, Bifrost tries Gemini. Gemini fails, falls back to Anthropic. My Claude Code session never breaks. No retry logic on my side.

Bedrock and Vertex AI

I also tested with AWS Bedrock and Vertex AI. Same pattern:

providers:
  - id: "bedrock-claude"
    type: "bedrock"
    region: "us-east-1"
    model: "anthropic.claude-sonnet-4-20250514-v2:0"
    priority: 1
  - id: "vertex-gemini"
    type: "vertex"
    project_id: "my-gcp-project"
    region: "us-central1"
    model: "gemini-2.5-pro"
    priority: 2
Enter fullscreen mode Exit fullscreen mode

Same Anthropic-compatible endpoint. Claude Code does not know which provider is behind Bifrost. That is the point.

Features Worth Mentioning

Routing alone is useful. But once all requests flow through one gateway, you get access to several other capabilities I found genuinely practical.

Budget Enforcement

Bifrost has a four-tier budget hierarchy: Customer, Team, Virtual Key, Provider Config. I set team-level limits:

budgets:
  - level: "team"
    id: "engineering"
    limit: 500
    period: "monthly"
Enter fullscreen mode Exit fullscreen mode

Budget runs out, requests get blocked. No surprise bills. The governance docs cover the full hierarchy.

Semantic Caching

This cut my costs noticeably. Bifrost supports dual-layer caching: exact hash matching plus semantic similarity. If I have already asked a similar question, it returns the cached response instead of hitting the provider.

Supported vector stores: Weaviate, Redis, Qdrant.

Observability

Every request gets logged with latency, tokens, cost, and provider information. The observability layer gives you full visibility into what is happening across all your providers.

MCP Support

Bifrost also works as an MCP server. I tested Code Mode -- it reduced tokens by over 50% and latency by 40-50%. Agent Mode is available for more complex workflows. Useful if you are connecting to Claude Desktop or other MCP-compatible clients.

Benchmarks

I ran my own tests and the numbers matched what is documented. 11 microseconds overhead. 5,000 RPS on a single instance. The Go implementation makes a real difference compared to Python gateways I tested.

The benchmarking guide explains how to reproduce these numbers yourself.

Trade-offs and Limitations

Worth being upfront about the downsides:

  • Relatively new project. Bifrost does not have the years of battle-testing that older proxies have. The community is growing but smaller than established alternatives.
  • Self-hosted only. The open-source version has no managed cloud offering. You run and maintain the infrastructure yourself.
  • Extra operational overhead. You are running a separate process between Claude Code and your LLM provider. That is one more thing to monitor, update, and debug compared to direct API calls.
  • Provider coverage is expanding but not exhaustive. Some niche providers or model variants may not be supported yet. Check the docs before committing.

Quick Recap

  1. Install: npx -y @maximhq/bifrost
  2. Configure providers in bifrost.yaml
  3. Set ANTHROPIC_BASE_URL=http://localhost:8080/anthropic
  4. Use Claude Code normally. Bifrost routes to whatever model you configured.

I tried building custom proxies before. I tried other gateways. This is the fastest option I found, and the setup takes minutes not hours.

GitHub repo | Docs | Website

If you run into issues or want a specific provider supported, open an issue on the repo.

Further Reading

Top comments (1)

Collapse
 
ali_muwwakkil_a776a21aa9c profile image
Ali Muwwakkil

One surprising insight from our work with enterprise teams is that agents often become bottlenecks in AI deployments. Instead of speeding up processes, they can introduce complexity when not integrated seamlessly into existing workflows. Using Bifrost or similar gateways can help simplify this by providing a robust framework for routing, which reduces friction and ensures models interact effectively with codebases like Claude. This alignment is key to unlocking the full potential of LLMs in production environments. - Ali Muwwakkil (ali-muwwakkil on LinkedIn)