LiteLLM logo

LiteLLM

by BerriAI
GenAI & Agents

LiteLLM is an open-source proxy gateway that exposes a single OpenAI-compatible API for calls to 100+ LLM providers — OpenAI, Anthropic, Google (Gemini), AWS (Bedrock/SageMaker), Hugging Face, Cohere, and many more. Point your applications at a single endpoint and switch between models without touching client code. The proxy adds per-team rate limiting, virtual API keys, model fallbacks, cost tracking, and observability out of the box.

Key features

Unified OpenAI-compatible API

Call any of 100+ supported LLM providers through one OpenAI-style endpoint; clients work unchanged.

Per-team keys, budgets, and rate limits

Issue virtual API keys to teams or applications with independent budgets, rate limits, and model allowlists.

Model fallbacks and routing

Define ordered fallback chains across providers so traffic shifts automatically when a provider is degraded or rate-limited.

Cost tracking and observability

Per-key spend dashboards, Prometheus metrics, and request logs to Langfuse / Helicone / S3 for analysis.

Self-managed in your cluster

Runs as a stateless proxy Deployment with an optional Postgres-backed virtual-key store; scale horizontally as load grows.


Pricing

Additional Nebius infrastructure costs may apply. Use the Nebius Pricing Page to estimate your infrastructure costs.

Self-managed

LiteLLM on Kubernetes

Deploy the LiteLLM proxy gateway in your Kubernetes cluster to mediate calls to 100+ LLM providers through a single OpenAI-compatible API.

Free
Charged for resources
Setup time10+ minutes
ScalingManual
MaintenanceSelf-managed (cluster)
Deploy
White-glove

Deploy with a solutions architect

Some applications are easier with a hand on the wheel. Talk to an architect who has deployed this in production.

  • Architecture review & sizing
  • Hands-on deploy session
  • 30 days of follow-up support
Talk to an expert

Security & compliance

Run LiteLLM on infrastructure built for AI workloads

Reliable AI infrastructure backed by top-tier NVIDIA GPUs, purpose-built for demanding inference workloads. Multiple deployment methods — virtual machines for full hardware control, Kubernetes for scalable cluster deployments, and managed serverless applications for teams that want inference running without infrastructure overhead

Learn about Nebius AI Cloud

Security & compliance, out of the box

Nebius meets a broad set of security and compliance standards. Fine-grained IAM controls, audit logs, and encrypted storage are available out of the box — so teams can meet security requirements without additional tooling.

Explore the Trust center

Support

Application support

Provided by BerriAI. See the documentation and project links above.

Infrastructure support

Provided by Nebius for the underlying cloud infrastructure.