vLLM logo

vLLM

by vLLM Project
Serving

vLLM provides an optimized inference runtime for open-source language models with PagedAttention, continuous batching, and an OpenAI-compatible API, allowing teams to tune engine parameters for latency, throughput, and memory efficiency across different model and GPU configurations.

Key features

Optimized performance

vLLM delivers state-of-the-art high-throughput serving.

OpenAI-compatible server

Drop-in replacement for OpenAI-style clients and SDKs.

Runtime parameterization

Expose key engine and server args to tune latency, throughput, and memory.

Flexible model support

Deploy popular Hugging Face models for chat, completions, and more.


Pricing

Additional Nebius infrastructure costs may apply. Use the Nebius Pricing Page to estimate your infrastructure costs.

Self-managed

vLLM on VM

Root access & custom setup. Maximum performance tuning. Direct hardware control.

Free
Charged for resources
Setup time2-5 minutes
ScalingManual
MaintenanceSelf-managed
Deploy
Self-managed

vLLM on Kubernetes

Run on your own Kubernetes for horizontal scaling and upgrades as you grow.

Free
Charged for resources
Setup time20+ minutes
ScalingAuto
MaintenanceSelf-managed (cluster)
Deploy

Security & compliance

Run vLLM on infrastructure built for AI workloads

Reliable AI infrastructure backed by top-tier NVIDIA GPUs, purpose-built for demanding inference workloads. Multiple deployment methods — virtual machines for full hardware control, Kubernetes for scalable cluster deployments, and managed serverless applications for teams that want inference running without infrastructure overhead

Learn about Nebius AI Cloud

Security & compliance, out of the box

Nebius meets a broad set of security and compliance standards. Fine-grained IAM controls, audit logs, and encrypted storage are available out of the box — so teams can meet security requirements without additional tooling.

Explore the Trust center

Support

Application support

Provided by vLLM Project. See the documentation and project links above.

Infrastructure support

Provided by Nebius for the underlying cloud infrastructure.