Inference Service by Nebius AI Studio

Use hosted open-source models and achieve faster, cheaper and more accurate inference results than with proprietary APIs.

$100 in free inference creditsnew

Sign up today and receive $100* in free inference credits: special offer is valid until January 15, 2025, for new customers only.

Save 3x on input tokens

You only pay for what you use, ensuring you meet your budget goals, making Inference Service perfect for RAG and contextual scenarios.

Achieve ultra-low latency

Our highly optimized serving pipeline guarantees a fast time to first token in Europe, where our data center is located, and beyond.

Verified model quality

We perform a set of tests to ensure high accuracy with a diverse range of open-source models.

Choose speed or economy

We offer you a choice between fast flavor for quicker results at a higher cost, or base flavor for slower but more economical processing.

No MLOps experience required

Benefit from simplicity with our production-ready infrastructure that’s already set up and ready to use.

Benchmark-backed performance and cost efficiency

time to first token in Europe than competitors

than GPT-4o with comparable quality on Llama-405B

input tokens price for Meta-Llama-405B

Top open-source models available

Meta
Llama-3.1-8B-instruct
A small yet powerful language model with results better than GPT-3.5 and many larger models.

128k context

LLama 3.1 License

Meta
Llama-3.1-405B-instruct
The largest and most powerful open model, comparable to GPT-4 and Claude 3.5 Sonnet.

128k context

LLama 3.1 License

Mistral
Mistral-Nemo-Instruct-2407
Outperforming larger models of its generation, this model shows the potential of compact architectures.

128k context

Apache 2.0 License

Mistral
Mixtral-8x22B-Instruct-v0.1
A Mixture-of-Experts (MoE) model ready for coding and math. One of its focuses is multilingual capabilities.

65k context

Apache 2.0 License

Ai2
OLMo-7B-Instruct
A fully open-source model with all training data and processes published.

2k context

Apache 2.0 License

Microsoft
Phi-3-mini-4k-instruct
Trained on synthetic and high-quality web-sourced data, this model shows strength in reasoning and long context.

4k context

MIT License

DeepSeek
DeepSeek-Coder-V2-Lite-Instruct
A lightweight and fast version of the most powerful model for coding questions.

128k context

DeepSeek license

Nebius
And much more...
Take a look at our Playground to see the models available today — we're continuously adding new and diverse models to expand our offerings.

A simple and friendly UI for a smooth user experience

Sign up and start testing, comparing and running AI models in your applications.

Full screen image

Familiar API at your fingertips

import openai
import os

client = openai.OpenAI(
    api_key=os.environ.get("NEBIUS_API_KEY"),
    base_url='https://api.studio.nebius.ai/v1'
)

completion = client.chat.completions.create(
    messages=[{
        'role': 'user',
        'content': 'What is the answer to all questions?'
    }],
    model='meta-llama/Meta-Llama-3.1-8B-Instruct-fast'
)

Optimize costs with our flexible pricing

$100 welcome credit

Sign up today and receive $100* in free inference credits to try our product through the Playground, or to spend on your inference workloads through the API.

Playground

The Nebius AI Studio provides a model playground: a web interface to try out different AI models available in Nebius AI Studio without writing any code.

Two flavors

Choose between fast and base flavors to suit your project needs. Fast flavor delivers quicker results for time-sensitive tasks, while base flavor offers economical processing for larger workloads.

Check out available models and prices

Model
Flavor
Input token (1M)
Output token (1M)
llama-3.1-8b-instruct
fast
$0.03
$0.09
base
$0.02
$0.06
llama-3.1-70B-instruct
fast
$0.25
$0.75
base
$0.13
$0.40
llama-3.1-405b-instruct
fast
-
-
base
$1
$3
mistral-nemo-instruct-2407
fast
$0.08
$0.24
base
$0.04
$0.12
mixtral-8x7B-instruct-v0.1
fast
$0.15
$0.45
base
$0.08
$0.24
mixtral-8x22b-instruct-v0.1
fast
$0.70
$2.1
base
$0.40
$1.2
OLMo-7B-Instruct
fast
-
-
base
$0.08
$0.24
Qwen2.5-Coder-7B
fast
$0.03
$0.09
base
$0.01
$0.03
Qwen2.5-Coder-7B-Instruct
fast
$0.03
$0.09
base
$0.01
$0.03
phi-3-mini-4k-instruct
fast
$0.13
$0.4
base
$0.04
$0.12
deepseek-coder-v2-lite-instruct
fast
$0.08
$0.24
base
$0.04
$0.12

Q&A about Inference Service

Can I use your service for large production workloads?

Absolutely, our service is designed specifically for large production workloads.

Welcome to Nebius AI Studio

Nebius AI Studio is a new product from Nebius designed to help foundation model users and app builders simplify the process of creating applications using these models. Our first release, Inference Service, provides endpoints for the most popular AI models.

* — Valid until January 15, 2025, for new customers only. Limited to one offer per customer. Terms and conditions apply. Offer may be subject to change.