Nebius AI Studio Q1 2025 roundup: Fine-tuning, new models and major expansions

We’ve had a very productive first quarter of 2025 at Nebius AI Studio, shipping major features, adding powerful new models and significantly expanding our infrastructure. This comprehensive roundup covers everything we’ve launched and where you can find us next.

Fine-tuning: Now available for everyone

After an extensive beta period, we’re thrilled to announce that fine-tuning is now generally available on Nebius AI Studio. This powerful capability allows you to transform generic AI models into specialized solutions that work precisely for your unique needs.

With our fine-tuning system, you can:

  • Choose from 30+ leading open-source models, including Llama 3, Mistral and Qwen.

  • Utilize both LoRA (efficient parameter-efficient fine-tuning) and full fine-tuning approaches.

  • Deploy models like Llama 3.3-70B in seconds.

  • Download model checkpoints or deploy instantly on our infrastructure.

Fine-tuned models deliver significant advantages over generic models: they become more compact, responsive and domain-specialized; understanding your industry terminology and without complex prompting. The end result is improved accuracy, reduced costs and more consistent outputs.

Our developer-friendly workflow lets you move from model selection to fine-tuning and deployment in minutes by using our OpenAI-compatible API. You can download model checkpoints for local use or instantly deploy models like Llama 3.3-70B, Meta-Llama-3.1-8B and Qwen2.5-72B on our infrastructure, in just one click.

What can you build? Our customers are already creating domain-specific assistants that follow company guidelines, specialized content generators that match their brand voice and knowledge systems for numerous industries — all with reduced hallucinations and improved accuracy.

New models: Expanding our catalog

We’ve significantly expanded our model lineup, with several powerful new additions:

  • Google’s Gemma 3 27B: A powerful multimodal model with a 128K token context window.

  • DeepSeek-V3-0324: The highest scoring non-reasoning model, a milestone for open source AI.

  • DeepSeek R1 & R1 (Fast): Our latest reasoning model, with exceptional capabilities (in base and fast).

  • DeepSeek R1-Distill-Llama-70B: An optimized distilled version for efficient inference, while maintaining impressive capabilities.

  • NousResearch/Hermes-3-Llama-405B: A powerful model for handling complex tasks with state-of-the-art performance.

  • Alibaba Qwen QwQ-32B: For outstanding reasoning capabilities (in base and fast).

Each of these models is available through our straightforward API and Playground, with the same competitive pricing structure our users have come to expect.

Introducing prompt presets in Playground

We’ve added a powerful new feature: prompt presets. This functionality allows you to:

  • Save your perfect prompt and model configurations directly in the Playground.

  • Build a personal prompt library, with one-click access to your best prompts.

  • Share your winning setups with team members.

  • Compare saved prompts side-by-side across different models.

From ideation to production, this feature keeps your AI workflows organized and efficient, helping you spend less time rewriting and more time building.

Significantly increased rate limits

Production applications need reliable throughput, and we’ve raised our limits to ensure your applications can scale:

LLM models

  • Llama 3.3-70B: Increased to 3 million tokens per minute.

  • DeepSeek-V3: Increased to 1 million tokens, with 3,000 requests per minute.

Image generation

  • Flux models: Increased to 100 images per minute.

  • Stable Diffusion XL: Increased to 50 images per minute.

For enterprise customers with even higher needs, we scale to 100M+ tokens per minute on request. Our goal is simple: ensure you never hit artificial ceilings that block your growth. You can check your rate limits here.

New integrations with developer tools

We’ve set up several major integrations that make Nebius AI Studio more accessible within your existing workflows:

  • Hugging Face: We’re now an official inference provider on the Hugging Face Hub! This integration allows you to access our models directly through the Hugging Face UI or via their client SDKs (Python and JavaScript). Switch between providers with a single line of code.

  • Helicone: Get enhanced observability and logging for your Nebius AI Studio usage through this popular monitoring platform.

  • LlamaIndex: Build RAG applications and more with our models through the LlamaIndex framework, making it easier to create advanced search and retrieval systems.

  • Postman: A single source of Truth for all API workflows.

Text-to-image generation: From concept to visual in seconds

Earlier in Q1, we launched our text-to-image generation service, bringing premium image creation capabilities to Nebius AI Studio. This service delivers high-quality, production-ready images at a fraction of typical costs.

Key capabilities include:

  • High-resolution outputs up to 2000×2000 pixels (4 megapixels).

  • Ultra-fast generation in as little as 1.8 seconds per image.

  • Simple, predictable costs starting from just $0.0013 per image.

  • Advanced parameter controls, including seed management and negative prompts.

  • OpenAI-compatible API for seamless integration.

We offer multiple models to suit different needs:

  • Flux Schnell: Optimized for speed (1.8 seconds per image), perfect for rapid prototyping.

  • Flux Dev: Premium quality for production-ready content.

  • Stable Diffusion XL: Enhanced creative control for professional creative work.

Whether you’re generating marketing assets, prototyping UI designs or creating e-commerce product imagery, our text-to-image service scales to meet your needs. Start generating images here.

Infrastructure expansion

To support our growing user base and ensure optimal performance worldwide, we’re significantly expanding our infrastructure:

  • New Jersey region: A facility of up to 300 MW opening this summer through partnership with DataOne will enhance our US-based compute capacity.

  • Iceland expansion: A new colocation facility with Verne is going live this month, powered solely by sustainable energy. This expansion enhances our European infrastructure, while minimizing environmental impact.

These additions complement our existing regions, to provide global coverage with low-latency access.

Meet us at upcoming events

We love connecting with our community in person! Here’s where we’ve been and where you can find us next:

Recent events

  • NVIDIA GTC 2025 in San José

  • AI Dev 25 in San Francisco (March 14, 2025)

Upcoming events

  • GITEX Asia x AI Everything in Singapore (April 23-25)

  • LlamaCon (April 29)

Want to meet our team in person? Check our full events calendar and come say hello! We’d love to hear about what you’re building and discuss how Nebius can support your AI journey.

What’s coming next

Our product roadmap for Q2 2025 is packed with exciting developments:

  • Image-to-image transformations

  • Advanced editing capabilities

  • Enhanced model fine-tuning options

  • Improved control features

  • Continuous model optimizations for better quality and performance

We’re committed to building the most powerful, flexible and developer-friendly AI platform on the market. These Q1 updates represent a major step toward that vision, with much more to come.

Ready to explore these new features? Sign in to your AI Studio dashboard or contact our team if you have any questions.

Explore Nebius AI Studio

Explore Nebius AI Cloud

Sign in to save this post