Nebius AI Studio expands with vision models, new language models, embeddings and LoRA

As the holiday season approaches and we all juggle year-end projects, we’re excited to bring you a gift of expanded capabilities on Nebius AI Studio. These updates — including advanced vision models, a broader range of language models, powerful embeddings and LoRA hosting — are designed to help you jumpstart the new year with stronger, more versatile AI tools.

December 17, 2024

2 mins to read

Nebius AI Studio is now Nebius Token Factory: same platform, new name, more power for running AI at scale.

By the way, all the features listed below you can leverage without worrying about scalability or rate limits. Our infrastructure automatically scales with your business growth, ensuring seamless performance from prototype to production.

Vision models: bringing sight to your AI

This update introduces vision-language models that enable your applications to understand and interact with visual content. From image captioning to product recognition, these models let you add a new dimension to your AI capabilities — perfect for holiday product catalogs or seasonal marketing campaigns.

New vision models include:

Qwen2-VL-72B-Instruct: High-performance for complex visual tasks.
Qwen2-VL-7B-Instruct: Lightweight yet versatile.
LLaVA-v1.5-13b and LLaVA-v1.5-7b: Specialized in image captioning and visual Q&A.

Expanded language model portfolio

We’ve broadened our language model offerings to meet diverse use cases — from complex reasoning to multilingual scenarios. Whether you’re refining product recommendations before the holiday rush or planning next year’s multilingual marketing campaigns, there’s a model that fits your needs.

New language models include:

Meta Llama-3.3-70B-Instruct: Access Meta’s latest state-of-the-art model featuring a 131K context window and support for 8 languages including English, German, French, Italian, Spanish, and Thai. Perfect for enterprise applications requiring top-tier performance.
dolphin-2.9.2-mixtral-8×22b: Built for conversational AI and coding tasks, featuring a 64K context window. Excels at following detailed instructions and handling complex dialogues. Perfect for chatbots and coding assistants.
Phi-3.5-MoE-instruct: Advanced mixture-of-experts model balancing high performance with efficient resource usage. Ideal for production applications requiring reliable performance.
Phi-3.5-mini-instruct: Compact version delivering quick responses for applications where speed is essential.
Qwen2.5-1.5B-Instruct: Fast, lightweight model perfect for testing and rapid prototyping.
Llama 3 Series (Med42-8B, 3.2 1B, 3.2 3B): Range of models for different needs, from specialized medical applications to general NLP tasks.

All models are available through our unified API, with both base and fast inference options to match your performance needs.

New embedding models for RAG

We’re adding three embedding models to strengthen your retrieval-augmented generation pipelines. Perfect for building knowledge bases, advanced semantic search engines or contextual chatbots — just in time to organize year-end data reviews.

New embeddings:

BGE-ICL: A leading open-source embedding model.
e5-mistral-7b-instruct: Instruction-tuned for richer context.
bge-multilingual-gemma2: Multilingual support for global applications.

LoRA hosting: a simpler, usage-based approach

We’re taking a different approach compared to other providers: pure usage-based pricing, no fixed costs and zero overhead — with no dedicated instances, monthly commitments or infrastructure management.

Here’s how it works:

Bring your LoRA weights: Simply upload your pre-trained LoRA model to Nebius AI Studio.
Start running inference: No environment setup or scaling decisions needed.
Pay only for what you use: Our per-token pricing ensures you don’t pay when your models sit idle.

This is ideal for teams that have already fine-tuned their models elsewhere and now need a hassle-free, cost-effective way to deploy and run them — especially during high-demand periods, without paying for unused capacity.

How to get started with LoRA:

Go to the LoRA section in the AI Studio UI.
Click “Request Hosting.”
Submit your model details.
Our team will handle the rest.

Scalability without limits

Scale your AI operations confidently with Nebius AI Studio’s enterprise-grade infrastructure:

— True unlimited scale: Our baseline capacity handles hundreds of millions of tokens per minute — and that’s just the beginning. Need more? We’ll scale our infrastructure to match your requirements, no matter how high. No artificial rate limits, just pure scaling potential.

— Massive batch processing: Process up to 5 million requests per file, with support for files up to 10GB in size, and handle 500 concurrent files per user. Works best for:

Large-scale data analysis
Bulk content generation
Dataset processing
Document analysis at scale

— Consistent performance: Whether you’re running a prototype or processing billions of requests in production, enjoy reliable response times and consistent model performance.

— Flexible deployment: Choose between base and fast inference options to optimize for either cost efficiency or maximum throughput based on your needs.

Use cases and applications

Computer vision

Image captioning and interpretation: Enhance accessibility, improve product listings, or streamline media cataloging.
Visual question answering: Enable intuitive image-based search and interaction.
Content moderation and compliance: Automatically identify and filter out inappropriate content.

Advanced language processing

Domain-specific adaptation: Tailor models to industry-specific terminology, knowledge bases, and workflows.
Multilingual applications: Serve global audiences by supporting a broad range of languages in search, chat, and content generation.
Complex reasoning and decision support: Strengthen research, analysis, and strategic planning through more nuanced and context-aware responses.

RAG implementations

Document retrieval and summarization: Quickly surface the right information from large repositories for faster decision-making.
Knowledge base enhancement: Improve internal knowledge systems with context-aware, content-rich responses.
Semantic search: Deliver more relevant search results by understanding user intent, not just keywords.

Getting started

Log in: Access Nebius AI Studio with your account (or create one).
Explore: Experiment in our playground with new models and embeddings.
Integrate: Use our comprehensive Open AI compatible API and updated documentation.

All new capabilities are available now in production. Our transparent, token-based pricing ensures cost-effective scaling—key as you plan your budget for the coming year. Visit our pricing or contact us for more info.

Explore Nebius

Docs

Explore Nebius Token Factory

Docs and support

Nebius team

Nebius AI Studio expands with vision models, new language models, embeddings and LoRA

Vision models: bringing sight to your AI

Expanded language model portfolio

New embedding models for RAG

LoRA hosting: a simpler, usage-based approach

Scalability without limits

Use cases and applications

Computer vision

Advanced language processing

RAG implementations

Getting started

Explore Nebius

Explore Nebius Token Factory

See also

Introducing Nebius AI Studio: Achieve fast, flexible inference today

Building an AI-powered finance planner with full-stack Next.js and Nebius AI Studio

Creating your own AI-powered code generator and reviewer

Products

Resources

Solutions

Prices

Security and compliance

Programs

Company

Legal

Nebius AI Studio expands with vision models, new language models, embeddings and LoRA

Vision models: bringing sight to your AIVision models: bringing sight to your AI

Expanded language model portfolioExpanded language model portfolio

New embedding models for RAGNew embedding models for RAG

LoRA hosting: a simpler, usage-based approachLoRA hosting: a simpler, usage-based approach

Scalability without limitsScalability without limits

Use cases and applicationsUse cases and applications

Computer visionComputer vision

Advanced language processingAdvanced language processing

RAG implementationsRAG implementations

Getting startedGetting started

Explore Nebius

Explore Nebius Token Factory

See also

Introducing Nebius AI Studio: Achieve fast, flexible inference today

Building an AI-powered finance planner with full-stack Next.js and Nebius AI Studio

Creating your own AI-powered code generator and reviewer

Vision models: bringing sight to your AI

Expanded language model portfolio

New embedding models for RAG

LoRA hosting: a simpler, usage-based approach

Scalability without limits

Use cases and applications

Computer vision

Advanced language processing

RAG implementations

Getting started