NVIDIA Nemotron Nano 2 VL in Nebius AI Studio: powering agentic multimodal AI

October 28, 2025

3 mins to read

Nebius AI Studio is now Nebius Token Factory: same platform, new name, more power for running AI at scale.

We’re pleased to announce that Nebius AI Studio now hosts NVIDIA Nemotron Nano 2 VL, a compact, production-ready multimodal reasoning model engineered for real-world document intelligence and video understanding.

Built on the innovative NVIDIA hybrid Mamba-Transformer architecture, Nemotron Nano 2 VL delivers high accuracy and efficiency, making advanced vision-language intelligence accessible without the cost or latency of oversized models.

With this release, developers can now experience even more flexibility in deploying and scaling multimodal applications directly through Nebius AI Studio’s inference platform via OpenAI-compatible API.

Open, efficient and specialized AI

Nemotron Nano 2 VL is part of the broader NVIDIA Nemotron family of open models, datasets and recipes that empower developers to build trustworthy, domain-specific AI systems.

By combining open weights, permissively licensed data and reproducible training recipes, the model provides transparency and flexibility for building enterprise-grade multimodal assistants and pipelines.

Click to expand

Model highlights

High accuracy for vision and document tasks — Excellent for OCR, chart reasoning, dense image captioning and video comprehension.
Hybrid Mamba-Transformer design — Increases throughput to process multi-image workloads faster, reducing inference cost.
Efficient video sampling (EVS) — Processes more video at lower inference cost.
Open and customizable — Open weights, datasets and recipes for complete transparency and model adaptation.

What developers can build

With Nemotron Nano 2 VL on Nebius AI Studio, teams can integrate multimodal reasoning into products in just a few API calls.

Developers are already using it to:

Build document-intelligent assistants that can read dashboards, forms and diagrams with contextual understanding.
Develop video summarization and search tools that extract scenes, captions and insights from unstructured footage.
Automate image and media curation pipelines for ad placements and e-commerce catalogs.

Each of these use cases benefits from Nemotron Nano 2 VL’s efficiency and low latency, NVIDIA accelerated compute and Nebius’s enterprise-grade infrastructure for fast, cost-effective deployment.

Deploy with Nebius AI Studio

Nebius AI Studio offers a high-performance, OpenAI-compatible inference platform optimized for running Nemotron Nano 2 VL in production environments.

Developers can:

Run inference with zero data retention.
Scale effortlessly with usage-based pricing and dedicated endpoints with autoscaling.
Access the model directly through the Nebius AI Studio API or the Playground.

Together, NVIDIA Nemotron Nano 2 VL and Nebius AI Studio give developers a powerful foundation for building agentic and multimodal AI systems that are open, efficient and ready for production.

Explore Nebius AI Cloud

Docs

Explore Nebius Token Factory

Docs and support

Dylan Bristot

Head of Product Marketing, Token Factory

Contents

Open, efficient and specialized AI
Model highlights
What developers can build
Deploy with Nebius AI Studio

Let’s explore one of the key features that makes the new NVIDIA GB200 NVL72 stand out: the fifth generation NVIDIA NVLink™ scale-up fabric. We’ll discuss how it redefines infrastructure by moving beyond the traditional 8-GPU NVLink. You’ll see a practical example of how to take advantage of this capability. Finally, we’ll examine a real-world use case: pre-training the Nemotron-4 340B LLM.

Nebius achieves NVIDIA Exemplar Status on NVIDIA H200 GPUs for training workloads

We’re proud to announce that Nebius is one of the first NVIDIA Cloud Partners to achieve NVIDIA Exemplar Status on NVIDIA H200 GPUs for training workloads. This recognition validates that Nebius meets NVIDIA’s rigorous standards for performance, resiliency, and scalability — addressing one of the most pressing challenges in AI infrastructure: ensuring consistent workload performance and predictable cost across clouds.

Build a multi-agent AI customer support system

This guide walks you through building a production-ready, multi-agent AI system by using the Google ADK and A2A, powered by Nebius AI Studio models. With sentiment detection, RAG-powered answers and escalation handling, you can automate customer queries end-to-end.

NVIDIA Nemotron Nano 2 VL in Nebius AI Studio: powering agentic multimodal AI

Open, efficient and specialized AI

Model highlights

What developers can build

Deploy with Nebius AI Studio

Explore Nebius AI Cloud

Explore Nebius Token Factory

See also

Leveraging high-speed, rack-scale GPU interconnect with NVIDIA GB200 NVL72

Nebius achieves NVIDIA Exemplar Status on NVIDIA H200 GPUs for training workloads

Build a multi-agent AI customer support system

NVIDIA Nemotron Nano 2 VL in Nebius AI Studio: powering agentic multimodal AI

Open, efficient and specialized AIOpen, efficient and specialized AI

Model highlightsModel highlights

What developers can buildWhat developers can build

Deploy with Nebius AI StudioDeploy with Nebius AI Studio

Explore Nebius AI Cloud

Explore Nebius Token Factory

See also

Leveraging high-speed, rack-scale GPU interconnect with NVIDIA GB200 NVL72

Nebius achieves NVIDIA Exemplar Status on NVIDIA H200 GPUs for training workloads

Build a multi-agent AI customer support system

Open, efficient and specialized AI

Model highlights

What developers can build

Deploy with Nebius AI Studio