NVIDIA Nemotron 3 Super now available on Nebius Token Factory

March 11, 2026

4 mins to read

NVIDIA Nemotron 3 Super is now available on Nebius Token Factory.

Nemotron 3 Super is a 120B parameter hybrid MoE model optimized for multi-agent applications and complex reasoning workflows. With 12B active parameters per inference step and up to 1M token context length, it is designed for long-horizon planning, tool calling and high-accuracy instruction following.

Built for agentic systems

Nemotron 3 Super combines a hybrid Transformer–Mamba architecture with mixture-of-experts routing to improve compute efficiency while maintaining strong reasoning performance.

Key characteristics:

120B parameters, 12B active;
Up to 1M token context;
Multi-Token Prediction for faster long-form generation;
Open weights, open datasets and open training recipes;
Text-in, text-out model inference.

The model targets production use cases such as:

Software development workflows, including code generation and analysis;
Deep research agents for long-horizon planning and reasoning;
Financial document processing;
Cybersecurity triage and threat intelligent analysis.

Run NVIDIA Nemotron 3 Super in production

On Nebius Token Factory, Nemotron 3 Super can be deployed via:

Dedicated GPU endpoints with guaranteed performance;
Autoscaling throughput for production workloads;
OpenAI-compatible API integration;
EU or US regional deployment options;
Optional zero-retention inference.

Token Factory enables teams to move from model access to production deployment without managing GPU clusters or inference infrastructure.

Get started

Nemotron 3 Super is available today in the Nebius Token Factory console.

Deploy via API or test in the Playground. Build AI engineered for your product.

Explore Nebius Token Factory

Docs and support

Explore Nebius AI Cloud

Docs

Nebius team

Contents

Built for agentic systems
Run NVIDIA Nemotron 3 Super in production
Get started

We are introducing Dedicated Endpoints and a Custom Weights Hub in Nebius Token Factory. You can now choose GPU type, define GPUs per replica, set scaling limits, select region and deploy your own model weights to isolated endpoints. Deployment becomes a defined, controllable part of your production architecture.

NVIDIA Nemotron Nano 2 VL in Nebius AI Studio: powering agentic multimodal AI

We’re pleased to announce that Nebius AI Studio now hosts NVIDIA Nemotron Nano 2 VL, a compact, production-ready multimodal reasoning model engineered for real-world document intelligence and video understanding.

Introducing self-service NVIDIA Blackwell GPUs in Nebius AI Cloud

NVIDIA HGX B200 instances are now publicly available as self-service AI clusters in Nebius AI Cloud. This means anyone can access NVIDIA Blackwell — the latest generation of NVIDIA’s accelerated computing platform — with just a few clicks and a credit card.

NVIDIA Nemotron 3 Super now available on Nebius Token Factory

Built for agentic systems

Run NVIDIA Nemotron 3 Super in production

Get started

Explore Nebius Token Factory

Explore Nebius AI Cloud

See also

Introducing Dedicated Endpoints and Custom Weights Hub in Nebius Token Factory

NVIDIA Nemotron Nano 2 VL in Nebius AI Studio: powering agentic multimodal AI

Introducing self-service NVIDIA Blackwell GPUs in Nebius AI Cloud

NVIDIA Nemotron 3 Super now available on Nebius Token Factory

Built for agentic systemsBuilt for agentic systems

Run NVIDIA Nemotron 3 Super in productionRun NVIDIA Nemotron 3 Super in production

Get startedGet started

Explore Nebius Token Factory

Explore Nebius AI Cloud

See also

Introducing Dedicated Endpoints and Custom Weights Hub in Nebius Token Factory

NVIDIA Nemotron Nano 2 VL in Nebius AI Studio: powering agentic multimodal AI

Introducing self-service NVIDIA Blackwell GPUs in Nebius AI Cloud

Built for agentic systems

Run NVIDIA Nemotron 3 Super in production

Get started