NVIDIA Nemotron 3 Super now available on Nebius Token Factory

NVIDIA Nemotron 3 Super is now available on Nebius Token Factory.

Nemotron 3 Super is a 120B parameter hybrid MoE model optimized for multi-agent applications and complex reasoning workflows. With 12B active parameters per inference step and up to 1M token context length, it is designed for long-horizon planning, tool calling and high-accuracy instruction following.

Built for agentic systems

Nemotron 3 Super combines a hybrid Transformer–Mamba architecture with mixture-of-experts routing to improve compute efficiency while maintaining strong reasoning performance.

Key characteristics:

  • 120B parameters, 12B active;
  • Up to 1M token context;
  • Multi-Token Prediction for faster long-form generation;
  • Open weights, open datasets and open training recipes;
  • Text-in, text-out model inference.

The model targets production use cases such as:

  • Software development workflows, including code generation and analysis;
  • Deep research agents for long-horizon planning and reasoning;
  • Financial document processing;
  • Cybersecurity triage and threat intelligent analysis.

Run NVIDIA Nemotron 3 Super in production

On Nebius Token Factory, Nemotron 3 Super can be deployed via:

  • Dedicated GPU endpoints with guaranteed performance;
  • Autoscaling throughput for production workloads;
  • OpenAI-compatible API integration;
  • EU or US regional deployment options;
  • Optional zero-retention inference.

Token Factory enables teams to move from model access to production deployment without managing GPU clusters or inference infrastructure.

Get started

Nemotron 3 Super is available today in the Nebius Token Factory console.

Deploy via API or test in the Playground. Build AI engineered for your product.

Explore Nebius Token Factory

Explore Nebius AI Cloud

Sign in to save this post