
NVIDIA Nemotron 3 Super now available on Nebius Token Factory
NVIDIA Nemotron 3 Super now available on Nebius Token Factory
NVIDIA Nemotron 3 Super
Nemotron 3 Super is a 120B parameter hybrid MoE model optimized for multi-agent applications and complex reasoning workflows. With 12B active parameters per inference step and up to 1M token context length, it is designed for long-horizon planning, tool calling and high-accuracy instruction following.
Built for agentic systems
Nemotron 3 Super combines a hybrid Transformer–Mamba architecture with mixture-of-experts routing to improve compute efficiency while maintaining strong reasoning performance.
Key characteristics:
- 120B parameters, 12B active;
- Up to 1M token context;
- Multi-Token Prediction for faster long-form generation;
- Open weights, open datasets and open training recipes;
- Text-in, text-out model inference.
The model targets production use cases such as:
- Software development workflows, including code generation and analysis;
- Deep research agents for long-horizon planning and reasoning;
- Financial document processing;
- Cybersecurity triage and threat intelligent analysis.
Run NVIDIA Nemotron 3 Super in production
On Nebius Token Factory, Nemotron 3 Super can be deployed via:
- Dedicated GPU endpoints with guaranteed performance;
- Autoscaling throughput for production workloads;
- OpenAI-compatible API integration;
- EU or US regional deployment options;
- Optional zero-retention inference.
Token Factory enables teams to move from model access to production deployment without managing GPU clusters or inference infrastructure.
Get started
Nemotron 3 Super is available today in the Nebius Token Factory console.
Deploy via API or test in the Playground



