
NVIDIA HGX B200 on Nebius AI Cloud
Train and serve large-scale AI models on the NVIDIA Blackwell platform, deployed in production on Nebius AI Cloud.
What makes HGX B200 on Nebius different
GPU performance, optimized in-house
Our GPU clusters are optimized across every layer of the stack — from in-house designed servers to a software layer validated against NVIDIA benchmarks — so your AI workloads run at the highest achievable performance.
AI without operational overhead
We take care of the infrastructure, at any scale. Run containerized experiments with Serverless Jobs or deploy large-scale training clusters on Slurm, without touching drivers, networking, or cluster configuration.
Security and reliability by design
Every Nebius cluster comes with auto-healing that detects and recovers from hardware failures with minimum possible interruption. Our platform is also built on industry security and compliance standards, so your data and workloads always stay secure.
What teams run on NVIDIA HGX B200
Large-scale LLM training
Train frontier language models on a platform designed for the demands of large-scale, memory-intensive workloads. HGX B200 combines 1.4 TB of HBM3e memory per node with fifth-generation NVIDIA NVLink™, making it a capable platform for dense model training at scale.
MoE model training and inference
Build and train Mixture-of-Experts models efficiently, where the combination of high memory capacity and fast intra-node interconnect directly improves utilization across expert layers. HGX B200 handles both the training and serving stages of MoE architectures without memory constraints.
High-throughput inference
Serve production AI workloads at scale with the throughput that modern inference demands. HGX B200’s FP4 Tensor Cores and large memory footprint allow teams to run large models at full precision with room for long context lengths and high request concurrency.
NVIDIA HGX B200 specifications
Specification
HGX B200
Form factor
8× NVIDIA Blackwell GPUs
FP4 Tensor Core (sparse | dense)
144 PFLOPS | 72 PFLOPS
FP8/FP6 Tensor Core (sparse)
72 PFLOPS
INT8 Tensor Core (sparse)
72 POPS
FP16/BF16 Tensor Core (sparse)
36 PFLOPS
TF32 Tensor Core (sparse)
18 PFLOPS
FP32
600 TFLOPS
FP64/FP64 Tensor Core
296 TFLOPS
Total memory
1.4 TB HBM3e
NVIDIA NVLink
Fifth generation
NVIDIA NVLink Switch
NVLink 5 Switch
NVLink GPU-to-GPU bandwidth
1.8 TB/s
Total NVLink bandwidth
14.4 TB/s
Networking bandwidth
0.8 Tb/s
Source: nvidia.com/en-us/data-center/hgx — NVIDIA Blackwell tab.
Supercomputer-class performance for your workloads
NVIDIA HGX B200 systems on Nebius are connected via NVIDIA Quantum-2 InfiniBand at 400 Gb/s per GPU. Having this high-speed interconnect enables massive parallel computation across thousands of GPUs, utilizing the full potential of NVIDIA accelerated compute, so your distributed workloads scale without becoming bottlenecked by the network fabric.
NVIDIA Exemplar Cloud Validation
NVIDIA Exemplar Cloud Validation
Nebius is a Reference Platform NVIDIA Cloud Partner and holds NVIDIA Exemplar Cloud validation across multiple GPU generations — from NVIDIA H200 to NVIDIA GB300 NVL72. Exemplar Cloud is awarded to providers that demonstrate real-world training performance against NVIDIA’s benchmarking standards, not just peak specifications.

Frequently Asked Questions
The NVIDIA HGX B200 is an 8-GPU server platform built on NVIDIA Blackwell architecture. It delivers 1.4 TB of HBM3e memory per node, fifth-generation NVLink interconnects, and FP4 Tensor Core support, designed for large-scale AI training and high-throughput inference workloads.
Get started with NVIDIA HGX B200 on Nebius
Launch your first instance in minutes or talk to our team to find the right setup for your workload.