
NVIDIA HGX H200 on Nebius AI Cloud
Run large language models at full precision and serve memory-intensive AI workloads on NVIDIA’s Hopper platform with 141 GB of memory per GPU, deployed in production on Nebius AI Cloud.
What makes HGX H200 on Nebius different
GPU performance, optimized in-house
Our GPU clusters are optimized across every layer of the stack — from in-house designed servers to a software layer validated against NVIDIA benchmarks — so your AI workloads run at the highest achievable performance.
AI without operational overhead
We take care of the infrastructure, at any scale. Run containerized experiments with Serverless Jobs or deploy large-scale inference clusters on Slurm, without touching drivers, networking, or cluster configuration.
Security and reliability by design
Every Nebius cluster comes with auto-healing that detects and recovers from hardware failures with minimum possible interruption. Our platform is also built on industry security and compliance standards, so your data and workloads always stay secure.
What teams run on NVIDIA HGX H200
Large model inference
Serve large language models at full precision without quantization trade-offs. With 141 GB of HBM3e memory per GPU, HGX H200 fits 70B to 405B parameter models in a single node, enabling high-concurrency inference with long context windows.
LLM training and fine-tuning
Train and fine-tune large language models without the memory constraints that force teams into reduced-precision workflows. HGX H200 memory capacity handles larger models and longer sequences in both pre-training and fine-tuning jobs, making it a reliable platform across the full training lifecycle.
Offline and batch inference
Process large volumes of AI requests in batch mode, where latency per request matters less than total throughput and cost. With 141 GB of memory per GPU, HGX H200 runs full-precision 70B–405B models on batch workloads without quantization, making it a strong fit for academic research, drug discovery, and life sciences applications that process large datasets offline.
NVIDIA HGX H200 specifications
Specification
HGX H200
Form factor
8× NVIDIA Hopper GPUs
FP8 Tensor Core (sparse)
32 PFLOPS
FP16/BF16 Tensor Core (sparse)
16 PFLOPS
TF32 Tensor Core (sparse)
8 PFLOPS
FP32
67 TFLOPS
FP64/FP64 Tensor Core
67 TFLOPS
INT8 Tensor Core (sparse)
32 POPS
GPU Memory (per GPU | total)
141 GB | 1.1 TB HBM3e
Memory bandwidth
4.8 TB/s per GPU
NVIDIA NVLink
Fourth generation
NVLink GPU-to-GPU bandwidth
900 GB/s
Total NVLink bandwidth
7.2 TB/s
Networking bandwidth
0.4 Tb/s
Source: nvidia.com/en-us/data-center/h200.
Supercomputer-class performance for your workloads
NVIDIA HGX H200 systems on Nebius are connected via NVIDIA Quantum-2 InfiniBand at 400 Gb/s per GPU. Having this high-speed interconnect enables massive parallel computation across thousands of GPUs, utilizing the full potential of NVIDIA accelerated compute, so your distributed workloads scale without becoming bottlenecked by the network fabric.
NVIDIA Exemplar Cloud Validation
NVIDIA Exemplar Cloud Validation
Nebius is a Reference Platform NVIDIA Cloud Partner and holds NVIDIA Exemplar Cloud status across multiple GPU generations — from NVIDIA H200 to NVIDIA GB300 NVL72. For the HGX H200 specifically, Nebius achieved NVIDIA Exemplar Cloud validation for training workloads, demonstrating real-world performance against NVIDIA’s benchmarking standards on this platform.

Frequently Asked Questions
The NVIDIA HGX H200 is an 8-GPU server platform built on NVIDIA’s Hopper architecture. It is the first GPU platform with HBM3e memory, delivering 141 GB per GPU and 4.8 TB/s of memory bandwidth, designed for large model inference and memory-intensive AI workloads.
Get started with NVIDIA HGX H200 on Nebius
Launch your first instance in minutes or talk to our team to find the right setup for your workload.