NVIDIA HGX H200 on Nebius AI Cloud

Run large language models at full precision and serve memory-intensive AI workloads on NVIDIA’s Hopper platform with 141 GB of memory per GPU, deployed in production on Nebius AI Cloud.

What makes HGX H200 on Nebius different

GPU performance, optimized in-house

Our GPU clusters are optimized across every layer of the stack — from in-house designed servers to a software layer validated against NVIDIA benchmarks — so your AI workloads run at the highest achievable performance.

AI without operational overhead

We take care of the infrastructure, at any scale. Run containerized experiments with Serverless Jobs or deploy large-scale inference clusters on Slurm, without touching drivers, networking, or cluster configuration.

Security and reliability by design

Every Nebius cluster comes with auto-healing that detects and recovers from hardware failures with minimum possible interruption. Our platform is also built on industry security and compliance standards, so your data and workloads always stay secure.

What teams run on NVIDIA HGX H200

Large model inference

Serve large language models at full precision without quantization trade-offs. With 141 GB of HBM3e memory per GPU, HGX H200 fits 70B to 405B parameter models in a single node, enabling high-concurrency inference with long context windows.

LLM training and fine-tuning

Train and fine-tune large language models without the memory constraints that force teams into reduced-precision workflows. HGX H200 memory capacity handles larger models and longer sequences in both pre-training and fine-tuning jobs, making it a reliable platform across the full training lifecycle.

Offline and batch inference

Process large volumes of AI requests in batch mode, where latency per request matters less than total throughput and cost. With 141 GB of memory per GPU, HGX H200 runs full-precision 70B–405B models on batch workloads without quantization, making it a strong fit for academic research, drug discovery, and life sciences applications that process large datasets offline.

NVIDIA HGX H200 specifications

Specification

HGX H200

Form factor

8× NVIDIA Hopper GPUs

FP8 Tensor Core (sparse)

32 PFLOPS

FP16/BF16 Tensor Core (sparse)

16 PFLOPS

TF32 Tensor Core (sparse)

8 PFLOPS

FP32

67 TFLOPS

FP64/FP64 Tensor Core

67 TFLOPS

INT8 Tensor Core (sparse)

32 POPS

GPU Memory (per GPU | total)

141 GB | 1.1 TB HBM3e

Memory bandwidth

4.8 TB/s per GPU

NVIDIA NVLink

Fourth generation

NVLink GPU-to-GPU bandwidth

900 GB/s

Total NVLink bandwidth

7.2 TB/s

Networking bandwidth

0.4 Tb/s

Source: nvidia.com/en-us/data-center/h200.

Supercomputer-class performance for your workloads

NVIDIA HGX H200 systems on Nebius are connected via NVIDIA Quantum-2 InfiniBand at 400 Gb/s per GPU. Having this high-speed interconnect enables massive parallel computation across thousands of GPUs, utilizing the full potential of NVIDIA accelerated compute, so your distributed workloads scale without becoming bottlenecked by the network fabric.

NVIDIA Exemplar Cloud Validation

Nebius is a Reference Platform NVIDIA Cloud Partner and holds NVIDIA Exemplar Cloud status across multiple GPU generations — from NVIDIA H200 to NVIDIA GB300 NVL72. For the HGX H200 specifically, Nebius achieved NVIDIA Exemplar Cloud validation for training workloads, demonstrating real-world performance against NVIDIA’s benchmarking standards on this platform.

Frequently Asked Questions

The NVIDIA HGX H200 is an 8-GPU server platform built on NVIDIA’s Hopper architecture. It is the first GPU platform with HBM3e memory, delivering 141 GB per GPU and 4.8 TB/s of memory bandwidth, designed for large model inference and memory-intensive AI workloads.

Get started with NVIDIA HGX H200 on Nebius

Launch your first instance in minutes or talk to our team to find the right setup for your workload.