NVIDIA HGX H100 on Nebius AI Cloud

Run AI inference, fine-tuning, and large-scale training cost-efficiently on the proven NVIDIA Hopper platform, deployed in production on Nebius AI Cloud.

What makes HGX H100 on Nebius different

GPU performance, optimized in-house

Our GPU clusters are optimized across every layer of the stack — from in-house designed servers to a software layer validated against NVIDIA benchmarks — so your AI workloads run at the highest achievable performance.

AI without operational overhead

We take care of the infrastructure, at any scale. Run containerized experiments with Serverless Jobs or deploy large-scale training clusters on Slurm, without touching drivers, networking, or cluster configuration.

Security and reliability by design

Every Nebius cluster comes with auto-healing that detects and recovers from hardware failures with minimum possible interruption. Our platform is also built on industry security and compliance standards, so your data and workloads always stay secure.

What teams run on NVIDIA HGX H100

Cost-efficient LLM inference

Serve large language models at scale without overpaying for compute. HGX H100 delivers strong FP8 inference throughput at a lower cost per GPU-hour than newer generations, making it the practical choice for teams where cost efficiency is a priority alongside performance.

LLM training for dense models

Train dense language models up to 70B parameters on a platform with a proven track record at cluster scale. HGX H100 is a reliable choice for non-MoE architectures, with years of optimized training recipes and well-understood behavior across large multi-node configurations.

Offline and batch inference

Run high-volume AI inference in batch mode where per-request latency is less critical than overall throughput and cost per result. HGX H100 cost efficiency makes it well-suited for offline workloads in academic research, life sciences, and drug discovery, where large datasets are processed in scheduled jobs rather than real-time pipelines.

NVIDIA HGX H100 specifications

Specification

HGX H100

Form factor

8× NVIDIA Hopper GPUs

FP8 Tensor Core (sparse)

32 PFLOPS

FP16/BF16 Tensor Core (sparse)

16 PFLOPS

TF32 Tensor Core (sparse)

8 PFLOPS

FP32

67 TFLOPS

FP64/FP64 Tensor Core

67 TFLOPS

INT8 Tensor Core (sparse)

32 POPS

GPU Memory (per GPU | total)

80 GB | 640 GB HBM3

Memory bandwidth

3.35 TB/s per GPU

NVIDIA NVLink™

Fourth generation

NVLink GPU-to-GPU bandwidth

900 GB/s

Total NVLink bandwidth

7.2 TB/s

Networking bandwidth

0.4 Tb/s

Source: nvidia.com/en-us/data-center/h100.

Supercomputer-class performance for your workloads

NVIDIA HGX H100 systems on Nebius are connected via NVIDIA Quantum-2 InfiniBand at 400 Gb/s per GPU. Having this high-speed interconnect enables massive parallel computation across thousands of GPUs, utilizing the full potential of NVIDIA accelerated compute, so your distributed workloads scale without becoming bottlenecked by the network fabric.

NVIDIA Exemplar Cloud Validation

Nebius is a Reference Platform NVIDIA Cloud Partner and holds NVIDIA Exemplar Cloud validation across multiple GPU generations — from NVIDIA H200 to NVIDIA GB300 NVL72. Exemplar Cloud is awarded to providers that demonstrate real-world training performance against NVIDIA’s benchmarking standards, not just peak specifications.

Frequently Asked Questions

The NVIDIA HGX H100 is an 8-GPU server platform built on NVIDIA’s Hopper architecture. It delivers 80 GB of HBM3 memory per GPU, fourth-generation NVLink, and FP8 Tensor Core support, a proven and widely adopted platform for AI training, fine-tuning, and inference workloads.

Get started with NVIDIA RTX PRO 6000 Blackwell Server Edition on Nebius

Launch your first instance in minutes or talk to our team to find the right setup for your workload.