
NVIDIA HGX H100 on Nebius AI Cloud
Run AI inference, fine-tuning, and large-scale training cost-efficiently on the proven NVIDIA Hopper platform, deployed in production on Nebius AI Cloud.
What makes HGX H100 on Nebius different
GPU performance, optimized in-house
Our GPU clusters are optimized across every layer of the stack — from in-house designed servers to a software layer validated against NVIDIA benchmarks — so your AI workloads run at the highest achievable performance.
AI without operational overhead
We take care of the infrastructure, at any scale. Run containerized experiments with Serverless Jobs or deploy large-scale training clusters on Slurm, without touching drivers, networking, or cluster configuration.
Security and reliability by design
Every Nebius cluster comes with auto-healing that detects and recovers from hardware failures with minimum possible interruption. Our platform is also built on industry security and compliance standards, so your data and workloads always stay secure.
What teams run on NVIDIA HGX H100
Cost-efficient LLM inference
Serve large language models at scale without overpaying for compute. HGX H100 delivers strong FP8 inference throughput at a lower cost per GPU-hour than newer generations, making it the practical choice for teams where cost efficiency is a priority alongside performance.
LLM training for dense models
Train dense language models up to 70B parameters on a platform with a proven track record at cluster scale. HGX H100 is a reliable choice for non-MoE architectures, with years of optimized training recipes and well-understood behavior across large multi-node configurations.
Offline and batch inference
Run high-volume AI inference in batch mode where per-request latency is less critical than overall throughput and cost per result. HGX H100 cost efficiency makes it well-suited for offline workloads in academic research, life sciences, and drug discovery, where large datasets are processed in scheduled jobs rather than real-time pipelines.
NVIDIA HGX H100 specifications
Specification
HGX H100
Form factor
8× NVIDIA Hopper GPUs
FP8 Tensor Core (sparse)
32 PFLOPS
FP16/BF16 Tensor Core (sparse)
16 PFLOPS
TF32 Tensor Core (sparse)
8 PFLOPS
FP32
67 TFLOPS
FP64/FP64 Tensor Core
67 TFLOPS
INT8 Tensor Core (sparse)
32 POPS
GPU Memory (per GPU | total)
80 GB | 640 GB HBM3
Memory bandwidth
3.35 TB/s per GPU
NVIDIA NVLink™
Fourth generation
NVLink GPU-to-GPU bandwidth
900 GB/s
Total NVLink bandwidth
7.2 TB/s
Networking bandwidth
0.4 Tb/s
Source: nvidia.com/en-us/data-center/h100.
Supercomputer-class performance for your workloads
NVIDIA HGX H100 systems on Nebius are connected via NVIDIA Quantum-2 InfiniBand at 400 Gb/s per GPU. Having this high-speed interconnect enables massive parallel computation across thousands of GPUs, utilizing the full potential of NVIDIA accelerated compute, so your distributed workloads scale without becoming bottlenecked by the network fabric.
NVIDIA Exemplar Cloud Validation
NVIDIA Exemplar Cloud Validation
Nebius is a Reference Platform NVIDIA Cloud Partner and holds NVIDIA Exemplar Cloud validation across multiple GPU generations — from NVIDIA H200 to NVIDIA GB300 NVL72. Exemplar Cloud is awarded to providers that demonstrate real-world training performance against NVIDIA’s benchmarking standards, not just peak specifications.

Frequently Asked Questions
The NVIDIA HGX H100 is an 8-GPU server platform built on NVIDIA’s Hopper architecture. It delivers 80 GB of HBM3 memory per GPU, fourth-generation NVLink, and FP8 Tensor Core support, a proven and widely adopted platform for AI training, fine-tuning, and inference workloads.
Get started with NVIDIA RTX PRO 6000 Blackwell Server Edition on Nebius
Launch your first instance in minutes or talk to our team to find the right setup for your workload.