Post-training for production: Making open models reliable at scale

Moving open models to production requires more than accuracy; it demands predictable latency, stable behavior under load, and cost control. In this live session, we’ll walk through how leading teams use post-training as the missing layer between a promising checkpoint and a production-ready system.

Fill out the form to get the recording

What you will learn

How speculative decoding controls tail latency (P90/P99) in long-context and high-concurrency workloads
Why generic draft models fail and how to post-train custom draft models on production data
How fine-tuning, distillation, quantization, and speculative decoding stabilize inference at scale
Deploying custom speculator pipelines via Nebius Token Factory API without infrastructure changes

Our hosts

Dylan Bristot

Product Marketing Manager

Mashrur Haider

Technical Product Manager

Sujee Maniyam

Developer Advocate

Try Nebius AI Cloud console today

Get immediate access to NVIDIA® GPUs, along with CPU resources, storage and additional services through our user-friendly self-service console.

Get started

Post-training for production: Making open models reliable at scale

Fill out the form to get the recording

Our hosts

Dylan Bristot

Mashrur Haider

Sujee Maniyam

Try Nebius AI Cloud console today

Products

Resources

Solutions

Prices

Security and compliance

Programs

Company

Legal

Post-training for production: Making open models reliable at scale

Fill out the form to get the recordingFill out the form to get the recording

Our hosts

Dylan Bristot

Mashrur Haider

Sujee Maniyam

Try Nebius AI Cloud console today

Fill out the form to get the recording