Post-training for production: Making open models reliable at scale

Moving open models to production requires more than accuracy; it demands predictable latency, stable behavior under load, and cost control. In this live session, we’ll walk through how leading teams use post-training as the missing layer between a promising checkpoint and a production-ready system.

Fill out the form to get the recording

What you will learn

  • How speculative decoding controls tail latency (P90/P99) in long-context and high-concurrency workloads
  • Why generic draft models fail and how to post-train custom draft models on production data
  • How fine-tuning, distillation, quantization, and speculative decoding stabilize inference at scale
  • Deploying custom speculator pipelines via Nebius Token Factory API without infrastructure changes

Our hosts

Dylan Bristot

Product Marketing Manager

Mashrur Haider

Technical Product Manager

Sujee Maniyam

Developer Advocate

Try Nebius AI Cloud console today

Get immediate access to NVIDIA® GPUs, along with CPU resources, storage and additional services through our user-friendly self-service console.