Post-training for production: Making open models reliable at scale
Moving open models to production requires more than accuracy; it demands predictable latency, stable behavior under load, and cost control. In this live session, we’ll walk through how leading teams use post-training as the missing layer between a promising checkpoint and a production-ready system.
What participants will learn
- How speculative decoding controls tail latency (P90/P99) in long-context and high-concurrency workloads
- Why generic draft models fail and how to post-train custom draft models on production data
- How fine-tuning, distillation, quantization, and speculative decoding stabilize inference at scale
- Deploying custom speculator pipelines via Nebius Token Factory API without infrastructure changes
Who should attend
ML engineers, platform engineers, and technical leaders running or deploying large open models in production. Ideal for teams who understand training but need practical guidance on making models behave reliably and predictably at scale
Register to receive an invitation and a recording
Try Nebius AI Cloud console today
Get immediate access to NVIDIA® GPUs, along with CPU resources, storage and additional services through our user-friendly self-service console.



