
Introducing DevPods, Jobs and Endpoints: Easy compute access with serverless AI
Introducing DevPods, Jobs and Endpoints: Easy compute access with serverless AI
If you work with AI infrastructure, you know how much effort it can take to get production-ready compute up and running. Modern AI stacks span multiple layers — servers, accelerators, drivers, libraries, virtualization, orchestration, networking — and aligning all of them into a stable environment often requires weeks of work from an experienced DevOps team.
At the same time, many AI use cases don’t need a permanently provisioned cluster. Data scientists want to run experiments, ML engineers want to test training scripts and validate model inference behind real endpoints — all without waiting for long-lived infrastructure to be designed, provisioned and validated.
At Nebius, we’re focused on making advanced AI compute accessible and practical for a broad range of users. As part of this direction, we’re introducing DevPods, Jobs and Endpoints — new services that represent our first steps toward container-based serverless compute for AI workloads.
These services are designed to hide most of the underlying infrastructure complexity, allowing ML engineers and data scientists to focus on model development, experimentation and evaluation. Compute resources are provisioned on demand, while cluster management, GPU drivers and networking are handled by the platform.
How serverless benefits AI practitioners
Originally, the serverless concept referred to Function-as-a-Service (FaaS) platforms such as AWS Lambda or Google Cloud Functions, where users execute short-lived functions on demand without provisioning or managing servers, virtual machines or clusters. Over time, this model evolved to support containerized applications, expanding serverless to a broader range of workloads.
Despite differences in implementation, most serverless platforms share several common characteristics:
-
On-demand workload lifecycle — Compute resources are provisioned in response to triggers and released when the workload completes.
-
Pay-per-use pricing — Users only pay for the time during which compute resources are actively allocated, thereby reducing costs associated with idle capacity.
-
Infrastructure invisible for users — The cloud provider is responsible for provisioning, scaling and maintaining the underlying infrastructure.
In the world of AI, container-based serverless platforms have become a practical middle ground, enabling teams to accelerate different stages of the model development pipeline — from faster experimentation to scalable and cost-conscious model inference.
With this approach, data scientists and ML engineers can access AI compute almost immediately, without waiting for a DevOps team to provision clusters, install GPU drivers or configure networking for public endpoints.
Figure 1. Serverless compute compared to other models of AI compute
To get started, practitioners package their training code or model artifacts into a container and submit it to the serverless platform. The infrastructure below that layer is already provisioned, tested and managed by the provider, allowing teams to focus on running and evaluating their workloads rather than on environment setup (Figure 1).
In an environment where accelerators are scarce and expensive, this allows ML teams to run AI workloads without reserving large GPU clusters in advance and carrying out the overheads of idle resources.
DevPods, Jobs and Endpoints: Develop, run and serve with zero hassle
DevPods, Jobs and Endpoints are three new services in Nebius AI Cloud that implement a container-based serverless approach to AI compute. Each of them is fully managed on our side and are designed to give users a simple way to run AI and ML workloads without provisioning or maintaining long-lived infrastructure.
To start working with either service, users can provide their own container image or specify the path to a public image and define basic configuration parameters. This includes selecting the available GPU type in the region and, if needed, mounting an object storage bucket or file system for datasets and artifacts. The rest of the infrastructure lifecycle is handled by the platform.
Jobs and Endpoints are currently available in public preview through the Nebius web console and CLI. DevPods are in private preview and are expected to become generally available in the next three months.
DevPods
DevPods are interactive development environments powered by GPU or CPU, designed for coding, debugging and exploratory work. The service serves as a developer playground for interactive coding in Jupyter or VS Code, exploratory data analysis and visualization, prototyping model ideas, and debugging running jobs or inference tasks.
The primary use case for DevPods is to quickly provision an interactive environment for human-in-the-loop development. These environments are not production-facing and therefore do not require production-grade stability or security, but they do require fast startup and rapid shutdown once work is complete.
An alternative approach would be to spin up a VM and install a development stack each time you need compute for prototyping. This process is tedious, requires infrastructure engineering effort and can incur unnecessary costs when GPUs remain idle during setup (see Figure 2).
Figure 2. Comparison of VM-based and serverless workflows: serverless reduces setup overhead and idle GPU utilization
Jobs
Jobs provide a simple orchestration mechanism for running finite workloads on allocated GPU or CPU-only resources. The service is well-suited for use cases such as batch data processing, model training experiments or scientific simulations.
In its current form, Jobs are optimized for single-node workloads and simplified execution flows. As a result, the service is not intended for large-scale or highly complex distributed training scenarios.
Endpoints
Endpoints provide a serverless compute engine with pre-configured web endpoints, enabling users to deploy custom models and make them accessible via HTTP within minutes. In the current version, the service is particularly well-suited for pre-production deployments and testing scenarios, where teams want to evaluate model behavior under realistic conditions.
The longer-term roadmap for Endpoints focuses on enabling auto-scaling capabilities for a full-fledged model that combines simplicity with greater control over the inference runtime and configuration.
These serverless tools expand the variety of options for seamless ML development available on the Nebius cloud platform, enabling ML engineers to progress in their pipelines without halts and extra costs. They can start prototyping quickly with DevPods and Jobs, scaling the training with orchestrators like Managed Soperator and evaluating production models with Endpoints.
Current limitations and upcoming changes
The Jobs and Endpoints services are being launched in public preview to give users early access to a container-based serverless experience for AI workloads. DevPods are expected to become publicly accessible in mid-April. At the same time, we recognize that the current functionality represents an initial stage, and that several core capabilities typically associated with mature serverless platforms are still under development.
The table below outlines the functionality available today, along with a set of core capabilities that are planned as part of the general availability (GA) roadmap. These timelines reflect our current plans and may evolve as development progresses.
| Q1 2026 | Q2–Q3 2026 | Q4+ 2026 | |
|---|---|---|---|
| Startup latency | Slow | Moderate | Optimized |
| Observability | Logging, Basic monitoring | Logging, Advanced monitoring | Logging, Advanced monitoring |
| Monitoring | Basic | Improved | Improved |
| Health checks | Basic | Advanced | Advanced |
| Serverless for reserved capacity | — | Yes | Yes |
| Single-node cluster support | Yes | Yes | Yes |
| Multi-node cluster support | — | Jobs | Jobs, Endpoints |
| Multi-region scheduling | — | Jobs | Jobs, Endpoints |
| Autoscaling | — | — | Endpoints |
Elastic compute over the robust platform
The serverless services at Nebius are a natural extension of how an AI infrastructure cloud evolves over time, building on a mature and well-established underlying platform. As the platform develops, it becomes possible to expose compute in more flexible and elastic forms that better match how AI workloads are consumed.
While DevPods, Jobs and Endpoints are still evolving, the underlying platform is built to support reliable, high-performance and elastic AI compute as additional features are introduced. In particular:
-
We maintain control over the infrastructure lifecycle, including server design, firmware and cluster-level monitoring for proactive hardware health management.
-
The platform is designed to support secure and compliant compute environments, and aligns with common industry standards and regulatory requirements.
-
Virtualization is implemented with minimal overhead and delivers a bare-metal level of performance, as demonstrated by public benchmark results, including MLPerf submissions.
This foundation provides a strong basis for expanding serverless capabilities over time.
How to get started
New serverless tools are available in self-service mode through the Nebius web console and via the CLI, making it easy to start running AI workloads without upfront reservations or long-term commitments.
To get started, sign in to the Nebius console, navigate to AI Services in the left-hand menu and select the service you need. From there, choose from the available GPUs in your region and configure the required runtime settings. If needed, you can also mount object storage or a file system to provide access to data and store results.
DevPods, Jobs and Endpoints use pay-as-you-go pricing based on on-demand compute rates and apply only while workloads are running.
For detailed configuration options and examples, refer to the documentation
Interested in Serverless AI and want to learn more from Nebius experts? Register for our upcoming virtual events:
- April 9 (Thursday) — Serverless for AI in 2026
- April 16 (Thursday) — Serverless AI for developers
- April 23 (Thursday) — Serverless AI for Life Sciences
- May 5 (Tuesday) — Serverless AI for Robotics and Physical AI




