Async workloads and batch inference at scale
Process millions of requests asynchronously with our high-throughput Batch API.
Asynchronous API
Submit entire inference jobs at once and retrieve results typically within 24 hours, freeing your systems from real-time processing constraints.
Cost optimization
Get fast model quality at base model prices. Batch requests to fast models are charged at the economical base model rate, allowing you to maximize your AI budget.
Process up to 10 GB in one go
Handle large-scale data effortlessly. Our Batch API supports requests up to 10 GB, helping you to maintain efficient, high-volume processing and avoid rate limits.
Batch vs. normal API: processing 1M requests
Example for processing 1 million inference requests with Meta/Llama-3.1-70B-Instruct (3 millions TPM, 1200 RPM).
Batch API
- Single JSONL file (10 Gb). 1 million requests processed asynchronously
- Processing time: ~24 hours
- No rate limits consumption
- Set and forget — no monitoring needed
- Fast models variants at base model price
Normal API
- 1 million individual API calls
- Processing time: Minimum 13.9 hours (assuming max throughput 1200 RPM)
- Requires complex retry and queue logic
- Consumes from your rate limits
- Higher than base model price
Technical capabilities
Process up to 5 million requests per file
Handle millions of individual inference operations in a single batch (can upscale on demand)
Support for files up to 10 GB
Submit extensive datasets in one operation, without splitting or chunking.
Run up to 500 concurrent batches
Scale your processing across multiple parallel jobs for maximum throughput (can upscale on demand)
How it works
.png?cache-buster=2025-05-12T09:56:29.336Z)
Prepare your JSONL file
Create a file where each line represents a request with a custom ID, method, URL and body.
.png?cache-buster=2025-05-12T09:56:04.355Z)
Upload your file
Submit your batch through our API by using a simple call.
.png?cache-buster=2025-05-12T09:55:50.022Z)
Create a batch
Specify the endpoint and completion window within 24 hours.

Monitor progress and download results
Track batch status through our API. Access completed results when processing is finished.
Pricing is simple
Batch inference is automatically billed at 50% of the base real-time model price, rounded up to the nearest cent.
Example: If a model’s base price is $0.13 input and $0.40 output, Batch inference is $0.07 input and $0.20 output respectively.
Questions and answers
A Batch API allows you to submit large sets of data or multiple tasks at once, process them asynchronously and retrieve all results in a single response. This approach reduces network overhead, improves efficiency and streamlines handling extensive workloads.