
Nebius and LangChain partner to power production-grade AI agents on open models
Nebius and LangChain partner to power production-grade AI agents on open models
Nebius and LangChain
Agents are production workloads now
The shift from chatbots to autonomous agents has changed what AI infrastructure needs to support. A single agent interaction can involve a planning step, multiple sub-agent calls, tool use, memory retrieval and retries, with each generating inference requests. Where a chatbot makes one LLM call per user message, an agent workflow can make dozens.
LangChain’s ecosystem has become the default tool for teams building agentic workflows. Production agents introduce an infrastructure question that the framework layer doesn’t answer on its own: where do these agents actually run at scale, with the reliability, throughput and model flexibility that production demands?
Nebius Token Factory and LangChain Deep Agents: the integration
This is where Token Factory comes in. It supports 30+ open-source models compatible with LangChain Deep Agents — including Llama, Qwen, DeepSeek and NVIDIA Nemotron — with an OpenAI-compatible API, dedicated endpoints, autoscaling and a 99.9% uptime SLA.
Here is how the stack fits together:
-
Deep Agents is LangChain’s open-source agent framework, built on LangGraph to handle orchestration: planning, sub-agents and tool use
-
LangSmith provides tracing and evaluation
-
Token Factory provides the inference backend powering every LLM call the agent makes
-
Tavily, Nebius’s agentic search API, provides real-time web search and content extraction — giving agents grounded access to live information
This integration brings full observability across the agent stack without additional instrumentation. Teams can point their agent workloads at Token Factory through a configuration change: swap the base URL and model string. The result: agents, LangSmith traces, evaluations and orchestration logic remain untouched.
Optimizing across every layer
Deep Agents’ architecture supports per-sub-agent model routing, so teams can assign different models to different agent roles: a larger model for planning, a faster model for execution, a lightweight model for tool-calling. This allows cost and quality optimization at each layer rather than running one model for everything.
Token Factory’s growing model catalog makes this practical: teams select from the models that fit each role, and swap freely as new models become available with a simple config change.
The langchain-nebius
Getting started
Install the langchain-nebius
Want to see it live? Join Nebius and LangChain for a webinar on May 21



