Setting up a RAG-powered content generation with Nebius AI Studio and Qdrant
September 23, 2025
11 mins to read
Writer’s block is real, and content teams face this more than anybody else. It is not just hard to write “social media hits” every day, but also difficult to keep up with what’s trending and align it with your company’s messaging. This is where AI can help. While there seems to be an AI SaaS solution for everything, a content generator platform tailored to a company’s specific needs was missing…until now. In this article, you will learn how to build a content generator platform by using Nebius AI Studio and Qdrant. By the end, you will learn how to build one for yourself, and you can also use the platform locally by following the steps in the readme.
When users upload company data or documents from the frontend, it hits the backend API that processes the content. This converts the text into vector embeddings by using the AI Studio API, and store those vectors in Qdrant Cloud which powers the semantic search.
Later, when a user wants content generated, the system queries Qdrant to fetch relevant context based on semantic similarity. That context is then passed to the Llama-3.3-70B-Instruct model by using the AI Studio API, which generates personalized content suggestions based on it.
Here’s a diagram to show how all this ties together:
Before jumping into the code, let’s understand why we choose these tools.
NebiusAI has pretty much everything you want if you’re building for production: crazy-fast throughput (we’re talking up to 10 million tokens per minute) and 400k+ TPM on most models, support for multiple models like DeepSeek, Qwen and Llama, and solid features like fine-tuning support tailored to enterprise use cases. This helps in generating fast and reliable content, without having to worry about infrastructure.
While Qdrant Cloud is a vector database built for semantic search, it helps in retrieving relevant context from the uploaded documents and links, and helps in generating contextually aware content.
For the RAG pipeline to work as expected, you need to upload context documents. These can be a link, or a DOCX, TXT or MD file. Once you do that from the frontend, as shown in the image below, the pipeline starts working and along with Nebius AI Studio, it embeds and generates contextually aware content.
In the next steps, we will see how the raw documents are turned into searchable vectors that help in generating social media posts, articles and demo ideas.
Each step will be illustrated with code examples, but if you want to see the entire code, you can find it on GitHub.
Next, we convert these text chunks into vector embeddings by using Nebius’ Embedding model: Qwen3-Embedding-8B. Vector embeddings are numerical representations of text that capture semantic meaning, allowing our system to understand the relationships between different pieces of content.
Each vector is stored with its associated metadata (like the original text, source document and timestamp) in what Qdrant calls a “point.” This allows us to not only find similar vectors, but also retrieve the original context when needed for content generation.
When a user requests content generation, we first search for relevant context. This is where the RAG starts working, instead of generating content from scratch, we find the most relevant information from the user’s uploaded documents and company data, to provide context-aware suggestions.
Now, we use Nebius’ Llama-3.3-70B-Instruct model to generate content with the retrieved context. This is where semantic search works with AI generation to create personalized content suggestions instead of suggesting generic content.
// backend/src/services/nebiusService.jsasyncgenerateContentSuggestions(companyData, contentType, goals, contextData = []) {
let contextInfo = '';
if (contextData && contextData.length > 0) {
contextInfo = `\n\nRelevant Context from Uploaded Documents:\n`;
contextData.forEach((doc, index) => {
contextInfo += `\${index + 1}. Source:
\${doc.source || 'Document'}\n`;
contextInfo += ` Content: \${doc.text.substring(0, 200)}
\${doc.text.length > 200 ? '...' : ''}\n\n`;
});
}
const prompt = `Based on the following company information and uploaded documents, suggest 3 ${contentType} ideas:
Company Data: ${JSON.stringify(companyData)}
Company Goals: \${goals}
\${contextInfo}
Please provide structured suggestions with titles, descriptions, and key points.`;
returnawaitthis.generateText(prompt, 'meta-llama/Llama-3.3-70B-Instruct', 1500);
}
The search functionality is what makes our content contextual. Here’s how the semantic search works:
Vector similarity search
Unlike traditional keyword-based search, which looks for exact word matches, semantic search understands the meaning behind the text. When a user requests content generation, their query is converted into a vector and we find the most semantically similar documents in our database.
The score_threshold parameter only returns results that are relevant (similarity score above 0.7), while the limit parameter keeps the context focused and manageable. This means if someone asks for “social media content about our new product, ‘ the system will find documents about product launches, marketing strategies or customer testimonials, even if those documents don’t contain the exact words ‘social media’ or ‘new product.’
To break it down, there are two main processes that are working in the background:
Document upload: When users upload documents, the tool first extracts the text content, then breaks it into manageable chunks with overlapping sections to maintain context. These text chunks are then converted into vector embeddings by using Nebius’ Qwen3-Embedding-8B model and finally stored in Qdrant Cloud for fast semantic search.
Content generation: When a user requests content generation, it starts by converting their query into a vector embedding, then performs a semantic search in Qdrant Cloud to find the most relevant documents. The retrieved context is then passed to AI Studio, which generates personalized content suggestions based on both the company data and the relevant document context.
The benefit of this architecture is that it scales well. As you add more documents, the system gets smarter about your company’s context, leading to better content suggestions over time.
We just walked through how to build a content generation platform by using AI Studio and Qdrant Cloud. From uploading documents to generating contextual, company-specific content by using a RAG setup.
With AI Studio handling high-performance generation and Qdrant powering semantic search, you get a solid foundation that’s scalable, efficient and grounded in your actual data, without needing to worry about managing complex infra.