Adopting AI shouldn’t mean risking data security. For organisations handling sensitive knowledge, every query is a matter of trust. What if your teams could ask any internal question — and get an accurate, policy-compliant answer in seconds?
As McKinsey reports, nearly 70% of enterprises already use AI across their operations. Yet only a fraction of organisations deploy generative models securely within their internal environments. At Codelab, we designed a solution that makes this possible — a secure LLM-based assistant that truly understands, protects, and delivers enterprise knowledge.
Enterprise context and security requirements
We ourselves faced challenges in the context of internal knowledge management, extracting information fast from the ocean of documents and as well as going through analysis and conclusions on that base. With a vast repository of sensitive company data and a growing need for efficient, secure, and accurate responses, we sought a cutting-edge solution to enhance our operations.
Challenge: Secure and scalable knowledge retrieval
Our existing internal knowledge-sharing systems were struggling to meet the following demands:
- Efficient Access to Sensitive Data: Employees in different roles, especially those related to direct work for projects and processes inside the organization, required quick and secure access to critical information stored in the company’s internal repositories.
- Scalability and Accuracy: The system needed to handle a growing volume of queries while maintaining high accuracy and relevance in responses.
- Fallback for Unavailable Data: When internal data was insufficient, the system needed to provide reliable external insights without compromising security
The real breakthrough wasn’t getting the model to generate answers — it was turning messy internal knowledge into something structured, searchable, and secure.
Patryk Skwarcan
Project Manager
Solution Architecture: LLM + RAG + AWS Deployment
Codelab developed and deployed a Virtual Assistant powered by the OpenAImodel and Retrieval-Augmented Generation (RAG) architecture, hosted securely on AWS. The solution was designed with the following key features:
The assistant was accessible through a simple and intuitive chat interface, enabling employees and customers to interact effortlessly.
- Secure Data Access:
- The LLaMA-based assistant was integrated with the company’s sensitive data repositories via a secure VPN connection.
- Additional access control mechanism ensured that only authorized users could retrieve specific data, adhering to the company’s strict security policies.
- Retrieval-Augmented Generation (RAG):
- The assistant utilized RAG to retrieve relevant information from the company’s internal documents, before generating responses. This ensured that the answers were accurate, contextually relevant, and based on the company’s proprietary knowledge.
- AWS Hosting:
- The entire solution was hosted on AWS, leveraging its scalability, reliability, and security features.
- The solution was also designed to switch between usage of CPUs vs GPUs power on AWS depending on customer’s demands for either fast-pacing responses (in style of a live chat conversation) or deeper reasoning with a slower response times.
- The application is also ready to work offline with usage of Llama model. We also work based on OpenAI.
- User-Friendly Interface:
- The assistant was accessible through a simple and intuitive chat interface, enabling employees and customers to interact effortlessly.
Measurable business results
▪️Response Time Reduction: 94%: Median response time decreased from 5 minutes to 20 seconds.
▪️Annual Time Savings: 2,500 Hours: Automation of repetitive queries reduced manual workload by approx. 2,500 hours per year (equivalent to 1.3 FTE).
▪️Annual Cost Savings: €75,000: The assistant handled a growing volume of queries without compromising performance or accuracy.
▪️Automation Rate: 40–70%: The AI assistant autonomously handled 40–70% of recurring queries.
▪️Human Escalation Rate: <10%: Fewer than 10% of total queries required human intervention.
▪️Knowledge Retrieval Precision: ≥80%: Precision rate achieved in LLM-based internal document search.
Why this architecture works in high-security environments
Codelab implemented a secure and scalable Virtual Assistant based on LLaMA, Retrieval-Augmented Generation (RAG), and AWS infrastructure, designed to operate within a controlled enterprise environment. The system addressed the identified knowledge retrieval challenges and provides a foundation for further internal AI capability development.
A significant portion of the engineering effort focused on data mining and document preparation. The RAG layer accounted for less than 30% of the total implementation work, with the majority dedicated to data structuring, normalization, and indexing optimization. The system also exposes a secured API layer via FastAPI for controlled internal integration.
Technical stack
- LLaMA Model & OpenAI: For natural language understanding and generation.
- Retrieval-Augmented Generation (RAG): For combining internal document retrieval with generative AI capabilities.
- AWS: For secure and scalable hosting.
- Apart from the typical RAG implementaiton we also have implemented non-trivial solutions like: Questions rephrasing, history of the chat used in RAG, hybrid search
Scaling secure enterprise AI: From internal assistant to production-ready AI systems
The deployment of this internal system demonstrates how AI-driven solutions can operate within high-security enterprise environments. Codelab continues to develop secure AI architectures and cloud-based systems to expand internal capabilities and support production-ready enterprise deployments.