Cog-RAG: Cognitive Retrieval-Augmented Generation
Cog-RAG is Queria's proprietary architecture that transforms document search from a mechanical process into a cognitive one. Instead of merely finding documents similar to a question, the system understands, plans, reasons and verifies before answering.
From v3.5.0 the entire Cog-RAG cycle is implemented as a DSL canvas (purpose = CHAT), a graph pipeline composed of typed nodes the tenant can customize without writing code. The same pattern applies to ingestion (purpose = INGESTION) and external integrations (purpose = SERVICE for OAuth).
See also
- Chat DSL (Canvas) -- how to configure the chat pipeline for your company
- Ingestion DSL -- role-aware ingestion pipeline
- OAuth & Service Canvas -- authenticated integrations
The traditional RAG problem
A classical RAG system follows three steps:
- Receives the user question
- Searches the most similar documents in the vector database
- Passes the found documents to the language model to generate the answer
This approach has clear limits. It does not handle complex questions requiring information from different sources. It does not adapt the search strategy to question complexity. It does not verify whether results are really pertinent. It is unable to reason across multiple documents to produce a synthesis. It has no memory of who is talking or what has already been asked.
From RAG to agentic Cog-RAG with memory
Queria's architecture evolves traditional RAG along three axes:
- Cognitive (Cog) -- the system understands intent, plans the strategy, decomposes queries and verifies grounding before answering.
- Agentic -- multi-step ReAct pattern: the engine runs a chain of observation, reasoning and action (tool calls, retrieval, synthesis), repeated until convergence or iteration limit.
- With memory -- every user has a persistent Memory Subject that retains facts, preferences and context across different conversations. The bot remembers who you are, what you're working on, what you've already seen.
The three axes are implemented as a customizable DSL canvas. What follows describes the "system" flow -- starting point for every tenant, modifiable without backend release.
How agentic Cog-RAG works
Cog-RAG introduces a full cognitive cycle between question and answer, memory-aware and agentic:
[Memory recall]
|
v
Question ----> [Memory subject + conversation history] ----> context
| |
v |
[1] Query analysis and understanding (with user context) |
| |
v |
[2] Search strategy planning |
| |
v |
[3] Decomposition into sub-queries (if needed) |
| |
v |
[4] *Agentic ReAct loop* |
| | |
| v |
| Action: tool calls (retrieval, external_tool, ...) <-----------+
| |
| v
| Observation: tool results
| |
| v
| Reasoning: critic evaluates sufficiency
| |
| v
| (converged?) --no--> new action
| |
| yes
|
v
[5] Semantic re-ranking
|
v
[6] Quality and grounding check (critic)
| |
| (insufficient)
| |
| v
| Return to agentic loop
| with widened scope
|
v
[7] Synthesis with deep reasoning
|
v
[8] Memory update (persistent facts emerged)
|
v
Answer with citationsEach step is a canvas node, handled by specialized components that collaborate autonomously. The critic -- a dedicated LLM node -- evaluates at each iteration whether the gathered evidence is sufficient and accurate; if not, the system relaunches the loop with widened scope. The conversation closes only when the critic agrees or the configured iteration limit is reached.
Persistent per-user memory
A distinctive trait of Cog-RAG v3.5.0 is the persistent memory the system maintains for each user, across different conversations and channels (web, widget, WhatsApp).
Memory Subject
Each user has a Memory Subject -- a DB structure that collects:
| Category | Content | Examples |
|---|---|---|
| Professional profile | Role, responsibility area, interest domain | "Compliance manager", "works on GDPR contracts" |
| Preferences | Answer style, language, detail level | "wants short answers", "prefers Italian" |
| Operational context | What they're working on, deadlines, open projects | "analyzing 2025 balance sheet", "deadline Dec 31" |
| References | Documents, customers, suppliers frequently mentioned | "often works on supplier Rossi srl" |
The Memory Subject is isolated per (userId, companyId): no user data flows to another, not even within the same company.
Three memory levels
+-----------------------------+
| Intra-conversation memory | last N messages (turns)
+-----------------------------+
|
v
+-----------------------------+
| Conversation history | summaries of previous sessions
+-----------------------------+
|
v
+-----------------------------+
| Memory Subject (persistent)| facts, preferences, user context
+-----------------------------+- Intra-conversation: the assistant remembers exactly what was said in previous turns of the same chat. You can refer to "point 3 of the previous answer" without repeating it.
- Conversation history: compact summaries of past conversations consultable on the fly. Useful to resume a topic opened days ago.
- Memory Subject: key facts extracted automatically or explicitly flagged. They persist until the user removes them from their memory profile.
Memory update
A memory_writer node at the end of the canvas extracts from completed turns candidate information to promote into the Memory Subject. Promotion is always transparent: the user sees a "I memorized that..." notice and can refuse the promotion, or consult their memory profile from settings.
Memory privacy
- User side: the Profile > Memory page shows all memorized facts, with edit and point deletion possibility.
- Admin side: SYSTEM_ADMIN can view a company's aggregated memory only for audit purposes, never per-user contents without explicit mandate.
- User deletion: all associated Memory Subjects are removed via cascade.
For the user-side guide see Memory & Context.
The two-brain system
At the heart of Cog-RAG operate two AI models with complementary roles:
Planner: the fast brain
The Planner is a fast and light model, optimized for immediate decisions. It handles:
- Intent classification: understands what the user is really asking
- Routing: decides which pipeline to activate (simple search, decomposition, comparison)
- Query decomposition: breaks down complex questions into manageable sub-questions
- Complexity evaluation: estimates question difficulty to calibrate search parameters
- Utility and support: handles auxiliary operations like reformulation and quick summaries
The Planner operates in milliseconds and doesn't engage heavy compute.
Writer: the deep brain
The Writer is a powerful model with advanced reasoning. It handles:
- Multi-document synthesis: combines information from dozens of sources into a coherent answer
- Complex reasoning: tackles questions requiring inferences, comparisons, analysis
- High-quality generation: produces professional, structured and accurate text
- Explicit thinking: uses an internal reasoning process before formulating the answer
The Writer enters only when its power is needed, preserving overall system efficiency.
The collaboration
The Planner decides what to do and how to do it. The Writer executes with depth. This separation lets you achieve fast response times for simple questions (handled almost entirely by the Planner) and high-quality answers for complex ones (where the Writer invests time in reasoning).
The Critic
The agentic pattern is joined by a third role: the Critic. It is a dedicated LLM instance that, after each Writer action, evaluates the result along explicit dimensions (evidence sufficiency, internal contradictions, alignment with the original question, sub-query coverage). If the analysis fails one or more criteria, the critic emits a bounce to the agentic loop with targeted instructions (e.g. "also search regulatory sources", "deepen aspect X").
The Critic is toggleable via the CHAT_CRITIC_ENABLED kill-switch. In latency-critical scenarios it can be disabled; in mission-critical scenarios (legal, healthcare, tax) it is always on.
Agentic pattern and tool use
In agentic mode, the Writer is not just synthesizing: it is an agent that decides which tools to invoke to reach the answer. The canvas tools available are:
| Tool type | Examples |
|---|---|
| Internal retrieval | Search on tenant Qdrant collections (KB, user docs, sector) |
| External sources | Legal Sources, Food Sources, Chem Sources, Pharma Sources, AE Sources |
| OAuth external tools | Slack post_message, Stripe customer.lookup, GCal create_event |
| Open data | Geocoding (Nominatim), POI (Overpass), weather (Open-Meteo), encyclopedia (Wikipedia) |
| AI Constructor | Pre-packaged sector pipelines (Tourism, on roadmap Tax, Healthcare) |
| Memory lookup | Querying the current user's Memory Subject |
Each tool is a canvas node. The agent iteratively chooses which to invoke observing previous results and reasoning about the gap with the question. The whole sequence is traced and visible in the reasoning panel.
Query orchestration
Not all questions are equal. Cog-RAG classifies every query and chooses the most fitting orchestration strategy.
Simple queries
For direct questions with a clear expected answer, the system runs a direct search and generates the response. No decomposition, no superfluous steps.
Example: "When does the contract with supplier X expire?"
Sequential decomposition
When sub-questions depend on each other, they are executed in sequence. The answer to one feeds the next.
Example: "Who are the designated heirs in the will and what shares go to each?" First identify the heirs, then look up the shares for each.
Parallel decomposition
When sub-questions are independent, they are executed in parallel to maximize speed.
Example: "Compare the contractual terms of supplier A and supplier B." The two supplier searches run simultaneously.
Hierarchical decomposition
For exploratory questions, the system starts from the general and progressively deepens.
Example: "What are the main issues that emerged in last year's audit reports?" First a broad search, then targeted deep dives on the surfaced themes.
Comparative decomposition
For structured comparisons, the system gathers information from both sides and produces a side-by-side analysis.
Example: "What are the differences between the current insurance policy and the proposed one?"
Adaptive search
Search parameters are automatically calibrated based on the query's estimated complexity:
| Complexity | Documents searched | Minimum threshold | Reranking | Diversification |
|---|---|---|---|---|
| Simple | Few, targeted | High | Yes | Low |
| Moderate | Medium quantity | Medium | Yes | Medium |
| Complex | Wide quantity | Low | Yes | High |
| Aggregative | Maximum coverage | Very low | No | Maximum |
Aggregative queries (statistics, summaries of large sets) need a different approach: maximum coverage with high diversification to avoid redundant results.
Hybrid search
Every search combines two complementary approaches:
Semantic search: compares the question meaning with the documents meaning via 1024-dimensional vectors. Excels at finding pertinent documents even when wording differs.
Lexical search (BM25): compares keywords. Excels at finding documents with specific terms (codes, proper nouns, article numbers).
Results from the two searches are combined via Reciprocal Rank Fusion (RRF), an algorithm that balances scores from both approaches to produce an optimal final ranking.
Multi-source integration
Cog-RAG is not limited to company documents. The system integrates transparently:
- Company documents: files uploaded by the organization
- Knowledge Base: the curated and permanent knowledge base
- Certified external sources: specialized databases in legal, food, chemical and pharmaceutical domains
All sources participate in the same search and reranking process. The user receives a unified answer with citations that clearly identify the origin of each piece of information through colored badges distinct by source type.
Transparent reasoning
A distinctive trait of Cog-RAG is the transparency of the reasoning process. The Writer uses an explicit thinking mode: before formulating the answer, it generates an internal reasoning analyzing sources, evaluating relevance, identifying contradictions and planning the answer structure.
This reasoning is visible to the user through the dedicated panel in the interface. The user can verify how the system arrived at a certain conclusion, which sources it considered relevant and why, and where it found information gaps.
Reasoning transparency is fundamental in enterprise contexts where decisions based on system answers must be verifiable and justifiable.
Customizability via DSL canvas
Unlike monolithic RAG architectures, Cog-RAG v3.5.0 is entirely built on DSL canvases. This means:
- Every tenant can duplicate the system pipeline and modify it -- add tools, change node order, exclude the critic, add PII sanitization steps before prompt submission.
- Every topic can have a dedicated pipeline (e.g. the "Compliance" topic uses critic + mandatory regulatory sources, the "Marketing" topic uses a freer pipeline).
- Canvases are versioned: snapshots, rollback, immutable audit of past versions.
- Pipeline changes don't require backend releases.
For the canvas editor, available nodes and examples see the Canvas Agent Builder section.
v3.5.0 summary
| Dimension | Implementation |
|---|---|
| Cognitive | Planner + Writer + Critic, adaptive decomposition, hybrid search |
| Agentic | Multi-step ReAct pattern, dynamic tool use, critic-guided convergence |
| Memory | Persistent Memory Subject per (user, company), three levels (intra-conversation, history, profile) |
| DSL-native | Typed canvases for CHAT / INGESTION / SERVICE / WIDGET with distinct purposes |
| Multi-source | Tenant documents + KB + 5 external sources + open data + OAuth tools |
| Multi-channel | Web app, embedded widget, WhatsApp via Twilio (V1) |
| Transparency | Visible reasoning panel, immutable canvas snapshot audit |
| Privacy | Multi-tenant isolation, user-local memory, on-prem AI (DGX Spark) |