Cog-RAG: Cognitive Retrieval-Augmented Generation

Cog-RAG is Queria's proprietary architecture that transforms document search from a mechanical process into a cognitive one. Instead of merely finding documents similar to a question, the system understands, plans, reasons and verifies before answering.

From v3.5.0 the entire Cog-RAG cycle is implemented as a DSL canvas (purpose = CHAT), a graph pipeline composed of typed nodes the tenant can customize without writing code. The same pattern applies to ingestion (purpose = INGESTION) and external integrations (purpose = SERVICE for OAuth).

The traditional RAG problem

A classical RAG system follows three steps:

Receives the user question
Searches the most similar documents in the vector database
Passes the found documents to the language model to generate the answer

This approach has clear limits. It does not handle complex questions requiring information from different sources. It does not adapt the search strategy to question complexity. It does not verify whether results are really pertinent. It is unable to reason across multiple documents to produce a synthesis. It has no memory of who is talking or what has already been asked.

From RAG to agentic Cog-RAG with memory

Queria's architecture evolves traditional RAG along three axes:

Cognitive (Cog) -- the system understands intent, plans the strategy, decomposes queries and verifies grounding before answering.
Agentic -- multi-step ReAct pattern: the engine runs a chain of observation, reasoning and action (tool calls, retrieval, synthesis), repeated until convergence or iteration limit.
With memory -- every user has a persistent Memory Subject that retains facts, preferences and context across different conversations. The bot remembers who you are, what you're working on, what you've already seen.

The three axes are implemented as a customizable DSL canvas. What follows describes the "system" flow -- starting point for every tenant, modifiable without backend release.

How agentic Cog-RAG works

Cog-RAG introduces a full cognitive cycle between question and answer, memory-aware and agentic:

                                  [Memory recall]
                                        |
                                        v
Question  ---->  [Memory subject + conversation history] ----> context
   |                                                                 |
   v                                                                 |
[1] Query analysis and understanding (with user context)            |
   |                                                                 |
   v                                                                 |
[2] Search strategy planning                                        |
   |                                                                 |
   v                                                                 |
[3] Decomposition into sub-queries (if needed)                      |
   |                                                                 |
   v                                                                 |
[4] *Agentic ReAct loop*                                            |
   |   |                                                             |
   |   v                                                             |
   |  Action: tool calls (retrieval, external_tool, ...) <-----------+
   |   |
   |   v
   |  Observation: tool results
   |   |
   |   v
   |  Reasoning: critic evaluates sufficiency
   |   |
   |   v
   |  (converged?) --no--> new action
   |   |
   |   yes
   |
   v
[5] Semantic re-ranking
   |
   v
[6] Quality and grounding check (critic)
   |          |
   |     (insufficient)
   |          |
   |          v
   |     Return to agentic loop
   |     with widened scope
   |
   v
[7] Synthesis with deep reasoning
   |
   v
[8] Memory update (persistent facts emerged)
   |
   v
Answer with citations

Each step is a canvas node, handled by specialized components that collaborate autonomously. The critic -- a dedicated LLM node -- evaluates at each iteration whether the gathered evidence is sufficient and accurate; if not, the system relaunches the loop with widened scope. The conversation closes only when the critic agrees or the configured iteration limit is reached.

Persistent per-user memory

A distinctive trait of Cog-RAG v3.5.0 is the persistent memory the system maintains for each user, across different conversations and channels (web, widget, WhatsApp).

Memory Subject

Each user has a Memory Subject -- a DB structure that collects:

Category	Content	Examples
Professional profile	Role, responsibility area, interest domain	"Compliance manager", "works on GDPR contracts"
Preferences	Answer style, language, detail level	"wants short answers", "prefers Italian"
Operational context	What they're working on, deadlines, open projects	"analyzing 2025 balance sheet", "deadline Dec 31"
References	Documents, customers, suppliers frequently mentioned	"often works on supplier Rossi srl"

The Memory Subject is isolated per (userId, companyId): no user data flows to another, not even within the same company.

Three memory levels

+-----------------------------+
|  Intra-conversation memory  |  last N messages (turns)
+-----------------------------+
              |
              v
+-----------------------------+
|  Conversation history       |  summaries of previous sessions
+-----------------------------+
              |
              v
+-----------------------------+
|  Memory Subject (persistent)|  facts, preferences, user context
+-----------------------------+

Intra-conversation: the assistant remembers exactly what was said in previous turns of the same chat. You can refer to "point 3 of the previous answer" without repeating it.
Conversation history: compact summaries of past conversations consultable on the fly. Useful to resume a topic opened days ago.
Memory Subject: key facts extracted automatically or explicitly flagged. They persist until the user removes them from their memory profile.

Memory update

A memory_writer node at the end of the canvas extracts from completed turns candidate information to promote into the Memory Subject. Promotion is always transparent: the user sees a "I memorized that..." notice and can refuse the promotion, or consult their memory profile from settings.

Memory privacy

User side: the Profile > Memory page shows all memorized facts, with edit and point deletion possibility.
Admin side: SYSTEM_ADMIN can view a company's aggregated memory only for audit purposes, never per-user contents without explicit mandate.
User deletion: all associated Memory Subjects are removed via cascade.

For the user-side guide see Memory & Context.

The two-brain system

At the heart of Cog-RAG operate two AI models with complementary roles:

Planner: the fast brain

The Planner is a fast and light model, optimized for immediate decisions. It handles:

Intent classification: understands what the user is really asking
Routing: decides which pipeline to activate (simple search, decomposition, comparison)
Query decomposition: breaks down complex questions into manageable sub-questions
Complexity evaluation: estimates question difficulty to calibrate search parameters
Utility and support: handles auxiliary operations like reformulation and quick summaries

The Planner operates in milliseconds and doesn't engage heavy compute.

Writer: the deep brain

The Writer is a powerful model with advanced reasoning. It handles:

Multi-document synthesis: combines information from dozens of sources into a coherent answer
Complex reasoning: tackles questions requiring inferences, comparisons, analysis
High-quality generation: produces professional, structured and accurate text
Explicit thinking: uses an internal reasoning process before formulating the answer

The Writer enters only when its power is needed, preserving overall system efficiency.

The collaboration

The Planner decides what to do and how to do it. The Writer executes with depth. This separation lets you achieve fast response times for simple questions (handled almost entirely by the Planner) and high-quality answers for complex ones (where the Writer invests time in reasoning).

The Critic

The agentic pattern is joined by a third role: the Critic. It is a dedicated LLM instance that, after each Writer action, evaluates the result along explicit dimensions (evidence sufficiency, internal contradictions, alignment with the original question, sub-query coverage). If the analysis fails one or more criteria, the critic emits a bounce to the agentic loop with targeted instructions (e.g. "also search regulatory sources", "deepen aspect X").

The Critic is toggleable via the CHAT_CRITIC_ENABLED kill-switch. In latency-critical scenarios it can be disabled; in mission-critical scenarios (legal, healthcare, tax) it is always on.

Agentic pattern and tool use

In agentic mode, the Writer is not just synthesizing: it is an agent that decides which tools to invoke to reach the answer. The canvas tools available are:

Tool type	Examples
Internal retrieval	Search on tenant Qdrant collections (KB, user docs, sector)
External sources	Legal Sources, Food Sources, Chem Sources, Pharma Sources, AE Sources
OAuth external tools	Slack post_message, Stripe customer.lookup, GCal create_event
Open data	Geocoding (Nominatim), POI (Overpass), weather (Open-Meteo), encyclopedia (Wikipedia)
AI Constructor	Pre-packaged sector pipelines (Tourism, on roadmap Tax, Healthcare)
Memory lookup	Querying the current user's Memory Subject

Each tool is a canvas node. The agent iteratively chooses which to invoke observing previous results and reasoning about the gap with the question. The whole sequence is traced and visible in the reasoning panel.

Query orchestration

Not all questions are equal. Cog-RAG classifies every query and chooses the most fitting orchestration strategy.

Simple queries

For direct questions with a clear expected answer, the system runs a direct search and generates the response. No decomposition, no superfluous steps.

Example: "When does the contract with supplier X expire?"

Sequential decomposition

When sub-questions depend on each other, they are executed in sequence. The answer to one feeds the next.

Example: "Who are the designated heirs in the will and what shares go to each?" First identify the heirs, then look up the shares for each.

Parallel decomposition

When sub-questions are independent, they are executed in parallel to maximize speed.

Example: "Compare the contractual terms of supplier A and supplier B." The two supplier searches run simultaneously.

Hierarchical decomposition

For exploratory questions, the system starts from the general and progressively deepens.

Example: "What are the main issues that emerged in last year's audit reports?" First a broad search, then targeted deep dives on the surfaced themes.

Comparative decomposition

For structured comparisons, the system gathers information from both sides and produces a side-by-side analysis.

Example: "What are the differences between the current insurance policy and the proposed one?"

Adaptive search

Search parameters are automatically calibrated based on the query's estimated complexity:

Complexity	Documents searched	Minimum threshold	Reranking	Diversification
Simple	Few, targeted	High	Yes	Low
Moderate	Medium quantity	Medium	Yes	Medium
Complex	Wide quantity	Low	Yes	High
Aggregative	Maximum coverage	Very low	No	Maximum

Aggregative queries (statistics, summaries of large sets) need a different approach: maximum coverage with high diversification to avoid redundant results.

Hybrid search

Every search combines two complementary approaches:

Semantic search: compares the question meaning with the documents meaning via 1024-dimensional vectors. Excels at finding pertinent documents even when wording differs.
Lexical search (BM25): compares keywords. Excels at finding documents with specific terms (codes, proper nouns, article numbers).

Results from the two searches are combined via Reciprocal Rank Fusion (RRF), an algorithm that balances scores from both approaches to produce an optimal final ranking.

Multi-source integration

Cog-RAG is not limited to company documents. The system integrates transparently:

Company documents: files uploaded by the organization
Knowledge Base: the curated and permanent knowledge base
Certified external sources: specialized databases in legal, food, chemical and pharmaceutical domains

All sources participate in the same search and reranking process. The user receives a unified answer with citations that clearly identify the origin of each piece of information through colored badges distinct by source type.

Transparent reasoning

A distinctive trait of Cog-RAG is the transparency of the reasoning process. The Writer uses an explicit thinking mode: before formulating the answer, it generates an internal reasoning analyzing sources, evaluating relevance, identifying contradictions and planning the answer structure.

This reasoning is visible to the user through the dedicated panel in the interface. The user can verify how the system arrived at a certain conclusion, which sources it considered relevant and why, and where it found information gaps.

Reasoning transparency is fundamental in enterprise contexts where decisions based on system answers must be verifiable and justifiable.

Customizability via DSL canvas

Unlike monolithic RAG architectures, Cog-RAG v3.5.0 is entirely built on DSL canvases. This means:

Every tenant can duplicate the system pipeline and modify it -- add tools, change node order, exclude the critic, add PII sanitization steps before prompt submission.
Every topic can have a dedicated pipeline (e.g. the "Compliance" topic uses critic + mandatory regulatory sources, the "Marketing" topic uses a freer pipeline).
Canvases are versioned: snapshots, rollback, immutable audit of past versions.
Pipeline changes don't require backend releases.

For the canvas editor, available nodes and examples see the Canvas Agent Builder section.

v3.5.0 summary

Dimension	Implementation
Cognitive	Planner + Writer + Critic, adaptive decomposition, hybrid search
Agentic	Multi-step ReAct pattern, dynamic tool use, critic-guided convergence
Memory	Persistent Memory Subject per (user, company), three levels (intra-conversation, history, profile)
DSL-native	Typed canvases for CHAT / INGESTION / SERVICE / WIDGET with distinct purposes
Multi-source	Tenant documents + KB + 5 external sources + open data + OAuth tools
Multi-channel	Web app, embedded widget, WhatsApp via Twilio (V1)
Transparency	Visible reasoning panel, immutable canvas snapshot audit
Privacy	Multi-tenant isolation, user-local memory, on-prem AI (DGX Spark)

Cog-RAG: Cognitive Retrieval-Augmented Generation ​

The traditional RAG problem ​

From RAG to agentic Cog-RAG with memory ​

How agentic Cog-RAG works ​

Persistent per-user memory ​

Memory Subject ​

Three memory levels ​

Memory update ​

Memory privacy ​

The two-brain system ​

Planner: the fast brain ​

Writer: the deep brain ​

The collaboration ​

The Critic ​

Agentic pattern and tool use ​

Query orchestration ​

Simple queries ​

Sequential decomposition ​

Parallel decomposition ​

Hierarchical decomposition ​

Comparative decomposition ​

Adaptive search ​

Hybrid search ​

Multi-source integration ​

Transparent reasoning ​

Customizability via DSL canvas ​

v3.5.0 summary ​