Document Management

Queria offers multiple ways to import documents into the platform. Once uploaded, documents are automatically processed and made available for AI search. This guide covers all upload options, supported formats and management tools.

Canvas-native ingestion pipeline (v3.5.0+)

From v3.5.0 the pipeline that processes documents (parsing, role classification, chunking, vectorization) is a DSL canvas that the administrator can customize for the company. See Ingestion DSL for technical details. From the user's side the upload experience is unchanged: drag the file, the AI does the rest.

Upload methods

The Documents page presents several tabs for the various import methods.

Upload File tab

The most direct method to upload documents from your computer.

Drag & drop: drag one or more files directly into the upload area.
Click to select: click the upload area to open the OS file picker.
Multiple upload: you can select and upload multiple files at once.

Every file is uploaded with its own progress bar. At the end of upload, the system automatically starts processing.

Cloud Storage tab

Import documents from your business cloud services:

Google Drive: connect your Google account and browse folders and files. You can select individual files or whole folders.
OneDrive: access files in your personal or business Microsoft OneDrive.
SharePoint: navigate through your organization's SharePoint sites and import documents from shared libraries.

Once a cloud account is connected, you can enable auto-sync: Queria will periodically check the selected folders and automatically import new files or updated versions.

Company Network tab

Import documents directly from your organization's network folders:

SMB/CIFS shares: specify the network path (e.g. \\server\share\folder) and access credentials.
VPN access: if your network requires a VPN, Queria supports tunnel configuration to reach internal resources.
Scheduled sync: configure a schedule for automatic network folder synchronization. The system detects new files and changes transparently.

URL tab

Import the content of a web page by providing its address:

Enter the full URL of the page.
Queria downloads the content, analyzes it and makes it available for search.
Useful for importing articles, online documentation or informative pages.

JSONL tab

For bulk import of pre-structured data:

Upload a file in JSONL format (JSON Lines) where each row represents a document with its metadata.
Ideal for migrations from other systems or programmatic imports.
Each record can include title, content, custom metadata and topic assignment.

Supported formats

Format	Extension	Notes
PDF	`.pdf`	Full support, including scanned PDFs (via OCR)
Word	`.docx`	Text, tables, embedded images
Excel	`.xlsx`	Multiple sheets, structured tables
PowerPoint	`.pptx`	Text from slides
Text	`.txt`	Plain text
Markdown	`.md`	Formatting preserved
CSV	`.csv`	Tabular data
JSON	`.json`	Structured data
OpenDocument	`.odt`	Open format
Images	`.jpg`, `.png`, `.tiff`	Automatic processing with OCR

Processing states

Every document goes through a series of states during processing:

State	Icon	Description
Uploaded	Empty circle	The file was received by the system, waiting for processing.
Processing	Animated spinner	The system is analyzing the document: text extraction, OCR (if needed), chunking, vector embedding generation.
Ready	Green check	The document was successfully processed and is available for AI search.
Error	Red triangle	An issue occurred during processing. You can view error details and attempt re-processing.

Processing times

A 10-20 page document is processed in roughly 30 seconds. Longer documents or those with many images may take a few minutes.

Processing monitor

The Monitor section lets you follow in real time the processing progress of all documents:

See how many documents are queued, processing or completed.
Check the progress percentage for each document.
Quickly identify any errors and access details for resolution.

Document actions

For each document in the list you can perform several operations:

View: open a preview of the original document or download it.
See segments: check how the system split the document into chunks for search. Useful to verify that processing was correct.
Re-process: restart processing of the document. Useful if the document was in error or you wish to update segments.
Archive: move the document to the archive. It will no longer be included in searches but remains available for consultation. You can restore it at any time.
Delete: removes the document (soft delete). Can be recovered from the trash.
Delete permanently: removes the document and all its segments irreversibly (admins only).

Notice

Permanent deletion is irreversible. The document will be removed completely from the system and from searches.

Document role

Not all documents should be treated the same way. A company policy and a price list speak different languages: the policy is read by concepts and paragraphs, the price list is consulted by rows and values. A judgment must be cited precisely (article, paragraph, dispositive), an FAQ is recalled as a concrete example.

That's why Queria, while processing a document, classifies it into one or more roles. The role determines how the document is segmented, indexed and then recalled in answer to a question. Classification is fully automatic; you just upload the file. The assigned role is visible in the document sheet and you can see it in the chat citations.

The five roles are:

TRUTH — Authoritative knowledge

Documents that tell "how things are": operational manuals, product documentation, interpreted regulations, company policies, white papers, technical books.

How it's read: segmented by paragraphs and sections, preserving narrative context.
What you get in chat: discursive answers that weave information from multiple paragraphs, with [N] citations to the original passage.
When it's used: most of your company documents end up here. It's the "default" role in the absence of specific cues.

FORMAT — Templates and forms

Documents that show a structure to reproduce rather than information to consult: contract templates, blank forms, schemas.

How it's read: the system extracts the structure (fields, sections, placeholders) separating it from the generic surrounding text.
What you get in chat: they are not usually cited in conversational answers. They are instead the "fuel" of Document Generation: when you ask "generate a rental contract", the system starts from the right FORMAT and fills it with data extracted from other documents.
When it's used: wherever there's a standard model to fill repeatedly.

RULES — Binding rules

The most delicate role. These are prescriptive documents that establish obligations, sanctions, deadlines, applicable regulations: judgments, decrees, EU regulations, law articles, orders, administrative decisions, Italian Revenue Agency circulars, internal regulations with binding effect.

How it's read: the system recognizes the article - paragraph - letter structure typical of Italian and EU legal language. Each article is kept as an indivisible unit. For judgments, the maxim (the established rule) and the dispositive (what the judge decided) are indexed separately.
What you get in chat:
- Precise citations: not a generic "the contract", but art. 5 par. 2 of LD 231/2001 or Cass. Lab. Sec. n. 12345/2023.
- RULES priority when the question concerns obligations or compliance: if you ask "can I dismiss a sick employee?", the system prefers law articles and judgments over an internal HR circular.
- Automatic validity filter: repealed norms are excluded by default (you can explicitly request them for historical searches).
When it's used: in law firms, tax advisors, compliance, HR -- any context where "cite the source" is not a detail but a requirement.

Double role TRUTH + RULES

A judgment has two souls: the motivation (why the judge decided so -- TRUTH) and the maxim/dispositive (the rule that follows -- RULES). The system assigns both roles and indexes the two aspects separately. So when you ask "why did the Supreme Court decide this way?" you get the motivation; when you ask "what does it establish on point X?" you get the precise maxim.

OPERATIONAL — Structured data

Documents that make sense "per row" rather than as free text: price lists, supplier and customer master data, product sheets, balance sheets in tabular format, KPIs, time sheets, reconciliations.

How it's read: one row = one autonomous unit. Values are preserved (e.g. product = Alpha, price = 120 euros, availability = in stock).
What you get in chat: the system recognizes aggregative questions ("what's the total 2025 revenue?", "how many products under 100 euros do we have?") and answers with calculations on structured data, not a narrative summary. For pointed analyses ("what's the price of product Alpha?") the answer cites the exact row.
When it's used: wherever a datum's value depends on its position in the table and not just on the text.

EXAMPLES — Cases and scenarios

Demonstrative documents that show how to apply a concept, procedure or rule: case studies, application scenarios, company FAQs, solved exercises, support knowledge bases.

How it's read: a Q&A pair or a complete scenario is kept as a unit. The integrity of the individual case is preserved.
What you get in chat: the assistant enriches the answer with a concrete example when the question allows it ("I have a situation similar to..."). EXAMPLES citations are visually marked distinctly so you know you're reading an example case and not an absolute rule.
When it's used: customer support, onboarding, training materials, internal helpdesk knowledge base.

Multi-role documents

Many real-world documents are mixed. Examples:

Document	Applied roles	Why
Judgment with damage calculation table	RULES + OPERATIONAL	The maxim is RULES, the values table is OPERATIONAL
Operational manual with FAQ appendix	TRUTH + EXAMPLES	The body is TRUTH, the FAQs are EXAMPLES
Company policy with attached form	TRUTH + FORMAT	The text describes the rule (TRUTH), the attached form is FORMAT
Price list with general terms on top	OPERATIONAL + TRUTH	Prices are OPERATIONAL, narrative terms are TRUTH

The system automatically detects coexisting roles and creates dedicated segments for each, so the same source can be cited differently depending on the question.

How to check the assigned role

Open the document sheet from the Documents page.
In the details panel you'll see a Role field with one or more colored badges.
Clicking the badge gives you the explanation of the role and of the chunks the system generated for that role.

When you get an answer in chat, the citations also show an icon corresponding to the source role (e.g. a scale for RULES, a table for OPERATIONAL). It's a quick way to understand where each piece of information comes from.

What to do if the classification is wrong

Automatic classification works well in the vast majority of cases, but it can fail -- especially for very industry-specific documents. You have two paths:

Report it to the administrator: they can force the role manually on that document or define a rule whereby all documents in a certain topic (or with a certain name/path pattern) are always classified into a specific role. See Wizard, Bulk and Path-rules for operational details.
Override on the fly during upload: using the upload Wizard you can confirm or change the suggested role before launching processing. The system also shows a confidence score (e.g. "RULES -- 87%") so you understand how sure it is of its choice.

Why it matters

The role is not a mere label. It determines how the AI will use that document to answer your questions. A price list mistakenly classified as TRUTH will produce narrative summaries when you want calculations; a judgment classified as TRUTH will lack precise article-and-paragraph citations. When results look "close but not on target", the issue is often there.

Standard documents vs Knowledge Base

Queria distinguishes between two document types:

Standard documents

Documents normally uploaded. Part of the company archive and available for searches based on permissions and assigned topics.

Knowledge Base documents

Documents marked as part of the company Knowledge Base. They have special characteristics:

They are permanent and always prioritized in searches.
They represent the curated and authoritative knowledge of the organization.
They are accessible to all users with the appropriate permissions.

To dig deeper, see the Knowledge Base guide.

Automatic OCR

When you upload a scanned document (image PDF) or an image file (JPG, PNG, TIFF), Queria automatically triggers Optical Character Recognition (OCR):

The system automatically detects whether the document contains real text or is an image.
The OCR engine extracts text from images, including tables that are formatted in Markdown to preserve structure.
The extracted text is then processed normally for search.
An AI post-OCR correction improves the recognized text quality, fixing broken words, missing spaces and common errors.
No manual action needed: the process is completely transparent.

Cloud and Network synchronization

Cloud sync

After connecting a cloud service (Google Drive, OneDrive, SharePoint):

Select the folders to monitor.
Configure the sync frequency.
The system periodically checks for new files or updated versions.
New documents are automatically imported and processed.

Company network sync

For network folders:

Configure path and access credentials.
Set the schedule (hourly, daily, weekly).
Queria accesses the folder at the set times and imports updates.
Modified documents are re-processed automatically.

Deduplication

Queria automatically prevents duplicates: each file is identified by a unique hash. If you upload a file already present in the same organization, the system returns the existing document without creating a duplicate.

Organization best practices

Assign topics at upload time: categorizing documents right away improves subsequent search quality. For Editors, assigning at least one topic is mandatory. If you have only one topic assigned, it's selected automatically.
Use descriptive file names: Queria also uses the file name as a search metadata. "Contract_Rossi_2025.pdf" is more useful than "doc1.pdf".
Prefer text formats: when possible, upload documents with real text (DOCX, text PDF) rather than scans. Search quality will be better.
Check error documents: monitor regularly to identify and solve processing issues quickly.
Use the Knowledge Base for key documents: manuals, procedures, policies and other reference documents should be in the KB.
Leverage automatic sync: for folders that update frequently, automatic sync avoids repeated manual upload.
Archive instead of delete: archived documents can be restored. Permanent deletion is irreversible.
Trash: Editors can view and restore deleted documents in their own topics. Permanent deletion and trash emptying are reserved for Admins.

Queria v3.5.0 -- Role-aware document ingestion (canvas DSL)

Document Management ​

Upload methods ​

Upload File tab ​

Cloud Storage tab ​

Company Network tab ​

URL tab ​

JSONL tab ​

Supported formats ​

Processing states ​

Processing monitor ​

Document actions ​

Document role ​

TRUTH — Authoritative knowledge ​

FORMAT — Templates and forms ​

RULES — Binding rules ​

OPERATIONAL — Structured data ​

EXAMPLES — Cases and scenarios ​

Multi-role documents ​

How to check the assigned role ​

What to do if the classification is wrong ​

Standard documents vs Knowledge Base ​

Standard documents ​

Knowledge Base documents ​

Automatic OCR ​

Cloud and Network synchronization ​

Cloud sync ​

Company network sync ​

Deduplication ​

Organization best practices ​