What content formats are acceptable for use in the knowledge base?
- .doc
- .docx
- .pptx
- .txt
- .md
- URLs (must not be password-protected)
What does Realtime retrieval mean?
The agent actively searches your domain each chat - no retraining required. This is different from uploaded files, which are indexed at the time of upload and queried when a user asks something relevant.
Are there any best practices when uploading documents to the knowledge base?
- Split large PDFs (over 50 pages) for faster indexing.
- Wait for the file status to read Processed before closing the window or navigating away. Do not switch tabs during upload.
- Do not close or refresh the editor while files are uploading - if your session expires, supported file types may show an unsupported format error. If this happens, refresh the page and re-upload.
- URLs added as knowledge sources must be publicly accessible. Password-protected, geo-restricted, or Cloudflare-protected pages may not index correctly. See Website Analysis Limitations for details.
Are .txt and .md files really supported? I keep getting an error.
Yes, both .txt and .md are supported. If you see an unsupported format error for these types, your editing session has most likely expired. Refresh the page and try uploading again.
Note: .md files work best as knowledge base sources when they contain factual reference content. If your .md file reads more like instructions or a persona description, consider pasting that content into the Persona Prompt editor instead - it will have more direct influence on agent behaviour there.
My agent is giving incomplete or wrong answers even though I uploaded the right documents. What should I check?
The most common cause of unreliable answers from uploaded documents is the structure of the source file. The knowledge base uses AI embeddings to understand and retrieve content - and some document formats carry very little semantic meaning for this process.
Avoid uploading these types of documents directly:
- Printouts of spreadsheets or tabular data (for example, a PDF exported from Excel with rows and columns of event sessions, product listings, or schedules). Tabular layouts do not carry enough sentence-level meaning for the embedding model to reliably answer questions like "List all sessions by speaker X" or "What time is the lunch break?" - it will skip items or give wrong answers.
- Scanned documents or image-based PDFs - these contain no readable text and will not index correctly.
- Heavily formatted documents with complex layouts, multiple columns, or dense tables - these often produce fragmented or out-of-order indexed content.
What to do instead:
- Convert tabular data into narrative, sentence-form text. For example, instead of a table row reading "Florian Rotberg | Session 3 | 14:00 | Hall B", write: "Florian Rotberg is presenting Session 3 at 14:00 in Hall B." This form is far more reliably retrieved.
- For schedules and structured data with many entries, consider breaking the content into separate smaller documents by category (e.g. one document per speaker, or one document per day) rather than one large combined file.
- Add a dedicated FAQ for the most critical questions that must always be answered correctly - FAQ entries take priority over knowledge base retrieval.
My Spaces dashboard showed all my Spaces as gone after an upload error. What happened?
This is a session display issue, not data loss. Refresh the page and your Spaces will reappear.