Knowledge Bases

The Knowledge Base is a core feature of Assistants that enables RAG (Retrieval-Augmented Generation) by allowing you to upload and manage documents that your Assistant can reference during conversations. When users chat with your Assistant, the system automatically searches through uploaded documents to find relevant information and includes it in the AI's responses.

What is RAG (Retrieval-Augmented Generation)?

RAG is an AI technique that enhances language model responses by retrieving relevant information from external knowledge sources. Instead of relying solely on the AI model's training data, RAG allows your Assistant to:

Access current information from your uploaded documents
Provide specific, accurate answers based on your content
Reference company policies, procedures, or data that wasn't in the AI's training
Stay up-to-date with your latest information without retraining

Knowledge Base Interface

[Screenshot placeholder: Knowledge Base tab overview]

The Knowledge Base tab provides a comprehensive file management interface with the following sections:

Header Section

Knowledge Base title with file count badge
Search bar for finding specific files
Add Files button for uploading new documents

File Management Area

File table displaying all uploaded documents
Status indicators showing processing progress
Action buttons for file inspection and management
Drag & drop zone for easy file uploads

Supported File Formats

The Knowledge Base supports a wide variety of document formats:

Document Formats

PDF (.pdf) - Adobe Portable Document Format
Microsoft Word (.doc, .docx) - Word documents
Rich Text Format (.rtf) - Rich text documents
Plain Text (.txt) - Simple text files

Web and Markup Formats

HTML (.html, .htm) - Web pages and structured content
Markdown (.md) - Markdown formatted text
XML (.xml) - Structured markup documents

Data Formats

CSV (.csv) - Comma-separated values
JSON (.json) - JavaScript Object Notation
Excel (.xls, .xlsx) - Spreadsheet data

Presentation Formats

PowerPoint (.ppt, .pptx) - Presentation slides

Programming and Code

JavaScript (.js) - JavaScript source code
CSS (.css) - Stylesheets
Various code files - Source code in different languages

File Upload Process

Method 1: Drag & Drop Upload

[Screenshot placeholder: Drag and drop interface]

Drag files from your computer onto any area of the Knowledge Base tab
Visual feedback appears showing the drop zone
Release files to begin the upload process
Progress tracking shows upload and processing status

The drag & drop interface features:

Visual indicators when files are being dragged
Animated drop zone with clear instructions
Multi-file support for bulk uploads
File type validation with helpful error messages

Method 2: File Browser Upload

[Screenshot placeholder: Add Files button and file picker]

Click the "Add Files" button
File picker opens allowing you to browse your computer
Select single or multiple files using standard file selection
Confirm selection to begin upload process

Upload Stages

Each uploaded file goes through a three-stage processing pipeline:

Stage 1: Getting Upload URL (25% Progress)

Secure URL generation for S3 storage
File validation and size checking
Preparation for cloud upload

Stage 2: Uploading to S3 (50% Progress)

Direct upload to secure cloud storage
Progress tracking for large files
Error handling for upload failures

Stage 3: Processing (75% Progress)

File registration with the system
Text extraction from documents
Initial processing preparation

Stage 4: Text Extraction and Embedding (100% Progress)

Content extraction from various file formats
Text chunking for optimal retrieval
AI embedding generation for similarity search
Indexing for fast retrieval

File Status Types

Processing States

Available

Status: Ready for use in conversations
Indicator: Green dot
Description: File has been fully processed and indexed

Extracting Text

Status: Converting document content to searchable text
Indicator: Yellow spinner
Description: System is extracting readable text from the document

Embedding

Status: Generating AI embeddings for similarity search
Indicator: Blue spinner
Description: Creating vector representations for intelligent retrieval

Processing

Status: General processing state
Indicator: Gray spinner
Description: File is being prepared for use

Error

Status: Processing failed
Indicator: Red indicator
Description: File could not be processed successfully

Uploading

Status: File transfer in progress
Indicator: Upload progress bar
Description: File is being uploaded to the system

File Management Features

File Information Display

[Screenshot placeholder: File table with details]

Each file in the Knowledge Base displays:

Basic Information

File name with clear, readable display
File size in human-readable format (KB, MB, GB)
File type with format description
Upload date with full timestamp

Status Information

Current processing status with visual indicators
Progress indicators for files being processed
Error messages for failed processing

File Actions

File Inspection

[Screenshot placeholder: File detail modal]

Click on any file to open the detailed inspection modal:

File Details Tab:

Complete file information including metadata
Processing status with detailed explanations
Upload timestamp and file history
File type and size information

Content Preview:

Extracted text preview (when available)
Download links for original files
Content statistics (word count, etc.)

File Management Actions

View/Inspect File:

Access detailed file information
Preview extracted text content
Download original file
Review processing status

Remove File:

Permanently delete from Knowledge Base
Confirmation dialog prevents accidental deletion
Immediate removal from Assistant's knowledge

Search and Organization

File Search

[Screenshot placeholder: Search functionality]

The search feature allows you to:

Find files by name using partial matching
Real-time filtering as you type
Case-insensitive search for ease of use
Clear search to return to full list

File Sorting

Files are automatically organized by:

Most recent uploads appear first
Processing status (uploading files shown at top)
Alphabetical sorting within status groups

Processing Pipeline Details

Text Extraction Process

The system automatically extracts text from various file formats:

PDF Processing

Text layer extraction from native PDF text
OCR capabilities for scanned documents
Table and structure preservation where possible
Metadata extraction (title, author, etc.)

Document Processing

Microsoft Office document text extraction
Formatting preservation for better context
Header and structure recognition
Embedded content handling

Web Content Processing

HTML parsing with content extraction
Markdown rendering and formatting
Link and reference preservation
Clean text output without markup noise

Chunking Strategy

Documents are intelligently split into chunks for optimal retrieval:

Chunking Parameters

Chunk size: Typically 1000-2000 characters
Overlap: 10-20% overlap between chunks
Boundary respect: Splits at sentence or paragraph boundaries
Context preservation: Maintains logical content groupings

Chunking Benefits

Relevant retrieval: Find specific sections without full document
Context preservation: Maintain meaning across chunk boundaries
Performance optimization: Faster search and retrieval
Token efficiency: Fit relevant content within AI context limits

Embedding Generation

Each text chunk gets converted to AI embeddings:

Embedding Process

Vector representation of text meaning
Semantic similarity for intelligent matching
Multi-dimensional vectors for nuanced understanding
Optimized for search with high-quality embeddings

Search Capabilities

Semantic search: Find content by meaning, not just keywords
Relevance ranking: Return most relevant chunks first
Context matching: Understand user intent and query context
Cross-document search: Find related content across all files

Knowledge Base Integration with Chat

Automatic RAG Integration

When users chat with your Assistant, the Knowledge Base automatically:

Query Processing

Analyze user question for intent and keywords
Search knowledge base using semantic similarity
Rank relevant chunks by relevance score
Select top matches for inclusion in AI context

Context Inclusion

Relevant chunks automatically added to AI prompt
Source attribution maintained for transparency
Context optimization to fit within token limits
Quality filtering to ensure relevance

RAG Configuration

System-Level Settings

Knowledge Base behavior is controlled by the underlying System's resource settings:

Embedding enabled: Whether to use vector embeddings
Chunk size: Size of text chunks for processing
Number of results: How many relevant chunks to include
Similarity threshold: Minimum relevance score for inclusion

Assistant-Level Settings

RAG enabled: Toggle Knowledge Base usage for the Assistant
Document scope: Which uploaded files to include in searches

Best Practices

File Organization

Naming Conventions

Descriptive names: Use clear, searchable file names
Version control: Include version numbers or dates when applicable
Category prefixes: Group related files with consistent naming
Avoid special characters: Stick to alphanumeric characters and hyphens

Content Quality

Well-structured documents: Use headings, lists, and clear formatting
Complete information: Ensure documents contain comprehensive information
Current content: Keep uploaded documents up-to-date
Relevant scope: Only upload content relevant to your Assistant's purpose

Upload Strategy

File Selection

Quality over quantity: Focus on high-quality, relevant documents
Comprehensive coverage: Include all necessary reference materials
Avoid duplicates: Remove redundant or outdated files
Size considerations: Balance completeness with processing efficiency

Batch Processing

Plan uploads: Organize files before uploading
Monitor processing: Watch for errors during bulk uploads
Verify completion: Ensure all files reach "Available" status
Test integration: Verify Knowledge Base works in chat after uploads

Maintenance and Updates

Regular Review

Content audits: Periodically review uploaded files for relevance
Remove outdated content: Delete files that are no longer accurate
Update information: Replace old files with newer versions
Monitor usage: Track which documents are being referenced

Performance Optimization

File size management: Keep individual files to reasonable sizes
Format optimization: Use text-based formats when possible
Content structure: Well-organized documents improve retrieval
Regular cleanup: Remove unused or irrelevant files

Troubleshooting

Common Upload Issues

File Too Large

Solution: Break large files into smaller, focused documents
Alternative: Extract key sections into separate files
Optimization: Remove unnecessary images or formatting

Unsupported Format

Solution: Convert to supported format (PDF, DOCX, TXT, etc.)
Tools: Use online converters or office software
Alternative: Copy content to plain text file

Processing Stuck

Check: Wait for processing to complete (can take several minutes)
Refresh: Reload the page to check current status
Retry: Remove and re-upload if processing fails

Content Quality Issues

Poor Search Results

Review content: Ensure documents contain relevant information
Improve structure: Use clear headings and organization
Add context: Include explanatory text and examples
Check keywords: Ensure important terms are present in documents

Missing Information

Verify upload: Confirm all necessary files were uploaded successfully
Check processing: Ensure files show "Available" status
Review content: Verify extracted text contains expected information
Test queries: Use specific questions to test knowledge retrieval

Security and Privacy

Data Protection

Encrypted storage: All uploaded files are stored securely
Access control: Files are only accessible within your Assistant
Private by default: Knowledge Base content is not shared between Assistants
Secure deletion: Removed files are permanently deleted from storage

Content Considerations

Sensitive information: Be mindful of confidential data in uploads
Access permissions: Ensure you have rights to upload and use content
Data retention: Understand that uploaded content is stored until manually removed
Compliance: Consider regulatory requirements for your industry

This documentation provides a comprehensive guide to using the Assistant Knowledge Base feature effectively. The Knowledge Base transforms your Assistant from a general-purpose AI into a specialized expert with access to your specific information and domain knowledge.

What is RAG (Retrieval-Augmented Generation)?​

Knowledge Base Interface​

Header Section​

File Management Area​

Supported File Formats​

Document Formats​

Web and Markup Formats​

Data Formats​

Presentation Formats​

Programming and Code​

File Upload Process​

Method 1: Drag & Drop Upload​

Method 2: File Browser Upload​

Upload Stages​

Stage 1: Getting Upload URL (25% Progress)​

Stage 2: Uploading to S3 (50% Progress)​

Stage 3: Processing (75% Progress)​

Stage 4: Text Extraction and Embedding (100% Progress)​

File Status Types​

Processing States​

Available​

Extracting Text​

Embedding​

Processing​

Error​

Uploading​

File Management Features​

File Information Display​

Basic Information​

Status Information​

File Actions​

File Inspection​

File Management Actions​

Search and Organization​

File Search​

File Sorting​

Processing Pipeline Details​

Text Extraction Process​

PDF Processing​

Document Processing​

Web Content Processing​

Chunking Strategy​

Chunking Parameters​

Chunking Benefits​

Embedding Generation​

Embedding Process​

Search Capabilities​

Knowledge Base Integration with Chat​

Automatic RAG Integration​

Query Processing​

Context Inclusion​

RAG Configuration​

System-Level Settings​

Assistant-Level Settings​

Best Practices​

File Organization​

Naming Conventions​

Content Quality​

Upload Strategy​

File Selection​

Batch Processing​

Maintenance and Updates​

Regular Review​

Performance Optimization​

Troubleshooting​

Common Upload Issues​

File Too Large​

Unsupported Format​

Processing Stuck​

Content Quality Issues​

Poor Search Results​

Missing Information​

Security and Privacy​

Data Protection​

Content Considerations​