Skip to main content

The Knowledge Base is a core feature of Assistants that enables RAG (Retrieval-Augmented Generation) by allowing you to upload and manage documents that your Assistant can reference during conversations. When users chat with your Assistant, the system automatically searches through uploaded documents to find relevant information and includes it in the AI's responses.

What is RAG (Retrieval-Augmented Generation)?

RAG is an AI technique that enhances language model responses by retrieving relevant information from external knowledge sources. Instead of relying solely on the AI model's training data, RAG allows your Assistant to:

  • Access current information from your uploaded documents
  • Provide specific, accurate answers based on your content
  • Reference company policies, procedures, or data that wasn't in the AI's training
  • Stay up-to-date with your latest information without retraining

Knowledge Base Interface

[Screenshot placeholder: Knowledge Base tab overview]

The Knowledge Base tab provides a comprehensive file management interface with the following sections:

Header Section

  • Knowledge Base title with file count badge
  • Search bar for finding specific files
  • Add Files button for uploading new documents

File Management Area

  • File table displaying all uploaded documents
  • Status indicators showing processing progress
  • Action buttons for file inspection and management
  • Drag & drop zone for easy file uploads

Supported File Formats

The Knowledge Base supports a wide variety of document formats:

Document Formats

  • PDF (.pdf) - Adobe Portable Document Format
  • Microsoft Word (.doc, .docx) - Word documents
  • Rich Text Format (.rtf) - Rich text documents
  • Plain Text (.txt) - Simple text files

Web and Markup Formats

  • HTML (.html, .htm) - Web pages and structured content
  • Markdown (.md) - Markdown formatted text
  • XML (.xml) - Structured markup documents

Data Formats

  • CSV (.csv) - Comma-separated values
  • JSON (.json) - JavaScript Object Notation
  • Excel (.xls, .xlsx) - Spreadsheet data

Presentation Formats

  • PowerPoint (.ppt, .pptx) - Presentation slides

Programming and Code

  • JavaScript (.js) - JavaScript source code
  • CSS (.css) - Stylesheets
  • Various code files - Source code in different languages

File Upload Process

Method 1: Drag & Drop Upload

[Screenshot placeholder: Drag and drop interface]

  1. Drag files from your computer onto any area of the Knowledge Base tab
  2. Visual feedback appears showing the drop zone
  3. Release files to begin the upload process
  4. Progress tracking shows upload and processing status

The drag & drop interface features:

  • Visual indicators when files are being dragged
  • Animated drop zone with clear instructions
  • Multi-file support for bulk uploads
  • File type validation with helpful error messages

Method 2: File Browser Upload

[Screenshot placeholder: Add Files button and file picker]

  1. Click the "Add Files" button
  2. File picker opens allowing you to browse your computer
  3. Select single or multiple files using standard file selection
  4. Confirm selection to begin upload process

Upload Stages

Each uploaded file goes through a three-stage processing pipeline:

Stage 1: Getting Upload URL (25% Progress)

  • Secure URL generation for S3 storage
  • File validation and size checking
  • Preparation for cloud upload

Stage 2: Uploading to S3 (50% Progress)

  • Direct upload to secure cloud storage
  • Progress tracking for large files
  • Error handling for upload failures

Stage 3: Processing (75% Progress)

  • File registration with the system
  • Text extraction from documents
  • Initial processing preparation

Stage 4: Text Extraction and Embedding (100% Progress)

  • Content extraction from various file formats
  • Text chunking for optimal retrieval
  • AI embedding generation for similarity search
  • Indexing for fast retrieval

File Status Types

Processing States

Available

  • Status: Ready for use in conversations
  • Indicator: Green dot
  • Description: File has been fully processed and indexed

Extracting Text

  • Status: Converting document content to searchable text
  • Indicator: Yellow spinner
  • Description: System is extracting readable text from the document

Embedding

  • Status: Generating AI embeddings for similarity search
  • Indicator: Blue spinner
  • Description: Creating vector representations for intelligent retrieval

Processing

  • Status: General processing state
  • Indicator: Gray spinner
  • Description: File is being prepared for use

Error

  • Status: Processing failed
  • Indicator: Red indicator
  • Description: File could not be processed successfully

Uploading

  • Status: File transfer in progress
  • Indicator: Upload progress bar
  • Description: File is being uploaded to the system

File Management Features

File Information Display

[Screenshot placeholder: File table with details]

Each file in the Knowledge Base displays:

Basic Information

  • File name with clear, readable display
  • File size in human-readable format (KB, MB, GB)
  • File type with format description
  • Upload date with full timestamp

Status Information

  • Current processing status with visual indicators
  • Progress indicators for files being processed
  • Error messages for failed processing

File Actions

File Inspection

[Screenshot placeholder: File detail modal]

Click on any file to open the detailed inspection modal:

File Details Tab:

  • Complete file information including metadata
  • Processing status with detailed explanations
  • Upload timestamp and file history
  • File type and size information

Content Preview:

  • Extracted text preview (when available)
  • Download links for original files
  • Content statistics (word count, etc.)

File Management Actions

View/Inspect File:

  • Access detailed file information
  • Preview extracted text content
  • Download original file
  • Review processing status

Remove File:

  • Permanently delete from Knowledge Base
  • Confirmation dialog prevents accidental deletion
  • Immediate removal from Assistant's knowledge

Search and Organization

[Screenshot placeholder: Search functionality]

The search feature allows you to:

  • Find files by name using partial matching
  • Real-time filtering as you type
  • Case-insensitive search for ease of use
  • Clear search to return to full list

File Sorting

Files are automatically organized by:

  • Most recent uploads appear first
  • Processing status (uploading files shown at top)
  • Alphabetical sorting within status groups

Processing Pipeline Details

Text Extraction Process

The system automatically extracts text from various file formats:

PDF Processing

  • Text layer extraction from native PDF text
  • OCR capabilities for scanned documents
  • Table and structure preservation where possible
  • Metadata extraction (title, author, etc.)

Document Processing

  • Microsoft Office document text extraction
  • Formatting preservation for better context
  • Header and structure recognition
  • Embedded content handling

Web Content Processing

  • HTML parsing with content extraction
  • Markdown rendering and formatting
  • Link and reference preservation
  • Clean text output without markup noise

Chunking Strategy

Documents are intelligently split into chunks for optimal retrieval:

Chunking Parameters

  • Chunk size: Typically 1000-2000 characters
  • Overlap: 10-20% overlap between chunks
  • Boundary respect: Splits at sentence or paragraph boundaries
  • Context preservation: Maintains logical content groupings

Chunking Benefits

  • Relevant retrieval: Find specific sections without full document
  • Context preservation: Maintain meaning across chunk boundaries
  • Performance optimization: Faster search and retrieval
  • Token efficiency: Fit relevant content within AI context limits

Embedding Generation

Each text chunk gets converted to AI embeddings:

Embedding Process

  • Vector representation of text meaning
  • Semantic similarity for intelligent matching
  • Multi-dimensional vectors for nuanced understanding
  • Optimized for search with high-quality embeddings

Search Capabilities

  • Semantic search: Find content by meaning, not just keywords
  • Relevance ranking: Return most relevant chunks first
  • Context matching: Understand user intent and query context
  • Cross-document search: Find related content across all files

Knowledge Base Integration with Chat

Automatic RAG Integration

When users chat with your Assistant, the Knowledge Base automatically:

Query Processing

  1. Analyze user question for intent and keywords
  2. Search knowledge base using semantic similarity
  3. Rank relevant chunks by relevance score
  4. Select top matches for inclusion in AI context

Context Inclusion

  1. Relevant chunks automatically added to AI prompt
  2. Source attribution maintained for transparency
  3. Context optimization to fit within token limits
  4. Quality filtering to ensure relevance

RAG Configuration

System-Level Settings

Knowledge Base behavior is controlled by the underlying System's resource settings:

  • Embedding enabled: Whether to use vector embeddings
  • Chunk size: Size of text chunks for processing
  • Number of results: How many relevant chunks to include
  • Similarity threshold: Minimum relevance score for inclusion

Assistant-Level Settings

  • RAG enabled: Toggle Knowledge Base usage for the Assistant
  • Document scope: Which uploaded files to include in searches

Best Practices

File Organization

Naming Conventions

  1. Descriptive names: Use clear, searchable file names
  2. Version control: Include version numbers or dates when applicable
  3. Category prefixes: Group related files with consistent naming
  4. Avoid special characters: Stick to alphanumeric characters and hyphens

Content Quality

  1. Well-structured documents: Use headings, lists, and clear formatting
  2. Complete information: Ensure documents contain comprehensive information
  3. Current content: Keep uploaded documents up-to-date
  4. Relevant scope: Only upload content relevant to your Assistant's purpose

Upload Strategy

File Selection

  1. Quality over quantity: Focus on high-quality, relevant documents
  2. Comprehensive coverage: Include all necessary reference materials
  3. Avoid duplicates: Remove redundant or outdated files
  4. Size considerations: Balance completeness with processing efficiency

Batch Processing

  1. Plan uploads: Organize files before uploading
  2. Monitor processing: Watch for errors during bulk uploads
  3. Verify completion: Ensure all files reach "Available" status
  4. Test integration: Verify Knowledge Base works in chat after uploads

Maintenance and Updates

Regular Review

  1. Content audits: Periodically review uploaded files for relevance
  2. Remove outdated content: Delete files that are no longer accurate
  3. Update information: Replace old files with newer versions
  4. Monitor usage: Track which documents are being referenced

Performance Optimization

  1. File size management: Keep individual files to reasonable sizes
  2. Format optimization: Use text-based formats when possible
  3. Content structure: Well-organized documents improve retrieval
  4. Regular cleanup: Remove unused or irrelevant files

Troubleshooting

Common Upload Issues

File Too Large

  • Solution: Break large files into smaller, focused documents
  • Alternative: Extract key sections into separate files
  • Optimization: Remove unnecessary images or formatting

Unsupported Format

  • Solution: Convert to supported format (PDF, DOCX, TXT, etc.)
  • Tools: Use online converters or office software
  • Alternative: Copy content to plain text file

Processing Stuck

  • Check: Wait for processing to complete (can take several minutes)
  • Refresh: Reload the page to check current status
  • Retry: Remove and re-upload if processing fails

Content Quality Issues

Poor Search Results

  • Review content: Ensure documents contain relevant information
  • Improve structure: Use clear headings and organization
  • Add context: Include explanatory text and examples
  • Check keywords: Ensure important terms are present in documents

Missing Information

  • Verify upload: Confirm all necessary files were uploaded successfully
  • Check processing: Ensure files show "Available" status
  • Review content: Verify extracted text contains expected information
  • Test queries: Use specific questions to test knowledge retrieval

Security and Privacy

Data Protection

  • Encrypted storage: All uploaded files are stored securely
  • Access control: Files are only accessible within your Assistant
  • Private by default: Knowledge Base content is not shared between Assistants
  • Secure deletion: Removed files are permanently deleted from storage

Content Considerations

  • Sensitive information: Be mindful of confidential data in uploads
  • Access permissions: Ensure you have rights to upload and use content
  • Data retention: Understand that uploaded content is stored until manually removed
  • Compliance: Consider regulatory requirements for your industry

This documentation provides a comprehensive guide to using the Assistant Knowledge Base feature effectively. The Knowledge Base transforms your Assistant from a general-purpose AI into a specialized expert with access to your specific information and domain knowledge.