The Knowledge Base is a core feature of Assistants that enables RAG (Retrieval-Augmented Generation) by allowing you to upload and manage documents that your Assistant can reference during conversations. When users chat with your Assistant, the system automatically searches through uploaded documents to find relevant information and includes it in the AI's responses.
What is RAG (Retrieval-Augmented Generation)?
RAG is an AI technique that enhances language model responses by retrieving relevant information from external knowledge sources. Instead of relying solely on the AI model's training data, RAG allows your Assistant to:
- Access current information from your uploaded documents
- Provide specific, accurate answers based on your content
- Reference company policies, procedures, or data that wasn't in the AI's training
- Stay up-to-date with your latest information without retraining
Knowledge Base Interface
[Screenshot placeholder: Knowledge Base tab overview]
The Knowledge Base tab provides a comprehensive file management interface with the following sections:
Header Section
- Knowledge Base title with file count badge
- Search bar for finding specific files
- Add Files button for uploading new documents
File Management Area
- File table displaying all uploaded documents
- Status indicators showing processing progress
- Action buttons for file inspection and management
- Drag & drop zone for easy file uploads
Supported File Formats
The Knowledge Base supports a wide variety of document formats:
Document Formats
- PDF (.pdf) - Adobe Portable Document Format
- Microsoft Word (.doc, .docx) - Word documents
- Rich Text Format (.rtf) - Rich text documents
- Plain Text (.txt) - Simple text files
Web and Markup Formats
- HTML (.html, .htm) - Web pages and structured content
- Markdown (.md) - Markdown formatted text
- XML (.xml) - Structured markup documents
Data Formats
- CSV (.csv) - Comma-separated values
- JSON (.json) - JavaScript Object Notation
- Excel (.xls, .xlsx) - Spreadsheet data
Presentation Formats
- PowerPoint (.ppt, .pptx) - Presentation slides
Programming and Code
- JavaScript (.js) - JavaScript source code
- CSS (.css) - Stylesheets
- Various code files - Source code in different languages
File Upload Process
Method 1: Drag & Drop Upload
[Screenshot placeholder: Drag and drop interface]
- Drag files from your computer onto any area of the Knowledge Base tab
- Visual feedback appears showing the drop zone
- Release files to begin the upload process
- Progress tracking shows upload and processing status
The drag & drop interface features:
- Visual indicators when files are being dragged
- Animated drop zone with clear instructions
- Multi-file support for bulk uploads
- File type validation with helpful error messages
Method 2: File Browser Upload
[Screenshot placeholder: Add Files button and file picker]
- Click the "Add Files" button
- File picker opens allowing you to browse your computer
- Select single or multiple files using standard file selection
- Confirm selection to begin upload process
Upload Stages
Each uploaded file goes through a three-stage processing pipeline:
Stage 1: Getting Upload URL (25% Progress)
- Secure URL generation for S3 storage
- File validation and size checking
- Preparation for cloud upload
Stage 2: Uploading to S3 (50% Progress)
- Direct upload to secure cloud storage
- Progress tracking for large files
- Error handling for upload failures
Stage 3: Processing (75% Progress)
- File registration with the system
- Text extraction from documents
- Initial processing preparation
Stage 4: Text Extraction and Embedding (100% Progress)
- Content extraction from various file formats
- Text chunking for optimal retrieval
- AI embedding generation for similarity search
- Indexing for fast retrieval
File Status Types
Processing States
Available
- Status: Ready for use in conversations
- Indicator: Green dot
- Description: File has been fully processed and indexed
Extracting Text
- Status: Converting document content to searchable text
- Indicator: Yellow spinner
- Description: System is extracting readable text from the document
Embedding
- Status: Generating AI embeddings for similarity search
- Indicator: Blue spinner
- Description: Creating vector representations for intelligent retrieval
Processing
- Status: General processing state
- Indicator: Gray spinner
- Description: File is being prepared for use
Error
- Status: Processing failed
- Indicator: Red indicator
- Description: File could not be processed successfully
Uploading
- Status: File transfer in progress
- Indicator: Upload progress bar
- Description: File is being uploaded to the system
File Management Features
File Information Display
[Screenshot placeholder: File table with details]
Each file in the Knowledge Base displays:
Basic Information
- File name with clear, readable display
- File size in human-readable format (KB, MB, GB)
- File type with format description
- Upload date with full timestamp
Status Information
- Current processing status with visual indicators
- Progress indicators for files being processed
- Error messages for failed processing
File Actions
File Inspection
[Screenshot placeholder: File detail modal]
Click on any file to open the detailed inspection modal:
File Details Tab:
- Complete file information including metadata
- Processing status with detailed explanations
- Upload timestamp and file history
- File type and size information
Content Preview:
- Extracted text preview (when available)
- Download links for original files
- Content statistics (word count, etc.)
File Management Actions
View/Inspect File:
- Access detailed file information
- Preview extracted text content
- Download original file
- Review processing status
Remove File:
- Permanently delete from Knowledge Base
- Confirmation dialog prevents accidental deletion
- Immediate removal from Assistant's knowledge
Search and Organization
File Search
[Screenshot placeholder: Search functionality]
The search feature allows you to:
- Find files by name using partial matching
- Real-time filtering as you type
- Case-insensitive search for ease of use
- Clear search to return to full list
File Sorting
Files are automatically organized by:
- Most recent uploads appear first
- Processing status (uploading files shown at top)
- Alphabetical sorting within status groups
Processing Pipeline Details
Text Extraction Process
The system automatically extracts text from various file formats:
PDF Processing
- Text layer extraction from native PDF text
- OCR capabilities for scanned documents
- Table and structure preservation where possible
- Metadata extraction (title, author, etc.)
Document Processing
- Microsoft Office document text extraction
- Formatting preservation for better context
- Header and structure recognition
- Embedded content handling
Web Content Processing
- HTML parsing with content extraction
- Markdown rendering and formatting
- Link and reference preservation
- Clean text output without markup noise
Chunking Strategy
Documents are intelligently split into chunks for optimal retrieval:
Chunking Parameters
- Chunk size: Typically 1000-2000 characters
- Overlap: 10-20% overlap between chunks
- Boundary respect: Splits at sentence or paragraph boundaries
- Context preservation: Maintains logical content groupings
Chunking Benefits
- Relevant retrieval: Find specific sections without full document
- Context preservation: Maintain meaning across chunk boundaries
- Performance optimization: Faster search and retrieval
- Token efficiency: Fit relevant content within AI context limits
Embedding Generation
Each text chunk gets converted to AI embeddings:
Embedding Process
- Vector representation of text meaning
- Semantic similarity for intelligent matching
- Multi-dimensional vectors for nuanced understanding
- Optimized for search with high-quality embeddings
Search Capabilities
- Semantic search: Find content by meaning, not just keywords
- Relevance ranking: Return most relevant chunks first
- Context matching: Understand user intent and query context
- Cross-document search: Find related content across all files
Knowledge Base Integration with Chat
Automatic RAG Integration
When users chat with your Assistant, the Knowledge Base automatically:
Query Processing
- Analyze user question for intent and keywords
- Search knowledge base using semantic similarity
- Rank relevant chunks by relevance score
- Select top matches for inclusion in AI context
Context Inclusion
- Relevant chunks automatically added to AI prompt
- Source attribution maintained for transparency
- Context optimization to fit within token limits
- Quality filtering to ensure relevance
RAG Configuration
System-Level Settings
Knowledge Base behavior is controlled by the underlying System's resource settings:
- Embedding enabled: Whether to use vector embeddings
- Chunk size: Size of text chunks for processing
- Number of results: How many relevant chunks to include
- Similarity threshold: Minimum relevance score for inclusion
Assistant-Level Settings
- RAG enabled: Toggle Knowledge Base usage for the Assistant
- Document scope: Which uploaded files to include in searches
Best Practices
File Organization
Naming Conventions
- Descriptive names: Use clear, searchable file names
- Version control: Include version numbers or dates when applicable
- Category prefixes: Group related files with consistent naming
- Avoid special characters: Stick to alphanumeric characters and hyphens
Content Quality
- Well-structured documents: Use headings, lists, and clear formatting
- Complete information: Ensure documents contain comprehensive information
- Current content: Keep uploaded documents up-to-date
- Relevant scope: Only upload content relevant to your Assistant's purpose
Upload Strategy
File Selection
- Quality over quantity: Focus on high-quality, relevant documents
- Comprehensive coverage: Include all necessary reference materials
- Avoid duplicates: Remove redundant or outdated files
- Size considerations: Balance completeness with processing efficiency
Batch Processing
- Plan uploads: Organize files before uploading
- Monitor processing: Watch for errors during bulk uploads
- Verify completion: Ensure all files reach "Available" status
- Test integration: Verify Knowledge Base works in chat after uploads
Maintenance and Updates
Regular Review
- Content audits: Periodically review uploaded files for relevance
- Remove outdated content: Delete files that are no longer accurate
- Update information: Replace old files with newer versions
- Monitor usage: Track which documents are being referenced
Performance Optimization
- File size management: Keep individual files to reasonable sizes
- Format optimization: Use text-based formats when possible
- Content structure: Well-organized documents improve retrieval
- Regular cleanup: Remove unused or irrelevant files
Troubleshooting
Common Upload Issues
File Too Large
- Solution: Break large files into smaller, focused documents
- Alternative: Extract key sections into separate files
- Optimization: Remove unnecessary images or formatting
Unsupported Format
- Solution: Convert to supported format (PDF, DOCX, TXT, etc.)
- Tools: Use online converters or office software
- Alternative: Copy content to plain text file
Processing Stuck
- Check: Wait for processing to complete (can take several minutes)
- Refresh: Reload the page to check current status
- Retry: Remove and re-upload if processing fails
Content Quality Issues
Poor Search Results
- Review content: Ensure documents contain relevant information
- Improve structure: Use clear headings and organization
- Add context: Include explanatory text and examples
- Check keywords: Ensure important terms are present in documents
Missing Information
- Verify upload: Confirm all necessary files were uploaded successfully
- Check processing: Ensure files show "Available" status
- Review content: Verify extracted text contains expected information
- Test queries: Use specific questions to test knowledge retrieval
Security and Privacy
Data Protection
- Encrypted storage: All uploaded files are stored securely
- Access control: Files are only accessible within your Assistant
- Private by default: Knowledge Base content is not shared between Assistants
- Secure deletion: Removed files are permanently deleted from storage
Content Considerations
- Sensitive information: Be mindful of confidential data in uploads
- Access permissions: Ensure you have rights to upload and use content
- Data retention: Understand that uploaded content is stored until manually removed
- Compliance: Consider regulatory requirements for your industry
This documentation provides a comprehensive guide to using the Assistant Knowledge Base feature effectively. The Knowledge Base transforms your Assistant from a general-purpose AI into a specialized expert with access to your specific information and domain knowledge.