Skip to content
Misar

What Can You Upload to Train Your AI Assistant? (Complete File Guide)

All articles
Tutorial

What Can You Upload to Train Your AI Assistant? (Complete File Guide)

A comprehensive guide to file formats, best practices, and optimization tips for training your AI assistant's knowledge base.

Assisters Team·January 3, 2026·7 min read

What Can You Upload to Train Your AI Assistant?

Your AI assistant is only as good as its knowledge base. This guide covers everything you need to know about uploading content—file formats, best practices, and optimization tips.

Supported File Formats

Documents

PDF Files (.pdf)

  • Standard text PDFs: Fully supported
  • Scanned PDFs: Supported via OCR (optical character recognition)
  • Image-heavy PDFs: Text extracted, images processed for text
  • Max size: 10MB per file

Word Documents (.doc, .docx)

  • Full formatting preserved for context
  • Headers and structure maintained
  • Tables converted to readable format
  • Max size: 10MB per file

Text Files (.txt)

  • Plain text, UTF-8 encoding recommended
  • Great for FAQs and simple content
  • No formatting overhead
  • Max size: 10MB per file

Markdown Files (.md)

  • Structure preserved (headers, lists)
  • Code blocks included
  • Ideal for technical documentation
  • Max size: 10MB per file

Spreadsheets

CSV Files (.csv)

  • Rows converted to readable entries
  • Great for product catalogs, FAQs, data tables
  • First row treated as headers
  • Max size: 10MB per file

Excel Files (.xlsx)

  • First sheet processed by default
  • Tables and data extracted
  • Formulas converted to values
  • Max size: 10MB per file

Images

Image Files (.png, .jpg, .jpeg)

  • OCR extracts visible text
  • Great for scanned documents, screenshots, infographics
  • Handwritten text partially supported
  • Max size: 5MB per file

What Makes Good Training Content?

High-Quality Content Characteristics

Specific and Detailed

  • Bad: "Our product helps with productivity"
  • Good: "Our task management feature saves users an average of 2.5 hours per week by automating recurring task creation"

Well-Structured

  • Use clear headings
  • Organize by topic
  • Maintain consistent formatting

Comprehensive

  • Cover common questions
  • Include edge cases
  • Provide context and nuance

Current

  • Remove outdated information
  • Update with recent changes
  • Date-stamp time-sensitive content

Content Types That Work Well

FAQs and Q&A Pairs

  • Explicit question-answer format
  • Covers common user queries
  • Easy for AI to match and retrieve

How-To Guides

  • Step-by-step instructions
  • Clear procedures
  • Troubleshooting steps

Reference Documentation

  • Product specifications
  • Policy documents
  • Technical details

Case Studies

  • Real examples
  • Outcomes and lessons
  • Context and nuance

Frameworks and Methodologies

  • Your unique approaches
  • Decision-making processes
  • Best practices

Content to Avoid

What Not to Upload

Duplicate Content

  • Multiple versions of the same document confuse the AI
  • Keep one authoritative version

Contradictory Information

  • Old and new policies together cause confusion
  • Archive outdated content separately

Sensitive Data

  • Personal information (unless necessary and compliant)
  • Credentials or passwords
  • Internal-only confidential information

Raw Data Without Context

  • Numbers without explanation
  • Lists without descriptions
  • Data that requires interpretation

Organization Best Practices

Naming Conventions

Use descriptive file names:

  • Good: "return-policy-2026.pdf"
  • Bad: "doc1.pdf"

Content Categories

Organize your content into logical groups:

  • Product information
  • Support and troubleshooting
  • Policies and procedures
  • FAQs by topic
  • Case studies and examples

Versioning

When updating content:

  • Replace old files with updated versions
  • Don't keep multiple versions active
  • Note significant changes in your content

Optimizing for Retrieval

Your content gets chunked and embedded for retrieval. Help this process:

Use Clear Headings

The AI uses headings to understand document structure:

Product Overview

Features

Feature 1: Task Management

...

Include Context

Don't assume knowledge:

  • Bad: "It costs $99"
  • Good: "The Professional plan costs $99/month and includes..."

Be Explicit

When information relates to other topics:

  • Bad: "See above"
  • Good: "As mentioned in the pricing section, the Professional plan..."

Format for Scanning

Use lists and bullet points:

  • Easier to process
  • Better retrieval accuracy
  • More scannable responses

Processing and Limits

How Content Gets Processed

  • Upload: You upload files to your knowledge base
  • Extraction: Text is extracted from all files
  • Chunking: Content is split into semantic chunks
  • Embedding: Chunks are converted to vector embeddings
  • Indexing: Embeddings are stored for fast retrieval

Current Limits

  • Storage: 10MB free, additional charged to wallet
  • File size: 10MB per file maximum
  • File count: No hard limit (storage-based)
  • Processing time: 1-5 minutes depending on content

Storage Management

Monitor your usage in Creator Studio:

  • View total storage used
  • See breakdown by assistant
  • Delete or replace old files as needed

Updating Your Knowledge Base

When to Update

  • Product or service changes
  • New frequently asked questions emerge
  • Policies or procedures change
  • You develop new insights or methodologies

How to Update

  • Navigate to your assistant's Knowledge Base
  • Upload new or replacement files
  • Delete outdated content
  • Reprocessing happens automatically

Testing After Updates

After significant updates:

  • Ask questions about new content
  • Verify old content still works
  • Check for any conflicts or confusion

Troubleshooting

Common Issues

"My assistant doesn't know about content I uploaded"

  • Check if processing completed
  • Verify the content is in a supported format
  • Test with exact phrases from your document

"Responses seem outdated"

  • Replace old files with current versions
  • Check for duplicate files with old information
  • Ensure the latest upload processed successfully

"OCR isn't capturing text correctly"

  • Use higher-resolution images
  • Ensure text is clear and legible
  • Consider retyping critical content

"File upload fails"

  • Check file size (under 10MB)
  • Verify file format is supported
  • Try re-saving in a different format

Checklist for Great Knowledge Bases

  • All files under 10MB
  • No duplicate content
  • Information is current
  • Clear headings and structure
  • Explicit context provided
  • No sensitive data included
  • FAQ format where appropriate
  • Tested after upload

Your knowledge base is your AI assistant's brain. Invest in quality content, and it will pay dividends in better user experiences and more revenue.

Upload your content →

foundationalcreatorsknowledge basetutorial