Files & PDFs

Upload documents, PDFs, and other files to your knowledge base. Support for Word, Excel, PowerPoint, text files, and markdown documents.

Upload documents directly to Ask0 to include manuals, guides, PDFs, presentations, and other file-based content in your knowledge base. This source type is perfect for internal documentation, product manuals, and any content not available on the web.

Supported File Types

Documents

PDF (.pdf) - Full text extraction with layout preservation
Word (.docx, .doc) - Microsoft Word documents
Text (.txt, .md, .markdown) - Plain text and Markdown
RTF (.rtf) - Rich Text Format

Spreadsheets

Excel (.xlsx, .xls) - Tables and data
CSV (.csv) - Comma-separated values
Google Sheets (via export)

Presentations

PowerPoint (.pptx, .ppt) - Slide content
Google Slides (via export)

Code & Config

JSON (.json) - Configuration files
YAML (.yaml, .yml) - Config and documentation
XML (.xml) - Structured data

Uploading Files

Navigate to Files Source

In your Ask0 dashboard, go to Sources → Add Source → Files & PDFs

Upload Your Files

Choose upload method:

Drag & Drop: Drag files directly into the upload area
Browse: Click to select files from your computer
Bulk Upload: Upload ZIP archives (automatically extracted)
URL Import: Provide direct download links

Upload Limits:
  Max File Size: 50 MB per file
  Max Total Size: 500 MB per upload
  Max Files: 100 files per batch

Configure Processing

Set processing options:

Processing Options:
  Extract Text: Yes
  OCR for Images: Yes (for scanned PDFs)
  Extract Metadata: Yes
  Language Detection: Automatic
  Split Large Documents: Yes
  Chunk Size: 1000 tokens

Organize Files

Add metadata and organization:

File Organization:
  Category: Product Manuals
  Tags: [user-guide, v2.0, technical]
  Language: English
  Version: 2.0
  Valid Until: 2024-12-31

File Processing

PDF Processing

Advanced PDF handling:

PDF Options:
  Text Extraction: Native
  OCR Engine: Tesseract
  OCR Languages: [en, es, fr, de]
  Extract Images: Yes
  Extract Tables: As Markdown
  Preserve Formatting: Semantic only
  Handle Passwords: Prompt for password

Handling Different PDF Types

Text-based PDFs:

Direct text extraction
Preserves structure and formatting
Fast processing

Scanned PDFs:

OCR processing required
Slower but accurate
Multiple language support

Hybrid PDFs:

Combines both methods
Optimizes for best quality

Document Structure

Preserve document organization:

Structure Preservation:
  Headers: Extract as hierarchy
  Table of Contents: Generate if missing
  Page Numbers: Include in references
  Footnotes: Append to relevant sections
  Lists: Maintain formatting
  Tables: Convert to markdown

Bulk Operations

Batch Upload

Upload multiple files efficiently:

// Via API
const files = [
  { name: 'manual.pdf', content: fileBuffer1 },
  { name: 'guide.docx', content: fileBuffer2 }
];

const uploaded = await ask0.sources.uploadFiles({
  files: files,
  category: 'Documentation',
  tags: ['manual', 'guide']
});

Folder Structure Import

Maintain folder organization:

uploads/
├── products/
│   ├── product-a-manual.pdf
│   └── product-b-manual.pdf
├── guides/
│   ├── quick-start.docx
│   └── advanced-guide.pdf
└── policies/
    ├── terms.pdf
    └── privacy.pdf

Ask0 preserves this structure for better organization.

File Management

Version Control

Manage document versions:

Document: Product Manual
Versions:
  - v2.0 (Current)
    Uploaded: 2024-01-15
    File: manual_v2.pdf
  - v1.9 (Previous)
    Uploaded: 2023-12-01
    File: manual_v1.9.pdf
    Status: Archived

Update Strategies

Options for updating files:

Replace: Overwrite existing file
Version: Keep both versions
Append: Add to existing content
Merge: Intelligent content merging

Metadata Management

Rich metadata for better retrieval:

File Metadata:
  title: "User Manual v2.0"
  author: "Technical Writing Team"
  created_date: "2024-01-01"
  modified_date: "2024-01-15"
  department: "Product"
  access_level: "Public"
  keywords: ["setup", "installation", "configuration"]
  related_files: ["quick-start.pdf", "api-guide.pdf"]

Organization Best Practices

Categorization

Organize files effectively:

Categories:
  Technical Documentation:
    - API References
    - Integration Guides
    - Architecture Docs
  User Documentation:
    - Getting Started
    - User Manuals
    - Tutorials
  Legal & Compliance:
    - Terms of Service
    - Privacy Policy
    - Compliance Docs

Naming Conventions

Use consistent naming:

Good Examples:
  ✅ product-manual-v2.0-2024.pdf
  ✅ api-reference-rest-v3.pdf
  ✅ user-guide-mobile-app.docx

Bad Examples:
  ❌ manual.pdf
  ❌ doc1.docx
  ❌ final-final-v2.pdf

Tagging Strategy

Effective tagging for retrieval:

Tag Categories:
  Type: [manual, guide, reference, policy]
  Product: [product-a, product-b, platform]
  Audience: [developer, user, admin]
  Version: [v1, v2, latest]
  Status: [current, deprecated, draft]

Processing Options

OCR Configuration

For scanned documents:

OCR Settings:
  Enable: Auto-detect
  Languages: [English, Spanish, French]
  Quality: High
  Preprocessing:
    - Deskew
    - Denoise
    - Contrast enhancement
  Confidence Threshold: 80%

Content Extraction

Control what's extracted:

Extraction Rules:
  Include:
    - Main content
    - Headers/footers
    - Captions
    - Alt text
  Exclude:
    - Page numbers
    - Watermarks
    - Advertisements
  Special Handling:
    - Code blocks: Preserve formatting
    - Tables: Convert to markdown
    - Images: Extract alt text

Quality Assurance

Validation

Automatic quality checks:

Quality Checks:
  Text Extraction: Verify completeness
  Language Detection: Confirm accuracy
  Encoding: UTF-8 validation
  Structure: Heading hierarchy
  Links: Check for broken references

Manual Review

Review uploaded content:

Preview extracted text
Verify structure preservation
Check metadata accuracy
Validate search results

Common Issues:

Poor OCR Quality

Ensure high-resolution scans (300 DPI minimum)
Use clean, well-lit scans
Avoid handwritten text

Missing Content

Check PDF security settings
Verify text is selectable (not image)
Review extraction logs

Large File Processing

Split very large documents
Compress images before upload
Use batch processing for multiple files

Security & Privacy

Data Handling

How files are processed:

Security Measures:
  Encryption: At rest and in transit
  Storage: Secure cloud storage
  Access Control: Role-based permissions
  Retention: Configurable retention policies
  Deletion: Permanent removal option

Sensitive Content

Handle confidential documents:

Sensitive Content:
  Redaction: Auto-detect and redact PII
  Access Restrictions: Limit to specific users
  Audit Logging: Track all access
  Compliance: GDPR, HIPAA compliant

API Integration

Manage files programmatically:

// Upload file
const file = await ask0.sources.uploadFile({
  name: 'manual.pdf',
  content: fileBuffer,
  contentType: 'application/pdf',
  metadata: {
    category: 'Product Manual',
    tags: ['v2.0', 'user-guide'],
    language: 'en'
  }
});

// Update file metadata
await ask0.sources.updateFile(fileId, {
  metadata: {
    version: '2.1',
    validUntil: '2025-01-01'
  }
});

// List files
const files = await ask0.sources.listFiles({
  category: 'Product Manual',
  tags: ['current']
});

// Delete file
await ask0.sources.deleteFile(fileId);

Performance Tips

Optimization

Improve processing speed:

Pre-process files: Optimize PDFs before upload
Batch uploads: Upload multiple files together
Compress images: Reduce file size
Use appropriate formats: Prefer text-based PDFs

Monitoring

Track file source performance:

Metrics:
  Processing Time: Average per file type
  Extraction Quality: Success rate
  Storage Usage: By category
  Query Performance: Retrieval speed
  User Feedback: Per document ratings

Next Steps

Configure Web Sources

Add website content

Add Custom Knowledge

Create Q&As and FAQs

Manage Sources

Organize all your sources

Files & PDFs

Configure Web Sources

Add Custom Knowledge

Manage Sources

On this page