Ask0 logoAsk0
Sources

Files & PDFs

Upload documents, PDFs, and other files to your knowledge base. Support for Word, Excel, PowerPoint, text files, and markdown documents.

Upload documents directly to Ask0 to include manuals, guides, PDFs, presentations, and other file-based content in your knowledge base. This source type is perfect for internal documentation, product manuals, and any content not available on the web.

Supported File Types

Documents

  • PDF (.pdf) - Full text extraction with layout preservation
  • Word (.docx, .doc) - Microsoft Word documents
  • Text (.txt, .md, .markdown) - Plain text and Markdown
  • RTF (.rtf) - Rich Text Format

Spreadsheets

  • Excel (.xlsx, .xls) - Tables and data
  • CSV (.csv) - Comma-separated values
  • Google Sheets (via export)

Presentations

  • PowerPoint (.pptx, .ppt) - Slide content
  • Google Slides (via export)

Code & Config

  • JSON (.json) - Configuration files
  • YAML (.yaml, .yml) - Config and documentation
  • XML (.xml) - Structured data

Uploading Files

In your Ask0 dashboard, go to SourcesAdd SourceFiles & PDFs

Upload Your Files

Choose upload method:

  • Drag & Drop: Drag files directly into the upload area
  • Browse: Click to select files from your computer
  • Bulk Upload: Upload ZIP archives (automatically extracted)
  • URL Import: Provide direct download links
Upload Limits:
  Max File Size: 50 MB per file
  Max Total Size: 500 MB per upload
  Max Files: 100 files per batch

Configure Processing

Set processing options:

Processing Options:
  Extract Text: Yes
  OCR for Images: Yes (for scanned PDFs)
  Extract Metadata: Yes
  Language Detection: Automatic
  Split Large Documents: Yes
  Chunk Size: 1000 tokens

Organize Files

Add metadata and organization:

File Organization:
  Category: Product Manuals
  Tags: [user-guide, v2.0, technical]
  Language: English
  Version: 2.0
  Valid Until: 2024-12-31

File Processing

PDF Processing

Advanced PDF handling:

PDF Options:
  Text Extraction: Native
  OCR Engine: Tesseract
  OCR Languages: [en, es, fr, de]
  Extract Images: Yes
  Extract Tables: As Markdown
  Preserve Formatting: Semantic only
  Handle Passwords: Prompt for password

Handling Different PDF Types

Text-based PDFs:

  • Direct text extraction
  • Preserves structure and formatting
  • Fast processing

Scanned PDFs:

  • OCR processing required
  • Slower but accurate
  • Multiple language support

Hybrid PDFs:

  • Combines both methods
  • Optimizes for best quality

Document Structure

Preserve document organization:

Structure Preservation:
  Headers: Extract as hierarchy
  Table of Contents: Generate if missing
  Page Numbers: Include in references
  Footnotes: Append to relevant sections
  Lists: Maintain formatting
  Tables: Convert to markdown

Bulk Operations

Batch Upload

Upload multiple files efficiently:

// Via API
const files = [
  { name: 'manual.pdf', content: fileBuffer1 },
  { name: 'guide.docx', content: fileBuffer2 }
];

const uploaded = await ask0.sources.uploadFiles({
  files: files,
  category: 'Documentation',
  tags: ['manual', 'guide']
});

Folder Structure Import

Maintain folder organization:

uploads/
├── products/
│   ├── product-a-manual.pdf
│   └── product-b-manual.pdf
├── guides/
│   ├── quick-start.docx
│   └── advanced-guide.pdf
└── policies/
    ├── terms.pdf
    └── privacy.pdf

Ask0 preserves this structure for better organization.

File Management

Version Control

Manage document versions:

Document: Product Manual
Versions:
  - v2.0 (Current)
    Uploaded: 2024-01-15
    File: manual_v2.pdf
  - v1.9 (Previous)
    Uploaded: 2023-12-01
    File: manual_v1.9.pdf
    Status: Archived

Update Strategies

Options for updating files:

  1. Replace: Overwrite existing file
  2. Version: Keep both versions
  3. Append: Add to existing content
  4. Merge: Intelligent content merging

Metadata Management

Rich metadata for better retrieval:

File Metadata:
  title: "User Manual v2.0"
  author: "Technical Writing Team"
  created_date: "2024-01-01"
  modified_date: "2024-01-15"
  department: "Product"
  access_level: "Public"
  keywords: ["setup", "installation", "configuration"]
  related_files: ["quick-start.pdf", "api-guide.pdf"]

Organization Best Practices

Categorization

Organize files effectively:

Categories:
  Technical Documentation:
    - API References
    - Integration Guides
    - Architecture Docs
  User Documentation:
    - Getting Started
    - User Manuals
    - Tutorials
  Legal & Compliance:
    - Terms of Service
    - Privacy Policy
    - Compliance Docs

Naming Conventions

Use consistent naming:

Good Examples:
  ✅ product-manual-v2.0-2024.pdf
  ✅ api-reference-rest-v3.pdf
  ✅ user-guide-mobile-app.docx

Bad Examples:
  ❌ manual.pdf
  ❌ doc1.docx
  ❌ final-final-v2.pdf

Tagging Strategy

Effective tagging for retrieval:

Tag Categories:
  Type: [manual, guide, reference, policy]
  Product: [product-a, product-b, platform]
  Audience: [developer, user, admin]
  Version: [v1, v2, latest]
  Status: [current, deprecated, draft]

Processing Options

OCR Configuration

For scanned documents:

OCR Settings:
  Enable: Auto-detect
  Languages: [English, Spanish, French]
  Quality: High
  Preprocessing:
    - Deskew
    - Denoise
    - Contrast enhancement
  Confidence Threshold: 80%

Content Extraction

Control what's extracted:

Extraction Rules:
  Include:
    - Main content
    - Headers/footers
    - Captions
    - Alt text
  Exclude:
    - Page numbers
    - Watermarks
    - Advertisements
  Special Handling:
    - Code blocks: Preserve formatting
    - Tables: Convert to markdown
    - Images: Extract alt text

Quality Assurance

Validation

Automatic quality checks:

Quality Checks:
  Text Extraction: Verify completeness
  Language Detection: Confirm accuracy
  Encoding: UTF-8 validation
  Structure: Heading hierarchy
  Links: Check for broken references

Manual Review

Review uploaded content:

  1. Preview extracted text
  2. Verify structure preservation
  3. Check metadata accuracy
  4. Validate search results

Common Issues:

Poor OCR Quality

  • Ensure high-resolution scans (300 DPI minimum)
  • Use clean, well-lit scans
  • Avoid handwritten text

Missing Content

  • Check PDF security settings
  • Verify text is selectable (not image)
  • Review extraction logs

Large File Processing

  • Split very large documents
  • Compress images before upload
  • Use batch processing for multiple files

Security & Privacy

Data Handling

How files are processed:

Security Measures:
  Encryption: At rest and in transit
  Storage: Secure cloud storage
  Access Control: Role-based permissions
  Retention: Configurable retention policies
  Deletion: Permanent removal option

Sensitive Content

Handle confidential documents:

Sensitive Content:
  Redaction: Auto-detect and redact PII
  Access Restrictions: Limit to specific users
  Audit Logging: Track all access
  Compliance: GDPR, HIPAA compliant

API Integration

Manage files programmatically:

// Upload file
const file = await ask0.sources.uploadFile({
  name: 'manual.pdf',
  content: fileBuffer,
  contentType: 'application/pdf',
  metadata: {
    category: 'Product Manual',
    tags: ['v2.0', 'user-guide'],
    language: 'en'
  }
});

// Update file metadata
await ask0.sources.updateFile(fileId, {
  metadata: {
    version: '2.1',
    validUntil: '2025-01-01'
  }
});

// List files
const files = await ask0.sources.listFiles({
  category: 'Product Manual',
  tags: ['current']
});

// Delete file
await ask0.sources.deleteFile(fileId);

Performance Tips

Optimization

Improve processing speed:

  1. Pre-process files: Optimize PDFs before upload
  2. Batch uploads: Upload multiple files together
  3. Compress images: Reduce file size
  4. Use appropriate formats: Prefer text-based PDFs

Monitoring

Track file source performance:

Metrics:
  Processing Time: Average per file type
  Extraction Quality: Success rate
  Storage Usage: By category
  Query Performance: Retrieval speed
  User Feedback: Per document ratings

Next Steps