Files & PDFs
Upload documents, PDFs, and other files to your knowledge base. Support for Word, Excel, PowerPoint, text files, and markdown documents.
Upload documents directly to Ask0 to include manuals, guides, PDFs, presentations, and other file-based content in your knowledge base. This source type is perfect for internal documentation, product manuals, and any content not available on the web.
Supported File Types
Documents
- PDF (.pdf) - Full text extraction with layout preservation
- Word (.docx, .doc) - Microsoft Word documents
- Text (.txt, .md, .markdown) - Plain text and Markdown
- RTF (.rtf) - Rich Text Format
Spreadsheets
- Excel (.xlsx, .xls) - Tables and data
- CSV (.csv) - Comma-separated values
- Google Sheets (via export)
Presentations
- PowerPoint (.pptx, .ppt) - Slide content
- Google Slides (via export)
Code & Config
- JSON (.json) - Configuration files
- YAML (.yaml, .yml) - Config and documentation
- XML (.xml) - Structured data
Uploading Files
Navigate to Files Source
In your Ask0 dashboard, go to Sources → Add Source → Files & PDFs
Upload Your Files
Choose upload method:
- Drag & Drop: Drag files directly into the upload area
- Browse: Click to select files from your computer
- Bulk Upload: Upload ZIP archives (automatically extracted)
- URL Import: Provide direct download links
Upload Limits:
Max File Size: 50 MB per file
Max Total Size: 500 MB per upload
Max Files: 100 files per batchConfigure Processing
Set processing options:
Processing Options:
Extract Text: Yes
OCR for Images: Yes (for scanned PDFs)
Extract Metadata: Yes
Language Detection: Automatic
Split Large Documents: Yes
Chunk Size: 1000 tokensOrganize Files
Add metadata and organization:
File Organization:
Category: Product Manuals
Tags: [user-guide, v2.0, technical]
Language: English
Version: 2.0
Valid Until: 2024-12-31File Processing
PDF Processing
Advanced PDF handling:
PDF Options:
Text Extraction: Native
OCR Engine: Tesseract
OCR Languages: [en, es, fr, de]
Extract Images: Yes
Extract Tables: As Markdown
Preserve Formatting: Semantic only
Handle Passwords: Prompt for passwordHandling Different PDF Types
Text-based PDFs:
- Direct text extraction
- Preserves structure and formatting
- Fast processing
Scanned PDFs:
- OCR processing required
- Slower but accurate
- Multiple language support
Hybrid PDFs:
- Combines both methods
- Optimizes for best quality
Document Structure
Preserve document organization:
Structure Preservation:
Headers: Extract as hierarchy
Table of Contents: Generate if missing
Page Numbers: Include in references
Footnotes: Append to relevant sections
Lists: Maintain formatting
Tables: Convert to markdownBulk Operations
Batch Upload
Upload multiple files efficiently:
// Via API
const files = [
{ name: 'manual.pdf', content: fileBuffer1 },
{ name: 'guide.docx', content: fileBuffer2 }
];
const uploaded = await ask0.sources.uploadFiles({
files: files,
category: 'Documentation',
tags: ['manual', 'guide']
});Folder Structure Import
Maintain folder organization:
uploads/
├── products/
│ ├── product-a-manual.pdf
│ └── product-b-manual.pdf
├── guides/
│ ├── quick-start.docx
│ └── advanced-guide.pdf
└── policies/
├── terms.pdf
└── privacy.pdfAsk0 preserves this structure for better organization.
File Management
Version Control
Manage document versions:
Document: Product Manual
Versions:
- v2.0 (Current)
Uploaded: 2024-01-15
File: manual_v2.pdf
- v1.9 (Previous)
Uploaded: 2023-12-01
File: manual_v1.9.pdf
Status: ArchivedUpdate Strategies
Options for updating files:
- Replace: Overwrite existing file
- Version: Keep both versions
- Append: Add to existing content
- Merge: Intelligent content merging
Metadata Management
Rich metadata for better retrieval:
File Metadata:
title: "User Manual v2.0"
author: "Technical Writing Team"
created_date: "2024-01-01"
modified_date: "2024-01-15"
department: "Product"
access_level: "Public"
keywords: ["setup", "installation", "configuration"]
related_files: ["quick-start.pdf", "api-guide.pdf"]Organization Best Practices
Categorization
Organize files effectively:
Categories:
Technical Documentation:
- API References
- Integration Guides
- Architecture Docs
User Documentation:
- Getting Started
- User Manuals
- Tutorials
Legal & Compliance:
- Terms of Service
- Privacy Policy
- Compliance DocsNaming Conventions
Use consistent naming:
Good Examples:
✅ product-manual-v2.0-2024.pdf
✅ api-reference-rest-v3.pdf
✅ user-guide-mobile-app.docx
Bad Examples:
❌ manual.pdf
❌ doc1.docx
❌ final-final-v2.pdfTagging Strategy
Effective tagging for retrieval:
Tag Categories:
Type: [manual, guide, reference, policy]
Product: [product-a, product-b, platform]
Audience: [developer, user, admin]
Version: [v1, v2, latest]
Status: [current, deprecated, draft]Processing Options
OCR Configuration
For scanned documents:
OCR Settings:
Enable: Auto-detect
Languages: [English, Spanish, French]
Quality: High
Preprocessing:
- Deskew
- Denoise
- Contrast enhancement
Confidence Threshold: 80%Content Extraction
Control what's extracted:
Extraction Rules:
Include:
- Main content
- Headers/footers
- Captions
- Alt text
Exclude:
- Page numbers
- Watermarks
- Advertisements
Special Handling:
- Code blocks: Preserve formatting
- Tables: Convert to markdown
- Images: Extract alt textQuality Assurance
Validation
Automatic quality checks:
Quality Checks:
Text Extraction: Verify completeness
Language Detection: Confirm accuracy
Encoding: UTF-8 validation
Structure: Heading hierarchy
Links: Check for broken referencesManual Review
Review uploaded content:
- Preview extracted text
- Verify structure preservation
- Check metadata accuracy
- Validate search results
Common Issues:
Poor OCR Quality
- Ensure high-resolution scans (300 DPI minimum)
- Use clean, well-lit scans
- Avoid handwritten text
Missing Content
- Check PDF security settings
- Verify text is selectable (not image)
- Review extraction logs
Large File Processing
- Split very large documents
- Compress images before upload
- Use batch processing for multiple files
Security & Privacy
Data Handling
How files are processed:
Security Measures:
Encryption: At rest and in transit
Storage: Secure cloud storage
Access Control: Role-based permissions
Retention: Configurable retention policies
Deletion: Permanent removal optionSensitive Content
Handle confidential documents:
Sensitive Content:
Redaction: Auto-detect and redact PII
Access Restrictions: Limit to specific users
Audit Logging: Track all access
Compliance: GDPR, HIPAA compliantAPI Integration
Manage files programmatically:
// Upload file
const file = await ask0.sources.uploadFile({
name: 'manual.pdf',
content: fileBuffer,
contentType: 'application/pdf',
metadata: {
category: 'Product Manual',
tags: ['v2.0', 'user-guide'],
language: 'en'
}
});
// Update file metadata
await ask0.sources.updateFile(fileId, {
metadata: {
version: '2.1',
validUntil: '2025-01-01'
}
});
// List files
const files = await ask0.sources.listFiles({
category: 'Product Manual',
tags: ['current']
});
// Delete file
await ask0.sources.deleteFile(fileId);Performance Tips
Optimization
Improve processing speed:
- Pre-process files: Optimize PDFs before upload
- Batch uploads: Upload multiple files together
- Compress images: Reduce file size
- Use appropriate formats: Prefer text-based PDFs
Monitoring
Track file source performance:
Metrics:
Processing Time: Average per file type
Extraction Quality: Success rate
Storage Usage: By category
Query Performance: Retrieval speed
User Feedback: Per document ratingsNext Steps
Notion Integration
Connect your Notion workspace as a knowledge source. Sync pages, databases, and documentation from Notion to your AI assistant knowledge base.
Knowledge API
Programmatically ingest and manage knowledge through our API (Coming Soon). REST endpoints for content ingestion and knowledge base management.