Sources
Configure knowledge sources to train your Ask0 AI assistant. Connect docs, GitHub, Notion, Discord, files, and custom content sources.
Sources are the foundation of your AI assistant's knowledge. They define what content your assistant can access to answer user questions. Ask0 supports multiple source types and provides powerful tools to manage and optimize your knowledge base.
What are Sources?
Sources are collections of content that Ask0 indexes and uses to train your AI assistant. When users ask questions, the assistant searches through these sources to find relevant information and generate accurate responses.
Key Capabilities
- Automatic Crawling: Web sources are crawled automatically on your schedule
- Real-time Updates: Keep knowledge current with incremental updates
- Multiple Formats: Support for websites, documents, APIs, and more
- Smart Processing: Intelligent chunking and indexing for optimal retrieval
- Source Attribution: Responses show which sources were used
Available Source Types
Web Sources
Crawl and index websites, documentation sites, and blogs
Custom Knowledge
Add specific Q&As, FAQs, and custom content directly
GitHub
Use GitHub issues, discussions, and wikis as knowledge
Discord
Index Discord channels and threads for community knowledge
Notion
Connect your Notion workspace as a knowledge source
Files & PDFs
Upload documents, PDFs, and other files
How Sources Work
1. Content Discovery
When you add a source, Ask0 discovers available content:
- Web crawler follows links and sitemaps
- API connectors fetch available resources
- File uploads are processed immediately
2. Processing & Indexing
Content is processed for optimal retrieval:
- Text Extraction: Convert various formats to searchable text
- Chunking: Split content into semantic segments
- Embedding: Generate vector representations for semantic search
- Metadata: Extract and store titles, URLs, timestamps
3. Knowledge Retrieval
When users ask questions:
- Query is converted to embeddings
- Semantic search finds relevant chunks
- Context is assembled from multiple sources
- AI generates response with source attribution
Managing Sources
Adding Sources
- Navigate to Sources in your project dashboard
- Click "Add Source" and select the type
- Configure source-specific settings:
- URLs and patterns for web sources
- Authentication for API sources
- Upload files for document sources
- Set refresh schedule (if applicable)
- Click "Save & Start Indexing"
Source Configuration
Each source type has specific configuration options:
Web Source Example:
Start URL: https://docs.example.com
Include Patterns:
- /guides/*
- /api/*
- /tutorials/*
Exclude Patterns:
- /archive/*
- /internal/*
Max Pages: 1000
Refresh: Daily at 2 AM
GitHub Source Example:
Repository: myorg/myrepo
Include:
- Issues
- Discussions
- Wiki
Labels: [documentation, faq]
State: Open and Closed
Refresh: Every 6 hoursMonitoring Source Health
Track source performance in the dashboard:
- Status: Active, Crawling, Error, Paused
- Last Updated: When content was last indexed
- Document Count: Number of indexed documents
- Usage: How often the source is referenced
- Performance: Response quality metrics
Pro Tip: Regularly review source analytics to identify which sources provide the most value and which might need optimization or removal.
Best Practices
1. Source Selection
- Quality over Quantity: Better to have fewer, high-quality sources
- Avoid Duplication: Don't index the same content multiple times
- Keep Current: Remove outdated or deprecated sources
- Test Coverage: Ensure sources cover common user questions
2. Web Crawling
- Use Patterns: Define clear include/exclude patterns
- Respect Limits: Don't overload servers with aggressive crawling
- Monitor Errors: Check crawler logs for blocked or failed pages
- Optimize Frequency: Balance freshness with resource usage
3. Content Quality
- Structure Matters: Well-structured content indexes better
- Clear Headings: Use descriptive headings and sections
- Avoid Noise: Exclude navigation, footers, and boilerplate
- Update Regularly: Keep source content current and accurate
4. Performance Optimization
- Chunk Size: Configure appropriate chunk sizes (typically 500-1000 tokens)
- Embedding Model: Choose model based on language requirements
- Index Limits: Set reasonable limits to manage costs
- Prune Unused: Remove sources with low usage
Advanced Features
Source Priorities
Set priorities to prefer certain sources:
{
"sources": [
{ "id": "docs", "priority": 1.0 }, // Primary documentation
{ "id": "github", "priority": 0.8 }, // GitHub issues
{ "id": "community", "priority": 0.6 } // Community forums
]
}Custom Metadata
Add metadata to improve retrieval:
{
"document": "installation.md",
"metadata": {
"category": "setup",
"difficulty": "beginner",
"version": "2.0",
"tags": ["installation", "quickstart", "setup"]
}
}Conditional Sources
Enable sources based on context:
{
"source": "internal-docs",
"conditions": {
"user_type": "employee",
"region": "US"
}
}Source Analytics
Monitor how sources contribute to answers:
Key Metrics
- Citation Rate: How often the source is referenced
- Relevance Score: Quality of matches from this source
- Coverage: Percentage of questions answered using this source
- Feedback: User ratings for answers from this source
Optimization Insights
- Identify gaps in knowledge coverage
- Find sources that need updating
- Discover redundant or conflicting information
- Track source performance over time
Troubleshooting
Common Issues:
Crawler can't access content
- Check robots.txt and crawler permissions
- Verify authentication if required
- Ensure URLs are publicly accessible
Poor answer quality
- Review source content quality
- Check for outdated information
- Adjust chunking settings
- Consider adding more specific sources
Slow indexing
- Reduce crawl frequency
- Limit maximum pages
- Optimize include/exclude patterns
- Check network connectivity
API Access
Manage sources programmatically:
// Add a web source
const source = await ask0.sources.create({
type: 'web',
name: 'Documentation',
config: {
startUrl: 'https://docs.example.com',
includePatterns: ['/api/*', '/guides/*'],
maxPages: 500
}
});
// Trigger manual refresh
await ask0.sources.refresh(source.id);
// Get source statistics
const stats = await ask0.sources.getStats(source.id);