Ask0 logoAsk0
Sources

Sources

Configure knowledge sources to train your Ask0 AI assistant. Connect docs, GitHub, Notion, Discord, files, and custom content sources.

Sources are the foundation of your AI assistant's knowledge. They define what content your assistant can access to answer user questions. Ask0 supports multiple source types and provides powerful tools to manage and optimize your knowledge base.

What are Sources?

Sources are collections of content that Ask0 indexes and uses to train your AI assistant. When users ask questions, the assistant searches through these sources to find relevant information and generate accurate responses.

Key Capabilities

  • Automatic Crawling: Web sources are crawled automatically on your schedule
  • Real-time Updates: Keep knowledge current with incremental updates
  • Multiple Formats: Support for websites, documents, APIs, and more
  • Smart Processing: Intelligent chunking and indexing for optimal retrieval
  • Source Attribution: Responses show which sources were used

Available Source Types

How Sources Work

1. Content Discovery

When you add a source, Ask0 discovers available content:

  • Web crawler follows links and sitemaps
  • API connectors fetch available resources
  • File uploads are processed immediately

2. Processing & Indexing

Content is processed for optimal retrieval:

  • Text Extraction: Convert various formats to searchable text
  • Chunking: Split content into semantic segments
  • Embedding: Generate vector representations for semantic search
  • Metadata: Extract and store titles, URLs, timestamps

3. Knowledge Retrieval

When users ask questions:

  • Query is converted to embeddings
  • Semantic search finds relevant chunks
  • Context is assembled from multiple sources
  • AI generates response with source attribution

Managing Sources

Adding Sources

  1. Navigate to Sources in your project dashboard
  2. Click "Add Source" and select the type
  3. Configure source-specific settings:
    • URLs and patterns for web sources
    • Authentication for API sources
    • Upload files for document sources
  4. Set refresh schedule (if applicable)
  5. Click "Save & Start Indexing"

Source Configuration

Each source type has specific configuration options:

Web Source Example:
  Start URL: https://docs.example.com
  Include Patterns:
    - /guides/*
    - /api/*
    - /tutorials/*
  Exclude Patterns:
    - /archive/*
    - /internal/*
  Max Pages: 1000
  Refresh: Daily at 2 AM

GitHub Source Example:
  Repository: myorg/myrepo
  Include:
    - Issues
    - Discussions
    - Wiki
  Labels: [documentation, faq]
  State: Open and Closed
  Refresh: Every 6 hours

Monitoring Source Health

Track source performance in the dashboard:

  • Status: Active, Crawling, Error, Paused
  • Last Updated: When content was last indexed
  • Document Count: Number of indexed documents
  • Usage: How often the source is referenced
  • Performance: Response quality metrics

Pro Tip: Regularly review source analytics to identify which sources provide the most value and which might need optimization or removal.

Best Practices

1. Source Selection

  • Quality over Quantity: Better to have fewer, high-quality sources
  • Avoid Duplication: Don't index the same content multiple times
  • Keep Current: Remove outdated or deprecated sources
  • Test Coverage: Ensure sources cover common user questions

2. Web Crawling

  • Use Patterns: Define clear include/exclude patterns
  • Respect Limits: Don't overload servers with aggressive crawling
  • Monitor Errors: Check crawler logs for blocked or failed pages
  • Optimize Frequency: Balance freshness with resource usage

3. Content Quality

  • Structure Matters: Well-structured content indexes better
  • Clear Headings: Use descriptive headings and sections
  • Avoid Noise: Exclude navigation, footers, and boilerplate
  • Update Regularly: Keep source content current and accurate

4. Performance Optimization

  • Chunk Size: Configure appropriate chunk sizes (typically 500-1000 tokens)
  • Embedding Model: Choose model based on language requirements
  • Index Limits: Set reasonable limits to manage costs
  • Prune Unused: Remove sources with low usage

Advanced Features

Source Priorities

Set priorities to prefer certain sources:

{
  "sources": [
    { "id": "docs", "priority": 1.0 },     // Primary documentation
    { "id": "github", "priority": 0.8 },   // GitHub issues
    { "id": "community", "priority": 0.6 }  // Community forums
  ]
}

Custom Metadata

Add metadata to improve retrieval:

{
  "document": "installation.md",
  "metadata": {
    "category": "setup",
    "difficulty": "beginner",
    "version": "2.0",
    "tags": ["installation", "quickstart", "setup"]
  }
}

Conditional Sources

Enable sources based on context:

{
  "source": "internal-docs",
  "conditions": {
    "user_type": "employee",
    "region": "US"
  }
}

Source Analytics

Monitor how sources contribute to answers:

Key Metrics

  • Citation Rate: How often the source is referenced
  • Relevance Score: Quality of matches from this source
  • Coverage: Percentage of questions answered using this source
  • Feedback: User ratings for answers from this source

Optimization Insights

  • Identify gaps in knowledge coverage
  • Find sources that need updating
  • Discover redundant or conflicting information
  • Track source performance over time

Troubleshooting

Common Issues:

Crawler can't access content

  • Check robots.txt and crawler permissions
  • Verify authentication if required
  • Ensure URLs are publicly accessible

Poor answer quality

  • Review source content quality
  • Check for outdated information
  • Adjust chunking settings
  • Consider adding more specific sources

Slow indexing

  • Reduce crawl frequency
  • Limit maximum pages
  • Optimize include/exclude patterns
  • Check network connectivity

API Access

Manage sources programmatically:

// Add a web source
const source = await ask0.sources.create({
  type: 'web',
  name: 'Documentation',
  config: {
    startUrl: 'https://docs.example.com',
    includePatterns: ['/api/*', '/guides/*'],
    maxPages: 500
  }
});

// Trigger manual refresh
await ask0.sources.refresh(source.id);

// Get source statistics
const stats = await ask0.sources.getStats(source.id);

What's Next?