Quick Start Guide¶
Get incremental updates running in 5 minutes!
Prerequisites¶
- Flexible GraphRAG backend running
- PostgreSQL database available
- At least one data source configured (filesystem, S3, Google Drive, etc.)
Step 1: Database Setup (2 minutes)¶
Option A: Use Existing PostgreSQL¶
If you have the postgres-pgvector service running (from main Flexible GraphRAG setup):
- Create a new database in the same PostgreSQL instance:
- Database name:
flexible_graphrag_incremental -
Port: Same as your PostgreSQL (usually 5433)
-
Run the schema:
- Use the
schema.sqlfile in theincremental_updatesfolder - Execute it against the new database
Option B: Docker Compose (Recommended)¶
The postgres-pgvector.yaml service provides everything:
- PostgreSQL on port 5433
- pgAdmin at http://localhost:5050
- Ensure it's uncommented in
docker/docker-compose.yaml - Start services:
docker-compose up -d - Access pgAdmin and create database
flexible_graphrag_incremental - Run
schema.sqlvia pgAdmin query tool
Step 2: Configure Environment (1 minute)¶
Add to your .env file:
# PostgreSQL connection for incremental updates
POSTGRES_INCREMENTAL_URL=postgresql://postgres:password@localhost:5433/flexible_graphrag_incremental
# Enable incremental updates
ENABLE_INCREMENTAL_UPDATES=true
Note: Replace postgres:password with your actual credentials.
Step 3: Enable Sync via UI (2 minutes)¶
For New Data Source¶
- Open the web UI (http://localhost:5000)
- Go to Processing tab → Click data source type (e.g., "Filesystem", "S3", "Google Drive")
- Fill in source details:
- Name: A descriptive name
- Source-specific fields (path, bucket, folder ID, etc.)
- Enable Sync section:
- ✅ Enable automatic sync
- Choose sync interval (default: 300 seconds)
- Optional: Enable "Skip Graph" for faster syncs
- Click Process or Add Source
For Existing Data Source¶
If you already have documents ingested and want to enable sync:
- Re-process through the UI with "Enable Sync" checkbox selected
- This will:
- Ingest documents (if not already present)
- Create
datasource_configentry - Create
document_staterecords - Start monitoring for changes
Step 4: Verify It's Working (1 minute)¶
Check Backend Logs¶
You should see:
INFO: Incremental Update System starting...
INFO: Found 1 active data source(s) configured for auto-sync
INFO: Started 1 auto-sync updater(s)
INFO: Monitoring for configuration changes...
Test Change Detection¶
For Filesystem:
1. Add a new .txt file to your monitored folder
2. Watch logs - should see: EVENT: CREATE detected for filename.txt
3. Check UI - document count should increase
For S3: 1. Upload a file to your monitored S3 bucket 2. If SQS configured: Change detected in 1-5 seconds 3. If polling only: Change detected within sync interval (5 minutes default)
For Google Drive:
1. Upload a file to your monitored folder
2. Within 60 seconds (default polling), should see: EVENT: CREATE detected for filename.txt
Verify in Database¶
Using pgAdmin or psql:
-- Check datasource configuration
SELECT source_name, source_type, sync_status, last_sync_completed_at
FROM datasource_config;
-- Check tracked documents
SELECT doc_id, source_path, ordinal
FROM document_state
ORDER BY ordinal DESC
LIMIT 10;
What Happens Next?¶
The system is now monitoring your data source and will automatically:
- Detect Changes
- CREATE: New files added
- UPDATE: Existing files modified
-
DELETE: Files removed
-
Process Changes
- Load document content
- Process with DocumentProcessor (Docling/LlamaParse)
- Generate embeddings
- Update vector store (Qdrant)
- Update search index (Elasticsearch)
-
Update knowledge graph (Neo4j) if enabled
-
Track State
- Store document metadata in
document_state - Update sync status in
datasource_config - Skip reprocessing if only timestamp changed (content hash optimization)
Common Issues¶
No Changes Detected¶
Problem: Added file but nothing happens
Solutions:
- Check is_active = true in datasource_config table
- Verify backend logs show "Started N auto-sync updater(s)"
- For event-driven sources (S3, Filesystem), check event stream is enabled
- Try manual sync: POST /api/incremental/sync/{config_id}
Documents Not Showing Up¶
Problem: Change detected but document not searchable
Solutions:
- Check backend logs for processing errors
- Verify vector store (Qdrant) and search index (Elasticsearch) are running
- Ensure data source credentials are correct
- Check document_state table - should have entry for the file
Duplicate Processing¶
Problem: Same document processed multiple times
Solutions:
- Check only one datasource config exists for the location
- Verify source_id is populated in document_state
- Ensure detector initialization completed (check "Populated known_file_ids" in logs)
Next Steps¶
- Setup Guide - Detailed configuration options
- API Reference - REST API for programmatic control
- S3 Setup - Configure S3 with SQS event notifications for real-time updates
- Diagnostic Queries: Use
diagnostic_queries.sqlto inspect system state - Manual Sync: Trigger on-demand sync via API or will be available in UI
Advanced Configuration¶
Adjust Sync Interval¶
Lower interval = faster change detection (but more frequent scans):
UPDATE datasource_config
SET refresh_interval_seconds = 60 -- Check every 60 seconds
WHERE config_id = 'your-config-id';
Skip Knowledge Graph¶
For faster syncs, skip graph extraction:
Then restart the backend.
Enable Event Stream¶
For sources that support it (S3 with SQS, Filesystem, Alfresco):
This enables real-time change detection instead of periodic polling.
Monitoring¶
Check Sync Status¶
SELECT
source_name,
sync_status, -- 'idle', 'syncing', 'error'
last_sync_completed_at,
last_sync_error
FROM datasource_config;
View Recent Changes¶
SELECT
doc_id,
source_path,
ordinal,
content_hash,
created_at
FROM document_state
ORDER BY ordinal DESC
LIMIT 20;
Count Documents by Source¶
SELECT
dc.source_name,
COUNT(*) as doc_count
FROM document_state ds
JOIN datasource_config dc ON ds.config_id = dc.config_id
GROUP BY dc.source_name;
Support¶
For issues or questions:
1. Check backend logs for error messages
2. Review diagnostic_queries.sql for troubleshooting queries
3. See Setup Guide for detailed configuration
4. Check GitHub issues for known problems and solutions