PostgreSQL Setup Guide¶

This guide covers PostgreSQL usage in Flexible GraphRAG, including the pgvector vector store option, incremental updates state management, and pgAdmin configuration.

Table of Contents¶

Overview
Two PostgreSQL Instances
Automatic Setup on First Start
pgVector Vector Store
Incremental Updates State Management
pgAdmin Access
Manual Database Operations
Troubleshooting

Overview¶

Flexible GraphRAG can use PostgreSQL for two distinct purposes:

Vector Storage - Using the pgvector extension for embedding similarity search (optional, alternative to Qdrant/Milvus/etc)
Incremental Updates - State management for tracking document changes and synchronization status (optional feature)

Both can run in the same PostgreSQL container but use separate databases to keep concerns separated.

Two PostgreSQL Instances¶

Flexible GraphRAG Docker setup includes two separate PostgreSQL containers:

1. Flexible GraphRAG PostgreSQL (Port 5433)¶

Container: flexible-graphrag-postgres
Image: pgvector/pgvector:pg16
Port: 5433 (to avoid conflict with Alfresco)

Purpose: Dual-purpose database for Flexible GraphRAG - Vector storage with pgvector extension - Incremental updates state management

Databases: - flexible_graphrag - Vector embeddings (with pgvector extension) - flexible_graphrag_incremental - State management tables

2. Alfresco PostgreSQL (Port 5432)¶

Container: flexible-graphrag-postgres-1
Image: postgres:16.5
Port: 5432 (standard PostgreSQL port)

Purpose: Alfresco content management system only - Dedicated database for Alfresco repository - Separate instance to avoid conflicts

Why Separate? - Different PostgreSQL images (pgvector vs standard) - Different port assignments - Isolated resource allocation - Independent lifecycle management

Automatic Setup on First Start¶

When you run docker compose up for the first time, the PostgreSQL container automatically initializes:

Initialization Process¶

The container executes scripts from /docker-entrypoint-initdb.d/ in alphabetical order:

docker/postgres-init/
├── 01-init-pgvector.sql           ← Creates pgvector extension
├── 02-init-incremental.sql        ← Creates incremental database
└── 03-init-incremental-schema.sh  ← Creates state management tables

What Gets Created Automatically¶

1. Vector Database (`flexible_graphrag`)¶

-- Created by: 01-init-pgvector.sql
CREATE EXTENSION IF NOT EXISTS vector;
-- Result: pgvector extension enabled for similarity search

2. Incremental Database (`flexible_graphrag_incremental`)¶

-- Created by: 02-init-incremental.sql
CREATE DATABASE flexible_graphrag_incremental;
-- Result: Separate database for state tracking

3. State Management Tables¶

-- Created by: 03-init-incremental-schema.sh
CREATE TABLE datasource_config (
    config_id TEXT PRIMARY KEY,
    source_type TEXT,
    connection_params JSONB,
    ...
);

CREATE TABLE document_state (
    doc_id TEXT PRIMARY KEY,
    content_hash TEXT,
    modified_timestamp TIMESTAMPTZ,  -- Proper timestamp type
    vector_synced_at TIMESTAMPTZ,
    search_synced_at TIMESTAMPTZ,
    graph_synced_at TIMESTAMPTZ,
    ...
);

One-Time Initialization¶

Scripts only run when data directory is empty (first startup)
Subsequent restarts skip initialization (data already exists)
To re-initialize: Remove the flexible-graphrag_postgres_data volume

# Reset and reinitialize
docker compose -p flexible-graphrag down postgres-pgvector
docker volume rm flexible-graphrag_postgres_data
docker compose -p flexible-graphrag up -d postgres-pgvector

pgVector Vector Store¶

Overview¶

The pgvector extension enables PostgreSQL to store and search vector embeddings efficiently, making it a viable alternative to specialized vector databases.

When to Use PostgreSQL as Vector Store¶

✅ Good for: - Small to medium datasets (< 10 million vectors) - When you want fewer moving parts (one database for everything) - Development and testing environments - When you already have PostgreSQL expertise

❌ Consider alternatives for: - Very large datasets (> 10 million vectors) - High-performance production workloads requiring sub-millisecond latency - Distributed vector search across multiple nodes

Configuration¶

1. Enable PostgreSQL Vector Store¶

Edit flexible-graphrag/.env:

# Set vector database to postgres
VECTOR_DB=postgres

# Configure connection
VECTOR_DB_CONFIG={"host": "localhost", "port": 5433, "database": "flexible_graphrag", "username": "postgres", "password": "password", "table_name": "hybrid_search_vectors"}

2. Connection Parameters¶

Parameter	Value	Notes
`host`	`localhost`	Or Docker service name `postgres-pgvector` from within Docker
`port`	`5433`	Flexible GraphRAG PostgreSQL (not 5432 which is Alfresco)
`database`	`flexible_graphrag`	Database with pgvector extension
`username`	`postgres`	Default admin user
`password`	`password`	Default password (change in production!)
`table_name`	`hybrid_search_vectors`	Table for vector embeddings

3. Verify pgVector Extension¶

# Connect to database
docker exec -it flexible-graphrag-postgres psql -U postgres -d flexible_graphrag

# Check extension
flexible_graphrag=# \dx
                                      List of installed extensions
  Name   | Version |   Schema   |                        Description
---------+---------+------------+-----------------------------------------------------------
 plpgsql | 1.0     | pg_catalog | PL/pgSQL procedural language
 vector  | 0.8.0   | public     | vector data type and ivfflat and hnsw access methods

How It Works¶

Document Ingestion: Embeddings are generated and stored in the hybrid_search_vectors table
Vector Search: Uses pgvector's indexing (HNSW or IVFFlat) for nearest neighbor search
Hybrid Search: Combined with Elasticsearch/OpenSearch for BM25 keyword search

Performance Tuning¶

For production workloads, consider:

-- Create HNSW index for faster similarity search
CREATE INDEX ON hybrid_search_vectors 
USING hnsw (embedding vector_cosine_ops);

-- Tune PostgreSQL memory settings
shared_buffers = 4GB
effective_cache_size = 12GB

Incremental Updates State Management¶

Overview¶

The incremental updates system tracks document changes and synchronization status across multiple data sources, enabling automatic re-ingestion when content changes.

Architecture¶

PostgreSQL (flexible_graphrag_incremental database)
├── datasource_config     → Configuration for each monitored source
└── document_state        → Per-document processing state

Configuration¶

1. Enable Incremental Updates¶

Edit flexible-graphrag/.env:

# Enable incremental updates
ENABLE_INCREMENTAL_UPDATES=true

# PostgreSQL connection for state tracking
POSTGRES_INCREMENTAL_URL=postgresql://postgres:password@localhost:5433/flexible_graphrag_incremental

2. Connection String Format¶

postgresql://[username]:[password]@[host]:[port]/[database]

username: postgres (default)
password: password (default, change in production!)
host: localhost (or postgres-pgvector from Docker)
port: 5433 (Flexible GraphRAG PostgreSQL)
database: flexible_graphrag_incremental (state management)

Database Schema¶

`datasource_config` Table¶

Stores configuration for each monitored data source:

CREATE TABLE datasource_config (
    config_id TEXT PRIMARY KEY,              -- Unique identifier
    project_id TEXT NOT NULL,                -- Project grouping
    source_type TEXT NOT NULL,               -- filesystem, s3, alfresco, etc.
    source_name TEXT NOT NULL,               -- Human-readable name
    connection_params JSONB NOT NULL,        -- Source-specific config
    refresh_interval_seconds INTEGER,        -- Polling interval
    enable_change_stream BOOLEAN,            -- Real-time monitoring
    skip_graph BOOLEAN,                      -- Skip graph extraction
    is_active BOOLEAN,                       -- Enable/disable source
    sync_status TEXT,                        -- idle, syncing, error
    last_sync_ordinal BIGINT,                -- Last processed timestamp
    last_sync_completed_at TIMESTAMPTZ,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

`document_state` Table¶

Tracks processing state for each document:

CREATE TABLE document_state (
    doc_id TEXT PRIMARY KEY,                 -- config_id:source_path
    config_id TEXT NOT NULL,                 -- Links to datasource_config
    source_path TEXT NOT NULL,               -- File path or identifier
    source_id TEXT,                          -- Source-specific ID
    ordinal BIGINT NOT NULL,                 -- Microsecond timestamp
    content_hash TEXT NOT NULL,              -- SHA-256 for change detection
    modified_timestamp TIMESTAMPTZ,          -- Source modification time
    vector_synced_at TIMESTAMPTZ,            -- Vector index sync time
    search_synced_at TIMESTAMPTZ,            -- Search index sync time
    graph_synced_at TIMESTAMPTZ,             -- Graph index sync time
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

Change Detection Flow¶

Detector monitors source (filesystem, S3, Alfresco, etc.)
Change detected (new/modified/deleted file)
State check - Query document_state for existing hash
Content comparison - Compare hash to detect actual changes
Sync tracking - Update vector_synced_at, search_synced_at, graph_synced_at
Re-ingestion - Only process changed documents

Supported Data Sources¶

Source	Type	Real-time Support
Filesystem	Local files	✅ Watchdog (inotify)
Amazon S3	Cloud storage	⏱️ Polling
Alfresco	CMS	✅ ActiveMQ events
Google Drive	Cloud storage	✅ Push notifications
OneDrive	Cloud storage	✅ Microsoft Graph webhooks
SharePoint	CMS	✅ Microsoft Graph webhooks
Azure Blob	Cloud storage	⏱️ Polling
Google Cloud Storage	Cloud storage	⏱️ Polling
Box	Cloud storage	⏱️ Polling

Timestamp Migration¶

The schema uses proper TIMESTAMPTZ (timestamp with timezone) for all timestamp columns. If upgrading from an older version with TEXT timestamps:

# Apply migration
docker exec -i flexible-graphrag-postgres psql -U postgres -d flexible_graphrag_incremental < flexible-graphrag/incremental_updates/migration_002_modified_timestamp_timestamptz.sql

pgAdmin Access¶

Automatic Server Registration¶

The PostgreSQL server is automatically registered in pgAdmin when you first start the containers. No manual "Add Server" configuration needed!

Accessing pgAdmin¶

Open pgAdmin: http://localhost:5050
Login (one-time per session):
Email: admin@flexible-graphrag.com
Password: admin
Connect to Server:
Server appears automatically in left sidebar: "Flexible GraphRAG PostgreSQL"
Click on it
Enter password: password
✅ Check "Save password" - Won't ask again

Navigate databases:

Flexible GraphRAG PostgreSQL
├── Databases
│   ├── flexible_graphrag              ← Vector store
│   │   └── hybrid_search_vectors      ← Embeddings table
│   └── flexible_graphrag_incremental  ← State management
│       ├── datasource_config
│       └── document_state

How Auto-Registration Works¶

Configuration files are mounted into the pgAdmin container:

# docker/includes/postgres-pgvector.yaml
volumes:
  - ../pgadmin-config/servers.json:/pgadmin4/servers.json  # Pre-register server
  - ../pgadmin-config/pgpass:/pgadmin4/pgpass              # Store credentials

servers.json defines the server:

{
  "Servers": {
    "1": {
      "Name": "Flexible GraphRAG PostgreSQL",
      "Host": "postgres-pgvector",
      "Port": 5432,
      "MaintenanceDB": "postgres",
      "Username": "postgres"
    }
  }
}

Credentials¶

pgAdmin Login: - Email: admin@flexible-graphrag.com - Password: admin

PostgreSQL Connection: - Username: postgres - Password: password

⚠️ Production: Change these default credentials!

Persistent Configuration¶

All pgAdmin settings are stored in the flexible-graphrag_pgadmin_data Docker volume: - Saved passwords - Query history - Preferences - Custom configurations

To reset pgAdmin:

docker volume rm flexible-graphrag_pgadmin_data
docker compose -p flexible-graphrag up -d pgadmin

Manual Database Operations¶

Connect via Command Line¶

# Connect to vector database
docker exec -it flexible-graphrag-postgres psql -U postgres -d flexible_graphraf

# Connect to incremental database
docker exec -it flexible-graphrag-postgres psql -U postgres -d flexible_graphrag_incremental

# List all databases
docker exec flexible-graphrag-postgres psql -U postgres -c "\l"

# List tables in incremental database
docker exec flexible-graphrag-postgres psql -U postgres -d flexible_graphrag_incremental -c "\dt"

View Table Structure¶

# Document state table schema
docker exec flexible-graphrag-postgres psql -U postgres -d flexible_graphrag_incremental -c "\d document_state"

# Vector table schema
docker exec flexible-graphrag-postgres psql -U postgres -d flexible_graphraf -c "\d hybrid_search_vectors"

Query Examples¶

Check Incremental Updates Status¶

-- Count documents per data source
SELECT config_id, COUNT(*) as doc_count
FROM document_state
GROUP BY config_id;

-- Recently synced documents
SELECT doc_id, source_path, vector_synced_at, search_synced_at, graph_synced_at
FROM document_state
WHERE vector_synced_at > NOW() - INTERVAL '1 hour'
ORDER BY vector_synced_at DESC;

-- Documents pending sync
SELECT doc_id, source_path
FROM document_state
WHERE vector_synced_at IS NULL OR search_synced_at IS NULL OR graph_synced_at IS NULL;

Check Vector Store¶

-- Count embeddings
SELECT COUNT(*) FROM hybrid_search_vectors;

-- Sample embeddings
SELECT id, LEFT(text, 50) as text_preview, 
       array_length(embedding, 1) as embedding_dim
FROM hybrid_search_vectors
LIMIT 5;

Backup and Restore¶

Backup¶

# Backup vector database
docker exec flexible-graphrag-postgres pg_dump -U postgres flexible_graphraf > backup_vectors.sql

# Backup incremental database
docker exec flexible-graphrag-postgres pg_dump -U postgres flexible_graphrag_incremental > backup_incremental.sql

# Backup both
docker exec flexible-graphrag-postgres pg_dumpall -U postgres > backup_all.sql

Restore¶

# Restore vector database
docker exec -i flexible-graphrag-postgres psql -U postgres -d flexible_graphraf < backup_vectors.sql

# Restore incremental database
docker exec -i flexible-graphrag-postgres psql -U postgres -d flexible_graphrag_incremental < backup_incremental.sql

Troubleshooting¶

PostgreSQL Container Won't Start¶

Check logs:

docker logs flexible-graphrag-postgres

Common issues: - Port 5433 already in use → Check with netstat -an | findstr :5433 - Volume permissions → On Linux, check ownership of mounted directories - Memory limits → Increase Docker memory allocation

Databases Not Created Automatically¶

Symptoms: Empty database, tables don't exist

Cause: Volume already exists from previous installation

Solution:

# Remove volume and restart
docker compose -p flexible-graphrag down postgres-pgvector
docker volume rm flexible-graphrag_postgres_data
docker compose -p flexible-graphrag up -d postgres-pgvector

pgVector Extension Missing¶

Check:

docker exec flexible-graphrag-postgres psql -U postgres -d flexible_graphraf -c "\dx"

If missing:

-- Manually install
CREATE EXTENSION IF NOT EXISTS vector;

pgAdmin Can't Connect¶

Symptoms: "Could not connect to server"

Solutions:

Check PostgreSQL is running:
```
docker ps | grep postgres
```

Verify network connectivity:

docker exec flexible-graphrag-pgadmin ping postgres-pgvector

Check credentials:
Username: postgres
Password: password
Database: postgres (maintenance DB)

Reset pgAdmin volume:

docker volume rm flexible-graphrag_pgadmin_data
docker compose -p flexible-graphrag up -d pgadmin

Incremental Updates Not Working¶

Check connection:

# From .env file
echo $POSTGRES_INCREMENTAL_URL

# Test connection
docker exec flexible-graphrag-postgres psql postgresql://postgres:password@localhost:5432/flexible_graphrag_incremental -c "SELECT COUNT(*) FROM datasource_config;"

Verify schema:

docker exec flexible-graphrag-postgres psql -U postgres -d flexible_graphrag_incremental -c "\dt"

Expected output:

               List of relations
 Schema |       Name        | Type  |  Owner   
--------+-------------------+-------+----------
 public | datasource_config | table | postgres
 public | document_state    | table | postgres

Wrong Database/Port¶

Remember: - Flexible GraphRAG PostgreSQL: Port 5433, Image pgvector/pgvector:pg16 - Container: flexible-graphrag-postgres - Vector DB: flexible_graphraf - Incremental DB: flexible_graphrag_incremental

Alfresco PostgreSQL: Port 5432, Image postgres:16.5
Container: flexible-graphrag-postgres-1
Alfresco DB: alfresco

Don't mix them up! They are completely separate instances.

Security Considerations¶

Default Credentials¶

⚠️ Change these in production!

# PostgreSQL
POSTGRES_USER=postgres
POSTGRES_PASSWORD=password

# pgAdmin
PGADMIN_DEFAULT_EMAIL=admin@flexible-graphrag.com
PGADMIN_DEFAULT_PASSWORD=admin

Production Recommendations¶

Use strong passwords
Enable SSL/TLS for PostgreSQL connections
Restrict network access (firewall rules)
Regular backups (automated)
Monitor access logs
Update regularly (security patches)

Additional Resources¶

PostgreSQL Documentation: https://www.postgresql.org/docs/
pgvector GitHub: https://github.com/pgvector/pgvector
pgAdmin Documentation: https://www.pgadmin.org/docs/
Flexible GraphRAG Docs: docs/ directory

For questions or issues, check the main README or open a GitHub issue.

PostgreSQL Setup Guide¶

Table of Contents¶

Overview¶

Two PostgreSQL Instances¶

1. Flexible GraphRAG PostgreSQL (Port 5433)¶

2. Alfresco PostgreSQL (Port 5432)¶

Automatic Setup on First Start¶

Initialization Process¶

What Gets Created Automatically¶

1. Vector Database (flexible_graphrag)¶

2. Incremental Database (flexible_graphrag_incremental)¶

3. State Management Tables¶

One-Time Initialization¶

pgVector Vector Store¶

Overview¶

When to Use PostgreSQL as Vector Store¶

Configuration¶

1. Enable PostgreSQL Vector Store¶

2. Connection Parameters¶

3. Verify pgVector Extension¶

How It Works¶

Performance Tuning¶

Incremental Updates State Management¶

Overview¶

Architecture¶

Configuration¶

1. Enable Incremental Updates¶

2. Connection String Format¶

Database Schema¶

datasource_config Table¶

document_state Table¶

Change Detection Flow¶

Supported Data Sources¶

Timestamp Migration¶

pgAdmin Access¶

Automatic Server Registration¶

Accessing pgAdmin¶

How Auto-Registration Works¶

Credentials¶

Persistent Configuration¶

Manual Database Operations¶

Connect via Command Line¶

View Table Structure¶

Query Examples¶

Check Incremental Updates Status¶

Check Vector Store¶

Backup and Restore¶

Backup¶

Restore¶

Troubleshooting¶

PostgreSQL Container Won't Start¶

Databases Not Created Automatically¶

pgVector Extension Missing¶

pgAdmin Can't Connect¶

Incremental Updates Not Working¶

Wrong Database/Port¶

Security Considerations¶

Default Credentials¶

Production Recommendations¶

Additional Resources¶

1. Vector Database (`flexible_graphrag`)¶

2. Incremental Database (`flexible_graphrag_incremental`)¶

`datasource_config` Table¶

`document_state` Table¶