Vector Dimension Compatibility Guide¶

This document explains critical vector dimension compatibility issues when switching between different embedding models in Flexible GraphRAG.

⚠️ Critical Issue: Vector Dimension Incompatibility¶

When switching between different LLM providers or embedding models, you MUST delete existing vector indexes because different models produce embeddings with different dimensions.

Why This Matters¶

Vector databases create indexes optimized for specific dimensions. When you change embedding models, the new embeddings won't fit the existing index structure, causing errors like:

Dimension mismatch error
Vector size incompatible with index
Index dimension does not match embedding dimension

📊 Embedding Dimensions by Provider¶

OpenAI¶

text-embedding-3-large: 3072 dimensions
text-embedding-3-small: 1536 dimensions (default)
text-embedding-ada-002: 1536 dimensions

Ollama¶

all-minilm: 384 dimensions (default)
nomic-embed-text: 768 dimensions
mxbai-embed-large: 1024 dimensions

Azure OpenAI¶

Same as OpenAI models: 1536 or 3072 dimensions

Other Providers¶

Default fallback: 1536 dimensions

🗂️ Vector Database Cleanup Instructions¶

🎯 Best Databases for Easy Vector Deletion¶

When frequently switching between embedding models (OpenAI ↔ Ollama), choose databases with user-friendly deletion:

Database	Deletion Method	Difficulty	Dashboard
Qdrant ✅	One-click collection deletion	⭐ Easy	Web UI
Milvus ✅	Professional drop operations	⭐⭐ Moderate	Attu Dashboard
Weaviate ✅	Schema-based deletion	⭐⭐ Moderate	Console
Chroma ⚠️	HTTP mode: API deletion, Local mode: File cleanup	⭐⭐ Moderate	Swagger API (HTTP)
LanceDB ⚠️	File/table deletion	⭐⭐ Moderate	Viewer + Files
PostgreSQL ❌	SQL commands required	⭐⭐⭐ Advanced	pgAdmin
Pinecone ⚠️	Cloud console only	⭐⭐ Moderate	Web Console

💡 Recommendation: Use Qdrant or Milvus for the easiest vector cleanup when switching embedding models.

Qdrant (Recommended for Easy Deletion)¶

Using Qdrant Dashboard: 1. Open Qdrant Dashboard: http://localhost:6333/dashboard 2. Go to "Collections" tab 3. Find hybrid_search_vector (or your collection name) in the collections list 4. Click the 3 dots (⋮) menu next to the collection 5. Select "Delete" 6. Confirm the deletion

Neo4j¶

Using Neo4j Browser: 1. Open Neo4j Browser: http://localhost:7474 (or your Neo4j port) 2. Login with your credentials 3. Drop Vector Index: - Run: SHOW INDEXES - Run: DROP INDEX hybrid_search_vector IF EXISTS - Run: SHOW INDEXES to verify cleanup

Elasticsearch¶

Using Kibana Dashboard: 1. Open Kibana: http://localhost:5601 (if Kibana is running) 2. Choose "Management" from the main menu 3. Click "Index Management" 4. Select hybrid_search_vector from the indices list 5. Choose "Manage index" (blue button) 6. Choose "Delete index" 7. Confirm the deletion

Alternative - Using Elasticsearch REST API:

# Delete the vector index via curl
curl -X DELETE "http://localhost:9200/hybrid_search_vector"

OpenSearch¶

Using OpenSearch Dashboards: 1. Open OpenSearch Dashboards: http://localhost:5601 (if running) or http://localhost:9201/_dashboards 2. Go to "Index Management" (in the main menu or under "Management") 3. Click on "Indices" tab 4. Find hybrid_search_vector in the indices list 5. Click the checkbox next to the index 6. Click "Actions" → "Delete" 7. Confirm the deletion by typing "delete"

Alternative - Using OpenSearch REST API:

# Delete the vector index via curl
curl -X DELETE "http://localhost:9201/hybrid_search_vector"

Chroma (File System or HTTP API Cleanup)¶

Chroma supports two deployment modes with different cleanup approaches:

Local Mode (PersistentClient) - File System Cleanup:

# Delete Chroma directory (contains all vector data)
rm -rf ./chroma_db

# Or on Windows
rmdir /s /q .\chroma_db

# Or on Windows PowerShell
Remove-Item -Path .\chroma_db -Recurse -Force

# Verify cleanup
ls -la  # Should not show chroma_db directory

HTTP Mode (HttpClient) - Using curl or Swagger API:

# List all collections
curl "http://localhost:8001/api/v2/tenants/default_tenant/databases/default_database/collections"

# Delete specific collection
curl -X DELETE "http://localhost:8001/api/v2/tenants/default_tenant/databases/default_database/collections/hybrid_search"

Via Swagger UI (http://localhost:8001/docs): 1. Find the DELETE endpoint for collections 2. Enter tenant: default_tenant 3. Enter database: default_database 4. Enter collection: hybrid_search 5. Execute

Alternative - Using Python API (for both modes):

import chromadb

# For Local Mode (PersistentClient)
client = chromadb.PersistentClient(path="./chroma_db")

# For HTTP Mode (HttpClient)
# client = chromadb.HttpClient(host="localhost", port=8001)

# Delete collection
client.delete_collection("hybrid_search")

# Verify
print(client.list_collections())  # Should not include hybrid_search

Milvus (Professional Dashboard)¶

Via Milvus Attu Dashboard (http://localhost:3003): 1. Open Attu Dashboard at http://localhost:3003 2. Navigate to Collections page 3. Find your collection (typically hybrid_search) 4. Click the "Drop" button next to the collection 5. Confirm the deletion by typing the collection name 6. Click "Drop Collection" to confirm

Alternative - Using Milvus CLI:

# Connect to Milvus and drop collection
curl -X DELETE "http://localhost:19530/v1/collection" \
  -H "Content-Type: application/json" \
  -d '{"collection_name": "hybrid_search"}'

Weaviate (Schema Management)¶

Via Weaviate Console (http://localhost:8081/console): 1. Open Weaviate Console at http://localhost:8081/console 2. Navigate to Schema section 3. Find your class (typically HybridSearch or Documents) 4. Click "Delete Class" button 5. Confirm deletion - this removes all vectors in the class

Alternative - Using Weaviate API:

# Delete entire class (removes all vectors)
curl -X DELETE "http://localhost:8081/v1/schema/HybridSearch"

PostgreSQL+pgvector (SQL-Based)¶

Via pgAdmin (http://localhost:5050): 1. Open pgAdmin at http://localhost:5050 2. Login with admin@flexible-graphrag.com / admin 3. Connect to PostgreSQL server (postgres:5432) 4. Navigate to Tables in the database 5. Find your vector table (e.g., hybrid_search_vectors) 6. Right-click → Delete/Drop → Cascade 7. Confirm deletion

Alternative - Using SQL Commands:

-- Delete all vectors from table
DELETE FROM hybrid_search_vectors;

-- Or drop entire table
DROP TABLE IF EXISTS hybrid_search_vectors CASCADE;

-- Verify cleanup
\dt  -- List tables to confirm deletion

Reference: n8n Community - Deleting pgvector content

Pinecone (Cloud Console)¶

Via Pinecone Console (https://app.pinecone.io): 1. Log in to Pinecone Console at https://app.pinecone.io 2. Navigate to Indexes page from left navigation 3. Find your index (typically hybrid-search) 4. Click the three vertical dots (•••) to the right of index name 5. Select "Delete" from dropdown menu 6. Confirm deletion in the dialog box 7. ⚠️ Warning: This is permanent and irreversible!

Note: Pinecone is a managed service - no local deletion needed.

LanceDB (File-Based Cleanup)¶

Via LanceDB Viewer (http://localhost:3005): 1. Open LanceDB Viewer at http://localhost:3005 2. Navigate to Tables section 3. Find your table (typically hybrid_search) 4. Click "Delete Table" button 5. Confirm deletion

Alternative - File System Cleanup:

# Delete LanceDB directory (contains all vector data)
rm -rf ./lancedb

# Or on Windows
rmdir /s /q .\lancedb

# Verify cleanup
ls -la  # Should not show lancedb directory

Neo4j (Vector Index Cleanup)¶

🔄 Safe Migration Process¶

When switching embedding models, follow this process:

1. Backup Important Data (Optional)¶

# Export any important data before deletion
# (Implementation depends on your database)

2. Update Configuration¶

# Edit your .env file
LLM_PROVIDER=ollama  # Changing from openai to ollama
EMBEDDING_MODEL=all-minilm  # 384 dimensions

3. Clean Vector Database¶

Choose the appropriate cleanup method from above based on your vector database.

4. Restart Services¶

# Restart your application
cd flexible-graphrag
uv run start.py

5. Re-ingest Documents¶

# Re-process your documents with the new embedding model
curl -X POST "http://localhost:8000/api/ingest" \
  -H "Content-Type: application/json" \
  -d '{"data_source": "filesystem", "paths": ["./your_documents"]}'

🚨 Common Error Messages¶

Qdrant¶

Vector dimension mismatch: expected 1536, got 384

Neo4j¶

Vector index dimension (1536) does not match embedding dimension (384)

Elasticsearch/OpenSearch¶

mapper_parsing_exception: dimension mismatch

📋 Configuration Detection¶

The system automatically detects embedding dimensions in flexible-graphrag/factories.py:

def get_embedding_dimension(llm_provider: LLMProvider, llm_config: Dict[str, Any]) -> int:
    if llm_provider == LLMProvider.OPENAI:
        return 1536  # or 3072 for large models
    elif llm_provider == LLMProvider.OLLAMA:
        return 384  # default for all-minilm
    # ... other providers

The dimension is automatically applied to vector database configurations in config.py:

"embed_dim": 1536 if self.llm_provider == LLMProvider.OPENAI else 384

Ollama + Ladybug + vector store¶

When using Ollama embeddings with Ladybug (GRAPH_DB=ladybug) and a separate VECTOR_DB (for example Qdrant), use one embedding model end-to-end and set EMBEDDING_DIMENSION to match (for example 384 for all-minilm, 768 for nomic-embed-text). If you change embedding models or dimensions, clear the vector index data and remove or recreate the Ladybug .lbug file before re-ingesting.

Ladybug can store vectors on chunk nodes when LADYBUG_USE_VECTOR_INDEX=true; those vectors must use the same embedding model and dimension as your configured VECTOR_DB.

Best Practices¶

Plan Your Embedding Model: Choose your embedding model before ingesting large document collections
Test with Small Data: Verify compatibility with a small test dataset first
Document Your Configuration: Keep track of which embedding model you're using
Backup Strategy: Consider backup procedures if you need to preserve processed data
Environment Separation: Use different databases/collections for different embedding models
Consistent Naming: Use explicit collection/database names to avoid defaults mismatches
Ollama + Ladybug: Align embedding dimensions across Ladybug and VECTOR_DB before large ingests

Verification¶

After switching models and cleaning databases, verify the setup:

# Test with a small document
curl -X POST "http://localhost:8000/api/test-sample" \
  -H "Content-Type: application/json" \
  -d '{}'

# Check system status
curl "http://localhost:8000/api/status"

Overview - Full system setup
Neo4j Cleanup - Detailed Neo4j cleanup procedures
Docker Setup - Container-based deployment
Configuration Guide - Environment configuration

Vector Dimension Compatibility Guide¶

⚠️ Critical Issue: Vector Dimension Incompatibility¶

Why This Matters¶

📊 Embedding Dimensions by Provider¶

OpenAI¶

Ollama¶

Azure OpenAI¶

Other Providers¶

🗂️ Vector Database Cleanup Instructions¶

🎯 Best Databases for Easy Vector Deletion¶

Qdrant (Recommended for Easy Deletion)¶

Neo4j¶

Elasticsearch¶

OpenSearch¶

Chroma (File System or HTTP API Cleanup)¶

Milvus (Professional Dashboard)¶

Weaviate (Schema Management)¶

PostgreSQL+pgvector (SQL-Based)¶

Pinecone (Cloud Console)¶

LanceDB (File-Based Cleanup)¶

Neo4j (Vector Index Cleanup)¶

🔄 Safe Migration Process¶

1. Backup Important Data (Optional)¶

2. Update Configuration¶

3. Clean Vector Database¶

4. Restart Services¶

5. Re-ingest Documents¶

🚨 Common Error Messages¶

Qdrant¶

Neo4j¶

Elasticsearch/OpenSearch¶

📋 Configuration Detection¶

Ollama + Ladybug + vector store¶

Best Practices¶

Verification¶

📚 Related Documentation¶