LLM Testing Results¶
Comprehensive testing results for different LLM and embedding provider combinations with GraphRAG.
Test Summary¶
| LLM Provider / Model | Graph Extraction | Search with Graph | AI Query with Graph | Notes |
|---|---|---|---|---|
| OpenAI (gpt-4.1-mini recommended, gpt-4o-mini, etc.) | ✅ Full entities/relationships | ✅ Works | ✅ Works | gpt-4.1-mini recommended — zero 0-entity failures, 3-4x faster than gpt-4o-mini on multi-chunk docs (2026-03-20) |
| Azure OpenAI (gpt-4o-mini) | ✅ Full entities/relationships | ✅ Works | ✅ Works | Same as OpenAI, enterprise hosting |
| Google Gemini (gemini-3-flash-preview, gemini-3.1-pro-preview) | ✅ Full entities/relationships | ✅ Works | ✅ Works | Fixed 2026-03-15: AFC disabled + pydantic_program_mode=FUNCTION required. Note: gemini-3-pro-preview deprecated |
| Google Vertex AI (gemini-2.5-flash) | ✅ Full entities/relationships | ✅ Works | ✅ Works | Fixed 2026-03-15: Same fixes as Gemini. Note: gemini-3-flash-preview not available on Vertex AI |
| Anthropic Claude (sonnet-4-5, haiku-4-5) | ✅ Full entities/relationships | ✅ Works | ✅ Works | Fixed 2026-01-09: Event loop fix restored extraction |
Ollama (gpt-oss:20b recommended — see section below) |
✅ Full entities/relationships | ✅ Works | ✅ Works | Use gpt-oss:20b only; llama3.1:8b/llama3.2:3b not recommended |
| Groq (openai/gpt-oss-20b, llama-3.3-70b-versatile) | ✅ Full entities/relationships | ✅ Works | ✅ Works | Fixed 2026-03-17: LlamaIndex assigned wrong context_window (3900) for Groq models causing finish_reason: length. Now fixed with correct context/max_tokens. Uses DynamicLLMPathExtractor. |
| Fireworks AI (gpt-oss-120b and others) | ✅ Full entities/relationships | ✅ Works | ✅ Works | Fixed 2026-03-17: max_tokens defaulted to 256 causing silent truncation. Now set to 32768. Uses DynamicLLMPathExtractor. |
| Amazon Bedrock (claude-sonnet, gpt-oss, deepseek-r1, llama3) | ✅ Full entities/relationships | ✅ Works | ✅ Works | Auto-switches to DynamicLLMPathExtractor; use_ontology=True + DISABLE_PROPERTIES=true confirmed working (33 nodes/59 rels, both docs) |
| OpenAI-Like (any OpenAI-compatible API — Ollama /v1, vLLM Docker) | ✅ Full entities/relationships | ✅ Works | ✅ Works | Added 2026-03-19: Auto-switches to DynamicLLMPathExtractor. Use for vLLM Docker (localhost:8002) or Ollama /v1 endpoint. |
| vLLM (Docker container, any HuggingFace model) | ✅ Full entities/relationships | ✅ Works | ✅ Works | Added 2026-03-19: Use LLM_PROVIDER=openai_like pointing at vLLM Docker port 8002. LLM_PROVIDER=vllm (Python package) Linux/macOS only, untested. |
| LiteLLM (proxy for 100+ providers) | ✅ Full entities/relationships | ✅ Works | ✅ Works | Added 2026-03-19: Cloud models (gpt-4o-mini) use SchemaLLMPathExtractor; Ollama-backed models auto-switch to DynamicLLMPathExtractor. Proxy runs in WSL2 on Windows. |
| OpenRouter (unified API, 200+ models) | ✅ Full entities/relationships | ✅ Works | ✅ Works | Added 2026-03-19: Auto-switches to DynamicLLMPathExtractor (OpenRouter extends OpenAILike, tool-calling incompatible). Requires context_window=128000 + max_tokens=16384. |
Graph Database Compatibility¶
All graph databases (Neo4j, Ladybug, FalkorDB, ArcadeDB, MemGraph, etc.) show identical behavior patterns based on the LLM provider used, not the database choice:
- OpenAI/Azure OpenAI: Full graph extraction with entities and relationships works on all databases
- Google Gemini: Full graph extraction works with
use_ontology=True(AFC disabled +pydantic_program_mode=FUNCTIONfixes applied 2026-03-15) - Google Vertex AI: Full graph extraction works — benchmarks run with
use_ontology=False, needs retest withuse_ontology=True - Anthropic Claude (direct): Full graph extraction works — benchmarks run with
use_ontology=False, needs retest withuse_ontology=True - Bedrock:
DynamicLLMPathExtractorconfirmed working after LlamaIndex bug fixes —use_ontology=True+DISABLE_PROPERTIES=truegives best results (33 nodes/59 rels both docs) - Groq:
DynamicLLMPathExtractorconfirmed working after context_window fix (2026-03-17) — 60 nodes/137 rels both docs in 14s - Fireworks:
DynamicLLMPathExtractor— max_tokens fix applied (2026-03-17); retest pending
Google Gemini Models¶
Tested Models¶
- ✅
gemini-3-flash-preview- Full graph building, search and query work (fast, recommended) - ✅
gemini-3.1-pro-preview- Full graph building, search and query work (pro quality) - ✅
gemini-2.5-flash- Full graph building, search and query work - ⚠️
gemini-3-pro-preview- Full graph building, search and query work — deprecated, usegemini-3.1-pro-preview
Embedding¶
- ✅
gemini-embedding-2-preview(768 dims) — current recommended Google embedding model - ❌
text-embedding-004— deprecated 2026-01-14
Known Issues¶
- ✅ Fixed 2026-01-08: Graph search/query async event loop conflicts resolved
- ✅ Fixed 2026-03-14: AFC (Automatic Function Calling) disabled in
factories.py— AFC was interceptingSchemaLLMPathExtractortool calls before Gemini could respond, causing 0 entities/relationships to be extracted. Log signature:AFC is enabled with max remote calls: 10.followed immediately byTotal: 0 entities, 0 relations. - ✅ Fixed 2026-03-15:
pydantic_program_mode=FUNCTIONnow explicitly passed toGoogleGenAI— previously omitted, causingDEFAULT(JSON schema) mode which silently returned 0 entities in 5ms whenuse_ontology=True. Both fixes together are required for full entity/relationship extraction. - Gemini creates full knowledge graphs with proper entities and relationships on all graph databases
- Search and AI query now work correctly with graph enabled
Benchmark Results — gemini-3-flash-preview, SchemaLLMPathExtractor, use_ontology=True¶
Hardware: AMD 9950X3D, NVIDIA RTX 5090, 128GB RAM, Windows 11
Storage mode: ingestion_storage_mode=both (Neo4j + GraphDB in single pass)
Embedding: gemini-embedding-2-preview (768 dims)
Ontology: company_classes.ttl, company_properties.ttl, common_ontology.ttl
| Doc | Neo4j Nodes | Neo4j Rels | GraphDB Rows | Time |
|---|---|---|---|---|
cmispress.txt |
22 | 41 | 209 | 24.80s |
| Both docs total | 58 | 123 | 551 | +47.34s (company-ontology-test.txt) |
Performance breakdown (company-ontology-test.txt): Pipeline: 1.06s, Vector: 0.24s, Graph: 46.00s
Sample Q&A Result¶
Query: "Who works at Acme?" — Returns names + roles + departments:
Sarah Chen (Senior Software Engineer, Engineering), Marcus Rivera (Software Engineer, Engineering), Priya Patel (Software Engineer, Engineering), James Okafor (Head of Product, Product Management), Linda Torres (Account Executive, Sales)
Google Vertex AI Models¶
Tested Models¶
- ✅
gemini-2.5-flash- Full graph building, search and query work (confirmed 2026-03-15) - ⚠️
gemini-3-flash-preview- Not available on Vertex AI (usegemini-2.5-flashinstead)
Configuration Notes¶
- Uses modern
google-genaipackage withGoogleGenAIandGoogleGenAIEmbeddingclasses - Requires
vertexai_configparameter with project and location - Cleaned up implementation - removed deprecated
llama-index-llms-vertexandllama-index-embeddings-vertexpackages - Environment variables (
GOOGLE_GENAI_USE_VERTEXAI,GOOGLE_CLOUD_PROJECT,GOOGLE_CLOUD_LOCATION) are set automatically by the code
Known Issues¶
- ✅ Fixed 2026-01-08: Graph search/query async event loop conflicts resolved (same fix as Gemini)
- ✅ Fixed 2026-03-15: AFC disabled +
pydantic_program_mode=FUNCTIONexplicitly passed — same root cause as Gemini (see Gemini Known Issues) - Vertex AI creates full knowledge graphs with proper entities and relationships on all graph databases
- Search and AI query now work correctly with graph enabled
Benchmark Results — gemini-2.5-flash, SchemaLLMPathExtractor, use_ontology=False ⚠️ needs retest with use_ontology=True¶
Hardware: AMD 9950X3D, NVIDIA RTX 5090, 128GB RAM, Windows 11
Storage mode: ingestion_storage_mode=both (Neo4j + GraphDB in single pass)
Embedding: gemini-embedding-2-preview via Vertex AI (768 dims)
Ontology: company_classes.ttl, company_properties.ttl, common_ontology.ttl (loaded in store but use_ontology=False — ontology not used for extraction)
| Doc | Neo4j Nodes | Neo4j Rels | GraphDB Rows | Time |
|---|---|---|---|---|
cmispress.txt |
26 | 45 | 245 | 26.01s |
| Both docs total | 61 | 124 | 582 | +34.65s (company-ontology-test.txt) |
Performance breakdown (company-ontology-test.txt): Pipeline: 1.07s, Vector: 0.52s, Graph: 33.02s
Sample Q&A Result¶
Query: "Who works at Acme?" — Returns names only (vs. Gemini direct which returns names + roles + departments)
Anthropic Claude Models¶
Tested Models¶
- ✅
claude-sonnet-4-5-20250929- Full graph building with entities/relationships, search and AI query work - ✅
claude-haiku-4-5-20251001- Full graph building with entities/relationships, search and AI query work
Known Issues¶
- Fixed 2026-01-09: Event loop fix restored full entity/relationship extraction (was previously showing "chunk nodes only")
- Both models now create proper entities and relationships with the async event loop fix
Benchmark Results — claude-sonnet-4-5-20250929, SchemaLLMPathExtractor, use_ontology=False ⚠️ needs retest with use_ontology=True¶
Hardware: AMD 9950X3D, NVIDIA RTX 5090, 128GB RAM, Windows 11
Storage mode: ingestion_storage_mode=both (Neo4j + GraphDB in single pass)
Embedding: text-embedding-3-small (OpenAI, 1536 dims)
Ontology: company_classes.ttl, company_properties.ttl, common_ontology.ttl (loaded in store but use_ontology=False — ontology not used for extraction)
| Doc | Neo4j Nodes | Neo4j Rels | GraphDB Rows | Time |
|---|---|---|---|---|
cmispress.txt |
26 | 45 | 245 | 21.00s |
| Both docs total | 53 | 116 | 503 | +13.76s (company-ontology-test.txt) |
Sample Q&A Result¶
Query: "Who works at Acme?" — Returns names + roles + departments:
Sarah Chen (Senior Software Engineer, Engineering), Marcus Rivera (Software Engineer, Engineering), Priya Patel (Software Engineer, Engineering), James Okafor (Head of Product, Product Management), Linda Torres (Account Executive, Sales)
Groq Models¶
Tested Models¶
- ✅
openai/gpt-oss-20b— Full graph extraction (60 nodes, 137 rels, both docs) — Fixed 2026-03-17 - ✅
llama-3.3-70b-versatile— Full graph extraction — same fix applies - ✅
openai/gpt-oss-120b— Expected to work (same fix; untested) - ✅
llama-3.1-8b-instant— Expected to work (same fix; untested)
Benchmark — llama-3.3-70b-versatile, DynamicLLMPathExtractor, use_ontology=False (2026-03-17)¶
Test conditions: DynamicLLMPathExtractor (auto-switched from schema), auto extraction mode, text-embedding-3-small (1536-dim), Neo4j only, no ontology, no properties.
| Document | Time | Extracted | Neo4j display | Chunks |
|---|---|---|---|---|
cmispress.txt (2480 chars) |
6.60s | 40 entities, 20 rels | 24 nodes, 43 rels | 1 |
company-ontology-test.txt (5618 chars, 2nd doc) |
7.56s | 104 entities, 52 rels | +36 nodes, +94 rels | 2 |
| Both docs combined (Neo4j) | 14.16s | — | 60 nodes, 137 rels | 3 |
Log source: flexible-graphrag-api-20260317-030532.log
Benchmark — openai/gpt-oss-20b, DynamicLLMPathExtractor, use_ontology=True, properties on, function mode (2026-03-17)¶
Test conditions: DynamicLLMPathExtractor, function extraction mode, text-embedding-3-small, Neo4j + GraphDB, ontology: 16 entity types / 22 relation types / 13 entity props / 4 relation props.
| Document | Time | Extracted | Neo4j display | Chunks |
|---|---|---|---|---|
cmispress.txt (2480 chars) |
14.50s | 40 entities, 20 rels | 23 nodes, 42 rels | 1 |
company-ontology-test.txt (5618 chars, 2nd doc) |
15.84s | 40 entities, 20 rels (14+26, capped) | +20 nodes, +39 rels | 2 |
| Both docs combined (Neo4j) | 30.34s | — | 43 nodes, 81 rels | 3 |
Log source: flexible-graphrag-api-20260317-053953.log
Note: Both runs had DISABLE_PROPERTIES=true. The ontology+properties run used gpt-oss-20b (different model) vs llama-3.3-70b-versatile in the no-ontology run, so the drop (43 vs 60 nodes, 81 vs 137 rels) reflects a combination of model difference, ontology constraints, and properties overhead. Next test: gpt-oss-20b with use_ontology=True + DISABLE_PROPERTIES=true to isolate the model vs ontology factors.
Configuration Notes¶
- Ultra-fast LPU (Language Processing Unit) architecture for low-latency inference
- All Groq models have 131,072 context window — LlamaIndex doesn't know Groq-specific models and defaults to 3900 tokens. flexible-graphrag overrides this automatically.
- Auto-switches to
DynamicLLMPathExtractor—SchemaLLMPathExtractorbroken in LlamaIndex 0.14.x (tool_choice conflict).DynamicLLMPathExtractornow works after context_window fix. is_function_calling_model=Falseis set on the extractor's LLM instance —DynamicLLMPathExtractorusesapredict()(plain text), not tool calls- Search and AI query work correctly with graph enabled
Known Issues¶
- ✅ Fixed 2026-03-17:
DynamicLLMPathExtractorreturned 0 entities. Root cause: LlamaIndex'sopenai_modelname_to_contextsize()does not know Groq models → falls back tocontext_window=3900→max_tokensis calculated as3900 - input_tokens ≈ 0→ model hits token limit immediately →finish_reason: length→ emptycontent→apredict()returns''. Fixed by passing correctcontext_windowandmax_tokensper model infactories.py. - ❌ Still broken:
SchemaLLMPathExtractor—OpenAI.astructured_predictinjectstool_choice="required"whichFunctionCallingProgram.acallstrips → model returns plain text → silent 0 entities. Auto-switches to Dynamic.
Fireworks AI Models¶
Tested Models (serverless + function calling)¶
- ✅
accounts/fireworks/models/gpt-oss-120b— Full graph extraction confirmed — Fixed 2026-03-17 - ✅
accounts/fireworks/models/deepseek-v3p2— Expected to work (same fix; untested post-fix) - ✅
accounts/fireworks/models/minimax-m2p5— Expected to work (same fix; untested post-fix) - ✅
accounts/fireworks/models/kimi-k2p5— Expected to work (same fix; untested post-fix) - ✅
accounts/fireworks/models/qwen3-vl-30b-a3b-thinking— Expected to work (same fix; untested post-fix)
Benchmark — accounts/fireworks/models/gpt-oss-120b, DynamicLLMPathExtractor, use_ontology=True, properties on, function mode (2026-03-17)¶
Test conditions: DynamicLLMPathExtractor (auto-switched), function extraction mode, text-embedding-3-small, Neo4j + GraphDB, ontology: 16 entity types / 22 relation types / 13 entity props / 4 relation props, streaming mode (_FireworksStreaming), max_tokens=16384.
| Document | Time | Neo4j display | Chunks |
|---|---|---|---|
cmispress.txt (2480 chars) |
21.34s | 22 nodes, 41 rels | 1 |
company-ontology-test.txt (5618 chars, 2nd doc) |
34.49s | +25 nodes, +60 rels | 2 (parallel) |
| Both docs combined (Neo4j) | 55.83s | 47 nodes, 101 rels | 3 |
Log source: flexible-graphrag-api-20260317-075958.log
Notes:
- All chunks extract fully (40/20 and 30/15 entities/rels per chunk); finish_reason=stop throughout
- Run-to-run variance of ~20-30% in final Neo4j counts is normal LLM non-determinism at temperature=0.1
Configuration Notes¶
- Supports wide model selection (Meta, Qwen, Mistral AI, DeepSeek, OpenAI GPT-OSS, Kimi, GLM, MiniMax)
- Does NOT support timeout parameter (overrides
__init__without including timeout parameter) - Auto-switches to
DynamicLLMPathExtractor—SchemaLLMPathExtractorthrows code exceptions - All tested models support function calling on Fireworks serverless
_custom_is_function_calling=Falseis set on the extractor's LLM instance soapredict()receives plain text responses instead of tool calls- All models: 131,072 context window; non-streaming API caps
max_tokensat 4096
Known Issues¶
- ✅ Fixed 2026-03-17:
max_tokens=256default caused silent truncation → 0 entities. Fixed by_FireworksStreamingsubclass overriding_achatto usestream=True(bypasses non-streaming 4096 cap). Defaultmax_tokens=16384. - ❌ Still broken:
SchemaLLMPathExtractor— throws code exceptions (tool schema incompatibility). Auto-switches to Dynamic.
Amazon Bedrock Models¶
Tested Models¶
Full Graph Extraction:
- ✅ us.anthropic.claude-sonnet-4-5-20250929-v1:0 - Full graph with entities and relationships
- ✅ openai.gpt-oss-20b-1:0 - Full graph with entities and relationships
- ✅ openai.gpt-oss-120b-1:0 - Full graph with entities and relationships
- ✅ us.deepseek.r1-v1:0 - Full graph with entities and relationships
- ✅ us.meta.llama3-3-70b-instruct-v1:0 - Full graph with entities and relationships
Needs Retesting (Previously Crashed with SchemaLLMPathExtractor):
- ⚠️ amazon.nova-pro-v1:0 - Previously: ModelErrorException: invalid ToolUse sequence (needs retest with SimpleLLMPathExtractor)
- ⚠️ us.amazon.nova-premier-v1:0 - Previously: ModelErrorException: invalid ToolUse sequence (needs retest with SimpleLLMPathExtractor)
- ⚠️ us.meta.llama3-1-70b-instruct-v1:0 - Previously: ValidationException: toolConfig.toolChoice.any not supported (needs retest with SimpleLLMPathExtractor)
- ⚠️ us.meta.llama3-2-90b-instruct-v1:0 - Previously: ValidationException: toolConfig.toolChoice.any not supported (needs retest with SimpleLLMPathExtractor)
- ⚠️ us.meta.llama4-maverick-17b-instruct-v1:0 - Previously: ValidationException: toolConfig.toolChoice.any not supported (needs retest with SimpleLLMPathExtractor)
- ⚠️ us.meta.llama4-scout-17b-instruct-v1:0 - Previously: ValidationException: toolConfig.toolChoice.any not supported (needs retest with SimpleLLMPathExtractor)
Configuration Notes¶
- Successfully switched from deprecated
llama-index-llms-bedrockto modernllama-index-llms-bedrock-conversepackage - Authentication: Uses AWS IAM admin credentials (NOT Bedrock API keys)
- Embeddings:
amazon.titan-embed-text-v2:0(1024 dims) works;text-embedding-3-small(OpenAI, 1536 dims) also usable with Bedrock LLM - Cross-region inference profiles: Most models require "us." prefix (e.g.,
us.anthropic.claude-*,us.meta.llama*,us.deepseek.*,us.amazon.nova-premier-*) - No prefix needed: OpenAI GPT-OSS models and
amazon.nova-pro-v1:0use standard model IDs without "us." prefix - Auto-switches to
DynamicLLMPathExtractor— Bedrock Converse API rejectsSchemaLLMPathExtractor's tool schema (toolConfig.toolChoice.anynot supported).pydantic_program_mode=FUNCTIONis set but insufficient — it's a Converse API constraint. - No switching for simple: If you configure
KG_EXTRACTOR_TYPE=simple, your choice is used as-is
Known Issues¶
- ✅ Fixed 2026-01-21: Auto-switch
schema→dynamicfor Bedrock (correct — Converse API limitation) - ✅ Fixed 2026-03-16:
DynamicLLMPathExtractornow works correctly — two LlamaIndex bugs fixed: (1) propsNone→[]coercion causing_apredict_with_propsto be called with empty prompt sections; (2)SafeFormatternot resolving{{→{in prompt template, causing model to receive malformed JSON examples. Both fixed in_make_dynamic_extractor(). - Search and AI query work correctly with graph enabled
Benchmark Results — us.anthropic.claude-sonnet-4-5-20250929-v1:0, Dynamic extractor¶
Hardware: AMD 9950X3D, NVIDIA RTX 5090, 128GB RAM, Windows 11
Storage mode: ingestion_storage_mode=both (Neo4j + GraphDB in single pass)
Run 1 — Embedding: amazon.titan-embed-text-v2:0 (Bedrock, 1024 dims), use_ontology=False
| Doc | Neo4j Nodes | Neo4j Rels | GraphDB Rows | Time |
|---|---|---|---|---|
cmispress.txt |
13 | 22 | — | 13.25s |
Run 2 — Embedding: text-embedding-3-small (OpenAI, 1536 dims), use_ontology=False
| Doc | Neo4j Nodes | Neo4j Rels | GraphDB Rows | Time |
|---|---|---|---|---|
cmispress.txt |
12 | 21 | 109 | 8.81s |
| Both docs total | 38 | 68 | 346 | +10.17s (company-ontology-test.txt) |
Note (Run 2): use_ontology=False with SCHEMA_NAME=default still uses LlamaIndex's built-in internal node/relation type schema for DynamicLLMPathExtractor — it provides entity and relation type guidance (e.g. PERSON, ORGANIZATION, LOCATION, WORKS_FOR etc.) from LlamaIndex's SAMPLE_SCHEMA defaults. This is why Run 2 gives high counts — it has type guidance without any ontology file overhead.
Run 3 — Embedding: text-embedding-3-small, use_ontology=False, DISABLE_PROPERTIES=true (2026-03-16, after LlamaIndex bug fixes)
| Doc | Neo4j Nodes | Neo4j Rels | GraphDB Rows | Time |
|---|---|---|---|---|
cmispress.txt |
10 | 17 | 89 | ~6s |
Note (Run 3): Lower than Run 2 — this run had a single doc without the second doc's additive Neo4j dedup benefit. Double-brace prompt fix confirmed working. Log: flexible-graphrag-api-20260316-013826.log
Run 4 — use_ontology=True, DISABLE_PROPERTIES=true, text-embedding-3-small (2026-03-16, after LlamaIndex bug fixes)
| Doc | Extracted Entities | Extracted Rels | Neo4j Nodes | Neo4j Rels | GraphDB Rows | Time |
|---|---|---|---|---|---|---|
cmispress.txt |
18 | 9 | 11 | 19 | 99 | ~5s |
company-ontology-test.txt |
34 (2 chunks) | 17 | +22 | +40 | 197 | ~6s |
| Both docs total | 52 | 26 | 33 | 59 | 296 |
Note (Run 4): use_ontology=True + DISABLE_PROPERTIES=true gives 33 nodes/59 rels — slightly lower than use_ontology=False Run 2 (38/68) because the company ontology types are more specific/restrictive than LlamaIndex's broad built-in schema (PERSON, ORG, LOCATION etc.). The ontology gives domain-correct typed labels (EMPLOYEE, COMPANY, DOCUMENT) while the built-in schema gives higher raw counts with generic labels. Trade-off: ontology = more accurate types, built-in schema = higher raw counts. Log: flexible-graphrag-api-20260316-020217.log
Run 5 — use_ontology=True, DISABLE_PROPERTIES=false (properties enabled), text-embedding-3-small (2026-03-16)
| Doc | Extracted Entities | Extracted Rels | Neo4j Nodes | Neo4j Rels | GraphDB Rows | Time |
|---|---|---|---|---|---|---|
cmispress.txt |
10 | 5 | 8 | 12 | 68 | ~9s |
company-ontology-test.txt |
10 (2 chunks) | 5 | +8 | +12 | 71 | ~7s |
| Both docs total | 20 | 10 | 16 | 24 | 139 |
Note (Run 5): Enabling ontology properties (DISABLE_PROPERTIES=false) gives worse results than disabling them (16 nodes/24 rels vs. 33 nodes/59 rels in Run 4). The _apredict_with_props prompt adds property constraints that reduce the model's extraction breadth — Claude focuses on filling property fields rather than finding all entity/relation pairs. Recommendation: use DISABLE_PROPERTIES=true with use_ontology=True for best Bedrock results. Log: flexible-graphrag-api-20260316-031613.log
Bedrock Recommended Configuration¶
USE_ONTOLOGY=true
DISABLE_PROPERTIES=true
BEDROCK_MODEL=us.anthropic.claude-sonnet-4-5-20250929-v1:0
EMBEDDING_KIND=openai
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
All graph databases show identical behavior based on LLM provider, not database choice:
Graph-Only Mode Testing (No separate Vector / Search databases)¶
Testing with OpenAI LLM, graph database only (no Elasticsearch, no Qdrant):
| Graph Database | Graph Creation | Search | AI Query | Notes |
|---|---|---|---|---|
| Neo4j | ✅ Full graph | ✅ Works (high scores, structured results) | ✅ Works | Recommended |
| FalkorDB | ✅ Full graph | ✅ Works (high scores, structured results) | ✅ Works | Fully functional |
| ArcadeDB | ✅ Full graph | ❌ No results | ❌ Empty response | Needs vector support update |
Key Findings: - Neo4j and FalkorDB work perfectly in graph-only mode with OpenAI, providing high-confidence search results (0.791 scores) with extracted relationship triplets and source text - ArcadeDB creates full graphs but cannot perform graph-based retrieval without external vector/search databases - TODO: Update ArcadeDB LlamaIndex PropertyGraphStore integration to leverage ArcadeDB's native vector support capabilities
Multi-Database Graph Compatibility¶
Testing with OpenAI + full database stack (vector + search + graph):
| LLM Provider | Neo4j | FalkorDB | ArcadeDB | Result |
|---|---|---|---|---|
| OpenAI | ✅ Full graph + search | ✅ Full graph + search | ✅ Full graph + search | All work identically |
| Azure OpenAI | ✅ Full graph + search | ✅ Full graph + search | ✅ Full graph + search | All work identically |
| Groq (llama-3.3-70b-versatile) | ✅ Full graph + search | ✅ Full graph + search | ✅ Full graph + search | Fixed 2026-03-17: context_window + max_tokens |
| Gemini | ✅ Graph + Search | ✅ Graph + Search | ✅ Graph + Search | Fixed 2026-01-08 |
| Vertex AI | ✅ Graph + Search | ✅ Graph + Search | ✅ Graph + Search | Fixed 2026-01-08 |
Conclusion: Graph database choice doesn't affect LLM compatibility - the pattern is determined by LLM provider's integration with LlamaIndex PropertyGraph.
OpenAI Models¶
Tested Models¶
- ✅
gpt-4.1-mini— Recommended — Full graph extraction, fastest and most consistent of all tested providers (2026-03-20) - ✅
gpt-4o-mini— Full graph extraction with entities/relationships, search and AI query work - ✅
gpt-4o— Full graph extraction (untested in benchmark but expected equivalent to gpt-4o-mini)
gpt-4.1-mini — Detailed Benchmark (2026-03-20) ⭐ Best OpenAI result¶
Test conditions: SchemaLLMPathExtractor + function mode (tool calling), nomic-embed-text via Ollama openai_like embedding (768-dim), use_ontology=True. Ontology: 3 files → 16 entity types, 22 relation types. Hardware: AMD 9950X3D, NVIDIA RTX 5090, 128 GB RAM, Windows 11.
| Document | Time | Pipeline | Graph | Node 0 entities | Node 1 entities | Neo4j Nodes | Neo4j Rels |
|---|---|---|---|---|---|---|---|
cmispress.txt (2480 chars) |
17.76s | 2.25s | 14.59s | 40 | — | 21 | 40 |
company-ontology-test.txt (5618 chars, 2nd doc) |
20.36s | 2.52s | 17.56s | 40 | 40 | +26 | +74 |
| Both docs combined (Neo4j) | 38s | — | — | — | — | 47 | 114 |
Log source: flexible-graphrag-api-20260320-012112.log
Why gpt-4.1-mini outperforms gpt-4o-mini: - Zero 0-entity failures — both chunks of company-ontology-test.txt returned exactly 40 entities each. gpt-4o-mini regularly returns 0 entities on the large 3782-char chunk (non-determinism with complex tool-call schema) - 3-4x faster on the 2-chunk doc — 20s vs gpt-4o-mini's 47-93s range - Better tool-call schema following for large ontology prompts (13 entity props + 4 relation props)
gpt-4o-mini — Detailed Benchmark (2026-03-14)¶
Test conditions: SchemaLLMPathExtractor + function mode (tool calling), text-embedding-3-small (1536-dim), OLLAMA_CONTEXT_LENGTH n/a, ingestion_storage_mode=both — Neo4j and GraphDB RDF store written in the same single ingestion pass. Ontology: 3 files → 16 entity types, 22 relation types. Hardware: AMD 9950X3D, NVIDIA RTX 5090, 128 GB RAM, Windows 11.
| Document | Time | Pipeline | Graph | Extracted | RDF built | GraphDB stored | Neo4j display | Chunks |
|---|---|---|---|---|---|---|---|---|
company-ontology-test.txt (5618 chars) |
39.68s | 5.03s | 33.76s | 74 ent (34+40), 37 rel (17+20) | 283 triples (+10 annot.) | ~67 | 26 nodes, 71 rels | 2 |
cmispress.txt (2480 chars, 2nd doc) |
17.35s | 1.15s | 15.96s | 38 ent, 19 rel | 171 triples (+1 annot.) | ~51 | +18 nodes, +36 rels | 1 |
| Both docs combined (Neo4j) | 57s | — | — | — | — | — | 44 nodes, 107 rels | 3 |
| Both docs combined (GraphDB) | — | — | — | — | — | ~118 stored / 481 rows | — |
Log source: flexible-graphrag-api-20260314-124435.log
Configuration Notes¶
- Embedding:
text-embedding-3-small(1536 dimensions) — OpenAI default SchemaLLMPathExtractor strict=False(schema + open) — uses ontology but allows additional types- Notable: gpt-4o-mini extracted LATITUDE/LONGITUDE properties on location entities (e.g. Austin: 30.2672, -97.7431)
Search & Q&A Test Results (2026-03-14, same session, both docs ingested)¶
| Mode | Query | Time | Results | Notes |
|---|---|---|---|---|
| Hybrid search | "who was first with cmis" | 1.85s | 3 (deduped from 4) | Consistent across 2 runs |
| Hybrid search | "who was first with cmis" | 1.46s | 3 (deduped from 4) | 2nd run, same results |
| Q&A | "who was first with cmis" | 2.64s | 162-char answer | Answer generated successfully |
| Q&A | "who works at acme" | 2.69s | 109-char answer | Answer generated successfully |
| Hybrid search | "who works at acme" | 0.82s | 2 (deduped from 3) | 1 result filtered by score |
Search and Q&A both confirmed working with OpenAI gpt-4o-mini + Neo4j graph + Elasticsearch + Qdrant stack.
Azure OpenAI¶
Tested Models¶
- ✅
gpt-4o-mini- Full GraphRAG functionality (graph building with entities/relationships, search, AI query all work)
gpt-4o-mini — Detailed Benchmark (2026-03-14)¶
Test conditions: SchemaLLMPathExtractor + function mode (tool calling), AzureOpenAIEmbedding text-embedding-3-small (1536-dim), ingestion_storage_mode=both — Neo4j and GraphDB RDF store written in the same single ingestion pass. Ontology: 3 files → 16 entity types, 22 relation types. Hardware: AMD 9950X3D, NVIDIA RTX 5090, 128 GB RAM, Windows 11.
| Document | Time | Pipeline | Graph | Extracted | RDF built | GraphDB stored | Neo4j display | Chunks |
|---|---|---|---|---|---|---|---|---|
company-ontology-test.txt (5618 chars) |
15.04s | 2.06s | 12.12s | 74 ent (36+38), 37 rel (18+19) | 244 triples (+5 annot.) | ~59 | 23 nodes, 64 rels | 2 |
cmispress.txt (2480 chars, 2nd doc) |
128.79s | 0.34s | 128.20s | 38 ent, 19 rel | 144 triples (+1 annot.) | ~48 | +15 nodes, +33 rels | 1 |
| Both docs combined (Neo4j) | 144s | — | — | — | — | — | 38 nodes, 97 rels | 3 |
| Both docs combined (GraphDB) | — | — | — | — | — | ~107 stored / 401 rows | — |
Log source: flexible-graphrag-api-20260314-134207.log
Note on cmispress.txt 128s: The graph phase took 128.20s despite only extracting 1 chunk. Log line 439 shows Retrying request to /chat/completions in 0.422565 seconds — Azure OpenAI rate-limited or had a transient error mid-extraction, causing ~2 minutes of retry waiting. The actual extraction (38 entities, 19 rels) and graph insert (1.1s) are fast; this is Azure throttling, not a pipeline issue.
Also noted: Azure OpenAI embedding_kind warning — No embedding_kind specified, defaulting to 1536. Set EMBEDDING_KIND=azure in .env to suppress.
Search & Q&A Test Results (2026-03-14, same session, both docs ingested)¶
| Mode | Query | Time | Results | Notes |
|---|---|---|---|---|
| Hybrid search | "who was first with cmis" | 0.95s | 3 (deduped from 4) | |
| Q&A | "who was first with cmis" | 1.88s | 184-char answer | |
| Q&A | "who works for acme" | 1.68s | 119-char answer | |
| Hybrid search | "who works for acme" | 0.55s | 3 (deduped from 4, 1 score-filtered) |
Search and Q&A confirmed working with Azure OpenAI gpt-4o-mini.
Configuration Notes¶
- Requires
EMBEDDING_KIND=azureto use Azure-hosted embeddings and suppress dimension warning - Can also use
EMBEDDING_KIND=openaiwith separate OpenAI API key
Ollama Models¶
Tested Models¶
| Model | Status | Notes |
|---|---|---|
gpt-oss:20b |
✅ RECOMMENDED | Best speed/quality balance — use this |
llama3.1:8b |
❌ NOT RECOMMENDED | Very slow; second doc in same session hangs |
llama3.2:3b |
❌ NOT RECOMMENDED | Stalls at 8192 context on GPU; falls back to 4096 — low entity count |
sciphi/triplex |
❌ Not usable | Extracts 0 entities/relationships, extremely slow |
gpt-oss:20b — Detailed Benchmarks (2026-03-14)¶
Common conditions: gpt-oss:20b, nomic-embed-text embeddings, OLLAMA_CONTEXT_LENGTH=8192, OLLAMA_NUM_PARALLEL=4, ingestion_storage_mode=both — Neo4j and GraphDB RDF store written in the same single ingestion pass. Ontology: 3 files (company_classes.ttl, company_properties.ttl, common_ontology.ttl) → 16 entity types, 22 relation types. Hardware: AMD 9950X3D, NVIDIA RTX 5090, 128 GB RAM, Windows 11.
KG_EXTRACTOR_TYPE=function (SchemaLLMPathExtractor, function/tool calling mode)¶
| Document | Time | Pipeline | Graph | Neo4j Nodes | Neo4j Rels | GraphDB RDF | Chunks |
|---|---|---|---|---|---|---|---|
company-ontology-test.txt (5618 chars) |
45s | — | — | 35 | 96 | — | 2 |
cmispress.txt (2nd doc, same session) |
13s | — | — | +32 | +46 | — | 1 |
| Both docs combined | 58s | — | — | 67 | 142 | 675 rows | 3 |
KG_EXTRACTOR_TYPE=dynamic (DynamicLLMPathExtractor, ontology-guided)¶
| Document | Time | Pipeline | Graph | Extracted entities | Extracted rels | RDF triples built | GraphDB stored | Chunks |
|---|---|---|---|---|---|---|---|---|
company-ontology-test.txt (5618 chars) |
65.86s | 2.53s | 62.12s | 54 (14+40) | 27 (7+20) | 180 | ~23 | 2 |
cmispress.txt (2480 chars, 2nd doc) |
45.53s | 2.03s | 43.25s | 40 | 20 | 178 (+4 annot.) | ~55 | 1 |
| Both docs combined (Neo4j) | 111s | — | — | — | — | — | — | 3 |
| Both docs combined (Neo4j display) | — | — | — | 38 nodes | 88 rels | — | — | |
| Both docs combined (GraphDB) | — | — | — | — | — | — | 366 rows |
Log source: flexible-graphrag-api-20260314-103941.log
Extractor comparison — company-ontology-test.txt alone:
| Extractor | Time | Neo4j nodes | Neo4j rels | GraphDB rows |
|---|---|---|---|---|
function (SchemaLLMPathExtractor) |
45s | 35 | 96 | — |
dynamic (DynamicLLMPathExtractor) |
66s | 19 | 50 | 180 |
function mode is faster and produces more Neo4j relationships. dynamic mode is slower but uses ontology guidance for entity typing and also produces RDF annotation triples.
llama3.1:8b — Why Not Recommended¶
- ~240s per document (5x slower than
gpt-oss:20b) on the same hardware (AMD 9950X3D, RTX 5090, 128 GB RAM) - Second document in the same session frequently hangs (likely
OLLAMA_NUM_PARALLEL=4contention) - Higher raw entity count than
gpt-oss:20bin some tests, but speed/reliability tradeoff is not worth it
llama3.2:3b — Why Not Recommended¶
OLLAMA_CONTEXT_LENGTH=8192: stalls indefinitely on GPUOLLAMA_CONTEXT_LENGTH=4096: works but produces low entity counts; got more entities fromcompany-ontology-test.txtthancmispress.txt(shorter doc extracted less)- Too small for complex SchemaLLMPathExtractor prompts with ontology schemas
- Second doc in same session hangs
Configuration Notes¶
- Local deployment — no API cost, full data privacy
- Both
SchemaLLMPathExtractor(function mode) andDynamicLLMPathExtractor(dynamic mode) work correctly withgpt-oss:20b - Embedding: use
nomic-embed-text(8192-token context); avoidall-minilm(256-token limit silently truncates)
Recommended Ollama Config¶
OLLAMA_MODEL=gpt-oss:20b
OLLAMA_CONTEXT_LENGTH=8192
OLLAMA_KEEP_ALIVE=1440m
OLLAMA_MAX_LOADED_MODELS=2
OLLAMA_NUM_PARALLEL=4
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
Known Issues¶
- ✅ Fixed 2026-01-09: Async event loop fix restored full Ollama functionality
sciphi/triplexstill fails to extract entities/relationships despite being designed for KG extractionllama3.1:8bandllama3.2:3btechnically functional but not recommended — see table above
OpenAI-Like Models¶
Overview¶
Use LLM_PROVIDER=openai_like for any OpenAI-compatible HTTP API — vLLM Docker, Ollama /v1 endpoint, LM Studio, LocalAI, Llamafile, etc.
Configuration¶
LLM_PROVIDER=openai_like
OPENAI_LIKE_API_BASE=http://localhost:8002/v1 # vLLM Docker
# or
OPENAI_LIKE_API_BASE=http://localhost:11434/v1 # Ollama /v1 endpoint
OPENAI_LIKE_MODEL=Qwen/Qwen2.5-7B-Instruct # or gpt-oss:20b for Ollama
OPENAI_LIKE_API_KEY=local
Known Issues / Fixes¶
- ✅ Fixed 2026-03-19: Added to
switch_to_dynamic_providers— SchemaLLMPathExtractor returns 0 entities (tool_choice conflict same as Groq). Auto-switches to DynamicLLMPathExtractor.
Benchmark — gpt-oss:20b via Ollama /v1, DynamicLLMPathExtractor, use_ontology=True (2026-03-19)¶
Embedding: text-embedding-3-small (OpenAI, 1536 dims)
| Doc | Neo4j Nodes | Neo4j Rels | Time |
|---|---|---|---|
cmispress.txt |
22 | 41 | — |
| Both docs total | 39 | 81 | — |
Log source: flexible-graphrag-api-20260317-092343.log
vLLM Docker Benchmark — Qwen/Qwen2.5-7B-Instruct, DynamicLLMPathExtractor, use_ontology=True (2026-03-18)¶
Setup: vLLM Docker on port 8002, LLM_PROVIDER=openai_like, WSL2 memory=48GB
Embedding: text-embedding-3-small (OpenAI, 1536 dims)
| Doc | Neo4j Nodes | Neo4j Rels | Time |
|---|---|---|---|
cmispress.txt |
31 | 48 | 20.33s |
| Both docs total | 88 | 217 | 46.44s |
Log source: flexible-graphrag-api-20260318-194621.log
Note: WSL2 .wslconfig with memory=48GB required — default 8GB caused KV cache exhaustion and 0 entities on one chunk.
vLLM¶
Overview¶
Two options:
1. Docker (all platforms, recommended): Add docker/includes/vllm.yaml to docker-compose, use LLM_PROVIDER=openai_like pointing at port 8002
2. Python package (Linux/macOS only, untested): LLM_PROVIDER=vllm with uv pip install vllm
Configuration (Docker — recommended)¶
LLM_PROVIDER=openai_like
OPENAI_LIKE_API_BASE=http://localhost:8002/v1
OPENAI_LIKE_MODEL=Qwen/Qwen2.5-7B-Instruct
docs/LLM/LLM-EMBEDDING-CONFIG.md Section 18 for full Docker setup.
Notes¶
- vLLM Docker port: 8002 (host) → 8000 (container)
- Default model:
Qwen/Qwen2.5-7B-Instruct(configurable viaVLLM_MODELenv var) - Windows: requires WSL2 with sufficient memory allocation (
memory=48GBin.wslconfigfor 128GB RAM systems) - See
docs/ADVANCED/DOCKER-RESOURCE-CONFIGURATION.mdfor WSL2/macOS/Linux sizing guide
LiteLLM¶
Overview¶
LLM_PROVIDER=litellm — routes through LiteLLM proxy (100+ providers). On Windows, run the proxy in WSL2 to bypass security software (Norton etc.) that blocks litellm.exe.
Configuration¶
LLM_PROVIDER=litellm
LITELLM_MODEL=gpt-4o-mini # or ollama/gpt-oss:20b
LITELLM_API_BASE=http://localhost:4000/v1
LITELLM_API_KEY=local
LITELLM_TIMEOUT=300.0 # critical — prevents silent hang on 2-chunk docs
Proxy startup (WSL2 Ubuntu):
OPENAI_API_KEY="sk-..." litellm --config /mnt/c/newdev3/flexible-graphrag/flexible-graphrag/litellm_config.yaml --port 4000
Known Issues / Fixes¶
- ✅ Fixed 2026-03-19:
LITELLM_API_BASEmust include/v1suffix - ✅ Fixed 2026-03-19:
LITELLM_EMBEDDING_API_BASEmust include/v1suffix - ✅ Fixed 2026-03-19:
litellm.drop_params=Trueset infactories.py— stripsparallel_tool_callsfor Ollama-backed models - ✅ Fixed 2026-03-19:
LITELLM_TIMEOUT=300.0— default timeout caused silent hang on 2-chunk docs (proxy drops response without error) - ✅ Fixed 2026-03-19: Ollama-backed models (
ollama/*prefix) auto-switch to DynamicLLMPathExtractor
Extractor Routing¶
- Cloud models (gpt-4o-mini, etc.): SchemaLLMPathExtractor — LiteLLM is FunctionCallingLLM subclass
- Ollama-backed models (
LITELLM_MODEL=ollama/gpt-oss:20b): DynamicLLMPathExtractor (auto-switched viallm.model.startswith("ollama/")check)
Benchmark — gpt-4o-mini via proxy, SchemaLLMPathExtractor, use_ontology=True (2026-03-19)¶
Embedding: text-embedding-3-small via LiteLLM proxy (1536 dims)
| Doc | Neo4j Nodes | Neo4j Rels | Time |
|---|---|---|---|
cmispress.txt |
18–21 | 33–39 | 16–38s |
| Both docs total | 38–48 | 87–115 | 47–93s |
Note: Range reflects two runs with different embedding configs (Run 1: OpenAI direct embedding; Run 2: LiteLLM proxy embedding — identical results, proxy adds no overhead)
Benchmark — gpt-oss:20b via Ollama, DynamicLLMPathExtractor, use_ontology=True (2026-03-19)¶
Embedding: text-embedding-3-small (OpenAI direct, 1536 dims)
| Doc | Neo4j Nodes | Neo4j Rels | Time |
|---|---|---|---|
cmispress.txt |
30 | 53 | 30.53s |
| Both docs total | 47 | 93 | +78.59s |
Note: Slower than direct LLM_PROVIDER=ollama (58s total vs 109s) due to LiteLLM in-process overhead. Use LLM_PROVIDER=ollama directly for Ollama models.
LiteLLM Sweet Spot¶
- Best use case: Cloud providers (OpenAI, Anthropic, etc.) routed through WSL2 proxy to bypass Windows security software
- Not recommended: Ollama LLM + Ollama embedding both via LiteLLM — contention for Ollama's generation slot causes hangs
OpenRouter¶
Overview¶
LLM_PROVIDER=openrouter — unified cloud API routing to OpenAI, Anthropic, Meta, Mistral, and 200+ models.
Configuration¶
LLM_PROVIDER=openrouter
OPENROUTER_API_KEY=sk-or-v1-...
OPENROUTER_MODEL=openai/gpt-4o-mini
OPENROUTER_CONTEXT_WINDOW=128000 # required — default too small → truncated output → 6 entities
OPENROUTER_MAX_TOKENS=16384 # required
Known Issues / Fixes¶
- ✅ Fixed 2026-03-19: Added to
switch_to_dynamic_providers— OpenRouter extends OpenAILike withis_function_calling_model=False; SchemaLLMPathExtractor returns 0 entities instantly (genuine tool-calling incompatibility confirmed after credits added) - ✅ Fixed 2026-03-19:
context_window=128000+max_tokens=16384added toconfig.py+factories.py— default was too small causing output truncation (6 entities in ~6s) - Account must have credits: https://openrouter.ai/settings/credits
Benchmark — openai/gpt-4o-mini, DynamicLLMPathExtractor, use_ontology=True (2026-03-19)¶
Embedding: text-embedding-3-small (OpenAI direct, 1536 dims)
| Doc | Neo4j Nodes | Neo4j Rels | Time |
|---|---|---|---|
cmispress.txt |
19 | 35 | 21.08s |
| Both docs total | 46 | 106 | +39.61s |
Log source: flexible-graphrag-api-20260319-031556.log
Note: Fast cloud routing — comparable to LiteLLM gpt-4o-mini (48/115) with similar speed profile.
Successfully tested combinations:
- ✅ Google Gemini LLM + Google embeddings - Recommended for Gemini users
- ✅ Google Gemini LLM + OpenAI embeddings (2 separate API keys)
- ✅ Azure OpenAI LLM + Azure embeddings (same endpoint, uses EMBEDDING_MODEL as deployment name)
- ✅ Azure OpenAI LLM + OpenAI embeddings (fetches OPENAI_API_KEY from environment)
- ✅ OpenAI LLM + Azure embeddings (requires EMBEDDING_MODEL or AZURE_EMBEDDING_DEPLOYMENT for deployment name)
- ✅ OpenAI LLM + OpenAI embeddings (same API key)
- ✅ OpenAI LLM + Google embeddings (mixed provider, tested and working)
- ✅ Ollama LLM + Ollama embeddings (local deployment)
- ✅ LiteLLM LLM (gpt-4o-mini via proxy) + LiteLLM embeddings (text-embedding-3-small via proxy) — Added 2026-03-19; both route through WSL2 LiteLLM proxy. Requires /v1 suffix on LITELLM_EMBEDDING_API_BASE.
- ✅ OpenAI LLM (direct) + LiteLLM embeddings (text-embedding-3-small via proxy) — Added 2026-03-19; proxy adds no meaningful overhead for embeddings vs direct OpenAI.
- ✅ OpenAI-Like LLM (vLLM Docker / Ollama /v1) + OpenAI embeddings — standard mix, works fine
- ✅ OpenAI LLM (gpt-4o-mini, gpt-4.1-mini) + OpenAI-Like embeddings (nomic-embed-text via Ollama /v1) — Confirmed 2026-03-20; best cost/privacy combo: cloud LLM + free local embeddings. Set EMBEDDING_DIMENSION=768.
- ✅ Any LLM provider + OpenAI-Like embeddings — EMBEDDING_KIND=openai_like is provider-agnostic; works alongside any LLM
- ⚠️ LiteLLM LLM (ollama/gpt-oss:20b direct) + LiteLLM embeddings (ollama/nomic-embed-text direct) — NOT RECOMMENDED: both compete for Ollama's single generation slot; LLM extraction hangs. Use LLM_PROVIDER=ollama directly instead.
OpenAI-Like Embedding Notes (2026-03-19/20)¶
EMBEDDING_KIND=openai_likeusesOpenAILikeEmbeddingclass — any server with a/v1/embeddingsendpoint- Tested:
nomic-embed-textvia Ollama athttp://localhost:11434/v1(768 dims) — confirmed working EMBEDDING_DIMENSION=768must be set explicitly (no auto-detection for openai_like endpoints)- Works with any LLM provider — tested with OpenAI gpt-4o-mini and gpt-4.1-mini
- Config:
- Log shows
OpenAILikeEmbeddingin IngestionPipeline transformations confirming correct class
LiteLLM Embedding Notes (2026-03-19)¶
EMBEDDING_KIND=litellmroutes embeddings through the LiteLLM proxyLITELLM_EMBEDDING_API_BASEmust include/v1suffix (e.g.http://localhost:4000/v1)- For Ollama embeddings direct (no proxy):
LITELLM_EMBEDDING_API_BASE=http://localhost:11434— but avoid combining with Ollama LLM via LiteLLM (contention) text-embedding-3-smallandollama/nomic-embed-textmust be listed inlitellm_config.yamlmodel_list when using proxy routing- Proxy adds zero measurable overhead vs direct
EMBEDDING_KIND=openaifor the same model
Recommendations¶
For Full GraphRAG (entities + relationships):¶
- ✅ OpenAI (
gpt-4.1-minirecommended, gpt-4o-mini, gpt-4o) - Best overall performance, works with all graph databases, SchemaLLMPathExtractor (default).gpt-4.1-miniis fastest and most consistent for multi-chunk docs (zero 0-entity failures, 3-4x faster than gpt-4o-mini). - ✅ Azure OpenAI (gpt-4o-mini) - Enterprise deployment option, same capabilities as OpenAI, SchemaLLMPathExtractor (default)
- ✅ Google Gemini (gemini-3-flash-preview, gemini-3.1-pro-preview) - Full graph extraction, SchemaLLMPathExtractor (default)
- ✅ Google Vertex AI (gemini-2.5-flash) - Same as Gemini, SchemaLLMPathExtractor (default)
- ✅ Anthropic Claude (sonnet-4-5, haiku-4-5) - Full graph extraction, SchemaLLMPathExtractor (default)
- ✅ Ollama (
gpt-oss:20bonly) - Local deployment,SchemaLLMPathExtractor+ function mode;llama3.1:8b/llama3.2:3bnot recommended - ✅ Groq (gpt-oss-20b, llama-3.3-70b-versatile) - Fast LPU inference, DynamicLLMPathExtractor (auto-switched; SchemaLLMPathExtractor broken due to tool_choice conflict). Fixed 2026-03-17.
- ✅ Fireworks AI (gpt-oss-120b, deepseek-v3, etc.) - Full graph extraction, DynamicLLMPathExtractor (streaming mode required). Fixed 2026-03-17.
- ✅ Amazon Bedrock (claude-sonnet, gpt-oss, deepseek-r1, llama3) - AWS integration, DynamicLLMPathExtractor;
use_ontology=True+DISABLE_PROPERTIES=truerecommended - ✅ OpenAI-Like (vLLM Docker port 8002, Ollama /v1, any OpenAI-compatible API) - DynamicLLMPathExtractor (auto-switched). Added 2026-03-19.
- ✅ LiteLLM (proxy, 100+ providers) - Cloud models use SchemaLLMPathExtractor; Ollama-backed models auto-switch to Dynamic. Proxy runs in WSL2 on Windows. Added 2026-03-19.
- ✅ OpenRouter (cloud unified API, 200+ models) - DynamicLLMPathExtractor (auto-switched). Requires
OPENROUTER_CONTEXT_WINDOW=128000+OPENROUTER_MAX_TOKENS=16384. Added 2026-03-19.
Note: The extractor type can be configured via KG_EXTRACTOR_TYPE environment variable (options: schema, simple, dynamic). Due to LlamaIndex integration issues, the configuration is auto-overridden for certain providers: Bedrock, Groq, Fireworks, OpenAI-Like, and OpenRouter all auto-switch schema→dynamic. LiteLLM auto-switches only when LITELLM_MODEL starts with ollama/.
For Graph Building + Vector/Search:¶
All providers listed above work correctly with full database stacks (vector + search + graph)
For Graph-Only Mode (No separate Vector / Search databases):¶
- ✅ Neo4j + OpenAI - Fully functional graph-based retrieval
- ✅ FalkorDB + OpenAI - Fully functional graph-based retrieval
- ✅ ArcadeDB + OpenAI - Fully functional graph-based retrieval
Extractor Information:¶
- SchemaLLMPathExtractor (default): Uses predefined schema with strict entity/relationship types. Default for OpenAI, Azure, Gemini, Vertex, Claude, Ollama, and LiteLLM (cloud models).
- DynamicLLMPathExtractor: Starts with an ontology but allows LLM to expand it. Auto-used for Groq, Fireworks, Bedrock, OpenAI-Like, OpenRouter, and LiteLLM (ollama/* models). Also configurable via
KG_EXTRACTOR_TYPE=dynamic. - SimpleLLMPathExtractor: Allows LLM to discover entities/relationships without schema. Available via
KG_EXTRACTOR_TYPE=simple; not auto-assigned to any provider currently.
LLM Extraction Mode¶
LLM_EXTRACTION_MODE controls how the LLM is called during extraction (function, json_schema, auto). Default is function (tool calling) — recommended for all providers. See docs/LLM/LLM-EMBEDDING-CONFIG.md Advanced Configurations section for full details.
function(default): tool/function calling — most reliable, avoids OpenAI structured output bugsjson_schema: JSON schema / structured output modeauto: maps to DEFAULT internally (same as json_schema)
Note: providers that auto-switch to DynamicLLMPathExtractor (Groq, Fireworks, Bedrock, OpenAI-Like, OpenRouter) are unaffected — Dynamic uses apredict() (plain text), not tool calls.
Local Models (Ollama):¶
- ✅ Only recommended model:
gpt-oss:20b(2026-03-14 benchmark, function mode, both stores) - 45s for
company-ontology-test.txt→ 35 Neo4j nodes, 96 rels - 13s for
cmispress.txt(2nd doc) → cumulative 67 nodes, 142 rels, 675 GraphDB RDF rows - Use
SchemaLLMPathExtractor+nomic-embed-text+OLLAMA_CONTEXT_LENGTH=8192 ingestion_storage_mode=bothconfirmed: Neo4j + RDF store in one pass- ❌ Not recommended
llama3.1:8b— ~240s/doc (5x slower), second doc hangs — not worth itllama3.2:3b— stalls at 8192 context on GPU; falls back to 4096, low entity count — not usable for schema extractionsciphi/triplex— extracts 0 entities/relationships, extremely slow