LangChain Graph Database Integration¶
This document describes the integration of LangChain's graph database support with flexible-graphrag, enabling enhanced natural language querying of RDF and property graph stores as part of hybrid retrieval.
Overview¶
The LangChain integration provides:
- Natural Language to Query Translation: Convert natural language questions to SPARQL or Cypher automatically
- Hybrid Retrieval: Combine LangChain graph retrievers with existing vector, BM25, and property graph retrievers via
QueryFusionRetriever - Same LLM Configuration: Uses whichever LLM is already configured — no separate LLM setup required
- Schema-Guided Generation: Live predicate/type schema fetched from the store at startup; missing namespace prefixes auto-injected
Architecture¶
+-------------------------------------------------------------+
| User Query (Natural Language) |
+------------------------+------------------------------------+
|
+---------------+---------------+
| QueryFusionRetriever |
| (LlamaIndex) |
+---+--------+--------+-----+---+
| | | |
| | | +--------------+
| | | |
+-------+--+ +--+----+ ++---------+ +-------+----------+
| Vector | | BM25 | |Property | | RDF Graph |
|Retriever | | | |Graph | | Retriever |
| | | | |Retriever | | (LangChain) |
+----------+ +-------+ +----------+ +------+-----------+
|
+--------------+----------+
| LangChain QA Chain |
| (NL → SPARQL/Cypher) |
+----------+--------------+
|
+----------+----------+
| Graph Database |
| (GraphDB, Fuseki, |
| Oxigraph, Neo4j) |
+---------------------+
Supported Databases¶
RDF Stores (SPARQL)¶
1. Ontotext GraphDB¶
- Status: ✅ Fully Implemented and Tested
- Deployment: Docker
- Features:
- Schema introspection with live predicate/type fetching
- Missing PREFIX declarations auto-injected
- Enterprise features (sharding, clustering, OWL reasoning)
Configuration:
USE_LANGCHAIN_RDF=true
RDF_STORE_TYPE=graphdb
GRAPHDB_BASE_URL=http://localhost:7200
GRAPHDB_REPOSITORY=flexible-graphrag
GRAPHDB_USERNAME=admin
GRAPHDB_PASSWORD=admin
Docker Setup:
2. Apache Jena Fuseki¶
- Status: ✅ Fully Implemented and Tested
- Deployment: Docker
- Features:
- Full SPARQL 1.1 + SPARQL Update support
- RDF 1.2 annotations (legacy
<< >>Turtle-star syntax on export) - HTTP Basic Auth
Configuration:
USE_LANGCHAIN_RDF=true
RDF_STORE_TYPE=fuseki
FUSEKI_ENABLED=true
FUSEKI_BASE_URL=http://localhost:3030
FUSEKI_DATASET=flexible-graphrag
3. Oxigraph¶
- Status: ✅ Fully Implemented and Tested
- Deployment: Docker (lightweight, good for local dev)
- Features:
- RDF 1.2 blank-node reifier syntax on export
- SPARQL endpoint at
/query; upload via/store(N-Quads, not Turtle)
Configuration:
USE_LANGCHAIN_RDF=true
RDF_STORE_TYPE=oxigraph
OXIGRAPH_ENABLED=true
OXIGRAPH_URL=http://localhost:7878
4. Amazon Neptune (RDF/SPARQL)¶
- Status: ⚠️ Implemented, untested
- Deployment: AWS Cloud
- Features:
- IAM authentication support
- VPC endpoint access
Configuration:
USE_LANGCHAIN_RDF=true
RDF_STORE_TYPE=neptune_rdf
NEPTUNE_HOST=my-cluster.cluster-xyz.us-east-1.neptune.amazonaws.com
NEPTUNE_PORT=8182
NEPTUNE_REGION=us-east-1
NEPTUNE_USE_IAM_AUTH=true
NEPTUNE_USE_HTTPS=true
Property Graphs¶
1. Neo4j¶
- Status: ✅ Retrieval working
- Query Language: Cypher
- Notes: Uses existing Neo4j connection from main config; no separate setup
2. ArangoDB, Apache AGE, Azure Cosmos DB Gremlin, Google Cloud Spanner Graph¶
- Status: 🔲 Placeholder stubs — not yet implemented
Usage¶
Basic Setup¶
-
Enable LangChain RDF retrieval in
.env: -
Configure the store (Fuseki example):
-
Ingest documents via the normal API — if
INGESTION_STORAGE_MODE=bothorrdf_only, triples are written to the RDF store automatically. -
Query — hybrid search automatically includes the RDF retriever:
Advanced: Custom QA Chain¶
from rdf.langchain_adapters.graphdb_langchain_adapter import GraphDBLangChainAdapter
from langchain_openai import ChatOpenAI
adapter = GraphDBLangChainAdapter({
"base_url": "http://localhost:7200",
"repository": "flexible-graphrag",
"username": "admin",
"password": "admin",
"ontology_file": "./rdf/schemas/company_ontology.ttl"
})
llm = ChatOpenAI(model="gpt-4o-mini")
qa_chain = adapter.create_qa_chain(llm)
result = qa_chain({"query": "What departments exist at TechCorp?"})
print(result["result"])
print(result["generated_sparql"])
Configuration Reference¶
Environment Variables¶
| Variable | Default | Description |
|---|---|---|
USE_LANGCHAIN_RDF |
false |
Enable LangChain RDF retrieval |
RDF_STORE_TYPE |
- | Store type: graphdb, fuseki, oxigraph, or neptune_rdf |
RDF_RETRIEVAL_TOP_K |
5 |
Number of results from RDF retriever |
RDF_RETRIEVAL_WEIGHT |
0.3 |
Weight in fusion (0.0-1.0) |
GRAPHDB_BASE_URL |
http://localhost:7200 |
GraphDB endpoint |
GRAPHDB_REPOSITORY |
flexible-graphrag |
GraphDB repository name |
GRAPHDB_USERNAME |
admin |
GraphDB username |
GRAPHDB_PASSWORD |
admin |
GraphDB password |
FUSEKI_ENABLED |
false |
Enable Fuseki store |
FUSEKI_BASE_URL |
http://localhost:3030 |
Fuseki endpoint |
FUSEKI_DATASET |
flexible-graphrag |
Fuseki dataset name |
OXIGRAPH_ENABLED |
false |
Enable Oxigraph store |
OXIGRAPH_URL |
http://localhost:7878 |
Oxigraph HTTP endpoint |
NEPTUNE_HOST |
- | Neptune cluster endpoint |
NEPTUNE_PORT |
8182 |
Neptune port |
NEPTUNE_REGION |
us-east-1 |
AWS region |
NEPTUNE_USE_IAM_AUTH |
false |
Use IAM authentication |
NEPTUNE_USE_HTTPS |
true |
Use HTTPS |
Hybrid Retriever Weights¶
- Low emphasis (0.1-0.2): Primarily vector/text search with light graph augmentation
- Medium emphasis (0.3-0.5): Balanced hybrid approach (recommended)
- High emphasis (0.6-0.8): Graph-centric with vector/text support
Architecture Decisions¶
Why LangChain for Retrieval?¶
- Natural Language Query Translation: QA chains provide NL→SPARQL/Cypher translation with schema awareness
- Error Correction: Iterative refinement of generated queries based on database errors
- Separation of Concerns: LlamaIndex handles ingestion/embedding; LangChain handles NL query generation
Why Not LangChain for Ingestion?¶
- Performance: RDFLib local construction → bulk REST upload is faster than SPARQL INSERT
- Flexibility: LlamaIndex abstractions support multiple databases with the same code
- Incremental Updates: LlamaIndex supports clean incremental updates with
ref_doc_idtracking
Troubleshooting¶
GraphDB Connection Issues¶
Error: ConnectionError: Failed to connect to GraphDB
- Check Docker:
docker ps | grep graphdb - Check logs:
docker logs flexible-graphrag-graphdb-1 - Verify endpoint:
curl http://localhost:7200/repositories - Check credentials in
.env
No RDF Results in Hybrid Search¶
- Verify
USE_LANGCHAIN_RDF=truein.env - Check RDF store has data — run
python scripts/rdf_cleanup.py list-docs - Increase
RDF_RETRIEVAL_TOP_Kto 10+ - Check startup logs for RDF retriever initialization errors
Pydantic Schema Errors¶
Error: Unable to generate pydantic-core schema
- Ensure
strict=FalseinOntologyGuidedExtractor - Check LlamaIndex version:
pip list | grep llama-index-core - Update if needed:
pip install -U llama-index-core