Framework Integration (LlamaIndex and LangChain)¶
This document describes how LlamaIndex and LangChain integrate with flexible-graphrag as full peer frameworks — covering graph, vector, search, chunking, KG extraction, and hybrid retrieval.
Overview¶
LangChain is not just a retrieval add-on; it is a first-class backend for every pipeline stage. Each stage can run on either LlamaIndex or LangChain independently:
| Stage | LlamaIndex | LangChain |
|---|---|---|
| Chunking | SentenceSplitter |
RecursiveCharacterTextSplitter (+ 5 other splitters) |
| KG Extraction | SchemaLLMPathExtractor / DynamicLLMPathExtractor |
LLMGraphTransformer |
| Property Graph ingestion | PropertyGraphIndex |
add_graph_documents() via LC graph store |
| Property Graph retrieval | VectorContextRetriever |
TextToGraphQueryRetriever (Cypher/AQL/GSQL/SurrealQL) |
| Vector ingestion | VectorStoreIndex |
LC vector store add_documents() |
| Vector retrieval | VectorIndexRetriever |
LangChainVectorStoreRetriever |
| Search ingestion | ES/OpenSearch LI index | LC add_documents() |
| Search retrieval | LI BM25 / ES retriever | LC BM25 / ES / OpenSearch retriever |
| Hybrid fusion | QueryFusionRetriever |
EnsembleRetriever (RRF) |
Framework Configuration¶
# -- Pipeline stage pickers ---------------------------
CHUNKER_BACKEND=llamaindex # llamaindex | langchain
GRAPH_BACKEND=llamaindex # llamaindex | langchain (auto=langchain for LC-only stores)
VECTOR_BACKEND=llamaindex # llamaindex | langchain
SEARCH_BACKEND=llamaindex # llamaindex | langchain
KG_EXTRACTOR_BACKEND=llamaindex # llamaindex | langchain
RETRIEVAL_FUSION=llamaindex # llamaindex | langchain (EnsembleRetriever/RRF)
# -- LC-specific splitter (when CHUNKER_BACKEND=langchain) --
LC_SPLITTER_TYPE=recursive # recursive | character | token | markdown | python | sentence_transformers
GRAPH_BACKEND is auto-selected to langchain when PG_GRAPH_DB is set to a LangChain-only store (ArangoDB, Apache AGE, HugeGraph, SurrealDB, TigerGraph, Cosmos Gremlin, Spanner).
Full Architecture Diagram¶
User Query
|
+---------------+-----------------+
| Retrieval Fusion |
| (LI QueryFusion or |
| LC EnsembleRetriever) |
+-+--+--+--+--+--+--+-------------+
| | | | | |
+---------------+ | | | | +------------------+
| +--------+ | | +----------+ |
| | | +------+ | |
| | | | | |
+--+------++--+---++------+--++-----+--++--+-----++---+------+
| Vector || Search|| Prop. || RDF || Neigh- || PG Vec. |
| Retr. || BM25/|| Graph || Graph || borhood|| (Entity |
| LI / LC || ES/OS|| LI / LC || SPARQL|| Cypher|| Vector)|
+----+----++-------+| (text- || (LC) || (LC) || (LC) |
| | to- |+--------++--------++--------+
| | query) | | | |
Vector DB +----+----+ | +--------+
(Qdrant, Milvus, | RDF Store Neo4j
Weaviate, etc.) Property (GraphDB, (entity-level
Graph DB Fuseki, __Entity__
(Neo4j, Oxigraph, vector index)
ArangoDB, Neptune RDF)
AGE, etc.)
+----------------------------+
| Ingest Pipeline |
+----------------------------+
| LI readers (always) |
| -> Chunker (LI or LC) |
| -> Embedder (LI or LC) |
| -> KG Extractor (LI or LC)|
| -> Vector store update |
| -> Search index update |
| -> Property graph update |
| -> RDF store update |
+----------------------------+
Retriever Layer Architecture¶
Retrievers follow a two-layer prefix convention:
| Prefix | Layer | Base class | Purpose |
|---|---|---|---|
lc_ |
0 — pure LC | langchain_core.BaseRetriever |
Passed directly to EnsembleRetriever |
li_ |
1 — LI wrapper | llama_index.BaseRetriever |
Used with QueryFusionRetriever; .as_lc_retriever() for hybrid |
Bridge classes in langchain/retriever_bridge.py:
- LItoLCRetriever — wraps an LI retriever so LC ensemble can call it
- LCBackedLIRetriever — wraps an LC retriever into LI interface for QueryFusionRetriever
Source labeling: every result carries the database it came from (e.g. "company-ontology.txt | Qdrant vector", "company-ontology.txt | Ontotext GraphDB rdf graph"). LoggingRetriever._postprocess() / LCLoggingRetriever._tag_docs() inject _retriever_label into node metadata; query_engine.py builds the display string.
Property Graph Databases¶
LangChain-Only Stores¶
These stores only work with GRAPH_BACKEND=langchain (auto-selected when PG_GRAPH_DB is set to one of these):
PG_GRAPH_DB |
Database | Docker Port | Query Language | Key Notes |
|---|---|---|---|---|
arangodb | ArangoDB | 8529 | AQL | ArangoGraph + GraphCypherQAChain |
apache_age | Apache AGE | 5434 | Cypher | _AGEGraphFixed dollar-quoting bypass; _extract_return_aliases() |
cosmos_gremlin | Azure Cosmos DB / TinkerPop | 8182 | Gremlin | GremlinGraph |
hugegraph | Apache HugeGraph | 8082 | openCypher | Custom add_graph_documents; _safe_id replaces spaces; Hubble UI at 8085 |
surrealdb | SurrealDB | 8010 | SurrealQL | add_graph_documents injects name/type; async client; Surrealist UI at 8011 |
tigergraph | TigerGraph Community | 9002 | GSQL | _ensure_graph() auto-create; nlqs_host bypass; GraphStudio at 14240 |
spanner | Google Cloud Spanner | 9010/9020 | Spanner Graph (Cypher) | Emulator supported; SPANNER_EMULATOR_HOST=localhost:9010 |
Supported with Both LlamaIndex and LangChain¶
These work with either GRAPH_BACKEND=llamaindex or GRAPH_BACKEND=langchain:
PG_GRAPH_DB |
LI Ingestion | LC Retrieval | Notes |
|---|---|---|---|
neo4j | PropertyGraphIndex | GraphCypherQAChain | Shared bolt connection |
arcadedb | ArcadeDBPropertyGraphStore | Custom Cypher adapter | Remote + embedded modes |
falkordb | FalkorDBPropertyGraphStore | FalkorDBGraph | |
memgraph | MemgraphPropertyGraphStore | MemgraphGraph | |
nebula | NebulaPropertyGraphStore | NebulaGraph | Dynamic schema patch for arbitrary props |
neptune | NeptuneDatabase | NeptuneGraph | AWS cloud |
neptune_analytics | NeptuneAnalytics | NeptuneAnalyticsGraph | AWS cloud |
ladybug | LadybugPropertyGraphStore | LadybugGraph (via langchain-ladybug) | Embedded |
LangChain PG Retrieval Components¶
When GRAPH_BACKEND=langchain, three retriever types are available:
TextToGraphQueryRetriever(li_graph_qa_retriever.py/lc_graph_retriever.py) — NL -> Cypher/AQL/GSQL/SurrealQL via LLM; uses per-store custom prompt templatesGraphNeighborhoodRetriever(li_neighborhood_retriever.py) — walks graph neighbors from seed entities; document text chunks score 2.0, entity stubs score 1.0GraphEntityVectorRetriever(langchain_retriever_wrapper.py) — semantic entity lookup via Neo4j vector index (Neo4j only)
Routing rules:
- TextToGraphQueryRetriever is auto-enabled for all LC non-vector stores (ArangoDB, AGE, HugeGraph, SurrealDB, TigerGraph, Cosmos Gremlin, Spanner) and for Neo4j when LANGCHAIN_PG_VECTOR_SEARCH=false (default). It is suppressed for Neo4j when LANGCHAIN_PG_VECTOR_SEARCH=true; set USE_LC_TEXT_TO_GRAPH=true to re-enable it alongside vector retrieval.
- GraphNeighborhoodRetriever is Neo4j only and auto-enabled when LANGCHAIN_PG_VECTOR_SEARCH=true.
- GraphEntityVectorRetriever is Neo4j only and off by default; LANGCHAIN_PG_VECTOR_SEARCH=true also auto-enables USE_PG_NEIGHBORHOOD.
USE_LC_TEXT_TO_GRAPH=true # only needed for Neo4j when LANGCHAIN_PG_VECTOR_SEARCH=true
USE_PG_NEIGHBORHOOD=true # default true; auto-enabled when LANGCHAIN_PG_VECTOR_SEARCH=true
LANGCHAIN_PG_VECTOR_SEARCH=false # GraphEntityVectorRetriever (Neo4j only, default false)
RDF / SPARQL Stores¶
RDF stores are always accessed via LangChain SPARQL chains (controlled by RDF_GRAPH_DB):
RDF_GRAPH_DB |
Port | Adapter | Chain Type |
|---|---|---|---|
graphdb | 7200 | GraphDBLangChainAdapter | _GraphDBQAChain (custom OntotextGraphDBQAChain subclass) |
fuseki | 3030 | FusekiLangChainAdapter | _GenericSparqlQAChain (GraphSparqlQAChain subclass) |
oxigraph | 7878 | OxigraphLangChainAdapter | _GenericSparqlQAChain |
neptune_rdf | 8182 | NeptuneRDFAdapter | _GenericSparqlQAChain |
All adapters share:
- _ensure_sparql_prefixes() — auto-injects missing PREFIX declarations (kg:, onto:, company:, common:, rdfs:, rdf:, xsd:, owl:)
- Live schema introspection at startup (predicates + types fetched via SELECT DISTINCT ?p / ?t)
- SPARQL broad-fallback retry: on 0 rows, extracts shortest entity keyword and retries bi-directional UNION
Vector Stores¶
Set VECTOR_DB to pick the store; set VECTOR_BACKEND=langchain to use LC adapters:
VECTOR_DB |
LI Adapter | LC Adapter | Notes |
|---|---|---|---|
qdrant | QdrantVectorStore | QdrantVectorStore (LC) | Default recommended |
elasticsearch | ElasticsearchStore | ElasticsearchStore (LC) | |
opensearch | OpensearchVectorClient | LC OpenSearch adapter | |
milvus | MilvusVectorStore | Milvus (LC) | gRPC host/port; auto_id=True |
weaviate | WeaviateVectorStore | WeaviateVectorStore (LC) | Sync client in FastAPI; Filter.by_property delete |
chroma | ChromaVectorStore | Chroma (LC) | HTTP client or persist mode |
pinecone | PineconeVectorStore | PineconeVectorStore (LC) | Cloud index |
postgres | PGVectorStore | PGVector (LC) | langchain_pg_collection/embedding tables |
lancedb | LanceDBVectorStore | LanceDB (LC) | uri/table_name new API; inspect.signature detection |
neo4j | Neo4jVectorStore | Neo4jVector (LC) | Embedded in graph store |
Per-store config: {TYPE}_VECTOR_DB_CONFIG={"host":...} (e.g. QDRANT_VECTOR_DB_CONFIG, MILVUS_VECTOR_DB_CONFIG)
Search / BM25 Stores¶
Set SEARCH_DB; set SEARCH_BACKEND=langchain for LC adapters:
SEARCH_DB |
LI | LC |
|---|---|---|
bm25 |
LI BM25Retriever |
LC BM25Retriever (in-memory) |
elasticsearch |
LI ES client | ElasticsearchStore BM25 |
opensearch |
LI OS client | LC OpenSearch BM25 |
Per-store config: {TYPE}_SEARCH_DB_CONFIG={"host":...}
Chunker Backends¶
CHUNKER_BACKEND=llamaindex # SentenceSplitter (default)
CHUNKER_BACKEND=langchain # LC text splitter (LC_SPLITTER_TYPE selects which)
LC_SPLITTER_TYPE=recursive # RecursiveCharacterTextSplitter
LC_SPLITTER_TYPE=character # CharacterTextSplitter
LC_SPLITTER_TYPE=token # TokenTextSplitter
LC_SPLITTER_TYPE=markdown # MarkdownTextSplitter
LC_SPLITTER_TYPE=python # PythonCodeTextSplitter
LC_SPLITTER_TYPE=sentence_transformers # HuggingFace sentence-transformers splitter
LC chunks are stashed as system._last_lc_chunks and passed directly to LC vector/search stores — no re-embedding.
KG Extraction Backends¶
KG_EXTRACTOR_BACKEND=llamaindex # SchemaLLMPathExtractor (default) / DynamicLLMPathExtractor
KG_EXTRACTOR_BACKEND=langchain # LLMGraphTransformer
If KG_EXTRACTOR_BACKEND=langchain and GRAPH_BACKEND=langchain, LC GraphDocument objects are written directly to the LC graph store via add_graph_documents(). If KG_EXTRACTOR_BACKEND=llamaindex and GRAPH_BACKEND=langchain, the LI triplets are converted to GraphDocument via aingest_li_to_lc_graph().
skip_graph Parameter¶
Pass skip_graph=true on a per-ingest call to skip KG extraction and all graph store writes (both property graph and RDF) for that document only. Vector and full-text stores are still updated. Available from the UI, REST API (POST /api/ingest, /api/ingest-text, /api/test-sample), MCP tools (ingest_documents, ingest_text, test_with_sample), and the Python API (backend.ingest_documents(skip_graph=True), backend.ingest_text(skip_graph=True)). Also persisted per-datasource in the incremental sync config.
To disable graph extraction globally on every ingest: set ENABLE_KNOWLEDGE_GRAPH=false in .env. Previously ingested graph data is not deleted in either case — hybrid search and AI Q&A continue to return results from earlier extractions.
Retrieval Fusion¶
RETRIEVAL_FUSION=llamaindex # QueryFusionRetriever, mode=relative_score (default)
RETRIEVAL_FUSION=langchain # EnsembleRetriever (RRF); only activates when ALL retrievers are LC-backed
EnsembleRetriever lives in langchain_classic.retrievers.ensemble. Falls back silently to QueryFusionRetriever if any LI-native retriever is present.
Note: The local
flexible-graphrag/langchain/package folder shadows the pip-installedlangchainpackage. Always uselangchain_classic/langchain_corefor the real LangChain packages inside the codebase.
Configuration Reference — Framework Env Vars¶
| Variable | Default | Values |
|---|---|---|
CHUNKER_BACKEND | llamaindex | llamaindex, langchain |
LC_SPLITTER_TYPE | recursive | recursive, character, token, markdown, python, sentence_transformers |
GRAPH_BACKEND | llamaindex | llamaindex, langchain (auto for LC-only stores) |
VECTOR_BACKEND | llamaindex | llamaindex, langchain |
SEARCH_BACKEND | llamaindex | llamaindex, langchain |
KG_EXTRACTOR_BACKEND | llamaindex | llamaindex, langchain |
RETRIEVAL_FUSION | llamaindex | llamaindex, langchain |
USE_LC_TEXT_TO_GRAPH | false | For Neo4j + LANGCHAIN_PG_VECTOR_SEARCH=true only: re-add TextToGraphQueryRetriever alongside vector+neighborhood. Auto-enabled for all other LC stores and for Neo4j with LANGCHAIN_PG_VECTOR_SEARCH=false. |
USE_PG_NEIGHBORHOOD | true | GraphNeighborhoodRetriever — k-hop walk (Neo4j only); auto-enabled when LANGCHAIN_PG_VECTOR_SEARCH=true |
LANGCHAIN_PG_VECTOR_SEARCH | false | GraphEntityVectorRetriever — entity vector seeding (Neo4j only); auto-enables USE_PG_NEIGHBORHOOD; suppresses text-to-query unless USE_LC_TEXT_TO_GRAPH=true |
USE_SYNONYM_EXPLODER | false | Expand query with LLM-generated synonyms (opt-in) |
SYNONYM_EXPLODER_SCOPE | none | Comma-separated retriever tags to apply synonym expansion |