Framework Integration (LlamaIndex and LangChain)¶

This document describes how LlamaIndex and LangChain integrate with flexible-graphrag as full peer frameworks — covering graph, vector, search, chunking, KG extraction, and hybrid retrieval.

Overview¶

LangChain is not just a retrieval add-on; it is a first-class backend for every pipeline stage. Each stage can run on either LlamaIndex or LangChain independently:

Stage	LlamaIndex	LangChain
Chunking	`SentenceSplitter`	`RecursiveCharacterTextSplitter` (+ 5 other splitters)
KG Extraction	`SchemaLLMPathExtractor` / `DynamicLLMPathExtractor`	`LLMGraphTransformer`
Property Graph ingestion	`PropertyGraphIndex`	`add_graph_documents()` via LC graph store
Property Graph retrieval	`VectorContextRetriever`	`TextToGraphQueryRetriever` (Cypher/AQL/GSQL/SurrealQL)
Vector ingestion	`VectorStoreIndex`	LC vector store `add_documents()`
Vector retrieval	`VectorIndexRetriever`	`LangChainVectorStoreRetriever`
Search ingestion	ES/OpenSearch LI index	LC `add_documents()`
Search retrieval	LI BM25 / ES retriever	LC BM25 / ES / OpenSearch retriever
Hybrid fusion	`QueryFusionRetriever`	`EnsembleRetriever` (RRF)

Framework Configuration¶

# -- Pipeline stage pickers ---------------------------
CHUNKER_BACKEND=llamaindex        # llamaindex | langchain
GRAPH_BACKEND=llamaindex          # llamaindex | langchain  (auto=langchain for LC-only stores)
VECTOR_BACKEND=llamaindex         # llamaindex | langchain
SEARCH_BACKEND=llamaindex         # llamaindex | langchain
KG_EXTRACTOR_BACKEND=llamaindex   # llamaindex | langchain
RETRIEVAL_FUSION=llamaindex       # llamaindex | langchain (EnsembleRetriever/RRF)

# -- LC-specific splitter (when CHUNKER_BACKEND=langchain) --
LC_SPLITTER_TYPE=recursive        # recursive | character | token | markdown | python | sentence_transformers

GRAPH_BACKEND is auto-selected to langchain when PG_GRAPH_DB is set to a LangChain-only store (ArangoDB, Apache AGE, HugeGraph, SurrealDB, TigerGraph, Cosmos Gremlin, Spanner).

Full Architecture Diagram¶

                              User Query
                                   |
                   +---------------+-----------------+
                   |        Retrieval Fusion         |
                   |   (LI QueryFusion or            |
                   |    LC EnsembleRetriever)        |
                   +-+--+--+--+--+--+--+-------------+
                     |  |  |  |  |  |
     +---------------+  |  |  |  |  +------------------+
     |          +--------+  |  |  +----------+          |
     |          |           |  +------+      |          |
     |          |           |         |      |          |
  +--+------++--+---++------+--++-----+--++--+-----++---+------+
  | Vector  || Search|| Prop.   ||  RDF   || Neigh- || PG Vec. |
  | Retr.   ||  BM25/|| Graph   ||  Graph || borhood|| (Entity |
  | LI / LC ||  ES/OS|| LI / LC ||  SPARQL||  Cypher||  Vector)|
  +----+----++-------+| (text-  ||  (LC)  ||  (LC)  ||  (LC)   |
       |               | to-     |+--------++--------++--------+
       |               | query)  |     |          |        |
  Vector DB            +----+----+     |          +--------+
  (Qdrant, Milvus,          |      RDF Store       Neo4j
   Weaviate, etc.)     Property   (GraphDB,      (entity-level
                       Graph DB    Fuseki,        __Entity__
                       (Neo4j,     Oxigraph,      vector index)
                        ArangoDB,  Neptune RDF)
                        AGE, etc.)

                  +----------------------------+
                  |     Ingest Pipeline        |
                  +----------------------------+
                  |  LI readers (always)       |
                  |  -> Chunker (LI or LC)     |
                  |  -> Embedder (LI or LC)    |
                  |  -> KG Extractor (LI or LC)|
                  |  -> Vector store update    |
                  |  -> Search index update    |
                  |  -> Property graph update  |
                  |  -> RDF store update       |
                  +----------------------------+

Retriever Layer Architecture¶

Retrievers follow a two-layer prefix convention:

Prefix	Layer	Base class	Purpose
`lc_`	0 — pure LC	`langchain_core.BaseRetriever`	Passed directly to `EnsembleRetriever`
`li_`	1 — LI wrapper	`llama_index.BaseRetriever`	Used with `QueryFusionRetriever`; `.as_lc_retriever()` for hybrid

Bridge classes in langchain/retriever_bridge.py: - LItoLCRetriever — wraps an LI retriever so LC ensemble can call it - LCBackedLIRetriever — wraps an LC retriever into LI interface for QueryFusionRetriever

Source labeling: every result carries the database it came from (e.g. "company-ontology.txt | Qdrant vector", "company-ontology.txt | Ontotext GraphDB rdf graph"). LoggingRetriever._postprocess() / LCLoggingRetriever._tag_docs() inject _retriever_label into node metadata; query_engine.py builds the display string.

Property Graph Databases¶

LangChain-Only Stores¶

These stores only work with GRAPH_BACKEND=langchain (auto-selected when PG_GRAPH_DB is set to one of these):

`PG_GRAPH_DB`	Database	Docker Port	Query Language	Key Notes
`arangodb`	ArangoDB	8529	AQL	`ArangoGraph` + `GraphCypherQAChain`
`apache_age`	Apache AGE	5434	Cypher	`_AGEGraphFixed` dollar-quoting bypass; `_extract_return_aliases()`
`cosmos_gremlin`	Azure Cosmos DB / TinkerPop	8182	Gremlin	`GremlinGraph`
`hugegraph`	Apache HugeGraph	8082	openCypher	Custom `add_graph_documents`; `_safe_id` replaces spaces; Hubble UI at 8085
`surrealdb`	SurrealDB	8010	SurrealQL	`add_graph_documents` injects `name`/`type`; async client; Surrealist UI at 8011
`tigergraph`	TigerGraph Community	9002	GSQL	`_ensure_graph()` auto-create; `nlqs_host` bypass; GraphStudio at 14240
`spanner`	Google Cloud Spanner	9010/9020	Spanner Graph (Cypher)	Emulator supported; `SPANNER_EMULATOR_HOST=localhost:9010`

Supported with Both LlamaIndex and LangChain¶

These work with either GRAPH_BACKEND=llamaindex or GRAPH_BACKEND=langchain:

`PG_GRAPH_DB`	LI Ingestion	LC Retrieval	Notes
`neo4j`	`PropertyGraphIndex`	`GraphCypherQAChain`	Shared bolt connection
`arcadedb`	`ArcadeDBPropertyGraphStore`	Custom Cypher adapter	Remote + embedded modes
`falkordb`	`FalkorDBPropertyGraphStore`	`FalkorDBGraph`
`memgraph`	`MemgraphPropertyGraphStore`	`MemgraphGraph`
`nebula`	`NebulaPropertyGraphStore`	`NebulaGraph`	Dynamic schema patch for arbitrary props
`neptune`	`NeptuneDatabase`	`NeptuneGraph`	AWS cloud
`neptune_analytics`	`NeptuneAnalytics`	`NeptuneAnalyticsGraph`	AWS cloud
`ladybug`	`LadybugPropertyGraphStore`	`LadybugGraph` (via `langchain-ladybug`)	Embedded

LangChain PG Retrieval Components¶

When GRAPH_BACKEND=langchain, three retriever types are available:

TextToGraphQueryRetriever (li_graph_qa_retriever.py / lc_graph_retriever.py) — NL -> Cypher/AQL/GSQL/SurrealQL via LLM; uses per-store custom prompt templates
GraphNeighborhoodRetriever (li_neighborhood_retriever.py) — walks graph neighbors from seed entities; document text chunks score 2.0, entity stubs score 1.0
GraphEntityVectorRetriever (langchain_retriever_wrapper.py) — semantic entity lookup via Neo4j vector index (Neo4j only)

Routing rules: - TextToGraphQueryRetriever is auto-enabled for all LC non-vector stores (ArangoDB, AGE, HugeGraph, SurrealDB, TigerGraph, Cosmos Gremlin, Spanner) and for Neo4j when LANGCHAIN_PG_VECTOR_SEARCH=false (default). It is suppressed for Neo4j when LANGCHAIN_PG_VECTOR_SEARCH=true; set USE_LC_TEXT_TO_GRAPH=true to re-enable it alongside vector retrieval. - GraphNeighborhoodRetriever is Neo4j only and auto-enabled when LANGCHAIN_PG_VECTOR_SEARCH=true. - GraphEntityVectorRetriever is Neo4j only and off by default; LANGCHAIN_PG_VECTOR_SEARCH=true also auto-enables USE_PG_NEIGHBORHOOD.

USE_LC_TEXT_TO_GRAPH=true         # only needed for Neo4j when LANGCHAIN_PG_VECTOR_SEARCH=true
USE_PG_NEIGHBORHOOD=true          # default true; auto-enabled when LANGCHAIN_PG_VECTOR_SEARCH=true
LANGCHAIN_PG_VECTOR_SEARCH=false  # GraphEntityVectorRetriever (Neo4j only, default false)

RDF / SPARQL Stores¶

RDF stores are always accessed via LangChain SPARQL chains (controlled by RDF_GRAPH_DB):

`RDF_GRAPH_DB`	Port	Adapter	Chain Type
`graphdb`	7200	`GraphDBLangChainAdapter`	`_GraphDBQAChain` (custom `OntotextGraphDBQAChain` subclass)
`fuseki`	3030	`FusekiLangChainAdapter`	`_GenericSparqlQAChain` (`GraphSparqlQAChain` subclass)
`oxigraph`	7878	`OxigraphLangChainAdapter`	`_GenericSparqlQAChain`
`neptune_rdf`	8182	`NeptuneRDFAdapter`	`_GenericSparqlQAChain`

All adapters share: - _ensure_sparql_prefixes() — auto-injects missing PREFIX declarations (kg:, onto:, company:, common:, rdfs:, rdf:, xsd:, owl:) - Live schema introspection at startup (predicates + types fetched via SELECT DISTINCT ?p / ?t) - SPARQL broad-fallback retry: on 0 rows, extracts shortest entity keyword and retries bi-directional UNION

Vector Stores¶

Set VECTOR_DB to pick the store; set VECTOR_BACKEND=langchain to use LC adapters:

`VECTOR_DB`	LI Adapter	LC Adapter	Notes
`qdrant`	`QdrantVectorStore`	`QdrantVectorStore` (LC)	Default recommended
`elasticsearch`	`ElasticsearchStore`	`ElasticsearchStore` (LC)
`opensearch`	`OpensearchVectorClient`	LC OpenSearch adapter
`milvus`	`MilvusVectorStore`	`Milvus` (LC)	gRPC host/port; `auto_id=True`
`weaviate`	`WeaviateVectorStore`	`WeaviateVectorStore` (LC)	Sync client in FastAPI; `Filter.by_property` delete
`chroma`	`ChromaVectorStore`	`Chroma` (LC)	HTTP client or persist mode
`pinecone`	`PineconeVectorStore`	`PineconeVectorStore` (LC)	Cloud index
`postgres`	`PGVectorStore`	`PGVector` (LC)	`langchain_pg_collection`/`embedding` tables
`lancedb`	`LanceDBVectorStore`	`LanceDB` (LC)	`uri`/`table_name` new API; `inspect.signature` detection
`neo4j`	`Neo4jVectorStore`	`Neo4jVector` (LC)	Embedded in graph store

Per-store config: {TYPE}_VECTOR_DB_CONFIG={"host":...} (e.g. QDRANT_VECTOR_DB_CONFIG, MILVUS_VECTOR_DB_CONFIG)

Search / BM25 Stores¶

Set SEARCH_DB; set SEARCH_BACKEND=langchain for LC adapters:

`SEARCH_DB`	LI	LC
`bm25`	LI `BM25Retriever`	LC `BM25Retriever` (in-memory)
`elasticsearch`	LI ES client	`ElasticsearchStore` BM25
`opensearch`	LI OS client	LC OpenSearch BM25

Per-store config: {TYPE}_SEARCH_DB_CONFIG={"host":...}

Chunker Backends¶

CHUNKER_BACKEND=llamaindex   # SentenceSplitter (default)
CHUNKER_BACKEND=langchain    # LC text splitter (LC_SPLITTER_TYPE selects which)

LC_SPLITTER_TYPE=recursive   # RecursiveCharacterTextSplitter
LC_SPLITTER_TYPE=character   # CharacterTextSplitter
LC_SPLITTER_TYPE=token       # TokenTextSplitter
LC_SPLITTER_TYPE=markdown    # MarkdownTextSplitter
LC_SPLITTER_TYPE=python      # PythonCodeTextSplitter
LC_SPLITTER_TYPE=sentence_transformers  # HuggingFace sentence-transformers splitter

LC chunks are stashed as system._last_lc_chunks and passed directly to LC vector/search stores — no re-embedding.

KG Extraction Backends¶

KG_EXTRACTOR_BACKEND=llamaindex   # SchemaLLMPathExtractor (default) / DynamicLLMPathExtractor
KG_EXTRACTOR_BACKEND=langchain    # LLMGraphTransformer

If KG_EXTRACTOR_BACKEND=langchain and GRAPH_BACKEND=langchain, LC GraphDocument objects are written directly to the LC graph store via add_graph_documents(). If KG_EXTRACTOR_BACKEND=llamaindex and GRAPH_BACKEND=langchain, the LI triplets are converted to GraphDocument via aingest_li_to_lc_graph().

`skip_graph` Parameter¶

Pass skip_graph=true on a per-ingest call to skip KG extraction and all graph store writes (both property graph and RDF) for that document only. Vector and full-text stores are still updated. Available from the UI, REST API (POST /api/ingest, /api/ingest-text, /api/test-sample), MCP tools (ingest_documents, ingest_text, test_with_sample), and the Python API (backend.ingest_documents(skip_graph=True), backend.ingest_text(skip_graph=True)). Also persisted per-datasource in the incremental sync config.

To disable graph extraction globally on every ingest: set ENABLE_KNOWLEDGE_GRAPH=false in .env. Previously ingested graph data is not deleted in either case — hybrid search and AI Q&A continue to return results from earlier extractions.

Retrieval Fusion¶

RETRIEVAL_FUSION=llamaindex   # QueryFusionRetriever, mode=relative_score (default)
RETRIEVAL_FUSION=langchain    # EnsembleRetriever (RRF); only activates when ALL retrievers are LC-backed

EnsembleRetriever lives in langchain_classic.retrievers.ensemble. Falls back silently to QueryFusionRetriever if any LI-native retriever is present.

Note: The local flexible-graphrag/langchain/ package folder shadows the pip-installed langchain package. Always use langchain_classic / langchain_core for the real LangChain packages inside the codebase.

Configuration Reference — Framework Env Vars¶

Variable	Default	Values
`CHUNKER_BACKEND`	`llamaindex`	`llamaindex`, `langchain`
`LC_SPLITTER_TYPE`	`recursive`	`recursive`, `character`, `token`, `markdown`, `python`, `sentence_transformers`
`GRAPH_BACKEND`	`llamaindex`	`llamaindex`, `langchain` (auto for LC-only stores)
`VECTOR_BACKEND`	`llamaindex`	`llamaindex`, `langchain`
`SEARCH_BACKEND`	`llamaindex`	`llamaindex`, `langchain`
`KG_EXTRACTOR_BACKEND`	`llamaindex`	`llamaindex`, `langchain`
`RETRIEVAL_FUSION`	`llamaindex`	`llamaindex`, `langchain`
`USE_LC_TEXT_TO_GRAPH`	`false`	For Neo4j + `LANGCHAIN_PG_VECTOR_SEARCH=true` only: re-add `TextToGraphQueryRetriever` alongside vector+neighborhood. Auto-enabled for all other LC stores and for Neo4j with `LANGCHAIN_PG_VECTOR_SEARCH=false`.
`USE_PG_NEIGHBORHOOD`	`true`	`GraphNeighborhoodRetriever` — k-hop walk (Neo4j only); auto-enabled when `LANGCHAIN_PG_VECTOR_SEARCH=true`
`LANGCHAIN_PG_VECTOR_SEARCH`	`false`	`GraphEntityVectorRetriever` — entity vector seeding (Neo4j only); auto-enables `USE_PG_NEIGHBORHOOD`; suppresses text-to-query unless `USE_LC_TEXT_TO_GRAPH=true`
`USE_SYNONYM_EXPLODER`	`false`	Expand query with LLM-generated synonyms (opt-in)
`SYNONYM_EXPLODER_SCOPE`	`none`	Comma-separated retriever tags to apply synonym expansion