Skip to content

Framework Integration (LlamaIndex and LangChain)

This document describes how LlamaIndex and LangChain integrate with flexible-graphrag as full peer frameworks — covering graph, vector, search, chunking, KG extraction, and hybrid retrieval.

Overview

LangChain is not just a retrieval add-on; it is a first-class backend for every pipeline stage. Each stage can run on either LlamaIndex or LangChain independently:

Stage LlamaIndex LangChain
Chunking SentenceSplitter RecursiveCharacterTextSplitter (+ 5 other splitters)
KG Extraction SchemaLLMPathExtractor / DynamicLLMPathExtractor LLMGraphTransformer
Property Graph ingestion PropertyGraphIndex add_graph_documents() via LC graph store
Property Graph retrieval VectorContextRetriever TextToGraphQueryRetriever (Cypher/AQL/GSQL/SurrealQL)
Vector ingestion VectorStoreIndex LC vector store add_documents()
Vector retrieval VectorIndexRetriever LangChainVectorStoreRetriever
Search ingestion ES/OpenSearch LI index LC add_documents()
Search retrieval LI BM25 / ES retriever LC BM25 / ES / OpenSearch retriever
Hybrid fusion QueryFusionRetriever EnsembleRetriever (RRF)

Framework Configuration

# -- Pipeline stage pickers ---------------------------
CHUNKER_BACKEND=llamaindex        # llamaindex | langchain
GRAPH_BACKEND=llamaindex          # llamaindex | langchain  (auto=langchain for LC-only stores)
VECTOR_BACKEND=llamaindex         # llamaindex | langchain
SEARCH_BACKEND=llamaindex         # llamaindex | langchain
KG_EXTRACTOR_BACKEND=llamaindex   # llamaindex | langchain
RETRIEVAL_FUSION=llamaindex       # llamaindex | langchain (EnsembleRetriever/RRF)

# -- LC-specific splitter (when CHUNKER_BACKEND=langchain) --
LC_SPLITTER_TYPE=recursive        # recursive | character | token | markdown | python | sentence_transformers

GRAPH_BACKEND is auto-selected to langchain when PG_GRAPH_DB is set to a LangChain-only store (ArangoDB, Apache AGE, HugeGraph, SurrealDB, TigerGraph, Cosmos Gremlin, Spanner).

Full Architecture Diagram

                              User Query
                                   |
                   +---------------+-----------------+
                   |        Retrieval Fusion         |
                   |   (LI QueryFusion or            |
                   |    LC EnsembleRetriever)        |
                   +-+--+--+--+--+--+--+-------------+
                     |  |  |  |  |  |
     +---------------+  |  |  |  |  +------------------+
     |          +--------+  |  |  +----------+          |
     |          |           |  +------+      |          |
     |          |           |         |      |          |
  +--+------++--+---++------+--++-----+--++--+-----++---+------+
  | Vector  || Search|| Prop.   ||  RDF   || Neigh- || PG Vec. |
  | Retr.   ||  BM25/|| Graph   ||  Graph || borhood|| (Entity |
  | LI / LC ||  ES/OS|| LI / LC ||  SPARQL||  Cypher||  Vector)|
  +----+----++-------+| (text-  ||  (LC)  ||  (LC)  ||  (LC)   |
       |               | to-     |+--------++--------++--------+
       |               | query)  |     |          |        |
  Vector DB            +----+----+     |          +--------+
  (Qdrant, Milvus,          |      RDF Store       Neo4j
   Weaviate, etc.)     Property   (GraphDB,      (entity-level
                       Graph DB    Fuseki,        __Entity__
                       (Neo4j,     Oxigraph,      vector index)
                        ArangoDB,  Neptune RDF)
                        AGE, etc.)

                  +----------------------------+
                  |     Ingest Pipeline        |
                  +----------------------------+
                  |  LI readers (always)       |
                  |  -> Chunker (LI or LC)     |
                  |  -> Embedder (LI or LC)    |
                  |  -> KG Extractor (LI or LC)|
                  |  -> Vector store update    |
                  |  -> Search index update    |
                  |  -> Property graph update  |
                  |  -> RDF store update       |
                  +----------------------------+

Retriever Layer Architecture

Retrievers follow a two-layer prefix convention:

Prefix Layer Base class Purpose
lc_ 0 — pure LC langchain_core.BaseRetriever Passed directly to EnsembleRetriever
li_ 1 — LI wrapper llama_index.BaseRetriever Used with QueryFusionRetriever; .as_lc_retriever() for hybrid

Bridge classes in langchain/retriever_bridge.py: - LItoLCRetriever — wraps an LI retriever so LC ensemble can call it - LCBackedLIRetriever — wraps an LC retriever into LI interface for QueryFusionRetriever

Source labeling: every result carries the database it came from (e.g. "company-ontology.txt | Qdrant vector", "company-ontology.txt | Ontotext GraphDB rdf graph"). LoggingRetriever._postprocess() / LCLoggingRetriever._tag_docs() inject _retriever_label into node metadata; query_engine.py builds the display string.

Property Graph Databases

LangChain-Only Stores

These stores only work with GRAPH_BACKEND=langchain (auto-selected when PG_GRAPH_DB is set to one of these):

PG_GRAPH_DB Database Docker Port Query Language Key Notes
arangodbArangoDB8529AQLArangoGraph + GraphCypherQAChain
apache_ageApache AGE5434Cypher_AGEGraphFixed dollar-quoting bypass; _extract_return_aliases()
cosmos_gremlinAzure Cosmos DB / TinkerPop8182GremlinGremlinGraph
hugegraphApache HugeGraph8082openCypherCustom add_graph_documents; _safe_id replaces spaces; Hubble UI at 8085
surrealdbSurrealDB8010SurrealQLadd_graph_documents injects name/type; async client; Surrealist UI at 8011
tigergraphTigerGraph Community9002GSQL_ensure_graph() auto-create; nlqs_host bypass; GraphStudio at 14240
spannerGoogle Cloud Spanner9010/9020Spanner Graph (Cypher)Emulator supported; SPANNER_EMULATOR_HOST=localhost:9010

Supported with Both LlamaIndex and LangChain

These work with either GRAPH_BACKEND=llamaindex or GRAPH_BACKEND=langchain:

PG_GRAPH_DB LI Ingestion LC Retrieval Notes
neo4jPropertyGraphIndexGraphCypherQAChainShared bolt connection
arcadedbArcadeDBPropertyGraphStoreCustom Cypher adapterRemote + embedded modes
falkordbFalkorDBPropertyGraphStoreFalkorDBGraph
memgraphMemgraphPropertyGraphStoreMemgraphGraph
nebulaNebulaPropertyGraphStoreNebulaGraphDynamic schema patch for arbitrary props
neptuneNeptuneDatabaseNeptuneGraphAWS cloud
neptune_analyticsNeptuneAnalyticsNeptuneAnalyticsGraphAWS cloud
ladybugLadybugPropertyGraphStoreLadybugGraph (via langchain-ladybug)Embedded

LangChain PG Retrieval Components

When GRAPH_BACKEND=langchain, three retriever types are available:

  1. TextToGraphQueryRetriever (li_graph_qa_retriever.py / lc_graph_retriever.py) — NL -> Cypher/AQL/GSQL/SurrealQL via LLM; uses per-store custom prompt templates
  2. GraphNeighborhoodRetriever (li_neighborhood_retriever.py) — walks graph neighbors from seed entities; document text chunks score 2.0, entity stubs score 1.0
  3. GraphEntityVectorRetriever (langchain_retriever_wrapper.py) — semantic entity lookup via Neo4j vector index (Neo4j only)

Routing rules: - TextToGraphQueryRetriever is auto-enabled for all LC non-vector stores (ArangoDB, AGE, HugeGraph, SurrealDB, TigerGraph, Cosmos Gremlin, Spanner) and for Neo4j when LANGCHAIN_PG_VECTOR_SEARCH=false (default). It is suppressed for Neo4j when LANGCHAIN_PG_VECTOR_SEARCH=true; set USE_LC_TEXT_TO_GRAPH=true to re-enable it alongside vector retrieval. - GraphNeighborhoodRetriever is Neo4j only and auto-enabled when LANGCHAIN_PG_VECTOR_SEARCH=true. - GraphEntityVectorRetriever is Neo4j only and off by default; LANGCHAIN_PG_VECTOR_SEARCH=true also auto-enables USE_PG_NEIGHBORHOOD.

USE_LC_TEXT_TO_GRAPH=true         # only needed for Neo4j when LANGCHAIN_PG_VECTOR_SEARCH=true
USE_PG_NEIGHBORHOOD=true          # default true; auto-enabled when LANGCHAIN_PG_VECTOR_SEARCH=true
LANGCHAIN_PG_VECTOR_SEARCH=false  # GraphEntityVectorRetriever (Neo4j only, default false)

RDF / SPARQL Stores

RDF stores are always accessed via LangChain SPARQL chains (controlled by RDF_GRAPH_DB):

RDF_GRAPH_DB Port Adapter Chain Type
graphdb7200GraphDBLangChainAdapter_GraphDBQAChain (custom OntotextGraphDBQAChain subclass)
fuseki3030FusekiLangChainAdapter_GenericSparqlQAChain (GraphSparqlQAChain subclass)
oxigraph7878OxigraphLangChainAdapter_GenericSparqlQAChain
neptune_rdf8182NeptuneRDFAdapter_GenericSparqlQAChain

All adapters share: - _ensure_sparql_prefixes() — auto-injects missing PREFIX declarations (kg:, onto:, company:, common:, rdfs:, rdf:, xsd:, owl:) - Live schema introspection at startup (predicates + types fetched via SELECT DISTINCT ?p / ?t) - SPARQL broad-fallback retry: on 0 rows, extracts shortest entity keyword and retries bi-directional UNION

Vector Stores

Set VECTOR_DB to pick the store; set VECTOR_BACKEND=langchain to use LC adapters:

VECTOR_DB LI Adapter LC Adapter Notes
qdrantQdrantVectorStoreQdrantVectorStore (LC)Default recommended
elasticsearchElasticsearchStoreElasticsearchStore (LC)
opensearchOpensearchVectorClientLC OpenSearch adapter
milvusMilvusVectorStoreMilvus (LC)gRPC host/port; auto_id=True
weaviateWeaviateVectorStoreWeaviateVectorStore (LC)Sync client in FastAPI; Filter.by_property delete
chromaChromaVectorStoreChroma (LC)HTTP client or persist mode
pineconePineconeVectorStorePineconeVectorStore (LC)Cloud index
postgresPGVectorStorePGVector (LC)langchain_pg_collection/embedding tables
lancedbLanceDBVectorStoreLanceDB (LC)uri/table_name new API; inspect.signature detection
neo4jNeo4jVectorStoreNeo4jVector (LC)Embedded in graph store

Per-store config: {TYPE}_VECTOR_DB_CONFIG={"host":...} (e.g. QDRANT_VECTOR_DB_CONFIG, MILVUS_VECTOR_DB_CONFIG)

Search / BM25 Stores

Set SEARCH_DB; set SEARCH_BACKEND=langchain for LC adapters:

SEARCH_DB LI LC
bm25 LI BM25Retriever LC BM25Retriever (in-memory)
elasticsearch LI ES client ElasticsearchStore BM25
opensearch LI OS client LC OpenSearch BM25

Per-store config: {TYPE}_SEARCH_DB_CONFIG={"host":...}

Chunker Backends

CHUNKER_BACKEND=llamaindex   # SentenceSplitter (default)
CHUNKER_BACKEND=langchain    # LC text splitter (LC_SPLITTER_TYPE selects which)

LC_SPLITTER_TYPE=recursive   # RecursiveCharacterTextSplitter
LC_SPLITTER_TYPE=character   # CharacterTextSplitter
LC_SPLITTER_TYPE=token       # TokenTextSplitter
LC_SPLITTER_TYPE=markdown    # MarkdownTextSplitter
LC_SPLITTER_TYPE=python      # PythonCodeTextSplitter
LC_SPLITTER_TYPE=sentence_transformers  # HuggingFace sentence-transformers splitter

LC chunks are stashed as system._last_lc_chunks and passed directly to LC vector/search stores — no re-embedding.

KG Extraction Backends

KG_EXTRACTOR_BACKEND=llamaindex   # SchemaLLMPathExtractor (default) / DynamicLLMPathExtractor
KG_EXTRACTOR_BACKEND=langchain    # LLMGraphTransformer

If KG_EXTRACTOR_BACKEND=langchain and GRAPH_BACKEND=langchain, LC GraphDocument objects are written directly to the LC graph store via add_graph_documents(). If KG_EXTRACTOR_BACKEND=llamaindex and GRAPH_BACKEND=langchain, the LI triplets are converted to GraphDocument via aingest_li_to_lc_graph().

skip_graph Parameter

Pass skip_graph=true on a per-ingest call to skip KG extraction and all graph store writes (both property graph and RDF) for that document only. Vector and full-text stores are still updated. Available from the UI, REST API (POST /api/ingest, /api/ingest-text, /api/test-sample), MCP tools (ingest_documents, ingest_text, test_with_sample), and the Python API (backend.ingest_documents(skip_graph=True), backend.ingest_text(skip_graph=True)). Also persisted per-datasource in the incremental sync config.

To disable graph extraction globally on every ingest: set ENABLE_KNOWLEDGE_GRAPH=false in .env. Previously ingested graph data is not deleted in either case — hybrid search and AI Q&A continue to return results from earlier extractions.

Retrieval Fusion

RETRIEVAL_FUSION=llamaindex   # QueryFusionRetriever, mode=relative_score (default)
RETRIEVAL_FUSION=langchain    # EnsembleRetriever (RRF); only activates when ALL retrievers are LC-backed

EnsembleRetriever lives in langchain_classic.retrievers.ensemble. Falls back silently to QueryFusionRetriever if any LI-native retriever is present.

Note: The local flexible-graphrag/langchain/ package folder shadows the pip-installed langchain package. Always use langchain_classic / langchain_core for the real LangChain packages inside the codebase.

Configuration Reference — Framework Env Vars

Variable Default Values
CHUNKER_BACKENDllamaindexllamaindex, langchain
LC_SPLITTER_TYPErecursiverecursive, character, token, markdown, python, sentence_transformers
GRAPH_BACKENDllamaindexllamaindex, langchain (auto for LC-only stores)
VECTOR_BACKENDllamaindexllamaindex, langchain
SEARCH_BACKENDllamaindexllamaindex, langchain
KG_EXTRACTOR_BACKENDllamaindexllamaindex, langchain
RETRIEVAL_FUSIONllamaindexllamaindex, langchain
USE_LC_TEXT_TO_GRAPHfalseFor Neo4j + LANGCHAIN_PG_VECTOR_SEARCH=true only: re-add TextToGraphQueryRetriever alongside vector+neighborhood. Auto-enabled for all other LC stores and for Neo4j with LANGCHAIN_PG_VECTOR_SEARCH=false.
USE_PG_NEIGHBORHOODtrueGraphNeighborhoodRetriever — k-hop walk (Neo4j only); auto-enabled when LANGCHAIN_PG_VECTOR_SEARCH=true
LANGCHAIN_PG_VECTOR_SEARCHfalseGraphEntityVectorRetriever — entity vector seeding (Neo4j only); auto-enables USE_PG_NEIGHBORHOOD; suppresses text-to-query unless USE_LC_TEXT_TO_GRAPH=true
USE_SYNONYM_EXPLODERfalseExpand query with LLM-generated synonyms (opt-in)
SYNONYM_EXPLODER_SCOPEnoneComma-separated retriever tags to apply synonym expansion

References