Ingestion and Storage Modes¶
Overview¶
After the LLM extracts entities and relations from your documents, Flexible GraphRAG can store the resulting knowledge graph in:
- Property Graph (Neo4j, FalkorDB, ArcadeDB, Memgraph, …) — LPG model, Cypher queries
- RDF Graph Store (Fuseki, GraphDB, Oxigraph) — triple store, SPARQL queries
- Both simultaneously — set both
PG_GRAPH_DBandRDF_GRAPH_DBto non-nonevalues
Configuration¶
Use the two picker vars to control where extracted knowledge graph data is written:
PG_GRAPH_DB=neo4j # neo4j | arcadedb | falkordb | ... | none
RDF_GRAPH_DB=fuseki # fuseki | graphdb | oxigraph | none
# Base namespace for entity instance URIs in RDF output
RDF_BASE_NAMESPACE=https://integratedsemantics.org/flexible-graphrag/kg/
Storage Modes¶
Property Graph Only (Default)¶
Set RDF_GRAPH_DB=none. Stores extracted entities and relations in the configured property graph store only.
Data flow:
RDF Graph Store Only¶
Set PG_GRAPH_DB=none. Extracted entities and relations are converted to RDF and pushed directly to the selected RDF store. No property graph is written.
PG_GRAPH_DB=none
RDF_GRAPH_DB=fuseki
FUSEKI_BASE_URL=http://localhost:3030
FUSEKI_DATASET=flexible-graphrag
Data flow:
Use case: Semantic web / linked data workflows where you only need SPARQL access.
Both Simultaneously¶
Set both PG_GRAPH_DB and RDF_GRAPH_DB to non-none values. Extraction runs once; the same nodes are written to both destinations.
PG_GRAPH_DB=neo4j
RDF_GRAPH_DB=fuseki
FUSEKI_BASE_URL=http://localhost:3030
FUSEKI_DATASET=flexible-graphrag
Data flow:
┌→ PropertyGraphIndex → Neo4j (Cypher)
Documents → LLM Extraction ──┤
└→ KGToRDFConverter → Fuseki / GraphDB / Oxigraph (SPARQL)
Use case: Best of both worlds — fast Cypher traversal + SPARQL reasoning in a triple store.
Skipping Graph Extraction¶
Both the property graph and RDF graph stores can be bypassed at ingest time, while leaving previously extracted data intact for search and Q&A.
Per-document: skip_graph¶
Pass skip_graph=true on a per-ingest call to skip KG extraction (and therefore skip both PG and RDF writes) for that document only. Available from:
| Entry point | How |
|---|---|
| UI | "Skip graph extraction" checkbox on the ingest form |
| REST API | POST /api/ingest, POST /api/ingest-text, POST /api/test-sample — skip_graph in body |
| MCP tools | ingest_documents, ingest_text, test_with_sample — skip_graph parameter |
| Python API | backend.ingest_documents(skip_graph=True), backend.ingest_text(skip_graph=True) |
| Python API | system.ingest_documents(skip_graph=True), system.ingest_text(skip_graph=True) |
| Datasource config | skip_graph field persisted in DB — applies to every auto-sync cycle for that datasource (filesystem, S3, GCS, Azure Blob, Google Drive, OneDrive, Alfresco, Box) |
When skip_graph=true:
- LLM KG extraction is skipped entirely (faster ingest)
- No new triples are written to the RDF graph store
- No new nodes/edges are written to the property graph store
- Vector and full-text (BM25/search) stores are still updated normally
Global: ENABLE_KNOWLEDGE_GRAPH=false¶
Set in .env to disable KG extraction for all ingestion globally:
Equivalent to passing skip_graph=true on every ingest call — KG extraction and graph writes are skipped for both PG and RDF stores.
Previously extracted data is preserved¶
In both cases, previously ingested graph data is not deleted. Hybrid Search and AI Query / Chat will still return results from graph extractions that happened before skip_graph was used or ENABLE_KNOWLEDGE_GRAPH was disabled. To remove stale graph data, use:
scripts/rdf_cleanup.py clear-doc <ref_doc_id>— remove one document's RDF triplesscripts/rdf_cleanup.py clear-all— wipe the entire RDF named graphscripts/cleanup.py— clear the property graph store
RDF Representation¶
Entity → RDF triples¶
# rdf:type from entity label
:alice_johnson rdf:type onto:EMPLOYEE .
# rdfs:label from entity name (original casing preserved)
:alice_johnson rdfs:label "Alice Johnson" .
# Datatype properties
:alice_johnson onto:age "30"^^xsd:integer .
:alice_johnson onto:email "alice@example.com" .
Relation → RDF 1.2 Annotation¶
LPG relations are typed edges that carry properties:
This maps to RDF 1.2 inline annotation syntax (Turtle 1.2, W3C Recommendation), which is supported natively by all three stores:
# RDF 1.2 annotation syntax (default: RDF_ANNOTATION_SYNTAX=rdf_1.2)
:alice_johnson onto:works_for :techcorp
{| onto:since "2020"^^xsd:string ;
onto:role "Engineer"^^xsd:string |} .
The {| |} block desugars to an anonymous reifier node linked via rdf:reifies:
[] rdf:reifies <<( :alice_johnson onto:works_for :techcorp )>> ;
onto:since "2020"^^xsd:string ;
onto:role "Engineer"^^xsd:string .
Annotation syntax options (set via RDF_ANNOTATION_SYNTAX):
| Value | Syntax | Notes |
|---|---|---|
rdf_1.2 (default) |
{| prop value |} inline annotation |
RDF 1.2 Turtle standard, recommended |
rdf_star |
<< s p o >> prop value . lines |
Legacy RDF-star (pre-RDF-1.2 proposal), same store support |
flat |
onto:rel__prop value triples |
Plain SPARQL 1.1, works with any triple store |
Store support:
| Store | rdf_1.2 | rdf_star | flat |
|---|---|---|---|
| Apache Fuseki (Jena 5) | ✅ | ✅ | ✅ |
| Ontotext GraphDB 10+ | ✅ | ✅ | ✅ |
| Oxigraph 0.4+ | ✅ | ✅ | ✅ |
| Any SPARQL 1.1 store | ❌ | ❌ | ✅ |
URI Generation¶
Entity and predicate URIs are generated from names using a slug strategy:
| Input | URI |
|---|---|
Entity name "Alice Johnson" |
<https://integratedsemantics.org/flexible-graphrag/kg/alice_johnson> |
Entity label "EMPLOYEE" |
<https://integratedsemantics.org/flexible-graphrag/ontology#employee> |
Relation label "WORKS_FOR" |
<https://integratedsemantics.org/flexible-graphrag/ontology#works_for> |
Property name "since" |
<https://integratedsemantics.org/flexible-graphrag/ontology#since> |
When USE_ONTOLOGY=true, the ontology IRI is used as the ONTO_NS so extracted
types align with the loaded OWL ontology.
RDF Store Connection Configuration¶
Configure connection details for the store selected by RDF_GRAPH_DB:
# Apache Fuseki
RDF_GRAPH_DB=fuseki
FUSEKI_BASE_URL=http://localhost:3030
FUSEKI_DATASET=flexible-graphrag
FUSEKI_USERNAME=admin
FUSEKI_PASSWORD=admin
# Ontotext GraphDB
RDF_GRAPH_DB=graphdb
GRAPHDB_BASE_URL=http://localhost:7200
GRAPHDB_REPOSITORY=flexible-graphrag
GRAPHDB_USERNAME=admin
GRAPHDB_PASSWORD=admin
# Oxigraph (HTTP mode)
RDF_GRAPH_DB=oxigraph
OXIGRAPH_URL=http://localhost:7878
See env-sample.txt (RDF / Graph Store Configuration section) for the full reference.
Terminology¶
| Term | Meaning |
|---|---|
| Hybrid Search | Vector + Fulltext + Graph retrieval (main search feature) |
| Property Graph | LPG store: Neo4j, FalkorDB, ArcadeDB, Memgraph, etc. |
| RDF Graph Store | Triple store: Fuseki, GraphDB, Oxigraph |
| Ontology | OWL/RDF schema defining entity/relation types |
| Knowledge Graph | Extracted entity/relation instances from documents |
| RDF 1.2 | W3C Recommendation (2024) that standardizes triple terms, rdf:reifies, and the {| |} annotation shorthand |
| Triple term | <<( s p o )>> — a reference to an RDF proposition used with rdf:reifies |
| Annotation syntax | {| prop value |} inline shorthand; desugars to a blank-node reifier via rdf:reifies |
| RDF-star | Pre-RDF-1.2 proposal for embedded triples (<< s p o >> assertion form); superseded by RDF 1.2 |