MCP Tools Reference¶
The MCP server provides 9 tools for document ingestion, search, and AI Q&A.
Available Tools¶
| Tool | What it does |
|---|---|
get_system_status() |
Verify setup and database connections |
ingest_documents(data_source, skip_graph, ...) |
Process documents from any of the 13 data sources |
ingest_text(content, source_name, skip_graph) |
Ingest and analyze specific text content; skip_graph=True skips KG extraction |
search_documents(query, top_k) |
Hybrid search — find relevant document excerpts |
query_documents(query, top_k) |
AI-powered Q&A over your document corpus |
test_with_sample(skip_graph) |
Quick system verification with built-in sample content; skip_graph=True for vector-only |
check_processing_status(id) |
Track long-running async ingestion tasks |
get_python_info() |
Python environment diagnostics |
health_check() |
Verify backend API connectivity |
ingest_documents() — Arguments¶
| Argument | Type | Default | Description |
|---|---|---|---|
data_source | str | "filesystem" | Source type — see Data Source JSON Config Strings table below |
paths | str | None | filesystem only — file/directory paths; JSON array ["p1","p2"], comma-separated, or single path |
skip_graph | bool | false | Skip KG extraction and graph store writes; chunk + embed + vector/search only |
enable_sync | bool | false | Enable automatic change detection and incremental updates for the source |
<source>_config | str | None | JSON config string for non-filesystem sources (e.g. alfresco_config, s3_config) — see table below |
Note
filesystem uses the paths argument, not a JSON config string. All other sources pass their
connection details as a JSON string in the corresponding <source>_config argument.
Data Source JSON Config Strings¶
data_source | Config Argument | JSON Fields |
|---|---|---|
filesystem | paths (not JSON) | File/directory path(s) — no config string needed |
alfresco | alfresco_config | {"base_url": "...", "username": "...", "password": "...", "paths": [...], "nodeDetails": {...}} |
cmis | cmis_config | {"cmis_url": "...", "username": "...", "password": "...", "paths": [...]} |
s3 | s3_config | {"bucket": "...", "aws_access_key_id": "...", "aws_secret_access_key": "...", "region": "..."} |
azure_blob | azure_blob_config | {"connection_string": "...", "container_name": "..."} |
gcs | gcs_config | {"bucket_name": "...", "credentials_path": "..."} |
onedrive | onedrive_config | {"client_id": "...", "client_secret": "...", "tenant_id": "..."} |
google_drive | google_drive_config | {"credentials_path": "...", "folder_id": "..."} |
sharepoint | sharepoint_config | {"client_id": "...", "client_secret": "...", "tenant_id": "...", "site_url": "..."} |
box | box_config | {"client_id": "...", "client_secret": "...", "folder_id": "..."} |
web | web_config | {"urls": ["https://...", "https://..."]} |
wikipedia | wikipedia_config | {"titles": ["Article Title", "..."]} |
youtube | youtube_config | {"urls": ["https://youtube.com/watch?v=...", "..."]} |
skip_graph — All Ingest Tools¶
skip_graph=True is available on all three ingest tools:
ingest_documents(data_source="filesystem", paths=["/docs"], skip_graph=True)
ingest_text(content="...", source_name="doc.txt", skip_graph=True)
test_with_sample(skip_graph=True)
When set, the document is chunked, embedded, and stored in vector + search indexes but KG extraction and property graph / RDF store writes are skipped. Useful for fast bulk ingest when graph queries are not needed.