Skip to content

MCP Tools Reference¶

The MCP server provides 9 tools for document ingestion, search, and AI Q&A.

Available Tools¶

Tool	What it does
`get_system_status()`	Verify setup and database connections
`ingest_documents(data_source, skip_graph, ...)`	Process documents from any of the 13 data sources
`ingest_text(content, source_name, skip_graph)`	Ingest and analyze specific text content; `skip_graph=True` skips KG extraction
`search_documents(query, top_k)`	Hybrid search — find relevant document excerpts
`query_documents(query, top_k)`	AI-powered Q&A over your document corpus
`test_with_sample(skip_graph)`	Quick system verification with built-in sample content; `skip_graph=True` for vector-only
`check_processing_status(id)`	Track long-running async ingestion tasks
`get_python_info()`	Python environment diagnostics
`health_check()`	Verify backend API connectivity

`ingest_documents()` — Arguments¶

Argument	Type	Default	Description
`data_source`	`str`	`"filesystem"`	Source type — see Data Source JSON Config Strings table below
`paths`	`str`	`None`	`filesystem` only — file/directory paths; JSON array `["p1","p2"]`, comma-separated, or single path
`skip_graph`	`bool`	`false`	Skip KG extraction and graph store writes; chunk + embed + vector/search only
`enable_sync`	`bool`	`false`	Enable automatic change detection and incremental updates for the source
`<source>_config`	`str`	`None`	JSON config string for non-filesystem sources (e.g. `alfresco_config`, `s3_config`) — see table below

Note

filesystem uses the paths argument, not a JSON config string. All other sources pass their connection details as a JSON string in the corresponding <source>_config argument.

Data Source JSON Config Strings¶

`data_source`	Config Argument	JSON Fields
`filesystem`	`paths` (not JSON)	File/directory path(s) — no config string needed
`alfresco`	`alfresco_config`	`{"base_url": "...", "username": "...", "password": "...", "paths": [...], "nodeDetails": {...}}`
`cmis`	`cmis_config`	`{"cmis_url": "...", "username": "...", "password": "...", "paths": [...]}`
`s3`	`s3_config`	`{"bucket": "...", "aws_access_key_id": "...", "aws_secret_access_key": "...", "region": "..."}`
`azure_blob`	`azure_blob_config`	`{"connection_string": "...", "container_name": "..."}`
`gcs`	`gcs_config`	`{"bucket_name": "...", "credentials_path": "..."}`
`onedrive`	`onedrive_config`	`{"client_id": "...", "client_secret": "...", "tenant_id": "..."}`
`google_drive`	`google_drive_config`	`{"credentials_path": "...", "folder_id": "..."}`
`sharepoint`	`sharepoint_config`	`{"client_id": "...", "client_secret": "...", "tenant_id": "...", "site_url": "..."}`
`box`	`box_config`	`{"client_id": "...", "client_secret": "...", "folder_id": "..."}`
`web`	`web_config`	`{"urls": ["https://...", "https://..."]}`
`wikipedia`	`wikipedia_config`	`{"titles": ["Article Title", "..."]}`
`youtube`	`youtube_config`	`{"urls": ["https://youtube.com/watch?v=...", "..."]}`

`skip_graph` — All Ingest Tools¶

skip_graph=True is available on all three ingest tools:

ingest_documents(data_source="filesystem", paths=["/docs"], skip_graph=True)
ingest_text(content="...", source_name="doc.txt", skip_graph=True)
test_with_sample(skip_graph=True)

When set, the document is chunked, embedded, and stored in vector + search indexes but KG extraction and property graph / RDF store writes are skipped. Useful for fast bulk ingest when graph queries are not needed.