Parser Output Files Documentation¶

Overview¶

When SAVE_PARSING_OUTPUT=true is configured, the document processor saves intermediate parsing results to ./parsing_output/ for inspection.

Output Files¶

LlamaParse¶

<filename>_llamaparse_output.md - Markdown output (multiple chunks automatically combined)
<filename>_llamaparse_output.txt - Plaintext version (markdown formatting stripped)
<filename>_llamaparse_metadata.json - Processing metadata

Docling¶

<filename>_docling_markdown.md - Markdown format (preserves tables)
<filename>_docling_plaintext.txt - Plain text format (better for entity extraction)
<filename>_docling_metadata.json - Processing metadata

Configuration¶

# Enable saving both formats to disk
SAVE_PARSING_OUTPUT=true

# Control what format gets sent for knowledge graph extraction (optional)
PARSER_FORMAT_FOR_EXTRACTION=auto  # Default: markdown if tables, else plaintext
#PARSER_FORMAT_FOR_EXTRACTION=markdown  # Always use markdown
#PARSER_FORMAT_FOR_EXTRACTION=plaintext  # Always use plaintext

Automatic Behavior¶

The system automatically: - Saves both formats (markdown + plaintext) to disk for inspection - Combines LlamaParse chunks from the same PDF into one file - Detects and logs parser errors - Detects LaTeX/math expressions that may cause preview issues

For Knowledge Graph Extraction: - auto (default): Documents with tables → markdown format, without tables → plaintext format - markdown: Always sends markdown (preserves structure, better for tables) - plaintext: Always sends plaintext (better for entity extraction in text-heavy docs)

KaTeX Preview Errors¶

If you see errors like ParseError: KaTeX parse error in the VS Code/Cursor markdown preview, these are rendering errors, not parsing errors. The actual .md file content is correct - it's just the preview renderer having trouble with table syntax or math expressions. You can ignore these or use a different markdown viewer.