Update Docling integration to v2.66.0 and optimize Docling parser with latest API #12382
+276
−70
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Docling requires Python 3.12 but the Dockerfile installs 3.11, causing compatibility issues. The docling_parser.py uses outdated API patterns without leveraging modern configuration options for improved accuracy and error handling.
Type of change
Changes
Dockerfile
uv python install 3.11→3.12pyproject.toml
docling>=2.60.0,<3.0.0deepdoc/parser/docling_parser.py (276 insertions, 70 deletions)
Modern Docling API integration:
PdfPipelineOptionswithTableFormerMode.ACCURATEfor better table extractionEasyOcrOptionsconfiguration for OCRPyPdfiumDocumentBackendsupportEnhanced error handling:
exc_info=True)Security hardening:
Example of improved configuration:
All changes maintain backward compatibility. Existing
DoclingParser()usage unchanged.