Skip to content

feat: Add ability to disable layout processing#3168

Closed
maxdswain wants to merge 3 commits intodocling-project:mainfrom
maxdswain:disable-layout-model
Closed

feat: Add ability to disable layout processing#3168
maxdswain wants to merge 3 commits intodocling-project:mainfrom
maxdswain:disable-layout-model

Conversation

@maxdswain
Copy link
Contributor

Overview
Add ability to disable layout processing of documents with do_layout option in PdfPipelineOptions.

Description of Changes
do_layout option added to PdfPipelineOptions and fed into all of the classes that inherit from BaseLayoutModel as well as the factories in StandardPdfPipeline and LegacyStandardPdfPipeline. When making these changes I followed how do_ocr was implemented.

Issue resolved by this Pull Request:
Resolves #3011

Checklist:

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

@github-actions
Copy link
Contributor

DCO Check Passed

Thanks @maxdswain, all your commits are properly signed off. 🎉

@mergify
Copy link

mergify bot commented Mar 22, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@dosubot
Copy link

dosubot bot commented Mar 22, 2026

Related Documentation

2 document(s) may need updating based on files changed in this PR:

Docling

How to properly enable enable_remote_services in Docling Serve (CPU image) to use an external OpenAI-compatible API for picture description and formula enrichment, and what is the correct config format?
View Suggested Changes
@@ -89,5 +89,5 @@
 
 - Set `UVICORN_WORKERS=1` (required at 8 GB RAM)
 - Set `DOCLING_NUM_THREADS=4` and `OMP_NUM_THREADS=4`
-- Disable unused features in requests (`do_ocr=false`, `do_table_structure=false`, etc.) unless explicitly needed
+- Disable unused features in requests (`do_ocr=false`, `do_table_structure=false`, `do_layout=false`, etc.) unless explicitly needed
 - Use external APIs for formula and picture enrichment instead of local models

[Accept] [Decline]

What are the detailed pipeline options and processing behaviors for PDF, DOCX, PPTX, and XLSX files in the Python SDK?
View Suggested Changes
@@ -8,6 +8,7 @@
     - `force_ocr`: Replace existing text with OCR-generated text
     - `ocr_engine`, `ocr_lang`: OCR engine and language options
     - `image_export_mode`: `placeholder`, `embedded`, `referenced`
+    - `do_layout` (default True): Enable document layout analysis to detect and classify page regions such as text blocks, headings, figures, tables, and other structural elements. Required for accurate content segmentation and reading-order reconstruction. Can be disabled to skip layout processing for faster performance when layout information is not needed.
     - `do_table_structure`, `table_mode`, `table_cell_matching`: Table extraction options (see Table Structure Models section below for details on TableFormer V1 and V2)
     - `do_code_enrichment`, `do_formula_enrichment`: Code/formula recognition
     - `vlm_pipeline_preset`, `vlm_pipeline_custom_config`, `picture_description_preset`, `picture_description_custom_config`, `code_formula_preset`, `code_formula_custom_config`: New model inference engine and preset options for VLM, picture description, and code/formula extraction
@@ -59,7 +60,7 @@
 
 ### PDF (continued)
 
-- **Pipeline Option Overrides**: The Python API allows you to override pipeline options at conversion time for a given format using the `format_options` argument. Only `do_*` flags (such as `do_ocr`, `do_table_structure`, `do_code_enrichment`, `do_formula_enrichment`, etc.) can be changed, and only from `True` to `False`. All other options must remain identical to those used at pipeline initialization. Attempting to enable a do_* flag or change other fields will result in an error. This enables per-call disabling of enrichment features without reinitializing the pipeline.
+- **Pipeline Option Overrides**: The Python API allows you to override pipeline options at conversion time for a given format using the `format_options` argument. Only `do_*` flags (such as `do_ocr`, `do_layout`, `do_table_structure`, `do_code_enrichment`, `do_formula_enrichment`, etc.) can be changed, and only from `True` to `False`. All other options must remain identical to those used at pipeline initialization. Attempting to enable a do_* flag or change other fields will result in an error. This enables per-call disabling of enrichment features without reinitializing the pipeline.
 - **Exporting Scanned/Image-Based PDFs**: When processing scanned or image-based PDFs with `force_full_page_ocr=True`, the layout model classifies full-page scans as `PictureItem` and OCR text is stored as children of those picture nodes. To export this OCR text via `export_to_markdown()` or `export_to_text()`, you must set the `traverse_pictures=True` parameter. Without this parameter, export functions will return empty results even though OCR text exists in the document.
 
 ```python

[Accept] [Decline]

Note: You must be authenticated to accept/decline updates.

How did I do? Any feedback?  Join Discord

@cau-git
Copy link
Member

cau-git commented Mar 23, 2026

@maxdswain while this can be technically done the way you propose, the consequence will be simply zero output, which is obviously not useful. The layout model is critical to discover structure that the subsequent pipeline stages requires to process a document at all. A proper solution would be to implement an ultra-cheap, no-AI layout detection algorithm, which can be plugged into the layout model factory we already provide.

@maxdswain
Copy link
Contributor Author

@maxdswain while this can be technically done the way you propose, the consequence will be simply zero output, which is obviously not useful. The layout model is critical to discover structure that the subsequent pipeline stages requires to process a document at all. A proper solution would be to implement an ultra-cheap, no-AI layout detection algorithm, which can be plugged into the layout model factory we already provide.

My bad then, I’ll close this PR is this is the wrong approach. Thanks for the feedback!

@maxdswain maxdswain closed this Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bee] Docling disable layout model to support express mode

2 participants