feat: Add ability to disable layout processing#3168
feat: Add ability to disable layout processing#3168maxdswain wants to merge 3 commits intodocling-project:mainfrom
Conversation
Signed-off-by: Max Swain <[email protected]>
Signed-off-by: Max Swain <[email protected]>
Signed-off-by: Max Swain <[email protected]>
|
✅ DCO Check Passed Thanks @maxdswain, all your commits are properly signed off. 🎉 |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
|
Related Documentation 2 document(s) may need updating based on files changed in this PR: Docling How to properly enable
|
|
@maxdswain while this can be technically done the way you propose, the consequence will be simply zero output, which is obviously not useful. The layout model is critical to discover structure that the subsequent pipeline stages requires to process a document at all. A proper solution would be to implement an ultra-cheap, no-AI layout detection algorithm, which can be plugged into the layout model factory we already provide. |
My bad then, I’ll close this PR is this is the wrong approach. Thanks for the feedback! |
Overview
Add ability to disable layout processing of documents with
do_layoutoption inPdfPipelineOptions.Description of Changes
do_layoutoption added toPdfPipelineOptionsand fed into all of the classes that inherit fromBaseLayoutModelas well as the factories inStandardPdfPipelineandLegacyStandardPdfPipeline. When making these changes I followed howdo_ocrwas implemented.Issue resolved by this Pull Request:
Resolves #3011
Checklist: