Skip to content

[Bug]: Output larger than input #1620

@iolpltuisciaoln

Description

@iolpltuisciaoln

Describe the bug

❯ uname -a
Linux alpha 6.17.13-200.fc42.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Dec 18 22:18:24 UTC 2025 x86_64 GNU/Linux
❯ ocrmypdf --version
16.7.0
❯ pdfimages -list 1906_29_p1.pdf

page   num  type   width height color comp bpc  enc interp  object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
   1     0 image    2480  3754  rgb     3   8  image  yes       28  0   300   300 10.9M  41%
   1     1 image    2216   233  rgb     3   8  image  yes       36  0   300   300  793K  52%
   1     2 image     374   550  rgb     3   8  image  yes       32  0   300   300  230K  38%
❯ ocrmypdf -O1 --skip-big 0 --tesseract-timeout 0 \
  --skip-text \
  --output-type pdf \
  1906_29_p1.pdf 1906_29_p1_out.pdf
    1 skipping all processing on this page                                                                                                            _pipeline.py:335
Image processing      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1/1 0:00:00
Postprocessing...                                                                                                                                           ocr.py:144
Linearizing           ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 100/100 0:00:00
Recompressing JPEGs   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/0 -:--:--
Deflating JPEGs       ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/0 -:--:--
JBIG2                 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/0 -:--:--
Image optimization did not improve the file - optimizations will not be used                                                                           optimize.py:728
Image optimization ratio: 1.00 savings: -0.0%                                                                                                         _pipeline.py:994
Total file size ratio: 1.00 savings: 0.0%                                                                                                             _pipeline.py:997
The output file size is 1.38× larger than the input file.                                                                                           _validation.py:357
No reason for this increase is known.  Please report this issue.
❯ pdfimages -list 1906_29_p1_out.pdf
page   num  type   width height color comp bpc  enc interp  object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
   1     0 image    2480  3754  rgb     3   8  image  yes       16  0   300   300 15.1M  57%
   1     1 image    2216   233  rgb     3   8  image  yes       17  0   300   300 1088K  72%
   1     2 image     374   550  rgb     3   8  image  yes       18  0   300   300  307K  51%

1906_29_p1.pdf

How did you download and install the software?

dnf install ocrmypdf

OCRmyPDF version

16.7.0

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions