Skip to content

[Bug]: Ghostscript rasterizing fails on seemingly empty page of document #1612

@FloWi

Description

@FloWi

Describe the bug

Hi!

I'm coming from the paperless-ngx world and learned that your package is an integral part there. Thanks a lot for your effort as it helps making my life easier!

I tried to import an invoice from my energy provider and it contains an empty page. This causes ocrmypdf to fail. I extracted the empty page into its own pdf (since the original contains private data) and attached it here.

qpdf --empty --pages broken.pdf 2 -- broken-only-empty-page.pdf

It looks like the issue comes from ghostscript, but I've seen other issues here in this repo with the same error, so I hope I'm in the right spot here.

Don't know if it helps, but the original file has been created with M/TEXT version 6.11.0.619.

Let me know if you need more information!
Cheers,
Florian

Steps to reproduce

ocrmypdf broken-only-empty-page.pdf -v1 out.pdf

Files

broken-only-empty-page.pdf

How did you download and install the software?

Homebrew

OCRmyPDF version

16.13.0

Relevant log output

ocrmypdf 16.13.0                                                   __main__.py:59
Running: ['tesseract', '--version']                               __init__.py:133
Found tesseract 5.5.2                                             __init__.py:345
Running: ['tesseract', '--version']                               __init__.py:133
Running: ['tesseract', '--version']                               __init__.py:133
Running: ['gs', '--version']                                      __init__.py:133
Found gs 10.6.0                                                   __init__.py:345
Running: ['gs', '--version']                                      __init__.py:133
Ghostscript 10.6.x contains JPEG encoding errors that may       ghostscript.py:80
corrupt images. OCRmyPDF will attempt to mitigate, but this
version is strongly not recommended. Please upgrade to a newer
version. As of 2025-12, 10.6.0 is the latest version of
Ghostscript.
Running: ['tesseract', '--list-langs']                            __init__.py:133
stdout/stderr = List of available languages in                     __init__.py:73
"/opt/homebrew/share/tessdata/" (163):
afr
amh
ara
asm
aze
aze_cyrl
bel
ben
bod
bos
bre
bul
cat
ceb
ces
chi_sim
chi_sim_vert
chi_tra
chi_tra_vert
chr
cos
cym
dan
deu
div
dzo
ell
eng
enm
epo
equ
est
eus
fao
fas
fil
fin
fra
frk
frm
fry
gla
gle
glg
grc
guj
hat
heb
hin
hrv
hun
hye
iku
ind
isl
ita
ita_old
jav
jpn
jpn_vert
kan
kat
kat_old
kaz
khm
kir
kmr
kor
kor_vert
lao
lat
lav
lit
ltz
mal
mar
mkd
mlt
mon
mri
msa
mya
nep
nld
nor
oci
ori
osd
pan
pol
por
pus
que
ron
rus
san
script/Arabic
script/Armenian
script/Bengali
script/Canadian_Aboriginal
script/Cherokee
script/Cyrillic
script/Devanagari
script/Ethiopic
script/Fraktur
script/Georgian
script/Greek
script/Gujarati
script/Gurmukhi
script/HanS
script/HanS_vert
script/HanT
script/HanT_vert
script/Hangul
script/Hangul_vert
script/Hebrew
script/Japanese
script/Japanese_vert
script/Kannada
script/Khmer
script/Lao
script/Latin
script/Malayalam
script/Myanmar
script/Oriya
script/Sinhala
script/Syriac
script/Tamil
script/Telugu
script/Thaana
script/Thai
script/Tibetan
script/Vietnamese
sin
slk
slv
snd
snum
spa
spa_old
sqi
srp
srp_latn
sun
swa
swe
syr
tam
tat
tel
tgk
tha
tir
ton
tur
uig
ukr
urd
uzb
uzb_cyrl
vie
yid
yor

pikepdf mmap enabled                                               helpers.py:328
os.symlink(broken-only-empty-page.pdf,                             helpers.py:179
/var/folders/cd/78438jq56gvbwk3gqk95_2y80000gn/T/ocrmypdf.io.3mg0y
afw/origin)
Gathering info with 1 thread workers                                  info.py:816
pikepdf mmap enabled                                               helpers.py:328
Scanning contents     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1/1 0:00:00
Using Tesseract OpenMP thread limit 3                        tesseract_ocr.py:199
pikepdf mmap enabled                                               helpers.py:328
    1 Rasterize with png16m, rotation 0, mediabox [0.0, 0.0,     _pipeline.py:553
595.224, 842.04]
    1 Running: ['gs', '-dSAFER', '-dBATCH', '-dNOPAUSE',          __init__.py:133
'-dInterpolateControl=-1', '-sDEVICE=png16m', '-dFirstPage=1',
'-dLastPage=1', '-r1.209525x1.209525', '-dPDFSTOPONERROR', '-o',
'/var/folders/cd/78438jq56gvbwk3gqk95_2y80000gn/T/ocrmypdf.io.3mg
0yafw/000001_rasterize.png', '-sstdout=%stderr',
'-dAutoRotatePages=/None', '-f',
'/var/folders/cd/78438jq56gvbwk3gqk95_2y80000gn/T/ocrmypdf.io.3mg
0yafw/origin.pdf']
    1 stderr = GPL Ghostscript 10.06.0 (2025-09-09)                __init__.py:75
Copyright (C) 2025 Artifex Software, Inc.  All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO
WARRANTY:
see the file COPYING for details.
Unrecoverable error: rangecheck in setscreen
Operand stack:
    0.0755906  0  --nostringval--

    1 GPL Ghostscript 10.06.0 (2025-09-09)                     ghostscript.py:142
Copyright (C) 2025 Artifex Software, Inc.  All rights
reserved.
This software is supplied under the GNU AGPLv3 and comes with
NO WARRANTY:
see the file COPYING for details.
Unrecoverable error: rangecheck in setscreen
Operand stack:
    0.0755906  0  --nostringval--

OCR                   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/1 -:--:--
ExitCodeException                                                  _common.py:271
Traceback (most recent call last):
  File
"/opt/homebrew/Cellar/ocrmypdf/16.13.0_1/libexec/lib/python3.14/si
te-packages/ocrmypdf/_exec/ghostscript.py", line 140, in
rasterize_pdf
    p = run(args_gs, stdout=PIPE, stderr=PIPE, check=True)
  File
"/opt/homebrew/Cellar/ocrmypdf/16.13.0_1/libexec/lib/python3.14/si
te-packages/ocrmypdf/subprocess/__init__.py", line 62, in run
    proc = subprocess_run(args, env=env, check=check, **kwargs)
  File
"/opt/homebrew/Cellar/[email protected]/3.14.2/Frameworks/Python.framewo
rk/Versions/3.14/lib/python3.14/subprocess.py", line 577, in run
    raise CalledProcessError(retcode, process.args,
                             output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['gs', '-dSAFER',
'-dBATCH', '-dNOPAUSE', '-dInterpolateControl=-1',
'-sDEVICE=png16m', '-dFirstPage=1', '-dLastPage=1',
'-r1.209525x1.209525', '-dPDFSTOPONERROR', '-o',
'/var/folders/cd/78438jq56gvbwk3gqk95_2y80000gn/T/ocrmypdf.io.3mg0
yafw/000001_rasterize.png', '-sstdout=%stderr',
'-dAutoRotatePages=/None', '-f',
'/var/folders/cd/78438jq56gvbwk3gqk95_2y80000gn/T/ocrmypdf.io.3mg0
yafw/origin.pdf']' returned non-zero exit status 255.

The above exception was the direct cause of the following
exception:

Traceback (most recent call last):
  File
"/opt/homebrew/Cellar/ocrmypdf/16.13.0_1/libexec/lib/python3.14/si
te-packages/ocrmypdf/_pipelines/_common.py", line 261, in
cli_exception_handler
    return fn(options, plugin_manager)
  File
"/opt/homebrew/Cellar/ocrmypdf/16.13.0_1/libexec/lib/python3.14/si
te-packages/ocrmypdf/_pipelines/ocr.py", line 181, in
_run_pipeline
    optimize_messages = exec_concurrent(context, executor)
  File
"/opt/homebrew/Cellar/ocrmypdf/16.13.0_1/libexec/lib/python3.14/si
te-packages/ocrmypdf/_pipelines/ocr.py", line 117, in
exec_concurrent
    executor(
    ~~~~~~~~^
        use_threads=options.use_threads,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<10 lines>...
        task_finished=update_page,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File
"/opt/homebrew/Cellar/ocrmypdf/16.13.0_1/libexec/lib/python3.14/si
te-packages/ocrmypdf/_concurrent.py", line 78, in __call__
    self._execute(
    ~~~~~~~~~~~~~^
        use_threads=use_threads,
        ^^^^^^^^^^^^^^^^^^^^^^^^
    ...<5 lines>...
        task_finished=task_finished,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File
"/opt/homebrew/Cellar/ocrmypdf/16.13.0_1/libexec/lib/python3.14/si
te-packages/ocrmypdf/builtin_plugins/concurrency.py", line 162, in
_execute
    result = future.result()
  File
"/opt/homebrew/Cellar/[email protected]/3.14.2/Frameworks/Python.framewo
rk/Versions/3.14/lib/python3.14/concurrent/futures/_base.py", line
443, in result
    return self.__get_result()
           ~~~~~~~~~~~~~~~~~^^
  File
"/opt/homebrew/Cellar/[email protected]/3.14.2/Frameworks/Python.framewo
rk/Versions/3.14/lib/python3.14/concurrent/futures/_base.py", line
395, in __get_result
    raise self._exception
  File
"/opt/homebrew/Cellar/[email protected]/3.14.2/Frameworks/Python.framewo
rk/Versions/3.14/lib/python3.14/concurrent/futures/thread.py",
line 86, in run
    result = ctx.run(self.task)
  File
"/opt/homebrew/Cellar/[email protected]/3.14.2/Frameworks/Python.framewo
rk/Versions/3.14/lib/python3.14/concurrent/futures/thread.py",
line 73, in run
    return fn(*args, **kwargs)
  File
"/opt/homebrew/Cellar/ocrmypdf/16.13.0_1/libexec/lib/python3.14/si
te-packages/ocrmypdf/_pipelines/ocr.py", line 78, in
_exec_page_sync
    ocr_image_out, pdf_page_from_image_out, orientation_correction
= process_page(

   ~~~~~~~~~~~~^
        page_context
        ^^^^^^^^^^^^
    )
    ^
  File
"/opt/homebrew/Cellar/ocrmypdf/16.13.0_1/libexec/lib/python3.14/si
te-packages/ocrmypdf/_pipelines/_common.py", line 417, in
process_page
    ocr_image, preprocess_out = make_intermediate_images(
                                ~~~~~~~~~~~~~~~~~~~~~~~~^
        page_context, orientation_correction
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File
"/opt/homebrew/Cellar/ocrmypdf/16.13.0_1/libexec/lib/python3.14/si
te-packages/ocrmypdf/_pipelines/_common.py", line 353, in
make_intermediate_images
    rasterize_out = rasterize(
        page_context.origin,
    ...<2 lines>...
        remove_vectors=False,
    )
  File
"/opt/homebrew/Cellar/ocrmypdf/16.13.0_1/libexec/lib/python3.14/si
te-packages/ocrmypdf/_pipeline.py", line 559, in rasterize
    page_context.plugin_manager.hook.rasterize_pdf_page(
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        input_file=input_file,
        ^^^^^^^^^^^^^^^^^^^^^^
    ...<7 lines>...
        stop_on_soft_error=not
page_context.options.continue_on_soft_render_error,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^
    )
    ^
  File
"/opt/homebrew/Cellar/ocrmypdf/16.13.0_1/libexec/lib/python3.14/si
te-packages/pluggy/_hooks.py", line 512, in __call__
    return self._hookexec(self.name, self._hookimpls.copy(),
kwargs, firstresult)
           ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^
  File
"/opt/homebrew/Cellar/ocrmypdf/16.13.0_1/libexec/lib/python3.14/si
te-packages/pluggy/_manager.py", line 120, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs,
firstresult)
           ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^
  File
"/opt/homebrew/Cellar/ocrmypdf/16.13.0_1/libexec/lib/python3.14/si
te-packages/pluggy/_callers.py", line 167, in _multicall
    raise exception
  File
"/opt/homebrew/Cellar/ocrmypdf/16.13.0_1/libexec/lib/python3.14/si
te-packages/pluggy/_callers.py", line 121, in _multicall
    res = hook_impl.function(*args)
  File
"/opt/homebrew/Cellar/ocrmypdf/16.13.0_1/libexec/lib/python3.14/si
te-packages/ocrmypdf/builtin_plugins/ghostscript.py", line 114, in
rasterize_pdf_page
    ghostscript.rasterize_pdf(
    ~~~~~~~~~~~~~~~~~~~~~~~~~^
        input_file,
        ^^^^^^^^^^^
    ...<7 lines>...
        stop_on_error=stop_on_soft_error,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File
"/opt/homebrew/Cellar/ocrmypdf/16.13.0_1/libexec/lib/python3.14/si
te-packages/ocrmypdf/_exec/ghostscript.py", line 144, in
rasterize_pdf
    raise SubprocessOutputError("Ghostscript rasterizing failed")
from e
ocrmypdf.exceptions.SubprocessOutputError: Ghostscript rasterizing
failed

Metadata

Metadata

Assignees

Labels

triageIssue needs triage

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions