Hacking Digitally Signed PDF Files

Interesting paper: “Shadow Attacks: Hiding and Replacing Content in Signed PDFs“:

Abstract: Digitally signed PDFs are used in contracts and invoices to guarantee the authenticity and integrity of their content. A user opening a signed PDF expects to see a warning in case of any modification. In 2019, Mladenov et al. revealed various parsing vulnerabilities in PDF viewer implementations.They showed attacks that could modify PDF documents without invalidating the signature. As a consequence, affected vendors of PDF viewers implemented countermeasures preventing all attacks.

This paper introduces a novel class of attacks, which we call shadow attacks. The shadow attacks circumvent all existing countermeasures and break the integrity protection of digitally signed PDFs. Compared to previous attacks, the shadow attacks do not abuse implementation issues in a PDF viewer. In contrast, shadow attacks use the enormous flexibility provided by the PDF specification so that shadow documents remain standard-compliant. Since shadow attacks abuse only legitimate features,they are hard to mitigate.

Our results reveal that 16 (including Adobe Acrobat and Foxit Reader) of the 29 PDF viewers tested were vulnerable to shadow attacks. We introduce our tool PDF-Attacker which can automatically generate shadow attacks. In addition, we implemented PDF-Detector to prevent shadow documents from being signed or forensically detect exploits after being applied to signed PDFs.

EDITED TO ADD (3/12): This was written about last summer.

Posted on March 8, 2021 at 6:10 AM28 Comments

Comments

David Rudling March 8, 2021 7:34 AM

The paper states in the summary that:-
“The PDF specification defines a compromise between
usability and security by softening the rules regarding the
integrity protection of digitally signed documents.”

If that is correct then there presumably needs to be a new SECURE-PDF specification which hardens the security rules even at the expense of some usability restrictions and documents compliant only with that specification should be acceptable when authenticity and integrity are essential.

Mike March 8, 2021 8:57 AM

So, you really need to start with a Digitally Signed PDF,
and then Zip, Encrypt and digitally sign the Encrypted file?

Digital Signing alone is not sufficient?
And then when it’s siting on your desktop, it can’t be attacked?

James March 8, 2021 9:12 AM

This seems so simple. Why isn’t a secure PDF exactly the same process as a PGP signature? Hash the document & encrypt the hash with your private key.

Denton Scratch March 8, 2021 9:23 AM

I have long distrusted PDF documents.

Adobe have a long history of designing and developing insecure products. The PDF format is extremely feature-rich (i.e. it presents a large attack surface). It includes a Turing-complete programming language, with which I am not familiar.

The single use-case for which I think PDF is an appropriate choice is a document containing content-elements such as formatted text and images, which must be presented as formatted in a particular way to carry the intended meaning. That is a special requirement; it’s needed for advertising brochures, printing masters, and a few other arty targets. Most PDF documents do not have that requirement; text/plain or HTML would serve as well or better.

You don’t need PDF for:

  • Most academic papers
  • Online forms
  • Business reports

In some industries, it seems that PDF is the default format for publishing information; if you’re not using PDF, you have to explain why. That is getting your Arsinole over your Elbling.

There’s a reason why a lot of my inbound spam contains a PDF attachment.

Denton Scratch March 8, 2021 9:28 AM

I found the linked article rather annoying; the repeated explanation of what they mean by a “shadow document” meant that I was at page 5 before I began to read how the attacks work. By page 9, we’re on to a description of the tools they’ve made, so I stopped reading.

The meat of the article could have been written on a single page.

TimH March 8, 2021 9:46 AM

@James: because secure PDFs aren’t. I’ve used Elcomsoft APDFPR for years to remove passwords from PDFs with security settings that prevent merging and page extraction. Takes a fraction of a setting. However, PDFs with password to open require brute force (at least in the version I have).

Evan Easton March 8, 2021 9:50 AM

@Denton Scratch, for sure. I thought the same.

That said, the Table II on p11 showing vendor products, their original vulnerability to each type of modification, and whether they had patched already might serve as a proxy metric for quality for those selecting pdf products for the first time.

Mr. C March 8, 2021 10:32 AM

Yikes.

I’d always considered PDF signatures a joke since there’s no infrastructure in place to make the signatorypublic key relationship verifiable or non-deniable. (Such a thing could exist, but in practice it does not. At least I’ve never seen a PDF signed with a public key that I had any meaningful way to verify/prove the ownership of.)

Anyway, this paper. Holy Jeezus H. Christ! This incremental updates idea is just beyond stupid. What a mess. I can’t imagine how anyone could have thought allowing post-signature changes in this manner could possibly be secure. This strikes me as likely unfixable. Even after these particular attacks are foreclosed, I’d wager new attacks in the same general class will be findable unless and until the incremental updates thing is scrapped altogether.

Tatütata March 8, 2021 10:33 AM

Surprised? PDF is IMO one hideous kludge.

The format began nearly thirty years ago with good enough intentions, building on the success of Adobe’s PostScript printer control language, with which it shares many concepts, but without the programming primitives (loops, tests, variables, etc.). The distilled pages consisted only of the visible basic graphics operators (text, lines, surfaces, and geometric constructs, bitmaps) resulting from the program execution.

But the structure got clunkier with every new version and features, and an initial goal of human readability went overboard quite quickly.

One case in point: Javascript in PDFs. Ewwwwwww… They might as well have kept PostScript after all.

If it weren’t of inertia and the sheer installed base, other formats could take over, such as SVG. Even some vendor specific alternatives like Microsoft’s XPS seem to have merits over PDF. But then there is the proliferation of E-book standards, which fill a need quite close to PDF.

To have PDF, which tries to be all things to all people, as a basis for defining legally binding documents, with all its complexities (interpreter implementation, content encryption, printing and text select permissions, font management, versioning, etc.) doesn’t sound like such a wise idea, as eloquently demonstrated in the paper.

In the mean time, I keep several PDF readers handy, in case one crashes while attempting to regurgitate a given document…

The paper only mentions 3 Linux applications, of which only one is a stand-alone reader (okular), and the other two are editors. There are several more choices, including Xreader (Mint), evince (gnome) and mupdf, as well as Linux versions of Acrobat and FoxIt.

willk2day March 8, 2021 1:46 PM

I have opened protected PDF files in Libre Office. The protections go away and I can do what I want with them.
Is it possible to infect the file and save it with a digital signature using the Libre Office add signature process?
I am not savvy enough to check/test that proposition.

Just curious.

Clive Robinson March 8, 2021 1:53 PM

@ Tatütata,

Surprised? PDF is IMO one hideous kludge.

So restrined 😉

… as a basis for defining legally binding documents, with all its complexities … doesn’t sound like such a wise idea, as eloquently demonstrated in the paper.

As I’m fond of advising people when it comes to the likes of “Electronic Submission” DON’T. Which I’ve said here a few times as,

PAPER, Paper, NEVER data!

Youl’d have thought people would have “cottoned on” by now.

@ Denton Scratch, Tatütata,

It includes a Turing-complete programming language, with which I am not familiar.

It’s derived from “PostScript” the language and is a concatenative dual stack based language with strong data typing and Reverse Polish notation, which would be quite familiar to anyone who programs in Forth.

As such it’s an interesting language and packs a very great deal into very little RAM, and executes supprisingly quickly for an interpreter. It’s also both very extensible and thus very expressive both of which have down sides when it comes to security… For one thing you can easily write an interpreter to run in the interpreter with all that implies. In fact if you want to you can wright a baby multitasking OS in it so you could have an interpreter and a command shell running together… All great fun till somebody sends one to you and your computer or printer spawns a hidden command interpreter in RAM and it starts following others hostile instructions…

Oh and in an office how often does the “group” or “Dept” printer actually get turned off?

What a lot of people do not realise is that anyone on the network can get access to the PostScript interpreter and kind of treat it like a CLI environment. So you can just wizz a comand down to it to crash out somebodies “print file” that might be blocking the printer etc.

So a hostile interpreter, being in RAM on a printer it can hang around for a long time doing it’s “nasty”. You could in theory leave a “deadman’s switch” on the office printer just waiting patiently for you not to send a heart beat, then whammy there goes something strategic to the organisation and then just deleate it’s self from the Printer RAM.

This all takes me back a third of a century, which is quite scarry in of it’s self…

Simone March 8, 2021 2:58 PM

Denton Scratch:

You don’t need PDF for: Most academic papers

What are the alternatives? Before PDF, people used PostScript (an actual Turing-complete programming language), usually gzipped. HTML is no real alternative, because its support for mathematical expressions is weak. One could use images or perhaps Javascript (…) for those. But then there are multiple files, and you’ll need to archive them and explain to people how to extract them such that they’ll work (never rename the directory or move it away from the HTML file; God help you if it started out with a bullshit name like 1.234.pdf—I’m looking at you, IACR). CHM and MHTML never caught on, and don’t work with all browsers. Plaintext is obviously no good for anything with non-trivial maths.

But, academic papers don’t need “secure” nor interactive PDFs. If people turn off Javascript, interactive forms, signature checking, etc. in their PDF readers, it won’t cause them any trouble. Definitely never use Adobe products for anything. While awful for in-browser use, basic PDF is pretty decent overall, and is easy to save “cleanly”—usually being free of web-page chuffa like navigation bars and ads.

I’m surprised by this (unsupported) statement from their paper: “A user opening a signed PDF expects to see a warning in case of any modification.” Really? Has anyone here ever opened a PDF file and known it should have been signed and the viewer would have warned if modified? I’ve skimmed the PDF specs, wrote code to deal with the files, hex-edited and otherwise munged them, and don’t think I knew they had a built-in signature capability. If I ever knew it, I’d forgotten, and I never knew how to check the signatures or how the key management works. I expect 99% of people to know less about PDF that I do. So, my gut feeling is that if you strip a signature from a PDF, roughly nobody will notice (the same number of people who’d have noticed a valid signature); and if you embed an image of a cursive name, the majority will consider it “signed”. That’s what companies do in their financial reports.

MK March 8, 2021 3:14 PM

Withoug diving into the depths of PDF, it should be pointed out that this attach requires the attacker to have access to the PDF before it is sent for signature, and the signer of the PDF does not ‘lock’ the document after signing, which would invalidate any incremental saves. If this is to be a MITM attack, the originator of the PDF (which I assume be without malice) could “Certify” the document, which is a digital signature itself, thereby preventing manipulation before signing by an end user.

A suspicious recipient could open the signature panel, select the last signature, and “view signed version”, which would ignore all incremental saves after that signature.

That’s not to say PDF is perfect, but I think it is fit for use. Unfortunately it is like a roach motel: features go in but can’t get out.

SpaceLifeForm March 8, 2021 3:25 PM

@ ALL

My irony meter just died.

Do you all not see that the article Bruce linked to is a PDF?

Rj March 8, 2021 7:00 PM

All this flap about pdf and its postscript heritage and postscript’s heritage in Forth brings back good memories. I did a great deal of work in Forth in the 1980’s and some in the early 90’s. Forth put a lot of bread on my table.

Its like a stripped down Lisp without the parenthesis. In fact Forth syntax is just Lisp syntax standing on its head naked! If you take a Lisp sexpr and rotate it 180 degrees about a perpindicular bisector of the sexpr that is orthogonal to the 2 dimension the sexpr is written in and remove the parens, you have an equivalent Forth expression.

Charles Moore once told me that he took a Lisp course from McCarthy himself (the inventor of Lisp) of all people before he invented Forth. Forth’s real beauty is one of the nice features of Lisp: it is both the language and the macro-language (code that runs at compile tim to output source code) of itself.

Forth was designed to run fast on bare metal on tiny microcontrollers. The IBM 1130 that ran early Forth was a 12 bit machine that was the ancestor of the beloved DEC PDP-8. It was a very good fit for things like the 8080, 8051, 6502, 68HC11, etc. There was no garbage collector.

It was very easy to port to a new processor. In the mid 90’s, I was involved in a project to implement a pair of 32 Forth optimized CPU’s on a single custom ASIC. It took me about a week to get a sross assembler and Forth interpreter running on the new hardware under simulation before the hardware was even ready.

lurker March 8, 2021 10:56 PM

@Rj: Forth was designed to run fast on bare metal on tiny microcontrollers.

The closest I ever got to the bare metal was handing in a coding sheet for the card punchers to play with. A course on “modern” computer programming marred me for life.
Until I found a copy of PostScript Language Reference Manual in a remainder bin, and really enjoyed it. Hacked a few logos in raw .ps, repaired Word’s occasional idiocies, etc. Telnet into the printer, ah those were the days…

@Simone: and if you embed an image of a cursive name, the majority will consider it “signed”.

Yup, I did that this morning, on instructions from the architect. I kept a printout to satisfy myself and the lawyers.

MK March 9, 2021 12:54 AM

@lurker: Digitally signed files can’t be modified without invalidating the signature. The file bytes are hashed and signed. qpdf could read a signed file, but the resultant file would not have a valid signature.

Denton Scratch March 9, 2021 1:44 AM

@Simone:

“HTML is no real alternative, because its support for mathematical expressions is weak.”

MathML?

Simone March 9, 2021 5:50 PM

@Denton Scratch

MathML?

Wikipedia says neither Chrome nor Safari supports it. Google suggest using MathJax—i.e. Javascript, a Turing-complete language—to embed math, which is what we were trying to avoid. I imagine a lot of people will want to embed graphs too. One could perhaps embed all these things as base64-encoded images via “data:” URIs. But there’s really not much fundamentally wrong with PDF for the purpose, if we ignore all the new fancy stuff.

@MK

Digitally signed files can’t be modified without invalidating the signature.

One could remove the signature, or even re-sign with another key. Whether that “invalidates” the original signature is a matter of semantics, and is moot if the recipients don’t notice.

Stefan Claas March 10, 2021 1:12 PM

Well, this seems to be an old attack described already on ZDNet in early 2020 and when downloading the mentioned tools, from GitHub, and checking a shadow signed .pdf, stating ‘you are fired’ my free Adobe Reader says that the .pdf contains an invalid signature, as expected, because Adobe fixed the issues long ago.

calvin March 10, 2021 7:11 PM

@Denton Scratch
Adobe have a long history of designing and developing insecure products.

Amen to that. The uncrowned King of those being Adobe Flash.

a'anon March 12, 2021 3:33 PM

>> Adobe have a long history of designing and developing insecure products.
>
> Amen to that. The uncrowned King of those being Adobe Flash.

Flash was developed by Macromedia. Adobe bought Macromedia in 2005.

Stefan Claas March 17, 2021 3:33 PM

For the interested reader I created yesterday an eIDAS conform and certified .pdf document, containing my age pub key, so that age users know the pub key belongs to me. Because eIDAS Digital Signatures are EU Standard I invite everybody to take a look at my .pdf and give a crack a try.

https://github.com/sac001/my_public_age_key

Kaushik December 29, 2022 4:55 AM

Such attacks aside, what would be the best way to detect text tampering of a PDF? The metadata isn’t changed in most cases and there’s no obvious way to detect if text has been edited!

Leave a comment

All comments are now being held for moderation. For details, see this blog post.

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.