Microsoft RC4 Flaw

One of the most important rules of stream ciphers is to never use the same keystream to encrypt two different documents. If someone does, you can break the encryption by XORing the two ciphertext streams together. The keystream drops out, and you end up with plaintext XORed with plaintext—and you can easily recover the two plaintexts using letter frequency analysis and other basic techniques.

It’s an amateur crypto mistake. The easy way to prevent this attack is to use a unique initialization vector (IV) in addition to the key whenever you encrypt a document.

Microsoft uses the RC4 stream cipher in both Word and Excel. And they make this mistake. Hongjun Wu has details (link is a PDF).

In this report, we point out a serious security flaw in Microsoft Word and Excel. The stream cipher RC4 [9] with key length up to 128 bits is used in Microsoft Word and Excel to protect the documents. But when an encrypted document gets modified and saved, the initialization vector remains the same and thus the same keystream generated from RC4 is applied to encrypt the different versions of that document. The consequence is disastrous since a lot of information of the document could be recovered easily.

This isn’t new. Microsoft made the same mistake in 1999 with RC4 in WinNT Syskey. Five years later, Microsoft has the same flaw in other products.

Posted on January 18, 2005 at 9:00 AM23 Comments

Comments

MA January 18, 2005 10:20 AM

I think this is not a mistake, but a “feature”, and they do it on purpose to allow CIA/NSA/etc easly crack the encription, while giving the user a false feel of security

Arik January 18, 2005 10:51 AM

Had I wanted to create a file format that can easily be decoded by the security agencies, I would have used a covert channel rather than encryption that’s flawed.

A word file is so big and contains so much redundant data that hiding the decryption key inside it, one bit at a time, seems much more plausible than using flawed encryption on purpose.

Moreover, the agency would be out of luck if they desperately need to decode a document and they only have a single one (the attack Bruce described requires at least two documents).

— Arik

Scott Laird January 18, 2005 12:06 PM

This really isn’t new with Microsoft–Windows 95 used the same basic scheme to encrypt its local passwords. One of the encrypted records was the username in uppercase, padded out to 20 bytes with nulls. Since each record in the file reused the same RC4 stream, the first 20 bytes of each password record could be recovered with pencil and paper.

See http://seclists.org/lists/bugtraq/1995/Dec/0004.html for more details.

The scuttlebutt at the time suggested that Microsoft couldn’t quite understand what was wrong with the way they were using RC4; obviously they still haven’t figured it out.

Davi Ottenheimer January 18, 2005 12:10 PM

RC4, even with IV, is still weak if implemented incorrectly. That was the mistake made with Wired Equivalent Privacy (WEP) for wireless security.

Just as you recommend, the WEP authors included Initialization Vectors to avoid encrypting two ciphertexts with the same key stream. Unfortunately, however, they only implemented a 24-bit field for the IV, and all mobile devices were set with a shared a key. So it did not take long for the WEP devices to demonstrate a weak/incorrect implementation of encryption using RC4 even with an IV.

Peter Gutmann January 19, 2005 10:21 AM

They’ve been doing this since at least 1993 with WfW password encryption. The last time I looked at this (in 1998, for my Godzilla crypto tutorial) they’d managed to get RC4 wrong every single time they were known to have used it. So this isn’t really a new problem, it’s just a bug-compatible continuation of a problem that’s been around for more than a decade.

Clay Webster January 21, 2005 12:24 PM

Microsoft hires recent college grads who will continue to make these same notice mistakes over and over… regardless of the technical specialty.

W.T. Ichiyasu January 21, 2005 11:32 PM

I have never trusted or relied on Microsoft’s native “document-level” encryption in any incarnation of Office, nor have I ever recommended that any of my clients even attempt to rely on the same to protect the confidendiality of their documents’ contents. (I’m none too pleased with “EFS,” either, but don’t even get me started on that…)

As a result of consciously never using non-MS file-level encryption, I haven’t bothered to keep abreast of all of the third-party tools that once existed to “recover” (decrypt) “natively encrypted” Office documents, with no prior knowledge of the original encryption keys. Some of those tools were baloney, but some really did work as advertised (which is why I became persuaded that native Office encryption was baloney).

Did all of those Office-cracker tools suddenly disappear???… Did any of them exploit the botched implementation of RC4???…

Ted Sumner January 21, 2005 11:54 PM

Mr. Ichiyasu raises a crucial point. The crackers I used on Outlook, Excel and Word were simply brute force. It seems unlikely that the naive implementation by Microsoft is the root of the weakness.

M. Carson January 22, 2005 8:50 AM

This really, really isn’t new. The user account creation tool in the original Microsoft version of Xenix, back in the early ’80’s, used the same IV (“salt”) for encrypting all passwords. The passwd command (for changing passwords) didn’t have this flaw, but often initial passwords were never changed. This made it substantially simpler to break large numbers of passwords on Xenix systems.

pir2 January 23, 2005 11:36 PM

Why would MS try harder? With such a monopolistic situation (rather communist when you think about it!), they would be stupid to invest hard cash delivering value to the customer. We will, anyway, all end up queuing at the same bakery, for lack of a better choice. Old debate though…

Jean-Guy Poudrier January 24, 2005 8:49 AM

Send me the link has soon has possible
for correcton of the bug in Word and Excel document.
thank

Mike Nichols January 26, 2005 11:43 AM

From my perspective, this type of less than perfect encryption will keep all but the “afflicted” snoop away from the information.
For the seriously determined and resourced, virtually no amount of encryption is absolute because it still involves an algorithm of some complexity.
The only true protection is to prevent unncessary access to the file in the first place.

Vesselin Bontchev January 26, 2005 12:47 PM

The point is really moot, folks.

First, a correction – Word does not use RC4 with a 128-bit key; it uses RC4 with a 40-bit key which is derived from a 128-bit hash (MD5, if I remember correctly) of the user-provided password. (Since version 97, that is. Earlier versions of Word used a “toy” cipher – a trivial Vigenere variant.)

Given the speed of computers nowadays, and given that RC4 is a relatively fast cipher, a 40-bit-long key for it is ridiculously insecure anyway – no matter how implemented.

Even if the flaw you’re discussing here didn’t exist, one could still brute-force the cipher in a few days on a reasonably fast PC (much faster if a cluster of PCs is used). There are even cracking programs for Word documents floating around that do just that. They don’t recover the password; they just exhaust the full (relatively small) 40-bit keyspace and decrypt the document.

chris January 28, 2005 4:06 AM

When you say ‘using letter frequency analysis and other basic techniques’, Bruce, is this something that can be done in hours or days on a standard p3/p4?

Clive Robinson February 2, 2005 1:57 PM

It probably is not even required to use a brutforce search, a 40 bit key is small enough to form a “catalouge” especially if the hash does not produce all of the 40 bit keys.

On his retirment Bob Moriss (NSA chief Scientist) made comments about modern computer files and their structures. Most of the Microsoft file formats start with a sequence of many bytes that is effectivly a magic number. Look the encrypted version up in the sorted catalouge and the key would drop out, estimated search time is based on search time of the memory holding the catalogue but could easily be down to sub millisecond times…

Therefore all MicroSoft encrypted documents sent on the internet could be decoded pretty much in real time, if there was the desire to harvest them.

The maths shows that 40 bits is approximatly 10^12 and a 130Gbyte Hard drive holds around 0.125E12 bytes so the catalogue could be fitted onto 40 IDE hard drives that cost less than 100USD each. A 5 tera byte raid array can be purchesed of the internet quite cheaply so even a home user could build the catalogue at home.

Rafael Sevilla February 3, 2005 2:23 AM

IIRC, the exact same mistake was made in MS-PPTP, as Schneier’s own analysis of the protocol pointed out. I suppose nobody at Microsoft ever bothered to read Applied Cryptography…

Anonymous October 3, 2006 10:45 AM

If you use a different password to encrypt each document then they cannot be subjected to this attack.
True or false?

john milton December 25, 2006 9:16 AM

False, as explained above: since there is “known content” in the unencrypted file regardless of what password was used a snoop tries all 2^40 possible keys, until they find one that displays that content

Alex June 3, 2007 5:41 PM

Hi all,
a curiosity about RC4.
Suppose the I generate a random 128-bit key and I encrypt a text. When I’ll decrypt it, before encrypt again I generate a new 128-bit key.
I suppose that this approach bypass RC4 flaws. Isn’t it?

Matt August 4, 2007 5:41 PM

I was wondering if there was a more secure way of protecting sheets. I have a very advanced template that requires users to input data on a sheet. There are separate sheets in the workbook that calculate the output and display it.
Mainly, I wish to prevent the “brain” of the sheet to be stolen; while allowing the users to still access the inputs.

This security hole is very disconcerting, and I wish to correct this issue.

Leave a comment

All comments are now being held for moderation. For details, see this blog post.

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.