We’re saddened to write that we’ve let some of our users down by accidentally leaking their email addresses and birth dates via a bug in the web pages of musicbrainz.org. This caused some users to receive unwanted spam emails.
However, we would like to emphasize that no passwords, passwords hashes or any other bits of private user information other than email addresses and birth dates were leaked.
If you have never added or edited an annotation on MusicBrainz, then your email address and birth date were never leaked and you can ignore this — your data has not leaked.
What happened
About two weeks ago a MusicBrainz editor contacted us to say that their email address that was in use only at MusicBrainz had received spam. The user changed the email address to a very distinct email address in order to rule out a spammer guessing the updated email address. But it happened again, and the user received email to the unguessable email address.
At this point we began an audit of the MusicBrainz server codebase in an attempt to find out where the leak was, patch it as soon as possible, and discover who was affected by it.
What we found
On 2019-04-26 we released a new version of the MusicBrainz server and in this version we added email addresses to the list of editor data we pass to our server to build MusicBrainz pages. The goal of this was to display them in admin-facing pages to, ironically, be able to fight spammers who were using MusicBrainz as a spamming tool. We also added the editor’s birth date, to be able to congratulate them on their birthday. Neither of these cases should have ever been a problem, since the private data should only be used on pages built and sent from our own server (where the data cannot be seen by anyone else), and any editor info sent to the users’ browser goes through a “sanitizing” process eliminating all this private information.
After some digging, we discovered that due to a bug we had overlooked in the code that stripped this data, the addresses and dates had started being sent to the browser whenever an entity page with an annotation was requested. The email address and birth date of the last person to have edited an annotation in MusicBrainz (any annotations, attached to any of our entities) was leaked on the page for the entities in question. This data was contained in a massive block of JSON data in the page source and was never shown on the web page for humans to see, which is why this issue went undetected for so long.
Who was affected
We looked at all editors who wrote any annotations that were displayed between the date the problematic code was released and the date the bug was fixed. This can mean either the annotation was written during this time period, or it was written before that but (being the latest version of the annotation for the entity) it was still displayed during this time period. This gave us a total of 17,644 editors whose data was at some point visible from the JSON block in at least one entity’s source code. We sadly do not have a way to know for sure how many of the affected were actually ever found and stored by spammers, since we attempt to block botnets as much as possible. As such, we simply have no way of knowing who was really affected by this leak — only who might have been.
What we’ve done
Once we detected the issue on November 22, we immediately put out a hotfix to all production (and beta) servers plugging the leak. The hotfix acted to sanitize the editor data by removing email addresses and birth dates from the JSON. We also deployed two additional changes that should help prevent similar issues from occurring, by avoiding sending sensitive editor data to our template renderer altogether. See all changes from the git tag v-2020-11-22-hotfix
.
We are planning to improve our testing infrastructure to detect exposure of editor data — this will become a routine part of our continuous integration process. We are also going to ensure that any pull request dealing with editor data goes through a strict testing checklist.
How did spammers get these email addresses?
You might be wondering how such an obscure leak in a web page can end up in spammers finding and using your email — you’re not alone.
Our sites are under near constant traffic from seemingly random internet bots fetching thousands of our pages in a day, with no apparent goal. All of our metadata is available for download, so why would someone download pages from us at random?
Well, we now know — web pages can contain a whole host of random data that shouldn’t be there. Email addresses, birth dates and such are just the starting point — there have been websites that have leaked credit card numbers and even login passwords, possibly compromising the integrity of user accounts.
In this case it appears that a botnet kept downloading pages from musicbrainz.org and driving the load on our servers up. We’ve been trying to block botnets ever since they’ve come into existence, but this is a laborious task that is never complete.
It appears that spammers used the botnet to scour the internet for private data such as emails to then send out lovely spam emails to all compromised users.
Summary
We would like to wholeheartedly apologize for this data leak. We take data privacy seriously and we aim to have high standards about privacy and data security. We find ourselves frustrated by the endless data leaks that happen on the Internet on a seemingly continuous basis and work hard to avoid committing these mistakes in our domain. However, we’re also human and we do make mistakes periodically. As explained above, we’re working to improve our systems and processes in order to prevent this from happening again.
We hope that you accept our most sincere apologies for this leak.
Robert Kaye, Michael Wiencek, Nicolás Tamargo and Yvan Rivierre