Amazon Just Broke the Internet

A little after 1pm on Tuesday, countless websites and web services ground to a halt following a reported widespread outage of Amazon Web Services (AWS).

Everything from Slack to Quora to your own beloved Gizmodo saw major disruptions. Before Down Detector itself went down, the site showed outages on the tier 1 network Level3 in most major population centers in the United States.

It’s unclear what’s causing the problem, but AWS did say on its website that it’s experiencing “increased error rates.” More specifically, the company said in an alert:

We’ve identified the issue as high error rates with S3 in US-EAST-1, which is also impacting applications and services dependent on S3. We are actively working on remediating the issue.

Amazon S3 refers to the company’s Simple Storage Service that helps countless websites stay up and running. Because so many services depend on Amazon’s cloud storage, a single outage can cripple America’s internet in a matter of minutes.

https://gizmodo.com/how-one-little-amazon-error-can-destroy-the-internet-1792828399

The situation undeniably draws comparison to the DDoS attack that affected Dyn’s systems late last year, bringing most of America’s internet to its knees. While it’s unclear if hackers are behind this AWS outage, lots of work days are being ruined for people who depend on the internet to do their jobs.

We’ve reached out to Amazon for more details about the outage and will update this post when we hear back.

Update 3:37 PM: Looks like Amazon is making progress!

For S3, we believe we understand root cause and are working hard at repairing. Future updates across all services will be on dashboard.

— Amazon Web Services (@awscloud) February 28, 2017

Update 3:45 PM: We now know that the widespread outage was cause by a failure at AWS’ Northern Virginia facility. It’s AWS’ oldest farm and also the most commonly borked. The Atlantic did a nice story on it earlier this year.

Our own services here were affected by the outage, and others affected that we’ve seen include Slack, Trello, JWPlayer, SocialFlow, Charbeat, and Imgur. What’s not working for you?

Update 5:12pm: The latest from Amazon:

S3 object retrieval, listing and deletion are fully recovered now. We are still working to recover normal operations for adding new objects to S3.

That sounds like more progress, but the catastrophe still isn’t quite over.

Update 6:14: And, finally, Amazon has given the all clear, :

As of 1:49 PM PST, we are fully recovered for operations for adding new objects in S3, which was our last operation showing a high error rate. The Amazon S3 service is operating normally.

The the cause of the outage remains unknown. Nevertheless, we remember this Mardi Gras for years to come.