A “relatively small addition of capacity” to the Amazon Kinesis real-time data processing service triggered a widespread Amazon Web Services outage last week, the company said in a detailed technical analysis over the weekend.
The addition of the new capacity “caused all of the servers in the fleet to exceed the maximum number of threads allowed by an operating system configuration,” describing a cascade of resulting problems that took down thousands of sites and services.
The outage impacted online services from big tech companies such as Adobe, Roku, Twilio, Flickr, Autodesk, and others, including New York City’s Metropolitan Transit Authority and the Washington Post, which is owned by Amazon CEO Jeff Bezos, was also impacted by the outage.
It was an especially ill-timed incident for Amazon, coming just days before its annual AWS re:Invent cloud conference. Reliability has