Uncaught bug in Fastly software triggered global service outage, company says
Tuesday’s disruption of numerous popular websites using the services of US cloud computing firm Fastly has been traced to a software bug, which sneaked into a recent update and got triggered by a user.
The websites of multiple news outlets, the British government, and services like Amazon and Spotify were among those affected by the hour-long outage on Tuesday. Fastly, whose servers were the source of the problem, says it traced the issue to a specific software bug, which its quality control engineers had failed to identify and fix ahead of a May update.
Also on rt.com Issue behind mass internet outage 'has been identified' & is currently being fixed, says FastlyThe bug was triggered by a customer, who was not identified. The user made a “valid” configuration change on Tuesday, starting a chain reaction that “caused 85 percent of our network to return errors,” Fastly Vice President Nick Rockwell said in a blog post.
“Even though there were specific conditions that triggered this outage, we should have anticipated it. We provide mission critical services, and we treat any action that can cause service issues with the utmost sensitivity and priority,” the executive said.
We apologize to our customers and those who rely on them for the outage and sincerely thank the community for its support.
Fastly said it noticed the issue a minute after it showed up, and managed to restore 95% of its network within 49 minutes. A permanent software patch fixing the problem was ready for deployment around five hours later. Rockwell promised to conduct a full analysis of the situation and figure out “why we didn’t detect the bug during our software quality assurance and testing processes.”
Also on rt.com Skynet? Russians? Twitter jesters have field day after #cyberattack trends amid massive downing of popular websitesThe outage caused by the Fastly glitch was one of several such large-scale incidents to have happened in the last several years. In February 2017, a human error made by an Amazon employee during a debugging process led to a cascading server shutdown and disrupted its AWS services for hours. In July 2020, a large portion of Cloudflare services went down for about 30 minutes due to a configuration error in a segment of the backbone network connecting Newark and Chicago.
Like this story? Share it with a friend!