CenturyLink suffered a widespread network outage across the U.S. Sunday morning, which had a ripple effect across sites such as Cloudflare, Hulu and Amazon, among others.
ThousandEyes, which is a monitoring company that's being bought by Cisco, said in an email to FierceTelecom Sunday afternoon that it detected a large-scale outage on CenturyLink's Level 3 backbone starting at 6 a.m.
"Level 3 is used as a transit provider for many app providers, so it had a significant cascading impact on global connectivity to thousands of services," ThousandEyes said in the email.
CenturyLink's network provides core IP, voice, video, and content delivery for a large number of carriers across North America, Latin America, Europe, and some of Asia. CenturyLink said the IP-outage was fixed just after 11 a.m. and that all of the services that were impacted had been restored.
“Today we saw a widespread internet outage online that impacted many multiple providers,” Cloudflare CTO John Graham-Cumming said, in a statement. “This was not a Cloudflare-specific outage. Level 3/CenturyLink was responsible for an outage that affected many internet services, including Cloudflare. Cloudflare’s automated systems detected the problem and routed around them, but the extent of the problem required manual intervention as well.”
Cloudflare provided a detailed account of CenturyLink's outage in a Sunday blog, which said that traffic dropped to "near-zero during the incident." Globally, Cloudflare saw a 3.5% drop in global traffic during the outage, nearly all of which was due to a nearly complete outage of CenturyLink’s ISP service across the United States, which would indicate that the outage was among the largest in a while.
CenturyLink/Level 3 requested that other backbone providers disable their peering with its backbone and ignore traffic coming in from its network—which led to a loss of connectivity for customers— as it worked to fix the outage.
According to a CenturyLink status page, the outage originated from CenturyLink's CA3 data center in Mississauga, a city outside of Ontario, Canada.
Matthew Prince, co-founder and CEO of Cloudflare, said in his blog that there were a significant number of BGP updates on CenturyLink's network Sunday morning. Prince speculated that a Flowspec update by CenturyLink was the root cause of the outage.
"These updates show the instability of BGP routes inside the CenturyLink/Level(3) backbone," Prince said. "The question is what would have caused this instability. The CenturyLink/Level(3) status update offers some hints and points at a Flowspec update as the root cause.
"So what is Flowspec? Flowspec is an extension to BGP, which allows firewall rules to be easily distributed across a network, or even between networks, using BGP. Flowspec is a powerful tool. It allows you to efficiently push rules across an entire network almost instantly. It is great when you are trying to quickly respond to something like an attack, but it can be dangerous if you make a mistake."
CenturyLink didn't provide any additional details on the IP outage Sunday afternnoon.
"On August 30, customers in several global markets were impacted by an IP outage across the network," a CenturyLink spokesman said in an email Sunday afternoon to FierceTelecom. "We can now confirm that all services have been restored."
CenturyLink, which bought Level 3 three years ago for $34 billion, said the IP outage affected its content delivery networks (CDNs), according to a story by CNN.
The network outage also caused IPv4 problems in Europe. "IPv4 peering with CenturyLink AS3356 has now been re-enabled globally as they report the incident has been resolved," according to a Twitter post by Telia Carrier, which has one of the world's largest backbones.