Level 3 technician's misstep causes largest outage ever reported

Level 3 Communications' campus in Broomfield, Colo.
CenturyLink purchased Level 3 in 2016. (Level 3)

On Oct. 4, 2016, phone service on Level 3’s network was blocked for nearly an hour and a half across the nation. Level 3 shortly thereafter copped to “a configuration error,” but said little more publicly. The company got more specific with its customers, revealing a Level 3 technician made a clerical error. The specific mechanism has just been made public by the Federal Communications Commission.  

Yesterday the FCC’s Public Safety and Homeland Security Bureau posted its report (PDF) on the Level 3 outage. The bureau administers the Network Outage Reporting System (NORS) and conducts investigations into service disruptions. Its summary of the error points to a decidedly pedestrian source:

“As part of its regular network maintenance practices, which involve network changes once or twice a day, a technician made changes to Level 3’s network management software, which manages soft switches and gateways. Specifically, the outage occurred while the technician was conducting routine anti-fraud operations in Level 3’s vendor-supplied network management software. The anti-fraud operations were intended to block calls originating from telephone numbers that are not native to Level 3’s network that are suspected of association with malicious activity. The technician left empty a field that would normally contain a target telephone number. The network management software interpreted the empty field as a 'wildcard,' meaning that the software understood the blank field as an instruction to block all calls, instead of as a null entry. This caused the switch to block calls from every number in Level 3’s non-native telephone number database.”

Next Gen Wireless Networks Summit

The Industry’s Most Exclusive Event for Wireless Network Executives

Join FierceWireless this Oct. 17-18 in Dallas as we cover 5G and its part in a much larger story about the next generation of wireless, through a mix of keynotes and fireside chats, breakout sessions and panels. Attendees will have the opportunity to hear from AT&T, Verizon, Sprint, T-Mobile, Starry, Dish, Nokia, Crown Castle, Ericsson, Qualcomm, Mavenir and more during the two-day event.

Register to secure your spot at the event! Now is your chance to join over 300 industry professionals as we gather in Dallas for the second annual event. Registration information and the schedule can be found on the website.

The vendor who supplied the network management software is not identified in the report. Cisco is a supplier (PDF) of network management systems to Level 3. 

Level 3 was aware it had a problem within four minutes, the FCC report said. The problem was difficult to diagnose, however, because no one at Level 3 was aware of the consequences of leaving that particular field empty, nor had anyone at the company previously seen the system behave the way it was behaving.

The outage affected approximately 29.4 million interconnected VoIP users and approximately 2.3 million wireless users. The full tally of calls that failed to go through exceeded 111 million. “This nationwide outage was the largest ever reported in NORS,” the FCC said.

The FCC report said Level 3 subsequently adopted measures to prevent a recurrence of the problem—measures in accord with best practices, the FCC drily noted, the Communications Security, Reliability and Interoperability Council had adopted five years earlier.

The Level 3 network disruption was on Oct. 4. On Oct. 21, the U.S. experienced what at the time was one of the worst disruptions of the internet ever, one that made many prominent websites grind nearly to a halt for hours, particularly on the East Coast but elsewhere as well. The cause was a series of distributed denial-of-service (DDoS) attacks, together the largest to that date, all in an onslaught against Dyn.

Some immediately accused Level 3 of being the culprit behind that outage as well. In fact, the company had nothing to do with the attack on Dyn, but the accusations did expose how Level 3’s reputation had been compromised.

But then, Level 3 hasn’t helped resuscitate its image much either. About a year later, on Nov. 7, 2017, Level 3 experienced yet another outage, this one of backhaul systems that caused a disruption in service to customers of Comcast, Charter, Cox Communications and Verizon, among other service providers.

CenturyLink bought Level 3 at the end of 2016.