The global outage that post, that Facebook, Instagram, and WhatsApp went dark “not by malicious activity but because of an error of our own making.”
This problem happened as engineers were working on Facebook’s global backbone network. These include the computers, routers, and software located in the data centers around the globe along with the fiber-optic cables that connect them.
Janardhan stated Tuesday that during one of these routine maintenance jobs, Janardhan issued a command with the intention of assessing the availability of global bandwidth capacity. This unintentionally caused all connections to our backbone network to be lost, effectively disconnecting Facebook data centers worldwide.
Facebook’s systems are built to detect such errors, but Janardhan explained that a bug in the audit tool made it impossible for Facebook to stop the command.
This change caused a second problem, making it impossible for users to access Facebook’s servers, even though they were functional.
Janardhan stated that engineers were quick to resolve the issue on the spot, but it took time due to the additional layers of security. Data centers are difficult to access, and once inside, hardware and routers can be hard to alter even if you have physical access.
After connectivity was restored, services were gradually brought back to prevent traffic surges that could lead to more crashes.
Although it was an “unforeseen exception” that caused a faulty maintenance upgrade to cause Facebook’s backbone network to go down, the company could have avoided the scenario where its servers were taken offline completely, making it impossible for users to access the tools required to fix it. Angelique Medina of Cisco Systems’ ThousandEyes monitors internet outages.
Medina stated that the big question is “Why so many internal tools or systems could have one source of failure.” Facebook would have been offline if the network outage had occurred, but they could have fixed the problem sooner if they had access to the internal network.