It’s not unusual for websites to experience an outage from time to time, and there are varying reasons why it can happen. But it is uncommon for a Facebook outage to hit and affect pretty much all Facebook users around the globe. But it wasn’t just a Facebook outage that created the online frenzy yesterday. Facebook’s other properties, Instagram and WhatsApp, were also affected. So what happened?
Estimated reading time: 4 minutes
Facebook is a large enough company that generally has its act together to deal with any disturbance in its operations. Redundancy is a big part of assuring that a Facebook outage is nearly impossible. But yesterday was a different story as Facebook scrambled for about 8-hours to restore its services to normal operating parameters.
There are also reports that this Facebook outage cost Mark Zuckerberg billions in revenue and affected his status on the world’s richest list; I’m sure we’re all sad about that.
There were plenty of conspiracy theories and speculation about what caused this Facebook outage. Still, now that it’s over, the company and others have diagnosed the problem that caused Facebook to go down. In a nutshell, Facebook had an update to its BGP (Border Gateway Protocol) protocols go wrong, which messed up the DNS (Domain Name System).
Now, we realize many of our readers are not familiar with any of this tech-speak, so we’re going to try and make it simple and link to other sources if you want to go more in-depth with the entire Facebook outage.
What is DNS?: The Domain Name System (DNS) is the phonebook of the Internet. Humans access information online through domain names, like nytimes.com or espn.com. Web browsers interact through Internet Protocol (IP) addresses. DNS translates domain names to IP addresses so browsers can load Internet resourcesCloudflare
What is BGP?: BGP stands for Border Gateway Protocol. It’s a mechanism to exchange routing information between autonomous systems (AS) on the Internet. The big routers that make the Internet work have huge, constantly updated lists of the possible routes that can be used to deliver every network packet to their final destinations. Without BGP, the Internet routers wouldn’t know what to do, and the Internet wouldn’t work.
The Internet is literally a network of networks, and it’s bound together by BGP. BGP allows one network (say Facebook) to advertise its presence to other networks that form the Internet. As we write Facebook is not advertising its presence, ISPs and other networks can’t find Facebook’s network and so it is unavailable.
The individual networks each have an ASN: an Autonomous System Number. An Autonomous System (AS) is an individual network with a unified internal routing policy. An AS can originate prefixes (say that they control a group of IP addresses), as well as transit prefixes (say they know how to reach specific groups of IP addresses).Cloudflare
So, the Facebook outage was caused by configuration changes on Facebook’s backbone routers which interrupted the communication channels used to deliver the site to the internet. These changes were made to its BGP protocols which affected the DNS and basically broke it all.
Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication. This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt.Facebook
It’s far more complex than how we’ve laid it out here, but it’s easy to understand. There are plenty of resources, such as the Cloudflare links, that can walk you through the technical details of the entire Facebook outage. Feel free to hit those up if you want more information.
What do you think of this Facebook outage? Please share your thoughts on any of the social media pages listed below. You can also comment on our MeWe page by joining the MeWe social network.
Last Updated on October 5, 2021.