The 8 worst outages in 2021: AWS, Google Cloud, Fastly, etc.

For applications and websites powered by the cloud, 2021 will be a shameful year-that’s basically all of this.

Cloud service disruptions are nothing new. However, the shift to working from home in 2020 has exposed a large number of vulnerabilities, as operators, cable and fiber companies, and every popular application under the sun have experienced some temporary catastrophic crashes. It puts an unprecedented burden on cloud infrastructure systems that support your favorite streaming and productivity sites. These interruptions are inevitable results.

You had hoped that there would be a significant improvement in 2021. On the contrary, it proves that the Internet is a deck of cards, and if the wrong base part is folded, it will collapse. Whether because of frugality or poor planning, many websites put all the data and traffic eggs in a cloud basket; a single node failure can destroy some of the highest traffic sites, and we hope these sites have better emergency response measure.

This year, we saw our favorite messaging apps, smart homes, gaming networks, productivity kits, and social media sites crash at some point. In addition, the outages of Amazon Web Services (AWS) and Facebook prove how much of our daily lives depend on the cloud, from smart home technology to our package delivery.

Looking back at the worst power outage in 2021, we can only hope that the situation will improve in 2022. But unless cloud infrastructure companies and content delivery networks (CDNs) change the way they do things—unless companies start to increase offline business, there is no reason to assume that they will improve the capabilities of cloud-dependent technologies.

1. AWS outage causes deliveries, cameras and cat feeders to stop

The AWS outage in the last December may still be fresh in your memory. It is said that Amazon Web Services runs about 33% of cloud infrastructure services, so when AWS crashed on December 7, it could take away about a third of cloud services.

According to the AWS team, the AWS internal network used for monitoring, internal DNS, and authorization services has in some way triggered “a large number of connection activities that overwhelmed the network equipment between the internal network and the AWS main network, resulting in inter- Communication delay network.” Because this internal network is connected to the global AWS server, it will cause international traffic delays or the site to shut down completely for approximately 7 hours before developers repair the internal network.

During holiday shopping, Amazon delivery drivers’ route and address apps malfunctioned, preventing them from completing deliveries. Consumers are also unable to place orders on Amazon, which means that the company has almost missed a day’s revenue. Amazon’s first-party services-Alexa, Ring camera, Prime Video, and music-are all turned off, which means their smart video doorbell and baby monitor are temporarily worthless. As a result of choosing a cloud provider, popular third-party applications such as Disney+, Venmo, and iRobot have all crashed.

According to CNBC, the impact of the AWS outage has even affected the final exams at universities, because some exam services rely on the cloud to work. Even some “smart” automatic cat feeders stopped feeding cats that day.

After the interruption, readers of Android Central said that they are more cautious about cloud-based smart home technology than before. Although experts believe that Amazon needs to incorporate offline control into its smart home technology, they also believe that this is unlikely. Again, this is because the cloud allows them to sell cheap, underpowered technology that cannot operate without it.

2. Yuan poems fall apart

If we talk about the most chaotic disruption in 2021, we have to mention Facebook. Just before the Meta name change, Facebook accidentally shut down its cloud service because of “the configuration on the backbone router that coordinates network traffic between our data centers has changed”, which cascades and shuts down all online services. It ensures that no one can access any Meta service worldwide, including its own employees.

Although Meta’s cloud server only powers its own businesses such as Facebook, Instagram and Whatsapp, the outage still affects other companies. Any website that relies on Facebook login cannot be accessed by users, and other shopping websites or games that rely on Meta servers or tokens are also closed.

In addition, of course, the Facebook outage destroyed its own cloud-driven peripherals. Due to Facebook account requirements, Quest 2 owners can no longer access their game library, and the Ray-Ban Stories smart glasses have lost their intelligence. We commented at the time that Facebook needs to add offline support to its technology in the future.

Most importantly, the 6-hour Whatsapp outage proved to be the company’s worst fiasco. For the millions of people who use the app as the main way to communicate with their families, even a day without it is too much. According to reports, after the power outage, Telegram gained 70 million new members. This does not necessarily mean Whatsapp Lost So many users, but it must have seen a lot of outflows that it may never win back.

Similar interruptions occurred in Whatsapp, Facebook and Instagram in April 2021, but they only lasted 45 minutes.

3. Fast disconnection

When something takes effect, you won’t pay attention to it. So many people had never heard of Fastly’s content delivery network (CDN) before the outbreak in June, which dragged down some of the most popular websites.

CDN helps to cache content to speed up loading time and reduce bandwidth load on hosting servers, which is why many companies rely on them. They transmit data at high speeds around the world, ensuring that the data is transmitted to different locations around the world, and the loading time can be kept low no matter where the user lives.

But as far as Fastly is concerned, the wrong service configuration “triggered the interruption of our global POP”, which harmed the sites that rely on its edge computing. Specifically, websites such as Amazon, Twitter, Reddit, Google, CNN, The Guardian, and the New York Times were all launched at the same time in early June. “95%” of the service was quickly restored within 49 minutes. Compared with other services, this was a widespread but relatively short service interruption.

4. The chaotic PS5 year caused four PSN outages

Suppose you managed to buy a PS5 this year, then sometime in 2021, you may have problems accessing the library or playing multiplayer games. Sony and CDN Akamai Technologies dealt with outages several times throughout the year.

The most serious and longest PSN outage occurred from late February to early March, causing some PS5 and PS4 players to be unable to access their game library occasionally for a few days.

However, three more outages occurred in the following months, which showed that Sony had basic network issues to resolve. In each case, players all over the world will encounter maintenance-related error messages when accessing the online service, and the interruption lasts from 1 to 5 hours.

Among the best PS5 games, many games require continuous online connections or multiplayer games. If Sony cannot maintain the functionality of its PSN service for several days in 2022, it will certainly make its loyal fans unhappy.

5. Google can’t help its smart home customers

Our first major blackout in 2021 occurred in February, because the Google Assistant suddenly lost memory. If you try to ask your Nest or Google Home speakers a question, you will be told that “the device has not been set up,” despite all evidence to the contrary. This makes it impossible to connect to the Google Home devices associated with your account, from smart lights to Nest security technology. In addition, the Android version of the Google Assistant app also has problems answering questions.

This seemed to affect all Google Home users that night, and users went to Reddit and support forums for help. A few hours after the problem became widely known, Google did solve the problem that night, but it is not clear when it started.

6. Wink’s smart home winks

Most of the worst power outages in 2021 affected a wide range of sites in a relatively short period of time.Real prize Worst However, this year’s power outage was caused by Wink Hubs, which was closed for 10 days. Because of their new reliance on cloud services to work, these hubs can no longer control Zigbee or Z-Wave products, making them almost worthless.

As an apology, Wink offered a 25% discount on its subscription cost, but as far as we know, it has never really explained the cause of the problem-only that it will “optimize the Wink backend and our API because it has already Backup.” Many customers see this interruption as a sign that it is time to completely abandon Wink.

7. The Android Exposure Notification System stops functioning

In terms of contact tracing and preventing COVID-19 exposure, any delay in learning about your condition may lead to further spread and disease. Therefore, it is not a good idea for Google when the NHS COVID-19 application fails due to a problem with Google’s back-end Android exposure notification system.

People who wanted to check their status found an indeterminate “loading” screen. Google announced that it would investigate the issue after approximately 12 hours of error reporting, and then it took another 5-6 hours to resolve the error. Add to that the creepy “phantom notification” glitch in 2020—the wrong notification that users are exposed to COVID-19 pops up and then disappears before you click on it—by then people have many reasons not to trust the app.

8. AWS Interruption Restoration

Following the major outage of AWS on December 7, we saw the second AWS outage on December 15 caused by problems with Amazon Web Services facilities in Amazon Oregon and Northern California. This time, it took out Twitch, DoorDash, Xbox Live, PSN, Ring, Disney+ and T-Mobile.

Then, we saw the third AWS outage on December 22, causing Fortnite, Hulu, Quora, Slack, and Imgur to shut down. In this case, the power outage of East Coast facilities caused problems. So this caused three outages in three weeks. The last two interruptions only lasted about an hour, although it was certainly enough to cause problems.

Will the blackout problem decrease or increase in 2022?

These various events highlight how fragile our current cloud-dependent systems are. Since our large amount of Internet usage is concentrated on a few applications and services-most of which use a few major cloud infrastructure providers-a crisis may weaken our productivity or render our expensive technology useless .

So can we hope that there will be fewer accidents next year?

To reduce disruption, we need to invest more in cloud infrastructure. The recent infrastructure bill has allocated billions of dollars to improve high-speed, rural broadband access, and civilian network security, but most of the most serious outages in 2021 came from corporate errors, not hostile actors. Therefore, we may have to count on (or pressure) the company to invest more in cloud infrastructure.

Currently, Gartner predicts that the company will spend US$482 billion on cloud services in 2022, an increase of 21.7%. At least, this should be a step in the right direction.

It’s important to note that many of the most serious power outages are caused by the company’s Internal Monitor the network or come from a third-party CDN, not the main server. A system designed to monitor and prevent outages may cause the entire system to shut down under wrong conditions, in which case human error may have disproportionate consequences. Although CDNs are essential to provide the fastest possible traffic, they do add a potential step that can cause problems.

When a single node, server, or data center can overthrow the system, the investment is not important. In order to reduce major outages in 2022, we need companies to better structure their data so that backups can start quickly until the problem node is repaired. Our condition is much better than it was two years ago, but we still have a long way to go to make the power outage less permanent.