AWS cloud services back after widespread outage hits SA

Popular internet services, ranging from streaming platforms to messaging services to banking, were offline for hours.


Amazon says its systems are back online after connectivity issues, but reports of problems with its cloud computing services unit Amazon Web Services (AWS) continue.

Popular internet services ranging from streaming platforms to messaging services to banking were offline for hours on Monday due to an outage in Amazon’s crucial cloud network, illustrating the extent to which internet life depends on the tech titan.

South Africa

The disruption affected streaming platforms, including Amazon’s Prime Video service and Disney+, as well as Perplexity AI, the Fortnite game, Airbnb, Roblox, Snapchat and Duolingo.

Downdetector showed that in South Africa, users of AWS, Standard Bank, Canva and Zoom also reported outages.

ALSO READ: Experiencing slow internet connectivity? This is why

Back online  

Amazon said on an status page that the system at issue was back to “pre-event levels” and expected it would to take time to work through the data backlog caused by the problem.

Reports of problems with AWS plummeted at DownDetector but lingered.

A huge spike in disruption logged at Downdetector early on Monday was followed by an even bigger jump some nine hours later, with the internet trouble tracker posting that it had received more than 11 million reports in total.

Update

In an update, Amazon said “mitigations were applied to resolve launch failures”, linking a “load balancer health” issue to the problem at Amazon Web Services (AWS), according to AFP.

Its maintenance site said engineers scrambled to fix a DNS issue once they became aware of “increased error rates” hitting multiple services. It was resolved, but caused a huge backlog of stymied requests that had to be worked through.

Outage

More than 10 hours later, AWS was still working to get the cloud computing system running smoothly.

The outage was the largest internet disruption since last year’s CrowdStrike malfunction hobbled technology systems in hospitals, banks and airports, highlighting the vulnerability of the world’s interconnected technologies.

Earlier, AWS said the root cause of the outage was an underlying subsystem that monitors the health of its network load balancers used to distribute traffic across several servers.

AWS handles nearly a third of the planet’s cloud infrastructure market, powering millions of apps and websites around the world.

ALSO READ: Microsoft working on global outages, Capitec says banking services restored