news  

A Year After the CrowdStrike Outage: What Changed?

A Year After the CrowdStrike Outage: What Changed?

The Global IT Outage of 2024 and Its Lasting Impact

One year ago, a critical error in a software update from a cybersecurity firm triggered a massive global disruption. On July 19, 2024, Crowdstrike released an update to its Falcon program, which is used by Microsoft Windows systems to monitor potential cyber threats. What was meant to be a routine update instead caused the infamous “Blue Screen of Death” (BSOD) for approximately 8.5 million users. This event became one of the most significant internet outages in history, affecting hospitals, airlines, banks, and government offices worldwide.

The financial repercussions were severe, with estimated losses reaching around $10 billion (€8.59 billion) for Crowdstrike’s clients. Steve Sands, a fellow of the Chartered Institute for IT, highlighted that there were no clear warning signs leading up to the incident. Many organizations had not prepared for such a scenario, leaving them vulnerable when the outage occurred.

Lessons Learned and Ongoing Challenges

Despite the lessons learned from the 2024 incident, recent events suggest that the cybersecurity community has not fully adapted. Eileen Haggerty, vice president of product and solutions at cloud security company NETSCOUT, noted that outages at banks and major service providers this year indicate that many companies are still unprepared for similar disruptions.

For example, a cloud outage from Cloudflare in June affected Google Cloud and Spotify, while changes to Microsoft’s Authenticator app led to issues for Outlook and Gmail users in July. A software flaw at SentinelOne also disrupted critical networks. These incidents underscore the need for continuous vigilance and proactive measures.

Haggerty emphasized the importance of round-the-clock monitoring of IT environments. She recommended that IT teams conduct synthetic tests, which simulate real traffic scenarios before critical functions fail. These tests can provide valuable insights into potential issues, allowing companies to address problems before they escalate.

Microsoft acknowledged that synthetic monitoring is not foolproof but can improve response times once an issue is identified. After an outage, Haggerty also suggested creating detailed repositories of information about the incident. These records should include plans for resilience and recovery, as well as an evaluation of dependencies on external companies.

Building Resilience: A Long-Term Approach

Sands stressed that building resilience should be a priority from the outset. It is difficult to implement after the fact, he said. While many companies have updated their incident response plans, Sands believes that the long-term impact of the outage may have been forgotten by some.

Nathalie Devillier, an expert at the EU European Cyber Competence Centre, previously called for European cloud and IT security providers to be based within the continent. She argued that relying on foreign technology solutions could lead to vulnerabilities, as seen in the 2024 incident.

Crowdstrike’s Response and Future Measures

In the aftermath of the outage, Crowdstrike took several steps to prevent future disruptions. The company introduced a self-recovery mode that detects crash loops and transitions systems into safe mode automatically. It also developed a new interface that allows customers greater flexibility in testing system updates. For instance, customers can now set different deployment schedules for test systems and critical infrastructure, avoiding simultaneous updates that could cause instability.

Additionally, a content pinning feature enables customers to lock specific versions of their content and choose when and how updates are applied. Crowdstrike has also established a Digital Operations Center, claiming it will provide deeper visibility and faster response times for the millions of computers using its technology globally.

The company conducts regular reviews of its code, quality processes, and operational procedures. CEO George Kurtz emphasized that while the 2024 incident was a defining moment, it was the subsequent actions that truly mattered. He stated that Crowdstrike is now “grounded in resilience, transparency, and relentless execution.”

The Road Ahead

Despite these improvements, Sands believes it may be challenging to completely avoid another large-scale outage. Computers and networks are inherently complex, with numerous dependencies. However, he remains optimistic that resilience can be improved through better architecture, design, and preparation for detection, response, and recovery. As the digital landscape continues to evolve, ongoing vigilance and adaptability will be crucial for preventing future disruptions.