So what is CrowdStrike, and how did it trigger one of the largest global information technology (I.T.) outages in history? CrowdStrike is a United States cybersecurity firm that detects and blocks hacking threats. It is a cloud-based endpoint protection system that utilizes artificial intelligence to detect intrusions across networks and endpoints. Endpoints are physical devices that can connect to a network system, such as mobile phones and desktop computers. The program is widely used by numerous Fortune 500 companies due its ability to detect advanced malware and prevent attacks before any harm can be caused.
On July 18, 2024, a CrowdStrike update caused the major I.T. outage whose effects were widely felt on the following day, impacting every Microsoft Windows operating system that had the CrowdStrike application installed and updated. Computers for hospital, business, technology and airport operations all came to a halt in less than one day.
In a sensor configuration update, there was a logic flaw that ultimately caused the entire operating system to crash. The sensor is regularly and frequently updated in order to provide customers with mitigation and threat protection. The intention of the update was to help improve how CrowdStrike evaluates interprocess communications on Microsoft Windows, but failure to identify a logic error caused the sensor and Windows systems to crash. By the time Crowdstrike noticed the error and resolved the logic flaw, many users had already updated their systems. The operating system is responsible for all low-level functions such as file systems and memory processing. CrowdStrike is tightly integrated into the Microsoft Windows core system and because of this deep embedment, the Windows system crashed and triggered the “blue screen of death (BSOD).”
According to Microsoft, around 8.5 million Windows devices were globally affected, representing less than 1% of all Windows products. Although the relative percentage of impacted devices is small, there were critical operations that relied on these affected systems, including those of airlines and airports, public transportation, healthcare, financial services and media organizations.
Thousands of flights were grounded, creating mass delays and cancellations of more than 10,000 flights globally. In the U.S., the main affected airlines were Delta Air Lines, United Airlines and American Airlines. Along with flights being grounded, baggage claim systems were impacted and passengers were left scrambling to reschedule flights and track down personal belongings.
Public transportation was also affected in cities, including Chicago, Minneapolis, New York City and Washington, D.C. Huge crowds of commuters attempted to board trains amidst a dysfunctional scheduling system.
The outage created significant disruptions for appointment systems in hospitals and healthcare clinics across the globe, resulting in delays and cancellations. According to the USA Today, there were also a number of states like Alaska and Indiana that had reported 911 emergency services being unable to respond to emergency calls. Financial institutions and online banking services were also affected as a number of customers reported not receiving expected payments and checks. Even media organizations like Sky News were taken off air due to the outage.
Crowdstrike identified and deployed a fix for the faulty update within an hour and a half, but the recovery process for businesses and organizations are complex and long. Due to the update triggering the BSOD, the standard rebooting process is not an option. The BSOD is commonly associated with a damaged and unfixable device, but it is usually resolved by rebooting the computer. I.T. administrators were forced to manually boot each affected system by either activating a safe mode or going into the Windows Recovery Environment and deleting the update. The recovery process is both time and labor intensive, especially for businesses with several affected systems.
It is estimated that it will take several months for organizations to fully recover all systems from the CrowdStrike outage, and it brings up a greater question of how we can prepare for a crisis like this in the future. According to TechTarget, the best preparatory actions are testing all updates prior to deploying to products, developing manual workarounds and preparing disaster recovery plans. Looking at how CrowdStrike’s single system update impacted the world, it brings to light how heavily society relies on modern technology and emphasizes a bigger concern of technological outages.