CrowdStrike Blames Quality Control Bug for Update That Caused Global Windows Outage

Cybersecurity company promises to improve how it handles errors and future updates.
CrowdStrike Blames Quality Control Bug for Update That Caused Global Windows Outage
A Windows error message caused by the CrowdStrike software update is displayed on a screen in a bus shelter in Washington on July 22, 2024. Justin Sullivan/Getty Images
Bill Pan
Updated:
0:00

CrowdStrike, the cybersecurity company at the center of massive global IT outages, blamed the meltdown on a bug in quality control software that allowed bad data in an update to be sent to millions of computers running Microsoft Windows.

About 8.5 million Windows machines across the globe crashed on July 19, forcing airports to ground flights, taking TV broadcasts off the air, and disrupting banks, hospitals, and the London Stock Exchange, among others. Some affected businesses, notably Delta Air Lines, are still struggling to recover.

CrowdStrike routinely sends out configuration updates for its Falcon Sensor product, a software suite that monitors and protects the user’s computer against threats and attacks.

Those updates are delivered in two different ways. One is called “Sensor Content,” which directly updates CrowdStrike’s Falcon Sensor and runs at the highest level of access to system resources. Separately, there is “Rapid Response Content” that updates how that sensor behaves to detect malware, allowing for fast response to changing threats.

However, a Rapid Response Content update that went out on the morning of July 19 included a broken file and slipped through CrowdStrike’s quality-control software.

In its post-incident review published on July 24, CrowdStrike stated, “Due to a bug in the Content Validator, one of the two [updates] passed validation despite containing problematic content data.”

The incident review further indicates that while CrowdStrike performs both automated and manual testing on sensor content, it places “trust in the checks performed in the Content Validator” on Rapid Response Content, which had until this point run smoothly.

The assumption that the Rapid Response Content rollout wouldn’t cause issues led to the Falcon Sensor loading the problematic update. This caused an out-of-bounds memory read, a type of error that occurs when a program tries to read data from memory that is outside of the bounds of what it is allowed to access. This triggered an exception that “could not be gracefully handled, resulting in a Windows operating system crash,” according to the review.

CrowdStrike, a California-based company, lost one-fifth of its stock value in the wake of the disaster. The firm promised to reform the way that it issues critical content updates.

Specifically, the company said it is planning to implement a “staggered deployment strategy” for future updates, first sending them out to just a handful of machines before a global rollout. This method is known in the industry as a “canary deployment.”

CrowdStrike will also “enhance existing error handling in the Content Interpreter,” which is part of the Falcon Sensor.

CrowdStrike also promised to use humans to test its Rapid Response Content, add extra validation checks to the content validator, and give customers the option to decide when and where these updates are deployed.

“Nothing is more important to me than the trust and confidence that our customers and partners have put into CrowdStrike,” George Kurtz, the company’s founder and chief executive, said in a statement following the outages.

“As we resolve this incident, you have my commitment to provide full transparency on how this occurred and steps we’re taking to prevent anything like this from happening again.”