As companies and government agencies around the world scramble to restore their computer systems following last week’s global outage from a faulty software update, questions are being raised about whether proper protocols for updates were followed.
Simultaneously, technology analysts are raising concerns about the extent of the United States’ increasing dependence on an oligopoly of cloud computing firms.
The company has issued multiple apologies since the event and pledged to resolve the issues, much of which cannot be fixed through system-wide updates but require fixes on individual computers.
“The confidence we built in drips over the years was lost in buckets within hours, and it was a gut punch. But this pales in comparison to the pain we’ve caused our customers and our partners.”
Cybersecurity experts have raised questions about whether CrowdStrike may have circumvented best-practice procedures when it circulated the July 19 update.
“The cautionary tale, to me, is the basics—for patches, updates, and on critical business systems, take the 10 minutes to test them,” Robert Thomas, owner of cybersecurity company 180A Consulting and a former Defense Department staffer, told The Epoch Times.
“You take one minute and you download the patch; you take another minute, you install the patch on a test system; one more minute, you reboot the system, and then you run tests against your business-critical software applications.”
The Center for Internet Security (CIS) and the National Institute of Standards and Technology (NIST) have created standard protocols regarding how software updates should be conducted. Had they been followed, Mr. Thomas said, the flaws in the update should have become apparent before it was circulated to users.
“Software updates, by best practice/protocol, should go through numerous stages of testing prior to touching a customer,” Tom Marsland, training and project manager of Cloud Range and author of “Unveiling the NIST Risk Management Framework,” told The Epoch Times.
“This would include automated unit testing on the code, security reviews, and testing inside of the CrowdStrike team [and] only once those actions are completed should a patch be rolled out to customers,” Mr. Marsland said.
In addition, updates should be rolled out initially to a smaller group of customers and then expanded, rather than sent out broadly all at once, he said.
“In the case of the CrowdStrike update on Friday, it does not appear those practices were followed,” Mr. Marsland said.
The ‘Cascading Effects’ of the Faulty Update
According to an assessment by the CIS, the effects of the faulty update became apparent just after midnight Eastern time on July 19, when computers operating on Microsoft’s Windows software that implemented updates from CrowdStrike’s Falcon security software went down.The update circulated for about an hour and half until the flaw was discovered and the update was “reverted,” according to the CIS.
“CrowdStrike has since issued a workaround that requires manual remediation for each affected device,” the CIS stated.
“They’re saying that this isn’t a cybersecurity attack, but it had the same net result as a cybersecurity attack,” Rex Lee, a security adviser to companies and governments, told NTD News, an Epoch Times affiliate. “We’re talking about government agencies, we’re talking about Fortune 500 businesses, airlines ... the cascading effects of this are unbelievable.
“If you look at the critical infrastructure that’s being affected, this is actually going to cause harm and people may be dying as a result of this, because first responders are being affected, hospitals are being affected. We won’t know the total damage from all this, but it’s going to go down in history as the largest mistake and/or outage in the history of the internet.”
The shift by companies and government agencies to cloud computing has been rapid and continues to accelerate.
“Cloud has become essentially indispensable,” Sid Nag, vice president analyst at Gartner, stated in the report.
But last week’s outage has highlighted the issue of company and societal vulnerabilities because of the extent to which cloud computing services are controlled by a small number of providers.
“A decade ago, businesses were uncertain whether the expansion of cloud computing by tech giants like Google, Microsoft, and Amazon was just a passing trend or a lasting shift,” Mr. Von Watzdorf stated in the report. “Today, companies worldwide have embraced the cloud in droves, recognising it as a vital component of successful digital transformation.
“However, the concentration of services with three dominant providers has created new risks, which are relevant to the re/insurance industry.
Societal and National Security Risks
Government agencies are also assessing the risks of cloud computing and tech consolidation.On the day of the outage, a senior White House official stated that “the White House has been convening agencies to assess impacts to the U.S. government’s operations and entities around the country.”
Amid the rush to shift operations onto the cloud, the CrowdStrike outage will likely spur users to reassess the extent of their dependence on one or a few service providers, and their ability to weather errors by providers.
“We’re reaching the point where over-centralization makes us less ‘healable,’ and less resilient,” Mr. Thomas said. “We’re losing our resiliency as a nation.”
After the CrowdStrike outage, companies and governments are now seeing the risks, as well as the benefits.
“There are absolutely societal and national security risks from putting all of your eggs in a single vendor basket, and I think those were clearly indicated in the past 72 hours when we grounded most flights nationwide,” Mr. Marsland said.
“The benefits of the cloud versus the risks is something each organization must answer for themselves,” Mr. Marsland said. “For organizations seeking a broader customer base, the benefits absolutely do outweigh the risks—but these organizations can afford to hire experts in cloud security.”
On a personal level, individuals who store their data in the cloud also face risks.
Security risks include personal data being exposed “through a security breach or incompetence on the part of the cloud service provider,” the report states, as well as the sharing of personal information with other businesses, government agencies, or employees of the cloud service provider, and malware or phishes that could gain access to your information.
The privacy policies of cloud service vendors “all reveal that the vendor, regardless of any claims to the contrary or use of encryption, has the ability to decrypt and access any stored data whenever they deem it necessary,” the report states.
For companies now seeking to get their computer systems back online, new risks have emerged from hackers looking to seize the opportunity the outage opened for them.
CrowdStrike reportedly represents about 15 percent of the cybersecurity market, catering to larger organizations and second only to Microsoft, which has an approximate 40 percent market share, according to Gartner. Its share price tumbled by more than 25 percent since the outage.
Speculation has focused on the firm’s ability to weather the current crisis, retain customers, and continue to grow its business. But beyond that, CrowdStrike may be facing substantial bills from its clientele.
Specifically, the company said it is planning to implement a “staggered deployment strategy” for future updates, first sending them out to just a handful of machines before a global rollout. This method is known in the industry as a “canary deployment.”
CrowdStrike will also “enhance existing error handling in the Content Interpreter,” which is part of the Falcon Sensor.
CrowdStrike also promised to use humans to test its Rapid Response Content, add extra validation checks to the content validator, and give customers the option to decide when and where these updates are deployed.