The Matthew Chapman Podcast

The Day the Digital World Stood Still: Lessons from the Largest IT Outage in History

July 26, 2024 Matthew Chapman Episode 3
The Day the Digital World Stood Still: Lessons from the Largest IT Outage in History
The Matthew Chapman Podcast
More Info
The Matthew Chapman Podcast
The Day the Digital World Stood Still: Lessons from the Largest IT Outage in History
Jul 26, 2024 Episode 3
Matthew Chapman

What if a single update could bring the world to a standstill? On July 19th, that hypothetical nightmare became a reality when a CrowdStrike sys file update led to the largest IT outage in history. In this episode, we unravel the catastrophic chain of events that left 8.5 million Windows systems crippled and incurred an estimated $15 billion in global damages. From grounded flights to disrupted healthcare services, the fallout was immense and far-reaching. But that wasn’t the end—the chaos also opened the door for opportunistic hackers, posing as CrowdStrike support, to exploit the situation further.

Join us as we dissect the lessons learned from this unprecedented incident. We’ll explore the critical importance of rigorous testing, robust backup plans, and zero trust technologies in maintaining system integrity. Companies must prioritize compliance and vendor vetting to mitigate risks. Through CrowdStrike’s transparent response, we see a roadmap for handling such crises. As consumers of cybersecurity, the onus is on us to demand more stringent development and testing processes from our providers. Tune in to get the insights you need to stay secure and informed in an ever-evolving threat landscape.

Support the Show.

Show Notes Transcript

What if a single update could bring the world to a standstill? On July 19th, that hypothetical nightmare became a reality when a CrowdStrike sys file update led to the largest IT outage in history. In this episode, we unravel the catastrophic chain of events that left 8.5 million Windows systems crippled and incurred an estimated $15 billion in global damages. From grounded flights to disrupted healthcare services, the fallout was immense and far-reaching. But that wasn’t the end—the chaos also opened the door for opportunistic hackers, posing as CrowdStrike support, to exploit the situation further.

Join us as we dissect the lessons learned from this unprecedented incident. We’ll explore the critical importance of rigorous testing, robust backup plans, and zero trust technologies in maintaining system integrity. Companies must prioritize compliance and vendor vetting to mitigate risks. Through CrowdStrike’s transparent response, we see a roadmap for handling such crises. As consumers of cybersecurity, the onus is on us to demand more stringent development and testing processes from our providers. Tune in to get the insights you need to stay secure and informed in an ever-evolving threat landscape.

Support the Show.

Matthew Chapman:

I'm back. It's been a long time I think it's actually almost been 10 months so I'm going to make a promise now that I'm going to start doing these on the more often, on them more often, and just wanted to check in a little bit, since the CrowdStrike thing has been such a big thing. So, first off, I'm not going to rag on CrowdStrike, so if you're looking for that you can go away now. But just some of my thoughts. So, as we all know, it happened on July 19th. It was an update to a sys file Took down about eight and a half million Windows systems around the world. It's being called the largest IT outage in history. Insurers are estimating that the outage will cost at least just the US Fortune 500 company is about $5.4 billion, and estimates now for worldwide are reaching somewhere around $15 billion.

Matthew Chapman:

The blue screen of death required a reboot into safe mode to remove the offending file. That caused an increased downtime because of the reboots, especially for machines that had to go into the bitlocker and deal with that first. Roughly 10,000 flights were delayed or canceled or affected in some way Healthcare disruptions, clinics, hospitals. Some states even reported that 911 services had outages. Banks and media outlets suffered outages as well. And, of course, the hackers are taking advantage of the situation, sending phishing emails saying that they're from CrowdStrike and here to support you and help you, and some are even going so far as calling customers and posing as CrowdStrike support. Some are even selling scripts that are telling you that they can automate the process to recover your system, when actually they're installing some malware and backdoors.

Matthew Chapman:

So what are the lessons? Have a test and release plan and I love this phrase. I've always loved this phrase trust but verify. Have a backup recovery plan, but just don't have one tested regularly. At least twice a year, I tend to do quarterlies when I was doing things like CISO work. Consider deploying a zero trust technology to help keep changes from happening to critical systems. Some of my other thoughts on this.

Matthew Chapman:

Companies, depending on the industry and sector, are required to follow compliance regulations. The vendor they choose for tools and services either have to help them be compliant and or be compliant themselves. As someone that has been around long enough in this industry to see most cybersecurity companies suffer some level of issue impacting customers. Testing and more testing before pressing the button to release needs to be the standard Regression testing. Eat your own dog food Test. Roll it out to yourself, your own company, when satisfied and diligence has been accomplished.

Matthew Chapman:

Do a staggered rollout to the customer base, not all at once. Follow the pipeline for testing, not just of core code, but also your drivers, your patches and any changes to the state of known good. Any change needs to be tested. That's the lesson. Good change management needs to be followed and software development processes need to be followed more stringently. We all know that, even with all that, mistakes or anomalies will occur and the reaction of support to CrowdStrike because they have been very forthcoming and thus far have shown nothing. But transparency is also needed and respected in the industry. As a customer and a consumer, we now need to ask questions about how is your code developed and tested, not just how does it work and keep me secure and in compliance, we need to do more than just check the boxes. So there's my thoughts on CrowdStrike and I hope all of you are having a better day today. It's been a little while now, so hopefully things are starting to get a little better out there and I'll see you next time.