The CrowdStrike outage on July 19, 2024, is a stark reminder of DevOps practices’ critical role in deploying updates to maintain the security and reliability of applications and systems. While the underlying software defect was the immediate cause, the broader issue lies in the deployment process that allowed a severe flaw to impact a global customer base.
Key Takeaway: To maintain trust and reliability in today’s complex software, security, cloud and data center ecosystem, we must prioritize robust, measured deployment strategies.
The ability to rapidly deploy software updates across large, diverse environments is necessary for many software offerings. Updates can have unintended and, rarely, catastrophic consequences if done improperly or without diligent governance. The CrowdStrike incident highlights the importance of adopting more sophisticated deployment strategies—such as staggered, A-B, canary and phased rollouts—to minimize the blast radius of any potential defects. By initially releasing updates to a smaller subset of systems, organizations can validate changes in real-world conditions, catching issues before they affect the entire user base. Then, the update can be released in stages to increase the number of customers. This approach significantly reduces the risk of widespread disruptions and ensures that problems are contained and addressed quickly.
Additional Resources
- Research Note: The CrowdStrike Outage – A Detailed Post-Mortem
- Research Note: CrowdStrike IT Outage: Critical Global Impact and Implications for Cybersecurity
Let’s move beyond this single CrowdStrike incident and focus on the lessons we need to learn to avoid repeating them in the future. As software delivery becomes faster and more automated, the need for meticulous, well-orchestrated deployment processes has never been greater. It’s not just about preventing the next outage; it’s about building a culture of reliability and accountability in every step and layer of the software supply chain. The urgency of this need cannot be overstated.
Customers of software vendors bear a crucial responsibility in ensuring their systems remain secure and reliable. They should not rely on automatic updates without oversight, as doing so can leave their environments vulnerable to unforeseen issues. Organizations need to implement their own vetting and validation processes, testing updates in controlled environments before widespread deployment. This added layer of scrutiny helps catch potential problems early and prevents disruptions from unquestioningly trusting every update. In this way, customers play an active role in maintaining the integrity of their systems and minimizing risks.
This is our call to action: To maintain trust and reliability in today’s complex software, security, cloud and data center ecosystem, we must prioritize robust, measured deployment strategies.
Mitch Ashley is Chief Technology Advisor with The Futurum Group and CTO of Techstrong Group’s tech media platforms covering DevOps, cybersecurity, AI, cloud native, cloud infrastructure, platforms and ITSM.
Mitch’s analyst research is available on FuturumGroup.com and TechstrongResearch.com.