BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Beat the Plan: Probabilistic Strategies for Successful Software Delivery at Scale

Beat the Plan: Probabilistic Strategies for Successful Software Delivery at Scale

Key Takeaways

  • Probabilistic Thinking: Though it does not come naturally, seeing the world through the lens of probability helps us achieve positive outcomes in an uncertain world.
  • Mindset Shift: Embracing probabilistic approaches allows teams to manage uncertainty and control risk, rather than relying on flawed cause-and-effect assumptions.
  • System Design: Developing adaptive systems that can adjust how we work to changing conditions is more effective than creating exhaustive upfront plans.
  • Beyond Planning: By appreciating systems, thinking in bets, and focussing on outcomes, we can move past the need for detailed plans.
  • Controlling Volatility: Leaders can use various methods to make safe bets in a controlled environment, while responding and reacting to new developments as they emerge

Software delivery at a very large scale is extremely complex. We are not just pulling stories from a backlog in a scrum of scrums; we coordinate the collective effort of hundreds or even thousands of engineers across dozens of teams and multiple organizations to deliver a single, integrated system. These are globally distributed products and services built by the world’s largest companies and provided to the most demanding customers. With so many moving parts, each requirement, team, integration, and interaction point becomes a nexus of possible failure modes. The question looms in such a complex environment: How can we turn the odds in our favor when faced with complexity so vast that failures are inevitable? Surprisingly, the answer may lie in the strategies employed in an unlikely place - Las Vegas.

Games of Chance

In 1960, MIT mathematician Edward Thorp visited Las Vegas with a skeptic’s eye. He noticed other players were deluded by the potential for finding betting patterns that could give them an edge on the felt. In his autobiography A Man for All Markets, he wrote, "Mathematicians had proved no system of varying bets could blunt the casino’s edge. Generations of gamblers had been seeking the impossible. Players are confused about the inevitability of losing in the long term". This failure to grasp the mathematical futility of their pursuit reveals a fundamental human bias.

The logic of probability defies our natural intuition. We live in a world of apparent cause-and-effect relationships - touch the stove, you get burned. This daily experience of determinism and certainty shapes our thinking, creating a cognitive trap when encountering complex, probabilistic systems. We instinctively search for deterministic patterns where none exist, making incorrect associations that obscure the true statistical relationships beneath the surface.

Yet our hero Thorp was no foolish gambler. His mathematical training told him there might be a way to bend the rules - not by finding misguided patterns, but by understanding the deep structure of the underlying probability of the game. While others failed with betting systems, Thorp saw something hidden underneath Blackjack: a dynamic system where the probability landscape shifted with each card played. Armed with a crucial advantage - a computer - he began a multi-year mission to map the odds of the game. Using an IBM 704, he methodically calculated the odds for millions of card combinations, analyzing how removing each card altered the probability space.

The result was revolutionary: a system for navigating the probability curve. Thorp could track how the odds shifted between dealer and player by keeping a running count of played cards. When low cards were depleted, leaving a deck rich in high cards, the odds turned in the player’s favor - signaling when to increase bets. His 1962 book Beat The Dealer revealed this system to the world, spawning the famous MIT Blackjack Team and embedding card counting in popular culture. More importantly, it demonstrated how systematic thinking about probability could transform uncertainty from an obstacle into an opportunity.

Planning in Software

Fast forward to today, and our world is defined by computers. In very large software delivery projects, we’re tackling incredibly complex problems using vast teams, aiming to deliver high-value outcomes on a global scale. The intuition here is precisely what you would expect from us, mere humans, the same humans that misunderstand probability - we apply a system of management based on causality. With this, we fall into the trap of classical logic: the idea that we can precisely lay out a path and then follow it correctly over many months and even years. Somehow, this seems to make sense. Fictional, though it may be, the story we tell feels real.

To that end, we have tried to apply the same management systems designed for delivering real-world outcomes - factories, bridges, skyscrapers - to software. These approaches suit the physical world and its classical mechanics, trying to create predictability through determinism. They operate from the premise that work can be exhaustively described, so the focus is on centralized planning, tracking, and reporting.

But plans like these need to be correct to be successful - they make big bets on a series of events that need to succeed. And when they don’t? The belief that we can press the system into submission is also supported by the conventional wisdom of determinism, which argues that we can throw people, money, and more planning at the problem to eventually emerge with something that works. Unfortunately, the premise is flawed, and the plan will continue to execute exactly as it should: into failure. Since this kind of significant up-front correctness is not something we can count on in software, we have seen these approaches consistently fail.

In software, we don’t get predictability through causality. Large batches, risk-aversion, phased process, over-planning, big bang deliveries - these are the mistakes we make when we try to wrestle certainty from the hand of randomness. Rather than be like the misguided gambler, we must resist the pull of deterministic thinking.

Beat the Odds

Just as Thorp used probabilistic thinking and system design to beat the odds in blackjack, software development teams can improve delivery outcomes by adopting probabilistic approaches and designing adaptive systems that manage uncertainty and control volatility. With this mindset, we can move beyond traditional planning methods and develop systems that guide us through complexity and uncertainty, much like Thorp did at the blackjack table.

Thorpe’s Blackjack system is a fantastic example of thinking in probabilities to create a system that achieves consistent outcomes in a non-deterministic way. Using a card counting system won’t win you every hand, but you will consistently walk away a winner over many hours. We cannot predict what will happen throughout the play; no chart shows when the winning starts and the losing stops. Instead, the player focuses on the system, using it to read the table, adapting the play to the changing odds, and creating the circumstances for long-term success within persistent uncertainty. As long as the player is using the system correctly and doesn’t get distracted by everything happening around them, they will achieve success.

What Thorpe managed to do is control volatility over time. In probability theory, volatility is the amount of variance we see in a system over time. The more volatile a system is, the more extensive the range of outcomes we expect. By reducing bets when the odds are bad and increasing them when the odds are good, card counting uses a boundary condition on the probability curve to detect when volatility increases and when it reduces. In other words, we restrict our actions when we are in the danger zone and open them up when the surface area of positive outcomes increases. This idea that volatility is a fixed system element and that we are merely subject to its whims is perhaps a common intellectual fallacy. With card counting, Thorp found a way to play with the risk curve and capitalize on its changes over time. And so can we.

Beat the Plan

In enterprise software delivery, we, too, are trying to win a long series of bets. We play from a deck of visions, missions, roadmaps, priorities, and many small decisions as we try to harness the potential of an uncertain world. Thorp faced the volatility of card distributions and used a statistical system to gain an edge. Likewise, in software, we face volatility in project requirements, implementation details, customer expectations, and market directions, but just as Thorpe used his card counting system, we can use our systems to navigate the uncertainty.

We create systems to guide us through a series of events over a long period, allowing us to exceed the limitations in our ability to plan work. To do better than a plan - to beat the plan - we need systems. Where plans might fail us, systems can save us. How might we use thinking in probabilities to design better systems? I see three key areas:

  • We need to have an appreciation for systems, a phrase I am borrowing from W. Edwards Deming’s System of Profound Knowledge. Effective systems will protect us from the probability of failures. It is our systems that give us the edge. Systems like quality frameworks, continuous integration, deployment pipelines, and monitoring systems guide our decision-making. They are designed to enable us to make decisions safely while providing feedback to help us understand our risk and adjust accordingly. We want to elevate our perspective by looking at the network of components that work together, considering the relationships and interactions between different parts of a system and other systems and external factors. This is where probabilistic thinking becomes effective.
  • The next idea is to embrace a concept explored by former World Series poker player Annie Duke in her book Thinking in Bets. She defines a bet as a "decision about an uncertain future", and what’s crucial to understand is that correct decisions don’t guarantee a result. Every poker player knows the "bad beat", where a well-played hand loses due to sheer luck. To that end, Annie notes that the world’s greatest poker players spend an unusual amount of time analyzing their decision-making process to understand whether they correctly applied their system, regardless of whether they win or lose. This wisdom is incredibly poignant to us in software, where we first need to view our decisions as bets and, second, place more value on reflecting on how we place them. We leave many parts of our work as tacit knowledge and only do retrospectives at a system level when things go wrong. We can build more resilient organizations by surfacing implicit systems and regularly examining how well we follow them rather than just reacting to successes and failures.
  • Over time, what matters isn’t individual decisions but our ability to achieve successful outcomes through consistent system-following behavior. Just as the card sharp doesn’t need to win every hand to walk away ahead, a well-designed software delivery system should absorb the impact of failed experiments and wrong turns while keeping us moving toward our goals. Individual decisions - whether to refactor a service, which technology to adopt, how to structure a team, or which feature to build - may be right or wrong, but the wisdom of probability tells us the cumulative outcome matters. This is why the above two points are crucial: they help us understand whether we succeeded or failed and whether our systems are effectively guiding us toward desired outcomes despite inevitable setbacks. By setting our targets on outcomes rather than individual decisions, we create space for controlled experimentation and learning, allowing our systems to evolve and improve even as they keep us on track.

These three principles - systems thinking, betting mindset, and outcomes focus - reframe how we should approach planning in software delivery. Rather than creating perfect plans, we need to build systems that can guide us through uncertainty. A plan may tell us where we want to go, but our systems determine whether we’ll get there. This brings us to the crucial question: How do we design systems that can navigate the complexities of large-scale software delivery while managing the inevitable volatility we’ll encounter along the way?

Plan the System

We can transform software delivery uncertainty into predictable outcomes just as Thorp transformed gambling uncertainty into a calculable advantage. But where Thorp dealt with the relatively constrained world of card probabilities, we face a far more complex challenge. Our deck contains countless variables with hidden interactions and long histories that come together and fall apart in different ways. Team dynamics, technical dependencies, market changes, customer needs, and organizational issues all interact and influence each other in ways that frustrate traditional planning methods. How might we leverage systems of software delivery to control this volatility?

The key lies in establishing boundaries that control volatility while maintaining the flexibility to adapt. Through years of experience and evolution, I’ve identified five critical elements that work together to create such systems:

  • Agility - decoupling predictability from causality
  • Continuous Methods - using continuous methods to solve discrete problems
  • Prioritization - working on areas with the highest volatility first
  • Constraints - imposing enabling restrictions and limitations
  • Culture - promoting healthy values and behaviors

All of these come together to help us design and execute systems that manage our bets and drive us toward targeted outcomes. Let’s dig into each area.

1. Agility - The first thing we need to do is decouple predictability from causality. We can achieve predictable outcomes without knowing all the details, creating paths that do not create much risk. Where Thorp managed the volatility of his game by adjusting bets according to his card counting system, in software development, we can manage project volatility by working in short cycles and adjusting in light of what our systems of work are telling us. We use these systems to take our "count", take it more often, assess our situation, and make small bets that succeed or fail without creating excessive risk. For example, we can reallocate resources to high-priority or high-risk areas as the project evolves or adjust requirements based on our learning. Agile, DevOps, and Lean methodologies form a system that gives us a regular "read" of the table, one we use to recalibrate where needed to land inside the range of desired outcomes eventually, which is why they are so effective for us. With this in mind, you don’t need to cargo cult (imitation without understanding) practices into your organization - you can evaluate them in this light.

2. Continuous Methods - The third element is continuous practices. We reduce the risk of our short-term decisions by situating them in systems of overarching engineering practices. This is the domain of Lean, where we convert discrete events into continuous practices so that each bet is placed under the same conditions. TDD, CI/CD, DevOps, DevSecOps, Automated Governance, etc. Just as Thorpe updated his card count with every hand, we run our own operations with every iteration, perhaps with every code commit, ensuring that playing a hand does not change our risk assessment equation. If we allow failing tests, security issues, defects, and the like to build up, we will no longer play our hands under the same conditions. With creative thinking, we can find ways to convert all discrete elements of software delivery into continuous functions in our engineering systems.

3. Prioritization - Next, how can we get an "unfair" advantage in our work? By stacking the deck! This is one area where we diverge from the gambler - we get to cheat. We can choose how we structure, scope, and sequence our work to create the least amount of risk for ourselves, prioritizing the work according to where the most volatility lies. For example, integrations always present a high degree of risk, but if we focus on them first, we can continuously integrate subsystem changes throughout the delivery process. This means we get to play the game knowing that the bad cards have already been dealt! Further, prioritization should be implemented as a continuous system, not as a big upfront plan. As we progress through our work, we gain insight into whether the rest of the deck is stacked for or against us and continuously re-evaluate our position. To achieve this, we need to involve the experts closest to the problem in the feedback loops of our work delivery systems. Having a generic, common-sense understanding of how the world works is not enough. We need the representation of technical experts with deep knowledge to guide difficult decisions and develop creative solutions that achieve an optimal balance of risk and reward.

4. Constraints - The fourth element is the role of constraints, as in when and how to apply constraints to achieve better outcomes from the system. This differs from classical/deterministic systems, where the constraints are imposed upon the system from outside (the cost of materials, the laws of physics, etc.), and the system operates against them. In synthetic/non-deterministic systems, constraints are imposed; they are system elements we can experiment with, adding and removing as we evolve to get the right mixture of velocity and quality. We deliberately introduce helpful governing and enabling constraints into the system to contend with volatility. In Blackjack, Thorpe had only a few quantitatively defined boundary conditions to guide his system, but in software organizations, we have a wide variety of these boundaries with which to play. For example, we can introduce technical constraints, scheduling constraints, organizational constraints, resourcing constraints, and more. We can use these to create forces that will shape the direction of our work, implementing them in a wide variety of ways with various degrees of strictness.

5. Culture - Finally, we have the values and behaviors that form our culture. Thorpe’s card-counting system requires commitment, focus, and discipline to succeed. In the same way, we need cultural norms that will shape and guide participants’ behavior in our systems. To compensate for the uncertainty and knowledge gaps from building complex distributed systems, we must promote values like learning, transparency, citizenship, and shared responsibility. And for us, like Thorpe, we need the discipline and commitment to follow the systems we put in place. A key goal is to motivate people to make decisions that are best for the overall system, not for themselves, because it is, in fact, the system that brings people success.

Conclusion

The logic of probability is not readily forthcoming - our intuition does not give up on what we know well: a world of cause-and-effect relationships, reproducible and mechanical, in which one thing always leads to another. Touch the stove, and you get burned. The conditioning of classical logic in our daily lives creates a trap when we move into the world of probabilities. We are prone to making incorrect associations and false inferences that betray the underlying statistical relationships.

The journey from the casino floors of Las Vegas to the server rooms of Silicon Valley may seem a long one, but the underlying principles remain the same. Thorp showed us that by understanding volatility and designing systems to manage it, we can consistently win in a game designed for us to lose. By learning to "count cards" in our field - to read the shifting probabilities and adjust our strategies accordingly - we can turn the tables in our favor.

The key to mastering software development at scale lies not in our ability to predict the future but in our capacity to play with uncertainty. When we embrace the probabilistic nature of our work, we can unlock levels of efficiency and predictability that have eluded us. The question that concerns us is not how to make better plans but how to design systems that beat the plan, thrive, and even capitalize on uncertainty. This mindset shift will be one of the most crucial skills in the fascinating endeavor of large-scale software delivery.

About the Author

Rate this Article

Adoption
Style

BT