Collusion Study
Collusion Study
Collusion Study
Robert J. Ullman
Advised by Professor Andrew Appel
Princeton University
Department of Computer Science
Date of Submission: April 30th, 2015
Abstract
We create modular software to construct and solve general hold’em poker games. Because
most poker variants, like Texas Hold’em, are prohibitively complicated to analyze directly, we
instead use the flexibility of our software to look at a variety of stripped-down hold’em-like
games. In these simplified games, we answer questions about how game rules, player position,
and betting history affect a player’s strategy to give insight into optimal strategies for more
complicated poker variants. We simulate these variants with collusion and find that collusion
leads to an overall increase in aggression by colluding players. We find that although strong
collusion leads to a significant advantage for colluding players, weak collusion does not.
1 Introduction
Unlike most casino games, poker is a game of skill. Talented players gain an advantage by ap-
proximating the likelihood of having the winning hand. Sometimes, players can be certain that
they will win; other times, this approximation requires careful consideration of the entire betting
history. Approximating win probability helps a player decide if she should bet, and if so how much,
or fold. Yet, even assuming a player can perfectly compute the probability she will win a given
hand — a calculation that can be done exactly with enough time — her optimal betting strategy
remains unclear because her expected winnings depend on the strategy of her opponents. Given the
uncertainty of opposing strategies, a Nash equilibrium strategy is favorable because it hedges one’s
Another way to gain an advantage is to cheat and gain private information as to what cards are
left in the deck or what cards the other players have. In many instances, cheating only helps a single
player (the cheater) at the expense of all other players. As such, cheating is disincentivized by all
players (players are careful to hide their cards, join only reputable and fair games, etc.). Yet, cheating
can be incentivized if some group of players can do better — either in aggregate or individually —
by working together. Players who strongly collude share private information, like the cards they
were dealt, for mutual gain. Strong collusion can be policed in in-person poker games by monitoring
players to prevent sharing information, but such measures are ineffective in online poker games where
sharing info outside the game is easy. As such, online poker communities rely primarily on reactive
measures that use statistical analysis on past hands to detect collusion. This often works by looking
for abnormal patterns in playing history but can also be done with more nuanced approaches that
do not assume any particular behavior pattern [10]. Another form of collusion is weak collusion
where players decide to work together without sharing information outside the rules of the game.
Weak collusion strategies rely primarily on beneficial coordinated betting. Also, weak collusion
is easy to implement and can be particularly effective against poor opponents. Strong collusion
produces larger expected winnings than weak collusion, but weak collusion is harder to detect, even
In this paper, we characterize the Nash equilibrium strategy in two simple hold’em variants. In
particular, we quantify the effects of game rules, number of players, player positions, and betting
2
history on the equilibrium strategy. We also consider these variants with both strong and weak
collusion to understand the effects of collusion in hold’em-like games. We explain the poker ab-
stractions and software implementation that we created to facilitate studying hold’em-like poker
variants.
Game theory, a field pioneered by John Von Neumannn in the 1940s, studies strategic competitive
decision making. A game is specified by offering a player a list of potential strategies and indicating
the payoff for each strategy conditional on all the other player’s strategies. Specifying a game in
this way, called a game’s normal form, lends itself easily to a matrix representation. For example,
consider a two-player game where the first player has n strategies and the second player has m
strategies. We can define the first player’s payoffs by an n × m matrix A, where Aij is her payoff
when she plays strategy i and the second player plays the strategy j. Similarly, we can define an
m × n matrix B to be the payoffs for player two. A game is zero-sum if the payoffs for all players
A game in normal form can be thought of as being played in a single turn where all players
simultaneously choose their strategies and then consult the payoff matrix for their wins or losses.
Some games are more intuitively expressed in a turn-based form in which players alternate making
moves. This is known as the extensive form of a game and is often expressed as a tree. In such
a tree, nonleaf nodes represent a player’s turn, edges represent possible moves on a given turn, and
leaves represent game payoffs. Many games (e.g. most card games) have situational variation caused
by neither player (e.g. the cards you are dealt), which can be conveniently represented by assigning
some nodes to a nature player with outgoing edges representing the different game variations that
can happen at that point. One can always convert between the matrix normal form and tree
extensive forms of a game, but the matrix form is less concise because the number of strategies is
exponential in the number of tree nodes for each player (figure 1).
Players can choose to play any distribution over their given strategies to form a mixed strategy.
The special case where a player chooses to play only one such strategy is called a pure strategy.
Von Neumann proved that in two-player zero-sum games there exists a value V and mixed strategies
3
C
(F,Y) (F,Z) (G,Y) (G,Z) .2 .5
.3
(a,j) 5 11 5 11
P1 P1 2
(a,k) 1 1 1 1 j
a c k
(b,j) 6 12 3 9 b
(b,k) 3 3 0 0
15 P2 -5 P2 -10
(c,j) 0 6 0 6
(c,k) -3 -3 -3 -3 F G Y Z
(a) Normal Form 25 10 0 20
(b) Extensive Form
Figure 1: An example two-player game (P1 versus P2) in both normal form and extensive form.
The node labeled C is played by the nature (chance) player with the edges leaving it labeled by
the probability of each game variation. The payoffs in the normal form are the expected values of
playing each pure strategy against each pure strategy of the opponent.
for both players such that V is the best payoff the first player can achieve if the second player fixes his
strategy and −V is the best payoff the second player can achieve if the first player fixes her strategy.
In other words, there exist strategies for each player such that no player has any incentive to change
their strategy. Von Neumann’s result, called the minimax theorem, was later generalized by John
Nash who proved that in any game there exists mixed strategies for all players such that no player
has any incentive to change her strategy individually. A set of such mixed strategies is called a
Nash equilibirum.
Finding a Nash equilibrium in games is often done with linear programs (LPs). For example, take
a two-player game with payoff matrix A. Given strategy distribution vectors x and y for players
1 and 2 respectively then the payoff for player 1 is xT Ay. As such, for a fixed strategy y the best
maxx xT Ay subject to ~1 · xi = 1, x ≥ 0.
The analogous LP with payoff matrix B = −AT produces the second player’s best response strategy.
Combining the constraints of both LPs yields an LP that finds Nash equilibrium strategies. This
4
can be generalized to a linear complementary problem (LCP) for nonzero-sum games [9]. Although
linear programs can be solved in polynomial time in the size of the matrix, the size of the payoff
matrix grows quickly with game complexity. The number of entries in the payoff matrix for the game
is the product of the number of pure strategies for each player. The number of pure strategies for
each player is the product of the number of actions they have at each node in the tree of the game’s
extensive form — a number exponential in the number of nodes for each player. This exponential
blowup in matrix size is in some part due to information redundancy introduced in representing a
game in normal form. Koller et al. showed that this redundancy can be eliminated in two-player
games with perfect recall [9]. At a high level, this is accomplished by representing strategies not as
a distribution over pure strategies, but instead as a distribution over game tree states. Representing
strategies in this way reduces the payoff matrix to a sparse matrix with dimensions the number of
decisions for each player (the same space as the game tree itself).
Linear programs are favored in tractable use cases because the LP produces the exact equilib-
rium. Yet, the construction of the LP is tricky to generalize from two-player to multi-player games.
In particular, how to eliminate the exponential space blowup of a game’s normal form is unknown
in the general case. Furthermore, in most applied use cases an approximate Nash equilibrium is
sufficient. As such, iterative algorithms, like gradient descent and fictitious play [3], allow for
solutions with adjustable precision that run much faster than the LP for the same result. We have
Smoothed fictitious play is an iterative algorithm to approximate a Nash equilibrium. Let sti be the
strategy for player i at time t and let st−i be the strategies for all players besides i at time t. Then,
1 t−1 1 t−1
sti = 1− si + ri ,
t t
where rit is the best response to st−i . The fictitious play algorithm terminates when the maximum
regret of any player at time t is below some specified threshold. The regret of a player i at time t is
the difference between the expected payoff of playing sti and of playing rit , fixing all other strategies
5
at that time. In other words, this difference quantifies the regret of player i of stopping fictitious
play at time t and not playing her actual best response to everyone else’s strategies. If all players
have regret less than some value , then the strategy profile is called an -equilibrium. Fictitious
play does not always converge — in particular, algorithm 1 does not always terminate for small —
but when it does it does so to a Nash equilibrium [4]. The intuition here is that if there exists some
t such that stp equals rpt for all p, then no players will ever be motivated to change their strategy.
Algorithm 1 FictitiousPlay()
regret, t ← 0
s0 ← initStrategy()
do
t←t+1
for each player p do
r ← bestResponse(st−1
−p )
stp ← 1 − 1t st−1 1
p + tr
regret ← max (regret, payoff(r) − payoff(stp ))
end for
while regret ≥
Because smoothed fictitious play takes a running average, it is resistant to change as the number
of iterations increases. As such, even when it eventually converges, fictitious play tends to cycle
around an equilibrium strategy (figure 2). Intuitively this is because a change in some dimension
of the best response is dampened in its effect on the current equilibrium approximation. When
fictitious play converges, the rate of convergence can be as slow as exponential in the order of
In most papers that make use of a best response function, the function is used as an oracle because
the particular implementation is unimportant. As such, the existing literature often falls short in
giving a detailed explanation of any particular best response function. Although we find straight-
forward LP and LCP formulations of best response functions in the literature, these constructions
are only applicable to two-player games [9]. We describe a simple procedure to compute a best
response in any extensive form game with perfect recall. Our contribution here is not to present a
particularly clever or optimized best response function, but instead to describe the straightforward
6
Regret Reduction by Iterating Fictitious Play
Figure 2: The player regret in Mercer Hold’em at each iteration of fictitious play. Notice the
dampened cyclic behavior.
For a given player, the best response to a set of fixed strategies for all other players is always to
select the action that maximizes that player’s expected payoff. For this reason, the best response
strategy is always a pure strategy. We define an information set I for a player p0 to be a set of
nodes that are indistinguishable from the perspective of p0 .1 For notational convenience, we define
an information set function I(n) to be the information set of a tree node n. Take any information
set I for player p0 and consider the set of available actions A for all n ∈ I. For action a ∈ A, we
define P(a) to be the probability that the current player chooses a. For node n ∈ I, we define F(n)
to be the frequency with which n occurs in the game. Finally, we define a child function C(n, a) to
be the node in the game tree found by starting at n and taking action a. For an information set I,
action a available at I, and payoff function Vp , we define the weighted value of taking action a:
X
WI (a) ≡ F(C(n0 , a)) × Vp (C(n0 , a)).
n0 ∈I
1
Nodes are indistinguishable when they have the same history except for private information (e.g. hold cards for
other players).
7
We recursively define the payoff function for player p at a node n ∈ I:
P(a)WI (a) : p 6= p0
P
Vp (n) = a∈A
0
max
a∈A WI (a) : p = p
The best response strategy s for player p defined on any of her information sets I is the pure strategy
The best response strategy and the payoffs at each node can be calculated together, where the
maximizing action used in calculating the payoff for player p is also the action selected as player p’s
pure strategy.
The frequency and payoff functions F and V respectively are fundamentally different in nature.
The frequency function for a node n is computed by taking the product of the probabilities of
selecting each action along the path from the root to n, while the payoff function for a node n is
computed by working backwards from all the leaves in the subtree rooted at n. An effective way
to visualize (and implement) finding the best response is to propagate frequency information from
the root towards the leaves and propagate payoff information backwards from the leaves towards
the root until all nodes are explored. Information sets pose a potential problem because computing
the payoff at a given node involves jumping around the tree and consulting the payoff of all nodes
that share the same information set. As such, how to propagate frequency/payoff information along
the game tree and how to deal with the potential of infinite loops when computing node payoffs
is often unclear. To avoid the confusion with propagating frequencies and payoffs, these values
are not calculated incrementally but instead are calculated on the fly with caching (dynamic
programming). The problem with infinite loops is avoided entirely because poker (or at least all
variants we study) is a game of perfect recall: for two tree nodes n, n0 , where n is an ancestor of
n0 , we have that no node v 0 ∈ I(n0 ) is an ancestor of any v ∈ I(n). In other words, every player can
8
3 Game Representation
3.1 Poker
Poker serves as a classic example of a zero-sum game. There are many variants, but all follow
the same basic premise: each player has some hole (private) cards and participates in one or more
betting rounds as shared cards and/or more hole cards are dealt. In the end, the player with the
best hand — as determined by the rules of the poker variant — wins the accumulated bets. In
cash poker, people bet money (or chip equivalents) and thus aim to maximize expected winnings.
In tournament poker, a variant that has become popular in televised poker, players bet chips that
have no cash equivalent. Instead, people are awarded cash prizes based on when in the tournament
they lose all their chips. The key difference between these two types of poker is that cash poker
has a local win condition to win as many hands as possible, while tournament poker has a global
win condition to end up with all the chips. As such, playing tournament poker requires taking into
account all players’ chip stacks in addition to the traditional concerns of cash poker (hole cards,
In tournament poker, because the win condition involves bankrupting opposing players, aggres-
sive bets that force players all in (to call with all their chips) are more common. In fact, the
fold/shove model, a simplified model for tournament poker where players decide to fold or go all
in as their first bet, is actually a reasonably good approximation of a Nash equilibrium. Although
this model greatly reduces the complexity of the game, how to value chip stacks — the likelihood a
player will win given all players’ chip stacks — is still unclear. The poker community often uses the
Independent Chip Model (ICM) heuristic to approximate chip stack cash value. Research has been
done to improve upon ICM using a double iterative approach where Nash equilibrium strategies
computed using ICM are used to update chip stack valuations [5].
In essence, tournament poker is a more complex version of cash poker where the expected value
of the game comes not only from the strategy in any single hand, but also from the chip stacks
of all players. This complexity does not necessarily result in more complex strategies, but instead
frequently leads to a simpler strategy as seen with the fold/shove model. We restrict our study to
9
3.2 The Abstractions
We represent poker games as a game tree in extensive form. For this purpose, we developed a set
of abstractions for specifying poker games that hide the tedious underlying tree construction and
wrote a program implementing these abstractions that can be configured to a wide variety of poker-
like games. The abstractions have three main pieces: a deck, a list of game rounds, and a payoff
evaluator. The deck abstraction specifies the ranks and suits available in the game. Each game
round has three phases: anteing, dealing, and betting. The anteing phase requires paying the ante,
a forced bet for each player. Antes provide incentive for players not to fold during the upcoming
round because doing so would forfeit their ante. Although such an incentive is usually only required
for the first round because in later rounds remaining players have already had (unforced) bets, our
particular abstraction allows antes at every round. In some variants of poker, blinds — forced bets
for some subset of the players that contribute towards the betting phase — are used as alternatives
to antes. We chose to study games without blinds, so they are not currently implemented in the
abstraction but could be easily added. Each dealing phase gives each player zero or more additional
hole cards and gives all players zero or more additional shared cards. Each betting phase is specified
by a list of allowed bets and maximum number of bets per person. Players take turns selecting one
of the allowed bets or folding until either all but one player has folded or everyone has called/checked
(bet no more than the previous person). The list of allowed bets is not a perfect model for no-limit
games where players can conceivably bet anything, but in this case it can serve as an approximation.
The payoff evaluator specifies how to distribute the pot to each player at terminal nodes. This
incorporates the winning condition of the poker variant and usually involves determining the best
possible hand(s) out of the shared and hole cards and rewarding those players (who have not already
folded) with the best hand(s). In Texas Hold’em, there are 6 types of hands2 , and players make the
best five-card hand of five shared cards and two hole cards. Most poker variants involve making
five-card hands, so implementing one five-card evaluator is usually sufficient. For our research, we
make use of the flexibility of our abstraction to play poker-like games with a four-card evaluator.
In addition, one can use the evaluator abstraction to explore game variants with untraditional win
conditions. For example, our evaluator abstraction expresses weak collusion by sharing colluding
2
The Texas Hold’em hand types are high card, one pair, two pair, three of a kind, straight, flush, full house, four
of a kind, and straight flush. Ties are broken within each hand type by card rank.
10
players’ payoffs.
The game tree is built with three different types of nodes: action nodes whose outgoing edges
represent different player betting choices, deal nodes whose outgoing edges represent different as-
signments of hole or shared cards, and payoff nodes that are leaves and store the payoff results
for each player. This specification leads to a straightforward algorithm to build a game tree where
for each round one adds a series of nodes that correspond to all possible deals with all possible
combinations of player bets. At the end of the game — either caused by all but one player folding
or finishing all rounds of betting — the payoff evaluator is used to generate the payoff nodes.
From any node in the game tree, one can follow the path back to the tree root to determine
the entire game state. This game state includes all public information, like shared cards and bets,
and private information, like hole cards. As such, only an omniscient player knows her location
in the game tree in a game with private information. Players of the game must play the same
strategy (the same distribution over their possible actions) at each action node where the game
history is indistinguishable to them. To enforce this constraint, each action node is part of a set,
called its information set, that contains all other nodes indistinguishable for this node from the
In general, generating the game tree from our abstractions does not tie us to any particular
underlying implementation and allows us to perform many general optimizations to increase ficti-
tious play performance. Precomputing the game tree before iterating fictitious play allows the tree
and information sets to be cached, which simplifies and speeds up calculating the best response.
In addition, terminal payoffs, which do not change between iterations of fictitious play, are cached,
which leads to big performance gains because the payoff evaluator need not be run more than once
for each game. Furthermore, the game tree can often be simplified after initial construction to
save space and time. Simple, but effective, optimizations involve skipping deal and action nodes
for folded players and truncating the game tree at nodes where all but one player has folded. Be-
cause the dealing and betting phases of each round consist of many deal and action nodes each, a
more involved optimization involves compressing this information into fewer nodes. These types of
11
Example Poker Game Tree Construction
Figure 3: A visualization of a game tree our program can create. Because the trees branch rapidly,
only one full path is shown. In the path shown, the following events happen in order: the first
player is dealt a 3 and 4, the second player is dealt a 2 and 5, the first player bets 4 chips pre-flop,
the second player calls, there are a 3 and 5 on the flop, the first player bets 8 chips, the second
player calls, and then the game ends. In this case, the second player would win the accumulated
pot (24 chips plus antes) because she has the best 4-card hand.
compression optimizations offer the most potential gains because, if applied early in the tree, they
We wrote a Java program for constructing and solving hold’em-like poker games (https://github.
com/rjullman/poker-nasheq). The PokerGame abstraction defines the hold’em variant and is de-
fined by a Deck, an Evaluator, and a specified number of players. The Deck defines the set
of cards used in the game, and the Evaluator determines the share of the pot for each player
given her hole cards, shared cards, and betting history. For our research, we implemented a
tom Evaluator. The PokerGame.buildGameTree() method constructs the GameTree object used by
the Strategies.nes(GameTree gt, double tolerance) utility method to compute a Nash equi-
librium strategy using fictitious play. The result of a fictitious play calculation is a Strategy object
indexed by InfoSets, the information set abstraction. Each InfoSet is programmatically generated
by buildGameTree() and assigned a human-readable label indicating the current game history of
The resulting game tree can be quite large and, in the current implementation, must fit in mem-
ory. Three-player Multiround Mercer Hold’em, the stripped-down poker variant that we consider
in this paper, has around 5 million nodes and took over 4 gigabytes of memory to store. In general,
the game trees generated (figure 3) grow exponentially with the complexity of the game. For exam-
12
ple, adding a check bet to Multiround Mercer Hold’em made the resulting game tree overflow the
(self-imposed) 32-gigabyte memory cap on our shared-resource super computer.3 On that same su-
per computer, finding a 0.01-equilibrium in Multiround Mercer Hold’em took roughly 10 hours and
3600 iterations of fictitious play. Under collusion, the equilibrium strategy took longer to converge,
requiring up to roughly 24 hours of computation (and 8500 iterations of fictitious play) to reach a
0.01-equilibrium.
4 Hold’em Variants
Texas Hold’em is a particular variant of poker played at many casinos. Representing two-player
Texas Hold’em in its extensive form requires O(1018 ) nodes or O(1017 ) nodes leveraging suit and
rank symmetries in the deck [2]. Since computing a Nash equilibrium for a game of this size is
intractable, the general technique for solving Texas Hold’em (and other complex poker variants) is
to find the Nash equilibrium for a simplified poker abstraction and then scale up the abstracted
The primary source of game complexity comes from the combinatorial blowup of the number
of possible hands in Texas Hold’em. To reduce this complexity, people often try to group hands of
relatively equal situational value and handle them together. Gilpin and Sandholm designed an algo-
rithm, GameShrink, to simplify extensive form games that they then applied to Texas Hold’em [6,7].
Whelan used a supervised machine learning approach to cluster hands of similar value [12]. Game
complexity can be furthered reduced by discretizing the traditional betting model or limiting the
In 2015, Bowling et al. weakly solved two-player (unsimplified) Texas Hold’em by approximating
the Nash equilibrium with an iterative algorithm [11]. They improved upon a counterfactual regret
minimization (CFR) algorithm [13] to mitigate intractability problems but still resorted to brute
force to reach a solution. Their program ran on 4800 processor cores and required over 68 days and
13
Bet Percentages by Holding
25 34 35
1st Player 66 65 100
2nd Player 0 0 97
Table 1: The betting and folding percentages for the nontrivial hands in two-player SMH.
We will study variants of a hold’em poker game called Mercer Hold’em [1]. The game is played with
a 20 card deck with 4 identical copies (no suits) of each of 5 cards (2 through 6). Mercer Hold’em
has two rounds. In the first round, the private round, there is a 1 chip ante, two hole cards,
and betting. In the second round, the public round, two shared cards are revealed (the flop)
and there is (optionally) more betting. Afterwards, the best four-card hand wins.4 The variants
of Mercer Hold’em are determined by the constraints on the betting in both rounds. The simplest
variant, Standard Mercer Hold’em (SMH), allows each player only to bet 4 chips or to fold in
We choose to study Mercer Hold’em because it has nontrivial Nash equilibrium strategies while
also having a low computational complexity.5 Making use of the expressiveness of our game tree
abstraction and the relative simplicity of Mercer Hold’em, we are able to study Nash equilibrium
strategies for a large variety of game variants. Noting how slight variations in game rules affect
Mercer Hold’em strategies provides insight into how poker variants affect game play even in more
In two-player SMH (2P-SMH), most hands can easily be classified as either good hands —
players always bet with 22, 26, 33, 36, 44, 45, 46, 55, 56, or 66 — or bad hands — players never
bet with 23 or 24 — regardless of the betting history. Concisely, we can describe the good hands
as the pairs and high cards and the bad hands as the low unpaired cards. The remaining three
possible hands 25, 34, and 35 have a nontrivial playing strategy at equilibrium (table 1). Notice
that for these hands the first player bets aggressively (by bluffing), while the second player plays
defensively. This makes sense because only the first player sends signals to the second player, so
4
In increasing value order, the hands in Mercer Hold’em are high card, one pair, two pair, three of a kind, and
four of a kind.
5
The simplest variant two-player SMH has approximately 104 nodes in the game tree, but even the more complex
variants considered have at most 109 nodes.
14
First Bet Percentages by Holding
22 23 24 25 26 34 35 36 45
1st Player 98.1 0.2 0.2 12.4 27.1 0.3 66.5 100 100
2nd Player 46.2 0.2 0.3 0.5 1.0 0.5 0.9 34.6 46.2
Table 2: The equilibrium strategy for the first bet in 2P-MMH. Only the hole cards with nontrivial
equilibrium strategies are given (the other hands mimic 2P-SMH).
the second player cannot bluff. Furthermore, we find that the game has an expected value of 0.1 to
the second player (and −0.1 to the first player), which makes sense because the second player acts
We also examine a slightly more complicated variant of Mercer Hold’em we call Multiround
Mercer Hold’em (MMH). Just as in SMH, MMH allows two actions from the players: fold or
bet 4 chips. In both the private and public rounds, there are 2 rounds of betting. For example, in
the two-player game there can be at most 8 bets for a total pot size of 34 chips (there are 2 ante
chips) at the end of the game. Notice that a player in position i gets a chance to raise only if all the
players before her fold (otherwise her bets are calling the previous player). The multiple bets per
round and the multiple rounds of betting in MMH simulate the features of a more complex poker
game.
makes sense because intuitively the information gained from the betting history is secondary to the
information gained by a player’s hole cards. The restricted betting in SMH makes SMH a good
first order approximation to 2P-MMH and in general all Mercer Hold’em variants. For this reason,
the places where 2P-MMH differs from 2P-SMH are of particular interest as these are places where
the flop is significant. The first player’s first bet, which is made with no betting history (in both
SMH and MMH) has a different equilibrium strategy in MMH than in SMH (tables 1 and 2). In
particular, the first player no longer always folds bad hands — namely 23 and 24 — but instead
bluffs a small percentage of the time. This makes sense because these weak hands can become
strong given particular flops (e.g. 22) and, because there is a round of betting after the flop, players
are able to capitalize on initially weak hands by betting post-flop. For much the same reason, it is
no longer favorable for the first player to bluff with hole cards 34; the continued betting would force
them to over-commit to the bluff. If players are bluffing, driving opposing players out of the game
15
First Post-Flop Bet Percentages Given 45 Hole Cards
22 23 26 33 36 46 56 66
1st Player 20.5 93.9 62.9 47.7 24.7 100 100 53.6
2nd Player 30.0 22.7 34.9 7.6 46.1 60.9 98.1 33.8
Table 3: The equilibrium strategy for the first bet in the public round in 2P-MMH given that the
player has 45 hole cards. Only the flops with nontrivial equilibrium strategies are given. The other
flops are always bet on at equilibrium.
as soon as possible (i.e. by excessive betting) is in their best interest, but in this case the betting
limit to 4 chips prevents such strategies — the opponent is not dissuaded from calling.
Another distinguishing part of MMH is how the first player bets initially in the public round.
For example, if the first player is holding 45, a hand that is always bet on initially, the hand is bet
on much less aggressively post-flop (table 3). In particular, the flops that are bet on less aggressively
when holding 45 are similar in that they do not produce a pair (or better) with 45. As such, these
flops potentially provide victory to initially weaker opposing hands (flop cards that make initially
weak hands strong are called outs). As expected, the exact opposite is true for initially weak hands
like 34. These initially weak hands are sometimes bluffed on in the private round, but are bet on
in the public round only if the flop produces almost certain victory. As a general rule — applied to
all holdings — the first player should bet a large percentage of the time on flops that produce good
hands with their holdings, while also frequently bluffing on hands that do not. In the latter case,
the player continues to bet despite only having a mediocre hand because it is not worth forfeiting
Although the second player bets more defensively than the first player in general, as in 2P-SMH,
the second player bets much less defensively in the public round of 2P-MMH (table 3). Additionally,
the overall defensive play of the second player leads to some quirks in the equilibrium strategy. In
particular, assuming a 45 holding and a 23 flop, the first player bets over 90% of the time. This
is bizarre given that a 23 flop produces no good hands with 45 and that a 22 flop, a flop with
seemingly fewer outs than a 23 flop, is bet on only around 20% of the time. Yet, we can justify this
equilibrium strategy in retrospect. The second player is unlikely to continue playing any initially
poor hand that would combine well with a 23 flop. The difference between the 23 and 22 flop is
more subtle but hinges on the fact that the second player can be convinced to fold when holding a
16
2 only in the former case because triple 2’s is a nearly unbeatable hand.
We find that there is an increased positional advantage in 2P-MMH compared to 2P-SMH. The
expected value of the game is 0.4 to the second player (−0.4 to the first player), four times the
expected payoff in 2P-SMH. Unlike in 2P-SMH, why the second player is expected to win over the
first player is unclear. Although the second player gets to act with more information initially, the
first player has an opportunity to respond to these actions with even more information. In MMH, it
turns out that the exact balance of the game (e.g. ratio of ante to bets, number of players, betting
options, etc.) makes it much better to play second, but there is no reason for this to be true in
The added complexity of three-player SMH (3P-SMH) changes the structure of the equilibrium
strategies. Overall, the inclusion of a third player causes people to bet more defensively because
with everything else equal they are more likely to lose than in the two-player case. As in 2P-
SMH, one still always folds on unpaired low cards, always bets on pairs, and nearly always bets on
unpaired high cards. The difference in betting on unpaired high cards is that at equilibrium the
third player folds these hands when the first two players bet. Basically, the increased competition
is enough to drive out the third player. The equilibrium strategy profile is also decisive. In every
situation players either bet or fold at least 90% of the time. That said, there are more situations
— 15 in 3P-SMH, versus 3 in 2P-SMH — where the equilibrium strategy has some, albeit small,
amount of bluffing. This increase in decisive play and small-probability bluffing is characteristic
in games with more public information because players, in particular later positioned players, can
more accurately predict each other’s cards. The expected value of this game is −0.29, 0.008, 0.282
for the first, second, and third players respectively. Once again, the expected value of the game
increases by position because later players still have strictly more information.
The game of three-player MMH (3P-MMH) admits strategies that are much harder to
characterize. Still, 33, 44, 55, 56, and 66 are strong hands and are bet on almost always. The rest
of the traditionally good hands, namely 22, 26, 36, 45, and 46, as well as most of the weaker hands,
have more nuanced play strategies that depend significantly on betting history, player position, and
17
Second Player’s First Bet Percentage
22 23 24 25 26 34 35 36 45 46
1st Player Folds 84.7 1.9 2.4 4.0 100 4.0 62.6 100 100 100
1st Player Bets 37.4 1.3 1.9 3.0 5.6 3.0 5.1 12.6 12.1 53.0
Table 4: The equilibrium strategy for the second player’s first bet in 3P-MMH. The hands not
shown (33, 44, 55, 56, and 66) are always bet on.
Consider how players bet when no one has folded before their turn. On their first bet, earlier
players bet more aggressively on mid-value hands, while weaker hands are played equally if not
less aggressively (table 5). The general decrease in aggression by position is expected because later
players must have overall better hands to compete with betting players. For this reason, the increase
in aggression for hands 25 and 36 are unexpected. This behavior can be partially explained by the
fact that there is still one more betting opportunity for each player. As such, there is some bluffing
incentive that increases with pot size. The holding 22 shows the most mysterious behavior: not
only are the betting strategies for 22 nonmonotone by position, but the differences in strategies are
significant. Because this hand has high variance — post-flop it is either almost certainly the best or
almost certainly the worst — perhaps the opposing effects of competition and bluffing are magnified
and result in the observed equilibrium strategies. Yet, this trend among the strategies could also
indicate that there is some significant factor for which we have still not accounted.
The second player’s play style is dramatically more aggressive if the first player folds than if the
first player bets (table 4). This directly follows the intuition that with fewer players remaining in the
game, weaker hands will tend to win the pot. When the first player folds, the game is approximated
by 2P-MMH. The differences in strategies for the remaining two players comes from the fact that
the hole cards for the first player, which probably form a weak hand, are no longer in the deck.
This has the effect of making bluffing weaker hands (like 23, 24, 25, and 34) and playing mid-value
The third player playing slightly differently when the first player folds than when the second
player folds is somewhat surprising. Yet, because the third player knows the equilibrium strategies
for the other players, she can use their bets along with their betting positions to tailor a response.
In the case of 3P-MMH, we have already discovered that the second player will bet on weaker hands
18
First Bet Percentage
22 23 24 25 26 34 35 36 45 46
1st Player 76.1 1.3 1.9 3.0 5.1 3.0 4.6 36.3 33.6 100
2nd Player 37.4 1.3 1.9 3.0 5.6 3.0 5.1 12.6 12.1 53.0
3rd Player 100 1.3 1.9 3.5 6.2 3.0 6.2 11.6 11.6 24.5
Table 5: The equilibrium strategy for the first bet in 3P-MMH assuming no one has yet folded. The
hands not shown (33, 44, 55, 56, and 66) are always bet on.
if the first player has already folded. Knowing this, the third player is comfortable playing more
aggressively when the first player folds than when the second player folds (table 6). In practice,
when playing more complicated poker variants like Texas Hold’em, positional information is often
ignored because it adds considerable complexity to a player’s strategy but has only a small influence
The public round equilibrium betting in 3P-MMH is easy to characterize because post-flop
players know the value of their hands and know exactly what potential holdings of their opponents
to which they would lose. As a general rule, a player finds herself in one of three situations: she has
only a high card, in which case she bets (bluffs) just a small percent of the time; she has the best
(or nearly best) hand, in which case she bets (nearly) always; or neither of these situations apply,
and she bluffs that she has the best hand between 10 and 20 percent of the time. There are only a
few scattered situations where the equilibrium strategy does not follow this rule.
5 Collusion
Players collude to increase their total winnings. As such, we motivate collusion in our existing
poker variants by placing some sharing constraint on the payoffs for each colluding player to align
their payoff goals. We aim to observe how this affects the equilibrium strategies and payoffs for
Table 6: The equilibrium strategy for the third player’s first bet in 3P-MMH. The hands not shown
(33, 44, 55, 56, and 66) are always bet on.
19
all players. To be clear, colluding players are not altruistic towards each other but simply want to
In our research we initially considered the pot sharing constraint where colluding players split
their sum total share of the pot evenly. Applying this constraint to 3P-MMH, we find that colluding
players (regardless of playing position) do much worse than if they had not colluded. In particular,
if the second and third players (who, without collusion, both make positive expected value on the
game) collude, they both make negative expected value on the game. At first glance this seems
bizarre, but this is the result of pot sharing being a poor motivating constraint. Under the pot
sharing constraint, it makes sense that the colluding players bet and fold together. Otherwise, the
betting player contributes more to the pot, effectively donating his money to the other colluding
player if he wins, while also accepting more of the risk of losing. Yet, the players are also not
motivated to bet together because even if they can force the other player to fold, they only stand to
win half as much as before. The colluding players can potentially earn winnings in some situations,
but the expected value of the game is an overall loss for them. In particular, this is seen in 3P-
SMH where if the second and third players are colluding then they have no chance to push the
noncolluding player out. As such, when players in the second and third position collude, they have
no hands worth playing and always choose to fold (forfeiting their ante). Although the problem
is not as extreme when other players collude, the central problem with pot sharing is the same:
colluding players are overly motivated to fold. A better collusion constraint — which we use when
simulating games with collusion – is the win sharing constraint where colluding players split
the pot evenly after reclaiming their contribution to the pot. This allows colluding players to bet
separately because they accept no more betting risk than they do without collusion.
Strong Collusion
In strong collusion, the colluding players look at each others’ hands. We first consider strong
collusion in 3P-SMH. This game broadly follows the strategy of standard (no collusion) 3P-SMH
except, as a general rule, the colluding players do not bet when their partner has a better hand
than they do. This makes sense because both colluding players betting doubles their betting risk.
Although normally colluding players do not both bet on a given hand, there are a few notable
20
exceptions to this characterization that highlight the effects of strong collusion. In 3P-SMH, the
only time both colluding players bet on a hand is when they both have good hands with different
win conditions on the flop. The betting from the colluding side serves to bully the noncolluding
player into folding. Additionally, both colluding players remaining in the game serves to swing the
odds of winning in their favor, especially when both have good hands. As such, in this case, putting
more money in the pot for the same potential winnings is worth the risk for the colluding players
because it steeply increases the chance they will win. Perhaps the best example of this is when
the colluding players have hands that provide maximal coverage of the out cards — for example,
hole cards 36 and 45. Although neither 36 nor 45 is the best possible holding, the flop will almost
There are also places where neither colluding player bets despite one or both of them having
hands they would have bet on without collusion. This happens when the colluding players are
holding each other’s out cards. As such, the flop is less likely to have the cards they need to win.
This happens most often when both colluding players have the same exact holding — for example
34. Although 34 is a relatively good hand on its own, both colluding players fold knowing that
getting a 3 or 4 on the flop is incredibly unlikely. Additionally, even if the flop has a 3 or 4, it
may also produce a stronger hand for the noncolluding player (say if the second flop card is a 5 or
6). This outcome is minimized when the colluding players both have the higher ranked hands —
namely 36, 44, 45, 46, 55, 56, or 66. In these cases, the colluding players revert to having only one
player bet.
Bluffing can be made more effective under strong collusion. In many cases where both colluding
players have weak hands, one of the colluding players will bet anyway. Whereas such a bet would
not be feasible without collusion, under collusion the betting player is able to overrepresented her
hand knowing her partner will fold. Take for example the hand 24, one of the weakest hands in
SMH. This holding is nearly always folded by the first colluding player except when the second
colluding player holds 23 or 35. Of these two cases, the 23 case is a bluff (the 35 case, as we have
already seen, ensures flop coverage). Although the other colluding player folds immediately, she
is critically important to the bluff because she makes the game appear move competitive to the
noncolluding player.
21
Strong collusion in 3P-MMH has many of the same general strategies as in 3P-SMH. The added
rounds of betting in MMH allow colluding players to exploit their advantage more fully. For example,
flop coverage betting is more widespread: the first colluding player will now frequently bet with a
holding of 24 if her colluding partner is holding 35, 36, or 56. Because MMH has higher stakes than
SMH, betting on these types of hands is worthwhile because colluding players are nearly certain
to win the final pot. In general, collusion strategies that take advantage of aligning hands become
Bluffing, on the other hand, works differently in MMH than in SMH. Although many of the same
hands are bluffed, they are bluffed less frequently. For example, while colluding players holding 24
and 23 bluff around 80% of the time in SMH, they only bluff around 20% of the time in MMH.
Because there are more rounds of betting in MMH, the noncolluding player is more willing to call
earlier bets to gain more information from her opponents’ betting. This works in the noncolluding
player’s favor because the colluding players are forced to commit mores chips to the bluff. The
noncolluding player is in general more willing to stay in the game longer, so bluffing with bad cards
does not pay off for the colluding players. The bluffing strategies in MMH are much more like
bluffing strategies in more complicated poker variants than those in SMH — poor hands are bluffed,
As one would expect, strong collusion increases the expected value of the game for the colluding
players. This increase is tiny6 in 3P-SMH because the game is too short for colluding players to
have a big advantage, but in 3P-MMH the gains are significant. We find that the collusion and
noncollusion Nash equilibria are different for all players. In particular, the noncolluding player has
to select between her noncollusion strategy and collusion strategy often without knowing which is
optimal in her current situation. Given players A, B, and C, there are three types of collusion:
Public, where A and B are colluding and C responds optimally; Private, where A and B are
colluding and C responds with the noncollusion equilibrium strategy; and Suspicious, where A
and B are actually not colluding and C responds with the collusion equilibrium strategy. We find
that even under public collusion the noncolluding player loses over 17.5% of an ante each game
(table 7). The losses are even greater for the noncolluding player under private collusion.
6
We found expected gains on the order of 10−4 .
22
Collusion Payoffs in 3P-MMH
Strong Collusion Weak Collusion (×10−3 )
Public Private Suspicious Public Private Suspicious
Player A −0.178 −0.195 −0.0316 −0.996 −0.748 −0.448
Player B 0.0889 0.0977 0.0127 0.498 0.374 0.325
Player C 0.0889 0.0977 0.0189 0.498 0.374 0.122
Table 7: The expected payoffs of players in 3P-MMH given that players A and B are potentially
colluding. All payoffs have an error of 0.00577, which is notably orders of magnitude more than the
expected payoffs under weak collusion. We assume that the players rotate dealer each round (from
A to B to C, then back to A and so on).
Weak Collusion
In weak collusion, the colluding players share no private information. Due to the simplicity of
3P-SMH, there are only a few differences from the noncollusion equilibrium strategy. The only
significant change in strategy for the first bet is that the second colluding player does not bet on 34
if the first colluding player also bets. In 3P-MMH, the first colluding player bets on 22, 26, 35, 36,
and 45 more frequently. On the other hand, the second colluding player bets on 22 more frequently
regardless of the first colluding player’s action and bets on 45 and 46 less frequently but only if
the first colluding player bets. As such, we find that weak collusion in MMH produces all around
more aggressive play by the colluding players except that the second colluding player plays less
Under weak collusion, colluding players attempt a strategy that allows them to work together
without explicitly sharing any private information. We observe the extremes principle: colluding
players bluff frequently on low-value and high-value hands but bet less frequently on mid-value
hands. Generally, for colluding players, betting helps to drive out the noncolluding player, while
folding reduces the shared risk by decreasing the colluding players’ aggregate contribution to the
pot. SMH and MMH innately limit betting risk by having only one betting option, so betting in
these variants is overall more aggressive relative to variants with open betting.
To better understand the intuition behind the extremes principle, first consider generally how
to bet with each type of hand outside the scope of SMH and MMH. Given a weak hand, players
either want to fold and accept the loss of their ante or to try to reclaim the pot by bluffing and
betting high. Given a good hand, players should either bet high to drive players out (if they think
23
the hand will depreciate in value) or bet low to encourage players to call (if they think the hand
will appreciate in value). Betting with mid-value hands is more complicated but intuitively involves
a balance bet aimed to goad low-value hands to call but mitigate losses against high-value hands.
Under weak collusion, mid-value hands are bet on less frequently because it is hard for colluding
players to assess the joint value of their holdings. In particular, as already seen in strong collusion,
two mid-value hands may depreciate both hands’ value by eliminating possible out cards in the flop.
That said, Mercer Hold’em is played with a small deck and only four-card hands, so most games
are won with a one or two pair. As such, MH tends to polarize hands and leave only a few hands
to be considered potentially mid-value. If any hands were to be considered mid-value, they would
be 26, 34, 35, and 36 — those being the statistically weakest of the strong hands — but these hands
are still highly favored. In the end, whether these weaker high-value hands are bet on more or less
frequently because of collusion is influenced by other factors (e.g. MH variant and player position).
Finally, because our variants of MH polarize the betting — players either bet 4 chips or fold —
generally high-value hands are bet on frequently, while low-value hands are bluffed on somewhat
infrequently. Collusion has the effect of making bluffing better — a colluding player bluffs with less
risk — so it makes sense that colluding players want to bluff more than if they had not colluded.
Because we are computing equilibrium strategies, we have assumed public collusion7 . As such,
we can observe how the equilibrium strategy for the noncolluding player changes under collusion.
In 3P-SMH we find no significant changes8 to the noncolluding player’s strategy. This makes sense
because the equilibrium in SMH is polarizing — players fold absolutely or bet absolutely in most
cases — and as such it seems reasonable that collusion is unable to shift the equilibrium. On
the other hand, in 3P-MMH we expect, and observe, changes in strategy with collusion (table 8).
We observe a slight, but significant, increase in aggression from the noncolluding player. This is
probably a result of the colluding player trying to adjust her strategy against the increase in bluffing
Although weak collusion is at least as good as not colluding, whether it is actually an improve-
ment is unclear. We found no significance difference in collusion payoffs because the payoffs are
7
In this case of our fictitious play simulations, the noncolluding player determines collusion by observing her
opponents’ strategies over time and responding to them.
8
There are changes, but they are within the margin of error of our fictitious play calculation.
9
Reminiscent of the mantra “the best defense is a good offense.”
24
Third Player’s First Bet Percentages by Holding
22 23 24 25 26 34 35 36 45 46
No Col No Folding 100 1.3 1.9 3.5 6.2 3.0 6.2 11.6 11.6 24.5
1st Player Folds 70.7 2.4 3.0 5.1 12.6 4.6 9.4 100 80.9 100
2nd Player Folds 20.7 2.4 3.0 4.6 9.9 4.6 8.3 16.9 16.9 76.6
No Folding 100 2.0 2.0 3.7 6.5 3.1 6.5 12.7 12.7 36.4
Col
1st Player Folds 30.2 2.5 3.1 5.9 13.8 5.4 9.9 100 68.6 100
2nd Player Folds 46.6 2.5 3.1 5.4 12.1 4.8 9.3 28.5 41.5 100
Table 8: The equilibrium strategy for the third player’s first bet in 3P-MMH with and without weak
collusion by the first two players. Notice that the third player tends to bet more aggressively under
collusion except for on the hands 22 and 45. Also note that the hole card holdings not shown in
the table are always bet on.
within the margin of error of our fictitious play calculation (table 7). These results are not that
surprising because poker is, for lack of a better word, strongly zero-sum: players that decide to
collude are still competing with each other over the pot. As such, under public weak collusion, the
colluding players gain no private information — not even the fact that they are colluding — and as
such it is reasonable to predict that they would be unable to curry an advantage. Yet, because we
observed that under public collusion all players modify their strategy, namely there exists a different
Nash equilibrium than in a noncollusion situation, the colluding players can do slightly better under
private collusion because the noncolluding player is probably not playing her best response strategy.
Although SMH and MMH are toy Hold’em variants, their trends in Nash equilibrium provide
heuristics that extend to the equilibrium strategies in more complicated poker games. Our results
from SMH and MMH characterize how game rules, player position, and betting history affect a
player’s optimal betting scheme. We find that poker variants with fewer betting opportunities
polarize the betting strategies towards a fold/shove model. Adding more rounds of betting makes
the equilibrium strategy more complicated such that all hands, even low-value hands, may be bet
on. The potential of future betting changes how players bet in earlier rounds even when they have
the same private and public information. We also find, regardless of variant, that players who bet
earlier tend to bluff more aggressively because their bet will affect more players’ choices. Along
25
these lines, the effectiveness of bluffing in general decreases later in the game, especially in games
where limited betting forces later bets to be a diminishing part of the total pot size. Poker games
with more players demand a more defensive betting strategy because bluffs are more likely to be
Collusion shifts the equilibrium strategies. In particular, colluding players aim to take an advan-
tage by coordinating their bets. Under strong collusion, colluding players bet more aggressively on
weak hands. Furthermore, strongly colluding players bet more on complementary hands that have
different win conditions, while they bet less frequently on hands that have the same win conditions.
Even when some of the colluding players fold, the remaining colluding players still gain a significant
advantage by knowing additional cards that will not appear later. In 3P-MMH, strongly colluding
players win over 17.5% of an ante (in expectation) each game even if their opponent knows they
are colluding. Under weak collusion, players coordinate their betting by playing extreme hands. In
other words, colluding players bet more aggressively on low-value and high-value hands, while they
bet more defensively on mid-value hands. As expected, we find that weakly colluding players have
no significant advantage when they collude publicly but can gain a slight advantage if they collude
in secret.
To study a variety of poker variants we built an abstraction to facilitate constructing poker game
trees and using fictitious play to find a Nash equilibrium. The implementation of this abstraction
was sufficient for considering SMH and MMH but suffers from complexity issues for bigger games.
In particular, we initially experimented with poker variants more complex than MMH, but many
of these exceeded the memory limitations needed to construct the game tree. Furthermore, some
game variants that did fit in memory took a prohibitively long time to simulate. Modifying the
best response algorithm to work in parallel is a stepping stone to make the implementation scale to
larger simulations.
There are many places to go from here. To start, it would be helpful to study poker variants
with more betting options — a facet of full-scale poker that we ignored. Perhaps our simplified
betting scheme explains why we observed player position correlating with expected payoff. Namely,
it intuitively makes sense that discretized betting would adversely affect earlier players. In addition,
it is worth exploring poker variants with more players. Not only does this more closely model large
26
table online games, but it may also identify relevant collusion trends. In particular, we could
empirically determine the changing effectiveness of collusion as the number of colluding players
increases. Other ideas include considering variants that use full fifty-two-card decks, have five-card
hands, or have more shared cards. Looking at any or all of these variants would help to further
27
References
[1] Andrew W. Appel. Game theory analysis of hold ’em poker. Unpublished, July 2010.
[3] G. W. Brown. Iterative solutions of games by fictitious play. In Activity Analysis of Production
and Allocation, pages 374–376. Wiley, 1951.
[4] Drew Fudenberg. The theory of learning in games, volume 2. MIT press, 1998.
[5] Sam Ganzfried and Tuomas W. Sandholm. Computing equilibria in multiplayer stochastic
games of imperfect information. In In IJCAI, 2009.
[6] Andrew Gilpin and Tuomas Sandholm. A texas hold’em poker player based on automated
abstraction and real-time equilibrium computation. In Proceedings of the Fifth International
Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS ’06, pages 1453–
1454, New York, NY, USA, 2006. ACM.
[7] Andrew Gilpin and Tuomas Sandholm. Lossless abstraction of imperfect information games.
J. ACM, 54(5), October 2007.
[8] Andrew Gilpin, Tuomas Sandholm, and Troels Bjerre Sørensen. A heads-up no-limit texas
hold’em poker player: Discretized betting models and automatically generated equilibrium-
finding programs. In Proceedings of the 7th International Joint Conference on Autonomous
Agents and Multiagent Systems - Volume 2, AAMAS ’08, pages 911–918, Richland, SC, 2008.
International Foundation for Autonomous Agents and Multiagent Systems.
[9] Daphne Koller, Nimrod Megiddo, and Bernhard von Stengel. Fast algorithms for finding ran-
domized strategies in game trees. 1994.
[10] Parisa Mazrooei, Christopher Archibald, and Michael Bowling. Automating collusion detection
in sequential games. In Marie desJardins and Michael L. Littman, editors, AAAI. AAAI Press,
2013.
[11] Michael Johanson Oskari Tammelin Michael Bowling, Neil Burch. Heads-up limit hold’em
poker is solved. pages 145–149, 2015.
[12] Mark Whelan. Supervised machine learning for strategy-based abstraction with applications
to poker. 2014.
[13] Martin Zinkevich, Michael Bowling, Michael Johanson, and Carmelo Piccione. Regret mini-
mization in games with incomplete information. Technical report, 2007.
28