# Applying Expectimax to Yahtzee

January 17, 2020

## Introduction

One of the games that my family likes to play when we’re waiting around for something to come out of the oven is Yahtzee. Yahtzee is a dice game where players take turns trying to complete a set of roll combinations with five dice. Each turn, you get three rolls: an initial roll and two rerolls, from which you can hold back any dice you wish to save. The highest scoring roll is a Yahtzee, which is a five of a kind. Other scoring combinations include the Small and Large Straight, of four and five in a row respectively, and the Full House, which is two of one and three of another, as in poker.

{{< figure src=“/img/yahtzee/score_sheet.png” caption=“Figure 1: The full score sheet for a game of Yahtzee” width=“450 px” >}}

Despite being a simple luck-based game, there are some basic strategic elements to Yahtzee, which beginners quickly pick up. For example, the top section of the score sheet introduces a nonlinearity into the scoring, which is that if you average rolls of three-of-a-kind or more for each of the six numbers, you receive a bonus of 35 points. This makes completing the top section in good form a priority during most games, since most of the lower section can be completed more or less opportunistically.

If, on your turn, you fail to complete any of the combinations, you can elect to score the sum of your dice in your Chance row, or, if that is used up, score a zero for any of the rows. This tactic is often used to get an extra turn to complete the top section for the bonus points. It’s sometimes wise to score your Yahtzee row for zero points in the hopes of rolling three sixes the next turn.

Yahtzee seems like a game it should be possible to write a program to
play capably. All probabilities involved are easy to calculate, and the
state space is not *that* large, assuming we can reduce the \(6^5
= 7776\) distinct rolls by exploiting symmetry. A decent Yahtzee playing
agent should exhibit at least some of the strategies described above.
More specifically, we want to build a bot that can:

- Roll coherently: save dice in a way that points to a goal combination for the turn.
- Score coherently: Pick the highest-scoring row among options of roughly equal probability.
- Plan ahead, to the extent the game supports it. For example, scoring
`55556`

as the Four of a Kind for 26 points is probably not worth it compared to racking up 20 points on your 5’s, and getting a good cushion for the top section bonus points. - Tactically retreat: Taking 0 on the Yahtzee row is correct commonly enough that we should expect to see this behavior from the bot, too.

## Expectimax

Our Yahtzee playing bot is going to use a classic algorithm: Expectimax. Expectimax is a tree search algorithm, which means that it treats the game as a tree of possible playouts. Each state of the game is a node of the tree, and the job of the algorithm is to determine the expected value to the player of every game state that it explores. By exploring the tree to the fullest extent of our ability, our bot will take into account as many outcomes as possible, and compute an accurate estimate of its expected score. Then, it will simply make the decision that gives the highest expected score.

Games like chess and checkers are deterministic, in the sense that there is no randomness involved. In these games, the value of a given state of the game is fully determined, assuming that both players play “perfectly”. For a random game like Yahtzee, we have to account for the fact that a given decision can have multiple results, depending on how the dice fall. Therefore, expectimax treats game states as falling into two different classes:

- Deterministic nodes, where it is up to the player to make a decision. The value of a deterministic node is equal to the maximum value of the possible future states, since the player will choose the highest-scoring decision available.
- Chance nodes, where the next outcome is determined randomly. The
algorithm gives these nodes a value equal to the
*average*of the possible next outcomes.

The expectimax algorithm is a recursive search over the game tree, using these two node types as a template. It is common to apply a maximum depth to the game tree, past which the algorithm terminates its search and applies a heuristic evaluation to the game state. The pseudocode for expectimax, then, looks something like this:

```
def expectimax(state, depth):
if depth == MAX_DEPTH:
return heuristic_evaluation(state)
= [expectimax(next_state, depth + 1) for next_state in state.children]
child_scores
if state.is_deterministic:
return max(child_scores)
else:
return mean(child_scores)
```

In our Yahtzee implementation, all chance nodes represent rolls of
the dice. A deterministic node occurs immediately *after* each
roll, and requires the agent to make a choice about whether to hold the
dice or reroll. A chance node occurs immediately after each decision,
and is evaluated randomly. For every roll, there is a deterministic node
for all possible outcomes, represented with boxes. And for every
decision node, there is a chance node for every possible decision at
that point, represented with ovals.

{{< figure src=“/img/yahtzee/tree.png” >}}

With the structure of the game broken down into states and the two different kind of transitions between them, deterministic and random, we are ready to try writing a first version of our future Yahtzee world champion bot. We’ll be using Rust.

## A First Attempt

As a first attempt to apply the expectimax algorithm to Yahtzee, let’s define a pared-down version of the game which only involves the upper section. That is, the score sheet consists of six rows, each of which can be optionally filled in with a score. When all six of the rows are filled, the game is done:

```
struct Scoresheet {
: Option<u8>,
ones: Option<u8>,
twos: Option<u8>,
threes: Option<u8>,
fours: Option<u8>,
fives: Option<u8>,
sixes}
```

The entire game state is comprised of a score sheet, plus the current
turn. A turn is simply a `Roll`

, marked as the first, second,
or third chance of that turn. We’ll model `Chance`

as an
enum, and `Roll`

as an array of five dice. Since dice go up
to six, we can use the `u8`

primitive type:

```
enum Chance {
,
First,
Second,
Third}
struct Roll([u8; 5], Chance);
```

We’re almost able to define our `Game`

struct, which will
encapsulate the entire state of a game of Yahtzee. The last thing we
need is a way to represent which dice are held out from a roll. Our
implementation will distinguish between the state of a game immediately
*before* a roll, when the set of dice to hold has been decided,
and the state of a game immediately *after* a roll, when no
decision to reroll or score has yet been made. The reasoning is that,
per the above high-level description of expectimax, “chance” nodes are
treated differently from deterministic nodes. The state immediately
before a roll is a chance node, while the state immediately after is a
deterministic decision point.

The last two structs we need, then, are as follows:

```
struct Hold([bool; 5]);
struct Game {
: Scoresheet,
scores: Roll,
roll: Option<Hold>,
hold}
```

With these domain models in place, we can implement a rudimentary version of the expectimax algorithm. In the interest of brevity, I have omitted some glue code and combinatorial functions which yield the set of all possible decisions or rolls. The core of our implementation is very similar to the above pseudocode.

### Expecting the Maximum

Our expectimax function accepts a game state, and an integer telling it how far down the game tree this invocation is. If the game is done, or if we have reached the maximum depth, then we refrain from recursing further, and instead hand the job off to our heuristic evaluation function.

```
fn expectimax(game: &Game, depth: u32) -> f64 {
if depth == MAX_DEPTH || game.completed() {
return heuristic_evaluation(game);
}
```

Our heuristic evaluation assumes a likely score of
`2.5 * n`

for each empty row, plus the points already scored
on filled rows. Therefore, the heuristic evaluation converges to the
actual score as the game progresses. There is substantial room for
improvement here, but it is sufficient to motivate the algorithm to try
for 3 or more dice on each row, which is the baseline for a successful
top section in a full game.

Here is the main body of the `expectimax`

function.
Whether a game state is deterministic or random is determined by whether
their is an active hold of the dice–if there is, we must be about to
roll the rest of the dice.

```
match game.hold {
// We just rolled, and have to decide whether to reroll or to score our dice
None => {
let all_options = game.options();
let (_, _, expectation) = best_option(
, depth + 1, all_options, rng, bench, max_depth);
game
expectation},
// We made a decision to reroll, and are looking for the expected value,
// averaged over all possible roll outcomes.
Some(hold) => {
let rerolls = game.roll.rerolls(&hold);
let stats: StatsCounter = rerolls.into_iter()
.map(|roll| {
&game.with_roll(roll), depth + 1)
expectimax(})
.collect();
.avg()
stats},
}
}
```

Note that we are making use of a cool feature of Rust: iterator
adapters. Our `StatsCounter`

struct can be constructed from
an iterator of floats. It automatically computes both the average and
variance of the stream, by accumulating a count, sum, and sum of
squares.

### Weighing Options

Along with the `expectimax`

function, a core piece of the
algorithm is the function which determines the best option from among
several. This has been factored out of `expectimax`

because
it’s also used to actually drive the game play. As we would expect, it
calls back into `expectimax`

to determine the best possible
play. To serve both callers, it returns a triple of
`(decision, game, score)`

. Of these, `expectimax`

only cares about the score.

```
/// Return the best option from a list of decisions, along with the game
/// resulting from that decision, and its expectimax score.
fn best_option(
: &Game,
game: u32,
depth: Vec<Decision>,
options-> (Decision, Game, f64) {
) assert!(!options.is_empty(), "Empty list of options");
return options.into_iter()
.map(|decision| {
let resulting_game = game.resulting(decision, rng);
let expectation = expectimax(&resulting_game, depth);
, resulting_game, expectation)
(decision})
.max_by(|(_, _, e1), (_, _, e2)| {
.partial_cmp(&e2).expect("Tried to compare a NaN")
e1})
.unwrap();
}
```

### Playing the Game

Once we have the `expectimax`

and `best_option`

functions defined, actually running the game is just a few lines of
code. While the game is not yet complete, we execute the best available
decision. The outer loop needs to print out a description of the
decision taken, and update the actual game state to the result of that
decision:

```
pub fn play_game(max_depth: u32) -> Game {
let mut rng: SmallRng = SmallRng::from_entropy();
let mut game = Game::new(&mut rng);
while !game.done() {
// Make the best available decision
let options = game.options();
let (best, resulting_game, _) = best_option(&game, 0, options);
// Print out the decision
println!("{}", best.describe(&game));
= resulting_game;
game
if game.hold.is_some() {
// Make a roll
= game.roll(&mut rng);
game }
}
game}
```

## A Promising Start

A typical game with this first implementation looks like this:

```
⚅⚅⚄⚂⚃
Rerolling: ⚅⚅⚄⚂⚃
Rerolling: ⚅⚅⚄⚅⚂
Scoring sixes ⚅⚅⚃⚅⚄
| 1s | 2s | 3s | 4s | 5s | 6s |
| _ | _ | _ | _ | _ | 18 |
Rerolling: ⚁⚄⚃⚅⚂
Rerolling: ⚅⚄⚅⚀⚁
Scoring ones ⚁⚂⚄⚀⚃
| 1s | 2s | 3s | 4s | 5s | 6s |
| 1 | _ | _ | _ | _ | 18 |
Rerolling: ⚄⚃⚁⚃⚅
Rerolling: ⚃⚃⚀⚃⚁
Scoring fours ⚃⚃⚄⚃⚀
| 1s | 2s | 3s | 4s | 5s | 6s |
| 1 | _ | _ | 12 | _ | 18 |
Rerolling: ⚂⚂⚂⚃⚃
Rerolling: ⚂⚂⚂⚂⚅
Scoring threes ⚂⚂⚂⚂⚅
| 1s | 2s | 3s | 4s | 5s | 6s |
| 1 | _ | 12 | 12 | _ | 18 |
Rerolling: ⚁⚅⚅⚁⚁
Rerolling: ⚁⚄⚀⚁⚁
Scoring twos ⚁⚄⚅⚁⚁
| 1s | 2s | 3s | 4s | 5s | 6s |
| 1 | 6 | 12 | 12 | _ | 18 |
Rerolling: ⚄⚂⚀⚃⚄
Rerolling: ⚄⚅⚂⚅⚄
Scoring fives ⚄⚅⚅⚀⚄
| 1s | 2s | 3s | 4s | 5s | 6s |
| 1 | 6 | 12 | 12 | 10 | 18 |
```

We certainly haven’t mastered the game. But our bot already demonstrates a very basic grasp of tactics. In the very first turn of this example game, it holds the pair of sixes, and attempts to roll some more. In the second turn, it first holds the five, and when it fails to roll another, switches to holding the 1 in order to minimize the damage.

While this is somewhat promising, we’ve actually already run up against the limits of this data model. At a search depth of 3, our expectimax implementation is examining upwards of 250 million leaf nodes. Granted, this takes just a few seconds per turn, but the growth is exponential in the tree depth. In order to do better, we need to do two things:

- Take advantage of the symmetries inherent in the game to eliminate some branches of the tree, or
- Memoize tree evaluations, so that we aren’t wasting work traversing the same branches more than once.

Looked at from a certain point of view, memoization is just a different way of taking advantage of symmetry. Paths through the tree which arrive at the same state will be deduplicated with a good memoization strategy. But comparing and deduplicating two game states has a nonzero runtime cost. It would be even better if we could keep the two symmetric paths from both appearing at all, by being more clever about the way we generate game states.

In the next post, we’ll improve our domain modeling to take better advantage of the information that the game of Yahtzee actually uses, and extend the depth of our tree search much farther.