DnD Stat rolls and order statistics

Here's how Dungeons and Dragons stat scores are chosen - you roll four six-sided dice ("4d6"), drop the lowest score, and take the sum of the other three. What's the expected value of this sum? Let's think about it. ### Finding the expected value of a DnD stat roll Let $X_1, X_2, X_3, X_4$ be random variables representing the number on each of the four dice rolled. We know that $\mathbb{E}[X_i] = 3.5$, and expectation is linear, so what we want should simply be $\mathbb{E}[X_1 + X_2 + X_3] = 10.5$, right? Unfortunately, this is wrong, because we aren't taking _any_ three of the dice and adding them, we are instead taking the _top three_. This is not the same as adding $X_1, X_2$ and $X_3$ because those aren't necessarily the top three numbers rolled on the four dice. To actually answer this question, let's define some new random variables, $Y_i$. To create these, we first roll the dice, and then put them in ascending order, assigning them to the $Y$s in order. So in our case, $Y_1 \le Y_2 \le Y_3 \le Y_4$, and the quantity that we want to find is $\mathbb{E}[Y_2 + Y_3 + Y_4]$. This is not the same as adding up just $X_1, X_2$ and $X_3$ because the $Y_i$s are no longer identically distributed (the distribution of $Y_4$, the maximum of four dice rolls, is going to be different from the distribution of $Y_1$, the minimum, for example). Still, we can simplify this using two observations. First, since the $Y_i$s are just a permutation of the $X_i$s, $Y_1 + Y_2 + Y_3 + Y_4 = X_1 + X_2 + X_3 + X_4$ and $\mathbb{E}[Y_1 + Y_2 + Y_3 + Y_4] = \mathbb{E}[X_1 + X_2 + X_3 + X_4] = 14$ Secondly, since expectation is linear even for dependent variables, $\mathbb{E}[Y_2 + Y_3 + Y_4] = \mathbb{E}[Y_1 + Y_2 + Y_3 + Y_4] - \mathbb E[Y_1] = 14 - \mathbb E[Y_1]$ So now, we just need to find the distribution of the minimum of the four dice rolls, $Y_1$. To do this, let's start by figuring out the probability that the minimum dice roll is a certain value. Let's start by finding the probability of $Y_1$ being 6. In our sample space, there are $6^4 = 1296$ possible outcomes (since each dice has four possible outcomes), and in only _one_ of those outcomes - 6,6,6,6 - is the minimum value equal to 6. So $\mathbb P[Y_1 = 6] = \frac1{1296}$. Now what's $\mathbb P[Y_1 = 5]$? Well, for the minimum value to be five, we exclude all outcomes in which any dice rolls below a five. This means that there are only two valid rolls for each die, 5 and 6. This gives us $2^4 =16$ outcomes, but we also have to make sure that at least one of the dice rolled is a five. So we exclude the outcome 6,6,6,6 to get 15 outcomes. $\therefore \mathbb P[Y_1 = 5] = \frac{15}{1296}$. Now let's think about the general case, $\mathbb P[Y_1 = m]$ (where $m = 1,2,..,6$). For the minimum value to be $m$, none of the dice should roll below $m$. Thus the only "choices" that each die has are $m$ and above, which are $7-m$ choices in total. But we have to exclude all the outcomes in which _all_ dice rolled more than $m$. We get this through a similar analysis; now all dice have "choices" $m+1$ and above, giving us $6-m$ choices. So the total number of outcomes in which the minimum is exactly $m$ are $(7-m)^4 - (6-m)^4$, $\therefore \mathbb P[Y_1 = m] = \frac{(7-m)^4 - (6-m)^4}{6^4} $ **Sanity check.** Do our probabilities even sum to 1? $\sum_{i=1}^6 \mathbb P[Y_1 = i] = \sum_{i=1}^6 \frac{(7-i)^4 - (6-i)^4}{6^4} = \sum_{i=1}^6 \frac{(7-i)^4}{6^4} - \sum_{i=1}^6 \frac{(6-i)^4}{6^4}$ This is a telescoping sum, but to avoid any ambiguities (and also because I think it's a nice trick that isn't often discussed when talking about telescoping sums), we can reindex both sums. The numerators in the first sum go from $6^4$ to $1^4$, and the numerators in the second sum go from $5^4$ to $0^4$, so we can write this as $\sum_{i=1}^6 \frac{i^4}{6^4} - \sum_{i=1}^5 \frac{i^4}{6^4} = \frac{6^4}{6^4} + \sum_{i=1}^5 \frac{i^4}{6^4} - \sum_{i=1}^5 \frac{i^4}{6^4} = 1$ _Whew!_ Finally onto finding the expected value. $\mathbb E[Y_1] = \text{ (by definition) }\sum_{i=1}^6 i\cdot\mathbb P[Y_1 = i]$. To simplify this further we use basically the same techniques as before, including re-indexing $\begin{align} &= \sum_{i=1}^6 i\cdot \frac{(7-i)^4 - (6-i)^4}{6^4} \\ &= \sum_{i=1}^6 i\cdot \frac{(7-i)^4}{6^4} - \sum_{i=1}^6 i\cdot \frac{(6-i)^4}{6^4} \\ &= \sum_{i=1}^6 (7-i)\cdot \frac{i^4}{6^4} - \sum_{i=1}^5 (6-i)\cdot \frac{i^4}{6^4} \;\; \text{(re-indexing)}\\ &= 1\cdot \frac{6^4}{6^4} + \sum_{i=1}^5 (7-i)\cdot \frac{i^4}{6^4} - \sum_{i=1}^5 (6-i)\cdot \frac{i^4}{6^4} \\ &= 1 + \sum_{i=1}^5 ((7-i)-(6-i))\frac{i^4}{6^4} \\ &= 1+ \sum_{i=1}^5 \frac{i^4}{6^4} \\ &= 1 + \frac{979}{1296} \approx 1.755\end{align}$ This gives us that the expected value of a stat roll is $14 - 1.755 = 12.245$. Some observations: the expected value of the minimum here is just slightly above half the expected value of a single roll ($3.5$). Also, we don't need to do all this work again to find the expected value of the maximum - if we replace each roll $X$ by $7-X$, we get a one-to-one correspondence in which the maximum and the minimum is switched, so the expected value of the maximum $\mathbb E[Y_4] = 7 - \mathbb E[Y_1] = 5.244$ ### Order Statistics Generalizing from this, the **k-th order statistic** of a sample is defined as the k-th smallest value in that sample. Our variables from above, $Y_1, Y_2, Y_3, Y_4$ are the the 1st, 2nd, 3rd and 4th order statistics of the distribution of $X$ when sampled four times. To adhere to conventional notation, I'll now use $X$ to refer to the random variable itself, and $X_{(k)}$ to refer to the random variable representing the k-th order statistic of $X$ (when rolled $n$ times). Let's think about the case where $X$ is a variable that uniformly takes values from $1$ to $r$, and it is sampled $n$ times. This gives us $n$ order statistics $X_{(1)}, ..., X_{(n)}$. We want to find the probability that $X_{(k)} = m$. This means that $k-1$ values are _at most_ as large as $m$, at least 1 value is _equal_ to $m$, and $n-k$ values are _at least_ as large as $m$. ### Arbitrary Discrete Random Variables ### Continuous Random Variables And that was it for this exploration of order statistics! I stumbled upon them while thinking about the DnD dice roll problem, and found them interesting enough to look a bit deeper. There's another problem that is related which I'll write about in a future post. It didn't arise through DnD, but just for fun, let me phrase it in DnD-adjacent terms - say you have to roll a 20-sided die ten times, and the game is to choose the highest roll of the die _as it is rolled_. i.e., you have to stop the game when you think you already have the highest roll, before the rest of the rolls are made. I think it would be interesting to explore this question, and order statistics can help for related but harder questions like choosing the _median_ of the rolls, or doing any of these on a continuous distribution with no upper limit.