[OT] Generating distribution of N dice rolls

Mon Nov 21 10:44:48 UTC 2022

On Thursday, 10 November 2022 at 02:10:32 UTC, H. S. Teoh wrote:
> This is technically OT, but I thought I'd pick the smart brains 
> here for my project, which happens to be written in D. ;-)
>
> Basically, I want to write a function that takes 2 uint 
> arguments k and N, and simulates rolling N k-sided dice and 
> counting how many 1's, 2's, 3's, ... k's were rolled. Something 
> like this:
>
> 	uint[k] diceDistrib(uint k)(uint N)
> 		in(k > 0)
> 		in(N > 0)
> 		out(r; r[].sum == N)
> 	{
> 		uint[k] result;
> 		foreach (i; 0 .. N) {
> 			result[uniform(0, k)]++;
> 		}
> 		return result;
> 	}
>
> The above code works and does what I want, but since N may be 
> large, I'd like to refactor the code to loop over k instead of 
> N. I.e., instead of actually rolling N dice and tallying the 
> results, the function would generate the elements of the output 
> array directly, such that the distribution of the array 
> elements follow the same probabilities as the above code.
>
> Note that in all cases, the output array must sum to N; it is 
> not enough to merely simulate the roll distribution 
> probabilistically.
>
> Any ideas?  (Or links if this is a well-studied problem with a 
> known
> solution.)
>
> <ObDReference> I love how D's new contract syntax makes it so 
> conducive to expressing programming problem requirements. ;-) 
> </ObDReference>
>
>
> T

If you have ever played the game of Catan you’ll quickly realise 
that when rolling two dice the number 7 is very common! This 
brought up a question, which is as follows:

What is the true probability of rolling a sum of 7 with two 
6-sided dice? Moreover, what is the probability of rolling a sum 
of any number with n 6-sided dice?

Here is what I did to find the answers.

Experiments (rolling dice)
Let’s first run a few dice experiments (ie. simply roll some 
dice). Lucky for us we can use computers to produce millions of 
dice rolling simulations in a few minutes instead of rolling them 
ourselves for a few years.

Just by eye-balling the experimental data in Figure 1 we can see 
a familiar shape emerging as the number of dice increases, a bell 
curve, also known as a normal or Gaussian distribution. We can 
work with this, and try to fit a Gaussian to the experimental 
data. A Gaussian distribution is mathematically expressed as

. The two parameters μ and σ correspond to the mean and the 
standard deviation of the probability distribution, they define 
the central react native app development 
(https://enterprise.affle.com/react-native-app-development) with 
a pool of experienced developers who leverage user-centric design 
But what are the best values of these parameters to fit our 
n-dice experimental data? Well, we can infer the most likely 
parameter values via statistical analysis. We can see from Figure 
1 that the probability distributions are symmetric. Using this 
symmetry we can define the means of the experimental data by 
simply locating the maximum positions of each distribution. 
Figure 2 below is a plot of these maximum positions for an 
increasing number of dice. It can be seen from the figure that a 
linear correlation exists between the mean μ and the number of 
dice n with a line of best fit of μ=3.5n.

Now with values for the means we can use the method of least 
squares to find the values for the standard deviation σ that 
correspond to the best fitting Gaussians to the experimental data.

Method of least squares (identifying the most likely σ values)
The method of least squares defines a metric to compare how 
similar two sets of data are, this metric is known as the mean 
squared error (MSE) which is mathematically expressed as

. MSE is the mean squared difference between the experimental 
data (Xᵢ ) and the Gaussian fitting (P(xᵢ)) for a given σ value, 
where n denotes the number of bins in the histogram data and i 
denotes each bin. So finding the σ value that minimises MSE 
corresponds to minimising the difference between the Gaussian 
fitting and the experimental data ie. The best fit. Figure 3 
shows the minimisation process for identifying the most likely 
parameter values (which were μ = 14 and σ = 3.36) for four dice.

Using the mean squared error to infer the most likely σ values 
for a range of dice we can plot σ as a function of the number of 
dice n. Figure 4 shows the plot of σ(n), we can see that the 
experimental data traces out a √n looking curve. Again, using the 
method of least squares we can find the best fitting √n curve to 
the experimental data, this resulted in σ(n) = 1.75√n.

So, given n-dice we can now use μ(n) = 3.5n and σ(n) = 1.75√n to 
predict the full probability distribution for any arbitrary 
number of dice n. Figure 5 and 6 below shows these fittings for 
n=1 to n=17.

We have so far shown that with a very quick empirical effort we 
can predict the data amazingly well using Gaussians. However, 
despite the success in fitting the larger n cases our Gaussian 
fit is still only an approximation. This is why our fitting is 
worse for smaller n and the first two cases (1 and 2 dice) are 
not captured very well by our Gaussian approximation. Let’s see 
if we can derive the exact solution for the probability of a 
given total sum for n dice!

Hope this will help you.