[OT] Generating distribution of N dice rolls
Cabal Powel
cabalpowel at gmail.com
Mon Nov 21 10:44:48 UTC 2022
On Thursday, 10 November 2022 at 02:10:32 UTC, H. S. Teoh wrote:
> This is technically OT, but I thought I'd pick the smart brains
> here for my project, which happens to be written in D. ;-)
>
> Basically, I want to write a function that takes 2 uint
> arguments k and N, and simulates rolling N k-sided dice and
> counting how many 1's, 2's, 3's, ... k's were rolled. Something
> like this:
>
> uint[k] diceDistrib(uint k)(uint N)
> in(k > 0)
> in(N > 0)
> out(r; r[].sum == N)
> {
> uint[k] result;
> foreach (i; 0 .. N) {
> result[uniform(0, k)]++;
> }
> return result;
> }
>
> The above code works and does what I want, but since N may be
> large, I'd like to refactor the code to loop over k instead of
> N. I.e., instead of actually rolling N dice and tallying the
> results, the function would generate the elements of the output
> array directly, such that the distribution of the array
> elements follow the same probabilities as the above code.
>
> Note that in all cases, the output array must sum to N; it is
> not enough to merely simulate the roll distribution
> probabilistically.
>
> Any ideas? (Or links if this is a well-studied problem with a
> known
> solution.)
>
> <ObDReference> I love how D's new contract syntax makes it so
> conducive to expressing programming problem requirements. ;-)
> </ObDReference>
>
>
> T
If you have ever played the game of Catan you’ll quickly realise
that when rolling two dice the number 7 is very common! This
brought up a question, which is as follows:
What is the true probability of rolling a sum of 7 with two
6-sided dice? Moreover, what is the probability of rolling a sum
of any number with n 6-sided dice?
Here is what I did to find the answers.
Experiments (rolling dice)
Let’s first run a few dice experiments (ie. simply roll some
dice). Lucky for us we can use computers to produce millions of
dice rolling simulations in a few minutes instead of rolling them
ourselves for a few years.
Just by eye-balling the experimental data in Figure 1 we can see
a familiar shape emerging as the number of dice increases, a bell
curve, also known as a normal or Gaussian distribution. We can
work with this, and try to fit a Gaussian to the experimental
data. A Gaussian distribution is mathematically expressed as
. The two parameters μ and σ correspond to the mean and the
standard deviation of the probability distribution, they define
the central react native app development
(https://enterprise.affle.com/react-native-app-development) with
a pool of experienced developers who leverage user-centric design
But what are the best values of these parameters to fit our
n-dice experimental data? Well, we can infer the most likely
parameter values via statistical analysis. We can see from Figure
1 that the probability distributions are symmetric. Using this
symmetry we can define the means of the experimental data by
simply locating the maximum positions of each distribution.
Figure 2 below is a plot of these maximum positions for an
increasing number of dice. It can be seen from the figure that a
linear correlation exists between the mean μ and the number of
dice n with a line of best fit of μ=3.5n.
Now with values for the means we can use the method of least
squares to find the values for the standard deviation σ that
correspond to the best fitting Gaussians to the experimental data.
Method of least squares (identifying the most likely σ values)
The method of least squares defines a metric to compare how
similar two sets of data are, this metric is known as the mean
squared error (MSE) which is mathematically expressed as
. MSE is the mean squared difference between the experimental
data (Xᵢ ) and the Gaussian fitting (P(xᵢ)) for a given σ value,
where n denotes the number of bins in the histogram data and i
denotes each bin. So finding the σ value that minimises MSE
corresponds to minimising the difference between the Gaussian
fitting and the experimental data ie. The best fit. Figure 3
shows the minimisation process for identifying the most likely
parameter values (which were μ = 14 and σ = 3.36) for four dice.
Using the mean squared error to infer the most likely σ values
for a range of dice we can plot σ as a function of the number of
dice n. Figure 4 shows the plot of σ(n), we can see that the
experimental data traces out a √n looking curve. Again, using the
method of least squares we can find the best fitting √n curve to
the experimental data, this resulted in σ(n) = 1.75√n.
So, given n-dice we can now use μ(n) = 3.5n and σ(n) = 1.75√n to
predict the full probability distribution for any arbitrary
number of dice n. Figure 5 and 6 below shows these fittings for
n=1 to n=17.
We have so far shown that with a very quick empirical effort we
can predict the data amazingly well using Gaussians. However,
despite the success in fitting the larger n cases our Gaussian
fit is still only an approximation. This is why our fitting is
worse for smaller n and the first two cases (1 and 2 dice) are
not captured very well by our Gaussian approximation. Let’s see
if we can derive the exact solution for the probability of a
given total sum for n dice!
Hope this will help you.
More information about the Digitalmars-d
mailing list