Saturday, February 28, 2009

Lies, Damned Lies, and the Law of Large Number Statistics.

After reading Adam's excellent article about math-hammer, I got wondering: when does "noise" stop being the dominant component of die rolling? Or put another way: how many die do I have to roll to expect math-hammer calculations to come true? Damned statistics get me interested every time. (Yes, I'm a geek, as my wife keeps telling me. To be fair, my wife helped out with this posting as well - she likes a good puzzle as much as I do. All of the following assumes fair die are used (no practised rolling etc.).

Aside: eventually, I want to figure out when you know you're playing against practised rolling just by using statistics... I'm a long way from figuring that out yet.

What's the problem here? Well, as Adam pointed out more eloquently than I can, if we're rolling small numbers of die, then we can't really expect the statistical average to be the outcome. Let me see if I can put that in another way. If we had an infinite(!) number of d6's, then the average (statistical mean) is expected to be 3.5. If I've only got a 2 die, then then mean might not be tending toward 3.5 in the slightest. Mathematically speaking, this issue is really an exploration and application of the law of large numbers and the central limit theorem.

Law of large numbers.
For the interested, wikipedia provides an excellent summary of this idea. In plain language, it states that as you roll more die, the more your average will tend toward the expected long-term average.

But, my question is how many die do I have to roll? To answer this, I must set a margin of tolerance that I'd be happy with. I'm going to (somewhat arbitrarily) set this at 5%. This means that when my average has a difference of 5% compared to the long term (infinite number of die) average of 3.5, I'll have found my answer. i.e. I'm looking for my average to fall in the range of 3.675 to 3.325.

There's clearly an immediate problem. I roll a 3 on my first die and then a 4 on my second one. Perfect - my average is now 3.5 and within 5% of the expected long term average (also 3.5). So, I'm only going to stop rolling those die when my average is within 5% of 3.5 for three consecutive rolls. This is my convergence condition assumption.

So, I was going to write an elegant compiled program to realize this idea, but my wife told me that excel would do it. I caved-in and used excel. If you want to reproduce it, type the following into an excel spreadsheet:
column A: =INT(6*RAND())+1
(This is a pseudo-random number from 1 to 6; drag to a large row number, say 1000 rows worth of this!).
column B: this column is going to be the same as the row number - just type it in a few times and drag downward. This is the iteration number (or how many times we've rolled the d6).
column C: square C2 is =A1+A2. square C3 is =C2+A3. square C4 is =C3+A4. drag down from here (but not square C2). This gives the sum of squares A1 through A?. (where ?=row number).
column D: square D2 is =C2/B2. square D3 is =C3/B3. ...and so forth. This gives the rolling average. You're basically looking for this column being between 3.675 and 3.325 for three consecutive rows. (i.e. The convergence condition is met).

When column D is between 3.675 and 3.325 for three consecutive rows, we stop looking and take a note of column B, the row number (or iteration number).

I got 143. This means that it took 143 d6's to get my running mean to within the range 3.675 to 3.325 for three consecutive mean calculations.

That is only 1 result, however. So, I've re-run this 100 times (which is where a compiled language would've helped over excel). That gives a mean of 56.5 +/- 69.8; median=38.

In doing so, there's a little issue: 3 consecutive means in the correct range might not indicate statistical convergence to 3.5. It might go out of that range once more with more die rolls.

So, I re-ran the re-run 100 times but looked for TEN consecutive means in the correct range (a more stringent convergence condition). That gave a mean of 83.8 +/- 64.9; median=68. That's a big standard deviation.

Of course, I could also alter that 5% level to a 1% level... but that'd take lots longer. I could go on about improvements the the methodology for a while but I'm not going to bore you.

Had enough, or just skipped to this section? Let me put it this way: if you roll lots of die, your average will tend toward 3.5. Otherwise, your previous meticulous calculations of how many wounds you're expecting to cause on that critical assault will not work out like you might think. Apocalypse anyone?


Andy said...

Whoa! i think i just burst a brain cell, i new i should have waited until i was fully awake before reading about Math:-)
Seriously though, even though i'm new to 40k i've already found myself rolling dice to see how a certain character may work out in a game.

Adam Hunter said...

Thanks for the article plug.

Mathhammer is an interesting topic, because due to high Natural Rolling and Practiced Rolling it tends to become irrelevant.

However, when you're making a tough tactical decision about where to spread your firepower, it's worth basing things on mathhammer so you are aware of the odds with regards to making your decisions.

Normally I go on common sense. But sometimes when you don't have much firepower to spare and a lot to kill, statistics can really help when prioritising.

RonSaikowski said...

Mine are "twin-linked," will that matter?

jabberjabber said...

Hi Guys, thanks for the comments!

Andy: sorry! I didn't mean to bust your brain cells eraly in the morning! :)

Adam: I agree with you 100% on all your points.

Ron: Good question! I'm feeling another posting coming along...

Related Posts Plugin for WordPress, Blogger...


Sequestered Industries