[OT] The horizon of a stream

Nigel Sandever nigelsandever at btconnect.com
Sun Oct 26 05:59:38 PDT 2008


On Sun, 26 Oct 2008 03:39:50 -0400, bearophile <bearophileHUGS at lycos.com> wrote:
> Nigel Sandever:
> 
> >I did try that (using md5), but the penalty in Perl was horrible,<
> 
> This is a D newsgroup, so use D, it allows you to manage bits more 
efficiently.
> 

Sorry. No disrespect meant to D. I always prototype in Perl and then convert to 
C or D if I need performance. I'm just more familiar with Perl.

> 
> >I used (a slightly modified version of) 2of12inf available from<
> 
> That's a quite complex file, so I suggest something simpler, as this after a 
cleaning of the non ASCII words:
> http://www.norvig.com/big.txt

I don't know what is "complex" about a 1 word per line, 81536 line dictionary 
file?

Or how having everyone clean up Conan Doyle would be simpler?

If you have Perl, you can produce a suitable testfile from any 1 word per line 
dictionary with the command line:

    perl -l12n0777aF/\n/ -ne'print $F[rand @F] for 1..4e8' yourdict >thedata

With the 2of12inf dictionary file, 4e8 produces a file a little under 4GB in a 
round 10 minutes. YMWV depending upon the average length of the lines in your 
local dict. 

Of course the won't all be the same as mine, or anyone elses, but given the 
random nature, the results will be broadly comparible.

My D foo is too rusty to try and write that in D. Especially for a D audience :)
I'm sure one of you guys can knock that up in the blink of an eye.


> 
> Bye,
> bearophile






More information about the Digitalmars-d mailing list