[OT] All your medians are belong to me
Andrei Alexandrescu via Digitalmars-d
digitalmars-d at puremagic.com
Mon Nov 21 09:39:40 PST 2016
Hey folks, I'm working on a paper for fast median computation and
https://issues.dlang.org/show_bug.cgi?id=16517 came to mind. I see the
Google ngram corpus has occurrences of n-grams per year. Is data
aggregated for all years available somewhere? I'd like to compute e.g.
"the word (1-gram) with the median frequency across all English books"
so I don't need the frequencies per year, only totals.
Of course I can download the entire corpus and then do some processing,
but that would take a long time.
Also, if you can think of any large corpus that would be pertinent for
median computation, please let me know!
Thanks,
Andrei
More information about the Digitalmars-d
mailing list