Ceci n'est pas une char

Jari-Matti Mäkelä jmjmak at utu.fi.invalid
Fri Apr 7 02:41:54 PDT 2006


Thomas Kuehne wrote:
> Jari-Matti wrote:
>>> That's very true. A "normal" hard drive reads 60 MB/s. So,
>>> reading a 4 MB file takes at least 66 ms and a 1 MB UTF-8-file (only
>>> ASCII-characters) is read in 17 ms (well, I'm a bit optimistic here :).
>>> A modern processor executes 3 000 000 000 operations in a
>>> second. Going through the UTF-8 stream takes 1 000 000 * 10 (perhaps?)
>>> operations and thus costs 3 ms. So it's actually faster to read UTF-8.
> 
> 1) your sample: English (consider Chinese)
> 2) magic word: seek

Yes, I know. This was just an optimistic tongue-in-the-cheek analysis :)
A real world example would naturally have a lot of non-ASCII characters
too, but the point is that reading huge loads of uncompressed UTF-32
data will be usually slower than reading UTF-8 if we are also checking
against text corruptions. I wonder if it's any faster to read
UTF-32-files from a transparently compressed reiser4 drive?

-- 
Jari-Matti



More information about the Digitalmars-d mailing list