Wide characters support in D

Tue Jun 8 15:33:31 PDT 2010

"Nick Sabalausky" <a at a.a> wrote in message 
news:humfrk$2gk$1 at digitalmars.com...
> "Rainer Deyke" <rainerd at eldwood.com> wrote in message 
> news:humes8$s8$1 at digitalmars.com...
>> On 6/8/2010 13:57, bearophile wrote:
>>> I hope we'll soon have computers with 200+ GB of RAM where using
>>> strings that use less than 32-bit chars is in most cases a premature
>>> optimization (like today is often a silly optimization to use arrays
>>> of 16-bit ints instead of 32-bit or 64-bit ints. Only special
>>> situations found with the profiler can justify the use of arrays of
>>> shorts in a low level language).
>>
>> Off-topic, but I don't need a profiler to tell me that my 1024x1024x1024
>> arrays should use shorts instead of ints.  And even when 200GB becomes
>> common, I'd still rather not waste that memory by using twice as much
>> space as I have to just because I can.
>>
>>
>
> I think he was just musing that it would be nice to be able to ignore 
> multiple encodings and multiple-code-units, and get back to something much 
> closer to the blissful simplicity of ASCII. On that particular point, I 
> concur ;)
>

Keep in mind too, that for an English-language app (and there are plenty), 
even using ASCII still wastes space, since you usually only need the 26 
letters, 10 digits, a few whitespace characters, and a handful of 
punctuation. You could probably fit that in 6 bits per character, less if 
you're ballsy enough to use huffman encoding internally. Yea, there's twice 
as many letters if you count uppercase/lowercase, but random-casing is rare 
so there's tricks you can use to just stick with 26 plus maybe a few special 
control characters. But, of course, nobody actually does any of that because 
with the amount of memory we have, and the amount of memory already used by 
other parts of a program, the savings wouldn't be worth the bother.

But I agree with your point too. Just saying.