First Impressions

Fri Sep 29 17:03:02 PDT 2006

Johan Granberg wrote:
> Georg Wrede wrote:
> 
>> Wrong.
>>
>> And that's precisely what I meant about the Daddy holding bike 
>> allegory a few messages back.
>>
>> The current system seems to work "by magic". So, if you do go to 
>> China, itll "just work".
>>
>> At this point you _should_ not believe me. :-) But it still works.
>>
>> ---
> 
> 
> But is this not a needless source of confusion, that could be eliminated 
> by defining char as "big enough to hold a unicode code point" or 
> something else that eliminates the possibility to incorrectly divide utf 
> tokens.
> 
> I will have to try using char[] with non ascii characters thou I have 
> been using dchar fore that up till now.

You might begin with pasting this and compiling it:

import std.stdio;

void main()
{
	int öylätti;
	int ШеФФ;

	öylätti = 37;
	ШеФФ = 19;

	writefln("Köyhyys 1 on %d ja nöyrä 2 on %d, että näin.", öylätti, ШеФФ);
}

It will compile, and run just fine. (The source file having been read 
into DMD as a single big string, and then having gone through comment 
removal, tokenizing, parsing, lexing, compiling, optimizing, and finally 
the variable names having found their way into the executable. Even 
though the front end has been written in D itself, with simply char[] 
all over the place.)

(Then you might see that the Windows "command prompt window" renders the 
output wrong, but it's only from the fact that Windows itself doesn't 
handle UTF-8 right in the Command Window.)

The next thing you might do is to write a grep program (that takes as 
input a file and as output writes the lines found). Write the program as 
if you had never heard this discussion. Then feed it the Kalevala in 
Finnish, or Mao's Red Book in Chinese. Should still work.

As long as you don't start tampering with the individual octets in 
strings, you should be just fine. Don't think about UTF and you'll prosper.