Let's stop parser Hell

Jacob Carlborg doob at me.com
Wed Aug 1 11:24:22 PDT 2012


On 2012-08-01 19:50, Philippe Sigaud wrote:
> On Wed, Aug 1, 2012 at 5:45 PM, Jonathan M Davis <jmdavisProg at gmx.com> wrote:
>
>> "ウェブサイト"
>> "\u30A6\u30A7\u30D6\u30B5\u30A4\u30C8"
>>
>> The encoding of the source file is irrelevant.
>
> do you mean I can do:
>
> string field = "ウェブサイト";
>
> ?
>
> Geez, just tested it, it works. even writeln(field) correctly output
> the japanese chars and dmd doesn't choke on it.
> Bang, back to state 0: I don't get how D strings work.

Unicode supports three encodings: UTF-8, UTF-16 and UTF-32. All these 
encodings can store every character in the Unicode standard. What's 
different is how the characters are stored and how many bytes a single 
character takes to store in the string. For example:

string str = "ö";

The above character will take up two bytes in the string. On the other 
hand, this won't work:

char c = 'ö';

The reason for that is the the above character needs two bytes to be 
stored but "char" can only store one byte. Therefore you need to store 
the character in a type where it fits, i.e. "wchar" or "dchar". Or you 
can use a string where you can store how many bytes you want.

Don't know if that makes it clearer.

-- 
/Jacob Carlborg


More information about the Digitalmars-d mailing list