char and string with umlauts

Jim Danley jimdanley2 at gmail.com
Sat Oct 22 01:51:52 PDT 2011


My thanks to everyone who responded.  I learned something new, which is 
always a good thing, plus my program now works correctly!

Take care,

Jim

----- Original Message ----- 
From: "Jonathan M Davis" <jmdavisProg at gmx.com>
To: "digitalmars.D.learn" <digitalmars-d-learn at puremagic.com>
Sent: Thursday, October 20, 2011 8:19 PM
Subject: Re: char and string with umlauts


> On Thursday, October 20, 2011 09:48 Jim Danley wrote:
>> I have been a programmer for many years and started using D about one 
>> year
>> back. Suddenly, I find myself in unfamiliar territory. I need to used
>> Finish umlauts in chars and strings, but they are not part of my usual
>> American ASCII character set.
>>
>> Can anyone point me in the right direction? I am getting "Invalid UTF-8
>> sequence" errors.
>
> I'd have to see code to really say much about what you're doing. But char 
> is a
> UTF-8 code unit, wchar is a UTF-16 code unit, and dchar is a UTF-32 code 
> unit.
> For UTF-8 and UTF-16, it can take multiple code units to make a single 
> code
> point, and a code point is typically what you would consider to be a 
> character
> (it's actually possible for one code point to alter another - e.g. add an
> accent or superscript to it - so a true character would be what is called 
> a
> grapheme, but for the most part, you don't need to worry about that; at 
> the
> moment, D doesn't do anything special to support graphemes). So, when 
> you're
> operating on characters in D, you want to operate on dchars, not chars or
> wchars, because they're not necessarily complete characters. That's why 
> range-
> based functions treat all strings as ranges of dchar, even if they're 
> arrays
> of char or wchar (e.g. front returns a dchar, not a char or wchar). It's 
> also
> why when iterating over a string with foreach, you want to specify the
> iteration type. e.g.
>
> foreach(dchar c; str)
>
> not
>
> foreach(c; str)
>
> Since iterating over the individual code units really isn't what you want.
> Basically, you pretty much never want to operate on an individual char or
> wchar. Always make sure that you operate on dchars when operating on
> individual characters.
>
> - Jonathan M Davis
> 



More information about the Digitalmars-d-learn mailing list