Today's programming challenge - How's your Range-Fu ?

Chris via Digitalmars-d digitalmars-d at puremagic.com
Sat Apr 18 04:52:50 PDT 2015


On Saturday, 18 April 2015 at 11:35:47 UTC, Jacob Carlborg wrote:
> On 2015-04-18 12:27, Walter Bright wrote:
>
>> That doesn't make sense to me, because the umlauts and the 
>> accented e
>> all have Unicode code point assignments.
>
> This code snippet demonstrates the problem:
>
> import std.stdio;
>
> void main ()
> {
>     dstring a = "e\u0301";
>     dstring b = "é";
>     assert(a != b);
>     assert(a.length == 2);
>     assert(b.length == 1);
>     writefln(a, " ", b);
> }
>
> If you run the above code all asserts should pass. If your 
> system correctly supports Unicode (works on OS X 10.10) the two 
> printed characters should look exactly the same.
>
> \u0301 is the "combining acute accent" [1].
>
> [1] http://www.fileformat.info/info/unicode/char/0301/index.htm

Yep, this was the cause of some bugs I had in my program. The 
thing is you never know, if a text is composed or decomposed, so 
you have to be prepared that "é" has length 2 or 1. On OS X these 
characters are automatically decomposed by default. So if you 
pipe it through the system an "é" (length=1) automatically 
becomes "e\u0301" (length=2). Same goes for file names on OS X. 
I've had to find a workaround for this more than once.


More information about the Digitalmars-d mailing list