Today's programming challenge - How's your Range-Fu ?

Sat Apr 18 09:01:21 PDT 2015

On 4/18/15 4:35 AM, Jacob Carlborg wrote:
> On 2015-04-18 12:27, Walter Bright wrote:
>
>> That doesn't make sense to me, because the umlauts and the accented e
>> all have Unicode code point assignments.
>
> This code snippet demonstrates the problem:
>
> import std.stdio;
>
> void main ()
> {
>      dstring a = "e\u0301";
>      dstring b = "é";
>      assert(a != b);
>      assert(a.length == 2);
>      assert(b.length == 1);
>      writefln(a, " ", b);
> }
>
> If you run the above code all asserts should pass. If your system
> correctly supports Unicode (works on OS X 10.10) the two printed
> characters should look exactly the same.
>
> \u0301 is the "combining acute accent" [1].
>
> [1] http://www.fileformat.info/info/unicode/char/0301/index.htm

Isn't this solved commonly with a normalization pass? We should have a 
normalizeUTF() that can be inserted in a pipeline. Then the rest of 
Phobos doesn't need to mind these combining characters. -- Andrei