Unicode handling comparison

Simen Kjærås simen.kjaras at gmail.com
Wed Nov 27 12:13:07 PST 2013


On 27.11.2013 19:07, Andrei Alexandrescu wrote:
> On 11/27/13 7:43 AM, Jakob Ovrum wrote:
>> On that note, I tried to use std.uni to write a simple example of how to
>> correctly handle this in D, but it became apparent that std.uni should
>> expose something like `byGrapheme` which lazily transforms a range of
>> code points to a range of graphemes (probably needs a `byCodePoint` to
>> do the converse too). The two extant grapheme functions,
>> `decodeGrapheme` and `graphemeStride`, are *awful* for string
>> manipulation (granted, they are probably perfect for text rendering).
>
> Yah, byGrapheme would be a great addition.

It shouldn't be hard to make, either:

import std.uni : Grapheme, decodeGrapheme;
import std.traits : isSomeString;
import std.array : empty;

struct ByGrapheme(T) if (isSomeString!T) {
     Grapheme _front;
     bool _empty;
     T _range;

     this(T value) {
         _range = value;
         popFront();
     }

     @property
     Grapheme front() {
         assert(!empty);
         return _front;
     }

     void popFront() {
         assert(!empty);
         _empty = _range.empty;
         if (!_empty) {
             _front = decodeGrapheme(_range);
         }
     }

     @property
     bool empty() {
         return _empty;
     }
}

auto byGrapheme(T)(T value) if (isSomeString!T) {
     return ByGrapheme!T(value);
}

void main() {
     import std.stdio;
     string s = "তঃঅ৩৵பஂஅபூ௩ᐁᑦᕵᙧᚠᚳᛦᛰ¥¼Ññ";
     writeln(s.byGrapheme);
}


-- 
   Simen


More information about the Digitalmars-d mailing list