Should this work?

Marco Leise Marco.Leise at gmx.de
Thu Jan 9 08:30:18 PST 2014


Am Thu, 09 Jan 2014 15:20:13 +0000
schrieb "John Colvin" <john.loughran.colvin at gmail.com>:

> On Thursday, 9 January 2014 at 14:34:43 UTC, John Colvin wrote:
> > On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:
> >> This works fine:
> >>  string x = find("Hello", 'H');
> >>
> >> This doesn't:
> >>  string y = find(retro("Hello"), 'H');
> >>  > Error: cannot implicitly convert expression 
> >> (find(retro("Hello"), 'H'))
> >> of type Result!() to string
> >
> > In order to return the result as a string it would require an 
> > allocation. You have to request that allocation (and associated 
> > eager evaluation) explicitly
> >
> > string y = "Hello".retro.find('H').to!string;
> >
> >
> > However, I think to get the expected result from unicode you 
> > need
> >
> > string y = "Hello".byGrapheme.retro.find('H').to!string;
> >
> > but I might be wrong.
> 
> Oh. I see you actually wanted strrchr behaviour. That's different.

The point about graphemes is good. D's functions still stop
mid-way. From UTF-8 you can iterate UTF-32 code points, but
grapheme clusters are the new characters. I.e. the basic need
to iterate Unicode _characters_ is not supported!
I cannot even come up with use cases for working with code
points and think they are a conceptual black hole. Something
carried over from a time when grapheme clusters didn't exist.

When you search for 'A', 'Ä' shows up when it is built from
an A and the "two dots" symbol. It also has the walk length 2.
This isn't an issue as long as we use strings from languages
that are traditionally well supported with single code-unit
characters.

Basically the element type when iterating over a string would
have to be another string of arbitrary length, since you could
attach any number of combining diacritical symbols to a
letter. See?: e͜͟͡͞

-- 
Marco



More information about the Digitalmars-d mailing list