Should this work?

Adam D. Ruppe destructionator at gmail.com
Thu Jan 9 17:16:12 PST 2014


On Friday, 10 January 2014 at 00:56:36 UTC, Manu wrote:
> So is it 'correct'?

Yes, with the caveat that it might find a surrogate pair (like H 
followed by an accent code point). That's what byGrapheme is 
about: combining those pairs.

But meh, do you really care about that?

indexOf does correctly handle the UTF formats and returns an 
index suitable for slicing (or -1).

auto idx = "cool".indexOf("o");
if(idx == -1)
   throw new Exception("not found");

auto before = "cool"[0 .. idx];
auto after = "cool"[idx + 1 .. $];


Code like that will always yield valid UTF strings. Again, it 
*might* break up a pair of code points, but it *will* correctly 
handle multi-byte code points... so probably good enough for 99% 
of use cases.

> Looks like bytes, but then it talks

It is bytes on string, and wchars on wstring; it is whatever unit 
is correct for slicing the type you pass it.

> The D docs are pretty terrible, they don't do much to help you 
> find what you're looking for.

I mostly agree (and this is partially why I started writing 
http://dpldocs.info/ but I never finished that so it isn't much 
better). I don't notice it so much because I already know where 
to look for most things but regardless I agree it is a pain for 
anything new.


More information about the Digitalmars-d mailing list