[phobos] UTF-8 string slicing

Jonathan M Davis jmdavisProg at gmx.com
Thu Aug 18 19:53:37 PDT 2011


On Thursday, August 18, 2011 13:21:29 unDEFER wrote:
> Hello!
> 
> D language specification says that it supports UTF-8 strings, but I can't
> find how to slice UTF-8 string by character index, not by bytes numbers.
> Why there is no simple slice function in std.utf like attached code?
> 
> Thank you in advance.

Hmmm. Such a function isn't entirely a bad idea, but it also makes me a bit 
nervous. Slicing is efficient. The slice function that you suggest is not. I 
mean, it's efficient enough for what it's doing, but it's not O(1) like slicing 
is, so having a slice function could be a bit misleading.

Once drop has been merged in, you'll be able do to this

auto s = takeExactly(drop(str, firstIndex), lastIndex - firstIndex));

to get the same effect. It may be worth adding such a function though. 
Certainly

auto s = slice(firstIndex, lastIndex);

is cleaner. If we add it though, then we should probably give it a different 
name. Maybe sliceByElementType? That does seem a bit long though, if accurate. 
We'd probably put it in std.range though rather than std.utf, since it could 
be useful for any range which isn't actually sliceable. And then there's the 
question of whether it would be better to make it lazy. It would make it so 
that it wasn't actually a string anymore, but it would make it more efficient 
for all of the cases where you don't actually end up using the whole slice.

You can make a pull request for it if you want to, and the best way to handle 
it - as well as whether we actually want such a function - can be discussed in 
the pull request. I do think that some thought is going to have to go into 
what behavior we really want such a function to have though (as well as the 
best name for it).

- Jonathan M Davis


More information about the phobos mailing list