[phobos] UTF-8 string slicing
Jonathan M Davis
jmdavisProg at gmx.com
Thu Aug 18 19:53:37 PDT 2011
On Thursday, August 18, 2011 13:21:29 unDEFER wrote:
> D language specification says that it supports UTF-8 strings, but I can't
> find how to slice UTF-8 string by character index, not by bytes numbers.
> Why there is no simple slice function in std.utf like attached code?
> Thank you in advance.
Hmmm. Such a function isn't entirely a bad idea, but it also makes me a bit
nervous. Slicing is efficient. The slice function that you suggest is not. I
mean, it's efficient enough for what it's doing, but it's not O(1) like slicing
is, so having a slice function could be a bit misleading.
Once drop has been merged in, you'll be able do to this
auto s = takeExactly(drop(str, firstIndex), lastIndex - firstIndex));
to get the same effect. It may be worth adding such a function though.
auto s = slice(firstIndex, lastIndex);
is cleaner. If we add it though, then we should probably give it a different
name. Maybe sliceByElementType? That does seem a bit long though, if accurate.
We'd probably put it in std.range though rather than std.utf, since it could
be useful for any range which isn't actually sliceable. And then there's the
question of whether it would be better to make it lazy. It would make it so
that it wasn't actually a string anymore, but it would make it more efficient
for all of the cases where you don't actually end up using the whole slice.
You can make a pull request for it if you want to, and the best way to handle
it - as well as whether we actually want such a function - can be discussed in
the pull request. I do think that some thought is going to have to go into
what behavior we really want such a function to have though (as well as the
best name for it).
- Jonathan M Davis
More information about the phobos