How to get a substring?
Jonathan M Davis
jmdavisProg at gmx.com
Sat Oct 26 19:27:56 PDT 2013
On Saturday, October 26, 2013 15:17:33 Ali Çehreli wrote:
> On 10/26/2013 02:25 PM, Namespace wrote:
> > On Saturday, 26 October 2013 at 21:23:13 UTC, Gautam Goel wrote:
> >> Dumb Newbie Question: I've searched through the library reference, but
> >> I haven't figured out how to extract a substring from a string. I'd
> >> like something like string.substring("Hello", 0, 2) to return "Hel",
> >> for example. What method am I looking for? Thanks!
> >
> > Use slices:
> >
> > string msg = "Hello";
> > string sub = msg[0 .. 2];
>
> Yes but that works only if the string is known to contain only ASCII
> codes. (Otherwise, a string is a collection of UTF-8 code units.)
>
> I could not find a subString() function either but it turns out to be
> trivial to implement with Phobos:
>
> import std.range;
> import std.algorithm;
>
> auto subRange(R)(R s, size_t beg, size_t end)
> {
> return s.dropExactly(beg).take(end - beg);
> }
>
> unittest
> {
> assert("abcçdef".subRange(2, 4).equal("cç"));
> }
>
> void main()
> {}
>
> That function produces a lazy range. To convert it eagerly to a string:
>
> import std.conv;
>
> string subString(string s, size_t beg, size_t end)
> {
> return s.subRange(beg, end).text;
> }
>
> unittest
> {
> assert("Hello".subString(0, 2) == "He");
> }
There's also std.utf.toUTFindex, which allows you to do
auto str = "Hello";
assert(str[0 .. str.toUTFindex(2)] == "He");
but you have to be careful with it when using anything other than 0 for the
first index, because you don't want it to have to traverse the range multiple
times. With your unicode example you're forced to do something like
auto str = "abcçdef";
immutable first = str.toUTFindex(2);
immutable second = str[first .. $].toUTFindex(2) + first;
assert(str[first .. second] == "cç");
It also has the advantage of the final result being a string without having to
do any conversions. So, subString should probably be defined as
inout(C)[] subString(C)(inout(C)[] str, size_t i, size_t j)
if(isSomeChar!C)
{
import std.utf;
immutable first = str.toUTFindex(i);
immutable second = str[first .. $].toUTFindex(i) + first;
return str[first .. second];
}
Using drop/dropExactly with take/takeExactly makes more sense when you want to
iterate over the characters but don't need a string (especially if you're not
necessarily going to iterate over them all), but if you really want a string,
then finding the right index for the slice and then slicing is arguably better.
- Jonathan M Davis
More information about the Digitalmars-d-learn
mailing list