How to get a substring?

Jonathan M Davis jmdavisProg at gmx.com
Sat Oct 26 19:27:56 PDT 2013


On Saturday, October 26, 2013 15:17:33 Ali Çehreli wrote:
> On 10/26/2013 02:25 PM, Namespace wrote:
> > On Saturday, 26 October 2013 at 21:23:13 UTC, Gautam Goel wrote:
> >> Dumb Newbie Question: I've searched through the library reference, but
> >> I haven't figured out how to extract a substring from a string. I'd
> >> like something like string.substring("Hello", 0, 2) to return "Hel",
> >> for example. What method am I looking for? Thanks!
> > 
> > Use slices:
> > 
> > string msg = "Hello";
> > string sub = msg[0 .. 2];
> 
> Yes but that works only if the string is known to contain only ASCII
> codes. (Otherwise, a string is a collection of UTF-8 code units.)
> 
> I could not find a subString() function either but it turns out to be
> trivial to implement with Phobos:
> 
> import std.range;
> import std.algorithm;
> 
> auto subRange(R)(R s, size_t beg, size_t end)
> {
>      return s.dropExactly(beg).take(end - beg);
> }
> 
> unittest
> {
>      assert("abcçdef".subRange(2, 4).equal("cç"));
> }
> 
> void main()
> {}
> 
> That function produces a lazy range. To convert it eagerly to a string:
> 
> import std.conv;
> 
> string subString(string s, size_t beg, size_t end)
> {
>      return s.subRange(beg, end).text;
> }
> 
> unittest
> {
>      assert("Hello".subString(0, 2) == "He");
> }

There's also std.utf.toUTFindex, which allows you to do

    auto str = "Hello";
    assert(str[0 .. str.toUTFindex(2)] == "He");

but you have to be careful with it when using anything other than 0 for the 
first index, because you don't want it to have to traverse the range multiple 
times. With your unicode example you're forced to do something like

    auto str = "abcçdef";
    immutable first = str.toUTFindex(2);
    immutable second = str[first .. $].toUTFindex(2) + first;
    assert(str[first .. second] == "cç");

It also has the advantage of the final result being a string without having to 
do any conversions. So, subString should probably be defined as

    inout(C)[] subString(C)(inout(C)[] str, size_t i, size_t j)
        if(isSomeChar!C)
    {
        import std.utf;
        immutable first = str.toUTFindex(i);
        immutable second = str[first .. $].toUTFindex(i) + first;
        return str[first .. second];
    }

Using drop/dropExactly with take/takeExactly makes more sense when you want to 
iterate over the characters but don't need a string (especially if you're not 
necessarily going to iterate over them all), but if you really want a string, 
then finding the right index for the slice and then slicing is arguably better.

- Jonathan M Davis


More information about the Digitalmars-d-learn mailing list