Python-like slicing and handling UTF-8 strings as a bonus

monarch_dodra monarchdodra at gmail.com
Sun Dec 30 01:49:10 PST 2012


On Sunday, 30 December 2012 at 00:02:17 UTC, FG wrote:
> On 2012-12-30 00:03, Peter Alexander wrote:
>> On Saturday, 29 December 2012 at 22:25:35 UTC, FG wrote:
>>> Forgive me if such a function already exists -- I couldn't 
>>> find it.
>>
>> std.range have drop and take, which work on code points, not 
>> code units. They
>> also handle over-dropping or over-taking gracefully. For 
>> example:
>>
>> string s = "okrągły stół";
>> writeln(s.drop(8).take(3)); // "stó"
>> writeln(s.drop(8).take(100)); // "stół"
>> writeln(s.drop(100).take(100)); // ""
>>
>
> Ah, so this is the way of doing it. Thanks.
>
>
>> It doesn't support negative indexing.
>
> At least dropping off the back is also possible s[2..$-5]:
>
>     writeln(s.retro.drop(5).retro.drop(2)); // "rągły"
>
>     (or with dropBack, without retro, if available)

dropBack is available IFF retro is available. (AFAIK)

> I have no idea how to do s[$-4..$-2] though.

But as a general rule, making a range out of the first (or last) 
elements of a non RA range is a limitation of how ranges can 
"only shrink". strings are a special case of non-RA, 
non-sliceable range you can index and slice...

Anyways, you can always get creative with length:

//----
s = "hello world";
s[s.dropBack(4).length .. s.dropBack(2).length];
//----

In this particular example, it is a bit suboptimal, but quite 
frankly, I'd assume readability trumps performance for this kind 
of code (and is what I'd use in my end code).

One last thing: keep in mind "drop/take" are linear operations. 
If you are handling unicode, then everything is linear anyways, 
so I'm not saying these functions are slow or anything, just 
don't forget they aren't the o(1) functions you'd get with ASCII.


More information about the Digitalmars-d mailing list