Splitting Ranges using Lambda Predicates

Wed Jun 11 01:58:56 PDT 2014

On Tuesday, 10 June 2014 at 22:31:37 UTC, Nordlöw wrote:
>> Either way, it shouldn't be too hard to implement. Base it off 
>> "splitter!pred", which is actually quite trivial. AFAIK, your
>
> What do you mean by basing it off splitter!pred - should I 
> start with some existing splitter algorithm in Phobos or start 
> from scratch?
>
> Thx.

I meant mostly copy pasting it, and modifying it to your needs. 
For example, I adapted it into this. For simplicity, I stripped 
infinite and forward only range support. The only functions I 
actually modified were "findTerminator", to actually find 
according to what I want, and popFront.

//----
auto slicer(alias isTerminator, Range)(Range input)
if (((isRandomAccessRange!Range && hasSlicing!Range) || 
isSomeString!Range)
     && is(typeof(unaryFun!isTerminator(input.front))))
{
     return SlicerResult!(unaryFun!isTerminator, Range)(input);
}

private struct SlicerResult(alias isTerminator, Range)
{
     alias notTerminator = not!isTerminator;

     private Range _input;
     private size_t _end = 0;

     private void findTerminator()
     {
         auto r = 
_input.save.find!(not!isTerminator).find!isTerminator();
         _end = _input.length - r.length;
     }

     this(Range input)
     {
         _input = input;

         if (!_input.empty)
             findTerminator();
         else
             _end = size_t.max;
     }

     static if (isInfinite!Range)
         enum bool empty = false;  // Propagate infiniteness.
     else
         @property bool empty()
         {
             return _end == size_t.max;
         }

     @property auto front()
     {
         return _input[0 .. _end];
     }

     void popFront()
     {
         _input = _input[_end .. _input.length];
         if (_input.empty)
         {
             _end = size_t.max;
             return;
         }
         findTerminator();
     }

     @property typeof(this) save()
     {
         auto ret = this;
         ret._input = _input.save;
         return ret;
     }
}
//----

This will split on before the first element where pred is true, 
provided there are previous elements where pred is false:

//----
void main()
{
     "SomeGreatVariableName"  .slicer!isUpper.writeln();
     "someGGGreatVariableName".slicer!isUpper.writeln();
     "".slicer!isUpper.writeln();
     "a".slicer!isUpper.writeln();
     "A".slicer!isUpper.writeln();
}
//----
["Some", "Great", "Variable", "Name"]
["some", "GGGreat", "Variable", "Name"]
[]
["a"]
["A"]
//----

This may or may not be what you wanted, depending on how you want 
to split "GGGreat". If you wanted it to simply split *ON* the 
left of every capital letter, then you can modify the the find 
terminator into:

     private void findTerminator()
     {
         auto r = _input.save.dropOne.find!isTerminator;
         _end = _input.length - r.length;
     }

And you'd get:
["some", "G", "G", "Great", "Variable", "Name"]

*******************************************
*******************************************
*******************************************

In any case, yeah, it shouldn't be too hard to shape it into what 
you want. A more involved solution to this problem could be to 
simply pass a "searchFun" predicate, in which case you'd be able 
to split not just according to any "unitary predicate", but 
according to an entire "range search strategy:

//----
auto slicer(alias searchFun, Range)(Range input)
if (((isRandomAccessRange!Range && hasSlicing!Range) || 
isSomeString!Range)
     && is(typeof(searchFun(input))))
{
     return SlicerResult!(searchFun, Range)(input);
}

private struct SlicerResult(alias searchFun, Range)
{
     private Range _input;
     private size_t _end = 0;

     private void findTerminator()
     {
         auto r = searchFun(_input.save);
         _end = _input.length - r.length;
     }

     ...
//----

And then:

"SomeGGGGreatVariableName".slicer!((s)=>s.find!isLower.find!isUpper).writeln();
     "someGGGreatVariableName" 
.slicer!((s)=>s.dropOne.find!isUpper).writeln();

["Some", "GGGGreat", "Variable", "Name"]
["some", "G", "G", "Great", "Variable", "Name"]

Just ideas.