A better way to write this function? (style question)

Mon Dec 30 14:51:37 PST 2013

On Monday, 30 December 2013 at 22:38:43 UTC, Brad Anderson wrote:
> On Monday, 30 December 2013 at 22:30:02 UTC, John Colvin wrote:
>> On Monday, 30 December 2013 at 22:17:21 UTC, John Colvin wrote:
>>> On Monday, 30 December 2013 at 21:40:58 UTC, Thomas Gann 
>>> wrote:
>>>> I've written a Markov bot in D, and I have function whose 
>>>> job it is to take an input string, convert all newline 
>>>> characters to spaces and all uppercase letters to lowercase, 
>>>> and then return an array of words that are generated by 
>>>> splitting the string up by whitespace. Here is the function 
>>>> is question:
>>>>
>>>> string[] split_sentence(string input)
>>>> {
>>>>  string line;
>>>>
>>>>  foreach(c; input)
>>>>  {
>>>>      if(c == '\n' || c == '\r')
>>>>          line ~= ' ';
>>>>
>>>>      else
>>>>          line ~= c.toLower();
>>>>  }
>>>>
>>>>  return line.splitter(' ').filter!(a => a.length).array;
>>>> }
>>>>
>>>> Obviously, one issue is that because the string is 
>>>> immutable, I can't modify it directly, and so I actually 
>>>> build an entirely new string in place. I would have just 
>>>> made a mutable duplicate of the input and modify that, but 
>>>> then I would get errors returning, because it expects 
>>>> string[] and not char[][]. Is there a more elegant way to do 
>>>> what I'm doing?
>>>
>>>
>>> A few points:
>>>
>>> by declaring a new string and appending to it you are risking 
>>> a lot of allocations. Either use std.array.appender or 
>>> allocate the array with the correct size at the beginning.
>>>
>>> using .array on the end of the ufcs chain is yet another 
>>> allocation. It can be avoided using std.algorithm.copy to 
>>> copy the result back in to 'line'
>>>
>>> In my opinion the whole API would be better as range-based:
>>>
>>> auto splitSentence(R)(R input)
>>>   if(isInputRange!R)
>>> {
>>>   return input
>>>          .map!(c => (c == "\n"[0] || c == "\r"[0]) ? ' ' : 
>>> c.toLower)
>>>          .splitter!(' ')
>>>          .filter!(a => !(a.empty));
>>> }
>>
>> sorry, ignore that attempt, it's woefully broken...
>
> Re: weird literal syntax, you didn't happen to be using dpaste 
> to test and have trouble with character literals, did you?  
> Because I did and thought I was going insane until realized 
> DPaste was broken.

oohhh so that was what that was. Anyway, here's what I have so 
far:

import std.range : isInputRange;

auto splitSentence(R)(R input)
     if(isInputRange!R)
{
     import std.algorithm : map, splitter, filter;
     import std.uni : toLower;
     import std.range : ElementType;

     dchar preProc(ElementType!R c)
     {
         return (c == '\n' || c == '\r') ? ' ' : c.toLower;
     }

     return input
         .map!preProc
         .splitter!(c => c == ' ')
         .filter!(a => !(a.empty));
}

I have to have the function instead of a lamda in order to get 
the return type correct. I guess I could cast or use std.conv.to 
instead.

Anyway, the big problem I've hit is that AFAICT std.algorithm 
makes a complete mess of unicode and i can't find a byCodeUnit 
range anywhere in order to make it correct.