A better way to write this function? (style question)

John Colvin john.loughran.colvin at gmail.com
Mon Dec 30 14:30:01 PST 2013


On Monday, 30 December 2013 at 22:17:21 UTC, John Colvin wrote:
> On Monday, 30 December 2013 at 21:40:58 UTC, Thomas Gann wrote:
>> I've written a Markov bot in D, and I have function whose job 
>> it is to take an input string, convert all newline characters 
>> to spaces and all uppercase letters to lowercase, and then 
>> return an array of words that are generated by splitting the 
>> string up by whitespace. Here is the function is question:
>>
>> string[] split_sentence(string input)
>> {
>>    string line;
>>
>>    foreach(c; input)
>>    {
>>        if(c == '\n' || c == '\r')
>>            line ~= ' ';
>>
>>        else
>>            line ~= c.toLower();
>>    }
>>
>>    return line.splitter(' ').filter!(a => a.length).array;
>> }
>>
>> Obviously, one issue is that because the string is immutable, 
>> I can't modify it directly, and so I actually build an 
>> entirely new string in place. I would have just made a mutable 
>> duplicate of the input and modify that, but then I would get 
>> errors returning, because it expects string[] and not 
>> char[][]. Is there a more elegant way to do what I'm doing?
>
>
> A few points:
>
> by declaring a new string and appending to it you are risking a 
> lot of allocations. Either use std.array.appender or allocate 
> the array with the correct size at the beginning.
>
> using .array on the end of the ufcs chain is yet another 
> allocation. It can be avoided using std.algorithm.copy to copy 
> the result back in to 'line'
>
> In my opinion the whole API would be better as range-based:
>
> auto splitSentence(R)(R input)
>     if(isInputRange!R)
> {
>     return input
>            .map!(c => (c == "\n"[0] || c == "\r"[0]) ? ' ' : 
> c.toLower)
>            .splitter!(' ')
>            .filter!(a => !(a.empty));
> }

sorry, ignore that attempt, it's woefully broken...


More information about the Digitalmars-d-learn mailing list