A better way to write this function? (style question)
John Colvin
john.loughran.colvin at gmail.com
Mon Dec 30 14:51:37 PST 2013
On Monday, 30 December 2013 at 22:38:43 UTC, Brad Anderson wrote:
> On Monday, 30 December 2013 at 22:30:02 UTC, John Colvin wrote:
>> On Monday, 30 December 2013 at 22:17:21 UTC, John Colvin wrote:
>>> On Monday, 30 December 2013 at 21:40:58 UTC, Thomas Gann
>>> wrote:
>>>> I've written a Markov bot in D, and I have function whose
>>>> job it is to take an input string, convert all newline
>>>> characters to spaces and all uppercase letters to lowercase,
>>>> and then return an array of words that are generated by
>>>> splitting the string up by whitespace. Here is the function
>>>> is question:
>>>>
>>>> string[] split_sentence(string input)
>>>> {
>>>> string line;
>>>>
>>>> foreach(c; input)
>>>> {
>>>> if(c == '\n' || c == '\r')
>>>> line ~= ' ';
>>>>
>>>> else
>>>> line ~= c.toLower();
>>>> }
>>>>
>>>> return line.splitter(' ').filter!(a => a.length).array;
>>>> }
>>>>
>>>> Obviously, one issue is that because the string is
>>>> immutable, I can't modify it directly, and so I actually
>>>> build an entirely new string in place. I would have just
>>>> made a mutable duplicate of the input and modify that, but
>>>> then I would get errors returning, because it expects
>>>> string[] and not char[][]. Is there a more elegant way to do
>>>> what I'm doing?
>>>
>>>
>>> A few points:
>>>
>>> by declaring a new string and appending to it you are risking
>>> a lot of allocations. Either use std.array.appender or
>>> allocate the array with the correct size at the beginning.
>>>
>>> using .array on the end of the ufcs chain is yet another
>>> allocation. It can be avoided using std.algorithm.copy to
>>> copy the result back in to 'line'
>>>
>>> In my opinion the whole API would be better as range-based:
>>>
>>> auto splitSentence(R)(R input)
>>> if(isInputRange!R)
>>> {
>>> return input
>>> .map!(c => (c == "\n"[0] || c == "\r"[0]) ? ' ' :
>>> c.toLower)
>>> .splitter!(' ')
>>> .filter!(a => !(a.empty));
>>> }
>>
>> sorry, ignore that attempt, it's woefully broken...
>
> Re: weird literal syntax, you didn't happen to be using dpaste
> to test and have trouble with character literals, did you?
> Because I did and thought I was going insane until realized
> DPaste was broken.
oohhh so that was what that was. Anyway, here's what I have so
far:
import std.range : isInputRange;
auto splitSentence(R)(R input)
if(isInputRange!R)
{
import std.algorithm : map, splitter, filter;
import std.uni : toLower;
import std.range : ElementType;
dchar preProc(ElementType!R c)
{
return (c == '\n' || c == '\r') ? ' ' : c.toLower;
}
return input
.map!preProc
.splitter!(c => c == ' ')
.filter!(a => !(a.empty));
}
I have to have the function instead of a lamda in order to get
the return type correct. I guess I could cast or use std.conv.to
instead.
Anyway, the big problem I've hit is that AFAICT std.algorithm
makes a complete mess of unicode and i can't find a byCodeUnit
range anywhere in order to make it correct.
More information about the Digitalmars-d-learn
mailing list