improving the join function

Wed Oct 13 12:03:37 PDT 2010

On Mon, 11 Oct 2010 20:33:27 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail at erdani.org> wrote:

> I'm looking at http://d.puremagic.com/issues/show_bug.cgi?id=3313 and  
> that got me looking at std.string.join, which currently has the sig:
>
> string join(in string[] words, string sep);
>
> A narrow fix:
>
> Char[] join(Char)(in Char[][] words, in Char[] sep)
> if (isSomeChar!Char);
>
> I think it's reasonable to assume that people would want to join things  
> that aren't necessarily arrays of characters, so T could be pretty much  
> any type. An obvious step towards generalization is:
>
> T[] join(T)(in T[][] items, T[] sep);

This doesn't quite work if T is not a value type (actually, I think it  
does, but only because there are bugs in the compiler).

>
> But join doesn't really need random access for words - really, an input  
> range should suffice. So a generally useful join, almost worth putting  
> in std.algorithm, would be:
>
> ElementType!R1[] join(R1, R2)(R1 items, R2 sep)
> if (isInputRange!R1 && isForwardRange!R2
>      && is(ElementType!R2 : ElementType!R1);
>
> Notice how the separator must be a forward range because it gets spanned  
> multiple times, whereas the items need only be an input range as they  
> are spanned once. This is at the same time a very general and very  
> precise interface.

I think this is fine.  Note that this does not take into account the  
constancy of items, meaning it is legal for this function to mess with the  
original data in items.

Not that I think it's a bad thing, but it does lose some guarantees as  
compared to the original join.  inout can't be used here because it  
doesn't work as a template parameter.

> One thing is still bothering me: the array output type. Why would the  
> "default" output range be an array? What can be done to make join() at  
> the same time a general function and also one that works for strings the  
> way the old join did? For example, if I want to join things into an  
> already-existing buffer, or if I want to write them straight to a file,  
> there's no way to do so without having an array allocation in the loop.  
> I have a couple of ideas but I wouldn't want to bias yours.

Well, one could have a version of join that takes an output range.  It  
would have to return the output range instead of the *result* of the  
output range.  And in that case, the standard join which returns an array  
can be implemented:

ElementType!R1[] join(R1 items, R2 sep) ...
{
    return join(R1, R2, Appender!(ElementType!R1)).data;
}

> I also have a question from people who dislike Phobos. Was there a point  
> in the changes of signature above where you threw your hands thinking,  
> "do the darn string version already and cut all that crap!"?

It's not a problem with phobos, it's a problem with documentation.  There  
is a fundamental issue with documenting complex templates which makes  
function signatures very difficult to understand.  The doc generator can  
and should simplify things, and I think at some point we should address  
this.  In other words, it should be transformed into a form that's easy to  
see that it's the same as string[] join(string[][], string[]).

-Steve