A Discussion of Tuple Syntax

Wyatt wyatt.epp at gmail.com
Mon Aug 19 09:53:05 PDT 2013


Note: I'm leading off with a reply to bearophile transplanted 
here to stop making OT noise in John's thread about TypeTuple.

On Friday, 16 August 2013 at 23:23:59 UTC, bearophile wrote:
>
> It's short, clear, has a precedent with q{}.

Wait, what is q{}?  That's something in D?  What does that even 
do?  I can infer that q{} is probably some manner of scoping or 
grouping _something_ somehow, but I have to dig into lexical and 
manually search for q{ to find out it's [neither of the things I 
expected].  In my view, this right here is really just a 
fundamental problem with single-character prefixes and I feel 
that's something we should endeavour to avoid, if possible.

> I don't like it a lot, but it's way better than not having
> language support for tuples.
>
On this, I think we all agree.

>> I'd prefer just using parentheses, but I think there were 
>> readability problems that caused the DIP to end up with:
>
> More than just readability problems. They were discussed when 
> Kenji presented the DIP 32 in this forum. Timon found a 
> significant problem with the {} syntax.
>
To be clear, I'm not talking about braces, {}; I'm talking about 
parentheses, ().  I read over that whole DIP32 thread a couple 
times, and didn't see any rationale offered for why the likely 
"cleanest" version "can't be used".  It wasn't even brought up 
(unless I've missed something subtle).  In the second thread, 
linked in the OP here, they were glossed over again.  Now, I 
fully believe there's a very good reason that's been written 
somewhere, but I _would_ like to know what that is, preferably 
documented somewhere less ephemeral and difficult to search than 
the newsgroup (such as in DIP32).  The closest I've seen so far 
is the pull request where Walter and Andrei expressed that it 
should be considered further.

On Friday, 16 August 2013 at 21:07:52 UTC, Meta wrote:
> - #(a, b) is unambiguous and would probably be the easiest 
> option. I don't think it looks too bad, but some people might 
> find it ugly and noisy
>
The octothorpe _is_ much better than the t simply in terms of 
readability, though, even more than q{} or t{}, I have concerns 
about its ability to be found with an ordinary search engine by 
an ordinary user.  Have you tried looking for documentation on 
weird operators with a search engine lately?  They don't exactly 
take to it well. :/ (cf. Perl's <=>)

Addressing the other suggestion I saw that cropped up, I 
personally find the two-character "bananas" to be impressively 
ugly.  I considered suggesting some permutation on that same 
idea, but after toying with a few examples I find it ends up 
looking awful and I think it's honestly annoying to type them in 
any form.  I even don't like how the unicode version of that one 
looks; for doubling up, I think ⟦ ⟧ or ⟪ ⟫ or are easier on the 
eyes.

It's times like these that I wish the standard PC keyboard had 
something like guillemets « », or corner brackets 「 」 (big fan of 
these) in addition to everything else. (Or even that we could use 
< > for bracing, though at this point I don't think I could 
easily condone that move for D).

I feel weird admitting this, but if we can't use some manner of 
bare brace, I think I'd rather have tup(), tup[], tup{} (or even 
tuple() et al) as a prefix over any single character.

Another stray thought: is there room for a little box of syntax 
chocolate so that e.g. tuple(), [||], and ⟦ ⟧ are all valid?  I 
don't know if we have a precedent like that off the top of my 
head and I'm pretty sure I don't like it, but I thought I'd at 
least mention it.

> - There was no consensus on the pattern matching syntax for 
> unpacking. For example, #(a, _) = #(1, 2) only introduces one 
> binding, "a", into the surrounding scope. The question is, what 
> character should go in the place of "_" to signify that a value 
> should not be bound? Some suggestions were #(a, $), #(a, @), 
> #(a, ?). I personally think #(a, ?) or #(a, *) would be best, 
> but all that's  really necessary is a symbol that cannot also 
> be an identifier.
>
Can't make it a single underscore? Question mark works best then, 
IMO.  It isn't as burdened with meanings elsewhere (sure there's 
ternary and possibly-match in regex, but...have I forgotten 
something?)

>     Also up for debate was nested patterns, e.g., #(1, 2, #(3, 
> 4, #(5, 6))). I don't think there was a consensus on unpacking 
> and pattern matching for this situation. One idea I saw that 
> looked good:
>
Ah, I was wondering about the case of a tuple of tuples.  It's 
not mentioned in the DIP that I saw, so I assumed it was allowed, 
but explicit mention is probably warranted.

>         * Use "..." to pattern match on the tail of an 
> expressions, so take the above tuple. The pattern #(1, ?, ...) 
> would match the two nested sub-tuples. Or, say, #(1, 2, 3) 
> could be matched by #(1, 2, 3), #(1, ?, 3), #(1, ...), etc. You 
> obviously can't refer to "..." as a variable, so it also 
> becomes a useful way of saying "don't care" for multiple items, 
> e.g., #(a, ...) -> only bind the first item in the tuple. We

#(a, ...) looks like to me like it would make a 2-tuple 
containing a and a tuple of "everything else", because of the 
ellipsis' use in templated code.  I think this is a little 
unclear, so instead I'd prefer #(a, ? ...) (or whatever ends up 
used for the discard character) to make it explicit.

> Assuming the "..." syntax for unpacking, it would be useful to 
> name the captured tail. For example, you could unpack #(1, 3, 
> #(4, 6)) into #(a, b, x...), where a = 1, b = 3, x = #(4, 6). 
> Similarly, #(head, rest...) results in head = 1, rest = #(2, 
> #(4, 6)). I think this would be very useful.
>
As a bonus, explicit discard means a simple comma omission is 
less likely to completely change the meaning of the statement.  
Compare:
#(a, b, ...)   //bind the first two elements, discard the rest.
#(a, b ...)    //bind the first element to a and everything else 
to b
#(a, b, ? ...) //same as the first
#(a, b ? ...)  //syntax error

Granted, there's this case:
#(a, ?, ...)
...but that seems like it would be less common just based on how 
people conventionally order their data structures.

Thought: Is there sufficient worth in having different tokens for 
discarding a single element vs. a range? e.g.
#(a, ?, c, * ...) //bind first and third elements; discard the 
rest
// I'm not attached to the asterisk there.
// +, #, or @ would also make some amount of sense to me.

> - Concatenating tuples with ~. This is nice to have, but not 
> particularly important.
>
What does concatenating a tuple actually do?  That is:
auto a = #(1,2) ~ 3; //Result: a == #(1,2,3), right?
auto b = a ~ #(4,5); //Is  b == #(1,2,3,#(4,5)) or is b == 
#(1,2,3,4,5)?

> This is the third or fourth time that I know of that tuple 
> syntax has come up, and as of yet, nothing has been done about 
> it. I'd really like to get the ball rolling on this, as I think 
> a good syntax for these tuple operations would do D a world of 
> good. I'm not a compiler hacker, unfortunately, so I can't 
> implement it myself as proof of concept... However, I hope that 
> discussing it and working out all the kinks will help pave the 
> way for an actual implementation.

Great! After this, let's fix properties. ;)

-Wyatt


More information about the Digitalmars-d mailing list