A Discussion of Tuple Syntax
Wyatt
wyatt.epp at gmail.com
Mon Aug 19 09:53:05 PDT 2013
Note: I'm leading off with a reply to bearophile transplanted
here to stop making OT noise in John's thread about TypeTuple.
On Friday, 16 August 2013 at 23:23:59 UTC, bearophile wrote:
>
> It's short, clear, has a precedent with q{}.
Wait, what is q{}? That's something in D? What does that even
do? I can infer that q{} is probably some manner of scoping or
grouping _something_ somehow, but I have to dig into lexical and
manually search for q{ to find out it's [neither of the things I
expected]. In my view, this right here is really just a
fundamental problem with single-character prefixes and I feel
that's something we should endeavour to avoid, if possible.
> I don't like it a lot, but it's way better than not having
> language support for tuples.
>
On this, I think we all agree.
>> I'd prefer just using parentheses, but I think there were
>> readability problems that caused the DIP to end up with:
>
> More than just readability problems. They were discussed when
> Kenji presented the DIP 32 in this forum. Timon found a
> significant problem with the {} syntax.
>
To be clear, I'm not talking about braces, {}; I'm talking about
parentheses, (). I read over that whole DIP32 thread a couple
times, and didn't see any rationale offered for why the likely
"cleanest" version "can't be used". It wasn't even brought up
(unless I've missed something subtle). In the second thread,
linked in the OP here, they were glossed over again. Now, I
fully believe there's a very good reason that's been written
somewhere, but I _would_ like to know what that is, preferably
documented somewhere less ephemeral and difficult to search than
the newsgroup (such as in DIP32). The closest I've seen so far
is the pull request where Walter and Andrei expressed that it
should be considered further.
On Friday, 16 August 2013 at 21:07:52 UTC, Meta wrote:
> - #(a, b) is unambiguous and would probably be the easiest
> option. I don't think it looks too bad, but some people might
> find it ugly and noisy
>
The octothorpe _is_ much better than the t simply in terms of
readability, though, even more than q{} or t{}, I have concerns
about its ability to be found with an ordinary search engine by
an ordinary user. Have you tried looking for documentation on
weird operators with a search engine lately? They don't exactly
take to it well. :/ (cf. Perl's <=>)
Addressing the other suggestion I saw that cropped up, I
personally find the two-character "bananas" to be impressively
ugly. I considered suggesting some permutation on that same
idea, but after toying with a few examples I find it ends up
looking awful and I think it's honestly annoying to type them in
any form. I even don't like how the unicode version of that one
looks; for doubling up, I think ⟦ ⟧ or ⟪ ⟫ or are easier on the
eyes.
It's times like these that I wish the standard PC keyboard had
something like guillemets « », or corner brackets 「 」 (big fan of
these) in addition to everything else. (Or even that we could use
< > for bracing, though at this point I don't think I could
easily condone that move for D).
I feel weird admitting this, but if we can't use some manner of
bare brace, I think I'd rather have tup(), tup[], tup{} (or even
tuple() et al) as a prefix over any single character.
Another stray thought: is there room for a little box of syntax
chocolate so that e.g. tuple(), [||], and ⟦ ⟧ are all valid? I
don't know if we have a precedent like that off the top of my
head and I'm pretty sure I don't like it, but I thought I'd at
least mention it.
> - There was no consensus on the pattern matching syntax for
> unpacking. For example, #(a, _) = #(1, 2) only introduces one
> binding, "a", into the surrounding scope. The question is, what
> character should go in the place of "_" to signify that a value
> should not be bound? Some suggestions were #(a, $), #(a, @),
> #(a, ?). I personally think #(a, ?) or #(a, *) would be best,
> but all that's really necessary is a symbol that cannot also
> be an identifier.
>
Can't make it a single underscore? Question mark works best then,
IMO. It isn't as burdened with meanings elsewhere (sure there's
ternary and possibly-match in regex, but...have I forgotten
something?)
> Also up for debate was nested patterns, e.g., #(1, 2, #(3,
> 4, #(5, 6))). I don't think there was a consensus on unpacking
> and pattern matching for this situation. One idea I saw that
> looked good:
>
Ah, I was wondering about the case of a tuple of tuples. It's
not mentioned in the DIP that I saw, so I assumed it was allowed,
but explicit mention is probably warranted.
> * Use "..." to pattern match on the tail of an
> expressions, so take the above tuple. The pattern #(1, ?, ...)
> would match the two nested sub-tuples. Or, say, #(1, 2, 3)
> could be matched by #(1, 2, 3), #(1, ?, 3), #(1, ...), etc. You
> obviously can't refer to "..." as a variable, so it also
> becomes a useful way of saying "don't care" for multiple items,
> e.g., #(a, ...) -> only bind the first item in the tuple. We
#(a, ...) looks like to me like it would make a 2-tuple
containing a and a tuple of "everything else", because of the
ellipsis' use in templated code. I think this is a little
unclear, so instead I'd prefer #(a, ? ...) (or whatever ends up
used for the discard character) to make it explicit.
> Assuming the "..." syntax for unpacking, it would be useful to
> name the captured tail. For example, you could unpack #(1, 3,
> #(4, 6)) into #(a, b, x...), where a = 1, b = 3, x = #(4, 6).
> Similarly, #(head, rest...) results in head = 1, rest = #(2,
> #(4, 6)). I think this would be very useful.
>
As a bonus, explicit discard means a simple comma omission is
less likely to completely change the meaning of the statement.
Compare:
#(a, b, ...) //bind the first two elements, discard the rest.
#(a, b ...) //bind the first element to a and everything else
to b
#(a, b, ? ...) //same as the first
#(a, b ? ...) //syntax error
Granted, there's this case:
#(a, ?, ...)
...but that seems like it would be less common just based on how
people conventionally order their data structures.
Thought: Is there sufficient worth in having different tokens for
discarding a single element vs. a range? e.g.
#(a, ?, c, * ...) //bind first and third elements; discard the
rest
// I'm not attached to the asterisk there.
// +, #, or @ would also make some amount of sense to me.
> - Concatenating tuples with ~. This is nice to have, but not
> particularly important.
>
What does concatenating a tuple actually do? That is:
auto a = #(1,2) ~ 3; //Result: a == #(1,2,3), right?
auto b = a ~ #(4,5); //Is b == #(1,2,3,#(4,5)) or is b ==
#(1,2,3,4,5)?
> This is the third or fourth time that I know of that tuple
> syntax has come up, and as of yet, nothing has been done about
> it. I'd really like to get the ball rolling on this, as I think
> a good syntax for these tuple operations would do D a world of
> good. I'm not a compiler hacker, unfortunately, so I can't
> implement it myself as proof of concept... However, I hope that
> discussing it and working out all the kinks will help pave the
> way for an actual implementation.
Great! After this, let's fix properties. ;)
-Wyatt
More information about the Digitalmars-d
mailing list