Arbitrary identifiers - syntax
Cecil Ward
cecil at cecilward.com
Tue Jul 4 07:55:51 UTC 2023
This is an intentionally vague post about an idea without a clear
solution, so this is not a concrete proposal, but is intended to
solicit suggestions and ideas.
In mathematics or physics, you might have variables such as t and
t′ the second character of the last variable is a U+2032 (prime),
and there’s also a similar glyph at U+02B9. I posted a while back
about the use of unicode, and in that I was thinking about text
in various non-English human languages. The docs say that D
identifiers such as variable names are chosen from a subset of
Unicode defined by an appendix of C99. This gives a massive list
of acceptable characters in umpteen writing systems and human
languages. How does D deal with that in the lexer? Enormous table
lookup? I would be interested to know, compiler authors.
However in maths many of the symbols such as my earlier example
contain characters that are not legal in identifiers as Unicode
considers them to be maybe punctuation or similar non-ident
concept. How to make D maths-friendly. Yes we can and do write
things like t_prime, but it doesn’t look great. And it’s
longwinded. Yes I hear you about the ease-of-use of Unicode but
that was discussed before and belongs to the earlier thread. Is
there a way of allowing (almost) ‘arbitrary’ content in
identifiers in D’s grammar? Think of the kind of syntax that
exploits say "my file.ext"-type double quoting for otherwise
unacceptable filenames such as this example one with a space in
it.
Is it at all possible that a future D might have a mechanism like
that to accommodate arbitrary identifiers for maths? Maybe even a
kind of extensible lexer? - perhaps way too hard, and an easier
but less attractive solution like the bracketing could be found.
abut whatever is suggested would have to be compact, neat and
minimal so that mathematical equations could clearly resemble D
statements and expressions.
I thought about all the imaginative literal string syntax that we
already have, where a lot of work was done to make literal
strings more workable in various use-cases.
I’d be very interested to hear suggestions as to how we do
special relativity with t, t′, and then t″. `it may be just
simply too hard to do it cleverly. I’m thinking about making D
the most maths-friendly language, Let’s displace Fortran ;-). (
Would need to make complex numbers friendlier for that though,
maybe with more of the syntactic sugar brought back, but that’s
another story. ) I think it would possibly be a good idea to
restrict ‘arbitrary’ characters to a certain subset, not allowing
absolutely any Unicode character, so no whitespace, no control
characters, no existing D tokens such as ‘=‘, maybe disallow all
punctuation characters that are already ‘taken’ in D, that is,
already in use in the existing lexer’s grammar, but I’m unsure
about that. What do do about ‘-‘ hyphen-minus? It is allowed in
some languages, such as XSLT and used there a lot. Perhaps ban it
because of the confusion with minus for subtraction. I don’t
know. It doesn’t seem to be used in physics, for that same reason.
Thoughts?
More information about the Digitalmars-d
mailing list