Building C++ modules
H. S. Teoh
hsteoh at quickfur.ath.cx
Tue Aug 13 16:32:43 UTC 2019
On Tue, Aug 13, 2019 at 08:21:56AM -0700, H. S. Teoh via Digitalmars-d wrote:
> On Tue, Aug 13, 2019 at 11:19:16AM +0200, Jacob Carlborg via Digitalmars-d wrote:
[...]
> > I don't know how this is implemented in a C++ compiler but can't the
> > lexer use a more abstract token that includes both the usage for
> > templates and for comparison operators? The parser can then figure
> > out exactly what it is.
>
> It's not so simple. The problem is that in C++, the *structure* of
> the parse tree changes depending on previous declarations. I.e., the
> lexical structure is not context-free.
[...]
Not to mention, in the more recent C++ revisions, it's not just the
parse tree that changes, even the tokenization changes. I.e.:
fun<gun<A, B>>(c, d);
can be tokenized as either:
fun < gun < A , B >> ( c , d ) ;
(i.e., '>>' is the right-shift operator), or:
fun < gun < A , B > > ( c , d );
(i.e., '>>' is *two* closing template argument list delimiters).
There is simply no way you can write a straightforward, context-free
lexer for C++. Such a thing simply doesn't exist, because C++ must be
parsed before it can be lexed. The lexer has to somehow know when '>>'
should be lexed as two tokens, or when it should be lexed as a single
token. The only way it can know this is if the parser informs it what
parse tree it's currently expecting. But that means the parser has to
be running *before* the lexer has completely lexified the input.
Furthermore, how does the parser know when it's expecting a template
argument list? From my previous example, you see that even when an
input statement looks like a template function call, it may not actually
be one. Which means *semantic analysis* has to have already begun (at
least partially), enough to recognize certain identifiers as templates,
with a feedback loop to the parser, which in turn has a feedback loop to
the lexer so that it knows whether '>>' should be two tokens or one.
You can't get around this inherent complexity without becoming
non-compliant with the C++ spec.
So you see, the seemingly insignificant choice of <> as template
argument list delimiters has far-reaching consequences. In retrospect,
it was a bad design decision. '<' and '>' should have been left alone
as comparison operators only, not overloaded with a completely unrelated
meaning that leads to all sorts of pathological ambiguities and needless
parser complexity.
T
--
Why ask rhetorical questions? -- JC
More information about the Digitalmars-d
mailing list