Building C++ modules

H. S. Teoh hsteoh at quickfur.ath.cx
Tue Aug 13 16:32:43 UTC 2019


On Tue, Aug 13, 2019 at 08:21:56AM -0700, H. S. Teoh via Digitalmars-d wrote:
> On Tue, Aug 13, 2019 at 11:19:16AM +0200, Jacob Carlborg via Digitalmars-d wrote:
[...]
> > I don't know how this is implemented in a C++ compiler but can't the
> > lexer use a more abstract token that includes both the usage for
> > templates and for comparison operators? The parser can then figure
> > out exactly what it is.
> 
> It's not so simple.  The problem is that in C++, the *structure* of
> the parse tree changes depending on previous declarations. I.e., the
> lexical structure is not context-free.
[...]

Not to mention, in the more recent C++ revisions, it's not just the
parse tree that changes, even the tokenization changes. I.e.:

	fun<gun<A, B>>(c, d);

can be tokenized as either:

	fun < gun < A , B >> ( c , d ) ;

(i.e., '>>' is the right-shift operator), or:

	fun < gun < A , B > > ( c , d );

(i.e., '>>' is *two* closing template argument list delimiters).


There is simply no way you can write a straightforward, context-free
lexer for C++.  Such a thing simply doesn't exist, because C++ must be
parsed before it can be lexed.  The lexer has to somehow know when '>>'
should be lexed as two tokens, or when it should be lexed as a single
token.  The only way it can know this is if the parser informs it what
parse tree it's currently expecting.  But that means the parser has to
be running *before* the lexer has completely lexified the input.
Furthermore, how does the parser know when it's expecting a template
argument list?  From my previous example, you see that even when an
input statement looks like a template function call, it may not actually
be one.  Which means *semantic analysis* has to have already begun (at
least partially), enough to recognize certain identifiers as templates,
with a feedback loop to the parser, which in turn has a feedback loop to
the lexer so that it knows whether '>>' should be two tokens or one.

You can't get around this inherent complexity without becoming
non-compliant with the C++ spec.

So you see, the seemingly insignificant choice of <> as template
argument list delimiters has far-reaching consequences.  In retrospect,
it was a bad design decision.  '<' and '>' should have been left alone
as comparison operators only, not overloaded with a completely unrelated
meaning that leads to all sorts of pathological ambiguities and needless
parser complexity.


T

-- 
Why ask rhetorical questions? -- JC


More information about the Digitalmars-d mailing list