String Literal Docs

Ellery Newcomer ellery-newcomer at utulsa.edu
Mon Jun 21 13:20:16 PDT 2010


On 06/21/2010 02:21 PM, Alix Pexton wrote:
> On 20/06/2010 22:46, Alix Pexton wrote:
>> On 20/06/2010 21:37, Ellery Newcomer wrote:
>>> On 06/20/2010 03:01 PM, Alix Pexton wrote:
>>>> On 19/06/2010 21:12, Alix Pexton wrote:
>>>>> I've been sketching some grammar diagrams for D2.0, a little like
>>>>> those
>>>>> on JSON.org, and of course I didn't get far before I ran into
>>>>> something
>>>>> odd.
>>>>>
>>>>
>>>> I think I will take the plunge and base my diagrams on the source of
>>>> DMD. After looking at the code in lexer.c, it does not seem as far
>>>> beyond my rusty old c++ parsing skills as I had expected! Massive
>>>> credit
>>>> to Walter for having a codebase that is as mature as DMD without it
>>>> turning into a labyrinth of preprocessor macros and cryptic
>>>> "comefrom"s.
>>>>
>>>> This will mean however that my little project may take a little longer,
>>>> sigh...
>>>>
>>>> A...
>>>
>>> Do share. I've always been too lazy to read lexer.c, and from this
>>> discussion, it sounds like there are a few spots where my own lexer
>>> grammar is incorrect (or at least differs from dmd).
>>>
>>
>> of course ^^
>>
>> A...
>
> Well, I think I have got my head around lexer.c now, and its various
> peculiarities, like "000377." being a valid float (although not
> according to my shiny new, limited edition copy of tDPL (fig2.2 p35)^^).

Oh wow. That's a sweet little diagram. Those dots are hard to see though.

>
> The weirdness occurs because some of some corner cases are handled not
> by the neat little state state machine that validates reals, but in the
> scanner at the point where it recognises a number beginning with a zero.
> The productions in lex.html represent the range of inputs that are
> accepted by the state machine without taking into account that the
> scanner rejects the sequence "._" (which makes sense as that is the
> identifier "_" in the outer scope).

to hell with lexer.c. I'm not changing anything.

>
> Andrei's analysis in tDPL also points out that 0xp0 is a valid hexfloat,
> but a strict reading of lex.html would not allow it.
>
> Overall the diagram for hexfloat is much simpler than the one for
> decimalfloat, which I think will have to be split into 3 ><
>
> A...
>
> PS, octal must die!

I'll settle for modified syntax 0c123. But yeah.

Are your diagrams solely concerned with the lexer? Because I have a 
(messy) parser grammar which I'm a bit more confident about if you're 
interested.


More information about the Digitalmars-d mailing list