[dox] Fixing the lexical rule for BinaryInteger
Andre Artus
andre.artus at gmail.com
Sat Aug 17 16:39:28 PDT 2013
> [...]
>>> H. S. Teoh wrote:
>>> I was just using BNF to show that it's possible to specify
>>> the behaviour precisely. And also that it's rather
>>> convoluted just for something as intuitively straightforward
>>> as an integer literal. Which is a likely reason why the
>>> current specs are a bit blurry about what should/shouldn't be
>>> allowed.
>> Andre Artus wrote:
>> I don't think I've seen lexemes defined using (a variant of)
>> BNF before, most often a form of regular expressions are used.
>> One could cut down and clarify the page describing the lexical
>> syntax significantly employing simple regular expressions.
> H. S. Teoh wrote:
> You're right, I think the D specs page on literals using BNF is
> a bit of an overkill. Maybe it should be rewritten using
> regexen.
> It would be easier to understand, for one thing.
I would not mind doing this, I'll see what Walter says.
It would also be quite easy to generate syntax diagrams from a
reg-expr.
> [...]
>>> H. S. Teoh wrote:
>>> I know that, but I'm saying that hardly *any* code would
>>> break if
>>> we made DMD reject things like this. I don't think anybody in
>>> their right mind would write code like that. (Unless they were
>>> competing in the IODCC... :-P)
>>
>> I agree that the compiler should probably break that code, I
>> believe some breaking changes are good when they help the
>> programmer fix potential bugs. But I am also someone who
>> compiles with "Treat warnings as errors".
> H. S. Teoh wrote:
> Walter is someone who believes that compilers should only have
> errors, not warnings. :)
That can go both ways, but I suspect you mean that in the good
way.
> [...]
>>>> Andre Artus wrote:
>>>> It's not a problem implementing the rule, I am more concerned
>>>> with documenting it in a clear and unambiguous way so that
>>>> people building tools from it can get it right. BNF isn't
>>>> always the easiest way to do so, but it's what being used.
>>
>>> H. S. Teoh wrote:
>>> Well, you could bug Walter about what *should* be accepted,
>>
>> I'm not sure how to go about that.
> H. S. Teoh wrote:
> Email him and ask? :)
I'll try that.
>>> H. S. Teoh wrote:
>>> and if he agrees to restrict it to having _ only between two
>>> digits, then you'd file a bug against DMD.
>>
>> Well if we could get a ruling on this then we could include
>> HexadecimalInteger in the ruling as it has similar behaviour
>> in DMD.
>>
>> The current specification for DecimalInteger also allows a
>> trailing sequence of underscores. It also does not include the
>> sign as part of the token value.
> H. S. Teoh wrote:
> Yeah that sounds like a bug in the specs.
Yes, I believe so. The same issues are under "Floating Point
Literals". Should be easy to fix.
>> Possible regex alternatives (note I do not include the sign,
>> as per current spec).
>>
>> (0|[1-9]([_]*[0-9])*)
>>
>> or arguably better
>> (0|[1-9]([_]?[0-9])*)
> [...]
>
> I think it should be:
>
> (0|[1-9]([0-9]*(_[0-9]+)*)?)
>
> That is, either it's a 0, or a single digit from 1-9, or 1-9
> followed by (zero or more digits 0-9 followed by zero or more
> (underscore followed by one or more digits 0-9)). This enforces
> only a single underscore between digits, and no
> preceding/trailing underscores. So it would exclude things like
> 12_____34, which is just as ridiculous as 123___, and only
> allow 12_34.
I concur with your assessment.
I believe my second reg-ex is functionally equivalent to the one
you propose (test results below). Although I would concede that
yours may be easier to grok.
The following match my regex (assuming it's whitespace delimited)
1
1_1
1_2_3_4_5_6_7_8_9_0
1234_45_15
1234567_8_90
123456789_0
1_234567890
12_34567890
123_4567890
1234_567890
12345_67890
123456_7890
1234567_890
12345678_90
123456789_0
123_45_6_789012345_67890
Whereas these do not
_1
1_
_1_
1______1
-12_34
-1234
123_45_6__789012345_67890
1234567890_
_1234567890_
_1234567890
1234567890_
More information about the Digitalmars-d
mailing list