[dox] Fixing the lexical rule for BinaryInteger

Andre Artus andre.artus at gmail.com
Sat Aug 17 16:39:28 PDT 2013


> [...]
>>> H. S. Teoh wrote:
>>> I was just using BNF to show that it's possible to specify 
>>> the behaviour precisely.  And also that it's rather 
>>> convoluted just for something as intuitively straightforward 
>>> as an integer literal. Which is a likely reason why the 
>>> current specs are a bit blurry about what should/shouldn't be 
>>> allowed.

>> Andre Artus wrote:
>> I don't think I've seen lexemes defined using (a variant of) 
>> BNF before, most often a form of regular expressions are used. 
>> One could cut down and clarify the page describing the lexical 
>> syntax significantly employing simple regular expressions.

> H. S. Teoh wrote:
> You're right, I think the D specs page on literals using BNF is 
> a bit of an overkill. Maybe it should be rewritten using 
> regexen.
> It would be easier to understand, for one thing.

I would not mind doing this, I'll see what Walter says.

It would also be quite easy to generate syntax diagrams from a 
reg-expr.

> [...]
>>> H. S. Teoh wrote:
>>> I know that, but I'm saying that hardly *any* code would 
>>> break if
>>> we made DMD reject things like this. I don't think anybody in
>>> their right mind would write code like that. (Unless they were
>>> competing in the IODCC... :-P)
>> 
>> I agree that the compiler should probably break that code, I 
>> believe some breaking changes are good when they help the 
>> programmer fix potential bugs. But I am also someone who 
>> compiles with "Treat warnings as errors".

> H. S. Teoh wrote:
> Walter is someone who believes that compilers should only have 
> errors, not warnings. :)

That can go both ways, but I suspect you mean that in the good 
way.


> [...]
>>>> Andre Artus wrote:
>>>> It's not a problem implementing the rule, I am more concerned
>>>> with documenting it in a clear and unambiguous way so that
>>>> people building tools from it can get it right. BNF isn't 
>>>> always the easiest way to do so, but it's what being used.
>> 
>>> H. S. Teoh wrote:
>>> Well, you could bug Walter about what *should* be accepted,
>> 
>> I'm not sure how to go about that.

> H. S. Teoh wrote:
> Email him and ask? :)

I'll try that.

>>> H. S. Teoh wrote:
>>> and if he agrees to restrict it to having _ only between two
>>> digits, then you'd file a bug against DMD.
>> 
>> Well if we could get a ruling on this then we could include
>> HexadecimalInteger in the ruling as it has similar behaviour 
>> in DMD.
>> 
>> The current specification for DecimalInteger also allows a 
>> trailing sequence of underscores. It also does not include the 
>> sign as part of the token value.

> H. S. Teoh wrote:
> Yeah that sounds like a bug in the specs.

Yes, I believe so. The same issues are under "Floating Point 
Literals". Should be easy to fix.

>> Possible regex alternatives (note I do not include the sign, 
>> as per current spec).
>> 
>> (0|[1-9]([_]*[0-9])*)
>> 
>> or arguably better
>> (0|[1-9]([_]?[0-9])*)
> [...]
>
> I think it should be:
>
> 	(0|[1-9]([0-9]*(_[0-9]+)*)?)
>
> That is, either it's a 0, or a single digit from 1-9, or 1-9 
> followed by (zero or more digits 0-9 followed by zero or more 
> (underscore followed by one or more digits 0-9)). This enforces 
> only a single underscore between digits, and no 
> preceding/trailing underscores. So it would exclude things like 
> 12_____34, which is just as ridiculous as 123___, and only 
> allow 12_34.

I concur with your assessment.
I believe my second reg-ex is functionally equivalent to the one 
you propose (test results below). Although I would concede that 
yours may be easier to grok.


The following match my regex (assuming it's whitespace delimited)

1
1_1
1_2_3_4_5_6_7_8_9_0	
1234_45_15		
1234567_8_90		
123456789_0		
1_234567890		
12_34567890		
123_4567890
1234_567890
12345_67890
123456_7890
1234567_890
12345678_90
123456789_0
123_45_6_789012345_67890

Whereas these do not

_1
1_
_1_
1______1
-12_34
-1234
123_45_6__789012345_67890
1234567890_
_1234567890_
_1234567890
1234567890_


More information about the Digitalmars-d mailing list