[dox] Fixing the lexical rule for BinaryInteger

H. S. Teoh hsteoh at quickfur.ath.cx
Sat Aug 17 15:26:25 PDT 2013


On Sat, Aug 17, 2013 at 11:29:03PM +0200, Andre Artus wrote:
[...]
> >H. S. Teoh wrote:
> >I was just using BNF to show that it's possible to specify the
> >behaviour precisely.  And also that it's rather convoluted just for
> >something as intuitively straightforward as an integer literal. Which
> >is a likely reason why the current specs are a bit blurry about what
> >should/shouldn't be allowed.
> 
> I don't think I've seen lexemes defined using (a variant of) BNF
> before, most often a form of regular expressions are used. One could
> cut down and clarify the page describing the lexical syntax
> significantly employing simple regular expressions.

You're right, I think the D specs page on literals using BNF is a bit of
an overkill. Maybe it should be rewritten using regexen. It would be
easier to understand, for one thing.


[...]
> >H. S. Teoh wrote:
> >I know that, but I'm saying that hardly *any* code would break if
> >we made DMD reject things like this. I don't think anybody in
> >their right mind would write code like that. (Unless they were
> >competing in the IODCC... :-P)
> 
> I agree that the compiler should probably break that code, I believe
> some breaking changes are good when they help the programmer fix
> potential bugs. But I am also someone who compiles with "Treat
> warnings as errors".

Walter is someone who believes that compilers should only have errors,
not warnings. :)


[...]
> >>Andre Artus wrote:
> >>It's not a problem implementing the rule, I am more concerned
> >>with documenting it in a clear and unambiguous way so that
> >>people building tools from it can get it right. BNF isn't always
> >>the easiest way to do so, but it's what being used.
> 
> >H. S. Teoh wrote:
> >Well, you could bug Walter about what *should* be accepted,
> 
> I'm not sure how to go about that.

Email him and ask? :)


> >H. S. Teoh wrote:
> >and if he agrees to restrict it to having _ only between two
> >digits, then you'd file a bug against DMD.
> 
> Well if we could get a ruling on this then we could include
> HexadecimalInteger in the ruling as it has similar behaviour in DMD.
> 
> The current specification for DecimalInteger also allows a trailing
> sequence of underscores. It also does not include the sign as part
> of the token value.

Yeah that sounds like a bug in the specs.


> Possible regex alternatives (note I do not include the sign, as per
> current spec).
> 
> (0|[1-9]([_]*[0-9])*)
> 
> or arguably better
> (0|[1-9]([_]?[0-9])*)
[...]

I think it should be:

	(0|[1-9]([0-9]*(_[0-9]+)*)?)

That is, either it's a 0, or a single digit from 1-9, or 1-9 followed by
(zero or more digits 0-9 followed by zero or more (underscore followed
by one or more digits 0-9)). This enforces only a single underscore
between digits, and no preceding/trailing underscores. So it would
exclude things like 12_____34, which is just as ridiculous as 123___,
and only allow 12_34.


T

-- 
Blunt statements really don't have a point.


More information about the Digitalmars-d mailing list