[dox] Fixing the lexical rule for BinaryInteger

H. S. Teoh hsteoh at quickfur.ath.cx
Fri Aug 16 17:50:24 PDT 2013


On Sat, Aug 17, 2013 at 01:03:35AM +0200, Brian Schott wrote:
> On Friday, 16 August 2013 at 22:43:13 UTC, Andre Artus wrote:
[...]
> >2. Your BinaryInteger and HexadecimalInteger only allow for one of
> >the following (reduced) cases:
> >
> >0b1__ : works
> >0b_1_ : fails
> >0b__1 : fails
> 
> It's my opinion that the compiler should reject all of these because
> I think of the underscore as a separator between digits, but I'm
> constantly fighting the "spec, dmd, and idiom all disagree" issue.
[...]

I remember reading this part of the spec on dlang.org, and I wonder if
it was worded the way it is just for simplicity, because to specify
something like "_ must appear between digits" involves some complicated
BNF rules, which maybe seems like overkill for a single literal.

But sometimes it is good to be precise, if we want to enforce "proper"
conventions for underscores:

<binaryLiteral> ::= "0b" <binaryDigits> <underscoreBinaryDigits>

<binaryDigits> ::= <binaryDigit> <binaryDigits>
		| <binaryDigit>

<underscoreBinaryDigits> ::= ""
		| "_" <binaryDigits>
		| "_" <binaryDigits> <underscoreBinaryDigits>

<binaryDigit> ::= "0"
		| "1"

This BNF spec forces "_" to only appear between two binary digits, and
never more than a single _ in a row. You can also make your parser only
pick up <binaryDigit> when performing semantic on binary literals, so
the other stuff is ignored and only serves to enforce syntax.

I'd be surprised if there's any D code out there that doesn't fit this
spec, to be honest.

But if you want to accept "strange" literals like 0b__1__, you could do
something like:

<binaryLiteral> ::= "0b" <underscoreBinaryDigits> <binaryDigit> <underscoreBinaryDigits>

<underscoreBinaryDigits> ::= "_"
		| "_" <underscoreBinaryDigits>
		| <binaryDigit>
		| <binaryDigit> <underscoreBinaryDigits>
		| ""

<binaryDigit> ::= "0"
		| "1"

The odd form of the rule for <binaryLiteral> is to ensure that there's
at least one binary digit in the string, whereas
<underscoreBinaryDigits> is just a wildcard anything-goes rule that
takes any combination of 0, 1, and _, including the empty string.


T

-- 
That's not a bug; that's a feature!


More information about the Digitalmars-d mailing list