Bye bye, fast compilation times
Walter Bright
newshound2 at digitalmars.com
Thu Feb 8 02:04:43 UTC 2018
On 2/7/2018 1:07 PM, Nathan S. wrote:
> On Tuesday, 6 February 2018 at 22:29:07 UTC, Walter Bright wrote:
>> nobody uses regex for lexer in a compiler.
>
> Some years ago I was surprised when I saw this in Clojure's source code. It
> appears to still be there today:
>
> https://github.com/clojure/clojure/blob/1215ba346ffea3fe48def6ec70542e3300b6f9ed/src/jvm/clojure/lang/LispReader.java#L66-L73
>
>
> ---
> static Pattern symbolPat =
> Pattern.compile("[:]?([\\D&&[^/]].*/)?(/|[\\D&&[^/]][^/]*)");
> //static Pattern varPat =
> Pattern.compile("([\\D&&[^:\\.]][^:\\.]*):([\\D&&[^:\\.]][^:\\.]*)");
> //static Pattern intPat = Pattern.compile("[-+]?[0-9]+\\.?");
> static Pattern intPat =
> Pattern.compile(
>
> "([-+]?)(?:(0)|([1-9][0-9]*)|0[xX]([0-9A-Fa-f]+)|0([0-7]+)|([1-9][0-9]?)[rR]([0-9A-Za-z]+)|0[0-9]+)(N)?");
>
> static Pattern ratioPat = Pattern.compile("([-+]?[0-9]+)/([0-9]+)");
> static Pattern floatPat =
> Pattern.compile("([-+]?[0-9]+(\\.[0-9]*)?([eE][-+]?[0-9]+)?)(M)?");
> ---
Yes, I'm sure somebody does it. And now that regex has produced a match, you
have to scan it again to turn it into a number, making for slow lexing. And if
regex doesn't produce a match, you get a generic error message rather than
something specific like "character 'A' is not allowed in a numeric literal".
(Generic error messages are one of the downsides of using tools like lex and yacc.)
More information about the Digitalmars-d
mailing list