Bye bye, fast compilation times

Walter Bright newshound2 at digitalmars.com
Thu Feb 8 02:04:43 UTC 2018


On 2/7/2018 1:07 PM, Nathan S. wrote:
> On Tuesday, 6 February 2018 at 22:29:07 UTC, Walter Bright wrote:
>> nobody uses regex for lexer in a compiler.
> 
> Some years ago I was surprised when I saw this in Clojure's source code. It 
> appears to still be there today:
> 
> https://github.com/clojure/clojure/blob/1215ba346ffea3fe48def6ec70542e3300b6f9ed/src/jvm/clojure/lang/LispReader.java#L66-L73 
> 
> 
> ---
> static Pattern symbolPat = 
> Pattern.compile("[:]?([\\D&&[^/]].*/)?(/|[\\D&&[^/]][^/]*)");
> //static Pattern varPat = 
> Pattern.compile("([\\D&&[^:\\.]][^:\\.]*):([\\D&&[^:\\.]][^:\\.]*)");
> //static Pattern intPat = Pattern.compile("[-+]?[0-9]+\\.?");
> static Pattern intPat =
>          Pattern.compile(
>                  
> "([-+]?)(?:(0)|([1-9][0-9]*)|0[xX]([0-9A-Fa-f]+)|0([0-7]+)|([1-9][0-9]?)[rR]([0-9A-Za-z]+)|0[0-9]+)(N)?"); 
> 
> static Pattern ratioPat = Pattern.compile("([-+]?[0-9]+)/([0-9]+)");
> static Pattern floatPat = 
> Pattern.compile("([-+]?[0-9]+(\\.[0-9]*)?([eE][-+]?[0-9]+)?)(M)?");
> ---


Yes, I'm sure somebody does it. And now that regex has produced a match, you 
have to scan it again to turn it into a number, making for slow lexing. And if 
regex doesn't produce a match, you get a generic error message rather than 
something specific like "character 'A' is not allowed in a numeric literal".

(Generic error messages are one of the downsides of using tools like lex and yacc.)


More information about the Digitalmars-d mailing list