Lexer in D

Dmitry Olshansky dmitry.olsh at gmail.com
Sat Mar 2 09:33:08 PST 2013


02-Mar-2013 18:53, Namespace пишет:
>> That would be slower as canFind is linear search.
>> Simpler way is use first letter and length to select a bucket of
>> keywords.
>
> I think I get it.
> Did you mean something like this?
>

No-no and no.

Just translate list of keywords/types to a bunch of switch-cases that 
will give you near optimal performance. Use CTFE to construct that 
string of D code.

In fact I proposed to do just 2 levels of switches: first switch by 
length then by first char.

Even simple if you don't want to go for CTFE code-generation is to make 
plain array of array. The length of array is exactly 256 i.e.

Even this should have decent speed:

string[][256] keywords = [
null,
null,
...
//at index 'i'
['int', 'if', 'invariant', 'idouble', 'ifloat' ...]
//array of strings starting with char for each
...
];

bool isKeyword(string value)
{
	string[] toSearch = keywords[value[0]];
	return toSearch.canFind(value);
}

Better yet only store [1..$] parts of keywords and search for value[1..$].

Even faster is to make the same for each possible length of keyword.

> [code]
> immutable ubyte[char] typeMap;
> immutable ubyte[char] keywordMap;
>
> static this() {
>      typeMap = [
>          'd' : 7, 'l' : 16, 'i' : 12, 'u' : 20, 'b' : 0, 'f' : 10, 'r' :
> 17, 'v' : 25, 'c' : 2, 's' : 18, 'w' : 26
>      ];
>
>      keywordMap = [
>          '_' : 0, 'a' : 6, 'b' : 12, 'c' : 14, 'd' : 20, 'e' : 26, 'f' :
> 30, 'g' : 36, 'i' : 37, 'l' : 45,
>          'm' : 46, 'n' : 49, 'o' : 52, 'p' : 54, 'r' : 60, 's' : 62, 't'
> : 69, 'u' : 77, 'v' : 79, 'w' : 81
>      ];
> }
>
> bool isKeyword(string value) pure nothrow {
>      if (value[0] !in keywordMap) return false;
>
>      foreach (string kword; keywords[keywordMap[value[0]] .. $]) {
>          if (kword == value) return true;
>          if (kword[0] != value[0]) return false;
>      }
>
>      return false;
> }
>
> bool isType(string value) pure nothrow {
>      if (value[0] !in typeMap) return false;
>
>      foreach (string type; types[typeMap[value[0]] .. $]) {
>          if (type == value) return true;
>          if (type[0] != value[0]) return false;
>      }
>
>      return false;
> }
> [/code]
>
> I'm now by ~230 msecs.


-- 
Dmitry Olshansky


More information about the Digitalmars-d-learn mailing list