struct vs class for a simple token in my d lexer

Dmitry Olshansky dmitry.olsh at gmail.com
Mon May 14 10:05:16 PDT 2012


On 14.05.2012 19:10, Roman D. Boiko wrote:
> (Subj.) I'm in doubt which to choose for my case, but this is a generic
> question.
>
> http://forum.dlang.org/post/odcrgqxoldrktdtarskf@forum.dlang.org
>
> Cross-posting here. I would appreciate any feedback. (Whether to reply
> in this or that thread is up to you.) Thanks
On 14.05.2012 19:10, Roman D. Boiko wrote:
Oops, sorry I meant to post to NG only :) Repost:

Clearly you are puting too much pressure on Token.

In my mind it should be real simple:

struct Token{
     uint col, line;
     uint flags;//indicated info about token, serves as both type tag 
and flag set;
//indicates proper type once token was cooked (like "31.415926" -> 
3.145926e1) i.e. values are calculated
     union {
         string     chars;
         float     f_val;
         double     d_val;
         uint     uint_val;
         long     long_val;
         ulnog     ulong_val;
         //... anything else you may need (8 bytes are plenty)
     }//even then you may use up to 12bytes
     //total size == 24 or 20
};

Where:
     Each raw token at start has chars == slice of characters in text 
(or if not UTF-8 source = copy of source). Except for keywords and 
operators.
     Cooking is a process of calculating constant values and such (say 
populating symbols table will putting symbol id into token instead of 
leaving string slice). Do it on the fly or after the whole source - let 
the user choose.

Value types have nice property of being real fast, I suggest you to do 
at least some syntetic tests before going with ref-based stuff. Pushing 
4 word is cheap, indirection never is. Classes also have hidden mutex 
_monitor_ field so using 'class' can be best described as suicide.

Yet you may go with freelist of tokens (leaving them as structs). It's 
an old and proven way.


About row/col - if this stuff is encoded into Finite Automation (and you 
sure want to do something like DFA) it comes out almost at no cost.
The only disadvantage is complicating DFA tables with some 
irregularities of "true Unicode line ending sequences". It's more of 
nuisance then real problem though.

P.S. if you real bend on performance, I suggest to run sparate 
Aho-Corassic-style thing for keywords, it would greatly simplify (=speed 
up) DFA structure if keywords are not hardcoded into automation.


-- 
Dmitry Olshansky


More information about the Digitalmars-d-learn mailing list