struct vs class for a simple token in my d lexer

Mon May 14 10:27:20 PDT 2012

On Monday, 14 May 2012 at 17:05:17 UTC, Dmitry Olshansky wrote:
> On 14.05.2012 19:10, Roman D. Boiko wrote:
>> (Subj.) I'm in doubt which to choose for my case, but this is 
>> a generic
>> question.
>>
>> http://forum.dlang.org/post/odcrgqxoldrktdtarskf@forum.dlang.org
>>
>> Cross-posting here. I would appreciate any feedback. (Whether 
>> to reply
>> in this or that thread is up to you.) Thanks
> On 14.05.2012 19:10, Roman D. Boiko wrote:
> Oops, sorry I meant to post to NG only :) Repost:
>
> Clearly you are puting too much pressure on Token.
>
> In my mind it should be real simple:
>
> struct Token{
>     uint col, line;
>     uint flags;//indicated info about token, serves as both 
> type tag and flag set;
> //indicates proper type once token was cooked (like "31.415926" 
> -> 3.145926e1) i.e. values are calculated
>     union {
>         string     chars;
>         float     f_val;
>         double     d_val;
>         uint     uint_val;
>         long     long_val;
>         ulnog     ulong_val;
>         //... anything else you may need (8 bytes are plenty)
>     }//even then you may use up to 12bytes
>     //total size == 24 or 20
> };
>
> Where:
>     Each raw token at start has chars == slice of characters in 
> text (or if not UTF-8 source = copy of source). Except for 
> keywords and operators.
>     Cooking is a process of calculating constant values and 
> such (say populating symbols table will putting symbol id into 
> token instead of leaving string slice). Do it on the fly or 
> after the whole source - let the user choose.
>
> Value types have nice property of being real fast, I suggest 
> you to do at least some syntetic tests before going with 
> ref-based stuff. Pushing 4 word is cheap, indirection never is. 
> Classes also have hidden mutex _monitor_ field so using 'class' 
> can be best described as suicide.
>
> Yet you may go with freelist of tokens (leaving them as 
> structs). It's an old and proven way.
>
>
> About row/col - if this stuff is encoded into Finite Automation 
> (and you sure want to do something like DFA) it comes out 
> almost at no cost.
> The only disadvantage is complicating DFA tables with some 
> irregularities of "true Unicode line ending sequences". It's 
> more of nuisance then real problem though.
>
> P.S. if you real bend on performance, I suggest to run sparate 
> Aho-Corassic-style thing for keywords, it would greatly 
> simplify (=speed up) DFA structure if keywords are not 
> hardcoded into automation.

Thanks Dmitry, I think I'll need to contact you privately about 
some details later.