struct vs class for a simple token in my d lexer
Roman D. Boiko
rb at d-coding.com
Mon May 14 10:27:20 PDT 2012
On Monday, 14 May 2012 at 17:05:17 UTC, Dmitry Olshansky wrote:
> On 14.05.2012 19:10, Roman D. Boiko wrote:
>> (Subj.) I'm in doubt which to choose for my case, but this is
>> a generic
>> question.
>>
>> http://forum.dlang.org/post/odcrgqxoldrktdtarskf@forum.dlang.org
>>
>> Cross-posting here. I would appreciate any feedback. (Whether
>> to reply
>> in this or that thread is up to you.) Thanks
> On 14.05.2012 19:10, Roman D. Boiko wrote:
> Oops, sorry I meant to post to NG only :) Repost:
>
> Clearly you are puting too much pressure on Token.
>
> In my mind it should be real simple:
>
> struct Token{
> uint col, line;
> uint flags;//indicated info about token, serves as both
> type tag and flag set;
> //indicates proper type once token was cooked (like "31.415926"
> -> 3.145926e1) i.e. values are calculated
> union {
> string chars;
> float f_val;
> double d_val;
> uint uint_val;
> long long_val;
> ulnog ulong_val;
> //... anything else you may need (8 bytes are plenty)
> }//even then you may use up to 12bytes
> //total size == 24 or 20
> };
>
> Where:
> Each raw token at start has chars == slice of characters in
> text (or if not UTF-8 source = copy of source). Except for
> keywords and operators.
> Cooking is a process of calculating constant values and
> such (say populating symbols table will putting symbol id into
> token instead of leaving string slice). Do it on the fly or
> after the whole source - let the user choose.
>
> Value types have nice property of being real fast, I suggest
> you to do at least some syntetic tests before going with
> ref-based stuff. Pushing 4 word is cheap, indirection never is.
> Classes also have hidden mutex _monitor_ field so using 'class'
> can be best described as suicide.
>
> Yet you may go with freelist of tokens (leaving them as
> structs). It's an old and proven way.
>
>
> About row/col - if this stuff is encoded into Finite Automation
> (and you sure want to do something like DFA) it comes out
> almost at no cost.
> The only disadvantage is complicating DFA tables with some
> irregularities of "true Unicode line ending sequences". It's
> more of nuisance then real problem though.
>
> P.S. if you real bend on performance, I suggest to run sparate
> Aho-Corassic-style thing for keywords, it would greatly
> simplify (=speed up) DFA structure if keywords are not
> hardcoded into automation.
Thanks Dmitry, I think I'll need to contact you privately about
some details later.
More information about the Digitalmars-d-learn
mailing list