struct vs class for a simple token in my d lexer

Tue May 15 02:33:26 PDT 2012

On 05/14/12 17:10, Roman D. Boiko wrote:
> (Subj.) I'm in doubt which to choose for my case, but this is a generic question.
> 
> http://forum.dlang.org/post/odcrgqxoldrktdtarskf@forum.dlang.org
> 
> Cross-posting here. I would appreciate any feedback. (Whether to reply in this or that thread is up to you.) Thanks
> 

If you have to ask such a question, then the answer doesn't matter.

Having said that, and w/o looking at the code, just based on the info
in this thread - a few thoughts, which may or may not be helpful:

- Use an array of pointers to struct tokens. IOW 'Token*[] tokens;'.
  Same use-syntax. Much less large continuous allocations. One indirection
  per token lookup, but that's no worse than using classes, but w/o the
  class overhead.
- Don't let yourself be talked into adding unnecessary bloat, like storing
  location inside Tokens. If calculating it on demand turns out to be costly
  you can always store the location in a separate array (your array-of-structs
  scheme would make the lookup trivial, array-of-*-structs would make this
  slightly more complex, but if the info really is rarely needed, then it
  doesn't matter)
- You can compress the string, if the 16 bytes is too much. Limiting the source
  unit size and max token length to 4G should be acceptable; then just have
  struct TokenStr { uint off, len; auto get(){return source[off..off+len];} etc
  which will reduce the memory footprint, while adding very little overhead. It's
  not like these strings will be used repeatedly. Don't forget to keep a reference
  to 'source' around, so that it isn't collected by the GC.
  (I'd even go for a 24:8 split, then you probably need to handle string literals
  longer then 255 bytes specially, but it's likely worth it. If somebody produces
  a 16M+ D source module he deserves the build failure. ;) )

I'm assuming the Token structs don't need destructors.

artur