Request for comments: std.d.lexer

FG home at fgda.pl
Mon Jan 28 04:12:09 PST 2013


On 2013-01-28 08:47, Jacob Carlborg wrote:
> If we're talking about dynamic allocation you can make sure you're just using
> value types. A token could look like:
>
> struct Token
> {
>      TokenKind kind;
>      string value;
>      uint index;
> }
>
> For the "value" you could just slice the buffer. But I think this will prevent
> the whole buffer from being collected.


I was also thinking about using slices to limit string allocation. So far the 
combined size of source files in D projects is so small, that it wouldn't hurt 
to mmap the files and slice them. It is possible though that someone would 
create a huge file, even if only just to see this program crash. :)

In that case something else may be useful. Allocate special arrays for holding 
value strings, for example 256 kB per array. Token.value will be a slice of such 
array. Additionally have a lookup Trie to help reuse repeating values - if a 
string is already in an array, the Trie leaf will store its position (slice) and 
the token will only have to copy that info. If the lookup doesn't turn up the 
string, the string will be added to the end of the array using Appender or, if 
it doesn't fit, a new array will be created.


More information about the Digitalmars-d mailing list