New to D

Steven Schveighoffer via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Thu Oct 27 06:43:26 PDT 2016


On 10/27/16 2:40 AM, Era Scarecrow wrote:
> On Tuesday, 25 October 2016 at 14:40:17 UTC, Steven Schveighoffer wrote:
>> I will note, that in addition to the other comments, this is going to
>> result in corruption. Simply put, the buffer that 'line' uses is
>> reused for each line. So the string data used inside the associative
>> array is going to change. This will result in not finding words
>> already added when using the 'word in dictionary' check.
>>
>> You need to use dictionary[word.idup] = newId; This will duplicate the
>> line into a GC string that will live as long as the AA uses it.
>
>  If there's a case where you have immutable data AND can reference it
> (say... mmap files?) then referencing the string would work rather than
> having to duplicate it.

It depends on the size of the file and the expectation of duplicate 
words. I'm assuming the number of words is limited, so you are going to 
allocate far less data by duping on demand. In addition, you may incur 
penalties for accessing the string directly from the file -- the OS may 
have swapped out that page and have to re-read it from the file itself.

You could also read the entire file into a string and go based on that.

-Steve


More information about the Digitalmars-d-learn mailing list