dmd 1.060 and 2.045 release

Mon May 10 06:22:21 PDT 2010

Steven Schveighoffer wrote:
> On Sun, 09 May 2010 02:11:21 -0400, Lionello Lunesu 
> <lio at lunesu.remove.com> wrote:
> 
> 
>> I'm in the middle of moving from one city to another so don't wait for
>> me. I have attached the D version of the code in the wikipedia article
>> (including the patch for transpositions.)
>>
>> It's not straightforward to drop this in (apart from it being in D),
>> since speller.c creates all variations on a string (=inefficient) and
>> uses a callback function to check if a variation is a valid symbol.
>>
>> I'll have more time to look at this next week.
> 
> Several others have privately brought up this problem to Walter.  He 
> does not want to change how the symbol lookup tables work, and there is 
> no way to iterate them.  Therefore, without a way to iterate current 
> symbols, this is the only way the algo can be written.
> 
> However, according to reports on the latest beta, he's sped up the 
> lookup times for symbols significantly enough to perceptually reduce the 
> problem.
> 
> Because of the nature of the algorithm, the longer the invalid symbol, 
> the slower the algorithm.  It would be interesting to see a comparison 
> between the current beta code and code that does a full iteration with 
> very long symbols.  I don't know if anyone wants to look at modifying 
> the symbol lookup data structures to allow iteration, it may be 
> perceptually insignificant, and unimportant for most developers.
> 
> A quick test on a long symbol name:
> 
> 
> void 
> s023456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567891() 
> {}
> void main()
> {
>      
> s123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890(); 
> 
> }
> 
> 
> 
> And on the various compilers:
> 
> 2.043: 0.052s, does not suggest
> 2.045: 5.4s, suggests the alternative
> beta: 0.8s, does not suggest
> 
> 2.043 only does spell checking with a Levenshtein distance of 1, 2.045 
> does 2, but is extremely slow.  The beta does a distance of 2, but only 
> if the errors are close together (Walter added this as an optimization 
> to remove one factor from the runtime complexity).
> 
> So clearly, there is still room for improvement, I think with a proper 
> symbol iteration, we could get the time down to be as quick as 2.043 or 
> faster *and* provide the ability to check for a complete LD of 2 where 
> the errors are not close together.  We might be able to even push the LD 
> to 3 or 4.
> 
> I've thought about the optimization of errors close together being 
> checked, and I think the counter case is the case which takes the 
> longest -- a long symbol.  Typically people use camelCase to denote 
> symbols, what if two of the words in the symbol were misspelled by one 
> character (or a capitalization was forgotten)?
> 
> It may not be an issue, the spell checker is simply a nice hint, but 
> isn't essential to determine errors.
> 
> -Steve

I can't imagine why something as simple as iterating a symbol lookup 
table can't be implemented.

(in fact I iterate that lookup table in Descent to show 
autocompletions... I don't generate every possible string in the world 
and see if it's a symbol, and if so, suggest it as an autocompletion... 
hmmm...)