dmd 1.060 and 2.045 release
Ary Borenszweig
ary at esperanto.org.ar
Mon May 10 06:22:21 PDT 2010
Steven Schveighoffer wrote:
> On Sun, 09 May 2010 02:11:21 -0400, Lionello Lunesu
> <lio at lunesu.remove.com> wrote:
>
>
>> I'm in the middle of moving from one city to another so don't wait for
>> me. I have attached the D version of the code in the wikipedia article
>> (including the patch for transpositions.)
>>
>> It's not straightforward to drop this in (apart from it being in D),
>> since speller.c creates all variations on a string (=inefficient) and
>> uses a callback function to check if a variation is a valid symbol.
>>
>> I'll have more time to look at this next week.
>
> Several others have privately brought up this problem to Walter. He
> does not want to change how the symbol lookup tables work, and there is
> no way to iterate them. Therefore, without a way to iterate current
> symbols, this is the only way the algo can be written.
>
> However, according to reports on the latest beta, he's sped up the
> lookup times for symbols significantly enough to perceptually reduce the
> problem.
>
> Because of the nature of the algorithm, the longer the invalid symbol,
> the slower the algorithm. It would be interesting to see a comparison
> between the current beta code and code that does a full iteration with
> very long symbols. I don't know if anyone wants to look at modifying
> the symbol lookup data structures to allow iteration, it may be
> perceptually insignificant, and unimportant for most developers.
>
> A quick test on a long symbol name:
>
>
> void
> s023456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567891()
> {}
> void main()
> {
>
> s123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890();
>
> }
>
>
>
> And on the various compilers:
>
> 2.043: 0.052s, does not suggest
> 2.045: 5.4s, suggests the alternative
> beta: 0.8s, does not suggest
>
> 2.043 only does spell checking with a Levenshtein distance of 1, 2.045
> does 2, but is extremely slow. The beta does a distance of 2, but only
> if the errors are close together (Walter added this as an optimization
> to remove one factor from the runtime complexity).
>
> So clearly, there is still room for improvement, I think with a proper
> symbol iteration, we could get the time down to be as quick as 2.043 or
> faster *and* provide the ability to check for a complete LD of 2 where
> the errors are not close together. We might be able to even push the LD
> to 3 or 4.
>
> I've thought about the optimization of errors close together being
> checked, and I think the counter case is the case which takes the
> longest -- a long symbol. Typically people use camelCase to denote
> symbols, what if two of the words in the symbol were misspelled by one
> character (or a capitalization was forgotten)?
>
> It may not be an issue, the spell checker is simply a nice hint, but
> isn't essential to determine errors.
>
> -Steve
I can't imagine why something as simple as iterating a symbol lookup
table can't be implemented.
(in fact I iterate that lookup table in Descent to show
autocompletions... I don't generate every possible string in the world
and see if it's a symbol, and if so, suggest it as an autocompletion...
hmmm...)
More information about the Digitalmars-d-announce
mailing list