Port of Python's difflib.SequenceMatcher class

Michael Butscher mbutscher at gmx.de
Wed Dec 6 14:21:35 PST 2006


Pragma wrote:
> Michael Butscher wrote:
> > Hi, 
> > 
> > a D port (version 0.175) of Python's difflib.SequenceMatcher class to 
> > generate diff's is available at
> > 
> >   http://www.mbutscher.de/snippets/difflib_d20061202.zip
> > 
> > It might need some cleaning up yet but the translated doctests pass 
> > (except one I couldn't make compile in D, but "in theory" it passes as 
> > well).
> > 
> > Comments, critique?
> 
> I agree with Walter that you should throw this up on a page somewhere. 

At least I have mentioned it on the page

  http://www.mbutscher.de/software.html

as a "snippet" (it isn't much more, I think).



> I'm curious, but rarely have time to sift through sourcecode unless I'm 
> in need of something specific - I develop using SVN 99% of the time, 
> which does .diff output for me already.

I will need it later for a project written in Python (kind of personal 
wiki without server) to allow to store different versions of a wiki 
page.

When the time comes, I will add a little C interface for a DLL which 
mainly can create some sort of binary diff of two arbitrary byte-blocks 
and allows to apply the diff to the first block to create the second.


> But I *am* curious about how the porting went, what the pitfalls were, 
> and how you worked around Python idioms and tuple types.

- The often used "self" was just translated to "this" therefore the 
code looks a bit weird in D, e.g.:


    void set_seq2(ST b)
    {
        if (b is this.b)
            return;
        this.b = b;
        this.matching_blocks = null;
        this.opcodes = null;
        this.fullbcount = null;
        this.chain_b();
    }


- One thing I really missed in D was the get() method for Python 
dictionaries with a default argument. Therefore I created inner 
functions like

        IndexType j2lenget(IndexType i, IndexType def)
        {
            IndexType* result = i in j2len;
            if (result)
                return *result;
            else
                return def;
        }

Probably this can be done more elegantly, but I personally think that
get() should be a standard method of AAs.



- The class used only two types of tuples which had clear purposes, so 
they were translated into structs without much harm.



> Also, I'm 
> wondering if the D version brings any extra perks like better 
> performance, or less/clearer code?

I have not yet done any benchmarks, but I just assume that D is much 
faster.


The D code is a bit longer and IMHO a bit less readable than Python, 
but I'm much more used to Python than D.


Michael



More information about the Digitalmars-d-announce mailing list