AA key conversion woes

Michael Rynn michaelrynn at optusnet.com.au
Thu Apr 19 01:46:14 PDT 2012


On Wed, 18 Apr 2012 15:06:11 +0200, SomeDude wrote:

> On Tuesday, 17 April 2012 at 18:51:07 UTC, H. S. Teoh wrote:
>> On Tue, Apr 17, 2012 at 08:44:42PM +0200, Andrej Mitrovic wrote:
>>> On 4/17/12, H. S. Teoh <hsteoh at quickfur.ath.cx> wrote:
>>> > But even then, I'm considering if .keys should return a mutable
>>> > array if the key is a value type
>>> 
>>> How about having .keys for mutable and .ikeys for immutable returns?
>>
>> That's one possibility. Then .keys can come with the caveat that it may
>> involve hidden performance costs in the form of internal .dup's. But
>> I'm not sure this is a good idea, because there are too many
>> string-keyed AA's in existing code that will now suddenly become very
>> slow because they use .keys instead of .ikeys, and the strings are
>> getting .dup'd all over.
>>
>>
>> T
> 
> Hidden duplications are not acceptable in my opinion.


First of all, distinguish between back-end implementation of AA, and 
template interface(s), which may or may not, add restrictions and "extra" 
information and functions, to please one's personal idealogical D purity 
and immutability.

Backend rt.aaA,  currently is generic pointer / TypeInfo based, which is 
good for performance and reducing code bloat. Current version does not 
handle D types as well as it could. Actual BB struct, only holds TypeInfo 
reference for key type, which has to be initialised on adding first key.

By generally returning a pointer to node value slot, aaA allows the 
compiler to supply the missing value TypeInfo work, which works for a 
good part of the functionality (but not on key removal/node deletion, for 
instance).

aaA does not handle postblit for keys correctly.
Structs with postblit and/or complex destructors are not properly done 
either.

This can be seen by using a reference counted struct. Normal D code 
handles the postblit, destructors properly, (meaning I haven't tested 
really enough code).  Entanble them in AA as key or value, and counting 
will not work as expected. I know I'm eccentric, so please ignore the 
question of why reference counting.  Immutable keys would ban postblit 
and destructors for struct.

However it is possible to have aaA implementation, with only a small cost,
that handles struct postblit and destructors correctly, and refcounting 
(as an example, please), comes out correctly.


I have working code in folder rt, viewable at  
http://bazaar.launchpad.net/~michael-rynn-500/d2-xml/d2-xml-dev/files/
head:/rt/

There is an IAA, defining a low level interface (really, real interface).

Maybe dmd could learn to call interface directly instead of indirect C 
functions in aaA.  However the overhead is very slight.

Not possible to have aaA integrated with this without using template, as 
dmd might not supply initilization TypeInfo for value type. Integration 
with current aaA C functions almost worked, but not quite, after I found 
I needed to initialise value TypeInfo.

Need to declare  with the aaI.AssociativeArray template.

The template below will support non-immutable keys for lookups for string 
types, but not for replace or insert of keys. This is why inX method is 
const void* key, believe it or not, so it is easily persuaded to take 
string or char[]. delX should be const as well, may have to fix this.


In order to demonstrate IAA is more or less complete and working 
interface for an aaA, I have made 2 different implementations, which use 
IAA


http://bazaar.launchpad.net/~michael-rynn-500/d2-xml/d2-xml-dev/view/
head:/rt/aaSLink.d, a direct rip off, with above mentioned few warts 
erased, of the current rt.aaA implementation.

and

http://bazaar.launchpad.net/~michael-rynn-500/d2-xml/d2-xml-dev/view/
head:/rt/aaArrayMap.d

which uses entirely arrays, (even a double linked list of integer indexes 
is an array).

I'm telling another lie here, it manages a partial implementation of 
untyped arrays, rt.unarray, which uses TypeInfo to manage memory content 
for keys and values, including postblit annoyances, to 
implement the genericity implied by IAA.


Its a little bit slower, which shows a bit with over a million keys, but 
acceptable.  arrayMap maintains insertion order, as long as no key 
deletions followed by insertions.


Now that there is 2 , or maybe more potential implementations, need a 
factory method, to create a default, if no deliberate initialisation.

Thats currently done in rt.aadefault,
which can use a version flag, to initialise the global factory.

alias IAA function(TypeInfo keyti,TypeInfo valueti) createDefaultAA; 

shared createDefaultAA gAAFactory;

interface IKeyRange {
	void popFront();
	AAKeyRef front();
	bool empty();
}

interface IValueRange {
	void popFront();
	AAValueRef front();
	bool empty();
}


interface IAA {
	/// equals checks same implementation.
	bool equals(IAA other, TypeInfo_AssociativeArray rawTi);

	/// setup types.  Won't work
	/// void init(TypeInfo_AssociativeArray aati);

	/// setup types
	void init(TypeInfo kti,TypeInfo vti);


	/// implement initialisation from a literal va_list. Won't work.
	///void init(TypeInfo_AssociativeArray ti, size_t length, ...);


	/// implement initialisation from a literal. Won't work.
	/// void init(TypeInfo_AssociativeArray ti, void[] keys, void[] 
values);
	///		return number of stored key-value pairs as a 
property
	size_t	length();

	/// append all the values from array index pairs, matching key 
and value types.
	/// return increase in size
	void append(void[] keys, void[] values);
	/// append or replace single key value pair. return 0 if 
successful (for apply2 delegate)
	//int append(void* key, void* value);

	dg2_t getAppendDg(); // return a int append(void* key, void* 
value);

	/// reference counting , or not
	void release();
	void addref();

	/// rehash.  Return IAA
	IAA rehash(); 

	/// Range interface for keys
	IKeyRange		byKey();

	/// Range interface for values
	IValueRange		byValue();

	/// remove key value pair
	bool delX(void* pkey);

	/// get, and return if location was created or was existing
	void* getX(void* pkey, out bool isNew);

	/// Return existing or null
	void* inX(const void* pkey);

	/// return unique copy of array of values
	ArrayRet_t values();

	/// return unique copy of array of keys
	ArrayRet_t keys();
	
	/** At least length 2, 0 = length of hash map. 1 = count of nodes 
which are empty.
	2 = count of nodes with chain length 1,  and so on, to length of 
array.
	*/
	uint[] statistics();
	/// return new instance with copy of each key-value pair
	//IAA dup();

	/// Delegate for each key
	int apply(dg_t dg);

	/// Delegate for each key and value
	int apply2(dg2_t dg);

	/// Typeinfo for key.
	TypeInfo keyType();

	/// Typeinfo for value.
	TypeInfo valueType();

	/// remove all pairs
	void clear();

	/// Get new interface with same configuration, but empty
	IAA emptyClone();

}

// how D will not see it
struct AA
{
    IAA a;
}


alias IAA function(TypeInfo keyti,TypeInfo valueti) createDefaultAA; 

shared createDefaultAA gAAFactory;


struct AssociativeArray(Key, Value)
{
private:
    IAA p;  
public:
	
	alias AssociativeArray!(Key,Value)	MyAAType;

	/// Support non-immutable lookups for immutable string types.
	/// TODO: make this more concise and broader?
	static if (isSomeString!Key)
    {
        static if (is(Key==string))
        {
            alias const(char)[] CKey;
        }
        else static if (is(Key==wstring))
        {
            alias const(wchar)[] CKey;
        }
        else static if (is(Key==dstring))
        {
            alias const(dchar)[] CKey;
        }
    }
    else {
		alias Key CKey;
    }


    @property size_t length() { return (p !is null) ? p.length : 0; }

	void init(createDefaultAA factory)
	{
		if (p)
		{
			p.release();
			p = null;
		}
		p = factory(typeid(Key), typeid(Value));
	}
	/// Post-Blits 
	this(this)
	{
		if (p)
		{
			p.addref();
		}
	}
	~this()
	{
		if (p)
		{
			p.release();
			p = null;
		}
	}
    MyAAType rehash() @property
    {
        p = p.rehash();
        return MyAAType(p);
    }

    Value[] values() @property
    {
		return (p !is null) ? *cast(Value[]*) p.values() : null;
    }

    Key[] keys() @property
    {
 		return (p !is null) ? *cast(Key[]*) p.keys() : null;
    }

    int opApply(scope int delegate(ref Key, ref Value) dg)
    {
		return p !is null ? p.apply2(cast(_dg2_t)dg) : 0;
    }

    int opApply(scope int delegate(ref Value) dg)
    {
        return p !is null ? p.apply(cast(_dg_t)dg) : 0;
    }

	void opAssign(Value[Key] op)
	{
		if (p is null)
			init(gAAFactory);
		auto dg = p.getAppendDg();
		foreach(k,v ; op)
		{
			dg(&k,&v);
		}
	}


	void opCatAssign(Value[Key] op)
	{
		if (p is null)
			init(gAAFactory);
		auto dg = p.getAppendDg();
		foreach(k,v ; op)
		{
			dg(&k,&v);
		}
	}
	void opCatAssign(MyAAType appendMe)
	{
		auto q = appendMe.p;
		if (q is null)
			// nothing to do
			return;
		if (p is null)
			p = q.emptyClone();
		q.apply2(p.getAppendDg());
	}

	Value opIndex(CKey key)
	{
		auto pvalue = (p !is null) ? p.inX(&key) : null;
		if (pvalue is null)
			throw AAError.error(AAError.OPINDEX_FAIL);
		return *(cast(Value*) pvalue);
	}
	Value opIndex(CKey key, lazy Value defaultValue)
	{
		auto pvalue = (p !is null) ? p.inX(&key) : null;
		return p ? *(cast(Value*) pvalue) : defaultValue;
	}

    Value get(CKey key, lazy Value defaultValue)
    {
        auto pval = (p !is null) ? p.inX(&key) : null;
        return pval ? *(cast(Value*) pval) : defaultValue;
    }

	Value* opIn_r(CKey key)
	{
		return ( p !is null) ? cast(Value*) p.inX(&key) : null;
	}
	
	// passed value is stored if it didn't exist, returned if it did.
	bool getOrPut(Key key, ref Value val)
	{	
		if (p is null)
			init(gAAFactory);
		bool existed = void;
		auto spot = cast(Value*) p.getX(&key, existed);
		if (!existed)
			*spot = val;
		else
			val = *spot;
		return existed;
	}

	void put(Key key, Value val)
	{
		if (p is null)
			init(gAAFactory);
		bool existed = void;
		auto spot = cast(Value*) p.getX(&key, existed);
		*spot = val;
	}

	uint[] chainLengths()
	{
		if (p is null)
			return [0,0];
		return p.statistics();
	}

	void opIndexAssign(Value val, Key key)
	{
		if (p is null)
			init(gAAFactory);
		bool existed = void;
		auto spot = cast(Value*) p.getX(&key, existed);
		*spot = val;
	}
	
	
	MyAAType dup()
	{
		MyAAType result;
		if (p !is null)
		{
			auto q = p.emptyClone();
			p.apply2(q.getAppendDg());
			result.p = q;
		}
		return result;
	}

	void clear()
	{
		if (p !is null)
		{
			p.clear();
		}
	}

	bool remove(CKey key)
	{
		return (p !is null) ? p.delX(&key) : false;
	}

    @property auto byKey()
    {
        return (p !is null) ?  p.byKey() : null;
    }

    @property auto byValue()
    {
		return (p !is null) ?  p.byValue() : null;
    }
}



More information about the Digitalmars-d mailing list