Associative array printing problems

Wed Jun 22 11:45:12 PDT 2011

Printing associative arrays in a decent way is not a luxury, it's a basic skill I expect D writeln to have.

This is reduced from a program related to the coding Kata Nineteen, Word Chains. The problem asks to create the longest chain of words. This function creates an associative array where the keys are the start chars of words, and the values are sets of words. For simplicity in D2 I have implemented the string sets as bool[string].

import std.stdio, std.string;

bool[string][char] foo(string[] names) {
    typeof(return) result;
    foreach (name; names)
        result[name[0]][name] = true;
    return result;
}

auto names = "mary patricia linda barbara elizabeth jennifer
    maria susan margaret dorothy lisa nancy karen betty helen
    sandra donna carol ruth sharon michelle laura sarah
    kimberly deborah jessica shirley cynthia angela melissa
    brenda amy anna rebecca virginia kathleen pamela";

void main() {
    writeln(foo(names.split()));
}

The original complete program didn't have to print this result, but there I have created a bug, so I have had to print result, as I have done in this reduced program. This is the printout:

p:patricia:true pamela:true l:linda:true lisa:true laura:true d:dorothy:true donna:true deborah:true h:helen:true m:melissa:true margaret:true michelle:true maria:true mary:true e:elizabeth:true a:angela:true anna:true amy:true b:barbara:true betty:true brenda:true j:jessica:true jennifer:true n:nancy:true r:ruth:true rebecca:true v:virginia:true s:sharon:true susan:true shirley:true sandra:true sarah:true k:kathleen:true karen:true kimberly:true c:cynthia:true carol:true

For me this is very bad, I am not able to read it well. Nesting of dictionaries produces a hard to read output.

To better show what I mean this a Python2.6 translation (here I have used true sets, that are built-in, but the situation doesn't change a lot):

from collections import defaultdict

def foo(names):
    result = defaultdict(set)
    for name in names:
        result[name[0]].add(name)
    return result

names = """mary patricia linda barbara elizabeth jennifer
    maria susan margaret dorothy lisa nancy karen betty helen
    sandra donna carol ruth sharon michelle laura sarah
    kimberly deborah jessica shirley cynthia angela melissa
    brenda amy anna rebecca virginia kathleen pamela"""

print foo(names.split())

Its textual output allows me to tell apart sub-dictionaries:

defaultdict(<type 'set'>, {'a': set(['amy', 'anna', 'angela']), 'c': set(['carol', 'cynthia']), 'b': set(['barbara', 'betty', 'brenda']), 'e': set(['elizabeth']), 'd': set(['dorothy', 'donna', 'deborah']), 'h': set(['helen']), 'k': set(['kathleen', 'kimberly', 'karen']), 'j': set(['jessica', 'jennifer']), 'm': set(['margaret', 'melissa', 'michelle', 'mary', 'maria']), 'l': set(['laura', 'linda', 'lisa']), 'n': set(['nancy']), 'p': set(['pamela', 'patricia']), 's': set(['sarah', 'sharon', 'sandra', 'shirley', 'susan']), 'r': set(['ruth', 'rebecca']), 'v': set(['virginia'])})

Using pprint (pretty print) from the Python standard library it improves:
from pprint import pprint
pprint(dict(foo(names.split())))

{'a': set(['amy', 'angela', 'anna']),
 'b': set(['barbara', 'betty', 'brenda']),
 'c': set(['carol', 'cynthia']),
 'd': set(['deborah', 'donna', 'dorothy']),
 'e': set(['elizabeth']),
 'h': set(['helen']),
 'j': set(['jennifer', 'jessica']),
 'k': set(['karen', 'kathleen', 'kimberly']),
 'l': set(['laura', 'linda', 'lisa']),
 'm': set(['margaret', 'maria', 'mary', 'melissa', 'michelle']),
 'n': set(['nancy']),
 'p': set(['pamela', 'patricia']),
 'r': set(['rebecca', 'ruth']),
 's': set(['sandra', 'sarah', 'sharon', 'shirley', 'susan']),
 'v': set(['virginia'])}

This is even better:

{'a': {"amy", "angela", "anna"},
 'b': {"barbara", "betty", "brenda"},
 'c': {"carol", "cynthia"},
 'd': {"deborah", "donna", "dorothy"},
 'e': {"elizabeth"},
 'h': {"helen"},
 'j': {"jennifer", "jessica"},
 'k': {"karen", "kathleen", "kimberly"},
 'l': {"laura", "linda", "lisa"},
 'm': {"margaret", "maria", "mary", "melissa", "michelle"},
 'n': {"nancy"},
 'p': {"pamela", "patricia"},
 'r': {"rebecca", "ruth"},
 's': {"sandra", "sarah", "sharon", "shirley", "susan"},
 'v': {"virginia"}
}

If you want a more apples-to-apples comparison this is Python code that uses the same data structure used by the D code:

from collections import defaultdict

def foo(names):
    result = defaultdict(dict)
    for name in names:
        result[name[0]][name] = True
    return result

names = """mary patricia linda barbara elizabeth jennifer
    maria susan margaret dorothy lisa nancy karen betty helen
    sandra donna carol ruth sharon michelle laura sarah
    kimberly deborah jessica shirley cynthia angela melissa
    brenda amy anna rebecca virginia kathleen pamela"""

print foo(names.split())

Its textual output:

defaultdict(<type 'dict'>, {'a': {'amy': True, 'anna': True, 'angela': True}, 'c': {'carol': True, 'cynthia': True}, 'b': {'barbara': True, 'betty': True, 'brenda': True}, 'e': {'elizabeth': True}, 'd': {'dorothy': True, 'donna': True, 'deborah': True}, 'h': {'helen': True}, 'k': {'kathleen': True, 'kimberly': True, 'karen': True}, 'j': {'jessica': True, 'jennifer': True}, 'm': {'margaret': True, 'melissa': True, 'michelle': True, 'mary': True, 'maria': True}, 'l': {'laura': True, 'linda': True, 'lisa': True}, 'n': {'nancy': True}, 'p': {'pamela': True, 'patricia': True}, 's': {'sarah': True, 'sharon': True, 'sandra': True, 'shirley': True, 'susan': True}, 'r': {'ruth': True, 'rebecca': True}, 'v': {'virginia': True}})

Using pprint:

{'a': {'amy': True, 'angela': True, 'anna': True},
 'b': {'barbara': True, 'betty': True, 'brenda': True},
 'c': {'carol': True, 'cynthia': True},
 'd': {'deborah': True, 'donna': True, 'dorothy': True},
 'e': {'elizabeth': True},
 'h': {'helen': True},
 'j': {'jennifer': True, 'jessica': True},
 'k': {'karen': True, 'kathleen': True, 'kimberly': True},
 'l': {'laura': True, 'linda': True, 'lisa': True},
 'm': {'margaret': True,
       'maria': True,
       'mary': True,
       'melissa': True,
       'michelle': True},
 'n': {'nancy': True},
 'p': {'pamela': True, 'patricia': True},
 'r': {'rebecca': True, 'ruth': True},
 's': {'sandra': True,
       'sarah': True,
       'sharon': True,
       'shirley': True,
       'susan': True},
 'v': {'virginia': True}}

Even without pprint the printout of the default dict is usable for my debugging because it allows me to tell apart the sub-dictionaries. Another help comes from using "" and '' around chars and strings present inside collections.

A prettyPrint() function in Phobos will help, but first in D2 I'd like writeln() to print that D data structure more or less like this:

['a': ["amy": true, "anna": true, "angela": true], 'c': ["carol": true, "cynthia": true], 'b': ["barbara": true, "betty": true, "brenda": true], 'e': ["elizabeth": true], 'd': ["dorothy": true, "donna": true, "deborah": true], 'h': ["helen": true], 'k': ["kathleen": true, "kimberly": true, "karen": true], 'j': ["jessica": true, "jennifer": true], 'm': ["margaret": true, "melissa": true, "michelle": true, "mary": true, "maria": true], 'l': ["laura": true, "linda": true, "lisa": true], 'n': ["nancy": true], 'p': ["pamela": true, "patricia": true], 's': ["sarah": true, "sharon": true, "sandra": true, "shirley": true, "susan": true], 'r': ["ruth": true, "rebecca": true], 'v': ["virginia": true]]

This is allows me to use the printout for debugging, especially when I reduce the number of names for debugging purposes:

['a': ["amy": true, "anna": true], 'c': ["carol": true], 'b': ["barbara": true, "betty": true], 'd': ["dorothy": true], 'k': ["kathleen": true, "karen": true], 's': ["sandra": true, "shirley": true], 'v': ["virginia": true]]

Bye,
bearophile