D BUGS Part 2

bearophile bearophileHUGS at lycos.com
Thu Jan 1 12:23:19 PST 2009


The first part of this post was posted around October 2 2008, and shows a lits of general bugs I have found in DMD/D.

This post lists several problems/bugs/limits I have found in the write/writefln of D1.

Most of such problems are solved in my implementation of the put/putr functions, that you can find in my dlibs, inside the module 'string'. Note that such functions have two known bugs (and some limits), I'll fix one of such bugs ASAP. So here I'll talk mostly about what my code does. Note that often there's no "correct" representation, but having a fixed default one is much better than nothing.

The name of the functions: put/putr are very short to type, its purpose is easy to understand, and there's very little risk of typos. So I think they are the best choice of names. I have used them for almost a year now.

The purpose of such functions: to print data and data structures on the console. Such printing is mostly for the programmer, expecially during debug, or for logging. So the print functions have to be:
- Fast, possibly as fast as printf(). At the moment put/putr are slower than writef/writefln, and writef/writefln are quite slower than printf(). Ideally writef/writefln can become as fast as printf(). I'd like put/putr to become faster, but it may require lot of work. The slowness of put/putr is generally less important, because where lot of data has to be printed, printf() can often be used.
- Unambiguous: the printed data must clearly show the type and content of data.
- Complete: all built-ins must have a good representation.
- Elegant and clean: to speed up the reading and have a good logs.
- A representation: when possible it can be useful to have an alternative way to represent something in a more precise way. Python tells apart the two usages: str() and its dual __str__ return a readable textual representation, while repr() and __repr__ often return a textual representation that put inside the code generates the same object that has being printed. In dlibs I have found useful create the same pair of functions (but in D objects have only one toString(), so there's no support for a toRepr or something similar).

So in the string module of the dlibs you can find put/putr and str/repr functions. The second pair returns a textual representation and the first pair prints.

Some notes on how put/putr work:
- Strings inside arrays/AAs are printed with "" around them and with special chars escaped. Because sub-items in collections are string-fied using repr and not str.
- Structs without toString(): prints their fields in the middle of <>.
- delegates are printed between <>.
- Objects: prints just their qualified name (this may be changed in the future).
- Unions: it prints the first field (using a .tupleof[0]) in the middle of {}.
- Pointers are printed as hex integers.
- Printing AAs that contain static arrays (as keys or values) may require lot of memory. This is a bug of D itself.
- Interfaces are printed as: interface:modulepath.Interfacename.

Limitations:
- Printing very large dictionaries with this function can be 2+ times slower than writefln.
- It can't print an enum.
- It doesn't work with dstring and wstring yet (to be fixed).
- It doesn't print structs/classes with both private attributes and toString() not defined.

Bug: This situation with self-nested array of box isn't handled yet:
----------------
import std.boxer: box, Box;
import d.func: putr;
void main() {
  auto a = new Box[1];
  a[0] = box(a);
  putr(a); // Error: Stack Overflow
}
----------------

Bug: some cases of unions/structs aren't printed correctly yet:
----------------
union XY1 {
  struct { int x, y; }
  long xy;
}

struct XY2 {
  union {
    struct { int x, y; }
    long xy;
  }
}

putr(XY1(10, 20)); // Out: {10}
putr(XY2(10, 20)); // Out: <10, 20, 85899345930>

------------------------


And now, after showing the bugs/problems of put/putr, I can show what they do well.

This shows how strings are printed:
- Escape character are printed with a \ before
- lists are printed with spaces among items to improve readability.
- AAs are printed with a space after the comma and after the colon for the same purpose.
- strings string-fied with repr() are printed inside "", this helps a LOT tell apart strings from everything else.

assert(str("hello", ' ', ["hello"], ' ', ['a', 'b']) == "hello [\"hello\"] ab");
assert(str([1, 2, 3]) == "[1, 2, 3]");
assert(str(["a", "b", "ca"]) == "[\"a\", \"b\", \"ca\"]");
string[][] ax = [["Ab", "c"], ["D", "ef"]];
assert(str(ax) == "[[\"Ab\", \"c\"], [\"D\", \"ef\"]]");
string[int] aa = [1:"aa", 2:"ba", 3:"bb"];
assert(str(aa, ' ', 3.154887e-3) == "[1: \"aa\", 2: \"ba\", 3: \"bb\"] 0.00315488");

Structs that don't define a toString have a default representation, their fields between <>, this is very useful:

struct S1 { int x;}
assert( str(S1(7)) == "<7>");
struct S2 { int x, y;}
assert( str(S2(7, 8)) == "<7, 8>");
struct S3 { int x; float y; int z;}
assert( str(S3(2, 7.1, 8)) == "<2, 7.1, 8>");

S1[] a1 = [S1(10), S1(20), S1(30)];
assert( str(a1) == "[<10>, <20>, <30>]" );

S2[] a2 = [S2(10,5), S2(20,6), S2(30,7)];
assert( str(a2) == "[<10, 5>, <20, 6>, <30, 7>]" );

S3[] a3 = [S3(10,5.5,1), S3(20,6.5,2), S3(30,7.5,3)];
assert( str(a3) == "[<10, 5.5, 1>, <20, 6.5, 2>, <30, 7.5, 3>]" );

But toString comes first, when defined:

struct S4 {
    int x;
    string toString() {
        return "S4<" ~ format(x) ~ ">";
    }
}
assert( str(S4(125)) == "S4<125>");

Unions too are pretty-printed by default, among {}:

union U { int x; char c; float f; }
U u;
u.x = 100;
assert(str(u) == "{100}");

All non-printable chars like \t and all the other have a representation with \symbol or \hex:

assert(str("\"", ' ', ["\""]) == `" ["\""]`);
assert(str("ab\tc") == "ab\tc");
assert(str(["ab\tc"]) == "[\"ab\\tc\"]");

// more tests with structs and classes
struct S { int[3] d; }

Everything nests, of course:

auto ay = new S;
ay.d[] = [1, 2, 3];
string str_ay = str(ay);
assert(str_ay.length <= (size_t.sizeof * 2));
foreach(c; str_ay)
    assert( isalnum(c) );
assert(str(*ay) == "<[1, 2, 3]>");

S b;
b.d[] = [1, 1, 1];
assert(str(b) == "<[1, 1, 1]>");

Classes are printed like structs:

class C1 {
    int[3] d;
    string toString() {return "C1" ~ format(d);}
}
auto c1 = new C1;
c1.d[] = [3, 2, 1];
assert(str(c1) == "C1[3,2,1]");

class C2 { int[3] d; }
auto c2 = new C2;
c2.d[] = [3, 2, 1];
assert( str(c2).startsWith("d.string.") );
assert( str(c2).endsWith(".C2") );

You can tell apart static arrays of chars from strings:

assert( str(["ab", "ba"]) == `["ab", "ba"]`);
assert( format(["ab":12, "ba":5]) == "[[a,b]:12,[b,a]:5]" );

string[int] aa2 = [12:"ab", 5:"ba"];
assert(str(aa2) == `[5: "ba", 12: "ab"]`);

char[2][int] aa3 = [12:"ab", 5:"ba"];
assert(str(aa2) == `[5: "ba", 12: "ab"]`);

assert(str([12:"ab", 5:"ba"]) == `[5: "ba", 12: "ab"]`);

assert(str(["ab":12, "ba":5]) == `["ab": 12, "ba": 5]`);

assert(str(["ab":"AB", "ba":"BA"]) == `["ab": "AB", "ba": "BA"]`);

assert(str(['a':'d','b':'e']) == `['a': 'd', 'b': 'e']`);

Empty associative arrays have a special representation:

assert(str(new int[][0]) == "[]");
char[int] aa_empty;
assert(str(aa_empty) == "AA!(int, char)");

aa3 = null;
assert(str(aa3) == "AA!(int, char[2])");
assert(str(aa_empty) == "AA!(int, char)");

More about classes:

// classes
class Cl0 { int a; }
Cl0 cl0;
assert(str(cl0) == "null");

auto cl0b = new Cl0();
cl0b.a = 10;
assert(str(cl0b).startsWith("d.string."));
assert(str(cl0b).endsWith(".Cl0"));

class Cl1 { int a; Cl1 cl; }
Cl1 cl1;
assert(str(cl1) == "null");

auto cl1b = new Cl1();
cl1b.a = 10;
assert(str(cl1b).startsWith("d.string."));
assert(str(cl1b).endsWith(".Cl1"));

Null objects:

class Cl2 {
    int a;
    Cl2 cl;
    string toString() { return "C2[" ~ str(a) ~ " " ~ str(cl) ~ "]"; }
}
Cl2 cl2;
assert(str(cl2) == "null");

auto cl2b = new Cl2();
cl2b.a = 20;
assert(str(cl2b) == "C2[20 null]");


Complex number tests are printed WAY better, try to do the same with writef, here there are many bugs fixed, notice the trailing zeros that help tell apart FP from ints. Hopefully all such many special cases are managed in the correct way:

assert(str(cast(float)-5) == "-5.0");
assert(str(cast(double)-5) == "-5.0");
assert(str(cast(real)-5) == "-5.0");

assert(str(cast(ifloat)-5i) == "-5.0i");
assert(str(cast(idouble)-5i) == "-5.0i");
assert(str(cast(ireal)-5i) == "-5.0i");

assert(str(cast(cfloat)53.25+55i) == "53.25+55.0i");
assert(str(cast(cdouble)53.25+55i) == "53.25+55.0i");
assert(str(cast(creal)53.25+55i) == "53.25+55.0i");

assert(str(cast(cfloat)-53-55i) == "-53.0-55.0i");
assert(str(cast(cdouble)-53-55i) == "-53.0-55.0i");
assert(str(cast(creal)-53-55i) == "-53.0-55.0i");

assert(str(cast(cfloat)-7.25-0i) == "-7.25+0.0i");
assert(str(cast(cdouble)-7.25-0i) == "-7.25+0.0i");
assert(str(cast(creal)-7.25-0i) == "-7.25+0.0i");

assert(str(cast(cfloat)-7-0i) == "-7.0+0.0i");
assert(str(cast(cdouble)-7-0i) == "-7.0+0.0i");
assert(str(cast(creal)-7-0i) == "-7.0+0.0i");

assert(str(7.00001) == "7.00001");

Typedef-ed vars are managed correctly:

// typedef test
typedef int T;
T t = 10;
assert(str(t) == "10");

typedef C1 TC1;
auto tc1 = new TC1;
tc1.d[] = [3, 2, 1];
assert(str(tc1) == "C1[3,2,1]");

tests with void*:

void* void_ptr;
assert( str(void_ptr) == "null" );
void*[] void_star_arr;
assert( str(void_star_arr) == "[]" );
void_star_arr = [null, null, null];
assert( str(void_star_arr) == "[null, null, null]" );


More tricks:

assert(str([`"hello"`]) == `["\"hello\""]`);
assert(str(["`hello`"]) == "[\"`hello`\"]");
assert(str('a', 'b') == "ab");
assert(str(['a', 'b']) == "ab"); // because an array of char is the same as a string
assert(str('\'', '\'') == "''");
assert(str(['\'', '\'']) == "''"); // because an array of char is the same as a string


functions and delegates:

auto d1 = (int i, char c) { return -i; };

assert(str(d1).startsWith("<int delegate(int,char): "));
assert(str(d1).endsWith(">"));

assert(str(d1.funcptr).startsWith("<int function(int,char): "));
assert(str(d1.funcptr).endsWith(">"));


interfaces:

interface I1 {
    void foo(int i);
}
string str_I1 = str(I1.init);
assert( str_I1.startsWith("interface:") );
assert( str_I1.endsWith(".I1") );

As you can see this fixes about 15-20 bugs/troubles with writef/writefln.

Bye,
bearophile



More information about the Digitalmars-d mailing list