Few things II

Tue Aug 7 11:52:27 PDT 2007

bearophile wrote:
> 2d) Regarding the printing I think it's not nice that:
> writefln(["1", "2"])
> prints the same thing as:
> writefln([1, 2])
> So maybe writefln can use "" when printing strings inside arrays/AAs, to differentiate them.

In that case, writefln(1) should not print the same thing as 
writefln("1") either. But I agree that it would be useful to have a 
function that prints strings with "". Preferably, it should be able to 
convert any expression of basic types into a string representation, that 
when parsed by the compiler as a literal yields an identical value.

> 2e) I also think writefln when in printing arrays/AAs can put spaces between the commas.
> I think [el1, el2, el3] and [1:"a", 2:"b"] are more readable than [el1,el2,el3] and [1:a,2:b].

Ideally, a formatter should allow you to define any format you want.

> 2f) I think this is a writefln inconsistency:
> import std.stdio;
> void main() {
>   char[3] a = "abc";
>   writefln([a, a]); // Prints: [abc,abc]
>   writefln([a: 1]); // Prints: [[a,b,c]:1]
> }

This is not a writefln inconsistency as much as it is a DMD inconsistency.

note that typeof([a, a]) == char[][2],
while     typeof([a: 1]) == int[char[3]],

This feels like a bug, but somehow, I have a feeling it is intentional 
to allow string array literals such as:

["a","ab","abc"]

without giving a stupid compiler error about type incompatibility.

IMHO, it would be much better if array literals instead picked the best 
common type, allowing expressions such as: [1, 1.5, 2, 2.5]

> 3) Given an array x1, like a char[], it seems to me that x1.dup is a bit slower compared to:
> char[] x2 = new char[x1.length];
> x2[] = x1;
> I hope this can be fixed (or maybe someone can tell me why).

.dup should definitely not be any slower (unless you have a GC compiled 
with DbC that does a memcmp after each .dup). Do you have a benchmark 
showing this?

> 6) I suggest to add a third optional to the AAs, the numerical progressive index, it seems easy to add, the foreach already supports more than two variables:
> foreach(index, key, value; someAA) {...}
> 
> But I don't like much the fact that the foreach with one parameter scans the values:
> foreach(value; someAA) {...}
> because most times you have to iterate on the keys (in Python too the default iteration on a dict is on its keys). With the key you can find the value, while I believe with the value you can't find the key.

I agree with Sean here. The current behavior is the most consistent one.

> 
> I'd like a way to find the key if you have a pointer to the actual value contained inside the AA. (I have seen you can cast the pointer to the value to a struct and you can start to walk back with that, but I have not succeed yet to find the key in a reliable way, you may suggest it to me. Maybe a function to do this can be added to Phobos.) I can use it to create a "ordered associative array" data structure (using the built-in AAs) that keeps a linked list of the AA pairs, according to the original insertion order inside the AA. To do that linked list of values you may need to find the key of a given value pointer (or a given key-value pair).

Something like this should work with current DMD and derivations that 
keep the memory layout of AA's:

template getKeyFromPtr(Key) {
   Key getKeyFromPtr(Val)(Val *ptr) {
     auto offset = (Key.sizeof + size_t.sizeof - 1) & ~(size_t.sizeof - 1);
     return *cast(Key *)(cast(void*)ptr - offset);
   }
}

(Not really tested though.)

> 9) Bit fields of structs: D is a practical language, so I think it can support them too, to allow people to translate C => D code more simply (D has already chosen similar practical compromises regarding C for other things).

Using string mixins and some CTFE it is not that hard implementing a bit 
field replacement.

> 10) I think the compiler has to accept a line like this too, converting the strings as necessary:
> char[][int] a = [1:"abba", 2:"hello"];

I agree. My solution would be to make string literals dynamic arrays 
instead of static. It doesn't make much sense to put the length into the 
type. "a" and "ab" should be of the same type.

> 11) I have seen that D AAs are quite efficient, but probably there are very good hash implementations around. So "superfasthash" or a modified version of the super refined and optimized Python/Perl hashes can be useful to replace the currently used by DMD. (Python source code is fully open source but code licenses may be a problem anyway, I don't know).

I usually make my own containers when the built in AA doesn't cut it, 
such as:

Map!(K,V), OrderedMap!(K,V), DiskStoredMap!(K,V), and so on.

The only big problem is D's lack of a reference return type.

> 12) I have seen D allows to pass anonymous functions like:
> (int x){return x*x;}
> 
> C# V.3.0 uses a more synthetic syntax:
> x => x*x
> 
> That syntax isn't bad, but I don't know how the compiler automatically manages the types in such situations, it can be a bit complex.

It is not bad, but it can never evaluate to a single function. It would 
have to evaluate to a template alias or similar. But I too wish D had a 
more convenient syntax for single expression delegate literals, such as:

(int x) => x*x

or something.

> Beside them I like list generators from Python, but their syntax isn't much compatibile with D syntax, so I don't know how they can be written.

They could probably be written something like:

int x; // dummy
List(x*x, x, range(10))

or

List((int x) { return x*x; }, range(10));

or (as I have implemented them):

map(range(10), (int x) { return x*x; });

range(10).map((int x) { return x*x; });

or equivalently as a lazy expression:

a = range(1_000_000_000_000_000L).mapLazy((int x) { return x*x; });
writefln(a[555_555_555_555]);

> 18) I don't like still that string literals are immutable while the others not. I suggest to make the string literals mutable as the other ones (automatic dup during program creation). This will avoid some of my bugs (and it can probably avoid problems in porting code Win <=>Linux that uses strings) and help me avoid some tricks to de-constant strings when they are constant. In few situations you may need to create a program that avoids heap activity, so in those situations you may need the normal string literals anyway, in such situations you may add a certain letter after the string literal (like c?) that tells the compile to not perform the automatic dup.

Most strings will never be changed, so this will make lots of 
unnecessary dups. D2.0's strings are constant (pronounced constant but 
spelled invariant) is a better solution. Maybe an automatic dup from 
constant strings into char[] would be a good thing.

> 19) I have created many very general function templates like map, imap, map, zip, sum, max, min, filter, ifilter, all, any, etc, that accept arrays, AAs and iterable classes. It may be useful to add a bit of functional-style programming to D, inside a Phobos module. Such functions may be fit for the 90% of the code where max running speed isn't necessary. They shorten the code, reduce bugs, speed up programming, make the program simpler to read, etc, and being hi-level they may even end helping the compiler produce efficient code (as foreach sometimes does).

I have implemented heaps of such functions too and I believe most people 
have implemented at least a few of those.

I made a proposal to standardize a number of such functions a long time 
ago. It is a shame there is no standard regarding such functions. If not 
only because it would make it so much easier to publish short snippets 
of code without having to constantly define all the basic functions.

Luckily, Tango has some really well implemented and solid functions in 
tango.core.Array. A collection that can only grow.

BTW, I really like your naming convention for in place versions (as I 
assume they are). I might steal that right away :). Or does i mean those 
are iterator functions?

> 20) After few test of mine I think the default way dmd computes struct hashes has to be improved (or removed), to make it compute the hash looking inside the dynamic arrays contained into the struct too. Otherwise it silently gives the "wrong" results.

This is just another example that treating structs as plain old data may 
be a too simplistic view. Another example is comparing two structs for 
equality. One could argue that two structs should be equal if all 
members are equal. That is not how it works today.

> 21) In this newsgroup I've seen that people like a lot the ability of doing:
> txt.split()
> But I don't like it much because it's a syntactical sugar that masks the fact that those split are functions (that pollute the namespace) and they aren't true methods (object functions). That namespace pollution may cause problems, if you have already defined elsewhere a function with the same name and similar signature.

I disagree. First, the name space "pollution" is no problem in D. You 
can never accidentally call an unwanted overload of a function from 
another module. D will issue an ambiguity error when the same identifier 
matches instances in several different modules.

Secondly, I have really grown to like the way of chaining functions on 
arrays. The function call order and the data flow is read left to right. 
An example from my own code:

references ~= refs
	.splitIter()
	.filterIter((char[] c) { return c.length > 2; })
	.map((char[] c) { return a[1..$-1]; });

Writing this as:

references ~= map(filterIterate(refs.splitIterate(),
                                 (char[] c) { return c.length > 2; }),
                   (char[] c) { return a[1..$-1]; });

Is both much harder to read and parse (to me atleast) and also hard to 
indent in any meaningful way.

-- 
Oskar