array of randomly generated names

Sat Oct 16 03:42:26 PDT 2010

On Saturday 16 October 2010 02:24:11 spir wrote:
> > > 1. In the  inner loop generating name, I have found neither a way to
> > > feed directly ints into name, nore a way to cast ints to chars using
> > > to! (also found no chr()). So, I had to list letters. But this
> > > wouldn't work with a wide range of unicode chars... How to build name
> > > directly from random ints?
> > 
> > I would expect to!dchar(num) to work where num is an integral value
> > [...]. If, for some reason, to!dchar(num) does not work, then you can
> > simply cast it cast(dchar)(num), much as that's less desirable.
> 
> Read somewhere, I guess, that cast(type) is deprecated in favor of
> to!(type).

Usually, it's better to use to!(), and it should work, but there are cases where 
cast() is the better choice. It certainly isn't deprecated.

> > [...]
> > 
> > > 2. I was surprised to get all names equal... Seems that "names[i] =
> > > name" actually copies a ref to the name. Is there another way to
> > > produce a copy than "names[i][] = name"?
> > 
> > You really should take a look at
> > http://www.digitalmars.com/d/2.0/arrays.html . Static arrays are value
> > types, but dynamic arrays are reference types.
> 
> My bad! I read this page, but did not realise that this applies to
> (flexible) arrays of *char, like the ones needed for text processing.

Arrays are arrays are arrays. Whether they are mutable, const, or immutable 
isn't particularly relevant except insofar as you want to alter them. string is 
immutable(char)[], so the array reference is mutable, but its elements are not.

> > You can ever slice them without making any copies. e.g.
> > 
> > string a = "hello world";
> > string b = a[1 .. 7]; //it's a slice
> > assert(b == "ello w");
> > 
> > No copying is taking place there.
> 
> Oh, yes. You mean since the string data is immutable, there is no risk in
> sharing strings? Does this really mean (like lisp lists, for instance),
> that b really shares elements (chars) with a? (So that, if they were
> mutable instead, changing chars in b would then change a?)

Variables are thread-local by default, so sharing strings isn't normally an 
issue anyway, but yes, the immutability of strings does make them properly 
shareable between threads.

And if you mean shareable between functions or objects, then immutable makes it 
so that the elements cannot be altered even if the reference is passed to 
another function. However, unless you specifically pass a mutable array by 
reference (that is with the ref modifier), then the original array reference 
cannot be altered by the function that it's passed to (so, your original array 
can't be shrunken or enlarged or whatnot), but its elements can be altered. And 
if the array in the function which was called does anything which would cause 
the array to have to reallocate memory (like appending to it), then it will no 
longer refer to the same memory as the original array and altering it won't 
affect the original array.

This stackoverflow question has some good info on arrays and what will and won't 
cause them to reallocate: http://stackoverflow.com/questions/3416657/is-it-bad-
practice-to-alter-dynamic-arrays-that-have-references-to-them/3417778#3417778

> > If you want a copy an array, you use dup (or
> > idup if you want an immutable copy). e.g.
> > 
> > string a = "hello world";
> > string b = a.idup; //It's an immutable copy.
> > 
> > Or, if you want to copy an array into an array, you'd do
> > 
> > string a = "hello world";
> > char[] b = new char[](a.length);
> > b[] = a[]; //it's a copy.
> > assert(b == "hello world");
> 
> Right. I'll use dup, seems more self-commenting for me.
> By the way, is it possible to alias funcs/methods like types? (I'll
> try...). To me, "copy" would be far more obvious than "dup" ;-)

dup is standard. If you're going to be coding much in D, you're pretty much 
going to have to put up with it or other people are gonig to harder time reading 
your code.

> > > 3. As you see, I individually set the length of each names[i] in the
> > > outer loop. (This, only to be able to copy, else the compiler
> > > complains about unequals lengths.) How can I set the length of all
> > > elements of names once and for all?
> > 
> > You're dealing with a multi-dimensional array. The inner array is empty
> > until you set it, so of course it won't work to index it until it's been
> > set. If you want to set the whole thing at once, then do
> > 
> > auto names =new dchar[][](numNames, nameLength);
> 
> Great, thank you. That's what I was looking for. I'm a bit lost with all
> possible syntaxes to perform similar things (wouldn't have thought at
> using "new" on an array). Is "auto" here used because it looks stupid to
> repeat the type, which is neccessary on right side? Also, let's say names
> are dstrings (by casting once built) instead of dchar[]. Is it still
> possible to dimension names at startup? I mean, how to tell D the size of
> elements (names)?

It's quite common to use auto pretty much all the time when declaring variables 
in D. It avoids having to type the type twice, and it makes it far easier to 
change the types of variables later.

As for new, you're dealing with dynamic arrays, of course you'd use new. They're 
on the heap. It's just that when you concatenate or append to an array or alter 
its length, then the array itself deals with reallocating memory.

auto s = new string[](5, 7);

would declare an dynamic array of strings of length 5 where every string in the 
array is of length 7 (with each character in the array default initialized to 
char.init).

> > Now, personally, I would argue that you really should be using string as
> > much as possible (or dstring when you have to) and avoid mutable arrays
> > of char, wchar, or dchar.
> 
> Right, I will try to reverse my point of view and follow your advice as
> much as possible. Thus, work basically with *string and use *char[] only
> for the really text processing parts of code.

I'm not sure why you're using * here. There are no pointers involved with 
arrays. You _can_ get at the pointer to an array by using its ptr property, but 
that's pretty much just for passing arrays to C code.

> > That being the case, I'd advise doing this
> > 
> > auto names = new string[](numNames);
> > 
> > then use dchar[] in the for loop (maybe even make it a static one to
> > avoid the memory allocation) and the use to!string() to create a string
> > from it and put it in the list of names. e.g.
> > 
> > dchar[nameLength] name;
> > //...
> > names[i] = to!string(name[]); (since it's a static array in this case,
> > you have to slice it to pass it to to!()).
> 
> Hum, I'm not sure this works because nameLength is a variable,; or does it?
> (I'll try) It looks strange to me to define a static array from a variable
> length ;-) Is the memory allocation issue really relevant, since if it's
> not allocated for name, then it must be for names[i]? Also, would idup
> work here, instead of (pseudo-)slicing? (I'll try this, too)

Actually, it turns out that you can't create a static array with a size which is 
set at runtime. I thought that you could. You can in C++ with gcc, so it's 
certainly possible to make the language do that, but apparently you can't in D. 
I don't use static arrays much, so I forgot that.

The reason that you have to slice name when passing it to to!() is because to!() 
won't take static arrays. Slicing a static array gives you reference to the 
static array and is thus a dynamic array.

In this case, using to!() allows you to convert from dchar[] to string. Also, 
it's generally better to using to!() rather than idup when converting strings 
because to!() won't do an idup if it doesn't have to (like if you passed it a 
string rather than a char[]), and of course idup won't convert from one 
character type to another.

> > > 4. Is there a kind of map(), or a syntax like list comprehension, to
> > > generate array content from a formula? (This would here replace both
> > > loops.)
> > 
> > Not that I'm aware of. Though, if you could define a range (IIRC it would
> > need to be an input range) which generates the next element in the array
> > when popFront() is called, then you could use std.array.array() to
> > create an array from such a range.
> 
> Right, later I'll see what D ranges are (only read evocation of them as of
> now). I suspect they are more or lass what is often called iterators in
> other languages (or cursors in Eiffel).

Ranges are _not_ iterators. They're similar, and in their most basic form, 
they're a pair of iterators, but they're far more powerful than iterators. This 
article would be a good place to start: 
http://www.informit.com/articles/article.aspx?p=1407357

> > Also, as a side note, I wouldn't advise using an alias for char[] if you
> > intend other people to be reading your code. It's just going to confuse
> > people.
> 
> Right. I agree with you for a general alias (like my Text). But it's also
> considered good practice (maybe nor in the D community) to give specific
> type names to specific meanings (interface), even when the type itself is
> not changed (implementation). This makes for better self-commenting code.
> For instance, "alias dchar[] ProductCode" (if computed) or "alias dstring
> ProductCode" (if read).

Generally, I think that that's frowned upon. You can do it, and it is done some 
of the time, but generally, the prevailing wisdom is that using aliases like 
this just leads to confusion because people don't know what the type is and 
assume that it's a struct or a class and have to look it up to figure out what's 
going on. If you have a long, nasty template type or somesuch, then such an 
alias could be quite useful, to be sure, but I think that you'll find that 
something like alias char[] Text; will generally be frowned upon in the D 
community.

As for dchar not be able to hold all code points in a single code unit, it's 
UTF-32 with whatever comes with that, and it was my understanding that it can 
hold all characters in a single code unit. Now, even if that isn't true, the 
best way to handle individual characters in strings is to use dchar and dstring. 
Typically, however, most code in D is going to treat a string as a single entity 
and not attempt to manipulate individual characters. If you'll notice, even 
something like std.string.indexOf() takes a string, not a character (be it char, 
wchar, or dchar). So, you don't generally have to worry about how many code 
units there are per code points. You just manipulate strings.

- Jonathan M Davis