First Impressions

Anders F Björklund afb at algonet.se
Fri Sep 29 00:48:06 PDT 2006


Geoff Carlton wrote:

> A simple alias of char[] to string would simplify the first glance code.
>   string x;    // yep, a string
>   main (string[]) // an array of strings
>   string[string] m; // map of string to string
> 
> I believe single functions get pulled in as member functions?  e.g.
> find(string) can be used as string.find()?  If so, it means that all the
> string functionality can be added and then used naturally as member
> functions on this "string" (which is really just the plain old char[] in
> disguise).

Problem of "char[]" is both that it hides the fact that "char" is UTF-8
while at the same time it exposes the fact that it's stored as an array.

You can "improve" upon that readability with aliases, like declaring say
utf8_t -> char and string -> utf8_t[], but you still need to understand
Unicode and Arrays in order to use it outside of the provided methods...
I think "hides the implementation" was the biggest argument against it ?

http://www.prowiki.org/wiki4d/wiki.cgi?UnicodeIssues

> This is a small thing, but I think it would help in terms of the mindset
> of strings being a first class primitive, and clear up simple "hello
> world" examples at the same time.  Put simply, every modern language has
> a first class string primitive type, except D - at least in terms of
> nomenclature.

I did the big mistake of thinking it would be a good thing to be able to
switch between "ANSI" and "UNICODE" builds (of wxD), and so did it like:

version(UNICODE)
     alias char[] string;
else // version(ANSI)
     alias wchar_t[] string; // wchar[] on Windows, dchar[] on Unix

Still trying to sort out all the code problems with that idea, as there
is a ton of toUTF8 and other conversions to make strings work together.


In retrospect it would have been much easier to have stuck with char[],
and do the conversion from UTF-8 to the local encoding on the C++ side.
(since there were no guarantees that the "char" and "wchar_t" types in
C++ used UTF encodings, even if they did so in Unix/GTK+ for instance)
Any (minor) performance issues of having to do the UTF-8 <-> UTF-32
conversions were not worth the hassle of doing it on the D side, IMHO.

So I agree with the "alias char[] string;" and the string[string] args.
It's going to be used as wx.common.string for instance, in wxD library.

--anders



More information about the Digitalmars-d mailing list