georg.wrede at nospam.org
Tue Oct 3 02:44:53 PDT 2006
Kevin Bealer wrote:
> Georg Wrede wrote:
>> Walter Bright wrote:
>>> True, and some have called for renaming char to utf8. While that
>>> would be technically more correct (as toUTF would be, too), it just
>>> looks awful.
>> Let's just say it would be a first step in lessening the confusion
>> _we_ create in newcomers' heads.
> I would kind of agree with this, but I think it's a two-edged knife.
> If we say 'char' then users don't know it's a string until they read
> the 'why D arrays are great' page (which they should read, but...)
> If we say 'string' then we hide the fact that  can be applied and that
> other array-like operations can work.
> For instance, from a Java perspective:
> char : Users don't know that it's "String"; users see it as low-level.
> Some will try to write things like 'find()' by hand since they
> will figure arrays are low level and not expect this to exist.
> string : Users will think it's immutable, special; they will ask "how do
> I get one of the characters out of a string", "how do I convert
> string to char?", and other things that would be obvious
> without the alias.
Well, with string, folks would at least be inclined to search for the
library function to do it.
Overall, having string instead of char should result in folks learning
and doing more with D _before_ they get tangled with UTF issues. (I
guess, getting tangled with UTF is unavoidable.) But the more later
folks stumble on this, the better they can handle it. If it happens too
soon, then they will just run away from D.
But substituting string for char in D is not enough. More than half
the issue is the wording in the docs.
Another thing intimately connected with this is whether we should have
char or utf8 (string or no string, this is an important thing anyway).
I understand that "char" is one of the words that a seasoned
programmer's fingers know by heart. So it would feel simply disgusting
to have to learn (and bother) to write "utf8" which I admit is a lot
more work to type. (Seriously.)
Now, "string" is easy for the fingers, and then you get to skip "",
which makes it all a little more palatable.
Having string would let us have the underlying type be utf8, which
really emphasizes and calls your attention to the fact that it's not
byte-by-byte stuff we have there.
More information about the Digitalmars-d