toString vs. toUtf8

Sun Dec 16 12:41:17 PST 2007

Sean Kelly wrote:
>
> I don't suppose there is anyone who does a lot of internationalization
> programming who can comment on the utility of one convention vs. the
> other?  I would love to hear some more practical concerns regarding the
> naming convention for these functions.

A D-wide (at least optionally *enforced*) specification that
the various types of "character" arrays are really strings,
not just arrays of the underlying storage types, would mean
that no convention would be needed to convey the meaning, and
the simpler name could be used safely (as the type system
would imply the encoding).

(I can't help but think that this is one more reason why
string types should *not* be built-in arrays, even if they
are known to the compiler, but I think my chances of
persuading Walter that string==array is a mistake are
three quarters of ten percent of none at all.)

In the absence of a language-enforced/mandated encoding,
it's up to the library to force programmers to consider
these issues; in that case, names making the encoding
clear (even names as ugly as toUtf8) are better than a
more readable, more generic but less intention-conveying
name like toString.

Most of the code I see (in C, C++, Java and more) is far
too sloppy about knowing which encoding is used for a
given string.  Unicode is now mature enough to make some
sense for the default in programming languages.  Ideally
I'd make the encoding something akin to a template parameter,
so that the compiler's type-checking could help out -- but
I digress into language design (as is inevitable when high
level facilities like strings are made part of the language
rather than being "just" standard library features).

-- James