better string

Jonathan M Davis via Digitalmars-d digitalmars-d at puremagic.com
Wed Jun 7 17:56:39 PDT 2017


On Wednesday, June 07, 2017 10:58:06 Mike B Johnson via Digitalmars-d wrote:
> Why not alias string so that one can easily switch from the old
> string or wstring, etc?
>
> e.g., rename string internally to sstring or whatever.
>
> then globally define
>
> alias string = sstring;
>
> Which can be over realiased to wstring to affect the whole program
>
> alias string = wstring;
>
> Or use a command line to set it or whatever makes you happy.
>
> I'm in the progress of converting a large source code database to
> use the above technique so we can move to using wstring... it is
> not fun. Most code that works with a string should with any
> string encoding, so it shouldn't matter. Making D string
> agnostic(after all, the only main different in 99% of programs is
> the space they take up).
>
> If you are worried about it causing subtle bugs, then don't...
> because those same bugs would occur if one manually had to switch.
>
> By designing techniques to use strings that are agnostic of there
> internal representation should save a lot of headache. For those
> few cases that it matters, simple static analysis works fine.

The official solution for handling multiple string types is to templatize
code and operate on ranges of charaters.

Regardless, all string is is an alias. All of the problems that you're
running into relate to the fact that all built-in D facilities use UTF-8
when they have to choose a character type. Most would agree that if you have
to pick, UTF-8 is the better choice. And it doesn't make sense for something
like .stringof or toString to vary in string type, because D doesn't
overload based on return type, and making those change based on a compiler
flag would make D libraries incompatible with one another if they're not
built exactly the same way. In addition, we'd get yet more problems akin to
what happens with size_t when someone always builds their code on 32-bit or
always on 64-bit and never on the other. Not many types in D vary based on
platform, but the ones that do tend to result in bugs due to folks not
building and testing their code on enough platforms.

In D, it is generally considered best practice to use UTF-8 everywhere in
your code except in places where you need to use UTF-16 or UTF-32. For a lot
of programs, that means using UTF-8 everywhere and then the standard library
functions deal with system APIs for stuff like dealing with files, since
Windows uses UTF-16 for many of its APIs. If you're using the Windows API
directly, that then means doing the conversion yourself with functions like
toUTFz, but most programs don't have to worry about that, and it's still
considered best practice for those that do to convert to UTF-16 when they
have to but to use UTF-8 as much as possible.

If you want to use UTF-16 everywhere throughout your program, then you
certainly can, and many of the standard library facilities will work just
fine that way, because they're templatized and deal with the differences in
character types, but the language and runtime use UTF-8 when they had to
make a choice, and most any library you're going to find for D is going to
use UTF-8 in its API when it's not templated code. I don't think that you're
going to find much support for the idea that you can change all of the
string types in a program with a compiler switch.

D provides solid facilities for converting between different UTF character
encodings, and templates allow you to write code that is encoding-agnostic,
but doing something like Windows' TCHAR is a whole other kettle of fish.

D's general approach is to make it so that the types do not vary from
platform to platform. There are a few cases where it's done to get at the
full address space (size_t) or to get full access to the hardware's
capabilities (real) - or simply because there is no way around it (e.g.
pointers are going to be 32-bits on 32-bit systems and 64-bit or 64-bit
systems) - but in general, the idea has been to make the types vary based on
the platform as little as reasonably possible, and nowhere do the built-in
types vary based on compiler flags. And I would not expect that to change.

But if you feel strongly about it, you can certainly create a DIP and try to
get your proposed changes into the language:

https://github.com/dlang/DIPs

- Jonathan M Davis



More information about the Digitalmars-d mailing list