Auto-casting in range based functions?

Jonathan M Davis jmdavisProg at gmx.com
Sun May 13 13:41:22 PDT 2012


On Sunday, May 13, 2012 19:49:00 Andrew Stanton wrote:
> I have been playing around with D as a scripting tool and have
> been running into the following issue:
> 
> -----------------------------------
> import std.algorithm;
> 
> struct Delim {
>      char delim;
>      this(char d) {
>          delim = d;
>      }
> }
> 
> void main() {
>      char[] d = ['a', 'b', 'c'];
>      auto delims = map!Delim(d);
> }
> 
> /*
> Compiling gives me the following error:
> /usr/include/d/dmd/phobos/std/algorithm.d(382): Error:
> constructor test.Delim.this (char d) is not callable using
> argument types (dchar)
> /usr/include/d/dmd/phobos/std/algorithm.d(382): Error: cannot
> implicitly convert expression ('\U0000ffff') of type dchar to char
> 
> */
> 
> -----------------------------------
> 
> As someone who most of the time doesn't need to handle unicode,
> is there a way I can convince these functions to not upcast char
> to dchar?  I can't think of a way to make the code more explicit
> in its typing.

_All_ string types are considered ranges of dchar and treated as such. That 
means that narrow strings (e.g. arrays of char or wchar) are not random-access 
ranges and have no length property as far as range-based functions are 
concerned. So, you can _never_ have char[] treated as a range of char by any 
Phobos functions. char[] is UTF-8 by definition, and range-based functions in 
Phobos operates on code points, not code units.

If you want a char[] to be treated as a range of char, then you're going to 
have to use ubyte[] instead. e.g.

char[] d = ['a', 'b', 'c'];
auto delims = map!Delim(cast(ubyte[])d);

Now, personally, I would argue that you should just use dchar, not char, 
because regadless of what you are or aren't doing with unicode right now, the 
odds are that you'll end up processing unicode at some point, and if you're in 
the habit of using char, you're going to get all kinds of bugs. So, if you 
just did

struct Delim
{
    dchar delim;

    this(dchar d)
    {
        delim = d;
    }
}

void main()
{
    char[] d = ['a', 'b', 'c'];
    auto delims = map!Delim(d);
}

then it should work just fine. And if you really need a char instead of dchar 
for some reason, you can always just use std.conv.to - to!char(value) - which 
will then throw if you're trying to convert a code point that won't fit in a 
char.

In general, any code which has a variable of char or wchar as a variable 
rather than an element in an array is a red flag which indicates a likely bug 
or bad design. In specific circumstances, you may need to do so, but in 
general, it's just asking for bugs. And you're going to have to be fighting 
Phobos all the time if you try and use ranges of code units rather than ranges 
of code points.

- Jonathan M Davis


More information about the Digitalmars-d-learn mailing list