Auto-casting in range based functions?
Jonathan M Davis
jmdavisProg at gmx.com
Sun May 13 13:41:22 PDT 2012
On Sunday, May 13, 2012 19:49:00 Andrew Stanton wrote:
> I have been playing around with D as a scripting tool and have
> been running into the following issue:
>
> -----------------------------------
> import std.algorithm;
>
> struct Delim {
> char delim;
> this(char d) {
> delim = d;
> }
> }
>
> void main() {
> char[] d = ['a', 'b', 'c'];
> auto delims = map!Delim(d);
> }
>
> /*
> Compiling gives me the following error:
> /usr/include/d/dmd/phobos/std/algorithm.d(382): Error:
> constructor test.Delim.this (char d) is not callable using
> argument types (dchar)
> /usr/include/d/dmd/phobos/std/algorithm.d(382): Error: cannot
> implicitly convert expression ('\U0000ffff') of type dchar to char
>
> */
>
> -----------------------------------
>
> As someone who most of the time doesn't need to handle unicode,
> is there a way I can convince these functions to not upcast char
> to dchar? I can't think of a way to make the code more explicit
> in its typing.
_All_ string types are considered ranges of dchar and treated as such. That
means that narrow strings (e.g. arrays of char or wchar) are not random-access
ranges and have no length property as far as range-based functions are
concerned. So, you can _never_ have char[] treated as a range of char by any
Phobos functions. char[] is UTF-8 by definition, and range-based functions in
Phobos operates on code points, not code units.
If you want a char[] to be treated as a range of char, then you're going to
have to use ubyte[] instead. e.g.
char[] d = ['a', 'b', 'c'];
auto delims = map!Delim(cast(ubyte[])d);
Now, personally, I would argue that you should just use dchar, not char,
because regadless of what you are or aren't doing with unicode right now, the
odds are that you'll end up processing unicode at some point, and if you're in
the habit of using char, you're going to get all kinds of bugs. So, if you
just did
struct Delim
{
dchar delim;
this(dchar d)
{
delim = d;
}
}
void main()
{
char[] d = ['a', 'b', 'c'];
auto delims = map!Delim(d);
}
then it should work just fine. And if you really need a char instead of dchar
for some reason, you can always just use std.conv.to - to!char(value) - which
will then throw if you're trying to convert a code point that won't fit in a
char.
In general, any code which has a variable of char or wchar as a variable
rather than an element in an array is a red flag which indicates a likely bug
or bad design. In specific circumstances, you may need to do so, but in
general, it's just asking for bugs. And you're going to have to be fighting
Phobos all the time if you try and use ranges of code units rather than ranges
of code points.
- Jonathan M Davis
More information about the Digitalmars-d-learn
mailing list