[Issue 18105] std.conv.parse!wchar cannot take a string

d-bugmail at puremagic.com d-bugmail at puremagic.com
Wed Dec 20 13:56:54 UTC 2017


https://issues.dlang.org/show_bug.cgi?id=18105

--- Comment #6 from Steven Schveighoffer <schveiguy at yahoo.com> ---
(In reply to dechcaudron+dlang.issue.tracking from comment #5)
> While I agree the solution is not trivial,
> if we do allow parse!char(string), we should allow parse!wchar(string) in
> the same fashion.

The reason you can parse a string into chars is because you can actually do it.
I can consume one char off a string and return it no problem. You can't do the
same with wchar. There's no way to advance the string "partially" into a code
point.

> When using 'char', there is no guarantee that the returned
> value will be a valid UTF-8 code point.

No, but a char is not necessarily a UTF code point, it's a UTF-8 code unit.
There is no direct translation of N chars to 1 wchar. So there's no way to
advance the range properly.

> It just gets the next 'char' code
> unit from the string, and so should 'wchar' IMHO. That is, taking the next 2
> 'char' from the string.

It is NOT the same thing to take 2 chars and stuff them into a wchar. This is
not only incorrect, it's pretty much useless.

I'm not sure of your use case, but I think you want one of 2 things:

1. std.utf.byUTF!wchar:

foreach(c; "hello".byUTF!wchar)
{
   static assert(is(typeof(c) == wchar));
   writeln(c); // writes 'h', 'e', 'l', 'l', 'o' on separate lines.
}

This will properly encode surrogate pairs.

-------

2. cast(ushort[]) myString;

This will look at the string in 16-bit chunks, but these aren't valid
characters.

--


More information about the Digitalmars-d-bugs mailing list