numericValue for (unicode) characters
monarch_dodra
monarchdodra at gmail.com
Fri Jan 4 09:48:27 PST 2013
On Friday, 4 January 2013 at 13:18:48 UTC, Dmitry Olshansky wrote:
> 04-Jan-2013 15:58, Jonathan M Davis пишет:
>> On Thursday, January 03, 2013 20:40:47 monarch_dodra wrote:
>>> So... do we agree on
>>> ascii: int - not found => -1
>>> uni: double - not found => nan
>>
>> I'm not a fan of the ASCII version returning -1, but I don't
>> really have a
>> better suggestion. I suppose that you could throw instead, but
>> I don't know if
>> that's a good idea or not. It _would_ be more consistent with
>> our other
>> conversion functions however.
>>
>> - Jonathan M Davis
>
> I find low-level stuff that throws to be overly awkward to deal
> with (not to mention performance problems).
>
> Hm... I've found an brilliant primitive Expected!T that could
> be of great help in error code vs exceptions problem. See the
> recent Andrei's talk that went live not long ago:
>
> http://channel9.msdn.com/Shows/Going+Deep/C-and-Beyond-2012-Andrei-Alexandrescu-Systematic-Error-Handling-in-C
>
> Time to put the analogous stuff into Phobos?
I finished an implementation:
https://github.com/D-Programming-Language/phobos/pull/1052
It is not "pull ready", so we can still discuss it.
I raised a couple of issues in the pull, which I'll copy here:
//----
I did run into a couple of issues, namelly that I'm not getting
100% equivalence between chars that are numeric, and chars with
numeric value... Is this normal...?
* There's a fair bit of chars that have numeric value, but aren't
isNumber. I think they might be new in 6.1.0. But I'm not sure. I
decided it was best to have them return nan, instead of having
inconsistent behavior.
* There's a couple characters in tableLo that have numeric
values. These aren't considered in isNumber either. I think this
might be a bug though.
* There are 4 "non-number numeric" characters in "CUNEIFORM
NUMERIC SIGN". These return wild values, and in particular two of
them return -1. I *think* this should actually return nan for us,
because (AFAIK), -1 is just wild for invalid :/
Maybe we should just return -1 on invalid unicode? Or maybe it's
just my input file:
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
It doesn't have a separate field for isNumber/numericValue, so it
is forced to write a wild number. Maybe these four chars should
return nan?
//----
Oh yeah, I also added isNumber to std.ascii. Feels wrong to not
have it if we have numericValue.
More information about the Digitalmars-d
mailing list