numericValue for (unicode) characters

Fri Jan 4 09:48:27 PST 2013

On Friday, 4 January 2013 at 13:18:48 UTC, Dmitry Olshansky wrote:
> 04-Jan-2013 15:58, Jonathan M Davis пишет:
>> On Thursday, January 03, 2013 20:40:47 monarch_dodra wrote:
>>> So... do we agree on
>>> ascii: int - not found => -1
>>> uni: double - not found => nan
>>
>> I'm not a fan of the ASCII version returning -1, but I don't 
>> really have a
>> better suggestion. I suppose that you could throw instead, but 
>> I don't know if
>> that's a good idea or not. It _would_ be more consistent with 
>> our other
>> conversion functions however.
>>
>> - Jonathan M Davis
>
> I find low-level stuff that throws to be overly awkward to deal 
> with (not to mention performance problems).
>
> Hm... I've found an brilliant primitive Expected!T that could 
> be of great help in error code vs exceptions problem. See the 
> recent Andrei's talk that went live not long ago:
>
> http://channel9.msdn.com/Shows/Going+Deep/C-and-Beyond-2012-Andrei-Alexandrescu-Systematic-Error-Handling-in-C
>
> Time to put the analogous stuff into Phobos?

I finished an implementation:

https://github.com/D-Programming-Language/phobos/pull/1052

It is not "pull ready", so we can still discuss it.

I raised a couple of issues in the pull, which I'll copy here:

//----
I did run into a couple of issues, namelly that I'm not getting 
100% equivalence between chars that are numeric, and chars with 
numeric value... Is this normal...?

* There's a fair bit of chars that have numeric value, but aren't 
isNumber. I think they might be new in 6.1.0. But I'm not sure. I 
decided it was best to have them return nan, instead of having 
inconsistent behavior.
* There's a couple characters in tableLo that have numeric 
values. These aren't considered in isNumber either. I think this 
might be a bug though.
* There are 4 "non-number numeric" characters in "CUNEIFORM 
NUMERIC SIGN". These return wild values, and in particular two of 
them return -1. I *think* this should actually return nan for us, 
because (AFAIK), -1 is just wild for invalid :/

Maybe we should just return -1 on invalid unicode? Or maybe it's 
just my input file:
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
It doesn't have a separate field for isNumber/numericValue, so it 
is forced to write a wild number. Maybe these four chars should 
return nan?
//----

Oh yeah, I also added isNumber to std.ascii. Feels wrong to not 
have it if we have numericValue.