numericValue for (unicode) characters

monarch_dodra monarchdodra at gmail.com
Wed Jan 2 06:48:44 PST 2013


There is an ER that would allow to convert characters to numebers:
http://d.puremagic.com/issues/show_bug.cgi?id=5543

For example: '1' => 1
Or, unicode considered: 'Ⅶ' => 7

Long story short, it was decided that it wasn't std.conv.to's job 
to do this conversion, but rather, there should be a function 
called "numericValue" inside std.uni and std.ascii that would do 
this job.

What remains are defining how these methods should work. Things 
to keep in mind:
- ASCII to int should be fast.
- unicode numeric values span from -0.5 to 1.0e12.
- unicode numeric values can be fractional.
- ALL unicode numeric values can be EXACTLY represented in a 
double.

Given these observations, I'd like to propose these:

//------------------------------
//std.ascii.numericValue
/** Given an ascii character, returns that character's
     numeric value if it is numeric ($(D isNumeric)),
     and -1 otherwise
  */
pure @safe nothrow
int numericValue(dchar c);
//------------------------------
//std.uni.numericValue
/** Given a unicode character, returns that character's
     numeric value if it is numeric ($(D isNumeric)),
     and throws an exception otherwise
  */
pure @safe
double numericValue(dchar c);
//------------------------------

The rationale for this:
std.ascii: I think returning -1 as a magic number should help 
keep the code faster and with less clutter than with exceptions. 
returning an int is the obvious choice for numbers that span -1 
to 10.

std.uni: double is the only type that can hold all ranges of 
unicode's numeric values.
This time, uni throws exceptions. This is for two reasons:
1. Choosing a magic number is difficult, and error prone. Correct 
code would have to look like: "if (std.uni.numericValue(c) > 
-0.7) {...}"
2. When dealing with unicode, overhead of the exception is 
probably cleaner and not as critical as with ascii.

***********************************************
Thoughts?

I wanted to get this ER moved forward. I don't think 
uni.numericValue will be finished soon, but I would have wanted 
std.ascii's done sooner rather than later.


More information about the Digitalmars-d mailing list