numericValue for (unicode) characters

monarch_dodra monarchdodra at gmail.com
Fri Jan 4 14:48:39 PST 2013


On Friday, 4 January 2013 at 22:00:02 UTC, Dmitry Olshansky wrote:
> 05-Jan-2013 00:51, monarch_dodra пишет:
>> Anyways, those 4 CUNEIFORM asside, what do you make of the
>> entries in Lo:
>> http://unicode.org/cldr/utility/character.jsp?a=F96B
>> These appear to be numeric, but aren't inside Nd/No/Nl. They
>> should return true to isNumber, no?
>
> Hmmm. Take a look here:
> http://unicode.org/cldr/utility/properties.jsp
>
> There is a section called Numeric that has 3 properties,
> and then there is a General section.
> The General has Category which in turn has 'Number' category.
>
> Bottom line is that I believe that std.uni isXXX queries the 
> category of a symbol and not some other property. Let any 
> mishaps in between properties and general category be 
> consortium's headache.
>>
>> Maybe isNumber's "documented behavior" is wrong?
>
> Problem is I can't come up with a good description of some 
> other behavior. Maybe this one [^[:Numeric_Type=None:]]
> http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%5E%5B%3ANumeric_Type%3DNone%3A%5D%5D&g=

Sounds like the root of the problem is that isNumber != 
Numeric_Type[Decimal, Digit, Numeric]

Ergo, there is no correlation between isNumber and numericValue.

Feels like there is a lot missing from std.uni, but at the same 
time, unicode is really huge.

At the very least, I think we should have Category enum, along 
with a (get) "category" function.

I was just saying to jmdavis in the pull that std.ascii had 
"isDigit", but that uni didn't. In truth, both also lack 
isDecimal and isNumeric.

There would just be a bit of ambiguity now between the broad 
"isNumeric", and "all the chars that have a numeric value"... :/

Damn. Unicode is complicated.

Anyways, taking my weekend break.


More information about the Digitalmars-d mailing list