[Issue 5543] to!int to see a char as a single-char string

d-bugmail at puremagic.com d-bugmail at puremagic.com
Fri Dec 21 07:17:56 PST 2012


http://d.puremagic.com/issues/show_bug.cgi?id=5543


Dmitry Olshansky <dmitry.olsh at gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dmitry.olsh at gmail.com


--- Comment #5 from Dmitry Olshansky <dmitry.olsh at gmail.com> 2012-12-21 07:17:53 PST ---
>Java even implements
> one taking chars, and another taking int (dchar)

That's because Java folks used to have only 16bit chars. Now true codepoints
are going in form of 'int'.

> http://msdn.microsoft.com/en-us/library/system.char.getnumericvalue.aspx
> http://docs.oracle.com/javase/1.4.2/docs/api/java/lang/Character.html
> 
> I'd say we should just add:
> std.ascii.getNumericValue
> std.uni.getNumericValue
> (or plain numericValue)
> 

Agreed and the name should be numericValue.

> I already wrote the ascii version (easy as pie), and support for the [Nd]
> group, using a binary search, followed by an offset from the lower bound.
> 
> [Nl] and [Po] require a straight up mapping of codepoint to value, but I'm
> still writing the parser that extract the data for the raw UCD
> (http://www.unicode.org/Public/6.2.0/ucdxml/).
> 

I'm wrapping up a revamp of std.uni that makes it piece of cake to create
character sets. And maps are converted to multi-staged tables that are faster
the binary search on a large set. I'd suggest to wait a bit on it (so as to not
duplicate work) and introduce only std.ascii version as the most useful.

The ongoing polishing, fixing and testing against ICU is going on here:
https://github.com/blackwhale/gsoc-bench-2012

> The file is too large for std.xml to handle, so it's back to C++ for me :/
> 
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt

Same thing but no useless XML trash. Description of fields is somewhere in the
middle of this document 
http://www.unicode.org/reports/tr44/

> The only questions I have is:
> Return value: int or double?

Should be rational to acurately represent things like "1/5" character ;)
I do suspect some simple custom type could do (2 shorts packed in one struct
etc.).

> Input is not numeric: -1 or exception?

-1 is fine I think as this rather low level (per character) and it's not at all
convenient to throw (and then catch).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------


More information about the Digitalmars-d-bugs mailing list