[Issue 5543] to!int to see a char as a single-char string
d-bugmail at puremagic.com
d-bugmail at puremagic.com
Fri Dec 21 07:17:56 PST 2012
http://d.puremagic.com/issues/show_bug.cgi?id=5543
Dmitry Olshansky <dmitry.olsh at gmail.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |dmitry.olsh at gmail.com
--- Comment #5 from Dmitry Olshansky <dmitry.olsh at gmail.com> 2012-12-21 07:17:53 PST ---
>Java even implements
> one taking chars, and another taking int (dchar)
That's because Java folks used to have only 16bit chars. Now true codepoints
are going in form of 'int'.
> http://msdn.microsoft.com/en-us/library/system.char.getnumericvalue.aspx
> http://docs.oracle.com/javase/1.4.2/docs/api/java/lang/Character.html
>
> I'd say we should just add:
> std.ascii.getNumericValue
> std.uni.getNumericValue
> (or plain numericValue)
>
Agreed and the name should be numericValue.
> I already wrote the ascii version (easy as pie), and support for the [Nd]
> group, using a binary search, followed by an offset from the lower bound.
>
> [Nl] and [Po] require a straight up mapping of codepoint to value, but I'm
> still writing the parser that extract the data for the raw UCD
> (http://www.unicode.org/Public/6.2.0/ucdxml/).
>
I'm wrapping up a revamp of std.uni that makes it piece of cake to create
character sets. And maps are converted to multi-staged tables that are faster
the binary search on a large set. I'd suggest to wait a bit on it (so as to not
duplicate work) and introduce only std.ascii version as the most useful.
The ongoing polishing, fixing and testing against ICU is going on here:
https://github.com/blackwhale/gsoc-bench-2012
> The file is too large for std.xml to handle, so it's back to C++ for me :/
>
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
Same thing but no useless XML trash. Description of fields is somewhere in the
middle of this document
http://www.unicode.org/reports/tr44/
> The only questions I have is:
> Return value: int or double?
Should be rational to acurately represent things like "1/5" character ;)
I do suspect some simple custom type could do (2 shorts packed in one struct
etc.).
> Input is not numeric: -1 or exception?
-1 is fine I think as this rather low level (per character) and it's not at all
convenient to throw (and then catch).
--
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
More information about the Digitalmars-d-bugs
mailing list