[Issue 5543] to!int to see a char as a single-char string

d-bugmail at puremagic.com d-bugmail at puremagic.com
Fri Dec 21 07:53:44 PST 2012


http://d.puremagic.com/issues/show_bug.cgi?id=5543



--- Comment #10 from monarchdodra at gmail.com 2012-12-21 07:53:36 PST ---
(In reply to comment #5)
> 
> I'm wrapping up a revamp of std.uni that makes it piece of cake to create
> character sets. And maps are converted to multi-staged tables that are faster
> the binary search on a large set. I'd suggest to wait a bit on it (so as to not
> duplicate work) and introduce only std.ascii version as the most useful.
> 
> The ongoing polishing, fixing and testing against ICU is going on here:
> https://github.com/blackwhale/gsoc-bench-2012

OK: The thing I was having trouble though is that existing binary search
returns a bool, whereas I need the actual entry, so I can do "value -
entry[0]", eg:

//----
    static immutable dchar[2][] table1 = [
    [ 0x0030,  0x0039], //
    [ 0x0660,  0x0669], //ARABIC-INDIC
    [ 0x06F0,  0x06F9], //EXTENDED ARABIC-INDIC

...
//---
That's because all the entries in [Nd] are consecutive numerals starting at 0.
I can also cram a select couple of entries from [Nl] and [Po] that also use
this scheme.

So if I have the unicode 0x0665 (The ARABIC-INDIC numeral '6'), I'd want to
find [ 0x0660,  0x0669], and then "return 0x0665 - 0x0660".

Well, I don't need the entire pair, but at least the lhs of the pair.

If you could keep that in mind during your re-write. Or not. Just throwing it
out there.

For all other entries in [Nl] and [Po], I'd have:
    static immutable dchar[2][] table1 = [
    [ 0x261D,  100], //ROMAN NUMERAL ONE HUNDRED

So that's just basic dictionary. But I don't think you can statically allocate
an AA. So yeah, just throwing that your direction too.

> > The file is too large for std.xml to handle, so it's back to C++ for me :/
> > 
> http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
> 
> Same thing but no useless XML trash. Description of fields is somewhere in the
> middle of this document 
> http://www.unicode.org/reports/tr44/

Nice, TY.

> > The only questions I have is:
> > Return value: int or double?
> 
> Should be rational to acurately represent things like "1/5" character ;)
> I do suspect some simple custom type could do (2 shorts packed in one struct
> etc.).
> 
> > Input is not numeric: -1 or exception?
> 
> -1 is fine I think as this rather low level (per character) and it's not at all
> convenient to throw (and then catch).

The only issue I have with returning -1 is that it is a magic value. The fact
that there is no unicode for -1 is pure coincidence, and not by design. In
particular, any attempt to write "if (numericValue(c) < 0) fail" would also be
wrong because:
http://unicode.org/cldr/utility/character.jsp?a=0F33
The TIBETAN DIGIT HALF ZERO returns -0.5

Do we *really* want to standardize the syntax of "if (numericValue(c) < -0.7)"
?

...

Damn you unicode!

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------


More information about the Digitalmars-d-bugs mailing list