ICU D Wrapper
Trent Forkert via Digitalmars-d
digitalmars-d at puremagic.com
Sat Dec 13 09:28:24 PST 2014
On Saturday, 13 December 2014 at 15:44:59 UTC, Sean Kelly wrote:
> On Friday, 12 December 2014 at 17:57:41 UTC, Trent Forkert
> wrote:
>>
>> I've looked into writing a binding for ICU recently, but
>> ultimately decided to abandon that idea in favor of writing a
>> replacement for it in D.
>
> Wow... really? You're actually going to write transcoders for
> all available encodings? Plus the conversion and parsing tools,
> plus expand our calendar functionality to handle the things it
> doesn't do now, plus... I mean I'd love it, but the scope of
> the project can be measured in tens of man-years.
Running down the icu4c API listing:
* Basic Types and Constants - only as needed
* Strings and character iteration - Just use D strings, std.string
* Unicode character properties and names - I think std.uni
handles this
* Sets of Unicode Code Points and Strings - ditto
* Codepage conversion - ignoring, at least for now. See below.
* Unicode text compression - again, I think std.uni handles this
* Locales - yes
* Resource Bundles - will offer equivalent functionality, just
not identical
* Normalization - std.uni
* Calendars - see below
* Date and time formatting - yes
* Message formatting - yes
* Number formatting / spell-out - yes
* Transliteration - yes, but may be delayed until after initial
release
* Bidirectional Algorithm - not at first, is this in std.uni?
* Arabic shaping - not at first, is this in std.uni?
* Collation - I'm delaying this until after the initial release
to get it out faster
* String searching - depends on Collation
* Index characters - depends on Collation
* Text Boundary analysis - depends on Collation
* Regular Expression - use std.regex
* StringPrep - not initially, is this in std.uni?
* IDNA - not initially, is this in Phobos?
* Identifier spoofing and confusability - not initially
* Layout engine - delayed, looks like ICU is removing this and
pointing to another library
* Universal Time Scale - see below
* ICU I/O - use phobos
There are very few things above that are not possible to generate
from CLDR data. Of those, most are RFC-defined algorithms,
several of which I believe are already part of Phobos.
If I add codepage conversion, it will likely be in terms of iconv
on POSIX and MultiByteToWideChar and friends on Windows.
Alternatively, I could "borrow" the IBM CDRA/UCM data the way I'm
getting almost everything else from CLDR data.
Support of other calendar systems is up in the air at the moment.
I had thought CLDR contained what I needed, but it looks like it
might not. It has locale-specific formatting and display info for
calendars, and mappings to when other calendar's eras begin in
terms of the Gregorian calendar, but I don't see further
breakdown of information. So, initially it looks like I'll only
be supporting Gregorian calendar, but I may add the others in the
future.
It is a lot of work, yes, but the Unicode Consortium already does
a significant chunk of it with CLDR.
- Trent
More information about the Digitalmars-d
mailing list