New std.uni: ready for more beating
H. S. Teoh
hsteoh at quickfur.ath.cx
Sat Feb 23 09:14:00 PST 2013
On Wed, Jan 30, 2013 at 01:52:20AM +0400, Dmitry Olshansky wrote:
> Recap:
> During a couple of rounds of the informal review new std.uni had its
> docs happily destroyed, and later re-written based on the feedback.
>
> Notable changes:
>
> - Fixed a couple of latent bugs (ouch!)
>
> - unicode.xyz helper was redesigned to have a clear path for
> extension to properties other then binary ones. For instance to get
> all of code points with hangul syllable type L (leading Jamo):
>
> auto leadingJamo = unicode.hangulSyllableType("L");
>
> - Squeezed extra 31Kb slack from object-file size (32 bits, more on
> 64). Now all of the packed tables occupy around 350Kb (32bits) and
> If you happen to know some tricks to reduce object file size (and in
> turn the executable size), please chime in.
>
> Code & benchmark: https://github.com/blackwhale/gsoc-bench-2012
>
> Docs: http://blackwhale.github.com/phobos/uni.html
> (looks far better without the JS jump-table)
>
> It's a standalone module at the moment. To use in place of current
> std.uni replace 'std.uni'->'uni' in your programs and compare the
> results. Make sure that both uni and unicode_tables modules are
> linked in, rdmd can take care of this dependency.
>
> P.S. Time to go for the formal review?
[...]
Alright, I decided to just jump in and re-review std.uni. I *really*
want to see this in Phobos, the sooner the better.
Here are some comments:
- In the first part of the docs, Terminology section, under "Code unit":
I think you mistyped a ddoc macro, it should be ($(D char)) instead of
(($D char)).
- lineSep, paraSep: are these fixed values? It would be nice to indicate
what their values are.
- UnicodeDecomposition: it would be nice to document the values in this
enum.
- normalize(): I think your code example has a duplicated line (NFKC
example appears twice).
- allowedIn(): How about an example where a character is *not* allowed
in a normalization form?
- InversionList.opBinary: I still prefer ^ instead of ~ for symmetric
difference. In D, ~ means "append", and it's very confusing when x~y
means symmetric difference instead of append.
- unicode.opDispatch: it would be nice to provide links to official
Unicode documentation that lists all blocks/scripts, as a reference.
- combiningClass: maybe provide a link to official Unicode docs that
list combining class values?
OK, a lot of this is just nitpicks... but overall, this new std.uni
looks very good. Looking forward to it being merged into Phobos!
T
--
Marketing: the art of convincing people to pay for what they didn't need
before which you can't deliver after.
More information about the Digitalmars-d
mailing list