New std.uni: ready for more beating

H. S. Teoh hsteoh at quickfur.ath.cx
Sat Feb 23 09:14:00 PST 2013


On Wed, Jan 30, 2013 at 01:52:20AM +0400, Dmitry Olshansky wrote:
> Recap:
> During a couple of rounds of the informal review new std.uni had its
> docs happily destroyed, and later re-written based on the feedback.
> 
> Notable changes:
> 
> - Fixed a couple of latent bugs (ouch!)
> 
> - unicode.xyz helper was redesigned to have a clear path for
> extension to properties other then binary ones. For instance to get
> all of code points with hangul syllable type L (leading Jamo):
> 
> auto leadingJamo = unicode.hangulSyllableType("L");
> 
> - Squeezed extra 31Kb slack from object-file size (32 bits, more on
> 64). Now all of the packed tables occupy around 350Kb (32bits) and
> If you happen to know some tricks to reduce object file size (and in
> turn the executable size), please chime in.
> 
> Code & benchmark: https://github.com/blackwhale/gsoc-bench-2012
> 
> Docs: http://blackwhale.github.com/phobos/uni.html
> (looks far better without the JS jump-table)
> 
> It's a standalone module at the moment. To use in place of current
> std.uni replace 'std.uni'->'uni' in your programs and compare the
> results. Make sure that both uni and unicode_tables modules are
> linked in, rdmd can take care of this dependency.
> 
> P.S. Time to go for the formal review?
[...]

Alright, I decided to just jump in and re-review std.uni. I *really*
want to see this in Phobos, the sooner the better.

Here are some comments:

- In the first part of the docs, Terminology section, under "Code unit":
  I think you mistyped a ddoc macro, it should be ($(D char)) instead of
  (($D char)).

- lineSep, paraSep: are these fixed values? It would be nice to indicate
  what their values are.

- UnicodeDecomposition: it would be nice to document the values in this
  enum.

- normalize(): I think your code example has a duplicated line (NFKC
  example appears twice).

- allowedIn(): How about an example where a character is *not* allowed
  in a normalization form?

- InversionList.opBinary: I still prefer ^ instead of ~ for symmetric
  difference. In D, ~ means "append", and it's very confusing when x~y
  means symmetric difference instead of append.

- unicode.opDispatch: it would be nice to provide links to official
  Unicode documentation that lists all blocks/scripts, as a reference.

- combiningClass: maybe provide a link to official Unicode docs that
  list combining class values?


OK, a lot of this is just nitpicks... but overall, this new std.uni
looks very good. Looking forward to it being merged into Phobos!


T

-- 
Marketing: the art of convincing people to pay for what they didn't need
before which you can't deliver after.


More information about the Digitalmars-d mailing list