Update #1 on new std.uni

H. S. Teoh hsteoh at quickfur.ath.cx
Thu Jan 17 10:48:25 PST 2013


On Wed, Jan 16, 2013 at 02:48:30PM +0400, Dmitry Olshansky wrote:
> 11-Jan-2013 23:31, Dmitry Olshansky пишет:
> >
> >The code, including extra tests and a benchmark is here:
> >https://github.com/blackwhale/gsoc-bench-2012
> >
> >And documentation:
> >http://blackwhale.github.com/phobos/uni.html
> >
> 
> First of all, @safe pure and nothrow is back. Let me know if
> something is still not.
> 
> OK, I've made an extra pass through docs with these things in mind:
> - getting the introduction & terminology part right
> - more explanations and details where applicable
>  (let me if that's too much / too little / wrong)
> - hiding away the truly generic (and not easy to use) Trie from
> documentation
> - old deprecated stuff is hidden from docs to discourage its use
[...]

Looks much better now!

Some nitpicks:

- Under Overview:

	[4th paragraph] "It's recognized that an application may need
	further enhancements and extensions. It could be the need for
	less commonly known algorithms or tailoring existing ones for
	regional-specific needs. To help users with building any extra
	functionality beyond the core primitives the module provides:"

  The grammar nazi in me thinks a better wording might be (changes
  delimited by {{}}):

	"It's recognized that an application may need further
	enhancements and extensions{{, such as}} less{{-}}commonly known
	algorithms{{,}} or tailoring existing ones for
	{{region}}-specific needs. To help users with building any extra
	functionality beyond the core primitives{{,}} the module
	provides:"

  The second item in the subsequent list:

	A way to construct optimal packed multi-stage tables also known
	as a special case of Trie. {{The functions}} codepointTrie,
	codepointSetTrie construct custom tries that map dchar to value.
	The end result is {{a}} fast and predictable Ο(1) lookup that
	powers functions like isAlpha {{and}} combiningClass{{,}} but
	for user-defined data sets.

  The last item in the list:

	Access to the commonly{{-}}used predefined sets of code points.
	The commonly{{-}}defined one{{s}} can be observed in the CLDR
	utility, on {{the}} page property index. {{S}}upported ones
	include Script, Block and General Category. See unicode for easy
	{{}} compile-time checked queries.

- Under Terminology:

	[[3rd paragraph]] "The minimal bit combination that can represent
	a unit of encoded text for processing or interchange. Depending
	on the encoding this could be: 8-bit code units in the UTF-8
	(($D char)), [...]"

  I think you transposed the "$(" here. :)

  The last sentence in this section appears to be truncated. Maybe a
  runaway DDoc macro somewhere earlier?

- Under Construction of lookup tables, the grammar nazi says:

	[[1st sentence]] "{{The}} Unicode standard describes a set of
	algorithms that {{}} depend on having {{the}} ability to quickly
	look{{ }}up various properties of a code point. Given the the
	codespace of about 1 million code points, it is not a trivial
	task to providing a space{{-}}efficient solution for the {{}}
	multitude of properties."

	[[2nd paragraph]] "[...] Hash-tables {{have}} enormous memory
	footprint and binary search over intervals is not fast enough
	for some heavy-duty algorithms."

	[[3rd paragraph]] "{{(P }}The recommended solution (see Unicode
	Implementation Guidelines) {{}} is using multi-stage tables{{,}}
	that is{{,}} {{an instance}} of Trie with integer keys and {{a}}
	fixed number of stages. For the {{remainder}} of {{this}}
	section {{it will be}} called {{a}} fixed trie. The following
	describes a particular implementation that is aimed for the
	speed of access at the expense of ideal size savings."

	[[4th paragraph]] "[...] Split {{the}} number of bits in a key
	(code point, 21 bits) {{into}} 2 components (e.g. 15 and 8). The
	first is the number of bits in the index of {{the}} trie and the
	other is {{the}} number of bits {{in each}} page of {{the}}
	trie. The layout of trie is then an array of size
	2^^bits-of-index followed an array of memory chunks of size
	2^^bits-of-page/size-of-element."

	[[5th paragraph]] "[...] {{The}} slots of {{the}} index all have
	to contain {{the same[?] number of pages}}. The lookup is then
	just a couple of operations - slice {{the}} upper bits, {{then}}
	look{{ }}up {{the}} index for these{{.}} The pseudo-code is:"

	[[Following the code example]] "[...] Where if the elemsPerPage
	is a power of 2 the whole process is a handful of simple
	instructions and 2 array reads.  {{Subsequent}} levels of
	{{the}} trie are introduced by recursing {{}} this notion - the
	index array is treated as values. The number of bits in {{the}}
	index is then again split into 2 parts, with pages over
	'current-index' and {{the}} new 'upper-index'."

	[[Next paragraph]] "For completeness the level 1 trie is simply
	an array. {{The}} current implementation takes advantage of
	bit-packing values when the range is known to be limited in {{}}
	advance (such as bool){{.}} {{S}}ee also BitPacked for enforcing
	it manually.  [...]"

	[[Last paragraph]] "The process of construction of a trie is
	more involved and is hidden from the user in a form of
	{{convenience}} functions: codepointTrie, codepointSetTrie and
	even more convenient toTrie. In general a set or built-in AA
	with dchar type can be turned into a trie. The trie object in
	this module is {{}} read-only (immutable){{;}} it's effectively
	frozen after construction."


The grammar nazi has run out of steam, so no more grammar nitpicks for
now. ;-) But there are still the following questions:

- Why is isControl() not pure nothrow?

- Why are the isX() functions @system? I would have expected they should
  be at least @trusted? (Or are there technical problems / compiler bugs
  preventing this?)

That's all for now. I hope you don't mind me allowing the grammar nazi
to take over for a bit. I want Phobos documentation to be professional
quality. :)


T

-- 
The trouble with TCP jokes is that it's like hearing the same joke over and over.


More information about the Digitalmars-d mailing list