Update #1 on new std.uni

Dmitry Olshansky dmitry.olsh at gmail.com
Thu Jan 17 11:31:25 PST 2013


17-Jan-2013 22:48, H. S. Teoh пишет:
> On Wed, Jan 16, 2013 at 02:48:30PM +0400, Dmitry Olshansky wrote:
>> 11-Jan-2013 23:31, Dmitry Olshansky пишет:
>>>
>>> The code, including extra tests and a benchmark is here:
>>> https://github.com/blackwhale/gsoc-bench-2012
>>>
>>> And documentation:
>>> http://blackwhale.github.com/phobos/uni.html
>>>
>>
>> First of all, @safe pure and nothrow is back. Let me know if
>> something is still not.
>>
>> OK, I've made an extra pass through docs with these things in mind:
>> - getting the introduction & terminology part right
>> - more explanations and details where applicable
>>   (let me if that's too much / too little / wrong)
>> - hiding away the truly generic (and not easy to use) Trie from
>> documentation
>> - old deprecated stuff is hidden from docs to discourage its use
> [...]
>
> Looks much better now!
>
> Some nitpicks:

Great ;)

> - Under Overview:
>
> 	[4th paragraph] "It's recognized that an application may need
> 	further enhancements and extensions. It could be the need for
> 	less commonly known algorithms or tailoring existing ones for
> 	regional-specific needs. To help users with building any extra
> 	functionality beyond the core primitives the module provides:"
>
>    The grammar nazi in me thinks a better wording might be (changes
>    delimited by {{}}):
>
> 	"It's recognized that an application may need further
> 	enhancements and extensions{{, such as}} less{{-}}commonly known
> 	algorithms{{,}} or tailoring existing ones for
> 	{{region}}-specific needs. To help users with building any extra
> 	functionality beyond the core primitives{{,}} the module
> 	provides:"
>
>    The second item in the subsequent list:
>
> 	A way to construct optimal packed multi-stage tables also known
> 	as a special case of Trie. {{The functions}} codepointTrie,
> 	codepointSetTrie construct custom tries that map dchar to value.
> 	The end result is {{a}} fast and predictable Ο(1) lookup that
> 	powers functions like isAlpha {{and}} combiningClass{{,}} but
> 	for user-defined data sets.
>
>    The last item in the list:
>
> 	Access to the commonly{{-}}used predefined sets of code points.
> 	The commonly{{-}}defined one{{s}} can be observed in the CLDR
> 	utility, on {{the}} page property index. {{S}}upported ones
> 	include Script, Block and General Category. See unicode for easy
> 	{{}} compile-time checked queries.
>
> - Under Terminology:
>
> 	[[3rd paragraph]] "The minimal bit combination that can represent
> 	a unit of encoded text for processing or interchange. Depending
> 	on the encoding this could be: 8-bit code units in the UTF-8
> 	(($D char)), [...]"
>
>    I think you transposed the "$(" here. :)
>

Looks like one commit wasn't pushed :(
I'll peruse you wording though.

>    The last sentence in this section appears to be truncated. Maybe a
>    runaway DDoc macro somewhere earlier?
>
> - Under Construction of lookup tables, the grammar nazi says:
>
> 	[[1st sentence]] "{{The}} Unicode standard describes a set of
> 	algorithms that {{}} depend on having {{the}} ability to quickly
> 	look{{ }}up various properties of a code point. Given the the
> 	codespace of about 1 million code points, it is not a trivial
> 	task to providing a space{{-}}efficient solution for the {{}}
> 	multitude of properties."
>
> 	[[2nd paragraph]] "[...] Hash-tables {{have}} enormous memory
> 	footprint and binary search over intervals is not fast enough
> 	for some heavy-duty algorithms."
>
> 	[[3rd paragraph]] "{{(P }}The recommended solution (see Unicode
> 	Implementation Guidelines) {{}} is using multi-stage tables{{,}}
> 	that is{{,}} {{an instance}} of Trie with integer keys and {{a}}
> 	fixed number of stages. For the {{remainder}} of {{this}}
> 	section {{it will be}} called {{a}} fixed trie. The following
> 	describes a particular implementation that is aimed for the
> 	speed of access at the expense of ideal size savings."
>
> 	[[4th paragraph]] "[...] Split {{the}} number of bits in a key
> 	(code point, 21 bits) {{into}} 2 components (e.g. 15 and 8). The
> 	first is the number of bits in the index of {{the}} trie and the
> 	other is {{the}} number of bits {{in each}} page of {{the}}
> 	trie. The layout of trie is then an array of size
> 	2^^bits-of-index followed an array of memory chunks of size
> 	2^^bits-of-page/size-of-element."
>
> 	[[5th paragraph]] "[...] {{The}} slots of {{the}} index all have
> 	to contain {{the same[?] number of pages}}. The lookup is then
> 	just a couple of operations - slice {{the}} upper bits, {{then}}
> 	look{{ }}up {{the}} index for these{{.}} The pseudo-code is:"
>
> 	[[Following the code example]] "[...] Where if the elemsPerPage
> 	is a power of 2 the whole process is a handful of simple
> 	instructions and 2 array reads.  {{Subsequent}} levels of
> 	{{the}} trie are introduced by recursing {{}} this notion - the
> 	index array is treated as values. The number of bits in {{the}}
> 	index is then again split into 2 parts, with pages over
> 	'current-index' and {{the}} new 'upper-index'."
>
> 	[[Next paragraph]] "For completeness the level 1 trie is simply
> 	an array. {{The}} current implementation takes advantage of
> 	bit-packing values when the range is known to be limited in {{}}
> 	advance (such as bool){{.}} {{S}}ee also BitPacked for enforcing
> 	it manually.  [...]"
>
> 	[[Last paragraph]] "The process of construction of a trie is
> 	more involved and is hidden from the user in a form of
> 	{{convenience}} functions: codepointTrie, codepointSetTrie and
> 	even more convenient toTrie. In general a set or built-in AA
> 	with dchar type can be turned into a trie. The trie object in
> 	this module is {{}} read-only (immutable){{;}} it's effectively
> 	frozen after construction."
>
>
> The grammar nazi has run out of steam, so no more grammar nitpicks for
> now. ;-) But there are still the following questions:
>
> - Why is isControl() not pure nothrow?
>

Missed this one.

> - Why are the isX() functions @system? I would have expected they should
>    be at least @trusted? (Or are there technical problems / compiler bugs
>    preventing this?)
>

M-hm I'm seeing this in my sources:
bool isAlpha(dchar c) @safe pure nothrow
{...}

The DDoc however shows @system.

A compiler bug?

> That's all for now. I hope you don't mind me allowing the grammar nazi
> to take over for a bit. I want Phobos documentation to be professional
> quality. :)
>

Sure, thanks.

>
> T
>


-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list