Update #1 on new std.uni
Dmitry Olshansky
dmitry.olsh at gmail.com
Thu Jan 17 11:31:25 PST 2013
17-Jan-2013 22:48, H. S. Teoh пишет:
> On Wed, Jan 16, 2013 at 02:48:30PM +0400, Dmitry Olshansky wrote:
>> 11-Jan-2013 23:31, Dmitry Olshansky пишет:
>>>
>>> The code, including extra tests and a benchmark is here:
>>> https://github.com/blackwhale/gsoc-bench-2012
>>>
>>> And documentation:
>>> http://blackwhale.github.com/phobos/uni.html
>>>
>>
>> First of all, @safe pure and nothrow is back. Let me know if
>> something is still not.
>>
>> OK, I've made an extra pass through docs with these things in mind:
>> - getting the introduction & terminology part right
>> - more explanations and details where applicable
>> (let me if that's too much / too little / wrong)
>> - hiding away the truly generic (and not easy to use) Trie from
>> documentation
>> - old deprecated stuff is hidden from docs to discourage its use
> [...]
>
> Looks much better now!
>
> Some nitpicks:
Great ;)
> - Under Overview:
>
> [4th paragraph] "It's recognized that an application may need
> further enhancements and extensions. It could be the need for
> less commonly known algorithms or tailoring existing ones for
> regional-specific needs. To help users with building any extra
> functionality beyond the core primitives the module provides:"
>
> The grammar nazi in me thinks a better wording might be (changes
> delimited by {{}}):
>
> "It's recognized that an application may need further
> enhancements and extensions{{, such as}} less{{-}}commonly known
> algorithms{{,}} or tailoring existing ones for
> {{region}}-specific needs. To help users with building any extra
> functionality beyond the core primitives{{,}} the module
> provides:"
>
> The second item in the subsequent list:
>
> A way to construct optimal packed multi-stage tables also known
> as a special case of Trie. {{The functions}} codepointTrie,
> codepointSetTrie construct custom tries that map dchar to value.
> The end result is {{a}} fast and predictable Ο(1) lookup that
> powers functions like isAlpha {{and}} combiningClass{{,}} but
> for user-defined data sets.
>
> The last item in the list:
>
> Access to the commonly{{-}}used predefined sets of code points.
> The commonly{{-}}defined one{{s}} can be observed in the CLDR
> utility, on {{the}} page property index. {{S}}upported ones
> include Script, Block and General Category. See unicode for easy
> {{}} compile-time checked queries.
>
> - Under Terminology:
>
> [[3rd paragraph]] "The minimal bit combination that can represent
> a unit of encoded text for processing or interchange. Depending
> on the encoding this could be: 8-bit code units in the UTF-8
> (($D char)), [...]"
>
> I think you transposed the "$(" here. :)
>
Looks like one commit wasn't pushed :(
I'll peruse you wording though.
> The last sentence in this section appears to be truncated. Maybe a
> runaway DDoc macro somewhere earlier?
>
> - Under Construction of lookup tables, the grammar nazi says:
>
> [[1st sentence]] "{{The}} Unicode standard describes a set of
> algorithms that {{}} depend on having {{the}} ability to quickly
> look{{ }}up various properties of a code point. Given the the
> codespace of about 1 million code points, it is not a trivial
> task to providing a space{{-}}efficient solution for the {{}}
> multitude of properties."
>
> [[2nd paragraph]] "[...] Hash-tables {{have}} enormous memory
> footprint and binary search over intervals is not fast enough
> for some heavy-duty algorithms."
>
> [[3rd paragraph]] "{{(P }}The recommended solution (see Unicode
> Implementation Guidelines) {{}} is using multi-stage tables{{,}}
> that is{{,}} {{an instance}} of Trie with integer keys and {{a}}
> fixed number of stages. For the {{remainder}} of {{this}}
> section {{it will be}} called {{a}} fixed trie. The following
> describes a particular implementation that is aimed for the
> speed of access at the expense of ideal size savings."
>
> [[4th paragraph]] "[...] Split {{the}} number of bits in a key
> (code point, 21 bits) {{into}} 2 components (e.g. 15 and 8). The
> first is the number of bits in the index of {{the}} trie and the
> other is {{the}} number of bits {{in each}} page of {{the}}
> trie. The layout of trie is then an array of size
> 2^^bits-of-index followed an array of memory chunks of size
> 2^^bits-of-page/size-of-element."
>
> [[5th paragraph]] "[...] {{The}} slots of {{the}} index all have
> to contain {{the same[?] number of pages}}. The lookup is then
> just a couple of operations - slice {{the}} upper bits, {{then}}
> look{{ }}up {{the}} index for these{{.}} The pseudo-code is:"
>
> [[Following the code example]] "[...] Where if the elemsPerPage
> is a power of 2 the whole process is a handful of simple
> instructions and 2 array reads. {{Subsequent}} levels of
> {{the}} trie are introduced by recursing {{}} this notion - the
> index array is treated as values. The number of bits in {{the}}
> index is then again split into 2 parts, with pages over
> 'current-index' and {{the}} new 'upper-index'."
>
> [[Next paragraph]] "For completeness the level 1 trie is simply
> an array. {{The}} current implementation takes advantage of
> bit-packing values when the range is known to be limited in {{}}
> advance (such as bool){{.}} {{S}}ee also BitPacked for enforcing
> it manually. [...]"
>
> [[Last paragraph]] "The process of construction of a trie is
> more involved and is hidden from the user in a form of
> {{convenience}} functions: codepointTrie, codepointSetTrie and
> even more convenient toTrie. In general a set or built-in AA
> with dchar type can be turned into a trie. The trie object in
> this module is {{}} read-only (immutable){{;}} it's effectively
> frozen after construction."
>
>
> The grammar nazi has run out of steam, so no more grammar nitpicks for
> now. ;-) But there are still the following questions:
>
> - Why is isControl() not pure nothrow?
>
Missed this one.
> - Why are the isX() functions @system? I would have expected they should
> be at least @trusted? (Or are there technical problems / compiler bugs
> preventing this?)
>
M-hm I'm seeing this in my sources:
bool isAlpha(dchar c) @safe pure nothrow
{...}
The DDoc however shows @system.
A compiler bug?
> That's all for now. I hope you don't mind me allowing the grammar nazi
> to take over for a bit. I want Phobos documentation to be professional
> quality. :)
>
Sure, thanks.
>
> T
>
--
Dmitry Olshansky
More information about the Digitalmars-d
mailing list