Formal Review of std.uni
Dmitry Olshansky
dmitry.olsh at gmail.com
Sun May 12 12:27:59 PDT 2013
30-Apr-2013 23:17, Jonathan M Davis пишет:
> On Tuesday, April 30, 2013 15:13:14 Dmitry Olshansky wrote:
>> Unicode --> can't be done on character by character basis
>
> Sure it can. It operates on dchar.
Getting back to this.
Sure it can't - I'd hate to break the illusion but the keyword is e.g.
Unicode Case Folding. Another one is Combining Character sequence.
> So, with how it's been, std.uni would only be operating on dchars, and putting
> a function in there which operated on strings wouldn't make any sense. Maybe
> that doesn't work if you've done a bunch of grapheme stuff, and things will
> have to be adjusted, but it would be a definite shift to put anything in
> std.uni which operated on strings, and I think that it would need some definite
> justification (and there's a good chance that I'd be inclined to argue that it
> should still go in std.string, possibly using some internal modules if
> necessary).
Justification is that we'd rather have exactly one module dealing with a
bunch of Unicode data arranged into intricate tables.
Strictly speaking I'd abolish any Unicode related algorithm in
std.string since it's almost definitely doing it wrong anyway (I've
checked only 2 - both broken).
There is not a single sign of unicode standards used, just the
fallacious logic: byte --> dchar and use the same algorithm as with
ASCII. It won't work.
>
> But clearly I need to take the time to take a look at what you've actually
> done (I keep meaning to but haven't gotten around to it yet). It had been my
> impression that what you were doing was primarily a matter of improving the
> implementation, but it sounds like you're doing something beyond that.
Take a peek at icmp and sicmp in new std.uni.
Current fork of Phobos is here:
https://github.com/blackwhale/phobos/tree/new-std-uni
Eventually we'd have to do a bit more in the same direction e.g. title
casing, split by word boundary etc. (all of these need fixing in
std.string).
Also all of the core tools are now in the open: CodepointSet, and
generating Tries from sets and AA-s.
--
Dmitry Olshansky
More information about the Digitalmars-d
mailing list