COW vs. in-place.

Thomas Kuehne thomas-dloop at kuehne.cn
Wed Aug 2 13:01:11 PDT 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dawid Ci??arkiewicz schrieb am 2006-08-01:
> Lionello Lunesu wrote:
>
>> 
>> "Dave" <Dave_member at pathlink.com> wrote in message
>> news:ealack$bjg$1 at digitaldaemon.com...
>>>
>>> What if selected functions in phobos were modified to take an optional
>>> parameter that specified COW or in-place? The default for each would be
>>> whatever they do now.
>>>
>>> For example, toupper and tolower?
>>>
>>> How many times have we seen something like this:
>>>
>>> str = toupper(str); // or equivalent in another language.
>> 
>> str being an UTF-8 string, I don't think you can guarantee that it CAN be
>> made uppercase in-place. It seems to me that it's quite possible that some
>> uppercase UNICODE characters are larger than their lowercase versions,
>> possibly crossing an UTF-8 byte-count border. But there are other string
>> functions that don't have this problem.
>
> This _is_ problem.

http://www.unicode.org/reports/tr21/

from ftp://ftp.unicode.org/Public/UNIDATA/CaseFolding.txt
# The data supports both implementations that require simple case
# foldings (where string lengths don't change), and implementations
# that allow full case folding (where string lengths may grow).

This allows to keep the code point count constant, the UTF-8 fragment
count however is a problem. Currently (5.0.0 2006-03-03, 08:22:43 GMT)
there are 9 + 2 cases where the fragment count changes:

# 017F; C; 0073; # LATIN SMALL LETTER LONG S
# 023A; C; 2C65; # LATIN CAPITAL LETTER A WITH STROKE
# 023E; C; 2C66; # LATIN CAPITAL LETTER T WITH DIAGONAL STROKE
# 1FBE; C; 03B9; # GREEK PROSGEGRAMMENI
# 2126; C; 03C9; # OHM SIGN
# 212A; C; 006B; # KELVIN SIGN
# 212B; C; 00E5; # ANGSTROM SIGN
# 2C62; C; 026B; # LATIN CAPITAL LETTER L WITH MIDDLE TILDE
# 2C64; C; 027D; # LATIN CAPITAL LETTER R WITH TAIL

Only used for Turkic languages (tr, az):
# 0049; T; 0131; # LATIN CAPITAL LETTER I
# 0130; T; 0069; # LATIN CAPITAL LETTER I WITH DOT ABOVE

Thomas


-----BEGIN PGP SIGNATURE-----

iD8DBQFE0RG3LK5blCcjpWoRAtjwAJ4wHpa36MrLRwlmBFs86gDdJyLHaQCfRNFI
6Ejb+99BzV5dl2QW9giF8Qg=
=h/xz
-----END PGP SIGNATURE-----



More information about the Digitalmars-d mailing list