[Issue 10203] std.string.toUpperInPlace is... not in place

d-bugmail at puremagic.com d-bugmail at puremagic.com
Thu May 30 02:13:54 PDT 2013


http://d.puremagic.com/issues/show_bug.cgi?id=10203



--- Comment #2 from monarchdodra at gmail.com 2013-05-30 02:13:50 PDT ---
(In reply to comment #1)
> Fixed here
> 
> https://github.com/D-Programming-Language/phobos/pull/1322

Turns out, upon investigation, that my fix is incorrect. What is more
problematic though, is that the review revealed that there are cases where
toUpperInPlace simply *cannot* be inplace: there characters out there, whose
UTF representation is not the same length for upper case and lower case.

What this means is that the lower case representation could be longer than the
original length.

How could this be done in place?

It also reveals more problems: If the resulting string is actually smaller,
than slices that aliased the same data will now reference garbage values. For
example:

'İ', which is u0130, and a two byte encoding will become 'i'.

So when I take the string "İa":
auto a = "\xC4\xB0\x61";
auto b = a;
toLowerInPlace(a);

//Now:
//a == "\x69\xB0"
//b == "\x69\xB0\x61" Oops: Trailing code unit :/

Or: say, I have "aİa", and want to reduce in place "İ"
auto a = "\x61\xC4\xB0\x61";
auto b = a[1 .. 3];
toLowerInPlace(b);

//Now:
//b == "\x69" //OK
//a == "\x61\x69\xB0\x61" //Wait, what is that B0 doinh here?

--------
It seems to me that, overall, toLowerInPlace is a function that is broken, that
cannot respect the specs it promises, and violates the principal of least
surprise in regards to behavior.

I think it should either be tagged with a massive red unsafe, or deprecated.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------


More information about the Digitalmars-d-bugs mailing list