Making all strings UTF ranges has some risk of WTF
Ali Çehreli
acehreli at yahoo.com
Wed Feb 3 22:58:42 PST 2010
Andrei Alexandrescu wrote:
> It's no secret that string et al. are not a magic recipe for writing
> correct Unicode code. However, things are pretty good and could be
> further improved by operating the following changes in std.array and
> std.range:
>
> - make front() and back() for UTF-8 and UTF-16 automatically decode the
> first and last Unicode character
They would yield dchar, right? Wouldn't that cause trouble in templated
code?
> - make popFront() and popBack() skip one entire Unicode character
> (instead of just one code unit)
That's perfectly fine, because the opposite operations do "encode":
string s = "ağ";
assert(s.length == 3);
s ~= 'ş';
assert(s.length == 5);
> - alter isRandomAccessRange to return false for UTF-8 and UTF-16 strings
Ok.
> - change hasLength to return false for UTF-8 and UTF-16 strings
I don't understand that one. strings have lengths. Adding and removing
does not alter length by 1 for those types. I don't think it's a big
deal. It is already so in the language for those types. dstring does not
have that problem and could be used when by-1 change is desired.
> (b) Operate the change and mention that in range algorithms you should
> check hasLength and only then use "length" under the assumption that it
> really means "elements count".
The change sounds ok and hasLength should yield true. Or... can it
return an enum { no, kind_of, yes } ;)
Current utf.decode takes the index by reference and modifies it by the
amount. Could popFront() do something similar?
I think that's it: front() and popFront() are separated for cohesion.
What is causing trouble here is the separation of "by-N" from popFront().
You are concerned that the user makes the assumption and popFront() will
reduce by 1. I think that is the problem here.
How about something like:
// returns the amount that the next popFront() will reduce length
int nextStep();
Ali
More information about the Digitalmars-d
mailing list