Making all strings UTF ranges has some risk of WTF

Ali Çehreli acehreli at yahoo.com
Wed Feb 3 22:58:42 PST 2010


Andrei Alexandrescu wrote:
 > It's no secret that string et al. are not a magic recipe for writing
 > correct Unicode code. However, things are pretty good and could be
 > further improved by operating the following changes in std.array and
 > std.range:
 >
 > - make front() and back() for UTF-8 and UTF-16 automatically decode the
 > first and last Unicode character

They would yield dchar, right? Wouldn't that cause trouble in templated 
code?

 > - make popFront() and popBack() skip one entire Unicode character
 > (instead of just one code unit)

That's perfectly fine, because the opposite operations do "encode":

     string s = "ağ";
     assert(s.length == 3);
     s ~= 'ş';
     assert(s.length == 5);

 > - alter isRandomAccessRange to return false for UTF-8 and UTF-16 strings

Ok.

 > - change hasLength to return false for UTF-8 and UTF-16 strings

I don't understand that one. strings have lengths. Adding and removing 
does not alter length by 1 for those types. I don't think it's a big 
deal. It is already so in the language for those types. dstring does not 
have that problem and could be used when by-1 change is desired.

 > (b) Operate the change and mention that in range algorithms you should
 > check hasLength and only then use "length" under the assumption that it
 > really means "elements count".

The change sounds ok and hasLength should yield true. Or... can it 
return an enum { no, kind_of, yes } ;)

Current utf.decode takes the index by reference and modifies it by the 
amount. Could popFront() do something similar?

I think that's it: front() and popFront() are separated for cohesion. 
What is causing trouble here is the separation of "by-N" from popFront().

You are concerned that the user makes the assumption and popFront() will 
reduce by 1. I think that is the problem here.

How about something like:

   // returns the amount that the next popFront() will reduce length
   int nextStep();

Ali



More information about the Digitalmars-d mailing list