Making all strings UTF ranges has some risk of WTF
    Ali Çehreli 
    acehreli at yahoo.com
       
    Wed Feb  3 22:58:42 PST 2010
    
    
  
Andrei Alexandrescu wrote:
 > It's no secret that string et al. are not a magic recipe for writing
 > correct Unicode code. However, things are pretty good and could be
 > further improved by operating the following changes in std.array and
 > std.range:
 >
 > - make front() and back() for UTF-8 and UTF-16 automatically decode the
 > first and last Unicode character
They would yield dchar, right? Wouldn't that cause trouble in templated 
code?
 > - make popFront() and popBack() skip one entire Unicode character
 > (instead of just one code unit)
That's perfectly fine, because the opposite operations do "encode":
     string s = "ağ";
     assert(s.length == 3);
     s ~= 'ş';
     assert(s.length == 5);
 > - alter isRandomAccessRange to return false for UTF-8 and UTF-16 strings
Ok.
 > - change hasLength to return false for UTF-8 and UTF-16 strings
I don't understand that one. strings have lengths. Adding and removing 
does not alter length by 1 for those types. I don't think it's a big 
deal. It is already so in the language for those types. dstring does not 
have that problem and could be used when by-1 change is desired.
 > (b) Operate the change and mention that in range algorithms you should
 > check hasLength and only then use "length" under the assumption that it
 > really means "elements count".
The change sounds ok and hasLength should yield true. Or... can it 
return an enum { no, kind_of, yes } ;)
Current utf.decode takes the index by reference and modifies it by the 
amount. Could popFront() do something similar?
I think that's it: front() and popFront() are separated for cohesion. 
What is causing trouble here is the separation of "by-N" from popFront().
You are concerned that the user makes the assumption and popFront() will 
reduce by 1. I think that is the problem here.
How about something like:
   // returns the amount that the next popFront() will reduce length
   int nextStep();
Ali
    
    
More information about the Digitalmars-d
mailing list