Why the hell doesn't foreach decode strings

Peter Alexander peter.alexander.au at gmail.com
Thu Oct 20 14:49:04 PDT 2011


On 20/10/11 8:37 PM, Martin Nowak wrote:
> It just took me over one hour to find out the unthinkable.
> foreach(c; str) will deduce c to immutable(char) and doesn't care about
> unicode.
> Now there is so many unicode transcoding happening in the language that
> it starts to get annoying,
> but the most basic string iteration doesn't support it by default?

D has got itself into a tricky situation in this regard. Doing it either 
way introduces an unintuitive mess.

The way it is now, you get the problem that you just described where 
foreach is unaware of Unicode.

If you changed it to loop as Unicode, then indices won't match up:

immutable(int)[] a = ...
foreach (x, i; a)
     assert(x == a[i]); // ok

immutable(char)[] b = ...
foreach (x, i; b)
     assert(x == b[i]); // not necessarily!

Also, the loop won't necessarily iterate b.length times. There's 
inconsistencies all over the place.

The whole mess is caused by conflating the idea of an array with a 
variable length encoding that happens to use an array for storage. I 
don't believe there is any clean and tidy way to fix the problem without 
breaking compatibility.


More information about the Digitalmars-d mailing list