Why the hell doesn't foreach decode strings

Martin Nowak dawg at dawgfoto.de
Thu Oct 20 16:57:06 PDT 2011


On Thu, 20 Oct 2011 21:58:20 +0200, Jonathan M Davis <jmdavisProg at gmx.com>  
wrote:

> On Thursday, October 20, 2011 21:37:56 Martin Nowak wrote:
>> It just took me over one hour to find out the unthinkable.
>> foreach(c; str) will deduce c to immutable(char) and doesn't care about
>> unicode.
>> Now there is so many unicode transcoding happening in the language that  
>> it
>> starts to get annoying,
>> but the most basic string iteration doesn't support it by default?
>
> Walter won't change it, because it would silently change too much code.  
> Now,
> I'm willing to bet that in 99.9999999% of cases, it would _fix_ the code  
> rather
> than break it, but still, he won't do it. However, the behavior _is_
> completely consistent with the rest of the language, since it's the  
> range-
> based stuff which decodes arrays of chars or wchars as characters. And it
> _would_ be inconsistent with all other uses of foreach for arrays of  
> char or
> wchar to be iterated over as ranges of dchar. But still, it's a bug  
> waiting to
> happen which doesn't really benefit anyone.
>
> I've suggested that there should be a warning when code uses a foreach  
> over an
> array of char or wchar without specifying the iteration type (
> http://d.puremagic.com/issues/show_bug.cgi?id=4483 ). That way, you can
> specify char or wchar if you really want it, but anyone who forgets to
> explicitly use dchar (or doesn't realize that they should) is warned.  
> But that
> hasn't been implemented as of yet, and I don't believe that Walter has  
> voiced
> his opinion on it.
>
> - Jonathan M Davis

At least it was your ∞ that revealed my bug.

Incidentally this has brought me a nice idea.
You need to combine the foreach loop 'bug' with the ability to alter the  
index variable
(http://d.puremagic.com/issues/show_bug.cgi?id=6652).
Then you can construct a terrifically fast, still correct, utf8 decoder.

                     foreach(i, c; s)
                     {
                         if (c < 0x80)
                             outp.put(c);
                         else
                             (outp.put(std.utf.decode(s, i)), --i);
                     }



But you better write foreach(ref i, char c; s).


More information about the Digitalmars-d mailing list