Why foreach(c; someString) must yield dchar

Thu Aug 19 12:18:03 PDT 2010

Jonathan M Davis Wrote:

> No, it doesn't hurt to have the iteration type larger than the actual type, but 
> you're not going to have overflow.

Trivial: take byte and add 256.

> could have had overflow putting it in, but when you're taking it out, you know 
> that it fits because it was already in there. You could have overflow issues with 
> math or whatnot inside the body of your loop if you're assigning to the foreach 
> variable, but that has nothing to do with what you're getting out of the loop. 

As long as what you get out of the loop doesn't depend on the element type. Didn't you demonstrated how such dependency can be introduced?

> It's fine with me to use narrow strings. Much as I'd love to avoid a lot of these 
> issues, dstrings take up too much memory if you're going to be doing a lot of 
> string processing.

If you're going to take much memory, there probably won't be much difference between strings and dstrings, you'll take much memory in both cases. And don't forget that UTF-8 chars take up to 4 bytes.

> problem is that the default behavior is the abnormal (and therefore almost 
> certainly buggy) behavior. Generally D tries to make the normal behavior the 
> behavior that is less likely to cause bugs.

Type system hacks are likely to cause bugs.

> Very few people are actually going to 
> want to deal with code points. They want characters. The result is that it 
> becomes very easy to make mistakes with strings if you ever try and manipulate 
> them character-by-character.

If you care about people and want to force them to use dchar ranges, you can do it with the library: make it refuse narrow strings - as long as the library is unusable with narrow strings, people will have to do something about it, say, use wrappers like one proposed in this thread (but providing forward dchar range interface).

> It makes perfect sense for general arrays. It makes perfect sense if you don't 
> really care about the contents of the array for your algorithm (that is, whether 
> they're code points or characters or just bytes in memory doesn't matter for 
> what you're doing). However, if you're actually processing characters, it makes 
> no sense at all. This mess with foreach and strings is one of the big reasons 
> why foreach tends to be avoided in std.algorithm.

The problem here is that integers are not much different from characters in this regard.

> and given the fact that the string module deals almost exclusively with 
> string rather than wstring or dstring, it really doesn't make sense to use 
> dstrings in the general case.

This is my point: you can do it with library, if you can't, fix the library.

> Not to mention, the Linux I/O stuff uses UTF-8, and 
> the Windows I/O stuff uses UTF-16, so dstring is less efficient for dealing with 
> I/O.

Every string type is inefficient here, but a wrapper comparable to NSString can fix it for you.

> Perhaps what we need is some way to distinguish between the exact element type 
> on an array and the conceptual element type. So, for most arrays, they'd both be 
> whatever the element type of the array is, but for strings the exact element 
> type would be char, whchar, or dchar while the conceptual type would be dchar. 

Conceptually number is an infinite sequence of digits with decimal point. What do you plan to do about this?