string is rarely useful as a function argument

Jonathan M Davis jmdavisProg at gmx.com
Fri Dec 30 19:30:50 PST 2011


On Friday, December 30, 2011 20:55:42 Timon Gehr wrote:
> 1. They don't notice. Then it is not a problem, because they are
> obviously only using ASCII characters and it is perfectly reasonable to
> assume that code units and characters are the same thing.

The problem is that what's more likely to happen in a lot of cases is that 
they use it wrong and don't notice, because they're only using ASCII in 
testing, _but_ they have bugs all over the place, because their code is 
actually used with unicode in the field.

Yes, diligent programmers will generally find such problems, but with the 
current scheme, it's _so_ easy to use length when you shouldn't, that it's 
pretty much a guarantee that it's going to happen. I'm not sure that Andrei's 
suggestion is the best one at this point, but I sure wouldn't be against it 
being introduced. It wouldn't entirely fix the problem by any means, but 
programmers would then have to work harder at screwing it up and so there 
would be fewer mistakes.

Arguably, the first issue with D strings is that we have char. In most 
languages, char is supposed to be a character, so many programmers will code 
with that expectation. If we had something like utf8unit, utf16unit, and 
utf32unit (arguably very bad, albeit descriptive, names) and no char, then it 
would force programmers to become semi-educated about the issues. There's no 
way that that's changing at this point though.

- Jonathan M Davis


More information about the Digitalmars-d mailing list