string is rarely useful as a function argument

Jonathan M Davis jmdavisProg at gmx.com
Wed Dec 28 22:48:29 PST 2011


On Thursday, December 29, 2011 07:33:28 Jakob Ovrum wrote:
> I don't think this is a problem you can solve without educating
> people. They will need to know a thing or two about how UTF works
> to know the performance implications of many of the "safe" ways
> to handle UTF strings. Further, for much use of Unicode strings
> in D you can't get away with not knowing anything anyway because
> D only abstracts up to code points, not graphemes. Imagine trying
> to explain to the unknowing programmer what is going on when an
> algorithm function broke his grapheme and he doesn't know the
> first thing about Unicode.
> 
> I'm not claiming to be an expert myself, but I believe D offers
> Unicode the right way as it is.

Ultimately, the programmer _does_ need to understand unicode properly if 
they're going to write code which is both correct and efficient. However, if the 
easy way to use strings in D is correct, even if it's not as efficient as we'd 
like, then at least code will tend to be correct in its use of unicode. And 
then if the programmer wants to their string processing to be more efficient, 
they need to actually learn how unicode works so that they code for it more 
efficiently.

The issue, however, is that it's currently _way_ too easy to use strings 
completely incorrectly and operate on code units as if they were characters. A 
_lot_ of programmers will be using string and char[] as if a char were a 
character, and that's going to create a lot of bugs. Making it harder to 
operate on a char[] or string as if it were an array of characters will 
seriously reduce such bugs and on some level will force people to become 
better educated about unicode.

No, it doesn't completely solve the problem, since then we're operating at the 
code point level rather than the unicode level, but it's still a _lot_ better 
than operating on the code unit level as is likely to happen now.

- Jonathan M Davis


More information about the Digitalmars-d mailing list