Why the hell doesn't foreach decode strings
Norbert Nemec
Norbert at Nemec-online.de
Mon Oct 24 12:23:14 PDT 2011
On 21.10.2011 06:06, Jonathan M Davis wrote:
> It's this very problem that leads some people to argue that string should be
> its own type which holds an array of code units (which can be accessed when
> needed) rather than doing what we do now where we try and treat a string as
> both an array of chars and a range of dchars. The result is schizophrenic.
Indeed - expressing strings as arrays of characters will always fall
short of the unicode concept in some way. A true unicode-compliant
languages have to handle strings as opaque objects that do not have any
encoding. There is a number of operations that can be done with these
objects (concatenation, comparison, searching, etc.). Any kind of
defined memory representation can only be obtained by an explicit
encoding operation.
Python3, for example, did a fundamental step by introducing this
fundamental distinction. At first it seems silly, having to think about
encodings so often when writing trivial code. After a short while, the
strict conceptual separation between unencoded "strings" and encoded
"arrays of something" really helps avoiding ugly problems.
Sure, for a performance-critical language, the issue becomes a lot
trickier. I still think it is possible and ultimately the only way to
solve tricky problems that will otherwise always crop up somewhere.
More information about the Digitalmars-d
mailing list