First Impressions
Walter Bright
newshound at digitalmars.com
Fri Sep 29 23:11:37 PDT 2006
Derek Parnell wrote:
> I'm pretty sure that the phobos routines for search and replace only work
> for ASCII text. For example, std.string.find(japanesetext, "a") will nearly
> always fail to deliver the correct result. It finds the first occurance of
> the byte value for the letter 'a' which may well be inside a Japanese
> character.
That cannot happen, because multibyte sequences *always* have the high
bit set, and 'a' does not. That's one of the things that sets UTF-8
apart from other multibyte formats. You might be thinking of the older
Shift-JIS multibyte encoding, which did suffer from such problems.
> It looks for byte-subsets rather than character sub-sets.
I don't think it's broken, but if it is, those are bugs, not fundamental
problems with char[], and should be filed in bugzilla.
> It may very well be pointless for your way of thinking, but your language
> is also for people who may not necessarily think in the same manner as
> yourself. I, for example, think there is a point to having my code read
> like its dealing with strings rather than arrays of characters. I suspect
> I'm not alone. We could all write the alias in all our code, but you could
> also be helpful and do it for us - like you did with bit/bool.
I'm concerned about just adding more names that don't add real value. As
I wrote in a private email discussion about C++ typedefs, they should
only be used when:
1) they provide an abstraction against the presumption that the
underlying type may change
2) they provide a self-documentation purpose
(1) certainly doesn't apply to string. (2) may, but char[] has no use
other than that of being a string, as a char[] is always a string and a
string is always a char[]. So I don't think string fits (2).
And lastly, there's the inevitable confusion. People learning the
language will see char[] and string, and wonder which should be used
when. I can't think of any consistent understandable rule for that. So
it just winds up being wishy-washy. Adding more names into the global
space (which is what names in object.d are) should be done extremely
conservatively.
If someone wants to use the string alias as their personal or company
style, I have no issue with that, as other people *do* think differently
than me (which is abundantly clear here!).
More information about the Digitalmars-d
mailing list