First Impressions

Walter Bright newshound at digitalmars.com
Fri Sep 29 23:11:37 PDT 2006


Derek Parnell wrote:
> I'm pretty sure that the phobos routines for search and replace only work
> for ASCII text. For example, std.string.find(japanesetext, "a") will nearly
> always fail to deliver the correct result. It finds the first occurance of
> the byte value for the letter 'a' which may well be inside a Japanese
> character.

That cannot happen, because multibyte sequences *always* have the high 
bit set, and 'a' does not. That's one of the things that sets UTF-8 
apart from other multibyte formats. You might be thinking of the older 
Shift-JIS multibyte encoding, which did suffer from such problems.

> It looks for byte-subsets rather than character sub-sets.

I don't think it's broken, but if it is, those are bugs, not fundamental 
problems with char[], and should be filed in bugzilla.

> It may very well be pointless for your way of thinking, but your language
> is also for people who may not necessarily think in the same manner as
> yourself. I, for example, think there is a point to having my code read
> like its dealing with strings rather than arrays of characters. I suspect
> I'm not alone. We could all write the alias in all our code, but you could
> also be helpful and do it for us - like you did with bit/bool.

I'm concerned about just adding more names that don't add real value. As 
I wrote in a private email discussion about C++ typedefs, they should 
only be used when:

1) they provide an abstraction against the presumption that the 
underlying type may change

2) they provide a self-documentation purpose

(1) certainly doesn't apply to string. (2) may, but char[] has no use 
other than that of being a string, as a char[] is always a string and a 
string is always a char[]. So I don't think string fits (2).

And lastly, there's the inevitable confusion. People learning the 
language will see char[] and string, and wonder which should be used 
when. I can't think of any consistent understandable rule for that. So 
it just winds up being wishy-washy. Adding more names into the global 
space (which is what names in object.d are) should be done extremely 
conservatively.

If someone wants to use the string alias as their personal or company 
style, I have no issue with that, as other people *do* think differently 
than me (which is abundantly clear here!).



More information about the Digitalmars-d mailing list