First Impressions

Derek Parnell derek at psyc.ward
Sat Sep 30 18:13:06 PDT 2006


On Fri, 29 Sep 2006 23:11:37 -0700, Walter Bright wrote:

> Derek Parnell wrote:
>> I'm pretty sure that the phobos routines for search and replace only work
>> for ASCII text. For example, std.string.find(japanesetext, "a") will nearly
>> always fail to deliver the correct result. It finds the first occurance of
>> the byte value for the letter 'a' which may well be inside a Japanese
>> character.
> 
> That cannot happen, because multibyte sequences *always* have the high 
> bit set, and 'a' does not. That's one of the things that sets UTF-8 
> apart from other multibyte formats. You might be thinking of the older 
> Shift-JIS multibyte encoding, which did suffer from such problems.

Thanks. That has cleared up some misconceptions and pre-concenptions that I
had with utf encoding. I can reduce some of my home-grown routines now and
reduce that number of times that I (think I) need dchar[] ;-)


>> It may very well be pointless for your way of thinking, but your language
>> is also for people who may not necessarily think in the same manner as
>> yourself. I, for example, think there is a point to having my code read
>> like its dealing with strings rather than arrays of characters. I suspect
>> I'm not alone. We could all write the alias in all our code, but you could
>> also be helpful and do it for us - like you did with bit/bool.
> 
> I'm concerned about just adding more names that don't add real value. As 
> I wrote in a private email discussion about C++ typedefs, they should 
> only be used when:
> 
> 1) they provide an abstraction against the presumption that the 
> underlying type may change
> 
> 2) they provide a self-documentation purpose
> 
> (1) certainly doesn't apply to string. 

No argument there.

>  (2) may, but char[] has no use 
> other than that of being a string, as a char[] is always a string and a 
> string is always a char[]. So I don't think string fits (2).
 
This is a lttle more debatable, but not worth generating hostility. 

A string of text contains characters whose position in the string is
significant - there are semantics to be applied to the entire text. It is
quite possible to conceive of an application in which the characters in the
char[] array have no importance attached to their relative position within
the array *where compared to neighboring characters*. The order of
characters in text is significant but not necessarily so in a arbitary
character array. 

Conceptually a string is different from a char[], even though they are
implemented using the same technology.

> And lastly, there's the inevitable confusion. People learning the 
> language will see char[] and string, and wonder which should be used 
> when. I can't think of any consistent understandable rule for that. So 
> it just winds up being wishy-washy. Adding more names into the global 
> space (which is what names in object.d are) should be done extremely 
> conservatively.

And yet we have "toString" and not "toCharArray" or "toUTF"!
 
And we still have the "printf" in object.d too! 

> If someone wants to use the string alias as their personal or company 
> style, I have no issue with that, as other people *do* think differently 
> than me (which is abundantly clear here!).

I'll revert Build to string again as it is a lot easier to read. It started
out that way but I converted it to char[] to appease you (why I thought you
need appeasing is lost though). :-)

-- 
Derek Parnell
Melbourne, Australia
"Down with mediocrity!"



More information about the Digitalmars-d mailing list