First Impressions
Chad J
"gamerChad\" at spamIsBad gmail.com
Fri Sep 29 11:21:36 PDT 2006
BCS wrote:
> Johan Granberg wrote:
>
>>
>>
>> I completely agree, char should hold a character independently of
>> encoding and NOT a code unit or something else. I think it would be
>> beneficial to D in the long term if chars where done right (meaning
>> that they can store any character) how it is implemented is not
>> important and i believe performance is not a problem here, so ease of
>> use and correctness would be appreciated.
>
>
> Why isn't performance a problem?
>
> If you are saying that this won't cause performance hits in run times or
> memory space, I might be able to buy it, but I'm not yet convinced.
>
> If you are saying that causing a performance hit in run times or memory
> space is not a problem... in that case I think you are dead wrong and
> you will not convince me otherwise.
>
> In my opinion, any compiled language should allow fairly direct access
> to the most efficient practical means of doing something*. If I didn't
> care about speed and memory I wound use some sort of scripting language.
>
> A good set of libs should make most of this moot. Leave the char as is
> and define a typedef struct or whatever that provides the added
> functionality that you want.
>
> * OTOH a language should not mandate code to be efficient at the expense
> of ease of coding.
I will go ahead and say that the current state of char[] is incorrect.
That is, if you write a program manipulating char[] strings, then run it
in china, you will be dissapointed with the results. It won't matter
how fast the program runs, because bad stuff will happen like entire
strings becoming unreadable to the user.
Technically if you follow UTF and do your char[] manipulations very
carefully, it is correct, but realistically few if any people will do
such things (I won't). Also, if you do this, your program will probably
run as slow as one with the proposed char/string solution, maybe slower
(since language/stdlib level support can be heavily optimized).
What I'd like then, is a program that is correct and as fast as possible
while still being correct.
Sure you can get some speed gains by just using ASCII and saying to hell
with UTF, but you should probably only do that when profiling has shown
that such speed gains are actually useful/needed in your program.
Ultimately we have to decide whether we want D to default to UTF code
which might run slightly slower but allow better localization and
international friendliness, or if we want it to default to ASCII or some
such encoding that runs slightly faster but is mostly limited to english.
I'd like the default to be UTF. Then we can have a base of code to
correctly manipulate UTF strings (in phobos and language supported).
Writing correct ASCII manipulation routine without good library/language
support is a lot easier than writing good UTF manipulation routines
without good library/language support, and UTF will probably be used
much more than ASCII.
Also, if we move over to full blown UTF, we won't have to give up ASCII.
It seems to me like the phobos std.string functions are pretty much
ASCII string manipulating functions (no multibyte string support). So
just copy those out to a seperate library, call it "ASCII lib", and
there's your library support for ASCII. That leaves string literals,
which is a slight problem, but I suppose easily fixed:
ubyte[] hi = "hello!"a;
Just add a postfix 'a' for strings which makes the string an ASCII
literal, of type ubyte[]. D arrays don't seem powerful enough to do UTF
manipulations without special attention, but they are powerful enough to
do ASCII manipulations without special attention, so using ubyte[] as an
ASCII string should give full language support for these. Given that
and ASCIILIB you pretty much have the current D string manipulation
capabilities afaik, and it will be fast.
More information about the Digitalmars-d
mailing list