First Impressions

Fri Sep 29 11:57:22 PDT 2006

Chad J > wrote:

> I'd like the default to be UTF. Then we can have a base of code to
> correctly manipulate UTF strings (in phobos and language supported).
> Writing correct ASCII manipulation routine without good library/language
> support is a lot easier than writing good UTF manipulation routines
> without good library/language support, and UTF will probably be used
> much more than ASCII.

But D already uses Unicode for all strings, encoded as UTF ?

When you say "ASCII", do you mean 8-bit encodings perhaps ?
(since all proper 7-bit ASCII are already valid UTF-8 too)

> Also, if we move over to full blown UTF, we won't have to give up ASCII. 
>  It seems to me like the phobos std.string functions are pretty much 
> ASCII string manipulating functions (no multibyte string support).  So 
> just copy those out to a seperate library, call it "ASCII lib", and 
> there's your library support for ASCII.  That leaves string literals, 
> which is a slight problem, but I suppose easily fixed:
> ubyte[] hi = "hello!"a;

I don't understand this, why can't you use UTF-8 for this ?

char[] hi = "hello!";

> Just add a postfix 'a' for strings which makes the string an ASCII 
> literal, of type ubyte[].  D arrays don't seem powerful enough to do UTF 
> manipulations without special attention, but they are powerful enough to 
> do ASCII manipulations without special attention, so using ubyte[] as an 
> ASCII string should give full language support for these.  Given that 
> and ASCIILIB you pretty much have the current D string manipulation 
> capabilities afaik, and it will be fast.

What is not powerful enough about the foreach(dchar c; str) ?
It will step through that UTF-8 array one codepoint at a time.

--anders