First Impressions
Anders F Björklund
afb at algonet.se
Fri Sep 29 11:57:22 PDT 2006
Chad J > wrote:
> I'd like the default to be UTF. Then we can have a base of code to
> correctly manipulate UTF strings (in phobos and language supported).
> Writing correct ASCII manipulation routine without good library/language
> support is a lot easier than writing good UTF manipulation routines
> without good library/language support, and UTF will probably be used
> much more than ASCII.
But D already uses Unicode for all strings, encoded as UTF ?
When you say "ASCII", do you mean 8-bit encodings perhaps ?
(since all proper 7-bit ASCII are already valid UTF-8 too)
> Also, if we move over to full blown UTF, we won't have to give up ASCII.
> It seems to me like the phobos std.string functions are pretty much
> ASCII string manipulating functions (no multibyte string support). So
> just copy those out to a seperate library, call it "ASCII lib", and
> there's your library support for ASCII. That leaves string literals,
> which is a slight problem, but I suppose easily fixed:
> ubyte[] hi = "hello!"a;
I don't understand this, why can't you use UTF-8 for this ?
char[] hi = "hello!";
> Just add a postfix 'a' for strings which makes the string an ASCII
> literal, of type ubyte[]. D arrays don't seem powerful enough to do UTF
> manipulations without special attention, but they are powerful enough to
> do ASCII manipulations without special attention, so using ubyte[] as an
> ASCII string should give full language support for these. Given that
> and ASCIILIB you pretty much have the current D string manipulation
> capabilities afaik, and it will be fast.
What is not powerful enough about the foreach(dchar c; str) ?
It will step through that UTF-8 array one codepoint at a time.
--anders
More information about the Digitalmars-d
mailing list