Making all strings UTF ranges has some risk of WTF

dsimcha dsimcha at yahoo.com
Thu Feb 4 14:13:23 PST 2010


== Quote from Jerry Quinn (jlquinn at optonline.net)'s article
> Don Wrote:
> > We seem to be approaching the point where char[], wchar[] and dchar[]
> > are all arrays of dchar, but with different levels of compression.
> > It makes me wonder if the char, wchar types actually make any sense.
> > If char[] is actually a UTF string, then char[] ~ char should be
> > permitted ONLY if char can be implicitly converted to dchar. Otherwise,
> > you're performing cast(char[])(cast(ubyte[])s ~ cast(ubyte)c) which will
> > not necessarily result in a valid unicode string.
> Well, if you're working with a LOT of text, you may be mmapping GB's of UTF-8
text.  Yes, this does happen.  You better be able to handle it in a sane manner,
i.e. not reallocating the memory to read the data in.  So, there is a definite
need for casting to array of char, and dealing with the inevitable stray
non-unicode char in that mess.

Welcome to the world of DNA sequence manipulation.



More information about the Digitalmars-d mailing list