First Impressions

Thomas Kuehne thomas-dloop at kuehne.cn
Sat Sep 30 01:56:17 PDT 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Derek Parnell schrieb am 2006-09-30:
> On Fri, 29 Sep 2006 10:04:57 -0700, Walter Bright wrote:
>
>> Derek Parnell wrote:
>>> And is it there yet? I mean, given that a string is just a lump of text, is
>>> there any text processing operation that cannot be simply done to a char[]
>>> item? I can't think of any but maybe somebody else can.
>> 
>> I believe it's there. I don't think std::string or java.lang.String have 
>> anything over it.
>
> I'm pretty sure that the phobos routines for search and replace only work
> for ASCII text. For example, std.string.find(japanesetext, "a") will nearly
> always fail to deliver the correct result. It finds the first occurance of
> the byte value for the letter 'a' which may well be inside a Japanese
> character. It looks for byte-subsets rather than character sub-sets.


~wow~

Have a look at std.string.find's source and try to stop giggling *g*

The correct implementation would be:

# import std.string;
# import std.c.string;
# import std.utf;
# 
# int find(char[] s, dchar c)
# {
#     if (c <= 0x7F)
#     {    // Plain old ASCII
#     auto p = cast(char*)memchr(s, c, s.length);
#     if (p)
#         return p - cast(char *)s;
#     else
#         return -1;
#     }
# 
#     // c is a universal character
#     return std.string.find(s, toUTF8([c]));
# }

The same applies to ifind and the like.

Thomas


-----BEGIN PGP SIGNATURE-----

iD8DBQFFHj4fLK5blCcjpWoRAj67AJoDagf5zf7Az7ZqMDfOyZdRJ+aIqQCdGeen
ye80pstE4IJC1WoxgTVVgdc=
=iwT5
-----END PGP SIGNATURE-----



More information about the Digitalmars-d mailing list