Why UTF-8/16 character encodings?

Walter Bright newshound2 at digitalmars.com
Sat May 25 15:01:10 PDT 2013


On 5/25/2013 2:51 PM, Walter Bright wrote:
> On 5/25/2013 12:51 PM, Joakim wrote:
>> For a multi-language string encoding, the header would
>> contain a single byte for every language used in the string, along with multiple
>> index bytes to signify the start and finish of every run of single-language
>> characters in the string. So, a list of languages and a list of pure
>> single-language substrings.
>
> Please implement the simple C function strstr() with this simple scheme, and
> post it here.
>
> http://www.digitalmars.com/rtl/string.html#strstr

I'll go first. Here's a simple UTF-8 version in C. It's not the fastest way to 
do it, but at least it is correct:
----------------------------------
char *strstr(const char *s1,const char *s2) {
     size_t len1 = strlen(s1);
     size_t len2 = strlen(s2);
     if (!len2)
         return (char *) s1;
     char c2 = *s2;
     while (len2 <= len1) {
         if (c2 == *s1)
             if (memcmp(s2,s1,len2) == 0)
                 return (char *) s1;
         s1++;
         len1--;
     }
     return NULL;
}


More information about the Digitalmars-d mailing list