The length of strings vs. # of chars vs. sizeof

Tue Nov 3 01:47:52 PST 2009

Charles Hixson <charleshixsn at earthlink.net> wrote:

> Jesse Phillips wrote:
>> On Sun, 01 Nov 2009 11:36:31 -0800, Charles Hixson wrote:
>>
>>> Does anyone just *know* the answer?  (And if so, could they make the
>>> documentation explicit?)
>>
>> I believe the documentation you are looking for is:
>>
>> http://www.prowiki.org/wiki4d/wiki.cgi?DanielKeep/TextInD
>>
>> It is more about understanding UTF than it is about learning strings.
> Thanks, that does appear to be the answer.
> 
> So if a string is too long, and I shorten it by one character, I'd 
> better test it with std.utf.validate(str).  If it doesn't throw an 
> error, it's ok.  Otherwise shorten it again and retry.
> 
> I hope I understood this correctly.  (I'm sure there's a more elegant 
> way to do this, but here I'm going for a simple approach, as I should 
> rarely be encountering this problem.)
> 
> 
As far as I know if you want to shorten a utf8 string you just check the 
first bit of the last byte to see if its 0. If its 0 go back further 
until you find a byte that starts with 1, and then remove that byte too.

All characters start with a byte that starts with 1, the number of 1s in 
the first byte of the character tell you how many bytes in the character.

Hope that helps, but you should find a library that already has a 
"shorten my string" function.

-Rory