Behavior of strings with invalid unicode...

Jonathan M Davis jmdavisProg at gmx.com
Mon Nov 26 21:13:15 PST 2012


On Monday, November 26, 2012 08:47:48 monarch_dodra wrote:
> OK: I guess that makes sense. I kind of which there'd be more of
> a documented "two-level" scheme, but that should be fine.

It's pretty much grown over time and isn't necessarily applied consistently.

> Well, popFront only pops 1 element only if the very first element
> of is an invalid code point, but will not "see" if the code point
> at index 2 is invalid for multi-byte codes.
> 
> This kind of gives it a double-standard behavior, but I guess we
> have to draw a line somewhere.

We care about making popFront as fast as possible, and in general, front is 
called on the character as well (making the whole way that front and popFront 
work for strings naturally inefficient unfortunately), so it makes sense to skip 
the checking as much as possible in popFront. It's basically doing the best 
that it can to be as fast as it can, so any checking that it doesn't need to 
do is best skipped. Speed is wins over correctness here and anything that we 
can do to make it faster is desirable. It's not perfect that way, but since in 
most cases the Unicode will be correct, and the correctness is generally 
checked by front (or decode), it was deemed to be the best approach.

- Jonathan M Davis


More information about the Digitalmars-d mailing list