On 05/27/2013 09:21 PM, Martin Nowak wrote: > > See unittest/benchmark here: > > https://gist.github.com/blackwhale/5653927 > > > Looks promising. This will not detect 0xFF as invalid UTF-8 sequence. For sequences with 5 or 6 bytes, that aren't used for unicode, it will return a stride of 4.