Of possible interest: fast UTF8 validation

Joakim dlang at joakim.fea.st
Wed May 16 17:18:06 UTC 2018


On Wednesday, 16 May 2018 at 16:48:28 UTC, Dmitry Olshansky wrote:
> On Wednesday, 16 May 2018 at 15:48:09 UTC, Joakim wrote:
>> On Wednesday, 16 May 2018 at 11:18:54 UTC, Andrei Alexandrescu 
>> wrote:
>>> https://www.reddit.com/r/programming/comments/8js69n/validating_utf8_strings_using_as_little_as_07/
>>
>> Sigh, this reminds me of the old quote about people spending a 
>> bunch of time making more efficient what shouldn't be done at 
>> all.
>
> Validating UTF-8 is super common, most text protocols and files 
> these days would use it, other would have an option to do so.
>
> I’d like our validateUtf to be fast, since right now we do 
> validation every time we decode string. And THAT is slow. 
> Trying to not validate on decode means most things should be 
> validated on input...

I think you know what I'm referring to, which is that UTF-8 is a 
badly designed format, not that input validation shouldn't be 
done.


More information about the Digitalmars-d mailing list