Challenge: write a really really small front() for UTF8

Vladimir Panteleev vladimir at thecybershadow.net
Mon Mar 24 09:37:07 PDT 2014


On Monday, 24 March 2014 at 16:31:42 UTC, Andrei Alexandrescu 
wrote:
> On 3/24/14, 2:02 AM, monarch_dodra wrote:
>> On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu 
>> wrote:
>>> Here's a baseline: http://goo.gl/91vIGc. Destroy!
>>>
>>> Andrei
>>
>> Before we roll this out, could we discuss a strategy/guideline 
>> in
>> regards to detecting and handling invalid UTF sequences?
>
> I think std.array.front should return the invalid dchar on 
> error, and popFront should attempt to resync on error.

Ignoring UTF errors is a lossy operation and has the potential 
problem of irreversible data loss. For example, consider a 
program which needs to process Windows-1251 files: it would need 
to read the file from disk, convert to UTF-8, process it, convert 
back to Windows-1251, and save it back to disk. If a bug causes 
the UTF-8 conversion step to be accidentally skipped, then all 
Unicode data in that file is lost.

I think UTF-8 decoding operations should, by default, throw on 
UTF-8 errors. Ignoring UTF-8 errors should only be done 
explicitly, with the programmer's consent.


More information about the Digitalmars-d mailing list