Challenge: write a really really small front() for UTF8
Vladimir Panteleev
vladimir at thecybershadow.net
Mon Mar 24 09:37:07 PDT 2014
On Monday, 24 March 2014 at 16:31:42 UTC, Andrei Alexandrescu
wrote:
> On 3/24/14, 2:02 AM, monarch_dodra wrote:
>> On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu
>> wrote:
>>> Here's a baseline: http://goo.gl/91vIGc. Destroy!
>>>
>>> Andrei
>>
>> Before we roll this out, could we discuss a strategy/guideline
>> in
>> regards to detecting and handling invalid UTF sequences?
>
> I think std.array.front should return the invalid dchar on
> error, and popFront should attempt to resync on error.
Ignoring UTF errors is a lossy operation and has the potential
problem of irreversible data loss. For example, consider a
program which needs to process Windows-1251 files: it would need
to read the file from disk, convert to UTF-8, process it, convert
back to Windows-1251, and save it back to disk. If a bug causes
the UTF-8 conversion step to be accidentally skipped, then all
Unicode data in that file is lost.
I think UTF-8 decoding operations should, by default, throw on
UTF-8 errors. Ignoring UTF-8 errors should only be done
explicitly, with the programmer's consent.
More information about the Digitalmars-d
mailing list