Parsing a UTF-16LE file line by line, BUG?

Patrick Schluter via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Sun Jan 29 06:49:29 PST 2017


On Saturday, 28 January 2017 at 15:40:24 UTC, Nestor wrote:
> On Friday, 27 January 2017 at 04:26:31 UTC, Era Scarecrow wrote:
>>  Skipping the BOM is just a matter of skipping the first two 
>> bytes identifying it...
>
> AFAIK in some cases the BOM takes up to 4 bytes (FOR UTF-32), 
> so when input encoding is unknown one must perform some kind of 
> detection in order to apply the correct transcoding later. I 
> thought by now dmd had this functionality built-in and exposed, 
> since the compiler itself seems to do it for source code units.

On UTF-8 files the BOM is 3 bytes long.


More information about the Digitalmars-d-learn mailing list