std.stream, BOM, and deprecation

Ali Çehreli acehreli at yahoo.com
Sat Oct 13 19:15:32 PDT 2012


On 10/13/2012 06:53 PM, Charles Hixson wrote:
 > If std.stream is being deprecated, what is the correct way to deal with
 > file BOMs. This is particularly concerning utf8 files, which I
 > understand to be a bit problematic, as there isn't, actually, a utf8
 > BOM,

That's correct. There is just one byte order for UTF-8.

 > merely a convention which isn't a part of a standard.

I am not sure about that. The Unicode standard describes UTF-8 as code 
units following each other in the file. There can't be any confusion 
about their order. According to Wikipedia, the only use of BOM for UTF-8 
is to identify the file as having been encoded in UTF-8:

   http://en.wikipedia.org/wiki/Byte_order_mark#UTF-8

But that can't have any meaning. The file could have been encoded in any 
one of the multitude of code pages as well. Treating the first three 
bytes as BOM would be taking a chance in that case and dropping those 
three characters.

 > But the
 > std.stdio documentation doesn't so much as mention byte order marks 
(BOMs).
 >
 > If this should wait until std.io is released, then I could use
 > std.stream until them, but the documentation is already warning to avoid
 > using it.

As I understand it, it is all down to convention any way. What is the 
meaning of the non-ASCII code 166? Only the generator of the file knows. :/

Ali



More information about the Digitalmars-d-learn mailing list