Unicode BOM and endianness

Thomas Kuehne thomas-dloop at kuehne.cn
Fri Aug 4 13:35:59 PDT 2006

Hash: SHA1

Hasan Aljudy schrieb am 2006-08-04:
> Derek Parnell wrote:
>> On Fri, 04 Aug 2006 00:36:21 -0300, Tim Locke wrote:
>>>How do I acquire and determine the BOM and endianness of a file I am
>> You might check out http://en.wikipedia.org/wiki/Byte_Order_Mark
> Are GNU tools really as ignorant of Unicode as that page implies?
> [quote]
> While UTF-8 does not have byte order issues, a BOM encoded in UTF-8 may 
> be used to mark text as UTF-8. Quite a lot of Windows software 
> (including Windows Notepad) adds one to UTF-8 files. However in 
> Unix-like systems (which make heavy use of text files for configuration) 
> this practice is not recommended, as it will interfere with correct 
> processing of important codes such as the hash-bang at the start of an 
> interpreted script.

Let's have 2 UTF-8 files with BOM: A and B

cat A B > C

A's BOM will remain a BOM but B's BOM is going to be interpreted as
"zero-width no-break space" - usually used to display unsupported or illegal
characters. Thus using BOMs in combination with streaming, concating
etc. will allways cause problems. In contrast to Windows, Linux - home
to the GNU tools - treats "text" and "binary" files as "binary" files.




More information about the Digitalmars-d-learn mailing list