Unicode BOM and endianness

Hasan Aljudy hasan.aljudy at gmail.com
Thu Aug 3 22:03:18 PDT 2006



Derek Parnell wrote:
> On Fri, 04 Aug 2006 00:36:21 -0300, Tim Locke wrote:
> 
> 
>>How do I acquire and determine the BOM and endianness of a file I am
>>reading?
>>
>>Thanks
> 
> 
> You might check out http://en.wikipedia.org/wiki/Byte_Order_Mark
> 

Are GNU tools really as ignorant of Unicode as that page implies?

[quote]
While UTF-8 does not have byte order issues, a BOM encoded in UTF-8 may 
be used to mark text as UTF-8. Quite a lot of Windows software 
(including Windows Notepad) adds one to UTF-8 files. However in 
Unix-like systems (which make heavy use of text files for configuration) 
this practice is not recommended, as it will interfere with correct 
processing of important codes such as the hash-bang at the start of an 
interpreted script. It may also interfere with source for programming 
languages that don't recognise it. For example, gcc reports stray 
characters at the beginning of a source file, and in PHP, if output 
buffering is disabled, it has the subtle effect of causing the page to 
start being sent to the browser, preventing custom headers from being 
specified by the PHP script
[/quote]



More information about the Digitalmars-d-learn mailing list