Unicode BOM and endianness
Hasan Aljudy
hasan.aljudy at gmail.com
Thu Aug 3 22:03:18 PDT 2006
Derek Parnell wrote:
> On Fri, 04 Aug 2006 00:36:21 -0300, Tim Locke wrote:
>
>
>>How do I acquire and determine the BOM and endianness of a file I am
>>reading?
>>
>>Thanks
>
>
> You might check out http://en.wikipedia.org/wiki/Byte_Order_Mark
>
Are GNU tools really as ignorant of Unicode as that page implies?
[quote]
While UTF-8 does not have byte order issues, a BOM encoded in UTF-8 may
be used to mark text as UTF-8. Quite a lot of Windows software
(including Windows Notepad) adds one to UTF-8 files. However in
Unix-like systems (which make heavy use of text files for configuration)
this practice is not recommended, as it will interfere with correct
processing of important codes such as the hash-bang at the start of an
interpreted script. It may also interfere with source for programming
languages that don't recognise it. For example, gcc reports stray
characters at the beginning of a source file, and in PHP, if output
buffering is disabled, it has the subtle effect of causing the page to
start being sent to the browser, preventing custom headers from being
specified by the PHP script
[/quote]
More information about the Digitalmars-d-learn
mailing list