[Issue 1024] invalid UTF-8 sequence for \u00B6 (¶) in comment

d-bugmail at puremagic.com d-bugmail at puremagic.com
Sun Mar 4 13:41:55 PST 2007


http://d.puremagic.com/issues/show_bug.cgi?id=1024





------- Comment #1 from fvbommel at wxs.nl  2007-03-04 15:41 -------
Was the encoding UTF-8? Did your file start with the appropriate BOM? (DMD
requires a BOM to consider a file anything other than pure ASCII)

Here's a test for you, try to reproduce this:
---
urxae at urxae:~/tmp$ cat utf.d
//¶
urxae at urxae:~/tmp$ hd utf.d
00000000  ef bb bf 2f 2f c2 b6 0a                           |...//...|
00000008
urxae at urxae:~/tmp$ dmd -c utf.d
urxae at urxae:~/tmp$ 
---
The first command shows the contents of the file (apparently cat doesn't handle
BOMs, it just sends it straight to the console; that's where the extra symbol
comes from).
The second shows the hexdump of the file. Note the 'ef bb bf' UTF-8 BOM, and
the 'c2 b6' encoding of the '¶'.
The third command shows DMD compiling the file successfully.

See http://www.digitalmars.com/d/lex.html (under "Source Text") for the details
on encodings accepted by DMD


-- 



More information about the Digitalmars-d-bugs mailing list