[Issue 1865] New: Escape sequences are flawed.

d-bugmail at puremagic.com d-bugmail at puremagic.com
Sun Feb 24 13:32:09 PST 2008


http://d.puremagic.com/issues/show_bug.cgi?id=1865

           Summary: Escape sequences are flawed.
           Product: D
           Version: 1.027
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: critical
          Priority: P1
         Component: DMD
        AssignedTo: bugzilla at digitalmars.com
        ReportedBy: aziz.kerim at gmail.com


The specs state (http://www.digitalmars.com/d/1.0/lex.html):
"Although string literals are defined to be composed of UTF characters, the
octal and hex escape sequences allow the insertion of arbitrary binary data."

This holds true for normal string literals (e.g. "abc") but not for escape
string literals. For instance:

auto str = \xDB;
pragma(msg, typeof(str).stringof); // Should be char[1u] but prints: char[2u]
auto str2 = "\xDB";
pragma(msg, typeof(str2).stringof); // Prints: char[1u]
static assert(\xDB == "\xDB"); // Should be equal, but aren't.

I also found out that octal escape sequences are fundamentally flawed.
The highest possible octal value is 0777 which equals 0x1FF in hex. It seems
like dmd doesn't know this.

pragma(msg, '\777'.stringof); // Prints: '\xff'
static assert('\777' == 0x1FF); // Shouldn't fail.
static assert('\777' == 0xFF); // Shouldn't pass.
static assert('\377' == 0xFF); // Passes as they are really equal.

As we can see values from 0400 to 0777 need two bytes to be represented
correctly. Therefore, when the lexer encounters string literals like \400 to
\777 or "\400" to "\777" then it must use two bytes to encode it into the
string value. Example:

char[2] str = \777;
static assert(str[0] == 1 && str[1] == 0xFF);

I think it's appropriate to mark this bug report as critical.


-- 



More information about the Digitalmars-d-bugs mailing list