String Literal Docs

div0 div0 at users.sourceforge.net
Sun Jun 20 03:52:22 PDT 2010


On 20/06/2010 11:03, Alix Pexton wrote:
> On 20/06/2010 01:09, div0 wrote:
>> On 19/06/2010 23:17, Ellery Newcomer wrote:
>>>
>>> All I can say is
>>>
>>> auto w = x"dead beef"w;
>>>
>>> results in
>>>
>>> Error: invalid UTF-8 sequence
>>>
>>> on dmd 2.047
>>
>> Then you've found a bug, you know what to do:
>>
>> http://d.puremagic.com/issues/
>>
>
> Hmn, that would seem to indicate to me that the postfix is being allowed
> when the hex represents a valid UTF sequence, but not otherwise.
>
> I didn't do too much testing myself as I know next to zilch about string
> internals ><
>
> The text that describes hex strings says that they have to have an even
> number of digits, but this would seem to imply that they have to have a
> multiple of 4 or 8 for wstrings and dstrings respectively, which makes
> sense, but I'm not sure that can be verified in the lexing of a string
> literal without insane lookahead rules ><

It says multiple of 2, not even number of digits. To me that implies 
it's always 2 and the suffix acceptance is just a bug. It could be made 
more clear though.

>
> But, then I guess that is why the spec says that hex strings are exempt
> from the valid UTF rule, and in that case hexstrings should really make
> byte arrays rather than strings, but failing that, always chars and not
> anything wider.
>
> A...

Yeah, hex strings should probably have the type ubyte[]

If you using them to put arbitrary binary in your program you're almost 
certainly going to cast the array to something else anyway, so char[], 
wchar[], dchar[] all seem a bit pointless and as they allow invalid utf, 
making them ?char[] seems wrong.

-- 
My enormous talent is exceeded only by my outrageous laziness.
http://www.ssTk.co.uk


More information about the Digitalmars-d mailing list