byte to char safe?

Sergey Gromov snake.scaly at gmail.com
Sun Aug 2 17:11:31 PDT 2009


Sat, 01 Aug 2009 19:58:20 -0400, Harry wrote:

> Sergey Gromov Wrote:
> 
>> Thu, 30 Jul 2009 19:14:56 -0400, Harry wrote:
>> 
>>> Ary Borenszweig Wrote:
>>> 
>>>> Harry escribi��> > Again hello, 
>>>>> 
>>>>> char[6] t = r"again" ~ cast(char)7 ~ r"hello";
>>>> 
>>>> If you want the result to be "again7hello", then no. You must do:
>>>> 
>>>> char[6] t = r"again" ~ '7' ~ r"hello";
>>>> 
>>>> or:
>>>> 
>>>> char[6] t = r"again" ~ (cast(char)('0' + 7)) ~ r"hello";
>>> 
>>> Hello Ary,
>>> 
>>> 7 is data not string.
>>> It makes own write function
>>> need style data in char[]
>>> Not sure if safe ?
>> 
>> If you use only your own write function then you can put just anything
>> into char[].  But if you pass that char[] to any standard function, or
>> even foreach, and there are non-UTF-8 sequences in there, the standard
>> function will fail.
>> 
>> Also note that values from 0 to 0x7F are valid UTF-8 codes and can be
>> safely inserted into char[].
>> 
>> If you want to safely put a larger constant into char[] you can use
>> unicode escape sequences: '\uXXXX' or '\UXXXXXXXX', where XXXX and
>> XXXXXXXX are 4 or 8 hexadecimal digits respectively:
>> 
>>     char[] foo = "hello " ~ "\u017e" ~ "\U00105614";
>>     foreach (dchar ch; foo)
>>         writefln("%x", cast(uint) ch);
>> 
>> Finally, if you want to encode a variable into char[], you can use
>> std.utf.encode function:
>> 
>>     char[] foo;
>>     uint value = 0x00100534;
>>     std.utf.encode(foo, value);
>> 
>> Unfortunately all std.utf functions accept only valid UTF characters.
>> Currently they're everything from 0 to 0xD7FF and from 0xE000 to
>> 0x10FFFF.  Any other character values will throw a run-time exception if
>> passed to standard functions.
> 
> thank you!
> 
> non-print utf8 is print with writef
> start of text \x02 is smile
> end of text \x03 is heart
> newline \x0a is newline!

Well, sure, standard writef simply outputs those characters to the
console.  Then console prints them according to its own rules.
Therefore special characters will have different representation on
different consoles.  If you want consistent output you should those
special characters to some printable form.

> is difference? utf.encode(foo,value)  foo~"\U00100534"

A little.  The code:

uint value = 0x00100534;
std.utf.encode(foo, value);

is the same as:

foo ~= "\U00100534";


More information about the Digitalmars-d-learn mailing list