To Walter, about char[] initialization by FF

Andrew Fedoniouk news at terrainformatica.com
Sat Jul 29 15:37:18 PDT 2006


"Walter Bright" <newshound at digitalmars.com> wrote in message 
news:eagmrk$1pn9$1 at digitaldaemon.com...
> Andrew Fedoniouk wrote:
>>>> What is the point of current initializaton?
>>> The point is to initialize it with an invalid value, in order to flush 
>>> out uninitialized data errors.
>>>
>>>> If you are doing intialization already
>>>> and this intialization is a part of specification so why not to use
>>>> official "Nul" values in this case?
>>> Because 0 is a valid UTF-8 character.
>>
>> 1) What "UTF-8 character" means exactly?
>
> For an exact answer, the spec is: http://www.ietf.org/rfc/rfc3629.txt
> There isn't much to it.

Sorry but I understand what UCS character means
but what exactly is "UTF-8 character" you are using?

Is this 1) a single octet in UTF-8 sequence or
2) is a sequence of octets representing one unicode character (21 bit value)


>
>> 2) In ASCII char(0) is officially NUL. Why not to initialize strings
>> by null?
>
> Because 0 characters are valid UTF-8 values. By using an invalid UTF-8 
> value, we can flush out bugs from uninitialized data.

Oh....

0 as a value of UTF-8 octet can represent only single value character
with codepoint 0x00000000.

In plain English: UTF-8 encoded strings cannot contain zeros in the middle.


>
>> I don't get it, sorry. In KOI-8R (Russian) enconding 0xFF is letter '?'
>> Are you saying that I cannot use char[] to represen russian text in D?
>
> char[] is for UTF-8 encoded text only. For other encoding systems, use 
> ubyte[]. But rest assured that Russian (and every other language) has a 
> defined encoding in UTF-8, which is why it was selected for D.

Sorry but char[acter] in plain english means character - index of some
human readable glyph in some table like ASCII, KOI-8,
MAC-ASCII, whatever.

Element of UTF-8 sequence is an octet.  I think you should rename
'char' type to 'octet' if D/Phobos intended to support only UTF-8.

Andrew.



















More information about the Digitalmars-d mailing list