Improving D's support of code-pages
Kirk McDonald
kirklin.mcdonald at gmail.com
Mon Aug 20 13:39:32 PDT 2007
Rioshin an'Harthen wrote:
> "Kirk McDonald" <kirklin.mcdonald at gmail.com> kirjoitti viestissä
> news:facpkj$13ml$1 at digitalmars.com...
>
>> Regan Heath wrote:
>>
>>> Kirk McDonald wrote:
>>> Technically 'char' in C is a signed byte, not an unsigned one
>>> therefore byte[] is more accurate.
>>
>>
>> I don't agree with this last part. For starters, I had thought the
>> signed-ness of 'char' in C was not defined. In any case, we're talking
>> about chunks of arbitrary, homogenous binary data, so I think ubyte[]
>> is most appropriate.
>
>
> <ramble>
> True. The C standard does not define the signedness of the char type.
>
> What it does require of the char type is that it guarantees that any
> character
> in the basic execution character set
>
> A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
> a b c d e f g h i j k l m n o p q r s t u v w x y z
> 0 1 2 3 4 5 6 7 8 9
> ! " # % & ' ( ) * + , - . / : ; < = > ? [ \ ] ^ _ { | } ~
>
> fit into a char in such a way that they are non-negative.
>
> Quote from standard (ISO/IEC 9899:TC2 Committee Draft May 6, 2005):
>
> 6.2.5 Types
> 3 An object declared as type char is large enough to store any member
> of the
> basic execution character set. If a member of the basic execution
> character
> set is stored in a char object, its value is guaranteed to be
> nonnegative. If
> any other character is stored in a char object, the resulting value is
> implementation-defined but shall be within the range of vlaues that
> can be
> represented in that type.
>
> 5.2.4.2.1 Size of integer types <limits.h>
> - number of bits for smallest object that is not a bit-field (byte)
> CHAR_BIT 8
> - minimum value for an object of type signed char
> SCHAR_MIN -127 // -(2^7 - 1)
> - maximum value for an object of type signed char
> SCHAR_MAX 127 // 2^7 - 1
> - maximum value for an object of type unsigned char
> UCHAR_MAX 255 // 2^8 - 1
> - minimum value for an object of type char
> CHAR_MIN see below
> - maximum value for an object of type char
> CHAR_MAX see below
>
> 2 If the value of an object of type char is treated as a signed integer
> when
> used in an expression, the value of CHAR_MIN shall be the same as
> that of SCHAR_MIN and the value of CHAR_MAX shall be the same
> as that of SCHAR_MAX. Otherwise, the value of CHAR_MIN shall
> be 0 and the value of CHAR_MAX shall be the same as that of
> UCHAR_MAX. The value of UCHAR_MAX shall equal
> 2^(CHAR_BIT) - 1.
> </ramble>
>
> So, applying this to the discussion would suggest that either byte[] or
> ubyte[] would be appropriate. However, the most natural would be to
> handle data as raw data without signs, thus ubyte[] feels more natural to
> use as the standard type for any data whatsoever.
Although this is interesting, and it does agree with what I was saying,
it is basically irrelevant. When passing a string to decode(), the bytes
therein could be in any encoding, even one which has nothing to do with
the above. (It could be in a multi-byte encoding!) None of those
guarantees which the C standard requires apply to these raw bytes.
Therefore ubyte[] is /definitely/ more appropraite.
--
Kirk McDonald
http://kirkmcdonald.blogspot.com
Pyd: Connecting D and Python
http://pyd.dsource.org
More information about the Digitalmars-d
mailing list