Improving D's support of code-pages

Rioshin an'Harthen rharth75 at hotmail.com
Mon Aug 20 13:01:09 PDT 2007


"Kirk McDonald" <kirklin.mcdonald at gmail.com> kirjoitti viestissä 
news:facpkj$13ml$1 at digitalmars.com...
> Regan Heath wrote:
>> Kirk McDonald wrote:
>> Technically 'char' in C is a signed byte, not an unsigned one therefore 
>> byte[] is more accurate.
>
> I don't agree with this last part. For starters, I had thought the 
> signed-ness of 'char' in C was not defined. In any case, we're talking 
> about chunks of arbitrary, homogenous binary data, so I think ubyte[] is 
> most appropriate.

<ramble>
True. The C standard does not define the signedness of the char type.

What it does require of the char type is that it guarantees that any 
character
in the basic execution character set

    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
    a b c d e f g h i j k l m n o p q r s t u v w x y z
    0 1 2 3 4 5 6 7 8 9
    ! " # % & ' ( ) * + , - . / : ; < = > ? [ \ ] ^ _ { | } ~

fit into a char in such a way that they are non-negative.

Quote from standard (ISO/IEC 9899:TC2 Committee Draft May 6, 2005):

6.2.5 Types
3  An object declared as type char is large enough to store any member of 
the
    basic execution character set. If a member of the basic execution 
character
    set is stored in a char object, its value is guaranteed to be 
nonnegative. If
    any other character is stored in a char object, the resulting value is
    implementation-defined but shall be within the range of vlaues that can 
be
    represented in that type.

5.2.4.2.1 Size of integer types <limits.h>
    - number of bits for smallest object that is not a bit-field (byte)
        CHAR_BIT                    8
    - minimum value for an object of type signed char
        SCHAR_MIN                -127 // -(2^7 - 1)
    - maximum value for an object of type signed char
        SCHAR_MAX                127 // 2^7 - 1
    - maximum value for an object of type unsigned char
        UCHAR_MAX                255 // 2^8 - 1
    - minimum value for an object of type char
        CHAR_MIN                    see below
    - maximum value for an object of type char
        CHAR_MAX                   see below

2  If the value of an object of type char is treated as a signed integer 
when
    used in an expression, the value of CHAR_MIN shall be the same as
    that of SCHAR_MIN and the value of CHAR_MAX shall be the same
    as that of SCHAR_MAX. Otherwise, the value of CHAR_MIN shall
    be 0 and the value of CHAR_MAX shall be the same as that of
    UCHAR_MAX. The value of UCHAR_MAX shall equal
    2^(CHAR_BIT) - 1.
</ramble>

So, applying this to the discussion would suggest that either byte[] or
ubyte[] would be appropriate. However, the most natural would be to
handle data as raw data without signs, thus ubyte[] feels more natural to
use as the standard type for any data whatsoever. 




More information about the Digitalmars-d mailing list