Improving D's support of code-pages
Rioshin an'Harthen
rharth75 at hotmail.com
Mon Aug 20 13:01:09 PDT 2007
"Kirk McDonald" <kirklin.mcdonald at gmail.com> kirjoitti viestissä
news:facpkj$13ml$1 at digitalmars.com...
> Regan Heath wrote:
>> Kirk McDonald wrote:
>> Technically 'char' in C is a signed byte, not an unsigned one therefore
>> byte[] is more accurate.
>
> I don't agree with this last part. For starters, I had thought the
> signed-ness of 'char' in C was not defined. In any case, we're talking
> about chunks of arbitrary, homogenous binary data, so I think ubyte[] is
> most appropriate.
<ramble>
True. The C standard does not define the signedness of the char type.
What it does require of the char type is that it guarantees that any
character
in the basic execution character set
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z
0 1 2 3 4 5 6 7 8 9
! " # % & ' ( ) * + , - . / : ; < = > ? [ \ ] ^ _ { | } ~
fit into a char in such a way that they are non-negative.
Quote from standard (ISO/IEC 9899:TC2 Committee Draft May 6, 2005):
6.2.5 Types
3 An object declared as type char is large enough to store any member of
the
basic execution character set. If a member of the basic execution
character
set is stored in a char object, its value is guaranteed to be
nonnegative. If
any other character is stored in a char object, the resulting value is
implementation-defined but shall be within the range of vlaues that can
be
represented in that type.
5.2.4.2.1 Size of integer types <limits.h>
- number of bits for smallest object that is not a bit-field (byte)
CHAR_BIT 8
- minimum value for an object of type signed char
SCHAR_MIN -127 // -(2^7 - 1)
- maximum value for an object of type signed char
SCHAR_MAX 127 // 2^7 - 1
- maximum value for an object of type unsigned char
UCHAR_MAX 255 // 2^8 - 1
- minimum value for an object of type char
CHAR_MIN see below
- maximum value for an object of type char
CHAR_MAX see below
2 If the value of an object of type char is treated as a signed integer
when
used in an expression, the value of CHAR_MIN shall be the same as
that of SCHAR_MIN and the value of CHAR_MAX shall be the same
as that of SCHAR_MAX. Otherwise, the value of CHAR_MIN shall
be 0 and the value of CHAR_MAX shall be the same as that of
UCHAR_MAX. The value of UCHAR_MAX shall equal
2^(CHAR_BIT) - 1.
</ramble>
So, applying this to the discussion would suggest that either byte[] or
ubyte[] would be appropriate. However, the most natural would be to
handle data as raw data without signs, thus ubyte[] feels more natural to
use as the standard type for any data whatsoever.
More information about the Digitalmars-d
mailing list