strings and endianness
Jonathan M Davis
jmdavisProg at gmx.com
Wed Jan 18 12:16:56 PST 2012
On Wednesday, January 18, 2012 20:40:33 Johannes Pfau wrote:
> I'm currently finishing std.uuid (see
> http://prowiki.org/wiki4d/wiki.cgi?ReviewQueue ). For name based hashes, a
> name string is passed to a hash function and I need to make sure that the
> resulting hash is the same on both little endian and big endian systems. So
> what's needed to convert a string to e.g little endian?
>
> string --> as string is basically a byte array, is byte swapping even
> necessary?
> wstring --> read as shorts and swap using nativeToLittleEndian()?
> dstring --> read as ints and swap using nativeToLittleEndian()?
>
> Also remotely related questions: AFAIK http://www.ietf.org/rfc/rfc4122.txt
> doesn't exactly specify what encoding/byte order should be used for the UUID
> names? Does this mean different implementations are allowed to generate
> different UUIDs for the same input? (See chapter 'Algorithm for Creating a
> Name-Based UUID')
>
> RFC4122 also says "put the name space ID in network byte order.", but the
> namespace is a ubyte[16], so how should this work?
>
> Should name based UUIDs be different if they were created with the same
> name, but using different encodings(string vs wstring vs dstring)? That's
> the way boost.uuid implements it.
If RFC 4122 says that it's using big endian (and I'd be shocked if anything
like that used little endian), then you need to convert to big endian. How
that conversion is done though, depends on what each of the values represent.
If they're 4 uints, then you'd need to sway each set of 4 bytes. If they're 8
ushorts, then you need to swap each set of 2 bytes.
However, I belive that RFC 4122 is laid out like this
uint
ushort
ushort
ubyte
ubyte
ubyte
ubyte
ubyte
ubyte
ubyte
ubyte
So, you'd need to have the first 4 bytes in big endian as a uint, and the next
2 set of 2 bytes in big endian as ushorts, leaving the rest alone.
As for strings. Remember that they're representing the data in the bytes, so I
don't believe that it makes sense to try and convert wstrings or dstrings to a
uuid directly. IIRC, the string must be 32 characters long (excepting the
dashes) and that each of those characters represents the hex for a nibble in
the UUID. So, if you have
58DF357E-8918-408D-8ABB-AFB70864ED9F
5 is the hex value for the first 4 bits in str[0], 8 is the hex value for the
second 4 bits in str[0], D is the hex value for the first 4 bits in str[1],
etc. So, there's no endian conversion going on at all. You just take the
characters (regardless of the type of string) and convert each hex character
to its corresponding integral value ('5' -> 5, '8' -> 8, 'D' -> 13, etc.) and
set the corresponding nibble in the ubyte[16] for each.
You're going to have to study RFC 4122 though, and make sure that you
understand it properly. I'm going primarily off of how I've seen UUID's
implemented before. All of this should be in the RFC.
- Jonathan M Davis
More information about the Digitalmars-d-learn
mailing list