repack ubyte[] to use only 7 bits

Manolo via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Sat Dec 13 03:58:06 PST 2014


On Saturday, 13 December 2014 at 11:20:21 UTC, Manolo wrote:
> On Saturday, 13 December 2014 at 10:09:27 UTC, Charles Hixson 
> via Digitalmars-d-learn wrote:
>> Is there a standard way to do this?  The code below is 
>> untested, as I haven't yet written the x7to8 routine, and came 
>> up with a better way to do what this was to accomplish, but it 
>> feels as if this should be somewhere in the standard library, 
>> if I could only find it.
>>
>> /** Repack the data from an array of ubytes into an array of 
>> ubytes of
>> * which only the last 7 are significant.  The high bit will be 
>> set only
>> * if the byte would otherwise be zero.    */
>> byte[]    x8to7 (ubyte[] bin)
>> {
>>    ubyte[] bout;
>>    //    bit masks:    0 => 0xfe = 11111110, 0x00 = 00000000
>>    //                1 => 0x7f = 01111111, 0x00 = 00000000
>>    //                2 => 0x3f = 00111111, 0x80 = 10000000
>>    //                3 => 0x1f = 00011111, 0xc0 = 11000000
>>    //                4 => 0x0f = 00001111, 0xe0 = 11100000
>>    //                5 => 0x07 = 00000111, 0xf0 = 11110000
>>    //                6 => 0x03 = 00000011, 0xf8 = 11111000
>>    //                7 => 0x01 = 00000001, 0xfc = 11111100
>>    if (bin.length < 1)    return    bout;
>>    int    fByte, fBit;
>>    while    (fByte < bin.length)
>>    {    if (fByte + 1 == bin.length && fBit > 1)  break;
>>        ubyte    b;
>>        switch (fBit)
>>        {    case    0:
>>                b    =    bin[fByte]    / 2;
>>                break;
>>            case    1:
>>                b    =    bin[fByte] & 0x7f;
>>                break;
>>            case    2:
>>                ubyte    b1    =    (bin[fByte] & 0x3f) << 1;
>>                ubyte    b2    =    (bin[fByte + 1] & 0x80) >>> 
>> 7;
>>                b    ~=    (b1 | b2);
>>                break;
>>            case    3:
>>                ubyte    b1    =    (bin[fByte] & 0x1f) << 2;
>>                ubyte    b2    =    (bin[fByte + 1] & 0xc0) >>> 
>> 6;
>>                b    ~= (b1 | b2);
>>                break;
>>            case    4:
>>                ubyte    b1    =    (bin[fByte] & 0x0f) << 3;
>>                ubyte    b2    =    (bin[fByte + 1] & 0xe0) >>> 
>> 5;
>>                b    ~= (b1 | b2);
>>                break;
>>            case    5:
>>                ubyte    b1    =    (bin[fByte] & 0x07) << 4;
>>                ubyte    b2    =    (bin[fByte + 1] & 0xf0) >>> 
>> 4;
>>                b    ~= (b1 | b2);
>>                break;
>>            case    6:
>>                ubyte    b1    =    (bin[fByte] & 0x03) << 5;
>>                ubyte    b2    =    (bin[fByte + 1] & 0xf8) >>> 
>> 3;
>>                b    ~= (b1 | b2);
>>                break;
>>            case    7:
>>                ubyte    b1    =    (bin[fByte] & 0x01) << 6;
>>                ubyte    b2    =    (bin[fByte + 1] & 0xfc) >>> 
>> 2;
>>                b    ~= (b1 | b2);
>>                break;
>>            default:
>>                assert (false, "This path should never be 
>> taken");
>>        }    //    switch (fBit)
>>        if    (b == 0)    bout    ~=    0x80;
>>        else            bout    ~=    b;
>>        fBit    =    fBit + 7;
>>        if    (fBit > 7)
>>        {    fByte++;
>>            fBit -=    7;
>>        }
>>    }
>> }
>
> Are you trying to make a "kind-of" Variable-Length quantity 
> encoder ?
>
> eg:
> 0b10101110: last bit not set, integrate 0b10101110 and stop 
> reading.
> 0b10011001: last bit set, integrate 0b10011000 and continue to 
> next byte.
>
> http://en.wikipedia.org/wiki/Variable-length_quantity
>
> except that this algo limits the length to 24 bits. It was used 
> a lot with
> MIDI, at a time when hardware memory was costly (eg inside 
> hardware synthesizer or workstations).

Sorry, lack or accuraccy: the maximum value represented was a 24 
bit unsigned integer, but the data length was 32 bit for this 
value.
The thing is that the format included various fields, but because 
of the memory price the algo saved space when values where less 
than 0X7F, because only one byte was needed. Nowadays such as 
format would allow for example always 4 bytes to describes the 
data length:

nowadays:
data len "L" | data
0 1 2 3      | 4 ... L-1

so "nowadays", we can afford 4 bytes to say that a field length 
is 1

olddays:
variable len VL | data
1 to ?          | VL ... VL-1

"olddays", they used only one byte to say that a field length is 
1.


More information about the Digitalmars-d-learn mailing list