repack ubyte[] to use only 7 bits

Sat Dec 13 12:41:25 PST 2014

On Saturday, 13 December 2014 at 19:52:33 UTC, Charles Hixson via 
Digitalmars-d-learn wrote:
>
> On 12/13/2014 03:20 AM, Manolo via Digitalmars-d-learn wrote:
>> On Saturday, 13 December 2014 at 10:09:27 UTC, Charles Hixson 
>> via Digitalmars-d-learn wrote:
>>> Is there a standard way to do this?  The code below is 
>>> untested, as I haven't yet written the x7to8 routine, and 
>>> came up with a better way to do what this was to accomplish, 
>>> but it feels as if this should be somewhere in the standard 
>>> library, if I could only find it.
>>>
>>> /** Repack the data from an array of ubytes into an array of 
>>> ubytes of
>>> * which only the last 7 are significant.  The high bit will 
>>> be set only
>>> * if the byte would otherwise be zero.    */
>>> byte[]    x8to7 (ubyte[] bin)
>>> {
>>>    ubyte[] bout;
>>>    //    bit masks:    0 => 0xfe = 11111110, 0x00 = 00000000
>>>    //                1 => 0x7f = 01111111, 0x00 = 00000000
>>>    //                2 => 0x3f = 00111111, 0x80 = 10000000
>>>    //                3 => 0x1f = 00011111, 0xc0 = 11000000
>>>    //                4 => 0x0f = 00001111, 0xe0 = 11100000
>>>    //                5 => 0x07 = 00000111, 0xf0 = 11110000
>>>    //                6 => 0x03 = 00000011, 0xf8 = 11111000
>>>    //                7 => 0x01 = 00000001, 0xfc = 11111100
>>>    if (bin.length < 1)    return    bout;
>>>    int    fByte, fBit;
>>>    while    (fByte < bin.length)
>>>    {    if (fByte + 1 == bin.length && fBit > 1) break;
>>>        ubyte    b;
>>>        switch (fBit)
>>>        {    case    0:
>>>                b    =    bin[fByte]    / 2;
>>>                break;
>>>            case    1:
>>>                b    =    bin[fByte] & 0x7f;
>>>                break;
>>>            case    2:
>>>                ubyte    b1    =    (bin[fByte] & 0x3f) << 1;
>>>                ubyte    b2    =    (bin[fByte + 1] & 0x80)
>>> >>> 7;
>>>                b    ~=    (b1 | b2);
>>>                break;
>>>            case    3:
>>>                ubyte    b1    =    (bin[fByte] & 0x1f) << 2;
>>>                ubyte    b2    =    (bin[fByte + 1] & 0xc0)
>>> >>> 6;
>>>                b    ~= (b1 | b2);
>>>                break;
>>>            case    4:
>>>                ubyte    b1    =    (bin[fByte] & 0x0f) << 3;
>>>                ubyte    b2    =    (bin[fByte + 1] & 0xe0)
>>> >>> 5;
>>>                b    ~= (b1 | b2);
>>>                break;
>>>            case    5:
>>>                ubyte    b1    =    (bin[fByte] & 0x07) << 4;
>>>                ubyte    b2    =    (bin[fByte + 1] & 0xf0)
>>> >>> 4;
>>>                b    ~= (b1 | b2);
>>>                break;
>>>            case    6:
>>>                ubyte    b1    =    (bin[fByte] & 0x03) << 5;
>>>                ubyte    b2    =    (bin[fByte + 1] & 0xf8)
>>> >>> 3;
>>>                b    ~= (b1 | b2);
>>>                break;
>>>            case    7:
>>>                ubyte    b1    =    (bin[fByte] & 0x01) << 6;
>>>                ubyte    b2    =    (bin[fByte + 1] & 0xfc)
>>> >>> 2;
>>>                b    ~= (b1 | b2);
>>>                break;
>>>            default:
>>>                assert (false, "This path should never be 
>>> taken");
>>>        }    //    switch (fBit)
>>>        if    (b == 0)    bout    ~=    0x80;
>>>        else            bout    ~=    b;
>>>        fBit    =    fBit + 7;
>>>        if    (fBit > 7)
>>>        {    fByte++;
>>>            fBit -=    7;
>>>        }
>>>    }
>>> }
>>
>> Are you trying to make a "kind-of" Variable-Length quantity 
>> encoder ?
>>
>> eg:
>> 0b10101110: last bit not set, integrate 0b10101110 and stop 
>> reading.
>> 0b10011001: last bit set, integrate 0b10011000 and continue to 
>> next byte.
>>
>> http://en.wikipedia.org/wiki/Variable-length_quantity
>>
>> except that this algo limits the length to 24 bits. It was 
>> used a lot with
>> MIDI, at a time when hardware memory was costly (eg inside 
>> hardware synthesizer or workstations).
>>
> What I was trying to do was pack things into 7 bits so I could 
> recode 0's as 128.  I finally thought clearly about it and 
> realized that I only needed to use one particular byte value (I 
> chose 127) to duplicate so I could repack things with a string 
> of 0's replaced by 127 followed by the length (up to 126) of 
> zeros, and for 127 itself I'd just emit 127 twice.  This was to 
> pack binary data into a string that C routines wouldn't think 
> had ended partway through.  (If I get more than 127 zeros in a 
> row, I just have more than one packing code.)

Sorry, I misunderstood the thing.