Proposal new type "flags"

Wed Feb 7 00:43:47 PST 2007

Sorry for the delay - I haven't been following the group for a while.

Your reply shifted my perspective on the problem a bit, but I still
disagree with your proposed approach to flags.

I think basically that you are trying to do with this flags type
something that should be done using structs with bitfields, or a
similar type specification that allows each bit or group of bits to be
accessed directly using a simple field name (without bitwise
operators).

The point here is that, while the specification of the fields is more
complex than a simple struct, there is a payoff for the investment.
The compiler can check things automatically that your examples check
explicity using asserts, and there is no need for all the
bit-twiddling.

I don't actually know what D's support for bit-fields or packed
records is like, but here is a bit of syntax-fantasy...

  bitfield NodeSpec : INT16
  {
    at 0:14 unsigned Size;  //  half-open bit range
    at 14   bool     Is_Leaf;
    at 15   bool     Is_Branch;

    mask  All_Flags = Is_Leaf | Is_Branch;
      //  Each field defines an implicit mask as a property of the
      //  type, to allow bit-fiddling in any strange cases. The 'mask'
      //  clause allows a few special cases to be explicitly added.
  };

  NodeSpec x;

  x.all = 0;
    //  Clear the whole bitfield at once.

  x.all &= ~NodeSpec.All_Flags;
    //  an operation that may be easier as bit-twiddling rather than
    //  adding explicit support, though having appropriate masks
    //  available as type properties seems is useful.

  x.Is_Branch = true;
    //  The easy way to access any individual flag.

The permissions example raises an important question - why are *you*
defining those flags rather than using predefined library definitions.
Nevertheless...

  bitfield Permissions : INT8
  {
    lsb 1  bool READ, WRITE, EXECUTE;
      //  'lsb n' means 'use least significant n remaining bits'
      //  'msb n' and 'rest' clauses could also be used.
  };

  bitfield FilePermissions : INT32
  {
    //  Build a compound bitfield type

    //  Each field defaults to 'lsb' with the size being the
    //  size of the specified field type.

    //  This could have been a struct, but allowing bitfields
    //  to be composed from other bitfields may allow more
    //  useful mask properties to be defined.
    Permissions Owner;
    Permissions Group;
    Permissions Others;
  };

Composition from enum types would be the same as composition from
signed/unsigned. An advantage to this approach is that the compiler
should be able to tell whether sufficient space is available in the
bitfield.

  enum Genre  : { UNKNOWN, CLASSIC, HEAVYMETAL, LULLABY };
  enum Format : { UNKNOWN, VORBIS, MP3, WAV };

  bitfield MusicInfo  //  default to int
  {
    Genre  m_Genre  = UNKNOWN;
    Format m_Format = UNKNOWN;
      //  use default position and sizing, and specify default
      //  values
  };

  search ("blah", MusicInfo (HEAVYMETAL, UNKNOWN));
  search ("blah", MusicInfo (m_Format=MP3));

The key point of all of this is simply that defining the bitfield type
saves work when accessing the bitfield, primarily by avoiding the need
for bit-twiddling. Adding properties to the type and values of that
type, along with implicit definition of a constructor, are just
obvious ways that a whole bunch of things normally done with
bit-twiddling can still be supported in a convenient and safe way, and
how bit-twiddling can still be supported as a last resort without
needing any casting or unions or other changes to the code structure.

The key issue with this bitfield approach is that it is really just a
tweak of what C structs with bitfields already do. Maybe what's really
needed is better support for bitfields in structs.

Other than that...

On Mon, 05 Feb 2007 20:28:09 +0100, Rueschi <rueschi at GIquadrat.de>
wrote:

>Steve Horne schrieb:

>Doesn't D allow you to write
>  alias FEATURE1|FEARURE2 FEATURE_SET;
>or similar? When testing, use
>  if ( FEATURE_SET==given_flags ) { /* all and only FEATURE_SET given */ }
>or
>  if ( FEATURE_SET<>given_flags ) { /* at least all FEATURE_SET given */ }

Maybe (not in D mode ATM) but I still don't see any real gain.

As soon as you need to define special masks outside the flags
definition, you may as well have simply defined explicit enum values
anyway. In fact, the enum would be clearer - all the values are
defined in one place.

>> Another common requirement is to pack a couple of flags into the most
>> significant bits of an integer that doesn't need all its bits.

<snip>

>This is contrary to the statement of yours, on modern
>multi-megabyte-machines the additional space of one byte would
>not be worth the effort.

No - "common" is a relative term. Doing bit-twiddling is generally a
bad idea, and you should only do it when you have a good reason.
Within that set of occasions, cases such as the flags in the most
significant bits and other special cases are very common whereas a
simple set of flags counting up from bit zero is in my experience
quite rare.

> Anyway you can write something like:

Unnecessarily complex. Explicit setting of mask values in an enum
would be much simpler, and the compile-time checking is almost
certainly unnecessary. If compile-time checking is needed, the way
these checks are written is itself verbose and error-prone.

The compile-time-checking ideal would be a bitfield system similar to
the one I describe above, where the compiler understands the
specification of the bitfield as a whole and can report errors without
the need for asserts.

>My approach makes no assumption to which bits are used. A compiler writer
>may even choose to use one byte per flag, as long as I get my type safety
>and operations

Type-safety isn't always the best way to avoid errors. Your approach
doesn't make access to the flags any simpler, especially since you
suggest using unions to overcome some of the limitations.

Don't get me wrong. Your own common usage examples seem nice and
simple both for specification and access. But your approach to
handling my tree node example shows that even slightly more complex
requirements can cause problems.

Fields are accessed many times more often than they are defined, so if
a flags/bitfield type is going to be considered, IMO your approach is
a missed opportunity. Taking more time to specify the type is most
beneficial if it results in assert-free compile-time error checks and
accesses that are simpler as well as safer.

>> Also, packing bits into an integer tends to be slower than just using
>> a boolean type, and since the saving in memory is usually small
>> (saving a couple of bytes here and there is pointless on machines with
>> megabytes, let alone gigabytes, of memory) there's no gain to pay for
>> the extra complexity. You should only do it if you have a specific
>> good reason.
>
>That is simply not true. When you have to pass flags as parameters, it
>is faster to combine them in one integral value instead of pushing and
>popping multiple bytes on and from the stack. It is waste of memory and
>cache pollution to represent a logical bit by a physical byte.

First, when putting the flags onto the stack, you first need to pack
them together into a single integer. If this can be done at compile
time, then you gain maybe a billionth of a second if you are very
lucky.

Then, we come to accessing the parameters.

Checking several separate booleans is really no problem - they will
still be in the processor cache from being pushed onto the stack, for
a start, and remember that the processor is shifting data in chunks
much larger than a few bytes anyway.

So any gains are not going to be huge.

On a non-intel architecture, the separate bools approach may have
gained a bit by not needing masking instructions to check single
flags, but this doesn't work so well on an intel chip - a mov
instruction doesn't set processor flags, so some kind of flag-setting
instruction will be needed in place of the masking.

So no huge gains, but it's all either good or, at least, no cost for
the bitset approach, eh?

Well, no. The bitset approach can limit the opportunities for
out-of-order execution (and maybe make branch prediction less
reliable).

Why?

To write a single flag (and any number of flags other than all of
them) requires a read-modify-write cycle. All the flags must be known
before the and/or flag-setting instruction can be performed.

Reading a single flag also requires all flags to be known (so the
masking instruction can be done).

This doesn't apply when using individual booleans.

Also, considering the 'mov' thing, masking is more specific than
testing for any non-zero value, so masking may still work out less
efficient. This...

  mov eax, [esp+blah]
  and eax, eax
  jnz label1

... will run slightly faster than this...

  mov  eax,  [esp+blah]
  test eax, MASK
  jnz  label1

... simply because 'and eax, eax' is a shorter instruction, whereas
the test instruction has a load of immediate data to worry about.
Furthermore, IIRC, modern processors tend to combine certain pairs of
instructions. A mov-then-test-for-nonzero is the kind of common pair
of instructions I'd expect a processor to handle efficiently,
effectively creating a single mov-with-flag-setting instruction.

mov-then-mask with every possibly mask accounted for? - doubt it.

So I stand by my original claim - dealing with packed bitfields tends
to be slower than dealing with separate booleans. There can be
performance benefits, but these tend to be at least matched by
performance penalties.

It's a tendency rather than an absolute, and there's not a lot in it
either way. I doubt that the difference would even register when
profiling most real-world software. And even if using booleans was
significantly faster, I wouldn't suggest switching just for
performance reasons - but the fact is that separate booleans are
easier to use, so it's the use of bitsets that needs to be justified.

>These instructions use the same number of registers, take approximately
>2 non-parallel instruction more but save 3 dwords on the stack.

So your code DOES take more instructions. AND it is more complex
(requiring three lines of condition-testing code rather than one).
Even in an example that you selected yourself, the only advantage to
counter two downsides is a few bytes less stack usage.

But you could have made a much stronger case, if only you'd noticed
that in the bitset example, testing the blink flag is actually
redundant - the RGB tests already cover it...

  if (   (color==AnsiColor.red  )
      || (color==AnsiColor.green)
      || (color==AnsiColor.blue ) ) { /* blah */ }

That would have saved you three assembler instructions. Three
instructions which look suspiciously as if they've been added by a
human programmer, to otherwise compiler-generated code. Not to mention
that your use of the blink flag was broken in the source (no mask, and
used as if it were a separate parameter).

So let's take a critical at the assembler for the bitset variant...

Here's those three blink-testing instructions.

    mov    eax, [esp+color]
    test   eax, 0x08
    jnz    label1

Contrasted with the remainder - a rather sophisticated calculation to
determine if precisely one of the interesting three bits is set. This
code is nothing like the source code, and not what the average human
programmer would write - it suggests a pretty intelligent
optimisation.

    and    eax, 0x07
    mov    edx, eax
    shr    edx, 2
    and    eax, 0x03
    dec    edx
    shr    eax, 1
    adc    eax, edx
    jnz    label1
    jmp    use_only_one_plane
  label1:

Unfortunately, it is an intelligent optimisation of different source
code - similar to, but different from, that you gave.

Look at the 'and eax, 0x07' masking at the top of that...

  1.  We already know that the blink flag is clear, so no need to
      mask it out.

  2.  The value checks in the source are whole-value comparisons, not
      restricted to only checking the first three bits, so the
      assembler doesn't match the source - it will miss cases where
      other bits are set that shouldn't be, and act differently to
      what the source code describes.

My suspicion - you have derived this example from real code, but have
modified it - and messed it up a bit. Your original source code
(ignoring the blink or other flags) probably looked something like
this...

  if (   (color&MASK == AnsiColor.red  )
      || (color&MASK == AnsiColor.green)
      || (color&MASK == AnsiColor.blue ) ) { /* blah */ }

As a result, you are arguing that twelve assembler instructions are at
least as fast as nine, where (if you'd got it right) you could have
been arguing that nine instructions are at least as fast as nine. I
know which one I'd prefer to be arguing ;-)

So with a bit of work, your bitset version can be made to appear a lot
more compelling. Even so, IMO you're still wrong.

Four flags as independent flag parameters is unusual. Even implemented
as booleans, the red, green and blue flags at least would be packed
into an AnsiColor struct (the 'blink' parameter looks like a misfit in
that collection).

Specify byte alignment for that struct, and suddenly the whole package
is one DWORD. Both versions use the same amount of stack - one or two
DWORDs depending on whether the blink flag is included along with the
RGB flags or not. When multiple flags are tested at once, that will be
optimised for both cases. But with the separate booleans (in a
byte-aligned struct) version, single flags can still be tested
individually without masking, and modified individually without
needing a read-modify-write cycle.

BTW - interesting question - Why didn't the compiler generate
something equivalent to...

  static byte Lookup [8] = { false, true,  true,  false,
                             true,  false, false, false };

  if (Lookup [color & MASK])
  {
    /*  blah  */
  }

I assume your example assembler is a symptom of supercompilation -
someone has done an exhaustive search to find the best possible
assembler for some common source code fragments, and built the results
into the compilers back end. But the lookup table approach looks at
least as efficient to me.

That should give something like...

  mov  eax, [esp+color]
  and  eax, 0x07
  mov  eax, [Lookup+eax]
  and  eax, eax
  jz   label1

This ignores the blink flag again - it's easily handled by either
doubling the lookup table size, or by adding those three instructions
again.

Since the lookup table can be embedded in the code (as switch lookup
tables usually are), it may even be addressed relative to the
instruction pointer, saving a few bytes more.

Presumably I'm missing something. No surprise there - my assembler is
extremely rusty.

-- 
Remove 'wants' and 'nospam' from e-mail.