Let's go back to the last example:

uint32_t test_3( uint32_t arg0, uint32_t arg1 )
{
uint32_t arg0_test = arg0 & 0x00040000;
uint32_t arg1_test = arg1 & 0x00080000;
uint32_t test_comb = arg0_test | arg1_test;
uint32_t mask_comb = (uint32_t)(((int32_t)( test_comb | -test_comb ))>>31);
uint32_t result = ( arg0 & mask_comb ) | ( arg1 & (~mask_comb) );
return (result);
}

Here's the interesting bit:
uint32_t mask_comb = (uint32_t)(((int32_t)( test_comb | -test_comb ))>>31);

What we're doing here is creating an integer mask based on the results
from combined test.
An integer mask is similar to a predicate value except that instead of
true being defined as one, true is defined as "all bits on", which in
two's complement is negative one. The code above is *logically*equivalent to:
uint32_t mask_comb = ( test_comb != 0 ) ? 0xffffffff : 0x00000000;

How does it work?

uint32_t test_comb_sign_msb = test_comb | -test_comb;

The important thing to recognize is that any negative number will have
the most significant bit (MSB) set, and that's the only bit we care
about. So for any value of test_comb that is not zero, test_comb OR
negative test_comb will result in the MSB being set. However, for zero
(since there is no negative zero in integer code) zero OR negative zero
is still exactly zero.
uint32_t test_comb_saturate_sign = ((int32_t)test_comb_sign_msb) >> 31;

The cast to *int32_t*is important. We are telling the compiler that we want a arithmetic shift right (SRA), not a logical shift right (SRL). An SRA will use the MSB of the integer to fill the gap on the left caused by the shift (and is used with signed values). A SRL will fill the gap with zero (and is used with unsigned values). And because we know the sign is stored in the MSB and because we are shifting the MSB all the way to the right we know that the entire value will be filled with the sign bit. In other words, we are saturating the value with the sign bit.

uint32_t mask_comb = (uint32_t)test_comb_saturate_sign;

The
cast back into an unsigned value is just to be sure that we don't get
any unwanted sign extensions. In this particular case it wouldn't
really make a difference, but it's a good habit. Use signed values for math. Use unsigned
values for bit manipulation, unless you have a reason to need a signed
intruction - just remember to cast it back when you're done.

**Okay. We have all the bits set to zero, or all set to one. Now what?**

uint32_t result = ( arg0 & mask_comb ) | ( arg1 & (~mask_comb) );

What we're saying here is:
if all the bits of mask_comb are one,
then ( arg0 & mask_comb ) must be arg0
and ( arg1 & (~mask_comb) ) must be zero
...
if all the bits of mask_comb are zero,
then ( arg0 & mask_comb ) must be zero
and ( arg1 & (~mask_comb) ) must be arg1
You might think that ( arg1 & (~mask_comb) ) would map to two instructions:
TMP = complement of mask_comb
RESULT = arg1 and TMP
But the PPU has two instructions that are very handy for dealing with masks:
*andc*: And with Complement ( c = a & ~b )

*orc*: Or with Complement ( c = a | ~b ) Which makes this operation extremely quick. And now that the result has been selected based on the mask, it can simply be returned. Let look at the compiler output again:

test_3:
rlwinm 8,3,0,13,13 # arg0_test = arg0 << 13
rlwinm 10,4,0,12,12 # arg1_test = arg1 << 12
or 7,8,10 # test_comb = arg0_test | arg1_test
neg 6,7 # test_comb_neg = -test_comb
srawi 0,6,31 # mask_comb = test_comb_neg SRA 31
and 9,3,0 # arg0_result = arg0 & mask_comb
andc 3,4,0 # arg1_result = arg1 & ~mask_comb
or 3,9,3 # result = arg0_result | arg1_result
blr # RETURN (result)

**What do we see here?**[list] 1. No branches 2. No comparison operations, thus no dependencies on CR at all [/list] And because there are no optimization barriers in this code, when it is inlined into another section of code the compiler will be able to schedule it very well. With the branches and comparison instructions, the compiler is much more limited on how the instructions can be scheduled and must also deal with any contention for the condition registers.

**But doesn't that look a bit...**Sure. Let's clean it up a bit then by defining a couple of handy inline functions:

*messy*?
// Compare Not Zero for 32 bit integer values
static inline uint32_t cmpnz_u32( const uint32_t arg )
{
/* Set MSB if arg is positive */
const uint32_t arg_neg = -arg;
/* Set MSB if arg is not zero */
const uint32_t arg_snz = arg | arg_neg;
/* Set all bits to MSB */
const uint32_t arg_snz_sat = (uint32_t)( ((int32_t)arg_snz) >> 31 );
return ( arg_snz_sat );
}
// Select between 32 bit integer values from mask
static inline uint32_t sel_u32( const uint32_t mask, const uint32_t arg0, const uint32_t arg1 )
{
/* Set to arg0 only if mask is set */
const uint32_t arg0_result = arg0 & mask;
/* Set to arg1 only if mask is clear */
const uint32_t arg1_result = arg1 & (~mask);
/* Only one result can be non-zero. */
const uint32_t result = arg0_result | arg1_result;
return ( result );
}

And look how that cleans up the code...
uint32_t test_4( uint32_t arg0, uint32_t arg1 )
{
uint32_t arg0_test = arg0 & 0x00040000;
uint32_t arg1_test = arg1 & 0x00080000;
uint32_t test_comb = arg0_test | arg1_test;
uint32_t mask_comb = cmpnz_u32( test_comb );
uint32_t result = sel_u32( mask_comb, arg0, arg1 );
return (result);
}

**But does that generate the same results?**Yes. Identical results:

rlwinm 8,3,0,13,13
rlwinm 10,4,0,12,12
or 7,8,10
neg 6,7
srawi 0,6,31
and 9,3,0
andc 3,4,0
or 3,9,3
blr

Helpful Identities

Replacing integer
branches with masks sometimes involves gymnastics with bitwise
transformations. Here is a table of transformations intended to be
helpful as reference:
--------------------------------------
x = 0
-------------
xor x, p, p
--------------------------------------
x = -1
-------------
orc x, p, p
--------------------------------------
x = p
-------------
or x, p, p
--------------------------------------
x = p
-------------
and x, p, p
--------------------------------------
x = ~p
-------------
nor x, p, p
--------------------------------------
x = ~p
-------------
nand x, p, p
--------------------------------------
x = p = p & -1
-------------- -----------------
or x, p, p orc t0, p, p
and x, p, t0
--------------------------------------
x = 0 = p & 0
-------------- -----------------
xor x, p, p andi x, p, 0
--------------------------------------
x = p | p = p & p
-------------- -----------------
or x, p, p and x, p, p
--------------------------------------
x = p | p = p | ( p & q )
-------------- -----------------
or x, p, p and t0, p, q
or x, p, t0
--------------------------------------
x = ~(p|p) = ~(p&p)
-------------- -----------------
nor x, p, p nand x, p, p
--------------------------------------
x = ~(p|p) = ~((p|p) & (p|p))
-------------- -----------------
nor x, p, p or t0, p, p
nand x, t0, t0
--------------------------------------
x = ~(p|q) = ~((p|q) & (p|q))
-------------- -----------------
nor x, p, q or t0, p, q
nand x, t0, t0
--------------------------------------
x = ~(p|q) = (~p) & (~q)
-------------- -----------------
nor x, p, q nand t0, p, p
andc x, t0, q
--------------------------------------
x = ~(p^q) = ~((p^q) & (p^q))
-------------- -----------------
eqv x, p, q xor t0, p, q
nand x, t0, t0
--------------------------------------
x = ~(p^q) = ~(p^q)
-------------- -----------------
xor t0, p, q eqv x, p, q
nor x, t0, t0
--------------------------------------
x = p & q = ~(p^q) & q;
-------------- -----------------
and x, p, q xor t0, p, q
andc x, q, t0
--------------------------------------
x = ~(p&q) = ~(p&q)
-------------- -----------------
and t0, p, q nand x, p, q
nor x, t0, t0
--------------------------------------
x = ~(p&q) = (p^q) & q
-------------- -----------------
nand x, p, q xor t0, p, q
and x, t0, q
--------------------------------------
x = ~(p&q) = (~p) | (~q)
-------------- -----------------
nand x, p, q nor t0, p, p
orc x, t0, q
--------------------------------------
x = p ^ q = (p|q) & ~(p&q)
-------------- -----------------
xor x, p, q or t0, p, q
nand t1, p, q
and x, t0, t1
--------------------------------------
x = p & (q&r) = (p&q) & r
-------------- -----------------
and t0, q, r and t0, p, q
and x, p, t0 and x, t0, r
--------------------------------------
x = p | (q|r) = (p|q) | r
-------------- -----------------
or t0, q, r or t0, p, q
or x, p, t0 or x, t0, r
--------------------------------------
x = p & (q|r) = (p&q) | (p&r)
-------------- -----------------
or t0, q, r and t0, p, q
and x, p, t0 and t1, p, r
or x, t0, t1
--------------------------------------
x = p | (q&r) = (p|q) & (p|r)
-------------- -----------------
and t0, q, r or t0, p, q
or x, p, t0 or t1, p, r
and x, t0, t1
--------------------------------------

Other References

Here are some additional references that might prove useful: