Better Performance Through Branch Elimination
Mike Acton
April 11, 2006
April 11, 2006
Introduction
Second only to poor data access patterns, branches can have a big negative impact in the performance of a program. Methods for reducing branch penalties, such as both dynamic and static (software-assisted) branch prediction hardware, despite their successes, are increasingly less effective as the length of the instruction pipelines increase, particularly with in-order architectures where execution must be stalled when hardware prediction fails.
Branching, both conditional and unconditional, slows most implementations. Even an unconditional branch or a correctly predicted taken branch may cause a delay if the target instruction is not in the fetch buffer or the cache. It is therefore best to use branch instructions carefully and to devise algorithms that reduce branching. Many operations that normally use branches may be performed either with fewer or no branches.
-- From IBM's The PowerPC Compiler Writer's Guide 3.1.5
Branches represent a significant part of both performance critical and general purpose code - as a general rule of thumb, 20% of the instructions in typical code are branches. In inner loops and other code sections which demand the highest performance may benefit from a multifold increase in performance by eliminating, or reducing, branches.
This series of articles will present the types of delays that branches may cause in program execution and some programming patterns that help avoid those delays.
Part 1: Background on Branching
Part 2: Benefits to Branch Elimination
Part 3: Programming with Branches, Patterns and Tips
Part 4: More Techniques for Eliminating Branches
Part 5: Summary