Better Performance Through Branch Elimination

Articles

Branch-free implementation of half-precision (16 bit) floating point
The goal of this project is serve as an example of developing some relatively complex operations completely without branches - a software implementation of half-precision floating point numbers.

Announcement: 1st International Symposium on CELL Computing (CCS 2006)
The 1st International Symposium on CELL Computing (CCS 2006) will be held this year during SIGGRAPH 2006, July 27-29 in Boston, MA. Presentations and papers will be delivered by IBM, Sony, the video game and other high-performance industry experts.

Box Overlap
A look at a function to test for overlap between 3D boxes, and how to optimize it for the CBE.

A 4x4 Matrix Inverse
Study case about how to convert scalar code indo SIMD code for PPU and SPU using the matrix inverse as example.

Avoiding Microcoded Instructions On The PPU
Executing instructions from microcode can wreck havok on inner loop performance. Find out which instructions are microcoded and how to avoid them.

Choosing to Avoid Branches: A Small Altivec Example
An example of why less instructions doesn't always equal faster code.

More Techniques for Eliminating Branches
Some additional examples for eliminating integer and floating-point branches.

Programming with Branches, Patterns and Tips
GCC follows some straightforward rules that are useful to know when programming with branches.

Benefits to Branch Elimination
The fundamental principal behind branch elimination is that expressing a value as a simple function of its inputs (a single basic block) is often more efficient than selecting a result through a change in control flow (branching).

Background on Branching
A background in understanding how branches operate on the PPU and SPU.

Better Performance Through Branch Elimination
An introduction to branch penalties: Why it's a good idea to avoid branchy code.

Links

Other Articles by Mike Acton
Thoughts on performance, the video game industry, and development.

CellPerformance.com Forums
Discussing all things related to getting the best performance from your Cell Broadband Engine (CBE) processor.

The Cell Developer's Corner at Power.org
"Resources for Cell Broadband Engine™ development."

IBM's Cell Broadband Engine Resource Center
"Your definitive resource for all things Cell BE"

IBM's Cell Architecture Forum
"This forum is a place for technical discussion of the Cell processor architecture and related questions."

GameTomorrow
"IBM leaders discuss the future of gaming"

No Insider Info!

Although discussions on applying the Cell processor to game development are welcome here, do not ask for insider information related to Sony's Playstation 3.

The details of the hardware and development are covered by a non-disclosure agreement and under no conditions will confidential information be permitted on this site.

Playstation 3 developers are welcome to participate in the discussions but be aware that this is a publicly accessable site and information not available to the general public may not be disclosed.

Keep it clean so that we can continue to build on the community of Cell developers both inside and outside video game development.

Thank you for your cooperation,
Mike.

Legal

Content Copyright © 2006 by Mike Acton. All Rights Reserved.

This site uses the Movable Type 3.2 content engine.

This site uses the phpBB bulletin board engine Copyright © 2001, 2005 phpBB Group.

Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc

PowerPC is a trademark of International Business Machines Corporation.

Linux is a registered trademark of Linus Torvalds in the U.S. and other countries.

Macintosh, and Mac are registered trademarks of Apple Computer, Inc

All other trademarks are the property of their respective owners.

Better Performance Through Branch Elimination

Mike Acton
April 11, 2006

Introduction

Second only to poor data access patterns, branches can have a big negative impact in the performance of a program. Methods for reducing branch penalties, such as both dynamic and static (software-assisted) branch prediction hardware, despite their successes, are increasingly less effective as the length of the instruction pipelines increase, particularly with in-order architectures where execution must be stalled when hardware prediction fails.

Branching, both conditional and unconditional, slows most implementations. Even an unconditional branch or a correctly predicted taken branch may cause a delay if the target instruction is not in the fetch buffer or the cache. It is therefore best to use branch instructions carefully and to devise algorithms that reduce branching. Many operations that normally use branches may be performed either with fewer or no branches.

-- From IBM's The PowerPC Compiler Writer's Guide 3.1.5

Branches represent a significant part of both performance critical and general purpose code - as a general rule of thumb, 20% of the instructions in typical code are branches. In inner loops and other code sections which demand the highest performance may benefit from a multifold increase in performance by eliminating, or reducing, branches.

This series of articles will present the types of delays that branches may cause in program execution and some programming patterns that help avoid those delays.

Part 1: Background on Branching
Part 2: Benefits to Branch Elimination
Part 3: Programming with Branches, Patterns and Tips
Part 4: More Techniques for Eliminating Branches
Part 5: Summary

Comment on this article