CellPerformance
All things related to getting the best performance from your Cell Broadband Engine™ (CBE) processor.
Suggestions? Comments? Questions?

Send email to Mike Acton
Articles
Fast Matrix Multiplication on Cell (SMP) Systems
Daniel Hackenberg wrote to tell me about some matrix multiply code he has written for the Cell.

Cleaning House
I'm working on a plan that will make the forums better and more useful. And hopefully, I can get a little help from some friends.

Handy PS3 Linux Framebuffer Utilities
While the documentation within Sony's vsync example should be enough to get you started with writing to the framebuffer, here's a couple of handy functions to test the framebuffer settings, open the virtual terminal and get access the the frame buffer.

HowTo: Huge TLB pages on PS3 Linux
Understanding the TLB and minimizing misses is a critical part of high performance Cell programming. Unfortunately some PS3 kernels do not come with huge page support enabled. Jakub Kurzak and Alfredo Buttari step through the details of recompiling the kernel for huge page support.

Cross-compiling for PS3 Linux
n this article, I will detail the basic steps I used to get started building on a host PC and running on the PS3.

Unaligned scalar load and store on the SPU
An example of unaligned loads and stores on the SPU. The solution to this problem is to remember that the SPU does not have a scalar instruction set or access local memory in anything except 16 bytes quadwords.

atan2 on SPU
A branch-free implementation of atan2 vector floats for the SPU.

Branch-free implementation of half-precision (16 bit) floating point
The goal of this project is serve as an example of developing some relatively complex operations completely without branches - a software implementation of half-precision floating point numbers.

Better Performance Through Branch Elimination
An introduction to branch penalties: Why it's a good idea to avoid branchy code.

Box Overlap
A look at a function to test for overlap between 3D boxes, and how to optimize it for the CBE.

A 4x4 Matrix Inverse
Study case about how to convert scalar code indo SIMD code for PPU and SPU using the matrix inverse as example.

Avoiding Microcoded Instructions On The PPU
Executing instructions from microcode can wreck havok on inner loop performance. Find out which instructions are microcoded and how to avoid them.

Choosing to Avoid Branches: A Small Altivec Example
An example of why less instructions doesn't always equal faster code.

More Techniques for Eliminating Branches
Some additional examples for eliminating integer and floating-point branches.

Programming with Branches, Patterns and Tips
GCC follows some straightforward rules that are useful to know when programming with branches.

Links
No Insider Info!
Although discussions on applying the Cell processor to game development are welcome here, do not ask for insider information related to Sony's Playstation 3.

The details of the hardware and development are covered by a non-disclosure agreement and under no conditions will confidential information be permitted on this site.

Playstation 3 developers are welcome to participate in the discussions but be aware that this is a publicly accessable site and information not available to the general public may not be disclosed.

Keep it clean so that we can continue to build on the community of Cell developers both inside and outside video game development.

Thank you for your cooperation,
Mike.
Legal
Content Copyright © 2006 by Mike Acton. All Rights Reserved.

This site uses the Movable Type 3.2 content engine.

This site uses the phpBB bulletin board engine Copyright © 2001, 2005 phpBB Group.

Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc

PowerPC is a trademark of International Business Machines Corporation.

Linux is a registered trademark of Linus Torvalds in the U.S. and other countries.

Macintosh, and Mac are registered trademarks of Apple Computer, Inc

All other trademarks are the property of their respective owners.
HowTo: Huge TLB pages on PS3 Linux
Mike Acton
January 30, 2007
Updated! (22 Mar 07) Minor edits. Added notes for YellowDog Linux. Added source code for using huge page allocation.
Updated! (30 Mar 07) A couple minor fixes. Thanks to Guénaël Renault for pointing them out!
Updated! (15 July 07) Added notes for kernel 2.6.21
Guest article: Understanding the TLB and minimizing misses is a critical part of high performance Cell programming. Unfortunately some PS3 kernels do not come with huge page support enabled. Jakub Kurzak and Alfredo Buttari step through the details of recompiling the kernel for huge page support.
The availability of huge TLB pages depends on the way the linux kernel has been configured prior to compilation. The default kernel that ships with Fedora Core 5 (most likely with any other distribution that has binary kernel packages) doesn't include this option. So, in order to have huge TLB pages, it is necessary to reconfigure the kernel, recompile it, instruct the boot loader about the newly created kernel image. Finally we will also show a way to allocate the TLB pages automatically at boot time.

[Mike Acton] This process also works with YellowDog Linux virtually unchanged.
Rebuilding the PS3 Linux Kernel
[Mike Acton] For more detailed information on the Linux Kernel and the build process, see:
[Mike Acton] For more information on using huge tlb pages, especially from user space, read hugetlbpages.txt which is found in the kernel source under /Documents/vm/
Here are the steps:

  1. Recompile the kernel in order to have huge TLB pages
    1. Take the kernel source from the add-on cd (filename is linux-20061110.tar.bz2)
      [Mike Acton] Download the PS3 Source Add-On CD [qj.net].
      [Mike Acton] A more recent (2.6.21 as of this update) kernel and sources can be found the more recent Add-on disc package (CELL-Linux-CL_20070516-ADDON) which can be found in various Linux mirrors:
    2. unpack it in the /usr/src directory
    3. make a link:
      	$ ln -s /usr/src/linux-20061110 /usr/src/linux
      
      [Mike Acton] For Linux 2.6.21:
      	$ ln -s /usr/src/linux-2.6.21-20070425 /usr/src/linux
      
    4. prepare for kernel configuration:
      [Mike Acton] For Linux 2.6.21:
      To build a more recent kernel you will need to install a few things first:
      1. AsciiDoc. Download: asciidoc-8.2.1.tar.gz [methods.co.nz]
      2. $ cd /usr/src
        $ tar xzvf asciidoc.tar.gz
        $ cd asciidoc-8.2.1
        $ ./install.sh
        
      3. xmlto. Download: xmlto-0.0.18.tar.bz2 [cyberelk.net]
      4. $ cd /usr/src
        $ tar xjvf xmlto-0.0.18.tar.bz2
        $ cd xmlto-0.0.18
        $ ./configure
        $ make
        $ make install 
        
      5. git, a revision control system. Download: git 1.5.2 [kernel.org]
        $ cd /usr/src
        $ tar xzvf git-1.5.2.tar.gz
        $ cd git-1.5.2
        $ make prefix=/usr all doc
        $ make prefix=/usr install install-doc 
        
      6. dtc (Device Tree Compiler) NOTE: To build the kernel, you need a version newer than the dtc-20060419.tar.gz version available on the dtc web page.
        $ cd /usr/src
        $ git clone git://www.jdl.com/software/dtc.git 
        $ cd dtc
        $ make
        $ make install
        
        
    5. [Mike Acton] mrproper should be done before make to clean any older build data, if you have them.
      $ make mrproper
      
    6. copy the kernel config file that comes with the fedora installation into /usr/src/linux
      $ cp /boot/config-2.6.16 /usr/src/linux/.config
      
      [Mike Acton] On YellowDog Linux, this file is /boot/config-2.6.16-20061110.ydl.1ps3
      [Mike Acton] For Linux 2.6.21:
      The config file has been updated significantly since the original 2.6.16 release. It's much easier to start with the file included in the kernel distribution.
      $ cd /usr/src/linux
      $ cp arch/powerpc/configs/ps3_defconfig .config
      
    7. This next step goes through the old configuration file and prompts the user whenever a new kernel option that is not present in the old kernel is encountered (none in this case since the old and the new kernels are exactly the same version)
      $ make oldconfig
      
      [Mike Acton] For Linux 2.6.21: There's no need for this step if you copied the file from the kernel distribution itself.
    8. enable huge TLB pages in the kernel configuration
      $ make menuconfig
      
      Now go to File systems --> Pseudo filesystems and enable huge TLB pages by pressing the space bar on the "HugeTLB file system support" option. Now select "exit" repeatedly and answer "yes" when asked to save the new kernel configuration
    9. compile kernel and modules and install modules (it will take around 20 minutes):
      $ make all
      $ make modules_install
      
  2. install the new kernel:
    [Mike Acton] For Linux 2.6.21: Replace references to 2.6.16 with 2.6.21 in this and the following steps.
    $ cp /usr/src/linux/vmlinux /boot/vmlinux-2.6.16_HTLB
    
  3. create a ramdisk image for the new kernel:
    $ mkinitrd /boot/initrd-2.6.16_HTLB.img 2.6.16
    
    [Mike Acton] On Yellowdog Linux, mkinitrd lives in /sbin.
    [Mike Acton] For Linux 2.6.21:
    "When I do mkinitrd, it says: No modules available for kernel "2.6.21". What's up?

    The problem is this version of the kernel doesn't isn't installed as "2.6.21", it's installed as "2.6.21-rc7". You can discover that by looking in /lib/modules:
    $ ls /lib/modules
    total 16
    drwxr-xr-x 3 root root 4096 Mar 22 05:57 2.6.16
    drwxr-xr-x 5 root root 4096 Jan 19 06:06 2.6.16-20061110.ydl.1ps3
    drwxr-xr-x 3 root root 4096 Jul 15 08:24 2.6.20
    drwxr-xr-x 3 root root 4096 Jul 17 06:22 2.6.21-rc7
    
    So the actual command you need to run is:
    $ mkinitrd /boot/initrd-2.6.21_HTLB.img 2.6.21-rc7
    
  4. tell the bootloader (kboot) where the new kernel is:
    $ vim /etc/kboot.conf
    
    add the following line
    linux_htlb='/boot/vmlinux-2.6.16_HTLB initrd=/boot/initrd-2.6.16_HTLB.img'
    
    [Mike Acton] For YellowDog Linux, use: ydl_htlb ='/dev/sda1:/vmlinux-2.6.16_HTLB initrd=/dev/sda1:/initrd-2.6.16_HTLB.img \ root=/dev/sda2 init=/sbin/init video=ps3fb:mode:3 rhgb' ydl480i_htlb ='/dev/sda1:/vmlinux-2.6.16_HTLB initrd=/dev/sda1:/initrd-2.6.16_HTLB.img \ root=/dev/sda2 init=/sbin/init video=ps3fb:mode:1 rhgb' ydl1080i_htlb ='/dev/sda1:/vmlinux-2.6.16_HTLB initrd=/dev/sda1:/initrd-2.6.16_HTLB.img \ root=/dev/sda2 init=/sbin/init video=ps3fb:mode:4 rhgb' ydltext_htlb ='/dev/sda1:/vmlinux-2.6.16_HTLB initrd=/dev/sda1:/initrd-2.6.16_HTLB.img \ root=/dev/sda2 init=/sbin/init 3'
    if you want this kernel to be loaded by default then change the "default" line into
    default=linux_htlb
    
    [Mike Acton] For YellowDog Linux, use one of the modes above.
  5. instruct the boot process in order to allocate huge TLB pages. (Pick one of the following two options)
    1. OPTION 1:
      $ vim /etc/rc.local
      
      add the following lines:
      mkdir -p /huge
      echo 20 > /proc/sys/vm/nr_hugepages
      mount -t hugetlbfs nodev /huge
      chown root:root /huge
      chmod 755 /huge
      
      be sure to change the "chown" line according to your system settings.
    2. OPTION 2: create a /etc/init.d/htlb script with the following content:
      All the commands added to the rc.local file in the previous step are executed at the end of the boot sequence. This means that the huge TLB pages allocation is performed when lots of the system memory has been already allocated by other processes. This results in the allocation of 6 or 7 pages. In order to obtain few pages more (8 or 9) we have to move the huge TLB pages allocation earlier in the boot sequence (i.e. at runlevel-1)

      [Mike Acton] chkconfig required some additional settings not in the previous version of this script. Modified version is here:
      	#!/bin/sh
      	#
      	# htlb:	Start/stop huge TLB pages allocation
      	#
              # [Mike Acton] The runlevel and priority settings for chkconfig are stolen straight out of cpuspeed.
              
              # chkconfig: 12345 06 99
              # description: Start/stop huge TLB pages allocation
      
      	. /etc/rc.d/init.d/functions
      
      	start()
      	{
      	    mkdir -p /huge
      	    echo 20 > /proc/sys/vm/nr_hugepages
      	    mount -t hugetlbfs nodev /huge
      	    chown root:root /huge
      	    chmod 775 /huge
              }
      
      	stop()
      	{
      	    echo 0 > /proc/sys/vm/nr_hugepages
      	}
      	
      	case "$1" in
      	  start)
      		start
      		;;
      	  stop)
      		stop
      		;;
      	  restart|reload)
      	        stop
      	        start
      	        ;;
      	  *)
      	        echo $"Usage: $0 {start|stop|status|restart|reload}"
      	        exit 1
      		;;
      	esac
      	
      	exit 0
      
      Make the new service executable:
      $ chmod a+x /etc/init.d/htlb
      
      Add the service to runlevel-1:
      $ /sbin/chkconfig --add htlb
      
  6. reboot. During the boot process, when presented the "kboot:" prompt you'll be able to choose your kernel using the "tab" key.
[Mike Acton] Validate that huge pages are now installed and working by:
$ cat /proc/meminfo | grep Huge
You should see something like:
HugePages_Total:     8
HugePages_Free:      8
Hugepagesize:    16384 kB
and...
$ cat /proc/filesystems  | grep huge
You should see something like:
nodev   hugetlbfs
[Mike Acton] Here are some helper functions for allocating and freeing huge memory:

cp_hugemem.c
cp_hugemem.h

They are very simple to use:
{
    // Allocate...
    const size_t  hmem_size = 128 * 1024 * 1024;
    cp_hugemem    hmem;

    int was_hugemem_allocated = cp_hugemem_alloc( &hmem, hmem_size );
    if ( !was_hugemem_allocated )
    {
        fprintf(stderr,"Error: Could not allocate hugemem\n");
        return (-1);
    }

    // Use the memory...
    char* ptr = (char*)hmem.addr;

    // Free...
    cp_hugemem_free( &hmem );
}
About the Authors
Jakub Kurzak AKA Koobas is a researcher at the University of Tennessee, Knoxville, and a member of the Innovative Computing Lab (ICL - http://icl.cs.utk.edu/), where he mostly does things related programming multi-core processors and the Cell processor. Before that he was a student the University of Houston, where he dealt with programming distributed memory machines using message passing (MPI). Jakub's interests are in parallel programming techniques (message passing, multi-threading), parallel number crunching algorithms, and performance optimization.

Alfredo Buttari is a research associate at the Computer Science dept. of the University of Tennessee Knoxville. Alfredo is a member of the Innovative Computing Laboratory which deals with many aspects of High Performance Computing. His interests are in developing high performance software for Linear Algebra which is mostly achieved through parallel programming techniques of all sorts (MPI, OpenMP, threads...), including the more exotic approaches like the Cell programming model. Before to Tennesse Alfredo got a PhD and a Master degree in Computer Science from the "Tor Vergata" University of Rome (Italy).