HowTo: Huge TLB pages on PS3 Linux

Updated! (22 Mar 07) Minor edits. Added notes for YellowDog Linux. Added source code for using huge page allocation.
Updated! (30 Mar 07) A couple minor fixes. Thanks to Guénaël Renault for pointing them out!
Updated! (15 July 07) Added notes for kernel 2.6.21
Guest article: Understanding the TLB and minimizing misses is a critical part of high performance Cell programming. Unfortunately some PS3 kernels do not come with huge page support enabled. Jakub Kurzak and Alfredo Buttari step through the details of recompiling the kernel for huge page support.
The availability of huge TLB pages depends on the way the linux kernel has been configured prior to compilation. The default kernel that ships with Fedora Core 5 (most likely with any other distribution that has binary kernel packages) doesn't include this option. So, in order to have huge TLB pages, it is necessary to reconfigure the kernel, recompile it, instruct the boot loader about the newly created kernel image. Finally we will also show a way to allocate the TLB pages automatically at boot time.

[Mike Acton] This process also works with YellowDog Linux virtually unchanged.
Rebuilding the PS3 Linux Kernel
[Mike Acton] For more detailed information on the Linux Kernel and the build process, see:
[Mike Acton] For more information on using huge tlb pages, especially from user space, read hugetlbpages.txt which is found in the kernel source under /Documents/vm/
Here are the steps:

  1. Recompile the kernel in order to have huge TLB pages
    1. Take the kernel source from the add-on cd (filename is linux-20061110.tar.bz2)
      [Mike Acton] Download the PS3 Source Add-On CD [qj.net].
      [Mike Acton] A more recent (2.6.21 as of this update) kernel and sources can be found the more recent Add-on disc package (CELL-Linux-CL_20070516-ADDON) which can be found in various Linux mirrors:
    2. unpack it in the /usr/src directory
    3. make a link:
      	$ ln -s /usr/src/linux-20061110 /usr/src/linux
      [Mike Acton] For Linux 2.6.21:
      	$ ln -s /usr/src/linux-2.6.21-20070425 /usr/src/linux
    4. prepare for kernel configuration:
      [Mike Acton] For Linux 2.6.21:
      To build a more recent kernel you will need to install a few things first:
      1. AsciiDoc. Download: asciidoc-8.2.1.tar.gz [methods.co.nz]
      2. $ cd /usr/src
        $ tar xzvf asciidoc.tar.gz
        $ cd asciidoc-8.2.1
        $ ./install.sh
      3. xmlto. Download: xmlto-0.0.18.tar.bz2 [cyberelk.net]
      4. $ cd /usr/src
        $ tar xjvf xmlto-0.0.18.tar.bz2
        $ cd xmlto-0.0.18
        $ ./configure
        $ make
        $ make install
      5. git, a revision control system. Download: git 1.5.2 [kernel.org]
        $ cd /usr/src
        $ tar xzvf git-1.5.2.tar.gz
        $ cd git-1.5.2
        $ make prefix=/usr all doc
        $ make prefix=/usr install install-doc
      6. dtc (Device Tree Compiler) NOTE: To build the kernel, you need a version newer than the dtc-20060419.tar.gz version available on the dtc web page.
        $ cd /usr/src
        $ git clone git://www.jdl.com/software/dtc.git
        $ cd dtc
        $ make
        $ make install

    5. [Mike Acton] mrproper should be done before make to clean any older build data, if you have them.
      $ make mrproper
    6. copy the kernel config file that comes with the fedora installation into /usr/src/linux
      $ cp /boot/config-2.6.16 /usr/src/linux/.config
      [Mike Acton] On YellowDog Linux, this file is /boot/config-2.6.16-20061110.ydl.1ps3
      [Mike Acton] For Linux 2.6.21:
      The config file has been updated significantly since the original 2.6.16 release. It's much easier to start with the file included in the kernel distribution.
      $ cd /usr/src/linux
      $ cp arch/powerpc/configs/ps3_defconfig .config
    7. This next step goes through the old configuration file and prompts the user whenever a new kernel option that is not present in the old kernel is encountered (none in this case since the old and the new kernels are exactly the same version)
      $ make oldconfig
      [Mike Acton] For Linux 2.6.21: There's no need for this step if you copied the file from the kernel distribution itself.
    8. enable huge TLB pages in the kernel configuration
      $ make menuconfig
      Now go to File systems --> Pseudo filesystems and enable huge TLB pages by pressing the space bar on the "HugeTLB file system support" option. Now select "exit" repeatedly and answer "yes" when asked to save the new kernel configuration
    9. compile kernel and modules and install modules (it will take around 20 minutes):
      $ make all
      $ make modules_install
  2. install the new kernel:
    [Mike Acton] For Linux 2.6.21: Replace references to 2.6.16 with 2.6.21 in this and the following steps.
    $ cp /usr/src/linux/vmlinux /boot/vmlinux-2.6.16_HTLB
  3. create a ramdisk image for the new kernel:
    $ mkinitrd /boot/initrd-2.6.16_HTLB.img 2.6.16
    [Mike Acton] On Yellowdog Linux, mkinitrd lives in /sbin.
    [Mike Acton] For Linux 2.6.21:
    "When I do mkinitrd, it says: No modules available for kernel "2.6.21". What's up?

    The problem is this version of the kernel doesn't isn't installed as "2.6.21", it's installed as "2.6.21-rc7". You can discover that by looking in /lib/modules:
    $ ls /lib/modules
    total 16
    drwxr-xr-x 3 root root 4096 Mar 22 05:57 2.6.16
    drwxr-xr-x 5 root root 4096 Jan 19 06:06 2.6.16-20061110.ydl.1ps3
    drwxr-xr-x 3 root root 4096 Jul 15 08:24 2.6.20
    drwxr-xr-x 3 root root 4096 Jul 17 06:22 2.6.21-rc7
    So the actual command you need to run is:
    $ mkinitrd /boot/initrd-2.6.21_HTLB.img 2.6.21-rc7
  4. tell the bootloader (kboot) where the new kernel is:
    $ vim /etc/kboot.conf
    add the following line
    linux_htlb='/boot/vmlinux-2.6.16_HTLB initrd=/boot/initrd-2.6.16_HTLB.img'
    [Mike Acton] For YellowDog Linux, use:
    ydl_htlb ='/dev/sda1:/vmlinux-2.6.16_HTLB initrd=/dev/sda1:/initrd-2.6.16_HTLB.img \
    root=/dev/sda2 init=/sbin/init video=ps3fb:mode:3 rhgb'
    ydl480i_htlb ='/dev/sda1:/vmlinux-2.6.16_HTLB initrd=/dev/sda1:/initrd-2.6.16_HTLB.img \
    root=/dev/sda2 init=/sbin/init video=ps3fb:mode:1 rhgb'
    ydl1080i_htlb ='/dev/sda1:/vmlinux-2.6.16_HTLB initrd=/dev/sda1:/initrd-2.6.16_HTLB.img \
    root=/dev/sda2 init=/sbin/init video=ps3fb:mode:4 rhgb'
    ydltext_htlb ='/dev/sda1:/vmlinux-2.6.16_HTLB initrd=/dev/sda1:/initrd-2.6.16_HTLB.img \
    root=/dev/sda2 init=/sbin/init 3'

    if you want this kernel to be loaded by default then change the "default" line into
    default=linux_htlb
    [Mike Acton] For YellowDog Linux, use one of the modes above.

  5. instruct the boot process in order to allocate huge TLB pages. (Pick one of the following two options)
    1. OPTION 1:
      $ vim /etc/rc.local
      add the following lines:
      mkdir -p /huge
      echo 20 > /proc/sys/vm/nr_hugepages
      mount -t hugetlbfs nodev /huge
      chown root:root /huge
      chmod 755 /huge
      be sure to change the "chown" line according to your system settings.
    2. OPTION 2: create a /etc/init.d/htlb script with the following content:
      All the commands added to the rc.local file in the previous step are executed at the end of the boot sequence. This means that the huge TLB pages allocation is performed when lots of the system memory has been already allocated by other processes. This results in the allocation of 6 or 7 pages. In order to obtain few pages more (8 or 9) we have to move the huge TLB pages allocation earlier in the boot sequence (i.e. at runlevel-1)

      [Mike Acton] chkconfig required some additional settings not in the previous version of this script. Modified version is here:
      	#!/bin/sh
      #
      # htlb: Start/stop huge TLB pages allocation
      #
      # [Mike Acton] The runlevel and priority settings for chkconfig are stolen straight out of cpuspeed.

      # chkconfig: 12345 06 99
      # description: Start/stop huge TLB pages allocation

      . /etc/rc.d/init.d/functions

      start()
      {
      mkdir -p /huge
      echo 20 > /proc/sys/vm/nr_hugepages
      mount -t hugetlbfs nodev /huge
      chown root:root /huge
      chmod 775 /huge
      }

      stop()
      {
      echo 0 > /proc/sys/vm/nr_hugepages
      }

      case "$1" in
      start)
      start
      ;;
      stop)
      stop
      ;;
      restart|reload)
      stop
      start
      ;;
      *)
      echo $"Usage: $0 {start|stop|status|restart|reload}"
      exit 1
      ;;
      esac

      exit 0
      Make the new service executable:
      $ chmod a+x /etc/init.d/htlb
      Add the service to runlevel-1:
      $ /sbin/chkconfig --add htlb
  6. reboot. During the boot process, when presented the "kboot:" prompt you'll be able to choose your kernel using the "tab" key.
[Mike Acton] Validate that huge pages are now installed and working by:
$ cat /proc/meminfo | grep Huge
You should see something like:
HugePages_Total:     8
HugePages_Free: 8
Hugepagesize: 16384 kB
and...
$ cat /proc/filesystems  | grep huge
You should see something like:
nodev   hugetlbfs
[Mike Acton] Here are some helper functions for allocating and freeing huge memory:

cp_hugemem.c
cp_hugemem.h

They are very simple to use:
{
// Allocate...
const size_t hmem_size = 128 * 1024 * 1024;
cp_hugemem hmem;

int was_hugemem_allocated = cp_hugemem_alloc( &hmem, hmem_size );
if ( !was_hugemem_allocated )
{
fprintf(stderr,"Error: Could not allocate hugemem\n");
return (-1);
}

// Use the memory...
char* ptr = (char*)hmem.addr;

// Free...
cp_hugemem_free( &hmem );
}
About the Authors
Jakub Kurzak AKA Koobas is a researcher at the University of Tennessee, Knoxville, and a member of the Innovative Computing Lab (ICL - http://icl.cs.utk.edu/), where he mostly does things related programming multi-core processors and the Cell processor. Before that he was a student the University of Houston, where he dealt with programming distributed memory machines using message passing (MPI). Jakub's interests are in parallel programming techniques (message passing, multi-threading), parallel number crunching algorithms, and performance optimization.

Alfredo Buttari is a research associate at the Computer Science dept. of the University of Tennessee Knoxville. Alfredo is a member of the Innovative Computing Laboratory which deals with many aspects of High Performance Computing. His interests are in developing high performance software for Linear Algebra which is mostly achieved through parallel programming techniques of all sorts (MPI, OpenMP, threads...), including the more exotic approaches like the Cell programming model. Before to Tennesse Alfredo got a PhD and a Master degree in Computer Science from the "Tor Vergata" University of Rome (Italy).