Tuesday, January 15, 2019

Setting up AMD's ROCm From Source

Setting up a Local ROCm Build from Source

These instructions document how to compile and run ROCm from source on a Linux system running either an upstream kernel or a kernel built from the ROCK-Kernel-Driver repository. Users who simply wish to use ROCm for general-purpose GPU programming generally will find it much easier to install from the binary distribution, as detailed in the README. However, those who wish to modify or develop ROCm source code may find this guide useful.

Secondly, I realize that AMD does have a repository supposedly for people who want to set up ROCm from source, but the repository primarily contains a bunch of scripts. These instructions were written under the assumption that people exist who, like me, want to work with ROCm source code, don't work for AMD, and would rather understand what they're doing up front rather than having to dig through a ton of shell scripts.

Prerequisites

This guide does not cover installing the ROCm kernel components. Compiling and installing the Linux kernel from source is a sufficiently involved topic and should be covered in other documentation. Instead, the following instructions require that you are already either running a sufficiently new Linux kernel version (i.e. 4.18 or later), or have installed the kernel from the ROCm kernel driver repository.

If you haven't yet added your username to the video group, do so first:

sudo usermod -a -G video $USER

Next, ensure that the video group has write access to /dev/kfd. We will do so by setting the group of the device to video, and ensuring that any member of the group has read and write access to the device:

echo 'KERNEL=="kfd", SUBSYSTEM=="kfd", GROUP="video", MODE="0660"' | sudo tee /etc/udev/rules.d/99-kfd.rules
sudo udevadm control --reload-rules
sudo udevadm trigger

Afterwards, ensure that your changes have taken effect by checking that the kfd virtual device has the correct permissions:

ls -lah /dev | grep kfd

# The above command should output something like:
# crw-rw----   1 root video   240,   0 Jan 15 10:04 kfd

As for software packages, you must first install some additional packages if you don't already have them. On Debian-based systems:

sudo apt install build-essential cmake g++ pkg-config libnuma-dev libpci-dev

Setting up directories

These instructions assume that you want to install ROCm to a directory without requiring root permissions. Start by creating a directory to hold all source code, and create a subdirectory into which we'll "install" ROCm components. Create environment variables holding the full paths of both directories:

mkdir rocm
cd rocm
ROCM_DIR=`pwd`
mkdir install
ROCM_INSTALL_DIR=$ROCM_DIR/install

The remainder of these instructions assume that $ROCM_DIR and $ROCM_INSTALL_DIR have been set as above.

Installing ROCT-Thunk-Interface

The ROCT-Thunk-Interface repository contains a simple C library wrapping ROCm system calls.

Start by cloning the repository if you haven't already:

cd $ROCM_DIR
git clone https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface.git

Next, build the library in a "build" subdirectory:

cd $ROCM_DIR/ROCT-Thunk-Interface
mkdir build
cd build
cmake -D CMAKE_INSTALL_PREFIX=$ROCM_INSTALL_DIR ..
make

Finally, install the library and headers to our target directory:

make install
make install-dev

Later on, if you modify ROCT-Thunk-Interface code, re-run the following commands to recompile and reinstall it:

cd $ROCM_DIR/ROCT-Thunk-Interface/build
make
make install
make install-dev

Installing ROCR-Runtime

ROCR-Runtime wraps the relatively minimal ROCT-Thunk-Interface library to provide more complex functionality, such as userspace memory management. This can only be compiled after you have completed the previous instructions for setting up ROCT-Thunk-Interface.

First, clone the repository if you haven't already:

cd $ROCM_DIR
git clone https://github.com/RadeonOpenCompute/ROCR-Runtime.git

Once again, use cmake to build in a separate build directory. We'll need to specify both where to install the runtime, and where we already installed the thunk interface (they're the same place, assuming you're following along):

cd $ROCM_DIR/ROCR-Runtime/src
mkdir build
cd build
cmake -D CMAKE_PREFIX_PATH=$ROCM_INSTALL_DIR \
      -D CMAKE_INSTALL_PREFIX=$ROCM_INSTALL_DIR \
      ..
make

Finally, install the library to our target directory.

make install

Later, if you modify ROCR-Runtime code, re-run the following commands to recompile and reinstall it:

cd $ROCM_DIR/ROCR-Runtime/src/build
make
make install

Installing the HCC Compiler

This step installs the HCC compiler and other tools such as OpenCL. It requires that you have already set up ROCR-Runtime.

First, clone the repository if you haven't already. Note that this repository is larger than the others due to it also requiring several submodules including LLVM and clang.

cd $ROCM_DIR
git clone --recursive -b clang_tot_upgrade https://github.com/RadeonOpenCompute/hcc.git

As before, build in a separate directory. Also specifiy where to install the binaries:

cd $ROCM_DIR/hcc
mkdir build
cd build
cmake -D CMAKE_INSTALL_PREFIX=$ROCM_INSTALL_DIR \
      -D CMAKE_PREFIX_PATH=$ROCM_INSTALL_DIR \
      -D CMAKE_BUILD_TYPE=Release \
      ..
make -j8

Finally, install HCC to the target directory:

# Run this with several threads to speed it up--there's a lot to install.
make -j8 install

Later, if you modify the HCC compiler's code, re-run the following commands to update your installation:

cd $ROCM_DIR/hcc/build
make -j8
make -j8 install

Installing HIP

This step installs the HIP compiler, AMD's CUDA-like language.

First, clone the repository if you haven't already:

cd $ROCM_DIR
git clone https://github.com/ROCm-Developer-Tools/HIP.git

As before, build in a separate directory, and specify the install location and the location where we installed hcc previously:

cd $ROCM_DIR/HIP
mkdir build
cd build
cmake -D CMAKE_INSTALL_PREFIX=$ROCM_INSTALL_DIR \
      -D HCC_HOME=$ROCM_INSTALL_DIR \
      -D HSA_PATH=$ROCM_INSTALL_DIR/hsa \
      -D CMAKE_BUILD_TYPE=Release \
      ..
LD_LIBRARY_PATH=$ROCM_INSTALL_DIR/lib make -j8
make install

Later, if you modify the HIP source code, re-run the following commands to update your installation:

cd $ROCM_DIR/HIP/build

# Setting LD_LIBRARY_PATH may not be necessary here if you've already followed
# all of these instructions and have added the ROCm libraries to your library
# search path in a more permanent manner.
LD_LIBRARY_PATH=$ROCM_INSTALL_DIR/lib make -j8
make install

Installing rocRAND

This may not be necessary for some people, but I was using it in some of my test programs.

First, clone rocRAND as you did the other projects:

cd $ROCM_DIR
git clone https://github.com/ROCmSoftwarePlatform/rocRAND.git

rocRAND has very annoying issues with their build scripts; namely hardcoded paths to /opt/rocm/... in several places that I couldn't figure out how to override using cmake options. So, just fix it with a hammer:

# Find a list of files with hardcoded paths
grep -r "opt/rocm" *

# Now, manually go through every file and replace the /opt/rocm/... paths with
# the location at which you've been installing ROCm.

Next, we'll build in a separate directory again.

cd $ROCM_DIR/rocRAND
mkdir build
cd build
cmake -D CMAKE_INSTALL_PREFIX=$ROCM_INSTALL_DIR \
      -D CMAKE_CXX_COMPILER=$ROCM_INSTALL_DIR/bin/hcc \
      -D BUILD_TEST=OFF \
      ..
make -j8
make install

These installed to an odd location, so I copied them to a location where they're more easily found by the local installation of hipcc:

cd $ROCM_INSTALL_DIR
cp -r hiprand/lib/* lib/
cp -r rocrand/lib/* lib/

cd ./include
ln -s -T ../hiprand/include ./hiprand
ln -s -T ../rocrand/include ./rocrand

Finally, the headers are wrong in several places (even if you installed the binary distribution of ROCm). If you try to use rocRAND, you'll encounter errors due to "missing" includes from the rocrand or hiprand headers. You'll usually just need to edit the headers and, for each wrong include path, add the directory containing the header to the #include <...> line. There's been an open PR for this issue on the rocRAND repository for a while, but they just don't seem to care about usability very much when it comes to this library.

Additional Steps to Update System Paths

If you plan to run hipcc, hcc, or other tools from your standard command line, you'll need to add your ROCM install location to your PATH. If you're using bash, this can be done by modifying .bashrc:

echo "export PATH=\$PATH:$ROCM_INSTALL_DIR/bin" >> ~/.bashrc
source ~/.bashrc

Some ROCm programs require an HCC_HOME environment variable. This is set to the installation directory of HCC, which was just $ROCM_INSTALL_DIR for us:

echo "export HCC_HOME=$ROCM_INSTALL_DIR" >> ~/.bashrc

Even after running the above step, you'll probably often encounter errors about missing runtime libraries. You can fix this in one of two ways:

  • Adding an entry to /etc/ld.so.conf.d/. If possible, you should start by trying to create a new file in the /etc/ld.so.conf.d/ directory, which contains lists of paths in which the system's dynamic linker searches:

    # Create a config file for most of the ROCm libraries
    echo "$ROCM_INSTALL_DIR/lib" | sudo tee /etc/ld.so.conf.d/ROCm.conf
    # For some reason, the HSA libraries are in a separate directory, so
    # append this line to the config file as well.
    echo "$ROCM_INSTALL_DIR/hsa/lib" | sudo tee -a /etc/ld.so.conf.d/ROCm.conf
    # Finally, update the settings for the dynamic linker:
    sudo ldconfig
    
  • Updating the LD_LIBRARY_PATH environment variable. This way is easy, but generally discouraged if you are able to add new library search paths in another way:

    echo "export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:$ROCM_INSTALL_DIR/lib" >> ~/.bashrc
    echo "export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:$ROCM_INSTALL_DIR/hsa/lib" >> ~/.bashrc
    source ~/.bashrc
    

Uninstalling

For the most part, I attempted to keep invasive system changes to a minimum in these instructions. So, installing should be fairly straightforward:

  1. Delete the ROCM directory you created: rm $ROCM_DIR.

  2. If you updated your PATH environment variable, delete the corresponding lines from .bashrc or whichever configuration file you modified.

  3. If you added new entries to /etc/ld.so.conf.d/, remove them: sudo rm /etc/ld.so.conf.d/ROCm.conf, and run sudo ldconfig. If you instead modified LD_LIBRARY_PATH, delete the relevant lines from .bashrc or wherever you made the changes.

The "Disparity Extender" Algorithm, and F1/Tenth

Introduction: Autonomous Racing Recently, my team from UNC-Chapel Hill won an F1/Tenth competition, held at CPSWeek 2019, in Montreal. ...