Today we look at how to build PyTorch on AMD's ROCm.
As you might know, AMD has consumer GPUs that look like they are asking for training and running deep learning models with them. But it seems little known that you can actually build PyTorch master relatively easily.So I got asked this at meetups two weeks in a row where people appeared surprised and also I always forget the exact command line when things drop out off my bash history.
For me, this is the setup I use:
- My base system is Debian unstable. I don't think that is a strict requirement.
- I use the stock Debian Linux 5.2 kernel (so I had some trouble with 4.x kernels a long time ago) along with Debian
firmware-amd-graphicspackages (you need to have Debian main and non-free for this to work).
- Being a bit lazy, I use the ROCm apt repository by adding this to apt.sources:
deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main(AMD provides detailed instructions how to get this going).
- At the time of writing, there is a small glitch in the repositories regarding the capitalization of HIP in the CMake tooling. I run
for fn in $(find /opt/rocm/ -name \*.cmake ); do sudo sed --in-place='~' 's/find_dependency(hip)/find_dependency(HIP)/' $fn ; doneto fix things. This was actually about how PyTorch looked for HIP and has been fixed in PyTorch master (PyTorch 1.7) by acxz. Thank you for tracking it down!
- This is the time when you need to checkout the PyTorch master branch from github.
- Before starting the actual compile, run
python3 tools/amd_build/build_amd.pyfrom the git root working directory. This does some magic replacing of cuda with hip (the ROCm equivalents).
Build PyTorch itself. I use
RCCL_DIR=/opt/rocm/rccl/lib/cmake/rccl/ PYTORCH_ROCM_ARCH=gfx900 hip_DIR=/opt/rocm/hip/cmake/ USE_NVCC=OFF BUILD_CAFFE2_OPS=0 PATH=/usr/lib/ccache/:$PATH USE_CUDA=OFF python3 setup.py bdist_wheel.Here
gfx900is the architecture you can get from
/opt/rocm/bin/rocm_agent_enumerator(that's what the pros do) or from
/opt/rocm/bin/rocminfosearching for gfx. Leaving out PYTORCH_ROCM_ARCH will build for all ROCm-supported architectures, which takes longer.
This produces a whl package in dist/ which you can now install using sudo pip3 install dist/*.whl.
Now you can use PyTorch as usual and when you say
a = torch.randn(5, 5, device="cuda"), it'll create a tensor on the (AMD) GPU.
Bonus tipp - setting up ccache
As compiling PyTorch often can sometimes take quite long, I like to use ccache (recommended in the PyTorch documentation). However, ROCm uses a fully qualified path to the compiler, so just adding ccache on the path won't work for the HIP compilation. The key binary for compilation is
/opt/rocm/llvm/bin/clang-11. So we (somewhat ad hoc) divert this.
I rename the file to
clang-11-real and then add a two line shell script
clang-11 (don't forget to chmod to make it executable):As
/opt/rocm/ will be a symlink, I tend to the version in the path in the script, too, but either should work.
#!/bin/sh exec ccache /opt/rocm/llvm/bin/clang-11-real "$@"
Disclosure: AMD sent me a card to try PyTorch on. (Thanks!) I also do work with AMD on other things, but anything in this blog post is my personal opinion and not necessarily that of AMD.