Building xformers on Windows

Categories:

in

xformers allows you to speed up AI image generation and reduce VRAM usage on Nvidia hardware, there are pre-built binaries however these may not be available for newer or nightly versions of PyTorch, in which case you have to build it yourself, this tutorial is for Windows but the same steps are mostly applicable to Linux as well.

Prerequisites

You will need to install Visual Studio build tools 2022, select the option ‘Desktop development with C++’ and install.

You will also need the CUDA Toolkit installed, this must match the CUDA version of your PyTorch install, in my case 12.1 is needed.

If you already have an existing Python virtual environment you can make use of that, in any case make sure your desired PyTorch version is installed before continuing and that the environment is activated with ‘.\venv\scripts\activate’.

You will also want to have git installed.

Building

Grab the latest version with:

git clone https://github.com/facebookresearch/xformers.git
cd xformers
git submodule update --init --recursive

Make sure your Python virtual environment is activated and your desired PyTorch version is installed, to speed things up you may also want to install ninja.

pip install ninja

To compile xformers:

python setup.py build
python setup.py bdist_wheel

You may notice very high memory usage possibly spilling into the page file if enabled, if you want to reduce it you will need to change the number of max workers, this can be done in PowerShell with $Env:MAX_WORKERS=4, adjust the value as needed.

I also recommend you only build it for your specific hardware CUDA version, you can set the environment variable TORCH_CUDA_ARCH_LIST=8.9, for the version number you can lookup your GPU here.

Compile time will vary but it can range anywhere from 10 minutes to an hour depending on how many workers you can allocate and your CPU.

Should you crash with STACK_OVERFLOW reduce the number of jobs and set only the architecture you need, if all else fails using the command editbin /STACK:reserve ptxas.exe may resolve the problem (use the visual studio developer command prompt), try different reserve values, ptxas.exe is located in your CUDA toolkit install.

Installing

Once complete there will be a .whl file in the dist directory, install this with:

pip install <file>.whl