xformers allows you to speed up AI image generation and reduce VRAM usage on Nvidia hardware, there are pre-built binaries however these may not be available for newer or nightly versions of PyTorch, in which case you have to build it yourself, this tutorial is for Windows but the same steps are mostly applicable to Linux as well.
You will also need the CUDA Toolkit installed, this must match the CUDA version of your PyTorch install, in my case 12.1 is needed.
If you already have an existing Python virtual environment you can make use of that, in any case make sure your desired PyTorch version is installed before continuing and that the environment is activated with ‘.\venv\scripts\activate’.
git clone https://github.com/facebookresearch/xformers.git
cd xformers
git submodule update --init --recursive
Make sure your Python virtual environment is activated and your desired PyTorch version is installed, to speed things up you may also want to install ninja.
pip install ninja
To compile xformers:
python setup.py build
python setup.py bdist_wheel
You may notice very high memory usage possibly spilling into the page file if enabled, if you want to reduce it you will need to change the number of max workers, this can be done in PowerShell with $Env:MAX_WORKERS=4, adjust the value as needed.
I also recommend you only build it for your specific hardware CUDA version, you can set the environment variable TORCH_CUDA_ARCH_LIST=8.9, for the version number you can lookup your GPU here.
Compile time will vary but it can range anywhere from 10 minutes to an hour depending on how many workers you can allocate and your CPU.
Should you crash with STACK_OVERFLOW reduce the number of jobs and set only the architecture you need, if all else fails using the command editbin /STACK:reserve ptxas.exe may resolve the problem (use the visual studio developer command prompt), try different reserve values, ptxas.exe is located in your CUDA toolkit install.
Installing
Once complete there will be a .whl file in the dist directory, install this with:
As you may or may not be aware the United Kingdom government recently passed the highly controversial Online Safety Bill, at this time it’s awaiting royal assent before it becomes law, this law will severely impact freedom of speech in the UK, but more importantly, it will risk undermining the safety of every single person living in the UK, by threating online encryption technology with government mandated backdoors.
Aside from trying to force platforms to censor any speech they deem ‘hateful’, which could be anything at all really since there are no strict guidelines, these encryption backdoors run the very real risk that they will be exploited by criminals, exposing personal information. Before you say “I have nothing to hide”, this includes banking, medical, housing and other information that most sane people want to keep private.
Defenders of this awful piece of legislation say it’s required to protect children, the truth is however this will do very little to accomplish that goal and may even have a negative effect on a child’s privacy, the vast majority of child exploitation already occurs on platforms that do not utilize encryption, and from countries outside the UK, regardless it should not be the job of the government to protect children in the first place, lazy and incompetent parents are what they should be targeting.
Ultimately this is almost certainly going to be a disaster, and will hopefully backfire badly on the British government, numerous companies have already indicated that they will block UK users rather than undermine their encryption, after all, who wants the UK government spying on them?, this does mean you will need to be more careful in the meantime in which software and platforms you make use of.
Ironically this law will actually help you find the companies that can (within reason) be trusted, by seeing which ones block the UK, so I guess we should thank them for that.
As usual with most governments they really don’t understand how technology works, there are numerous ways to avoid censorship and protect your online safety and privacy, in this article I will cover the most important of these, you may also find additional useful information in my prior article.
Virtual Private Network
The easiest way to protect your privacy is to use a VPN, there are many available including free ones, these also allow you to bypass government blocks by routing your connection through another country, whilst this isn’t 100% secure against a government trying to unmask who you are, it’s good enough for the average person.
Some popular ones that are privacy focused are ProtonVPN (free available) and Mullvad (paid), as always do your research before choosing a VPN.
Data Minimisation
If you are going to post things on the internet then you should at least take the time to minimise personal information, avoid using your real name, age and date of birth, do not upload pictures of yourself or your family or any sensitive documents, this doesn’t only apply to social media but also educational and government websites, none of them can be trusted with personal information.
If you are going to upload images make sure location data is not included, if you are unsure you can strip all metadata with a tool such as ExifCleaner.
Alternative Platforms / Software
Avoiding mainstream platforms is also important since in many cases these are the most likely to comply with the UK government and other censorship, some alternatives to popular platforms are:
There is a whole bunch more, mainly you want to look for things that are open source, provide end to end encryption and refuse to censor content.
Conclusion
With the proper tools and a bit of effort you can protect your privacy online whilst retaining the right to say whatever the hell you like, make no mistake this sort of intrusive monitoring and censorship is going to come to all countries in time unless you fight against it.
Stable diffusion XL is Stability AI’s latest AI image generator released in July 2023, it gives a significant improvement to image quality compared to prior versions at a modest performance cost, a number of popular frontends have at least partial support for SDXL which should improve over time.
In this tutorial I’m going to assume that you have no prior knowledge, the frontend I’m using is Comfy UI, while it does have a steeper learning curve it offers the best performance and least issues of currently available frontends.
Installation
Windows users with Nvidia hardware or those wanting to use CPU only can download the portable version, you only need to exact this somewhere and run the desired batch file.
For users of AMD hardware, Linux or OSX, you will need to perform a manual install which is described here, you will need Python 3.10.x* installed or alternatively you can use an Anaconda environment which is outside the scope of this tutorial, you’ll also need git installed to clone the repository.
When installing Python on Windows make sure the option add to PATH is checked.
Open your command prompt / terminal and browse to where you want Comfy UI installed, a minimum of around 20GB of free space is required but more is recommended.
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
Next create a virtual environment so it doesn’t interfere with other Python software you have installed, if you don’t care about this you can skip straight to installing.
Run the following commands depending on your hardware and operating system:
# Nvidia (all)*
pip install torch==2.1.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install -U xformers==0.0.22.post4
# AMD (Linux / ROCm)
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/rocm5.4.2
# AMD 7000 series (Linux / ROCm)
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.7
# AMD (Windows / DirectML)
# This must be run after installing from requirements.txt
pip uninstall torch
pip install torch-directml
# If you get the error "Torch not compiled with CUDA enabled" run this and repeat
pip uninstall torch
SDXL may have serious issues with DirectML, more than 8GB of VRAM or having AMD Smart Access Memory may be required to avoid crashing.
Torch version is fixed at 2.1.0 for users using precompiled xformers 0.0.22.post4, if you’re not using xformers this is not needed.
xformers typically reduces vram usage and increase speed, however you may find you get similar or better results by using the –use-pytorch-cross-attention launch option, particularly with newer versions of pytorch so feel free to experiment
Then you just need to install the rest of the dependencies.
pip install -r requirements.txt
To make things a bit easier to launch I recommend making a simple launcher script, a very basic example PowerShell script which includes automatic update would be:
git pull
.\venv\scripts\activate
python main.py --auto-launch
# Use one of the below instead only if needed
# For Linux AMD 6000 series users you may need to use this
HSA_OVERRIDE_GFX_VERSION=10.3.0 python main.py --auto-launch
# For Linux AMD 7000 series users you may need to use this
HSA_OVERRIDE_GFX_VERSION=11.0.0 python main.py --auto-launch
# For AMD Windows DirectML users
python main.py --auto-launch --directml
Save it as a .ps1 file in the same location as main.py, right click it and click Run with PowerShell to launch Comfy UI, if all is working you can download the SDXL models, you will need the base model and the refiner model, put them both in ‘ComfyUI/models/checkpoints’ and restart Comfy UI.
Enable High Quality Previews
To enable high quality previews download the taesdxl_decoder.pth and place it in ‘ComfyUI/models/vae_approx’, then add –preview-method auto to the launch script.
Basic Images
To start you can copy the below workflow or load one from the examples provided.
This is the most basic workflow you can get, the positive and negative text prompts are encoded by the CLIP Text Encode node which gets given to the KSampler which produces the latent image, the size is set by the Empty Latent Image, finally the latent output is decoded by the VAE Decode giving the actual image.
By making a few changes we can add the refiner model to the workflow, this is intended to be used after the base model and helps produce additional details, you can give it the same or a different prompt to get interesting results, using the same seed as the base may be preferable in most cases:
As you can see we’ve pretty much duplicated everything here, except the denoise parameter in the refiner sampler has been reduce to 0.2, denoise controls how strongly the sampler effects the latent image, ranging from 0 to 1 (0% to 100%), since we want the refiner to refine the image, not completely replace it a value of 0.05 to 0.4 is a good choice.
Another important parameter is the steps, simply put the more steps you have the more refined the image will appear, however there is a point of diminishing returns where the image changes very little or not at all with additional steps, the optimal amount varies with the sampler used, the image subject and various other hard to predict factors, typically 20-30 steps is decent for a preview with 100-400 for a final image, you may find at times that lower steps produces a more desirable image.
There are a number of samplers to choose from although I would say there are only a few that are really worthwhile in my opinion, Euler is a decent general purpose sampler, DDIM is excellent for faces and DPMPP_2M is overall good for most purposes, as with anything experimenting is a good idea.
The CFG (Classifier Free Guidance) controls how strongly it tries to adhere to the prompt, higher values tend to give results closer to the prompt but can result in weirdness if set too high, a range of 4-15 usually works, a lower value in the refiner seems to be preferable.
Make sure you check out the examples for different workflows, there is multiple ways to do the same thing, the more advanced SDXL workflow given in the examples tends to generate better images than the more traditional method shown here.
This only covers the very basics of what you can do with AI image generation but you can get a lot of interesting results with only this.