Posts Tagged ‘strix halo’

No Comments

Running FLUX.2-dev on a Strix Halo

Wednesday, December 3rd, 2025

FLUX.2-dev is the newest text-to-image generator introduced by the German AI company Black Forest Labs (BFL). FLUX.2.dev is the open source variant and can be downloaded and use local on your PC. It is a huge model with 32 billion parameter, 64GB in size. So you need a beefy computer with a lot of memory

https://huggingface.co/black-forest-labs/FLUX.2-dev

Luckily the Strix Halos has up to 128GB of unified memory, so it’s ideal for this huge AI models.

Easiest way to try out image, movie and audio generation is with ComfyUI.

There are several ways of installing ComfyUI, or using a toolbox, but here we follow the instructions on the ComfyUI Github repository.

Installing ComfyUI on a Strix Halo

create a virtual environment to isolate your Python install.

python3 -m venv .comfyui
source .comfyui/bin/activate
git clone https://github.com/comfyanonymous/ComfyUI.git

cd ComfyUI

pip install --pre torch torchvision torchaudio --index-url https://rocm.nightlies.amd.com/v2/gfx1151/

pip install -r requirements.txt

Running ComfyUI

In short:

TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 python main.py --use-pytorch-cross-attention

Workflow for FLUX.2-dev on ComfyUI

Here is a collection of examples how to run a specific model.

https://comfyanonymous.github.io/ComfyUI_examples/

Of course sometimes you can save some VRAM by using a quantized model

https://huggingface.co/city96/FLUX.2-dev-gguf

https://huggingface.co/city96/FLUX.2-dev-gguf/blob/main/flux2-dev-Q4_K_M.gguf

But for the best quality you wanna stick to the bf16 or fp8 models. We just downloaded all the files from the official example;

https://comfyanonymous.github.io/ComfyUI_examples/flux2/

Running FLUX.2-dev with ComfyUI on Strix Halo

The best way to start ComfyUI on a Strix Halo is by adding some extra parameters for better performance:

TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 python main.py  --use-pytorch-cross-attention --disable-mmap

Don’t forget to add –disable-mmap, otherwise ComfyUI will hang.

It will take about 6 to 7 minutes to generate a 20 step 1200×600 picture. It’s not that fast, but the image quality is excellent.

 

No Comments

Fixing crashes on AMD Strix Halo — Image & Video Toolbox

Wednesday, November 26th, 2025

kyuz0 introduced his simple to setup toolboxes for Fedora, they also run on Ubuntu. This collection of toolboxes gives any Strix Halo owner an easy way of trying different Ai Applications on their machine, without having to mess around with software installations.

https://github.com/kyuz0

I can say, they are a delight to use. Update a toolbox is as simple as running a script. No more hassle with dependencies or compiling issues.

What you do need is patience, a huge SSD, and a fast unmetered internet connection. You’ll end up downloading 250GB+ to try out  different things, such as LLMs, video and image generation.

Anyway the toolboxes are getting better and better. Performance is improving, with speedups of up to 300% in some cases, reduced memory usage, but stability remains another PITA.

ROCm is no fun because of instability and daily crashes. But there is good news on the horizon.

A fix is out, which stops ROCm from crashing. That is the good news. The bad news is that it’s only available in a release candidate for the new kernel.

In the end it is a missing feature in the kernel for the Strix Halo (or a bug), that is fixed in the upcoming Linux Kernel 6.18. Hopefully this patch will be backported to 6.17 otherwise we will have broken software support in Ubuntu 25.10 on the flagship AMD APU.

In the meantime, you can mitigate the problem in Ubuntu 25.10 / 25.04 with this workaround:

options amdgpu cwsr_enable=0

Just add that to `/etc/modprobe.d/strix-halo.conf` (or any file)

https://community.frame.work/t/amd-rocm-does-not-support-the-amd-ryzen-ai-300-series-gpus/68767/51

At least with that I got no more crashes on Ubuntu 25.10.

Not sure what impact this has on performance. AFAIK, a Strix Halo has 1.5 times the  VGPR capacity versus other RDNA3.5 hardware.

What does cwsr do?

cwsr_enable (int) (https://www.kernel.org/doc/html/v6.7/gpu/amdgpu/module-parameters.html)

CWSR(compute wave store and resume) allows the GPU to preempt shader execution in the middle of a compute wave. Default is 1 to enable this feature. Setting 0 disables it.

https://www.kernel.org/doc/html/v6.7/gpu/amdgpu/module-parameters.html#cwsr-enable-int

I really don’t know what  disabling means in practice, (changing the default is a workaround) and I don’t know why that is fixing the VGPR issue, but it does work as a workaround.

VGPR has something to do with dynamic wave32 compute shaders, I suspect that’s the issue, because in addition to crashes I also experienced hangings/deadlocks.

RDNA increased SIMD register file capacity to 128 KB, up from 64 KB on GCN. RDNA 3 introduced a 192 KB register file configuration for high end GPUs, where die area is likely less of a concern. But that strategy isn’t efficient for raytracing.

https://chipsandcheese.com/p/dynamic-register-allocation-on-amds

Please let me know in the comments, if you do.

Hopefully this patch will be backported to 6.17. Don’t forget to comment out or delete the added line `options amdgpu cwsr_enable=0 in  /etc/modprobe.d/strix-halo.conf` once the patch is released.

Links and resources:

  • https://gpuopen.com/learn/optimizing-gpu-occupancy-resource-usage-large-thread-groups/
  • https://github.com/kyuz0/amd-strix-halo-image-video-toolboxes
No Comments

Enable WakeOnLan (WOL) on Strix Halo

Saturday, November 15th, 2025

First enter the bios and enable Wake-On-LAN there.

Is that all? You might expect everything is working now, by sending the magic WOL packet from another machine to the Strix Halo’s correct MAC address on your LAN.

How to find the MAC address of your Strix Halo:

Boot your Strix Halo machine , open a terminal and type

ip a

look for eth0 or enoX and the line, link/ether

00:1A:2B:3C:4D:5E (example only)

Be aware that Wake-On-LAN can only work on Ethernet, not Wi-Fi.

Power off your Strix Halo

On another machine or laptop run:

wakeonlan <Strix-Halo-MAC-address>
whatis wakeonlan
wakeonlan (1) - Perl script to wake up computers

If you need to install it:

sudo apt install wakeonlan

But, this did not work from my laptop.

wakeonlan <Strix-Halo-MAC-address>

After powering off and issuing this command from my laptop, nothing happened.

So I started Strix Halo manually by pressing the power button and started debugging.

sudo ethtool eno1 | grep Wake
Supports Wake-on: pumbg
Wake-on: d

What does this mean?

According to Qwen-Coder:

This indicates which wake-on-LAN features your NIC supports, using a code:
Letter Meaning
p Magic Packet – The standard WoL packet that can wake the machine
u Unicast – Wake from a unicast packet (specific to your MAC address)
m Multicast – Wake from multicast packets
b Broadcast – Wake from broadcast packets
g Gigabit – Wake-on-LAN over gigabit Ethernet (if applicable)

Wake-on: d means it is disabled software wise.

You have to enable it in the software as well, a bit surprising.

Enable WakeOnLan on your ethernet, also activate WOL in software

sudo ethtool -s eno1 wol g

After that.

sudo ethtool eno1 | grep Wake
Supports Wake-on: pumbg
Wake-on: g

So g means, it is enabled now.

I shutdown my Strix Halo, and now I could start it from my laptop:

wakeonlan <MAC-address>

But it stopped working after reboot, it only worked once.

Enable Wake-On-LAN on your ethernet interface, persistently

To make it persistent after reboot, you have to configure it with nmcli:

Lookup the connection name for ethernet (netplan-eno1)

sudo nmcli d

Then make it persistent:

sudo nmcli c modify "netplan-eno1" 802-3-ethernet.wake-on-lan magic

Links and resources:

  • https://wiki.debian.org/WakeOnLan
No Comments

Setting up unified memory for Strix Halo correctly on Ubuntu 25.04 or 25.10

Wednesday, November 12th, 2025

You have followed the online instructions, for example this great toolbox, but upon running the Strix Halo Toolbox, you are still encountering memory errors on your 128GB Strix Halo system, and for example qwen-image-studio fails to run.

File "/opt/venv/lib64/python3.13/site-packages/torch/utils/_device.py", line 104, in torch_function
return func(*args, **kwargs)
torch.OutOfMemoryError: HIP out of memory. Tried to allocate 3.31 GiB. GPU 0 has a total capacity of 128.00 GiB of which 780.24 MiB is free. Of the allocated memory 57.70 GiB is allocated by PyTorch, and 75.82 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_HIP_ALLOC_CONF=expandable_segments:True to avoid fragmentation. 

(https://github.com/kyuz0/amd-strix-halo-image-video-toolboxes/issues)

You’ve verified your configuration using sudo dmesg | grep "amdgpu:.*memory", and the output indicates that the GTT size is correct.

It is likely that you configured the GTT size using the now outdated and deprecated parameter amdgpu.gttsize, which may explain why the setting is not taking effect. Alternatively, you may have used an incorrect prefixamdttm. instead of the correct ttm..

Please verify your configuration to ensure the proper syntax is used:

How to check the unified memory setting on AMD Strix Halo/Krackan Point:

cat /sys/module/ttm/parameters/p* | awk '{print $1 / (1024 * 1024 / 4)}'

The last two lines must be the same, and the number you see is amount of unified memory in GB.

How to setup unified memory correctly

In the BIOS, set the GMA (Graphics Memory Allocation) to the minimum value: 512MB. Then, add a kernel boot parameter to enable unified memory support.

Avoid outdated methods, they no longer work. Also, note that the approach differs slightly depending on your hardware: AMD Ryzen processors require a different configuration (ttm) compared to Instinct-class (professional workstation) GPUs (amdttm).

To max out unified memory:

Edit /etc/default/grub for and change the GRUB_CMDLINE_LINUX_DEFAULT line to:

128GB HALO

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_iommu=off ttm.pages_limit=33554432 ttm.page_pool_size=33554432"

96GB HALO

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_iommu=off ttm.pages_limit=25165824 ttm.page_pool_size=25165824"

64GB HALO

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_iommu=off ttm.pages_limit=16777216 ttm.page_pool_size=16777216"

32GB HALO

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_iommu=off ttm.pages_limit=8388608 ttm.page_pool_size=8388608"

The math here is 32GB: 32 x 1024 * 1024 * 1024 / 4096 -> 32 * 1024 * 256

The default page = 4096.

After you’ve edited the /etc/default/grub:

sudo update-grub2
reboot

You probably should leave some memory (~4GB) for your system to run smoothly, so edit and adapt above lines accordingly. Maybe 2Gb is enough if you don’t run a GUI.

Check your config

To check if you’ve done it correctly, reboot and check:

cat /sys/module/ttm/parameters/p* | awk '{print $1 / (1024 * 1024 / 4)}'
96
96

Here the total memory of 96GB is set on a 96GB Strix Halo

If the last two numbers are not the same, try debugging the problem.

Debugging unified memory problem

Check dmesg for AMD VRAM and GTT size:

sudo dmesg | grep “amdgpu:.*memory”

[ 10.290438] amdgpu 0000:64:00.0: amdgpu: amdgpu: 512M of VRAM memory ready
[ 10.290440] amdgpu 0000:64:00.0: amdgpu: amdgpu: 131072M of GTT memory ready.

This seem correct, but setting set the GTT size with the old method amdgpu.gtt_size, will report the right size of GTT memory, but it can’t be used by ROCm unless  you set the TTM memory correctly You’ll notice an other warning in the dmesg output during early boot.

sudo dmesg | grep “amdgpu”

[ 17.652893] amdgpu 0000:c5:00.0: amdgpu: [drm] Configuring gttsize via module parameter is deprecated, please use ttm.pages_limit
[ 17.652895] amdgpu 0000:c5:00.0: amdgpu: [drm] GTT size has been set as 103079215104 but TTM size has been set as 48956567552, this is unusual

Furthermore you see a lot of sources mentioning to set amdttm.pages_limit or amdttm.page_pool_size. This won’t work with your Strix Halo but these settings are for AMD Instinct machines.

Confusing, yes, but just be careful to use the right settings in /etc/default/grub for  GRUB_CMDLINE_LINUX_DEFAULT

And don’t forget just check it with the mentioned oneliner:

cat /sys/module/ttm/parameters/p* | awk '{print $1 / (1024 * 1024 / 4)}'

 

Links and resources:

  • https://github.com/ROCm/ROCm/issues/5562#issuecomment-3452179504
  • https://strixhalo.wiki/
  • https://blog.linux-ng.de/2025/07/13/getting-information-about-amd-apus/
  • https://www.jeffgeerling.com/blog/2025/increasing-vram-allocation-on-amd-ai-apus-under-linux