Fixing crashes on AMD Strix Halo — Image & Video Toolbox

{,,,}
November 26th, 2025

kyuz0 introduced his simple to setup toolboxes for Fedora, they also run on Ubuntu. This collection of toolboxes gives any Strix Halo owner an easy way of trying different Ai Applications on their machine, without having to mess around with software installations.

https://github.com/kyuz0

I can say, they are a delight to use. Update a toolbox is as simple as running a script. No more hassle with dependencies or compiling issues.

What you do need is patience, a huge SSD, and a fast unmetered internet connection. You’ll end up downloading 250GB+ to try out  different things, such as LLMs, video and image generation.

Anyway the toolboxes are getting better and better. Performance is improving, with speedups of up to 300% in some cases, reduced memory usage, but stability remains another PITA.

ROCm is no fun because of instability and daily crashes. But there is good news on the horizon.

A fix is out, which stops ROCm from crashing. That is the good news. The bad news is that it’s only available in a release candidate for the new kernel.

In the end it is a missing feature in the kernel for the Strix Halo (or a bug), that is fixed in the upcoming Linux Kernel 6.18. Hopefully this patch will be backported to 6.17 otherwise we will have broken software support in Ubuntu 25.10 on the flagship AMD APU.

In the meantime, you can mitigate the problem in Ubuntu 25.10 / 25.04 with this workaround:

options amdgpu cwsr_enable=0

Just add that to `/etc/modprobe.d/strix-halo.conf` (or any file)

https://community.frame.work/t/amd-rocm-does-not-support-the-amd-ryzen-ai-300-series-gpus/68767/51

At least with that I got no more crashes on Ubuntu 25.10.

Not sure what impact this has on performance. AFAIK, a Strix Halo has 1.5 times the  VGPR capacity versus other RDNA3.5 hardware.

What does cwsr do?

cwsr_enable (int) (https://www.kernel.org/doc/html/v6.7/gpu/amdgpu/module-parameters.html)

CWSR(compute wave store and resume) allows the GPU to preempt shader execution in the middle of a compute wave. Default is 1 to enable this feature. Setting 0 disables it.

https://www.kernel.org/doc/html/v6.7/gpu/amdgpu/module-parameters.html#cwsr-enable-int

I really don’t know what  disabling means in practice, (changing the default is a workaround) and I don’t know why that is fixing the VGPR issue, but it does work as a workaround.

VGPR has something to do with dynamic wave32 compute shaders, I suspect that’s the issue, because in addition to crashes I also experienced hangings/deadlocks.

RDNA increased SIMD register file capacity to 128 KB, up from 64 KB on GCN. RDNA 3 introduced a 192 KB register file configuration for high end GPUs, where die area is likely less of a concern. But that strategy isn’t efficient for raytracing.

https://chipsandcheese.com/p/dynamic-register-allocation-on-amds

Please let me know in the comments, if you do.

Hopefully this patch will be backported to 6.17. Don’t forget to comment out or delete the added line `options amdgpu cwsr_enable=0 in  /etc/modprobe.d/strix-halo.conf` once the patch is released.

Links and resources:

  • https://gpuopen.com/learn/optimizing-gpu-occupancy-resource-usage-large-thread-groups/
  • https://github.com/kyuz0/amd-strix-halo-image-video-toolboxes

Tags: , , ,

Leave a Reply