How to use AMD ROCM on Krackan Point / Ryzen AI 300 series

September 30th, 2025

While AMD’s ROCm platform promises powerful GPU computing on Linux, users often encounter frustrating and contradictory statements on their consumer hardware.

Is ROCM officially supported on any AMD APU? No, according to the official support matrix.

https://rocm.docs.amd.com/en/latest/compatibility/compatibility-matrix.html

Only discrete GPU cards are mentioned here.

Yes, according to other AMD sources, it is supported in preview on the new Strix Halo and other (high-end) Ryzen APUs, e.g. Strix Point / Krackan Point.

https://rocm.docs.amd.com/projects/radeon-ryzen/en/latest/index.html

Here is llamacpp-rocm

https://github.com/lemonade-sdk/llamacpp-rocm

In the past, I’ve run ROCM on a 4800U to try out LLMs. While offloading code to a rather small GPU, the integrated Vega of the 4800U, doesn’t necessarily make the LLM run faster, it does run quieter and is didn’t stress out the CPU so much, so besides speed, there are other benefits to be gained.

For that I wanted to try running llama.cpp with ROCm on a AMD Krackan Point laptop APU, a Ryzen 7 350 with an integrated 860M, with the same RDN 3.5 architecture as a Strix Halo.

Spoof ROCm support using HSA_OVERRIDE_GFX_VERSION

So, just try that ROCm version and spoof the GPU ID

To spoof your GPU ID is dead simple, simply set this environment variable: .HSA_OVERRIDE_GFX_VERSION

The iGPU ID of a Strix Halo is GFX1151.

The iGPU ID of a Krackan Point is GFX1152.

So with this workaround the only thing you have to do is download a working example for Strix Halo, and instead of running:

llama-cli -m model.gguf

You run:

HSA_OVERRIDE_GFX_VERSION="11.5.1"  llama-cli -m model.gguf

That’s all. Now you have ROCM running on a Strix Krackan laptop.

Running llama.cpp with ROCM (Ubuntu) on a Ryzen AI 7 350

The easiest way to run llama.cpp with ROCm support is to download a fresh builds of llama.cpp with AMD ROCm acceleration made by AMD:

Download the latest release:

wget https://github.com/lemonade-sdk/llamacpp-rocm/releases/download/b1066/llama-b1066-ubuntu-rocm-gfx1151-x64.zip
cd Downloads

Unzip the downloaded file

unzip llama-b1066-ubuntu-rocm-gfx1151-x64.zip -d llama-b1066-ubuntu-rocm-gfx1151-x64

Enter the dir

cd llama-b1066-ubuntu-rocm-gfx1151-x64

Mark llama-bench executable

chmod u+x llama-bench

Download a GGUf model

wget https://huggingface.co/unsloth/Qwen3-0.6B-GGUF/resolve/main/Qwen3-0.6B-Q5_K_M.gguf?download=true
llama-bench -m ~/Downloads/Qwen3-0.6B-Q5_K_M.gguf

It won’t run:

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, gfx1152 (0x1152), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
rocBLAS error: Cannot read ./rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1152
List of available TensileLibrary Files : 
"./rocblas/library/TensileLibrary_lazy_gfx1151.dat"
Aborted

Aha we forgot to override the GPU ID:

HSA_OVERRIDE_GFX_VERSION="11.5.1"  llama-cli -m model.guff

And now we’re running:

HSA_OVERRIDE_GFX_VERSION="11.5.1" ./llama-bench -m ~/Downloads/Qwen3-0.6B-Q5_K_M.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3 0.6B Q5_K - Medium | 418.15 MiB | 596.05 M | ROCm | 99 | pp512 | 863.83 ± 86.23 |
| qwen3 0.6B Q5_K - Medium | 418.15 MiB | 596.05 M | ROCm | 99 | tg128 | 43.32 ± 1.89 |
build: 703f9e3 (1)

For Ubuntu, you have to do one more step to allow a toolboc container to use you’re GPU for ROCM. For that you have to create udev rule:

https://github.com/kyuz0/amd-strix-halo-toolboxes?tab=readme-ov-file#211-ubuntu-users

cat /etc/udev/rules.d/99-amd-kfd.rule:

SUBSYSTEM=="kfd", GROUP="render", MODE="0666", OPTIONS+="last_rule"
SUBSYSTEM=="drm", KERNEL=="card[0-9]*", GROUP="render", MODE="0666", OPTIONS+="last_rule"

Is it worth running ROCm on Strix Point, Krackan Point

Not really. It isn’t faster. Vulkan is doing a better job at the moment.

Llama.cpp ROCM vs Vulkan Benchmarks vs CPU on Ryzen AI 7 350

| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3 0.6B Q5_K - Medium | 418.15 MiB | 596.05 M | ROCm | 99 | pp512 | 863.83 ± 86.23 |
| qwen3 0.6B Q5_K - Medium | 418.15 MiB | 596.05 M | ROCm | 99 | tg128 | 43.32 ± 1.89 |
| qwen3 0.6B Q5_K - Medium | 418.15 MiB | 596.05 M | RPC,Vulkan | 99 | pp512 | 1599.95 ± 14.06 |
| qwen3 0.6B Q5_K - Medium | 418.15 MiB | 596.05 M | RPC,Vulkan | 99 | tg128 | 80.84 ± 2.81 |
| qwen3 0.6B Q5_K - Medium | 418.15 MiB | 596.05 M | RPC | 99 | pp512 | 406.69 ± 0.21 |
| qwen3 0.6B Q5_K - Medium | 418.15 MiB | 596.05 M | RPC | 99 | tg128 | 108.54 ± 1.82 |

Running the small Qwen3 model on CPU is surprisingly the fastest in token generation, but prompt processing is much faster on Vulkan/GPU.

Sources:
https://github.com/kyuz0/amd-strix-halo-toolboxes
https://llm-tracker.info/_TOORG/Strix-Halo
https://github.com/lemonade-sdk/llamacpp-rocm/

Leave a Reply