While AMD’s ROCm platform promises powerful GPU computing on Linux, users often encounter frustrating and contradictory statements on their consumer hardware.
Is ROCM officially supported on any AMD APU? No, according to the official support matrix.
https://rocm.docs.amd.com/en/latest/compatibility/compatibility-matrix.html
Only discrete GPU cards are mentioned here.
Yes, according to other AMD sources, it is supported in preview on the new Strix Halo and other (high-end) Ryzen APUs, e.g. Strix Point / Krackan Point.
https://rocm.docs.amd.com/projects/radeon-ryzen/en/latest/index.htmlHere is llamacpp-rocm
https://github.com/lemonade-sdk/llamacpp-rocm
In the past, I’ve run ROCM on a 4800U to try out LLMs. While offloading code to a rather small GPU, the integrated Vega of the 4800U, doesn’t necessarily make the LLM run faster, it does run quieter and is didn’t stress out the CPU so much, so besides speed, there are other benefits to be gained.
For that I wanted to try running llama.cpp with ROCm on a AMD Krackan Point laptop APU, a Ryzen 7 350 with an integrated 860M, with the same RDN 3.5 architecture as a Strix Halo.
Spoof ROCm support using HSA_OVERRIDE_GFX_VERSION
So, just try that ROCm version and spoof the GPU ID
To spoof your GPU ID is dead simple, simply set this environment variable: .HSA_OVERRIDE_GFX_VERSION
The iGPU ID of a Strix Halo is GFX1151.
The iGPU ID of a Krackan Point is GFX1152.
So with this workaround the only thing you have to do is download a working example for Strix Halo, and instead of running:
llama-cli -m model.ggufYou run:
HSA_OVERRIDE_GFX_VERSION="11.5.1"  llama-cli -m model.ggufThat’s all. Now you have ROCM running on a Strix Krackan laptop.
Running llama.cpp with ROCM (Ubuntu) on a Ryzen AI 7 350
The easiest way to run llama.cpp with ROCm support is to download a fresh builds of llama.cpp with AMD ROCm acceleration made by AMD:
Download the latest release:
wget https://github.com/lemonade-sdk/llamacpp-rocm/releases/download/b1066/llama-b1066-ubuntu-rocm-gfx1151-x64.zipcd DownloadsUnzip the downloaded file
unzip llama-b1066-ubuntu-rocm-gfx1151-x64.zip -d llama-b1066-ubuntu-rocm-gfx1151-x64Enter the dir
cd llama-b1066-ubuntu-rocm-gfx1151-x64Mark llama-bench executable
chmod u+x llama-benchDownload a GGUf model
wget https://huggingface.co/unsloth/Qwen3-0.6B-GGUF/resolve/main/Qwen3-0.6B-Q5_K_M.gguf?download=truellama-bench -m ~/Downloads/Qwen3-0.6B-Q5_K_M.ggufIt won’t run:
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, gfx1152 (0x1152), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
rocBLAS error: Cannot read ./rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1152
List of available TensileLibrary Files : 
"./rocblas/library/TensileLibrary_lazy_gfx1151.dat"
AbortedAha we forgot to override the GPU ID:
HSA_OVERRIDE_GFX_VERSION="11.5.1"  llama-cli -m model.guffAnd now we’re running:
HSA_OVERRIDE_GFX_VERSION="11.5.1" ./llama-bench -m ~/Downloads/Qwen3-0.6B-Q5_K_M.ggufggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3 0.6B Q5_K - Medium | 418.15 MiB | 596.05 M | ROCm | 99 | pp512 | 863.83 ± 86.23 |
| qwen3 0.6B Q5_K - Medium | 418.15 MiB | 596.05 M | ROCm | 99 | tg128 | 43.32 ± 1.89 |
build: 703f9e3 (1)For Ubuntu, you have to do one more step to allow a toolboc container to use you’re GPU for ROCM. For that you have to create udev rule:
https://github.com/kyuz0/amd-strix-halo-toolboxes?tab=readme-ov-file#211-ubuntu-users
cat /etc/udev/rules.d/99-amd-kfd.rule:
SUBSYSTEM=="kfd", GROUP="render", MODE="0666", OPTIONS+="last_rule"
SUBSYSTEM=="drm", KERNEL=="card[0-9]*", GROUP="render", MODE="0666", OPTIONS+="last_rule"
Is it worth running ROCm on Strix Point, Krackan Point
Not really. It isn’t faster. Vulkan is doing a better job at the moment.
Llama.cpp ROCM vs Vulkan Benchmarks vs CPU on Ryzen AI 7 350
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3 0.6B Q5_K - Medium | 418.15 MiB | 596.05 M | ROCm | 99 | pp512 | 863.83 ± 86.23 |
| qwen3 0.6B Q5_K - Medium | 418.15 MiB | 596.05 M | ROCm | 99 | tg128 | 43.32 ± 1.89 |
| qwen3 0.6B Q5_K - Medium | 418.15 MiB | 596.05 M | RPC,Vulkan | 99 | pp512 | 1599.95 ± 14.06 |
| qwen3 0.6B Q5_K - Medium | 418.15 MiB | 596.05 M | RPC,Vulkan | 99 | tg128 | 80.84 ± 2.81 |
| qwen3 0.6B Q5_K - Medium | 418.15 MiB | 596.05 M | RPC | 99 | pp512 | 406.69 ± 0.21 |
| qwen3 0.6B Q5_K - Medium | 418.15 MiB | 596.05 M | RPC | 99 | tg128 | 108.54 ± 1.82 |Running the small Qwen3 model on CPU is surprisingly the fastest in token generation, but prompt processing is much faster on Vulkan/GPU.
Sources:
https://github.com/kyuz0/amd-strix-halo-toolboxes
https://llm-tracker.info/_TOORG/Strix-Halo
https://github.com/lemonade-sdk/llamacpp-rocm/
 
			
