Posts Tagged ‘AI’

No Comments

Running FLUX.2-dev on a Strix Halo

Wednesday, December 3rd, 2025

FLUX.2-dev is the newest text-to-image generator introduced by the German AI company Black Forest Labs (BFL). FLUX.2.dev is the open source variant and can be downloaded and use local on your PC. It is a huge model with 32 billion parameter, 64GB in size. So you need a beefy computer with a lot of memory

https://huggingface.co/black-forest-labs/FLUX.2-dev

Luckily the Strix Halos has up to 128GB of unified memory, so it’s ideal for this huge AI models.

Easiest way to try out image, movie and audio generation is with ComfyUI.

There are several ways of installing ComfyUI, or using a toolbox, but here we follow the instructions on the ComfyUI Github repository.

Installing ComfyUI on a Strix Halo

create a virtual environment to isolate your Python install.

python3 -m venv .comfyui
source .comfyui/bin/activate
git clone https://github.com/comfyanonymous/ComfyUI.git

cd ComfyUI

pip install --pre torch torchvision torchaudio --index-url https://rocm.nightlies.amd.com/v2/gfx1151/

pip install -r requirements.txt

Running ComfyUI

In short:

TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 python main.py --use-pytorch-cross-attention

Workflow for FLUX.2-dev on ComfyUI

Here is a collection of examples how to run a specific model.

https://comfyanonymous.github.io/ComfyUI_examples/

Of course sometimes you can save some VRAM by using a quantized model

https://huggingface.co/city96/FLUX.2-dev-gguf

https://huggingface.co/city96/FLUX.2-dev-gguf/blob/main/flux2-dev-Q4_K_M.gguf

But for the best quality you wanna stick to the bf16 or fp8 models. We just downloaded all the files from the official example;

https://comfyanonymous.github.io/ComfyUI_examples/flux2/

Running FLUX.2-dev with ComfyUI on Strix Halo

The best way to start ComfyUI on a Strix Halo is by adding some extra parameters for better performance:

TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 python main.py  --use-pytorch-cross-attention --disable-mmap

Don’t forget to add –disable-mmap, otherwise ComfyUI will hang.

It will take about 6 to 7 minutes to generate a 20 step 1200×600 picture. It’s not that fast, but the image quality is excellent.

 

No Comments

Fixing crashes on AMD Strix Halo — Image & Video Toolbox

Wednesday, November 26th, 2025

kyuz0 introduced his simple to setup toolboxes for Fedora, they also run on Ubuntu. This collection of toolboxes gives any Strix Halo owner an easy way of trying different Ai Applications on their machine, without having to mess around with software installations.

https://github.com/kyuz0

I can say, they are a delight to use. Update a toolbox is as simple as running a script. No more hassle with dependencies or compiling issues.

What you do need is patience, a huge SSD, and a fast unmetered internet connection. You’ll end up downloading 250GB+ to try out  different things, such as LLMs, video and image generation.

Anyway the toolboxes are getting better and better. Performance is improving, with speedups of up to 300% in some cases, reduced memory usage, but stability remains another PITA.

ROCm is no fun because of instability and daily crashes. But there is good news on the horizon.

A fix is out, which stops ROCm from crashing. That is the good news. The bad news is that it’s only available in a release candidate for the new kernel.

In the end it is a missing feature in the kernel for the Strix Halo (or a bug), that is fixed in the upcoming Linux Kernel 6.18. Hopefully this patch will be backported to 6.17 otherwise we will have broken software support in Ubuntu 25.10 on the flagship AMD APU.

In the meantime, you can mitigate the problem in Ubuntu 25.10 / 25.04 with this workaround:

options amdgpu cwsr_enable=0

Just add that to `/etc/modprobe.d/strix-halo.conf` (or any file)

https://community.frame.work/t/amd-rocm-does-not-support-the-amd-ryzen-ai-300-series-gpus/68767/51

At least with that I got no more crashes on Ubuntu 25.10.

Not sure what impact this has on performance. AFAIK, a Strix Halo has 1.5 times the  VGPR capacity versus other RDNA3.5 hardware.

What does cwsr do?

cwsr_enable (int) (https://www.kernel.org/doc/html/v6.7/gpu/amdgpu/module-parameters.html)

CWSR(compute wave store and resume) allows the GPU to preempt shader execution in the middle of a compute wave. Default is 1 to enable this feature. Setting 0 disables it.

https://www.kernel.org/doc/html/v6.7/gpu/amdgpu/module-parameters.html#cwsr-enable-int

I really don’t know what  disabling means in practice, (changing the default is a workaround) and I don’t know why that is fixing the VGPR issue, but it does work as a workaround.

VGPR has something to do with dynamic wave32 compute shaders, I suspect that’s the issue, because in addition to crashes I also experienced hangings/deadlocks.

RDNA increased SIMD register file capacity to 128 KB, up from 64 KB on GCN. RDNA 3 introduced a 192 KB register file configuration for high end GPUs, where die area is likely less of a concern. But that strategy isn’t efficient for raytracing.

https://chipsandcheese.com/p/dynamic-register-allocation-on-amds

Please let me know in the comments, if you do.

Hopefully this patch will be backported to 6.17. Don’t forget to comment out or delete the added line `options amdgpu cwsr_enable=0 in  /etc/modprobe.d/strix-halo.conf` once the patch is released.

Links and resources:

  • https://gpuopen.com/learn/optimizing-gpu-occupancy-resource-usage-large-thread-groups/
  • https://github.com/kyuz0/amd-strix-halo-image-video-toolboxes
Comments Off on Setting up unified memory for Strix Halo correctly on Ubuntu 25.04 or 25.10

Setting up unified memory for Strix Halo correctly on Ubuntu 25.04 or 25.10

Wednesday, November 12th, 2025

You have followed the online instructions, for example this great toolbox, but upon running the Strix Halo Toolbox, you are still encountering memory errors on your 128GB Strix Halo system, and for example qwen-image-studio fails to run.

File "/opt/venv/lib64/python3.13/site-packages/torch/utils/_device.py", line 104, in torch_function
return func(*args, **kwargs)
torch.OutOfMemoryError: HIP out of memory. Tried to allocate 3.31 GiB. GPU 0 has a total capacity of 128.00 GiB of which 780.24 MiB is free. Of the allocated memory 57.70 GiB is allocated by PyTorch, and 75.82 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_HIP_ALLOC_CONF=expandable_segments:True to avoid fragmentation. 

(https://github.com/kyuz0/amd-strix-halo-image-video-toolboxes/issues)

You’ve verified your configuration using sudo dmesg | grep "amdgpu:.*memory", and the output indicates that the GTT size is correct.

It is likely that you configured the GTT size using the now outdated and deprecated parameter amdgpu.gttsize, which may explain why the setting is not taking effect. Alternatively, you may have used an incorrect prefixamdttm. instead of the correct ttm..

Please verify your configuration to ensure the proper syntax is used:

How to check the unified memory setting on AMD Strix Halo/Krackan Point:

cat /sys/module/ttm/parameters/p* | awk '{print $1 / (1024 * 1024 / 4)}'

The last two lines must be the same, and the number you see is amount of unified memory in GB.

How to setup unified memory correctly

In the BIOS, set the GMA (Graphics Memory Allocation) to the minimum value: 512MB. Then, add a kernel boot parameter to enable unified memory support.

Avoid outdated methods, they no longer work. Also, note that the approach differs slightly depending on your hardware: AMD Ryzen processors require a different configuration (ttm) compared to Instinct-class (professional workstation) GPUs (amdttm).

To max out unified memory:

Edit /etc/default/grub for and change the GRUB_CMDLINE_LINUX_DEFAULT line to:

128GB HALO

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_iommu=off ttm.pages_limit=33554432 ttm.page_pool_size=33554432"

96GB HALO

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_iommu=off ttm.pages_limit=25165824 ttm.page_pool_size=25165824"

64GB HALO

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_iommu=off ttm.pages_limit=16777216 ttm.page_pool_size=16777216"

32GB HALO

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_iommu=off ttm.pages_limit=8388608 ttm.page_pool_size=8388608"

The math here is 32GB: 32 x 1024 * 1024 * 1024 / 4096 -> 32 * 1024 * 256

The default page = 4096.

After you’ve edited the /etc/default/grub:

sudo update-grub2
reboot

You probably should leave some memory (~4GB) for your system to run smoothly, so edit and adapt above lines accordingly. Maybe 2Gb is enough if you don’t run a GUI.

Check your config

To check if you’ve done it correctly, reboot and check:

cat /sys/module/ttm/parameters/p* | awk '{print $1 / (1024 * 1024 / 4)}'
96
96

Here the total memory of 96GB is set on a 96GB Strix Halo

If the last two numbers are not the same, try debugging the problem.

Debugging unified memory problem

Check dmesg for AMD VRAM and GTT size:

sudo dmesg | grep “amdgpu:.*memory”

[ 10.290438] amdgpu 0000:64:00.0: amdgpu: amdgpu: 512M of VRAM memory ready
[ 10.290440] amdgpu 0000:64:00.0: amdgpu: amdgpu: 131072M of GTT memory ready.

This seem correct, but setting set the GTT size with the old method amdgpu.gtt_size, will report the right size of GTT memory, but it can’t be used by ROCm unless  you set the TTM memory correctly You’ll notice an other warning in the dmesg output during early boot.

sudo dmesg | grep “amdgpu”

[ 17.652893] amdgpu 0000:c5:00.0: amdgpu: [drm] Configuring gttsize via module parameter is deprecated, please use ttm.pages_limit
[ 17.652895] amdgpu 0000:c5:00.0: amdgpu: [drm] GTT size has been set as 103079215104 but TTM size has been set as 48956567552, this is unusual

Furthermore you see a lot of sources mentioning to set amdttm.pages_limit or amdttm.page_pool_size. This won’t work with your Strix Halo but these settings are for AMD Instinct machines.

Confusing, yes, but just be careful to use the right settings in /etc/default/grub for  GRUB_CMDLINE_LINUX_DEFAULT

And don’t forget just check it with the mentioned oneliner:

cat /sys/module/ttm/parameters/p* | awk '{print $1 / (1024 * 1024 / 4)}'

 

Links and resources:

  • https://github.com/ROCm/ROCm/issues/5562#issuecomment-3452179504
  • https://strixhalo.wiki/
  • https://blog.linux-ng.de/2025/07/13/getting-information-about-amd-apus/
  • https://www.jeffgeerling.com/blog/2025/increasing-vram-allocation-on-amd-ai-apus-under-linux

 

Comments Off on Running the Dutch LLM fietje-2-chat with reasonable speed on your Android Phone

Running the Dutch LLM fietje-2-chat with reasonable speed on your Android Phone

Thursday, October 10th, 2024

To run Fietje-2-Chat on your Android Phone locally on a reasonable speed, you’ll need to create a special quantized version of fietje-2b-chat.

One of the best apps to run a LLM’s on your Android Phone is ChatterUI (https://github.com/Vali-98/ChatterUI).

You can download the APK from Github and transfer it to your phone and install it. It’s not yet available in F-Droid.

As most Android Phones have a ARM CPU, use special `quants` that run faster on ARM, because the use NEON extensions, int8mm and SVE instructions.

Note that these optimized kernels require the model to be quantized into one of the formats: Q4_0_4_4 (Arm Neon), Q4_0_4_8 (int8mm) or Q4_0_8_8 (SVE). The SVE mulmat kernel specifically requires a vector width of 256 bits. When running on devices with a different vector width, it is recommended to use the Q4_0_4_8 (int8mm) or Q4_0_4_4 (Arm Neon) formats for better performance.

https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md#arm-cpu-optimized-mulmat-kernels

How to create a special ARM optimized version of Fietje-2-Chat

Download the f16 guff version of Fietje-2-Chat:

wget https://huggingface.co/BramVanroy/fietje-2-chat-gguf/resolve/main/fietje-2b-chat-f16.gguf?download=true

Install a Docker version of LLama to do the conversion

mkdir p ~/llama/models
sudo docker run -v /home/user/llama/models:/models ghcr.io/ggerganov/llama.cpp:full --all-in-one "/models/" 7B

To convert the f32 or f16 gguf to another format Q4_0_4_4 (Arm Neon):

docker run --rm -v /home/user/llama/models:/models ghcr.io/ggerganov/llama.cpp:full --quantize "/models/fietje-2b-chat-f16.gguf" "/models/fietje-2b-chat-Q4_0_4_4.gguf" "Q4_0_4_4"

Transfer the fietje-2b-chat-Q4_0_4_4.gguf to your Android Device.

Open ChatterUI

Import Model:

Go to menu -> API -> Model -> import -> fietje-2b-chat-Q4_0_4_4.gguf

Load Model:

menu ->API -> Model -> select fietje-2b-chat-Q4_0_4_4.gguf -> Load

Then leave the settings and start typing in the prompt. The first cold run will be a little slow, but once it’s running you’ll get about 10 tokens/s on a Snapdragon 865 phone.

That’s not bad.

If you’re interested in a LLM that can generate much better Dutch than LLama3.2 or Phi3 on your phone, give Fietje a try.

Comments Off on Creating graphics with LLM’s: generate SVG!

Creating graphics with LLM’s: generate SVG!

Wednesday, September 25th, 2024

LLM’s can generate text, and Scalable Vector Graphics (SVG) is an XML-based vector image format for defining two-dimensional graphic. XML is human readable, so it can be generated by LLM’s.

How good do they perform?

https://duckduckgo.com/?q=DuckDuckGo+AI+Chat&ia=chat&duckai=1

Generate a dog in svg

Looks like a cute mix between a rubber duck and a dog to me.

Generate a chair in svg

More a stool then chair.

"A mix between a rubber duck and a dog."

LLM’s have humor in a way.

Comments Off on Open source coding with Ollama as Copilot

Open source coding with Ollama as Copilot

Tuesday, September 24th, 2024

Among the many tools available to developers today, support from AI as a coding assistant is a nice must-have.

Zed is a Rust-written modern open source code editor designed for collaborative editing and multiplayer teamwork. It works fine as a stand-alone editor with git support.

Ollama offers easy and privacy-friendly local driven LLM support. Get up and running with large language models on your own machine.

Zed does offer AI integration with ChatGPT or Claude, but it can also connect to your local Ollama install.

To try this out, just add this to the settings-file in Zed CTRL + ,:

 

"assistant": {
"version": "1",
"provider": {
"default_model": {
"name": "qwen2.5-coder",
"display_name": "codeqwen",
"max_tokens": 2048,
"keep_alive": -1
},
"name": "ollama",
// Recommended setting to allow for model startup
"low_speed_timeout_in_seconds": 30
}
}

Open a new assistant tab and you can change context of the assistant to your tab content, and let the AI assistant annotate your script:

/tab
Annotate the JS file

And can LLM’s create graphics? Of course they can.

How to create images with a Large Language Model (LLM)

SVG!

SVG is an xml based format that can be understood by humans and machines, so when a coding assistant can write code, it can also write SVG. 😉

Does it create sensible graphics? Not really.

I asked qwen2.5-coder  in Zed:

Create a SVG file that shows a chair.

Does this look like a chair?

 

Another attempt:

Does it really look like a chair? What does it look like. Let me know in the comments!