How to Run a GPU-Accelerated, Local AI Voice Assistant in Home Assistant with Sentinel Core

by Pepijn de V

I’m running out of openings, but I’m glad you’re all here!

Last week I told you about the GPU driver landscape, now I want to take a moment to explore one possible use case for our board: running a local voice assistant on your Sentinel Core using Home Assistant.

This was our initial pitch for the board, and it turns out to be just barely possible. Good for tinkering, less ideal to support as a product. Nevertheless, let us tinker!

The first order of business is getting GPU drivers into Home Assistant. They maintain their own operating system, which I forked to make a few changes:

Configure the Linux kernel to enable AMDGPU
Add the ARM patches to the patches folder
Install the firmware package to actually instantiate the card

Then, it’s just a matter of building and installing the operating system.

But now we’re kind of stuck. Home Assistant supports Ollama, but Ollama only supports ROCm, and ROCm doesn’t work on ARM. They also support OpenAI, but have repeatedly refused to allow setting the API base URL needed for local LLMs. The saving grace comes by way of Extended OpenAI Conversation which, while not very actively maintained, does support any OpenAI compatible server (in other words, pretty much all of them).

Except now we need to find a server that works on ARM with a backend that also works on ARM, and after a bit of research the answer lies in llama.cpp with its Vulkan backend. To run it on Home Assistant, I created this LLM addon repository that contains a Docker image of llama.cpp with Vulkan and tool calling enabled (tool calling was in itself a whole adventure to get right).

Okay so now we’re done right? We have drivers, an LLM server, and an HA integration, all ready to go. Well, yes, now you can CHAT with your LLM, but it’s called a VOICE assistant for a reason. For local voice functionality, Home Assistant uses Piper and Whisper, and while Piper runs adequately on CPU, Whisper not so much.

Home Assistant uses a thing called Faster Whisper which supports CPU and NVIDIA, as far as I know. However, it turns out that the fine folks of llama.cpp also maintain whisper.cpp, which supports the same Vulkan backend. The bad news is that the Wyoming server that communicates between Whisper and Home Assistant appears to be abandoned, so I did what any insane person would do and forked it to update Whisper.cpp and brush up the build system and other dependencies.

And NOW we have ourselves a fullly functioning voice assistant! But what a stack of forks it has been. I hope this shows both how cool this board is to tinker with, but also that it is still quite experimental. Hopefully a Sentinel Core community of excited users and developers can help push this to a higher level!

Questions?

Ask Crowd Supply about an order
Ask Sanctuary Systems a technical question

Learn More About This Project

Go to the main project page
See all project updates

Sentinel Core

A Mini-ITX Raspberry Pi CM5 I/O board with PCIe

How to Run a GPU-Accelerated, Local AI Voice Assistant in Home Assistant with Sentinel Core

Questions?

Learn More About This Project

Subscribe to the Crowd Supply newsletter, highlighting the latest creators and projects