The Raspberry Pi is a compelling low-power option for running GPU-accelerated LLMs locally.

For my main test setup, here’s the hardware I used (some links are affiliate links):

– Raspberry Pi 5 8GB ($80): https://www.raspberrypi.com/products/raspberry-pi-5/
– Raspberry Pi 27W Power Supply ($14): https://www.raspberrypi.com/products/power-supply/
– 1TB USB SSD ($64): https://amzn.to/3OjJysQ
– Pineboards HatDrive! Bottom ($20): https://amzn.to/3Zbz0T5
– JMT M.2 Key to PCIe eGPU Dock ($55): https://amzn.to/4eCpi0g
– OCuLink cable ($20): https://amzn.to/3YTXNJW
– Lian-Li SFX 750W PSU ($130): https://amzn.to/48T4a4R
– AMD RX 6700 XT ($400): https://amzn.to/3UXywgI

And here are the resources I mentioned for setting up your own GPU-accelerated Pi:

– Blog post with AMD GPU setup instructions: https://www.jeffgeerling.com/blog/2024/amd-radeon-pro-w7700-running-on-raspberry-pi
– Blog post with llama.cpp Vulkan instructions: https://www.jeffgeerling.com/blog/2024/llms-accelerated-egpu-on-raspberry-pi-5
– Llama Benchmarking issue: https://github.com/geerlingguy/ollama-benchmark/issues/1
– AMD not supporting ROCm on Arm: https://github.com/ROCm/ROCm/issues/3960
– Raspberry Pi PCIe Database: https://pipci.jeffgeerling.com
– Home Assistant Voice Control: https://www.home-assistant.io/voice_control/
– James Mackenzie’s video with RX 580: https://www.youtube.com/watch?v=J0z09Ddr58w

Support me on Patreon: https://www.patreon.com/geerlingguy
Sponsor me on GitHub: https://github.com/sponsors/geerlingguy
Merch: https://www.redshirtjeff.com
2nd Channel: https://www.youtube.com/@GeerlingEngineering
3rd Channel: https://www.youtube.com/@Level2Jeff

Contents:

00:00 – Why do this on a Pi
01:33 – Should I even try?
02:06 – Hardware setup
04:34 – Comparisons with Llama
05:43 – How much is too much?
06:52 – Benchmark results
07:41 – Software setup
09:13 – More models, more testing

source