Running Immich with AI-Powered Image Search on Raspberry Pi 5 + AXera NPU

yuyun2000

TL;DR: Got Immich running with CLIP-based semantic search on a Raspberry Pi 5 using the AXera AX8850 NPU. Chinese language search works surprisingly well thanks to the ViT-L-14-336-CN model. Setup took about 30 minutes once I figured out the ML server configuration.

What is Immich?

Immich is an open-source, self-hosted photo and video management platform. Think Google Photos, but you control the data. It supports automatic backup, intelligent search, and cross-device access.

Why This Setup?

I wanted to test AI-accelerated image search on edge hardware. The AXera AX8850 NPU on our M5Stack development board provides hardware acceleration for the CLIP models, making semantic search actually usable on a Pi.

Hardware Setup

Raspberry Pi 5
M5Stack AX8850 AI Module (provides NPU acceleration)
Standard Pi power supply and storage

Step-by-Step Deployment

1. Download the Pre-built Package

Grab the optimized Immich build from HuggingFace:

git clone https://huggingface.co/AXERA-TECH/immich

Note: You'll need git lfs installed. If you don't have it, install it first.

What you get:

m5stack@raspberrypi:~/rsp/immich $ ls -lh
total 421M
drwxrwxr-x 2 m5stack m5stack 4.0K Oct 10 09:12 asset
-rw-rw-r-- 1 m5stack m5stack 421M Oct 10 09:20 ax-immich-server-aarch64.tar.gz
-rw-rw-r-- 1 m5stack m5stack    0 Oct 10 09:12 config.json
-rw-rw-r-- 1 m5stack m5stack 7.6K Oct 10 09:12 docker-deploy.zip
-rw-rw-r-- 1 m5stack m5stack 104K Oct 10 09:12 immich_ml-1.129.0-py3-none-any.whl
-rw-rw-r-- 1 m5stack m5stack 9.4K Oct 10 09:12 README.md
-rw-rw-r-- 1 m5stack m5stack  177 Oct 10 09:12 requirements.txt

2. Load the Docker Image

cd immich
docker load -i ax-immich-server-aarch64.tar.gz

If Docker isn't installed, you'll need to set that up first.

3. Configure the Environment

unzip docker-deploy.zip
cp example.env .env

4. Start the Core Services

docker compose -f docker-compose.yml -f docker-compose.override.yml up -d

Success looks like this:

[+] Running 3/3
 ✔ Container immich_postgres  Started                                      1.0s 
 ✔ Container immich_redis     Started                                      0.9s 
 ✔ Container immich_server    Started                                      0.9s

5. Set Up the ML Service (The Interesting Part)

The ML service handles the AI-powered image search. It runs separately to leverage the NPU.

Create and activate a virtual environment:

python -m venv mich
source mich/bin/activate

Install dependencies:

pip install https://github.com/AXERA-TECH/pyaxengine/releases/download/0.1.3.rc2/axengine-0.1.3-py3-none-any.whl
pip install -r requirements.txt
pip install immich_ml-1.129.0-py3-none-any.whl

Launch the ML server:

python -m immich_ml

You should see:

[10/10/25 09:50:12] INFO     Listening at: http://[::]:3003 (8698)              
[INFO] Available providers:  ['AXCLRTExecutionProvider']
[10/10/25 09:50:16] INFO     Application startup complete.

The AXCLRTExecutionProvider confirms the NPU is being used.

Web Interface Configuration

Initial Setup

Navigate to http://<your-pi-ip>:3003 (e.g., 192.168.20.27:3003)
First visit requires admin account creation - credentials are stored locally

Configure the ML Server

This is critical - the web interface needs to know where your ML service is running.

Go to Settings → Machine Learning
Set the URL to your Pi's IP and port 3003: http://192.168.20.27:3003
Choose your CLIP model based on language:
- Chinese search: ViT-L-14-336-CN__axera
- English search: ViT-L-14-336__axera

First-Time Index

Important: You need to manually trigger the initial indexing.

Go to Administration → Jobs
Find "SMART SEARCH"
Click "Run Job" to process your uploaded images

Testing Image Search

Upload some photos, wait for indexing to complete, then try semantic searches:

The search works conceptually - you can search for "sunset" or "dogs playing" and it'll find relevant images even if those exact words aren't in the filename.

Technical Notes

The NPU acceleration makes CLIP inference fast enough for interactive search
Chinese language support is genuinely good with the CN model
The ML server runs independently, so you can restart it without affecting the main Immich service
Docker handles PostgreSQL and Redis automatically

Why M5Stack in This Stack?

The AX8850 NPU module provides the hardware acceleration that makes this practical on a Pi. Without it, running CLIP inference would be too slow for interactive use. We're working on more edge AI applications that leverage this acceleration - this Immich setup is a good real-world test case.

Questions about the setup or the NPU integration? Happy to dig into specifics.