🤖Have you ever tried Chat.M5Stack.com before asking??😎
    M5Stack Community
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Register
    • Login

    Running Home Assistant Control with a 0.5B Model? I Got the Full Pipeline Working on M5Stack LLM8850

    General
    1
    1
    27
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Y
      yuyun2000
      last edited by

      🎯 TL;DR

      Successfully achieved a complete voice → control pipeline in under 2 seconds using qwen2.5-0.5B, including voice wake-up, ASR recognition, pause detection, and device control. This speed is genuinely usable for daily smart home interactions.


      Hey Makers! 👋

      I've been working on a pretty interesting challenge: getting a tiny 0.5B parameter LLM to reliably control Home Assistant devices. Spoiler alert—it actually works, and it's fast. Let me walk you through the journey, the failures, and what finally clicked.


      01 | The Core Problem: Speed vs. Intelligence

      Product: M5Stack LLM8850
      Category: Edge AI / Smart Home Integration

      When building voice control for Home Assistant, you hit a classic tradeoff:

      • Large models (7B+): Great at understanding context, terrible at response time.
      • Small models (0.5B-1.5B): Lightning fast, but they often hallucinate or ignore instructions.

      My goal was simple: Make a 0.5B model output structured JSON commands reliably, just like ChatGPT would—without the random gibberish or making up device names.

      The approach? Inject real device info into the LLM's context and force it to output standardized control commands. Sounds easy, right? Well...


      02 | First Attempt: Prompt Engineering (Epic Fail)

      Category: Methodology

      I figured I'd try the "prompt engineering" route first—after all, it works wonders with GPT-4. I crafted a detailed system prompt with JSON examples:

      80193cd4-b8fe-4163-a050-72fb98980c1e-image.png

      Result? The model just copy-pasted my examples verbatim. It completely ignored the actual user command.

      Lesson learned: Small models don't have the reasoning capacity to "follow instructions" the way larger models do. You can't trick them with clever prompts alone.


      03 | Second Attempt: Dataset Fine-Tuning (This Worked)

      Category: Model Training

      While digging through GitHub, I found home-llm—a project that already proved this concept with a 3B model. The author's core insight was brilliant: Build a custom training dataset that teaches the model the exact input→output pattern.

      9742b9c5-e431-447f-aea5-6e79068fb15a-image.png

      Their approach:

      • System Prompt: Randomized lists of devices + available services (simulating real homes).
      • User Prompt: Commands drawn from a pre-built library.
      • Model Output: Properly formatted JSON commands matching the request.

      But there were two critical gaps:

      ❌ Gap #1: No "Device Not Found" Handling
      If you said "turn on the washing machine in the bedroom" but that device didn't exist, the model would just pick a random device. No error handling.

      ❌ Gap #2: The Model Didn't Know Its Own Capabilities
      Ask it "what devices can you control?" and it couldn't answer. This caused frequent mismatches during actual use.


      04 | My Dataset Improvements

      Category: Data Engineering

      I extended the training data to fix these issues:

      ✅ Fix #1: Added "Device Not Found" Samples

      • Included training examples where the user asks for non-existent devices.
      • Taught the model to respond with: "devices not found".
      • Result: Near-perfect rejection of garbage inputs.

      ✅ Fix #2: Added Device Query Capability

      • Added samples like "what can you control?" or "list all devices."
      • Model learned to extract and report the device list from the system prompt.
      • Result: Noticeably better device name matching (though room assignments still occasionally glitch).

      Resources:

      • Full dataset + fine-tuned model: qwen2.5-0.5b-ha @ HuggingFace

      05 | The Fine-Tuning Process (Using LLaMA-Factory)

      Category: Training Workflow

      I used LLaMA-Factory for the actual fine-tuning. Here's the quick rundown:

      Step 1: Convert Dataset Format
      Transform the JSON into LLaMA-Factory's expected structure:

      2bf605e2-f3d6-421f-bd57-4b278730db98-image.png

      Step 2: Register the Dataset
      Edit dataset_info.json to add your dataset config:

      08a74128-1bbd-47ac-b2e0-c76ba3267646-image.png

      Step 3: Fix Identity Presets
      Don't forget to change the default identity dataset—otherwise your model will introduce itself as ChatGPT. 😂

      Step 4: Run the Training
      Reference notebook: Colab Fine-Tuning Script

      Key params:

      • Base model: qwen2.5-0.5b (best speed/quality balance)
      • Hardware: RTX 3090 or better recommended (T4 free tier is painfully slow)

      06 | Deployment Architecture

      Product: M5Stack LLM8850
      Category: System Design

      To make this actually usable, I built a control service: HomeAssistant-Edge

      Pipeline:

      Voice Input → VAD (Voice Activity Detection) → KWS (Keyword Spotting)  
      → ASR (Speech Recognition) → LLM (Structured Output) → HA API Call
      

      Why M5Stack LLM8850?
      This hardware was critical to making the whole thing work:

      • ✅ Edge inference: 0.5B model runs in ~1.5 seconds.
      • ✅ Complete toolchain: Supports VAD/KWS/ASR/LLM model deployment.
      • ✅ Local network: No cloud dependency = better privacy + speed.
      • ✅ Ecosystem: M5Stack's modular design makes client expansion trivial.

      Setup:

      1. Input your Home Assistant API key.
      2. Load pre-trained ASR + LLM models.
      3. Start local network service and wait for M5 clients to connect.

      07 | Real-World Performance & Limitations

      Category: Testing Results

      🎯 What Works

      • Speed: Full pipeline (ASR + transmission + pause detection) in under 2 seconds.
      • Accuracy: Reliable control for common devices.
      • Robustness: Properly rejects invalid commands and missing devices.

      ⚠️ Current Limitations

      1. Lab-only validation: Limited device variety; needs real-world testing.
      2. Client incomplete: Only the 8850 does inference; M5 microphone modules aren't integrated yet.
      3. No dynamic context: System prompts are cached in memory for speed, so can't query real-time device states, time, etc.

      🔧 Next Steps

      • ASR error tolerance: Add training data for typos, homophones.
      • Dynamic info injection: Refactor prompt system to support live state queries.
      • Advanced features: Build datasets for timers, scene automation, etc.

      💬 Discussion

      Have you tried running LLMs on edge devices for smart home control? What gotchas did you run into? Drop your experiences below—I'd love to hear how others are tackling this problem!


      📌 Resources

      • 🤗 Fine-tuned Model & Dataset: https://huggingface.co/yunyu1258/qwen2.5-0.5b-ha
      • 🛠️ Control Service Code: https://github.com/yuyun2000/HomeAssistant-Edge
      • 🔧 Fine-Tuning Tool: https://github.com/hiyouga/LLaMA-Factory
      • 📖 Reference Project: https://github.com/acon96/home-llm
      • 📚 M5Stack Docs: https://docs.m5stack.com
      • 🗣️ Community Forum: https://community.m5stack.com

      Note: This post shares technical experiments and gotchas from building edge-based LLM systems. All implementation details are open-sourced for the maker community.

      1 Reply Last reply Reply Quote 0
      • First post
        Last post