Building a Local Smart Home Voice Assistant With ESPHome!

Introduction

Home Assistant recently announced the much-anticipated wake word feature as part of their "Year of the Voice" initiative. This feature is an exciting milestone for users looking to create local, private voice assistants similar to Amazon Alexa or Google Home but focused on operating entirely offline.

In this guide, we’ll show you how to set up Home Assistant's voice assistant with wake words and then take it a step further by building a custom voice assistant using ESPHome, an ESP32, a microphone, and a speaker.

Year of the Voice: An Overview

Earlier this year, Home Assistant began the "Year of the Voice" project, aiming to integrate a local voice assistant with a heightened emphasis on privacy. Since the start, voice features have gradually rolled out:

Initially, text input was used to control devices.
Next, analog phones were introduced to interact with the voice assistant.
Push-to-talk was later added for ESPHome devices.
The latest update introduces wake words.

Importance of Wake Words

Wake words like “OK Google” or “Alexa” are crucial for hands-free interaction. This feature allows users to interact with their smart home effortlessly, enhancing practical use and seamless operation.

Setting Up Home Assistant for Voice Interaction

Step 1: Configure Voice Pipeline

Navigate to the Settings:
**Select Voice Assistants:**
Default Pipeline: Modify the existing Home Assistant pipeline by clicking the dropdown to configure its features.

Step 2: Install Required Add-ons

Piper: Handles text-to-speech operations.
- Install, then configure voice models and ensure start on boot and watchdog are enabled.
Whisper: Converts speech from the microphone to text.
- Install, select the appropriate model based on your hardware, enable start on boot, and watchdog.
Open Wake Word: Detects wake words from the streamed audio.
- Install and configure if needed for threshold and trigger levels, start it.

Step 3: Set Up Voice Components

Go Back to Voice Assistants Settings:
Configure Components in the Pipeline:

Set up Speech-to-Text using Whisper, Text-to-Speech using Piper, and Wake Words using Open Wake Word.
Choose your required wake word.

Building a Custom Voice Assistant with ESPHome

Required Components:

ESP32 Development Board: Any development board should be fine.
I2S Microphone: IC43434 or INMP441 models are recommended.
Optional Components: An I2S amplifier and speaker for audio feedback.

Wiring the Components:

Wire the microphone and optional speaker to the ESP32. They share the bit clock and frame sync but have separate data in and out lines.

Configuring ESPHome:

Create ESPHome Config:
Use ESP IDF Framework:
Configure I2S Settings:
- Specify GPIO pins and adjust settings for noise suppression, auto-gain, etc.
Upload and Integrate with Home Assistant:
- Enter the IP address in Home Assistant settings for ESPHome.
- Configure the device settings and fine-tune as needed.

Final Testing:

Use the wake word to activate the voice assistant and issue commands.
Ensure proper functionality by checking the sensor in ESPHome.

Keywords

Home Assistant
Wake Word
Voice Assistant
ESPHome
ESP32
I2S Microphone
Local Privacy

FAQ

Q1: What is the "Year of the Voice" initiative in Home Assistant? A1: "Year of the Voice" is a project focused on adding a robust voice assistant feature to Home Assistant, emphasizing local operation and privacy.

Q2: Why are wake words important for a voice assistant? A2: Wake words allow for hands-free operation, making interactions with the smart home more practical and seamless.

Q3: What hardware do I need to build a custom voice assistant with ESPHome? A3: You need an ESP32 development board, an I2S microphone, and optionally an I2S amplifier and speaker for feedback.

Q4: Can I use my custom wake words with Home Assistant? A4: Yes, you can train your custom wake word models, although they might not perform as well as pre-trained models.

Q5: Does streaming audio from multiple microphones to the Home Assistant server use significant bandwidth? A5: No, it uses about 32 kilobytes per second per device, which is minimal.

Q6: Can different devices have different wake words or languages? A6: Yes, you can configure different pipelines for different devices, allowing various wake words and languages.

Q7: Are there any pre-built devices recommended for use with ESPHome voice assistants? A7: The M5 Stack Echo is a popular pre-built device used in many demonstrations, but you can also build your own with ESP32 and other components.