Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
15 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions .env.template
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@
GCLOUD_PROJECT="your-gcloud-project-id"
GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"

# -- Wakeup Word --
# STACKCHAN_NO_USE_CLIENT_WAKEUP_WORD=1

# STACKCHAN_USE_OPEN_WAKE_WORD=1

# -- Speech Recognition --
# Google Cloud STT
STACKCHAN_USE_GOOGLE_CLOUD_STT=1
Expand All @@ -13,8 +18,9 @@ STACKCHAN_GOOGLE_CLOUD_STT_LANGUAGE_CODE="ja-JP"
# STACKCHAN_WHISPER_CLI_VAD_MODEL_PATH="/path/to/whisper.cpp/ggml-silero-v5.1.2.bin"

# Whisper Server
# STACKCHAN_USE_USE_WHISPER_SERVER=1
# STACKCHAN_WHISPER_SERVER_PORT=8431
# STACKCHAN_USE_WHISPER_SERVER=1
# STACKCHAN_WHISPER_SERVER_URL="http://127.0.0.1:8080/inference"
# STACKCHAN_WHISPER_SERVER_MODEL=

# -- Speech Syntheis --
# Google Cloud TTS
Expand Down
1 change: 1 addition & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
コミットメッセージは英語で書いてください。日本語のコミットメッセージは避けてください。
22 changes: 22 additions & 0 deletions .github/workflows/pr-lint-typecheck.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,28 @@ on:
push:
branches:
- main
- develop
paths:
- ".github/workflows/pr-lint-typecheck.yml"
- "Makefile"
- "platformio.ini"
- "platformio-m5stack.ini"
- "pyproject.toml"
- "protobuf/**"
- "uv.lock"
- "firmware/lib/generated_protobuf/**"
- "stackchan_server/**"
- "example_apps/**"
pull_request:
paths:
- ".github/workflows/pr-lint-typecheck.yml"
- "Makefile"
- "platformio.ini"
- "platformio-m5stack.ini"
- "pyproject.toml"
- "protobuf/**"
- "uv.lock"
- "firmware/lib/generated_protobuf/**"
- "stackchan_server/**"
- "example_apps/**"

Expand Down Expand Up @@ -42,3 +53,14 @@ jobs:

- name: ty
run: uv run ty check stackchan_server example_apps

- name: Install PlatformIO
run: python -m pip install platformio

- name: Verify generated protobuf files are up to date
run: |
set -euo pipefail
pio pkg install -e m5stack-cores3-m5unified
make protobuf
git diff --stat -- stackchan_server/generated_protobuf firmware/lib/generated_protobuf
git diff --exit-code -- stackchan_server/generated_protobuf firmware/lib/generated_protobuf
42 changes: 42 additions & 0 deletions .github/workflows/pr-platformio-build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
name: PR PlatformIO Build

on:
push:
branches:
- main
- develop
paths:
- ".github/workflows/pr-platformio-build.yml"
- "platformio.ini"
- "platformio-m5stack.ini"
- "protobuf/**"
- "firmware/**"
pull_request:
paths:
- ".github/workflows/pr-platformio-build.yml"
- "platformio.ini"
- "platformio-m5stack.ini"
- "protobuf/**"
- "firmware/**"

jobs:
platformio-build:
runs-on: ubuntu-latest

steps:
- name: Checkout
uses: actions/checkout@v4

- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: "3.13"

- name: Install PlatformIO
run: python -m pip install platformio

- name: Prepare firmware config
run: cp firmware/include/config.template.h firmware/include/config.h

- name: Build firmware
run: pio run --environment m5stack-cores3-m5unified
49 changes: 29 additions & 20 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,49 +5,57 @@
## 全体像

- CoreS3 側は `firmware/`、Python サーバー側は `stackchan_server/`。
- WebSocket の on-wire 形式は手書きバイナリヘッダではなく `protobuf/websocket-message.proto` で定義した protobuf。
- 音声 uplink は `AudioPcm`、音声 downlink は `AudioWav`(実体は raw PCM)。
- サーバーは FastAPI を公開し、WebSocket と REST API の両方を持つ。
- サーボ制御が追加済みで、WebSocket プロトコルには `ServoCmd` / `ServoDoneEvt` がある。

## 状態遷移の要点

- ファームウェア状態: `Idle`, `Listening`, `Thinking`, `Speaking`, `Disconnected`
- サーバーから指示できるのは `StateCmd` の `0..3` (`Idle`〜`Speaking`)
- サーバーから指示できるのは `StateCmd` の `Idle` / `Listening` / `Thinking` / `Speaking`
- `Disconnected` はファームウェア内部状態で、WebSocket 切断時に入る
- `WakeWordEvt` を受けるか、REST API の wakeword 擬似発火で talk session が始まる

## WebSocket プロトコル要約

- 共通ヘッダ: `WsHeader` (`<B B B H H>`, packed, little-endian)
- 1 WebSocket binary frame = 1 protobuf `WebSocketMessage`
- protobuf 定義: `protobuf/websocket-message.proto`
- package: `stackchan.websocket.v1`
- envelope fields
- `kind`
- `message_type`
- `seq`
- `oneof body`
- `kind`
- `1=AudioPcm`
- `2=AudioWav`
- `3=StateCmd`
- `4=WakeWordEvt`
- `5=StateEvt`
- `6=SpeakDoneEvt`
- `7=ServoCmd`
- `8=ServoDoneEvt`
- `AudioPcm`
- `AudioWav`
- `StateCmd`
- `WakeWordEvt`
- `StateEvt`
- `SpeakDoneEvt`
- `ServoCmd`
- `ServoDoneEvt`
- `messageType`
- `1=START`
- `2=DATA`
- `3=END`
- `START`
- `DATA`
- `END`

### 現行挙動

- `AudioPcm`
- PCM16LE / 16kHz / 1ch
- `START -> DATA* -> END`
- `AudioPcmStart -> AudioChunk* -> AudioPcmEnd`
- `DATA` は 2000 samples(4000 bytes, 約 125ms)ごと
- 無音 3 秒で自動終了
- `AudioWav`
- 名前に反して WAV コンテナではなく PCM ストリーム
- `START` payload は `<uint32 sample_rate><uint16 channels>`
- `AudioWavStart.sample_rate` / `AudioWavStart.channels` を送る
- `DATA` chunk は既定 4096 bytes
- 約 2 秒セグメントで送信し、2 本目は約 1 秒後に先行開始
- `ServoCmd`
- payload: `<uint8 count><commands...>`
- op: `0=Sleep`, `1=MoveX`, `2=MoveY`
- `ServoCommandSequence.commands[]`
- op: `Sleep`, `MoveX`, `MoveY`
- 新規コマンド受信時は実行中シーケンスを置き換える

## サーバー側 (`stackchan_server/`)
Expand Down Expand Up @@ -82,15 +90,16 @@

- `src/main.cpp`
- Wi-Fi 接続後、`/ws/stackchan` に接続
- `AudioWav`, `StateCmd`, `ServoCmd` を受信処理
- protobuf `WebSocketMessage` を decode して `AudioWav`, `StateCmd`, `ServoCmd` を受信処理
- 通信が 60 秒止まると `Thinking` / `Speaking` から `Idle` に戻す
- `src/listening.cpp`
- マイク読み取り 256 サンプル単位
- 2 秒リングバッファ
- protobuf の `AudioPcmStart/Data/End` を送信
- 無音 3 秒で停止
- `src/speaking.cpp`
- 3 本バッファで TTS セグメント受信
- `END` 後に `M5.Speaker.playRaw()` で再生
- 3 本バッファで protobuf `AudioWavStart/Data/End` を受信
- `AudioWavEnd` 後に `M5.Speaker.playRaw()` で再生
- 再生完了時に `SpeakDoneEvt`
- `src/servo.cpp`
- `ServoCmd` を非同期実行
Expand Down
26 changes: 26 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,7 +1,33 @@
UV ?= uv
PROTO_DIR := protobuf
PROTO_FILE := $(PROTO_DIR)/websocket-message.proto
PY_PROTO_OUT_DIR := stackchan_server/generated_protobuf
FW_PROTO_OUT_DIR := firmware/lib/generated_protobuf
NANOPB_GENERATOR := .pio/libdeps/m5stack-cores3-m5unified/Nanopb/generator/nanopb_generator.py

.PHONY: lint lint-fix protobuf protobuf-python protobuf-firmware clean-protobuf

lint:
uv run ruff check stackchan_server example_apps
uv run ty check stackchan_server example_apps

lint-fix:
uv run ruff check --fix stackchan_server example_apps
uv run ty check stackchan_server example_apps

protobuf: protobuf-python protobuf-firmware

protobuf-python: $(PROTO_FILE)
mkdir -p $(PY_PROTO_OUT_DIR)
touch $(PY_PROTO_OUT_DIR)/__init__.py
$(UV) run python -m grpc_tools.protoc -I$(PROTO_DIR) --python_out=$(PY_PROTO_OUT_DIR) $(PROTO_FILE)

protobuf-firmware: $(PROTO_FILE)
@test -f $(NANOPB_GENERATOR) || (echo "nanopb generator not found: $(NANOPB_GENERATOR)" && exit 1)
mkdir -p $(FW_PROTO_OUT_DIR)
$(UV) run python $(NANOPB_GENERATOR) --proto-path=$(PROTO_DIR) --output-dir=$(FW_PROTO_OUT_DIR) $(PROTO_FILE)

clean-protobuf:
rm -f $(PY_PROTO_OUT_DIR)/websocket_message_pb2.py
rm -f stackchan_server/generated/websocket_message_pb2.py
rm -f $(FW_PROTO_OUT_DIR)/websocket-message.pb.h $(FW_PROTO_OUT_DIR)/websocket-message.pb.c
24 changes: 21 additions & 3 deletions docs/server_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,13 +59,31 @@ STACKCHAN_WHISPER_CLI_MODEL_PATH="/path/to/whisper.cpp/ggml-small.bin"
STACKCHAN_WHISPER_CLI_VAD_MODEL_PATH="/path/to/whisper.cpp/ggml-silero-v5.1.2.bin"
```

### (オプション)Whisper.cppのwhisper-serverの設定
### (オプション) Whisper Serverの設定

(WIP)

`STACKCHAN_WHISPER_SERVER_URL` に Whisper Server の推論エンドポイント URL をそのまま指定します。
未設定時は `http://127.0.0.1:8080/inference` を利用します。

#### 例: Whisper.cppのwhisper-serverの設定

whisper.cpp/examples/server: https://github.com/ggml-org/whisper.cpp/tree/master/examples/server

```
STACKCHAN_USE_WHISPER_SERVER=1
STACKCHAN_WHISPER_SERVER_URL="http://127.0.0.1:8080/inference"
STACKCHAN_WHISPER_SERVER_MODEL=
```

#### 例: [Lemonade](https://lemonade-server.ai/) を使う場合

Lemonade: https://lemonade-server.ai/

```
STACKCHAN_USE_USE_WHISPER_SERVER=1
STACKCHAN_WHISPER_SERVER_PORT=8431
STACKCHAN_USE_WHISPER_SERVER=1
STACKCHAN_WHISPER_SERVER_URL=http://localhost:13305/api/v1/audio/transcriptions
STACKCHAN_WHISPER_SERVER_MODEL=Whisper-Large-v3-Turbo
```

## 音声合成の設定
Expand Down
Loading
Loading