Experimenting with fully on-device meeting processing using Qwen 3
Experimenting#32 · by Marshall (developer) · Mar 28, 2026
Experimenting with fully on-device meeting processing using Qwen 3.5 9B, an open-source language model that runs locally on your Mac via Apple's MLX framework.
Meeting assistant briefings, action items, and potentially full meeting summarization could run entirely on your device with no internet connection required.
Early quality results: In benchmark testing across 10 meeting scenarios (goal tracking, interpersonal dynamics, reference context usage, multi-speaker chaos), the on-device model with a tuned prompt scored 90% on our grading rubric — matching Gemini 3 Flash cloud performance.
System requirements we're targeting:
- Apple Silicon Mac with 16+ GB RAM (model uses ~5.6 GB)
- Best experience: M1 Max, M2 Max, M3 Max, M4 Pro, or newer (~1-3 second briefings)
- Usable on any Pro chip (~3-5 seconds)
- Base chips work but slower (~5-10 seconds)
- ~5.6 GB one-time download, stored locally
What's next: Exploring structured output (JSON) support for action items and meeting summaries, which would enable fully offline meeting processing end-to-end.
Comments (1)
Posted a video walkthrough testing Qwen 3.5 9B with the meeting assistant: https://www.youtube.com/watch?v=zVlj9gzmAaA
Covers latency on M1 Max (~10-12s), memory usage (~7GB), projected M5 performance, and why we're not shipping it yet. Grammar correction works well on-device, but meeting intelligence still needs cloud models.