On-device AI

Speech processing always runs on your Mac: audio capture, transcription (Parakeet), speaker identification, echo cancellation, and voice detection. No setup, no plan requirement — audio never leaves the machine.

The AI features that work on transcript text — summaries, the live assistant, dictation cleanup, transform — default to Google Gemini through our open-source proxy. This page is about running those on your own hardware instead. Three levels, in increasing order of setup:

You wantUseCovers
No AI at allLocal ModeTranscription and speakers only; every AI feature off
Typing features on-device, zero setupApple (On-Device)Dictation, Transform
Every AI feature on your hardwareCustom endpoint → a local serverDictation, Transform, summaries, live assistant

Apple (On-Device): dictation and transform

On macOS 26 with Apple Intelligence enabled, MimicScribe can run Dictation and Transform on Apple’s built-in on-device model. Nothing to install, nothing to configure.

Open Settings → AI Provider, and under Feature routing set Dictation or Transform to Apple (On-Device). The option only appears on Macs where the model is available.

Apple’s model has a small context window — a full meeting transcript doesn’t fit — so it’s offered only for the typing features. For summaries and the live assistant, use a local server below.

A local server: every feature

Any server that speaks the OpenAI chat-completions API works. Ollama is the shortest path:

  1. Install Ollama and pull a model:

    ollama pull qwen3
  2. In Settings → AI Provider, under Custom endpoint, fill in:

    • Base URLhttp://localhost:11434/v1
    • Model — the name you pulled, e.g. qwen3.6:27b
    • API Key — leave blank
    • Reasoning — leave at None. Thinking models (like Qwen) otherwise spend minutes on hidden reasoning before every answer.

    Click Verify. It checks the server is reachable, then sends a tiny request to confirm the model actually responds.

  3. Under Feature routing, set the features you want to Custom endpoint. You can mix — summaries on your local model, dictation on Gemini.

Settings → AI Provider with every feature routed to a custom endpoint: Ollama on localhost, model qwen3.6:27b, no API key, reasoning set to None

LM Studio and llama.cpp work the same way; only the Base URL and model name change. The full reference — what routes where, plan limits, troubleshooting — is on the Custom AI Endpoint page.

What to expect

Own hardware means owning the quality tradeoff:

  • Quality is the model you pick. MimicScribe’s prompts are tuned against Gemini. Dictation and transform are short, focused tasks that mid-size local models handle; meeting summarization is a long-context task — use the largest model your Mac runs comfortably.
  • A few things stay on the default Gemini path: reference-document search, vocabulary spelling hints, and image (OCR) processing. They rely on capabilities a general chat endpoint doesn’t reliably provide, and each fires only if you use the feature it serves.
  • Free-tier caps still apply. A custom endpoint shares the same per-feature daily allowance as the default path — a product limit, not a cost one. Light and Unlimited lift it.

Watch it work

Open Settings → Network Log and run a meeting or a dictation. Each AI request lists where it went — with a local server, that’s your own machine:

The Network Log showing AI requests going to localhost — the meeting summary ran on this Mac

Network Activity documents what the log records and how to verify it from outside the app with nettop or Little Snitch.

Local Mode: no AI at all

Routing changes where the AI runs. Local Mode is different — it turns AI off entirely for a meeting: transcription and speaker separation only, nothing sent anywhere, no summaries to backfill until you choose to. Toggle it per meeting when you start one, or set Start meetings offline in Settings → AI Provider to make it the default. Details in Privacy & Data.