Custom AI Endpoint

By default, MimicScribe’s text-AI features — summaries, speaker attribution, action items, the live assistant, dictation, and transform — run on Google Gemini through an open-source proxy. Audio always stays on your Mac; only transcript text is sent.

You can route those features to your own OpenAI-compatible endpoint instead — a model running locally on your Mac, or any hosted provider. It’s a power-user option for flexibility, and with a local model it keeps transcript text on your machine.

What you can connect

Any endpoint that speaks the OpenAI chat-completions API:

Local — text stays on your Mac. Ollama, LM Studio, or llama.cpp running on your machine. Nothing leaves the device. Step-by-step walkthrough: On-Device AI.
Hosted — text goes to that provider. OpenRouter, OpenAI, or any compatible service. Transcript text is sent to them instead of to Google through our proxy.

The default stays Gemini because that’s what the prompts are tuned against. With your own model, the output quality is the quality of the model you pick.

Set it up

Open Settings → AI.

Under Custom endpoint, fill in:
- Base URL — your server’s OpenAI-compatible address, e.g. http://localhost:11434/v1 for Ollama.
- Model — the model name your server exposes, e.g. qwen3.6:27b.
- API Key — optional. Leave it blank for a local server with no auth; set it for a hosted provider.
- Reasoning — sent as reasoning_effort. None (the default) makes thinking models answer directly instead of reasoning at length first. Some hosted providers only accept Low/Medium/High — if Verify or a feature reports the field was rejected, switch to one of those or Model default (omits the field).
Click Verify. It first checks the server is reachable, then sends a tiny request to confirm the model actually responds — a reachable server with the wrong model name is the most common misconfiguration, and a reachability check alone won’t catch it.
Under Feature routing, set any feature — Dictation, Transform, Post-meeting summary, Live assistant — to Custom endpoint. Each feature lists only the providers it supports, and you can mix: summaries on your local model, dictation on Gemini.

What routes to your endpoint

The features you switch in Feature routing go to your endpoint: dictation, transform, post-meeting summaries (including speaker attribution, action items, and corrections), and the live assistant.

A few things always stay on the default Gemini path: reference-document search, the vocabulary spelling hints, and image (OCR) processing. These rely on capabilities a general chat endpoint doesn’t reliably provide.

Privacy

A local endpoint keeps transcript text on your Mac end to end — combined with on-device transcription, nothing leaves the machine.
A hosted endpoint sends transcript text to that provider instead of to Google. You are responsible for the privacy and security of any endpoint you configure.
Audio is never sent, on any path.

If you want AI summaries with zero cloud, point the features at a local model. If you want no AI at all, use Start meetings offline (Settings → AI) — that keeps transcription on-device and pauses every AI feature.

You can watch the routing work. Open Settings → Network Log and run a meeting: each AI request lists the endpoint it went to — your server’s address instead of our proxy. With a local model that’s localhost, and nothing AI-related leaves the machine. See Network Activity for what the log records.

Plans and limits

A custom endpoint shares your free-tier daily allowance — the same per-feature daily caps as the default Gemini path. The Light and Unlimited plans lift those caps. This is a product limit, not a cost one: even when the model runs on your own hardware, free use is metered the same way.

This is separate from Bring Your Own Key (your own Gemini API key), which is a paid-plan option configured in the same AI settings pane.

Troubleshooting

Verify reports the two failure modes separately:

Cannot connect / server returned a code — the Base URL is wrong, or the server isn’t running.
Model not found / timed out — the server is reachable, but the Model name doesn’t match what it exposes, or the model is too slow to load. Check the exact name your server lists.

If a feature is set to Custom endpoint but no model name is saved, that run is skipped and the feature reports “Custom endpoint not configured — choose a model in AI settings.”

To confirm what is and isn’t leaving your Mac at any moment, see Network Activity.