Cost Comparison

MimicScribe’s Unlimited pricing bullet claims it’s often cheaper than pasting transcripts into ChatGPT or Claude. This page shows the measured numbers behind that claim and what the baseline actually is.

What the baseline is

The paste-in baseline simulates a first approach without a custom retrieval pipeline — the workflow most people, including most technical people, actually use for meeting transcripts: open a chat, paste the transcript, ask a question. A developer who builds their own RAG on top of a cheap model can spend less than these numbers, but that’s not the comparison this benchmark targets.

All paste-in calls are routed through OpenRouter, which adds roughly a 5% markup over direct-provider API pricing. MimicScribe’s own side calls Gemini Flash directly.

Results — cost per meeting or per search query

Average across 3 meetings (a short standup, a complex discovery call, and a 90-minute enterprise negotiation) for S1–S3. S4 averages across 3 cross-meeting search queries over a 10-meeting haystack that includes 3 long (75–80 min) meetings plus 7 shorter ones.

ModelSummarySummary + action itemsSummary + 3 follow-up QsSearch, 3 candidatesSearch, 10 meetings
Claude Opus 4.7$0.052$0.054$0.176$0.015$0.224
Gemini 3.1 Pro$0.040$0.043$0.155$0.022$0.082
Claude Sonnet 4.6$0.020$0.022$0.071$0.006$0.097
GPT-5.4$0.019$0.021$0.059$0.003$0.071
Claude Haiku 4.5$0.006$0.007$0.021$0.002$0.032
GPT-5.4 mini$0.005$0.005$0.016$0.001$0.022
DeepSeek V3.2$0.001$0.001$0.004$0.0003$0.008
◆ MimicScribe Unlimited$0.005$0.005$0.014$0.0006$0.0006

Per-call OpenRouter pricing as measured at benchmark run time. Numbers reflect actual token usage — including reasoning tokens for models that think by default (Opus, Gemini 3.1 Pro).

Where the gap comes from

Single-meeting summaries (S1, S2). MimicScribe’s tuned Gemini Flash summary call is roughly on par with Haiku 4.5 and GPT-5.4 mini, and 4–11× cheaper than the flagship models (Opus, Sonnet, GPT-5.4, Gemini 3.1 Pro). Cheap inference providers like DeepSeek V3.2 are slightly cheaper per token, but the user has to do all the orchestration themselves — copy the transcript, craft the prompt, move the output to wherever they actually use it.

Multi-step workflows (S3). The gap widens once you ask follow-up questions. A paste-in user without caching re-sends the full transcript per question. For a 90-minute meeting plus three follow-ups, Claude Opus 4.7 costs about $0.39 per session; MimicScribe spends about $0.03 for the equivalent work. With a continued-thread model (transcript sent once, follow-ups appended) the Opus side would be lower, but still multiple times our cost.

Cross-meeting search (S4). This is the widest gap. A first-approach user who doesn’t know which meeting holds the answer pastes all their recent transcripts into the prompt. With a realistic 10-meeting mix (3 long meetings plus 7 shorter), the naive paste runs about $0.22 per query in Opus or $0.10 in Sonnet 4.6. MimicScribe’s cost stays flat at $0.0006 per query because retrieval happens locally — only the matched chunk ever reaches the LLM.

Quality at equivalent cost

Cost without a quality baseline is meaningless. Every summary in the benchmark was scored by an independent Claude Sonnet judge on three dimensions, 1–5: groundedness (no hallucinations), comprehensiveness (covers decisions, key points, blockers, outcomes), and usefulness (would a person who missed the meeting get the gist and next steps).

ModelS1 costGroundednessComprehensivenessUsefulness
DeepSeek V3.2$0.0014.75.05.0
GPT-5.4 mini$0.0054.75.05.0
◆ MimicScribe Unlimited$0.0054.74.75.0
Claude Haiku 4.5$0.0074.75.05.0
GPT-5.4$0.0224.75.05.0
Claude Sonnet 4.6$0.0224.75.05.0
Gemini 3.1 Pro$0.0364.74.75.0
Claude Opus 4.7$0.0534.75.05.0

Scores averaged across 3 meetings — a short standup, a multi-stakeholder discovery call, and a 90-minute enterprise negotiation.

Groundedness and usefulness are at parity with flagship models. The 0.3-point comprehensiveness gap is isolated to the 90-minute meeting, where MimicScribe and Gemini 3.1 Pro both scored 4 (vs 5 for every other model) — the long-meeting quality gap is a flagship-Google-family limitation, not a tuning-gap in our pipeline. Short and medium-length meetings score 5/5/5.

What this doesn’t measure

Your workflow. The paste-in side bills for every call a user would make. It doesn’t bill for the minutes spent copying transcripts, crafting prompts, or moving outputs to the tool where the work actually happens. For a heavy user, that time is often the dominant cost; this benchmark deliberately sets it aside.

Explicit caching. A user who keeps a continued thread in Claude or ChatGPT gets some amount of prompt caching automatically. S3’s numbers above reflect the conservative upper-bound where every question re-sends the transcript — a fresh chat. Continued-thread caching would narrow the gap, though not close it.

Implicit caching on our side. MimicScribe’s Q&A pipeline in production benefits from Gemini’s implicit cache on repeated prefixes. The numbers above treat every call as a cache miss, so they slightly overstate our real cost.

Classifier overhead. Production MimicScribe runs a small classifier in front of every Q&A call to route the intent. That classifier call is not counted in the numbers above; real cost is marginally higher than shown.

How we test

Three meeting transcripts drawn from our existing benchmark corpora: a short weekly standup, a multi-stakeholder discovery call, and a 90-minute enterprise sales negotiation. Each is run through every model once for S1, S2, and S3. S4 uses a 10-meeting haystack mixing 3 long-form transcripts with 7 shorter template-corpus samples, against three hand-written queries targeting specific meetings.

The paste-in baseline uses a reasonable — not naive — prompt: “Summarize this meeting into sections: Summary, Decisions, Key Points” for S1, bundled “Summary + Action Items” for S2, transcript + question for S3. For S4a we paste the three most plausible candidate transcripts (the user narrows by title and date); for S4b we dump the entire 10-meeting haystack in one prompt (the user has no way to narrow).

MimicScribe’s side runs the production summary prompt against Gemini Flash with structured JSON output, then per-question Q&A calls using the same transcript plus the summary. Retrieval on S4 simulates what search_meetings returns in production: the target meeting’s top-scoring chunk.

Token counts come directly from each provider’s API response. Costs are derived from pricing snapshots fetched from OpenRouter’s /models endpoint at run time and stored alongside the results. OpenRouter adds a small markup over direct provider pricing; direct-API costs would be slightly lower on the paste-in side.

Quality scores come from an independent Claude Sonnet judge that sees the transcript and each model’s summary blind (no model names, no attribution) and scores groundedness, comprehensiveness, and usefulness on a 1–5 scale. For fair comparison, MimicScribe’s structured JSON output is flattened to summary + actionItems — the same content the user sees in the product — before the judge reads it.

Where this sits in the bigger picture

Speed and quality benchmarks for MimicScribe live on the technology page — diarization accuracy, meeting-assistant briefing quality, context retrieval, meeting search. This cost comparison is a separate question from those: how much does using MimicScribe cost, versus approximating the same workflow with a chat window and a general-purpose model? The answer varies by model and by how much back-and-forth the workflow involves, but for any multi-step or cross-meeting use, MimicScribe is meaningfully cheaper — and the gap widens the more you use it.