Stack the platform fee + LLM + STT + TTS + telephony and a working AI receptionist runs $0.09–$0.36/min. For a typical SMB doing 300–500 calls/month, that's $90–$360 in raw infrastructure — before any agency build or run cost. See how we deploy AI voice agents end to end at our AI voice agents service page.
$0.05/min
Vapi entry-only fee — before LLM, STT, TTS & telephony
Source: Vapi.ai/pricing
$1,000/mo
Vapi HIPAA compliance flat surcharge, separate from usage
Source: Vapi billing docs
~50%
ElevenLabs price cut Feb 2025 — Creator & Pro plans dropped to $0.10/min
Source: ElevenLabs blog
1.4–1.7s
Hamming.ai independent p50 end-to-end latency across 4M+ production calls
Source: Hamming.ai, 2026
1m 32s
Median duration of a successful AI voice call (Canonical Chat dataset)
Source: Canonical Chat
42%
Share of AI voice calls that meet their stated objective
Source: Canonical Chat, 2025
The "$0.05 per minute" line on the Vapi pricing page is technically true. It's also useless, because no working AI receptionist actually costs $0.05 a minute. It costs that, plus an LLM, plus speech-to-text, plus text-to-speech, plus a phone number that can receive calls. Stack those and the real number for an SMB looks more like $0.09 to $0.36 a minute, depending on the platform and the voice you pick.
This article tabulates the real 2026 cost of an AI receptionist across five platforms (Vapi, Retell, Synthflow, Bland, and ElevenLabs Conversational AI), every component that goes into a working stack, and three SMB worked examples (dental, law, HVAC) so you can see the monthly bill before you commit. Every number is sourced. Nothing is rounded to make a vendor look better.
What does “AI voice agent cost per minute” actually include?
Answer
A working AI receptionist isn't one product. It's a pipeline. Audio comes in over a phone line, gets transcribed to text, gets reasoned over by an LLM, gets rendered back to audio, and goes out the same phone line. Each of those steps is a paid API call.
Five components, billed on five different clocks:
- Orchestration / platform fee: the per-minute rate the voice-agent platform charges to glue the pipeline together (Vapi $0.05, Retell $0.055, Synthflow $0.09 voice engine).
- Speech-to-text (STT): what the caller says, transcribed in real time. Deepgram, AssemblyAI, OpenAI, and Google STT are the common choices.
- LLM tokens: input tokens (the prompt + caller transcript) and output tokens (the AI's response), billed per million.
- Text-to-speech (TTS): the voice you hear back. ElevenLabs, Cartesia, Deepgram Aura, OpenAI TTS, Hume, and PlayHT compete on quality and latency.
- Telephony termination (PSTN): Twilio, Telnyx, or SignalWire to connect the call to a real phone number. Inbound and outbound are billed separately, plus a monthly DID rental.
Bland and ElevenLabs Conversational AI are the two exceptions. Both quote a single all-in per-minute rate that bundles the components above. Everything else stacks.
What are the 5 components of an AI voice agent stack (and what does each cost in 2026)?
Across the platforms covered here, expect roughly these per-minute ranges in 2026:
| Component | Typical 2026 range | Notes |
|---|---|---|
| Orchestration platform fee | $0.05 to $0.09/min | Vapi, Retell, Synthflow voice engine |
| Speech-to-text (STT) | $0.0025 to $0.016/min | AssemblyAI Universal-2 cheapest; Google STT v2 Chirp standard |
| LLM (text models) | $0.005 to $0.080/min | Depends on model and tokens-per-minute; GPT-5 nano vs GPT-5.4 |
| TTS | $0.015 to $0.10/min | OpenAI TTS-1 cheap; ElevenLabs Flash mid; PlayHT Premium high |
| Telephony (US local) | $0.008 to $0.022/min | Twilio local inbound $0.0085, toll-free inbound $0.022 |
| HIPAA / compliance | $0 to $1,000/mo flat | Vapi $1,000/mo; Synthflow Enterprise-only; Retell add-on |
| Concurrency | First 5 to 20 free | Then $8 to $20/mo per extra reserved line |
The cheapest viable stack lands near $0.07/min. A premium stack with a top-tier voice and a frontier LLM lands closer to $0.30/min. Most working SMB deployments sit between $0.10 and $0.20/min.
How much does an AI receptionist cost per minute, all-in?
Answer
| Platform | Platform fee | STT | TTS | LLM | Telephony | HIPAA add-on | All-in @ 500 min/mo |
|---|---|---|---|---|---|---|---|
| Vapi | $0.05/min | bring your own ($0.003 to $0.016) | bring your own ($0.015 to $0.10) | bring your own ($0.005 to $0.08) | bring your own (~$0.008+) | $1,000/mo | $35 to $165 + $1,000 if HIPAA |
| Retell | $0.055/min | included in voice | $0.015/min platform voice | $0.003 to $0.080/min selectable | bring your own (~$0.008+) | PII removal +$0.01/min | ~$55 ($0.11/min x 500) |
| Synthflow | $0.09/min voice engine | included | included | $0.02 to $0.05/min | $0.00 to $0.02/min | Enterprise plan only | $75 to $120 ($0.15 to $0.24/min) |
| Bland | (bundled) | included | included | included | included (US) | not separately listed | $55 to $70 ($0.11 to $0.14/min) |
| ElevenLabs Agents | (bundled) | included | included | LLM pass-through extra | bring your own | not separately listed | $40 to $60 + LLM ($0.08 to $0.12/min x 500) |
Verified against Vapi pricing, Retell AI pricing, Synthflow pricing, Bland AI billing docs, and the pxlpeak ElevenLabs pricing breakdown. All numbers retrieved 2026-05-03.
What this means for SMB operators:the headline rate on a vendor's pricing page is the floor, not the ceiling. Vapi's “$0.05/min” is true only if you bring zero LLM, zero STT, zero TTS, and zero phone number. The honest comparison number is the all-in 500-minutes-a-month column. That puts Bland and Retell within a few cents of each other for a typical use case, while Vapi swings widely depending on the voice and model you wire in.
What does Vapi actually cost (what's hidden in the $0.05/min)?
Answer
Vapi includes 10 concurrent call lines by default; additional lines run $10/month each. The platform is genuinely flexible (any STT, any LLM, any TTS), which is also why pricing comparisons keep getting it wrong. There's no single “Vapi price.” There's a Vapi-orchestrated stack price, and you build it.
One caveat worth flagging: the vapi.ai/pricing page is JavaScript-rendered, so automated fetchers (and AI Overviews) often miss it. The $0.05/min platform figure here comes from two independent third-party breakdowns that line up with Vapi's own billing docs.
What does Retell AI actually cost?
Answer
Retell's pricing structure is the most transparent of the five platforms. Every component is a line item on the live pricing page, and you can build the stack you want from a dropdown. Concurrency is generous: the first 20 concurrent calls are free, then $8/month per additional reserved line. Add-ons include a Knowledge Base at +$0.005/min and PII removal at +$0.01/min. Verified phone numbers are $10/month each.
For most SMB receptionist deployments, Retell is the platform that gives you the smallest gap between your back-of-the-envelope estimate and the actual invoice.
What does Synthflow cost?
Answer
Because Synthflow charges PAYG across all usage tiers, it rewards variable-volume SMBs: you pay only for minutes you actually use, with no minimum commitment. The tradeoff is that per-minute rates are higher than a well-tuned Retell stack, so at sustained high volume (5,000+ minutes/month) it can run more expensive. For HIPAA-required SMBs, the Enterprise tier is the only path, which means negotiating custom pricing above the 10K-minute threshold.
What does Bland AI cost after the December 2025 tier change?
Answer
Bland's all-in pricing is the cleanest model on the market for SMB buyers who don't want to think about component stacks. Pick a tier, get a number, that's the bill. Transfer time is billed separately and tiered as well: $0.05/min on Start, $0.04/min on Build, $0.03/min on Scale. The free trial includes 2 credits and a free inbound number (a $15/month value).
If you see a 2025-dated article quoting Bland at $0.09/min, treat the rest of that article with skepticism. Pricing isn't the only thing it's missed.
What does ElevenLabs Conversational AI cost (the platform nobody compares)?
This is the platform every multi-vendor comparison leaves out, which is strange given that ElevenLabs cut Conversational AI prices roughly 50% in early 2025 and now sits in the same per-minute range as Bland and Retell.
Answer
The ~50% Conversational AI price cut was announced February 11, 2025, with Creator and Pro plans dropping to $0.10/min. Pre-cut, the rate was approximately $0.20/min. ElevenLabs' subscription tiers run Free $0 / Starter $6 / Creator $11 / Pro $99 / Scale $299 / Business $990, and Business plan low-latency TTS gets as low as 5 cents/minute when used standalone.
ElevenLabs is the right call when voice quality is the differentiator (legal intake, premium concierge), not when raw component cost is. The voices are consistently the best in the field, and it's the Hamming.ai latency leader (see section 13).
STT, LLM, and TTS component pricing reference (2026)
For Vapi, Retell, and Synthflow stacks, you pick the components yourself. Here's the per-minute math in 2026.
Speech-to-text (STT)
| Vendor / model | Rate | Notes |
|---|---|---|
| Deepgram Nova-3 streaming, monolingual | $0.0048/min PAYG | $0.0042/min on Growth tier |
| AssemblyAI Universal-2 | $0.0025/min ($0.15/hr) | Cheapest in class |
| OpenAI Whisper API | $0.006/min | |
| OpenAI gpt-4o-transcribe | $0.006/min | |
| OpenAI gpt-4o-mini-transcribe | $0.003/min | |
| Google Cloud STT v2 Chirp (streaming) | $0.016/min standard, down to $0.004/min at volume | 60 min/mo free tier |
Source: Deepgram, AssemblyAI, TokenMix, Google Cloud
Sources: Deepgram pricing, AssemblyAI pricing, TokenMix Whisper API guide, Google Cloud Speech-to-Text pricing.
LLMs (text models, $/MTok)
| Model | Input ($/MTok) | Output ($/MTok) |
|---|---|---|
| OpenAI GPT-5 | $1.25 | $10.00 |
| OpenAI GPT-5.4 | $2.50 | $15.00 |
| OpenAI GPT-5.4 mini | $0.75 | $4.50 |
| OpenAI GPT-5.4 nano | $0.20 | $1.25 |
| OpenAI GPT-4o | $2.50 | $10.00 |
| Anthropic Claude Sonnet 4 / 4.5 / 4.6 | $3.00 | $15.00 |
| Anthropic Claude Haiku 4.5 | $1.00 | $5.00 |
| Anthropic Claude Opus 4.5 / 4.6 / 4.7 | $5.00 | $25.00 |
| Google Gemini 2.5 Flash (text in) | $0.30 | $2.50 |
| Google Gemini 2.5 Pro (up to 200k) | $1.25 | $10.00 |
Source: BenchLM, PricePerToken, Anthropic docs, Google AI
OpenAI rates verified via BenchLM's API pricing tracker and PricePerToken; Anthropic via Claude pricing docs; Google via the Gemini API pricing page. Voice agents typically burn 200 to 600 input tokens and 100 to 300 output tokens per turn; at 5 to 10 turns per minute, that lands in the $0.005 to $0.080/min range depending on model.
For real-time audio I/O, OpenAI gpt-4o-realtime bills $100/MTok audio in and $200/MTok audio out, which works out to roughly $0.06/min in and $0.24/min out.
TTS
| Vendor / model | Rate | Notes |
|---|---|---|
| ElevenLabs Flash v2.5 / Turbo v2.5 | 0.5 to 1 credit/character (tier-dependent) | Business low-latency 'as low as 5 cents/min' |
| Cartesia Sonic-3 | 15 credits/sec audio | Pro $4/mo, Startup $39/mo, Scale $239/mo |
| OpenAI TTS-1 | $15/MChar (~$0.015/1K chars) | |
| OpenAI TTS-1-HD | $30/MChar (~$0.030/1K chars) | |
| OpenAI gpt-4o-mini-tts | ~$0.015/min | |
| Deepgram Aura-2 | $0.030/1K chars PAYG | $0.027/1K chars on Growth |
| Hume Octave Pro | $70/mo, 1M chars; $0.05/1K chars overage | |
| PlayHT Creator | $49/mo annual ($99 monthly) | Verify before publish: connection issues at fetch |
Source: ElevenLabs, Cartesia, TokenMix, Deepgram, Hume, voice.ai
Sources: ElevenLabs pricing, Cartesia pricing, TokenMix TTS comparison, Deepgram pricing, Hume pricing, voice.ai PlayHT pricing breakdown.
What does telephony termination cost? (Twilio vs Telnyx vs SignalWire)
Answer
| Provider | Local inbound | Local outbound (US) | Toll-free inbound | DID rental |
|---|---|---|---|---|
| Twilio Programmable Voice (local) | $0.0085/min | $0.0140/min | $0.0220/min | $1.15/mo local, $2.15/mo toll-free |
| Telnyx (local) | from $0.0035/min | from $0.005/min | from $0.015/min | verify on Numbers page |
| SignalWire (10DLC local) | $0.0066/min | $0.0080/min | $0.0147/min | $0.50/mo local, $0.80/mo toll-free |
| Twilio SIP / WebRTC | $0.004/min | $0.004/min | n/a | n/a |
| SignalWire SIP / WebRTC | $0.003/min | $0.003/min | n/a | n/a |
Source: Twilio, Telnyx, SignalWire
Verified via Twilio Programmable Voice pricing, Telnyx Elastic SIP pricing, SignalWire voice pricing. Telnyx also offers channel-based billing as an alternative to per-minute: first 10 channels at $12/mo, scaling down to $8/mo above 250.
For an SMB doing under 5,000 minutes a month, the telephony delta between providers is roughly $20 to $50/month. It's not where you should optimize first.
What does an AI voice agent actually cost an SMB per month? (Three worked examples)
Per-minute rates are abstractions. Here's what three real SMB use cases pay each month at typical 2026 stacks.
Dental practice
300 calls/month, 3.5 min average = 1,050 min/mo. Volume reflects the 40 to 60 calls/day median for solo dental practices (existing-patient calls run 1 to 3 min, new-patient calls 4 to 6 min, blended around 3.5 min).
Retell stack (GPT-4.1, platform voice, Twilio local)
1,050 min x $0.11/min + ~$0.0085/min telephony + $10 number
ElevenLabs Agents stack (Turbo tier, GPT-4o-mini pass-through, Twilio local)
1,050 min x $0.10/min + LLM ~$0.01/min + ~$0.0085/min telephony
Law firm after-hours intake
50 calls/month, 6 min average = 300 min/mo. Unqualified intake calls run up to 5 min; PI / family intake creeps to 8 to 10 min. Six minutes is a reasonable blended average.
Retell stack
300 min x $0.11/min + telephony + $10 number
ElevenLabs Agents stack (Premium tier for voice quality)
300 min x $0.12/min + LLM + telephony
Volume is low enough that the $10 number rental matters as much as the per-minute rate.
HVAC contractor
400 calls/month, 2 min average = 800 min/mo. HVAC inbound calls are short and transactional: address, problem, dispatch window. The bigger problem this stack solves is the approximately 22% annual missed-call rate (35% in peak season).
Retell stack
800 min x $0.11/min + telephony + $10 number
ElevenLabs Agents stack (Standard tier)
800 min x $0.08/min + LLM + telephony
These figures are infrastructure only. They don't include the time to build the agent, write the prompts, integrate with the booking flow or CRM, or run the system day to day.
What this means for SMB operators: for a typical SMB doing under 1,500 minutes a month, the platform you pick rarely changes the bill by more than $30 to $50/month. What changes the bill 3x to 10x is whether the system is wired into your scheduling and CRM correctly. The infrastructure is cheap. The integration work is where projects succeed or fail. For an outbound counterpart (booking confirmations, reactivation campaigns), see how we structure lead-generation automation.
How long is the average AI voice agent call?
Answer
| Use case | Typical duration | Source |
|---|---|---|
| Dental, existing patient | 1 to 3 min | AgentZap dental phone stats |
| Dental, new patient | 4 to 6 min | AgentZap |
| Law firm, unqualified intake | up to 5 min | Filevine intake KPIs |
| Law firm, qualified sign-up | ~30 min | Filevine |
| PI / family law intake | 8 to 10 min | Alert Communications |
| AI voice agent (all calls, blended median) | 51s | Canonical Chat |
| AI voice agent (successful calls) | 1m 32s | Canonical Chat |
Canonical Chat also reports that 42% of AI voice calls meet their stated objective, vs ~70% first-call resolution and 2m 50s avg talk time for human call-center agents. Read those numbers together: AI agents are ~60% as effective as humans on first-call resolution, but they cost $0.11/min instead of $0.50 to $1.20/min.
Voice-AI latency benchmarks: vendor self-reported vs Hamming.ai independent
Answer
| Source | Metric | Value |
|---|---|---|
| Hamming.ai (independent, 4M+ calls) | p50 end-to-end latency | 1.4 to 1.7s |
| Hamming.ai | p99 end-to-end latency | 8 to 15s |
| Canonical Chat (production) | Median human-to-AI response gap | 1.95s |
| Canonical Chat | Median time-to-first-AI-word | 880ms |
| Vapi (self-reported) | Target | sub-500ms; real-world 600ms to 1,000ms+ |
| Retell (tested with ElevenLabs v3) | Observed | ~600ms; long-turn pauses ~1.1s |
| ElevenLabs (claimed) | Component-level | sub-100ms |
| TTS leader (Hamming) | ElevenLabs Flash | 75ms |
| TTS runner-up (Hamming) | Cartesia Sonic | 90ms |
| ITU-T G.114 reference | One-way voice quality threshold | <300ms |
What this means for SMB operators:every vendor's marketing page is technically truthful and operationally misleading. ElevenLabs Flash genuinely does deliver ~75ms TTS. Vapi orchestration genuinely can target sub-500ms. The reason real calls land at 1.4 to 1.7s is that you stack STT + network + LLM + TTS + network + return, and none of the sub-500ms claims survive contact with the full pipeline. When you evaluate a platform, ask for end-to-end median (and p99) on a stack that matches what you'll actually deploy. Not isolated component benchmarks.
Is it cheaper to build an AI voice agent or buy a platform?
Answer
Two real-world numbers from the indie-operator world worth keeping in mind:
- The "$3,000 to find out the hard way" tax. One operator tested 7 voice-agent platforms before settling on one, spending ~$3K in API credits along the way. If you're shopping, set a budget for the trial phase.
- Custom builds that cost more than budgeted. A Reddit-cited insurance agency budgeted $80K for a custom AI build, ended up spending $160K, and then switched to a $600/mo managed platform. Stories like this are anecdotes, not industry-wide averages, but they illustrate why the build-vs-buy math so rarely closes for SMBs.
Agency retainer ranges from operator-published data: $800 to $3,500/month for full-deployment management, with setup fees of $3K to $15K for the build. Custom dev hourly rates run $50 to $150/hr for typical implementation work.
What this means for SMB operators: if you're an SMB, you shouldn't be in the “build” business. The math has been settled for two years. The honest framing is platform-vs-platform, not build-vs-buy. We cover the integration side at our AI integration services page.
What's the cheapest viable AI voice agent stack for an SMB right now?
Answer
Key finding 1: don't optimize on the platform fee.
Key finding 2: bundled all-in beats stacked components for predictability.
Key finding 3: HIPAA changes the calculus.
Key finding 4: ElevenLabs Conversational AI is the gap in every other comparison.
Key finding 5: the integration tax is the real cost.
For dental, salon, and clinic verticals where HIPAA matters, lean Retell or Synthflow Enterprise. For HVAC, contractors, and small law where speed of deploy matters more than HIPAA, lean Bland. For premium-voice intake (boutique law, concierge medical), lean ElevenLabs Conversational AI Premium tier. We deploy across all four for clients via our AI voice agents service; for ongoing customer-side conversation handling beyond the receptionist scope, see customer support automation.
Frequently asked questions
How much does an AI receptionist cost per month for a small business?
For an SMB doing 300 to 500 calls a month at typical 2 to 4 minute average duration, raw platform infrastructure runs $80 to $200/month across the major platforms. Add agency build cost ($1,500 to $5,000 one-time) and run cost ($400 to $1,200/month) if you don't want to manage it yourself.
Is it cheaper to use Bland or Vapi?
For SMB volumes (under 2,000 minutes/month) Bland is usually cheaper because the all-in $0.11 to $0.14/min beats a typical Vapi stack of platform + LLM + STT + TTS + telephony, especially if you'd otherwise pay Vapi's $1,000/month HIPAA surcharge. For 5,000+ minutes/month with no HIPAA need, a tuned Vapi stack can come in lower.
What's hidden in Vapi's $0.05 per minute price?
The $0.05/min is platform orchestration only. STT, LLM, TTS, and telephony are all billed separately. HIPAA compliance adds a flat $1,000/month. Realistic stacked cost for a working agent runs $0.07 to $0.33/min.
Does ElevenLabs Conversational AI charge for the LLM separately?
Yes. The per-minute rate ($0.08 / $0.10 / $0.12 by tier) covers orchestration and TTS. LLM tokens are pass-through and billed separately on top.
How does Synthflow's pricing compare to Vapi or Retell pay-per-minute?
Synthflow is pay-as-you-go at $0.15 to $0.24/min depending on the LLM and telephony choices you make. Retell at ~$0.11/min on a typical stack and Vapi at $0.07 to $0.25/min are also pay-per-minute. All three charge only for minutes used, with no unused-capacity penalty. Synthflow's higher PAYG floor reflects its bundled voice engine and LLM handling; Retell's lower floor reflects a more a-la-carte component model.
What happens to the per-minute rate when I'm not on a call?
You're not billed for idle time on any of the five platforms. Concurrency reservations are billed monthly (e.g., $8 to $20/mo per extra reserved line on Retell or Synthflow), but per-minute usage only accrues during active call time. Hold time and transfer time are billed on Bland (transfer at $0.03 to $0.05/min by tier).
How much does it cost to have an agency build and run an AI receptionist?
Operator-published ranges put agency setup at $3K to $15K and ongoing retainer at $200 to $1,200/month for typical SMB deployments, scaling to $3,500/month for full-deployment management on more complex stacks. Build cost varies with how many integrations (CRM, scheduling, dispatch, payment) you need wired in.
Methodology
Methodology and Sources
All pricing data was retrieved on 2026-05-03 from primary vendor pricing pages where reachable, and from independent third-party breakdowns (Cloudtalk, Emitrr, pxlpeak, BenchLM, TokenMix, PricePerToken) where the primary page was JavaScript-rendered, returned 403 to automated fetchers, or refused connection. Specifically:
- The vapi.ai/pricing page is JS-rendered; the $0.05/min platform fee was cross-confirmed via two independent third-party breakdowns (Cloudtalk and Emitrr).
- The OpenAI pricing page returned 403 to automated retrieval; GPT-5, GPT-5.4, Whisper, and TTS rates were sourced via three independent secondary trackers (BenchLM, TokenMix, PricePerToken) that aligned on the same numbers.
- The ElevenLabs Conversational AI tier breakdown ($0.08 / $0.10 / $0.12 by Standard/Turbo/Premium) is sourced from pxlpeak's detailed explainer alongside the official Feb 2025 X announcement of the ~50% price cut. The ElevenLabs help-center page returned 403.
- Bland's pricing changed materially on Dec 5, 2025 from a flat $0.09/min to tiered Start/Build/Scale rates. Older articles still quote the obsolete number; this article uses the current Bland docs.
- Synthflow moved to a pure PAYG model in 2026. Previously published bundled-plan pricing (Pro/Growth/Agency tiers) is no longer offered. This article reflects the current PAYG structure confirmed on synthflow.ai/pricing as of 2026-05-03.
- Latency benchmarks combine vendor self-reported figures with the independent Hamming.ai dataset (4M+ production calls) and Canonical Chat's production telemetry, deliberately presented side by side to surface the gap between marketing claims and field reality.
- Vendor-published comparisons were excluded as primary sources because every one we audited produced a self-serving result. Where vendor docs were used, only first-party billing and pricing pages were cited.
We will refresh this article each quarter as platform pricing shifts.

WRITTEN BY
Paul Bendzik
Founder, Scale me AI · 10+ years in software, marketing, and AI automation
See how Scale me AI builds, deploys, and operates AI voice agents for SMBs.
Typically live in under 2 weeks.
Last updated 2026-05-03
