Voice Payments Are an Architecture Decision, Not a Feature Request

By Shuttle Team, February 26, 2026

Why CPaaS, CCaaS, and AI voice platforms need an embedded payment layer — not another bolt-on integration.

The shift

Payments moved inside the call.

Not "after the call." Not "we'll send you a link." Inside it. The customer says a number, the system captures it, the payment confirms, and the conversation continues. No handoff. No second session. No human in the loop.

This happened because voice platforms matured. IVR gave way to intelligent routing. Intelligent routing gave way to AI agents. And AI agents hit the same wall every time. The ones handling collections, renewals, order confirmations, insurance claims. The customer is ready to pay, and the platform has nowhere to send them.

For CPaaS providers, CCaaS vendors, and AI voice platforms, this is no longer a feature request. It's an architecture decision. And the wrong one costs years.

Why platforms can't ignore this

Your customers are already solving this problem. They're solving it badly.

Collections teams are saying "I'll send you a link" and losing 30-40% of payment commitments in the handoff. Contact centres are asking callers to read card numbers over open lines, pulling PCI scope into systems that were never designed for it. AI agents are hitting a dead end the moment money enters the conversation, because there's no payment surface inside the call flow.

These aren't edge cases. They're the default experience for any platform running voice at scale.

The question for platform teams isn't whether your customers need payments inside voice. They do. The question is whether they solve it inside your platform or outside it.

Outside means third-party bolt-ons, compliance risk you can't control, and a customer experience your platform gets blamed for but can't fix.

Inside means you own the integration point. You control the experience. And you capture value on every transaction.

Three architecture patterns

Every platform that confronts this problem lands on one of three approaches. Two of them fail at scale.

Pattern 1: Transfer-out

The call hits a payment moment. The system transfers the caller to a separate IVR. A standalone payment line operated by a third party. The caller re-authenticates, enters card details, and either gets transferred back or hangs up.

This is the legacy model. It works for low-volume, low-expectation use cases. It fails everywhere else.

Transfer-out breaks the conversation. The AI agent loses context. The caller loses patience. Completion rates drop below 50% on most implementations because the friction of being bounced between systems is enough to make people give up.

For AI voice platforms, transfer-out is structurally incompatible. The entire value proposition of an AI agent is that it handles the conversation end-to-end. Transferring to a separate IVR for payment is an admission that the agent can't do its job.

Pattern 2: Bolt-on

The platform integrates a single PSP's payment API directly into the call flow. Stripe, or Worldpay, or whichever PSP the first customer required. Card capture happens inline, maybe via DTMF, maybe via a pause-and-link flow. It works for that one PSP.

Then the second customer needs Adyen. The third needs Checkout.com. The enterprise prospect mandates their existing acquirer relationship and won't negotiate.

Bolt-on creates a linear scaling problem. Every new PSP requires a separate integration, a separate certification process, and ongoing maintenance as APIs change. By the third PSP, payments are consuming product and engineering resources that should be building the platform's core product.

Worse, bolt-on usually means the platform is handling card data. Or at minimum, sitting in the PCI scope chain in ways that weren't planned for. One PSP integration is manageable. Five is a compliance programme.

Pattern 3: Embedded layer

A dedicated payment layer sits between the voice platform and any PSP. The platform makes one integration. The layer handles PSP routing, PCI compliance, card capture, and settlement across every gateway the customer needs.

The voice platform never touches card data. The AI agent never sees a card number. The layer handles DTMF capture in a PCI-certified environment, confirms the payment in real time, and returns the result to the call flow. The agent says "your payment of £247 has been confirmed" and the conversation moves on.

This is the only pattern that scales across customers, PSPs, and compliance requirements without turning payments into a permanent line item on the product roadmap.

The PCI wall

PCI compliance is where most platform teams underestimate the problem.

The moment card data touches any system, even transiently, that system is in PCI scope. The voice platform, the call recording infrastructure, the AI model, the logging pipeline. All of it.

For voice, the architecture has to enforce hard boundaries.

DTMF capture must be separated. When a caller enters card digits via keypad tones, those tones must be intercepted and processed in a PCI-certified environment before they reach the platform's audio stream. The platform receives silence or flat tones during capture. The AI agent hears nothing. The recording captures nothing.

AI agents must never process card data. The language model handles the conversation. The payment layer handles the card. These two systems must be architecturally isolated. Not just logically separated, but running in different environments with different compliance boundaries. An AI model that has access to card data is a PCI problem that no QSA will sign off on.

The real question is scope reduction, not scope management. The difference between SAQ-A and SAQ-D is the difference between a compliance checkbox and a six-figure annual audit programme. Platform teams that build payment capture in-house inherit SAQ-D. Platform teams that use an embedded layer operate at SAQ-A.

This isn't a configuration decision. It's an architectural boundary that determines your compliance posture for years.

The multi-tenant PSP problem

Voice platforms are multi-tenant by nature. You don't serve one customer. You serve hundreds or thousands, each with their own payment stack.

The mid-market customer uses Stripe because it was easy to set up. The enterprise customer mandates Worldpay because they negotiated rates across their entire business. The regulated customer requires a specific acquirer for compliance reasons. The international customer needs local acquiring in markets where your default PSP doesn't operate.

A platform that supports one PSP serves one segment. A platform that supports any PSP serves the entire market.

This is the multi-tenant PSP problem, and it's unique to platforms. A single merchant picks one PSP and moves on. A platform must support whatever PSP each of its merchants requires, without building and maintaining each integration.

In voice, this compounds. Each PSP has different tokenisation flows, different API patterns for authorisation, different webhook formats for confirmation. Building DTMF capture that works with Stripe is one project. Making it work with Stripe, Adyen, Worldpay, Checkout.com, Braintree, and 35 others is a programme.

The embedded layer pattern solves this. The platform integrates once. The layer routes to the right PSP per merchant, per transaction. Adding a new PSP for a new customer is configuration, not engineering.

Build vs partner: the real numbers

Every platform team's first instinct is to build it. "We'll integrate Stripe's API, add DTMF capture, handle PCI ourselves." The instinct is understandable. The maths isn't.

Timeline: 12-18 months to production for a single PSP with PCI-compliant voice capture. That's not a guess. It's the consistent timeline reported by platform teams that have done it. DTMF integration, PCI audit preparation, acquirer certification, production hardening, edge case handling. Each step takes longer than expected.

Cost: $2M+ fully loaded. Engineering time, PCI QSA assessment ($50-150k annually), infrastructure for a PCI-certified capture environment, ongoing maintenance as PSP APIs evolve. For a second PSP, add another 4-6 months and $500k+.

Compliance burden: PCI DSS Level 1 assessment is not a one-time event. It's an annual cycle. Audit preparation, evidence gathering, remediation, assessor engagement. The internal team required to maintain certification is typically 2-3 dedicated headcount.

The build path makes sense in exactly one scenario: payments are your core product and PSP integration is your competitive advantage. For every other platform, it's a misallocation of engineering resources toward a solved problem.

This is what Shuttle does. One integration. 40+ PSPs. Voice capture via DTMF with architectural separation. The integration takes weeks, not quarters. The platform never enters PCI scope.

What production looks like

In production, the payment confirms inside the natural pause of the conversation. Under two seconds. The agent says "please enter your card number on your keypad." The caller enters digits. The tones are intercepted by the payment layer before reaching the platform's audio stream. The agent hears silence. The recording captures nothing. The payment processes. The agent confirms and moves on.

No transfers. No separate IVR. No card data in the platform's environment.

The platform receives structured webhook events. Payment initiated, payment confirmed, payment failed, refund processed. Enough to update CRM records, order status, customer accounts. The payment layer handles the transaction. The platform handles the business logic.

The customer never knows the layer exists. The experience is branded as the platform's. The merchant portal is the platform's. The infrastructure is invisible.

The revenue model platforms miss

Most platform teams think about payments as a cost centre. Infrastructure to build, compliance to manage, PSPs to wrangle. A tax on the product roadmap.

They're missing the revenue line.

Every payment processed through your platform is a transaction you can monetise. Revenue share on payment volume. Paid by the PSP or the payment layer, not by the merchant. The platform's cut comes from the infrastructure it provides, not from charging its customers more.

This is incremental revenue on existing call volume. No new sales motion. No new product to market. Customers are already making calls. They're already trying to pay. The platform just captures value on the transactions it's already facilitating.

For a voice platform processing 100,000 payment transactions per month, even modest per-transaction economics generate a meaningful revenue line, with zero engineering investment after the initial integration.

The payment layer handles PCI, PSP routing, and settlement. The platform collects a share. That's the model.

Decision framework

Build if:

  • Payments are your core product, not an adjacent feature

  • You need exactly one PSP and will never need another

  • You have dedicated PCI compliance staff and budget for annual assessment

  • You're prepared for a 12-18 month timeline before first production transaction

Partner if:

  • Your customers use different PSPs and you need to support all of them

  • You want payments live in weeks, not quarters

  • PCI compliance is a cost you'd rather eliminate than manage

  • Your engineering team should be building your core product, not payment infrastructure

Do nothing if:

  • You're comfortable with customers solving payments outside your platform

  • You don't mind the revenue leaking to third parties

  • You're not competing with platforms that have embedded payments

Doing nothing is a valid choice. It just becomes less valid every quarter as more platforms ship native payment capabilities and your customers start asking why you haven't.

Shuttle is the payment layer for voice, links, and embedded checkout. PCI DSS Level 1. 40+ PSPs. One integration. shuttleglobal.com

Talk to us

Make enabling payments for your platform and merchant users easy.

Book a Call