How Voice AI Ordering Platforms Handle Payments — And Why It's Broken

By Shuttle Team, February 28, 2026

Voice AI Can Take the Order. It Can't Take the Payment.

Voice AI for restaurant ordering is one of the fastest-growing categories in foodtech. SoundHound is in 10,000+ locations. ConverseNow powers Panera and Checkers. Kea, Loman, VOICEplug, Tarro, Checkmate, and a wave of startups funded in 2024-2025 are racing to automate phone orders for restaurants.

The AI is getting good at taking orders. It handles menu questions, customisations, upsells. It works 24/7. It doesn't put callers on hold. Some platforms report 95%+ order accuracy and 26% increases in phone order revenue.

But every one of these platforms hits the same wall: when the customer wants to pay by card, the AI can't handle it.

Not because the AI isn't capable. Because payment capture over voice is a PCI compliance nightmare, and no voice AI company wants to own that problem.

So they work around it. And every workaround is broken in a different way.


The Three Payment Patterns (And Why They All Have Problems)

Pattern 1: The SMS Payment Link Fallback

How it works: The AI takes the order over the phone. When it's time to pay, the system sends the customer an SMS with a payment link. The customer hangs up (or stays on the line), opens the link on their phone, and enters their card details on a web checkout page.

Who does this: SoundHound (with Toast), Checkmate, ActiveMenus, and most chatbot ordering platforms.

Why it's broken:

The whole point of voice AI is that the customer calls and the experience is handled. Sending them to a different channel — phone to SMS to mobile browser — breaks that.

  • Drop-off is real. Not every customer completes the link. Some don't see the text. Some don't trust it. Some are driving.

  • It's not a voice experience anymore. You've built a sophisticated AI that handles the conversation naturally, then you drop the customer into a generic checkout page. The brand experience falls apart.

  • The payment link routes through one PSP. If the platform generates Stripe links, every restaurant pays Stripe fees — even if the restaurant already has a Worldpay or Square merchant account with better rates.

The SMS fallback is the most common approach because it's the easiest to build. But it's a compromise, not a solution.

Pattern 2: POS-Routed Payment

How it works: The AI takes the order and pushes it directly to the restaurant's POS system (Toast, Square, Clover). Payment is handled at the POS — the customer pays when they pick up, or the POS charges a card on file.

Who does this: ConverseNow, Kea, Tarro.

Why it's broken:

  • No prepayment guarantee. If the customer doesn't pay until pickup, there's no commitment. No-shows cost restaurants money. Prepaid orders have significantly higher completion rates.

  • Only works with supported POS systems. Each POS integration is a separate engineering project. Toast's API is different from Square's is different from Clover's. The voice AI platform has to build and maintain each one. Supporting 5 POS systems means 5 integration codebases.

  • The POS controls the payment. Toast processes through Toast Payments. Square processes through Square. The voice AI platform has no control over fees, has no visibility into payment data, and can't offer payment as a feature.

  • Franchise fragmentation. A franchise with 200 locations might have 3 different POS systems across its estate. The voice AI platform needs to support all of them — and each POS routes to its own bundled processor.

POS-routed payment works for simple pay-at-pickup flows. It doesn't work when the platform needs prepayment, multi-PSP flexibility, or payment data ownership.

Pattern 3: Voice Card Capture

How it works: The AI asks the customer to speak their card number, expiry date, and CVV. The audio is processed, tokenised, and the card is charged — all while the customer is on the phone.

Who attempts this: Almost nobody. Loman AI claims basic PCI-compliant voice processing, but this is extremely rare.

Why it's broken (or rather, why nobody does it):

Voice card capture triggers the most demanding tier of PCI DSS compliance:

  • CVV audio must be suppressed — the recording system must detect and redact CVV digits in real-time

  • Call recordings must be purged — any recording containing card data must be identified and deleted

  • The voice channel must be encrypted end-to-end — from the customer's phone through the AI platform to the payment processor

  • PCI DSS 4.0.1 (effective 2025) added specific requirements for voice channel security that most platforms haven't implemented

Building PCI-compliant voice card capture from scratch is a 6-12 month project requiring specialist security engineering and annual PCI audits. For a voice AI startup that raised $5M to build ordering intelligence, spending half the runway on payment compliance infrastructure doesn't make sense.

So they don't. They fall back to SMS links or POS routing. And the payment experience stays broken.


The Underlying Problem: PSP Fragmentation

Beyond the payment capture challenge, voice AI ordering platforms face a structural issue that gets worse with scale: every restaurant uses a different payment processor.

A platform with 1,000 restaurant clients might have merchants on:

  • Toast Payments (bundled with Toast POS)

  • Square (bundled with Square POS)

  • Clover (bundled with Fiserv)

  • Stripe (independent restaurants using Stripe for online orders)

  • Worldpay (enterprise chains with negotiated rates)

  • Adyen (larger groups with multi-market operations)

If the voice AI platform processes payments through its own Stripe account, it's adding a layer of fees on top of whatever the restaurant already pays its POS provider. The restaurant is now paying two processors for the same order — one for in-store, one for phone orders.

If the platform tries to route to each restaurant's existing processor, it needs to build and maintain integrations with every PSP its restaurants use. That's the multi-PSP problem — and it's why platforms in other verticals adopt a payment layer rather than building PSP integrations one at a time.

The food ordering vertical just hasn't caught up yet. Most voice AI companies are still in Pattern 1 or Pattern 2, treating payments as someone else's problem. The ones scaling to thousands of locations are starting to feel the pain.


What a Proper Solution Looks Like

The voice AI ordering payment problem has four requirements:

1. PCI-Compliant Payment Capture During the Call

The customer shouldn't leave the voice channel to pay. The payment capture should happen mid-conversation — the AI hands off to a secure payment environment, the customer enters their card details (via DTMF tones or a secure voice channel), and the conversation resumes.

This is exactly what voice payment infrastructure provides. The voice AI platform integrates once. PCI compliance is carried by the payment provider, not the AI company. Card data never touches the AI platform's systems.

The customer experience: "I'd like to pay by card." → Secure tones capture card details → "Payment confirmed. Your order will be ready in 20 minutes."

No SMS. No link. No channel switching.

2. Route to the Restaurant's Own PSP

When a customer pays for their order, the payment should process through whatever gateway the restaurant already uses — not through the voice AI platform's Stripe account.

A PSP-neutral payment layer makes this possible. The platform integrates once. Each restaurant merchant is configured with their existing processor. Payments route accordingly. The restaurant sees the transaction in their existing settlement reports, alongside their in-store and online orders.

This eliminates the double-processing problem and means the voice AI platform isn't in the money flow — which simplifies compliance, accounting, and merchant relationships.

3. Payment Links for Fallback and Chat Channels

Not every payment needs to happen over voice. Some customers prefer a link. Chat and messaging channels (WhatsApp, Instagram DMs) need payment links by default.

The payment links should route through the same PSP configuration as voice payments — so a restaurant's phone order and WhatsApp order both process through the same gateway, with the same fees, and appear in the same reporting.

4. Platform-Level Reporting and Revenue Share

The voice AI platform needs visibility into payment data across all its restaurant clients — transaction volumes, success rates, channel breakdown (voice vs link vs chat). And it needs a commercial model: revenue share on payment transactions, turning payment infrastructure from a cost into a revenue line.


Why This Matters Now

Three things are converging:

Voice AI is scaling past the early adopter phase. SoundHound, ConverseNow, and others are in thousands of locations. At that scale, the SMS-link workaround creates measurable conversion loss. Platforms processing millions of orders per month can't afford the drop-off.

PCI DSS 4.0.1 is enforced. The updated standard (effective March 2025) tightened requirements for voice channel security. Platforms that were borderline compliant with voice card capture need to re-evaluate. Those that haven't attempted it face an even higher bar to build it themselves.

Restaurants are consolidating payment relationships. As POS systems bundle more services (Toast Capital, Square Loans, Clover Rapid Deposit), restaurants are increasingly locked into their POS provider's payment stack. Voice AI platforms that add yet another payment processor create friction. Platforms that route to the restaurant's existing processor remove it.

The voice AI companies that figure out payments will have a structural advantage. Every order that converts to a prepaid order (vs pay-at-pickup) is guaranteed revenue for the restaurant. Platforms that enable prepayment capture more value — and can monetise the payment flow.


FAQ

Can voice AI platforms use Stripe Connect for restaurant payments?

Stripe Connect lets platforms process payments on behalf of sub-merchants (restaurants). But every transaction routes through Stripe — which doesn't work when a restaurant already uses Toast Payments, Square, or Worldpay. Stripe Connect is a single-PSP solution. Voice AI platforms serving diverse restaurant merchants need multi-PSP flexibility. See when platforms outgrow Stripe Connect.

What about Twilio <Pay> for voice card capture?

Twilio's `<Pay>` connector handles DTMF-based card capture during a call. It's a building block, not a complete solution. It connects to a limited set of processors, doesn't provide merchant onboarding, and leaves PCI scope management to the platform. A payment layer built on Twilio's voice infrastructure provides the same DTMF capture with multi-PSP routing, white-label merchant management, and PCI compliance included.

How does this work for franchise operations?

A franchise with 200 locations might have 5 different POS systems and 3 different payment processors across its estate. A PSP-neutral payment layer configures each location with its assigned processor. The voice AI platform integrates once. Payments at each location route to the correct gateway. The franchisor gets group-level reporting. See enterprise PSP mandates for more on franchise payment complexity.

What about drive-thru voice AI (Taco Bell, Wendy's)?

Drive-thru payment is handled at the window via card terminal — the AI takes the order, the hardware takes the payment. The voice payment problem is specific to phone ordering and off-premise channels where the customer isn't physically present. That said, drive-thru AI platforms expanding into phone ordering (which many are) will hit the same payment wall.

Do ghost kitchens have different payment needs?

Yes. A ghost kitchen operating 5 virtual brands from one physical kitchen needs payments routed to the correct brand entity — potentially different merchant accounts, different PSPs, different settlement. Processing all brands through one account creates accounting and tax complexity. A payment layer with sub-merchant routing handles this cleanly.


Related Reading


Building a voice AI ordering platform?

Shuttle gives food ordering platforms PCI-compliant voice payment capture, payment links, and multi-PSP routing through a single integration. Each restaurant uses their existing processor. Your platform earns revenue share. No PCI burden on your team.

Talk to Us | See How It Works

Talk to us

See how Shuttle can power payments for your platform — multi-PSP, multi-channel, white-label.

Book a Demo