The $40 billion deepfake fraud forecast: where banks need to defend now

A finance employee at a global engineering firm joined what looked like a routine video call with their CFO and several colleagues. The call ended with a transfer of HK$200 million — roughly $25.6 million — to five accounts they had never seen before. Every person on that call, except the employee, was an AI-generated deepfake. The incident at engineering firm Arup is no longer an outlier. According to projections shared by Deloitte's Center for Financial Services and echoed at the 2026 Munich Cyber Security Conference, AI-enabled fraud losses are on track to reach $40 billion globally by 2027, up from roughly $12 billion in 2023.

A fraud category that outpaced its defenses

Three forces converged in the past 18 months to push voice and video impersonation into the mainstream of financial crime. Generative voice tools dropped to three to five seconds of sample audio required to clone a target. Real-time face-swapping models became fast enough to run inside ordinary video conference software. And criminal marketplaces packaged the whole stack into "fraud-as-a-service" offerings priced in the low thousands. The result is a threat that scales the way phishing did a decade ago — but with a far higher conversion rate, because the victim hears a familiar voice or sees a trusted face.

Recent industry reporting helps illustrate the scale. The State of the Call 2026 survey found that one in four Americans received a deepfake voice call in the past 12 months, and the FBI now classifies deepfake business email compromise as one of the fastest-growing fraud categories targeting US enterprises, with average losses of roughly $680,000 per incident.

Where banks are most exposed

Banking and financial services sit at the centre of the threat for a simple reason: every channel that uses a human voice or face as a trust signal is now potentially compromised.

Call-centre voice authentication

Many institutions still rely on voiceprints or knowledge-based verification when a customer phones in. With cloning tools now needing only a brief audio sample — easy to harvest from social media, podcasts, or hold music — legacy voiceprints can be replayed or synthesised by an attacker who has never met the customer. Detecting synthetic voice in real time is now the baseline expectation, not a future capability.

Remote onboarding and KYC

Account opening flows that record a short selfie video or a "say this phrase" liveness check were designed for replay attacks, not generative ones. Off-the-shelf face-swap tools have been shown in independent testing to defeat several widely deployed liveness providers under lab conditions. Banks running remote KYC at scale need detection targeting generative artefacts, not just basic replay.

Internal executive impersonation

The Arup case is the headline example, but every multinational with public-facing executives is exposed. Voice cloning of a CEO, paired with a spoofed caller ID or a compromised email thread, is now a documented pattern in fraud teams' incident reports.

What "good enough" detection looks like in 2026

A defensible posture today rests on three layers working together:

Multimodal biometric checks that combine voice, face, and behavioural signals — so a successful deepfake on one channel is caught by inconsistency on another.
Generative artefact detection trained on the latest open-source and commercial voice and video models, refreshed often, with explainable signals fraud analysts can act on.
Identity correlation across modalities — for example, confirming that a voice and a face actually belong to the same person, even when no prior biometric template exists.

The last point is critical because attackers increasingly mix and match. A real face, a cloned voice. A real voice, a deepfaked face. Standalone detectors miss the seam. Cross-modal identity checks close it.

A practical checklist for fraud and security teams

For teams reviewing their 2026 controls, a useful starting point:

Inventory every channel where voice or face acts as a trust signal — call centre, video KYC, internal authorisation flows, fraud reviews.
Test current providers against the latest generation of voice cloning and face-swap tools, not last year's. Refresh that test quarterly.
Add an out-of-band step — a callback to a registered number, a second approver, or a transaction freeze — for any voice- or video-only authorisation above a defined threshold.
Brief executives whose voices and faces are public on the realistic threat, including which channels should never be trusted for instructions.
Track regulatory movement. The proposed AI Fraud Accountability Act of 2026 (S.3982) would create federal penalties for digital impersonation with intent to defraud, signalling that compliance teams should expect tighter audit expectations.

The institutions winning against this threat are not the ones with the loudest detection vendor — they are the ones treating voice and face as evidence to be verified, not assumed. Corsound AI's Deepfake Detect and Voice-to-Face AI were built for exactly this problem: real-time detection of synthetic voice and video, plus cross-modal verification that a voice and face genuinely belong to the same person — without a pre-existing database. See how it works for banking and finance →

Photo: RDNE Stock project / Pexels

See Corsound AI Voice Intelligence In Action

Thank you.
Your submission has been received.

Oops! Something went wrong while submitting the form.

Voice-to-Face AI Deepfake Detect Law Enforcement Banking & Finance Company