Biometric injection attacks: how deepfakes slip past video KYC

Digital facial recognition scan with biometric overlay representing video identity verification

In Q1 2026, biometric authentication systems faced a new kind of attacker — one that doesn't show up in front of a camera. Industry research now finds that one in five biometric fraud attempts involves deepfake manipulation, and a fast-growing share of those attempts bypass the camera entirely. Instead of holding up a printed photo or playing video on a phone, attackers are feeding pre-rendered deepfakes straight into authentication APIs through virtual camera drivers — a technique known as a biometric injection attack. For banks, fintechs, and any business running remote KYC, it changes what a “liveness check” actually has to mean.

What is a biometric injection attack?

Traditional presentation attacks try to fool a camera with something physical: a printed photo, a 3D mask, a phone screen showing video. Liveness detection has gotten reasonably good at spotting those.

Injection attacks skip that battle altogether. The attacker installs virtual webcam software on the device running the verification flow, then pipes a synthesized video — often a deepfake of a real customer assembled from leaked KYC data and social media images — directly into the camera feed. The authentication system never sees a real face, but it doesn't know that. To the API, the bytes look like any other live capture.

According to a 2026 Biometric Update analysis, injection-based attacks are now growing faster than every other category of biometric fraud, partly because the tooling has been productized. Deepfake-as-a-Service marketplaces sell ready-to-go video clones for $10–$50 and full synthetic identity packages for around $15.

Why this attack vector is exploding right now

Three things converged in the last twelve months to make injection attacks a mass-market threat:

  • Generative video has crossed the realism threshold. A few seconds of public audio or video is now enough to train a clone good enough to fool most consumer-grade liveness models.
  • Crime-as-a-service has lowered the skill floor. Cybercriminals can rent deepfake images, synthetic identities, cloned voices, and biometric datasets for as little as $5.
  • Remote onboarding is now the default. The same video KYC flows that made digital banking accessible during the pandemic are the surface area attackers are now industrializing.

The result: deepfake files grew from roughly 500,000 in 2023 to 8 million in 2025, and fraud attempts leveraging deepfake content have climbed more than 2,000% over three years.

Where existing defenses fall short

Most deployed liveness systems were designed for a world where the attacker had to physically defeat a camera. Injection attacks invalidate three of their core assumptions at once.

Camera trust is no longer a given

Browser and mobile camera APIs treat any registered device as legitimate. Virtual camera drivers — many of them legitimate tools used by streamers and remote workers — register exactly the same way as a real webcam. The OS layer cannot tell them apart, and most authentication SDKs don't try.

Texture and motion cues can be synthesized

Skin micro-texture, eye reflections, head-pose dynamics — the signals first-generation liveness models rely on — are exactly the cues today's diffusion-based generators have been trained to reproduce. They're now table stakes for a credible deepfake.

One channel is no longer enough

A model that only inspects video pixels has nothing to fall back on when those pixels are synthetic. Defenses that fuse multiple modalities — what the user looks like, what they sound like, and whether the two match — have a much harder problem to defeat.

What a layered defense looks like

Stopping an injection attack means catching the deepfake in the signal itself, not waiting for the camera to give it away. That's where multi-modal detection comes in:

  • Deepfake detection on the media stream — frame-level and audio-level analysis that looks for the statistical fingerprints of generative models, not just for “liveness.”
  • Voice-to-face consistency — verifying that the voice in a session is biometrically consistent with the face on screen, without needing a pre-enrolled voiceprint.
  • Device and capture-path telemetry — flagging virtual cameras, screen captures, and re-encoded streams before they ever reach the model.

The throughline is that no single check is sufficient anymore. Layered detection is now the price of doing remote identity verification at all.

The bottom line for fraud and security teams

Injection attacks are not a future scenario. They are the dominant growth category in biometric fraud right now, and the economics — $5 for a synthetic identity, free virtual camera drivers, and minutes of effort per attempt — guarantee they keep scaling. Banks, fintechs, telcos, and HR platforms running remote verification need to assume that any camera feed could be synthetic and design their controls accordingly.

Corsound AI's Deepfake Detect was built for exactly this threat model: real-time detection of synthetic audio and video on live streams, with multi-modal signal analysis that catches what camera-only liveness misses. If your KYC, customer-support, or remote-onboarding flows still depend on a single video channel, talk to our team about closing the injection-attack gap.

Photo: cottonbro studio / Pexels

See Corsound AI Voice Intelligence In Action
Thank you.
Your submission has been received.
Oops! Something went wrong while submitting the form.