Snapshot caveat: Detection-tool accuracy and the adoption of provenance and watermarking evolve quickly. Reflects June 2026.
Deepfakes and Detecting AI-Generated Media
In Short
A deepfake is an image, sound, video, or block of text that AI has altered or synthesized to look authentic. Neither ordinary eyes nor automated detectors reliably catch the best ones, so the durable defense is verifying where media came from rather than hunting for visual flaws. Most of what you see is not fake, and the everyday risks of scam calls and fraudulent payments are defused by slowing down and confirming the source through a second channel.
01. What It Is
Synthetic media is any image, audio, video, or text produced or substantially altered by AI. A deepfake is the common name for a convincing fake of a specific real person, though it now also covers fully invented people and scenes. The word comes from a 2017 face-swapping method and has widened since.
The material spans a spectrum, from lightly edited footage, to a face swap or voice clone, to a fully generated person who never existed, to AI-written text. All of it is built to read as authentic.
02. Why It Matters
Most media is authentic, but the real damage is concentrated in a few serious areas.
Fraud is the fastest-moving everyday risk. Criminals clone a relative's voice from public audio for an emergency call, or impersonate an executive to authorize a payment. Deloitte's Center for Financial Services estimated US generative-AI fraud losses could reach US$40 billion by 2027, up from US$12.3 billion in 2023.
More on these schemes is in AI Scams and Fraud.
Information and elections are a second area, where a cloned voice or fabricated clip can outrun any correction. By volume, though, the dominant harm is non-consensual sexual imagery. Tracking by Sensity and others finds most deepfake videos online, cited from about 90% to 98% across years, are non-consensual sexual content, almost all targeting women. A subtler cost is the liar's dividend, where real evidence gets waved away as "probably AI."
03. How It Works
The types
Face swap and reenactment put one person's face or expressions onto another's footage, behind most political clips and video-call scams. Voice cloning recreates a voice from a few seconds of public audio. Fully generated media invents a person or voice from nothing, and AI-written text rounds out the set.
How they are made
Two engines dominate. A GAN, or generative adversarial network, pits a generator that makes fakes against a discriminator that catches them, until the output passes. Diffusion starts from random noise and cleans it toward a prompt.
The mechanics and current tools are covered in Diffusion and Image Generation.
Why detectors are unreliable
Automated detection breaks down where it is needed most. On Deepfake-Eval-2024, a benchmark of fakes circulating online in 2024, open-source state-of-the-art detectors lost roughly half their accuracy versus older lab tests, with AUC falling about 50% for video, 48% for audio, and 45% for images, because compression and re-sharing wreck the clean conditions they were tuned on.
False positives are the other failure. A Stanford study ran seven AI-text detectors over TOEFL essays by non-native English speakers and flagged 61.22% as AI-written, with 97% of the 91 essays flagged by at least one tool, while staying near-perfect on native-speaker writing. A detector score is a weak signal, never a verdict, and should not by itself decide a grade or justify an accusation.
Provenance and watermarks
The better path is provenance, recording where content came from. Content Credentials are the C2PA standard for it. A credential (a manifest) records how an asset was made and edited, including any AI use, and is signed so any later change breaks the signature. It is an open, royalty-free standard, not DRM. The limit is that the manifest usually rides inside the file, so it can be stripped, and many platforms re-encode uploads and drop it. C2PA's fallback is a soft binding, an invisible watermark that tries to relink a stripped file to its credential. A credential proves what a tool recorded, not that the content is true.
Invisible watermarks work differently. Google DeepMind's SynthID embeds an imperceptible signal at creation into Google's own products (Gemini, Imagen, Lyria, Veo) and survives edits like cropping and compression. By Google's May 2025 figure it had marked over 10 billion pieces of content. But it only recognizes its own signature, so it says nothing about media from other tools, and watermarks are breakable. University of Maryland researchers broke every watermark they tested and could even forge a fake one onto a real image, and one tool reported stripping about 79% of SynthID marks in a test Google disputes. Even so, adoption is spreading, with the Content Authenticity Initiative past 5,000 members by August 2025.
The deeper legal picture, including EU marking rules, is in Copyright, IP, and Data Provenance in AI.
04. How to Spot or Verify
Begin from a realistic baseline. In a peer-reviewed study, people scored near 48% (chance) telling AI faces from real photos and rated the fakes 7.7% more trustworthy, and a 2025 iProov test of 2,000 UK and US consumers found only 0.1% flagged every real and fake image and video, even when warned. If trained people and tools miss the best fakes, the fix must be procedural, not perceptual.
- Slow down. Urgency is the scammer's main lever, so pressure to act now is itself a warning sign.
- Check source and context. Who posted it, where it first appeared, and whether a reputable outlet carries it.
- Reverse image search a still to find the original or an earlier version.
- Confirm through a second channel. Call back on a number you already have. For a voice-clone emergency call, the FTC advises hanging up and calling back, with a family code word agreed in advance.
- Check Content Credentials at contentcredentials.org/verify. Finding none is normal for almost all media and proves nothing.
- Treat visual cues as weak and expiring. MIT Media Lab's Detect Fakes lists eight, from skin and eye shadows to glasses glare and blinking, while stressing there is "no single tell-tale sign."
The shift that matters is from "do the pixels look fake" to "can I confirm where this came from."
05. Key Terms
| Term | What it means |
|---|---|
| Deepfake / synthetic media | Media produced or substantially altered by AI. A deepfake usually means a convincing fake of a real person. |
| Face swap and reenactment | One person's face or expressions placed onto another's footage so the target appears to say what an attacker scripts. |
| Voice cloning | Recreating a voice from a short sample. A few seconds of public audio can be enough. |
| GAN (generative adversarial network) | A generator makes fakes while a discriminator catches them, until the output passes. The engine behind early fake faces. |
| Provenance / C2PA Content Credentials | A file's recorded origin and edit history, attached as a cryptographically signed, tamper-evident record anyone can inspect. |
| Invisible watermark (e.g., SynthID) | A signal hidden in content at generation, imperceptible to people but readable by a detector. Not a visible logo or metadata. |
| Liar's dividend | In a fake-aware world, real footage can be dismissed as "probably AI," letting wrongdoers dodge real evidence. |
06. Examples
Arup video call (May 2024):
The engineering firm Arup confirmed an employee in Hong Kong made 15 transfers worth about US$25 million after a video meeting in which the "CFO" and colleagues were all AI-generated.
New Hampshire robocall (January 2024):
Days before the primary, an AI-cloned "Joe Biden" call told voters to stay home. The FCC ruled AI-voice robocalls illegal under the Truth in Caller ID Act and finalized a US$6 million forfeiture against the operative behind it.
TAKE IT DOWN Act:
The Act, signed May 19, 2025, criminalizes publishing non-consensual intimate images, including AI "digital forgeries," and gives covered platforms 48 hours to remove a flagged image, with the process due by May 19, 2026.
07. Common Misconceptions
"There is always a visual tell, like six fingers or bad teeth."
Those artifacts were real but are a moving target, fixed by each new model generation, so the best fakes now pass a careful look.
"A high detector score means it is fake."
Detectors carry high false-positive rates and degrade on compressed, re-shared media. They are also biased against non-native English writers. A flag is a weak signal, not proof.
"A credential proves an image is real, and no credential means it is fake."
A credential records only what a tool logged about creation and edits, not whether the content is accurate. Absence of one is normal and proves nothing.
"Watermarks like SynthID catch any AI content."
SynthID only flags content from Google's own tools carrying its signature, and researchers have shown invisible watermarks can be stripped or forged.
"Deepfakes are mainly a celebrity and election problem."
By volume the dominant harm is non-consensual sexual imagery of women, while the fastest-growing financial harm hits ordinary people through voice-clone and fake-executive fraud.
"If I cannot spot it, I am gullible and helpless."
Experts and detectors also fail on the best fakes, so this is not a personal weakness. The procedural routine above is the defense, used before you act or share.