Please ensure Javascript is enabled for purposes of website accessibility
Back

Thousands of Hackers Red-Team Generative AI Models at DEF CON 31

Yinon Douchan
,
AI Researcher
August 29, 2023

DEF CON, a Las Vegas fixture, has long been the proving ground for cybersecurity's cutting edge. This year, the focus shifted to artificial intelligence. The Generative Red Team Challenge, with the White House's involvement, aimed to expose the weaknesses of leading large language models (LLMs) before they could be exploited in the wild.

The objective was clear: to simulate real-world adversarial attacks, uncovering biases, harmful outputs, and security flaws that could have far-reaching consequences. The models involved, kept largely under wraps for security purposes, were subjected to many attacks.

The Scale of the Generative Red Team Challenge

The scale of the operation was substantial. Participants, ranging from seasoned cybersecurity professionals to newcomers, employed a variety of techniques. Prompt injection, data exfiltration attempts, and "jailbreaking" were among the arsenal used.

One of the prominent attack vectors was prompt injection, where hackers manipulated user inputs to override the model's safety guidelines. By carefully crafting prompts, they could coax the AI into generating harmful or biased content, bypassing built-in safeguards. The challenge exposed the difficulty of making AI models truly safe.

Data Exfiltration and Bias Concerns

Data exfiltration, the attempt to extract sensitive information, proved to be another area of concern. Hackers explored ways to retrieve training data or other confidential details, raising alarms about potential data leakage.

The challenge also highlighted the persistent issue of bias and toxicity. AI models, trained on vast datasets, can inadvertently perpetuate societal biases, generating discriminatory or offensive outputs. Participants documented numerous instances of models producing biased or harmful content, underscoring the ethical challenges of AI development.

Jailbreaking and Information Disclosure

"Jailbreaking," the act of tricking an AI into ignoring its programming, was another key tactic. Hackers found ways to manipulate the models into providing information they were supposed to withhold, or to perform actions they were explicitly forbidden from doing.

Also, some participants were able to get the AI models to reveal information about their training data, and other internal information.

The implications of these findings are important. The AI industry is now facing the reality of these vulnerabilities. Developers are under pressure to strengthen their models' defenses, implementing more robust safety measures and addressing the root causes of bias.

Governments and policymakers are also taking notice, recognizing the need for regulations and guidelines to ensure responsible AI development and deployment. Ethical considerations are important to consider. As AI becomes more integrated into our lives, we must ensure that these systems are safe, fair, and transparent.

-

“As GenAI threats are emerging, organizations implementing AI like chatbots or decision engines can use the DEF CON findings as a checklist of what to guard against when using generative AI.”

‍

Latest AI Deepfake articles

Deepfake Investment Scams Are Exploding—And the Stakes Just Got Personal

Over the past few weeks, my feed has been flooded with "exclusive" video pitches featuring familiar faces like Gal Gadot, Dovi Frances, Yasmin Lukatz, Eyal Valdman, and even Warren Buffett. Each video promises extraordinary returns from a supposedly exclusive investment fund. The presentations are incredibly polished, flawlessly lip-synced, and convincingly authentic.

The only problem? None of these videos are real.

Why Does This Matter?

  • Hyper-Realism on Demand: Advanced generative AI now easily replicates faces, voices, and micro-expressions in real-time.
  • Massive Reach: Fraudsters distribute thousands of micro-targeted ads across Instagram, YouTube Shorts, and TikTok. Removing one only leads to a rapid replacement.
  • Record Losses: In 2024, a deepfake impersonation of a CFO cost a UK engineering firm $25 million. Regulators estimate nearly 40% of last year's investment fraud complaints involved manipulated audio or video.

What To Watch For

  • Too-Good-To-Be-True Promises: Genuine celebrities rarely endorse 15% daily returns.
  • One-Way Communication: Disabled comments, invitation-only direct messages, and suspiciously new "official" websites are red flags.
  • Subtle Visual Artifacts: Watch for flat hairline lighting, inconsistent blinking patterns, or an unnatural stare when the speaker moves.

How Clarity Responds

At Clarity, our detection engine swiftly identified the recent "Gal Gadot investment pitch" deepfake within 4 seconds, pinpointing subtle lip-sync inconsistencies invisible to human observers.

As deepfakes proliferate at machine speed, automated verification is essential. Our technology analyzes facial dynamics, audio patterns, and metadata in real-time, enabling rapid removal of fraudulent content—before it reaches potential victims. Think of our solution as antivirus software for the age of synthetic media—always active, continuously evolving, and most effective when supported by an educated public.

Yet, technology alone isn't enough; critical thinking and vigilance remain crucial.

If You Encounter a Suspicious Investment Video:

  • Pause: Don’t act immediately.
  • Verify: Confirm the source through known, official channels.
  • Report: Use the “impersonation” option available on most platforms.
  • Share Awareness: Inform others. Community awareness grows faster than deepfake scams when actively spread.
Together, let's protect our communities—investors, families, and fans alike—from synthetic media fraud.
‍
graphical user interface, website

‍

Last week, Unit42 by Palo Alto Networks published a fascinating - and frightening - deep dive into how easily threat actors are creating synthetic identities to infiltrate organizations.

We’re talking about AI-generated personas, complete with fake resumes, social profiles, and most notably, deepfaked video interviews. These attackers aren’t just sending phishing emails anymore. They’re showing up on your video calls, looking and sounding like the perfect candidate.

At Clarity, this is exactly the kind of threat we’ve been preparing for.

The Rise of Deepfakes in Hiring - A New Attack Vector

The interview process has become a weak link in organizational security. With remote hiring now standard, verifying a candidate’s identity has never been more challenging - and adversaries know it.

Deepfake technology has reached a point where bad actors can spin up convincing video personas in hours. As Unit42 highlighted, state-sponsored groups are already exploiting this to gain insider access to critical infrastructure, data, and intellectual property.

This isn’t just a cybersecurity issue - it’s a trust crisis.

‍

Inside Unit42’s Findings - A Manual Deepfake Hunt

In their detailed analysis, Unit42 showcased just how layered and complex synthetic identity attacks can be. Each figure in their report highlights different aspects of deepfake deception - from AI-generated profile photos and fabricated resumes to manipulated video interviews, with cheap and widely available hardware to higher-quality deepfakes using resource-intensive techniques.

Their approach demonstrates the painstaking process of manually dissecting these fakes:

  • Spotting subtle visual glitches

  • Identifying inconsistencies across frames

  • Cross-referencing digital footprints

While their expertise is impressive, it also underscores a critical point: most organizations don’t have the time, resources, or deepfake specialists to conduct this level of forensic analysis for every candidate or call.

That’s exactly why Clarity exists.

‍

How Clarity Detects What the Human Eye Can’t

Let’s face it - no recruiter, hiring manager, or IT admin can be expected to spot a high-quality deepfake in a live interview. That’s where Clarity comes in.

Our AI-powered detection platform is designed to seamlessly analyze video feeds, pre-recorded interviews, and live calls to identify synthetic media in real-time.

When we ran the videos shared in Unit42’s report through our Clarity Studio, the outcome was clear:

Deepfake detected - with a clear confidence score that tells you instantly whether a video is real or synthetic. No need for manual checks or deepfake expertise - Clarity delivers fast, decisive answers when it matters most.

No manual frame-by-frame reviews. No specialized training required. Just fast, reliable detection that integrates directly into your workflows.

‍

Automating Trust in a Synthetic World

At Clarity, we believe organizations shouldn’t have to become deepfake experts to stay protected. Whether you're hiring globally, conducting sensitive interviews, or verifying identities remotely, our system ensures:

  • Real-time detection during live calls

  • Comprehensive analysis of recorded videos

  • Automated alerts when synthetic media is detected

With Clarity, you can focus on growing your team and business, without second-guessing who’s really on the other side of the screen.

See It In Action

We applaud Unit42 for shedding light on this growing threat. To demonstrate how proactive detection can neutralize these risks, we’ve analyzed the same deepfake videos from their post using Clarity Studio.

Check out the screenshots below to see how Clarity instantly flags these synthetic identities - before they become your next insider threat.

Our studio results on Unit42 Figure 4 video: A demonstration of a realtime deepfake on cheap and widely-available hardware

‍

Our studio results on Unit42 Figure 4 video: A demonstration of a realtime deepfake on cheap and widely-available hardware
Our studio results on Unit42 Figure 5: demonstration of identity switching
Our studio results on Unit42 Figure 6. A higher quality deepfake using a more resource-intensive technique
Our studio results on Unit42 Figure 7c. The "sky-or-ground"

‍

On Saturday night, Israeli Channel 14 mistakenly aired a manipulated video of former Defense Minister Yoav Gallant—an AI-generated deepfake that appeared to originate from Iranian media sources. The incident, which took place during the channel’s evening newscast, showcased Gallant speaking in Hebrew but with a clear Persian accent. The anchor, recognizing the suspicious nature of the clip, interrupted the broadcast mid-sentence, calling out the video as fabricated.

“On the first sentence I said stop the video. We apologize. This is cooked… These are not Gallant’s words but AI trying to insert messages about the U.S. and the Houthis,” said anchor Sarah Beck live on air.

Shortly after, Channel 14 issued an official statement confirming that the video was aired without prior verification and that an internal investigation was underway.

What Actually Happened?

The video portrayed Gallant stating that “the U.S. will not be able to defeat the Houthis,” a politically charged statement intended to sow confusion and manipulate public sentiment. Although the channel removed the clip within seconds, the damage was already done: the AI-generated video had reached thousands of viewers.

This incident highlights the speed, sophistication, and geopolitical implications of deepfake attacks.

How Clarity Responded — in Real Time

Minutes after the clip aired, our team at Clarity ran the footage through Clarity Studio, our real-time media analysis and deepfake detection platform. The results were clear:

  • Manipulation Level: High
  • Audio-Visual Inconsistencies: Detected in voice pattern and facial dynamics
  • Anomaly Source: Synthetic voice generation with foreign accent simulation

Here’s the detection screenshot from Clarity Studio:

We identified clear mismatches between Gallant’s known voice and speech pattern compared to the clip, along with temporal inconsistencies in facial movement and audio syncing—hallmarks of state-sponsored deepfake manipulation.

Why It Matters

This wasn’t a fringe incident. This was a high-profile deception attempt broadcast on national television. Deepfakes are no longer future threats. They are present-day weapons—used to spread disinformation, manipulate public opinion, and erode trust in media.

And this time, Clarity caught it before the narrative could spiral out of control.

The Takeaway

Broadcasters, law enforcement, and government agencies need tools that can verify audio and video authenticity in real time. This isn’t just about technology—it’s about safeguarding democratic discourse and preventing psychological operations from hostile actors.

At Clarity, we’re building the tools to detect these threats before they become headlines.

‍

Changpeng Zhao (CZ) of Binance recently warned, deepfakes are proliferating in the crypto space, impersonating prominent figures to promote scams and fraudulent projects. The message is clear: the digital age has ushered in a new era of brand vulnerability.

Deepfakes, powered by sophisticated artificial intelligence, manipulate audio and video to create convincing forgeries. The technology's accessibility and affordability have democratized its use, making it easier for malicious actors to create realistic impersonations.

In the financial and crypto sectors, where trust is paramount, deepfakes can cause substantial damage. Impersonating CEOs, creating fake endorsements, and fabricating promotional materials are just a few of the tactics being employed. The potential for financial damage is substantial, as unsuspecting individuals are tricked into sending money or divulging sensitive information.

Consider the recent surge in deepfakes impersonating public figures endorsing cryptocurrency scams. These fabricated videos, often spread through social media, can deceive even savvy investors.

Brand And Financial Consequences

The consequences are concerning, leading to substantial financial losses and a severe erosion of trust in the affected brands.

The impact on brand reputation can be significant. Deepfakes can tarnish a brand's image overnight, eroding the credibility built over years. Regaining trust after a deepfake incident is an uphill battle, requiring a concerted effort to restore public confidence. In a digital world where information spreads quickly, the damage can be extensive and long-lasting.

However, there are strategies for mitigating and preventing deepfake attacks. Technological solutions are at the forefront of this battle. Deepfake detection tools, powered by AI, can analyze videos and audio to identify telltale signs of manipulation. 

Blockchain technology offers another layer of protection, providing a secure and transparent way to verify identity and content. Watermarking and digital signatures can also help authenticate media and prevent tampering.

A Technological Arms Race

The deepfake threat isn't static; it's a rapidly evolving landscape. The technology itself is constantly being refined, with advancements in AI and machine learning pushing the boundaries of what's possible. 

This evolution is driven by a technological arms race. As detection tools improve, so do the methods used to create deepfakes. Generative adversarial networks (GANs), for instance, are becoming more sophisticated, allowing for the creation of highly realistic synthetic content. 

Furthermore, the accessibility of powerful computing resources and open-source deepfake software democratizes the technology, placing it within reach of even less technically skilled individuals.

This constant evolution presents a significant challenge for detection and mitigation efforts. It's not simply a matter of developing a one-size-fits-all solution; it's an ongoing battle against increasingly sophisticated techniques

-

Detection, collaboration, and information sharing are all vital in combating this evolving threat. While detection and prevention should be the first port of call, collaboration with law enforcement and regulatory agencies can help bring deepfake creators to justice.

‍

 

‍