Deepfakes and Stolen Voices: How to Navigate a new era of Identity Theft

Noah Kjos

Jun 7, 2024 — 11 min read

"Create an image of a shadowy figure with a microphone"

This newsletter is a repost from my blog, Noah's Ark on LinkedIn. If you enjoy, please consider subscribing.

In today’s newsletter I want to help provide education on where voices can be stolen from by malicious actors, examples of what a stolen voice could be used for, and what you can do to protect yourself, your family, and your business.

As I have written about before, generative AI has opened up a Pandora’s box where we can no longer trust anything we see or hear. While we as a collective society are used to being somewhat skeptical of what we see, we are not accustomed to having to question what we hear, and especially “who” we are hearing. The flip side of this is we have never had to think about how our voice might be “stolen” or misused, or truly thought about just how accessible our voices are to anyone looking for them.

With this new reality comes a host of problems that your everyday person is not educated on, let alone prepared for, that are going to have far reaching effects and consequences. The goal of this blog is to help provide foundational education and awareness of how a voice can be “stolen”, what it can be used for, and what practical steps can be taken in light of this new threat.

Let’s start by identifying where our voices can be stolen from.

Ready to get depressed? Me too. Let’s rip the bandaid off.

Most of us have never considered needing to be careful about where our voice appears. At most it’s consideration of “what” we are saying from a reputational management standpoint, but never before have we had to deal with the fact that our voice itself can be replicated.

Here is a non-comprehensive list of places where our voices might appear digitally. Note, I say “digitally” because some of the places where our voices found aren’t necessarily “online”.

Social Media Platforms: TikTok, Facebook, Instagram, Youtube, Twitter, etc. where users upload videos, often including their voice. (Goodluck to all you want-to-be influencers)
Voicemail Recordings: Personal or business voicemails are vulnerable and many sales prospecting tools allow you to find both personal and business phone numbers.
Phone Calls: Your voice can be recorded during a personal or work call.
Video Calls: Your voice can be recorded during a call on Zoom, Google Meet, Microsoft Teams, Facetime etc.
Voice Messages on Messaging Apps: Your voice can be taken from voice messages on facebook messenger, WhatsApp, Signal, Telegram, imessage, etc.
Game Chats: Voice chats can be recorded when you’re gaming. Think Playstation parties, Xbox parties, Discord rooms, etc.
Podcasts and Interviews: Often available to anyone on streaming platforms or YouTube.
Public Speeches and Presentations: Video’s or recordings other people may have taken at a public event can often be found online.
Video Blogs: Personal blogs with audio content can be found online.
Workshops and Webinars: If you are a speaker or someone asking a question and a recording gets posted.
Educational Recordings: Lectures, online courses, and instructional videos often contain the voices of instructors and/or participants.
Other People’s Personal Recordings: You might appear in videos or recordings that other people have taken of you.
Legal and Administrative Recordings: Courtroom proceedings and other public and governmental meetings are often recorded.
Customer Service Calls: These are often recorded and could be accessed in a breach or by a disgruntled employee.
Smart Home Devices: Hackers could potentially access recordings stored on home devices like Alexa, Google Home, Ring Doorbells, etc.

These are dark days to have made an enemy, and a good time to be in the hacking, social engineering, scamming, extortion, or catfishing businesses.

So who is most vulnerable?

Here is a (non-exhaustive) list of people most likely to have their voices stolen and deepfaked:

Public Figures and Celebrities
Politicians and Leaders
Content Creators and Influencers
Business Executives
Activists and Community Leaders
Individuals in Customer Facing Roles
Kids who are Victims of Bullying
Those Navigating a Messy Divorce or Custody Battle
Journalists and Reporters
Whistleblowers
Legal Professionals
Human Rights Activists
High-Net-Worth Individuals
Family Members of all of the Above

So what makes these people particularly vulnerable? They either have something people want (power, money, information, abilities, etc.), a voice that others will react to hearing, or a high likelihood of potential enemies. Note however, this is not to say that your everyday person can’t be a victim of voice theft as well.

So why might someone steal a voice?

Here is another (non-exhaustive) list:

Identity Theft: Stealing someone’s voice can allow malicious actors to access confidential information, personal assets, and accounts.
Manipulation and Social Engineering: A stolen voice can be used to breach a business, manipulate others into taking certain actions or compromise sensitive information. (A business in Hong Kong lost $25m to this)
Misinformation: Using a stolen voice can give credibility to misleading or false information in an attempt to deceive other people and/or sway public opinion. (The Biden Deepfake call was an example of this)
Ransom or Blackmail: Threats of misuse of a voice can be used to extort ransom payments from potential victims. (We spoke with a victim of this)
Scams: Stolen voices can be used to manipulate and scam friends or family members of the victim. (Read more here)
Reputational Attacks: Stolen voices can be used to say damaging and/or misleading things in an attempt to attack the reputation of an individual. (Recent example here)

All kinds of nasty stuff there. To help make this more tangible, here are some examples of what this might look like in the wild. Many of these are directly from a great paper published by the Department of Homeland Security, and I have indicated these examples where appropriate.

Deepfakes in the workplace

Scenario #1: Corporate Enhanced Social Engineering Attacks (From DHS Paper)

In this scenario we consider the use of deepfake technology to more convincingly execute social engineering attacks. First, a malign actor would conduct research on the company’s line of business, executives, and employees. He identifies the Chief Executive Officer (CEO) and the Finance Director of the company. The malign actor researches a new joint venture that company announced recently. He utilizes Ted Talks and online videos of the CEO to train the model to create a deepfake audio of the CEO. The malign actor conducts research on the Finance Director’s social media profiles. He sees that he posted a picture of a baby and a message it’s hard to return to work. Next, the individual would place a call to the Finance Director with a goal to fraudulently obtain funds. He would ask the Finance Director about how he is doing returning to work and about the baby. The Finance Director answers his phone and recognizes his boss’s voice. The malign actor directs him to wire $250K to an account for the joint venture. The funds would be wired and then the malign actor would transfer the funds to several different accounts.

Scenario #2: Financial Institution Social Engineering Attack (From DHS Paper)

In this scenario, the malign actor decides to employ a deepfake audio to attack a financial institution for financial gain. Next, she conducts research on the dark web and obtains names, addresses, social security numbers, and bank account numbers of several individuals. The malign actor identifies the individuals’ TikTok and Instagram social media profiles. She utilizes the videos posted on social media platforms to train the model and creates deepfake audio of targets. The malign actor researches the financial institution for the verification policy and determines there’s a voice authentication system. Next, she calls the financial institution and passes voice authentication. She is routed to a Representative and then utilizes the customer proprietary information obtained via the dark web. The malign actor tells the Representative that she was unable to access on her account online and needs to reset her password. She was provided a temporary password to access the online account. The malign actor gains access to her target’s financial accounts. The malign actor wires funds from the target’s account to overseas accounts.

Scenario #3: Stock Manipulation (From DHS Paper)

In this scenario, we consider a deepfake generated to manipulate the stock market and allow the malign actor to make an illicit profit. A malign actor wishes to make a quick profit through stock manipulation. The actor thoroughly researches the stock and purchases it at a low price. He creates several deepfake profiles on stock market forums such as Reddit and Stockaholics. The profiles show that the users are employees of the company. Posing as these employees, the actor posts comments about a pending “major” announcement. Having identified the company CEO, the actor trains a model of the CEO’s speech based of interviews which aired on various television and radio programs. The actor creates an audio deepfake of the CEO discussing the pending “major” announcement and posts it on social media, along with a link to the audio on stock market forums. The malign actor monitors forums and sees a huge spike of activity confirming his deepfake audio is working. The stock increases in price by 1000 percent and the malign actor cashes out before the stock drops. This could cause other investors to lose money and impact the company’s reputation. The company may make a statement that the audio of the CEO was fake. The investors may look to the company to make them whole for any losses suffered.

Deepfakes in your personal life

Scenario #1: Scams

Here is a real world example that was sent to me by my Grandfather from his neighborhood NextDoor channel.

Scenario #2: Cyberbullying (From DHS Paper)

In this scenario we consider a deepfake generated to depict a target in a situation which would damage their reputation or impact their access to groups, services, or benefits, perhaps by depicting the individual engaged in criminal behavior. The attacker wishes to undermine the reputation of the target, which may have the secondary effect of enhancing the status of another preferred by the attacker. In a well publicized recent incident in Pennsylvania, a woman attempted to damage the reputation of cohorts of her daughter who were in competition for limited spots on a cheerleading squad.35 In this scenario a deepfake video depicting the target engaged in criminal behavior is produced and sent to individuals in positions of authority over the target’s activities. Based on the video, these authorities restrict or remove the target from participating in certain activities.

Deepfakes in politics

Scenario #1: Election Influence (From DHS Paper)

In this scenario we consider a deepfake used to spread disinformation around the time of an election. In the run-up to the election, a group of tech-savvy supporters of Candidate A could launch a disinformation campaign against Candidate B. In the scenario, malign actors may take advantage of audio, video, and text deepfakes to achieve their objectives. While individual audio and video deepfakes are likely to create more sensational headlines and grab people’s attention, the threat of text deepfakes lies in their ability to permeate the information environment without necessarily raising alarms. Another key use of text deepfakes is controlling the narrative on social media platforms. This approach could heighten societal tensions, damage the reputation of an opponent, incite a political base, or undermine trust in the election process.

So what can you do?

Now that we are all sufficiently depressed with the state of the world we are entering, let’s take a moment to explore the practical steps we can take to protect ourselves.

Proactive steps:

Continued Education: Now that you are aware of the threats, continue to stay updated on the latest developments in generative AI and continue to read up on how you can protect yourself. Take the time to educate your loved ones as well as they might not be aware of the problem, and knowing there is a threat is the first step towards defending against it.
Audit Exposure: Take time to conduct a thorough review of everywhere your voice appears digitally. Do the same with your kids, parents, grandparents, and other family members. The list I provided above can be used as a starting point.
Limit Accessibility: Once you have taken inventory, take action to limit the quantity and digital accessibility of your voice. Remove your personal voicemail recordings, update your social media privacy settings to the maximum levels of security and privacy (or even better, remove videos and audio), take down old videos and recordings that are no longer relevant.

Even though the steps above remove the low hanging fruit, the unfortunate reality is that we cannot remove all instances of our voices online. If bad actors truly want to find and clone our voice, odds are they can. Or if they can’t steal ours, they will likely be able to find and use the voice of someone else close to us. Given this, more important than “protecting” our voices, is knowing steps to take once a voice has already been stolen and being skeptical of what, and who, we are hearing at all times.

Reactive steps:

Think Critically: For everything you see and hear online or on a call, think about what is being said and asked and question the identity and motives of the speaker: Why is it being said? Who is saying it? Am I sure it’s really that person? Do they have an agenda? Is this the proper channel of communication for this information? Is this following the relevant processes? Don’t just accept information at face value. Adopt a mindset of always being skeptical of what you are hearing, especially when information or money is being requested.
Check Speaker Identity: For both individuals and businesses, I recommend implementing a second method of verifying speaker identity on calls. For individuals and families this might be security questions or passphrases. For a business this might be process updates and training around confirming speaker identity, training around checking the email that is sending meeting invitations, and defining what information can be exchanged over what channels.
Adopt Tools to Secure Calls: I’m obviously heavily biased here, but in my eyes we have crossed the rubicon when it comes to being able to perceive real audio from deepfake audio in the wild, and relying on people to think critically about what is being said and question the identity of speakers at all times isn’t realistic, especially in emotionally charged, high stress scenarios. Given this, we need to adopt tools that can verify the source of the audio being generated, analyze the content of what is being said and alert us to suspicious asks, and identify, and alert us of, deepfaked speech in realtime.
Adopt Tools to Verify Content: Similar to securing calls, we now need accessible tools that can analyze the videos, recordings, and other media that we are consuming to identify misinformation and flag AI generated content. Long term, responsibility for content protection should likely reside with the platforms that distribute content (social media, search engines, news, phone providers, etc.) but in the meantime, there are still tools we can use to validate anything we have questions about.

Closing thoughts

The high quality, low cost, and accessibility of generative AI is driving increased adoption at a rapid pace for all kinds of legitimate and amazing use cases. While we are still in the early days in terms of malicious use of generative AI, real world scenarios are becoming more and more common and it's naive to think that this technology won’t increasingly be abused. In order to mitigate risks, we all need to collectively educate those around us, audit where our voices appear online, and adopt new tools and processes to protect ourselves.

With this, in closing, I’ll leave you with an assignment to have at least one conversation (more is ideal) with someone close to you to help educate them on generative AI, make them aware of some of the risks, have them audit where their voice appears digitally, and point them in the direction of resources that can help protect them.

Stay safe out there everyone. The coming years are going to be interesting.

Who am I?

I (Noah) am a Co-Founder at DeepTrust where we are building an end to end call security solution to combat deepfakes and generative AI. If you’re interested in learning more, or have additional questions about deepfakes, I’d love to chat. Feel free to shoot me an email at noah@deeptrustai.com.

Ready to get started? Sign up here!

Finally, if you enjoyed this blog, I invite you to follow my newsletter, Noah’s Ark on LinkedIn.