E R T N
P
X
E INIO
OP
Many are aware of fake videos of
politicians, carefully crafted to convey
false messages and statements that
call their integrity into question. But
with companies becoming more vocal
and visible on social media, and CEOs
speaking out about purpose-driven
brand strategies using videos and
images, is there a risk that influential
business leaders will provide source
material for kicking off possible
deepfake attacks?
In Figure 1, the request for a personal
cell number indicates the possibility for
the attacker to circumvent any caller ID
facility in place on a company telephony
network which would confirm the
caller’s identity.
How deepfake phishing attacks
are created
BEC: The first step in
voice phishing
A BEC is the campaign that follows a
highly focused period of research into
a target organisation. Using all available
resources to examine organisational
structure, threat actors can effectively
identify and target employees authorised
to release payments.
Through impersonation of senior
executives or known and trusted suppliers,
attackers seek authorisation and release
of payments to false accounts.
Figure 2: Possible example of pre-deepfake
audio attack BEC email
Voice phishing enhances
BEC attacks
Today, deepfake audio is used to
enhance BEC attacks. Reporting has
indicated that there has been a marked
rise in deepfake audio attacks over the
last year.
An FBI report found that BEC attacks
have cost organisations worldwide more
than US$26 billion between June 2016
and July of this year. But will these become more prominent
as the next generation of phishing (or
‘vishing’ as in voice phishing) attacks
and mature into the preferred attack
vector instead of BEC?
“The scam is frequently carried out
when a subject compromises legitimate
business or personal email accounts
through social engineering or computer
intrusion to conduct unauthorised
transfers of funds,” according to the
FBI alert. Deepfake audio is considered as one of
the most advanced forms of cyberattack
through its use of AI technology. In fact,
research has recently demonstrated
that a convincing cloned voice can be
developed with under four seconds of
source audio.
Figure 1: Possible example of pre-deepfake
audio attack BEC email
42
In comparison to producing deepfake
video, deepfake audio is more extensible
and difficult to detect. This is according
to Axios, which has stated: “Detecting
audio deepfakes requires training a
computer to listen for inaudible hints that
the voice couldn’t have come from an
actual person.”
Within this small time frame
all the distinguishable
personal voice traits, such
as pronounciation, tempo,
intonation, pitch and
resonance, necessary to
create a convincing deepfake,
are likely to be present to feed
into the algorithm. However,
the more source audio and
training samples, the more
convincing the output.
Deepfake audio requires material to
be created from feeding training data
and sample audio into appropriate
algorithms. This material can be
comprised of a multitude of audio clips
of the target, which are often collected
from public sources such as speeches,
presentations, interviews, TED talks,
phone calls in public, eavesdropping
and corporate videos, many of which are
freely available online.
Through the use of speech synthesis,
a voice model can be effortlessly
created and is capable of reading out
text with the same intonation, cadence
and manner as the target entity. Some
products even permit users to select
a voice of any gender and age, rather
than emulating the intended target. This
methodology has the potential to allow
for real-time conversation or interaction
In today’s charged
global political
climate, the output
of a deepfake attack
can also be used
to create distrust,
change opinion and
cause reputational
damage.
Issue 21
|
www.intelligentciso.com