Intelligent CISO Issue 21

E R T N P X E INIO OP Many are aware of fake videos of politicians, carefully crafted to convey false messages and statements that call their integrity into question. But with companies becoming more vocal and visible on social media, and CEOs speaking out about purpose-driven brand strategies using videos and images, is there a risk that influential business leaders will provide source material for kicking off possible deepfake attacks? In Figure 1, the request for a personal cell number indicates the possibility for the attacker to circumvent any caller ID facility in place on a company telephony network which would confirm the caller’s identity. How deepfake phishing attacks are created BEC: The first step in voice phishing A BEC is the campaign that follows a highly focused period of research into a target organisation. Using all available resources to examine organisational structure, threat actors can effectively identify and target employees authorised to release payments. Through impersonation of senior executives or known and trusted suppliers, attackers seek authorisation and release of payments to false accounts. Figure 2: Possible example of pre-deepfake audio attack BEC email Voice phishing enhances BEC attacks Today, deepfake audio is used to enhance BEC attacks. Reporting has indicated that there has been a marked rise in deepfake audio attacks over the last year. An FBI report found that BEC attacks have cost organisations worldwide more than US$26 billion between June 2016 and July of this year. But will these become more prominent as the next generation of phishing (or ‘vishing’ as in voice phishing) attacks and mature into the preferred attack vector instead of BEC? “The scam is frequently carried out when a subject compromises legitimate business or personal email accounts through social engineering or computer intrusion to conduct unauthorised transfers of funds,” according to the FBI alert. Deepfake audio is considered as one of the most advanced forms of cyberattack through its use of AI technology. In fact, research has recently demonstrated that a convincing cloned voice can be developed with under four seconds of source audio. Figure 1: Possible example of pre-deepfake audio attack BEC email 42 In comparison to producing deepfake video, deepfake audio is more extensible and difficult to detect. This is according to Axios, which has stated: “Detecting audio deepfakes requires training a computer to listen for inaudible hints that the voice couldn’t have come from an actual person.” Within this small time frame all the distinguishable personal voice traits, such as pronounciation, tempo, intonation, pitch and resonance, necessary to create a convincing deepfake, are likely to be present to feed into the algorithm. However, the more source audio and training samples, the more convincing the output. Deepfake audio requires material to be created from feeding training data and sample audio into appropriate algorithms. This material can be comprised of a multitude of audio clips of the target, which are often collected from public sources such as speeches, presentations, interviews, TED talks, phone calls in public, eavesdropping and corporate videos, many of which are freely available online. Through the use of speech synthesis, a voice model can be effortlessly created and is capable of reading out text with the same intonation, cadence and manner as the target entity. Some products even permit users to select a voice of any gender and age, rather than emulating the intended target. This methodology has the potential to allow for real-time conversation or interaction In today’s charged global political climate, the output of a deepfake attack can also be used to create distrust, change opinion and cause reputational damage. Issue 21 | www.intelligentciso.com

Intelligent CISO Issue 21 | Page 42