Voice impersonators can fool speaker recognition systems

Skilful voice impersonators are able to fool state-of-the-art speaker recognition systems, as these systems generally aren't efficient in recognising voice modifications, according to new research from the University of Eastern Finland. The vulnerability of speaker recognition systems poses significant security concerns.

Mobile devices are increasingly equipped with applications that function with voice commands. The user is able to dictate messages, translate phrases and do search queries by voice only. The widespread use of electronic services has increased the demand of applications that use voice to recognise the speaker either for authentication purposes or for public safety. However, with the popularity of voice applications, their misuse may also increase.

Voice attacks against speaker recognition can be done using technical means, such as voice conversion, synthesis and replay attacks. The scientific community is systematically developing techniques and countermeasures against technically generated attacks. However, voice modifications produced by a human, such as impersonation and voice disguise, cannot be easily detected with the developed countermeasures.

Voice impersonation is common in the entertainment industry where professionals and amateurs are able to copy voice characteristics and speech behaviour of other speakers, usually public figures. An easier way of voice modification is voice disguise where speakers modify their voices to avoid being recognised as themselves. The latter type of modification is common in situations that do not require face-to-face communications and may vary from innocent prank calls to crimes such as blackmailing or threatening calls. Consequently, this issue prompts an interest to improve the robustness of recognition against human-induced voice modifications.

The study analysed speech from two professional impersonators who mimicked eight Finnish public figures. Additionally, the study of voice disguise included acted speech from 60 Finnish speakers who participated in two recording sessions. The speakers were asked to modify their voices to fake their age, attempting to sound like an old person and like a child. The study found that impersonators were able to fool automatic systems and listeners in mimicking some speakers. In the case of acted speech, a successful strategy for modification was to sound like a child, as both automatic systems' and listeners' performance degraded with this type of disguise.

More information: Rosa González Hautamäki et al, Acoustical and perceptual study of voice disguise by age modification in speaker verification, Speech Communication (2017). DOI: 10.1016/j.specom.2017.10.002

Citation: Voice impersonators can fool speaker recognition systems (2017, November 15) retrieved 29 March 2024 from https://phys.org/news/2017-11-voice-impersonators-speaker-recognition.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Protecting your smartphone from voice impersonators

29 shares

Feedback to editors