AI researchers claim 93% accuracy in detecting keystrokes over Zoom audio

Woman setting up a microphone right by her MacBook — Enlarge / Some people hate to hear other people’s keyboards on video calls, but AI-backed side channel attackers? They say crank that gain.

Getty Images

reader comments
65 with

By recording keystrokes and training a deep learning model, three researchers claim to have achieved upwards of 90 percent accuracy in interpreting remote keystrokes, based on the sound profiles of individual keys.

In their paper A Practical Deep Learning-Based Acoustic Side Channel Attack on Keyboards (full PDF), UK researchers Joshua Harrison, Ehsan Toreini, and Marhyam Mehrnezhad claim that the trio of ubiquitous machine learning, microphones, and video calls “present a greater threat to keyboards than ever.” Laptops, in particular, are more susceptible to having their keyboard recorded in quieter public areas, like coffee shops, libraries, or offices, the paper notes. And most laptops have uniform, non-modular keyboards, with similar acoustic profiles across models.

Previous attempts at keylogging VoIP calls, without physical access to the subject, achieved 91.7 percent top-5 accuracy over Skype in 2017 and 74.3 percent accuracy in VoIP calls in 2018. Combining the output of the keystroke interpretations with a “hidden Markov model” (HMM), which guesses at more-likely next-letter outcomes and could correct “hrllo” to “hello,” saw one prior side channel study’s accuracy jump from 72 to 95 percent—though that was an attack on dot-matrix printers. The Cornell researchers believe their paper is the first to make use of the recent sea change in neural network technology, including self-attention layers, to propagate an audio side channel attack.

Training and validation accuracy for the researchers’ study, with phone-recorded data on the left, and Zoom on the right.

IEEE/Durham University
The microfiber towel was to reduce table vibration pickup. It’s a bit conspicuous, but it also might not be necessary, given how well the Zoom results turned out.

IEEE/Durham University
The process for turning audio recordings into machine-learning-friendly bits.

IEEE/Durham University
More detail on how audio files were transformed into data ready for analysis.

IEEE/Durham University

The researchers used a 2021 MacBook Pro to test their concept, a laptop that “features a keyboard identical in switch design to their models from the last two years and potentially those in the future,” typing on 36 keys 25 times each to train their model on the waveforms associated with each key. They used an iPhone 13 mini, 17 cm away, to record the keyboard’s audio for their first test. For the second test, they recorded the laptop keys over Zoom, using the MacBook’s built-in microphones, with Zoom’s noise suppression set to its lowest level. In both tests, they were able to achieve higher than 93 percent accuracy, with the phone-recorded audio edging closer to 95-96 percent.

a collection of mechanical keyboards with different switch types, but the researchers had no particular say on that strategy.

Sound-based side channel attacks on sensitive computer data are sometimes seen in research, though rarely in disclosed breaches. Scientists have used computer sounds to read PGP keys, and machine learning and webcam mics to “see” a remote screen. Side channel attacks themselves are a real threat, however. The 2013 “Dropmire” scandal that saw the US spying on its European allies was highly likely to have involved some kind of side channel attack, whether through wires, radio frequencies, or sound.

Article Tags:

featured

Article Categories:

Technology

AI researchers claim 93% accuracy in detecting keystrokes over Zoom audio

Related Articles