Tools for capturing and analyzing keyboard input paired with microphone capture
The main goal is to exploit the sound produced by pressing keyboard keys as a side channel in order to guess the content of the text being typed. To achieve this, the algorithm takes as input a training set, consisting of an audio recording, together with the corresponding keys being typed during this recording.
Using this data, the algorithm learns what is the sound of different keystrokes and later attempts to recognise the sounds only using captured audio. The training set is very specific, in a sense that it targets a single setup – keyboard, microphone and relative position between the two. Changing any of these factors renders the approach useless. As a bonus, the current implementation does the prediction in real-time.
The main steps involved in the implementation are the following:
- collecting training data
- creating a prediction model (learning step)
- keypress detection
- predicting the key for a detected keypress
In the current implementation, the sound in-between keystrokes is simply discarded. We keep only the audio within 75-100 ms before and after the actual press. This is a bit tricky, as it seems there are random delays between the key being pressed and the event being captured by the program – most likely both hardware and software factors are involved.
- SDL2 – used to capture audio and to open GUI windows libsdl
- FFTW3 – some of the helper tools perform Fourier transformations fftw