This CREST report explores how keystroke dynamics can be used to identify the name and native language of an anonymous user.

Executive summary

Keystroke dynamics is the analysis of how an individual uses a keyboard. These typing behaviours can be as uniquely identifiable as a person’s handwriting or signature and this data can reveal identifying characteristics about an individual.

The analysis of the typing rhythm and cadence of a user can be used to identify an individual, while also providing information about the person sitting at the keyboard, which can include characteristics such as handedness, hand size, or typing style.

Whereas previous work largely focused on confirming the identity of an anonymous user, this work aimed to understand more about the individual using the device by determining the name and native language of an anonymous user, based on how they type.

The first experiment focused on determining the name of an anonymous user and collected typing samples from 84 users. The participants completed several typing exercises where the timing of each keystroke was recorded. The typing data were subdivided into substrings of two characters (bigrams), and the time between releasing the first key and pressing the second (the flight time) was calculated. The research hypothesis was that those bigrams with a greater familiarity with the user will have a discernibly higher ranking than those that are not as commonly used.

The research made use of machine learning classifiers to develop a model that is capable of a balanced accuracy prediction of approximately 70% of the bigrams in an anonymous user’s name.

The second experiment aimed to predict the native language of an individual based on an analysis of their typing behaviours. The experiment collected data from 492 participants across five native languages (English, French, German, Spanish, and Italian) with around 100 people in each group. The participants were again required to complete typing exercises and this data was segmented into bigrams.

Again, machine learning classifiers were used to predict a user’s native language. In the first instance, the research aimed to distinguish whether English was the user’s native language (i.e. English versus French, Spanish, German, Italian) with a balanced accuracy of 71%, using the SVC classifier. 

When predicting the native language of an individual, based on five languages, the approach achieved a balanced accuracy of 45%. While this offers significant room for improvement it does perform notably better than a random prediction.

The research established that users display repeatable and predictable typing behaviours based on familiar identity or linguistic data.

Read more
  • E. F. Gehringer, Choosing passwords: Security and Human Factors in Technology and Society, 2002 (ISTAS’02). 2002 International Symposium on. IEEE, 2002, pp. 369–37
  • R. V. Yampolskiy and V. Govindaraju, Behavioural biometrics: a survey and classification, International Journal of Biometrics, vol. 1, no. 1, pp.81–113, 2008
  • F. Monrose and A. D. Rubin, Keystroke dynamics as a biometric for authentication, Future Generation computer systems, vol. 16, no. 4, pp.351–359, 2000
  • P. M. Fitts and M. I. Posner. Human performance. 1967
  • K. Delac and M. Grgic, A survey of biometric recognition methods in Electronics in Marine, 2004. Proceedings Elmar 2004. 46th International Symposium. IEEE, 2004, pp. 184–193
  • S. Douhou and J. R. Magnus, The reliability of user authentication through keystroke dynamics, Statistica Neerlandica, vol. 63, no. 4, pp.432–449, 2009
  •  A. Dvorak, N. L. Merrick, W. L. Dealey, and G. C. Ford, Type writing behaviour, New York: American Book Company, vol. 1, no. 6, 1936
  • S. Bleha, C. Slivinsky, and B. Hussien, Computer-access security systems using keystroke dynamics, IEEE Transactions on pattern analysis and machine intelligence, vol. 12, no. 12, pp. 1217–1222, 1990
  • D. Rudrapal, S. Das, and S. Debbarma, Improvisation of biometrics authentication and identification through keystrokes pattern analysis, in International Conference on Distributed Computing and Internet Technology. Springer, 2014, pp. 287–292
  •  M. Rybnik, M. Tabedzki, and K. Saeed, A keystroke dynamics-based system for user identification, in Computer Information Systems and Industrial Management Applications, 2008. CISIM’08. 7th. IEEE, 2008, pp. 225–230
  • P. Pinto, B. Patrao, and H. Santos, Free-typed text using keystroke dynamics for continuous authentication, in IFIP International Conference on Communications and Multimedia Security. Springer, 2014, pp.33–45
  •  A. Messerman, T. Mustafic, S. A. Camtepe, and S. Albayrak, Continuous and non-intrusive identity verification in real-time environments based on free-text keystroke dynamics, in Biometrics (IJCB), 2011 International Joint Conference on. IEEE, 2011, pp. 1–8
  • C. Epp, M. Lippold, and R. L. Mandryk, Identifying emotional states using keystroke dynamics, in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2011, pp. 715–724 F. Bergadano, D. Gunetti, and C. Picardi, User authentication through keystroke dynamics, ACM Transactions on Information and System Security (TISSEC), vol. 5, no. 4, pp. 367–397, 2002
  •  R. Giot, M. El-Abed, and C. Rosenberger, Keystroke dynamics overview, in Biometrics. In Tech, 2011, pp. 157–182
  • A. K. Jain, S. C. Dass, and K. Nandakumar, Soft biometric traits for personal recognition systems, in Biometric Authentication. Springer, 2004, pp. 731–738
  • S. Z. S. Idrus, E. Cherrier, C. Rosenberger, and P. Bours, Soft biometrics for keystroke dynamics: Profiling individuals while typing passwords, Computers & Security, vol. 45, pp. 147–155, 2014
  • [M. Monaro, R. Spolaor, Q. Li, M. Conti, L. Gamberini, and G. Sartori, Type me the truth! Detecting deceitful users via keystroke dynamics, in Proceedings of the 12th International Conference on Availability, Reliability and Security, 2017, pp. 1–6