Lecture | Lunch included
Florence Nightingale Colloquium presents Odette Scharenborg
- Friday 21 February 2020
- The seminar is targeted at a broad audience, in particular we invite master students, PhD candidates and supervisors interested or involved in the Data Science Research programme as well as colleagues from LIACS and MI to attend. The seminar is organized by the DSO, MI and LIACS.
- Florence Nightingale Colloquium
Niels Bohrweg 1
2333 CA Leiden
Link to registration form.
Speech representations and speech processing in humans and deep neural networks
Speech recognition is the mapping of a continuous, highly variable speech signal onto discrete, abstract representations. The question how is speech represented and processed in the human brain and in automatic speech recognition (ASR) systems, although crucial in both the field of human speech processing and the field of automatic speech processing, has historically been investigated in the two fields separately. I will argue that comparisons between humans and deep neural network (DNN)-based ASRs and cross-fertilization of the two research fields can provide valuable insights into the way humans process speech and improve ASR technology. Specifically, I will present results of several experiments carried out on both human listeners and DNN-based ASR systems on the representation of speech in human listeners and DNNs and on lexically-guided perceptual learning, i.e., the ability to adapt a sound category on the basis of new incoming information resulting in improved processing of subsequent information. I will explain how listeners adapt to the speech of new speakers, and I will present the results of a lexically-guided perceptual study we carried out on a DNN-based ASR system, similar to the human experiments. In order to investigate the speech representations and adaptation processes in the DNN-based ASR systems, we visualized the activations in the hidden layers of the DNN. These visualizations revealed that DNNs use speech representations that are similar to those used by human listeners, without being explicitly taught to do so, and showed an adaptation of the phoneme categories similar to what is assumed happens in the human brain.