Lips pouted or not? How improved speaker recognition can help forensic investigations

Police investigations use wiretapped phone recordings as investigative material fairly regularly. But how do they know that the voice on the recording actually belongs to the suspect? PhD student Laura Smorenburg is trying to answer that question.

Everyone has a different voice. So we often know immediately whether we are hearing either that one friend talking or rather a person you'd cross the street to avoid. But if the police intercept a phone call, it is more difficult to say with 100 per cent certainty who they are hearing on the tape. Besides, the quality is worse than, say, a microphone recording. This makes things difficult for the police, who often work with phone recordings. Forensic analysts are therefore trying to improve speaker recognition.

Role of consonants

One way to do that is to zoom in on the different sounds in a language. Smorenburg looked specifically at the role of two groups of consonants: nasal and fricative consonants. 'For example, how does someone pronounce the G in Dutch? Is it a soft or a hard G? If you compare different speech characteristics of one person from different types of recordings, you can examine whether it’s likely to be the same speaker,' she says. 

Some sounds contain more information about the speaker than other sounds. 'The phonetic context of a word can influence pronunciation. When you speak, you don't pronounce the sounds separately, but joined together. That affects the sounds themselves. The G in Dutch is a good example: In the word ‘geen’ (meaning ‘none’), it sounds different from in the word ‘goed’ (meaning ‘good’) because your lips are in a different position. People have a different degree and timing of lip pouting when they pronounce rounded sounds like the vowels in ‘good’, so I wanted to know if those differences between different speakers matter for forensic analysis,' Smorenburg explains. If sounds in certain sound contexts contain more speaker information, forensic analysts could work more effectively.

Minimal gain

The study found that analysing nasal and fricative consonants in specific sound contexts does indeed lead to slightly more speaker information, although Smorenburg does add a caveat: ‘In practice, you have to make do with what you have, because there is often very little data available for analysis. Examining only nasal and fricative consonants from specific contexts gives you only minimal gains in evidence. In principle, this is also good news, because it means that forensic investigators do not have to consider phonetic context in their analyses.'

