Special sessions

Speech Prosody 2024 includes seven special sessions. When making a submission, authors are asked to indicate whether they want their paper to be considered for a special session. You can find descriptions of each below.

Description:

This special session aims to foster discussions among speech scientists and technologists on the application of Information Theory in prosodic research, particularly exploring the potential of Large Language Models (LLMs) within this framework. The session seeks contributions that highlight the utility of Information Theory in prosodic research and delve into the interplay between prosody and information-theoretic effects in speech. Information Theory provides explicit measures, including entropy, information content, redundancy, and mutual information, which offer a formal approach to studying prosodic phenomena and their relationship to linguistic variation.

Furthermore, the advent of Large Language Models has revolutionized language understanding and generation. In the special session, contributions are invited to examine the applicability of current information-theoretic measures, such as surprisal, informativity, and conditional entropy, in quantifying information content and determining prosodic variation. The session also aims to explore emerging measures that leverage the capabilities of LLMs for modeling language, offering a deeper understanding of linguistic content in prosodic research based on information-theoretic principles.

Description:

Spoken language often places multiple competing demands on the speech organs, creating an intriguing tension between the segmental and suprasegmental domains. As an example, phonological laryngeal contrasts, such as voicing contrasts in obstruents, are largely regulated with the same laryngeal articulators that govern fundamental frequency (F0). Consequently, the implementation of a voicing contrast typically exerts a localized influence on the resulting F0 contours (House & Fairbanks 1954). This particular microprosodic effect has been widely regarded as one of the main drivers in the emergence of phonological tone (e.g. Ohala 1973; Hyman 1976). Another well-known example is the intrinsic F0 of vowels. During the production of high vowels, tongue raising stretches the vocal folds, which increases F0, while during the production of low vowels, jaw lowering serves to slacken the vocal folds, which decreases F0 (e.g. Chen et al. 2021).

These examples can be considered universal or near-universal in the sense that they are phonetically natural and exhibit a high degree of consistency across the world’s languages. However, the phonological environments, magnitude, and temporal extent of segment–prosody interactions have also been shown to vary by language. For example, aspiration triggers an F0-lowering effect in some languages but not others (e.g. Chen 2011), and vowel-intrinsic F0 tends to be suppressed by a phonological low tone target (Connell 2002). Additionally, in some languages, segment–prosody interactions serve as minor secondary cues to a contrast, while in others, they develop over time into primary contrasts, triggering tonogenesis.

In recent years, there has been a surge of renewed interest in segment-intrinsic microprosody, which highlights some of the remaining open questions, e.g.: How ‘automatic’ is segment-intrinsic microprosody and to which extent can it be controlled? What is the exact relation between pitch and other segment-intrinsic microprosodic phenomena, such as segment-induced variations in voice quality and formants? How does all this relate to higher-level phonological units and sound change? These questions mainly arise from gaps in our understanding of the mechanisms driving segment-intrinsic prosody. With this special session, we hope to showcase studies that broaden our knowledge with research into previously understudied phenomena and under-documented languages from the perspective of segment–prosody interactions. We particularly welcome papers that probe and model the extent of typological and interspeaker variability in segment-intrinsic prosody.

References
Chen, Wei-Rong, Douglas H. Whalen & Mark K. Tiede. 2021. A dual mechanism for intrinsic f0. Journal of Phonetics 87. doi:10.1016/j.wocn.2021.101063.
Chen, Yiya. 2011. How does phonology guide phonetics in segment-f0 interaction? Journal of Phonetics 39(4), 612–625. doi:10.1016/j.wocn.2011.04.001.
Connell, Bruce. 2002. Tone languages and the universality of intrinsic F0. Evidence from Africa. Journal of Phonetics 30(1), 101–129. doi:10.1006/jpho.2001.0156.
House, Arthur S. & Grant Fairbanks. 1953. The influence of consonant environment upon the secondary acoustical characteristics of vowels. Journal of the Acoustical Society of America 25(1), 105–113. doi:10.1121/1.1906982.
Hyman, Larry M. 1976. Phonologization. In Alphonse G. Juiland (ed.), Linguistic studies offered to Joseph Greenberg on the occasion of his sixtieth birthday, 407–418. Saratoga: Anma Libri.
Ohala, John J. 1973. The physiology of tone. In Larry M. Hyman (ed.), Consonant types and tone (Southern California Occasional Papers in Linguistics 1), 1–14. Los Angeles: University of Southern California.

Notes:
This special session is designed to accommodate a maximum of 6 oral talks. Depending on the quantity and quality of the abstracts we receive, there is also the possibility of including posters in the session.

Description:

Pitch plays a critical role in human communication through different manifestations in various types of music and language. Human perception not only needs to adapt to pitch in language but also in music across the life span. Empirical evidence shows that the ability to process pitch varies across different individuals and can influence the processing of language and music bidirectionally.

While the decoding of pitch in non-tonal languages may mainly involve post-lexical perceptual processes, pitch information in tone languages provides independent cues and interacts with segments in real-time word recognition. Yet, in music perception, pitch processing intertwines with not only lower-level perceptual processes but also higher-order musical structures and properties. How is pitch processed in these seemingly different cognitive domains? How do linguistic and musical skills contribute to domain-general auditory processing, and vice versa? Do the same factors influence pitch processing in individuals from different populations in a similar way? In this special session, we call for papers that show how pitch contributes to the process of language comprehension and music perception across different populations.

We welcome papers that address these questions from different angles and in different populations, including native tone or non-tonal language speakers with and without specific language impairment, bilingual listeners, children who are acquiring a tone language, musicians, and individuals with neurodevelopmental conditions such as autism, amusia, and dyslexia. The goal of this session is to provide a forum for researchers to gather empirical evidence that contributes to the theoretical development of how pitch drives language comprehension and music perception. We also encourage submissions which investigate this issue using a variety of methods, including behavioural measures, eye-tracking, EEG, and fMRI/MEG.

Description:

Speakers employ intonation for diverse communicative purposes, ranging from the signalling of different linguistic functions, such as asking questions, marking focus, or juncture, to the indexing of socio-cognitive information, such as a speaker’s emotional or affective state, stance, or attitude (Chen, 2022; Grice et al., 2023; Gussenhoven, 2004; Ladd, 2008; Xu, 2019). For example, words in focus are typically realized with larger f0-movements, longer durations, and amplified intensity, as compared to non-focused words. Also, happy voice is most often associated with raised f0, while a sad tone of voice goes with a lowering of f0.

To date, these linguistic and socio-cognitive functions of intonation have often been considered separately in both theoretical frameworks and experimental studies, even though they might interact. This interplay between different communicative functions of intonation is particularly intriguing in languages with lexical tone, such as African tone languages or Sinitic varieties, where f0 is the primary cue to encode lexical meaning in addition to its post-lexical functions (e.g., Hyman et al., 2020; Zhang et al., 2020). In line with this, recent studies have shown that prosodic differences between neutral (information-seeking) questions and rhetorical questions in a variety of languages, both tonal and non-tonal, could be explained by a combination of assertive force, focus, and (negative) speaker attitude (Zahner-Ritter, Chen, et al., 2022; Zahner-Ritter, Einfeldt, et al., 2022).

A fruitful avenue for intonational research is thus to consider the multidimensionality of intonational functions to advance our knowledge of how linguistic and socio-cognitive functions are intertwined in different communicative contexts. Comparative approaches across tonal and non-tonal languages will help unravel the typological differences and similarities. Furthermore, analyses of intonational meaning also need to go beyond f0 variation and include other prosodic cues such as duration or voice quality. Additionally, it’s essential to examine how intonation interacts with other linguistic aspects (e.g., lexis and syntax) and non-linguistic elements (e.g., facial expressions or gestures).
Our proposed special session aims to foster discussion on how to integrate the linguistic and socio-cognitive functions of intonation in conveying communicative meanings. We welcome submission on typologically diverse languages, especially those that explore the interaction between lexical tone and intonation, and submissions on understudied languages and populations. We also encourage contributions that approach the topic from a methodological perspective. Our proposed special session aligns with two of the four themes outlined for Speech Prosody 2024. Specifically, we delve into a) cross-linguistic variation encompassing typologically diverse languages, placing emphasis on b) individual and social variation.

References
Chen, Y. (2022). Mind the subtle f0 modifications: The interaction of tone and intonation in Sinitic varieties. Stellenbosch Papers in Linguistics Plus, 62(02). https://doi.org/10.5842/62-2-904
Grice, M., Wehrle, S., Krüger, M., Spaniol, M., Cangemi, F., & Vogeley, K. (2023). Linguistic prosody in autism spectrum disorder—An overview. Language and Linguistics Compass, 17(5), e12498. https://doi.org/10.1111/lnc3.12498
Gussenhoven, C. (2004). The phonology of tone and intonation. Cambridge University Press.
Hyman, L. M., Sande, H., Lionnet, F., Rolle, N., & Clem, E. (2020). Sub-Saharan Africa. In C. Gussenhoven & A. Chen (Eds.), The Oxford Handbook of Language Prosody (pp. 182–194). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780198832232.013.11
Ladd, D. R. (2008). Intonational phonology (2nd ed.). Cambridge University Press.
Xu, Y. (2019). Prosody, Tone and Intonation. In W. F. Katz & P. F. Assmann (Eds.), The Routledge Handbook of Phonetics (pp. 314–356). Routledge.
Zahner-Ritter, K., Chen, Y., Dehé, N., & Braun, B. (2022). The prosodic marking of rhetorical questions in Standard Chinese. Journal of Phonetics, 95, 101190. https://doi.org/10.1016/j.wocn.2022.101190
Zahner-Ritter, K., Einfeldt, M., Wochner, D., James, A., Dehé, N., & Braun, B. (2022). Three kinds of rising-falling contours in German wh-questions: Evidence from form and function. Frontiers in Communication. https://doi.org/10.3389/fcomm.2022.838955
Zhang, J., Duanmu, S., & Chen, Y. (2020). Prosodic systems of China and Siberia. In C. Gussenhoven & A. Chen (Eds.), The Oxford Handbook of Language Prosody (pp. 331–343). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780198832232.013.21

Description:

Although prosody is an intrinsic aspect of oral language production, since the early models of speech production, work on the prosodic planning of utterances has been quite limited. Levelt's language production model famously contained a Prosody generator, the workings of which were at the time still, in his own words "enigmatic". Some decades later, this characterisation still largely holds. Two important models—the Levelt (1989) and Keating and Shattuck-Hufnagel (2002) models—propose the existence of a minimal generic prosodic representation for the planned phrase or utterance, enriched appropriately as more information about morpho-syntactic content is generated. However, parts of their models still lack detail and—perhaps for that same reason—little experimental work has been done to examine the characteristics of the initial and final representations of prosodic structure. Various studies have focused on specific aspects of prosody—e.g., how many categories of pitch accent are there in a sentence or how pauses reflect planning or phrase difficulty. Nevertheless, some of the main questions posed by the early theoretical work still remain to be answered, particularly those regarding the mechanisms behind prosodic planning. For example, how far ahead is prosodic structure planned and how much look-ahead is required? Is prosodic information available before or after segmental information? In both cases, there is seemingly contradictory evidence. For one, some models that address the prosodic structure of a sentence claim an overall structure that requires a knowledge of the sentence as a whole, whereas evidence from production studies suggests that production is incremental. With respect to when prosodic information on contrastive focus is available, there are models that suggest that this is added at the phonological/phonetic stages whereas others suggest it is planned at the conceptual stage.

Given the difficulty in putting together individual pieces of evidence, this special session aims to bring together researchers from different fields—phonetics, phonology, psycholinguistics, neurolinguistics, cognitive science—in order to jointly address these questions. We will invite contributions by speakers investigating prosody production from various angles, including experimental and corpus approaches. Examples include empirical work on the scope of prosodic planning, individual differences in the realisation of prosodic cues, and corpus analyses on the conversational contexts in which specific prosodic cues occur.

By highlighting the varied work that has been done in recent years on this topic, we hope to begin to understand as well as address the gaps and challenges that are involved in understanding prosodic production. This interdisciplinary special session will therefore illustrate the current state of the art and call attention to the main research questions surrounding prosody production, from planning to articulation.

References

Keating, P., & Shattuck-Hufnagel, S. (2002). A prosodic view of word form encoding for speech production. UCLA working papers in phonetics, 112-156.
Levelt, W. 1989. Speaking. From intention to articulation. Cambridge, MA: MIT Press.