Leiden University Centre for Digital Humanities

Small Grants 2024 Research Projects

The LUCDH foster the development of new digital research by awarding a number of Small Grants each year. As in previous years the LUCDH received a large number of excellent grant applications for Research and Personal Development funds. Congratulations to the recipients of this year's research awards!

Small Grants 2024 Research Projects

Tessa de Boer & Ramona Negrón & Jessica den Oudsten

Dear Digital Diary: An Exploration of the Huydecoper Diaries (1648-1704) through Handwritten Text Recognition

This project focuses on Joan Huydecoper van Maarsseveen (1625-1704). Huydecoper was a prominent individual who served as Burgomaster, Director of the Dutch East India Company, and Council in the Admiralty in Amsterdam. His diaries, including copies of his letters and lists of expenses, written between 1648 and 1704, have survived (Utrechts Archief). They provide a wealth of information about a variety of themes. For one, they provide a unique insight into Huydecoper's interactions with the Amsterdam elite. But the information in his diaries extends far beyond. Huydecoper corresponded, for example, with his servants and schoolmasters, as well as with scholars, diplomats, and merchants worldwide, in Dutch, French, Italian, and Latin.

Yet, despite the potential of his diaries as a source, they remain understudied. The primary obstacles for historians are the extensive volume of the diaries and Huydecoper's challenging handwriting. Our aim is, therefore, to make his diaries searchable by utilizing the latest digital methods.

The project has four principal goals:

Develop a high-quality HTR model to generate instant transcriptions for a historically relevant collection of primary sources. This will transform the currently inaccessible collection into a searchable full-text database, saving time and increasing research efficiency.
Develop a research agenda based on the primary source, utilizing collective expertise to identify priorities across historical subfields. This involves raising awareness of the source and the developed search tool.
Train a promising student in paleography and cutting-edge digital tools, bridging the gap between traditional and digital methodologies in source work.
Promote collaboration between academic and industrial expertise by working with archives.

The successful implementation of the HTR model will drastically enhance accessibility to the Huydecoper Diaries, offering new perspectives on various historiographical debates.

Jacob van Loo: Portrait of Joan Huydecoper

Louie Buana & Muhammad Asyrafi

Lontara Digital: Developing Text Recognition Model for Lontara Manuscripts Using Transkribus and PyLaia

Lontara manuscripts from South Sulawesi have earned a reputation among scholars of Southeast Asian studies for the realistic and open-ended portrayal of the past, in contrast to other local traditions that mix myth and legend with history. In the words of local scholar Abidin (1971), Lontara chronicles record events in a matter-of-fact way, without excessively flattering the rulers, which serves as useful and reliable preliminary sketches for reconstructing history. Among many genres of Lontara manuscripts in South Sulawesi, a local pre-Islamic epic called La Galigo rises to prominence. La Galigo is the world's longest epic (even longer than the Mahabharata) that was acknowledged by UNESCO as Memory of The World for Indonesia and The Netherlands in 2011.

This project aims to provide a tool for the accessibility and literacy of Lontara sources from South Sulawesi through the use of technology. This pilot project will develop a Customised Text Recognition Model prototype for the Lontara script by utilising Transkribus and PyLaia. The project will begin with creating a fresh Ground Truth from the La Galigo manuscripts kept in Leiden University Special Collection.

Ultimately, this project will compile the digitised documents written in Lontara script scattered in libraries and institutions worldwide. By doing this, the project will map, catalogue, and preserve Buginese and Makassarese knowledge and open many research possibilities. In the long run, we attempt to make the Lontara collection searchable using Handwritten Text Recognition (HTR) technology. The expected results from this pilot project, therefore:

Text Recognition Model for Lontara script;
Web application for Latin-Lontara and Lontara-Latin transliteration;
Blog/website to showcase the database of scattered Lontara manuscripts.

The project is also expected to open up opportunities for further exploration of non-Western historical sources in the Leiden University collections through HTR technology. At the time of writing this proposal, there are only five non-western public models available in Transkribus. As a result, the project aims to address the gaps in the non-western script model in Transkribus.

Lettie Dorst, Alina Karakanta, Katinka Zeven and Mayra Nas

Reading Metaphor in Literary Machine Translation and Post-Editing

This project focuses on metaphors in literary machine translation. It investigates what happens to metaphors when literary texts are machine-translated, for example using Google Translate or ChatGPT, and how readers respond to machine-translated metaphors. While a small but growing number of studies has investigated the usefulness of Statistical and Neural Machine Translation for literary translation, obtaining promising results for different genres (short stories, novels, poems) and language pairs (including English-French and English-Spanish), no studies have thus far focused specifically on metaphors and whether the way Machine Translation handles metaphors affects how readers understand and appreciate literary texts. A recent study by Guerberof-Arenas and Toral (2022) suggests that metaphors are likely to require creative solutions in translation, while Machine Translation tends to be direct, producing a literal translation by default. However, we don’t know whether readers interpret unidiomatic or unexpected literally translated metaphors as errors or as creativity.

The project will involve three main activities:

The first activity involves machine-translating 10 excerpts from English novels into Dutch using an Neural Machine Translation model (Google Translate) and a Large Language Model (ChatGPT).
For the second activity, the Machine Translation output will be corrected and revised (post-edited) by two professional literary translators to determine which ones they consider incorrect or inappropriate (linguistically, stylistically or culturally) as compared to how a human would translate them.
Finally, the third activity addresses the reactions of readers when they read literary passages with machine-translated metaphors. An online questionnaire will be used to determine how understandable, readable and enjoyable they are from the readers’ perspective.

Jeremy Farrell

Mapping the development of Quranic Reading Traditions (QRTs) with the HAMaLaT-Quran Database (8th–11th centuries CE)

In the past twenty five years, the field of Quranic Studies has witnessed the wide adoption of computational tools and frameworks of inquiry, which in turn has enabled scholarship on both the literary forms and manuscript tradition of the Quran to shed new light on the history of the sacred scripture and early communities of Muslims. Increasingly, researchers wishing to extend the scale of inquiry have begun applying similar computational tools and frameworks to a vast source of data, the Quranic Reading Traditions: the textually encoded record representing a historical figure's recitation of the entire text of the Quran and which is accepted as valid for ritual purposes. The formulation and subsequent transmission of these Quranic Reading Traditions occurred, over the course of the 7th through 11th centuries CE, through the efforts of an ethnically and economically diverse network of hundreds of actors from across the geographical expanse of the Islamic polity over more than 400 years. With the support of the Leiden University Centre for Digital Humanities, the Historical Atlas of the Mobility, Literature, and Transmission of the Quranic Reading Traditions (HAMaLaT-Quran) project models the available prosopographical data using digital tools, including network analysis (SNA), geographic information systems (GIS), and data visualization libraries. The resulting database will serve as a tool to enable future research to formulate and test precise questions about not only the growth and spread of individual reading traditions, but also the nature of the transmission of culturally vital information within the early Islamic social system.

Biography
Jeremy Farrell is a Postdoctoral Research Scholar at the Leiden University Centre for Linguistics (LUCL), as part of the ERC-Consolidator project entitled QurCan: The Canonization of the Quranic Reading Traditions. His scholarship utilizes text critical and computational methods, focusing on diverse aspects of Islamic society and Arabic literature, particularly the influence of textuality on the transmission of culture.

Nivja de Jong & Paz Gonzalez Gonzalez

The Effect of Immersive 360 Tasks on Aspects of L2 Speaking

Within language learning studies, 360-degree video technology is recognised for providing learners with a realistic and authentic environment. This technology creates immersive content from real-world footage, and fosters a sense of presence and engagement in language learning environments. In prior studies funded by LUCDH projects, speaking tasks utilizing 360-degree techniques were developed to prompt speech for applied linguistic research. Although these techniques were tested in previous projects, a comprehensive evaluation of their effectiveness and outcomes is still pending.

The present project aims to address this research gap by evaluating the effect of using immersive 360 videos as elicitation compared to speech elicited through the same tasks presented in two types of 2D formats: normal 2D videos and cartoons. The comparison involves quantitative measurements of the naturalness of the speech in the three conditions and a qualitative evaluation by students. Because the production and use of speaking tasks with the 360 technique are more expensive than 2D videos or simple cartoons, the outcomes of the research will inform future researchers and language teaching professionals to evaluate the worth of creating 360 videos for speaking tasks in research and teaching.

Sarita Koendjbiharie

Virtual reality storytelling, embodied experience, empathy and understanding (Correspondents of the World)

Virtual reality storytelling, embodied experience, empathy and understanding

In his 2015 TED talk Chris Milk touted Virtual Reality as ‘ultimate empathy machine’. Why? Since with this technology, one can viscerally experience anything from another person’s perspective.

VR experiences, or immersive virtual environments (IVEs), are 3D, computer generated environments which enable free movement and interaction with these surroundings by users. In IVEs the perceptual input of the real world is replaced by perceptual input from a virtual world (Herrera et al. 2018).

Researchers have created IVEs for various purposes usually for perspective-taking of stigmatised or marginalised groups of people. For example, ‘the 1000 cut journey’, an IVE in which participants embody a black male, Michael Sterling, experiencing racism as a child, adolescent and adult. Or ‘Carne y Arena’ where one is transported to the Mexican desert to join a caravan of migrants about to cross the United States border with a ‘coyote’ or smuggler.

Evidence on the impact of IVEs on changes in empathy behaviour is however mixed and based mainly on survey data. This study addresses the need for more clarity on user ‘s understanding of the other after walking in their ‘virtual shoes’.

Correspondents of the World is a platform that invites people around the world to tell their stories and share their personal experiences in relation to global issues including gender and sexuality, migration and exile, liberation and human rights. In this study, we create an IVE of such a story location. For example: Black Lives Matter: An Experience in the Train or Gender-neutral Toilets – A Safe Space Taken Away?

In this study the IVE will enable participants to feel the perspective of a character/author depicted in the story through access to the sights and sounds, and even to the feelings and emotions, associated with the personal story.

The IVE enables study of embodied experience i.e. action taken with one’s own body, emotional arousal and its impact on empathy. Insights have been accumulating, there is space however, for more exploratory qualitative research into user experience and empathic arousal after Virtual Reality storytelling, particularly compared to online reading and imagining the context.

Through this research we seek a more nuanced understanding on the impact of Virtual Reality storytelling on users, its strengths and pitfalls including blind morality (Bloom, 2017). We aim to analyse in-depth the extent to which Virtual Reality can foster empathy and enhance better understanding in the world.

Sources

Milk C. How virtual reality can create the ultimate empathy machine [Internet]. TED: Ideas worth spreading. 2015 [cited 2023Dec13]. How virtual reality can create the ultimate empathy machine (TED)
Herrera, F., Bailenson, J., Weisz, E., Ogle, E., & Zaki, J. (2018). Building long-term empathy: A large-scale comparison of traditional and virtual reality perspective-taking. PloS one, 13(10), e0204494.
1000 cut journey (Cogburn Research Group)
Carne-y-arena (Virtually present, Physically invisible)
Black lives matter an experience in the train (CotW)
Gender neutral toilets a safe space taken away (CotW)
Bloom, P. (2017). Against empathy: The case for rational compassion. Random House.

Updated text and image (20 Feb 2025)

Over the second half of 2024, Dr. Sarita Koendjbiharie and Vennila Vilvanathan (VR Research Programmer) have developed and completed a virtual reality (VR) environment for researching the elicitation of empathy. This VR environment simulates a real-life experience by Naomi, a female passenger of color in a first class train compartment. Her online written story has been the basis for developing the VR setting. In the final phase, Naomi visited the SSH lab in the Silvius building to experience VR first-hand. She gave constructive feedback used for finetuning the simulation, and encouragingly conveyed her enthusiasm about the outcome. The VR setting is life-like according to her, and the interaction with the train conductor captures importance nuances.

Demos at the opening of the Humanities Hub and Immersive Tech Event also led to feedback used for improvement and insight. The research including data collection through VR experiments, is set to take place in the first half of 2025.

Alex Reuneker

D or t? Using Big Data to Explore Linguistic Factors in Dutch Verb Spelling

Introduction

The rules for Dutch verb conjugation are simple and understandable, yet they remain a stumbling block for both students in secondary and higher education, and for the working population. Previous studies show that errors in verb spelling, even those by skilled writers, are due to factors such as time pressure and frequency-dominance in homophone verbs. For instance, the finite verb verhuist (to move) and the participle verhuisd (has moved) have identical pronunciation, but different orthographies. As the participle verhuisd has a higher frequency of use, it is often used in linguistic contexts that require verhuist. The mainly psycholinguistic approach of existing studies to understand such problems has targeted homophone dominance specifically for its relation to extralinguistic factors such as working memory, highlighting that cognitive pressure may cause fallback from computation to look-up of the most frequent word form in memory. In the literature, however, several intra-linguistic factors are hypothesized to be of influence but remain to be systematically tested.

[Updated: 25 Feb 2025]

The project

As existing research on Dutch verb spelling is often limited to small experimental studies, this project will provide new insights into determinants of verb-spelling errors. As it becomes clear from the recent debate surrounding the declining state of the language proficiency of high-school students, from spelling to reading skills, the time is ripe to add a digital, quantitative perspective to problems in language education. This project explored two suggested, yet unstudied intra-linguistic factors in verb spelling, namely personal vs possessive pronouns (e.g., word je... vs wordt je broer...), and imperative forms (word lid!). Together with the applicant, two student-assistants have investigated these factors by combining their insights in Dutch Linguistics and Digital Humanities onto a new dataset spanning 6 million answers by thousands of users collected through Gespeld.nl.

The results

The resulting studies were completed. One of the resulting papers is in process of finalization, the other has been submitted for review to an academic journal.

Study 1: The spelling of homophonic verbs preceding the reduced possessive and personal pronoun je (Alex Reuneker & Mette Rebel)

As mentioned above, even for the best students in Dutch secondary education, the spelling of homophonic verbs before reduced pronoun je ‘you/your’ remains a problem, as the pronoun may be either the reduced form of personal pronoun jij ‘you’ or of the possessive pronoun jouw ‘your’. Consequently, the spelling of the verb preceding depends on this difference: it gets a final –t or not. For this study, we analysed the legal framework in which Dutch language proficiency is described, we compared teaching methods, and we performed quantitative analyses, such as regressions analyses, on a large set of students’ spellings of verbs to test possible associations between spelling errors in homophonic verbs and the two functions of the reduced pronoun je, taking into account education level, and education stage. The results show that homophonic verbs preceding je ‘you/your’ indeed contain significantly more spelling errors than other homophonic verbs. Accordingly, we have formulated concrete recommendations for the legal framework and methods of Dutch language education.

Study 2: The spelling of the imperative mood in secondary education

‘A peculiar type of finite verb’ is what Van den Toorn (1984: 12) calls the imperative mood. A finite verb is, after all, a verb that agrees with its subject and indicates whether an event or situation is simultaneous with or prior to the moment of speaking. However, in the imperative mood, the subject of the sentence is not expressed, and according to traditional grammar, in Dutch the imperative mood cannot change tense. This study focused on the form of the imperative mood that is identical to the form of the first person singular present tense. In the case of homophone verbs, this form sounds the same as the verb form in third-person singular present tense, but it differs in orthography. In a large-scale data analysis, we compared the errors made in the spelling of the imperative mood with that of regular homophonic fine verbs. We show that Dutch secondary-school students at all levels and in all grades have more difficulty spelling the imperative mood than spelling a regular finite verb. Accordingly, we have offered concrete recommendations for improving education materials.

Conclusion

Using big data, we were able to enhance the academic knowledge of two specific factors in verb-spelling errors and to offer evidence-based solutions to long-lasting and persistent problem in Dutch language education.

Naomi Truan

Interviews Going Open! Developing Interdisciplinary Guidelines on How to Publish Qualitative Interview Data as Open Data

The project “Interviews Going Open! Developing Interdisciplinary Guidelines on How to Publish Qualitative Interview Data as Open Data” aims at making potentially sensitive data used for qualitative purposes suitable for its reuse in Open Access. Using sociolinguistic interviews as an example, we will develop FAIR guidelines and workflows (https://www.go-fair.org/fair-principles/) and apply the framework of the Text Encoding Initiative (https://tei-c.org/) to show how qualitative data may go open.