Universiteit Leiden

nl en

Leiden University Centre for Digital Humanities

Small Grants 2024 Research Projects

The LUCDH foster the development of new digital research by awarding a number of Small Grants each year. As in previous years the LUCDH received a large number of excellent grant applications for Research and Personal Development funds. Congratulations to the recipients of this year's research awards!

Small Grants 2024 Research Projects

Dear Digital Diary: An Exploration of the Huydecoper Diaries (1648-1704) through Handwritten Text Recognition

This project focuses on Joan Huydecoper van Maarsseveen (1625-1704). Huydecoper was a prominent individual who served as Burgomaster, Director of the Dutch East India Company, and Council in the Admiralty in Amsterdam. His diaries, including copies of his letters and lists of expenses, written between 1648 and 1704, have survived (Utrechts Archief). They provide a wealth of information about a variety of themes. For one, they provide a unique insight into Huydecoper's interactions with the Amsterdam elite. But the information in his diaries extends far beyond. Huydecoper corresponded, for example, with his servants and schoolmasters, as well as with scholars, diplomats, and merchants worldwide, in Dutch, French, Italian, and Latin. 

Yet, despite the potential of his diaries as a source, they remain understudied. The primary obstacles for historians are the extensive volume of the diaries and Huydecoper's challenging handwriting. Our aim is, therefore, to make his diaries searchable by utilizing the latest digital methods.

The project has four principal goals:

  1. Develop a high-quality HTR model to generate instant transcriptions for a historically relevant collection of primary sources. This will transform the currently inaccessible collection into a searchable full-text database, saving time and increasing research efficiency.
  2. Develop a research agenda based on the primary source, utilizing collective expertise to identify priorities across historical subfields. This involves raising awareness of the source and the developed search tool.
  3. Train a promising student in paleography and cutting-edge digital tools, bridging the gap between traditional and digital methodologies in source work.
  4. Promote collaboration between academic and industrial expertise by working with archives.

The successful implementation of the HTR model will drastically enhance accessibility to the Huydecoper Diaries, offering new perspectives on various historiographical debates. 

Jacob van Loo: Portrait of Joan Huydecoper

Lontara Digital: Developing Text Recognition Model for Lontara Manuscripts Using Transkribus and PyLaia

Lontara manuscripts from South Sulawesi have earned a reputation among scholars of Southeast Asian studies for the realistic and open-ended portrayal of the past, in contrast to other local traditions that mix myth and legend with history. In the words of local scholar Abidin (1971), Lontara chronicles record events in a matter-of-fact way, without excessively flattering the rulers, which serves as useful and reliable preliminary sketches for reconstructing history. Among many genres of Lontara manuscripts in South Sulawesi, a local pre-Islamic epic called La Galigo rises to prominence. La Galigo is the world's longest epic (even longer than the Mahabharata) that was acknowledged by UNESCO as Memory of The World for Indonesia and The Netherlands in 2011.

This project aims to provide a tool for the accessibility and literacy of Lontara sources from South Sulawesi through the use of technology. This pilot project will develop a Customised Text Recognition Model prototype for the Lontara script by utilising Transkribus and PyLaia. The project will begin with creating a fresh Ground Truth from the La Galigo manuscripts kept in Leiden University Special Collection.

Ultimately, this project will compile the digitised documents written in Lontara script scattered in libraries and institutions worldwide. By doing this, the project will map, catalogue, and preserve Buginese and Makassarese knowledge and open many research possibilities. In the long run, we attempt to make the Lontara collection searchable using Handwritten Text Recognition (HTR) technology. The expected results from this pilot project, therefore:

  1. Text Recognition Model for Lontara script; 
  2. Web application for Latin-Lontara and Lontara-Latin transliteration; 
  3. Blog/website to showcase the database of scattered Lontara manuscripts.

The project is also expected to open up opportunities for further exploration of non-Western historical sources in the Leiden University collections through HTR technology. At the time of writing this proposal, there are only five non-western public models available in Transkribus. As a result, the project aims to address the gaps in the non-western script model in Transkribus.

Reading Metaphor in Literary Machine Translation and Post-Editing

This project focuses on metaphors in literary machine translation. It investigates what happens to metaphors when literary texts are machine-translated, for example using Google Translate or ChatGPT, and how readers respond to machine-translated metaphors. While a small but growing number of studies has investigated the usefulness of Statistical and Neural Machine Translation for literary translation, obtaining promising results for different genres (short stories, novels, poems) and language pairs (including English-French and English-Spanish), no studies have thus far focused specifically on metaphors and whether the way Machine Translation handles metaphors affects how readers understand and appreciate literary texts. A recent study by Guerberof-Arenas and Toral (2022) suggests that metaphors are likely to require creative solutions in translation, while Machine Translation tends to be direct, producing a literal translation by default. However, we don’t know whether readers interpret unidiomatic or unexpected literally translated metaphors as errors or as creativity.

The project will involve three main activities:

  • The first activity involves machine-translating 10 excerpts from English novels into Dutch using an Neural Machine Translation model (Google Translate) and a Large Language Model (ChatGPT).
  • For the second activity, the Machine Translation output will be corrected and revised (post-edited) by two professional literary translators to determine which ones they consider incorrect or inappropriate (linguistically, stylistically or culturally) as compared to how a human would translate them.
  • Finally, the third activity addresses the reactions of readers when they read literary passages with machine-translated metaphors. An online questionnaire will be used to determine how understandable, readable and enjoyable they are from the readers’ perspective.

Mapping the development of Quranic Reading Traditions (QRTs) with the HAMaLaT-Quran Database (8th–11th centuries CE)


The Effect of Immersive 360 Tasks on Aspects of L2 Speaking

Within language learning studies, 360-degree video technology is recognised for providing learners with a realistic and authentic environment. This technology creates immersive content from real-world footage, and fosters a sense of presence and engagement in language learning environments. In prior studies funded by LUCDH projects, speaking tasks utilizing 360-degree techniques were developed to prompt speech for applied linguistic research. Although these techniques were tested in previous projects, a comprehensive evaluation of their effectiveness and outcomes is still pending.

The present project aims to address this research gap by evaluating the effect of using immersive 360 videos as elicitation compared to speech elicited through the same tasks presented in two types of 2D formats: normal 2D videos and cartoons. The comparison involves quantitative measurements of the naturalness of the speech in the three conditions and a qualitative evaluation by students. Because the production and use of speaking tasks with the 360 technique are more expensive than 2D videos or simple cartoons, the outcomes of the research will inform future researchers and language teaching professionals to evaluate the worth of creating 360 videos for speaking tasks in research and teaching.

Virtual reality storytelling, embodied experience, empathy and understanding (Correspondents of the World)


D or t? Using Big Data to Explore Linguistic Factors in Dutch Verb Spelling

The rules for Dutch verb conjugation are simple and understandable, yet they remain a stumbling block for both students in secondary and higher education, and for the working population. Previous studies show that errors in verb spelling, even those by skilled writers, are due to factors such as time pressure and frequency-dominance in homophone verbs. For instance, the finite verb verhuist (to move) and the participle verhuisd (has moved) have identical pronunciation, but different orthographies. As the participle verhuisd has a higher frequency of use, it is often used in linguistic contexts that require verhuist. The mainly psycholinguistic approach of existing studies to understand such problems has targeted homophone dominance specifically for its relation to extralinguistic factors such as working memory, highlighting that cognitive pressure may cause fallback from computation to look-up of the most frequent word form in memory. In the literature, however, several intra-linguistic factors are hypothesized to be of influence but remain to be systematically tested.

As existing research on Dutch verb spelling is often limited to small experimental studies, this project will provide new insights into determinants of verb-spelling errors. As it becomes clear from the recent debate surrounding the declining state of the language proficiency of high-school students, from spelling to reading skills, the time is ripe to add a digital, quantitative perspective to problems in language education. This project aims to explore two suggested, yet unstudied intra-linguistic factors in verb spelling, namely personal vs possessive pronouns (e.g., word je... vs wordt je broer...), and imperative forms (word lid!). Together with the applicant, two student-assistants will investigate these factors by combining their insights in Dutch Linguistics and Digital Humanities onto a new dataset spanning 6 million answers by thousands of users collected through Gespeld.nl. The resulting studies will enhance the academic knowledge of factors in verb-spelling errors, and simultaneously offer evidence-based solutions to long-lasting and persistent problem in Dutch language education.


Interviews Going Open! Developing Interdisciplinary Guidelines on How to Publish Qualitative Interview Data as Open Data

The project “Interviews Going Open! Developing Interdisciplinary Guidelines on How to Publish Qualitative Interview Data as Open Data” aims at making potentially sensitive data used for qualitative purposes suitable for its reuse in Open Access. Using sociolinguistic interviews as an example, we will develop FAIR guidelines and workflows (https://www.go-fair.org/fair-principles/) and apply the framework of the Text Encoding Initiative (https://tei-c.org/) to show how qualitative data may go open. 

This website uses cookies.  More information.