Leiden University Centre for Digital Humanities


Staff members and affiliates of the Leiden University Centre for Digital Humanities are involved with a variety of digital research projects. Some of them are featured here.

Detecting cross-linguistic Syntactic Differences Automatically (DeSDA)

Sjef Barbiers

The goal of this project is to investigate the possibility of automatic detection of syntactic differences between languages by using on-line parallel corpora and software tools for annotation, search and analysis. This approach has the potential to greatly enhance the empirical basis of theoretical comparative syntax research and will enable syntacticians to do theoretical modeling of syntactic variation based on quantitative analysis of the correlations between syntactic properties. Since the project should be seen as a test case for this larger goal and has to be carried out by one PhD student, the research topic will be narrowed down to the (morpho-)syntax of verbs in Germanic languages, including auxiliaries. The main descriptive question will be: Which differences do we find in the Germanic languages with respect to the structural positions of verbs and with respect to verbal inflection? The main theoretical question will be: To which extent is the existing theory of verb placement and inflection as it has been developed since the late eighties of the 20th century capable of capturing these facts?

For more details, see the full proposal.

Exploring new methods in comparing sign language corpora: analysing cross-linguistic variation in the lexicon

Victoria Nyst

Between 2007 and 2014, four large video corpora of West African SLs have been compiled at Leiden University, under guidance of the applicant. From 2007 to 2012, two large projects took place to document local SL use at various places in Mali and Ghana leading to three digital video corpora. The first Malian Sign Language corpus contains recordings of SL use by deaf signers in Bamako and Mopti, consisting of over 27 hours of recorded discourse, featuring 65 signers (Nyst 2008; 2010). The second Malian Sign Language corpus contains the results of a SL survey in the Dogon area of Mali, notably in Bandiagara, Douentza and surrounding villages and Berbey, close to Hombori. This corpus contains 32 hours of signing of 68 signers and includes signed conversations, interviews and lexical items in at least three independent SL varieties (Nyst, Sylla, Magassouba, forthcoming). The third SL corpus contains a set of discourse data of around 30 hours and 15 signers of Adamorobe Sign Language, which emerged spontaneously in response to the high incidence of hereditary deafness in the village of Adamorobe, Ghana. In 2014, the fourth extensively annotated corpus was archived, containing a representative sample (around 30 hours of 9 signers) of the emerging SL of the village of Bouakako in Côte d’Ivoire. All corpora are glossed in ELAN (Crasborn & Sloetjes 2008). All corpora are glossed in French, except the Adamorobe Sign Language corpus, which is glossed in English and Akan. Lexical databases with phonological coding are available for all corpora, except the second Malian SL corpus. All corpora are stored either at the Endangered Language Archive in London or the DoBeS archive in Nijmegen, or both.

For more details, see the full proposal.