The Seductions and Dangers of GrETEL 4

  • Jan Odijk, Utrecht University
Tuesday 26 March 2019
Cleveringaplaats 1
2311 BD Leiden

LUCDH Lunch Talk

GrETEL is an application for searching and analyzing Dutch syntactically annotated corpora (treebanks and parsebanks). A core feature of GrETEL is the option to search on the basis of an example sentence (‘query-by-example’), so that researchers can use it without having to write a query themselves, and without having to know the exact fine details of the syntactic structures. Version 4 of GrETEL (GrETEL 4) boasts interesting new features, in particular: (1) one can upload one’s own text corpus, in a variety of formats (TXT, CHAT, TEI, FoLiA) together with metadata; each sentence in the text is parsed automatically and made available in the search and analysis interface as a parsebank; (2) extended options for analysis of search results with combinations of elements from the query and metadata, by dragging elements to a pivot table; (3) enrich Dutch CHAT corpora with so-called MOR and GRA tiers, which enables application of CLAN-based analysis tools such as EVAL and KIDEVAL for research into language acquisition and language deficiencies.

In the first part of the lecture I will introduce GrETEL 4 and give a live demo by analyzing some phenomena related to verbal clusters in Dutch, and I aim to show that everyone in the audience can now make such queries by using the query-by-example method.

In the second part, I show that the ease with which one can now have queries created for you also entails some dangers: without thorough knowledge of the nature of the syntactic structures, and the way in which the query-by-example method generalizes from a single example to a query, one runs the risk of missing crucial examples. I will illustrate this and also show how one can check for this, and propose some methods to ensure that one can nevertheless draw reliable conclusions on the basis of GrETEL 4 queries in treebanks and parsebanks.

