Experiments in tracing the origin of quotes through late imperial Chinese corpora
- Wednesday 31 October 2018
Van Wijkplaats 2
2311 BX Leiden
In this talk, Paul Vierthaler will present his current attempts to develop machine learning algorithms that accurately predict the source text for quotes found in late imperial Chinese documents. It is possible, even easy, to identify when two texts share information. It is often more difficult to assess which text is relying on which (or if there is a third, unknown text involved). Paul is in the process of developing a method that will aid scholars in evaluating the directionality of such intertextuality and will present the current state of this work at the China seminar.
Paul Vierthaler is a University Lecturer of the Digital Humanities at Leiden University. In his current monograph project, he analyzes how historical events are represented in “quasi-histories" written in late imperial China. In this work, he studies how information transforms in genre- and time-dependent ways across thousands of semi- to un-trustworthy texts. In order to facilitate rapid and rigorous research, Paul is interested in developing and adapting computational methods to analyze and visualize large natural language corpora. Additionally, as a continuation of past work on quantitative bibliographic analysis, Paul is developing an extensible and mineable bibliographic database on public domain Chinese texts. Paul has held postdoctoral fellowships at Boston College and Harvard University and in 2014 was awarded a Ph.D. in East Asian Languages and Literatures from Yale University.