Digging up new information from ancient Chinese texts

13 December 2016

How were ideas about politics and society distributed in ancient China? Hilde De Weerdt, Professor of Chinese History, investigates this using new digital methods. We speak with her about networks, big data and digital humanities.

Since the eighties, China has been at the forefront when it comes to digitisation of written sources. Huge numbers of ancient Chinese books, letters and other texts are now available in digital format. What can these texts tell us about the history of China? Using new digital research methods such as text mining, we are now able to analyse much larger amounts of text in a short time. This provides new insights, says Hilde De Weerdt.

What exactly is your research about?

‘I study Chinese political history and the role that communication networks have played in it. I try to answer such questions as how new ideas about politics and society spread throughout Chinese history. We always thought that it happened from above, from the court. But by analysing large amounts of written sources we found that the situation was rather different. From the 12th century onwards, a cultural, literate elite emerged, which spread all kinds of ideas and exercised influence in that way.’

So the elite had more power than we thought?

‘Exactly. These people wrote many texts, such as letters, poems and notebooks - similar to today's blogs - in which they commented on a variety of issues, from defence to diplomatic relations, religion and inflation. In my research, we have identified the networks through which these texts were distributed. In many cases they turned out to be large networks, covering long distances. You can imagine that in this manner, despite the enormous size of China, people were able to bring together their ideas and say: we want our views to be respected. Many ideas were developed locally and then became so widespread that they finally penetrated to the top.’

How did you manage to map these networks?

‘In the texts that we used in the study, we analysed who was in contact with one another, who addressed letters to whom, and who commented to whom. We did this by, for instance, semi-automatically extracting personal names from the texts. Then we visualised these networks using digital methods, and mapped which regions in China they covered.’

This kind of research is probably only possible if many texts are digitally available?

‘That's right, and in that area we are at an advantage in China Studies. From an early stage, much attention has been paid to the digitisation of the Chinese textual tradition, so we have a lot of digital sources. But so far, little has been done to really make use of all this material.’

Why is that?

‘The possibilities are still limited. Apart from performing keyword searches, not much else can be done with these texts in most commercial databases. In the eighties and nineties, we were already very happy with that; it was revolutionary that these texts were digitised in the first place. But not all the possibilities of the data were used. For example, if a keyword search yielded a thousand results, that used to be too much to work with. The motto was then: limit yourself to a particular period or author, in order to reduce the number of hits. But that method doesn’t quite fit in with present-day research anymore. You now want to use all the data you have, in order to discover new questions and to reach new insights.’

Is that why you have developed the MARKUS platform?

‘Yes, in our research into communication networks, we realised that it is not very efficient to annotate large amounts of text manually. That is why we developed MARKUS, which automatically tags relevant information for you. Suppose you're investigating Chinese city walls and you want to know where the walls were, how high they were, how and when they were built, etcetera. If you search for that kind of information in the traditional way, it will take years. But using a platform like MARKUS, you can automatically extract all kinds of information from the texts.’

How exactly does that work?

‘You enter the text, indicate what kind of information you're looking for, and the system will help you by extracting, for instance, place names, personal names, time references and other relevant information. All that information is also linked to other databases, which you may analyse along with your own data in visualisation platforms linked to MARKUS. Combining all these different datasets leads to new insights.’

Would you consider the data you work with to be ‘big data’?

‘Anyone who deals with astronomy data will find our datasets to be small. And relatively speaking, they are, of course. But there are certainly parallels with big data. The idea of not working with small samples, but using all available data and looking for patterns in it, that is something we are doing now. It used to be impossible to analyse tens of thousands of texts, or it could only be done with large teams and it would take decades. So when we talk about the use of new methods for processing data, I say yes, that is happening. But it is still in its infancy.’

You are one of the people at the forefront in the field of Digital Humanities. To what extent are other humanities scholars involved in this?

‘I think we're in a transition period. The interest is definitely there, but many researchers do not feel equipped to do research with digital methods. The younger generation do, which shows that things are changing. But collaboration with computer scientists is crucial, because it helps you accomplish things that you would not be able to do alone. And on the other hand, computer scientists can learn from humanities. Cooperation can lead to new, original projects. Therefore we have recently founded the Leiden University Centre for Digital Humanities.’

How did you come into contact with digital humanities?

‘I had an early interest in the computational side, but I only really started using digital research methods some seven, eight years ago. I wanted to map these Chinese communication networks and was searching for the best way to do that. And sometimes, as in my case, computational methods offer something extra, because they allow you to approach your research question from a different angle.’

Final question: have you always been interested in China?

‘By chance, I recently found a few name cards with Chinese characters, which I had made in primary school. I was nine years old at the time, I think. So yes, I believe my interest in China has always been there.’

Hilde De Weerdt obtained her PhD from Harvard University and was a postdoc at the University of California at Berkeley, Stanford University and Harvard University. Subsequently, she taught Chinese history at the University of Oxford (2007-2012) and at King's College London (2012-2013). Since 2013, she is Professor of Chinese History at the Leiden Institute for Area Studies, where she focuses on the question of how elite networks shaped Chinese politics. Together with Dr Brent Ho she developed the platform MARKUS for the annotation and visualisation of data in Chinese texts.

(JvdB)

This article is part of a series of interviews with researchers from the Leiden Centre of Data Science (LCDS). LCDS is a network of researchers from different scientific disciplines, who use innovative methods to deal with large amounts of data. Collaboration between these researchers leads to new solutions to problems in science and society.