'Distant Reading': Analyzing Online Book Reviews

  • Peter Boot, Huygens ING
17 May 2018
Book response used to be given at the coffee machine, in a conversation between friends or in a book discussion group: all environments that aren't particularly well accessible to researchers. That situation has changed dramatically since readers began writing about books online. Online book discussion presents an important opportunity for scholarship to investigate what readers appreciate in books. The ODBR database of Online Dutch Book Response contains (at the time of writing) 390,000 (short) online book reviews and 510,000 other book response items, harvested from Dutch-language mass review sites and from the largest online bookseller in the Netherlands, bol.com. While a resource such as this, based on self-selection of participants, can never pretend to be representative of 'the' reader, the database does bring together a very large collection of non-professional writing about literature and books more generally. Because of the size of the collection, the study of the reviews requires a tool for ‘distant reading’, a tool that can summarize the evaluative aspects discussed in reviews.

In my talk I will describe the creation of such a tool, aimed at discovering the standards about quality in literature and reading that people explicitly or implicitly express in the reviews. The tool will assign evaluative codes to reviews based on the presence of certain textual patterns in the reviews' texts. I will ask why we need such a tool, and why general purpose-tools are not enough. I will also ask what it means to reduce a complex text into numeric codes, what we lose and what we gain in that transformation. How far will it get us? Will this help us to do research, without the need to manually annotate thousands of reviews, into e.g. how different standards  are applied in judging different genres or how different readers or reader groups apply different standards?

