Children's stories as a window to investigate empathy
Researcher Max van Duijn and PhD student Bram van Dijk apply language models to stories told by children to investigate empathy. For this research, they received the Best Paper Award at the Computational Natural Language Learning Conference in Singapore.
'I use language and stories as a window to investigate certain cognitive skills, in particular empathy or Theory of Mind,' says Max van Duijn. He is an associate professor at the Leiden Institute of Advanced Computer Science (LIACS). While story comprehension in young children has been widely studied, little is known about what kind of stories elementary school-aged children spontaneously invent and tell themselves. With this gap, the idea arose to create ChiSCor (Children's Story Corpus), a database of stories told by children. Using computational techniques, Van Duijn and Van Dijk analyze the relationship between story structure and children's empathy.
Children telling stories
Together with a team of alternating research assistants, Van Duijn and Van Dijk visited school classrooms, a community center and after-school programs to collect stories from children. They decided not to take the children out of class one by one, but to let them tell their stories in the group. Van Duijn: ‘We deliberately chose to leave children in their natural environment, so they basically tell the story not to us, the researchers, but to their group mates.’
Training language learning models
Van Duijn initially considered himself more of a "consumer" of computer science: someone who uses the techniques but does not contribute to the development himself. ‘But that changed in the course of the project. This is partly due to the appearance of large language models such as ChatGPT.' Large amounts of text are used to train language models. Based on this data, language models produce representations of how different linguistic elements relate to each other.
Until now, there has been little attention to the type of training data. Nowadays, researchers are trying to use smaller and more targeted data sets. 'The quality of the training data is perhaps more important than the quantity,' Van Dijk explained. In their research, Van Duijn and Van Dijk showed that even with relatively little data, you can teach a computer all kinds of word meanings. This appealed to the conference jury.
Two-way traffic between alpha and beta research
The outcome of the research is thus twofold, says Van Duijn, "On the one hand, we are making a contribution to language and cognitive science with our database and results. On the other hand, our work lays a foundation for training language models more efficiently using narrative data. It has become two-way traffic between alpha questions and beta methods, and vice versa.'
Best Paper Award
For this work, Van Duijn and Van Dijk, along with their co-authors Suzan Verberne and Marco Spruit, received the Best Paper Award. In part this is due to the unique database ChiSCor, consisting of about seven hundred stories and metadata from children between the ages of four and twelve. The database has been made publicly available so that researchers from developmental psychology, linguistics and pedagogy can work with the stories. Van Dijk: ‘We hope that other researchers will use the data, because there is still a wealth of information stored in ChiSCor. Moreover, working with children's stories is also a lot of fun and educational!’
For more information, view the project website.