How Zero shot learning changes the world
On June 22, the week of data literacy started. The week was organized by PublicNL in close collaboration with LCDS. The essence was: How do we deal with data in the future? What major changes did we see in the past five years and what expectations may we expect for the future? Are there any pointers?
LCDS had requested the Making Sense of Illustrated Handwritten Archives team to support the PublicNL organisation by kicking off the week of data literacy. The Making Sense team consisted of Jaap van den Herik, Lise Stork, Andreas Weber and Katy Wolstencroft.
The Photo of the Century
The first lecture started with a broad overview in which the theme Safety was central. Two connection lines were drawn, one to Surveillance (by collecting data) and one to Privacy (by protecting data). This contrasting idea was illustrated by the Photograph of the Century. As you may understand here we point to the photo of the murder of George Floyd. Actually, it was not a photograph, but a video that lasted more than eight minutes. It was a big flow of data. The (academic) question here is: can you read from only a photograph what has happened or what will happen?
The value of human data
We now reformulate the above academic question: what can we do with data? And what can we do with a photograph? We can use existing knowledge to a certain extent to interpret a photo. We can also extract new knowledge from data (examples). We have seen tremendous development in this area in the last 25 years. In 1997, DEEP BLUE defeated world champion Kasparov with the help of 700,000 Grandmaster (GM) games. In the years 2015-2017 a great development took place in the area of the game of Go. Then the European Champion, the number 5 of the world and the world champion were defeated by 15 million, 25 million and 40 million GM games respectively. The question then arose, can a program playing against itself generate the necessary patterns itself? That turned out to be possible, and AlphaGoZero proved that human knowledge was fallible by winning a 100 game match by 100-0 of the strongest program in the world. It was a great support for the idea that PERFECT knowledge and some data leads to a better interpretation of a snapshot. This also applies to photos and archival objects.
Making Sense Research Question
But what if you don’t have perfect knowledge? What if there are serious limits in terms of size and coverage on the data you can learn from? Can the computer learn from related knowledge and predict unseen classes? This is one of the central problems in the Making Sense project. Lise Stork presented her work on Zero-Shot learning to identify species from historical illustrations.
These illustrations are part of a large archive of historical biodiversity data from Naturalis. The rest of the collection consists of 17000 handwritten pages that document an expedition to Indonesia in the early 1800s. As well as identifying the species in the illustrations, the Making Sense team are automating the process of mining the rest of the content to make it amenable to research. In the Werkcollege, Andreas Weber and Katy Wolstencroft described the complexity of the collection and how it was necessary to bring together a number of computer science technologies to facilitate this. We use other machine learning and neural network approaches to identify where in the text the most important named entities can be found and we structure those named entities using a semantic model that is based on established community data standards. The underlying semantics enable better inference and reasoning within the collection and also allow us to interlink with external biodiversity resources, connecting data across the world, from historic expeditions to the present day.
The technology objectives of our research are investing in data and in data literacy, namely in knowledge about data, in processing and in software, with the goal to make civics and civil servants data literates in 2021.
The societal objectives of our research are (1) exchanges of data, (2) performing community-related tasks, (3) taking care of relational privacy, and (4) training, with the goal to exchange increasing amounts of data and to handle successfully multifaceted societal problems.
The main obstacle remaining is: Without violating the privacy rules can we solve the question of what exactly happened when only seeing a picture of George Floyd?
Here are the links to the webinars of June 22nd:
Webinar: Zero Shot Learning – Jaap van den Herik and Lise Stork (Dutch)
Webinar: Data literacy - Andreas Weber and Katy Wolstencroft (Dutch and English)