Low-Resource Chat-based Conversational Intelligence (LESSEN)
Access to information is a human right (United Nations, 1948). Information technology, such as search engines and recommender systems, has become the key mediator and facilitator to connect people to information (White, 2016). Conversational artificial intelligence (AI) technology is increasingly being used as a means to connect people to the information that they need, both in quotidian tasks and for occasional knowledge-intensive problem solving for important issues or crises (Allan et al, 2018).
- 2022 - 2027
- Suzan Verberne
University of Amsterdam, Leiden University, University of Groningen, University of Applied Sciences, Radboud University Nijmegen, Achmea, Albert Heijn, Bol.com, KPN, Rasa Technologies, Ahold Delhaize, National Police.
The development of conversational AI technology has seen rapid progress in recent years, both chat-based (a.k.a. text-based) and voice-based, enabled by large pre-trained language models. The technology is highly successful in conversational agents that support a large number of services, such as Alexa and Siri (Lee and Jha, 2019; Rastogi et al., 2020). The number of consumer-facing applications of conversational AI is increasing rapidly, with domains of application ranging from retail to government to telecommunication to finance to health and government (Gao et al., 2021), and with services including sales, help desks, and advice.
In particular, in the Dutch ecosystem, diverse economic stakeholders share an ambition to develop increased technological autonomy in the area of conversational AI. Stakeholders selling a multitude of grocery product categories to a large customer base (e.g., Ahold Delhaize and Albert Heijn), offering a broad spectrum of financial services (Achmea) or telecommunication services (KPN), or with a large-scale catalogue in general retail (Bol.com) — they all share the same need of technological autonomy concerning conversational AI for the Dutch language, meaning that they seek to develop conversational AI technology for the Dutch language, for a diverse range of application domains and user scenarios, a wide range of tasks, subject to Dutch (and EU) regulations concerning the use of potentially privacy-sensitive data.
Key to the development of effective chat-based conversational AI technology are availability of a large volume of training material for training conversational agents and significant computational resources to actually carry out the training (Su et al., 2018). At present, state-of-the-art conversational AI technology and the required resources are mostly owned by big tech such as Amazon, Facebook, Google and Microsoft. Increasingly, however, stakeholders, and especially European stakeholders, opt to develop a higher level of technological autonomy, for economic reasons, for legal reasons, and/or strategic reasons (Timmers, 2021).
The key societal breakthrough that the LESSEN project aims to achieve is to democratize conversational AI technology for the Dutch language.
Widespread and democratic adoption of conversational AI technology for the Dutch language by a wide spectrum of technological and societal stakeholders is hampered by three main hindrances:
• Steep computational costs for training conversational agents and challenging inference times.
• A lack of training material for lower resourced languages, tasks and domains.
• A lack of guarantees for the safety and transparency of conversational agents.
While Dutch is a relatively small language in terms of the number of native speakers (Wikipedia, 2021), it is a wellresourced language in terms of the core technologies and datasets needed to build state-of-the-art language-based solutions (ELRA, 2021). The societal problem at the heart of LESSEN lies with the compute and data requirements that are needed to enable and support a myriad of tasks and scenarios for conversational AI technologies for economic stakeholders in the Dutch ecosystem. Labeled data is the core fuel that drives the development, optimization and deployment of conversational AI, and a lack of resources for training and fine-tuning conversational AI for specific domains, tasks, and scenarios blocks further adoption and innovation of conversational AI in the Dutch ecosystem.
LESSEN addresses the compute and data hindrances identified above for the case of chat-based conversational AI for the Dutch language. In doing so, LESSEN directly addresses the NWA route
• “Creating value through responsible access to and use of big data.”
LESSEN enables economic stakeholders in the Netherlands to create value by giving them direct access to data and compute efficient conversational AI technology, subject to clear and actionable guidelines and guarantees concerning
the safe and transparent use of such technology. LESSEN answers the cluster question
• “Can we develop human language technology (HLT) that allows us to communicate with our computers (smartphones, tablets)?”
by making state-of-the-art chat-based AI technology available for the Dutch language, thereby allowing users to communicate with apps and services in a conversational manner, which is an increasingly popular medium for people to communicate with their computers and the services to which they provide access.