PhD project
From Text to Knowledge: Language Models for Knowledge Graph Extraction and Ontology Learning
This PhD project investigates how language models can be used to extract structured knowledge from text in a way that is accurate, robust, and applicable to real-world use cases.
- Duration
- 2022 - 2026
- Contact
- Roos Bakker

Knowledge graphs and ontologies are powerful means to structure textual data and express linguistic entities and relations. However, creating such models requires significant manual effort from domain experts. Natural Language Processing (NLP) offers opportunities to automate this task by extracting structured information from text. This PhD research investigates methods for automating the extraction and evaluation of semantic models across domains.
In the legal domain, we compared language models for extracting knowledge graphs. We focused on FLINT, an ontology formalising legal actions with roles such as actors and objects. We fine-tuned BERT models with a newly created Dutch dataset (accuracy ≈ 0.80). Comparisons with rule-based approaches and prompting strategies for instruction-tuned generative Large Language Models (LLMs) showed that fine-tuned models performed best, but LLMs offer advantages in low-resource scenarios.
In the safety domain, we researched relation extraction for knowledge graphs. Using a dataset of annotated news articles, we evaluated methods including co-occurrence analysis, embedding-based relation classification, and LLM prompting. LLMs outperformed the other methods, though they still fell short of human-created ground truths (F1 ≈ 0.6).
Beyond knowledge graphs, we explored ontology learning, which imposes more constraints on entities and relations. We modelled an ontology in the safety domain and tested different prompting strategies. Results varied across ontology elements: individuals achieved an F1 score of ≈ 0.65, while properties 0.2. For identifying axioms, we created a benchmark and tested LLMs. We found that, while unsuitable for full automation, they can provide valuable candidate axioms for ontology engineers.
A major challenge is evaluation. Gold-standard datasets are costly to create and maintain. To address this, we developed automatic evaluation metrics that measure semantic and syntactic graph quality. Experiments with concept removal and noise injection showed that these metrics reliably indicate change effects, enabling developers to assess modifications. LLM evaluation presents distinct challenges: outputs are variable and often unverifiable. We reviewed metrics and conducted a practitioner survey to provide practical guidelines.
In summary, this work highlights the potential and limitations of NLP for structured information extraction. We observe that domain specificity matters. Models trained or adapted to a particular field outperform general-purpose ones. Also, structural complexity influences results: axioms are harder to identify than surface-level entities. Finally, evaluation is essential: automatic metrics can reliably detect changes in knowledge graphs and ontologies. Taken together, these findings show that while fully automated knowledge graph extraction and ontology learning remain out of reach, NLP techniques such as fine-tuned models and generative LLMs can act as valuable assistants to ontology engineers by offering candidate axioms, suggesting refinements, and accelerating the modelling process.