Conference Agenda

Session

WS06: Evaluating Automated Subject Indexing Methods

Time:

Wednesday, 03/Dec/2025:

11:15am - 1:15pm

Location: Brontë A

Knowledge Centre. Capacity: 18 (cabaret)

Presentations

Evaluating Automated Subject Indexing Methods

Maximilian Kähler

German National Library, Germany

Overview

Level of experience for attendees with relevant technologies: intermediate

This workshop will explore theoretical and practical aspects of evaluating automated subject indexing methods. As artificial intelligence and machine learning become increasingly prevalent in information retrieval and subject indexing, it is essential to understand how to effectively evaluate these methods to inform choices in system design, ensure quality and explore possibilities for further improvement. Key questions include: How can I drill down into subject suggestions generated by various methods and determine strength and weaknesses of these methods? What benefits offer LLM-based methods over traditional methods? This workshop aims to provide participants with a comprehensive understanding of the key metrics, their modes of aggregation, dimensions of evaluation that should be considered, as well as hands-on experience with an R evaluation toolkit, CASIMiR, newly developed at the German National Library (DNB).

We will examine strengths and limitations of various automated subject indexing approaches, including Lexical Matching ([1], [8]), Partitioned Label Trees ([2], [3], [4]), X-Transformer ([5], [6]), and LLM-generated subject terms ([7]). As an example we will study the application of these algorithms to a test-set of German book titles, predicting subject terms from the Integrated Authority File (GND) [9]. With its huge size the GND provides a very challenging target vocabulary, providing rich opportunities in studying advantages and disadvantages of the various subject indexing approaches.

The workshop will conclude with an open discussion, where participants will have the opportunity to share their perspectives and experience on other aspects, beyond the quality of subject suggestions, that should be factored into an in depth evaluation: Resource Requirements, Feasibility, Open Source availability, etc.

Prerequisites and Preparation

It is not mandatory or expected that participants understand German for this workshop. Example-book-titles and German subject terms will be translated to English
Participants are expected to have a basic understanding of information retrieval and subject indexing concepts, in particular knowledge of the basic information retrieval metrics: Precision, Recall and F-Score
Familiarity with R and basic programming concepts are helpful but not required. Participants will be provided with fully functional code examples
Participants are encouraged to bring their own laptops with R (and preferably also an IDE like RStudio, Positron or VS-Code) installed to work through provided example notebooks (see below)
Software and data will be provided at: https://github.com/deutsche-nationalbibliothek/casimir-workshop. Please make sure to follow the installation instructions in advance of the workshop

Participants will be provided with:

Example datasets with subject term suggestions from various subject indexing methods
Access to the CASIMiR package (https://github.com/deutsche-nationalbibliothek/casimir)
Example quarto-Notebook(s) that contain the first steps of an analysis with CASIMiR

Planned Outcomes

By the end of this workshop, participants will:

Understand the theoretical aspects to consider when starting an evaluation project,
Be familiar with some pros and cons of current approaches to automated subject indexing,
Have hands-on experience with a drill-down analysis in R using the CASIMiR package,
Be able to apply the knowledge gained to their own evaluation projects.

Detailed Timetable

-----------------------------------------------------------------------------------

Theory I: 30 Minutes

why automated subject indexing is hard
existing methods for automated indexing
introducing the datasets

-----------------------------------------------------------------------------------

Work-Book 1: 10 Minutes

Comparing specific examples of automated indexates

-----------------------------------------------------------------------------------

Theory II: 5 Minutes

metrics basics: set retrieval vs. ranked retrieval

-----------------------------------------------------------------------------------

Work-Book 2: 5 Minutes

Computing overall set retrieval metrics

-----------------------------------------------------------------------------------

Work-Book 3: 5 Minutes

Conducting stratified analysis along document and label groups

-----------------------------------------------------------------------------------

Work-Book 4: (optional, for fast study)

Precision-Recall-Curves
Ranked Retrieval Metrics

-----------------------------------------------------------------------------------

Theory II: 10 Minutes

Analysing the Long-Tail

-----------------------------------------------------------------------------------

Work-Book 5: 15 Minutes

Stratify results by label-frequency
Propensity scored metrics (optional, for fast study)

-----------------------------------------------------------------------------------

Work-Book 6: 10 Minutes

final assignment: decrypt workshop results

-----------------------------------------------------------------------------------

Theory III: Advanced Topics 15 Minutes

propensity scored metrics
Graded relevance and expert ratings
Combining multiple methods

-----------------------------------------------------------------------------------

Discussion: 15 Minutes

Other aspects of evaluation

-----------------------------------------------------------------------------------

Additional Information about Instructor

The workshop will be instructed by Maximilian Kähler, Research Software Engineer at the German National Library.

Mr Kähler acquired degrees in mathematical sciences from the universities of Göttingen, Durham (UK) and Leipzig. After completing his studies, he specialized as Data Scientist and Research Software Engineer. Prior work has led him to the Federal Institute for Quality Assurance and Transparency in Health Care (IQTIG) in Berlin and the Helmholtz Center for Environmental Science (UFZ) in Leipzig, before joining the German National Library (DNB) in October 2021. Kähler is part of the Department for Automatic Indexing and Online Publications and project lead for a DNB research project that investigates the possibilities to exploit recent advances in natural language processing and novel machine learning approaches for the task of automated subject indexing.

References

[1] O. Suominen, “Maui Like Lexical Matching,” https://github.com/NatLibFi/Annif/wiki/Backend%3A-MLLM.

[2] O. Suominen, “Omikuji Backend,” https://github.com/NatLibFi/Annif/wiki/Backend%3A-Omikuji.

[3] Y. Prabhu, A. Kag, S. Harsola, R. Agrawal, and M. Varma, “Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising,” The Web Conference 2018 - Proceedings of the World Wide Web Conference, WWW 2018, pp. 993–1002, Apr. 2018, doi: 10.1145/3178876.3185998.

[4] S. Khandagale, H. Xiao, and R. Babbar, “Bonsai: diverse and shallow trees for extreme multi-label classification,” Mach Learn, vol. 109, no. 11, pp. 2099–2119, Nov. 2020, doi: 10.1007/s10994-020-05888-2.

[5] W.-C. Chang, H.-F. Yu, K. Zhong, Y. Yang, and I. S. Dhillon, “Taming Pretrained Transformers for Extreme Multi-label Text Classification,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, New York, NY, USA: ACM, Aug. 2020, pp. 3163–3171. doi: 10.1145/3394486.3403368.

[6] J. Zhang, W. Chang, H. Yu, and I. S. Dhillon, “Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification,” Oct. 2021, Accessed: Nov. 08, 2021. [Online]. Available: https://arxiv.org/abs/2110.00685v2

[7] L. Kluge and M. Kähler, “DNB-AI-Project at SemEval-2025 Task 5: An LLM-Ensemble Approach for Automated Subject Indexing,” Apr. 2025, Accessed: May 07, 2025. [Online]. Available: https://arxiv.org/abs/2504.21589v1

[8] O. Medelyan, E. Frank, and I. H. Witten, “Human-competitive tagging using automatic keyphrase extraction,” ACL and AFNLP, pp. 6–7, 2009, doi: 10.5555/3454287.3454810.

[9] Geschäftsstelle der GND-Zentrale an der Deutschen Nationalbibliothek, “Gemeinsame Normdatei,” https://gnd.network/. Accessed: Oct. 11, 2024. [Online]. Available: https://gnd.network/