Conference Agenda

Session

Thursday poster session: Thursday poster session

Time:

Thursday, 04/Dec/2025:

9:00am - 5:00pm

Location: Brontë

Knowledge Centre. Capacity: 60

Presentations

From Annotation to Insight: Human-in-the-Loop Machine Learning for Historical Archives in HAICu WP2

C.A. Romein¹, K. van Schuijlenburg², B. Wolf³, S. Peeters¹, S.J.L. Weggeman³, K. Dijkstra³, M. Dhali², A. Weber¹, L.R.B. Schomaker²

¹University of Twente, The Netherlands; ²University of Groningen, The Netherlands; ³NHL Stenden University of Applied Sciences, The Netherlands

This poster gives insight into ongoing research of new machine-learning architectures, which promote continual-machine learning through continual, ‘life-long’ harvesting of labels and annotations in order to enable multi-modal data mining (1). This work is carried out in the context of work pagage 2 of the large Dutch HAICu project (digital Humanities, Artificial Intelligence & Cultural heritage, 2024-2030) (2). The work package’s focus lies on layout clustering, document structure detection, contextual text linking. We research and develop scalable solutions for studying and interpreting handwritten and other multimodal collections with complex layouts by creating innovative feedback loops between volunteers-in-the-loop and machine-generated output (e.g. recurring textual and graphic patterns). Our use cases stem from three different Dutch archives: the National Archives in The Hague (NA), the Groninger Archives (GA), and the Collection Overijssel (CO).

In the context of the NA, we work with the archives of the Dutch Ministry of Colonial Affairs (1850-1900), entailing ca. 4 million scans. Researchers from the NHL-Stenden University of Applied Sciences use advanced deep-learning technologies such as Laypa (3), DinoV2 vision transformers (4), and openTSNE mapping (5) to develop sophisticated latent space embeddings. These techniques transform page layouts based on structural features, going beyond conventional automatic text recognition to capture historical documents' nuanced visual and contextual complexity. Visualizing this latent space exposes clusters of handwriting styles for printed tabular structures, implicit tables on mixed layout pages, and writing density for different column widths (see first illustration). Concurrently, research at the University of Twente attempts to exploit the colonial archive’s innate preexisting structure and historical ordering principles (e.g. indices and klappers) for rapid information retrieval through keyword search.

Also at the University of Twente, researchers study the Staten van Overijssel collection, comprising 60,000 pages. It documents centuries of political and social interactions in the Netherlands (1578-1795). We pay in particular attention to historical petitions, representing the 'voice of the people' and offering insights into societal concerns (see second illustration). One of the core objectives is to generate innovative automatic metadata - likely with ANNIF (6) - that will make these documents more accessible and interpretable for the general public.

Central to WP2's methodology is research into robust human-in-the-loop frameworks that recognize the limitations of fully automated systems. Solutions developed within this WP rather continuously refine machine learning models by integrating academic expertise and volunteer contributions. This collaborative approach democratizes cultural heritage research and actively engages diverse stakeholders in preserving and interpreting colonial, regional and other multimodal archives.

Acknowledgement:

This work is supported by the Dutch Research Agenda (NWA) of the Dutch Research Council (NWO), grant: NWA.1518.22.105.

References:

(1) Schomaker, L.R.B. (2020). Lifelong Learning for Text Retrieval and Recognition in Historical Handwritten Document Collections. https://doi.org/10.1142/9789811203244_0012.

(2) HAICu: https://www.haicu.science/.

(3) Klut, S., et al. (2023). Laypa: A Novel Framework for Applying Segmentation Networks. https://doi.org/10.1145/3604951.3605520.

(4) Oquab, M., et al. (2023). Dinov2: Learning robust visual features. https://doi.org/10.48550/arXiv.2304.07193.

(5) https://opentsne.readthedocs.io/en/stable/# and Poličar, P.G., et al. (2024). OpenTSNE. https://doi.org/10.18637/jss.v109.i03.

(6) Annif: https://annif.org/ and Suominen, O. (2019). Annif: DIY automated subject indexing. https://doi.org/10.18352/lq.10285.

AI Tools for Digital Libraries: Enhancing User Experience and Trust

Petr Žabička¹, Jan Rychtář², Martin Lhoták³, Filip Jebavý¹, Filip Kersch³

¹Moravian Library in Brno, Czech Republic; ²Trinera s.r.o., Czech Republic; ³Library of hhe Academy of Sciences, Czech Republic

As generative AI and semantic technologies rapidly enter the public sphere, digital libraries are increasingly expected to offer intuitive LLM-based functions and interfaces alongside traditional tools. This presentation explores how large language models (LLMs), multimodal AI, and semantic search can transform user-facing services in digital libraries — and the crucial design decisions that determine their usefulness, reliability, and trustworthiness.

In the Czech Republic, most digital libraries are based on the open-source Kramerius digital library system. A wide array of digital libraries exists, established by both large libraries directly under the Ministry of Culture, such as the National Library and the Moravian Library, as well as specialized libraries, university libraries, regional libraries, and even some smaller institutions. As there are almost 50 installations of Kramerius, the landscape is quite fragmented. The Czech Digital Library was conceived as a common index and user centric front-end with the aim to provide one point of entry to the users. Currently, the Czech Digital Library provides access to 350,000 documents represented by 150 million pages. Since 2019, it has also served as the official national aggregator for modern library documents, forwarding data to the Europeana Digital Library.

For over a year, the Czech Digital Library has enhanced its public domain digital content by integrating external AI services for translation, page summaries, and text-to-speech functionalities. These AI features are available to users for a single page or a selected portion of it. They function for publicly accessible documents, both for scanned documents available in JPG/JPG2000 formats and for born-digital documents accessible as PDFs. The services allow the use of the OCR text layer to translate documents into more than 10 languages, enable text-to-speech functionality, and summarize page content into a few quickly digestible points. Although these features are relatively limited, they provide tangible benefits by enabling users to work with documents in languages they do not understand, quickly analyze document content, or absorb information by listening, particularly benefiting users with special needs. In testing is a LLM query interface for querying either a page or a whole document. The testing phase includes enhanced features that allow users not only to summarize a page but also to query the AI with open-ended questions about the content of the displayed page. Additionally, the testing involves summarizing entire documents, such as monographs, articles, or newspaper issues, and querying these entire documents. Although this option is not available in the production environment, for testing purposes it is possible to easily switch between different external AI services and their models from the user interface to observe varying responses.

For querying, we have been testing a range of models from OpenAI, Anthropic and Google to get a sense of how different (and differently priced) models respond to user queries. For translations, Google Translate and DeepL has been tested first but as DeepL does not support Latin we decided to use solely Google Translate, even though there are some disadvantages there as well. Early in testing, we discovered that creating summaries in a language different from the original is more effective when the document is first translated. This approach prevents models from inadvertently switching back to the original language partway through the summary, ensuring consistency and accuracy. For text to speech we have tested services from Google, OpenAI and ElevenLabs. As each of these services has its own advantages and disadvantages, we are allowing the user to pick a model and a voice for each target language. During the presentation, we will discuss our findings and experience in greater detail.

Currently, all these services are implemented solely in the digital library front-end to accelerate user testing and interface improvements, and have not yet been integrated into backend systems where some of these features might ultimately belong. Since all significant online AI services require payment for extensive use, we require user authentication and route all AI service requests through a common proxy, monitoring token usage and setting usage limits. This gives us valuable data on the real use of the AI services as well as protects us from the numerous LLM crawlers that ignore the robots.txt settings.

The presentation will then concentrate on a recent Newspaper memory project. We indexed 25 newspaper titles dating from 1880 to 1914, totaling approximately 500,000 pages. Due to the absence of precise page segmentation data, we divided the text into approximately 10 million chunks using arbitrary heuristics and generated vector representations for each chunk. Then we gave the users the possibility to ask questions in a natural language and used LLM to give consistent answers based on the most relevant texts with references to the original articles to allow users to check the sources themselves.

Another experiment involved bibliographic data aggregated by the Moravian Library for its Central Portal for Libraries. We enriched MARC records of monographs with publisher and additional annotations, testing the effectiveness of natural language searches within library catalogues. The early testing suggests that from this activity is the need to create relevant AI summaries for books that have already been digitized but lack any annotation.

Another experiment involved bibliographic data aggregated by the Moravian Library for its Central Portal for Libraries. We enriched MARC records of monographs with publisher and additional annotations and tested the effectiveness of semantic searches within library catalogues. The initial testing showed promising results in terms of relevance. Going forward, our primary focus will be on analyzing user queries to better understand how they utilize semantic search capabilities, which will help us refine its integration into library systems.

All the above mentioned experiments as well as other considerations led us to the decision to develop a new version of the Czech digital library front-end. We examine how to design hybrid interfaces that balance the precision of keyword search with the flexibility of semantic understanding. We highlight core challenges — from integrating AI-generated image descriptions and document translations to building “Ask a Document” conversational interfaces — while maintaining user trust and legal compliance. Additional challenges include determining over which and how large portions of documents or digital libraries AI functions should be enabled. Should we allow users to query virtual collections curated by experts or their own lists of favorite documents? Another consideration is whether to pre-generate document summaries and offer them directly to users, and if so, can this be done for documents not in the public domain? Furthermore, addressing the hallucination of LLMs in open-ended queries is crucial to ensure that the answers users receive genuinely derive from the document's text. How can we effectively communicate this in the user interface?

The rise of LLMs and other AI tools since 2023 has created opportunities and pressures for libraries to rethink user interactions with digital collections. Traditional catalogue and full-text search interfaces are being supplemented — and in some cases replaced — by semantic search, image-to-text analysis, summarization, and even conversational querying. These tools can enhance discovery and accessibility, but they also raise critical questions about transparency, user control, and ethical use. When should AI-enhanced features be applied automatically versus offered as opt-in tools? How do we prevent hallucinations, bias, and copyright violations inherent in generative models? How can we design user interfaces that clearly distinguish between keyword and semantic search modes?

We will share lessons and design decisions from ongoing development in the Czech Digital Library and partner institutions, including merging results from different search paradigms, supporting multilingual and multimodal queries, and ensuring responsible LLM integration. Our aim is to offer a thoughtful and transparent approach to AI deployment, enhancing user research experiences without compromising clarity, providing Kramerius-based digital library teams with ready-to-use tools, and sharing adaptable examples, patterns, and strategies with conference participants. Our next steps will include detailed testing of the newly developed user interface, which will be largely completed by the time of the conference, as well as the integration of advanced AI functionalities. Simultaneously, we aim to gather usage data to support the effective integration of these AI features into digital libraries and catalogues.

Making Meaningful Connections: How Wellcome Collection Has Used Graph Technology to Aid Collections Discovery

Jonathan Cates, Antonia Langfelder, Štěpán Brychta, Daniel Sim

Wellcome Collection, United Kingdom

This poster will explore how Wellcome Collection has developed a new graph database to support its mission to make meaningful connections between different perspectives and stories of health past, present and future.

Wellcome Collection's catalogue theme pages, built around human tagged concepts and provide a key entry point to our collections. Despite these foundations, recent research revealed that they remained hidden and sometimes offered few meaningful onward connections limiting their ability to truly connect our collection.

We will showcase our work to address these shortcomings by developing a knowledge graph that integrates external sources, including MeSH, LCSH, and Wikidata, to provide new and enriched connections between our theme pages. We will show that by working to eliminate duplicates, aligning synonymous terms, and connecting related concepts, our approach enhances both the user experience and the underlying data infrastructure.

Our poster discusses the technical design of the graph — including nodes for works, concepts, and external authorities, and edges capturing semantic relationships — as well as the modular pipeline enabling scalable ingestion. We will highlight the importance of aligning data sources and understanding the relationships between these ontologies.

Throughout the project we have attempted to deepen our understanding of user needs and have been led by that understanding to make several key decisions about how we introduce the graph to our users. We will discuss how this learning has been reflected in the design and iteration of our new theme pages, and how we have approached releasing these gradually to our users.

The success of this project has depended on a multi-disciplinary team comprising user experience researchers, designers, software engineers, machine learning engineers, and library and archives professionals. We will highlight the importance of these different disciplines and reflect on how other institutions might learn from our experience.

We also explore how the graph provides a foundation for future exploration of machine learning and 'AI" techniques, including: the use of graph embeddings to expose more connections, named entity recognition, leveraging full text in our OCR data, and exploring the use of LLMs and VLMs to deepen out understanding of the works in our collection.

We believe that by leveraging and reconciling recognised and publicly accessible ontologies as the foundations for this work, we hope this project offers a practical model for GLAM institutions aiming to integrate machine learning and AI approaches into collections discovery.

Wellcome Collection has embarked on an ambitious 10 year strategy centred on a vision of a world where everyone's experience of health matters. We believe that this work is central to that vision.

UK Web Archive Data: Opportunities for the AI Community

Helena Byrne

British Library, United Kingdom

The UK Web Archive (UKWA) is a partnership of all six UK Legal Deposit Libraries. Its aim is to collect and preserve websites published in the UK, encompassing a broad spectrum of topics. Selective archiving began in 2005 and, following the implementation of the Legal Deposit Libraries (Non-Print Works) Regulations 2013 (www.legislation.gov.uk/uksi/2013/777/made), archiving continued at a whole-domain level. The entire collection amounts to approximately 2 petabytes (PB) of data, representing millions of websites and billions of documents and other files published to the web. The archive includes curated or thematic collections covering a diverse array of subjects and events, ranging from UK General Elections, Blogs, and the UEFA Women’s Euros 2022, to Live Art, the History of the Book, and the French Community in London.

Since November 2024, the UKWA has published one large data set of all records curated in the Annotation Curation Tool (ACT) - the software utilised to capture descriptive, administrative and technical metadata. This data set has fifteen metadata fields describing a mixture of websites, sections of websites, individual pages and some social media profiles. A number of inactive (relatively static) more targeted curated collections have also been published. These data sets comprise of twenty metadata fields including collection location data and a description of the websites (if one was entered into the curation tool).

The data sets are published in the UK Web Archive section of the British Library Research Repository in the folder ‘UK Web Archive: Data’ - https://bl.iro.bl.uk/collections/5379d014-1774-46e1-a96a-7089e7c814a3?locale=en

This lightning talk will highlight the rights and copyright status of this new resource. It will outline potential research projects that will be of interest to the AI community. These potential projects could help to develop tools and techniques to implement AI workflows to manage metadata management and creation.

Responsible AI Strategies

Abbey Potter

Library of Congress, United States of America

The Library of Congress has been exploring how to use computational methods like machine learning and AI for the benefit of its users, staff and stakeholders for many years. The Library has been using optical character recognition (OCR), a form of machine learning, since the 1990s to make typed documents machine readable. Natural language processing (NLP) is also used for several kinds of text analysis tasks, and speech to text services are provided on our website and at our National Library for the Blind and Print Disabled. We’ve experimented with AI for enhancing staff workflows for preprocessing digitization tasks, for building human in the loop review interfaces, for text summarization, data classification, and metadata generation. We’ve supported AI-enabled research with our collections. Text, network and object analysis are some examples. We’ve also tested AI-enhanced search and discovery interfaces with public users and our Innovators in Residence used AI to create engaging and popular public demonstration projects that reimagine how these technologies could connect with new users and researchers. These experiments inspired the development of the AI Planning Framework, which applies the lessons learned from previous experimentation to three planning phases and a set of activities to guide teams through an AI implementation.

Responsible AI is centered on evaluating the readiness, appropriateness and effectiveness of the data and models utilized for specific use cases and assessing the impacts of AI programs on users, staff and stakeholders. Recent advances in AI technologies have inspired new thinking around how the technology could help to address core challenges at the Library. Beginning in 2022, the Digital Innovation Division called LC Labs began three high-priority and focused experiments. The first exploring AI-assisted cataloging, the second testing models in creating authoritative Bill summaries, and the third extracting metadata from historic Copyright Registration forms. This presentation will share how these experiments are driving the adoption of the AI Planning Framework in the agency-wide AI governance process. This governance is guided by the goals of centering and maximizing the expertise of staff, the evaluation of models and tools prior to making implementation decisions, with preparing and understanding Library data in the AI ecosystem, specifically in terms of data readiness and data authenticity.

Propelling the Library’s Responsible AI strategies are the staff’s deep knowledge of collection data, insights into user needs, and prior experience with previous digital transformations in the sector. Hands-on experiments with and exposure to AI tools and output help to build knowledge and confidence with staff in applying AI responsibly. This presentation will also share how encouraging and iterating experiments can open new pathways to addressing long-standing challenges and issues and how the expertise and effort of staff are crucial to Responsible AI in LAMs.

AI tools can have significant differences in the licensing, privacy and security terms and settings.Models and tools also vary wildly in cost and how much compute is required to run them. These differences are important to understand yet not always transparent. Selecting and evaluating models that are appropriate and effective for individual use cases are critical to successful AI implementations. The presentation will show examples of how the Library is evaluating models.

Understanding how to evaluate these elements comes from hands-on experience with AI tools and output. Supporting access to AI tools and providing safe spaces to gain experience enables the development of informed and relevant review processes, usage guidelines, and rules of behavior. Exposure to AI helps to realize benefits and address challenges while reducing risks for users, communities and the institution. The presentation will give examples of how Library staff are prepared for and supported in their use of AI tools. .

AI tests and experiments produce a multitude of data and information to analyze. Quantitative metrics measure things like F1 scores (a combination of precision and recall measurements comparing machine generated results with ground truth data) and other indicators for accuracy. Often, manual review is needed of AI output, especially when the accuracy or quality of output is subjective or multiple terms or topics could be acceptable. Manual reviews could also include qualitative scoring for factors like completeness, fairness or style. In addition to the output data, the AI tools and programs need to be evaluated according to the risk factors they could present to organizations or people and if AI experiments, tests, or initial implementations will support the goals and values of the organization. All of these evaluations also need to be done throughout the lifecycle of the AI use case or task because a risk unique to AI systems are that results and outputs can change as models and tools are updated and as the model processes more data. The Library has developed a framework to assess AI uses according to three overarching factors that are evaluated based on evidence, staff feedback and according to strategic priorities and roadmaps. The first factor is evaluating if the results of the AI test, pilot or experiment is responsible. Are potential benefits realized and can risks be mitigated? Are the data, staff and tools utilized in the AI process compliant to Library copyright, privacy and security policies? Does the AI use case support the Library’s AI principles. The second factor to evaluate is if the AI test was effective. DId the output match expectations? Was the tool tested with Library data and reviewed by Library staff? Is the output good enough? The third factor we are evaluating is if the AI tool or process is practical to implement. Is the AI process cost effective in the short and long term and how is that estimated? Can the AI tools, models, and data be integrated into and managed with existing infrastructure, or would updates in infrastructure, process and policy be required? And can the Library ensure the quality of the AI output overtime? Building on existing evaluation and impact assessment methodologies and sharing AI evaluation practices with other libraries, archives and museums are essential to Responsible AI.

The presentation will also include recent work (concluding in August of 2025) to develop recommendations and demonstrations for how the Library could responsibly create and share datasets expressly for training and tuning AI models for Library-specific tasks and uses. Responsible AI Datasets recommendations will include how to identify and mitigate statistical biases in datasets and how to document transformations and processes to prepare datasets for use in AI tools.

Throughout the Library’s ongoing AI explorations we’ve learned that staff expertise and quality, balanced training data will drive Responsible AI at the Library of Congress. To enable innovation, experiments will continue as the AI roadmap is developed for priority use cases. We’ve also learned that the risks could be mitigated with strong quality assurance and governance. Sharing approaches to Responsible AI throughout the LAM community could have consequential influence on the responsible, effective and practical application of AI technologies in our sector.

Quantitatively Assessing the Applicability of LLM's to Metadata Creation and Enrichment

Neil Jefferies, Alex Hitchman

Bodleian Libraries, University of Oxford, United Kingdom

The objective of this project, part of the Oxford-OpenAI collaboration, is to understand where the use of LLM's (in particular, OpenAI models, but also others) might add value to special collections resource description activities in the library.

The evaluation will be carried out against a test corpus (circa 100 representative items) of newly digitsed materials drawn from our Global Dissertation collection, for which we have a card catalogue but no entries in our online catalogue. Both the disserations and cards are digitsed, and the images used as the basis for comparing parallel workflows:

1. Manual cataloguing to generate a minimal record required by our online catalogue (the baseline) based primarily on the existing card record.

2. Manual cataloguing to generate a 'gold standard' record, making use of the digitised source materials and any external reference sources as deemed necessary. This will be used as a benchmark against which outcomes can be measured.

3. Use of an LLM to extract and/or enrich metadata from card images, using both a basic task prompt and more advanced prompting to include the additional sources captured during workflow 2.

4. Use of an LLM to extract metadata from disseration images, using both a basic task prompt and more advanced prompting to include the additional sources captured during workflow 2.

Additional dimensions of investigation with the corpus will include:

1. The impact on performance of different output format requirements (MARC vs JSON vs BibFrame)

2. The effect of injecting causality - using prompting to sequence text extraction, metadata creation and metadata enrichment

3. A comparison of generative and conventional, trained AI (OCR/HTR) approaches for text extraction.

Case Studies in Responsible AI: Generative Approaches to Genealogy and Historical Newspaper Archives

Teodora Oniga, Irene Iriarte Carretero

Findmypast, United Kingdom

Findmypast is a UK based genealogy company and a part of DC Thomson, with a mission to connect people to their family stories. With millions of digitised records, newspapers and research tools, such as a family tree building canvas, our users can explore their family history and delve deeper into what the life of their ancestors looked like.

However, family history can be a complicated endeavour, with many sources being difficult to find and interpret [1]. Technological advances, and AI in particular, can play a key role in unlocking the hobby and making the process more accessible to a wider audience [2]. This talk focuses on describing some of the different existing and future applications of AI within Findmypast, as well as some of the practical aspects of utilising advanced generative technology within a niche market.

One of our goals as a business is to help bring the past to life and key to doing this is helping contextualise the facts that users find in our wide variety of records. This is a classic application of generative AI [3] and the talk will discuss a possible solution to turning the factual and disparate information that our users add to people in their family tree into accessible narratives about their lives. This would give our users a more holistic view of their ancestors’ lives as well as creating highly shareable content that is more easily interpretable by a wider range of audiences.

We have found that running this workflow at scale requires careful consideration of several aspects. Critically, it was key to acknowledge the personal and sensitive nature of the content that is being created using generative AI, given that it is describing real people’s lives. To account for this, we performed many rounds of prompt engineering in a systematic way to ensure that we achieved an appropriate balance between keeping the information factual and adding some appropriate narrative that made the content engaging. The benchmarking of this balance was open to genealogical experts within the business using an internal platform inspired by Chatbot Arena [4].

Furthermore, we will discuss the importance of testing the narratives on significantly diverse test cases before making the experience available to our users given the wide range of time periods and circumstances that the people captured in our platform lived within. Finally for this use case, we will discuss the metrics we used to mitigate the common generative AI risk of hallucinations [5].

In addition, we will explore another potential application of generative AI, this time to historical newspapers. In recent years, Large Language Models (LLM) have found uses in historical archives for tasks such as transcription [6], OCR correction [7] and entity extraction [8]. However, generative techniques such as summarisation and retrieval augmented generation (RAG) on historical text have not been as widely explored. As a method that combines a knowledge base with a generative model to enhance the relevance of the generative model outputs and reduce hallucinations, RAG has become ubiquitous in the field of information retrieval. Some initial works have suggested that applying RAG to historical newspaper articles can provide benefits over traditional information retrieval [9].

While showing great potential in assisting researchers and family historians beyond transcription/OCR tasks, such systems are not without limitations [10,11]. In addition to the well-known problem of hallucinations, LLMs can introduce biases and unfairness [12,13]. Various methods have been proposed for evaluating such systems, including the use of LLMs, known as LLM-as-a-judge [14,15]. Using LLMs as evaluators has significant cost and speed advantages over human evaluation and can be used to iteratively improve the system. A key question remains whether such approaches, which have their own challenges [16], are applicable to historical text.

To answer some of these questions, we describe an experimental newspaper article RAG application, utilising hybrid search (text and vector) as its underlying information retrieval system. We will then present an analysis on the use of an LLM-as-a-judge framework for evaluating bias and unfairness, as well as our findings on the impact of various bias mitigation strategies [17].

As the accuracy of the system is a key concern, we also examine using the LLM-as-a-judge approach for measuring general performance metrics. We will look at metrics for both the retrieval (relevance, accuracy) and generation (relevance, faithfulness, correctness) stages [18]. In addition, we also investigate applying similar metrics to the summarisation of historical newspaper articles.

The findings from this exploratory study will further our understanding of how effective automated systems such as LLM-as-a-judge are in the context of historical newspapers and illustrate potential approaches to mitigating biases and LLM inaccuracies in a systematic manner.

In conclusion, during the talk we will use these examples to highlight the huge role that AI can play in transforming the way that our users interact with our content. Ultimately, we believe that AI can enable a wider audience to find, explore and share the vast amount of digital heritage that we hold as a business. Innovative applications of AI within our sector can help users more easily find content that is more relevant to them in a large corpus of unstructured data, as well as helping them contextualise it. This can drive a better understanding and appreciation of the past and increase the accessibility of historical materials.

However, it is crucial that we consider our responsibility over these datasets and experiences by taking a more active role in selecting, curating and contextualising this content. We will focus on discussing the need to carefully consider the implications and intricacies of each use case in isolation and implement robust frameworks to manage and mitigate potential biases and limitations of this technology.

Mitchell BA, Kim B. Amateur family genealogists researching their family history: a scoping review of motivations and psychosocial impacts. Genealogy. 2023 Dec 28;8(1):3.
Shan F, Luther K. Reexamining Technological Support for Genealogy Research, Collaboration, and Education. Proceedings of the ACM on Human-Computer Interaction. 2025 May 2;9(2):1-33.
Spennemann DH. Generative artificial intelligence, human agency and the future of cultural heritage. Heritage. 2024;7(7):3597.
Chiang WL, Zheng L, Sheng Y, Angelopoulos AN, Li T, Li D, Zhu B, Zhang H, Jordan M, Gonzalez JE, Stoica I. Chatbot arena: An open platform for evaluating llms by human preference. In Forty-first International Conference on Machine Learning 2024 Mar 7.
Gosmar D, Dahl DA. Hallucination Mitigation using Agentic AI Natural Language-Based Frameworks. arXiv preprint arXiv:2501.13946. 2025 Jan 19.
Humphries M, Leddy LC, Downton Q, Legace M, McConnell J, Murray I, Spence E. Unlocking the archives: Using large language models to transcribe handwritten historical documents. Historical Methods: A Journal of Quantitative and Interdisciplinary History. 2025 Apr 30:1-9.
Thomas A, Gaizauskas R, Lu H. Leveraging LLMs for post-OCR correction of historical newspapers. In Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA)@ LREC-COLING-2024 2024 May (pp. 116-121).
González-Gallardo CE, Tran HT, Hamdi A, Doucet A. Leveraging open large language models for historical named entity recognition. In International Conference on Theory and Practice of Digital Libraries 2024 Sep 24 (pp. 379-395). Cham: Springer Nature Switzerland.
Tran TT, González-Gallardo CE, Doucet A. Retrieval Augmented Generation for Historical Newspapers. In Proceedings of the 24th ACM/IEEE Joint Conference on Digital Libraries 2024 Dec 16 (pp. 1-5).
Mauermann J, Oberbichler S. LLM Biases: Expected and Unexpected Model Design Effects in Historical Newspaper Article Extraction on the Messina Earthquake. DH Lab. Updated 31 Jan 2025. Accessed 31 May 2025. Available from: https://doi.org/10.58079/137qr
Toth GM, Albrecht R, Pruski C. Explainable AI, LLM, and digitized archival cultural heritage: a case study of the Grand Ducal Archive of the Medici. AI & SOCIETY. 2025 Mar 25:1-3.
Dai S, Xu C, Xu S, Pang L, Dong Z, Xu J. Bias and unfairness in information retrieval systems: New challenges in the llm era. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2024 Aug 25 (pp. 6437-6447).
Gallegos IO, Rossi RA, Barrow J, Tanjim MM, Kim S, Dernoncourt F, Yu T, Zhang R, Ahmed NK. Bias and fairness in large language models: A survey. Computational Linguistics. 2024 Sep 1;50(3):1097-179.
Es S, James J, Anke LE, Schockaert S. Ragas: Automated evaluation of retrieval augmented generation. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations 2024 Mar (pp. 150-158).
Gao M, Hu X, Yin X, Ruan J, Pu X, Wan X. Llm-based nlg evaluation: Current status and challenges. Computational Linguistics. 2025 Apr 10:1-28.
Li D, Jiang B, Huang L, Beigi A, Zhao C, Tan Z, Bhattacharjee A, Jiang Y, Chen C, Wu T, Shu K. From generation to judgment: Opportunities and challenges of llm-as-a-judge. arXiv preprint arXiv:2411.16594. 2024 Nov 25.
Singh K, Ngu W. Bias-Aware Agent: Enhancing Fairness in AI-Driven Knowledge Retrieval. In Companion Proceedings of the ACM on Web Conference 2025 2025 May 8 (pp. 1705-1712).
Yu H, Gan A, Zhang K, Tong S, Liu Q, Liu Z. Evaluation of retrieval-augmented generation: A survey. In CCF Conference on Big Data 2024 Aug 8 (pp. 102-120). Singapore: Springer Nature Singapore.

AI-Powered Subject Indexing in the Archives – Piloting Finto AI at the Finnish Literature Society

Milla Eräsaari, Teemu Hirvonen

Finnish Literature Society, Finland

The Finnish Literature Society (SKS) Archives has launched a development project to explore the potential of AI-assisted subject indexing within its archival description workflows. At the core of this initiative is Finto AI, an open-source tool developed by the National Library of Finland. Using the General Finnish Ontology (YSO) as its vocabulary, Finto AI enables the automatic generation of subject terms. It is pre-trained on Finnish-language data and offers the flexibility for domain-specific adaptation.

Our goal is to integrate Finto AI into our archival management system to enhance metadata creation, improve discoverability, and support the long-term accessibility of archival materials. At the same time, we want to ensure that the implementation of AI is not a top-down technical imposition, but a collaborative effort that includes the expertise and perspectives of our staff. The project is still in its early planning phase, and we are actively seeking insights, experiences, and peer feedback from others in the GLAM (Galleries, Libraries, Archives, and Museums) community.

In this five-minute lightning talk, We will briefly present the project’s aims and pose five key questions we hope to explore with others:

What kind of workflow best supports AI-assisted subject indexing in archival environments?
How should automation and human expertise be balanced?
How can we make the most of an AI tool like Finto AI when a significant portion of our materials is not yet digitised?
We are considering how to balance manual and automated work, especially when dealing with analogue materials, and whether subject indexing and digitisation should be coordinated or treated as separate stages.
How can staff be actively involved in the development and implementation process?
Our colleagues are proud how they know literature and folklore, they don´t want to be superseded by machines, but we have a huge task ahead of us and we do need help. How can we find ways to see AI as a help rather than a threat?
Should we explicitly mark or indicate when indexing has been performed by AI?
What level of transparency do users and professionals expect?
Are we even asking the right questions about AI in cultural heritage work?
What assumptions underlie our implementation decisions, and what perspectives might we be overlooking?

Adopting AI in GLAM institutions is not just a technical challenge—it is also a cultural and organisational one. Instead of presenting polished solutions, this talk is a call for collaboration. We’re keen to hear from other institutions that have experimented with AI-assisted subject indexing tools, and to exchange tips, lessons learned, and critical reflections.

References:

Suominen, O. (2019). Annif: DIY automated subject indexing using multiple algorithms. LIBER Quarterly: The Journal of the Association of European Research Libraries, 29(1), 1-25. https://doi.org/10.18352/lq.10285
National Library of Finland. Annif Project. https://annif.org
National Library of Finland. https://ai.finto.fi/

The fifth law of Ranganathan and implications for information science in artificial intelligence-driven world: addressing the anxieties of the AI in developing countries.

Chiatu Matilda ODUAGWU, Emmanuel A. ODUAGWU, Oyemike Victor BENSON

FEDERAL POLYTECHNIC NEKEDE, OWERRI, NIGERIA.

Introduction

This paper explores the fifth law of S. R. Ranganathan in consonance with technological trends, with a particular emphasis on the concerns of the artificial intelligence (AI) era, especially in developing countries. Ranganathan's five laws of library science, first proposed in 1931, have long served as foundational principles for the operation and philosophy of libraries worldwide. These laws include: (1) Books are for use, (2) Every reader his/her book, (3) Every book its reader, (4) Save the time of the reader, and (5) The library is a growing organism (Ranganathan, 1931). The fifth law, which posits that "the library is a growing organism," underscores the dynamic nature of libraries as entities that must continually evolve, adapt, and expand in response to societal, technological, and informational changes. This principle implies not only physical growth in collections and infrastructure but also intellectual and operational adaptation to ensure relevance and sustainability. In an era dominated by rapid technological advancements, particularly AI, this law takes on renewed significance, urging library and information science (LIS) professionals to reexamine traditional practices and embrace innovation to address emerging challenges, including those unique to resource-constrained environments in developing nations.

Philosophy behind the Fifth Law as Proposed By Ranganathan

The philosophy behind Ranganathan's fifth law is rooted in the idea that libraries are living systems, akin to biological organisms, which must grow to survive and thrive. Ranganathan emphasized that stagnation leads to obsolescence, and thus libraries must accommodate increases in users, resources, and services while adapting to external influences (Ranganathan, 1931). In the context of information science, the law encourages proactive responses to technological disruptions, ensuring that libraries remain vital hubs for knowledge dissemination. As libraries transition from print-centric to digital and AI-enhanced environments, the fifth law provides a framework for understanding how technological trends can foster growth rather than threaten extinction, particularly in developing countries where access to advanced technologies is often limited (Benson & Oduagwu, 2023).

The Nexus between the Fifth Law and Technological Trends in Information Science

The nexus between the fifth law and technological trends in information science is evident through the lens of historical industrial revolutions. During the First Industrial Revolution (late 18th to early 19th century), mechanization introduced printing presses that expanded library collections, embodying the growth aspect of the fifth law. The Second Industrial Revolution (late 19th to early 20th century) brought electrification and mass production, enabling broader access to information resources. The Third Industrial Revolution, marked by digitalization and the internet in the late 20th century, transformed libraries into hybrid spaces with online catalogs and databases, requiring LIS professionals to adapt to information technology (IT) tools (Noruzi, 2004). For instance, AI-driven recommendation systems can match users with resources more efficiently, aligning with Ranganathan's emphasis on saving time and ensuring every book finds its reader, even in under-resourced settings (Omame & Alex-Nmecha, 2020).

Industrial Revolutions and Information Officers’ Reactions in Accordance with Law that States: ‘Library is a Growing Organism’

Information professionals' responses and attitudinal approaches to technological development have varied historically, often oscillating between enthusiasm and resistance. In the early days of computerization, many librarians viewed automation as a threat to traditional roles, fearing job displacement and loss of human touch (Barner, 2011). However, progressive attitudes, guided by the fifth law, have led to successful integrations, such as the adoption of integrated library systems (ILS) that streamline operations. This attitudinal shift is crucial, as the fifth law implies that resistance to growth could render libraries irrelevant in an AI-driven world, a concern amplified in African contexts where infrastructure challenges persist (Benson & Oduagwu, 2023).

Impact of Artificial Intelligence on Library Practice,

The impact of AI on library practice is profound and multifaceted. AI tools can revolutionize cataloging by automating metadata extraction and classification, reducing human error and processing time (Omame & Alex-Nmecha, 2020). In information retrieval, AI-powered search engines and chatbots provide instant, accurate responses to user queries, enhancing accessibility. For collection development, predictive algorithms analyze usage patterns to inform acquisitions, ensuring collections grow dynamically. In user services, AI enables virtual reference desks and personalized learning paths, particularly beneficial in resource-scarce environments. However, in developing countries, AI's impact is tempered by infrastructural challenges. For example, in Nigeria, AI has improved cataloging in academic libraries but is hindered by inconsistent electricity and limited high-speed internet, which affects access to cloud-based AI platforms (Benson & Oduagwu, 2023). Similarly, in India, AI applications in public libraries have boosted digital literacy programs, yet uneven adoption exacerbates urban-rural divides (Kalbande et al, (2024).

Expectations of Library and Information Professional

Expectations of library and information professionals in the AI era are evolving rapidly. Professionals are now expected to possess hybrid skills, combining traditional LIS knowledge with data science, programming, and ethical AI literacy (Asemi & Asemi, 2018). The changing role of stakeholders in library sectors further amplifies this. Library administrators must prioritize AI investments, policymakers should develop supportive regulations, and users become co-creators in AI-enhanced ecosystems. For instance, collaborations between libraries and tech firms in countries like Kenya have led to AI-driven mobile libraries, extending services to remote areas (Masinde, Mugambi & Wambiri, 2024). Additionally, educators in LIS programs must reposition curricula to include e-learning and AI competencies, as highlighted in studies on African library schools (Benson, Oduagwu, Mbajiorgu, & Ike., 2022).

Changing Role of Stakeholders in Response to the Fifth Law in Current AI Ages

Despite these opportunities, several concerns associated with AI in libraries, particularly in developing countries, warrant attention. Anxieties stem from the "AI divide," where limited access to high-performance computing infrastructure such as GPUs and data centers prevents equitable adoption (Asim, Arif, Rafiq & Ahmad, 2023). To address these concerns, several recommendations are proposed. First, LIS education curricula in developing countries should incorporate AI training modules, emphasizing ethical frameworks like UNESCO's Recommendation on the Ethics of Artificial Intelligence (UNESCO, 2021). Governments and international organizations must invest in infrastructure, such as subsidized cloud computing and renewable energy for libraries. Collaborative networks, like those under the International Federation of Library Associations (IFLA), can facilitate knowledge sharing and open-source AI tools tailored for low-resource settings. Libraries should adopt AI governance policies that prioritize transparency, bias audits, and user consent. Professional development programs, including workshops on AI ethics, can build confidence and skills (Chigwada, 2024). Finally, stakeholder engagement through public forums and partnerships ensures inclusive growth, aligning with the fifth law's organism metaphor, and drawing on entrepreneurial approaches in LIS to foster innovation (Anyanwu, Oduagwu, Ossai-Onah & Amaechi, 2013).

Conclusion

In conclusion, this paper asserts that librarians must awaken to the transformative potential of AI, embracing technologies to realize optimal service delivery while mitigating risks. By reinterpreting Ranganathan's fifth law in the AI context, libraries in developing countries can evolve from passive repositories to dynamic, inclusive knowledge ecosystems. This adaptation not only addresses AI anxieties but also positions libraries as leaders in the information revolution, fostering sustainable development and equity.

References

Anyanwu, E. U., Oduagwu, E. A., Ossai-Onah, O. V. & Amaechi, N. M. . (2013). Repositioning library and information science graduates in Nigeria for self-employment through entrepreneurship education. American International Journal of Contemporary Research, 3(8), 178 - 184. https://www.aijcrnet.com/journals/Vol_3_No_8_August_2013/21.pdf

Asemi, A., & Asemi, A. (2018). Artificial intelligence (AI) application in library systems in Iran: A taxonomy study. Library Philosophy and Practice (e-journal)1840, https://digitalcommons.unl.edu/libphilprac/1840.

Asim, M., Arif, M., Rafiq, M. & Ahmad, R.. (2023). Investigating applications of artificial intelligence in university libraries of Pakistan: An empirical study. The Journal of Academic Librarianship, 49(6), 102803. DOI: 10.1016/j.acalib.2023.102803.

Barner, K. (2011). The Library is a Growing Organism: Ranganathan's Fifth Law of Library Science and the Academic Library in the Digital Era. Library Philosophy and Practice (e-journal),548. https://digitalcommons.unl.edu/libphilprac/548/

Benson, O. V., & Oduagwu, E. A. (2023). Artificial intelligence and library practice in developing countries: A call for realistic partnership and sustainable collaboration. The Information Technologist, 20 (2), 117-126.

Benson, O. V., Oduagwu, M. C., & Mbarjiorgu, O. F. & Ike, C. P. (2022). Towards sustainable e-learning in library schools: Expectations of library and information science educators in Africa. Conference Proceedings of the 60^th National Conference & AGM of the Nigerian Library Association, Abuja, pp. 633 -651

Chigwada, J. (2024). A proposed framework for a digital literacy course for artificial intelligence in academic libraries. South African Journal of Libraries and Information Science, 90(2), 1-8. https://sajlis.journals.ac.za/pub/article/view/2388/1662

Cox, A. M., Pinfield, S., & Rutter, S. (2019). Extending McKinsey’s 7S model to understand strategic alignment in academic libraries. Library Management, 40(5), 313-326. https://eprints.whiterose.ac.uk/id/eprint/135632/7/SevenSpaper14062018_anonymised_-_revised_final.pdf

Ekere, Justina N. & Benson, O. V. (2022). Managing smart campus and smart libraries: A look at challenges and the way forward for libraries in developing countries. Library Philosophy and Practice (e-journal)7478. https://digitalcommons.unl.edu/libphilprac/7478/

Hervieux, S., & Wheatley, A. (2021). Perceptions of artificial intelligence: A survey of academic librarians in Canada and the United States. The Journal of Academic Librarianship, 47(1), https://doi.org/10.1016/j.acalib.2020.102270.

Noruzi, A. (2004). Application of Ranganathan's laws to the web. Webology, 1(2), Article 8. https://www.webology.org/2004/v1n2/a8.html

Ranganathan, S. R. (1931). The five laws of library science. Madras Library Association.

UNESCO. (2021). Recommendation on the ethics of artificial intelligence. UNESCO.