Conference Agenda

Session

SP03: Short papers

Time:

Friday, 05/Dec/2025:

10:00am - 11:00am

Location: Pigott Theatre (Auditorium)

Knowledge Centre, capacity 255

Presentations

Hacking AI chatbots for critical AI literacy in the library

Heather Ford, Andrew Burrell, Monica Monin, Suneel Jethani, Bhuva Narayan

University of Technology Sydney, Australia

AI is seeping into the fabric of our information environment as generative AI (genAI) tools are increasingly used to search for and discover information. But AI systems regularly produce errors (also known as “hallucinations”) which demonstrate that uncertainty is a feature rather than a bug of such systems. Despite this problem, we regularly hear stories about people who have mistakenly used false information provided by these tools in their communications and outputs. An American lawyer, for example, was fined because he submitted fake citations originating from ChatGPT in a court filing and Australian academics had to apologise for their submission to a parliamentary inquiry that included false claims against consultancy firms originating from Google Bard.

There is wide agreement about the need for AI literacy in order for publics to recognise how to use AI effectively and ethically but less consensus on how AI literacy is best achieved. A key component of many AI literacy frameworks is an understanding of how AI works. Using a case study of a pilot, participatory AI literacy intervention working with 14 librarians in four Greater Sydney libraries, we argue that instead of learning only about how AI works, AI literacy might also involve learning when, how and why AI doesn’t work. In this, we extend Mike Ananny’s argument that AI errors are a mechanism for “making generative AI a public problem” (Ananny 2024) by articulating how AI literacy programs could expand or intensify such learning. The concept of socio-technical error and uncertainty, we argue, is a useful heuristic for understanding AI – particularly in the context of information search and discovery, a primary practice in both public and academic libraries.

The project involved both social research and co-design research methods over six key phases in 2024 (see Table 1), with a second version planned for late 2025.

Phase 1: Pre-project survey Anonymous pre-project survey to determine current AI skills confidence levels among participating librarians

Phase 2: Interviews In situ interviews asking questions about current attitudes and practices

Phase 3: Workshops Two co-design workshops with librarians

Phase 4: Exhibitions The Making of Misbehaving Machines exhibits at four partner libraries

Phase 5: Post-project survey Anonymous post-project survey to determine AI skills confidence changes as a result of the project

Phase 6: Review workshop Workshop to discuss draft results, lessons learned and next steps

Participating libraries were selected for diversity of library workers’ predicted experience and skills using generative AI tools to include: one university library (University of Technology Sydney), one vocational training institution (TAFE NSW, Ultimo) and two public libraries (Parramatta and Sydney City libraries).

During the workshops, participants learned about the problem of AI error and uncertainty through presentations of the latest social research. They then experimented with genAI tools to try to produce errors and reveal uncertainties and to design alternative interfaces that made those tools’ uncertainties more transparent. Participants co-created datasets of possible genAI questions and answers to think through AI uncertainty and, using this data, they worked together to produce a library exhibit (“The Making of Misbehaving Machines”) where a model was trained on the data produced and its workings were explicitly demonstrated and explained. The library exhibits ran for at least two weeks in each of the four locations and librarians used the opportunity to talk to their clients about genAI, with some in academic settings using the exhibit for class library visits.

Analysis of the survey results indicate a general improvement in confidence across all dimensions as a result of librarians’ participation in the project. For participating librarians, the project helped them to understand what genAI tools are useful for by understanding their limitations. According to one librarian, "The project highlighted the negative side of LLMs" and "opened my eyes to how deceptive the appearance of some answers are". Interestingly, the project improved librarians’ confidence in their own ability to evaluate the results of genAI tools at the expense of those tools. According to one participant, “now my lack of confidence is in the LLM, not in my ability to prompt it”.

In the final project review workshop, librarians discussed how library clients (and some of their library workers colleagues) were currently either afraid of AI, which meant that the refused experimenting with tools or they were blindly using AI, when it is rolled into other library search databases, for example. In both cases, the problem is the lack of agency and critical engagement with genAI that they felt it was important for librarians to work against.

The project is a small step towards improving critical AI literacy of librarians and the clients they support. We aimed to test the hypothesis that AI error and uncertainty could be an important component of AI literacy conceptualisations and curricula and recognised that understanding how AI models make mistakes is not only practically useful but also works to shore up human confidence in the wake of AI’s authoritative dominance.

The (pilot) project was limited in its reach and we noted a number of elements that we want to improve. This includes improving the learning materials on genAI error and uncertainty and enabling librarians to learn on their own before they start making exhibits, as well as further opportunities for the interactivity of the exhibit and options for remote setup in rural locations. We plan to develop these ideas in a second version of the project in late 2025.

Understanding how AI tools make errors and exhibit uncertainty, although important, will never constitute the entirety of an AI literacy curriculum for libraries. But we hope to have demonstrated how it is possible to learn about how AI works by learning why and when it doesn’t work, and that the ways in which learning is achieved is by increasing librarians’ confidence in using and supporting clients’ use of genAI tools collectively.

Reference

Ananny, Mike. 2024. “Making Generative Artificial Intelligence a Public Problem. Seeing Publics and Sociotechnical Problem-Making in Three Scenes of AI Failure.” Javnost - The Public 31 (1): 89–105. https://doi.org/10.1080/13183222.2024.2319000.

Recognising Hands, Recognising Processes - Benchmarking eXplainable Automated Text Recognition (X-ATR) for Libraries

Joe Nockels

University of Sheffield, Digital Humanities Institute, United Kingdom

Responding to the conference theme of ‘AI Everywhere, All at Once’, this paper will address: To what level libraries require eXplainable AI (XAI) to utilise Automated Text Recognition (ATR) at scale on their collections? With the National Library of Scotland (NLS) as case study, the role of AI-enabled transcription, against the backdrop of technological advancement and shifting user expectations, is further illuminated not abstracted within a library context. Well-articulated processes are increasingly important as libraries incorporate ATR - the conversion of images-of-text into computer-readable format, further within their collections systems [7]. In line with this, XAI is defined as the effort to sufficiently provide model information for nontechnical users [2], and offers a way to scrutinise outputs as well as anticipate potential ethical implications when scaling processes [15]. Directed by a growing emphasis on XAI within libraries, this paper centres issues of trust, AI literacy and user expectations [9, 15].

The perceived tradeoff between XAI and model performance features heavily in scholarly literature [1, 5], and often foregoes attempts to prioritise building intelligible systems [13]. This paper therefore demonstrates how the lack of consistent eXplainable ATR (X-ATR) leaves room for overhyping tool accuracy and creates potential for AI-driven approaches to disrupt social norms between libraries as trustworthy collection stewards and end-users [6, 8].

Method

The Library of Congress AI planning framework of ‘Understand, Experiment and Implement’ acted as a methodological wrapper for measuring X-ATR and libraries’ requirements [12]. In framing X-ATR as a social issue requiring meaningful exchange between informed actors, an Action Research (AR) method with NLS digital staff, curators and public users, was performed over three months in 2025. This enabled local problem-solving through collaborative reflection [4], synthesised with a content analysis of academic, public-facing, and grey literature on XAI library approaches.

Thereafter, six ATRs were chosen across a range of tool openness: from Open Source (OS) to commercial Large Language Models (LLMs). Each ATR was used to re-OCR the same 50 pages of The Spiritualist Newspaper (1869-1882), which forms a record of Scottish Spiritualism - a belief system that the living could communicate with the deceased [3, pg. 1], available via the NLS Data Foundry (https://data.nls.uk/). The Spiritualist also contains reports of mediums’ ‘automatic writing’ during seances (3, pg. 73), leading to future explorations into whether these activities functioned as logical systems mirroring present discussion of AI use. These ‘research in-practice’ ATR experiments prompted further AR-led discussions, both on error rates and the potential value of XAI-ATR, resulting in eventual findings.

Findings

X-ATR is shown to differ tremendously between tools and even within project documentation regarding preprocessing, model training and setting parameters. Though this partially correlates to ATR openness, OS and community-built tools also require greater XAI to allay anxieties - especially from curators - if deployed at scale. ATR accuracy is also shown to have little relationship to XAI. A set of recommendations is therefore proposed for developers to construct mutually intelligible language surrounding ATR, and broader AI, processes.

Value/Originality

Operationalising Samek and Müller’s [14] notion that XAI is required for user trust in systems by pinpointing the degree to which this applies to AI-enabled automated transcription within libraries. Real-world recommendations are also provided for how to select, implement and test XAI for libraries, as well as upskill staff in such approaches.

References -

Ali, S., Abuhmed, T., El-Sappagh, S., Muhammed, K., Alonso-Moral, J.M., Roberto Confalonieri, R., Guidotti, R., Del Ser, J., Natalia Díaz-Rodríguez, N., Herrera F. (2023) ‘Explainable Artificial Intelligence (XAI): What We Know and What Is Left to Attain Trustworthy Artificial Intelligence’, Information Fusion, 99: 1-52. doi: 10.1016/j.inffus.2023.101805.

Confalonieri, R., Ludovik, C., Wagner, B., and Besold, T.R. (2021) ‘A Historical Perspective of Explainable Artificial Intelligence’, WIREs - Data Mining and Knowledge Discovery, WILEY, 11: 1–21. doi: 10.1002/widm.1391.

Foot, M. (2023) Modern Spiritualism and Scottish Art - Scots, Spirits and Seances, 1860-1940. London: Bloomsbury.

Kemmis, S., McTaggart, R., Nixon, R. (2014). The Action Research Planner: Doing Critical Participatory Action Research. New York: Springer.

Lipton, Z.C. (2016) ‘The Mythos of Model Interpretability’, Paper presented at 4th ICML Workshop on Human Interpretability in Machine Learning, June 23, 2016, New York. 1-8. doi: 10.48550/arXiv.1606.03490

McCarron, L. (2023). Interview by Joseph Nockels [Microsoft Teams]. 27 April.

Mühlberger, G., Seaward, L., Terras, M., Ares Oliveira, S., Bosch, V., Bryan, M., Colutto, S. … Zagoris, K. (2019) ‘Transforming scholarship in the archives through handwriting text recognition, Transkribus as a case study’, Journal of Documentation, 75(50): 965-967. doi: 10.1108/JD-07-2018-0114/full/html

Mussgnung, A. (2024) ‘AI is Not Context’. Work-in-Progress Talk, Centre for Technomoral Future, University of Edinburgh, October 22, 2024.

National Library of Scotland (2024) ‘AI Statement’. Available at: https://data.nls.uk/projects/ai-statement/ (accessed 25 March 2025).

National Library of Scotland (2023) Data Foundry. Available at: https://data.nls.uk/ (Accessed 5 September 2023).

National Library of Scotland (2024) Data Foundry, ‘The Spiritualist’, https://data.nls.uk/data/digitised-collections/spiritualist-newspapers/ (Accessed 20 March 2023).

Potter, A. (2023) ‘Introducing the LC Labs Artificial Intelligence Planning Framework’, The Signal. Available at: https://blogs.loc.gov/thesignal/2023/11/introducing-the-lc-labs-artificial-intelligence-planning-framework/ (Accessed 24 March 2024).

Rudin, C. (2019) ‘Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead’, Nature Machine Intelligence, 1: 206-215. Doi: 10.1038/s42256-019-0048-x

Samek, W., Müller, K-R. (2019) ‘Towards Explainable Artificial Intelligence’ in Wojciech Samek, Grégoire Montavon, Andrea Vedaldi, Lars Kai Hansen, Klaus-Robert Müller (eds.) Explainable AI: Interpreting, Explaining and Visualising Deep Learning, pp. 5-22. Cham, Switzerland: Springer Nature.

Van Wessel, J.W. (2020) AI in Libraries: Seven Principles. National Library of the Netherlands. Available at: https://zenodo.org/records/3865344 (accessed 25 March 2025).

CultureQuest: Deploying AI-powered Characters in Museum Spaces to Reimagine Visitor Engagement

Richard Cole¹, Chris Bevan¹, Frances Pickworth¹, Ben Ackland², Thomas Keane²

¹University of Bristol; ²Meaning Machine

In the wake of a transformative wave of public-facing AI tools, the question that museums face is not what AI could do to them, but rather what AI could do for them.

With this short talk, we will present the findings from CultureQuest, a collaborative pilot project led jointly by the University of Bristol and Meaning Machine, working in partnership with Bristol Museums, and funded by Digital Catapult Creative Connect. The project uses generative AI to create personalised, interactive ‘quests’ that lead museum visitors away from passive consumption of information towards active interpretation of exhibits. The goal is both to educate—by encouraging deeper engagement with museum objects and narratives—but also to entertain, by turning the act of moving through a gallery into a dynamic and revelatory experience.

Our pilot focuses on the Egyptian Gallery at Bristol Museum and Art Gallery, using the space as a testbed for a scalable quest system driven by Meaning Machine’s AI-powered NPCs (non-player characters). Visitors take on the role of ly-en-Amen-nay-es-nebet-ta, a deceased ancient Egyptian woman entering the afterlife who must discover her past in order to pass the weighing of the heart ritual and enter the 'Field of Reeds’. ly-en-Amen-nay-es-nebet-ta's character is directly inspired by an empty coffin on display in the gallery. Visitors then use their smartphone to engage in conversations with four ancient Egyptian gods - Osiris, Anubis, Maat, and Ra. These characters set the visitors tasks that require them to interact closely with objects and ideas across the gallery as they progress through the Duat.

Each character’s knowledge and personality are shaped by a corpus of collection-specific data provided by the museum, and are designed in partnership with curatorial staff. This means that NPCs are aware of their ‘setting’: not only do they understand the collection data they have been trained on, but they also refer to their position within the exhibition space. Because the NPCs are powered by large language models (LLMs) and can respond to natural language input, the conversations visitors have with them are not pre-scripted, but generated dynamically. As a result, the quest can offer each visitor a unique experience that responds as their interests develop in real time and unfolds at a pace of their own choosing.

The system is built using Meaning Machine’s Game Conscious™ NPC engine, which has previously been deployed for the purposes of immersive entertainment and is being tested in a heritage context for the first time. Meaning Machine’s NPCs are not merely chatbots with a historical skin—they are context-aware, conversationally rich characters who understand their own ‘world,’ refer to other objects in the gallery, and remember prior exchanges. By using the Game Conscious™ engine, CultureQuest offers museum visitors opportunities for agency and goal-oriented interaction that have previously been the prerogative of digital gaming, transforming them into ‘researchers’ or ‘explorers’ within the gallery space.

This presentation explores the project in light of the Fantastic Futures 2025 theme ‘AI Everywhere, All at Once’ by offering insight into the potential benefits and drawbacks of integrating generative AI into public heritage environments. We reflect on the design principles that guided our approach, including:

Co-design and interdisciplinarity: The project represents a fusion of academic research in historical simulation and games (University of Bristol), commercial innovation in AI character systems (Meaning Machine), and institutional knowledge within the GLAM sector (Bristol Museums). This triangulated model brings together a diverse range of expertise right from the outset, ensuring that quality assurance is integrated at all stages of the project.

Ethical responsibility: Drawing on curatorial input, we have put in place measures to ensure NPCs’ output is factually accurate, culturally sensitive, and non-hallucinatory (a known risk with LLMs). Characters are deliberately ‘bounded’ by their roles and knowledge, with meta-dialogue (e.g. explaining their limits) built in where appropriate.

Accessibility and inclusivity: The system prioritises intuitive UX design, offering both accessibility and agency. Visitors can interact via typing into a messaging app-style interface, with text-to-voice and voice-to-voice functionality also available to support visually impaired users. Onboarding is handled via in-gallery prompts and onboarding materials are co-designed with curators.

Scalability and sustainability: CultureQuest is a smartphone-based system that requires no specialist hardware. This ‘bring your own device’ model is both environmentally and economically viable, reducing the need for costly infrastructure that may be at risk of obsolescence, while also providing the basis for broader deployment across multiple institutions.

This pilot was the first real-world deployment of CultureQuest in a museum environment. Over the course of a four-month sprint (April–July 2025), we prototyped a fully functioning quest system, developed the narrative through iterative playtesting, and conducted an initial player study of 40 participants. The study gathered both qualitative and quantitative data to understand how visitors to the Egyptian Gallery responded to the AI characters inspired by the collection, as well as how the quest system influenced visitor behaviour in and experience of the space. Preliminary findings suggest that character-driven storytelling and personalised quest structures hold potential to boost visitor engagement, particularly with groups.

In sharing our findings, challenges, and lessons learned from this pilot at Fantastic Futures 2025, we aim to prompt discussion around the practical, ethical, and creative implications of deploying generative AI in GLAM contexts. What does it take to make AI characters that feel at home in a museum? How do we strike the right balance between control and openness, authenticity and artistic license? And how might visitor-facing AI influence not only how museums are experienced, but what they might become?

Philippe Genêt, Victor Zimmermann

Deutsche Nationalbibliothek (German National Library), Germany

The challenge of copyright and a potential remedy

As all large libraries, the German National Library (DNB) seeks to make its collection as publicly accessible as possible. A Large Language Model (LLM) trained on its holdings would allow users to search semantically across the entire catalogue, greatly enhancing the outreach and usefulness of the library. However, most of the collection is under copyright, and German law limits the distribution of such a model to the library’s premises.

Research with protected works is also heavily constrained by German copyright legislation. Even the 2018 Text‑and‑Data‑Mining exemption (§ 60d UrhG) only permits scientific analysis; it does not ease requirements for compiling, publishing, or archiving the resulting data (Iacino et al. 2023). Because protection lasts up to 70 years after an author’s death, literary scholars, for example, tend to limit themselves to 19th‑century texts, avoiding more recent works.

Thus, while an LLM based on DNB’s collection would be highly valuable both for researchers and the wider public, the main obstacle is copyright. To obtain the benefits of an openly shareable, knowledge‑rich LLM without violating copyright, DNB is exploring Derived Text Formats (DTFs) (Schöch et al. 2020). DTF‑training promises secure, transparent, and copyright‑compliant LLMs that could be queried online by anyone – everywhere, all at once.

To create DTFs, original texts are first processed with text‑and‑data‑mining tools (e.g., POS‑tagging, lemmatization). Then, information is selectively reduced until the work falls below the copyright threshold. The remaining data no longer infringes copyright but still answers at least one research question.

While there are also vector-based DTFs – the replacement of individual words with numerical values such as word embeddings –, DNB focuses on token-based DTFs. One of them is the Bag‑of‑words or document‑term matrix that consists of a list of tokens with their absolute frequencies (optionally with word form, POS, and/or lemma). It can be used for text classification, clustering, and authorship attribution. N‑grams, ordered sequences of n tokens presented alphabetically or by frequency, can be applied in stylometric studies. Masking or deletion of specific tokens can render a text uncopyrighted while enabling analyses – e.g. a drama stripped of its spoken lines can still serve for network analyses.

DNB is mapping the copyright landscape (Iacino et al. 2025) to identify DTFs that are both research‑valuable and legally safe, with the intention of releasing selected corpora in these formats for free use.

LLM training under constraints

Furthermore, DNB is testing whether DTFs can potentially unlock its collections for LLM use while respecting copyright. To explore this, DNB joined the CORAL research project (Constrained Retrieval‑Augmented Language Models), funded by the Federal Ministry of Research, Technology and Space.

CORAL combines exclusive data – DNB’s digital holdings, web crawls from the Internet Archive and Common Crawl, and proprietary financial data – to study how such material can be transformed or obfuscated yet remain useful for LLM training. The goal is to build models that generate purpose‑specific, non‑hallucinatory text that is traceable to sources.

Because existing constraint techniques are insufficient and users can coax models into leaking sensitive or protected information, the project investigates new retrieval‑augmented generation (RAG) methods, robust training procedures, resource‑efficient architectures, and safeguards against data leakage. It also seeks to ensure transparency of outputs (grounding, originality, referenceability).

All methods, models, and RAG pipelines will be benchmarked and tested on real‑world use cases, including those of DNB, offering a blueprint for other institutions hindered by legal and technical data constraints.

The technical headache

While the idea to train an LLM on the concepts of DNB’s 19 million digital works without using the original wording by using DTFs sounds plausible, the practical implementation remains a challenge. DTFs strip virtually all textual cues that LLMs rely on (large‑scale parameters, attention, next‑token prediction, high confidence). This makes it difficult for the model to learn and retain knowledge without risking data leakage.

To mitigate these effects, DNB explores different approaches in CORAL, such as:

avoiding structured information and limiting duplication in the corpus (Sun et al. 2024),
scaling down context windows or using static embeddings in preprocessing (Speicher et al. 2024),
creating bottlenecks in the architecture to force the model to generalise (Lesci et al. 2024),
injecting noise during training (Speicher et al. 2024), or
filtering out generated text that would violate copyright post-hoc (Yan et al. 2024).

Since each measure harms the quality, transparency, or usefulness of the resulting model, balancing the trade-offs is the core research challenge. Currently, the project members are experimenting with different solutions. Results that are expected before the Fantastic Futures conference will be presented there.

References

Iacino, Gianna, Paweł Kamocki, Keli Du, Christof Schöch, Andreas Witt, Philippe Genêt, and José Calvo Tello. 2025. “Legal status of Derived Text Formats.” In Recht und Zugang 3/2024. https://doi.org/10.5771/2699-1284-2024-3-149

Iacino, Gianna, Pawel Kamocki, and Peter Leinen. 2023. Assessment of the Impact of the DSM-Directive on Text+. Zenodo. https://doi.org/10.5281/zenodo.12759960 .

Lesci, P., Meister, C., Hofmann, T., Vlachos, A. and Pimentel, T., 2024. Causal Estimation of Memorisation Profiles. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 15616-15635). Association for Computational Linguistics.

Schöch, Christof, Frédéric Döhl, Achim Rettinger, Evelyn Gius, Peer Trilcke, Peter Leinen, Fotis Jannidis, Maria Hinzmann, and Jörg Röpke. 2020. “Abgeleitete Textformate: Text und Data Mining mit urheberrechtlich geschützten Textbeständen.” In Zeitschrift für digitale Geisteswissenschaften. Wolfenbüttel 2020. text/html Format. DOI: 10.17175/2020_006.

Speicher, T., Khan, M.A., Wu, Q., Nanda, V., Das, S., Ghosh, B., Gummadi, K.P. and Terzi, E., 2024. Understanding Memorisation in LLMs: Dynamics, Influencing Factors, and Implications. arXiv preprint arXiv:2407.19262.

Sun, Albert Yu, Eliott Zemour, Arushi Saxena, Udith Vaidyanathan, Eric Lin, Christian Lau, and Vaikkunth Mugunthan. 2024. “Does Fine-Tuning GPT-3 with the OpenAI API Leak Personally-Identifiable Information?” arXiv preprint arXiv:2307.16382. https://arxiv.org/abs/2307.16382.

Yan, B., Li, K., Xu, M., Dong, Y., Zhang, Y., Ren, Z. and Cheng, X., 2024. On protecting the data privacy of large language models (llms): A survey. arXiv preprint arXiv:2403.05156.