Conference Agenda

Session

LT01: Lightning talks

Time:

Thursday, 04/Dec/2025:

4:00pm - 5:00pm

Location: Pigott Theatre (Auditorium)

Knowledge Centre, capacity 255

Presentations

Introducing the AI for Cultural Heritage Hub (ArCH)

Amelie Roper, Suzanne Paul

Cambridge University Library, United Kingdom

The University of Cambridge holds the UK’s highest concentration of Designated Collections outside of London - an extraordinary resource recognised by the Arts Council and UNESCO for their national and international significance. Numbering over 15 million items and housed within the University’s museums, libraries and garden, these collections span the globe and millennia. They represent a rich repository of cultural and natural history and play a pivotal role in underpinning research across the arts, humanities, social sciences and sciences. However, challenges such as analogue formats, handwritten documentation, fragmented and dispersed objects, multi-lingual sources and multi-dimensional surfaces render much of this material inaccessible.

This paper will introduce the AI for Cultural Heritage Hub project: www.lib.cam.ac.uk/arch. This is a proof-of-concept project funded by AI@Cam and Schmidt Sciences, and led by Cambridge University Library in collaboration with the Department of Applied Mathematics and Theoretical Physics, the Collections, Connections, Communities Strategic Research Initiative and the Leverhulme Centre for the Future of Intelligence at the University of Cambridge. Over 14 months (February 2025 to March 2026), ArCH aims to deploy the convening power of the University of Cambridge’s distributed network of collections to create a secure workspace that will empower two types of non-technical users - cultural heritage practitioners and academic researchers working with cultural heritage collections - to analyse cultural heritage data with AI tools.

Central to ArCH’s vision is the need for a secure environment where researchers working with cultural heritage collections data can leverage the power of AI. The University of Cambridge’s collections include data that is “sensitive” for a variety of reasons, including copyright, data protection, licensing, subject matter, materiality and provenance. This results in a need to retain control over this data and how it is used, thereby limiting the usability of existing AI tools. This challenge is not unique to Cambridge: it is faced by cultural heritage organisations and collections researchers worldwide.

Alongside the creation of the infrastructure underpinning the secure workspace (or hub) and an associated community of practice, ArCH is investigating the potential of AI to address three cultural heritage challenges: (1) unlocking inaccessible collections; (2) reconstructing fragmentary and dispersed cultural objects; and (3) integrating expert cultural knowledge into AI algorithms. A series of case studies based on one or more of the University’s cultural heritage collections are being used to address these challenges. These include card catalogues at Cambridge University Library and the Scott Polar Research Institute; accession registers at the University Museum of Zoology; specimen labels at the University Herbarium; the Book of the Dead Ramose, an Egyptian papyrus dating to the 13th century BC at the Fitzwilliam Museum; and a 16th-century Mesoamerican pictographic lectionary at Cambridge University Library. The hub will prototype adaptive AI solutions to enhance understanding of these collections and identify a selection of AI tools to address these challenges.

This presentation will address the conference theme of building AI systems for and with staff and users. As well as providing an overview of the project’s progress to date and its case studies, it will outline the ways in which ArCH is using the combined knowledge of cultural heritage practitioners, collections researchers, IT professionals and AI experts to develop the hub. Interdisciplinary collaboration and skills sharing are built into the project structure. ArCH draws together expertise in secure platform and software development in the cultural heritage sector, collections-led research, digital humanities, library, archive, garden and museum collection curation, AI methodologies and tools, technology and digital cultures, and cultural heritage sector project management and funding application development.

ArCH is jointly led by Amelie Roper (Head of Research and Manager of the Research Institute at Cambridge University Library) and Tuan Pham (Head of Digital Innovation and Development at Cambridge University Library), who have overall responsibility for project delivery. The project is then divided into three teams: the Workspace Team, the AI Experts Team and the Support Team. The Workspace Team has responsibility for gathering requirements, horizon scanning, building the platform, integrating AI tools and preparing documentation. The AI Experts Team provides expertise in AI methods, tools and data, and the Support Team assists with network building, budgeting, delivery, events and comms. In addition, there are sub-teams for each case study, overseen by Dr Suzanne Paul (Keeper of Rare Books and Early Manuscripts at Cambridge University Library) and an Advisory Board comprising representatives from the University of Cambridge and wider cultural heritage and academic sectors. Their expertise spans applied mathematics, archaeology, art history, book history, curation, data provenance, history of science, literature, manuscript studies and museum studies, as well as heritage practice and leadership. The paper will demonstrate how this team structure and the associated collaboration are underpinning the development of the hub, and provide insights into project methodology and possible next steps once the initial phase of ArCH has been completed.

Write it down! Fostering Responsible Reuse of Cultural Heritage Data with Interoperable Dataset Descriptions

Henk Alkemade², Steven Claeyssens³, Maria Eskevich⁴, Nuno Freire¹, Nele Gabriëls⁵, Alba Irollo¹, Antoine Isaac¹, Sarah Oberbichler⁶, Giulia Osti⁷

¹Europeana Foundation; ²CARARE; ³KB, National Library of the Netherlands; ⁴Huygens Institute for History and Culture of the Netherlands; ⁵KU Leuven Libraries; ⁶Leibniz Institute of European History; ⁷University College Dublin

Cultural heritage institutions have seen a surge in the creation of datasets ready for computational use, while researchers increasingly experiment with datasets through computational processing and AI-assisted methods. For both groups, issues of transparency have sparked interest in developing documentation practices cutting across the artificial intelligence/machine learning (AI/ML) and the digital cultural heritage (DCH) sector, aiming to provide better information on e.g., the purpose, composition, reusability, collection processes and provenance, or societal biases reflected in datasets. The publication of Datasheet for Datasets (Gebru et al., 2021) and the Collections as Data movement (Padilla et al. 2023) have sparked the definition of guidelines for dataset creators and publishers who want to follow FAIR and CARE principles and make it easier for one to reuse their data in a responsible, well-informed manner.

Gathering CH professionals, technical experts and humanities scholars from the Europeana Research and EuropeanaTech communities, the Datasheets for Digital Cultural Heritage working group has adapted existing ML documentation approaches to the DCH case. As a first outcome, a template (Alkemade et al, 2023) has sought to address the complexities of DCH datasets, shaped by layered curatorial decisions, often subject to evolving and non-linear trajectories. In the spirit of the common European data space for cultural heritage (2025), which is being deployed under the stewardship of the Europeana Initiative, the working group has then supported professionals interested in applying the template in their institutional context (see for example Lehmann et al., 2024) and fostered exchanges with other initiatives emerging at the European level and exploring suitable ways to describe datasets. One key initiative in this regard concerns the proposal for Data-Envelopes for Cultural Heritage (Luthra et al., 2024), which has focused specifically on providing machine-readable descriptions of datasets, especially considering the W3C Data Catalogue Vocabulary (DCAT) that is used in many data portals.

The goal of this collaboration is both to validate and further refine the existing templates following a community-led approach, and to investigate how to ensure (human-machine) interoperability in the data space, which aims to establish a diverse data offer (including datasets suitable for AI applications, as illustrated by the AI4Culture platform (2025)) as well as making use of DCAT.

Our contribution will report on the following ongoing work:

Alignment with DCAT: DCH datasheet fields are being mapped to DCAT to enable machine-readability
Alignment between DCH datasheets and data-envelopes, establishing conceptual and structural compatibility, and supporting future integration with other legal, technical and ethical frameworks.
Gathering a set of exemplary dataset descriptions
Creation of (prototype) tooling to support and simplify the creation, reuse and integration of descriptions into existing workflows.

We also plan to discuss new items that will begin before the conference:

Identify possible connections with data research plans and data management plans. This may extend to interoperability with emerging European Cultural Heritage Cloud (ECHOES, 2025).
Establish a modular structure for descriptions, aiming at operationalising the templates by defining building blocks, including a ‘core’ common to most DCH collections and a series of ‘profiles’, tailored to research data management and AI/ML workflows (e.g., AI Model Research Documentation Sheet (AIRDocS) (Oberbichler, 2025)
Providing guidance to use these modules and possibly develop custom ones.

While some components remain under active development (e.g. prototype, profiles and guidelines for their development), we present this work in progress to foster dialogue and invite broader engagement from the Fantastic Futures community.

References

AI4Culture project (2025). AI4Culture, Empowering Cultural Heritage through Artificial Intelligence. https://ai4culture.eu
Alkemade, H., Claeyssens, S., Colavizza, G., Freire, N., Irollo, A., Lehmann, J., Neudecker, C., Osti, G., & van Strien, D. (2023, September 25). Datasheets for Digital Cultural Heritage Datasets—Template v.1. Zenodo. https://zenodo.org/records/8375034
Alkemade, H., Claeyssens, S., Colavizza, G., Freire, N., Lehmann, J., Neudecker, C., Osti, G., & Van Strien, D. (2023). Datasheets for Digital Cultural Heritage Datasets. Journal of Open Humanities Data, 9, 17. https://doi.org/10.5334/johd.124
Common European data space for cultural heritage (2025), Welcome to the Common European data space for cultural heritage. https://www.dataspace-culturalheritage.eu/en
ECHOES project (2025), ECCCH, The Cultural Heritage Cloud, https://www.echoes-eccch.eu/
Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumé III, H., & Crawford, K. (2021). Datasheets for Datasets. Communications of the ACM, 64(12), 86–92. https://doi.org/10.1145/3458723
Luthra, M., & Eskevich, M. (2024). Data-Envelopes for Cultural Heritage: Going beyond Datasheets. In I. Siegert & K. Choukri (Eds.), Proceedings of the Workshop on Legal and Ethical Issues in Human Language Technologies @ LREC-COLING 2024 (pp. 52–65). ELRA and ICCL. https://aclanthology.org/2024.legal-1.9
Lehmann, J., & Schneider, S. (2024). Metadata of the "Alter Realkatalog" (ARK) of Berlin State Library (SBB). https://doi.org/10.5281/zenodo.13284442
Oberbichler, S. (2025). AI Model Research Documentation Sheet (AIRDocS). https://doi.org/10.5281/zenodo.15046713
Padilla, T., Scates Kettler, H., Varner, S., & Shorish, Y. (2023). Vancouver Statement on Collections as Data. https://zenodo.org/records/8342171
Pushkarna, M., Zaldivar, A., & Kjartansson, O. (2022). Data Cards: Purposeful and Transparent Dataset Documentation for Responsible AI. 2022 ACM Conference on Fairness, Accountability, and Transparency, 1776–1826. https://doi.org/10.1145/3531146.3533231
World Wide Web Consortium. (2024). Data Catalog Vocabulary (DCAT) - Version 3. https://www.w3.org/TR/vocab-dcat-3/

Multimodal AI for visual book navigation: lessons learned implementing and evaluating image search for a large library collection

Marie Roald

The National Library of Norway, Norway

In addition to text, books often contain visual content such as illustrations, photographs, diagrams or other graphical elements. Thus, the National Library of Norway's (NLN's) digitised book collection is not just a text collection but a multimodal collection that combines text and visuals. This lightning talk covers our ongoing work addressing this multimodality by developing an image search application (https://dh.nb.no/run/bildesok/) to facilitate exploration of NLN's digitised books through their visual elements.

NLN has since 2006 digitised its collection, and, nearly 20 years later, it has digitised more than 600 000 books, nearly all books published in Norway. As part of the digitisation pipeline, books are run through a layout analysis step, which results in ALTO-XML files with annotated text regions and non-text regions (i.e. image, illustration, etc.) The textual output from this digitisation pipeline is the foundation for the search functionality in NLN's online library, Nettbiblioteket (https://www.nb.no/search), and the DH-Lab tools [1]. Modern image-based and multimodal AI models open the door to similar capabilities for images and show promise for both similarity-based image retrieval and automatic metadata generation.

However, as our data source is images extracted from the output of automatic layout detection of scanned books from NLN's collection, the data pose unique challenges:

Data quality: The images are not cleanly digitised and can contain segmentation errors from the automatic layout detection
Scale: There is a lot of data (hundreds of thousands of books, tens of millions of automatically detected image elements)
Language: As this is a Norwegian collection, we need a search that works for Norwegian.

This lightning talk outlines our ongoing work addressing these challenges. We share our preliminary results and lessons learned from developing the image search application and evaluating modern models for retrieval and metadata generation. Specifically, we expand on the work in [2], which evaluated modern image-based and multimodal AI models with respect to image retrieval and classification and implemented a prototype image search application with images from books published before 1900. In this work, we also include books from the 20th and 21st centuries, which led to a 50-fold increase in image elements and a broader visual diversity. Additionally, we expand the evaluation to include more realistic and domain-specific image transforms. Our results demonstrate that modern multimodal AI models can enhance the discoverability of visual content within extensive library book collections.

[1]: M. B. Birkenes, L. Johnsen, and A. Kåsen. "NB DH-LAB: A Corpus Infrastructure for Social Sciences and Humanities Computing". CLARIN Annu. Conf. Proc 2023. 2023, pp. 30–34.

[2]: M. Roald, M. B. Birkenes, L. G. Johnsen. "Visual Navigation of Digital Libraries: Retrieval and Classification of Images in the National Library of Norway's Digitised Book Collection". Proc. Comput. Humanit. Res. Conf. 2024. 2024, pp. 892–905.

Crafting responsible AI afterlives: Co-designing a practical resource with the GLAM Sector

Bethan Jones¹, Jenny Kidd¹, Eva Nieto McAvoy²

¹Cardiff University, United Kingdom; ²King's College London

Those attending the 803^rd Interstellar Song Contest would have had the opportunity to visit the Interstellar Song Contest Museum, dedicated to the history of the competition. Among the costumes, posters and videos from previous contests visitors would also have been able to access a hologram archive containing informational holograms in the likeness of those associated with the contest, such as Graham Norton. Of course, the 803^rdInterstellar Song Contest Museum only exists in the confines of the BBC’s science fiction programme Doctor Who, but with the improved capabilities of AI-enabled voice ‘clones’ and ‘deepfake’ technologies, the use of AI to ‘revive’ the dead within museums and heritage sites is fast becoming a mundane proposition. Notable examples include the Dalí Lives installation at the Dalí Museum (Salvador Dalí), and the Hello Vincent chatbot at the Musée d’Orsay (Vincent van Gogh).

However, the creation of what we call ‘AI afterlives’ raises deep ethical questions within GLAM contexts: How should cultural professionals work with the digital/digitised human remains of public figures and cultural ‘icons’? How should consent be understood in the context of (remediated, and posthumous) algorithmic afterlives? How might our ethical and regulatory frameworks need to be (re)configured? And what are the implications of cultural professionals using off the shelf chatbot creators (for example Chatbot Kit)? These questions are increasingly important in the context of concerns about disinformation and declining levels of trust in public institutions and the media (OECD 2024). Cultural institutions - and GLAM professionals - continue to enjoy very high levels of trust and confidence (National Museum Directors' Council 2022) yet our research has evidenced deep concerns within the sector that increased use of AI might undermine that trust, with questions about distortion, distrust and disinformation looming large (Kidd and Rees 2021; Nieto McAvoy and Kidd 2024, Kidd and Nieto McAvoy 2025).

As part of the Leverhulme Trust-funded Synthetic Pasts project (2024-2026) we have been working collaboratively with the sector to better understand and respond to these concerns; in particular where algorithms are being used to ‘revive’ historic figures for interactions with the public. In this presentation we will introduce and reflect upon our co-design work – alongside 20 UK cultural professionals and creative studio yello brick – to produce an innovative resource for museum/historic sites navigating the creation of ‘AI afterlives’.

This work started in 2024 with a series of workshops where we introduced a range of digital afterlife tools and examples to cultural professionals and asked them to consider the opportunities and challenges they pose for the sector. Participants were also asked to reflect on how working with AI afterlives fits with their professional values and organisational principles through a series of activities. While responses to these activities suggested that such approaches might - when done well - enhance visitor engagement and offer greater accessibility to collections materials and existing digital assets, a great many challenges and concerns were identified: [1] Creating AI afterlives might distort or flatten the past, creating inaccurate or misleading representations and perpetuating harmful stereotypes; [2] relatedly, machine learning processes (grounded in datasets that are incomplete and often biased) might compound known issues around historical interpretation and representation, potentially reversing recent positive gains in the GLAM sector relating to decolonisation and the diversification of historical narratives; [3] interactions with AI afterlives might narrow visitors’ scope of inquiry and limit the types of questions they ask, restricting rather than encouraging critical thinking; [4] widespread uses of AI within institutions will unquestionably compound environmental and sustainability concerns associated with digital heritage; and [5] working with cultural heritage data in this way could well encourage data uses and entanglements that cultural professionals need specific skills to navigate, including ethical and legal/regulatory literacies. Our workshop attendees also reflected that museums, galleries, archives and libraries range from the large and well-resourced to the small and volunteer-led, with very different levels of technological skill and understanding, varied priorities, and diverse audiences (and audience expectations) to navigate. There is much to consider then as institutions decide whether and how to create AI afterlives, and under what circumstances they might be resisted.

Building from these workshops we have produced an online resource which encourages heritage professionals to reflect upon the creation of AI afterlives through a series of innovative scenarios. In March 2025 we began user testing and evaluation of the resource, including through a series of reflective interviews with workshop participants, to assess whether it meets the needs and requirements identified. This resource will be finalised in the coming months on the basis of insights from the user tests and will be ready to introduce and demonstrate at the conference in December.

In our research and collaborative work we have found that thinking about AI afterlives vividly crystalises concerns that have long been expressed within the sector about how materials from the past are made use of in the present, including digitally: Questions about how, what and who we remember; about what it means to represent and commemorate the dead; about claims to – and ownership of – the past; and about the varied ethical implications of using digital tools and working with data. Thinking about how the GLAM sector works with AI through the concept of ‘afterlives’ and ‘revival’ strongly signposts the need for responsible digital innovation and values-driven machine learning initiatives.

In sum, this presentation will [1] introduce the concept of AI afterlives and reflect on some examples, [2] unpack some of the practical and ethical issues associated with AI afterlives for the GLAM sector, [3] give an account of our collaborative work alongside cultural professionals and creative studio yello brick, and [4] introduce and demonstrate the AI afterlives resource that is one outcome of this work.

References

Kidd, J. and Rees, A. 2021. “A museum of deepfakes? Potentials and pitfalls for deep learning technologies.” In Stylianou-Lambert, T., Heraclidou, A. and Bounia, A. (eds), Museum Media(ting): Emerging Technologies and Difficult Heritage. New York and Oxford: Berghahn Books, pp. 218–232.

National Museum Directors' Council. 2022. “Ipsos MORI Veracity Index: museum curators among the most trusted professionals 10 Jan 2022.” National Museums, 10 January. https://www.nationalmuseums.org.uk/news/ipsos-mori-veracity-index-museum-curators-among-most-trusted-professionals/

Nieto McAvoy, E. and Kidd, J. 2024. “Synthetic Heritage: Online platforms, deceptive genealogy and the ethics of algorithmically generated memory.” Memory, Mind & Media, 3, p.e12. https://doi.org/10.1017/mem.2024.10

Kidd, J. and Nieto McAvoy, E. Forthcoming 2025. AI Afterlives: digital memory and synthetic pasts. Bloomsbury.

OECD. 2024. OECD Survey on Drivers of Trust in Public Institutions – 2024 Results: Building Trust in a Complex Policy Environment. OECD Publishing, Paris, https://doi.org/10.1787/9a20554b-en.

Challenges of Building a RAG System for Library Reference

Alberto Martinez

El Colegio de Mexico, Mexico

At the Biblioteca Daniel Cosío Villegas at El Colegio de México we have been developing an AI powered chatbot designed to support users in exploring a corpus of historical Mexican legal texts covering the period from the 1680s to the 1920s.

As we moved forward, we ran into a critical challenge in terms of the design of RAG systems, in that, it operates counter to the basic principles of library reference services. Librarians do not aim to hand out final answers to research questions. Instead, we guide users, helping them locate the most relevant and authoritative materials so they can shape their own answers.

This presentation will explore the tensions, challenges, and some possible approaches to adapting RAG based AI tools to the particular service model of academic and research libraries.

I will start by explaining how the Legislación Mexicana chatbot was developed and why we opted to use a RAG system. I will describe some of the practical hurdles we faced, such as how to manage the risks of hallucination or oversimplification, how to ensure that the system is transparent about where its answers come from, and how to preserve the exploratory nature of the research process rather than shutting it down with a simplified summary.

I will talk about some of the strategies we have been using to address these issues. For example, we are focusing on document retrieval and linking over pure synthesis. We are adding disclaimers and interface prompts that help users understand what the system can and cannot do.

Finally, I will reflect on what all of this means for the future of RAG systems in libraries. We need to rethink what we are trying to accomplish when we bring these tools into our environments. Should we measure success by how quickly a user gets a summarized answer, or by how well the system helps them navigate the complexity of a collection? Should we aim for systems that replace human interaction, or that extend and support it?

I believe this project makes an important contribution because it highlights the conceptual gap between how RAG systems work and what responsible library reference work actually looks like. As more libraries, and library service providers begin testing AI, it is essential to ask not just what these systems can technically do, but how they align with the values and goals that shape our work.

AI-Based OCR for Slovene Heritage: A Comparative Study of Tools and Practices

Ajda Zavrtanik Drglin, Bakir Toskić

National and University Library, Slovenia

Digitising cultural heritage materials in less widely spoken languages such as Slovene presents significant challenges for AI-based optical character recognition (OCR) systems. At the Digital Library of Slovenia (dLib.si), we have primarily relied on ABBYY FineReader and its Server Engine. While generally effective, these tools often struggle with more complex materials, including historical newspapers, typewritten documents, and handwritten manuscripts.

In this lightning talk, we present insights from our recent evaluation of multiple OCR tools for the more demanding Slovene-language materials. Pero OCR, trained on Czech, delivered the most accurate results for historical newspapers – the publicly available OCR model was trained on a dataset, very similar to our own, down to the period of history that the newspapers covered. Pero OCR also showed promising outcomes for typewritten PDFs, while remaining somewhat inconsistent. We also tested Surya OCR and Tesseract OCR, two open-source systems with multilingual support. Both tools offer flexibility and customisation options, yet neither fully meets the practical demands of our heritage digitisation workflows at this time. The most significant advantage of Pero OCR compared to Surya and Tesseract was the general similarity of the Czech and Slovenian languages, which in particular meant better recognition of special characters, such as č, ž, š, that other models struggle with. Still, the absence of any general OCR model trained on Slovene only remains a persistent obstacle, which continues to limit the potential of AI-driven text recognition in our national context.

For handwritten manuscripts, we compared Pero and Transkribus, ultimately selecting Transkribus for its dedicated handwriting recognition capabilities and ability to train your own recognition models for each hand, which has proven to be crucial. In anticipation of the 2026 centenary of Slovene poet Srečko Kosovel’s death, we are launching a citizen science initiative to refine OCR outputs for his manuscripts. Volunteers will work from AI-generated drafts to produce accurate, structured TXT files — essential for full-text search, accessibility, long-term preservation, and scholarly use.

By the time of the conference in December, we expect to present expanded results and updated comparisons. We also aim to encourage colleagues in other institutions to share their own experiences with AI-based OCR, especially in linguistically under-resourced or complex environments. Our goal is to foster collaboration and mutual learning across the GLAM sector.

References

Pero OCR https://pero-ocr.fit.vutbr.cz/
Pero OCR results on the Digital Library of Slovenia https://dlib.si/results/?euapi=1&query=%27keywords%3docr+pero%27&pageSize=25
Transkribus https://app.transkribus.org/
Tesseract OCR https://tesseract-ocr.github.io/
Surya OCR https://github.com/VikParuchuri/surya
Kosovel handwriting https://dlib.si/static/kosovel/kosovel.html

Libraries as Research Engines: AI Agents for Discovery through Language, Inference, and Network Structure

James Lee, Xuemao Wang

Northwestern University, United States of America

We propose a vision of the academic research library in the era of AI, which supplements existing services and collections with computational AI research that meets the future needs of our scholars. At Northwestern University, we are advancing this shift through a clear methodological evolution, from ensemble topic modeling and network-based interpretability metrics, to the deployment of AI research-assistant agents that orchestrate multi-modal computational reasoning across curated, temporally deep, interdisciplinary collections to assist in collaborative research projects.

Our current research builds on the foundational Model of Models (MoM) framework, originally funded by the Andrew W. Mellon Foundation, which demonstrated that libraries could serve not only as repositories but as semantic brokers across disciplines. MoM used ensemble topic models to reveal bridging concepts across law, medicine, literature, and social media, establishing proof of concept for computational infrastructure rooted in the library. That foundation has now evolved into a new generation of AI agents, designed not just to model knowledge but to navigate it, by integrating statistical inference, temporal reasoning, and semantic structure discovery across multiple modalities and domains.

These agents combine three synergistic capabilities:

Transformer-based language models for multilingual semantic reasoning, historical discourse tracking, and comparative evaluation of domain-specific corpora across time. These models allow us to uncover both local and global contextual meaning, capturing how concepts shift across cultural, linguistic, and disciplinary spaces.
Statistical modeling, including Granger causality and partial correlation networks, to identify directional influence between discourses and track how concepts move across epistemic communities. These tools allow us to analyze textual archives as dynamic systems of semantic cause-and-effect, not exclusively collections of retrospectively interpreted content.
Network science, including betweenness centrality, modularity, and Gini inequality measures, to map semantic brokerage and epistemic connectivity. By tracing how concepts bridge otherwise disconnected clusters, we model how knowledge flows across disciplinary boundaries and how meaning is structurally encoded within AI language models themselves.

This research is anchored in the Academic Innovation division at Northwestern University Libraries, a new initiative that integrates expertise in AI and data science, digital humanities, geospatial analysis, research data services, and open-source development. Our collaborators span the Medill School of Journalism, Media and Integrated Marketing Communications' Knight Lab, faculty in computer science and network theory at the McCormick School of Engineering, and historians and media scholars from the School of Communication.

Two flagship case studies illustrate this evolution:

Archives as Sensors

This project reconceptualizes the humanities archive as a temporal instrument, not just a retrospective record. Using over eight million JSTOR articles and transformer-based topic modeling, we constructed Gaussian Graphical Models (GGMs) and applied Granger causality to topic time series. Our findings show that discursive shifts in the humanities often precede equivalent scientific discourse by 15–25 years. For example, metaphors for climate instability appear in the 1920s literary corpus, decades before the emergence of formal climate science discourse.

This work reveals that archives encode predictive semantic signals and function as “epistemic ice cores.” Humanities language serves not as post hoc commentary, but as a leading indicator of emerging worldviews. These findings support a broader claim: that textual archives constitute an underrecognized early-warning system for sociotechnical change, and that libraries are the institutions best positioned to activate that system.

Semantic Brokerage in Topic Networks

This study investigates the structural backbone of meaning in topic networks generated by AI language models. We trained both LDA and transformer-based models (e.g., BERTopic) on corpora including Wikipedia, PubMed Central, and global news archives. We constructed semantic networks based on document co-occurrence and topic vector similarity, then measured how central nodes (brokers) distribute according to betweenness centrality.

Our findings reveal a universal power-law distribution in semantic brokers: a small number of high-centrality concepts connect large portions of the network. These brokers do not correspond to the most frequent terms (hubs), but to bridging concepts that span clusters. For instance, “vaccine trials” connects public health, clinical research, and global policy clusters, while “franchise” connects legal discourse across antitrust, consumer rights, and civil litigation.

Critically, we show that this power-law behavior emerges only in sparse regimes, when weak ties are pruned and the network backbone revealed. This aligns with social theories of weak ties and structural holes, but reframed for semantic systems. As networks are simplified, the true epistemic infrastructure of discourse emerges, not by degree, but by brokerage.

Taken together, these two projects demonstrate that libraries can house knowledge while increasingly shaping its research logic. When AI systems are trained on curated, ethically governed collections that span centuries, disciplines, and epistemic registers, they become precise tools of both information retrieval and transdisciplinary understanding. More importantly, these systems cannot be trained, validated, or trusted without library stewardship. They depend on the provenance, balance, and interpretability that only libraries can provide.

The implications for LAM institutions are profound:

Libraries become epistemic infrastructures, not just points of entry to services and collections, directly supporting AI systems that learn from and reveal the semantic evolution of knowledge.
Archives become predictive instruments, capable of tracking discursive anticipation, not just retrospective framing.
Museums and galleries can adopt these methods to analyze curatorial metadata, interpret visual language shifts, and map cultural brokerage across historical collections.

In doing so, libraries and their sister institutions offer something commercial AI platforms cannot: an ethics of context, a commitment to meaning over scale, and a framework where AI amplifies human judgment rather than replacing it.

This work offers the LAM community a blueprint for becoming central actors in the next phase of machine learning as research engines that can meaningfully contribute to how AI discovers, connects, and reasons. We conclude that the future of AI in GLAM will be defined by the ability to trace connections, anticipate meaning, and broker knowledge across the boundaries that separate disciplines, media, and historical moments. We propose a vision where that future will be built, curated, and led by libraries.

Culture for AI: formulating an alignment

Rasa Bocyte¹, Johan Oomen¹, Lorena Aldana², Harry Verwayen², Antoine Isaac², Emma O'Hare¹, Valentine Charles²

¹Netherlands Institute for Sound & Vision; ²Europeana Foundation

This submission presents insights gathered in the Alignment Assembly on ‘Culture for AI’, organised in the spring and summer of 2025 inviting the digital cultural heritage community to collectively shape a shared vision for responsible AI, surfacing consensus, tensions, and community-driven insights through a participatory process.

Introduction: Cultural Heritage and AI

The digital cultural heritage community has never been a bystander in digital transition; we have actively shaped and developed digital infrastructures and tools that adhere to public values and inspire other sectors. Cultural heritage institutions not only leverage AI technologies to enhance their collections and storytelling methods (Münster 2024, Hansen 2023, Gefen 2021); they also possess unique expertise, perspectives, and datasets that can inform the development of responsible AI systems. Likewise, culture plays a crucial role in shaping AI itself, fostering diversity, increasing AI literacy and supporting the creation of AI models that are culturally sensitive, inclusive, and aligned with societal values (Thiel 2024, Boer 2024).

This reciprocal relationship between AI and culture (AI for culture and culture for AI) emphasises the potential that the cultural heritage community has in guiding the responsible development and application of AI (Bočytė 2024). If we, as the cultural heritage sector, want to claim an active role in the development of AI, we need to agree on a common vision for what we actually mean by responsible AI. Yet doing so is not straightforward.

Need for a shared vision

The vagueness of terms such as ‘ethical’, ‘trustworthy’ and ‘participatory’ AI and the lack of good practices that would make them more tangible, creates complexity and ambiguity around the topic. Addressing this uncertainty is particularly key in the context of the common European data space for cultural heritage [1], the European Union flagship initiative to accelerate the digital transformation of the cultural heritage sector. The data space comprises cutting-edge infrastructure, a vibrant community and a suite of products, frameworks and tools which facilitate the open and trustworthy sharing of heritage data across Europe. This infrastructure is not built from scratch; it draws on over 15 years of experience from the Europeana initiative, which has pioneered open metadata standards, interoperability protocols, and participatory governance models. These foundations provide the heritage sector in Europe with a strong platform for shaping how AI is implemented in ways that reinforce public values, long-term stewardship, and ethical data use. As we expand its infrastructure and nurture its community, the question remains; What position should the data space take in responsible AI development and adoption? And what do responsible AI practices and critical engagement with AI look like in our community?

The Alignment Assembly on ‘Culture for AI’

This presentation at the Fantastic Futures conference will discuss learnings from the Alignment Assembly on ‘Culture for AI’ - a collaborative process designed to shape a shared vision on AI for the data space community. The process, running in May-July 2025, includes a participatory activity on the online open-source consultation platform Pol.is [2], as well as discussions with the community during both offline and online events. This Alignment Assembly invites the digital cultural heritage community to engage with a curated set of provocative statements about AI, vote on them, and contribute their views. The initial statements were co-created drawing on experiences from the data space, recent policy discussions, and a dedicated stakeholder dialogue.

The process is based on the Alignment Assembly model developed by the Collective Intelligence Project and inspired by the by the Alignment Assembly on “AI and the Commons” that was conducted using the same model by Open Future, Creative Commons and Fundación Karisma between February-March 2024 (Hong 2024). This model combines large-scale online surveys with structured conversations to inform policy debates and guide technology development in line with shared values.

The process aims to achieve three key objectives: first, to identify areas of consensus and contention within the community contributing to the common European data space for cultural heritage; second, to foster dialogue around emerging dilemmas and showcase how these are being navigated in practice; and third, to co-create a living resource that offers actionable guidance and captures community-driven insights. The Assembly surfaces both areas of broad agreement and topics marked by strong disagreement. At the launch in May, 19 initial statements were published, including these examples:

"Cultural heritage institutions should keep their data as open as possible, even if AI models trained on it generate commercial profit."
"Heritage organisations should actively develop open-source AI models to provide alternatives to big tech’s control over cultural data."
"Less cultural heritage data will be published openly if the sector embraces AI, due to concerns of misuse, commercial exploitation and loss of control."
"Heritage organisations should reject partnerships with AI companies that fail to comply with ethical labour and data practices globally."

The platform encourages participants to also add additional statements that they feel are important and urgent. This way, topics discussed in the assembly will grow organically.

In order to interpret and structure the complex, and at times contradictory, input generated through this participatory process, the initiative applies a structured thinking model [3] for systematically identifying, comparing, and prioritising the dilemmas surfaced through the participatory process. The model considers AI's impact along two dimensions: the temporal horizon (from current applications through emerging opportunities to future scenarios) and stakeholder impact, ranging from individuals and organisations through the broader cultural heritage sector to societal and global contexts. Particular attention is given to the intersection where AI can have sectoral impact through the widespread adoption of still-emerging technologies. This is the ‘zone’ in which the common European data space for cultural heritage is already most active, offering the potential to shape the development of new tools and services and define the conditions under which they are adopted. Exploring this space can help to surface shared values that also inform current tools whose implications remain unclear.

Anticipated outcomes

We anticipate that dozens of community members in the sector (ranging from data specialists and technology developers to curators and educators) will share their thoughts (at the time of writing, over 60 participants have already taken part during the first week of the process). Given that analysing the data will commence from July onwards, it is impossible to share insights at the time of submitting this paper proposal. Also, we anticipate to follow up the online questionnaire with discussions with the various communities connected to the European data space for cultural heritage (including EuropeanaTech, AI4LAM, Creative Commons). Based on the outcomes of these activities, during the session we will:

Shed light on the most agreeable and contentious topics in the community;
Discuss how heritage communities can collaborate to turn the share vision into concrete actions and guidance for the sector;
Elaborate on divergent positions in the community and what actions can be taken to get more clarity on them.
Discuss good practices that exemplify what responsible AI means for the digital heritage community.

The Alignment Assembly on ‘Culture for AI’ directly supports the AI4LAM community’s mission by fostering collective reflection and dialogue, empowering cultural heritage institutions and other stakeholders to collaboratively shape ethical, inclusive, and value-aligned AI practices. With this session, we hope to contribute to create space for alignment on core values that drive our sector’s engagement with AI.

Notes

[1] https://www.dataspace-culturalheritage.eu/en

[2] https://pol.is/home

[3] The model is explained in this blogpost: https://pro.europeana.eu/post/share-your-views-about-ai-and-digital-cultural-heritage

References

Bočytė, R., Oomen, J., Hazejager, K., & Libot, C. (2024). The media sector on its AI journey. https://www.ai4media.eu/whitepapers/the-media-sector-on-its-ai-journey-directions-for-experimentation-implementation/

Boer, V. de, & Stork, L. (2024). Hybrid intelligence for digital humanities. In HHAI 2024: Hybrid Human AI Systems for the Social Good (pp. 94–104). IOS Press. Frontiers in Artificial Intelligence and Applications, Vol. 386.

Gefen, A., Saint-Raymond, L., & Venturini, T. (2021). AI for digital humanities and computational social sciences. In AI and Society (pp. 191–202). Springer International Publishing. https://doi.org/10.1007/978-3-030-69128-8_12

Hansen, A., Krack, N., Dutkiewicz, L., Bočytė, R., et al. (2023). Final white paper on the social, economic, and political impact of media AI technologies.

Hong, S., & Tarkowski, A. (2024). Alignment assembly on AI and the commons: Outcomes and learnings. Open Future. https://openfuture.eu/publication/alignment-assembly-on-ai-and-the-commons-outcomes-and-learnings/

Münster, S., Maiwald, F., di Lenardo, I., Henriksson, J., Isaac, A., Graf, M. M., Beck, C., & Oomen, J. (2024). Artificial intelligence for digital heritage innovation: Setting up a R&D agenda for Europe. Heritage, 7(2), 794–816.

Thiel, S., & Bernhardt, J. C. (2024). AI in museums: Reflections, perspectives and applications. Transcript Verlag. https://openresearchlibrary.org/content/33770d87-1343-411d-bd84-f765cf29dded

Building Bridges, Not Black Boxes: Integrating Generative AI and Linked Open Data for Cultural Heritage Futures

Shani Evenstein Sigalov

University of London, UK

As Generative AI (GenAI) systems rapidly evolve, memory institutions are under growing pressure to engage with these tools, both as potential accelerators of their mission, and as sources of new ethical, technical, and institutional challenges. The ways in which AI systems reshape access to cultural heritage, who gets represented, and how, are not abstract concerns: they shape what is remembered, what is cited, and what becomes part of the digital public record.

While Cultural Heritage Institutions (CHIs) hold vast repositories of structured and trustworthy data, these are rarely used to train or inform the Large Language Models (LLMs) that underpin today’s most powerful AI systems. Conversely, GenAI tools are not yet well integrated into CHIs’ own metadata workflows, largely due to infrastructural, skill-based, and ethical constraints. Linked Open Data (LOD) platforms such as Wikidata, hold enormous potential to serve as a connective layer—a bridge between structured cultural knowledge and machine-readable systems. However, despite numerous pilot initiatives, this potential remains largely unrealized at scale.

This presentation introduces a new interdisciplinary initiative: AI-BRIDGES (AI-Driven Bridging of Resources and Integration of Data Governance in Cultural Heritage Systems). This project seeks to address these gaps by designing a robust ecosystem that meaningfully connects CHIs, GenAI technologies, and LOD platforms in ways that center equity, openness, and human judgment. The project positions cultural heritage data not just as a passive resource to be consumed by AI, but as a co-constructed body of knowledge that can shape the future of responsible, ethical AI development.

Funded through a Marie Skłodowska-Curie Postdoctoral Fellowship and hosted by the University of London’s Digital Humanities Research Hub at the School of Advanced Study, AI-BRIDGES explores three interrelated research strands, metaphorically framed as bridges:

CHIs ↔ LOD: Many CHIs maintain high-quality metadata but struggle to publish it in formats compatible with LOD platforms. This bridge focuses on designing low-code/no-code tools that allow CHIs—especially smaller institutions without in-house technical staff—to streamline metadata cleaning, normalization, and publication. These pipelines are intended to democratize access to LOD participation, aligning with open knowledge values and enhancing discoverability across global heritage networks.
LOD ↔ GenAI: A second bridge addresses the disconnect between the structured knowledge in LOD platforms (such as Wikidata) and the opaque, often unverifiable outputs of large language models. AI-BRIDGES investigates the use of retrieval-augmented generation (RAG) to make GenAI systems more transparent and source-aware, particularly in cultural heritage contexts. By enabling LLMs to "call out" to structured LOD datasets when generating responses, this approach aims to reduce hallucinations, improve inclusivity, and reintroduce provenance into AI-driven knowledge production.
Users ↔ LOD: The third bridge focuses on people. It explores how participatory data curation activities—especially involving students and early-career researchers—can support both digital literacy and the integrity of LOD platforms. When users are invited to contribute to and critique the data that informs AI systems, they become not just consumers of culture, but co-creators of more equitable digital futures. This pedagogical model draws on the principles of participatory culture and critical digital literacy.

These research avenues are not simply technical -- they are fundamentally ethical and institutional. They concern how GenAI tools are designed, who contributes to their knowledge base, and how cultural memory is negotiated in a digital future. AI-BRIDGES positions itself at the intersection of design, critique, and capacity building, seeking to elevate the agency of CHIs and their communities in shaping AI futures.

The AI-BRIDGES project is still in its early stages. Rather than presenting polished outcomes, this talk will serve as an open invitation into the "thinking work" of the project, highlighting dilemmas, assumptions, and design choices that are actively being shaped. These include:

What responsibilities do CHIs have when their metadata is used by AI systems they do not control?
How do we meaningfully involve CHIs that lack in-house technical capacity in co-designing AI workflows?
What new literacies are needed—not just for users, but for institutions themselves—to navigate AI transformations responsibly?
How can we sustain collaborative, open knowledge ecosystems that resist centralization and control by commercial platforms?

In line with the Fantastic Futures 2025 theme of "The Fantastic Futures We Need," this presentation argues that the future we need is neither fully automated nor wholly preserved -- it is co-created. And in order to remain co-created, it requires intentional alliances across memory institutions, technologists, researchers and educators. These alliances must be grounded not only in tools and workflows, but also in shared commitments to transparency, inclusion and justice.

The success of AI-BRIDGES depends on exactly the kinds of partnerships that the AI4LAMs community promotes. While the project draws on open knowledge practices and open-source infrastructure (e.g., Wikidata, CC-licensed datasets), its true sustainability hinges on its ability to build and maintain a participatory network of cultural practitioners, researchers, educators, and technologists. The infrastructure is only as valuable as the care, governance and judgment embedded in its use.

The presentation will share preliminary insights from early engagements with Wikimedia affiliates, educational institutions, and CHI professionals. It will also outline upcoming experiments, such as designing a pipeline for GLAM to share data on Wikidata and Wikibases that harnesses GenAI; RAG prototyping with Wikidata; co-design sessions with students and GLAM staff; and workshops on participatory data curation -- all open for collaboration. A key objective of this session is to expose GLAM professionals to the opportunity to shape this work as it unfolds.

This talk is not a showcase of finished products; It is a provocation, and an invitation for an opportunity to reflect together on what kind of future we want -- not only for memory institutions, but for the data and technologies that increasingly mediate collective memory. Rather than being spectators to GenAI disruption, CHIs and their collaborators can be co-authors of its future, which requires us to come together -- not only around shared challenges, but around shared values.

Attendees can expect to leave with:

A conceptual framework for understanding the intersections between GenAI, LOD, and CHI workflows.
A practical overview of what the project is building and where collaborators might plug in.
An invitation to contribute ideas, critiques, and partnerships that will shape the direction of AI-BRIDGES.

As a community committed to ethical, transparent, and equitable uses of AI in the GLAM sector, AI4LAM is an ideal space to surface these questions. It is our hope this talk will spark further conversation, and more importantly, collaboration -- toward building the kind of AI-powered futures we actually want: ones that are open, inclusive, and shaped by those who care for our cultural memory.

References (select)

Evenstein Sigalov, S. & Nachmias, R. (2023). Investigating the Potential of the Semantic Web for Education: Exploring Wikidata as a Learning Platform. Education & Information Technologies. https://doi.org/10.1007/s10639-023-11664-1
Ferrara, E. (2023). Should ChatGPT Be Biased? First Monday. https://doi.org/10.5210/fm.v28i11.13346
Jenkins, H. (2006). Fans, Bloggers, and Gamers: Exploring Participatory Culture. NYU Press.
Padilla, T. (2017). On a collections as data imperative. https://doi.org/10.5281/zenodo.437265
Wilcock, G. (2024). New technologies for spoken dialogue systems: LLMs, RAG and the GenAI Stack. 14th International Workshop on Spoken Dialogue Systems.

Intelligent Systems for Screen Archives: Between Cultural and Computational Value

Daniel Chávez Heras, Arianna Ciula, Geoffroy Noël, Kirsty Warner

King's College London, United Kingdom

This presentation introduces the working model and preliminary findings of Intelligent Systems for Screen Archives (ISSA), a cross-institutional research and development project led by King’s College London in partnership with five regional UK film and television archives: National Library of Scotland, National Library of Wales, Northern Ireland Screen, North West Film Archive, and Yorkshire Film Archive, as well as a convening partner, Film Archives UK.

In the past decades, film and television archives have digitised their collections at different scales, in various formats, and increasingly including digital-born materials. These growing collections of moving images are assembled and made available through rich contextual and medium-specific knowledge, encoded in catalogues, databases, metadata records, and in the moving images themselves. Emerging AI technologies suggest great potential to understand screen heritage from a computational perspective and create new forms of value for moving image archives and their users. At the same time, the rapid proliferation of this family of technologies and the enormous financial interests vested in them have been profoundly disorienting, especially to cultural institutions such as film and television archives that hold data-intensive collections but rarely have the resources, infrastructures, and in-house expertise required to accurately define the computational value of their holdings. Articulating this value to their users and stakeholders is the main challenge tackled by this project.

ISSA is designed by the Department of Digital Humanities at King's College London and King's Digital Lab, supported by the BFI Innovation Challenge Fund, made possible with National Lottery funding.

In this presentation, we will expand on ISSA’s working model and show preliminary findings by focusing on two interlinked components of the project:

The modular prototype for creative experimentation with audiovisual data, called DEERIN (Data Enrichment, Exploration, Retrieval & Interaction).
The programme of co-design workshops that grounds technical development in archival expertise and priorities is called AIMS (AI for Media Sandbox).

The prototype affords tasks such as automated captioning and metadata enrichment (via LLMs and LVMs), visualisation and navigation of moving image collections (using clustering and projection techniques like UMAP and t-SNE), and retrieval-augmented generation (RAG and GraphRAG) for storytelling and enhanced search. All components are built using open-source technologies and published via a public code repository.

These developments are situated within broader public discourse around AI, shaped by the widespread use of generative systems like ChatGPT, Gemini, and Copilot. Such tools have raised expectations for AI’s usefulness across sectors, including culture and heritage, yet the technical sophistication behind these systems often remains opaque to non-specialist users. ISSA responds to this gap by creating transparent, explainable tools built specifically for the needs and capacities of film and television archives. Through hands-on experimentation, partner institutions are empowered to better understand what AI can and cannot do for their collections and users.

Each of these components has been developed to accommodate a wide range of use cases that reflect the distinct challenges faced by partner archives. These include the need to enrich legacy metadata, improve discoverability for under-catalogued collections, surface themes or geographic patterns within vast footage, and enable new modes of public engagement with heritage materials. For example, Northern Ireland Screen has explored how AI might support enhanced access for broadcast partners and educators, while the Yorkshire Film Archive has focused on enabling better search and navigation for community-based users interested in local histories. These diverse scenarios are critical in shaping the design and evaluation of tools and workflows within ISSA.

Crucially, ISSA avoids a one-size-fits-all approach. Instead, it supports situated experimentation by working closely with each partner institution. Partners engage in asynchronous testing of their own collections and participate in targeted co-design workshops. This approach situates technical design and development within the curatorial needs and operational contexts of each institution, favouring incremental development of AI technologies in the context of screen archives, supported by extended testing and expert evaluation.

The AIMS workshops are a core element of this participatory model. They serve to build shared understanding across the project team, encourage reflective dialogue on the implications of AI integration, and identify site-specific priorities for innovation. Through these workshops, partners have raised questions around archival labour, future cataloguing practices, the interpretability of AI outputs, and the risks of over-automation. These discussions are feeding directly into the development of the DEERIN prototype, ensuring that tool design remains accountable to curatorial and operational realities.

ISSA’s methods draw from state-of-the-art computational techniques but embed them within human-centred design. This includes large language and visual models for generating descriptive metadata, visualisation techniques for exploring content relationships, and retrieval-augmented generation approaches for creative exploration. However, these tools are not intended to replace archival expertise. Instead, they are meant to support and extend existing practices by aligning with institutional values, workflows, and knowledge infrastructures.

Our presentation will include draft architecture and workflow illustrations from the prototype, alongside emerging use cases from the workshops. These examples demonstrate how ISSA enables new forms of discovery, access, and reuse of archival material. In doing so, we will reflect on how participatory technology design, explainable infrastructure, and critical experimentation shape our collaborative model. ISSA is driven by research in the emerging field of computational moving image studies (Chávez Heras, 2024, Arnold and Tilton, 2023); its aims are influenced by the key notion of critical technical practice (Agre, 1997); and its design draws from frameworks in participatory design (Simonsen & Robertson, 2013) and responsible AI development (Whittaker et al., 2021).

ISSA’s broader objectives include creating a publicly accessible code repository and knowledge base that documents experiments and shares tools, insights, and best practices for the sector. It also seeks to document gaps in infrastructure and capacity that can inform future funding calls and policy decisions. These goals ensure that ISSA’s impact extends beyond its immediate partners, contributing to a wider network of GLAM institutions considering how to engage responsibly and effectively with AI.

We conclude by situating ISSA within ongoing debates on AI in the GLAM (galleries, libraries, archives, and museums) sector. In response to the conference theme, “AI Everywhere, All at Once,” ISSA offers a grounded case study in how AI development can be steered and integrated in the workflows of cultural institutions, respecting their values, expertise, and public responsibilities, without sidelining innovation. Rather than simply showcasing technical progress, our contribution invites dialogue about how cultural institutions can actively shape new directions for AI systems, making them not only usable, but meaningful.

References:
Agre, P. E. (1997). Computation and Human Experience. Cambridge University Press.
Arnold, T. and Tilton, L. (2023). Distant Viewing: Computational Exploration of Digital Images. MIT Press.
Chávez Heras, D. (2024). Cinema and Machine Vision: Artificial Intelligence, Aesthetics and Spectatorship. Edinburgh University Press.
Simonsen, J., & Robertson, T. (Eds.). (2013). Routledge International Handbook of Participatory Design.
Whittaker, M. et al. (2021). The AI Now Report.

Arctic Fish Skin and AI: Ecological Entanglements and the Future of Cultural Heritage Preservation

Elisa Palomino¹, Ana Cordoba Crespo²

¹Smithsonian Institution, Arctic Studies Center, Washington DC; ²Central Saint Martins, University of the Arts, London

Arctic Indigenous communities such as the Nanai of Eastern Siberia have cultivated material cultures rooted in ecological and spiritual knowledge. Garments made from organic materials such as fish skin, gutskin and sinew sustainably sourced from their immediate environment are the basis of their cultural practices. These artefacts are not only utilitarian and sacred, but also express Indigenous values that place all beings - human, animal and plants - interconnected. These social foundations stand in sharp contrast to the anthropocentric paradigms underpinning Artificial Intelligence (AI), which seeks to simulate human cognition in machines, often marginalising other ways of knowing and being. As AI technologies increasingly penetrate cultural institutions, including those in the GLAM sector (galleries, libraries, archives and museums), there is a growing need to ensure that the deployment of these tools engages ethically with indigenous heritage.

The migration of Arctic material culture, particularly garments such as fish skin robes, from their communities of origin to distant Western museums has been an intellectual, spiritual and artistic loss. The inaccessibility of these artefacts, now held in institutions like the British Museum, the Penn Museum, and the Smithsonian’s National Museum of Natural History (NMNH), continues the colonial legacies of extraction and dispossession. While some museums have facilitated in-person access for community Elders, these efforts often remain limited in scope, reaching only a few individuals due to geographical, financial, and logistical constraints. Addressing these challenges requires not only technological innovation but also a shift in the governance of heritage materials, privileging Indigenous voices in all stages of digital engagement.

In this context, the integration of AI-enhanced 3D modelling offers a powerful tool to extend access to cultural heritage and facilitate virtual repatriation. This paper presents a case study examining a collaborative initiative between two researchers—one a fashion anthropologist affiliated with the NMNH, the other a fashion educator and CLO3D specialist teaching at University of the Arts, London—working in partnership with Nanai cultural practitioners. The project focuses on digitally reproducing a 19th-century Nanai fish skin robe housed at Penn Museum. Originally collected during an 1898 expedition to Siberia and acquired following the 1900 Exposition Universelle in Paris, the robe was later transferred to the Commercial Museum in Philadelphia and ultimately accessioned into the Penn Museum’s ethnographic collections. While these garments may appear to be mere artefacts, they are in fact spiritually charged belongings, created with ritual knowledge and designed to accompany women through major life transitions such as marriage and burials, serving as spiritual shields and cultural markers.

Using high-resolution photographs, parametric design tools, and AI-enhanced visualisation software, the research team reconstructed the fish skin robe digitally. AI-enhanced CLO3D enabled accurate pattern simulation and virtual garment construction, while Blender facilitated the replication of textures and material properties such as the sheen of fish skin and the rigidity of metal coins. The resulting 3D replica served as a pedagogical and cultural resource, deployed in workshops involving Nanai Elders and local youth. These sessions not only enhanced community access to otherwise inaccessible heritage items but also revitalised intergenerational transmission of cultural knowledge through open education. In doing so, the project contributed to the broader aims of digital repatriation—returning knowledge and experience, if not the physical artefact itself, to its community of origin.

This digital replica also functioned as a form of research into the intersections between Indigenous knowledge systems and AI technologies. Indigenous cosmologies offer a powerful lens through which to reimagine the goals of AI development. Where AI tends to privilege rationalist abstraction and individual agency, Indigenous knpwledge systems emphasise relationality, reciprocity, and collective memory. The case study highlights the potential for rethinking AI design not only to serve technical efficiency or institutional objectives but to uphold ethical commitments to inclusion, sovereignty, and cultural integrity. The team’s previous projects, which include the digital reconstruction of a Yup’ik parka at the Anchorage Museum and an Ainu fish skin coat at the Nibutani Ainu Museum, support the scalability and adaptability of such interdisciplinary methodologies across different Arctic and sub-Arctic communities.

Nonetheless, the use of AI-enhanced 3D technologies in heritage contexts is not without complications. Existing platforms often fail to capture the subtleties of organic materials, and their spiritual ceremonial meaning. In addition, digitisation raises complex questions around intellectual property, access rights, and data sovereignty. For instance, while 3D models offer unprecedented accessibility, unrestricted archiving may violate cultural protocols, especially concerning sacred or sensitive items. The risk of cultural appropriation, misrepresentation, and further extraction remains high if such initiatives are not governed by community-led frameworks. As the project makes clear, effective digitisation must involve negotiated agreements on authorship and control, particularly in light of the colonial histories that shaped the very presence of these artefacts in Western collections.

These challenges demand the development of ethical guidelines and legal mechanisms tailored to the cultural and spiritual aspects of Indigenous heritage. Policymakers and cultural institutions must take a proactive role in anticipating the implications of digitisation, crafting regulatory frameworks that balance technological innovation with social justice. This includes revisiting intellectual property norms, developing culturally responsive access controls, and ensuring that communities can define the terms of their digital presence. In this regard, the UNESCO Recommendation on the Ethics of Artificial Intelligence (2021) and the Indigenous AI Protocol (2020) provide essential starting points. Both frameworks emphasise the importance of inclusivity, data sovereignty, and the recognition of alternative knowledge systems in the development and application of AI technologies.

The research presented here argues for a radical reorientation of AI-driven cultural heritage practices—one that moves beyond mere visual replication to consider ethical dimensions. In particular, it highlights the importance of designing 3D and AI systems not from abstract technical parameters, but from within Indigenous worldviews that understand heritage as living, relational, and connected with specific communities. The future of responsible digital heritage lies in co-creation and co-stewardship, wherein Indigenous communities are not passive recipients of digital technologies but active designers and decision-makers. This approach ensures not only cultural fidelity but also builds institutional accountability and trust.

By integrating digital technologies into collaborative education and preservation strategies, this project demonstrates how AI can be harnessed to support sustainable, inclusive, and community-led approaches to heritage. It offers a model for how academic researchers, museum professionals, and Indigenous knowledge holders can work together to co-produce digital tools that honour and perpetuate cultural memory. As globalisation and technological acceleration continue to threaten cultural continuity, such interdisciplinary and intercultural collaborations will be essential. The lessons drawn from Arctic fish skin traditions—rooted in adaptability, creativity, and ecological knowledge—offer not only a model for preserving the past but also a vision for designing more ethical futures in the age of AI.

References

American Declaration on the Rights of Indigenous Peoples (2016) Organization of American States. Secretariat for Access to Rights and Equity. Department of Social Inclusion. Available at: https://www.oas.org/en/sare/documents/DecAmIND.pdf (Accessed: 4 January 2025).

Benedict, C. (2023) Traditional Knowledge and Copyright Intersections. Creative Commons – We Like to Share Medium Blog. Available at: https://medium.com/creative-commons-we-like-to-share/traditional-knowledge-and-copyright-intersections-d8cb78375e40 (Accessed: 13 January 2025).

Brown, D. and Nicholas, G. (2012) Protecting Indigenous Cultural Property in the Age of Digital Democracy: Institutional and Communal Responses to Canadian First Nations and Māori Heritage Concerns. Journal of Material Culture 17(3): 307-324.

Christen, K. (2009) Access and Accountability: The Ecology of Information Sharing in the Digital Age. Anthropology News, April 2009, 4-5.

Csoba DeHass, M. and Taitt, A. (2017) 3D Technology in Collaborative Heritage Preservation. In Digital Representation of Indigenous Peoples through Sharing, Collaboration, and Negotiation. C. Gish Hill and M. Csoba DeHass (eds.). Special Issue of Museum Anthropology Review 12(2). pp.120-152.

Csoba DeHass, M. and Hollinger, E. (2018) 3D Heritage Preservation and Indigenous Communities in the Circumpolar North. In Arctic Yearbook 2018: Arctic Development in Theory & in Practice, L. Heininen and H. Exner-Pirot eds.pp. 218–33. Northern Research Forum.

Palomino, E. (2022) Indigenous Arctic Fish Skin Heritage: Sustainability, Craft and Material Innovation. PhD Thesis. University of the Arts, London. Available at: https://ualresearchonline.arts.ac.uk/id/eprint/20124/

Palomino, E. Karmon, A. Topaz, O., Solo, A., Cordoba, A. (2024) ‘Digital Tools in the Fashion Industry: fish skin garments and Ainu fish skin traditions’. Technology, Sustainability and the Fashion Industry: Can Fashion Save the World? Schramme, A., and Verboven, N. (eds.) Routledge. https://doi.org/10.4324/9781032658506

Palomino, E. (2025) Dye Plants used by the Indigenous Peoples of the Amur River in Fish Skin Artefacts. MDPI Heritage, 8, 195. Special Issue: Dyes in History and Archaeology 43. https://doi.org/10.3390/heritage8060195

Palomino, E., Katz, J. (2025) Arctic Encounters: Material Culture, Indigenous Worldviews, and AI Projections. in: Shons, A. Artistic Witness and Response to Environmental Change in the Arctic. London: Routledge.

User-guided visualizations of multispectral data for recovering degraded notation

Wallace Peaslee, Carola-Bibiane Schönlieb, Anna Breger

DAMTP, University of Cambridge, United Kingdom

Multispectral imaging (MSI) can be used to recover illegible writing or notation from manuscripts. It produces several images, each measuring light intensity in a different wavelength range. Better visualizations of degraded notation can sometimes be achieved with post-processing methods like those implemented in ENVI® or HOKU [1], e.g. Principal Component Analysis (PCA) or the Minimum Noise Fraction Transform (MNF) [2]. These approaches require specifying a region of interest (ROI), but otherwise do not adapt to the specific problem a researcher might be interested in.

We present methods designed to adapt to a researcher’s task. By using a few labels provided by the researcher indicating where notation is present and absent, we aim to automatically and efficiently produce good visualizations of degraded notation. This enables the incorporation of expert knowledge beyond what is possible with many standard methods, as they are unsupervised. As an example, the ‘best’ multispectral image can be determined through task-adaptive and label-dependent image quality measures like Normalized Potential Contrast [3].

Furthermore, several common methods from ENVI and HOKU (e.g. PCA and MNF) can be thought of as producing weighted averages of the images comprising MSI data, where these weights are chosen to maximize some objective or loss function. We can change the objective function to use the labels provided by a researcher, directly tailoring the resulting visualization towards their particular goal. This has the potential to efficiently produce better visualizations compared to standard approaches, at least in cases when a researcher already knows something about what they are looking for. For example, if some degraded notation or writing is faintly detectable in some images from MSI, in the raw data or after postprocessing, then an expert's labels combined with our semi-supervised approaches (e.g. using Normalized Potential Contrast [3]) may enable the recovery of additional notation.

We will provide real-world examples with multispectral images of degraded music manuscripts, discussing how changes induced in an objective function by an expert's labels can be useful but also noting when to be cautious about bias in results. This echoes a wider tension with reliability in the application of artificial intelligence to cultural heritage. The computational methods and algorithms we present, given implementations with practical and user-friendly tools, can be applied to many other problems. Finally, we plan to further develop and broaden models and methods that incorporate expert knowledge for the unique challenges and data that arise in cultural heritage applications.

References

[1] K. T. Knox, “Hoku—a multispectral software tool to recover erased writing on palimpsests,” The Vatican Library Review, vol. 1, no. 2, pp. 205–214, 2022.

[2] A. A. Green, M. Berman, P. Switzer, and M. D. Craig, “A transformation for ordering multispectral data in terms of image quality with implications for noise removal,” IEEE Transactions on geoscience and remote sensing, vol. 26, no. 1, pp. 65–74, 1988.

[3] W. Peaslee, A. Breger, and C.-B. Sch¨onlieb, “Potential contrast: Properties, equivalences, and generalization to multiple classes,” https://arxiv.org/abs/2505.01388, 2025.