Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
|
Session Overview |
| Session | ||
Colab 4.3. AI tools for archives
| ||
| Presentations | ||
ClioX: A Novel Decentralized Platform for Digital Asset Reference & Access: 1The University of British Columbia, Canada; 2University of Lleida, Spain; 3UNED, Spain Short Description This session demos the ClioX platform that archives can use to make archival datasets available to researchers who want to use AI to conduct research. The platform employs Privacy-Enhancing Technologies (e.g., compute-to-data design, data spaces, privacy-preserving federated machine learning, secure multiparty computation, distant reading and visualization) to protect archival documents of a sensitive nature while responding to growing researcher requests for access and analysis using AI. Abstract This session will present and seek feedback on a novel decentralized reference and access computing infrastructure for archives and digital humanities researchers called ClioX. ClioX explores the research question: "How can Privacy Enhancing Technologies (PETS) (e.g., compute-to-data design, data spaces, privacy-preserving federated machine learning, secure multiparty computation, distant reading and visualization) be used to provide greater access to sensitive archival documents without compromising privacy?" ClioX was developed because archival institutions process increasing archival documents of a sensitive nature and have growing researcher requests for access and analysis to such documents using AI. Archives struggle to conduct sensitivity reviews using manual or automated techniques, which prevents compliance with AI and data protection regulations and presents barriers to access for researchers. ClioX solves this problem by: (1) allowing archives to make archival datasets available to researchers to run a variety of AI-enabled computations (e.g., Exploratory Data Analysis, Clustering, Topic Modelling, and Sentiment Analysis) and (2) allowing researchers to access one or many archival datasets to run AI algorithms over archival datasets in archival 'data spaces' so that the data never leaves the custody and control of the archives but is still able to return aggregated output results to researchers. ClioX builds upon the EU's PontusX, open source framework for the industrial AI and data economy and the largest publicly available X-Ecosystem, which is powered by smart contracts and Gaia-X and transforms Data Act compliance into scalable collaboration and monetization opportunities that have the potential not only to provide greater access to archival data but also to create new business models for archives and digital humanities that leverage the power of decentralized governance and computing. Transformando Archivos: IA para Descripciones Automatizadas de Audios en el AUROL-UCR 1Cargil, Costa Rica; 2Intel, Costa Rica; 3Universidad de Costa Rica, Archivo Universitario Rafael Obregón Loría, Costa Rica.; 4Universidad de Costa Rica, Radioemisoras UCR, Costa Rica. Short Description Este proyecto exploratorio tiene como objetivo evaluar la viabilidad y los desafíos del uso de la Inteligencia Artificial para descripciones automáticas de 4000 audios de cintas digitalizadas y custodiadas en el Archivo Universitario Rafael Obregón Loría de la Universidad de Costa Rica. Durante la sesión, se presentarán los hallazgos iniciales, abordando contexto institucional y recursos necesarios, y se espera recibir retroalimentación de colegas sobre el desarrollo y futuro del proyecto. Abstract La Universidad de Costa Rica (UCR) es una institución pública de educación superior que goza de autonomía constitucional. Fue fundada en la década de 1940 y declarada Institución Benemérita de la Educación y la Cultura de Costa Rica. Destaca por su vínculo con el sector productivo del país por medio de sus actividades de investigación, proyectos de acción social, educación continua y extensión cultural en diversos puntos del territorio nacional. El Archivo Universitario Rafael Obregón Loría (AUROL) es la instancia universitaria encargada de la coordinación del Sistema de Archivos Universitarios y del Archivo Histórico de la Institución. Promueve la conservación del patrimonio documental de la UCR y lo pone al servicio de la comunidad universitaria y de la sociedad. En el 2014, el AUROL recibe la Fonoteca Histórica de la UCR, por parte de Radioemisoras de la UCR (Radios UCR), conformada por discos de vinilo, cintas magnéticas y casetes. En 2015, se prioriza la digitalización de 4,000 cintas de programas nacionales, de un total de 8,000. A pesar de que el AUROL ha destinado recursos para la descripción del material, el proceso ha sido lento debido a la cantidad de audios. La Inteligencia Artificial (IA) podría ser una herramienta efectiva para realizar descripciones masivas y ayudar a la accesibilidad y uso de estos materiales. Por esta razón, se propone un proyecto para explorar las fortalezas, limitaciones y consideraciones prácticas de las herramientas de IA, incluidos los requisitos técnicos, recursos necesarios y su compatibilidad con tecnologías utilizadas por el Centro de Informática de la UCR. Para ello, se pretende realizar una revisión de estudios, investigaciones y mejores prácticas de instituciones que usen IA en archivos, analizando tecnologías adecuadas para la descripción y evaluando el contexto jurídico, administrativo y tecnológico de la UCR. Esta etapa exploratoria permitirá comprender IA en la descripción archivística en el contexto latinoamericano y universitario, sentando la base para una implementación posterior. El impacto a largo plazo será la preservación cultural de colecciones de audios y el establecimiento de un modelo para adopción de tecnologías de descripciones automáticas. Durante la sesión, se buscará retroalimentación sobre el uso del IA en la descripción archivística, explorando herramientas y experiencias, y fomentando colaboraciones con otras instituciones. Unlocking Archival Access: Using Transkribus and AI to Transform Finding Aids into Searchable Data READ-COOP (Transkribus), Austria Short Description Finding aids are essential for archive users, but many remain undigitised in archives’ reading rooms. This presentation highlights a project by READ-COOP together with Library and Archives Canada that used Transkribus, an AI-driven platform, to extract and convert information from complex archival finding aids into a database. It outlines the workflow and accuracy of trained AI models, opening a discussion on how AI can help archivists make analogue written documents accessible and searchable. Abstract Finding aids (inventories, indexes, registers, etc.) are often the first point of access for archive users to locate documents. Despite requiring years of expertise to be created, many remain undigitised in the reading rooms or only available as non-searchable and/or unstructured PDFs. This presentation showcases a project conducted by READ-COOP in collaboration with Library and Archives Canada (LAC). It demonstrates how AI models can be trained to extract information from structured analogue documents, such as archival finding aids, for seamless integration into a database. Transkribus is a user-friendliness-focussed platform for transcribing and searching historical documents using AI-powered text recognition. Originally developed as part of the Horizon 2020 “READ” EU project, it is currently maintained and further developed by the non-profit-oriented READ-COOP European Cooperative Society. It allows users to train custom AI models and process large volumes of handwritten and printed documents without coding knowledge. The LAC project involved extracting information from two finding aids previously scanned by LAC: the Selective Index to Canadian Newspapers (1890–1950), a subject index of typewritten cards with occasional handwritten notes, and the Order in Council Registers (1925–1942), handwritten volumes in tabular format documenting the Privy Council Office's activities. The complexity of the sources required training multiple custom AI models. The workflow involved: - layout recognition to identify fields containing information - text recognition for both handwritten and typewritten content - date normalisation - automated subject extraction. The exported and reviewed results were integrated by LAC into their system. We will provide a detailed walkthrough of the project, illustrating how the models were trained, the achieved accuracy and solutions to challenges, showing how similar projects can be done in Transkribus. In addition, an outlook will be provided as to the further development of the technology, where all steps will be done by a single AI model instead of a series of models. The discussed approaches differ from models like ChatGPT in so far as they are more predictable and reliable. We will invite participants to discuss other potential applications of this technology beyond finding aids and how automated text recognition and information extraction can support archivists in their work to make information accessible and searchable. | ||