Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Please note that all times are shown in the time zone of the conference. The current conference time is: 6th Oct 2024, 07:57:40pm GMT

 
 
Session Overview
Session
Panel: Publication and reuse of digital collections: A GLAM Labs approach
Time:
Wednesday, 29/May/2024:
2:45pm - 4:30pm

Session Chair: Mahendra Mahey, Tallinn University, Estonia
Location: H-207 [2nd floor]

https://www.hi.is/sites/default/files/atli/byggingar/khi-stakkahl-2h_2.gif

Show help for 'Increase or decrease the abstract text size'
Presentations
2:45pm - 4:15pm

Publication and reuse of digital collections: A GLAM Labs approach

Gustavo Candela1, Sally Chambers2, Nele Gabriëls3, Katrine Hofmann Gasser4, Olga Holownia5, Lars Johnsen6

1University of Alicante, Spain; 2DARIAH, Belgium; 3KU Leuven, Belgium; 4Royal Danish Library, Denmark; 5IIPC, United States of America; 6National Library of Norway

Title: Publication and reuse of digital collections: A GLAM Labs approach

For decades GLAM (Galleries, Libraries, Archives and Museums) have been exploring new ways to make available their digital collections. They host a wide diversity of rich content including, for example, maps, images, born-digital materials, text, audio or video materials that are available in many forms in terms of access and copyright. Recent advances in technology based on Artificial Intelligence and Machine Learning have provided a new context in which data level access has become a crucial aspect to engage with the research community. GLAM institutions can play a relevant role in this new context based on their expertise and knowledge as curators and content publishers [1], and their efforts can be maximised by the recently established common European data space for cultural heritage [2].

New initiatives such as Collections as Data [3] and the International GLAM Labs Community [4] have recently emerged in the cultural heritage sector to promote the publication of digital collections suitable for computational use as well as the reuse of content in innovative ways. Following their principles, a growing number of cultural heritage institutions have been making their digital collections available under open licenses, releasing prototypes and creating sandboxes for researchers. Some examples include the Data Foundry at the National Library of Scotland, the Library of Congress Labs, and the British Library Labs [5]. Inspired by previous approaches focused on the use of Jupyter Notebooks such as the GLAM Workbench [6], several institutions have started to use the notebooks to make available documentation and code based on their digital collections [7]. In addition, a checklist describing the steps to publish Collections as Data focused on small and medium-sized GLAM institutions has been recently published as a collaborative effort by the International GLAM Labs Community [8, 9].

These efforts provide an extensive demonstration of different initiatives to publish and reuse digital collections suitable for computational use. However, GLAM institutions need guidance in order to meet the current and emerging needs of the research community covering the following aspects: i) data workflows and checklists to provide data level access; ii) data quality in terms of content (e.g., OCR) and metadata; iii) documentation about the digital collections; and iv) reproducible examples of use.

The purpose of this panel is to introduce the work performed in the context of the International GLAM Labs Community to help GLAM organizations adopt best practices when using new trends such as Collections as Data. This proposal fits several of the conference topics, including “creating and using cultural heritage collections as data: workflows, checklists, tools” and “the reproducibility and repurposing of data, workflows, and lessons learned”.

Format

The speakers representing the GLAM Labs community will provide an introduction to the concepts and practices mentioned above, with a particular focus on the checklist to publish collections as data as well as the planned next steps. This will be followed by two case studies looking at lessons learned from Library Labs, and an overview of the potential of the European data space for cultural heritage In the presentations and the 30-minute Q&A part of the panel, we will cover the following questions: i) what is computational access and how can it be achieved in small and medium-sized institutions?; ii) how can GLAM institutions provide documentation and reproducible examples of use based on their digital collections?; iii) what are the steps and best practices to publish digital collections suitable for computational use?; and iv) how to establish a community to share ideas and knowledge about GLAM. In addition, future work will be explored regarding potential research lines for the International GLAM Labs Community.

Proposed format:

Presentations

Sally Chambers & Olga Holownia: Introduction to GLAM Labs as a community, recent projects and next steps [12 minutes]

Gustavo Candela & Nele Gabriëls: Introduction to Checklist for Publishing Collections as Data [12 minutes]

In order to support GLAM institutions in meeting the needs of the research community, a checklist for preparing collections as datasets for computational use was created. It is set up as a tool for GLAMs to leverage their digital assets for digital scholarship. This presentation will talk about how the checklist was developed based on input from the community through a survey of their needs. The checklist will be presented as well as a brief case study, offering both GLAM professionals and DH researchers insight into the principles and their implementation.

Katrine Hofmann Gasser: Connecting people with data at KB Labs at the Royal Danish Library: lessons learned from collaborative projects and initiatives. [12 minutes]

KB Labs (labs.kb.dk) was set up in 2016 and for the past 8 years has focused on providing opportunities for students and researchers to work with library collections and also using the tools developed by KB IT Department. This presentation will give a brief overview of the labs within the Library, how we work with data and how we connect with our users through collaborative projects. The presentation will also cover the most recent initiatives involving the AI Lab and AI politics.

Lars Johansen: Connecting people with tools: lessons learned at the DH-lab at the National Library of Norway [12 minutes]

DH-lab assists scholars and students in the use of digital tools and methods. Since 2013, they have built a research infrastructure that allows for computational analysis in alignment with the FAIR principles, primarily through Jupyter notebooks and user-friendly web applications. This presentation will focus on the lessons learned from providing tools to researchers who work with digital collections provided by the National Library of Norway through the DH-Lab.

Sally Chambers & Gustavo Candela: Crossroads: the common European data space for cultural heritage [12 minutes]

Building on the experience of Europeana, the launch of a data space by the European Union has opened up new possibilities for the sharing and reuse of cultural heritage data. The data space has the potential to maximize the efforts carried out by cultural heritage institutions, also connecting them with wider academic and research communities. This presentation will focus on the new perspectives it can bring to the GLAM Labs Community.

Moderated Q&A discussion [30 minutes]

After a brief introductory question round regarding the attendants experiences with GLAM collections as data - be it from the perspective of a GLAM institution or Lab or that of a dataset user - the Q&A discussion will focus on the following topics/questions:

  • Access to data: key challenges for researchers and GLAM institutions

  • Solutions for scaling up and reproducibility

  • Next steps for organisations that have adopted a checklist.

References

[1] Research Libraries UK. A manifesto for the digital shift in research libraries, 2020, https://www.rluk.ac.uk/digital-shift-manifesto/.

[2] https://pro.europeana.eu/page/data-space-deployment

[3] Padilla, T., Allen, L., Frost, H., Potvin, S., Russey Roke, E., & Varner, S. (2019). Final Report --- Always Already Computational: Collections as Data (Versión 1). Zenodo. https://doi.org/10.5281/zenodo.3152935

[4] Data Foundry at the National Library of Scotland: https://data.nls.uk/, the Library of Congress Labs: https://labs.loc.gov/, British Library Labs: https://labs.biblios.tech/.

[5] https://labs.biblios.tech/item-category/datasets/

[6] Tim Sherratt. (2021). GLAM Workbench (version v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.5603060

[7] Candela, G., Chambers, S., & Sherratt, T. (2023). An approach to assess the quality of Jupyter projects published by GLAM institutions. Journal of the Association for Information Science and Technology, 74(13), 1550–1564. https://doi.org/10.1002/asi.24835

[8] Candela, G., Gabriëls, N., Chambers, S., Dobreva, M., Ames, S., Ferriter, M., Fitzgerald, N., Harbo, V., Hofmann, K., Holownia, O., Irollo, A., Mahey, M., Manchester, E., Pham, T.-A., Potter, A. and Van Keer, E. (2023), "A checklist to publish collections as data in GLAM institutions", Global Knowledge, Memory and Communication, Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/GKMC-06-2023-0195

[9] Mahey, M., Al-Abdulla, A., Ames, S., Bray, P., Candela, G., Chambers, S., Derven, C., Dobreva-McPherson, M., Gasser, K., Karner, S., Kokegei, K., Laursen, D., Potter, A., Straube, A., Wagner, S-C. and Wilms, L. with forewords by: Al-Emadi, T. A., Broady-Preston, J., Landry, P. and Papaioannou, G. (2019) Open a GLAM Lab. Digital Cultural Heritage Innovation Labs, Book Sprint, Doha, Qatar, 23-27 September 2019. https://glamlabs.io/books/open-a-glam-lab/

Candela-Publication and reuse of digital collections-216_a.pdf
Candela-Publication and reuse of digital collections-216_b.pdf


4:15pm - 4:30pm

Surveying cultural heritage data labs

Kaspar Beelen1, Marten Düring2, Danièle Guido2

1School of Advanced Study, University of London, United Kingdom; 2Centre for Contemporary and Digital History, University of Luxembourg

The “Always already computational. Collections as data” paradigm coined by a research project of the same name has since 2016 received a strong resonance among libraries, archives and other GLAM institutions worldwide. Many of them strive to offer access to their data and are experimenting with public APIs, dedicated data labs, data dumps, and even closed computing environments. In parallel, the decidedly computational analysis of cultural heritage data has emerged as a vibrant subfield which so far produced a dedicated journal, a conference series, workshops and monographs. We define the subfield “computational humanities” as a distinct user group of computer-savy humanists who wish to analyze cultural (heritage) data at scale harnessing advanced methods from data science and machine learning.

This short paper addresses the question to which extent GLAM institutions succeed in meeting the needs of the research community. It is motivated by our ongoing work to create a data lab for the impresso project. impresso aims to break down national and institutional data silos, providing unified access to newspaper and radio archives in Western Europe. The forthcoming impresso data lab strives to facilitate access to such a complex, multilingual and multimodal collection, especially focussing on the “programming historians” as a distinct user group.

The presentation will include an overview of the current state of the art in commercially and publicly funded cultural heritage data labs, present user requirements and researcher personas as well as transparency requirements. More specifically, this entails:

  1. A survey of data labs for computational humanities research: the first part of our analysis comprises a survey of existing data labs in the fields of cultural heritage and digital humanities. To determine the shape of the impresso data lab, we need to gather ideas and best-practices. We investigate how data labs provide access to collections, e.g. via APIs, data dumps or other means, what type of information they make available (metadata, text, image) and also to what extent these labs achieve to integrate heterogeneous data (or provide access in parallel). Besides access, we inspect whether labs provide computational infrastructure to support the analysis and exploration of their data, for example by allowing users spin-up dedicated VMs, or less costly, Google Colab notebooks or binder environments. We especially focus on the role of notebooks as a bridge between infrastructure and research applications. (Melgar-Estrada, et al. 2019)

  1. User requirements and researcher personas: after establishing what exists (in terms of data labs), we elicit user requirements from researchers interested in working with historical media archives at scale. Building an infrastructure, doesn’t automatically mean it will be used by the community (Zundert, 2012). Therefore, in the second part of the presentation, we report on interviews conducted with researchers in the computational humanities. More generally, we will discuss how we envisage to create communities around the tools and models we develop as part of the data lab, and ensure longer-term support and use (Arnold et al., 2019).

  1. Requirements for transparency and data-criticism: transparency has been a key value of the impresso project since its inception. The quality of digital research does depend on being able to understand (and control) the process through which data was collected, processed and analyzed. We assess how a data lab can maximize both transparency and utility (allow users to look under the hood and be in charge of the research, without this becoming a burden or hindrance). We discuss various methods to enhance transparency, for example through collecting paradata on the collections by documenting archival knowledge; releasing the models used in processing data; ensuring users can recreate and repurpose data pipelines; facilitating data-criticism through overviews of both present and missing data etc (Beelen et al. 2023).

The computational analysis of cultural heritage data has been embraced by both data providers and the digital humanities community. It is, however, not obvious how exactly institutions can effectively and efficiently support research practices. This short presentation will report on the lessons we learned during the survey, interviews, and our own design process.