Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Please note that all times are shown in the time zone of the conference. The current conference time is: 1st May 2025, 05:49:20pm GMT

 
 
Session Overview
Session
Poster session
Time:
Thursday, 30/May/2024:
4:40pm - 5:30pm

Location: Háma


https://dhnb.eu/conferences/dhnb2024/posters/
Show help for 'Increase or decrease the abstract text size'
Presentations

Insights into the Labour’s Memory Project Infrastructure

Raphaela Heil1, Theo Erbenius1, Isto Huvila2, Eva Pettersson3, Örjan Simonson1, Olle Sköld2

1Popular Movements’ Archive Uppsala, Sweden; 2Department of ALM, Uppsala University, Sweden; 3Department of Linguistics and Philology, Uppsala University, Sweden

The Labour’s Memory (LM) project aims to make the annual and financial reports from the years 1880 to 2020 from local, regional and national Swedish blue-collar trade union organisations, as well as their international umbrella organisations, the International Trade Secretariats (ITS) and the International Confederation of Free Trade Unions (ICFTU), digitally available to researchers and trade union organisations via a dedicated web platform. The report documents fulfil a similar purpose for organisations worldwide and can therefore provide a basis for comparison and serve as entryways to the life and work of trade unions.

The reports from the various organisation are held and digitised at the collaborating archives, the Swedish Labour Movement’s Archive and Library (ARAB, Stockholm, Sweden), the Popular Movements’ Archive in Uppsala (FAC, Uppsala, Sweden), the Archive of Social Democracy (AdSD, Bonn, Germany) and the International Institute of Social History (IISH, Amsterdam, the Netherlands). The presentation of the digitised reports on the platform and the users’ experiences are enhanced via computational linguistics, in the form of spelling normalisation and named entity recognition to facilitate full-text and faceted searches, computerised image processing, making the report contents digitally available via automatic text recognition, and user-driven design methodologies, for example eliciting the needs and goals of platform users. These respective expertises are provided by researchers from the Department of Linguistics and Philology, the Department of Information Technology and the Department of ALM at Uppsala University.

The purpose of this poster is to present the Labour’s Memory project to the wider audience of the DHNB community and share experiences from the process of developing the digital infrastructure with researchers and engineers. Besides providing an update on the progress of the project, the poster will focus on sharing insights into the infrastructural design choices made during the process to inform future work on digital platforms aimed at a mixed audience.

In LM, the Omeka S system (https://omeka.org/s/) was chosen as the framework for the web platform, through which the digitised material will be made available. For each report, the metadata and transcriptions are delivered directly to the platform from each of the four archiving institutions, in order to leverage Omeka’s search and filtering capabilities. Besides this, the respective digital images are made available via the International Image Interoperability Framework (IIIF, https://iiif.io/), which ensures a unified and well-defined interface for accessing whole collections, individual pages or even portions of an image, regardless of each institution’s individual implementation. In the case of LM, the choice of IIIF furthermore allows for the reuse of existing systems (IISH, AdSG) and facilitates the implementation of an infrastructure that can be reused for future projects (ARAB, FAC). The poster will highlight advantages of an Omeka S based approach, how the system was configured and customised for the LM and why. Moreover, the limitations relating to using an existing framework in relation to developing a custom platform will be discussed.

Labour’s Memory is funded by Riksbankens Jubileumsfond under grant agreement IN20-0040.



Letter Collections - from Word to Web

Senka Drobac1, Hanna-Leena Paloposki2, Ilona Pikkanen2

1University of Helsinki, Finland; 2The Finnish Literature Society, Finland

This paper describes the transformation process of Letter catalogues written as Word documents into a Resource Description Framework (RDF) for publishing on the Semantic Web. Part of the digital humanities consortium project, Constellations of Correspondence (CoCo) (Tuominen et al. 2022, Drobac et al. 2023a), this work aims to aggregate, harmonize, link, enrich, and publish 19th-century epistolary metadata from various Finnish Cultural Heritage (CH) organizations. A key challenge in this task is the catalogues' format. Although well-suited for human use, their inconsistent formatting and structure pose difficulties for computational processing.

Many CH organizations, including the National Library of Finland and the Swedish Literature Society in Finland, keep their epistolary collections in traditional Word documents. Typically a file begins with the record creator's name and biography, followed by details of documents in their archive. The Letter Exchange section is usually divided into subcategories such as Received Letters, Sent Letters, Letter Concepts, and exchanges between other individuals. Each subsection contains varying details about correspondences. The creation of these catalogues over decades by different archivists has led to many exceptions in both formatting and structure.

Automatically parsing these files is complicated due to the variation in document structure and line-level inconsistencies. Sometimes, documents include information about multiple persons, breaking the typical structure format. For instance, the Åkerman-Voipio family archive contains records of 19 different persons. Personal archives can also include documents of family members, as seen in the Frans Victor Hannus archive, which includes records of his wife's correspondence. Additionally, the lack of standard naming conventions in subsections and the possibility of letter correspondence information spanning multiple lines with additional comments or locations add to the complexity.

To confront these challenges, our initial step was to manually uniform the documents. Research assistants reviewed the documents for inconsistencies and standardized the format for automatic parsing. This process involved separating catalogues with multiple main archival persons, harmonizing subsection titles, and introducing specific markers for line breaks and comments. Our goal was to standardize sections, especially those involving correspondence between other actors, replacing diverse formats with a consistent one.

Following this groundwork, we created a rule-based parser capable of reading standardized Word documents. It extracts essential information such as actors (sender, recipient), dates of sending, amount of sent letters, and archival information. Whenever available, we also capture biographical information on persons. On occasion, the parser also identifies details on mentioned persons and places, letter types, or other relevant information. Once all required information is extracted, it is fed into the transformation pipeline. This pipeline converts the data into RDF following the CoCo model (Drobac et al. 2023b) and publishes it on the Semantic Web portal.

This combination of manual and rule-based automatic processing ensures accurate data handling, making the process both reliable and trustworthy. It is a crucial step in making these historical documents accessible on the Semantic Web, thereby enhancing their utility for academic research and public interest.

This poster will present the key aspects of our methodology, challenges faced, and the innovative solutions we implemented.



Large language models to supercharge digital humanities

Andres Karjus1,2

1Tallinn University; 2Estonian Business School

The increasing capacities of large language models present an unprecedented opportunity to scale up data analytics in the humanities and social sciences, augmenting and automating qualitative analytic tasks previously typically allocated to human labor. This contribution describes a systematic qual-quant mixed methods framework to harness and combine expert knowledge, machine scalability, and rigorous quantification, with attention to transparency and replicability. 17 machine-assisted case studies are showcased as proof of concept. These cover linguistic and discourse analysis, lexical semantic change detection, interview analysis, historical event cause inference and text mining, detection of political stance, text and idea reuse, genre composition in literature and film; social network inference, automated lexicography, missing metadata augmentation, and multimodal visual cultural analytics. In contrast to the focus on English in the emerging LLM applicability literature, many examples here deal with scenarios involving smaller languages and historical texts prone to digitization distortions. In all but the most difficult tasks requiring expert knowledge, generative LLMs can demonstrably serve as viable research instruments or artificial assistants, reflecting recent LLM applicability research (cf. Gilardi et al 2023, Ziems et al. 2023). Machines as well as humans may contain errors and variation, but the agreement rate can and should be accounted for in subsequent statistical modeling; a bootstrapping approach is discussed. The replications among the case studies (Mulder et al. 2023, Sobchuk et al. 2023, Kanger et al. 2022) illustrate how tasks previously requiring potentially months of team effort and complex computational pipelines, can now be accomplished by an LLM-assisted scholar in a fraction of the time.



Social Media Analysis of Public Reactions to the Israel-Gaza War: Insights from Facebook and Instagram

Wajdi Zaghouani2, Anissa Jrad1

1HBKU, Qatar; 2HBKU, Qatar

The Israel-Gaza War remains a highly intricate and polarizing issue that captures global attention. This study delves into the dynamics of public reactions on Facebook and Instagram, platforms pivotal in shaping the discourse around this geopolitical event. We analyze user-generated content to uncover how individuals react to and engage with posts related to the Israel-Gaza War, and what insights can be gleaned from their interactions.

Research Question: How do users on Facebook and Instagram engage with and respond to content about the Israel-Gaza War? What patterns emerge in their interactions, and how do these reflect broader public sentiment and discourse?

Methodology: Utilizing CrowdTangle, a comprehensive social media analytics tool, we compiled a large dataset of posts from Facebook and Instagram. Our collection strategy involved keyword searches and hashtag tracking (e.g., #Israel, #Gaza, #PeaceInTheMiddleEast) to ensure a diverse representation of perspectives—from news coverage and official communications to grassroots activism and personal narratives. We meticulously documented user interactions, including likes, shares, comments, and emotional reactions, to analyze engagement patterns.

Findings:

Temporal Patterns: Examination of post and reaction distributions revealed spikes in activity following key events like bombings or attacks, reflecting an initial shock and engagement that brings graphic images and major announcements to the forefront of global attention. Over time, certain hashtags or narratives gain traction, maintaining sustained discussions that influence public perception and policy discussions.

Sentiment Analysis: Sentiment analysis of user comments and reactions indicated significant negative sentiments towards the actions of both sides, especially highlighting issues like hospital bombing, censorship, misinformation, and humanitarian impacts. Sentiments can shift rapidly in response to new incidents or information disclosures. Moreover, support and opposition for each party vary widely across different social media platforms and user communities, with some showing a higher prevalence of pro-Palestinian content while others display more balanced or varied political leanings.

User Engagement: Analysis of engagement metrics showed that content depicting graphic imagery or major updates tends to receive heightened interaction, indicating user priorities and the impact of visual content in activism.

Influential Voices: We identified key influencers, including activists, media outlets, and notable personalities, who significantly shape the narrative and mobilize engagement through their extensive reach.

Geographical Analysis: Geotagged data provided insight into regional sentiment variations, underscoring how local contexts influence perceptions of the war.

This research elucidates the critical role of social media in facilitating dialogue, spreading information, and fostering collective expression during complex geopolitical events. By deploying a robust methodology and an extensive dataset, our study enhances understanding of public reactions and the influence of social media on contemporary discourse and activism surrounding the Israel-Gaza War on key platforms like Facebook and Instagram.

Acknowledgments

This work was made possible by grant NPRP14C-0916-210015 / MARSAD Sub-Project from the Qatar National Research Fund / Qatar Research Development and Innovation Council (QRDI). The contents herein reflect the work and are solely the authors’ responsibility.

Zaghouani-Social Media Analysis of Public Reactions to the Israel-Gaza War-233.pdf


MARSAD Observatory: Monitoring and Analyzing Social Networks Topics in the MENA Region

Wajdi Zaghouani

Hamad Bin Khalifa University, Qatar

The MARSAD Observatory aims to revolutionize social media data monitoring and analysis in the Middle East and North Africa (MENA) region. This live social media observatory provides real-time insights into the dynamic digital discourse in MENA, serving individuals and institutions. MARSAD takes a multidisciplinary approach to offer comprehensive data on trending topics, sentiments, emotions, and nuanced insights into gendered and generational dynamics within Qatar's digital public sphere.

The project's objectives encompass a diverse range of ambitious goals. Objective 1 involves creating a large and balanced annotated dataset that covers multiple Arabic dialects and regions, addressing a crucial gap in current data resources. Objective 2 focuses on developing an AI-powered Social Media Monitoring Platform to enable real-time data analysis, providing users with an intuitive interface to navigate the MENA region's digital landscape.

MARSAD's commitment to bridging language barriers is a distinguishing feature. It will be the first social media monitoring tool to offer support for Arabic dialects, capturing diverse linguistic nuances prevalent in the region and making it a unique and indispensable resource.

Beyond real-time monitoring, MARSAD supports archiving topics (Objective 3), ensuring that historical public datasets remain accessible. This feature enables researchers to analyze the evolution of digital discourse over time, crucial for studying online narratives, social trends, and the impact of digital activism.

Fostering inclusivity within the digital public sphere is central to Objective 4. MARSAD investigates gendered and generational dynamics in Qatar's digital landscape, with a focus on marginalized groups. Informed by feminist theories, the project seeks to illuminate both empowering and challenging aspects of digital activism, promoting more inclusive digital cultures in MENA.

Methodologically, MARSAD employs cutting-edge techniques in Artificial Intelligence (AI) and Natural Language Processing (NLP). Data collection relies on the X API, extracting live public X streams from MENA. Premium access to the X API ensures historical data availability. A substantial sample undergoes meticulous annotation for sentiments, emotions, hate speech, irony, sarcasm, and stance.

To achieve Objective 1, Python scripts automate data collection, focusing on specific keywords and posts in standard Arabic and dialects. Skilled annotators follow comprehensive guidelines, ensuring high accuracy through blind and double annotations.

Objective 2 integrates advanced machine learning and deep learning techniques using the annotated dataset. Techniques include LSTM, Bi-LSTM, CNNs, RNNs, and transformers like BERT and GPT-2.

The project culminates in a dynamic real-time social media monitor, offering daily trend exploration, topic and hashtag searches, and interaction visualization. An Application Programming Interface (API) facilitates external application integration. MARSAD includes a user feedback feature, contributing to ongoing annotation and data enhancement efforts.

In conclusion, the MARSAD Observatory advances our understanding of MENA's digital discourse. Through innovative data collection, analysis, and inclusivity, MARSAD empowers researchers, policymakers, and the community to navigate and shape the region's digital landscape effectively. Interdisciplinary collaboration, cutting-edge technology, and a critical perspective drive MARSAD to contribute significantly to the study of digital communication in MENA, promoting more inclusive and informed digital cultures.

This work was made possible by grant NPRP14C-0916-210015 / MARSAD from Qatar National Research Fund.

Zaghouani-MARSAD Observatory-232.pdf


Uppsala Runestaff Database

Michael Dunn

Uppsala University, Sweden

Runestaves are perpetual calendars made and used in Sweden and
surrounding areas from the turn of the millenium until they were
superseded by printed almanacks in the 17th or 18th century (Hallonquist
1994). They were usually carved on wood, and took various forms, most
commonly staff, sword, paddle, or (wooden) book. The runestaves were
marked with sequences of symbols, mostly from the younger futhark runic
alphabet, from which could be read Sundays, the new moons, and various
fixed feasts over any year, and which allowed a simple calculation of
Easter. As a calendrical calculator, the runestaves represent a
folk-scientific instrument of considerable complexity (Halonen 2020).
The Uppsala Runestaff database contains aligned transcriptions of more
than 600 runestaves, representing more than half of the runestaves known
to exist. This digital tool is designed primarily to facilitate analysis
of runestaves as texts, using stemmatological and cultural evolutionary
approaches (Roelli 2020). Analysis of the textual content of the
runestaves corpus provides insight on how they were used, allows
reconstruction of regional and temporal traditions in their manufacture,
and gives evidence of the manner in which the knowledge required to make
and use them was transmitted.



Using ChatGPT for (semi-) automatic subject indexing of different document types

Johannes Widegren1, Koraljka Golub1, Jue Wang2

1Linnaeus University, Sweden; 2University of Chinese Academy of Science

We are currently in a phase where it seems that new applications for large language models (LLMs) in general and generative pre-trained transformers (GPTs) in particular are tested every day. Examples are as diverse as automated data mining for building energy management (C. Zhang et al., 2024), evaluating the accuracy of differential-diagnosis lists for clinical vignettes (Hirosawa et al., 2023), and human-machine-augmented intelligent vehicles (J. Zhang et al., 2023). They can also be used to extract structured information from unstructured text (Söderström, 2023). This poster presents a pilot study on one such application, the potential use of OpenAI’s ChatGPT for automatic subject indexing of archival documents in Swedish, Swedish LGBTQ fiction and Chinese fiction. The accuracy of the assigned subject index terms is compared with the output from ANNIF (Suominen et al., 2022), an established automatic subject indexing software used in libraries.

The results display an impressive degree of accuracy for the subject index terms assigned by ChatGPT, but challenges have been identified in all three document types. For example, the appropriateness of the terms for historical text is highly questionable at times. The terms assigned by ANNIF, in contrast, are drawn from a controlled vocabulary, which ensures that they have been manually selected as suitable subject index terms. The pilot study shows that it is feasible to run the index terms suggested by ChatGPT through ANNIF to get index terms from a controlled vocabulary while harnessing ChatGPT’s state-of-the-art natural language understanding. This presents intriguing opportunities for implementing GPTs in the archival/library cataloging workflows. Semi-automatic approaches and manual checks are still to be preferred, however, in order to maintain the authenticity of the generated metadata.

Widegren-Using ChatGPT for (semi-) automatic subject indexing-155.pdf


A new resource of Icelandic sagas: Digitizing normalized scholarly editions and enhancing textual data

Ellert Þór Jóhannsson1, Þórður Ingi Guðjónsson2, Finnur Ágúst Ingimundarson1

1Árni Magnússon Institute for Icelandic studies, Iceland; 2Old Icelandic Text Society

A new resource of Icelandic sagas

Digitizing normalized scholarly editions and enhancing textual data

Introduction

This poster accounts for a new project aimed at digitizing Icelandic saga text editions and further enhancing the resulting textual data. This includes implementing lemmatization processes, creating a corpus, linking the lemmas with a lexicographic resource, and constructing comprehensive inflectional database. The aim is to have the material accessible online on a user-friendly platform. The initiative seeks to provide scholars, researchers, and language enthusiasts with a dynamic resource for exploring and understanding Old Norse vocabulary and linguistic structures in a detailed and straight forward manner.

Background

Icelandic family sagas represent the peak of Icelandic medieval writing culture. Until now these sagas have only been freely available online in a normalized form compatible with Modern Icelandic language standards, both orthographically as well as morphologically, eliminating various nuances present in their original Old Norse form.

Íslenska fornritafélagið (Old Icelandic Text Society) was founded in 1928 and has ever since worked on publishing Icelandic medieval texts in print with detailed introduction and textual commentary. The editions of the society use their own normalization standard of the language as it was around the year 1200. Most of the print volumes were published before electronic processing of texts. The current project aims at facilitating access to the texts closer to their original linguistic state and create a versatile resource for the study of Old Norse in a standardized form.

Methods and workflow

  1. Digitization of text editions
  • Employ Optical Character Recognition (OCR) and text-processing techniques to convert printed editions into machine-readable formats.
  • Enhance accessibility by providing an online platform to explore the digitized texts.
  1. Lemmatization:
  • Implement natural language processing algorithms to identify and categorize word forms within the texts.
  • Facilitate linguistic analysis by generating lemmatized versions of the texts, revealing the base or dictionary forms of words.
  1. Linking to The Dictionary of Old Norse Prose (ONP):
  • Integrate the lemmatized texts with ONP, creating a link between the sagas and an authoritative lexicographic resource.
  • Enable users to cross-reference words with their definitions, contextualizing the language within a linguistic framework and the broader context of other medieval text genres.
  1. Inflectional Database:
  • Develop a paradigmatic database showcasing different inflectional forms for each headword, allowing users to explore morphological variations.
  • Include unattested forms in the inflectional descriptions, providing a comprehensive view of potential linguistic forms and expanding the understanding of Old Norse grammar.

Results

This project contributes to the field of Old Norse studies by offering a unified digital platform that combines digitized sagas, lemmatized corpus, and a link to a comprehensive dictionary. The inclusion of inflectional descriptions, offers a useful resource to Old Norse learners and researchers, promoting a more nuanced understanding of Old Norse morphology. The platform's user-friendly interface will cater to a diverse audience, fostering research and facilitate the exploration of Old Norse language resources. As a result, important texts will be more accessible and engaging for scholars, learners and enthusiasts alike.



404 Not Found. Dire Straits and Safe Havens for Digital Scholarly Editions in Norway

Annika Rockenberger, Johanne Emilie Christensen, Federico Aurora

University of Oslo Library, Norway

Since its inaugural conference in Oslo in 2016, presentations and discussions about digital scholarly editions (DSEs) at DHNB conferences have dwindled. The "big names" and national icons (Henrik Ibsen, N.F.S. Grundtvig, S. Kierkegaard, Strindberg, Z. Topelius, L. Holberg etc.) have been published, and the eagerness and technological momentum of the earlier days of DSEs has come to a halt. We're now in a situation where even the larger, previously well-funded DSE projects face issues with legacy code and systems (Evensen 2020), maintenance, research software engineer/developer knowledge transfer, necessary back- and front-end updates, and the institutional requirement to make research data FAIR all the while there are no funds and research time allocated (Baunvig et al. 2023). Not to mention the many small DSE projects where individual researchers put much effort, knowledge, and expertise into creating valuable, now virtually useless scholarly resources because they cannot be accessed anymore.

Before the backdrop of this serious situation, the University of Oslo Library has launched an initiative to systematically investigate the "state of affairs" of DSEs in Norway and chart a route to a sustainable national infrastructure for digital editions.

Building and fostering a network of interdisciplinary researchers, software engineers/developers, and cultural heritage specialists is one measure for achieving sustainability. With the network, we aim to retain and spread knowledge about DSEs, their technical infrastructure and local solutions for hosting and maintenance. We believe the key to keeping valuable humanities data in the form of DSEs is having a strong community interested in and can lobby for allocating funds and resources to an infrastructure fit for long-term archiving and accessibility of editions.

Furthermore, we believe that sustainability can only be achieved when DSEs are conceptualized, planned, developed, and published within a realistic setting and with as much clarity about standards for data, software, systems, and maintenance requirements as possible. We will thus develop a set of recommendations for researchers at the University of Oslo and beyond, built on the outcomes of a feasibility study we are doing in the fall of 2024 and an in-depth survey of the state of the art of DSEs in the spring of 2024.

In our poster, we will provide an overview of the project - its background and aims - and will highlight our work with (a) community building as a key to sustainability, (b) the design and expected outcomes of our DSE survey, and (c) the design of our feasibility study for implementing the lessons learned into a national infrastructure.



Collaborative Infrastructure as a Disruptive Force for Interdisciplinary Digital Scholarship, Illustrated by Use Cases of the Transkribus Stakeholder Platform.

Andy Stauder1, Annika Rockenberger2, Minna Kaukonen3, Bragi Þorgrímur Ólafsson4, Unnar Ingvarsson5, Therese Foldvik6, Johanne Emilie Christensen2

1READ-COOP SCE, Austria; 2University of Oslo Library; 3National Library of Finland; 4National and University Library of Iceland; 5National Archives of Iceland; 6University of Oslo

Introduction:

We explore the positive feedback loop created by shared infrastructure that enables collaboration in scholarly research. Collaboration can take various forms, including data sharing, joint AI-model training, direct collaboration, and education. Each of these forms contributes to the loop, leading to improved capabilities and usefulness of the collaborative infrastructure, making collections of documents suddenly more valuable and increasing the attractiveness of scanning projects for memory institutions such as museums, libraries and archives. This leads to more interest in the infrastructure, which in turn makes it more useful and so forth. This positive feedback loop leads to an increased study of original sources and replicability of studies, i.e., strengthens scholarly quality criteria. The framework for this collaborative infrastructure is the READ co-operative. This is a stake-holder- instead of profit-oriented social business which was founded for the purpose of maintaining and further developing the technology and community built up during two EU-funded academic research projects. This socio-technological platform that was built during the projects and is now widely known in the academic community and beyond is called Transkribus.

Data Sharing:

Data sharing is a fundamental aspect of collaboration in scholarly research. By sharing data, researchers can build upon each other's work, leading to new insights and discoveries. This process creates a positive feedback loop where the more data is shared, the more research can be conducted, and the more knowledge is generated. The same is true for training data that is fed into recognition models for handwritten text recognition, natural language processing and information extraction.

Direct Collaboration:

Direct collaboration involves researchers working together on a common project, e.g. the transcription, annotation, statistical analysis or digital publication of original sources. This type of collaboration can lead to the development of new ideas, methods, and theories. It also fosters a sense of community and support among researchers, which can further enhance the research process. The discussed Transkribus software platform fosters this type of collaboration through its cloud-based approach.

Education:

Education is a critical component of collaboration in scholarly research. By educating students and early-career researchers on the importance of collaboration, including collaboration tools and infrastructure, we can create a new generation of researchers committed to sharing data and working together. This will further strengthen the positive feedback loop of collaboration and lead to even greater advances in scholarly research.

Human-Artificial-Intelligence Positive Feedback Loop:

The combination of networked human intelligence and artificial intelligence creates a positive feedback loop that amplifies the benefits of collaboration. When these two patterns of information processing are combined, they can create systems that are more efficient, effective, and adaptable than either pattern alone. Not only does this lead to greater quantities of historical data that can be processed, but also to qualitatively new insights.

Compounding Effect:

The positive feedback loop of collaboration has a compounding effect, leading to exponential growth in the amount of knowledge generated. As more researchers collaborate, more data is shared, better AI models trained, more research conducted, and more knowledge produced. This cycle continues to repeat itself, leading to a rapid expansion of human knowledge, across disciplines.

Practical Use Cases:

The collaborative Transkribus infrastructure has applications in many disciplines, in particular in the digital humanities, and among others, in the Nordic and Baltic countries. A few of them are to be presented in this paper, answering the questions: A) What was the general scope and purpose of the projects? B) If the project coordinator is a memory institution, how has it collaborated with researchers and research institutions, or the project coordinator is a researcher, research team or research institution, how have they collaborated with memory institutions? C) What role have Transkribus and the co-operative played in this?

University of Oslo Library

A)

  1. Transcribing two Early Modern prints for a bilingual digital scholarly edition. The Ethica Complementoria, together with the Tranchierbuch, in the German version from 1674 and its Danish translation from 1678. Public/shared models from the Platform were re-used and a dedicated model was trained for German print to minimise manual corrections. Export in XML/TEI will form the basis of a digital edition.

  2. Transcribing the private correspondence of Christopher Hansteen (Norwegian astronomer), especially the letters from the expedition to Siberia in the late 1820s. This collaboration with the National Library is re-using one of their models and customising it for Hansteen’s hand. It will be re-used to transcribe the professional correspondence held at the Museum for University History and the History of Science, University of Oslo, which collaborates with the University Library for its digitisation efforts.

  3. Planned: Transcriptions of the Library’s East Asian Special Collection, including Tibetan prints from the 18th and 19th centuries.

B)

  1. Training (Transkribus and other recognition software) and project or individual guidance. Guidance sessions with researchers at the University who want to use handwritten text recognition software for their transcriptions. These range from Arabic, Ottoman Turkish, Coptic, French, Spanish, German, English, Danish, Norwegian, and Latin texts in handwriting or print to musical notation (mediaeval) and specimen descriptions from the Museum of Natural History. The Library mainly does guidance and training sessions for researchers as part of their research support.

  2. There is collaboration with other memory institutions, namely the Museum for University History and the History of Science, the Dept. of Pedagogy and the Dept. of Economics at the University of Oslo, and the National Library of Norway.

  3. There is the aim of becoming a member of the Co-operative and offering researchers and partners in smaller memory institutions access to its services. This will also be part of the skills development and knowledge sharing hub “BærUt! Sustainable Digital Scholarly Editions” from 2024-2026.

C)

  1. Community support and quick answers to odd problems via a dedicated Slack channel

  2. Documentation and how-to-guides

  3. Ready-to-use training materials (e.g. presentation slides)

  4. Re-use of public models

  5. A scholarship programme that supports training sessions

National Library of Finland

A)

The primary role has been within the NewsEye Project, funded from the Horizon 2020 programme. In the first place, the researchers from the University of Helsinki – to which the National Library belongs - chose the most important newspapers from a research perspective to be reprocessed with Transkribus and to be used as pilot material in the Research and Innovation Action project. This entailed about 500 000 pages in two languages. In the second place, after the project, the research community speaking Finnish and Swedish has been able to enjoy the improved search results due to a large-scale reprocessing of another 2 million pages.

B)

The National Library of Finland works regularly with researchers, either as project partners in research projects or cooperating in other manners, for instance by offering data services, digital source materials or training in using both of these.

C)

The National Library of Finland has cooperated with read-coop sce to improve text recognition for Finnish historical newspapers from 1771 to the 1920s. The improvement rate that Transkribus provided, is remarkable compared to the earlier text recognition results. The reprocessed corpus consists of about 2.5 million pages in Finnish and Swedish. The improved versions of recognized newspaper texts are accessible in the publication and presentation system of the National Library of Finland on https://digi.nationallibrary.fi. The read-coop sce team, and before that, the Transkribus project team, have provided user support, project management, and a recognition technology that made new levels of correctness possible, providing significant value to the academic community.

The National and University Library of Iceland (NULI) and National Archives of Iceland

A)

Digital humanities project via the Icelandic Centre for Digital Humanities and Arts. This is a forum for the development, hosting, and consultation on the development and access to digital databases in the humanities and arts, as well as for research based on these databases. The project aimed to train the Transkribus application to read old Icelandic handwriting. The project, carried out by historian Emil Gunnlaugsson resulted in two models for late 18th century and 19th century Icelandic handwriting. The models have been made publicly available to Transkribus users on the platform.

B)

Being one research and one memory institution, the University and the Archives have been working through the Digital Humanities Center to foster collaboration between researchers on the one hand and holders of large collections of historical documents on the other. The goal is to promote the development of and access to research infrastructure in the field of digital humanities and to link Icelandic research to international development in the field.

C)

For the discussed project, the READ co-operative has mainly acted as a technology provider, offering an easy-to use, customisable tool for working with historical documents even with languages that have a relatively small number of speakers. An additional advantage is that even where recognition results are not perfect, they still make it easier to read the documents, to users of varying skill levels and the general public.

The project shows that the Transkribus infrastructure is also very compatible with other types of infrastructure and collaboration projects, fostering collaboration on the metalevel, too.

University of Oslo

A)

SAMLA – digitizing Norwegian tradition archives: Three Norwegian archives containing cultural historical material, which includes folktales, legends, traditions etc. are in the process of being digitised. SAMLA has generated ca. 500,000 image files. The aim is to make this material accessible through one web-based portal, which will be launched in the autumn of 2024.

B)

The project is coordinated by a research institution and thus working from a research perspective. One crucial part of the project was the source material, whose holders are both memory and other research institutions, namely:

- Norwegian folklore archives, University of Oslo

- Norwegian ethnological research, The Norwegian folk museum

- Ethno-folkloristic archives, University of Bergen, project owner.

This means two of them are archives situated within research institutions, which makes collaboration easier due to similarities in structure and workflows, while one of the institutions is a museum. This shows the importance of strong connections between research institutions and other organisations in society, in order to have both methodological rigour in research on the one hand and relevant objects of study on the other.

C)

Transcriptions play an important role in making the material accessible. The materials are varying in dialects, spelling, layout, and are a mixture of handwritten/typed, and clean text/drafts. SAMLA is currently experimenting with Transkribus recognition for making models for layout and text, aiming to get as low character error rate as possible to make the transcriptions readable for the public. There are plans to connect Transkribus to an existing (Goobi) database for more automatic file management and larger batches of documents for transcribing before publishing the transcriptions on the web portal. SAMLA also plans to find a good workflow for crowdsourcing, using Transkribus, where the public may correct errors they find in the transcribed material.

Conclusion:

The positive feedback loop of collaboration in scholarly research is a powerful force for advancing human knowledge. By sharing data, working together, and educating future researchers, we can create a more collaborative and productive research environment that will lead to even greater discoveries.



Archaeological Artefact Database of Finland (AADA)

Petro Pesonen1,2,3, Ulla Moilanen2, Meeli Roose2, Jarkko Saipio1,2, Jasse Tiilikkala2,3, Usman Sanwal2, Visa Immonen4, Outi Vesakoski2, Päivi Onkamo2

1Finnish Heritage Agency, Finland; 2University of Turku, Finland; 3University of Helsinki, Finland; 4University of Bergen, Norway

Archaeological Artefact Database of Finland (AADA) is planned to cover all prehistoric artefacts in Finland. So far, the database offers comprehensive information on over 49,000 collection entries of Finnish archaeological materials. It covers the whole prehistory of Finland from the beginning of the pioneer settlement after the Last Ice Age (c. 8900 calBC) until the beginning of the Medieval period (c. 1300 AD). Geographically, it covers the entire territory of present-day Finland, including the Åland Islands, and as well as artefacts collected before the Second World War from the territories ceded to Russia in 1945 (e.g. Karelia, Petsamo). The artefacts are categorized by type, and are accompanied with photos of the artefacts. The database includes the details and measures of such artefacts that can be classified typologically excluding e.g., flakes, informal chipped tools (scrapers, awls, burins, etc.), iron knives, nails and such.

The database provides spatio-temporal context for comparing artefacts across different time periods and regions. To facilitate data usage, we also offer a geospatial framework to implement the visualization and analyses of the database. The AADA database offers a valuable resource for studying Finland's prehistory and is accessible in Zenodo. The data will be continuously updated in the GitHub repository that will be managed by Finnish Heritage Agency and University of Turku. New versions of AADA will be launched to Zenodo in regular intervals. The AADA database is a part of the trend towards more open materials that can be used in collaborative research, representing also a shift towards greater reliability and quality.



Latvian Prose Counter: from digitized books to data visualizations

Anda Baklāne, Valdis Saulespurēns

National Library of Latvia, Latvia

The Latvian Prose Counter (LPC) is a multifaceted digital platform that showcases the potential of digital text analysis and visualization, provides comprehensive insights into Latvian novels from the 19th and 20th centuries, and serves as an experimental hub for full-text and metadata analysis of these novels. This initiative is a collaborative effort between the National Library of Latvia (NLL), the Institute of Literature, Folklore, and Art of the University of Latvia (ILFA), aiming to synergize the resources of both institutions to forge a comprehensive digital resource. The morphological and syntactical markup of texts is realized by using NLP tools created by the Institute of Informatics and Mathematics of the University of Latvia.

The poster delineates the LPC's workflow, which encompasses text digitization, preprocessing, analysis, visualization, and enrichment with references to full-text objects, authoritative data, and ILFA's database contributions. Utilizing open-source Jupyter Notebooks for data processing and visualization underscores the project's commitment to transparency and reusability.

As the number of novels digitized by the NLL increases, the content and functionalities of the LPC are constantly updated. This ongoing development is anticipated to evolve into a holistic representation of the Latvian prose landscape that will facilitate a nuanced understanding through distant reading methodologies.

At the time of this poster presentation, the Latvian Prose Counter offers insights into novels from the Corpus of Latvian Early Novels (1879-1940) and features four distinguished authors from the Soviet era (195 authors in total). Users can delve into various quantitative aspects, such as the frequency of words (categorized by author, work, and parts of speech), prevalent sentence types, and lexical diversity within texts.

Moreover, the platform emphasizes the importance of data visualization in complementing the presentation of quantitative parameters. It aims not merely to furnish details on these parameters but also to present this information through engaging and intuitive visual representations.



Digital tools, citizen engagement and vulnerable cultural heritage

Eiríkur Smári Sigurðarson1, Skúli Björn Gunnarsson2

1University of Iceland, Iceland; 2Gunnar Gunnarsson Institute, Iceland

The CINE project (Connected Culture and Natural Heritage in a Northern Environment), funded by the INTERREG Northern Periphery and Arctic Programme, aimed at transforming people’s experiences of outdoor heritage sites through technology, building on the idea of “museums without walls”. New digital interfaces such as augmented reality, virtual world technology, and easy to use apps brought the past alive, and allowed people to visualise the effects of the changing environment on heritage sites and helping them to imagine possible futures. CINE developed content management toolkits – enabling curators, archivists, historians, individuals and communities to make innovative heritage projects to create unique on-site and off-site customer experiences in specific locations.

We will present the main result of the CINE project and further work on developing a “citizen science app” (Muninn) to monitor and register cultural heritage sites, to collect new information about known sites (descriptions, photographs, 360 photographs and 3D photogrammetry models) and add to geographic databases.



Jubileumsportalen – contextualizing 1923’s jubilee exhibition using digital methods

Siska Humlesjö, Johan Åhlfeldt, Anders Strinnholm

University of Gothenburg, Sweden

Jubileumsportalen is a web portal collecting the views, images and other surviving documents produced during the Gothenburg 1923 Jubilee exhibition. The Jubilee exhibition was inaugurated 8th of May 1923 by the king Gustav V to commemorate the foundation of the city in 1621. But due to the extremely high ambitions, financial troubles, and the aftermath of WWI the exhibition was delayed by two years. Within the exhibition grounds, temporary structures were erected, including a modern lighthouse, an aerial railway, the world's largest restaurant, and a 7000 square meter industrial exhibition hall. While most of these buildings were later demolished, some of Gothenburg's most iconic landmarks were established during this jubilee period.

Jubileumsportalen[1] leverages digital methods to make this part of local history available to a wider audience. The portal utilizes digitized photographs from different sources. These materials are linked to one of the contemporary maps of the exhibition area using geographical data. By providing context to high-resolution photographs available in IIIF format, the project aims to make the exhibition's materials accessible. The digitized collection includes photographs taken by an official exhibition photographer, menus from the exhibition's restaurants, official posters, and more. Additionally, the project utilizes digitized literature, such as guidebooks, official publications, and news articles, to describe and contextualize both the materials and the geographical locations.

The portal is a collaborative initiative between Gothenburg University Library and Gothenburg Research Infrastructure in Digital Humanities (GRIDH). Its purpose is to employ digital humanities technology to showcase the library's collection, with the goal of making this historical material accessible and engaging for a wider audience.

The main interface presents a contemporary map from the exhibition. The user can click the different data points on the map and access the material in depth, with links to the source material published in GUPEA (Gothenburg University Publications Electronic Archive). Places and buildings are described and contextualized to give the user an understanding of the exhibitions scope and the society that formed it. A toogle also allows the user to explore the data collectively in a gallery mode. Making the images available in IIIF give the user the possibility to zoom in on details. All images are free to download and have citation information.

[1] Accessible at https://jubileet1923.dh.gu.se/



Representing the Íslendinga Saga As Knowledge Graphs of Events and Social Relationships: Developing Workflows Based on a Pilot Case

Shintaro YAMADA1, Jun OGAWA2, Ikki OHMUKAI1

1The University of Tokyo, Japan; 2ROIS-DS Center for Open Data in the Humanities, Japan

The sagas of medieval Iceland comprise several genres. The Íslendinga saga in the Sturlunga saga, classified as a contemporary saga, depicts events surrounding the powerful Sturlungar family clan and the social and political circumstances of the time. This research examines the Íslendinga saga with the aim of representing its narrative content as a knowledge graph.

A knowledge graph is a network of data that describes relationships between things in a machine-readable form. One way of representing such data is with RDF (Resource Description Framework), which can be used to structure information as a graph by linking entities through common formats. A knowledge graph is externally extensible; as long as a common descriptive format is used, separately created graphs can be easily integrated. Employing widely used vocabularies like CIDOC-CRM and HIMIKO will enable combining knowledge graphs. It should be noted that there is a research project which has also begun exploring an ontology for Icelandic sagas in Iceland and part of their works can be seen from the GitHub repository.

In constructing a knowledge graph of the Íslendinga saga, this research focuses on two aspects: 1) maintaining chronological continuity when representing the various events described in the saga, and 2) capturing relationships between characters. The graphs will be used to analyze the dynamics of how the characters solve their own problems that arise in the saga, and by doing so it can reveal the dynamics of problem-solving in medieval Icelandic society.

We create two different knowledge graphs in order to represent the Íslendinga saga according to the aspects. One is an event-oriented graph describing events in the saga and another is a character-oriented graph outlining relationships between characters. The event-oriented graph captures entities like persons, places, and objects and their associations in events such as conflicts, killings, and lawsuits. The character-oriented graph describes kinship ties like marriages and sibling relationships, as well as social relationships between characters where possible. Both graphs are constructed according to the texts of the saga, but the person-oriented graph also includes interpreted information based on the reading of the texts, such as friendship or social relationships between characters, which are sometimes not explicitly mentioned in the texts but are implicitly recognizable through understanding the contexts. These two knowledge graphs can be integrated into a single knowledge graph representing the knowledge contained in the Íslendinga saga.

The Íslendinga saga has approximately two hundred chapters and there are over a hundred people appear in the texts. As a pilot study for constructing graphs of the entire work, this research initially attempts to cover approximately one-quarter of the narrative, focusing on establishing workflows for graph construction. Specifically, it aims to compile vocabularies for appropriately capturing narrative content and identify entities and resources that need representation in the graphs.

YAMADA-Representing the Íslendinga Saga As Knowledge Graphs-217.pdf


Display, Ontology and Database for Exhibition Documentation

Emmanuel Château-Dutier, Lena Krause, David Valentine, Zoë Renaudie

Université de Montréal

This poster aims to present the Display project developed within L’Ouvroir, the Digital Art History and Museology Laboratory at the Université de Montréal for the Partnership for New Uses with CIECO. Bringing together a team of researchers in art history, computer science, and museology, the laboratory is working on a digital tool to assist research on exhibition displays. We would like to present our methodology and the role of the DH Lab in the conception of this tool.

The mobilization and utilization of numerous archival sources to document the history of art museum exhibitions and enable their reconstruction is at heart of the CIECO project. The Ouvroir is conceiving this tool necessary to support all research operations, from collecting historical information to formulating hypotheses and recording results.

In a often sparse documentary context, this abstract model allows for the recording of historical information about exhibition installations through a spatial approach defining the possibilities of topological inferences between objects in the exhibition space. The first step was then to design a computer ontology to explicitly and formally describe the characteristics of an exhibition installation (proximity and contiguity of exhibit, faces, is Left Of, etc.). It will be compatible with the [CIDOC-CRM](https://www.cidoc-crm.org/) ontology, a conceptual model of reference promoted by the international museum organization, forming an extension to cover the specific domain of exhibition installations.

The reflection on the database tool to be provided to researchers and communicating it to the developer is the second research axis of this project. User-friendly, the database will be used by art historians without specific technical skills and facilitates:

-     The creation or automated import of lists of artworks

-     The recording or definition of the geometry of an exhibition space

-     The localization of artworks in this space.

This brief presentation will account for the various roles in the team, the decision made, and the documents produced for this purpose. We are currently finalizing the database model to submit for production and would be pleased to present it to the expertise of your audience.

Château-Dutier-Display, Ontology and Database for Exhibition Documentation-146.pdf


Runoregi: A User Interface for Exploring Text Similarity in Oral Poetry

Maciej Michał Janicki1, Kati Kallio1,2, Mari Sarv3, Eetu Mäkelä1

1University of Helsinki, Finland; 2Finnish Literature Society; 3Estonian Literary Museum

This demonstration presents the user interface Runoregi used for exploring text similarity in large collections of Finnic oral poetry. We showcase the different views of the interface and their applications in folkloristic research.

Janicki-Runoregi-164.pdf


Towards Humanistic AI: Mapping an Emergent Field of DH Practices

Mats Fridlund1, Daniel Brodén1, David Alfter1, Ashely Green1, Aram Karimi1, Gustaf Nelhans2, Cecilia Lindhé1

1University of Gothenburg, Sweden; 2University of Borås, Sweden

Although many consider ‘AI’ or ‘artificial intelligence’ a debatable and fuzzy concept, it nevertheless today figures prominently in academic discourse, funding politics and policy-making. One could even talk about a broad institutionalisation process currently underway outside the fields of engineering and data science, as seen by several initiatives to manage the use of AI in higher education, the Swedish Research Council’s recent guidelines for using AI tools in research project applications (https://www.vr.se/english/applying-for-funding/applying-for-a-grant/guidelines-for-the-use-of-ai-tools.html), and the large Swedish research program WASP-HS (2019–2028) (https://wasp-hs.org/) aimed at fostering interdisciplinary knowledge about AI and autonomous systems in the humanities and social sciences, and their impact on human and social development. Notably, throughout this the role of the humanities is emphasised.

Following the surge of ChatGPT and‘generative AI’, the importance of humanities researchers is often argued to be in exploring ethical, social, etc., aspects of AI (Dimock, 2020). However, ,humanities scholars – perhaps most prominently within corpus linguistics and language technology, but also in Digital Humanities (DH) and, to some extent, traditional disciplines, such as archaeology, comparative literature and history – have for a long time been developing and using resources that are today often associated with AI and also tend to use this term in communicating their research to academic and non-academic stakeholders. At the same time, humanists are sometimes active within the fields of AI without realising the depth, degree or character of their involvement. Depending on how it is applied, AI is used as a terminology in technological applications (including the applications themselves), a general field of expertise or an imaginary of something that does not fully exist. For better or worse, one thing is clear: the influence of the term will likely continue to be felt in academic discourse for the near future, as a “fact”, “fantasy”, “desire” or perceived “destiny” (Zhao, 2022).

To not only meet the discursive norms but also open up for a more structured discussion and demystify the notion of AI within the humanities, we have elsewhere proposed Humanistic AI as an apt term for discussing an emergent field of practices in the intersection of the application of AI tools and the interests that fall within the domain of DH and the humanities, taking into account the contested nature of the term AI (Fridlund et al., 2024). As noted by one of the few publications that use the term, “While there is massive investment all over the world related to one side of AI, namely engineering, it is also important to create rules and competence related to humanistic AI and its effects on people and societies” (Zhao, 2022). Thus, this paper will further explore ‘Humanistic AI’ as a term that enables DH to contribute to the expanding AI discourse and by highlighting what could be viewed as AI-related work within DH centres and research.

By providing a conceptual overview and a bibliometric mapping of the emergence of a terminology that refers both to the field of AI and humanities in research publications, the paper will delineate the context of the term and what we mean by it. To ground our discussion in practice, we will address three core areas of practice considered as Humanistic AI, in terms of using, developing or interrogating AI which will be concretised through projects involving the Gothenburg Research Infrastructure in Digital Humanities (GRIDH, formerly Centre for Digital Humanities) at the University of Gothenburg. We conclude by suggesting the term’s pragmatic usefulness for communication with the wider research community.

Humanities + artificial intelligence

We will primarily use ‘Humanistic AI’ to reference activities within humanities research and cultural heritage that apply, develop or study AI tools and applications. For clarity, we briefly sketch out what we mean by ‘humanistic’ and ‘AI’, respectively. Within AI, ‘humanistic’ can be used to designate ‘humane’ or ‘human-like’ functionalities and behaviours as well as to describe aspects related to humanities disciplines or knowledge domains (our use concerns the latter sense). Among historians, there is a consensus that ‘the humanities’ consists of a complex of academic disciplines and practices perceived as distinct and yet under continuous renegotiation (Bon, 2013). For instance, in Sweden many humanistic disciplines move across different university faculties, and before the 1960s the faculty of humanities represented both the humanities and the social sciences (Ekström & Östh Gustafsson, 2022).

The meaning of AI and artificial intelligence is somewhat more problematic due to its increasingly widened and contested meanings. To clarify and critique the various uses (and abuses) of the AI term a number of alternative terms have been introduced, including ‘augmented intelligence’, ‘intelligence augmentation’, ‘automated approaches’, ‘autonomous systems’ and ‘intelligent systems’. The “classic” textbook describes AI as a field “concerned with not just understanding but also building intelligent entities—machines that can compute how to act effectively and safely in a wide variety of novel situations”, encompassing “logic, probability, and continuous mathematics; perception, reasoning, learning, and action; fairness, trust, social good, and safety; and applications that range from microelectronic devices to robotic planetary explorers to online services with billions of users” (Russell & Norvig, 2021). While such a broad range of notions exists about AI, we will pragmatically discuss it as an emergent field of practice that develops and studies so called intelligent machines, as well as the use of such algorithms and machines. In particular, this refers to machine and software applications from the subfields of, among others, Expert Systems, Machine Learning, Natural Language Processing, Speech Recognition, Computer Vision, Robotics, and Genetic Algorithms, which include a range of applications such as clustering, deep learning, image segmentation, text classification and topic modelling.

To discuss the emergence of a research field related to the humanities and humanistic endeavours, we have conducted bibliometric searches in GoogleScholar and Web of Science for publications including the term ‘humanistic AI’ as well as broader search criterias to capture wider uses in the humanities through articles mentioning AI together with ‘humanist’, ‘humanitarian’, ‘humanities’, etc. With such broader search criteria, we found 1,300 articles, reviews and proceedings papers in Web of Science. When visualising these more than 130 different AI-related keywords in VOSviewer (van Eck & Waltman, 2010), we found several distinct clusters, one revolving around digital humanities, technical applications of machine learning, internet of things and image and text analysis; followed by a cluster depicting AI and risks; another centred around posthumanities, cybernetic, and ethics; one on applications within humanitarian law and military applications; and one referring to heterogeneous fields of AI applications within education, culture, media and digital methods. The paper will further analyse these visualisations of the clustering of co-occurring keywords, including such central humanities related ones such as ‘AI ethics’, ‘ethical AI’, ‘human-centred AI’, ‘responsible AI’, ‘explainable artificial intelligence’ – and ‘humanistic AI’.

Notably, during the last two decades the term ‘Humanistic AI’ has been used in different ways. For instance, in 2003 it was used to describe the trajectory within design of intelligent machines that tries to emulate human cognitive capabilities rather than mimicking the human brain’s anatomical functioning (Krishnakumar, 2002). More recently, ‘Human-Centered AI’ (HAI) has been used for similar AI activities and processes. Such efforts are often shaped by a rationale implying that HAI in augmenting rather than replacing human decision-making is not just efficient but also more ‘fair’, ‘compatible’, and ‘humane’. Furthermore, there are a range of similar AI related activities drawing on HSS perspectives (see Saheb et al., 2022). Also, the term is increasingly used within academic research. The Media Lab at KTH Royal Institute of Technology engages in interdisciplinary research combining “advanced engineering with philosophy, art, aesthetics and other disciplines from the humanities” to “develop a strong humanistic stance with respect to AI” (https://www.kth.se/hct/mid/research/media-lab/about-1.929121) and the University of Bologna’s Humanistic AI unit applies AI techniques to humanities that includes “classification, exploration, management, and preservation of cultural heritage, archives, or demo-ethno-anthropological materials” (https://centri.unibo.it/alma-ai/en/scientific-units/humanistic-ai).

Applying, developing and interrogating AI

Drawing together these topics and themes, we suggest that AI is involved in humanistic research mainly through three core practices (exemplified below through the expertise at GRIDH): 1) humanistic scholars applying existing tools incorporating AI applications in their research; 2) engineers and programmers developing custom-made, AI-related resources for humanities research; 3) humanists interrogating AI through reflexive critical analysis of AI tools’ embedded values, positions (‘bias’) and affordances.

Application of AI involves a range of diverse techniques and methods which includes vector representation for text, contextual search, data annotation, clustering, image classification, and recognition. Specific examples of applications implemented at GRIDH include advanced word embeddings (Word2Vec, FastText, etc) to create vector representations of textual content allowing for semantic similarity analysis, topic modelling, and contextual understanding; word embeddings in combination with domain-specific ontologies to enhance the semantic understanding; capture evolving themes and topics in historical text through use of topic modelling techniques, such as Dynamic Topic Modeling (DTM), semantic search to clarify meaning of queries and documents and to improve search recall precision; and image colour clustering based on similarity of embeddings.

Development of AI involves developing resources for solving complex research issues not easily solvable by simply applying existing applications. This can be done in different ways, such as training classifiers, fine-tuning existing or training new transformer models from scratch based on specific text or image corpora. Such development practices at GRIDH include computer vision and deep learning techniques for automatic image annotation, object detection and segmentation for image labelling. However, developing more general AI applications requires a deeper understanding of the underlying principles (and implications), and large amounts of training data, which is a constraint often hard to satisfy in the humanities (e.g. documents to be analysed are in extinct languages, or artefacts under scrutiny no longer exist).

Interrogation of AI entails applying humanistic research-based reflection to interrogate the implications of the AI tools and methods. This partly relates to practices within fields such as Critical Digital Humanities, Critical Code Studies, digitalSTS, etc, that concern interdisciplinary analysis of advanced data-driven approaches, software, etc, and the socio-cultural production of knowledge in digitalised society. In practice, such AI reflexiveness at times comes as explicit interdisciplinary studies including humanities scholars as well as tacitly in project conversations with humanist scholars probing the interpretative limits and affordances of the data generated by AI tools. This often entails making obtuse AI algorithms fathomable or at least trying to work out their inner workings by conducting in-depth analyses of model performance, and training processes that include human-in-the-loop components or active learning techniques.

The presentation will explore these three areas of Humanistic AI practice through descriptions of text-based and multimodal DH-projects at GRIDH: ‘The Nordisk familjebok’ research infrastructure project developed together with Data as Impact Lab at the University of Borås, that implements ‘likeness’ search functionalities using a Word2vec-model; ‘The New Order of Criticism’ (Ingvarsson et al., 2022) mixed methods project that use Swedish LLMs for classification of book reviews in newspaper corpora with a comparative perspective on quantitative and qualitative approaches; the ‘Literary Lab’ developed for the Swedish Literature Bank uses ML algorithms to cluster images of illustrations, initials, graphics ornaments, and sheet music for visualisations; The ‘Ivar Aroseniusarkivet’ project visualises thematic clusterings of a large archival collection of artworks; the project ‘Rock Art in Three Dimensions’ (Horn et al., 2022) uses AI-enhanced Augmented Reality (AR) technologies aligned with the ethics of conservation; the research project ‘Terrorism in Swedish politics’ (Edlund et al., 2022) carried out together with Språkbanken Text (University of Gothenburg) and Språkbanken Speech (KTH Royal Institute of Technology) studies parliamentary discourse on terrorism, drawing on both speech analysis in the form of automatic speech recognition (ASR) and deep neural networks, and text analysis, using, among other things, word vectors to trace conceptual development in nuanced ways.

Summary

By combining a conceptualisation of Humanistic AI with highlighting the AI-related DH practices at GRIDH, our paper will contribute to opening up a discussion and demystifying AI as a field of interest within the humanities.



Towards Standards in Digital Editions of Old Norse Prose: A Case Study

Sebastian Pohland

University of Oslo, Norway

This paper provides an overview over the authors ongoing PhD-research project at the University of Oslo, which aims to contribute to ongoing efforts to transition the field of Old Norse philology into the age of digital humanities by providing a detailed meta-analysis of the tools and technologies currently available for the creation of Old Norse digital editions, as well as the opportunities and pitfalls presented by transition towards digital-first edition projects given current trends.

Pohland-Towards Standards in Digital Editions of Old Norse Prose-203.docx


Digital Datasets Created from Archival Sources: The Problem of Data Quality in the Study of Private Letters

Marin Laak, Kadri Vider, Neeme Kahusk, Mari Sarv

Estonian Literary Museum, Estonia

We will focus our presentation on the analysis of the results we have obtained from studying the letters and correspondence in the Estonian Cultural Archives through the textual databases created from them. The empirical basis of our research is the collection of manuscript private letters of Estonian literary figures in the 20th century. We would like to discuss some methodological issues related to data preparation to highlight an important problem of the impact of the quality of textual data on research results. The goal of our work is to analyse the results of the application of computational methods and their dependence on data quality. For this purpose, we compare different datasets created from archival sources of cultural history and highlight how the quality of metadata and the structuring of content elements affect the content of research results.

The research presented in this paper was conducted in the framework of the research project „Source Documents in the Cultural Process: Estonian Materials in the Collections and Databases of the Estonian Literary Museum” (I and II, 2019-2023, funded by the Ministry of Education and Research of the Republic of Estonia). The project focused on the implementation of digital methods and international standards in the management, publication, and research of archival sources. The use of existing and emerging textual data and databases with help of computational analysis will allow for an increasingly better and more evidence-based overview of the various aspects of the information stored in the collections of Estonian Literary Museum (ELM), as well as of changes in society, culture, mindsets. Our interdisciplinary research is inspired by the surprising results achieved in the study of Estonian and Finnish folklore using computational methods, e.g. in the study of poetic and narrative text corpora of Estonian folklore (Sarv, Järv 2023) and similarity analysis applied to Finnish oral folklore (Janicki, Kallio, Sarv 2023).

The archival sources of our research consist of private letters and correspondence between Marie Under (1883-1980) and Ivar Ivask (1927-1992), two Estonian exile/diaspora writers in the West after World War Second. Marie Under has been one of the most appreciated poets in the Estonian diaspora in Sweden, the candidate of the Nobel prize. Her correspondence with the younger literary scholar and poet dr. Ivar Ivask in Minnesota (and later Oklahoma), USA contains all together ca 550 letters from 1957-1979. Their letters are exceptionally poetic, but also long and informative as it was our experience in traditional literary analyses (close reading).

For exploring thematic and temporal variability of correspondence we had to create a textual database of letters. The methodological challenge relates to archival data preparation. The first step in this complicated process was converting the handwritten letters into a machine-readable format to use them as textual data and applying automatic language data analyses (see also Laak et al 2019).

Raw version of the dataset of Marie Under and Ivar Ivask correspondence consists of approx. 300,000 words, thus the average annual number of words in letters was about 3000 words. Raw data (text only) was unstructured, the authors and dates of each letter were not distinguishable automatically, and thus the data quality was unknown and required metadata extraction.

Our initial hypothesis made by traditional literary qualitative research was, for example, that the correspondence of our refugees/diaspora writers thematically covers a wide range of topics, through which the productive activity of literati in preserving national culture in exile and diaspora communities opens. To explore these hypotheses, we applied frequency and theme analysis to the raw data.

Through theme analysis applied to the original textual data, the dynamics of correspondence over the years was revealed. Secondly, the analysis of word frequencies offered surprises. Topic analysis of the raw data revealed the dynamics of the correspondence and a large number of such topics, which were rather very personal. The frequency analysis of top content words applied to the study of the correspondence texts also yielded to several unexpected results.

The results of the frequency analyses showed that top of eight content words (nouns and verbs in base form of lemma) are: letter (‘kiri’), poem (‘luuletus’), to write (‘kirjutama’), to do/perform (‘tegema’), time (‘aeg’), to read (‘lugema’), poetry (‘luule’), to see/meet (‘nägema’) (Laak, Kirss 2023). We expected that Estonia (‘Eesti’, in form of proper name or possible gerund) belongs also into list of tops, but surprisingly it wasn’t so.

According to the top content words frequency plots, we can claim that if before 1965 the subject matter of the letters was broader, then for some reason there was a change in relationships of correspondents. In 1965 the correspondence stopped for a while and then relatively formal letters were exchanged. From this moment we see from relative frequencies of top words that from the usual distribution, the content words ‘letter’ and ‘read’, which refer to more formal relationships, emerge.

The frequency analysis of topical content words did not support the hypothesis about the broad topic of the letters in national exiles and diaspora activities. It turned out that the number of topics at the centre of the correspondence was relatively narrow. The focus of the letters was the reading and analysing the artistic values of poems, the translation of poetry and preparation of books for publishing.

The results of the project showed that in order to study textual datasets created from manuscript archival sources using computational methods, it is necessary to solve a number of specific problems. The next challenge would be to 1) analyse the frequency of topical content words by distinguishing the authors of the letters, based on the structured and annotated database of the letters; 2) to carry out a network analysis in order to determine the geographical extent of international contacts of Estonian writers in exile.

One of the goals of our study is to analyse the results of the application of computational methods and their dependence on data quality. In our presentation we will compare different datasets created from older archival sources and highlight how the quality of metadata and the structuring of content elements affect the content of research results.

Based on the topic analysis applied to text collections with higher quality data, it is possible to highlight topics personally related to the authors of the letters and the differences between the authors.

Computational exploration of digital data from archival sources of cultural history revealed best practices and lessons learned from collaboration between archive, literature studies and computational linguistics.



Capitalizing on experience to experiment and innovate: feedback and reflection on the future of the Huma-Num research infrastructure

Antoine Silvestre de Sacy, Stéphane Pouyllau

IR* Huma-Num (UAR 3598), CNRS, France

The French national research infrastructure Huma-Num has just celebrated its tenth anniversary. We feel that this is an opportune moment to take a step back and reflect on the past ten years, combining the experience gained with the innovation required of any infrastructure.

The three historical missions of the infrastructure have always remained the same:

  • To accompany the evolution of the human and social sciences (SSH) communities in the context of digitalization and Open Science;
  • To implement an infrastructure for the "FAIRization" (Findable, Accessible, Interoperable, Reusable) of data;
  • To participate in the construction of international infrastructures with the SSH communities in the context of the European Open Science Cloud (EOSC).

Relying on national and international communities, the Huma-Num IR* has been built around a chain of services following the data lifecycle and designed to promote this _fairization_ of research data, developing in-house services or offering off-the-shelf services corresponding to needs typical of digital humanities projects (data storage services, web hosting, virtual machines, computing power, data warehouses, search engine...).

The core principles of the infrastructure are built around three axes:

  • Innovation through use: relying on communities of researchers to co-develop services based on and adapted to their needs.
  • Service operation and data control: hosted at the IN2P3 computing center in Lyon, Huma-Num hosts all data in a sovereign and controlled manner.
  • Mixed and in-house development: backed by industrial partnerships, Huma-Num co-develops part of its services, taking advantage of the opportunities offered by the CNRS (partnerships, joint laboratories with industrial companies, etc.).

However, ten years after its construction, at a time when we can say that the infrastructure has reached a phase of maturity, having overcome the initial challenges associated with its start-up and established a solid base in the national and international landscape, the question of what is a national research infrastructure in the humanities and social sciences today arises all the more acutely, all the more so at a time of democratization of artificial intelligence and massive use of generative models :

  • How can we reconcile the exploitation of existing services and the innovation required for tomorrow's challenges?
  • What is the right balance between listening to communities, anticipating needs and guiding research practices?
  • How can a national research infrastructure can capitalize on past experience to design tomorrow's research infrastructure, especially at a time when innovation seems, at first glance, to be the preserve of private companies ?


Visualizing quire structures on Handrit.is

Beeke Stegmann

Árni Magnússon Institute for Icelandic Studies, Iceland

Information about quire structures of manuscripts can be quite complex, but are often highly relevant for researchers. This is in particular the case when scholars are investigating aspects related to the manuscripts’ materiality and genesis. Therefore, quire structures are a standard element that is included in most catalogue descriptions of handmade books.

Traditionally, quire structures are given in more or less condensed formulas, and their format can vary considerably between subfields. To increase the usability of quire structure information and to make them more easily accessible to different users, we experimented with including visualizations of quire structures into the joint online catalogue Handrit.is. Instead of re-inventing the wheel, however, we wanted to take advantage of available software and decided to build on open-source system for modelling and visualizing the physical collation of manuscripts “VisColl” (Collation Visualization; see Porter et al. 2017). The idea was to integrate the existing code into our online catalogue, but adjusting it for our particular use and needs was not without challenges.

The aim of the present presentation is to reflect on this collaborative initiative at the crossroads of archives, conservation, manuscript studies, cataloguing, DH and data management.[1] Lessons learned will be addressed with focus on how DH can best support the digital use of collections - in this case through additional tools in an online catalogue - and satisfy needs and requirements of both general ongoing cataloguing as well as particular large-scale research projects.[2]

Incorporating the open-source data module was not as straight forward as initially thought. Even though the creators of VisColl make all their code available and share it in an exemplary manner, the nature of our use is different from what was aimed at with the original software, requiring considerable adjustments. In particular, employing visualizations for cataloguing purposes and sharing them with users on a large scale makes a one-by-one project layout and personal logins impractical. In the end, the solution that was most practical and efficient to us was to use the VisColl interface for entering data, exporting the underlying code generated by the software, but employing our own, newly developed code for displaying the graphs. That way, a pop-up window can be embedded in the front-end of the online catalogue Handrit.is for manuscripts that have the relevant encoding. Also, no login is required for the user and additional visualizations can be created one by one as the cataloguing progresses.

[1] Institutions collaborating on this initiative are The Árni Magnússon Institute for Icelandic Studies and the National and University Library of Iceland. Main participants are, in alphabetical order by first names, Beeke Stegmann (SÁM), Halldóra Kristinsdóttir (Lbs), Kristinn Sigurðsson (Lbs), Silvia Hufnagel (SÁM) and Trausti Dagsson (SÁM).

[2] The initiative is part of a three-year research project, “Life of Paper: Cycles of production, Use and Reuse of 17th-Century Paper in Iceland”, funded by The Icelandic Research Council; Grant Number 228695.



Understanding researchers' needs by surveying to support them

Liisa Näpärä

National Library of Finland, Finland

In order to effectively support, collaborate, and understand researchers’ needs, there is a demand for systematic information collection as digital research, methodologies, technology, pedagogy, and practices are evolving. In recent years, the National Library of Finland (NLF) has emphasized its focus on this aspect. This aligns with the purpose of repeating a survey in the first months of 2024 to evaluate researchers' current needs and experiences with digital resources, materials, and services. The survey follows the design of the initial survey conducted in the spring of 2020.

The survey about the researchers’ needs for the digital resources and services of the National Library of Finland has shown to be a remarkable way to attain new development ideas and improve current services. It has been structured with a research data life cycle and data management planning in mind to help identify services and their relevance to researchers. After the survey 2020, various activities and improvements have been conducted. For instance, a specific tool was developed to collect customized datasets, and to serve those who are interested and capable of handling moderate-size (not necessarily big) data with qualitative or mixed methods. Data download numbers have already been promising after two years of its launch. Additionally, a contact point for collaboration has attracted regular researchers. However, there is still room to raise awareness about this option among the broader research community.

Besides the survey-based actions, a number of research collaborations have been active. To name a few 1) Fin-Clariah research infrastructure for social science and humanities to provide out of copyrighted digitized materials to computation environment, and 2) large Finnish language model development together with TurkuNLP group based on the modern language and legal deposit collection.

After a few years, it is time to update the relevance of current and existing research and data services. The survey aims to identify areas requiring enhancement and guide the NLF focusing on the development of digital collections, data, and research services. The core emphasis of the survey lies in understanding researchers' experiences and cultivating ideas for the improvement of digital resources and services. Comparing the upcoming results with the previous ones will indicate or at least reflect how well and widely the NLF’s recent development have found their way to knowledge among the researchers.

In the presentation, the 2024 survey results are analysed, further initiatives for development are delivered and conclusions are compared with the previous research survey highlighting areas of improvement. Repeated surveys are essential in providing evidence of evolving needs, with particular attention contributing to the potential shifts in research profiles. The previous answers indicate that the methods used in digital resources research are still very traditional for humanities and social sciences. Additionally, the survey implied that most attention is needed to digital newspapers and research collaboration in general. The actions have taken place and collaboration has been conducted in projects and other occasions.

The survey results are anticipated to provide significant value and serve as indicators for the further development of collaboration, research, and data services, aligning closely with the evolving needs of researchers.



Uralic Historical Atlas (URHIA): Interactive web app for spatial data

Meeli Roose1, Tua Nylén1, Petro Pesonen2, Harri Tolvanen1, Outi Vesakoski1

1University of Turku, Finland; 2Finnish Heritage Agency (Museovirasto)

The field of digital humanities has advanced significantly, with improved infrastructure and storage capabilities fostering overall research development. This progress, including the evolution of datasets, has expanded opportunities for spatial data storage, management, and analysis. The "spatial turn" in digital humanities incorporates spatial analysis, GIS, and other methodologies into evolving datasets, facilitating cultural studies and providing clear spatial views of complex data.

For 15 years, the University of Turku has fostered interdisciplinary collaboration in studying language evolution and human diversity (www.bedlan.net, https://sites.utu.fi/urko/, www.humandiversity.fi). We compile spatial data and offer open access to databases in Finland and Northern Eurasia. To ensure easy access, we developed the Uralic Historical Atlas (URHIA), an interactive platform (https://sites.utu.fi/urhia/) for researchers and lay audiences. URHIA is built on UTU-GeoNode (http://geonode.utu.fi/) as a versatile resource hub.

Within URHIA, we curate thematic spatial datasets, creating a dynamic space beyond a mere repository. It serves as a live data showroom, presenting various datasets through interactive online maps, enabling active user engagement. Current offerings include the Uralic Language Atlas, showcasing speaker areas, and the Archaeological Artefact Atlas of Finland. We'll discuss URHIA's developmental challenges, particularly those of the two current showrooms, emphasizing the need to address challenges for future advancements.

The initial URHIA platform, developed collaboratively within the URKO project (2020-2022) at the University of Turku, focused on spatial language data (Rantanen et al., 2022). It introduced the Uralic Language Atlas, utilizing interactive maps to showcase diverse information on Uralic languages' speaker areas. This approach, guided by user-centered design (UCD) principles (Roose et al., 2021), set a standard for cross-disciplinary collaboration, involving experts from biology, linguistics, archaeology, geoinformatics, and IT support.

The development of the second showroom, initiated in 2023, the Archaeological Artefact Atlas of Finland, introduced fresh challenges for the underlying platform and necessitated the integration of the Oskari platform into UTU-GeoNode, forming the foundation for URHIA. The data within the Archaeological Artefact Atlas encompasses the typological classification of prehistoric artefacts along with their coordinates (Pesonen et al. 2024, submitted). Sharing archaeological data on a spatial platform poses challenges due to the specific needs of data functionalities related to the subsetting of big data on the map. This database offers a spatio-temporal (c. 8900 calBC - 1300/1500 calAD) context for comparing artefacts across different periods and regions, encompassing approximately 38,000 single artefacts and approximately 10,000 pottery-type identifications.

The design and user-friendly layout of historical spatial data platforms is crucial for enhancing spatial data usability (Slingerland et al., 2020). Data visualisation plays a significant role in enriching studies, emphasising the importance of how a map view is designed in spatial data platforms (Coetzee et al., 2020; Jiang et al., 2019; Kraak and Ormeling, 2020). Through teamwork and a dedication to user-centric design, the URHIA spatial data platform has become a dynamic tool, ready to meet the diverse research needs of the community exploring Uralic historical and cultural data.



Examples from the Translocalis: Cultural Heritage, Narratives, Emotions, Perceptions and Voices of the Finnish Media, People, and Soldiers on the Imperial War.

Aytac Yurukcu

University of Eastern Finland Karelian Institute

"Like everywhere in Finland, the call to provide woollen clothes for soldiers has been well received, and the same goes for here. The government, in turn, has granted 25,000 marks for the acquisition of fur coats for the guard." Satakunta, 10 September 1877, 45, 3. (A letter from the reader of the Satakunta newspaper from Helsinki.).

The nineteenth century was a challenging period not only for the Ottoman Empire, which Russia condescendingly referred to as "The Sick Man of Europe," but also for Russia, which faced all the major powers in the Crimean War. These empires fought nine times between the beginning of the seventeenth century and the Russo-Turkish War of 1877–78. The war had far-reaching consequences in the Balkans and Caucasus and was unique in the way journalists portrayed the war to Europeans, Russians, Turks, and Balkan nations. The media, journalists, military attachés, and foreign correspondents had shaped war news, which drew a sizable audience and frequently deeply engaged people on both an emotional and intellectual level. However, the conflict also exerted a substantial influence on the ethnic majorities and peripheral minorities of the Russian Empire and in the army, including Finns, Estonians, and Poles. Hence, the war had the capacity to significantly impact the Finnish media, society, and the character of the national movement with regards to patriotism and the nation's position within the empire (Liikanen, 1995; Alapuro, 2018).

The war created new circumstances in realpolitik all around Europe and between the empires. One of these outstanding impacts of the war was inevitably on the Baltic Provinces and Grand Duchy of Finland (Thaden, 1964), dominated by the Russian Empire during the golden age of nationalism in the late nineteenth century. Within the imperial context in Finland (Snellman and Kalleinen, 2022), the war also shaped the different media and war perspectives of the people and the army officers on the threshold. This paper provides a scholarly contribution to the study of vernacular writing history by using previously unused letters from readers (Kokko, 2021; Kuismin & Driscoll, 2013) and soldiers as primary sources.

The Translocalis Database (1) contains over 60 published letters from readers and soldiers in 1877–78 that will be examined in this research. Using local letters, which had been sent to newspapers by readers and soldiers from the war zone, as source material gives details on the perspectives of society, people, and war and shares the community's experiences with the war. The study posits a significant hypothesis that Finland's emerging notion of a distinct state and nationhood was influenced by wartime events, as evidenced by particular instances from readers' letters, news coverage of the war in newspapers, and tales of troops.

The theoretical framework will be formulated with the interaction between one's personal identity and a sense of belonging in the community, as argued by Knott (2017), the concept of social layer experiences, and the theory of historical times, as developed by Koselleck (2004). Koselleck emphasizes the war impacts and the link between "the space of experience" and "the horizon of expectation" in national and social politics. Finally, the classical nationalist theory of ‘imagined community’ Anderson (1991), since the public sphere was the essential foundation for nationalism as an "imagined community", and the development of mass movements of civil society in Finland, by using cultural heritage collections, qualitative content analysis, and digital humanities tools (https://korp.csc.fi/korp/, https://voyant-tools.org/, https://digi.kansalliskirjasto.fi/, and https://digi.kansalliskirjasto.fi/collections?id=742) will be implemented to analyze the written texts by key themes: society, soldiers, solidarity, narratives, war news, and enemy images.

Many scholars—Kansanaho, 1965; Hiisivaara, 1969; Backström, 1996; Laitila, 2001; Suistola and Tiilikainen, 2014; Outinen, 2016; Parppei, 2021—discussed the war from various perspectives; however, they have not been directly linked to societal experiences. By using double-sided letters, newspapers, and diaries all together, such practices about the war (Kettunen, 2018) addressed addressed the issue of nationhood, solidarity, welfare, and nation-building Of course, firsthand evidence and national experience have been difficult to gain. However, the Translocalis also contains articles from readers that were published in newspapers throughout times of conflict, providing distinct and extensive information about people's war experiences. In addition, the Digital Collections of the National Library of Finland (2) contain mass media sources that provide insight into the effects of the war on the mindset of the Finnish media, military personnel, and the information disseminated by them and media outlets during the peak development period of Finnish media and the later period of societal renaissance.

This research looks at how the war altered the general, local, social, and political history of information circulation in Finland by asking: How did the war affect the community? How did the 1877–78 battle impact the media and soldiers' narratives and perceptions of liberty and independence? What kinds of war, news, narratives, and emotions have been expressed in the newspapers, as well as in the readers’ and soldiers’ letters, and why? How did they imagine, construct, narrate, and visualize their senses and sensations during wartime?

(1) Translocal Database developed by the Academy of Finland Centre of Excellence in the History of Experiences (HEX), https://digi.kansalliskirjasto.fi/sanomalehti/binding/431835?page=3

(2) Digital Newspaper Collections: National Library of Finland Digital Newspaper Collection. http://digi.kansalliskirjasto.fi/



Automation of Linguistic Annotation in Historical Lithuanian Corpus

Mindaugas Šinkūnas, Ignas Rudaitis

Institute of the Lithuanian Language, Lithuania

Institute of the Lithuanian Language holds the largest corpus of historical Lithuanian language that has been collected for more than two decades in cooperation with scholars from Lithuanian and German Universities. The texts are dated from the first Lithuanian printed book in 1547 to the formation of standard Lithuanian at the end of 19th century. The precise transcription of books and manuscripts was achieved working with original copies, which are held in libraries across Lithuania, Poland, Germany, UK, Sweden etc. The corpus of ca. 6m tokens is mostly used to research the history of Indo-European and Baltic languages and literature. The potential of the research is increased with tagged non-linguistic metadata and a partial linguistic annotation.

Four types of annotation are set to augment the corpus: (1) part of speech, (2) inflectional features, (3) lemma, and (4) modernized spelling. In the general context of linguistic corpora, (1–3) are commonplace, and (4) reflects the specifics of a historical corpus. Historical Lithuanian Corpus, as many other corpora of this kind, consists of texts written in various dialects and orthographic conventions; they vary greatly between the texts of the same period, and are different from the orthography standard of nowadays.

To minimize manual labour in producing these annotations, various computational approaches will be assessed, with most focus on supervised machine learning models. This choice follows naturally from the fact that prior to this work, the present authors had already compiled a training dataset in the order of hundreds of thousands of manually annotated tokens, consolidating multiple scholarly sources.

All of (1–4) can be—and have been—stated as computational problems. In the parlance of natural language processing (NLP), (1) is known as part-of-speech tagging, (2) as morphological analysis, and (3) as lemmatization. In NLP, the interest in (1–3) is already widespread, producing many well-performing solutions. Therefore, a developer of any historical corpus could also be expected to benefit from them.

However, (4) is different. In the literature, “historical spelling normalization” is the preferred term for this task; its relevance is mostly confined to historical corpora, rendering its popularity much lesser in comparison. In addition, even (1–3) have some specifics when applied to historical language varieties. These specifics, arising mostly from sparsity of data, are sufficient to deprive some first-line NLP approaches of their otherwise established advantage.

With that in mind, it is prudent to regularly reassess the state of problems (1–4) in the context of historical language varieties. It is especially pressing nowadays, given the unprecedented flux in the NLP ecosystem, first brought about by deep learning and distributional semantics, and then by large language models (LLMs). The development of Historical Lithuanian Corpus has presented such an opportunity.

This paper will focus on the following questions, to the extent that the experience with historical Lithuanian varieties permits: (a) To date, have LLMs become relevant to tasks of historical NLP, and if so, how are they to be integrated to the pipeline? (b) How do the most common models in neural NLP compare when historical datasets are used, and is the comparison influenced by the specific nature of the datasets? (c) In this context, can these models benefit from different choices of pipelining and representation? (d) Can annotated corpora of modern language varieties be used to augment historical training datasets, and if so, in what ways? (e) Given that, as a computational problem, historical spelling normalization overlaps with spelling correction, grapheme-to-phoneme transduction and some other NLP tasks, can the advances in solving these latter tasks be transferred to solving historical spelling normalization? (f) Given this very same overlap, is an analogous kind of transfer possible in the opposite direction?

Lastly, the paper will remark on some typological aspects of Lithuanian, which should help the reader situate the results in the greater perspective of historical NLP.



Text Recognition, Network Analysis, and Spatial Analysis: Approaching 17th-Century Court Records from a New Perspective.

Ville-Pekka Iivari Kääriäinen

University of Helsinki, Finland

This paper explores 17th-century court records from the Parish of Iisalmi, utilizing digital humanities approach that integrates Handwritten Text Recognition, Social Network Analysis, and Spatial Analysis. By examining the progression of state formation at the local level, this research offers fresh insights into the dynamics of state building “from below,” challenging traditional narratives centered on the decrees of central political elites. The study leverages the Court Records of Iisalmi parish, a rich but underexploited source that, despite its detailed content, presents challenges due to archaic language and handwriting. Employing HTR technology, particularly through the Transkribus application, the researcher has developed a model for deciphering 17th-century handwritten Swedish, making these valuable records more accessible to the academic community.

The creation of a relational database from these records has allowed for an unprecedented level of analysis, collecting meta and descriptive data for each legal case and linking individuals across multiple cases. This methodology not only bridges the gap between quantitative and qualitative research methods but also highlights the importance of everyday interactions and the agency of seemingly marginal figures in the historical process of state formation. The paper further discusses the potential for future research, particularly in the application of spatial network analysis to better understand the geographical dimensions of these interpersonal relationships.

Kääriäinen-Text Recognition, Network Analysis, and Spatial Analysis-191.pdf


Historical Farm and People Registry – Turning static list entries into network nodes

Eiríkur Smári Sigurðarson1, Pétur Húni Björnsson2

1University of Iceland, Iceland; 2Árni Magnússon Institute for Icelandic Studies

The aim of developing the Historical Farm and People Registry is to create a reliable infrastructure for research involving data on people and places in Iceland from 1703 to 1920 based on official census in the period. This has been done by (a) establishing a reliable historical farm registry, (b) mapping census data onto the farm registry, and (c) connecting people between censuses, and thereby transforming the static lists of the censuses into an interconnected network of nodes. In the spring of 2024 an attempt is made to use AI solutions to suggest linkages between individuals that have not so far been linked, "finishing" the product as far as possible.

This presentation will outline the aim and scope of the project, explain the development process and problems encountered on the way, and demonstrate a use case for the "final" product.



Collecting streaming services

Andreas Lenander Ægidius1, Mads Møller Tommerup Andersen2

1The Royal Danish Library, Denmark; 2University of copenhagen, Denmark

In the streaming era, the very thing that defines it is what threatens to impede access to important media history and cultural heritage. Streaming’s barriers to entry and its interim content catalogs challenge the actual collection and preservation of it for research and teaching purposes. If researchers and libraries do not work together to document and preserve these, we will keep losing important sources and data. A legal mandate to collect and preserve cultural heritage drives the national library to address this issue. The same issue sparks engagement among scholars that wish to collect streaming for research purposes, but who end up making individual archives unknown to their associates. Therefore, we wish to address this issue in collaboration by answering the following research questions: What methodological challenges do we find when we collect and study streaming services using our two different collection methods? What characterizes collections of streaming interfaces and how can we improve future collections? From a collection perspective, we argue that streaming services consist of their catalog, metadata, and graphical user interfaces. First, we map the large-scale legal deposit collection of streaming at a national library as well as a media researcher’s small-scale targeted collection. Second, we compare the resulting collections of web sites and graphical user interfaces in order to discuss methodological challenges. The findings of this comparative analysis indicate the existing deficiencies in both collections and suggest potential improvements in the collection and preservation of streaming services.

Concluding discussion

In the following, we will focus on methodological challenges surrounding the collection process. We will divide our discussion into three sections. First, we will discuss the advantages and disadvantages of the two approaches. Second, we will discuss the extent to which the two collections can supply or support each other and in that way mitigate their methodological challenges. Finally, we will discuss steps to achieve greater transparency and better data about and from streaming services.

We have described the national library’s method as a general collection approach and the media researcher’s method as a targeted collection approach. A parallel difference is whether a collection has a macro-level and/or micro-level approach. To exemplify this, we briefly recount how the library has initiated a general automated collection from commercial aggregators of born digital music and books, along with a targeted automated effort to collect streaming-only TV programs from the two major public service TV-stations. However, the library’s current attempted micro-level collection effort provides but a narrow slice of the dimensions of streaming. So far, the method does not capture the graphical user interface only video files, program descriptions, images, and various other metadata.

The researcher’s targeted approach seems to have a wider margin of success by collecting a few streaming services in full or thematically. This should provide data that satisfies that researcher’s immediate needs. However, it runs the risk of being too narrow and hence support very few insights into the different dimensions and roles of streaming services in the contemporary society.

We have already touched upon another important distinction to do with the degree of automation of the process of collecting. At the library, curatorial and technical staff thoroughly test and improve upon their practice of automated collection through many years of quality control. Yet, the library’s collection only happens outside any paywalls, which is a key collection bias and analytical disadvantage. What is not automatically collected, will be missing from the collection. The interfaces are missing. Arguably, the most important aspect of the streaming experience. In other words, our evidence suggests that the automated process has deficiencies in terms of the “lack of depth” during the collection of the interfaces of the streaming services.

In contrast, Author B has documented versions of streaming services interfaces with manual screenshots after having logged in to the services. If a web page does not load successfully, the researcher can just reload the web page to capture all the content in the subsequent screenshot. In the hand-held collection process, the researcher will inevitably tackle such obstacles in the short run on the micro level. However, the obstacles could accumulate during a longer scheduled collection. In other words, the hand-held collection is subject to the researchers’ stamina. If collection fatigue sets in at some point, the consistency of that collection could suffer. This is a potential hindrance for studies, which aim to provide longitudinal insights on the development of streaming services. Author B remedied this by downshifting from monthly to quarterly collection. It seems that the hand-held method has longitudinal potential while the library’s collection method is established as longitudinal. Since there is no exact time-based definition of a longitudinal study, we could argue that the library’s collection method should provide very lengthy longitudinal advantages. Its allocated work effort of testing and improving its method should increase the replicability of the automated method. A threat worth mentioning stems from the fact that the very lengthy longitudinal collection is subject to the library’s strategic priorities that they must renegotiate every 3-5 years.

Still, the two methods could support each other simply because they cover different depths and timelines of streaming. We will describe two possible interdependencies at the practical level and policy level. The targeted approach can document user profiles better and collect more micro-level details e.g. full samples of images in a carousel and aspects of personalization, which are missing in the automatic collection. The choice to, or attempt at, collecting inside paywalls or logins is a very important one. The library’s general collection of other types of content and related materials from streaming services can serve as context for both methods. In the case of national services, the library should be able to give researchers access to the videos featured in the screenshots and historic news coverage of specific services. However, we should consider the methods as interdependent rather than the one being subordinate to the other. In other words, researchers can help the library patch the obvious holes in past collections. Looking forwards, the researchers can help the library adjust and improve collections. Given its wide and ambitious scope, it should be in everyone’s interest to have an optimal broad very lengthy longitudinal collection of streaming. It is a demanding task for the curators to assess what future needs will be. Researchers and eventually the public will continuously have to help the library assess the merit of their general approach that aims to collect everything automatically. This will require a better and continuous dialogue between the library and these stakeholders about what should be collected from the internet (Brügger, 2018; Schafer and Winters, 2021). The same goes for the library and researchers concerning the various dimensions of streaming captured in varied datasets (Kelly and Sørensen, 2021: 87). Not only are collections mutually beneficial on a practical level. We reiterate that they are interdependent at a policy level. The library initiated its collection practices based on investigations made by appointed experts and researchers that led to legal deposit legislation for dynamic web sites (Bache and Finnemann, 2003). Without renewed and continued exchange of advice and assessments of what needs to be collected there will not be a wide, long, and lasting contextual collection to draw from and build upon. As such, it is a fundamental interdependence at the level of policy and at the level of the individual collections of specific types of content, as shown here in the case of streaming services. There are multiple advantages of developing relationships between researchers and institutions. Not only do they facilitate greater access to data but they may also potentially increase access to the expertise and tools required to make sense of said data while also enforcing appropriate digital data preservation policies (Kelly, 2022: 16–17).

We wish to add that the streaming services are also important partners and collaborators. However, their preservation needs and research interests might not match those of the libraries and the researchers. Nevertheless, greater awareness of the legal deposit obligations and the benefits of research collaborations could produce better preservation of digital cultural heritage, more research insights, improved public-service offers, and potential commercial gains. The dynamic illusive Netflix catalog seems like a hyperobject, an n-dimensional non-entity on par with the internet itself (Morton, 2013). Yet, Netflix does have a research division and a website that links to their research (Netflix, n.d.). A quick survey of the site suggests that while they do present at conferences and publish in ACM proceedings they do not seem open to collaboration with independent researchers.

We can pick up, browse through, and pass on books made of paper and their born digital counterparts. We cannot, at this point in time, ‘press play’ in a collected version of the software and content assemblage we call streaming services. Their constituent parts are spread across various collections. Curators and researchers face a near-insurmountable task of reconstructing them as a piece of cultural heritage and as a research object. Potentially, an automated collection process inside login would collect every page and their contents that is linked to in Figure 2. Barring any conventional download of content in a ripping manner, this would be akin to having a bot real-time “watch” all Netflix shows or work its way through a curated playlist of content that is covered by Danish legal deposit law. Such a scenario presents a probable collection method with a very high degree of complexity. Which elements in the interface should be “clicked” and in which sequence? Will the play button be consistently placed in the interface? Will the ‘more info’-button load a pop-up or take us to a different “site” of the interface? Let alone the apparent impossibility of automatically collecting interactive TV series!

In summary, this means that collecting streaming services is very problematic and fraught with challenges. Nonetheless, we are hopeful since we can identify overlapping collections that can support each other. The collections can be built upon and further enriched by research requests if researchers and institutions in collaborations with the services acknowledge interdependencies and produce rich documentation and metadata.

We urge researchers and curators to help politicians realize how big a problem the lack of transparency around streaming actually is. As we have discussed, the actual interfaces of these services are not well preserved for future reference, and every single day, important material is regrettably not collected. Also, if a streaming service used its interface in a way that was societally problematic, the current state of our collections might make that difficult for anyone to prove. One solution is that a political intervention could make it mandatory for streaming services to hand over documentation of their service's appearance (inside paywalls) to the library on a recurring basis. Another major transparency issue is also the lack of reliable "ratings" or numbers that document streaming use, which most services do not share with the public. Altogether, these circumstances point to how streaming services are difficult to research and escape important critical observation.

In the meantime, we recommend that more parties test and share their experiences of using various tools that support archival and research-based collection of online materials. An example of this is Webrecorder’s software Browsertrix Cloud that provides a user interface for non-developers while being compliant with standardized Web archive formats (Myrvoll et al., n.d.). As tools become easier to use, it is increasingly important to provide documentation of choices before and during the collection to increase the methodological transparency and reflexivity of a given collection. This will help future exchanges and access to collections of streaming services.

Funding

A grant from the Ministry of Culture Denmark funded the research project that provided insights for this article: FPK-2021-0004.



Og að mér lifanda lifir enn hans hamingja -- Rare syntactic phenomena in parsed historical corpora

Ingunn Hreinberg Indriðadóttir, Þórhallur Eyþórsson

University of Iceland, Iceland

Introduction
This paper examines the historical distribution of the Prepositional Absolute Construction (PAC), a rare and underdescribed construction in Icelandic. Our study shows how a syntactically annotated digital corpus of historical texts can be used to describe the preservation and development of uncommon syntactic phenomena across different stages of a language. PAC has been defined as a small clause containing a subject NP and a present or past participle whose case is governed by the preposition ‘at’ (Eyþórsson and Indriðadóttir 2018). The construction consists of three types, labeled Type 1, Type 2a and Type 2b. Examples are given in (1):

(1a) Type 1: að öllum sjáandi

at all.dat seeing

‘While everybody sees/saw.’

(1b) Type 2a: að viku liðinni

at week.dat passed.dat

‘When the week had passed.’

(1c) Type 2b: að athuguðu máli

at considered.dat matter.dat

‘When the matter had been considered.’

In Type 1 (1a), the verb is in the present participle, the subject is in an active clause with a finite verb. In Type 2a (1b), the verb is in the past participle, the subject is in an active clause with a finite verb. In Type 2b (1c), the verb is in the past participle, the subject is in a passive clause with a finite verb (underlying object). The past participle shows agreement with the NP in case, number and gender in both Old and Modern Icelandic. The present participle is not inflected in Modern Icelandic, but it shows agreement in Old Icelandic, with a distinction

in the masculine singular (OIc. komandi ‘coming’ (nom.sg.)/ komanda (obl.sg.) vs. Modern Icelandic komandi (nom./obl.sg.)).

The grammatical function of the NP in PAC is either that of a subject of a finite active clause (Type 1 and Type 2a) or a subject of a finite passive clause, i.e. an “underlying” object (Type 2b), (2)

(2a) Allir sjá þetta.

all.nom see this.acc

‘Everybody sees this.’

(2b) Vikan líður.

the-week.nom passes

‘The week passes.’

(2c) Mál var athugað.

case.nom was considered.nom

‘A case was considered.’

Due to their correspondence to subjects in finite clauses, in this paper we call the NPs in PAC “subjects”, as has been argued by Indriðadóttir and Eyþórsson (to appear). However, it is unclear that it be shown independently, by means of the standard tests for subjecthood, that the relevant oblique NP is actually a subject in the PAC.

Previous studies of this structure have suggested that the distribution of the present and past participles is different in Old and Modern Icelandic, i.e. that there are very few occurrences of Type 1 in Modern Icelandic, whereas Type 2 is relatively common, in particular Type 2b. It has also been maintained that case marking within PAC developed very early on, as both dative and accusative NPs are attested in the PAC in Old Icelandic, whereas in Modern Icelandic only dative NPs seem to be found (Eyþórsson and Indriðadóttir 2018).

The results presented in this paper show that the distribution pattern of present and past participles in PAC has been consistent throughout all stages of the language from the 12th century to Modern Icelandic. This is important as it means that while PAC has always been a rare construction in the Icelandic language, it has maintained a similar level of use and productivity. Furthermore, our results show that accusative NPs in PAC survived in Icelandic until the 19th century, many centuries longer than has previously been considered.

PAC in IcePAHC

In this paper, we describe our detailed investigation of the distribution and development of PAC in Icelandic that we conducted in the Icelandic Parsed Historical Corpus (IcePaHC) (Wallenberg et al. 2011). We searched for all possible types of PAC with a single search command and extracted a total of 243 results. The results were then analyzed in R (2021), where the distribution of PAC from the 12th to the 21st century was defined, based on the three types described in (1). We then describe the results in detail, considering issues such as the grammatical function of the NP and the participle and the word order patterns within the construction.

Results

Of the 243 results, 5 were eliminated as they were considered improperly annotated. The remaining 238 results were analyzed in R (2021), where we defined the distribution of PAC from the 12th to the 21st century, based on the three types described in (1). Fig. 1 shows the distribution of the examples by century.

SEE PDF FOR FIGURE 1

Figure 1. Distribution of PAC by type and century

As shown in Fig. 1, the vast majority of examples were of Type 2b (3), while considerably fewer examples were found of Type 2a (4), and only a few examples were found of Type 1 (5). Most examples of Types 2a and 2b date from the 14th and 17th centuries.

(3) En [að sénu þessu mikla tákni]

but at seen.dat this.dat great.dat sign.dat

lofuðu allir guð og sæla Maríu Magdalenu.

praised all god and blessed Mary Magdalen

‘But having seen this great sign everybody praised God and the blessed Mary Magdalen.’ (ID 1350.MARTA.REL-SAG,.914)

(4) [en að morgni komnum] þá stóð Jesús í sjávarfjörunni.

but at morning.dat come.dat then stood Jesus in seashore

‘but when morning came Jesus was standing on the seashore.’

(ID 1540.NTJOHN.REL-BIB,232.1631)

(5) Og [að mér lifanda] lifir enn hans hamingja.

and at me.dat living.dat lives still his happiness

‘And while I am living his happiness still lives.’

(ID 1300.ALEXANDER.NAR-SAG,.554)

A closer look reveals that most of the examples from these periods (and in the results in general) can be found in only three publications. There are numerous examples from the 14th century, including 16 examples from the Saga of Bishop Árni (1325) and 16 examples from the Saga of Martha and Mary Magdalene (1350). From the 17th century, quite a few examples were also found, of which 18 were found in Jón Ólafsson Indíafari’s Travel Book (1661). In other respects, the number of examples is fairly evenly distributed between the centuries. The increase in examples in the 14th and 17th centuries is therefore probably due more to the personal style of the respective authors than to the increased general use of PAC in these periods. These results are consistent with the description of PAC in Modern Icelandic, i.e. that PAC Type 2 is relatively common, in particular Type 2b, which enjoys more productivity than the others. Nevertheless, all three types of PAC can be found through different stages of Icelandic, even though the construction is rare.

Previous studies have maintained that case marking within PAC developed very early on, as examples of dative and accusative NPs have been found in PAC in Old Icelandic, whereas in Modern Icelandic only dative NPs seem to be found (Indriðadóttir and Eyþórsson, to appear, and Eyþórsson and Indriðadóttir 2018). In our search, we found much newer examples of PAC with an accusative NP, such as example (6), which is from the 17th century.

(6) Og [eftir máltíð gerða] var hann kallaður

and after meal.acc done.acc was he called

að fylgja einu líki sem verið hafði kaupmaður

to follow one corpse that been had merchant

til sinnar greftrunar.

to his burial

‘And after the meal he was summoned to a funeral procession to the burial of someone who had been a merchant.’ (ID 1628.OLAFUREGILS.BIO-TRA,.769)

We also found examples that contain the preposition á ‘on’ rather than the usual , (7)-(8). This is interesting on its own as examples of PAC with other prepositions than have not been discovered before. Moreover, while the overt NP töðuslætti ‘haymaking’ in (7) is in the dative case, the missing NP in (8) is in the accusative, as shown by the form of the participle gert ‘done’:

(7) Anno 1661-1661 [á liðnum töðuslætti]

year 1661-1661 on passed.dat haymaking.dat

brann allur bærinn á Gröf á Höfðaströnd

burned all the-farm on Gröf on Höfðaströnd

hvar biskupinn herra Gísli hafði bú og mikið af fjárhlutum

where the-bishop mister Gísli had estate and much of property

sem þar voru inni.

that there was inside

‘In the year 1661, after haymaking, the whole farm on Gröf on Höfðaströnd, where Bishop Gísli had an estate and much property inside, burned.’

(ID 1725.BISKUPASOGUR.NAR-REL,.943)

(8) Þrællinn kvað honum illa farið að eggja sig til stórræða

the-slave said him badly gone to incite him to big-venture

en svíkja sig [á svo gert ofan]

but betray him on so done.acc down

um frelsi og fé er hann bauð honum til.

about freedom and money that he offered him to

‘The slave said that he had fared badly by inciting him to a big venture and thereupon betraying him of the freedom and money that he had offered him.’

(ID 1830.HELLISMENN.NAR-SAG,.220)

These results are important as this means that accusative NPs in PAC survived in Icelandic until the 19th century, many centuries longer than has previously been considered.

Finally, we considered word order within PAC, which can either be verb–subject (9) or subject–verb (10):

(9) [Að liðnum þessum jólum] var herra Árni biskup

at passed.dat this.dat Christmas.dat was mister Árni bishop

að stóli sínum til vors.

on seat his to spring

‘When this Christmas had passed Bishop Árni was in his bishopric until spring.’

(ID 1325.ARNI.NAR-SAG,.442)

(10) og [að veizlunni endaðri] voru menn með gjöfum útleystir.

and at feast.dat ended.dat were men with farewell-presents parted

‘and when the feast was over the men were given farewell gifts.’

(ID 1675.ARMANN.NAR-FIC,121.1007)

The latter appears obligatory if the NP is a pronoun, like in (5). In the paper, we argue that similar rules apply to word order in PAC as in constructions like Object Shift , i.e. an unstressed pronominal object must always precede a sentential adverb, and cannot be left in the default object position, or in situ (see Holmberg 1986 and much later work). Our conclusion is supported by the fact that pronouns are generally affected by different rules of word order than other types of NPs in Icelandic syntax (see e.g. Thráinsson 2007:31–37).

While rules and variation in word order in PAC have not changed over time, the construction tends to be shorter in Modern Icelandic than in older stages of the language, as argued by Indriðadóttir and Eyþórsson (to appear). In our paper, we demonstrate that PAC in older stages of Icelandic would allow much longer and more complex NPs than can be found in modern Icelandic and that long and complex NPs can be found preceding the participle (11) or following the participle (12).

(11) Og svo [að þessu samtali í það sinn öllu enduðu]

and so at this.dat conversation.dat in that time all.dat ended.dat

lét kóngur kalla á sinn eigin víntapparameistara

let king call on his own wine-steward

Kristján Skammelsson.

Kristján Skammelsson

‘and so when this conversation was all over at that time the King had his own wine

steward, Kristján Skammelsson, summoned.’

(ID 1661.INDIAFARI.BIO-TRA,40.375)

(12) Á miðja nátt fyrir framferðartíma sællar Mörtu

on mid night before death-time blessed Martha

[að sofnaðum bræðrum og þeim mönnum sem vöku héldu

at asleep.dat brothers.dat and those.dat men.dat that wake held

með kertum og ljósum] þaut mikill hvirfilvindur(...)

with candles and lights whistled great whirlwind

‘At midnight before the time of death of the blessed Mary, when the brothers and the men who held a wake with candles and lights had fallen asleep, a great whirlwind whistled (...).’

(ID 1350.MARTA.REL-SAG,.703)

The results of this study are robust, providing a detailed historical overview of PAC, which has never been done before, describing how this construction has been preserved in Icelandic and how it has developed over time. Our study demonstrates how syntactically annotated digital corpora can contribute to the study of the preservation and development of uncommon syntactic phenomena across different stages of a language.

Indriðadóttir-Og að mér lifanda lifir enn hans hamingja -- Rare syntactic phenomena-209.pdf


Three 3D scanners and 13 institutions,

Hrönn Konráðsdóttir

National Museum of Iceland, Iceland

How to prioritize projects and spark interest

In the starting stages of the Centre for Digital Humanities and Arts in Iceland three 3D scanners from Artec were purchased. They were chosen with the idea of being able to scan everything from chess pieces to rooms and houses. The scanners are located at the National Museum of Iceland but are easily movable between institutions. They can also be brought on site when needed.

The scanners serve as a collaborative resource for 13 institutions. The National museum’s role is to utilize the scanners to scan its collections but also to supervise their lending to other participating institutions as well as assisting with their use when possible. The initial stages have emphasised the importance of effective project prioritization and institutional engagement. The presentation will delve into the initial phases of the project, addressing the challenges and successes while initiating this venture. Furthermore, it will provide insight into ongoing projects, their current status and outline future aspirations.



Exploring Existentialist Design in Digital Humanities: A Case Study of User Experience at the National Library of Norway

Jana Sverdljuk

National Library of Norway, Norway

This paper examines the application of existentialist philosophical principles to user experience design within the Digital Humanities Laboratory at the National Library of Norway (NLN). Over the past decade, the NLN DH Lab has evolved from organizing workshops on Jupyter Notebook tools to developing user-friendly applications tailored to streamline computer-mediated research on cultural heritage materials. Drawing from existentialist philosophy, the paper explores how these applications prioritize exploration, accommodate subjectivity and interpretation, embrace fluidity and multiplicity of meanings, and facilitate engagement with being-in-the-world. By analyzing user personas and their interactions with the applications, the paper demonstrates the profound impact of existentialist design principles on user experiences and interactions within the digital humanities landscape. The paper concludes with reflections on the dynamic relationship between technology design and user engagement, underscoring the role of design principles in shaping meaningful and impactful experiences for diverse user groups.

Sverdljuk-Exploring Existentialist Design in Digital Humanities-172.docx


Digitized language variation for computational dialectology: The Dialect Atlas of Finnish by Lauri Kettunen (1940)

Jenni Santaharju1, Terhi Honkola1, Perttu Seppä2, Kaj Syrjänen1, Unni Leino3, Outi Vesakoski1,4

1University of Turku, Finland; 2University of Helsinki, Finland; 3Tampere University, Finland; 4Turku Institute of Advanced Studies

Our work introduces open access data conversions based on the Dialect Atlas of Finnish (Kettunen 1940) that are suitable for computational analyses of dialectal traits. The dialect atlas was collected by Lauri Kettunen in the 1920s–1930s and describes the spatial variation of Finnish ca. 100 years ago. The data is organised into 213 maps each describing areal variation of one (mostly) morpho-phonological feature, identifying which variant(s) of each feature was present in each of the 525 Finnish speaking municipalities. The dialect atlas represents so far the most comprehensive data of the dialectal variation in Finnish and is the only data available of historical linguistic landscape. As the dialect atlas was collected before the urbanisation and mass movements during the WW2, it is suitable for studying for example the development of preindustrial linguistic landscape and mechanisms driving dialect formation. The dialect atlas has been studied both with traditional (reviewed in Aarikka 2023) and quantitative methods (reviewed in Syrjänen 2016).

Digital version of the dialect atlas is found at http://kettunen.fnhost.org and undocumented version of the data corrected by us is at http://urn.fi/urn:nbn:fi:csc-kata20151130145346403821. The latter is based on the first digitised version of the dialect atlas by Embledon & Wheeler (1997, 2000) done in collaboration with the Institute of the Language in Finland (KOTUS) and corrected by us (www.bedlan.net). In this paper we provide the raw data of the dialect atlas readily formatted in different coding schemes, along with additional annotations to further facilitate the use of this valuable linguistic data. We also reconstruct the data collection procedure by Kettunen and provide the linguistic classification of the collected linguistic traits. Furthermore, we promote the usage of the data and FAIR principles also by offering different coding schemes, as used in different BEDLAN paper: We have modified the dialect atlas into multiple versions according to the needs of different studies (Syrjänen et al. 2016, Honkola et al. 2018 and Santaharju et al. ms in revision). We offer here not only the master copy and metadata, but also the different versions available in the open repository. Furthermore, we contribute to development of methodology in evolutionary language sciences and computational dialectology by discussing different coding schemes for the language data.Even though these analogies between linguistic and genetic data have been discussed from a theoretical point of view e.g. in Andersen (2006), Croft (2008) and Pakendorf (2014), the different alternatives of how to code linguistic data so that it matches genetic data has obtained little attention (see Leino et al. 2020.

Our work and the dataset provided here contribute to the rise of the evolutionary language sciences by adapting approaches from evolutionary biology and computational sciences to operate with large digitised linguistic datasets. This paper will elevate especially computational dialectometry by providing not only a new data resource to international audience, but also by presenting a well-established new framework to conduct studies within digital humanities.



Using BERT to Study Semantic Variations of Climate Change Keywords in Danish News Articles

Florian Meier

Aalborg University, Denmark

The terms greenhouse effect, global warming, and climate change are often used synonymously in everyday conversations about the warming planet. Utilizing Danish BERT, the study employs masked language model tasks to uncover semantic shifts and variations of these climate change-related keywords in Danish news articles from 1990 to 2021. The findings offer insights into contextual understandings and framing nuances by journalists, contributing to a deeper comprehension of these terms in the Danish media discourse on CC.

Meier-Using BERT to Study Semantic Variations of Climate Change Keywords-125.pdf


From Miðgarð to Marvel: Norse Mythology, Augmented Heritage and the Prose Edda

Alan Thomas Searles

University of Iceland, Iceland

This research proposes that the digital integration of cultural heritage in Iceland expand beyond urban settings and should encompass heritage sites, monuments, churches, graveyards and natural landscapes, initially focusing on the research centre at Snorrastofa in Reykholt and its association with Snorri Sturluson, the Prose Edda and Norse Mythology.

In recent decades, digital technologies have increasingly been employed in cultural and heritage spaces in Iceland. Typically these spaces, museums, galleries and libraries, are located in and around Reykjavík with relatively large visitor numbers. Technologies such as barcodes, QR codes and NFC tags have all been utilised with varying degrees of success in attempts to improve the participatory and experiential engagement of visitors (Werthner et al.). Mobile phones are increasingly being used as interactive devices in cultural and heritage spaces (Lombardi). The capabilities of recent convergent technologies, applications and cloud storage afford opportunities for comprehensive integration of digital interfaces with cultural heritage.

In 2021 the European Commission launched the Digital Decade framework which includes a strategy, targets and objectives for digital innovation in Europe until 2030. As part of the Digital Decade framework, a number of projects and policies have been initiated. These include the project on Digital Cultural Heritage, focusing on the areas of digitisation, online access to cultural material and digital preservation. According the digital cultural heritage website “Cultural heritage is evolving rapidly thanks to digital technologies. The momentum is now to preserve our cultural heritage and bring it to this digital decade.” The Cultural Heritage Cloud initiative has been launched with the aim of developing specific digital collaborative tools for the sector while removing barriers for smaller and remote institutions. The Cultural Heritage Cloud will aim to add a new digital dimension to cultural heritage preservation, conservation, restoration and enhancement.

This research will use the framework and initiatives currently being implemented by the EU to extend digital cultural mapping and virtual heritage technologies to a traditional museum in rural Iceland. Research indicates that public engagement with a culturally significant encounter, such as visiting a museum or heritage centre, can be greatly enhanced through the implementation of interactive tools and mobile devices (Ruiz-Gómez et al.). These tools can be as simple as QR codes located appropriately to promote user engagement with an interactive experience or as complex as augmented, virtual or extended realities designed to create an immersive experiential environment for users.

The primary objective of the research project will be to connect people visiting places of culture significance to the literature and history of the place. Hopefully we will increase visitors understanding of the intimate relationship between the Prose Edda and Norse Mythology and by extension increase their appreciation of the far reaching and enduring impact of Reykholt on twentieth century culture, cinema and literature. To achieve this objective the plan is to implement technologies which will allow individuals to seamlessly interact with specific locations, and through digital interfaces and extended realities, connect those physical locations with medieval Norse literature and mythology.

The proposed research will engage with and collaborate closely with Snorrastofa, which is an independent research centre, located in Reykholt in western Iceland, the main residence of Snorri Sturluson (1179-1241). The main goal of Snorrastofa is to facilitate research on the medieval period in general, and Snorri and his works in particular. Rekholt is the place where modern Norse Mythology was written/compiled into a single book for the first time almost a thousand years ago.

A number of innovative interactive digital projects have been implemented at heritage sites in Iceland in recent years. The Find the Past project at Thingvellir National Park and the Back to Hofsstaðir project in Garðarbær both utilise WebXR technology to create an extended virtual environment which visitors can access via mobile devices. While this technology is immersive, it is still a passive experience for users. A more interactive experience is being offered at the 1238 visitor centre at Sauðurkrókur in north Iceland where guests can take part in the Battle of Örlygsstaðir using Virtual Reality headsets and haptic feedback breastplates. Three dimensional holographic images linked to specific locations via QR codes are being investigated by the town of Hafnarfjördur and a 360 degree scanned map of the statue garden at the Einar Jónsson museum has been created by the engineering firm Efla and made available online to schoolchildren. These and other projects indicate that the technology and expertise are available in Iceland for private interests and tourist companies to implement digitally enhanced experiences for tourists and visitors to heritage sites. This project will attempt to apply some of these tools, technologies and expertise via a digital humanities project focusing virtual heritage. While the concept of digital cultural heritage is not new, the technology and tools to implement the ideas developed in recent decades has reached a point where the theory can now be practically implemented (Salek Farokhi and Hosseini).

According to the European Commission website on Culture and Creativity, “cultural heritage..encompasses a broad spectrum of resources inherited from the past in all forms and aspects.” And that “Research and innovation nurture smart and technologically advanced solutions to help Europe protect and promote its cultural heritage.”

The EU project “Integrated e-Services for Advanced Access to Heritage in Cultural Tourist Destinations (ISAAC)” confirmed that adopting innovative ICT could enhance and improve the promotion of local cultural heritage and improve cooperation across sectors and research disciplines.

During the implementation of this project I will draw on the work of Erik Champion and his extensive research on the concept of Virtual Heritage, as well as following EU guidelines for Digital Cultural Heritage projects. This is an attempt to address a gap in digital humanities research in Iceland, specifically virtual heritage studies and hopefully it will contribute to a structure which future studies regarding digital heritage and cultural studies may utilise and benefit from.



 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: DHNB 2024
Conference Software: ConfTool Pro 2.6.153+TC+CC
© 2001–2025 by Dr. H. Weinreich, Hamburg, Germany