Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Please note that all times are shown in the time zone of the conference. The current conference time is: 1st May 2025, 02:44:52pm GMT

 
 
Session Overview
Session
WS2-1: FULL-DAY WORKSHOP (DiPaDA 2024)
Time:
Tuesday, 28/May/2024:
8:30am - 10:00am

Session Chair: Mats Fridlund, University of Gothenburg, Sweden, Sweden
Session Chair: Matti La Mela, Uppsala University, Sweden
Location: K-208 [2nd floor]

https://www.hi.is/sites/default/files/atli/byggingar/khi-stakkahl-2h_2.gif

Full programme at: https://dhnb.eu/conferences/dhnb2024/workshops/dipada/

09:00-09:10 Welcoming (Organizing Committee)

09:10-09:30 Transcriber effects in the Icelandic parliament corpus (Anton Karl Ingason, Lilja Björk Stefánsdóttir)

09:30-10:00 Augmenting the Analysis of Political Discourse: A Word Embedding and Context-sensitive Methodological Approach to the Swedish Parliamentary Corpus (Lejf-Jöran Olson, Daniel Brodén, Mats Fridlund, Magnus P. Ängsal, Patrik Öhberg)


Show help for 'Increase or decrease the abstract text size'
Presentations

Transcriber effects in the Icelandic parliament corpus

Anton Karl Ingason, Lilja Björk Stefánsdóttir

University of Iceland, Iceland

The Icelandic parliament corpus is being used to study individual
lifespan change in sociolinguistic style-shift. We report on how the
word order effect in question is affected by decisions made by those
who transcribe the speeches and show that while some changes are made
by the transcribers, the overall pattern of linguistic usage is not
substantially altered. Ideally, each recording is manually checked by
an annotator, but automatic annotation can be used with the understanding
that quantitative findings are subject to minor errors.

Ingason-Transcriber effects in the Icelandic parliament corpus-245.pdf


Augmenting the Analysis of Political Discourse: A Word Embedding and Context-sensitive Methodological Approach to the Swedish Parliamentary Corpus

Leif-Jöran Olsson, Daniel Brodén, Mats Fridlund, Magnus P. Ängsal, Patrik Öhberg

University of Gothenburg, Sweden, Sweden

Introduction

Recent years have witnessed a rapidly expanding interest in data-driven research on parliamentary collections as well as a drive towards further development and standardisation of parliamentary infrastructures (La Mela et al. 2022: 3–4), including cross-national initiatives such as Parla-CLARIN (https://github.com/clarin-eric/parla-clarin). As regards Sweden, although major work is carried out within the SWERIK infrastructure and the Welfare State Analytics project (WeStAc), research on the Swedish parliamentary datasets have, so far, been primarily exploratory and driven by broad conceptual issues and application of common statistical measurements (see Ohlsson et al. 2022; Brodén et al. 2023; Jarlbrink & Norén 2023), leaving a significant gap in the data modelling and the contextual understanding of the data.

This paper will present the augmented methodological approach to analysing the Swedish parliamentary discourse on terrorism developed in the ‘Terrorism in Swedish Politics’ (SweTerror) project (2021–2025) (Edlund et al. 2022). Drawing upon a mixed methods approach, we will discuss a set of context-sensitive analyses of the Swedish parliamentary record, integrating language technology (LT) and contextualising methods from, among other research areas, political science, history of ideas and linguistics. Notably, the discussion connects to the current debate within digital humanities (DH) about the need for engaging with the contextual complexities of text mining large-scale archival collections. According to digital historian Jo Guldi (2023), dedication to the question of what makes text mining accurate and robust will only get data-driven analysis of large-scale collections so far, as without a contextual sensibility applied to the materials the results tend to raise more questions than they answer (see also Bode 2018).

We here outline SweTerror’s enactment of a contextualised understanding of the text data through the development of a custom-made dataset and use of word embeddings (vectors) for analysing the framing of terrorism in the parliamentary debates, 1968–2018. The paper presents a LT approach firmly grounded in humanities and social sciences (HSS) research questions, and highlights its methodological and analytical potential by presenting results from two case studies of how the Swedish Parliament (riksdag) and the different political parties have engaged with the issue of terrorism.

The dataset and LT approach of SweTerror

The paper will focus on our work with developing the SweTerror corpus, using and adapting the Swedish Parliament Corpus of the edited transcripts of minutes that are currently being cleaned up, partly re-digitised and curated for research purposes (current version 0.14). The dataset is longitudinal and encompasses both the bicameral and unicameral Parliament (1867–1970 and 1971–2018, respectively), consisting of roughly 4 M tokens per parliamentary year (see below). The structure of speeches is reintroduced with a correctness of 90+ percent. Notably, the dataset is annotated with metadata about Members of Parliament (MPs) concerning name, party affiliation, gender and regional representation. Furthermore, we describe the exchange between SweTerror and SWERIK, with SweTerror’s LT analyst Olsson serving on the advisory board and technical advisory board of SWERIK and further enriching and curating the Swedish Parliament Corpus for the benefit of the infrastructure and our research purposes. This work includes contributing various forms of quality control; in this paper we will point at some issues of relevance for our analysis, including the identification of omissions in the dataset such as missing debate protocols.

From the infrastructure perspective, the paper will highlight the integration of workflows into the Språkbanken Text (SB Text) infrastructure, including the Korp tool (Borin et al. 2012) to avoid reprocessing in the SB Text infrastructure. In turn, the process will introduce more flexibility to Korp’s word picture functionalities and feed into new Sparv plugins (Borin et al. 2016) as well as be accessible through APIs. In extension, this means that workflows and data will also be integrated into the CLARIN ERIC infrastructure.

Concerning contextualisation, writings in Digital Humanities on Swedish Parliamentary data have mostly focused on more technical and formalistic aspects of the documentary record, such as issues surrounding OCR quality and metadata as well as how the transcriptions of the minutes are the result of post-speech editing (see Norén & Jarlbrink 2024). However, SweTerror seeks to enact a more contextualising understanding of the data, a simple yet significant element being our choice to, contrary to most parliamentary datasets, group the debates by parliamentary year (autumn–summer), rather than calendar year, to distinctly represent the mandate period of the Parliament. An important rationale for this is that changes of government during election years (mid-calendar year) affect the political dynamics, and we have previously shown that governmental position is a major factor for MPs’ motion writing on the topic of terrorism (Brodén et al. 2023). Following our lead, SWERIK in 2024 adopted this principle for their Swedish Parliament Corpus.

From the technical perspective, the paper will describe our workflows around the annotation pipelines, where the outputs are continuously aggregated and analysed in an iterative process with each layer of annotation having at least one manual evaluation. Specifically, we discuss our work with word vectors and metadata, respectively, for contextual readings in two case studies concerning the occurrence of terrorism discourse in the Swedish parliamentary debate transcripts.

Case study 1: Vectors of violence

A key part of the SweTerror project is applying word vectors, word embeddings (Mikolov et al. 2013) and vectors for longitudinal exploration and examination of conceptual changes and conceptual similarities related to the notion of terrorism (see Stampnitzky 2013; Ditrych 2014; Zoller 2021). The word vectors are used in combination with enriched document annotation and quality assessed metadata to create ‘temporal lenses’ to traverse our analytic universe. Furthermore, we highlight the work concerning document based analyses of classification and Named Entity Recognition (NER). Both of these enrichments are used for traversal or retrieval of similar sections and related activities based on network relations.

This case study explores the Swedish parliamentary speech on terrorism with regard to discourse semantic patterns and competing terms from the realm of political violence. We aim at comparing the usage of discourse-relevant lexical items such as the Swedish terms ‘terror’, ‘terrorism’, and ‘våldsbejakande extremism’ (‘violence-affirming extremism’). To highlight this, we diachronically compare these units by means of their similarity and closeness, estimated through a plethora of word vectors, allowing us to trace the development of the parliamentary discourse on terrorism, with regard to continuities and discontinuities. Comparing different vectors of semantically related lexical units diachronically allows us to identify potential discursive shifts in the framing of terrorism over time. A key focus of the analysis is if, and then how, the vectors of ‘terror’ undergo a change when the modern usage in Swedish of the word ‘terrorism’ emerged in the early 1970s. Another point of interest is possible discursive shifts related to the establishment of the specific term ‘våldsbejakande extremism’ in the 2010s (Andersson Malmros 2022) and whether it had an impact on the vectors of ‘terrorism’.

In our preliminary findings, the calculated similarity between ‘terror’ and ‘terrorism’ remains rather high over time, ranging from the lowest value of 0,73 (2015) to 0,93 (1974), with value 1,0 meaning identical embeddings in the data. Although there are noteworthy differences between separate years, a general decrease in similarity after 2001 is discernible, with 2005 as the first year when similarity drops below 0,8 and 2015 displaying the all-time lowest degree of similarity. This discursive change can likely be interpreted as a further specialisation of the term ‘terrorism’ post 9/11, meaning that it increasingly diverts from the more general term ‘terror’. A further point of inquiry is examining different ways of wording political violence by way of the lexical items mentioned in relation to MPs party-affiliation. This methodical advancement enables the outlining of differences and similarities in framing terrorism with respect to lines of political-ideological differences in parliament diachronically.

Case study 2: Terrorism is not for beginners

Another integral part of the SweTerror project is combining word vectors and metadata (see above) for classifier tasks and traversal. The act of giving speeches holds significance for analysis, as it influences the discourse surrounding policies, but also serves to inform the public about position-taking and the law-making process. However, the allocation of speaking opportunities and debates on certain issues among legislators is not random. There are patterns. One such pattern is gendered speech behaviour. For instance, women often deliver fewer speeches (Bäck, Debus & Müller 2014) and exhibit more emotional (Dietrich, Hayes & O'Brien 2019) and less aggressive speech tendencies (Kathlene 1994). Moreover, women tend to speak less on subjects regarded as ‘masculine’ (Bäck & Debus 2019).

Since the parliamentary speeches in our dataset are automatically assigned a Persistent Identifier (PID) for the speakers that can be used in connection to other metadata and calculated metadata. This allows for an analysis of structural differences in the debates on terrorism at party level, including government versus opposition, speech volume measured as token percentage, differences between men and women. Furthermore, we will integrate ‘seniority’ as an analytical factor through measurements based on the speaker’s age, years in parliament, position (Minister or not, membership in Parliamentary Committees, committee chairs, governmental party, etc). Our temporal periods are dynamic in the sense that they can encompass, among other things, parliamentary year, government period, and eras defined by MPs that have been particularly influential in the debate on terrorism, all of which will be explored for both continuities and discontinuities. Networking inside parliament is explored and workflows have been established for visualising and comparing extra-parliamentary and intra-parliamentary networking.

This case study also provides an overview of women’s speech participation over an extended period of time and to explore which MPs who have discussed terrorism. Our preliminary findings suggest that female speech percentages have risen slowly during the early 20th century, levelling out close to 50 percent in the last decades (since 1988). Nowadays, women speak about as much as men do. Whether that pattern is also reproduced in the terrorism debate we will present and discuss when we have produced more finished results for our conference presentation.

Conclusions

We conclude by drawing together our lines of thought about the augmented methodological approach of SweTerror to analysing Swedish parliamentary discourse and how it can enrich a contextual understanding of parliamentary data, in Sweden and beyond.

Olsson-Augmenting the Analysis of Political Discourse-247.pdf


 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: DHNB 2024
Conference Software: ConfTool Pro 2.6.153+TC+CC
© 2001–2025 by Dr. H. Weinreich, Hamburg, Germany