Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Please note that all times are shown in the time zone of the conference. The current conference time is: 1st May 2025, 05:46:08pm GMT

 
 
Session Overview
Session
SESSION#15: LINGUISTIC ANALYSIS
Time:
Friday, 31/May/2024:
8:45am - 10:15am

Session Chair: Matti Lamela, Uppsala University, Sweden
Location: K-207 [2nd floor]

https://www.hi.is/sites/default/files/atli/byggingar/khi-stakkahl-2h_2.gif

Show help for 'Increase or decrease the abstract text size'
Presentations
8:45am - 9:15am

Gly2Mdc v.2.0: Lessons Learned from Building a Tool for Hieroglyphic Texts

Heidi Annika Jauhiainen

University of Helsinki, Finland

In order to advance digital methods in Egyptology, machine-readable hieroglyphic texts are needed. While machine-readable cuneiform texts have been extensively employed in Assyriological studies, the intricate nature of hieroglyphic script poses challenges in creating accessible corpora. Specific hieroglyphic text editors are used to produce pictures of the texts with signs placed correctly above and next to each other in kind of boxes. The pictures are used in publications, but the machine-readable project files are generally discarded. In this paper, I introduce Gly2Mdc v.2.0, a tool designed to transform the .gly files containing encoded hieroglyphic texts into a more human-readable format. The tool extracts and cleans the encoding and offers users options for saving the text in different formats. The aim is to give the users of hieroglyphic text editors a chance to publish the text also in machine-readable format and increase the amount of text available for building digital methods. Challenges faced in developing this tool are discussed, including the impossibility of achieving a faithful rendition of the original text in machine-readable form and the challenges of converting encoding to Unicode.

Jauhiainen-Gly2Mdc v20-183.pdf


9:15am - 9:30am

An unexpected gender-agreement pattern in Icelandic

Einar Freyr Sigurðsson1, Oddur Snorrason2, Ása Bergný Tómasdóttir3

1The Árni Magnússon Institute for Icelandic Studies, Iceland; 2Queen Mary University of London; 3University of Iceland

This paper examines gender-agreement variation for Icelandic sports-team names using the Icelandic Gigaword Corpus. Feminine and masculine sports-team names, such as Keflavík and Fjölnir, respectively, allow two different agreement patterns: (a) the expected (feminine/masculine) gender agreement corresponding to the gender of the team name, see (1) below, or (b) unexpected neuter agreement, see (2) below.

(1) Fjölnir er fallinn ʻFjölnir.MASC is relegated.MASC’

(2) Fjölnir er fallið ʻFjölnir.MASC is relegated.NEUT’

Interestingly, our corpus results reveal that the vast majority of the examples show neuter agreement, i.e., 80% of the total number. It is unclear how to account for this unexpected gender-agreement pattern. We discuss a few possible explanatory factors.

Sigurðsson-An unexpected gender-agreement pattern in Icelandic-215.pdf


9:30am - 9:45am

Word of the year 1919: Conveying the media’s favorite annual linguistic parlor game to a different era

Steinþór Steingrímsson, Einar Freyr Sigurðsson, Starkaður Barkarson, Atli Jasonarson, Ágústa Þorbergsdóttir

The Árni Magnússon Institute for Icelandic Studies, Iceland

In Iceland, the word of the year is chosen annually, both by the Icelandic National Broadcasting Service and the Árni Magnússon Institute for Icelandic Studies (AMI). We explore the possibility of doing the same but for a year more than 100 years ago. We try using the same methods as AMI does for our times. This approach has various limitations, which we discuss, and raises many questions, such as how much texts from journals and periodicals reflect the actual word use of the time.

Steingrímsson-Word of the year 1919-225.pdf


9:45am - 10:00am

Analysing lexical cohesion and topicality in online interaction

Antti Kanner, Anna Vatanen, Eetu Mäkelä

University of Helsinki, Finland

Free forms of conversation are characterised by informal flow of topic and theme, where the original or temporally first topic does not necessarily constrain the following turns of conversation. Topic shifts in spoken language have been studied in interactional linguistics, especially in Conversation Analysis (CA), where the concept of topicality encompasses thematic structuredness and progression as well as the means the speakers use to manage these structures (for an overview, see Couper-Kuhlen & Selting 2018: 312-328).

The concept of lexical cohesion, on the other hand, is most famously introduced by Halliday & Hasan (1976), according to whom lexical cohesion in texts is upheld by lexical selections that are somehow, in whatever way, predictable by selections made earlier. Intuitively, words belonging to the same conceptual spheres are often found in the same segments of discourse. That conceptual sphere then forms one aspect through which the subject matter of the discourse can be characterised. This intuition is often exploited in computational approaches to topicality, such as topic modelling. However, from the Conversational Analytic perspective, specific vocabulary constitutes only one dimension along which a shift in topic can be observed. Others include, for example, the use of specialised expressions to explicitly signal initiation of or a shift to a new topic, and shifts in temporal orientation of the discourse (moving from recounting previous events to planning for future). Furthermore, also topic closures can be marked with specific verbal expressions. Yet the interplay between the different dimensions have not been thoroughly investigated. (Couper-Kuhlen & Selting 2018: 312-328.)

Unlike in several other fields of linguistics, in Conversation Analysis computational methodologies have been relatively rarely used. However, O’Keeffe & Walsh (2012) have argued that corpus linguistics and CA are, despite their ontological differences, not mutually incompatible, and has shown this by utilising corpus linguistic methods in several studies on classroom interaction. Other previous studies approaching computational methodologies from a CA perspective include Haugh & Musgrave (2018), who present a combinatorial procedure for identifying examples of an interactional practice across relatively large tracts of data. From the perspective of computational linguistics and HCI, structures of interaction have attracted more attention, as developing talking machines has been a steady interest (see, e.g., Compagno et al. 2018).

By combining the conversation analytical perspective on topic shifts and computational analysis of lexical cohesion, we ask: what is the role of vocabulary in topic formation and, consequently, how far a purely word-based method can go in recognizing topic shifts in conversation? In our study, we use roughly 10,000 lines of online chatroom discussion data to assess the degree to which computationally mapped lexical cohesion and qualitatively analysed topic shifts converge.

In operationalizing word-based lexical cohesiveness, the most common text analysis methods (including topic modeling) seek to measure the degree of co-occurrence between groups of words and build the computational topics as representations of these co-occurrence patterns. This approach aligns well with text-internal lexical cohesion: the words’ meanings outside the data are not taken into account; instead, what matters only is how they reside in relation to each other in the data. Word-embeddings trained on much larger dataset than our 10K lines and that represent word distributions on type level, on the other hand, suit well for tracking the text-external associations. Word embeddings, being based on distributional similarities, are often very greedy when it comes to establishing proximities: any kind of feature of a word, as long as it has a distributional imprint, will be reflected into distributions and, by extension, to vector space models. In our case this is, however, not an issue, as Halliday & Hasan’s definition of adequate association between words is equally generous.

In our study, we experiment which statistical models capture best the topical structuredness in discourse. We mark the associations identified by both topic (CA-wise) and word embedding models in the data, and experiment with different ways to extract semantically cohesive structures. These include chains connecting associated word pairs, and sliding windows within which the overall associativeness is measured. We assess which models best align with qualitatively analysed topic shifts and where the misalignments particular to each model reside. As an outcome, we are able to discuss how lexical cohesion plays a part in topic development in thematically unbounded online discussions. This discussion contributes to the development of large scale automated methods seeking to understand topical progression in online discussion data.



10:00am - 10:15am

A humanist in search of computer scientists: A (so far unsuccessful) attempt to apply topic modeling techniques to Wittgenstein’s Nachlass

Filippo Mosca

Wittgenstein Archives, University of Bergen; University of Rome Tor Vergata.

This paper explores the application of Latent Dirichlet Allocation (LDA) Model to the writings of the philosopher Ludwig Wittgenstein. More specifically, it highlights what topic modelling is and why it can be useful for philosophical interpretation, shows the two major stages of how LDA works in practice (pre-processing the data and running the model) and addresses some important challenges in assembling the corpus of Wittgenstein’s writings: the issue of multilingual corpora, the issue of repeated text sequences and the issue of the basic textual units. Finally, this article shows the results of a specific analysis of Wittgenstein’s Nachlass through LDA and points out limitations and problems related to these results.

Mosca-A humanist in search of computer scientists-116.pdf


 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: DHNB 2024
Conference Software: ConfTool Pro 2.6.153+TC+CC
© 2001–2025 by Dr. H. Weinreich, Hamburg, Germany