Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
Please note that all times are shown in the time zone of the conference. The current conference time is: 1st May 2025, 02:37:46pm GMT
|
Session Overview |
Session | ||
SESSION#17: MEDIEVAL STUDIES
| ||
Presentations | ||
10:45am - 11:00am
Analyzing Kinship Sentiment in Medieval Documents 1CDHU: Uppsala University, Sweden; 2Culture, Cognition, Coevolution Lab: Harvard University; 3George Mason University; 4New York University; 5University of Texas, Austin; 6Northeastern University Complex kinship systems have long shaped societies, influencing relationships within different familial structures. These systems can extend beyond immediate and genetically-related family to include distant relatives, in-laws and others, and have historically been used to determine who is or is not eligible for marriage. However, little is understood about the historical psychological impacts of kinship systems, leaving aspects of historical societies’ sentiments surrounding kinship to be explored. In modern WEIRD (Western Educated Industrialized Rich and Democratic) societies, individuals often exhibit traits of individualism, non-conformity, and trust in societal structures. The influence of kinship structures on WEIRD psychology was previously explored in Schulz et al.'s (2019) study which examined effects of changes in Medieval Europe. It revealed that prolonged exposure to the Western Christian (pre-Catholic) Church led to reduced rates of cousin marriage, which correlated with a more individualistic and impartially prosocial modern psychology. This shift was attributed to the Church's emphasis on promoting nuclear households, weakening extended family ties, and fostering mobility within the community. The present study takes a new approach to examining the impact of kinship psychology through the medieval church. It leverages empirical data extracted from historical texts using natural language processing techniques, including sentiment analysis of texts written in Latin. The objective is not only to support theoretical claims and shed light on historical kinship sentiment but also to explore computational methodologies applicable to comparative historical research in other regions, such as the influence of religion on kinship and social psychology in other European regions. Such results are to be aggregated and eventually compared within an interdisciplinary team involving classics, computational linguistics, economic history, anthropology, and psychology across regions and time considering the development of WEIRD kinship structures alongside surrounding economic, historical, and socio-psychological influences. One key aspect of this research is a cross-institutional and interdisciplinary approach, engaging collaboratively with an international network of experts in, e.g. linguistics, history, and psychology, as well as with key cultural institutions. This collaborative framework has been instrumental in enriching our materials, critical to building an understanding of historical kinship systems, and allowing for a more detailed analysis through the integration of diverse perspectives. Such partnerships have not only facilitated access to valuable historical texts but also have provided unique insights into the interpretative methodologies across the various disciplines engaged in this project. The present preliminary analysis uses the CBMA (Corpus Burgundiae Medii Aevi), which consists of over 22,000 medieval charters, hagiographies, and other religious, legal, and communicative texts from medieval Burgundy spanning roughly the 5th to 15th centuries. These texts, available in XML and pre-processed parsed formats, were prepared for analysis, which included the extraction of kinship terms and matched sentiment scores within their contexts. Various methods, including a LatinBERT language model (Bamman & Burns 2020) and dependency parsing, were explored to determine the different contexts surrounding kinship terms. Ultimately, dependency parsing using LatinCy (Burns 2023) was the chosen method, which incorporated grammatical relationships between words, enhancing the overall analysis and improving accuracy and interpretation. Plain-text versions of the texts were first processed and parsed using LatinCy. Sentences containing both kinship terms and sentiment terms according to specified lists (over 600 kinship terms and 6000 sentiment terms) were extracted. Specifically, sentiment-related terms were identified using a sentiment dictionary (Sprugnoli et al. 2021; 2023), aiding in the exploration of sentiment-related language within the context of kinship terms. To gain an understanding of the bearing of certain kinship terms within the corpus, a TF-IDF score was also obtained and combined with terms’ average sentiment scores, highlighting terms' varying importance across different texts--notably, more religious texts would contain more references to, e.g. 'father' and 'son'. The outcomes of this analysis yielded valuable initial insights, including average sentiment scores for kinship terms weighted by TF-IDF scores. Furthermore, sentiment scores were assigned to each kinship term in individual documents, enabling chronological comparisons to track sentiment shifts related to terms over time. Specific kinship terms were furthermore extracted, including gender pairs for comparison (e.g. 'brother', 'sister', 'mother-in-law', 'father-in-law'). Those terms which had the most variability over the timespan of the corpus (measured by their standard deviation) were also examined. Subsets of the corpus based on genre (diplomatic texts, hagiographies, etc) were also specified and explored individually. The use of the CBMA corpus serves as a valuable example for researchers exploring questions pertaining to kinship terms and sentiment within various texts. The use of NLP methods to explore and analyze such a sizeable corpus of Latin texts is furthermore expanding the boundaries of what has been done within computational Latin studies, as the models and pipelines used are only recently developed or currently under development. It lays a foundation for further in-depth investigations of kinship in linguistics, classics, and across historical social sciences, facilitating further discoveries in this interdisciplinary field. 11:00am - 11:30am
Asynchronous linked editing of texts in physical objects University of Copenhagen, Denmark Introduction Several projects (at the University of Copenhagen these include Editiones Arnamagæanae Electronicae and the Dictionary of Old Norse Prose) aim to digitally record an analysis of early texts and their language which is closely based on material evidence, normally manuscripts. Digital methods for these processes can be assisted by various techniques including imaging, text recognition and linguistic analysis. The overall process normally proceeds in a single direction: objects are imaged, transcribed, structured as texts and linguistically normalised and parsed. At each stage information is often discarded, particularly as the data standards for each process are often incompatible. The present paper describes a working model and application (at https://menotag.ku.dk) that allows these processes to proceed asynchronously and without information loss, that is, linguistically-annotated texts can be linked in detail to manuscript imaging, and manuscript imaging can be used to produce linguistically-annotated texts. This technology produces richly interactive editions and linguistic analyses that are grounded in and linked with the material evidence. It further provides the potential ground-truth set, based on existing editions, for training new handwritten text recognition models. Background There are a number of standards for the digital description of text-bearing objects and/or texts deriving from those objects. Many of these standards and applications are focused on the material objects (IIIF, CIDOC-CRM, Transkribus / PAGE XML and related formats, and TEI’s manuscript description tagset), that is, they take the material object as their starting point but may have extensions for encoding the text (e.g. IIIF annotations, CRMtext). A number of alternative standards and applications exist for encoding philological editions of early texts. The Text Encoding Initiative (TEI), an XML-based standard, is by far the most widely implemented of these. Thirdly, there exists a set of applications and de facto standards for linguistic analysis such as Corpus Workbench. These use still different structures and models, even though in the case of historical languages and text, the underlying corpora derive from unique physical text-bearing artefacts. Some projects have gone some way to overcoming these boundaries. The Menota project has for two decades been maintaining a set of standards based on TEI as well as hosting digital editions based on those standards. Menota’s editions are based on unique physical objects (manuscripts, charters and inscriptions), and the page and line boundaries of the material source are incorporated into almost all editions. At the same time word and punctuation tokens form a central part of the model. This allows for additional linguistic annotation. Menota’s archive is currently hosted as part of Norway’s CLARIN infrastructure. This approach provides a bridge between the physical artefact, whereby manuscript images are linked to pages of transcription, as well as corpus linguistic tools. A new set of technologies has emerged more recently that allow for automatic analysis of digital images of text, such as Transkribus and eScriptorium/Kraken. These can identify text regions, lines and words and characters on a digitally-imaged page and then recognise the characters with varying degrees of accuracy, once trained to do so. Transkribus, for example, is a tool that allows synchronous editing of texts from objects: handwritten text recognition (HTR) technology generates text from a manuscript page, which can then be corrected and edited as TEI/XML. The workflow is unidirectional, however: TEI/XML documents cannot be linked to the manuscript pages in any detail, and there is little compatibility between the HTR data formats (PAGE XML, for example) and the potential resulting TEI/XML. This study describes an application and model (MenotaG) which is designed to integrate the processes of describing a physical object, editing its text and analysing its language. This application has been driven by two projects in particular: Editiones Arnamagnæanae Electronicae, a collaboration headed by the present author between the Universities of Iceland and Copenhagen to publish peer-reviewed manuscript-based digital editions, as well as the Dictionary of Old Norse Prose, which requires high-quality editions for its work in the semantic analysis of the Old Norse corpus. A process and model that is asynchronous enables a process which takes existing TEI/XML documents and deeply links them to the physical objects they derive from as well as vice-versa. This opens the possibility of using existing transcriptions for training new HTR models, as well as creating interactive digital editions integrating image and text. Model The data model that has been developed for this project has been designed to link together the three main domains of analysis. The principal goal is to be able to effectively link the textual, linguistic and material information together, while maintaining compatibility and ideally digital links to the data sources. The model is under development and is described at https://menotag.ku.dk/q?p=menota/home/about. The model is graph-based, realised in the application as a relational database. This allows for leveraging mature technologies for editing and publishing data and building complex applications, as well providing the suite of in-built spatial types and functions (OpenGIS standard) found in modern relational database management systems. The text-bearing object is treated as a series of surfaces on which text is written in lines and lines are a series of word tokens (which may also continue onto another line) or punctuation tokens. The spatial relationship between the surface and lines, words and non-linguistic features are recorded as polygons relating to the digital images representing the surface. This extends the capabilities of IIIF annotations, which can only refer to rectangles on the image. As the data encoded is a superset of IIIF, it can be represented also in this standard. The text is itself treated in a way compatible with TEI and Menota’s application of it: a work consisting of a hierarchical, ordered structure, which at the most detailed level consists of a series of word and punctuation tokens. All elements in this structure are linked to the underlying TEI by use of XPath paths. These in turn have different representations depending on the relative closeness on the one hand to the physical object’s representation of the text (‘facsimile’ level in Menota parlance) and on the other the underlying linguistic entity (‘normalised’ text), with an intermediate form corresponding to Old Norse diplomatic edition conventions (roughly script normalised, orthography unnormalised, expansions of abbreviations marked). The token here is what allows the linguistic structure to be connected to the physical object. Application The application at menotag.ku.dk is designed to use existing tools and resources while providing optimal methods and workflows for textual editing. A user wishing to begin transcribing and editing a manuscript that does not have an existing digital transcription can begin by linking images, either from an IIIF manifest or to the main repository of digitised manuscripts in Old Norse and Icelandic at handrit.is. Manuscript pages can be segmented automatically using Kraken and the user is then provided with an interface where the transcription of each line can be entered under the image of the line itself. For Latin works Kraken’s existing downloadable models can be used to perform a preliminary HTR on the page. If the manuscript is unreadable by these technologies, the user can easily draw outlines of the manuscript lines themselves. The resulting transcriptions can be tokenised automatically and normalisation and lemmatisation applied using machine-learning to assist the user. The resulting information can be exported to both PAGE XML and TEI. Where existing Menota-style TEI documents are available, these can be imported using a simple interface and the user can begin the process of linking the text and tokens to the image. Once segmented, either manually or with the help of kraken, the transcription lines can be automatically linked to the TEI text. The original TEI document can be then updated with the new information and exported. This paper will demonstrate an edition that mixes these techniques. The thirteenth-century Icelandic Third Grammatical Treatise has three independent manuscripts. One has been transcribed and is available in Menota’s archives. A second is a relatively clear copy, and a third is a very difficult to read palimpsest. The first is imported, linked to images and enhanced in this system. The second has been segmented automatically and transcribed manually, and the third segmented and transcribed manually. The three versions are linked using automatic tools for collation, providing automatic registration of variants and generation of a stemma. Future development The application is currently not released as open source, the main reason being institutional security policies, but work is under way to comply with local rules in order to release the code and ensure that it is externally-deployable. The machine learning tools, however, benefit from a single deployment per language. The application itself is being tested in both teaching and research environments, and is being actively developed in anticipation of the launch of the first digital volume of Editiones Arnamagnæanae Electronicae this year (late 2024). Work is also underway on exposing the image-text links using the IIIF annotations API. Ultimately the whole data structure will be exposed as Linked Open Data and/or with a SPARQL endpoint. Challenges still remain with respect to speed: while the application leverages different browser APIs to ensure that the image processing is done as efficiently as possible, initial page loads for each image are slow at the server end (around 10s per page image). This is due to the complex model which links the manuscripts to text structure, in part recursively, being loaded in full for each manuscript page. Other standards-related issues remain, particularly in exporting full TEI XML including the text structure, and in re-exporting imported TEI, particularly where word tokens have been deleted or added. MenotaG is also waiting for the outcome of other projects in standardising graph models for text, particularly the Semantic TEI project. 11:30am - 11:45am
Quantifying Medieval Scribal Habits – The Case of Abbreviations in West Norse Manuscripts 1Department of Linguistics and Philology, Uppsala University, Sweden; 2Department of Scandinavian Languages, Uppsala University, Sweden Abbreviations were an integral part of the medieval script with Latin letters, as a way of increasing the writing speed and saving parchment. Originating in Latin texts, the system was transferred into texts in vernacular languages, although to a very varying degree. In the Nordic medieval vernaculars, the West Norse area (Norway and Iceland) made a greater use of abbreviations than the East Norse area (Sweden and Denmark). Furthermore, within the West Norse area, the Icelandic manuscripts are especially characterized by an elaborated use of abbreviations. In the Menota Handbook, ch. 6.1, it is stated that as much as one third of the words are abbreviated in some Icelandic manuscripts. In the present study, the West Norse abbreviation system is in focus. Our aim is to extract as much information as possible regarding the use of the abbreviations from digital editions of Old West Norse texts, by combining digital methods for information extraction and qualitative analysis of the retrieved data. We have divided our investigation into four separate, but interrelated, questions:
For our digital method in question 2 above, we need digital texts transcribed in a format where both abbreviation sign and expansion are available. Texts having only the expansion can also be used for questions 1, 3 and 4. Furthermore, we need texts of a certain length (at least 10 000 tokens) in order to make the quantification reliable. We have used all available texts meeting the demands above in the digital text archives in Menota[1] and Emroon[2]. Within this text corpus, we have texts from both medieval Norway and Iceland, allowing for a comparison between Iceland and Norway, and texts from different times, allowing for the investigation of the chronological development. In total, the Icelandic dataset contains approximately 350,000 words from the time period 1280–1425, and the Norwegian dataset consists of roughly 720,000 words from the time period 1200–1350. Of course, when more material becomes available, the picture given by our investigation will be further nuanced. A common formal classification of the medieval abbreviations is that of dividing them into suspensions (the end of the word is abbreviated), contractions (the middle of the word is abbreviated), superscript letters and special signs (e.g. Hreinn Benediktsson 1965: 85). Regarding the content of the abbreviations, i.e. what linguistic units they represent, they can be divided into two types: 1) abbreviations representing lexical or grammatical forms and 2) abbreviations representing graphemic-phonetic units. Many of the words that are abbreviated in a lexical way are highly frequent words, either names, highly frequent within a certain text (e.g. Gunnlaugr in Gunnlaugs saga), or other words recurring in many texts (e.g. verbs like segja or mǽla). It should be noted that one and the same abbreviation sign can be used for both lexical and graphemic-phonetic representation; superscript ‘m’ can be used in lexical abbreviations like ‘r’ + superscript ‘m’ for riddurum (dat. of riddari) as well as for the sequence ‘um’, occurring in different words. It is also known that recurring formulas are often abbreviated through suspensions, e.g. in manuscripts with poetry or legal texts (e.g. Hreinn Benediktsson 1965: 87). In our investigation, we will address both these perspectives; we will account for the type of abbreviation as well as for the units being abbreviated. Indeed, for all the research questions above, tentative answers can be given beforehand. It is sometimes stated in the literature that the earliest Icelandic manuscripts are less abbreviated than the later ones, and one could assume that there is a general trend during the Middle Ages towards a more frequent use of abbreviations (nr 1). Also, one could assume that the number of different abbreviations would decrease during the Middle Ages, parallel to the reduction of the number of different ordinary letter forms (nr 2). Furthermore, it is very likely that the frequency of the words affects the proneness of the scribe to abbreviate (nr 3), and finally it is often pointed out that the tendency to abbreviate is stronger on Iceland than in Norway (4). However, our empirical basis and our digital methodology will allow us to quantify the use of abbreviations in West Norse manuscripts in a way that has not been done before. In our study, we show that some of the previously stated hypotheses about the use of abbreviations in West-Norse manuscripts could be validated on an empirical basis. Abbreviations appear to be used gradually more frequently during the Middle Ages on Iceland. The number of different abbreviation signs do not increase in the texts that we have investigated, however; on the contrary, certain abbreviation signs used in the older texts have disappeared in the later text. The tendency that Icelandic manuscripts are more heavily abbreviated than the Norwegian ones, is also confirmed and quantified in our investigation. [1] https://clarino.uib.no/menota/catalogue [2] https://www.emroon.no/#
|
Contact and Legal Notice · Contact Address: Privacy Statement · Conference: DHNB 2024 |
Conference Software: ConfTool Pro 2.6.153+TC+CC © 2001–2025 by Dr. H. Weinreich, Hamburg, Germany |