86th Annual Meeting of the American Society for Information, Science & Technology (ASIS&T 2023)

Session

Paper Session 21: Knowledge Organization and Cultural Analytics

Time:

Tuesday, 31/Oct/2023:

11:30am - 1:00pm

Session Chair: Deborah Lee, University College London, UK

Location: Bordeaux Suite, 2nd Floor, Novotel

Presentations

11:30am - 11:45am
ID: 265 / PS-21: 1
Short Papers
Confirmation 1: I/we acknowledge that all session authors/presenters have read and agreed to the ASIS&T Annual Meeting Policies
Topics: Archives; Data Curation; and Preservation (archives; records; cultural heritage materials; digital data curation; digital libraries; digital humanities)
Keywords: knowledge organization, Chinese ancient book, ontology, system design, digital humanities

Using Ontology to Organize Chinese Ancient Books in the Digital Age

Linxu Wang, Tong Wei, Jun Wang

Peking University, People's Republic of China

The digitization, curation, and utilization of Chinese ancient books are crucial to the digital humanities. Despite progress in these areas, issues with data interoperability, data sharing, and data linkage persist due to a lack of standardized annotated ancient corpus and a general description framework for ancient books. To overcome these challenges, this paper proposes an ontology-based description framework that integrates catalogs of Chinese ancient books from various institutions, creating a standardized, interpretable, and researchable knowledge base. The framework combines general standards with unique ancient book characteristics, revealing complex relationships between books and books, books and people, and books and times, providing a more comprehensive understanding of the knowledge contained within ancient books. Additionally, this paper applied the framework to The National Rare Ancient Book Directory, a catalog containing 13,026 books from over 400 institutes, to develop an interactive system. The system is available at https://rarebib.pkudh.org/. Our results demonstrate that the framework standardizes data and provides a sophisticated and nuanced understanding of the knowledge within ancient books. This has noteworthy implications for individuals engaged in research, scholarship, and reading in the digital age.

11:45am - 12:10pm
ID: 337 / PS-21: 2
Long Papers
Confirmation 1: I/we acknowledge that all session authors/presenters have read and agreed to the ASIS&T Annual Meeting Policies
Topics: Archives; Data Curation; and Preservation (archives; records; cultural heritage materials; digital data curation; digital libraries; digital humanities)
Keywords: Person-oriented ontology, Biographical ontology, Digital humanities, Metadata crosswalk

Person-Oriented Ontologies Analysis for Digital Humanities Collections from a Metadata Crosswalk Perspective

Rui Liu¹, Dana Mckay², George Buchanan¹

¹University of Melbourne, Australia; ²RMIT University, Australia

Mapping between different representations of similar data is a common challenge in digital humanities (DH). In practical DH collections, the ‘person’ is an essential and centric unit and other parts could link to the ‘person’ to form the knowledge base. However, there is still no general and useful person-oriented ontology in DH community. Many practical DH projects have developed their own ontologies by DH experts, but these ontologies are not interoperable. Therefore, it is important to explore existing biographical ontologies and develop a comprehensive person-oriented ontology for DH.

Using the metadata crosswalk method, we examined the ontologies provided for persons in three DH collections to analyze how they map onto standard ontologies such as FOAF (friend of a friend). This paper uncovers a significant and consistent gap between standard biographical ontologies and those used in practical DH collections, arriving at a set of heterogeneous problems, including different granularities of metadata. Consequently, we propose three key person-oriented ontological types of elements, drawing on this metadata crosswalk: basic biographical elements, relational elements, and explanatory elements (such as career, connected with role and time). This metadata crosswalk provides a foundation for future matching between person-oriented ontologies and facilitates semantic interoperability between DH collections.

12:10pm - 12:35pm
ID: 166 / PS-21: 3
Long Papers
Confirmation 1: I/we acknowledge that all session authors/presenters have read and agreed to the ASIS&T Annual Meeting Policies
Topics: Informetrics and Scholarly Publishing (bibliometrics; infometrics; scientometrics; altmetrics; open science; scholarly communication and new modes of publishing; measurement of information production and use)
Keywords: Interdisciplinary Prediction, Interdisciplinary Topic, Co-word Network, Link Prediction, Digital Humanities

Interdisciplinary Topic Link Prediction Based on Co-Word Network: A Case Study on Digital Humanities

Chaoguang Huo¹, Yueji Han¹, Chenwei Zhang², Fanfan Huo¹, Xiaobin Lu¹

¹Renmin University of China, People's Republic of China; ²University of Hong Kong, People's Republic of China

Interdisciplinary research plays a crucial role in addressing complex challenges in science, technology, and society. Predicting interdisciplinary links between topics can unveil potential interdisciplinary relationships and foster innovation. Considering topics extracted from interdisciplinary research as interdisciplinary topics, we predict the potential links among them based on their co-word network, and we propose integrating topic semantic content features, author direct-collaboration features, and indirect-collaboration features to improve prediction accuracy. Based on Graph Convolutional Networks (GCN), Graph Attention Networks (GAT), GraphSAGE, Bert, and Node2vec, interdisciplinary topic link prediction models are constructed. We use digital humanities as a case study and our experimental results show that the integration of semantic content, direct-collaboration, and indirect-collaboration features significantly enhances the Area Under the Curve (AUC) and Average Precision (AP) performance, outperforming predictions based solely on the co-word network. The predicted results provide valuable research directions and references for digital humanities scholars, with examples in the fields of cultural heritage and historical geographic information systems.

12:35pm - 12:50pm
ID: 324 / PS-21: 4
Short Papers
Confirmation 1: I/we acknowledge that all session authors/presenters have read and agreed to the ASIS&T Annual Meeting Policies
Topics: Data Science; Analytics; and Visualization (data science; data analytics; data mining; decision analytics; social analytics; information visualization; images; sound)
Keywords: named entity recognition, machine learning, data science, cultural analytics, Native American studies

Tuning out the Noise: Benchmarking Entity Extraction for Digitized Native American Literature

Nikolaus Nova Parulian¹, Ryan Dubnicek¹, Daniel Evans¹, Yuerong Hu¹, Glen Layne-Worthey¹, J. Stephen Downie¹, Raina Heaton², Kun Lu², Raymond Orr³, Isabella Magni⁴, John Walsh⁵

¹University of Illinois at Urbana-Champaign, USA; ²University of Oklahoma, USA; ³Dartmouth College, USA; ⁴University of Sheffield, UK; ⁵Indiana University, USA

Named Entity Recognition (NER), the automated identification and tagging of entities in text, is a popular natural language processing task, and has the power to transform restricted data into open datasets of entities for further research. This project benchmarks four NER models–Stanford NER, BookNLP, spaCy-trf and RoBERTa–to identify the most accurate approach and generate an open-access, gold-standard dataset of human annotated entities. To meet a real-world use case, we benchmark these models on a sample dataset of sentences from Native American authored literature, identifying edge cases and areas of improvement for future NER work.

Conference Agenda (All times are shown in Greenwich Mean Time (GMT) unless otherwise noted)