Conference Agenda (All times are shown in Mountain Daylight Time)

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
Paper Session 09: Text and Data Processing
Time:
Monday, 01/Nov/2021:
8:00am - 9:30am

Session Chair: Haihua Chen, University of North Texas, USA
Location: Salon C, Lobby Level, Marriott

Show help for 'Increase or decrease the abstract text size'
Presentations
8:00am - 8:30am
ID: 184 / PS-09: 1
Long Papers
Confirmation 1: I/we agree if this paper/presentation is accepted, all authors/panelists listed as “presenters” will present during the Annual Meeting and will pay and register at least for the day of the presentation.
Confirmation 2: I/we further agree presenting authors/panelists who have not registered on or before the early bird registration deadline will be removed from the conference program, and their paper will be removed from the Proceedings.
Confirmation 3: I/we acknowledge that all session authors/presenters have read and agree to the ASIS&T Annual Meeting Policies found at https://www.asist.org/am21/submission-types-instructions/
Topics: Data Science; Analytics; and Visualization
Keywords: Wikipedia article quality assessment, language representation model, deep ensemble learning

Measuring Quality of Wikipedia Articles by Feature Fusion-Based Stack Learning

Jingrui Hou, Jiangnan Li, Ping Wang

Wuhan University, People's Republic of China

Online open-source knowledge repository such as Wikipedia has become an increasingly important source for users to access knowledge. However, due to its large volume, it is challenging to evaluate Wikipedia article quality manually. To fill this gap, we propose a novel approach named “feature fusion-based stack learning” to assess the quality of Wikipedia articles. Pre-trained language models including BERT (Bidirectional Encoder Representations from Transformers) and ELMo (Embeddings from Language Models) are applied to extract semantic information in Wikipedia content. The feature fusion framework consisting of semantic and statistical features is built and fed into an out-of-sample (OOS) stacking model, which includes both machine learning and deep learning models. We compare the performance of proposed model with some existing models with different metrics extensively, and conduct ablation studies to prove the effectiveness of our framework and OOS stacking. Generally, the experiment shows that our method is much better than state-of-the-art models.



8:30am - 9:00am
ID: 262 / PS-09: 2
Long Papers
Confirmation 1: I/we agree if this paper/presentation is accepted, all authors/panelists listed as “presenters” will present during the Annual Meeting and will pay and register at least for the day of the presentation.
Confirmation 2: I/we further agree presenting authors/panelists who have not registered on or before the early bird registration deadline will be removed from the conference program, and their paper will be removed from the Proceedings.
Confirmation 3: I/we acknowledge that all session authors/presenters have read and agree to the ASIS&T Annual Meeting Policies found at https://www.asist.org/am21/submission-types-instructions/
Topics: Data Science; Analytics; and Visualization
Keywords: semantic alignment, fitness assessment, data selection, multiple data streams, data practices

The Reproducible Data Reuse (ReDaR) Framework to Capture and Assess Multiple Data Streams

Donald Keefer, Catherine Blake

University of Illinois at Urbana-Champaign, USA

Much of the literature in knowledge discovery from data (KDD) focuses on algorithms that are faster and more accurate at capturing patterns in a given data set. However, answering a research question is fundamentally connected with how well the data is aligned with the questions being asked. Thus, data selection is one of the most important steps to ensure that models produced from the KDD process are useful in practice. A lack of documentation about the data selection rationale and the transformations needed to semantically align the data streams prevents others from reproducing the research and obfuscates development of best practices in data integration. Our goal in this paper is to provide KDD practitioners with a framework that brings together theories in provenance, information quality, and contextual reasoning, to enable researchers to achieve a semantically aligned dataset with data selection, description, and documentation based on an application-focused assessment.



9:00am - 9:15am
ID: 273 / PS-09: 3
Short Papers
Confirmation 1: I/we agree if this paper/presentation is accepted, all authors/panelists listed as “presenters” will present during the Annual Meeting and will pay and register at least for the day of the presentation.
Confirmation 2: I/we further agree presenting authors/panelists who have not registered on or before the early bird registration deadline will be removed from the conference program, and their paper will be removed from the Proceedings.
Confirmation 3: I/we acknowledge that all session authors/presenters have read and agree to the ASIS&T Annual Meeting Policies found at https://www.asist.org/am21/submission-types-instructions/
Topics: Data Science; Analytics; and Visualization
Keywords: Organic Materials, Automated Knowledge Extraction, Named-Entity-Recognition, Text Mining, Deep Learning

Text to Insight: Accelerating Organic Materials Knowledge Extraction via Deep Learning

Xintong Zhao1, Steven Lopez2, Semion Saikin3, Xiaohua Hu1, Jane Greenberg1

1Drexel University, USA; 2Northeastern University, USA; 3Kebotix, Inc., USA

Scientific literature is one of the most significant resources for sharing knowledge. Researchers turn to scientific literature as a first step in designing an experiment. Given the extensive and growing volume of literature, the common approach of reading and manually extracting knowledge is too time consuming, creating a bottleneck in the research cycle. This challenge spans nearly every scientific domain. For the materials science, experimental data distributed across millions of publications are extremely helpful for predicting materials properties and the design of novel materials. However, only recently researchers have explored computational approaches for knowledge extraction primarily for inorganic materials. This study aims to explore knowledge extraction for organic materials. We built a research dataset composed of 855 annotated and 708,376 unannotated sentences drawn from 92,667 abstracts. We used named-entity-recognition (NER) with BiLSTM-CNN-CRF deep learning model to automatically extract key knowledge from literature. Early-phase results show a high potential for automated knowledge extraction. The paper presents our findings and a framework for supervised knowledge extraction that can be adapted to other scientific domains.



9:15am - 9:30am
ID: 234 / PS-09: 4
Short Papers
Confirmation 1: I/we agree if this paper/presentation is accepted, all authors/panelists listed as “presenters” will present during the Annual Meeting and will pay and register at least for the day of the presentation.
Confirmation 2: I/we further agree presenting authors/panelists who have not registered on or before the early bird registration deadline will be removed from the conference program, and their paper will be removed from the Proceedings.
Confirmation 3: I/we acknowledge that all session authors/presenters have read and agree to the ASIS&T Annual Meeting Policies found at https://www.asist.org/am21/submission-types-instructions/
Topics: Library and Information Science
Keywords: book reviews, text mining, affective terms, mood, emotion

Moods in Book Reviews: Text Mining Approach

Hyerim Cho, Denice Adkins, Jenny Bossaller, Heather Moulaison-Sandy

University of Missouri, USA

Spiteri and Pecoskie (2018) proposed a taxonomy of terms to describe emotion and tone in novels. We tested those terms against 5144 full-text book reviews from the New York Times Book Review to discover whether the proposed terms were used in published reviews to describe books, and of those terms used, which were most used. Findings demonstrate that the terms chosen by Spiteri and Pecoskie are used in professional book reviews, though some may be used in multiple ways, rather than only related to emotional content. Results of this work contribute to a larger scale project of testing machine models of identifying emotional content in books and ultimately being able to create automated media recommendation systems that include emotion as an identifier.



 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: ASIS&T 2021
Conference Software - ConfTool Pro 2.6.143+TC
© 2001–2022 by Dr. H. Weinreich, Hamburg, Germany