Conference Agenda (All times are shown in Eastern Daylight Time)
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
Truth in the Timestamps: Data Management as a Shield Against Misconduct
A. Yoon1, J. {. Kim2
1Indiana University Indianapolis, USA; 2University of North Texas, USA
While research misconduct poses a serious threat to science, the role of data management in preventing misconduct, particularly by ensuring that raw or primary data is securely archived and accessible for audits or verification, remains underexplored. This study examines the existing and potential relationships between research misconduct and data management practices.
11:15am - 11:45am
The Uneven Impact of Big Data in Science
X. Han, O. J. Gstrein, V. Andrikopoulos
University of Groningen, Netherlands, The
Data practices vary widely across scientific disciplines. While Big Data has significantly transformed research activities across various domains and has been heralded as a revolutionary force in scientific paradigms, its application has not been uniform across all fields. This study examines Big Data research and practices in data-intensive scientific domains, identifying its distinct features and revealing the uneven adoption and impact of Big Data across disciplines. Our findings indicate that discussions on the epistemological concepts and definitions of Big Data in data-intensive scientific domains are limited, with little divergence among scholars. Machine learning emerges as a central technological focus across disciplines, closely integrated with research topics and widely driving scientific advancements. Additionally, this paper highlights the instrumental role of Big Data in scientific inquiry and underscores the disparities in its impact across different disciplines. Through this review, we aim to foster a more comprehensive understanding of Big Data’s evolving role in science, emphasizing the need for continued critical reflection as its influence continues to develop.
11:45am - 12:00pm
An Exploratory Study of the Cross-border Flow of Research Data in the US and China
R. Tao1, L. Xu1, Y. Du2, J. Ye1
1Nanjing University, People's Republic of China; 2University of North Texas, USA
In the era of globalization, academic resources flow around the world. The cross-border flow of traditional academic publications such as e-journals and databases has received widespread attention and has formed basic flow rules. As an emerging and important academic resource, research data also travels across borders. However, most of the research focuses on how to regulate the cross-border flow of research data, and very few addressed statistical analysis of the cross-border flow of research data. This paper aims to study Chinese and the US research data repositories and illustrated the cross-border flow and multi-national cooperation of research data among the research data repositories, using proportion analysis, word frequency analysis, and social network analysis. This paper found that the degree of data localization in China and the US is high. Among the flowing research data, the US is more diverse in terms of flow direction, subjects and cooperative relationships.
12:00pm - 12:30pm
Embracing Training Dataset Bias for Automated Harmful Detection
A. Schöpke Gonzalez, N. Kim, L. Hemphill
University of Michigan, USA
The increasing volume of social media content surpasses the capacity of human moderation and poses psychological risks, leading to a need for automated moderation systems. However, these systems often exhibit biases against minoritized groups. One way to mitigate these biases is by altering the training data, which are biased by human annotators. Increasing diversity among annotators can help, but implementing this is challenging for machine learning specialists and tends to focus on minimizing identity-based bias rather than embracing diverse perspectives. Using moral systems theory from social psychology, we suggest that automated systems should incorporate diverse, context-aware interpretations of harm, embracing biases to adequately address moderation issues. We analyze how different dimensions of 2,180 U.S.-based annotators’ personal moral systems like institutional affiliation (religion, political party), values (political ideology), and identities (age, gender, sexual orientation, and race, ethnicity, or place of origin) influenced how they judged whether 101 social media comments were harmful. We find institutional affiliations have the greatest impact on labeling, followed by values and identities. These insights advocate for a diversity approach that reflects community-specific user bases, allowing model developers and online communities to intentionally select biases for better moderation outcomes.