10:30am - 11:00amJustifying Biodiversity Data Access Restrictions: A Global Comparison of Data Policies
M. Kaehrle, K. Eschenfelder
University of Wisconsin-Madison, USA
Web-accessible biodiversity databases accept and openly share species observations from the public, benefitting research, conservation, and education. However, public data sharing can also bring harm, for example by facilitating poaching. Databases may mitigate harm by designating certain species as “sensitive” and restricting access to those species data. In this paper, we describe how 39 databases that share participatory science data justify automatic data access restrictions based on species-related concerns. We developed a codebook describing rationales for restricting access to sensitive species data and analyzed rationale use in relation to database characteristics such as size, type, host institution type, national origin, and taxonomic scope. We found a small set of commonly used rationales, wide variation in the number of rationales provided, and a surprising number of databases citing few rationales. Larger databases, aggregators, and governmental databases tended to cite more rationales, but there were numerous outliers. We hope our framework of rationales will support databases seeking to document data restrictions and assist with the creation of controlled vocabulary terms. The study also provides examples that could guide data curation education about responsible policy development.
11:00am - 11:15am“Unnecessarily cumbersome”: Researchers’ Opinions on Restricted Data Access Systems
M. A. Brown1, A. Thomer2, L. Hemphill1
1University of Michigan, USA; 2University of Arizona, USA
Research data archives use restricted data access protocols to manage access to sensitive data. However, restricted data access systems can be cumbersome for researchers to engage in data reuse, as the systems frequently implemented introduce friction into the research process. We fielded a survey of 481 data reusers at the Inter-university Consortium for Political and Social Research (ICPSR) in 2020 about restricted data access systems. We found that 80% of respondents would be more likely to reuse data if restricted data access applications were made faster and easier. Additionally, most researchers indicated they believe that the security of research data is very important. However, researchers disagreed on the appropriate set of mechanisms to ensure that research data remains secure, especially discounting interventions that introduce friction to accessing data. These findings present challenges for archives in implementing restricted data access systems that balance protecting research subjects with encouraging data reuse.
11:15am - 11:30amInteractive Graph Visualization and Teaming Recommendation in an Interdisciplinary Project’s Talent Knowledge Graph
J. Xu1, J. Chen1, Y. Ye2, Z. Sembay3, S. Thaker3, P. Payne-Foster3, J. Chen3, Y. Ding1
1School of Inforamation, University of Texas at Austin; 2Columbia University; 3School of Medicine, University of Alabama at Birmingham
Interactive visualization of large scholarly knowledge graphs combined with LLM reasoning shows promise but remains under-explored. We address this gap by developing an interactive visualization system for the Cell Map for AI Talent Knowledge Graph (28,000 experts and 1,179 biomedical datasets). Our approach integrates WebGL visualization with LLM agents to overcome limitations of traditional tools such as Gephi, particularly for large-scale interactive node handling. Key functionalities include responsive exploration, filtering, and AI-driven recommendations with justifications. This integration can potentially enable users to effectively identify potential collaborators and relevant dataset users within biomedical and AI research communities. The system contributes a novel framework that enhances knowledge graph exploration through intuitive visualization and transparent, LLM-guided recommendations. This adaptable solution extends beyond the CM4AI community to other large knowledge graphs, improving information representation and decision-making. Demo: https://cm4aikg.vercel.app/
11:30am - 11:45amHow Data Reuse Leads to Citation Performance: The Mediating Role of Coauthorship in ICPSR-Based Author Data Coupling
H. Kim, S. Bratt
UNIVERSITY OF ARIZONA, USA
This study investigates how data reuse contributes to scholarly impact by tracing the pathway from dataset reuse similarity to coauthorship formation and citation performance. We introduce Author Data Coupling, defined as the extent to which two authors independently cite the same dataset. Drawing on ICPSR, SciSciNet, and OpenAlex, we analyze 15,575 authors and 3,987,858 author pairs, each pair comprising two researchers who independently cited the same dataset. Using Quadratic Assignment Procedure regression, we find that higher author data coupling is associated with greater average citation performance (H1), showing that dataset reuse similarity alone correlates with higher scholarly impact. Among author pairs with no prior coauthorship, those who later collaborated after independently citing the same dataset tended to receive more citations than those who did not (H2). Stronger data coupling also correlates with a higher likelihood of coauthorship, which in turn correlates with higher citation performance, indicating a mediating effect (H3). Together, these findings suggest that shared dataset reuse not only signals intellectual alignment but is also linked to the emergence of new collaborations. In this way, data reuse operates not only as a foundation for knowledge production but also as relational infrastructure that shapes collaboration and enhances academic influence.
|