10:30am - 11:00amJustifying Biodiversity Data Access Restrictions: A Global Comparison of Data Policies
M. Kaehrle, K. Eschenfelder
University of Wisconsin-Madison, USA
Web-accessible biodiversity databases accept and openly share species observations from the public, benefitting research, conservation, and education. However, public data sharing can also bring harm, for example by facilitating poaching. Databases may mitigate harm by designating certain species as “sensitive” and restricting access to those species data. In this paper, we describe how 39 databases that share participatory science data justify automatic data access restrictions based on species-related concerns. We developed a codebook describing rationales for restricting access to sensitive species data and analyzed rationale use in relation to database characteristics such as size, type, host institution type, national origin, and taxonomic scope. We found a small set of commonly used rationales, wide variation in the number of rationales provided, and a surprising number of databases citing few rationales. Larger databases, aggregators, and governmental databases tended to cite more rationales, but there were numerous outliers. We hope our framework of rationales will support databases seeking to document data restrictions and assist with the creation of controlled vocabulary terms. The study also provides examples that could guide data curation education about responsible policy development.
11:00am - 11:15am“Unnecessarily cumbersome”: Researchers’ Opinions on Restricted Data Access Systems
M. A. Brown1, A. Thomer2, L. Hemphill1
1University of Michigan, USA; 2University of Arizona, USA
Research data archives use restricted data access protocols to manage access to sensitive data. However, restricted data access systems can be cumbersome for researchers to engage in data reuse, as the systems frequently implemented introduce friction into the research process. We fielded a survey of 481 data reusers at the Inter-university Consortium for Political and Social Research (ICPSR) in 2020 about restricted data access systems. We found that 80% of respondents would be more likely to reuse data if restricted data access applications were made faster and easier. Additionally, most researchers indicated they believe that the security of research data is very important. However, researchers disagreed on the appropriate set of mechanisms to ensure that research data remains secure, especially discounting interventions that introduce friction to accessing data. These findings present challenges for archives in implementing restricted data access systems that balance protecting research subjects with encouraging data reuse.
11:15am - 11:45amUnderstanding Data Search Behaviors Through the Lens of Search Stages: A Comparative Study of Data Retrieval Systems and Generative Search Engines
S. Wu1, S. Peng2, Q. Li3, P. Wang2
1Nanyang Technological University, Singapore; 2Wuhan University, China; 3Nankai University, China
Generative search engines address limitations of traditional data retrieval systems, including rigid keyword-based queries, impersonalized results, and choice overload. However, they introduce new challenges such as prompt literacy demands, hallucination risks, and reduced output diversity. While these trade-offs fundamentally reshape user interactions with search systems, the comparative dynamics of search behavior across generative and traditional systems remain underexplored. This study bridges this gap by analyzing data search behaviors through a search stage framework, revealing distinct interaction patterns. Building upon the Information Search Process Model and Information Seeking Behavior Model, this study proposes a stage model of data search behavior. Experimental data were analyzed to explore the proposed model. Our findings identify both convergent and divergent behavioral patterns: while certain search stage types and behaviors overlap across systems, substantial differences emerge in stage transition dynamics (encompassing transition types, frequencies, and pathways) and specific behaviors. This study uncovers a fundamental tension in data search: traditional retrieval systems support broad exploratory patterns but constrain interaction depth, while generative search engines enable deeper engagement at the expense of exploration breadth. This trade-off between breadth and depth presents significant implications for the design of next-generation intelligent retrieval systems that optimize both dimensions of user interaction.
11:45am - 12:00pmInteractive Graph Visualization and Teaming Recommendation in an Interdisciplinary Project’s Talent Knowledge Graph
J. Xu1, J. Chen1, Y. Ye2, Z. Sembay3, S. Thaker3, P. Payne-Foster3, J. Chen3, Y. Ding1
1School of Inforamation, University of Texas at Austin; 2Columbia University; 3School of Medicine, University of Alabama at Birmingham
Interactive visualization of large scholarly knowledge graphs combined with LLM reasoning shows promise but remains under-explored. We address this gap by developing an interactive visualization system for the Cell Map for AI Talent Knowledge Graph (28,000 experts and 1,179 biomedical datasets). Our approach integrates WebGL visualization with LLM agents to overcome limitations of traditional tools such as Gephi, particularly for large-scale interactive node handling. Key functionalities include responsive exploration, filtering, and AI-driven recommendations with justifications. This integration can potentially enable users to effectively identify potential collaborators and relevant dataset users within biomedical and AI research communities. The system contributes a novel framework that enhances knowledge graph exploration through intuitive visualization and transparent, LLM-guided recommendations. This adaptable solution extends beyond the CM4AI community to other large knowledge graphs, improving information representation and decision-making. Demo: https://cm4aikg.vercel.app/
|