Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
Please note that all times are shown in the time zone of the conference. The current conference time is: 21st Dec 2025, 03:15:17pm GMT
|
Session Overview |
| Session | ||
Session A: AI/ML: Curation challenges and opportunities: I
Paper session.
| ||
| Presentations | ||
Leveraging LLM for Semantic Search and Curation in a National Research Data Catalog INRAE, France We present a suite of operational services (TRL 7-9) that leverage Artificial Intelligence to augment, not replace, human expertise. We have developed a prototype national catalog for French research data that integrates hybrid search capabilities with a suite of AI-driven tools for metadata enhancement and quality assessment. The catalog combines traditional faceted search with a multilingual semantic search engine, using bi-encoder models for efficient retrieval and cross-encoders for precise reranking. To tackle metadata inconsistency, we utilize right-sized, open-source LLMs like Mistral Small to align entities to controlled vocabularies (e.g., ROR) and generate standardized classifications (e.g. scientific disciplines). This approach minimizes computational costs and environmental impact while ensuring transparency by always distinguishing between original and AI-generated metadata. Acknowledging metadata can be of low quality, we have also built a novel curation analysis tool using a few-shot LLM to assess the semantic substance of descriptions. Our roadmap focuses on evolving these tools into a proactive "FAIR by Design" ecosystem. Taming the AI Curator: A Content Focused Data Description Diagnostic and Assistive Writing Tool 1University of Texas at Austin, United States of America; 2Washington University in St. Louis, United States We designed an AI based tool to diagnose and help users write clear, accurate, and complete data descriptions. The tool's components include best practices data description guidelines, data descriptions reviewed by experts as few-shot prompts, and chain of thought reasoning to explain the diagnostic outputs. We engineered our prompts and Large Language Model choice so that a score of 8 reflects an acceptable data description. Users can double check the evaluations and the assisted descriptions to minimize scores inconsistent with expert reviewers and hallucinated outputs. The application is crafted to match the standards of our field and to be used with guided intention. Scaling Data Sharing Expertise with AI: a Case Study from DataSeer and Taylor & Francis 1Taylor & Francis, United Kingdom; 2DataSeer This paper outlines the collaborative development of an AI data curation tool to support data sharing in the journal publishing workflow. As scrutiny of research increases, concerns have grown regarding reproducibility, as well as fraud and bad actors in the research lifecycle. Transparent, reproducible, and well-curated data is foundational to restoring confidence. In this paper we describe the current data sharing policy landscape at academic publishers and outline key challenges which might limit further data policy implementation and enforcement on journals. We provide insights into a new approach to data sharing compliance checks, the DataSeer SnapShot tool, and how this tool was developed with the collaboration of the Open Science-, Implementation-, and Editorial Operations teams at academic publisher Taylor & Francis. The potential future iterations of the tool and the implications of its wider implementation are also discussed. | ||
