Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
Please note that all times are shown in the time zone of the conference. The current conference time is: 21st Dec 2025, 05:07:53pm GMT
|
Session Overview |
| Session | ||
Session H: Discoverability
Paper session.
| ||
| Presentations | ||
A Citation Analysis of Government of Canada Open Data in Academic Literature: Leveraging AI for Open Data Archive Impact Assessment Concordia University, Canada This presentation introduces the first comprehensive analysis of how Government of Canada open data is cited in academic literature, addressing a critical challenge digital curators face: how to demonstrate the impact and value of open data collections. Using a fine-tuned BERT language model trained on over 3,000 manually verified citation examples, this study overcame the problem of inconsistent data citation standards. This study leveraged AI to identify 3,953 citing articles with 91% accuracy, significantly outperforming traditional keyword-matching methods at 73% accuracy. The study reveals key usage patterns across disciplines, identifying environmental science, agriculture, and immigration studies as primary users of Canadian government data. The study's findings provide digital curators with evidence-based insights for strategic collection development and resource allocation decisions, while the open-source methodology offers the community immediately deployable tools for impact assessment. In an era of budget cuts where archives must continually justify their value, this study demonstrates how AI can enhance traditional bibliometric approaches to provide more comprehensive and accurate measures of collection impact, directly addressing contemporary challenges in digital curation. ‘&%$£ In = &%$£ Out: How Controlled Vocabularies and Metadata Standards Are Fundamental for Developing Open Research Indicators University of Bristol, United Kingdom In 2024 the UK Reproducibility Network (UKRN) initiated a set of pilots involving institutional members and solution providers to establish good practice in institutional monitoring of Open Research through the creation of robust indicators. The Open Research Indicators Pilot was sector led, with institutions and solution providers working together to develop, test, and evaluate prototype machine learning solutions with valid, reliable, and ethical indicators for measuring Open Research. The University of Bristol was the lead for the ‘Openness of Data’ pilot and assessed providers’ data to ascertain the usefulness of machine learning for this purpose. The pilot’s findings highlight the inherent challenges and limitations of monitoring and assessing published datasets for openness within a research landscape that prioritises articles as benchmark outputs; the combination of article primacy and existing publisher and repository systems means datasets can currently only be monitored in Data Availability Statements (DAS). Our analysis of machine learning tools confirmed an uncomfortable truth many in the RDM community suspected; we do not have enough openly available machine actionable metadata for digital tools to reliably and accurately extract DAS, and we are not doing enough at the human interface with researchers to ensure their DAS are easy to understand and describe how their data can be found by others, which impacts measuring openness. Standardization Vs. Preservation? Supporting Interoperability by Enhancing Thematic Metadata at Social Science Archives Centre for Social Sciences, Hungary The presentation "Standardization Vs. Preservation: Supporting Interoperability by Enhancing Thematic Metadata at Social Science Archives" addresses the challenges of data standardization and interoperability in social science research. Emphasizing the importance of effective metadata practices, the project ONTOLISST aims to study thematic ontologies with the purpose of improving data discoverability and sharing among diverse research infrastructures. The project, funded by the European Commission's Horizon Europe program, investigates the varied approaches to thematic metadata creation across research repositories containing social science survey data. By analyzing metadata structures and curation practices, the research seeks to identify and explain commonalities and discrepancies in metadata schemes that hinder interoperability. The study highlights the need for rich metadata documentation while navigating the complexities arising from competing standards and the diversity of data describing practices. Drawing on data documentation received from the repositories and extensive interviews with data management experts, the project presents two kinds of outcomes: research studies and technical innovation. The results of analysis feed into the development of a semi-automated thematic metadata-generating scheme based on a simplified thesaurus (LiSST). This tool aims to facilitate the integration and accessibility of social science data, fostering connectivity across disciplines and languages. Thus the anticipated outcome is a harmonized metadata structure that upholds the rich, nuanced meanings of original research while promoting discoverability and reuse. By focusing on the balance between standardization and preservation, ONTOLISST affirms that thoughtful approaches to thematic metadata can yield practical solutions to interoperability challenges, ultimately enhancing the usability and visibility of social science datasets in the global research landscape. | ||
