Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
Please note that all times are shown in the time zone of the conference. The current conference time is: 21st Dec 2025, 03:14:48pm GMT
|
Session Overview |
| Session | ||
Session G: Curating complex data
Paper session.
| ||
| Presentations | ||
On Curating HTR Training Datasets for Romanian Language with use of Transcribathon Tool 1AIT Austrian Institute of Technology GmbH, Austria; 2Facts & Files Digital Services GmbH, Germany; 3CrossLang NV, Belgium This paper presents a workflow for HTR dataset generation in Romanian using Transcribathon’s Correct HTR feature. Leveraging citizen-science transcriptions aligned with Transkribus outputs, our case study on Jurnalul lui Dumitru Nistor reduced CER from 15.26% to 0.13%. The approach enables efficient dataset curation and supports scalable model development in low-resource languages. Dealing with Unprecedented Scale and Complexity: Lessons from Archiving HS2 Digital Archaeological Data University of York, United Kingdom Construction of High Speed 2, the UK's largest linear infrastructure project, brought the need to undertake the most extensive archaeological programme the country had ever seen. Huge challenges to how its archaeological outcome would be recorded and preserved derived from the monumental geographic and temporal scale of the project. As specific government-mandated requirements were imposed on the overall scheme, this created a complex regulatory environment for the large number of parties involved. Discrepancies between companies with distinct methodologies and individual reporting standards posed a threat for the consistency of the records and therefore their preservation, access and reuse. The Archaeology Data Service (ADS) was tasked with setting standards for data deposition, digital preservation, and access to all archaeological data created by HS2. With over 31 terabytes of data in an array of formats, the project shone a light on the limitations of existing frameworks when managing large-scale, heterogeneous datasets while presenting a significant opportunity for innovation in archiving practice, infrastructure, and rationale. The scale and complexity of HS2 introduced both technical and epistemological risks to how we provide long-term digital preservation to the data entrusted in our care while ensuring it remains findable, accessible, interoperable, and re-usable. This paper analyses and reflects upon how the ADS has been transformed by the demands of HS2, not only in its technical capacity but in its understanding of the infrastructural, organisational, and ethical dimensions of large-scale digital curation. Challenges offered a proving ground in which future approaches to archaeological data management and archiving could be tested. This led to new tools, adaptable procedures, better workflows, and more nuanced perspectives on the value of curated data. Our capacity to ensure data integrity and accessibility in perpetuity has expanded, demonstrating the project’s long-term infrastructural benefit to the sector on a wider level. Auditing the Human BioMolecular Atlas Program (HuBMAP) Human Reference Atlas (HRA): An Evaluation of Core Digital Objects Indiana University Bloomington, United States of America Data auditing has become increasingly critical for large-scale biomedical repositories as they serve diverse research communities while maintaining scientific rigor and compliance with established standards. The Human BioMolecular Atlas Program (HuBMAP) aims to map the human body at single-cell resolution through curated spatial and molecular data. The Human Reference Atlas (HRA), a central output of HuBMAP, includes datasets such as Anatomical Structures, Cell Types and Biomarkers (ASCT+B) tables, 2D Functional Tissue Unit (FTU) illustrations, 3D reference organ models, and Organ Mapping Antibody Panels (OMAPs). This study reports the first comprehensive, third-party audit of the HRA, conducted from March to July 2024 to assess data quality, internal consistency, and adherence to Standard Operating Procedures (SOPs). The audit methodology combined systematic evaluation of metadata completeness and correctness with visual inspection protocols designed to assess user experience and functional utility across different digital object types. Using a combination of visual inspections, file metadata analyses, and spreadsheet comparisons across 34 ASCT+B tables, 22 2D FTU illustrations, 70 3D reference models, and 21 OMAP datasets, the audit demonstrated overwhelmingly positive results, with compliance rates of 94-100% across most evaluation criteria. Findings indicate that HuBMAP maintains robust curation standards, with structural issues present in fewer than 10% of ASCT+B tables. This audit provides a replicable model for future quality assurance activities in large-scale biomedical data infrastructures and highlights the importance of continuous audit processes for ensuring data integrity, transparency, and usability in contemporary digital curation contexts. | ||
