Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
|
Session Overview |
| Session | ||
4.15. Harnessing Technology for Archival Access and Preservation
| ||
| Presentations | ||
US Revolutionary War Pension Files AI Transcription Collaboration 1FamilySearch International, United States of America; 2National Archives and Records Administration Short Description In 2024, the US National Archives and Records Administration (NARA) and FamilySearch collaborated to create an AI-driven, entity-enhanced transcription of several million historical images of US Revolutionary War pension files. NARA provided 30 thousand transcribed images to improve the FamilySearch handwritten-text recognition model, and in turn FamilySearch processed more than 2 million Revolutionary War images and returned entity-enhanced transcripts back to NARA for each one. Abstract The US National Archives and Records Administration (NARA) is preparing a special celebration for the 250th anniversary of the signing of the US Declaration of Independence in the summer of 2026. Part of this preparation includes fully transcribed pension packets from the US Revolutionary War, which bring to life the stories of the people who served and their families. To accomplish this, NARA has recruited the help of thousands of volunteers who are manually transcribing the images, but there remained a significant risk that the transcription project would not finish on time.
In October of 2023, NARA representatives at the ICA Congress in Abu Dhabi heard a presentation about the application of AI to handwritten texts at FamilySearch and were interested in exploring a collaboration. In the summer of 2024, the two organizations combined efforts to accelerate the transcription of the US Revolutionary War pension files. NARA contributed 30 thousand faithfully transcribed images from the collection which FamilySearch used to create a new and improved AI model. Then FamilySearch used that model to process the images and delivered the entity-enhanced transcripts back to NARA. Now, NARA is having their expert volunteers review the content and make corrections, and they have taken a massive step towards finishing the 2026 transcription project on time. FamilySearch is happy for the additional training data, and for the chance to expose the Revolutionary War images and transcripts to its own patrons for doing family-history research after a short wait.
This project demonstrated how quickly and effectively technological collaborations can be formulated and carried out within the archive community. Both NARA and FamilySearch are leveraging their lessons learned from this project to make future collaboration even more effective, and the success of the project itself is a testament to the ways in which AI can and is being used to enhance and improve the research experience for everyone.
Our presentation at the ICA congress will describe what we did and how we did it, with an emphasis on what we learned and the value that we were able to unlock through AI and machine learning. It will also briefly touch on FamilySearch’s experience in developing AI for use with historical, handwritten documents, and on NARA’s approach to AI and what helps them build their capacity as they plan future digital transformation projects and roadmaps. Fast Learns, Slow Remembers: Systems thinking in digital preservation using the Pace Layer and Viable System Models Artefactual Systems Inc., Canada Short Description Digital preservation is a complex and long-term process. When our goal is to ensure that information is available and trusted in the future, how can we design, build, and operate durable and dynamic systems and processes that are adaptable and robust in the face of change? This presentation will describe a novel approach to understanding and addressing the challenges of digital preservation using two complementary approaches to systems thinking, the Pace Layer and Viable Systems models. Abstract Digital preservation systems are complex, dynamic, and must operate over the very long-term. However, maintaining complex systems over long periods of time presents many challenges. When our goal is to ensure that information is available and trusted in the future, how can we design, build, and operate durable and dynamic systems and processes that are adaptable and robust in the face of changes happening at many different rates and scales? This presentation will describe a novel approach to understanding and addressing the challenges and complexities of long term digital preservation using two complementary approaches to systems thinking. The Pace Layer Model, first described by Stewart Brand, describes how complex systems are composed of interconnected layers that change at different rates, from slow and stable to fast and innovative. These layers interact and depend on each other, with slower layers providing stability and context, while faster layers enable adaptation and innovation. The model highlights the importance of balancing change and stability in order for systems to remain robust and adaptable over time. The Viable System Model, created by Stafford Beer, describes the necessary organizational structure for a system to be viable (i.e., capable of independent existence). It is often used to model organisations and complex systems, helping to identify areas for improvement and ensure long-term sustainability. While seemingly disparate, these models share common characteristics that make their combination particularly valuable for understanding the challenges of caring for cultural memory in the very long term. Both emphasize the interdependence of system components, recognizing that changes in one layer can have cascading effects on others. They both stress the importance of autonomy and control at each level, allowing for flexibility and adaptation to change while maintaining overall system coherence. Key concepts common to both approaches will be outlined and explained in the context of the OAIS reference model, a framework for long-term digital preservation, defining roles and responsibilities for managing and preserving information packages that must operate over very long time frames. The presentation emphasizes the necessity of systems thinking and provides new perspectives on the development of more resilient and sustainable digital preservation practices. Audience members will come away with a better understanding of the Pace Layer and Viable System Models, and the benefits of systems thinking in efforts to care for cultural memory. The Media is the Message: Optical or Analog Storage Media without Magnetic Information FH Potsdam, Germany Short Description Today´s digital information is stored on hard discs or tape, both relying on a magnetic mechanism for storage. It is used indifferent on how important the data is or the time of storage. They need constant replacement, checking of data integrity, expensive cooling and IT-experts. Today the technology for storing digital data is on the move, creating new optical or analog storage solutions. Various existing and emerging technologies are presented that can provide viable alternatives. Abstract Almost all digital data in data centers is stored on either hard drives or tape, both relying on magnetic storage. Due to the instable form, it needs constant integrity checks. For example, the Amazon cloud storage S3 held about 280 trillion data objects in 2023 requiring four billion checksum calculations each second! All this needs computing power, energy and money – only because we rely on magnetic storage for information that should not change. Additionally, both hard drives and tape do not have a very long lifespan and are replaced every 3-10 years, adding to the electronic waste. Cultural institutions really want a storage medium that needs no power during the centuries of storage. It should be stable enough to keep information for decades, but it should also be cheap to produce. It should have no moving parts and would need no cooling. It should be safe from electromagnetic impulses, hackers, heat, floods and other threats. As a tradeoff we would be willing to compromise on speed of reading and writing the data. Currently a handful of alternative storage providers are either already on the market or still in the later stages of their development. They rely on a wide variety of storage media and formats. Their unifying characteristics is a non-magnetic storage which provides for a very stable form that does not need constant maintenance and checks. These storage media range from rather traditional paper or film to more advanced solutions on ceramics or specially developed material. Some of the companies find analog storage sufficient for their customer’s needs, others have developed new methods of encoding digital data. Solutions range from analog “back-ups” for archives to server racks for the biggest data centers worldwide. This market is currently on the move and driven by companies that try to adapt their products to the need of their customers. But the market is also influenced by the costumers (=archives) that are no longer willing to pay huge storage costs for data that should stay unchanged for decades or centuries. In my talk I want to encourage the cultural institutions to look for alternative and new ways of safeguarding their data long-term. We should use our working relationships with the IT that currently keeps our data and invite them to see the new market out there. It can potentially safe a lot of money while using less energy and providing a much safer storage solution. | ||