Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
|
Session Overview |
| Session | ||
4.03. Reconstructing Memory: Artificial Intelligence, Archival Practice, and Digital Ethics
| ||
| Presentations | ||
Archival Science and Machine Learning: Automatic Classification of Archival Documents Universidade Federal de Santa Maria (UFSM), RS, Brazil Short Description The evolution of artificial intelligence techniques has supported several areas of knowledge. In archival science, one of the archival management operations is the classification. This article presents how machine learning can support archival science in the task of classifying university documents in order to improve the quality of classification. Experiments on real documents show that the approach reaching good results. The results demonstrate an acuracy equals to 98% on the prediction. Abstract Archival document management is important from both a personal and institutional perspectives. Keeping documents is not always an easy task, as it requires detailed organization and a sense of importance, as losing them can have several consequences. Document management is important since it is through documents that it is possible to recover part of the history of institutions and also of people. Furthermore, storing documents without an efficient mechanism is not a good practice. An efficient mechanism for accessing and locating documents is as important as any other stage of archival management. The information access becomes highly strategic considering the advancement of information technologies that allow the optimization of work procedures. In addition to the technological issue, access to information is a transparent action, which strengthens democracy and the exercise of citizenship. Classifying documents appropriately have become essential aspect in document care. Classifying is one of the activities in the archival document management process, which includes specific procedures and routines that enable greater efficiency and agility in the management and control of information. The classification plan for archival document in public administration was elaborated by the Arquivo Nacional from Brazil and constitutes an essential element in the organization of archives. The classification of documents in most institutions is often performed by professionals from other areas, which results in classification errors and execessive time spent selecting the appropriate subject to the content of the document. In this context, automating the task of classifying archival documents can help people identify the main subject of the document and improve the classification process by reducing the execution time of this task. One of the paradigms of computer science is artificial intelligence (AI). One of AI's areas of activity is machine learning, where techniques are developed to allow the computer to improve its performance in a specific task. Machine learning-based systems present solutions to problems through inductive methods. Generally, a model capable of solving a specific task is generated by induction from a training set. This paper presents an automatic approach for classifying archival documents at a federal higher education institution. This process involves a set of activities from the document identification and the automatic classification of its class (document type). To perform the automatic classification, data mining and text mining algorithms are used. The present work uses three algortihms for classification, with a qualitative and quantitative analysis with an exploratory approach, through a case study. The preliminary results demonstrate that the approach is efficient. The algorithms tested for automatic classification achieved an average accuracy rate of 98% in predicting document types on 11.041 documents. Finally, the presented method can be applied and shared with other institutions, since the proposed solution is independent of the computerized archival document management system used by the institution. Artificial Intelligence Technology for Recognition of Handwritten Chinese Characters in Hunan Provincial Archives 1Hunan Provincial Archives, China; 2Xiangtan University, China Short Description Hunan Provincial Archives houses over 100,000 historical files, with Chinese character recognition challenged by diverse patterns and styles. AI technology aids in this through: (1)Image correction and repair with CNN; (2) Histogram equalization-based image enhancement. (3)Image classification with target detection. (4)Text localization with YOLOv3. (5)Handwritten images generating with generative RNN. (6)Character recognition with deep-learning model. (7)Error corrections with language model. Abstract There are over 100,000 historical files in Hunan Provincial Archives. These files have not been systematically arranged, with missing pages, inaccurate description, and unidentified special carriers. The recognition of Chinese characters faces significant challenges. For there are more than 6,700 commonly used Chinese characters, including traditional and simplified characters in terms of patterns, printed and handwritten characters in terms of writing, as well as regular script, cursive script, clerical script, seal script and semi-cursive script in terms of styles. In addition, it is difficult to understand the background of historical records. Most of the documents refer to the historical events before 1940s, which requires a good understanding of China's modern history. Therefore, AI technology is needed to realize the recognition of handwritten characters. There are 7 steps to realize recognition of handwritten Chinese characters with AI-based technologies. (1)Image correction and repair. A deep convolutional neural network (CNN), combining the GAN technology, is used to repair images through end-to-end learning. (2)Image enhancement. Histogram equalization-based image enhancement algorithm is hired to highlight the details of images. The algorithm features a shorter calculation time and better real-time performance, which is more in line with engineering requirements on the premise of ensuring higher contrast of enhanced images. (3)Image classification. We use the common OCR technology for full-text recognition and then conduct data analysis based on the recognition results to achieve the purpose of classification. (4)Text location. The yolov3 location mark detecting algorithm can automatically learn the features through a large number of network parameters in the training set. (5)Handwritten images generating. We use the deep neural network to learn the characteristics of the image on the existing data, use the generative RNN to output the image, score the generated image through the adversarial neural network, and finally generate the satisfactory image. (6) Character recognition. The intelligent image recognition technology based on deep learning can obtain the feature of each segmented element, distinguish and recognize different characters, carry out multi-network parallel recognition, check each other and produce the final result. (7)Error corrections. The language model is established according to the sample materials. The recognition candidate characters and recognition distance obtained through OCR in the early stage will be calculated by the language model to be further corrected to get the first recognition candidate characters by combining the optimal path, so as to improve the accuracy of OCR. The Impact of Open Data, Artificial Intelligence, and Big Data on Archive Preservation in the Digital Era National Library and Archives, United Arab Emirates Short Description Big data, AI, and open data increase productivity but threaten the preservation of archives by misclassifying records, warping historical context, and raising cyberthreats. In order to preserve archival integrity in the digital age, this study examines these issues and offers remedies such as data verification, AI supervision, and robust cybersecurity. Abstract Open data, artificial intelligence (AI), and big data have emerged as key technologies in the digital revolution that has revolutionized a number of industries. These developments pose threats to the integrity and validity of archive records even as they greatly increase efficiency and improve information access. The study examines the difficulties that open data, artificial intelligence, and big data present for preserving archives, especially with regard to the degradation or loss of historical accuracy. Historical facts may be misinterpreted as a result of open data, which gives the public open access to information for their own use and redistribution. The original context or meaning of historical events may be changed when historical records are released as open data because they may be examined by AI technologies that introduce bias or mistakes. If left unchecked, inaccurate information has the potential to mislead future generations and skew historical accounts. Archival preservation is made more difficult by AI's role in data analysis and classification. AI has the potential to improve archival record management efficiency, but it can also create false material, alter historical data, or incorrectly classify records. The legitimacy of archive materials may be threatened by biased or inaccurate interpretations produced by AI algorithms that lack contextual awareness. With its enormous amounts of data, big data also makes it difficult to retain and preserve records. Archival management is challenging due to the sheer amount of data, and there is a greater chance of cyberthreats like hacking or data theft. The integrity of archives may be further impacted by incorrect conclusions drawn from excessive big data analysis. This paper will look at these problems and offer remedies, such as strong data verification, AI systems that are supervised by humans, and cutting-edge cybersecurity protocols. It seeks to assist archivists in preserving the authenticity and correctness of historical documents in a world that is becoming more digital by tackling these issues. The role of Artificial Intelligence in identifying or reconstituting archival aggregations of digital records and enriching metadata schemas. A Practical Case Study and a Project Reporting Framework 1University of Macerata (ITALY); 2ICCROM International Centre for the Study of the Preservation and Restoration of Cultural Property; 3Associazione Nazionale Archivistica Italiana – ANAI; 4NATO Archives, Belgium Short Description Can we use AI tools to constitute or reconstitute archival aggregations and create metadata schema for them? The project aims at identifying concrete areas where AI technologies could play a crucial role. The team has tested some of the available tools with case studies. As part of the InterPARES TRUST AI network, the team is also trying to see how an AI project can be documented from the perspective of the practitioners, and to test the framework in a real case scenario. Abstract ICA 2025 Paper Application Subtheme – 4. Digital and Accessible The role of Artificial Intelligence in identifying or reconstituting archival aggregations of digital records and enriching metadata schemas A Practical Case Study and a Project Reporting Framework Since 2021, InterParesTrustAI has been working on designing, developing, and leveraging Artificial Intelligence to support the ongoing availability and accessibility of trustworthy public records. Within this framework, the CU05 working group started its activities with the aim of answering the question: Can we use AI tools to constitute or reconstitute archival aggregations and create metadata schema for them? The project aims at identifying concrete areas where AI technologies could play a crucial role. In doing so, the study is analysing case studies. The project is articulated in three phases - (1) A market survey of the existing software solutions - (2) Case studies to experiment available software solutions - (3) How organizations can document AI projects related to recordkeeping. Suggested frameworks and templates The presentation intends to focus on points (2) and (3). The team has tested some of the available tools with case studies. At the ICA conference a case study will also be presented. This involves the Australian software company RecordsPoint and NATO Archives. Within the case study, the research team investigated if and to what extent it is possible to - Use Artificial Intelligence to classify records against rules and regulations - Automatically flagging and alerting for high risk of sensitive data and/or security markers - Indexing based on records content and context against function-based records classification schemes - Perform text summarization of given records - Making inferences about the organization or person that has created or received and then set aside the records In addition, the group has been working on a final report that aims at documenting the project in its various phases. The idea is to develop a framework that organizations can use to support the implementation of AI projects related to recordkeeping. This framework is built on the most recent literature on the topic and on relevant standards and recommendations such as OAIS and IPELTU standards. The team is trying to see – in cooperation with other InterPARES TRUST AI studies - how an AI project can be documented from the perspective of the practitioners, and to test the framework in a real case scenario. The project started in 2022 and it is now ongoing. At the ICA Meeting, it will be possible to present the final results and the related documentation. | ||