FROM ia_archiver TO OpenAI: THE PASTS AND FUTURES OF AUTOMATED DATA SCRAPERS
Katherine Mackinnon1, Emily Maemura2
1University of Toronto, Canada; 2University of Illinois Urbana-Champaign, United States of America
Data scraping practices have recently come under scrutiny, as datasets scraped from the web’s social spaces are the basis of new generative AI tools like Google’s Gemini, Microsoft’s Copilot, and OpenAI’s ChatGPT. These practices of scrapers and crawlers are based on the conception of the internet as a mountain of data that’s sitting, waiting, available to be acted upon, extracted and put to use. In this paper, we examine the robots.txt exclusion protocol which has been used to govern the behavior of crawlers and is often taken as a proxy for consent in widespread data scraping and web archiving. By addressing the underlying assumptions of the protocol, we aim to counter a recent narrative that “the basic social contract of the web is falling apart” (Pierce, 2024), and instead argue that data extractive infrastructures have always been at work over the past 30 years of the web. Positioning this work within the field of critical data studies, we aim to find new ways for web archives and modes of collection to become unbound from the “capitalist logics of data extraction” upon which they’re currently built (Theilen et al., 2021).
Taming Ambiguity: Managerial Contradictions in AI Data Production Industry
Julie Yujie Chen
University of Toronto, Canada
As human cognitive capacity of comprehension, interpretation, learning, and problem solving has been put to work to annotate and train the datasets for artificial intelligence (AI) and machine learning models, it is imperative, from the management perspective, to standardize subjective interpretations of data and align workers’ understanding of data with the clients’ interests and values. The tendency to treat the data worker’s mind as a contested terrain of labor control raises important questions for labor politics and critical data studies.
The paper delves into the management of interpretative labor in the AI data production industry, with a specific focus on the impact of organizational and market dynamics. Taking AI data production industry in China as a case study, in this paper, I intend to elucidate the contradictions in managing the cognitive and interpretative labor, examine the contributing factors to these managerial contradictions, and attend to worker’s cognitive tactics as means of negotiations and resistance.
Breaking data flows and connecting data practices: examining data frictions in digital platform APIs
Fang Jiao1, Jo Bates2
1Chinese University of Hong Kong, Hong Kong S.A.R. (China); 2University of Sheffield, Sheffield, UK
This study examines data frictions – a combination of sociotechnical circumstances involving the consumption of energy, time, and resources that shape the mobility of data – embedded in digital platforms APIs and how data frictions shape the connections between various API-based data practices. By conducting a document analysis of Twitter/X APIs’ historical documentation and changelogs, this study articulates a multi-layered understanding of data frictions, namely, this study identifies a multiple-layered understanding of data frictions, including the material layer, technical layer, and discursive layer, which respectively correspond to the elements of data practice, namely materials, competencies, and meanings. Beyond the descriptive understanding of data friction, this study argues that data frictions shape the material element of data practice as limited and potentially fractured so that the data practice environment of third-party developers is limited to the technical environment recommended by APIs. This also makes it possible for practitioners with different identities, such as programmers, software developers, business analysts, and academic researchers, to share and discuss codes on open-source platforms like GitHub. With practitioners applying APIs in different fields, the platform extends this meaning to the broader social web, serving as one of the strategies of platformization, contributing to digital platforms’ programmability and ongoing infrastructuralization.
From tech-solutionism to community-centred data capability for disaster preparedness
Anthony McCosker, Yong-Bin Kang, Frances Shaw, Kath Albury
Swinburne University of Technology, Australia
The urgency of enhancing community resilience in the face of escalating disasters necessitates a shift in disaster preparedness strategies. This paper presents a novel approach developed in collaboration with the Australian Red Cross, focusing on community-centred data practices for disaster resilience. Recognising the limitations of traditional digital humanitarianism, which largely relied on crowdsourcing and social platform data, our project shifts the paradigm towards empowering communities with data capabilities. We developed a Community Resource Mapping Pipeline, and a prototype platform to map community resources and strengths, emphasising community-led processes and local data capability development to improve local disaster preparedness and response. Our organisational participatory approach involved workshops with stakeholders to co-shape research questions and platform design within a human-centred framework. Our prototype demonstrates the potential of community-led data capability building in enhancing disaster preparedness, underscoring the importance of involving communities in both data collection and decision-making.
|