Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Please note that all times are shown in the time zone of the conference. The current conference time is: 16th Aug 2025, 03:00:26am BST

 
 
Session Overview
Session
PSG 1 - e-Government
Time:
Wednesday, 27/Aug/2025:
8:30am - 10:30am

Session Chair: Prof. C. William WEBSTER, University of Stirling

"AI adoption choices"

Show help for 'Increase or decrease the abstract text size'
Presentations

Towards a Benchmark for LLM-Based Agents in Public-Sector Institutions

Jonathan Rystrøm1, Chris Schmitz2, Karolina Korgul1, Jan Batzner3,4

1Oxford Internet Institute, University of Oxford, UK; 2Centre for Digital Governance, Hertie School, Germany; 3Weizenbaum Institute, Germany; 4Technical University of Munich, Germany

Large Language Model (LLM)-based agents offer significant potential for public sector organizations by streamlining processes, improving processing speed, and increasing consistency and transparency when implemented effectively (Straub et al., 2024). However, current work examining their impact on the public sector is insufficient to guide both research on their impacts and practical implementation. Empirical analyses of AI adoption lag behind the technological frontier and focus too narrowly on the small group of early-adopting institutions, with insufficient attention to what is technologically possible. Theoretical approaches lack grounding in actual technological capabilities. Neither adequately addresses the "jagged frontier" of progress—what is theoretically automatable today versus what is not. We posit that this knowledge gap severely inhibits meaningful analysis and forward-looking policy formulation – particularly regarding “downstream” effects of agent integration, such as organizational change and the shifting role of human bureaucrats.

We argue here that benchmarking, the systematic evaluation of LLM-based agents against sets of tasks (Wang et al., 2024), is a promising avenue of research to remediate these problems. We derive essential criteria for effective public sector agent benchmarking from theories of public management and automation. First, benchmarks must be based on authentic public sector work, and reflect the wide variety of subject knowledge, media formats, and administrative freedom this work may entail (Zacka, 2022). Second, they should reflect processes with several interdependent subtasks that feed into each other, as proposed in leading models of automation (Acemoglu and Restrepo, 2018). These tasks should require interaction with complex systems and precise interpretation of regulations. Third, benchmarks must allow for meaningful translation of technical performance metrics into human-compatible metrics (Thomas & Uminsky, 2022). Evaluation must extend beyond simple performance metrics to include robustness to environmental changes, cost-effectiveness compared to human baselines, and fairness assessments to identify potential biases.

Using these criteria, we evaluate 874 existing agent benchmarks through LLM-assisted distant reading. We employ a systematic approach where LLMs analyze whether each benchmark's title and abstract satisfy our specified criteria, providing written justifications followed by binary valid/invalid determinations. Our comprehensive review reveals significant gaps: a complete lack of realistic public sector-relevant processes, no conceptualization of fairness metrics, very limited measurement beyond simple performance, and almost no translation to human-relevant metrics. These findings highlight the need for benchmarks that enable more direct comparisons with human performance, better assessment of automation potential, and guide AI development toward solutions more beneficial for actual public sector tasks. This approach will provide researchers and policymakers with tools to better understand the current and future impacts of AI in public administration, supporting evidence-based workforce planning and organizational development.



Investigating municipalities’ legitimacy considerations when deciding to make or buy public sector AI applications

Marissa HOEKSTRA1,2, Alex Ingrams2

1TNO; 2Leiden University

In the debate on legitimacy of public sector AI applications there is a strong focus on judgements of the output of the AI application. However, only focusing on the output of an AI application is not sufficient. The development process should also be assessed in terms of legitimacy, as these choices can threaten democratic procedures, and thus have an impact on legitimacy of the AI application. One of the choices in the development of public sector AI is who is involved in how the AI application gets built. Organizations can choose: to make the AI application in-house in the organization, to collaborate with another organization to build the AI application or to buy the AI application from another party. In the literature there is currently no concept that describes this aspect. Therefore this research proposes to conceptualize this choice as the concept: configurations in building public sector AI applications. It is currently unclear if and how public professionals in public sector organizations deliberately and strategically think about the choice to make, collaborate or buy public sector AI. Therefore the aim of this study is to answer the following research question: Which type of legitimacy considerations are taken into account in the decision for a certain configuration in building public sector AI applications? In order to answer this research question this study conducts a qualitative multiple case study by examining eight cases of municipal chatbots for public service delivery.



A clash of logics: Westminster budgeting for public sector AI adoption

Chloe Chadwick1, Nicholas Robinson2, Nathan Davies1

1University of Oxford, United Kingdom; 2Hertie School, Germany

There is growing interest among governments and policymakers in the potential of artificial intelligence (AI) to improve service delivery, productivity, and management (Mergel et al., 2023; Bright et al., 2024). Yet despite the proliferation of Generative AI, commoditisation of foundation models, and the popularisation of open-source options, public sector adoption continues to lag, with most efforts confined to pilot or trial phases (OECD & UNESCO, 2024).

While much academic attention has been paid to how AI could enhance public budgeting (Valle-Cruz et al., 2022), far less has been given to how existing budgeting and public financial management systems may constrain AI adoption, even when adapted for digital projects. This paper argues that current budgeting rules, shaped by legacy infrastructure funding models and New Public Management reforms, are often poorly aligned to the financial demands of AI implementation. Successful adoption requires not only upfront and sustained investment, but also strengthened internal capabilities and cross-departmental coordination, all of which challenge traditional public sector budgeting practices.

Drawing on documentary analysis and in-depth interviews with fiscal and technology decision-makers in three Westminster systems – Australia, Canada, and the United Kingdom – this empirical study identifies a deeper institutional contradiction between entrenched public budgeting logics and the iterative, uncertain nature of AI development. We find that these tensions give rise to four archetypal organisational responses, each reflecting distinct attempts to resolve these conflicts.

Building on institutional logics literature, and extending recent work in public financial management, the paper demonstrates that budgeting should not be viewed as a neutral constraint but as a critical institutional lever. Addressing current barriers requires not just alternative funding mechanisms such as tranched investment or innovation funds, but more fundamentally, institutional innovation capable of reconciling competing logics of budgeting and AI implementation.