Conference Agenda

Session

LP01: Long papers

Time:

Thursday, 04/Dec/2025:

10:00am - 11:00am

Location: Pigott Theatre (Auditorium)

Knowledge Centre, capacity 255

Presentations

10:00am - 10:30am

In bed with AI: the library learnings from working with big tech

Megan Gooch, Amy Warner May

University of Oxford, United Kingdom

In March 2025 it was publicly announced that the University of Oxford, along with many other universities, was entering a partnership with OpenAI. At Oxford, a 6-month pilot at the Bodleian Libraries was a key part of this institutional partnership. In our talk we will discuss what this partnership aimed to achieve, and reflect on our learnings from our staff, students and researchers as we delivered the project.

In summer 2024, the Bodleian Libraries began discussions for a partnership that would deliver mutual benefit. Plans included some distinctly non-AI activities such as scaled digitisation and time and motion studies using new digitisation equipment. But we also wanted to look at how AI worked with our library use cases, particularly in metadata creation and enhancement and in transcript creation. We also wanted the time and space to undertake some benchmarking of AI projects in the GLAM sector, so we could understand what was already being trialled, and where we might focus any future efforts.

Delivery of our pilot project began in February 2025 and following the press release, we worked with library staff in particular to understand their concerns over an AI project in general, and one with OpenAI in particular. We used all staff meetings, FAQs and a new GLAM AI Community of Practice as tools to understand, listen, and share our thinking.

Key concerns arose over the trustworthiness of AI-generated information, ethical concerns over copyright infringement, and our status as a trusted knowledge institution as well as trustworthy information professionals. Other issues included fears over changes in jobs and the environmental impact of AI training and use. We did not dismiss concerns, but built these into a new framework for assessing future tech company partnerships, and also worked with a University-wide centre for AI and Machine Learning to seek new ways to get real and rigorous data on energy and rare metals use of AI as well as for other aspects of our digital infrastructure.

As we write this abstract, the pilot is about halfway through, but we have committed to writing up our learnings and sharing them on our institutional repository. We don’t yet know our final findings. At Fantastic Futures we hope to be able to share our learnings and hear from others who have worked with AI companies or are considering doing so.

10:30am - 11:00am

Case study of compar:IA: Creating (international) language data commons while raising awareness

Lucie Termignon, Simonas Zilinskas

Compar:IA - Ministry of Culture, France

How did we create (international) data commons for multilingual AI, all while raising awareness?

This presentation explores the case of compar:IA, a public platform launched by the French government in late 2024. At its core, compar:IA is a simple tool: users ask a question, two AI models respond blindly, and the user votes for the better answer. After voting, the user discovers which models they compared, sees how others voted, and accesses information about each model’s characteristics and estimated environmental impact.

While the tool is simple in form, its impacts have been significant - both in generating high-quality datasets for underrepresented languages and in raising public awareness around AI’s diversity, limitations, and consequences. This presentation will walk through how we built compar:IA, what we’ve learned from user data, and how we are now replicating the model in other countries.

1. Starting point: the need to create French-language data commons

One of the main motivations for launching compar:IA was the lack of high-quality evaluation data for conversational AI in French. To illustrate the problem: French represented just 0.16% of the training data in models like LLaMA 2 (Touvron et al., 2023). The underrepresentation of non-English languages introduces major bias and performance issues for users who do not interact with AI in English.

Our hypothesis was that a participatory evaluation tool - a “chatbot arena” - could serve as a low-barrier way to collect useful, real-world language data while also being engaging to the public. That’s what compar:IA set out to do. By creating a space where users compare models without knowing which is which, we not only gathered millions of tokens of French-language prompts and answers, but also captured user preferences - the key data needed to improve model alignment.

To date, more than 170,000 conversations, 60,000 votes, and 40,000 reactions have been collected and published in open datasets on Hugging Face. Developers are already using the data to fine-tune models, run evaluations, and explore direct preference optimization in French. Researchers in digital humanities and the social sciences have also expressed interest in studying how people actually use AI in practice: what questions they ask, how they phrase them, how expectations vary across topics. Proposals for visualizing and analyzing the dataset have already been submitted to us: for example, Bunka.ai designed a topographical map to illustrate AI usage patterns based on the comparIA dataset.

2. Side quest that became central: raising awareness

While our initial goal was data generation, we quickly realized that compar:IA’s most immediate impact was public awareness. Many users told us that it was the first time they had ever used another conversational AI model than ChatGPT, and that this experience, coupled with the environmental impact statistics, made a lasting impression. The act of comparing reveals a lot — often more than reading articles or watching hype unfold in the news. ComparIA has thus become a tool for raising awareness of four fundamental issues: model pluralism, cultural bias, environmental impact and the sharing of open source data for model alignment.

Since its launch in October 2024, the platform has been visited more than 180,000 times. Over 250,000 questions have been submitted and over 100,000 blind votes cast. These numbers reflect not only engagement, but curiosity and trust from a wide audience.

Model plurality becomes immediately visible. Most people use only one model, typically ChatGPT, which alone captures 66% of the conversational AI usage share in France. Compar:IA reminds users that there is no “one AI.” There are many, and they behave differently. This echoes broader ideas of media, algorithmic, and now, model plurality - exposing users to different perspectives, even in automated form.

Linguistic and cultural biases also become more obvious. When people ask about movies, history, or politics, and both models suggest only US-based answers, the point about training data bias becomes clearer. This is especially relevant in a context where only 14% of French respondents say they are concerned about AI-driven bias or discrimination (Ipsos & CESI, 2024). The comparison format makes this issue tangible in a way that’s immediate and relatable.

Environmental impact is brought into the conversation as well. When the anonymous models are revealed, an energy estimate based on the EcoLogits methodology (created by the GenAI impact association) is shown. Even if not entirely precise due to gaps in provider transparency, it gives users a sense of the relative cost of different models and prompts new questions about how we use AI, at what scale, and for what purposes. This is particularly important in a country where only 19% of the population expresses concern about AI's climate impact (Ipsos & CESI, 2024).

Beyond the platform itself, we’ve invested in outreach to extend this awareness-building. Compar:IA has been integrated into many classroom activities, university seminars, and professional education since day one. This was partly because educators found the tool themselves, and partly because our team proactively reached out to teacher communities to promote it. We also co-designed a workshop format called “Le Duel des IA,” which is increasingly used by educators to explore the environmental impact of models - and soon, bias and digital sovereignty - with groups of all ages.

Compar:IA has received media coverage from national outlets including France Info, France Inter, BFM, HuffPost, ActuIA, and Frandroid. It was also featured in podcasts such as Le Comptoir de l’IA and IA Frugale, and mentioned over 250 times on platforms like LinkedIn and X.

In February 2025, we organized “compar:IA Day” at the Bibliothèque nationale de France, with over 300 participants attending talks, panels, and workshops from our partners. The event confirmed the public's interest in critical and hands-on approaches to AI.

Perhaps most importantly, people simply enjoy the platform. It’s simple, surprising, and fun, which means it spreads organically. A significant portion of our visits come from social sharing and personal recommendations.

3. International replication: same recipe, different kitchens

Given the success of compar:IA in France, we are now working to replicate the model in other countries and languages. The need is clear. Most languages lack public benchmarks for conversational AI, and most users have never had the chance to evaluate models in their own language. Compar:IA offers a lightweight and participatory way to do both.

Our international expansion is structured around local partnerships. We are currently onboarding institutions in several countries who want to adapt the platform to their language, choose relevant models for comparison, and organize local engagement and communication efforts.

We handle the technical side: hosting, configuration, API integration, environmental impact estimates. Our partners help localize content and bring the tool to life in their context.

We’re learning a lot through this process and the momentum is real. There is strong interest from countries where AI performance in the local language is poor and where institutions are actively seeking ways to explore and improve it. We also hope that the more datasets we gather across languages, the more leverage we have to encourage model developers to support multilingualism.

Conclusion

compar:IA is a small tool that plays a big role. It creates open datasets, reveals model differences, and builds public understanding — all in one user flow. This combination of data commons and public awareness raising is especially relevant in the context of GLAM institutions and cultural research.

As libraries, archives, and museums face the challenge of “AI Everywhere, All at Once,” compar:IA offers a model for what public AI infrastructure might look like. It’s not about building AI from scratch, but about shaping how AI is evaluated, understood, and improved - with the public, not just for them.

We believe this work is relevant to many other countries and contexts. We welcome new partners, researchers, and institutions interested in adapting or reusing the platform, the datasets, or the workshop formats. And we look forward to sharing what we’ve learned - and learning from others - at Fantastic Futures 2025.