GraphRAG for Research: Connecting Scientific Knowledge through Graphs and Generative AI
Manfred Nölte1, Bert Gollnick2, Jesko Rehberg2, Johanna Rockstroh3, Frank J. Müller3
1State and University Library Bremen, Germany; 2Gollnick Data Solutions GmbH; 3University of Bremen
How can we move beyond keyword-based search to support meaningful research discovery in the era of information abundance? In this interactive workshop, we explore how Retrieval-Augmented Generation (RAG) powered by Large Language Models (LLMs) and combined with semantic graph technologies (GraphRAG) can provide accurate, context-aware, and rights-conscious access to scholarly knowledge. It explores how staff and researchers can engage with, scrutinize, and collaborate with AI tools embedded into research infrastructures, while also addressing the challenges of integrating generative AI into workflows involving copyrighted materials, sensitive information, and long-term preservation. What motivation do we see in the application of RAG and GraphRAG, why is it important for librarians and what is it? Retrieval-Augmented Generation (RAG) is a method to adapt an AI system to the data or the textual sources of a given context (e.g. a scientific domain, a company, or a project) without having to train or refine the AI system. It reduces the tendency for ‘hallucinated answers’ of AI systems and provides references to the given data. RAG is widely used, well documented and constantly being developed further, as we will see with GraphRAG, for example. GraphRAG adds symbolic knowledge to the approach, while RAG is similarity based with subsymbolic knowledge. The answers from a GraphRAG assistant become more explainable and understandable, with knowledge being explicitly encoded in a graph-based data structure. The most important role for libraries is to provide access to knowledge, with many other roles supporting this mission: Collecting knowledge, preserving knowledge, organizing and describing knowledge. Traditionally, libraries have provided access to knowledge primarily through the creation and use of metadata. An exception is the generation of full text through digitization and OCR, which turns libraries into full-text providers. Now, LLMs and GraphRAG systems go beyond metadata and operate on the full-text and semantic level, generating responses based on the provided documents and patterns learned from large corpora. This is a powerful development that libraries should actively engage with. It brings both challenges, such as data privacy, copyright issues, reduced critical engagement, and over-reliance on AI, and opportunities, for example new roles for libraries: to critically guide the use of AI, advise users on responsible practices, and organize knowledge at a higher semantic level, for instance through knowledge graphs or ontologies. The traditional role of 'describing knowledge' comes into play here. Ontologies and knowledge graphs, as flexible data structures, have the potential to reach the semantic level. LLMs help to make these technologies more scalable. Looking ahead to future developments, we outline how these technologies will increasingly rely on ontologies and symbolic representations, paving the way for richer, explainable, and standard-driven AI services. We envision an evolving AI landscape where contextualized knowledge, semantic relationships, and discipline-specific models will play a central role in responsible and effective research discovery. Beyond technical applications, the workshop addresses critical issues of trust, transparency, and AI literacy in the academic sector. By introducing the concepts of RAG and GraphRAG in a two-hour format, we aim to strengthen participants’ AI literacy. GraphRAG, as a form of source-grounded AI, has the potential to increase trust in the outputs of AI systems. Finally, the workshop is very much about building AI systems: we will demonstrate how AI can be customized to the domains, documents, and datasets owned, provided, or selected by the users themselves, whether in their work, projects, or research. The preconditions for doing this in practice, which will be explained during the workshop, enable library staff to undertake this work both for and with their users. Overview of the workshop schedule: After a brief introduction to the use of generative AI in academic libraries, participants will experience the full pipeline of building a research assistant system. This includes the automatic detection of scientific publications (e.g., with tools such as OpenAI Deep Research), the semantic structuring of entities extracted from text (authors, topics, institutions, methods, etc.), and the creation of a navigable, queryable knowledge graph that can be explored visually or through natural language questions. By combining Retrieval-Augmented Generation (RAG) with semantic graph technologies (GraphRAG), such systems enable accurate, context-sensitive, and rights-conscious access to scholarly knowledge. A RAG layer further allows for the dynamic generation of user-oriented responses that reflect the provenance and structure of the source material. Through live demonstrations grounded in real academic use cases, participants will see how institutions such as university libraries or research data centers can locally host and operate these systems within the framework of the EU AI Act, thus reinforcing their role as trusted institutional AI providers. An example from the University of Bremen (Faculty of Pedagogy and Educational Sciences) will illustrate how scientific documents can be semantically linked, enabling users to move seamlessly from specific facts to broader concepts using graph-powered AI. To clarify the scope of use cases, RAG and GraphRAG make it possible to prompt an AI using hundreds or thousands of documents as a basis, such as scientific papers, protocols or digitised historical texts. These techniques are field-independent, making them applicable to a wide range of texts and datasets. Details for Participation, Timetable, and Prerequisites: Format / Type of session: Formal, instructor-led session (two instructors) including short tutorials, guided group exercises, and a hands-on demonstration (duration: 2 hours). This session is 'on-site only' and has a maximum capacity of 24 attendees. Timetable: 20 minutes: Introduction Introduction to the use of generative AI and possible future roles of academic libraries. 5 minutes: Short discussion 80 minutes: Live-Coding and concepts
- You will learn how to extract and semantically enrich metadata through NER/NEL techniques (Named Entity Recognition and Named Entity Linking)
- Explore knowledge graphs built with Neo4j that structure scholarly entities and their relationships
- Test a GraphRAG implementation that integrates search, question answering, and knowledge navigation
- Critical reflection on possibilities and limitations of current technologies and the influence of the data on the desired objectives
15 minutes: Final discussion
- Discuss licensing models (e.g., Open Access vs. licensed media), data mining rights (TDM), and copyright implications in AI systems
- Reflect on the institutional role of libraries and universities in building and hosting trustworthy AI system
Level of experience for attendees: Previous knowledge is not required. The workshop is intended for participants at the introductory level. Target Audience: Library and information professionals, AI developers in GLAM institutions, digital humanities researchers, data scientists, research data specialists, digital strategy officers in libraries, and AI strategy development. Prerequisites: Internet access; Google Account or GitHub Account. While it is recommended that you bring your notebook or other device to the workshop, it is not essential. Keywords: GraphRAG, Generative AI, Semantic Search, Named Entity Recognition Further Reading:
- Retrieval Augmented Generation: Your 2025 AI Guide (Collabnix, June 2025); https://collabnix.com/retrieval-augmented-generation-rag-complete-guide-to-building-intelligent-ai-systems-in-2025/
- Zhentao Xu, Mark Jerome Cruz, Matthew Guevara, Tie Wang, Manasi Deshpande, Xiaofeng Wang, and Zheng Li. 2024. Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '24). Association for Computing Machinery, New York, NY, USA, 2905–2909. https://doi.org/10.1145/3626772.3661370
- Cossette, A., Blumenfeld, Z., & Sanoja, D. (2025). The Developer’s Guide to GraphRAG. Neo4j. Retrieved from https://neo4j.com/books/the-developers-guide-to-graphrag/
- Yi Sun, Wanru Yang, and Yin Liu. 2024. The Application of Constructing Knowledge Graph of Oral Historical Archives Resources Based on LLM-RAG. In Proceedings of the 2024 8th International Conference on Information System and Data Mining (ICISDM '24). Association for Computing Machinery, New York, NY, USA, 142–149. https://doi.org/10.1145/3686397.3686420
- Hunger, M., Bratanic, T., De Jong, N., Senechal, M., & Persistent Team. (2025). Neo4j LLM knowledge graph builder – Extract nodes and relationships from unstructured text. Neo4j Labs. https://neo4j.com/labs/genai-ecosystem/llm-graph-builder/
|