Duration of the workshop
2,5
Target groups
Analysts and researchers working with text data, e.g. transcripts, news articles, social media posts or reviews
Is the workshop geared at an exclusively German or an international audience?
International
Workshop language
English
Description of the content of the workshop
This workshop is an introduction to the application of Large Language Models (LLMs) for structured information extraction in market research and social sciences. Participants will implement solutions to natural language processing tasks such as text classification, entity recognition, and sentiment analysis. The session includes hands-on exercises in Python using the library "instructor". Participants will learn about strategies for prompting, few-shot examples and fine-tuning. The approaches taught are compatible with a wide range of open source and commercial models. Discussion sections of the workshop will cover the methodological and technical possibilities and limitations of LLMs for information extraction.
Goals of the workshop
- Get hands-on experience with structured information extraction.
- Get an overview of available models, tools and prompting tactics
- Learn about evaluation, efficiency and limitations
- Share experiences and use cases
Necessary prior knowledge of participants
Basic knowledge of Python. R users can use the guide recommended literature to get up to speed quickly. The code examples in the workshop can be followed with minimal coding knowledge, extending them requires a bit more.
Literature that participants need to read prior to participation
Starter guide which will be sent before the workshop. It will contain instructions for using Google Colab and installing the required Python packages.
Recommended additional literature
Primer on Python for R users: https://rstudio.github.io/reticulate/articles/python_primer.html
Information about the instructor
Paul Simmering is a data scientist at Q Agentur für Forschung where he works on social media and review analysis. He has presented research on sentiment analysis at GOR 23 and GOR 24
Maximum number of participants
20
Will participants need to bring their own devices in order to be able to access the Internet? Will they need to bring anything else to the workshop?
Participants will need to bring a laptop. An OpenAI API key will be provided for use during the workshop. The recommended development environment for beginners is Google Colab, which is free and runs in the browser. A starter guide will be provided. Advanced users are welcome to use an IDE of their choice and are also welcome to use a different LLM platform than OpenAI that is compatible with instructor, such as Anthropic, Cohere, Gemini and local models using Ollama.