Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
B1: Large, Larger, LLMs
Time:
Thursday, 22/Feb/2024:
10:45am - 11:45am

Session Chair: Daniela Wetzelhütter, University of Applied Sciences Upper Austria, Austria
Location: Seminar 3 (Room 1.03/1.04)

Rheinische Fachhochschule Köln Campus Vogelsanger Straße Vogelsanger Str. 295 50825 Cologne Germany

Show help for 'Increase or decrease the abstract text size'
Presentations

Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles

Patrick Mertes1, Sophie Tschersich2

1Inspirient GmbH, Germany; 2Verian Group, Germany

Relevance & Research Question:

Manual classification of job titles into classification systems like ISCO-08 is a time-consuming and labor-intensive task that often suffers from inconsistencies due to differences in expertise. The naïve approach of using lookup tables, while effective for a limited number of samples, falls short when handling larger and heterogeneous datasets, having different spelling styles for the same job. To address these challenges, our research seeks to answer the question: Can we build a classifier that outperforms the naïve approach and significantly improves the process of job title classification into ISCO-08?

Methods & Data:

After a rigorous exploration of various machine learning approaches, we identified feedforward neural networks as the most effective for our task. Our training dataset comprises a diverse collection of approximately 100,000 distinct job titles (German), each meticulously labeled with the corresponding ISCO-08 code by human experts.

Results:

In our classifier, around 60% of classifications demonstrate confidence levels of 90% or higher, achieving an impressive accuracy of approximately 94% for those classifications. This allows for swift review of about 60% of the data. The overall accuracy of the classifier is approximately 73%, when no low confidence classifications are discarded. Despite this, manual review remains crucial for ensuring correctness, but the classifier significantly speeds up the process. Furthermore, the classifier provides three ISCO-08 recommendations with confidence scores for each, giving human experts a solid reference point for manual classification, even in cases of lower confidence.

Added Value:

Our data science approach offers several valuable benefits. Firstly, it significantly reduces the time investment required for this process, enabling researchers to process larger datasets efficiently. Secondly, it enhances the consistency of job title classification into ISCO-08, mitigating the issues associated with completely manual classification. Lastly, our classifier's accuracy is expected to improve even further as it continues to learn from more examples and corrections over time, making it a valuable and ever-evolving tool for labor market research. First tests show that the same approach works very well with other national and international economic and social classifications (e.g. ISCED, KldB2010).



Where do LLMs fit in NLP pipelines?

Paul Simmering, Paavo Huoviala

Q Agentur für Forschung GmbH, Germany

Relevance & Research Question

Large language models (LLMs) can perform classic tasks in natural language processing (NLP), such as text classification, sentiment analysis and named entity recognition. It is tempting to replace whole pipelines with an LLM.

But the flexibility and ease of use of LLMs comes at a price: their throughput is low, they require a provider like OpenAI or one’s own GPU cluster and have high operating cost.

This study aims to evaluate the practicality of LLMs in NLP pipelines by asking, "What is the optimal placement of LLMs in these pipelines when considering speed, affordability, adaptability, and project management?”.

Methods & Data

This study utilizes a mixed-method approach of benchmarks, economic analysis and two case studies. Model performance and speed are assessed on benchmarks. Economic considerations stem from prices for machine learning workloads on cloud platforms. The first case study is on social media monitoring of a patient community. It is centered on an LLM that performs multiple tasks using in-context instructions. The second case is large-scale monitoring of cosmetics trends using a modular pipeline of small models.

Results

Small neural networks outperform LLMs by over 100-fold in throughput and cost-efficiency. Yet, without parameter training, LLMs attain high accuracy benchmark scores through in-context examples, making them preferable for small scale projects lacking labeled training data. They also allow flexibility of labeling schemes without retraining, which helps at the proof-of-concept stage. Further, they can be used to aid or automate the collection of labeled examples.

Added Value

LLMs have only recently become available for many organizations and drew new practitioners to the field. A first instinct may be to treat LLMs as a universal solution for any language problem.

The aim of this study is to provide social scientists and market researchers with references that help them navigate the tradeoffs of using LLMs versus classic NLP techniques. It combines theory with benchmark results and practical experience.



Sentiment Analysis in the Wild

Orkan Dolay1, Denis Bonnay2

1Bilendi & respondi, France; 2Université Paris Nanterre, France

Relevance & Research Question
The development of digital has brought new promises to qualitative and conversational surveys, but also raised new challenges on the analysis side. How should we handle numerous verbatim records? AI, especially LLMs seem to come to the rescue. But how reliable are these?
The aim of this presentation is to present the specific opportunities and challenges for AI focusing on the concrete case of sentiment analysis and to compare different AI models.

Methods & Data

We used a dataset of 167K pairs of the form <comment, rating> in English and French where participants were routinely asked to provide 0-10 ratings to evaluate how much they appreciated a certain product AND to provide a comment explaining their thoughts in connection with the rating.

Then we did assess the different SA models on how good they can guess participants’ rating based on their comment?

Since SA models typically predict SA on a five point/stars scale, the question is whether those stars somehow match the ratings given by the participant.

We compared the google NLP suite vs NLP Town, an open source multi-lingual neural network model based on BERT. Basically, a multilingual BERT finetuned for sentiment analysis on 629k amazon reviews.
Then we finetuned NLP town with inhouse datasets and external open data like Cornell and wikipedia extracts. Before comparing the performances with chatgpt3.5 and 4.
Results

The perforamance vary between 50% and 70%.
It's possible to use neural networks as an alternative to quant interrogation in qual-at-scale and to develop more conversational surveys

In this context, qual/quant data is useful to fine-tune open-source models such as NLP Town.

But the risk of generalization failure is very real. It proved very useful to supplement our qual/quant data by bringing in some extra open data (Wikipedia extracts as paradigmatic descriptive verbatims)

However, few shot learnings and last gen LLMs such as ChatGPT 4 do change the game by providing an ‘effortless’ (but still not perfect) solution.

Added Value

Fondamental research…

- assessing reliability of AI in NLP

- comparing the reliability of different AI models

- identifying challenges and limitations



 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: GOR 24
Conference Software: ConfTool Pro 2.8.101
© 2001–2024 by Dr. H. Weinreich, Hamburg, Germany