In recent years, the advent of Large Language Models (LLMs), which are task-agnostic models trained on huge amounts of textual data, has given momentum to a wide variety of NLP applications, ranging from chatbots to sentiment classifiers. Currently, many LLMs are publicly available, each with different features and performance, and the selection of the best LLM for a specific task may be challenging. In this work, we focus on the task of emotion recognition in Italian social media content and we present an experimental comparison among three of the most popular LLMs: Google Bidirectional Encoder Representations from Transformers (BERT), OpenAI Generative Pre-trained Transformer 3 (GPT-3) and GPT-3.5. Model specialization in emotion recognition has been achieved by using two different approaches, namely fine-tuning and prompt engineering with few-shot task transfer. The experimentation has been performed on TwIT, a corpus of about 3100 Italian tweets annotated with respect to six emotions. The results show that fine-tuning GPT-3 leads to the best performance on the considered dataset, achieving a remarkable F1=0.90.
An Experimental Comparison of Large Language Models for Emotion Recognition in Italian Tweets / Diamantini, C.; Mircoli, A.; Potena, D.; Vagnoni, S.. - 3606:(2023). (Intervento presentato al convegno 2nd Italian Conference on Big Data and Data Science, ITADATA 2023 nel 2023).
An Experimental Comparison of Large Language Models for Emotion Recognition in Italian Tweets
Diamantini C.;Mircoli A.;Potena D.;Vagnoni S.
2023-01-01
Abstract
In recent years, the advent of Large Language Models (LLMs), which are task-agnostic models trained on huge amounts of textual data, has given momentum to a wide variety of NLP applications, ranging from chatbots to sentiment classifiers. Currently, many LLMs are publicly available, each with different features and performance, and the selection of the best LLM for a specific task may be challenging. In this work, we focus on the task of emotion recognition in Italian social media content and we present an experimental comparison among three of the most popular LLMs: Google Bidirectional Encoder Representations from Transformers (BERT), OpenAI Generative Pre-trained Transformer 3 (GPT-3) and GPT-3.5. Model specialization in emotion recognition has been achieved by using two different approaches, namely fine-tuning and prompt engineering with few-shot task transfer. The experimentation has been performed on TwIT, a corpus of about 3100 Italian tweets annotated with respect to six emotions. The results show that fine-tuning GPT-3 leads to the best performance on the considered dataset, achieving a remarkable F1=0.90.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.