In recent years, the advent of Large Language Models (LLMs), which are task-agnostic models trained on huge amounts of textual data, has given momentum to a wide variety of NLP applications, ranging from chatbots to sentiment classifiers. Currently, many LLMs are publicly available, each with different features and performance, and the selection of the best LLM for a specific task may be challenging. In this work, we focus on the task of emotion recognition in Italian social media content and we present an experimental comparison among three of the most popular LLMs: Google Bidirectional Encoder Representations from Transformers (BERT), OpenAI Generative Pre-trained Transformer 3 (GPT-3) and GPT-3.5. Model specialization in emotion recognition has been achieved by using two different approaches, namely fine-tuning and prompt engineering with few-shot task transfer. The experimentation has been performed on TwIT, a corpus of about 3100 Italian tweets annotated with respect to six emotions. The results show that fine-tuning GPT-3 leads to the best performance on the considered dataset, achieving a remarkable F1=0.90.

An Experimental Comparison of Large Language Models for Emotion Recognition in Italian Tweets / Diamantini, C.; Mircoli, A.; Potena, D.; Vagnoni, S.. - 3606:(2023). (Intervento presentato al convegno 2nd Italian Conference on Big Data and Data Science, ITADATA 2023 nel 2023).

An Experimental Comparison of Large Language Models for Emotion Recognition in Italian Tweets

Diamantini C.;Mircoli A.;Potena D.;Vagnoni S.
2023-01-01

Abstract

In recent years, the advent of Large Language Models (LLMs), which are task-agnostic models trained on huge amounts of textual data, has given momentum to a wide variety of NLP applications, ranging from chatbots to sentiment classifiers. Currently, many LLMs are publicly available, each with different features and performance, and the selection of the best LLM for a specific task may be challenging. In this work, we focus on the task of emotion recognition in Italian social media content and we present an experimental comparison among three of the most popular LLMs: Google Bidirectional Encoder Representations from Transformers (BERT), OpenAI Generative Pre-trained Transformer 3 (GPT-3) and GPT-3.5. Model specialization in emotion recognition has been achieved by using two different approaches, namely fine-tuning and prompt engineering with few-shot task transfer. The experimentation has been performed on TwIT, a corpus of about 3100 Italian tweets annotated with respect to six emotions. The results show that fine-tuning GPT-3 leads to the best performance on the considered dataset, achieving a remarkable F1=0.90.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11566/326492
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact