Sentiment analysis (SA), also known as opinion mining, is a natural language processing (NLP) technique used to determine the sentiment or emotional tone behind a piece of text. It involves analyzing the text to identify whether it expresses a positive, negative, or neutral sentiment. SA can be applied to various types of text data such as social media posts, customer reviews, news articles, and more. This experiment is based on the Internet Movie Database (IMDB) dataset, which comprises movie reviews and the positive or negative labels related to them. Our research experiment's objective is to identify the model with the best accuracy and the most generality. Text preprocessing is the ¯rst and most critical phase in an NLP system since it signi¯cantly impacts the overall accuracy of the classi¯cation algorithms. The experiment implements unsupervised sentiment classi¯cation algorithms including Valence Aware Dictionary and sentiment Reasoner (VADER) and TextBlob. We also examine the supervised sentiment classi¯cations methods such as Naïve Bayes (Bernoulli NB and Multinomial NB). The Term Frequency-Inverse Document Frequency (TFIDF) model is used to feature selection and extractions. The combination of Multinomial NB and TFIDF achieves the highest accuracy, 87.63%, for both classi¯cation reports based on our experiment result.

Multinomial Naive Bayes Classifier for Sentiment Analysis of Internet Movie Database / Dewi, Christine; Chen, Rung-Ching; Juli Christanto, Henoch; Cauteruccio, Francesco. - In: VIETNAM JOURNAL OF COMPUTER SCIENCE. - ISSN 2196-8888. - ELETTRONICO. - 10:4(2023), pp. 485-498. [10.1142/S2196888823500100]

Multinomial Naive Bayes Classifier for Sentiment Analysis of Internet Movie Database

Francesco Cauteruccio
2023-01-01

Abstract

Sentiment analysis (SA), also known as opinion mining, is a natural language processing (NLP) technique used to determine the sentiment or emotional tone behind a piece of text. It involves analyzing the text to identify whether it expresses a positive, negative, or neutral sentiment. SA can be applied to various types of text data such as social media posts, customer reviews, news articles, and more. This experiment is based on the Internet Movie Database (IMDB) dataset, which comprises movie reviews and the positive or negative labels related to them. Our research experiment's objective is to identify the model with the best accuracy and the most generality. Text preprocessing is the ¯rst and most critical phase in an NLP system since it signi¯cantly impacts the overall accuracy of the classi¯cation algorithms. The experiment implements unsupervised sentiment classi¯cation algorithms including Valence Aware Dictionary and sentiment Reasoner (VADER) and TextBlob. We also examine the supervised sentiment classi¯cations methods such as Naïve Bayes (Bernoulli NB and Multinomial NB). The Term Frequency-Inverse Document Frequency (TFIDF) model is used to feature selection and extractions. The combination of Multinomial NB and TFIDF achieves the highest accuracy, 87.63%, for both classi¯cation reports based on our experiment result.
2023
Bernoulli NB; multinomial NB; Naïve Bayes; Sentiment analysis; sentiment classi¯cations; TFIDF
File in questo prodotto:
File Dimensione Formato  
Dewi_Multinomial-naïve-bayes-classifier_2023.pdf

accesso aperto

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza d'uso: Creative commons
Dimensione 820.7 kB
Formato Adobe PDF
820.7 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11566/319811
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 10
  • ???jsp.display-item.citation.isi??? 6
social impact