Sentiment analysis (SA), also known as opinion mining, is a natural language processing (NLP) technique used to determine the sentiment or emotional tone behind a piece of text. It involves analyzing the text to identify whether it expresses a positive, negative, or neutral sentiment. SA can be applied to various types of text data such as social media posts, customer reviews, news articles, and more. This experiment is based on the Internet Movie Database (IMDB) dataset, which comprises movie reviews and the positive or negative labels related to them. Our research experiment's objective is to identify the model with the best accuracy and the most generality. Text preprocessing is the ¯rst and most critical phase in an NLP system since it signi¯cantly impacts the overall accuracy of the classi¯cation algorithms. The experiment implements unsupervised sentiment classi¯cation algorithms including Valence Aware Dictionary and sentiment Reasoner (VADER) and TextBlob. We also examine the supervised sentiment classi¯cations methods such as Naïve Bayes (Bernoulli NB and Multinomial NB). The Term Frequency-Inverse Document Frequency (TFIDF) model is used to feature selection and extractions. The combination of Multinomial NB and TFIDF achieves the highest accuracy, 87.63%, for both classi¯cation reports based on our experiment result.
Multinomial Naive Bayes Classifier for Sentiment Analysis of Internet Movie Database / Dewi, Christine; Chen, Rung-Ching; Juli Christanto, Henoch; Cauteruccio, Francesco. - In: VIETNAM JOURNAL OF COMPUTER SCIENCE. - ISSN 2196-8888. - ELETTRONICO. - 10:4(2023), pp. 485-498. [10.1142/S2196888823500100]
Multinomial Naive Bayes Classifier for Sentiment Analysis of Internet Movie Database
Francesco Cauteruccio
2023-01-01
Abstract
Sentiment analysis (SA), also known as opinion mining, is a natural language processing (NLP) technique used to determine the sentiment or emotional tone behind a piece of text. It involves analyzing the text to identify whether it expresses a positive, negative, or neutral sentiment. SA can be applied to various types of text data such as social media posts, customer reviews, news articles, and more. This experiment is based on the Internet Movie Database (IMDB) dataset, which comprises movie reviews and the positive or negative labels related to them. Our research experiment's objective is to identify the model with the best accuracy and the most generality. Text preprocessing is the ¯rst and most critical phase in an NLP system since it signi¯cantly impacts the overall accuracy of the classi¯cation algorithms. The experiment implements unsupervised sentiment classi¯cation algorithms including Valence Aware Dictionary and sentiment Reasoner (VADER) and TextBlob. We also examine the supervised sentiment classi¯cations methods such as Naïve Bayes (Bernoulli NB and Multinomial NB). The Term Frequency-Inverse Document Frequency (TFIDF) model is used to feature selection and extractions. The combination of Multinomial NB and TFIDF achieves the highest accuracy, 87.63%, for both classi¯cation reports based on our experiment result.| File | Dimensione | Formato | |
|---|---|---|---|
|
Dewi_Multinomial-naïve-bayes-classifier_2023.pdf
accesso aperto
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza d'uso:
Creative commons
Dimensione
820.7 kB
Formato
Adobe PDF
|
820.7 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


