The class-imbalance problem is the problem of learning a classification rule from data that are skewed in favor of one class. On these datasets traditional learning techniques tend to overlook the less numerous class, at the advantage of the majority class. However, the minority class is often the most interesting one for the task at hand. For this reason, the class-imbalance problem has received increasing attention in the last few years. In the present paper we point the attention of the reader to a learning algorithm for the minimization of the average misclassification risk. In contrast to some popular class-imbalance learning methods, this method has its roots in statistical decision theory. A particular interesting characteristic is that when class distributions are unknown, the method can work by resorting to stochastic gradient algorithm. We study the behavior of this algorithm on imbalanced datasets, demonstrating that this principled approach allows to obtain better classification performances compared to the principal methods proposed in the literature.

Bayes Vector Quantizer for Class-Imbalance Problem / Diamantini, Claudia; Potena, Domenico. - In: IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING. - ISSN 1041-4347. - 21:5(2009), pp. 638-651. [10.1109/TKDE.2008.187]

Bayes Vector Quantizer for Class-Imbalance Problem

Diamantini Claudia;Potena Domenico
2009-01-01

Abstract

The class-imbalance problem is the problem of learning a classification rule from data that are skewed in favor of one class. On these datasets traditional learning techniques tend to overlook the less numerous class, at the advantage of the majority class. However, the minority class is often the most interesting one for the task at hand. For this reason, the class-imbalance problem has received increasing attention in the last few years. In the present paper we point the attention of the reader to a learning algorithm for the minimization of the average misclassification risk. In contrast to some popular class-imbalance learning methods, this method has its roots in statistical decision theory. A particular interesting characteristic is that when class distributions are unknown, the method can work by resorting to stochastic gradient algorithm. We study the behavior of this algorithm on imbalanced datasets, demonstrating that this principled approach allows to obtain better classification performances compared to the principal methods proposed in the literature.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11566/53209
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 33
  • ???jsp.display-item.citation.isi??? 17
social impact