Understanding the underlying structure of medical data is essential for developing robust and reliable classification models. Supervised learning, which relies on predefined classes, may fail to capture the intrinsic patterns within the data, potentially leading to suboptimal outcomes. This study investigates the application of unsupervised clustering to analyze and validate the structure of a public medical dataset, the Breast Tissue Dataset, with varying class configurations (6 vs. 4 classes). Clustering methods, such as KMeans and Affinity Propagation models, were applied alongside classification models, including Random Forest and XGBoost. Key performance metrics, such as accuracy and confusion matrices, were employed to evaluate classification performance, while clustering results were assessed using the Adjusted Rand Index (ARI) and the Hopkins Test, which evaluates the clustering tendency of datasets. Additionally, the robustness of clustering to measurement uncertainty was examined by introducing synthetic noise (5 % and 10 % perturbations) into the input data, simulating real-world variability. The study further explores how clustering can reveal insights into class labels and assess the separability of different groups. Results demonstrate the utility of combining unsupervised clustering with supervised methods to enhance data exploration, assess the reliability of predefined labels, and improve classification in medical applications, even in the presence of measurement uncertainty.

Bridging Supervised and Unsupervised Learning for Classification of Breast Tissue / Negri, Virginia; Iadarola, Grazia; Mingotti, Alessandro; Tinarelli, Roberto; Peretto, Lorenzo. - (2025), pp. 1-6. ( 20th IEEE International Symposium on Medical Measurements and Applications, MeMeA 2025 grc 2025) [10.1109/memea65319.2025.11067980].

Bridging Supervised and Unsupervised Learning for Classification of Breast Tissue

Iadarola, Grazia
Secondo
Conceptualization
;
2025-01-01

Abstract

Understanding the underlying structure of medical data is essential for developing robust and reliable classification models. Supervised learning, which relies on predefined classes, may fail to capture the intrinsic patterns within the data, potentially leading to suboptimal outcomes. This study investigates the application of unsupervised clustering to analyze and validate the structure of a public medical dataset, the Breast Tissue Dataset, with varying class configurations (6 vs. 4 classes). Clustering methods, such as KMeans and Affinity Propagation models, were applied alongside classification models, including Random Forest and XGBoost. Key performance metrics, such as accuracy and confusion matrices, were employed to evaluate classification performance, while clustering results were assessed using the Adjusted Rand Index (ARI) and the Hopkins Test, which evaluates the clustering tendency of datasets. Additionally, the robustness of clustering to measurement uncertainty was examined by introducing synthetic noise (5 % and 10 % perturbations) into the input data, simulating real-world variability. The study further explores how clustering can reveal insights into class labels and assess the separability of different groups. Results demonstrate the utility of combining unsupervised clustering with supervised methods to enhance data exploration, assess the reliability of predefined labels, and improve classification in medical applications, even in the presence of measurement uncertainty.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11566/349734
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact