Pancreatic ductal adenocarcinoma (PDAC) is among the deadliest cancers, with a five-year overall survival rate of 13%. Its aggressiveness is driven by marked inter- and intra-tumoral heterogeneity, which complicates diagnosis, limits therapeutic efficacy, and hinders biomarker development. A deeper understanding of the molecular processes shaping this heterogeneity is essential for improving patient stratification and identifying clinically relevant targets. This thesis addresses these challenges through an integrated analysis of genomic, transcriptomic, and single-cell data to characterize PDAC heterogeneity and uncover functional alterations across tumor and stromal compartments. The first chapter examines copy number variations (CNVs), recognized contributors to genomic instability and tumor evolution. CNVs influence gene dosage, disrupt regulatory architecture, and promote oncogenesis and therapy resistance. A comprehensive literature review outlines their origins, biological impact, and clinical significance in PDAC, highlighting both somatic and germline events. The analysis emphasizes how CNVs contribute to tumor diversity, affect prognosis, and may guide patient stratification. It also identifies gaps in current knowledge and the need for more sensitive detection technologies, particularly single-cell approaches, and larger cohorts to fully define CNV-driven mechanisms in PDAC. The second chapter benchmarks four computational tools (sciCNV, InferCNV, CopyKAT, and SCEVAN) for identifying tumor cells from single-cell RNA sequencing (scRNA-seq) data based on CNV inference. Using scRNA-seq datasets from PDAC tumors, adjacent tissue, and healthy pancreas, substantial variability in performance was observed: InferCNV showed the highest sensitivity (0.72) and SCEVAN the highest specificity (0.75). However, overlap between tools was limited (<30%), and false positives were frequent. These results show that CNV-based tumor cell calling is unreliable when used alone and must be complemented with known PDAC biomarkers. The findings also highlight the need for more robust computational strategies for tumor cell identification. The third chapter addresses the lack of a centralized, clinically informative repository of PDAC transcriptomic signatures. A systematic review of 399 publications identified 732 single-gene and multi-gene signatures linked to tumor progression, immune modulation, and therapy response. These signatures were integrated into PanSCOPE, a searchable database enabling exploration of gene or signature-level information, associated clinical parameters, and compatibility with bulk or single-cell transcriptomic datasets. PanSCOPE supports biomarker discovery, patient stratification, and the study of tumor subpopulations that may influence disease trajectory or treatment outcomes. The fourth chapter uses scRNA-seq to characterize single-nucleotide variants (SNVs) in six PDAC patients, linking mutations to specific cellular contexts. Approximately 4,000 SNVs were detected, including 77 tumor-enriched and 114 non-tumor-enriched variants. Functional prioritization using multiple predictive tools identified variants potentially affecting protein structure, RNA splicing, or post-transcriptional regulation. The results reveal cell-type-specific mutational patterns, subclonal diversity, and microenvironment-associated adaptations. Overall, this thesis provides a multi-layered framework for understanding PDAC heterogeneity, integrating CNV profiling, computational benchmarking, curated transcriptomic resources, and single-cell mutational analysis. The methods and tools developed, particularly PanSCOPE, offer valuable resources for biomarker discovery, patient stratification, and precision oncology.
Il carcinoma duttale pancreatico (PDAC) è tra i tumori più letali, con un tasso di sopravvivenza globale a cinque anni pari al 13%. La sua aggressività è determinata dalla marcata eterogeneità inter- e intra-tumorale, che complica la diagnosi, limita l’efficacia terapeutica e ostacola lo sviluppo di biomarcatori affidabili. Una comprensione più approfondita dei meccanismi molecolari che guidano questa eterogeneità è fondamentale per migliorare la stratificazione dei pazienti e identificare bersagli prognostici e terapeutici. Questa tesi affronta tali sfide attraverso un’analisi integrata di dati genomici, trascrittomici e a singola cellula per caratterizzare l’eterogeneità del PDAC e identificare alterazioni funzionali nei compartimenti tumorali e stromali. Il primo capitolo analizza le variazioni del numero di copie (CNV), riconosciute come importanti determinanti dell’instabilità genomica e dell’evoluzione tumorale. Le CNV influenzano il dosaggio genico, alterano l’architettura regolatoria e favoriscono oncogenesi e resistenza terapeutica. Una revisione sistematica della letteratura ne descrive l’origine, l’impatto biologico e la rilevanza clinica nel PDAC, evidenziando sia eventi somatici sia germinali. L’analisi mostra come le CNV contribuiscano all’eterogeneità tumorale e al valore prognostico, sottolineando la necessità di tecniche di rilevazione più sensibili, in particolare approcci single-cell, e di studi su coorti più ampie. Il secondo capitolo valuta quattro strumenti computazionali (sciCNV, InferCNV, CopyKAT e SCEVAN) per l’identificazione delle cellule tumorali da dati scRNA-seq mediante inferenza di CNV. Utilizzando dataset provenienti da tumori PDAC, tessuto adiacente e pancreas sano, è emersa una forte variabilità nelle prestazioni: InferCNV ha mostrato la sensibilità più alta (0,72) e SCEVAN la specificità più elevata (0,75). Tuttavia, la concordanza tra strumenti era limitata (<30%) e frequenti erano i falsi positivi. Ciò dimostra che l’identificazione tumorale basata esclusivamente sulle CNV non è affidabile e richiede conferma tramite biomarcatori noti del PDAC. Il terzo capitolo affronta la mancanza di un archivio centralizzato di firme trascrittomiche del PDAC con valore clinico. Una revisione sistematica di 399 pubblicazioni ha identificato 732 firme geniche, singole o multigeniche, legate a progressione, immunomodulazione e risposta terapeutica. Queste firme sono state integrate in PanSCOPE, un database consultabile che permette di esplorare firme, parametri clinici associati e confronti con dataset bulk o single-cell. PanSCOPE supporta la scoperta di biomarcatori, la stratificazione dei pazienti e l’identificazione di sottopopolazioni tumorali rilevanti. Il quarto capitolo utilizza dati scRNA-seq per caratterizzare le varianti a singolo nucleotide (SNV) in sei pazienti con PDAC, collegando le mutazioni ai loro specifici contesti cellulari. Sono state identificate circa 4.000 SNV, tra cui 77 arricchite nel comparto tumorale e 114 arricchite nel comparto non tumorale. La prioritizzazione funzionale, ottenuta mediante diversi strumenti predittivi, ha evidenziato varianti potenzialmente in grado di influenzare la struttura proteica, lo splicing dell’RNA o la regolazione post-trascrizionale. I risultati rivelano pattern specifici per tipo cellulare, diversità subclonale e adattamenti associati al microambiente. Nel complesso, questa tesi fornisce un approccio integrato per lo studio dell’eterogeneità del PDAC, combinando l’analisi delle CNVs, il confronto di strumenti computazionali, risorse trascrittomiche curate e analisi single-cell delle mutazioni. I metodi e le risorse sviluppati, in particolare PanSCOPE, costituiscono un supporto utile per l’identificazione di biomarcatori, la stratificazione dei pazienti e lo sviluppo di strategie di oncologia di precisione.
SINGLE-CELL RNA ANALYSIS APPLIED TO PANCREATIC CANCER ENABLES THE IDENTIFICATION OF CELL POPULATIONS, MUTATIONS AND CELL-SPECIFIC TRANSCRIPTOMIC SIGNATURES / Oketch, Daisy Judith Akinyi. - (2026 Mar 24).
SINGLE-CELL RNA ANALYSIS APPLIED TO PANCREATIC CANCER ENABLES THE IDENTIFICATION OF CELL POPULATIONS, MUTATIONS AND CELL-SPECIFIC TRANSCRIPTOMIC SIGNATURES
OKETCH, DAISY JUDITH AKINYI
2026-03-24
Abstract
Pancreatic ductal adenocarcinoma (PDAC) is among the deadliest cancers, with a five-year overall survival rate of 13%. Its aggressiveness is driven by marked inter- and intra-tumoral heterogeneity, which complicates diagnosis, limits therapeutic efficacy, and hinders biomarker development. A deeper understanding of the molecular processes shaping this heterogeneity is essential for improving patient stratification and identifying clinically relevant targets. This thesis addresses these challenges through an integrated analysis of genomic, transcriptomic, and single-cell data to characterize PDAC heterogeneity and uncover functional alterations across tumor and stromal compartments. The first chapter examines copy number variations (CNVs), recognized contributors to genomic instability and tumor evolution. CNVs influence gene dosage, disrupt regulatory architecture, and promote oncogenesis and therapy resistance. A comprehensive literature review outlines their origins, biological impact, and clinical significance in PDAC, highlighting both somatic and germline events. The analysis emphasizes how CNVs contribute to tumor diversity, affect prognosis, and may guide patient stratification. It also identifies gaps in current knowledge and the need for more sensitive detection technologies, particularly single-cell approaches, and larger cohorts to fully define CNV-driven mechanisms in PDAC. The second chapter benchmarks four computational tools (sciCNV, InferCNV, CopyKAT, and SCEVAN) for identifying tumor cells from single-cell RNA sequencing (scRNA-seq) data based on CNV inference. Using scRNA-seq datasets from PDAC tumors, adjacent tissue, and healthy pancreas, substantial variability in performance was observed: InferCNV showed the highest sensitivity (0.72) and SCEVAN the highest specificity (0.75). However, overlap between tools was limited (<30%), and false positives were frequent. These results show that CNV-based tumor cell calling is unreliable when used alone and must be complemented with known PDAC biomarkers. The findings also highlight the need for more robust computational strategies for tumor cell identification. The third chapter addresses the lack of a centralized, clinically informative repository of PDAC transcriptomic signatures. A systematic review of 399 publications identified 732 single-gene and multi-gene signatures linked to tumor progression, immune modulation, and therapy response. These signatures were integrated into PanSCOPE, a searchable database enabling exploration of gene or signature-level information, associated clinical parameters, and compatibility with bulk or single-cell transcriptomic datasets. PanSCOPE supports biomarker discovery, patient stratification, and the study of tumor subpopulations that may influence disease trajectory or treatment outcomes. The fourth chapter uses scRNA-seq to characterize single-nucleotide variants (SNVs) in six PDAC patients, linking mutations to specific cellular contexts. Approximately 4,000 SNVs were detected, including 77 tumor-enriched and 114 non-tumor-enriched variants. Functional prioritization using multiple predictive tools identified variants potentially affecting protein structure, RNA splicing, or post-transcriptional regulation. The results reveal cell-type-specific mutational patterns, subclonal diversity, and microenvironment-associated adaptations. Overall, this thesis provides a multi-layered framework for understanding PDAC heterogeneity, integrating CNV profiling, computational benchmarking, curated transcriptomic resources, and single-cell mutational analysis. The methods and tools developed, particularly PanSCOPE, offer valuable resources for biomarker discovery, patient stratification, and precision oncology.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


