Loss-function learning for cell type mapping of spatial transcriptomics using single-cell RNA-seq data
Abstract
Breast cancers are complex cellular ecosystems consisting of multiple cell types. Heterotypic interactions and their unique gene expression profiles play central roles in cancer progression and response to therapy. However, our understanding of spatial organization and cellular composition remains limited. Recent models for estimating cellular compositions require labeled training data, which do not necessarily contain the complete biological diversity. In addition, the investigated data of interest can contain different cellular compositions and/or cell types than the ones available in the data used for training the deconvolution model. Most deconvolution models will add the contributions of cell types not covered by the reference profiles to other cell types, making the results biased, noisy, and less trustable. It is known that immune cells are adjusting their molecular phenotype to the tumor microenvironment, making it more difficult for deconvolution models which are using static reference profiles to estimate the cellular composition with high accuracy. In the first part of this thesis, we propose adaptive digital tissue deconvolution (ADTD) to overcome these problems. ADTD is the second of its kind to estimate the hidden background composition and the first to also take into account its profile. In addition, it is the first deconvolution model which also adapts the reference profiles to the tumor microenvironment and can therefore estimate cell type regulation. We used the ADTD model to investigate the differences in regulation for different breast cancer subtypes (estrogen receptor-positive, human epidermal growth factor receptor 2-positive, and triple-negative breast cancer). In the second part of the thesis, we investigate single-cell RNA sequencing (scRNA-seq) and corresponding spatial transcriptomics data from 26 human breast cancers with the digital tissue deconvolution (DTD) model to classify the cellular composition within breast tumors. Additionally, we investigate cell type distribution of the spatial transcriptomics data to get information about the spatial distribution of the different cell types. The use of scRNA-seq data has revolutionized the field of transcriptomics by providing a high-resolution view of gene expression data at the single-cell level. However, spatial transcriptomics analysis is often complicated due to the presence of multiple cell types in a given spot. DTD is a promising deconvolution model that allows us to train the model on scRNA-seq data and transfer it to spatial transcriptomics data to estimate the cellular composition in each spot and get the distribution of every cell type. Recently, spatial transcriptomics measurements have emerged as a powerful tool for mapping the transcriptome of cells in their tissue context. This provides a valuable source for understanding cell type-specific gene expression in complex tissues such as breast cancers. In this study, we used DTD on spatial transcriptomics data in order to spatially map the cell type distributions. For benchmarking the state-of-the-art model Cell2location is used. Our results show that DTD is a powerful and easy-to-use tool for determining the cellular composition of breast cancers. We can also see that the performance of DTD is more justified than that of Cell2location. In the first part of the thesis, we demonstrate that ADTD has high performance in deconvolution and estimates the cellular composition of the hidden background as well as its profile under optimal conditions. Additionally, ADTD shows good performance even under difficult conditions. Known gene regulations for breast cancer subtypes which match the literature were also found. In the second part of the thesis, we demonstrated the usability and performance of DTD in combination with scRNA-seq and spatial transcriptomics data for spatial mapping and understanding of the cellular heterogeneity of breast cancers.
Description
Postponed access: the file will be accessible after 2028-06-01