Skip to Main content Skip to Navigation

Exploring human T cells functional diversity using single-cell RNAseq: methodological and biological strategies

Abstract : Single-cell ARN sequencing (scRNAseq) is a fairly young technique. It makes a snapshot of a single-cell transcriptome. After a slow start, its usage became more systemic. Indeed, the richness of the data enables a fine dissection of a living organism’s biology. It gives access to an unprecedented amount of information to better understand cell heterogeneity, or quantify the differences between physiological and pathological states. However, the single-cell approach is a double-edged sword. In spite of its democratisation, the wet lab part is still perfectible, as we are currently only able to capture 5 to 20% of the reads. Regarding the computer process, it is challenging: scRNAseq data is noisy as its effective dimension is high, and because of the incomplete capture. Because scRNAseq is still a young technique, not enough time elapsed for analysis stan- dards to emerge. On the contrary, there is an exponential increase in the number of ana- lytical tools. However, there is a common pipeline: load the genes × cells count matrix -or its transpose-, filter out outlier cells and genes, normalise and reduce the dimension. From the data projected onto a smaller subspace, the next steps can be clustering, trajectory inference or visualisation. Finally, the different clusters or trajectory nodes are annotated. This last step, where we interpret the data, is critical but unfortunately often biased. In this thesis, I focused on two aspects of the analysis of scRNAseq data: a methodological aspect, and the interpretation step. First, I studied dimensional noise, alternatively called the curse of dimensionality. The curse complicates the analysis. It blurs the differences between close and far away data points. Since analysing scRNAseq relies heavily on the production of neighbor graphs, the performance will be degraded by the curse, which distorts the graphs. The usual trick is to reduce the dimension. However, the blurring, or concentration, of distances is not the only effect of dimensional noise. An additional phenomenon called the hubness phenomenon is also detrimental to the analysis as it distorts nearest neighbors graphs. While measure concentration cannot be corrected in high dimensional spaces, hubness can. I quantified the magnitude of the hubness phenomenon in omics data, and the effect of correcting for hubness on the performance of scRNAseq analysis. scRNAseq data is indeed "hubby", especially the datasets with a high intrinsic dimension. The performance when analysing the latter would be improved upon hubness correction, with the best performance reached in the space with the highest effective dimension. I reckon that it might be perceived as just another tool in the already existing jungle, but I believe that the change of paradigm is really interesting, as we modified conceptually one of the most performed step of the analysis, the dimension reduction. Second, I focused more specifically on T cells, through the prism of regulatory T cells. Those cells have a precise functional definition, while there is no strong consensus on the population’s markers for humans. I hypothesized that there might be a decorrelation between function and phenotype and I decided to extend my study to all T cells, since the lineage paradigm is also questionable here. I did a supervised analysis of scRNAseq data in order to better unveil T cells’ functionality. After defining functional modules, I can link each cell to its function/s. First, I assessed the novelty of the approach, by comparing it to the unsupervised pipeline. Then, I characterized the functional differences between T cells from a healthy or a cancer tissue. We also implemented this method to analyse dendritic cells from Covid-19 patients, scoring functions exerted by dendritic cells. This strategy can be applied for other immune cells, other diseases, and even in a physiological setting, so as to functionally map immune cells.
Complete list of metadata
Contributor : Elise Amblard Connect in order to contact the contributor
Submitted on : Thursday, August 11, 2022 - 4:38:48 PM
Last modification on : Saturday, August 13, 2022 - 3:37:44 AM


Files produced by the author(s)


  • HAL Id : tel-03750034, version 1



Elise Amblard. Exploring human T cells functional diversity using single-cell RNAseq: methodological and biological strategies. Bioinformatics [q-bio.QM]. Université de Paris, 2021. English. ⟨tel-03750034⟩



Record views


Files downloads