Logo CICANCER

Two useful bioinformatics applications for cancer data analysis: a single-cell dataset comparison tool; a survival analysis and patient risk prediction tool.

Two useful bioinformatics applications for cancer data analysis: a single-cell dataset comparison tool; a survival analysis and patient risk prediction tool.

Óscar González-Velasco y Alberto Berral-González

Centro de Investigación del Cáncer

Date: 23/06/2022
Time: 12:30
on line
Host: Javier De Las Rivas

In this Seminar we will be presenting two bioinformatic algorithms developed in R, one for the analysis of single-cell data and a second one for the analysis of gene-associated survival data in patients; both applied to the field of Cancer Computational Genomics.

– The first one, called ClusterFoldSimilarity, is an R package that calculates a measure of similarity between cell clusters from different single-cell datasets, without the need of correcting for batch effect or normalizing and merging the data, thus avoiding artifacts and the loss of information derived from these kinds of techniques. The similarity metric is based on the average vector module and sign of the product of logarithmic fold-changes. The algorithm compares every single pair of cell clusters from any number of different datasets (including sets with different number of clusters), providing a direct identification of common cell clusters between different datasets as well as the identification of the most significant genes that mark each cluster.

– The second algorithm developed, called ASURI, is also an R package that provides an integrated set of functions for disease survival analysis and patient risk predictions based on gene signatures. The tool allows: (i) the discovery of robust and reproducible gene lists associated with disease survival based on gene expression or on other gene-related activity signal; (ii) the discovery of gene markers linked to survival, by identification of the significant association of gene expression (or other gene-related signal) with clinical variables or phenotypic characteristics; and (iii) the construction of robust patient risk predictors based on gene signatures using univariate and multivariate approaches.