One class Differential Expression Analysis ‐using Tensor Decomposition based‐
Unsupervised Feature Extraction Applied to Integrated Analysis of Multiple Omics Datafrom 26 Lung Adenocarcinoma Cell Lines
Yh. TaguchiDepartment of Physics,
Chuo University,Tokyo, Japan.
Reasons:
Purpose:
1. CCLs often differ from tumors from which CCLs were generated although CCLs are often used as representatives of tumors.
2. Most of studies using CCLs focus the comparison between treated and control CCLs. Not CCLs themselves.
Characterizing cancer cell lines (CCLs) themselves without comparisons with anything referenced.
Methods:
Through collecting 26 nonsmall cell lung (NSCL) CCLs from DBTSS(*) and identifying genes commonly expressed among 26 NSCLCCLs.
But what is the definition of “commonly expressed” if the references are missing?
(*) dbtss.hgc.jp: Data base of transcription start sites. Now including more data sets including histone modification, promoter methylation, RNAseq, long read, single cell RNAseq etc, etc....
DBTSS(*)
Solutions:
Usage of recently proposed tensor decomposition (TD) based unsupervised feature extraction (FE)
Q:What is TD?
A: Extension of matrix factorization to tensor.
xijk = ∑l1,l2,l3 G(l1,l2,l3)xl1i xl2j xl3k
ik j
xijk
Tensor
l2
l1
l3
GCore tensor ixl1i
j
xl2j
k xl3kSingular
value matrix
Over all plans of this studyOver all plans of this study
Proposed methods
Synthetic data sets
Real data sets
Biological validations of selected genes
TD based unsupervised FE applied to synthetic data set
Cell lines
Genes
Omics AOmics B
Omics C
Omics
Commonly expressive
Omics specific expressive
Cell line specific expressive
Genes
expr
essi
on
Task : identify “commonly expressive” genes
Results of TD:
Commonly expressiveOmics specific expressiveCell line specific expressive
xl1i :i=1,...,104 genes
l1=1
l1=5
Singular value vectorsxl2j :j=1,...,20 cell lines
l2=1
jCell lines independent Cell lines independent
expressionexpression
Over all plans of this studyOver all plans of this studyReal data sets
TD based unsupervised FE applied to real data set
TSSseqRNAseq
ChIPseq(H3K27ac)
coincidence
regulation
Commonly expressive genes independent of omics and cell lines
(*)dbtss.hgc.jp
26 lung adenocarcinoma cell lines (*)
Chromosome 122,X,Y 3 omics summed up within 25,000 bp each long interval in each chromosome
TD
xjkiR(3 omics) (26 cell lines) (intervals)
Chromosome 122, X,Y
The
f ir s
tT
he f i
r st
c ell
l ine
s ing
ular
val
ue v
ecto
r
26 lung adenocarcinoma cell lines
G(l1,l2,l3)
Cell line singular value vectorsOmics singular value vectors
interval singular value vectors
Associated with omics independent
expression
xl3i , l3=1,2,3
xl3i , l3=1,2,3
Outliers = intervals associated with cell line independent expression
interval singular value vectors
2703 Entrez gene IDs included in outlier intervals
l3=1
l3=2
l3=3
Over all plans of this studyOver all plans of this study
Biological validations of selected genes
Biological validations of selected genesBiological validations of selected genes
Q1: Are genes NSCLC specific?
A1: Yes (MSigDB) 78100 2625
Selected genesNSCLC
Q2: Are genes lung specific?
A2: No,
tissue specificity lungCell type specificity
glandular cells
but glandular cells specific (g:profiler)
GO term / Reactome enrichmentGO term / Reactome enrichment
SRPdependent cotranslational protein targeting to membrane
nucleartranscribed mRNA catabolic process, nonsensemediated decay (NMD)
SRP & NMD are often reported cancer causing factors
As for more enrichment analyses performed, including PPI and TF binding, see supporting information of the paper.
ConclusionsConclusions
TD based unsupervised FE was applied to synthetic data set and real (multiomics) data set
As for the application to synthetic data set, TD based unsupervised FE successfully identified commonly expressed genes.
As for the application to real data set, TD based unsupervised FE successfully identified biologically reasonable genes.
Future directions
Since DBTSS was recently (Sep. 2017) updated, it has more data sets to which TD based unsupervised FE can be applied.
I am looking for someone which can provide me more data set to which TD based unsupervised FE can be applied (e.g., paired multiomics measurements of in vitro study)