Download - Tensor decomposition based unsupervised feature extraction applied to matrix products for multiview data processing

Tensor decompositionbased unsupervisedfeature extraction applied to matrix products

for multiview data processing

Yh. Taguchi

Department of Physics, Chuo UniversityTokyo, Japan.

PLoS ONE 12(8): e0183933. PLoS ONE 12(8): e0183933. DOI: 10.1371/journal.pone.0183933DOI: 10.1371/journal.pone.0183933

What's typical in Bioinformatics?What's typical in Bioinformatics?

Small samples（a few）, variables（=genes）arehuge（~104）→a typical “large p small n” problem

Difficult to apply usual statistical analyses

ex. small samples deep learning → ×“large p small n” problem→sparse modeling (lasso)variable selections ×

Approaches specific to bioinformatics are required

Purpose: multiview data analysis

persons×

features

persons

features

persons×

shoppings

shoppings

features：A,B,D,M

persons：β,δ,μ

shoppings：1,3,4

persons

matrix 　　tensor

×xij xil

xij ×xil

xijl

Tensor decomposition

Gxik1

xjk2

xlk3

xijl=xij ×xil≒Σk1,k2,k3 Gk1,k2,k3

xik1xjk2

xlk3

i:personsj:featuresl：shoppings

Demonstration using synthetic data set

50 50

1000+20%ノイズ

50

100%noise

No correlationsNo correlations

＋＋

50

+20%ノイズ

50×1000×1000

tensor

Tensor decomposition

xik1

k1=1

1≦i 50≦

k1=2 k1=3

xjk2

k2=1

k2=2

xlk2

k3=1

k3=2

1≦j 1000≦ 1≦l 1000≦

persons

features shoppings

Advantages as multiview data analysis toolsAdvantages as multiview data analysis tools

・No weights required to integrate multiple views・Complete unspervised learning

（no model buildings using preknowledge）・smaller computational resources because of linearity

Disadvantages....

・tendency to require more memoriesSolution：summing up Σi xij ×xil results in j×l matrix that can be converted back （explains omitted）。

・no shared feature or samples result in four mode.

Feature extractionFeature extraction No real data separated well

Assume Gaussian

Detect outliers

Pi=P [ >∑k(x ikσ )

2

]

BenjaminiHochberg corrected P <0.01

Pvalues by χ2 dist

P(p)

1p0 1

Applications：multiomics data

mRNAsample1

sample2

sample3

sample4

sample5

miRNA

A group

B group

activeactive

expression interaction

xij ×xil i：161samples, j:13393mRNA, l:755miRNA,(8 groups)

Selection of xik1distinct between symptoms

k1=1 k1=2 k1=3 k1=4 k1=5

1≦k1 5 are symptom dependent≦Pvalue

k2 k3 k1 G(k1,k2,k3)

1≦k1 k2 k3 5≦

k1 ：samplek2 ：mRNA k3 ：miRNA

1≦ k2 5≦Larger G

Smaller G

1≦ k3 2≦

xjk2xlk3

assume Gaussian

Detect outliers

BenjaminiHochberg corrected P <0.01

Pvalues by χ2 dist

755miRNA中7miRNA13393mRNA中427mRNA（Biological validations omitted）

SummarySummary

・ As a feature selection in multi view data, after applying tensor decomposition to a tensor generated by product of matrices, I propose to select features associated with BHcorrected Pvalues <0.01 computed by χ2 dist assumed for a mode.

・ As for synthetic data set, apparently uncorrelated variables embedded into noised are decomposed to original orthogonal vectors after identifying correlated variables.

・As for muli omics data set, a few (a few %) intercorrelated and biologically reasonable miRNAs and mRNAs are identified among huge number of mRNAs and miRNAs

My presentation in GIW2017:GIW 7 RNA Bioinformatics2nd Nov. Morning (c.a. 10 AM)

at Adonis (1F)

Tensor decompositionbased unsupervised feature extraction identified the universal nature of sequencenonspecific offtarget

regulation of mRNA mediated by microRNA transfection

Download - Tensor decomposition based unsupervised feature extraction applied to matrix products for multiview data processing

Top Related