data$analy)cs$and$mathema)cal$modeling$for$ psychiatric...

26
Data Analy)cs and Mathema)cal Modeling for Psychiatric Diagnosis in a Big Data Processing Environment Kazuo Ishii, PhD, Professor of Genomic Sciences Kazuo Ishii 1* , Shusuke Numata 2 , Makoto Kinoshita 2 and Tetsuro Ohmori 2 1 Tokyo University of Agriculture and Technology, Tokyo, Japan 2 University of Tokushima School of Medicine, Tokushima, Japan * EFmail: [email protected]

Upload: others

Post on 07-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric ......Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric$Diagnosis$in$a$Big$Data$Processing$ Environment$ Kazuo$Ishii,$PhD,$$Professor$of$Genomic$Sciences$!

Data$Analy)cs$and$Mathema)cal$Modeling$for$Psychiatric$Diagnosis$in$a$Big$Data$Processing$

Environment$

Kazuo$Ishii,$PhD,$$Professor$of$Genomic$Sciences$!Kazuo!Ishii1*,!Shusuke!Numata2,!Makoto!Kinoshita2and!Tetsuro!Ohmori2!1!Tokyo!University!of!Agriculture!and!Technology,!Tokyo,!Japan!2!University!of!Tokushima!School!of!Medicine,!Tokushima,!Japan!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!*EFmail:[email protected]

Page 2: Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric ......Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric$Diagnosis$in$a$Big$Data$Processing$ Environment$ Kazuo$Ishii,$PhD,$$Professor$of$Genomic$Sciences$!

Agenda!�

•  Back$ground$•  Research$Aim$and$Target$

•  Research$Scheme$

•  Prac)cal$Case$Study$•  Summary$

Page 3: Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric ......Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric$Diagnosis$in$a$Big$Data$Processing$ Environment$ Kazuo$Ishii,$PhD,$$Professor$of$Genomic$Sciences$!

Era$of$Genomic$Big$Data�

•  Genomic$Big$Data$produc)on$by$Next$Genera)on$Sequencing$Technologies$is$increasing$year$aJer$year.$�

Next!GeneraMon!Sequencers�

Back$ground$

Page 4: Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric ......Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric$Diagnosis$in$a$Big$Data$Processing$ Environment$ Kazuo$Ishii,$PhD,$$Professor$of$Genomic$Sciences$!

Mental$Health�

•  Neuropsychiatric$Disorders,$such$as$depression,$bipolar$disorders$are$increasing$year$aJer$year.$

•  But,$no$effec)ve$evidence$basedNdiagnosis.$$•  Big$DataNbased$new$diagnosis$$

system$is$expected$$

to$provide$$

revolu)onary$$

innova)on$$

in$mental$health.$$�

Depression�Bipolar!Disorders�

(x!1000!persons)�

From!Japanese!Government!Documents!(2012)!!!

Increasing$Number$of$Mental$Illness$�

!!�

Others�

Persistent!Mood!Disorders�

1996�1999�2002�2005�2008�2011�

Back$ground$

Page 5: Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric ......Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric$Diagnosis$in$a$Big$Data$Processing$ Environment$ Kazuo$Ishii,$PhD,$$Professor$of$Genomic$Sciences$!

Research$Aim$and$Target$�•  Aim:$$

$Development$of$Big$Data$Mining$Method$Development$of$op)mized$algorithm$and$mathema)cal$

modeling$methods$for$genomic$big$data;$from$$

500,000$$N$10,000,000$explanatory$variables$$

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$(biological$markers)$

$

•  Target$(Data$is$provided$by$Tokushima$Univ.)$

Diagnosis$system$for$three$major$mental$disorders;!depression,$etc!!!

Research$Aim$and$Target$$

Page 6: Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric ......Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric$Diagnosis$in$a$Big$Data$Processing$ Environment$ Kazuo$Ishii,$PhD,$$Professor$of$Genomic$Sciences$!

Overview$of$Research$Process$$Mathema)cal$Modeling$for$Big$Data�

Unstructured$Data�

Structured$Data�

Selec)on$of$$Explanatory$

Variables�

Discrimina)on$$of$Data�

Mathema)cal$$Modeling�

Op)miza)on$$of$Models�

Hadoop$MapReduce,$shell$scrip)ng,$data$processing$with$$

NoSQL,$Monte$Carlo$Simula)on�

Data$processing$with$RDMS!MySQL,$PostgreSQL"�

Evalua)on$$of$Models�

Sta)s)cal$significance$tests$(Student's$t$test,$MannNwhitney$U$test,$etc),$sparse$modeling�

Mul)variable$analyses$$(Mul)ple$Regression,$$

Discriminant$analysis),$$Support$Vector$Machine$(SVM),$$

Machine$Learning$(SOM$etc.),$$Baysean$Filtering,$etc.�

Linear$Regression$Model,$$Logis)c$Regression$Model$and$Mixed$Model,$

etc.�

Coefficient$of$determina)on,$Wilks$Lambda,$Akaike's$Informa)on$Criterion$(AIC),$$

Bayesian$Informa)on$Criterion$(BIC),$etc.�

Cross$valida)on,$including$LeaveNone$Out�

Research$Scheme$

Page 7: Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric ......Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric$Diagnosis$in$a$Big$Data$Processing$ Environment$ Kazuo$Ishii,$PhD,$$Professor$of$Genomic$Sciences$!

HPC$and$Cloud$(Amazon)�

•  HPC$$Very$Large$Memory$and$Many$Core$CPUs$

4TB$Memory,$80$core$CPU$

$

•  Cloud$(Amazon)$

Many$Core$CPUs$$

but$memory$is$not$so$large$

244$GB$Memory,$32$core$CPU$x$n$

More$core$CPUs$available$by$using$many$instances.$

$

Plaform$should$be$selected$based$on$its$purpose$

$

Powerful$and$High$Performance�

Research$Scheme$

Page 8: Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric ......Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric$Diagnosis$in$a$Big$Data$Processing$ Environment$ Kazuo$Ishii,$PhD,$$Professor$of$Genomic$Sciences$!

Example$of$Methyla)on$Calling$

SoJware�

•  Bismark!−!Mapping!with!bowMe!•  PASH!−!small!memory!and!fast!•  BSMAP!−!Mapping!with!SOAP!!•  Methylcoder$

•  BSNSeq$−!for!plants!•  Kismeth!−!for!plants,!webFbased�

Research$Scheme$

Page 9: Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric ......Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric$Diagnosis$in$a$Big$Data$Processing$ Environment$ Kazuo$Ishii,$PhD,$$Professor$of$Genomic$Sciences$!

HPC:!High!Performance!CompuMng�

����������������������

Page 10: Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric ......Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric$Diagnosis$in$a$Big$Data$Processing$ Environment$ Kazuo$Ishii,$PhD,$$Professor$of$Genomic$Sciences$!

HPC:!RIKEN$“K$Computer”$Compa)ble$

Computer:$SCLS$supercomputer$system�

������������ ��������

Page 11: Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric ......Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric$Diagnosis$in$a$Big$Data$Processing$ Environment$ Kazuo$Ishii,$PhD,$$Professor$of$Genomic$Sciences$!

U)liza)on$of$Amazon$Web$Services$

(AWS)�

Provided!from!Mr.Y.!Yoshiara,!AWS!JAPAN�

Page 12: Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric ......Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric$Diagnosis$in$a$Big$Data$Processing$ Environment$ Kazuo$Ishii,$PhD,$$Professor$of$Genomic$Sciences$!

Available$Open$Source$Tools$on$Amazon$

Web$Services�

Provided!from!Mr.Y.!Yoshiara,!AWS!JAPAN�

Page 13: Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric ......Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric$Diagnosis$in$a$Big$Data$Processing$ Environment$ Kazuo$Ishii,$PhD,$$Professor$of$Genomic$Sciences$!

Plaform$should$be$selected$based$on$

its$purpose$•  Data$Analysis$of$MethylNSeq$requires$extremely$large$memory$$

•  ex.$$$BisMark$(Methyla)on$site$calling$soJ)$

$$$$N>$870$GB$in$one$process$

$R$N>$900$GB$in$one$process$$

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$requires$about$1TB$memory$

$Amazon$–$Cloud$could$not$$$$$$$$$$$$$$$$$$$$$analyze$methyla)on$calling$with$BisMark$�

Research$Scheme$

Page 14: Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric ......Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric$Diagnosis$in$a$Big$Data$Processing$ Environment$ Kazuo$Ishii,$PhD,$$Professor$of$Genomic$Sciences$!

Prac)cal$Case$Study�

Here,!we!only!show!the!case!of!450K!MicroArray!in!this!presentaMon.!!Results!of!NGS!will!be!shown!elsewhere.!�

Prac)cal$Case$Study$

Page 15: Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric ......Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric$Diagnosis$in$a$Big$Data$Processing$ Environment$ Kazuo$Ishii,$PhD,$$Professor$of$Genomic$Sciences$!

Research$Process$in$This$Method$$Mathema)cal$Modeling$for$Big$Data�

Structured$Data�

Selec)on$of$$

Explanatory$

Variables�

Discrimina)on$$of$Data�

Mathema)cal$$Modeling�

Op)miza)on$$of$Models�

Evalua)on$$of$Models�

MannNwhitney$U$test$and$Ranking�

Cross$valida)on$(Training$set$and$Valida)on$set)�

Illumina$450K$DNA$Methyla)on$Microarray$

Linear$Discriminant$Analysis$(LDA)�

Discriminant$Func)on�

Backward$Elimina)on$Method�

Page 16: Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric ......Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric$Diagnosis$in$a$Big$Data$Processing$ Environment$ Kazuo$Ishii,$PhD,$$Professor$of$Genomic$Sciences$!

DNA!MethylaMon!rate!does!not!!show!a!normal!distribuMon�

Both$Next$Genera)on$Sequencing$Data$and$$Methyla)on$MicroArray$Data�

BetaFvalue!for!an!ith$interrogated!CpG!site!is!defined!as:!�

where!yi,menty$and!yi,unmenty$are!the!intensiMes!measured!by!the!ith!methylated!and!unmethylated!probes,!respecMvely�

Page 17: Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric ......Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric$Diagnosis$in$a$Big$Data$Processing$ Environment$ Kazuo$Ishii,$PhD,$$Professor$of$Genomic$Sciences$!

DNA!MethylaMon!rate!does!not!!show!a!normal!distribuMon�

Both$Next$Genera)on$Sequencing$Data$and$$Methyla)on$MicroArray$Data�

No$equal$variances$$

Range:$

0$<=$Beta$<=$1�

!Protocol!Exchange!(2014)!doi:10.1038/protex.2014.002�

Beta$Score�

Sites�

Page 18: Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric ......Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric$Diagnosis$in$a$Big$Data$Processing$ Environment$ Kazuo$Ishii,$PhD,$$Professor$of$Genomic$Sciences$!

Ra)o$Data�

•  The!distribuMon!of!errors!does!not!show!normal!distribuMon!(binominal$distribu3on)$

•  Non!equal!variances�σ$2$=npq�$•  Dependent!variables!have!upper!and!lower!limits!(0!<=!Beta!<=!1)!

•  SomeMmes!shows!overdispersion�!So!the!significance!should!test!by!Non!Parametric!tesMng!

Page 19: Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric ......Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric$Diagnosis$in$a$Big$Data$Processing$ Environment$ Kazuo$Ishii,$PhD,$$Professor$of$Genomic$Sciences$!

Mon!Parametric!Test!is!Required!Mann–Whitney$U$test�

!F!Log2(P)�

�� ��� ��

Selected!Sites�

20!paMents!and!!19!healthy!volunteers!�

This!is!the!example!of!one!neuropsychiatric!diseases.!20!paMents!and!19!healthy!volunteers!were!tested!with!500,!000!explanatory!variables.!!

Page 20: Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric ......Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric$Diagnosis$in$a$Big$Data$Processing$ Environment$ Kazuo$Ishii,$PhD,$$Professor$of$Genomic$Sciences$!

Linear$Discriminant$Analysis�� Discriminant$Func)on�

Page 21: Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric ......Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric$Diagnosis$in$a$Big$Data$Processing$ Environment$ Kazuo$Ishii,$PhD,$$Professor$of$Genomic$Sciences$!

Discriminant$Func)on��

where!!fkm!=!the!value!(score)!on!the!canonical!discriminant!funcMon!for!case!m!in!the!group!k.!!Xikm!=!the!value!on!discriminant!variable!Xi$for!case!m!in!group!k;!and!!ui!=!coefficients!which!produce!the!desired!characterisMcs!in!the!funcMon.�

Discriminant$Score�

Page 22: Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric ......Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric$Diagnosis$in$a$Big$Data$Processing$ Environment$ Kazuo$Ishii,$PhD,$$Professor$of$Genomic$Sciences$!

EvaluaMon!of!the!DiscriminaMon!SensiMvity!and!Specificity�

Sensi&vity$=$true$posi3ves$/$(true$posi3ve$+$false$nega3ve)$$$$$$$$$$$$$$$$$$$=$Diagnosed$as$pa3ents$/$Pa3ents�

Specificity$=$true$nega3ves$/$(true$nega3ve$+$false$posi3ves)$$$$$$$$$$$$$$$$$$=$Diagnosed$as$non$pa3ents$/$Healthy$Volunteers$�

Page 23: Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric ......Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric$Diagnosis$in$a$Big$Data$Processing$ Environment$ Kazuo$Ishii,$PhD,$$Professor$of$Genomic$Sciences$!

Discriminant!analysis!with!20!paMents!and!19!healthy!volunteers!(Training!group)!!With!methylaMon!rate!of!DNA!Markers!top20!ranked!by!MannFwhitny!U!test�

Healthy!!Volunteer�

PaMents�

Discriminant$Analysis$of!!a!Psychiatric!Disorder!with!!DNA!MethylaMon!Markers!in!a!Training$group$�

Discrim

inant!S

core�

!!�

20!paMents!and!!19!healthy!volunteers!�

Posi)ve�

Nega)ve�

Page 24: Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric ......Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric$Diagnosis$in$a$Big$Data$Processing$ Environment$ Kazuo$Ishii,$PhD,$$Professor$of$Genomic$Sciences$!

Discriminant$Analysis$of!!a!Psychiatric!Disorder!with!!DNA!MethylaMon!Markers!in!a!Valida)on$group$�Discriminant!Analysis!with!12!paMents!and!12!healthy!volunteers!(ValidaMon!group)!!

With!MethylaMon!rate!of!DNA!Markers!top20!ranked!by!MannFwhitny!U!test�

Healthy$$Volunteer� Pa)ents�

Discrim

inant$Score�

!!!!The!discriminant!funcMon!was!reconstructed!for!evaluaMon!of!variables.!

!!!�

Posi)ve�

Nega)ve�

12$pa)ents$and$$12$healthy$volunteers$� 12$pa)ents$and$$

12$healthy$volunteers$�

Page 25: Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric ......Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric$Diagnosis$in$a$Big$Data$Processing$ Environment$ Kazuo$Ishii,$PhD,$$Professor$of$Genomic$Sciences$!

X7512551017_R01C02.AVG

_Beta

X7512551017_R04C02.AVG

_Beta

X7512551047_R06C02.AVG

_Beta

X7512551017_R05C02.AVG

_Beta

X7512551047_R02C02.AVG

_Beta

X7512551047_R01C02.AVG

_Beta

X7512551047_R03C02.AVG

_Beta

X7512551047_R05C02.AVG

_Beta

X7512551017_R02C02.AVG

_Beta

X7512551047_R04C02.AVG

_Beta

X7512551017_R06C02.AVG

_Beta

X6264488085_R03C02.AVG

_Beta

X6264488085_R04C02.AVG

_Beta

X6057825132_R06C02.AVG

_Beta

X6264488085_R01C02.AVG

_Beta

X6264488085_R05C02.AVG

_Beta

X6057825132_R04C02.AVG

_Beta

X6264488085_R02C02.AVG

_Beta

X6264488085_R06C02.AVG

_Beta

X7512551017_R03C02.AVG

_Beta

c7512551047_R01C01.AVG

_Beta

c7512551047_R03C01.AVG

_Beta

c6264488085_R04C01.AVG

_Beta

c6264488085_R05C01.AVG

_Beta

c6264488085_R01C01.AVG

_Beta

c7512551017_R02C01.AVG

_Beta

c7512551047_R05C01.AVG

_Beta

c7512551047_R06C01.AVG

_Beta

c7512551017_R06C01.AVG

_Beta

c7512551017_R01C01.AVG

_Beta

c7512551047_R04C01.AVG

_Beta

c6057825132_R06C01.AVG

_Beta

c6264488085_R03C01.AVG

_Beta

c7512551017_R03C01.AVG

_Beta

c7512551017_R04C01.AVG

_Beta

c6057825132_R05C01.AVG

_Beta

c6264488085_R06C01.AVG

_Beta

c6057825132_R04C01.AVG

_Beta

c7512551017_R05C01.AVG

_Beta

12

7

11

3

13

10

8

9

1

2

4

5

6

MDD−Control:13_Mathylation_Sites

Cluster!Analysis!of!!a!Psychiatric!Disorder!with!!DNA!MethylaMon!Markers!in!a!Training!group!�

�Healthy!!Volunteer�

PaMents�

20!paMents!and!!19!healthy!volunteers!�

Page 26: Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric ......Data$Analy)cs$and$Mathema)cal$Modeling$for$ Psychiatric$Diagnosis$in$a$Big$Data$Processing$ Environment$ Kazuo$Ishii,$PhD,$$Professor$of$Genomic$Sciences$!

Summary�

•  Big$Data$processing$environment$should$be$selected$based$on$its$performance$and$purpose$of$data$analysis�

•  Mul)variable$diagnosis$methods$using!DNA!methylaMon!raMo!works!well!for!Diagnosis$of$Psychiatric$Diseases$

•  Selec)on$with$a$non$parametric$test$and!mul)variable$analysis$is!extremely!effecMve!