karin verspoor, nicta: bioinformatics and data analytics for next-generation cancer care

29
NICTA Copyright 2013 From imagination to impact Bioinforma)cs and data analy)cs for next genera)on cancer care Karin Verspoor, PhD Principal Researcher Scien)fic Director, Health and Life Sciences NICTA

Upload: informa-australia

Post on 27-Jan-2015

104 views

Category:

Health & Medicine


1 download

DESCRIPTION

Dr. Karin Verspoor, Scientific Director, Health and Life Sciences, NICTA delivered this presentation at the 2013 Cancer Centres Symposium in Australia. The annual event explores current opportunities and challenges surrounding cancer centre policy, funding, operations, innovations and development. For more information about the annual event, please visit the conference website: http://www.informa.com.au/cancercentressymposium

TRANSCRIPT

Page 1: Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

NICTA Copyright 2013 From imagination to impact

Bioinforma)cs  and  data  analy)cs  for  next-­‐genera)on  cancer  care  

Karin  Verspoor,  PhD  

Principal  Researcher  

Scien)fic  Director,  Health  and  Life  Sciences  

NICTA  

Page 2: Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

NICTA Copyright 2013 From imagination to impact

Challenge  

• Enhancing  and  suppor)ng  biomedical  data  analysis  and  interpreta)on  will  facilitate  –  Automated  surveillance  of  pa)ents  –  Performance  outcomes  analysis  

–  Improved  efficiency  in  treatment  

–  Clinical  Decision  Support  –  Predic)ve  modeling  of  disease  risk  

–  Reduc)on  of  human  effort  in  disease  research  –  Improved  diagnos)cs  for  disease  

–  Accelerated  drug  target  and  lead  iden)fica)on  –  Personalised/precision  medicine  

Page 3: Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

NICTA Copyright 2013 From imagination to impact

Data,  Data,  Everywhere  

•  Electronic  health  records  •  Radiology  images:  X-­‐ray,  MRI  and  PET  Scans  

•  Radiology  and  pathology  reports  •  Data  from  sensors    

•  Registry  data  •  Medicare  claim  data  •  Published  biomedical  ar)cles  

•  DNA  (gene)c  material)  from  biopsy  samples  

3

Page 4: Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

NICTA Copyright 2013 From imagination to impact

Making  Sense  of  Biomedical  Data  

4

Page 5: Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

NICTA Copyright 2013 From imagination to impact 5

Computa)on  for  biomedical  data  

Page 6: Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

NICTA Copyright 2013 From imagination to impact

The  need  for  automa)on  

•  Yann  LeCun,  Director  of  the  Center  for  Data  Science  at  New  York  University1:  –  “much  of  the  knowledge  in  the  world  will  soon  need  to  be  extracted  by  machines,  because  there  will  not  be  enough  brain  power  to  do  it.”  

•  Russ  Altman,  Stanford  University2:  –  “Our  en)re  understanding  of  biology  and  medicine  is  really  contained  in  the  published  literature.    And  since  people  write  in  natural  language,  if  you  can’t  get  computers  to  turn  that  informa)on  into  databases  and  computable  informa)on,  you’re  falling  behind.”      

6

1http://www.forbes.com/sites/sap/2013/11/14/the-white-house-honors-sap-stanford-and-nct/ 2http://biomedicalcomputationreview.org/content/ncbcs-take-stock-and-look-forward-fruitful-centers-face-sunset

Page 7: Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

NICTA Copyright 2013 From imagination to impact

“Convergence”  

Bringing  together  clinicians,  biologists,  engineers,  computer  scien)sts,  mathema)cians,  sta)s)cians  and  physicists  

Biomedical  Informa)cs:  Applica)on  of  knowledge  representa)on  and  computa)onal  infrastructure  for  biomedical  data  storage,  retrieval,  manipula)on,  and  analysis.  

Bioinforma)cs:  Process,  analyse  and  interpret  protein  and  genomics  data.  

 Computa)onal  methods  and  algorithms   Robust,  scalable  computa)on  

 Data  mining  

 Predic)ve  analy)cs  7

Page 8: Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

NICTA Copyright 2013 From imagination to impact

Machines  to  Data  to  Machines  to  Knowledge  to  Ac)on  

8

Page 9: Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

NICTA Copyright 2013 From imagination to impact

Uncovering  Hidden  Informa)on  

•  About  80%  of  informa)on  is  buried  in  textual  form  –  Clinical  notes  – Radiology  reports  – GP  and  specialist  lecers  – Medical  ar)cles  

•  Text  Mining  Applica)ons  –  Extrac)ng  data  from  clinical       notes  

–  Connec)ng  with  proteomic  and  genomic  data  –  Linking  with  biomedical  literature  

9

Page 10: Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

NICTA Copyright 2013 From imagination to impact

Prac)ce-­‐based  Evidence  

•  EHRs  capture  health-­‐related  data  •  Turning  that  data  into  ac)onable  informa)on  requires  

analysis  and  modeling  – Data-­‐driven  methods  –  Integra)on  of  mul)ple  sources  of  data  

–  e.g.  combining  clinical  and  gene)c  indicators  in  predic)on  of  cancer  prognosis  

•  Models  produced  via  data  mining  and  predic)ve  analysis  profile  inherited  risks  and  environmental/behavioral  factors  associated  with  pa)ent  disorders  

•  U)lise  to  generate  predic)ons  about  treatment  outcomes  

10

Page 11: Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

NICTA Copyright 2013 From imagination to impact

Pharmacovigilance  •  Mining  of  clinical  records  to  iden)fy  adverse  drug  events  

–  Es)mated  >90%  of  adverse  events  do  not  appear  in  coded  data  

–  Transform  pa)ent  records  into  pa)ent-­‐feature  matrix  encoded  using  clinical  terminologies  

•  Detect  sta)s)cal  associa)ons  between  drugs  and  adverse  events  

11 LePendu et al. (2013) “Pharmacovigilance Using Clinical Notes” Clinical Pharmacology & Therapeutics 93(6), 547–555; doi: 10.1038/clpt.2013.47

Page 12: Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

NICTA Copyright 2013 From imagination to impact

Text  Mining  for  in-­‐hospital  infec)on  

•   Hospital-­‐acquired  infec8on  is  a  major  health  burden  –   $4.5  billion  cost,  98,000  deaths  in  US  annually  [1]  –   >$100  million,  1000  deaths  in  Australia  annually  for  2  common  infec)ons  [2]  

•   Surveillance  as  the  founda)on  of  preven)on  and  control  –   shown  to  lower  infec)on  rates,  improve  detec)on,  iden)fy  overuse  of  expensive  drugs  [3]    

– pervasive  surveillance  not  feasible  without  automated  support  

•   Our  approach:  mining  radiology  reports  and  images  –   automate  surveillance,  leverage  hospital  informa)on  flow  

–   side  benefit:  early  detec)on  –   Joint  project:  Alfred  Health,  Melbourne  Health,  Peter  Mac  Cancer  Ins)tute  

12

Page 13: Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

NICTA Copyright 2013 From imagination to impact

Text  mining  and  beyond  

•  Current  text  mining  performance  –  94%  recall,  90%  precision  at  scan  level  –  98%  recall,  88%  precision  at  pa)ent  level  –  Effec)ve  for  surveillance;  improvement  needed  for  real-­‐

4me  detec4on  

•  Directly  classifying  CT  images  for  IFI  –  Matching  images  being  provided  by  hospital  partners  

–  Set  up  as  mul4-­‐task  learning  problem:              Detect    <Image,Report>    pair  as  indica)ve  of  IFI      

•  Mining  pa8ent  records  for  risk  indicators  –  Mining  historical  pa)ent  data  to  learn  impac)ng  factors  

13

Page 14: Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

NICTA Copyright 2013 From imagination to impact

Searching  for  Disease-­‐related  Genes  

14

•  Large  amounts  of  individual  gene)c  varia)on  –  SNPs,  inser)ons,  dele)ons  –  Copy  number  varia)on,  

genomic  duplica)ons,  inversions,  transloca)ons  

–  DNA  methyla)on,  chroma)n  state,  histone  modifica)on,  RNA  binding  affinity,  etc.  

•  Iden)fying  varia)on  is  becoming  easier,  interpre)ng  it  remains  difficult  

Image credit: Jane Ades, NHGRI, http://www.sciencedaily.com/releases/2008/01/080122101914.htm

Page 15: Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

NICTA Copyright 2013 From imagination to impact

Singular  Nucleo)de  Polymorphisms  

•  Haplotypes:  Associated  SNP  alleles  

•  Chromosome  regions  where  two  groups  differ  in  haplotype  frequencies  might  contain  genes  affec)ng  the  disease  

•  Analysis  enabled  by  large-­‐scale  genomic  data  collec)on,  data  storage,  and  sta)s)cal  frameworks  scaled  to  large  data  sets  

15

Page 16: Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

NICTA Copyright 2013 From imagination to impact 16

1M SNPs

… …

1 m

m

x 1

0,00

0 s

ampl

es

1 m

m

Size  of  Epistasis  Search  Space  

Page 17: Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

NICTA Copyright 2013 From imagination to impact

17

[Insert image or x 1,000,000

GWIS – Genome Wide Interaction Search system

Our  Strength:  Integra)on  of  mathema)cal,  computa)onal,  signal  processing  and  bioinforma)cs  exper)se  resul)ng  in  unique  novel  solu)on,  Genome  Wide  Interac)on  Search  (GWIS):  

●  Run  )me  improved  by  up  to  3  orders  of  magnitude  with  ●  Improved  detec)on  rate

Genome  Wide  Interac8on  Search  (GWIS)  Adam  Kowalcyzk  

Page 18: Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

NICTA Copyright 2013 From imagination to impact

2nd Order GWIS with Bigger Datasets The future of GWAS studies implies bigger datasets giving more precision, but

longer computing times ! We are ready for these future datasets.

3rd Order GWIS We are developing even faster techniques, to make 3rd Order GWAS feasible (all combinations of 3 SNPs).

* Fastest according to the benchmark paper: Li Chen, Guoqiang Yu, David J. Miller, Lei Song, Carl Langefeld, David Herrington, Yongmei Liu, and Yue Wang, A Ground Truth Based Comparative Study on Detecting Epistatic SNPs, Proceedings (IEEE Int Conf Bioinformatics Biomed). 2009 November 1; 1-4(Nov 2009):

SNPs x Samples

Standard algorithm (IG*)

GWIS-CPU (4 Cores Intel 3.0 GHz)

GWIS-GPU (1 GTX 470) Chi-squared test

GWIS-GPU on MASSIVE GPU Cluster (~ 200 Tesla C2050)

GWIS-GPU on Titan (18,688 Tesla K20)

2nd Order 300K x 3K 108 days 39 minutes 3 minutes ~ 0 ~ 0

1M x 10K 11 years 25 hours 1.85 hours ~ 0.5 minutes ~ 0

5M x 10K 275 years 26 days 1.91 days ~ 12.24 minutes ~ 0

3rd Order 300K x 3K ~ 30K years ~ 30 years ~ 2.3 years ~ 5 days ~ 38 minutes

1M x 10K ~ 3.6M years ~ 3.7K years ~ 282 years ~ 612 days ~ 3.2 days

5M x 10K ~ 458.3M years ~ 453K years ~ 34.9K years ~ 208 years ~ 1.1 years

Timing  at  a  glance  

Page 19: Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

NICTA Copyright 2013 From imagination to impact 19

Applica)on  context:  Integrated  genomics  for  lethal  prostate  cancer  

A/Prof  Chris  Hovens  at  Royal  Melbourne  Hospital  has:  •  Acquired  a  unique  bio-­‐bank  of  over  1500  prostate  cancer  samples  

•  Extensive  clinical  informa)on  

•  Demands  computa)onal  resources  and  exper)se  to  address  complex  genomic  analysis  problems  

Page 20: Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

NICTA Copyright 2013 From imagination to impact

Integrated  genomics  for  lethal  prostate  cancer  Sample  acquisi)on  

20

Unique metastatic samples are harvested by clinical and surgical researchers during the progression of the disease

Page 21: Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

NICTA Copyright 2013 From imagination to impact

Integrated  genomics  for  lethal  prostate  cancer  

21

Molecular  analysis  

Samples  are  profiled  using  mul)ple  high-­‐resolu)on,  high-­‐throughput  plasorms  genera)ng  large  amounts  of  molecular  level  data  

Heterogeneous  DNA  sequencing  (whole  genome)  RNA  sequencing  Methyla)on  profiling  Copy-­‐number  varia)on  profiling  

=  40TB  data  

(doubling  every  3  months)    

Page 22: Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

NICTA Copyright 2013 From imagination to impact

Algorithms  for  variant  interpreta8on  Harness  the  power  of  the  literature  •  Extract  informa)on  about  

genes  and  gene)c  variants  from  biomedical  research  publica)ons  

•  Start  with  the  simple  hypothesis  that  any  men)on  of  a  gene)c  variant  is  meaningful  

•  Priori)ze  variants  with  literature  support  

•  Provide  pointers  to  the  evidence  for  human  interpreta)on  

22

Page 23: Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

NICTA Copyright 2013 From imagination to impact

Cura)on  of  Gene)c  Variant  Informa)on    from  the  biomedical  literature  

hcp://opennicta.com/home/health/variome  

•  Partnership  with  InSiGHT  database  (Human  Variome  Project)  –  Collect  and  catalogue  muta)ons  in  specific  genes  implicated  in  

gastrointes)nal  hereditary  tumours  

–  Collected  both  by  direct  deposit  of  gene)c  variants,  and  from  cura)on  of  the  published  literature  

•  We  have  developed  a  text  annota)on  schema  and  annotated  a  corpus  of  relevant  literature  –  Variant  Annota)on  Schema  

–  covers  genes,  muta)ons,  diseases,  pa)ents,  body  parts,  ethnic  group,  age,  gender,  characteris)cs;  also  rela)onships  among  these  

•  In  progress:  build  en)ty  and  rela)on  extrac)on  tools  to  build  tools  to  support  cura)on  of  this  informa)on  

Page 24: Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

NICTA Copyright 2013 From imagination to impact

A  “Phenotypic  code”  for  complex  disease  

•  Simple  and  complex  diseases  appear  to  share  a  gene)c  architecture  

•  Mining  of  co-­‐morbidi)es  of  complex  diseases  and  Mendelian  diseases  with  known  gene)c  cause  iden)fies  a  ‘code’  for  each  complex  disease  in  terms  of  Mendelian  gene)c  loci.  

•  Evidence  of  epistasis  among  the  Mendelian  variants  (superlinear  complex  disease  risk)   24

Blair et al. Cell (2013); 155 (1); 70-80. http://dx.doi.org/10.1016/j.cell.2013.08.030

Page 25: Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

NICTA Copyright 2013 From imagination to impact

BiomRKRS  Biomarker  Retrieval  and  Knowledge  Reasoning  System  

•  Knowledge  management  for  biomarker  data  

•  Using  ontologies/controlled  vocabularies  as  backbone  for  integra)on  and  retrieval  

•  Integra)ng  informa)on  from  a  range  of  sources,  including  the  literature  

•  Support  querying  according  to  various  characteris)cs  

25

Page 26: Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

NICTA Copyright 2013 From imagination to impact

Searching  for  informa)on  via  complex  queries  

26

Page 27: Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

NICTA Copyright 2013 From imagination to impact

Predic)ve  Modeling  

•  EHRs  capture  health-­‐related  data  •  Turning  that  data  into  ac)onable  informa)on  requires  

analysis  and  modeling  – Data-­‐driven  methods  –  Integra)on  of  mul)ple  sources  of  data  

–  e.g.  combining  clinical  and  gene)c  indicators  in  predic)on  of  cancer  prognosis  

•  Models  produced  via  data  mining  and  predic)ve  analysis  profile  inherited  risks  and  environmental/behavioral  factors  associated  with  pa)ent  disorders,  which  can  be  u)lized  to  generate  predic)ons  about  treatment  outcomes  

27

Page 28: Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

NICTA Copyright 2013 From imagination to impact 28

Biomedical  informa)cs  @  NICTA  

Page 29: Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

NICTA Copyright 2013 From imagination to impact

We  Do  Good  STUFF  

29