no boundary thinking in bioinformatics workshop keynote

56
The bounty of the commons. Casey Greene @GreeneScientist Image: William R Shepherd

Upload: casey-greene

Post on 12-Apr-2017

513 views

Category:

Science


0 download

TRANSCRIPT

Page 1: No Boundary Thinking in Bioinformatics Workshop Keynote

The bounty of the commons.

Casey Greene @GreeneScientist

Image: William R Shepherd

Page 2: No Boundary Thinking in Bioinformatics Workshop Keynote
Page 3: No Boundary Thinking in Bioinformatics Workshop Keynote

It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so.

- Attributed to Mark Twain

Page 4: No Boundary Thinking in Bioinformatics Workshop Keynote
Page 5: No Boundary Thinking in Bioinformatics Workshop Keynote

Everybody knows there are 4 subtypes of HGSC.

… everybody but Greg.

Page 6: No Boundary Thinking in Bioinformatics Workshop Keynote

Tothill et al. Clinical Cancer Research. 2008

One hundred and seventy one tumors consistently segregated into one of the six k-means clusters. Most of the remaining tumors (80 of 114) could be further assigned to one of the molecular subsets by performing class prediction.

171 clustered cleanly

80 could be assigned 34 ???

12-40% unclear

Page 7: No Boundary Thinking in Bioinformatics Workshop Keynote

The Cancer Genome Atlas, Nature. 2011

The silhouette width was computed to filter out expression profiles that were included in a subclass, but that were not robust representatives of the subclass. This resulted in the removal of 51 of 135 samples of the Differentiated subclass; 12 of 107 samples of the Immunoreactive subclass; 0 of 109 samples of the Mesenchymal subclass; and 13 of 138 samples of the Proliferative subclass..

Page 8: No Boundary Thinking in Bioinformatics Workshop Keynote

Verhaak et al. JCI. 2013

Page 9: No Boundary Thinking in Bioinformatics Workshop Keynote

Unified Bioinformatics Pipeline curatedOvarianData

Remove •  <130 tumors •  Custom array

technology

Clustering

Analyses

SAM

Overrepresented Pathways

Survival

Match Clusters

Dataset Inclusion Criteria

TCGA

Tothill

Yoshihara

Bonome

*Our group deposited 528 samples to GEO (GSE74357)

Mayo*

Keep •  Histology •  High Grade

Sample Inclusion Criteria

Gene Selection Criteria

Keep •  1500 MAD •  Union

Page 10: No Boundary Thinking in Bioinformatics Workshop Keynote

Remove •  <130 tumors •  Custom array

technology

Clustering

Analyses

SAM

Overrepresented Pathways

Survival

Match Clusters

Dataset Inclusion Criteria

TCGA

Tothill

Yoshihara

Bonome

*Our group deposited 528 samples to GEO (GSE74357)

Mayo*

Keep •  Histology •  High Grade

Sample Inclusion Criteria

Gene Selection Criteria

Keep •  1500 MAD •  Union

Unified Bioinformatics Pipeline

Are HGSC subtypes consistent?

Page 11: No Boundary Thinking in Bioinformatics Workshop Keynote

k-Means and NMF are consistent.

Page 12: No Boundary Thinking in Bioinformatics Workshop Keynote

Cross-population analysis does not support four subtypes.

Page 13: No Boundary Thinking in Bioinformatics Workshop Keynote

Why didn’t TCGA (2011) find this?

Page 14: No Boundary Thinking in Bioinformatics Workshop Keynote

Why didn’t Konecny (2014) find this?

Page 15: No Boundary Thinking in Bioinformatics Workshop Keynote

What about TCGA’s re-analysis of Tothill?

Page 16: No Boundary Thinking in Bioinformatics Workshop Keynote

What if you re-analyze Tothill without LMP samples?

Page 17: No Boundary Thinking in Bioinformatics Workshop Keynote

What if you re-analyze Tothill with LMP samples?

Page 18: No Boundary Thinking in Bioinformatics Workshop Keynote

Comprehensive cross-population analysis of high-grade serous ovarian cancer supports no more than three subtypes bioRxiv: http://dx.doi.org/10.1101/030239 github: http://github.com/greenelab/hgsc_subtypes

Page 19: No Boundary Thinking in Bioinformatics Workshop Keynote

Research is to see what everybody else has seen and to think what

nobody else has thought.�- Albert Szent-Györgyi

Image by J.W. McGuire/NIH

Page 20: No Boundary Thinking in Bioinformatics Workshop Keynote

Image from You Don’t Know Jack. Vol 3.

Page 21: No Boundary Thinking in Bioinformatics Workshop Keynote

If you showed 16,000 computers 10 million images from youtube, what would they see?

Le et al. 2012

Page 22: No Boundary Thinking in Bioinformatics Workshop Keynote
Page 23: No Boundary Thinking in Bioinformatics Workshop Keynote

•  Pseudomonas aeruginosa compendium •  > 100 different experiments •  Many different labs

Page 24: No Boundary Thinking in Bioinformatics Workshop Keynote

Analysis with Denoising Autoencoders of �Gene Expression (ADAGE)

Tan et al. Pac Sym Bio 2015; Tan et al. mSystems 2016.

Page 25: No Boundary Thinking in Bioinformatics Workshop Keynote

The worst thing that can be said about your grant is that it's a fishing expedition.

�- Tom Blumenthal (@ 2016 Gold Lab Symposium)

Page 26: No Boundary Thinking in Bioinformatics Workshop Keynote

�- Tom Blumenthal (@ 2016 Gold Lab Symposium)

If you engage in a fishing expedition you can actually catch fish.�

Image by Peter (stockispicts)

Page 27: No Boundary Thinking in Bioinformatics Workshop Keynote

The Transcription Factor Anr Controls P.a. Response to Low O2

Low O2

O2

O2

O2

O2

O2 O2

O2 O2

O2

O2

O2

O2

O2

O2

O2 O2

O2

O2 O2

O2

O2

O2 O2

O2

O2

O2 O2 O2

O2 O2

O2

O2

O2

Anr

CF Lung Epithelium

Page 28: No Boundary Thinking in Bioinformatics Workshop Keynote

High-weight genes

Page 29: No Boundary Thinking in Bioinformatics Workshop Keynote

Node42 reflects Anr Activity

E−GEOD−17179

} wt

}}Δanr

Δdnr

E−GEOD−17296

}}}}}}

ΔanrΔroxSR

ΔanrΔroxSR

wt

wt

}}

EXP

STAT

O2

E−GEO

D−52445

O2

Node42 - Anr ActivityE−G

EOD

−33160

O2

A

B

−15 0 10Value

Color KeyColor Key

−10 0 10Value

Color Key

Value−10 0 10

−10 0 15

Color Key

Value

}}Δanr

wt

}}Δanr

wt }}Δanr

wt

−5 0 5

Color Key

Value

Color Key

Value−4 0 4

Color Key

Value−2 0 2

Microarray RNAseq PAO1

RNAseq J215

C

Page 30: No Boundary Thinking in Bioinformatics Workshop Keynote

New Experiment Validates Node 42’s Low-O2 Signature

CF lung epithelial cells Jack Hammond

E−GEOD−17179

} wt

}}Δanr

Δdnr

E−GEOD−17296

}}}}}}

ΔanrΔroxSR

ΔanrΔroxSR

wt

wt

}}

EXP

STAT

O2

E−GEO

D−52445

O2

Node42 - Anr Activity

E−GEO

D−33160

O2

A

B

−15 0 10Value

Color KeyColor Key

−10 0 10Value

Color Key

Value−10 0 10

−10 0 15

Color Key

Value

}}Δanr

wt

}}Δanr

wt }}Δanr

wt

−5 0 5

Color Key

Value

Color Key

Value−4 0 4

Color Key

Value−2 0 2

Microarray RNAseq PAO1

RNAseq J215

CE−GEOD−17179

} wt

}}Δanr

Δdnr

E−GEOD−17296

}}}}}}

ΔanrΔroxSR

ΔanrΔroxSR

wt

wt

}}

EXP

STAT

O2

E−GEO

D−52445

O2

Node42 - Anr Activity

E−GEO

D−33160

O2

A

B

−15 0 10Value

Color KeyColor Key

−10 0 10Value

Color Key

Value−10 0 10

−10 0 15

Color Key

Value

}}Δanr

wt

}}Δanr

wt }}Δanr

wt

−5 0 5

Color Key

Value

Color Key

Value−4 0 4

Color Key

Value−2 0 2

Microarray RNAseq PAO1

RNAseq J215

C

Page 31: No Boundary Thinking in Bioinformatics Workshop Keynote

Cross-platform normalization of microarray and RNA-seq data for machine learning applications

Thompson, Tan, Greene. PeerJ. Jeff Thompson

Page 32: No Boundary Thinking in Bioinformatics Workshop Keynote

Cross-platform normalization of microarray and RNA-seq data for machine learning applications

Thompson, Tan, Greene. PeerJ. 2016

Page 33: No Boundary Thinking in Bioinformatics Workshop Keynote

New Experiment Validates Node 42’s Low-O2 Signature

CF lung epithelial cells Jack Hammond

E−GEOD−17179

} wt

}}Δanr

Δdnr

E−GEOD−17296

}}}}}}

ΔanrΔroxSR

ΔanrΔroxSR

wt

wt

}}

EXP

STAT

O2

E−GEO

D−52445

O2

Node42 - Anr Activity

E−GEO

D−33160

O2

A

B

−15 0 10Value

Color KeyColor Key

−10 0 10Value

Color Key

Value−10 0 10

−10 0 15

Color Key

Value

}}Δanr

wt

}}Δanr

wt }}Δanr

wt

−5 0 5

Color Key

Value

Color Key

Value−4 0 4

Color Key

Value−2 0 2

Microarray RNAseq PAO1

RNAseq J215

CE−GEOD−17179

} wt

}}Δanr

Δdnr

E−GEOD−17296

}}}}}}

ΔanrΔroxSR

ΔanrΔroxSR

wt

wt

}}

EXP

STAT

O2

E−GEO

D−52445

O2

Node42 - Anr Activity

E−GEO

D−33160

O2

A

B

−15 0 10Value

Color KeyColor Key

−10 0 10Value

Color Key

Value−10 0 10

−10 0 15

Color Key

Value

}}Δanr

wt

}}Δanr

wt }}Δanr

wt

−5 0 5

Color Key

Value

Color Key

Value−4 0 4

Color Key

Value−2 0 2

Microarray RNAseq PAO1

RNAseq J215

C

E−GEOD−17179

} wt

}}Δanr

Δdnr

E−GEOD−17296

}}}}}}

ΔanrΔroxSR

ΔanrΔroxSR

wt

wt

}}

EXP

STAT

O2

E−GEO

D−52445

O2

Node42 - Anr Activity

E−GEO

D−33160

O2

A

B

−15 0 10Value

Color KeyColor Key

−10 0 10Value

Color Key

Value−10 0 10

−10 0 15

Color Key

Value

}}Δanr

wt

}}Δanr

wt }}Δanr

wt

−5 0 5

Color Key

Value

Color Key

Value−4 0 4

Color Key

Value−2 0 2

Microarray RNAseq PAO1

RNAseq J215

C

E−GEOD−17179

} wt

}}Δanr

Δdnr

E−GEOD−17296

}}}}}}

ΔanrΔroxSR

ΔanrΔroxSR

wt

wt

}}

EXP

STAT

O2

E−GEO

D−52445

O2

Node42 - Anr Activity

E−GEO

D−33160

O2

A

B

−15 0 10Value

Color KeyColor Key

−10 0 10Value

Color Key

Value−10 0 10

−10 0 15

Color Key

Value

}}Δanr

wt

}}Δanr

wt }}Δanr

wt

−5 0 5

Color Key

Value

Color Key

Value−4 0 4

Color Key

Value−2 0 2

Microarray RNAseq PAO1

RNAseq J215

C

E−GEOD−17179

} wt

}}Δanr

Δdnr

E−GEOD−17296

}}}}}}

ΔanrΔroxSR

ΔanrΔroxSR

wt

wt

}}

EXP

STAT

O2

E−GEO

D−52445

O2

Node42 - Anr Activity

E−GEO

D−33160

O2

A

B

−15 0 10Value

Color KeyColor Key

−10 0 10Value

Color Key

Value−10 0 10

−10 0 15

Color Key

Value

}}Δanr

wt

}}Δanr

wt }}Δanr

wt

−5 0 5

Color Key

Value

Color Key

Value−4 0 4

Color Key

Value−2 0 2

Microarray RNAseq PAO1

RNAseq J215

C

Page 34: No Boundary Thinking in Bioinformatics Workshop Keynote

ADAGE analysis of publicly available gene expression data collections illuminates Pseudomonas aeruginosa-host interactions��bioRxiv: http://dx.doi.org/10.1101/030650�github: http://github.com/greenelab/adage�Tan, Hammond, Hogan, and Greene. mSystems. 2016

Page 35: No Boundary Thinking in Bioinformatics Workshop Keynote

I didn’t want to just know the names of things. I remember really wanting to know how it all worked.�- Elizabeth Blackburn

Image: US Embassy Sweden

Page 36: No Boundary Thinking in Bioinformatics Workshop Keynote

Activity volcano plot

Fold Change

-log(p)

Page 37: No Boundary Thinking in Bioinformatics Workshop Keynote

Pareto activity selection

Fold Change

-log(p)

Page 38: No Boundary Thinking in Bioinformatics Workshop Keynote

volcano + networks =�pathway-style analysis

Page 39: No Boundary Thinking in Bioinformatics Workshop Keynote

We can measure gene-gene similarity with ADAGE weights.

… …

Page 40: No Boundary Thinking in Bioinformatics Workshop Keynote

ADAGE similarity captures functional similarity.

Tan et al. in prep.

Page 41: No Boundary Thinking in Bioinformatics Workshop Keynote

ADAGE-based pathway analysis reveals underlying mechanisms

Tan et al. in prep.

Page 42: No Boundary Thinking in Bioinformatics Workshop Keynote

Where do we have (enough) data?

Greene et al. Pac Sym Bio. 2016

Page 43: No Boundary Thinking in Bioinformatics Workshop Keynote

ADAGE webserver coming soon! http://www.greenelab.com/webservers

Page 44: No Boundary Thinking in Bioinformatics Workshop Keynote

Open science can be hypercollaborative.

Page 45: No Boundary Thinking in Bioinformatics Workshop Keynote

Open Science != Reproducibility

Page 46: No Boundary Thinking in Bioinformatics Workshop Keynote

Computational biology should be reproducible:

•  Accurately

•  Quickly

•  Without contacting original authors

Page 47: No Boundary Thinking in Bioinformatics Workshop Keynote

Continuous Analysis

Page 48: No Boundary Thinking in Bioinformatics Workshop Keynote

Continuous Analysis

Page 49: No Boundary Thinking in Bioinformatics Workshop Keynote

Cloud computing allows “Continuous Analysis” at scale.

Research Cluster

Page 50: No Boundary Thinking in Bioinformatics Workshop Keynote

Semi-Supervised Learning of the Electronic Health Record for Phenotype Stratification �bioRxiv: http://dx.doi.org/10.1101/039800�github: http://github.com/greenelab/DAPS

Reproducible computational workflows with continuous analysis bioRxiv: http://dx.doi.org/10.1101/056473 github: http://github.com/greenelab/continuous_analysis

Page 51: No Boundary Thinking in Bioinformatics Workshop Keynote
Page 52: No Boundary Thinking in Bioinformatics Workshop Keynote

Research Parasite Awards (The “Parasites”)

Selection criteria for the work in question: •  The awardee must not have been involved the design of the experiments

that generated the data.

•  The awardee published independently of the original investigators, and the original investigators are not authors of the secondary analyses but are appropriately credited in the manuscripts.

•  The awardee may have extended, replicated or disproved what the original investigators had posited.

•  The awardee has provided source code and intermediate or final results in a manner that enhances reproducibility.

Page 53: No Boundary Thinking in Bioinformatics Workshop Keynote

Research Parasite Awards (The “Parasites”)

Additional selection criteria for the Junior Parasite award: •  The awardee must have published the work at the training stage of their

career (postdoctoral, graduate, or undergraduate). If the awardee has assumed a position as an independent investigator she or he should not have been in that position for more than 2 years.

•  The award will be based on work described in a single manuscript (submitted alongside the nomination letter).

Page 54: No Boundary Thinking in Bioinformatics Workshop Keynote

Research Parasite Awards (The “Parasites”)

Additional selection criteria for the Sustained Parasitism award: •  The awardee must be in an independent investigator position in academia,

industry or public sector.

•  The awardee must be a last or corresponding author on the three manuscripts submitted alongside the nomination letter.

•  At least a five-year period must have elapsed between the publication of the first manuscript and the final manuscript.

Page 55: No Boundary Thinking in Bioinformatics Workshop Keynote

Details •  Submit by October 14, 2016 @ 5PM HST •  Additional Instructions at�

http://greenelab.com/parasite-award •  COI rules are strict! I can only talk about rules.

2017 Selection Committee:

Page 56: No Boundary Thinking in Bioinformatics Workshop Keynote

Greene Lab: Jaclyn Taroni (Postdoc) Daniel Himmelstein (Postdoc) Jie Tan (Grad Student) Gregory Way (Grad Student) Brett Beaulieu-Jones (Grad Student) Amy Campbell (Postbacc) René Zelaya (Programmer) Matt Huyck (Programmer) Dongbo Hu (Programmer) Kathy Chen (Undergrad) Mulin Xiong (Undergrad) Tim Chang (Undergrad) Roshan Ravishankar (Undergrad)

Collaborators: Deb Hogan & Jack Hammond

Data: All investigators who publicly release their gene expression data.

Images: Artists who release their work under a Creative Commons license.

Funding: Gordon and Betty Moore Foundation National Science Foundation Cystic Fibrosis Foundation National Institutes of Health �

Find us online: http://www.greenelab.com Twitter: @GreeneScientist Calvin and Hobbes. Bill Watterson