information in the lifesciences… the science of the 21 century … · 2020. 5. 12. · ebi is an...

43
EBI is an Outstation of the European Molecular Biology Laboratory. Ewan Birney (tweetable until the last bit – I’ll point it out) Information in the Lifesciences… The Science of the 21 st Century …and why it needs infrastructure

Upload: others

Post on 13-Feb-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • EBI is an Outstation of the European Molecular Biology Laboratory.

    Ewan Birney(tweetable until the last bit – I’ll point it out)

    Information in the Lifesciences…The Science of the 21st Century…and why it needs infrastructure

  • Outline of the talk

    • Who am I?• Genomics 2000-2012

    • HapMap, 1000 Genomes, GWAS, ENCODE

    • Route into medicine• Why we need an infrastructure

    • (Some more whimsical uses of DNA…)

  • Who am I?• Associate Director at

    European Bioinformatics Institute (EBI)

    • Involved in genomics since I was 19 (almost 20 years!)

    • Trained as a biochemist –most people think I am CS

    • Analysed – sometimes lead –human/mouse/rat/platypus etc genomes.

    • Lead the analysis of ENCODE

    EBI is in Hinxton, SouthCambridgeshire

    EBI is part of EMBL, ~likeCERN for molecular biology

  • Molecular Biology

    • The study of how life works – at a molecular level

    • Key molecules:• DNA – Information store (Disk)• RNA – Key information transformer, also does stuff (RAM)• Proteins – The business end of life (Chip, robotic arms)• Metabolites – Fuel and signalling molecules (electricity)

    • Theories of how these interact – no theories of to predict what they are

    • Instead we determine attributes of molecules and store them in globally accessible, open, databases

  • EBI is an Outstation of the European Molecular Biology Laboratory.

    Crash Course in DNA

  • DNA is a covalently linked polymer nearly always found in anti-parallel, non covalent pairs

  • We represent it as strings, not worrying about one pair of the two polymers

    >6 dna:chromosome chromosome:GRCh37:6:133017695:133161157:1GCAGCAAGACAGAAGTGACTCATACATACAAGGGATCCCCAATAAGATTATCGGCAGATTTCTCATCAATAACTTTGGAGACCACAAAGCATTGAGCTGATATATTTAAAGTACTGAAAGAAAAAAAAATCTGACAACCAAGAATTCTATATCCATCAGAACTGCCCTTCAAAAGGGAGGGAGAAATGAAGACATTCTCAGATTTGAGAAGAAAGGAAAGAGAGAAGGGAGGGGAGGGGAGAGGAGGGGAGGGGAGGAGAGGAGAGGAGAGGGCACAGTGGCTCACGCCTGTAATCCTAGCACTTTGCAAGACTGAGGCCAGTGGAACACCTGAGGTCAGGAGATCGAGACCATCCTGGCTAACACGGTGAAACCCCGTCTCCACTAAAAATACAAAAAATTAGCCAGGCGTGGTGGCAGGTGCCTGTAGTTCCAGCTACTCAGGAGGCTGAGGCAGCAGAATGGCGTGAACTCGGGAGGTGGAGCTTGCAGTGAGCTGAGATTGCGCCCCTGCACTCCAGCCTGGGTGACAGAGTGAGACTCTGTCTCAAAAAAATAAAAAGTTTAAAAATATTTTAAAAAAAGAAAGAAAGAAGGGAG

    1 monomer is called a “base pair” – bp

  • We can routinely determine small parts of DNA1977-1990 – 500 bp, manual tracking

    1990-2000 – 500 bp, computational tracking, 1D, “capillary”

    2005-2012 – 20-100bp, 2D systems, (“2nd Generation” or NGS)

    2012 - ?? >5kb, Real time “3rdGeneration”

    Fred Sanger, inventorof terminator DNA sequencing

  • Costs have come exponentially down

  • 100000

    1000000

    10000000

    100000000

    1E+09

    1E+10

    1E+11

    1E+12

    1E+13

    1E+14

    1980 1985 1990 1995 2000 2005 2010 2015

    Bas

    es

    Date

    Capillary reads

    Assembled sequences

    Next gen. reads

    See Blog posts about daspecific compression

    Volumes have gone up!

    Similar explosion ofImage based methods

  • A genome is all our DNA

    Every cell has two copies of 3e9bp(one from mum, one from dad) in24 polymers (“chromosomes”)

    Ecoli: 4e6, Yeast, 12e6 Medaka, 0.9e9

    White Pine20e9

  • Human Genome project

    • 1989 – 2000 – sequencing the human genome• Just 1 “individual” – actually a mosaic of about 24 individuals but

    as if it was one• Old school technologies• A bit epic

    • Now• Same data volume generated in ~3mins in a current large scale

    centre• It’s all about the analysis

  • EBI is an Outstation of the European Molecular Biology Laboratory.

    What happened next?

  • We looked into human variation

    3 in 10,000 bases between any two individuals are different(a bit more between Africans)

    The similarity of a European to an African (any population) isOnly marginally smaller than European to European (2 or 3%).Only a minute amount of DNA is unique to any population

  • … and associate this with traits or disease

    (you can infer the majority of the genome by knowing a base About 1 every 5,000 to 10,000 bases – the experiments toLook at this density is far cheaper than sequencing)

  • ENCODE Dimensions

    Methods/FactorsMethods/Factors

    Cel

    lsC

    ells

    182

    Cel

    l Lin

    es/ T

    issu

    es

    164 Assays (114 different Chip)

    3,010 Experiments5 TeraBases1716x of the Human Genome

  • ENCODE Uniform Analysis Pipeline

    Mapped reads from production (Bam)

    Uniform Peak Calling Pipeline (SPP, PeakSeq)

    IDR Processing, QC and Blacklist Filtering

    Motif Discovery Stats, GSC enrichments, etc.

    Signal Aggregation over peaks

    Signal Generation(read extension and mappability correction)

    Segmentation

    Poor reproducibilityGood reproducibility

    Rep1

    Rep

    2

    Self Organising Maps

    ChromHMM/Segway

    Anshul Kundaje, Qunhua Li, Michael Hoffman, Jason Ernst, Joel Rozowsky, Pouya Kheradpour

  • Discovering functional genome segmentsWell understood:

    TSS, Gene Start,Gene Bodies

    Reassuringly Interesting“Enhancers” (2 states)Insulators

    Definitely There, UnexpectedSpecific Gene End

    Sub-classification of Repeats ~7 Major flavours of genome25 “elaborations”1,000s of details

    Michael Hoffman, Jason Ernst, Bill Noble, Manolis Kellis

  • EBI is an Outstation of the European Molecular Biology Laboratory.

    Impact on Medicine

  • 3 big areas of impact for medicine

    Germ lineRisk to disease

    “Precision” cancermedicine

    Pathogens + Hospital acquiredinfections

  • Germ Line impact

    • Everyone has differential risk of disease

    • But the shift in risk is small• Perhaps 1 to 2% have a

    striking change in risk to a serious disease (>10 fold) which is “actionable”

    • This goes up to 3-4% if you count some less clinically worrying diseases

    1:500 people have HCM1:500 people have FH

  • Precision cancer diagnosis

    • Cancer is a genomic disease

    • By sequencing a cancer you can understand its molecular form better

    • Particular molecular forms respond to particular bugs

  • Pathogens

    • Sequencing provides a clear cut diagnosis of pathogens

    • Can also be used to sequence environments (eg, hospitals)

    • Immune systems for hospitals

  • EBI is an Outstation of the European Molecular Biology Laboratory.

    Why we need an infrastructure…

  • Infrastructure components for ENCODE

    • ~300TB of space • (small compared to 1,000 genomes/Cancer)

    • >5 years of CPU • (EBI 10,000 core farm critical in turn around times)

    • High bandwidth connectivity + Aspera• Easier to connect Seattle and Santa Cruz via EBI (!)

  • Infrastructures are critical…

  • But we only notice them when they go wrong

  • Biology already needs an information infrastructure

    • For the human genome• (…and the mouse, and the rat, and… x 150 now, 1000 in the

    future!) - Ensembl

    • For the function of genes and proteins• For all genes, in text and computational – UniProt and GO

    • For all 3D structures• To understand how proteins work – PDBe

    • For where things are expressed• The differences and functionality of cells - Atlas

  • ..But this keeps on going…

    • We have to scale across all of (interesting) life• There are a lot of species out there!

    • We have to handle new areas, in particular medicine• A set of European haplotypes for good imputation• A set of actionable variants in germline and cancers

    • We have to improve our chemical understanding• Of biological chemicals• Of chemicals which interfere with Biology

  • How?

    Fully Centralised

    Pros: Stability, reuse,Learning ease

    Cons: Hard to concentrateExpertise across of life scienceGeographic, language placementBottlenecks and lack of diversity

    Pros: Responsive, GeographicLanguage responsive

    Cons: Internal communication overheadHarder for end users to learnHarder to provide multi-decade stability

    Fully Distributed

  • How?

    GeneralHub (EBI)

    Robust network with a strong hubNode

    “DomainSpecific hub”

    “National”

  • 34

    medicine

    environment

    bioindustries

    society

    To build a sustainable European infrastructure for biological information, supporting life science research and its translation to:

    ELIXIR’s mission

  • Overlapping Networks

  • Other infrastructures needed for biology

    • EuroBioImaging• Cellular and whole organism Imaging

    • BioBanks (BBMRI)• We need numbers – European populations – in particular for rare

    diseases, but also for specific sub types of common disease

    • Mouse models and phenotypes (Infrafrontier)• A baseline set of knockouts and phenotypes in our most tractable

    mammalian model• (it’s hard to prove something in human)

    • Robust molecular assays in a clinical setting (EATRIS)• The ability to reliably use state of the art molecular techniques in

    a clinical research setting

  • EBI is an Outstation of the European Molecular Biology Laboratory.

    (and don’t tweet this bit)

    And… just for fun…

  • Over a beer…

    Ha! At some point all the data weStore is going to be DNA…

    Of course, the cost effective wayTo store this would be as DNA…

  • 1 g == 1 PB (with redundancy)

  • Scaleable? Cost effective?

  • ENCODE Authors*** Overall Coordination ***Ian Dunham (1), Anshul Kundaje (2).

    *** Data Production Leads ***Shelley F. Aldred (3), Patrick J. Collins (3), Carrie A. Davis (4), Francis Doyle (5), Charles B. Epstein (6), Seth Frietze (7), Jennifer Harrow (8), Vishwanath R. Iyer (9), Rajinder Kaul (10), Jainab Khatun (11), Bryan R. Lajoie (12), Stephen G. Landt (13), Bum-Kyu Lee (9), Florencia Pauli (14), Kate R. Rosenbloom (15), Peter Sabo (16), Alexias Safi (17), AmartyaSanyal (12), Noam Shoresh (6), Jeremy M. Simon (18), Lingyun Song (17), Nathan D. Trinklein (3).

    *** Lead Analysts ***Robert C. Altshuler (19), Ewan Birney (1), James B. Brown (20), Chao Cheng (21), Sarah Djebali (22), Xianjun Dong (23), Ian Dunham (1), Jason Ernst (19), Terrence S. Furey (24), Mark Gerstein (21), Belinda Giardine (25), Melissa Greven (23), Ross C. Hardison (26), Robert S. Harris (25), Javier Herrero (1), Michael M. Hoffman (16), Sowmya Iyer (27), ManolisKellis (19), Jainab Khatun (11), Pouya Kheradpour (19), Anshul Kundaje (2), Timo Lassmann (28), Qunhua Li (20), Xinying Lin (23), Angelika Merkel (29), Ali Mortazavi (30), Stephen C. J. Parker (31), Timothy E. Reddy (14), Joel Rozowsky (21), Felix Schlesinger (4), Bob Thurman (16), Jie Wang (23), Lucas D. Ward (19), Troy W. Whitfield (23), Steven P. Wilder (1), Weisheng Wu (25), Hualin S. Xi (32), Kevin (Yuk-Lap) Yip (21), Jiali Zhuang (23).

    *** Writing Group ***Bradley E. Bernstein (33), Ewan Birney (1), Ian Dunham (1), Eric D. Green (34), Chris Gunter (14), Michael Snyder (35).

    *** NHGRI Project Management ***Leslie B. Adams (36), Laura A.L. Dillon (36), Elise A. Feingold (36), Peter J. Good (36), Eric D. Green (34), Caroline J. Kelly (36), Rebecca F. Lowdon (36), Michael J. Pazin (36), Judith R. Wexler (36), Julia Zhang (36).

    *** Principal Investigators ***Bradley E. Bernstein (33), Ewan Birney (1), Gregory E. Crawford (37), Job Dekker (12), Laura Elnitski (38), Peggy J. Farnham (7), Mark Gerstein (21), Morgan C. Giddings (11), Thomas R. Gingeras (39), Eric D. Green (34), RodericGuig (40), Ross C. Hardison (26), Timothy J. Hubbard (8), Manolis Kellis (19), W. James Kent (15), Jason D. Lieb(18), Elliott H. Margulies (31), Richard M. Myers (14), Michael Snyder (35), John A. Stamatoyannopoulos (16), Scott A. Tenenbaum (5), Zhiping Weng (23), Kevin P. White (41), Barbara Wold (42)

    *** Broad Institute Group ***Bradley E. Bernstein (33), Charles B. Epstein (6), Noam Shoresh (6), Pouya Kheradpour (19), Jason Ernst (19), Tarjei S. Mikkelsen (6), Shawn Gillespie (43), Alon Goren (33), Oren Ram (33), Xiaolan Zhang (6), Li Wang (6), Robbyn Issner (6), Michael J. Coyne (6), Timothy Durham (6), Manching Ku (33), Thanh Truong (6), Lucas D. Ward (19), Robert C. Altshuler(19), Matthew Eaton (19), Manolis Kellis (19).

    *** Boise State University Proteomics Group ***Jainab Khatun (11), Yanbao Yu (44), John Wrobel (11), Brian A. Risk (11), Harsha Gunawardena (44), Heather C. Kuiper (44), Christopher W. Maier (44), Ling Xie (44), Xian Chen (44), Morgan C. Giddings (11).

    *** Data Coordination Center ***Katrina Learned (15), Venkat S. Malladi (15), Kate R. Rosenbloom (15), Cricket A. Sloan (15), Matthew C. Wong (15), Galt P. Barber (15), Melissa S. Cline (15), Timothy R. Dreszer (15), Steven G. Heitner (15), Donna Karolchik (15), W. James Kent (15), Vanessa M. Kirkup (15), Laurence R. Meyer (15), Jeffrey C. Long (15), Morgan Maddren (15), Brian J. Raney (15).

    *** Sanger Institute, Washington University, Yale University, Center for Genomic Regulation, Barcelona, UCSC, MIT, University of Lausanne, CNIO Group ***Bronwen Aken (8), Roger P. Alexander (21), Suganthi Balasubramanian (21), Daniel Barrell (8), Gemma Barson (8), Andrew Berry (8), Nitin Bhardwaj (21), Alexandra Bignell (8), Veronika Boychenko (8), Michael Brent (45), Giovanni Bussotti (22), Chao Cheng (21), Jacqueline Chrast (46), Claire Davidson (8), Thomas Derrien (47), Gloria Despacio-Reyes (8), Mark Diekhans (48), Jason Ernst (19), Iakes Ezkurdia (49), Julio Fernandez Banet (8), Adam Frankish (8), Mark Gerstein (21), James Gilbert (8), Jose Manuel Gonzalez (8), Ed Griffiths (8), Roderic Guig (40), Lukas Habegger (21), Jennifer Harrow (8), Rachel Harte (48), David Haussler (50), Cdric Howald (51), Timothy J. Hubbard (8), Toby Hunt (8), Mike Kay (8), Manolis Kellis (19), Pouya Kheradpour (19), j Khurana (21), Felix Kokocinski (8), Jing (Jane) Leng (21), Michael F. Lin (19), Lucas Lochovsky (21), Jane Loveland (8), Zhi Lu (52), Deepa Manthravadi (8), Marco Mariotti (22), Renqiang Min (21), Xinmeng (Jasmine) Mu (21), Jonathan Mudge (8), Gaurab Mukherjee (8), Cedric Notredame (22), Baikang Pei (21), Alexandre Reymond (51), Jose Manuel Rodriguez (49), Joel Rozowsky (21), Gary Saunders (8), Andrea Sboner (53), Stephen Searle (8), Cristina Sisu (21), Catherine Snow (8), Charlie Steward (8), Andrea Tanzer (54), Electra Tapanari (8), Michael L. Tress (49), Alfonso Valencia (49), Marijke J. van Baren (55), Nathalie Walters (46), Laurens Wilming (8), Koon-Kiu Yan (21), Kevin (Yuk-Lap) Yip (21), Amonida Zadissa (8), Zhengdong Zhang (56), Arif Harmanci(21), Alexej Abyzov (21).

    *** Genome Institute of Singapore Group ***

    *** HudsonAlpha Institute, Caltech, Stanford Group ***Devin M. Absher (14), Henry Amrhein (59), Michael Anaya (42), Anita Bansal (14), Serafim Batzoglou (2), Kevin M. Bowling (14), Marie K. Cross (14), Nicholas S. Davis (14), Tracy Eggleston (14), Clarke Gasper (42), Jason Gertz(14), DeSalvo Gilberto (59), Chris Gunter (14), Preti Jain (14), Brandon King (59), Anshul Kundaje (2), Shawn E. Levy (14), Max W. Libbrecht (2), Georgi K. Marinov (59), Kenneth McCue (59), Sarah K. Meadows (14), Ali Mortazavi(30), Michael A. Muratet (14), Richard M. Myers (14), Amy S. Nesmith (14), J. Scott Newberry (14), Kimberly M. Newberry (14), Stephanie L. Parker (14), E. Christopher Partridge (14), Florencia Pauli (14), Shirley Pepke (60), Barbara Pusey (14), Timothy E. Reddy (14), Arend Sidow (61), Diane Trout (59), Katherine E. Varley (14), JostVielmetter (42), Brian A. Williams (59), Barbara Wold (42).

    *** University of Massachusetts Medical School Genome Folding Group ***Job Dekker (12), Gaurav Jain (12), Bryan R. Lajoie (12), Amartya Sanyal (12).

    *** University of Massachusetts Medical School Weng Group ***Zhiping Weng (23), Troy W. Whitfield (23), Jie Wang (23), Patrick J. Collins (3), Shelley F. Aldred (3), Nathan D. Trinklein (3), E. Christopher Partridge (14), Richard M. Myers (14).

    *** NHGRI Groups ***Laura Elnitski (38), Elliott H. Margulies (31), Stephen C. J. Parker (31), Hanna M. Petrykowska (38).

    *** Duke University, EBI, University of Texas, Austin, University of North Carolina-Chapel Hill Group ***Gregory E. Crawford (37), Terrence S. Furey (24), Paul G. Giresi (18), Linda L. Grasfeder (18), Vishwanath R. Iyer(9), Damian Keefe (1), Seul Ki Kim (18), Min Jae Kim (18), Bum-Kyu Lee (9), Jason D. Lieb (18), Zheng Liu (9), Darin London (17), Ryan M. McDaniell (9), Joanna O. Mieczkowska (18), Piotr A. Mieczkowski (62), Yunyun Ni (9), Naim U. Rashid (63), Alexias Safi (17), Matthew R. Schaner (18), Nathan C. Sheffield (17), Christopher Shestak (18), Kimberly A. Showers (18), Jeremy M. Simon (18), Lingyun Song (17), Tianyuan Wang (17), Deborah Winter (17), Zhancheng Zhang (24), Zhuzhu Z. Zhang (18), Ewan Birney (1), Akshay A. Bhinge (9), Anna Battenhouse (9), Sheera Adar (18).

    *** Lawrence Berkeley National Laboratory Group ***Matthew J. Blow (64), Axel Visel (64), Len A. Pennachio (65).

    *** Cold Spring Harbor, University of Geneva, Center for Genomic Regulation, Barcelona, RIKEN, University of Lausanne, Genome Institute of Singapore Group ***Sarah Djebali (22), Carrie A. Davis (4), Angelika Merkel (29), Alex Dobin (4), Timo Lassmann (28), Ali Mortazavi(30), Andrea Tanzer (54), Julien Lagarde (22), Wei Lin (4), Felix Schlesinger (4), Chenghai Xue (4), Georgi K. Marinov (59), Jainab Khatun (11), Brian A. Williams (59), Chris Zaleski (4), Joel Rozowsky (21), Maik Rder (22), Felix Kokocinski (8), Rehab F. Abdellhamid (66), Tyler Alioto (67), Igor Antoshechkin (59), Michael T. Baer (4), Philippe Batut (4), Ian Bell (68), Kimberly Bell (4), Sudipto Chakrabortty (4), Xian Chen (44), Jacqueline Chrast (46), Joao Curado (22), Thomas Derrien (47), Jorg Drenkow (4), Erica Dumais (68), Jackie Dumais (68), RadahDuttagupta (68), Megan Fastuca (4), Kata Fejes-Toth (69), Pedro Ferreira (22), Sylvain Foissac (68), Melissa J. Fullwood (58), Hui Gao (68), David Gonzalez (22), Assaf Gordon (4), Harsha Gunawardena (44), Cdric Howald(51), Sonali Jha (4), Rory Johnson (22), Philipp Kapranov (68), Brandon King (59), Colin Kingswood (70), GuoliangLi (71), Oscar J. Luo (57), Eddie Park (30), Jonathan B. Preall (4), Kimberly Presaud (4), Paolo Ribeca (67), Brian A. Risk (11), Daniel Robyr (72), Xiaoan Ruan (57), Michael Sammeth (67), Kuljeet Singh Sandhu (57), Lorain Schaeffer (59), Lei-Hoon See (4), Atif Shahab (57), Jorgen Skancke (22), Ana Maria Suzuki (66), Hazuki Takahashi (66), Hagen Tilgner (73), Diane Trout (59), Nathalie Walters (46), Huaien Wang (4), John Wrobel (11), Yanbao Yu (44), YoshihideHayashizaki (66), Jennifer Harrow (8), Mark Gerstein (21), Timothy J. Hubbard (8), Alexandre Reymond (51), Stylianos E. Antonarakis (72), Gregory J. Hannon (4), Morgan C. Giddings (11), Yijun Ruan (57), Barbara Wold (42), Piero Carninci (66), Roderic Guig (40), Thomas R. Gingeras (39).

    *** University of Washington, University of Massachusetts Medical Center Group ***Job Dekker (12), Morgan J. Diegel (16), Erica Gist (16), R. Scott Hansen (74), Eric Haugen (16), Richard A. Humbert(16), Gaurav Jain (12), Audra K Johnson (16), Rajinder Kaul (10), Tattyana V. Kutyavin (16), Bryan R. Lajoie (12), Kristen Lee (16), Patrick Navas (75), Shane J. Neph (16), Fiedencio V. Neri (16), Alex Reynolds (16), Eric Rynes(16), Peter Sabo (16), Richard S. Sandstrom (16), Daniel L. Bates (16), Theresa K. Canfield (16), Amartya Sanyal(12), Anthony O. Shafer (16), George Stamatoyannopoulos (75), John A. Stamatoyannopoulos (16), Sean Thomas (16), Bob Thurman (16), Shinny Vong (16), Hao Wang (16), Molly A. Weaver (16).

    *** Stanford-Yale, Harvard, University of Massachusetts Medical School, University of Southern California/UCDavis Group ***Alexej Abyzov (21), Nick Addleman (13), Roger P. Alexander (21), Raymond K. Auerbach (76), SuganthiBalasubramanian (21), Keith Bettinger (13), Nitin Bhardwaj (21), Alan P. Boyle (13), Alina R. Cao (77), Philip Cayting (13), Alexandra Charos (78), Chao Cheng (21), Yong Cheng (13), Catharine Eastman (13), Ghia Euskirchen(13), Peggy J. Farnham (7), Joseph D. Fleming (79), Seth Frietze (7), Mark Gerstein (21), Fabian Grubert (13), Lukas Habegger (21), Manoj Hariharan (13), Arif Harmanci (21), Sushma Iyengar (80), Victor X. Jin (81), Konrad J. Karczewski (13), Maya Kasowski (13), j Khurana (21), Phil Lacroute (13), Hugo Lam (13), Nathan Lamarre-Vincent (79), Stephen G. Landt (13), Jing (Jane) Leng (21), Jin Lian (82), Marianne Lindahl-Allen (79), Lucas Lochovsky(21), Zhi Lu (52), Renqiang Min (21), Benoit Miotto (79), Hannah Monahan (78), Zarmik Moqtaderi (79), Xinmeng(Jasmine) Mu (21), Henriette O'Geen (77), Zhengqing Ouyang (13), Dorrelyn Patacsil (13), Baikang Pei (21), Debasish Raha (78), Lucia Ramirez (13), Brian Reed (78), Joel Rozowsky (21), Andrea Sboner (53), Minyi Shi (13), Cristina Sisu (21), Teri Slifer (13), Michael Snyder (35), Kevin Struhl (79), Sherman M. Weissman (82), Heather Witt (83), Linfeng Wu (13), Xiaoqin Xu (77), Koon-Kiu Yan (21), Xinqiong Yang (13), Kevin (Yuk-Lap) Yip (21), ZhengdongZh (56)

    *** University of Albany SUNY Group ***Scott A. Tenenbaum (5), Luiz O. Penalva(84), Francis Doyle (5).

    *** University of Chicago, Stanford Group ***Subhradip Karmakar (41), Stephen G. Landt (13), Raj R. Bhanvadia (41), AlinaChoudhury (41), Marc Domanus (41), LijiaMa (41), Jennifer Moran (41), DorrelynPatacsil (13), Teri Slifer (13), Alec Victorsen (41), Xinqiong Yang (13), Michael Snyder (35), Kevin P. White (41).

    *** University of Heidelberg Group ***Thomas Auer (85), Lazaro Centanin (85), Michael Eichenlaub (85), Franziska Gruhl(85), Stephan Heermann (85), Daigo Inoue (85), Tanja Kellner (85), Stephan Kirchmaier (85), Claudia Mueller (85), Robert Reinhardt (85), Lea Schertel (85), Stephanie Schneider (85), Rebecca Sinn (85), Beate Wittbrodt (85), JochenWittbrodt (85).

    *** Data Analysis Center ***Ian Dunham (1), Kathryn Beal (1), AlvisBrazma (86), Paul Flicek (1), Javier Herrero (1), Nathan Johnson (1), Damian Keefe (1), Margus Lukk (86), Nicholas M. Luscombe (87), Daniel Sobral (1), Juan M. Vaquerizas (87), Steven P. Wilder (1), Serafim Batzoglou (2), Arend Sidow (61), Nadine Hussami (2), Sofia Kyriazopoulou-Panagiotopoulou (2), Max W. Libbrecht (2),Marc A. Schaub (2), Anshul Kundaje (2), Ross C. Hardison (26), Webb Miller (25), Belinda Giardine (25), Robert S. Harris (25), Weisheng Wu (25), Peter J. Bickel (20), Balazs Banfai (20), Nathan P. Boley(20), James B. Brown (20), Haiyan Huang (20), Qunhua Li (20), Jingyi Jessica Li (88),William Stafford Noble (89), Jeffrey A. Bilmes (90), Orion J. Buske (16), Michael M. Hoffman (16), Avinash D. Sahu (16), Peter V. Kharchenko (91), Peter J. Park (91), Zhiping Weng (23), Sowmya Iyer (27), Xianjun Dong (23), Melissa Greven (23), Xinying Lin (23), Jie Wang (23), Hualin S. Xi (32), Jiali Zhuang (23), Alexej Abyzov(21), Roger P. Alexander (21), SuganthiBalasubramanian (21), Nitin Bhardwaj (21),Chao Cheng (21), Lukas Habegger (21), Arif Harmanci (21), j Khurana (21), Jing (Jane) Leng (21), Lucas Lochovsky (21), Zhi Lu (52), Renqiang Min (21), Xinmeng(Jasmine) Mu (21), Baikang Pei (21), Andrea Sboner (53), Cristina Sisu (21), Koon-Kiu Yan (21), Kevin (Yuk-Lap) Yip (21), Mark Gerstein (21), Joel Rozowsky (21), Zhengdong Zhang (56), Ewan Birney (1).

    Ian Dunham, Anshul Kundaje

    Shelley F. Aldred Patrick J. Collins, Carrie A. Davis, Francis Doyle, Charles B. Epstein, Seth Frietze, Jennifer Harrow, Vishwanath R. Iyer, Rajinder Kaul, Jainab Khatun, Bryan R. Lajoie, Stephen G. Landt, Bum-Kyu Lee, Florencia Pauli, Kate R. Rosenbloom, Peter Sabo, Alexias Safi, Amartya Sanyal, Noam Shoresh, Jeremy M. Simon, Lingyun Song, Nathan D. Trinklein

  • (you can follow me on twitter @ewanbirney)I blog and update this on Google Plus publically

  • Summary of ENCODE• The majority of the genome participates in a biochemical event

    • A substantial portion (>25%) of the genome participates in a chromatin related event

    • There is a complex and better understood (quantitative) TF/Chromatin/RNA relationship, both in promoters and splicing

    • TFs co-associate in complex, combinatoric ways• We can classify the genome into 7 very broad classes of

    states, and 1,000s of microstates• We can show experimentally and computational this is informative

    • We can inform >50% of the non coding GWAS hits in the NHGRI Catalog

    • Somatic variants are, in bulk, less common in ENCODE functional regions than the rest of the genome

    • Come and play with the data!

  • 44

    The ENCODE ConsortiumBrad Bernstein (Eric Lander, Manolis Kellis, Tony Kouzarides)

    Ewan Birney (Jim Kent, Mark Gerstein, Bill Noble, Peter Bickel, Ross Hardison, Zhiping Weng)

    Greg Crawford (Ewan Birney, Jason Lieb, Terry Furey, Vishy Iyer)

    Jim Kent (David Haussler, Kate Rosenbloom)

    John Stamatoyannopoulos (Evan Eichler, George Stamatoyannopoulos, Job Dekker, Maynard Olson, Michael Dorschner, Patrick Navas, Phil Green)

    Mike Snyder (Kevin Struhl, Mark Gerstein, Peggy Farnham, Sherman Weissman)

    Rick Myers (Barbara Wold)

    Scott Tenenbaum (Luiz Penalva)

    Tim Hubbard (Alexandre Reymond, Alfonso Valencia, David Haussler, Ewan Birney, Jim Kent, Manolis Kellis, Mark Gerstein, Michael Brent, Roderic Guigo) 

    Tom Gingeras (Alexandre Reymond, David Spector, Greg Hannon, Michael Brent, Roderic Guigo, Stylianos Antonarakis, Yijun Ruan, Yoshihide Hayashizaki) 

    Zhiping Weng (Nathan Trinklein, Rick Myers)

    Additional ENCODE Participants: Elliott Marguiles, Eric Green, Job Dekker, Laura Elnitski, Len Pennachio, Jochen Wittbrodt

    .. and many senior scientists, postdocs, students, technicians, computer scientists, statisticians and administrators in these groups

    NHGRI: Elise Feingold, Mike Pazin, Peter Good