amul&dimensionalapproachtostudyingprp.ucsd.edu/presentations/2015-workshop/2015_pacific...
TRANSCRIPT
A mul& dimensional approach to studying mul$ple sclerosis pathogenesis
Cell cycle
Cell differen&a&on
apoptosis Cellular process
proteins
pathways
genomic
expression
Genotype
Phenotype low
high
Physiological/pathological Process (health/disease)
autoimmunity cancer metabolic
neurodegenera&ve
infec&ous
metagenomic
Challenges in data acquisi&on and analysis in the biological sciences
• Democra&za&on of equipment • Distributed data genera&on • Cheaper per byte (?) • Individualized analysis • PRP will enable a new mode of collabora&ng for health sciences.
Current projects
• Genomics – GWAS – WGA – RNA expression
• Endophenotypes – Clinical – Imaging
• MRI • OCT
• Microbiome – Human – experimental
• Heterogeneous networks – Drug repurposing – Gene&c architecture
• Genomics – GWAS (80,000 samples * 2M variants) -‐> 160B datapoints
• IMSGC: Collabora&on with 40+ labs
– WGA (500 genomes) -‐> 125 Tb • Own effort. Poten&al IMSGC involvement
– RNA expression (~1000 samples) -‐> 10 Tb • Own effort. EPIC Project
• Endophenotypes – Clinical (24/7 monitoring, wearables, etc) – Imaging
• MRI (5000) • OCT (3000)
• Microbiome (Collabora&on with R. Knight) – Human (4000 16S, 500 metagenomics) – Experimental (?)
• Heterogeneous networks – Drug repurposing – Gene&c architecture of diseases
6M edges
50,000 nodes
Current data transfer speeds
• Data transfer frequency: ad-‐hoc • Within building (10 Gbps) • From lab to QB3 cluster (10Gbps) – QB3 cluster (~5000 cores)
• From lab to EC2 (20Mbps)
QB3 cluster