how hpc meets big data · how hpc meets big data ... data from the sequencer need to be served to...

30

Upload: others

Post on 28-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable
Page 2: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable
Page 3: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable

eXact lab - ICTP Data Day 3

how HPC meets big data

● HPDA: High Performance Data Analysis● tasks involving sufficient data volumes and

algorithm complexity to require HPC resources

● Use cases● climate modeling● risk analysis● national security● life science

Page 4: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable

eXact lab - ICTP Data Day 4

use case: DNA sequencing

● In 2006 the XPRIZE Foundation offered $10 million to the first team that“... could sequence 100 whole human genomes at a cost

of $10000 or less per genome, in 30 days or less ...”

● from September 5, 2013 to October 5

➔ HURRY UP!!

Page 5: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable

eXact lab - ICTP Data Day 5

use case: DNA sequencing

● In 2006 the XPRIZE Foundation offered $10 million to the first team that“... could sequence 100 whole human genomes at a cost

of $10000 or less per genome, in 30 days or less ...” ● from September 5, 2013 to October 5

➔ XPRIZE canceled the price on August 22"... genome sequencing technology is plummeting in cost and increasing in speed independent of our competition. Today, companies can do this for less than $5,000 per

genome, in a few days or less ..."

Page 6: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable
Page 7: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable
Page 8: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable
Page 9: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable

eXact lab - ICTP Data Day 9

Customer needs analysis

● Huge amount of genomic data from Illumina Hi-Seq 2000

● To backup (~20k € per run)● To post-process● Always available

➔ Data from the sequencer need to be served to the computational infrastructure

➔ Need for a fast, high performance, highly scalable file system, with robust failover and recovery mechanisms

Page 10: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable

eXact lab - ICTP Data Day 10

Customer needs analysis

➔ Need for a fast, high performance, highly scalable file system, with robust failover and recovery mechanisms

➔ Lustre File System

➔ parallel and distributed

➔ high availability features

Page 11: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable
Page 13: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable
Page 14: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable
Page 15: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable
Page 16: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable
Page 17: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable
Page 18: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable
Page 19: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable
Page 20: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable
Page 21: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable
Page 22: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable
Page 23: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable
Page 24: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable
Page 25: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable

eXact lab - ICTP Data Day 25

The way to ensure MDT integrity

Page 26: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable
Page 27: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable
Page 28: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable
Page 29: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable
Page 30: how HPC meets big data · how HPC meets big data ... Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable