cifar

14
Drowning in data The need to deal with and benefit from large quantities of data is not a new concept: it has been noted in many policy reports, particularly in the US and UK, over the past several years. Source: Ian Foster, UoChicago 1

Upload: bill-st-arnaud

Post on 15-Jan-2015

363 views

Category:

Technology


0 download

DESCRIPTION

Developing a Canadian Cyber-infrastructure stratgey

TRANSCRIPT

Page 1: Cifar

1

Drowning in data

• The need to deal with and benefit from large quantities of data is not a new concept: it has been noted in many policy reports, particularly in the US and UK, over the past several years.

Source: Ian Foster, UoChicago

Page 2: Cifar

The Data Deluge

Genomic sequencing output x2 every 9 month

Climate model intercomparisonproject (CMIP) of the IPCC

2004: 36 TB

2012: 2,300 TB

1330 molec. bio databases Nucleic Acids Research (96 in Jan 2001)

MACHO et al.: 1 TBPalomar: 3 TB2MASS: 10 TBGALEX: 30 TBSloan: 40 TB

Pan-STARRS: 40,000 TB

Source: Ian Foster, UoChicago

Page 3: Cifar

Big science has achieved big successes

LIGO: 1 PB data in last science run, distributed worldwide

ESG: 1.2 PB climate datadelivered to 23,000 users; 600+ pubs

OSG: 1.4M CPU-hours/day, >90 sites, >3000 users, >260 pubs in 2010

Robust production solutionsSubstantial teams and expenseSustained, multi-year effortApplication-specific solutions, built on common technology

Source: Ian Foster, UoChicago

Page 4: Cifar

4

Growth in sensor networks and Citizen Science

Real Time Health Monitoring

Glacier Tracking

Smart Trash

Page 5: Cifar

5

NSF Vision

Page 6: Cifar

6

Critical Factors

Source: NSF

Page 7: Cifar

But small & medium science in Canada is struggling

More data, more complex dataAd-hoc solutionsInadequate software, hardwareData plan mandates

Source: Ian Foster, UoChicago

Page 8: Cifar

8

Time-consuming tasks in science

• Run experiments• Collect data• Manage data• Move data• Acquire computers• Analyze data• Run simulations• Compare experiment

with simulation• Search the literature

• Communicate with colleagues

• Publish papers• Find, configure, install

relevant software• Find, access, analyze

relevant data• Order supplies• Write proposals• Write reports• …

Source: Ian Foster, UoChicago

Page 9: Cifar

9

SaaS services in action: The XSEDE vision

XUAS

Source: Ian Foster, UoChicago

Page 10: Cifar

The real cost of campus computing

Source: Christian Belady

Belady, C., “In the Data Center, Power and Cooling Costs More than IT Equipment it Supports”, Electronics Cooling Magazine (February 2007)

• HPC represents 15-20% of campus electrical energy at many Canadian universities*

• Closet clusters consume 5-10% of campus electricity*

• Universities collectively spending millions of dollars on capital cost and electrical energy of computing

* Studies undertaken by CANARIE of 4 universities: UBC, Dalhousie, Ottawa U, UoAlberta

Page 11: Cifar

Research Computing Pyramid

Source: Dan Reed, PCAST 11

Data, data, data

Petascale/Exascale/…

Mobile/Desktopcomputing

Closet clusters

University HPC infrastructure

National HPC infrastructure

Compute, compute, compute

102

109

Capa

ble

Use

rs

Compute Canada

Role for cloud computing

Page 12: Cifar

12

USA & Europe programs -commercial clouds to support research

• US Government $200 million “Big Data for Research and Discovery” research universities, government labs and commercial cloud providers– For example 1000 person genome project stored on Amazon with free access

to researchers– Grants available to researcher to use Amazon tools to undertake computation

• European public –private clouds for research partnership – “European Cloud Partnership”– CERN, European Space Agency, European Molecular Laboratory plus several

Internet companies

• Network organizations in USA, UK , Netherlands etc are brokering commercial cloud services for research and education to significantly reduce costs

Page 13: Cifar

13

Other Canadian initiatives

• CANARIE + Compute Canada– “Integrated Digital Infrastructure”– Integrating networks and HPC

• Research directions being determined by the infrastructure?

• Workshop in Saskatoon in June

Page 14: Cifar

14

Questions for attendees1. Should Canada pursue a research cyber-infrastructure and/or Big Data strategy?

2. Do we need an organization or leadership council to promote a cyber-infrastructure or Big Data strategy in Canada?

3. Given Canada is so far behind, should we partner with international groups such as XSEDE, NeCTAR, etc

4. Should we focus on those who need the most help – small and medium science in Canada?

5. Who should lead cyber-infrastructure in Canada? Researchers, infrastructure providers, funding councils, VPRs, CIOs, Government?

6. Is it the role of universities to operate 1 MW power plants and massive compute facilities that are identical to commercial facilities?