slide 1 steve lloyd culham - july 2006 the data deluge and the grid steve lloyd queen mary,...

40
Steve Lloyd Culham - July 2006 Slide 1 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Upload: alexa-mckinnon

Post on 27-Mar-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 1

The Data Deluge

and the GridSteve Lloyd

Queen Mary, University of LondonCulham July 2006

Page 2: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 2

Outline

• What is Data?• Where it comes from – e-

Science• The CERN LHC and Experiments• What is the Grid?• GridPP• Challenges ahead

Page 3: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 3

What is Data?

Anything that can be expressed as numbers

Raw Information → Numbers → Binary Digits

Pictures

Electrical Signals

Sound

Store amount of Red, Green and Blue

Store loudness at each time

Lots of Pictures + Sound = DVD Video

Store voltage or current

Text

Every character has a numerical code

Page 4: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 4

Digital DataNumbers are stored as Binary digits1 Bit = 0 or 11 Byte = 8 bits

Can store yes/no or on/offCan store numbers from 0 to 255(Enough for a character a-z, A-Z, *£$@<... )

25 = 0x128 + 0x64 + 0x32 + 1x16 +1x8 + 0x4 + 0x2 + 1x1 = 00011001

1 kiloByte = ~1,000 Bytes

Typical Word Document ~30kB

1 MegaByte = ~1,000,000 Bytes

A Floppy Disk ~1.4MBA CD ~700MB

1 GigaByte = ~1,000,000,000 Bytes Typical PC Hard Drive 120-

400 GB1 TeraByte = ~1,000,000,000,000 Bytes1 PetaByte = ~1,000,000,000,000,000 Bytes ~1.4 Million

CDs1 ExaByte = ~1,000,000,000,000,000,000 Bytes World Annual Information

Production

World Annual Book Production

Page 5: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 5

Data AnalysisWhat is done with data?

Nothing

Read it Listen to it

Watch it

Analyse it

23Read A

Read BC = A + BPrint C

5

Computer

Program

"Job"

Calculate how proteins fold

Calculate what the weather is going to do

Page 6: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 6

e-ScienceIn the UK this sort of activity has become known as "e-Science"

"e-Science will change the dynamic of the way Science is undertaken"

"Science increasingly done through distributed global collaborations enabled by the internet using very large data collections, terascale computing resources and high performance visualisation"

Dr John Taylor - Director General of Research Councils:

"e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it"

Page 7: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 7

AstronomyCrab Nebula

Optical

RadioInfra-red

X-ray

Jet in M87

HST optical

Gemini mid-IR

VLA radio

Chandra X-ray Virtual Observatories

Page 8: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 8

Earth Observation

1 TB/day

Ozone map

Ottawa

The Institute of Physics (Google

Earth)

Page 9: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 9

Bioinformatics

Page 10: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 10

HealthcareDynamic Brain Atlas

Breast Screening

Scanning

Remote Consultancy

Page 11: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 11

Collaborative Engineering

Real-timecollection

Multi-sourceData Analysis

Unitary Plan Wind Tunnel

Archival storage

Page 12: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 12

Digital CurationDigitization of almost anything

To create Digital Libraries and Museums

Page 13: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 13

The CERN LHC

4 Large Experiments

The world’s most powerful particle accelerator - 2007

Page 14: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 14

ALICE- heavy ion collisions, to create quark-gluon plasmas- 50,000 particles in each collision

LHCb- to study the differences between matter and antimatter- detect over 100 million b and b-bar mesons each year

ATLAS- General purpose- Origin of mass- Supersymmetry- 2,000 scientists from 34 countries

CMS- General purpose- 1,800 scientists from over 150 institutes

The Experiments

Page 15: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 15

7,000 tonnes 42m long22m wide22m high

2,000 Physicists150 Institutes34 Countries

ATLAS Detector

(About the height of a 5 storey building)

Page 16: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 16

Page 17: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 17

The Higgs• Primary objective of the

LHC -• What is the origin of Mass? • Is it the Higgs Particle?

Massless Particle – Travels at the speed of light

Low Mass Particle – Travels slower

High Mass Particle – Travels slower still

Page 18: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 18

Starting from this event…

We are looking for this “signature”Selectivity: 1 in 1013

Like looking for 1 person in a thousand world populations

Or for a needle in 20 million haystacks!

The LHC Data Challenge • ~100,000,000

electronic channels• 800,000,000 proton-

proton interactions per second

• 0.0002 Higgs per second

• 10 PBytes of data a year

• (10 Million GBytes = 14 Million CDs)

Page 19: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 19

LHC Computing RequirementsCPU Power (Reconstruction, Simulation, User Analysis etc) - 100,000 of today's PCs

Distributed Computing Solution – "The Grid"

'Tape' Storage20 PetaBytes (= 20 M GBytes)

Disk Storage – 2.5 PetaBytes (= 2.5 M GBytes)

Page 20: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 20

Web: Information Sharing

• Invented at CERN by Tim Berners-Lee

• Agreed protocols: HTTP, HTML, URLs

• Anyone can access information and post their own

• Quickly crossed over into public use

No. of

Inte

rnet

host

s (m

illio

ns)

Year

Page 21: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 21

Distributed Resource Sharing @Home

Projects• Uses home PCs to run

numerous calculations with dozens of variables.

• Distributed computing project, not a Grid

• Some @home projects– BBC Climate

Change ExperimentSETI @ Home

– FightAIDS@home

Distributed File SharingPeer To Peer Networks

Peer-to-peer network

• No centralised database of files• Legal problems with sharing

copyrighted material• Security problems

Page 22: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 22

SETI@home

• A distributed computing project - not really a Grid project

• You pull the data from them rather than they submit the job to you

Arecibo telescope in Puerto Rico

Users - 5,240,038

Results received – 1,632,106,991

Years of CPU Time – 2,121,057

Extraterrestrials found – 0

Page 23: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 23

The Grid

Ian Foster / Carl Kesselman:

"A computational Grid is a hardware and software infrastructure that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities."

'Grid' means different things to different people

All agree it's a funding opportunity!

Page 24: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 24

Electricity GridAnalogy with the Electricity Power Grid

'Standard Interface'

Power Stations

Distribution Infrastructure

Page 25: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 25

Computing Grid

Computing and Data Centres

Fibre Optics of the Internet

Page 26: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 26

Middleware

MIDDLEWARE

CPUDisks, CPU etc

PROGRAMS

OPERATING SYSTEM

Word/Excel

Email/Web

Your Progra

mGames

CPUCluste

r

UserInterfac

eMachine

CPUCluste

r

CPUCluste

r

Resource Broker

Information Service

Single PC

Grid

DiskServer

Your Progra

m

Middleware is the Operating System of a distributed computing system

Replica CatalogueBookkeepin

g Service

Page 27: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 27

The Grid VisionFrom this:

To this:

Page 28: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 28

19 UK Universities, CERN and

CCLRC (RAL & Daresbury) Funded by PPARC:

GridPP1 2001-2004 “From Web to Grid”

GridPP2 2004-2007 “From Prototype to Production”

GridPP3 2007-2011 (proposed)“From Production to

Exploitation”

Who are GridPP?

Developed a working, highly functional Grid

Page 29: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 29

International Context

LHC Computing Grid (LCG)Grid Deployment Project for

LHC

EU Enabling Grids for e-Science (EGEE) 2004-2008Grid Deployment Project for all disciplines

GridPP LCG

EGEE

GridPP is part of EGEE and LCG (currently the largest Grid in the world)

UK National Grid ServiceUK’s core production computational and data Grid

Open Science Grid (USA)Science applications from HEP to biochemistry

Nordugrid (Scandinavia)Grid Research and Development collaboration

Page 30: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 30

GridPP Middleware Development

Workload Management

Storage Interfaces

Network Monitoring

SecurityInformation Services

Grid Data Management

Page 31: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 31

What you need to use the Grid

1. Get a digital certificate (UK Certificate Authority)

2. Join a Virtual Organisation (VO)

3. Get access to a local User Interface Machine (UI) and copy your files and certificate there

Authentication – who you are

Authorisation – what you are allowed to do

4. Write some Job Description Language (JDL) and scripts to wrap your programs

############# HelloWorld.jdl #################Executable = "/bin/echo";Arguments = "Hello welcome to the Grid ";StdOutput = "hello.out";StdError = "hello.err";OutputSandbox = {"hello.out","hello.err"};#########################################

Page 32: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 32

How it works

Job Descriptio

n

UI Machine

Resource Broker

Input SandboxScript you

want to run

Other files (Job

Options, Source...)

Storage Element

Compute Element

Storage Element

Grid

Proxy Certificate

Output SandboxOutput

files (Plots, Logs...)

Input Data

Output Data

Job

Output Sandbo

x

Page 33: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 33

Tier StructureTier 0

Tier 1National centres

Tier 2Regional groups

Institutes

Workstations

Offline farm

Online system

CERN computer centre

RAL,UK

ScotGrid NorthGridSouthGrid London

FranceItalyGermanyUSA

Glasgow Edinburgh DurhamUseful model for Particle Physics but not necessary for others

Page 34: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 34

UK Tier-1 Centre at RAL

• High quality data services• National and International Role• UK focus for International Grid

development•1000 Dual CPU•200 TB Disk•220 TB Tape

(Capacity 1PB)Grid Operations Centre

Page 35: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 35

UK Tier-2 Centres

ScotGridDurham, Edinburgh, Glasgow NorthGridDaresbury, Lancaster, Liverpool,Manchester, Sheffield

SouthGridBirmingham, Bristol, Cambridge,Oxford, RAL PPD

LondonBrunel, Imperial, QMUL, RHUL, UCL

Mostly funded by HEFCE

Page 36: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 36

The Grid at Queen MaryThe Queen Mary e-Science

High Throughput Cluster

174 PCs (348 CPUs)40 TByte Disk Storage

Part of the London Tier-2 Centre

Currently adding another 285 PCs

Page 37: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 37

The LCG Grid Status Worldwide

182 Sites

23,438 CPUs

9.2 PB Disk

2,200 Years of CPU time

UK

21 Sites

4,482 CPUs

180 TB Disk

593 Years of CPU time

Page 38: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 38

"GridPP has been developed to help answer questions about the conditions in the Universe just after the Big Bang," said Professor Keith Mason, head of the Particle Physics and Astronomy Research Council (PPARC)."But the same resources and techniques can be exploited by other sciences with a more direct benefit to society."

Page 39: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 39

Future Challenges

(Ex-)Concorde(15 km)

CD stack with1 year LHC data(~ 20 km)

We are here(4 km)

• Scaling to full size ~20,000 → 100,000 CPUs

• Stability, Robustness etc• Security (Hackers Paradise!)• Sharing resources (in RAE

environment!)• International Collaboration• Increased Industrial take-up• Spread beyond Science• Continued funding beyond start of

LHC!

Page 40: Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006

Steve Lloyd Culham - July 2006 Slide 40

Further Info …

http://www.gridpp.ac.uk

RSS News feed