“cloud computing y big data, próxima frontera de la innovación”

Cloud Computing y Big Data, próxima frontera de la innovación Cloud Computing and Big Data, the next frontier of innovation

Jordi Torres, UPC-BSC Madrid, 21 Marzo 2013

Resumen Un gran avance de la ciencia se produjo hace siglos cuando la teoría matemática permitió formalizar la experimentación. Pero sin duda otro paso fundamental para el avance de la ciencia se dio gracias a la aparición de los computadores. Gracias a ello hoy en día disponemos de potentes supercomputadores que por medio de simulaciones nos permiten crear escenarios caros, peligrosos o incluso imposibles de reproducir en la vida real. Todo un avance para la ciencia y el progreso. Pero lamentablemente hasta ahora la potencia de la supercomputación no se ha encontrado al alcance de todo el mundo, reduciéndose a un conjunto limitado de grupos de investigación, dado los costes de crear y mantener las grandes infraestructuras de este tipo. Pero esto está cambiando con la llegada de lo que se conoce como Cloud Computing, y que ya está permitiendo que muchos otros ámbitos de la ciencia que hasta ahora no podían beneficiarse de esta tecnología puedan hacerlo. Ahora bien, dado que hoy en día los datos disponibles para poder realizar los cálculos han adquirido dimensiones de gran magnitud , lo que se conoce por Big Data, los sistemas de computación actuales presentan nuevos retos que la propia ciencia informática ha empezado a abordar. En esta presentación se discutirá cuáles son las características de esta nueva realidad que conforma el Cloud y el Big Data, enfatizando los nuevos retos que deben ser abordados urgentemente para poder dar respuesta a las necesidades del avance de la ciencia.

Summary A major breakthrough in science occurred centuries ago when mathematical theory enabled formalized experimentation. But certainly another important step in the advancement of science came with the advent of computers. As a result today we have powerful supercomputers that through simulations allow us to create different scenarios: expensive, dangerous, or even those which are impossible to replicate in real life. A breakthrough for science and progress. But unfortunately, so far the power of supercomputing is not available to everyone. It is restricted to a limited set of research groups, due the costs of creating and maintaining large infrastructures like this. With the advent of what is known as cloud computing, this situation is changing. Today, many other fields of science, that until now could not benefit from this technology, are able to take advantage of it. However, given that nowadays the data available to perform calculations has acquired large-scale magnitudes, which is known as Big Data. Current computer systems present new challenges that computer science itself has begun to address. In this presentation we will discuss the characteristics of this new reality that make up the Cloud and Big Data. We will emphasize the new challenges that must be addressed urgently, in order to respond to the needs of the advancement of science.

HOW DID SCIENCE START?

Source: Prof. Mateo Valero, BSC-CNS 2010

HOW IS SCIENCE ADVANCING TODAY?

Source: Prof. Mateo Valero, BSC-CNS 2010

MATHEMATICAL CALCULATIONS?

WHERE?

MN3

Compute

Cores/chip 8

Chip/node 2

Cores/node 16

Nodes 3028 Total cores 48448

Performance

Freq. 2,6

Gflops/core 20,8

Gflops/node 332,8

Total Tflops 1000,0

Memory

GB/core (GB) 2

GB/node (GB) 32 Total (TB) 96,89

Network Latency (μs) 0,7 Bandwidth (Gb/s) 40

Storage (TB) 2000 Consumption (KW) 1080

FOR SOME SPANISH RESEARCH GROUPS!

AND…

FOR THE REST OF THE WORLD?

Source: http://news.cnet.com/8301-13846_3-57349321-62 /amazon-takes-supercomputing-to-the-cloud

GOOD NEWS!

CLOUD COMPUTING?

Source: http://www.wired.com/wiredenterprise/2011/ 12/nonexistent-supercomputer/all/1

28.000 m2

Sour

ce: h

ttp:

//w

ww

.face

book

.com

/med

ia/

set/

?set

=a.1

9084

2620

9651

85.4

7008

.140

3752

8934

5252

40 Mw

Foto

: Goo

gle

HUGE DATA CENTERS Fo

to: G

oogl

e

> football pitch x 4

Source: http://www.google.com/about/datacenters/gallery/images

Different IT production

Foto

: J.T

.

On-demand self-service

Rapid elasticity Ubiquitous access . . . .

Pay per use

CLOUD COMPUTING: IT as a service

Source: http://www.telegraph.co.uk/technology /reviews/9241719/Power-Ethernet-Sockets-review.html

Idea : Tutorial SC2011 - Robert Grossman

1 computer in a rack for 120 hours 120 computers in three

racks for 1 hour

Example of benefits (IaaS):

AND DATA?

Source: http://www.docuciencia.es/2009/05/lhc-el-acelerador-de-particulas/

“… the LHC produces 1PetaByte of data every second, big data and lack of computing resources were becoming the European Organization for Nuclear Research’s biggest IT challenges…”

Source: computerweekly.com/news/2240173897/CERN-adopts -OpenStack-private-cloud-to-solve-big-data-challenges

1 Gigabyte (GB) = 1.000.000.000 byte 1 Terabyte (TB) = 1.000 Gigabyte (GB) 1 Petabyte (PB) = 1.000.000 Gigabyte (GB) 1 Exabyte (EB) = 1.000.000.000 Gigabyte (GB) 1 Zettabyte (ZB) = 1.000.000.000.000 (GB)

Source: Economist , Feb 25th, 2010 http://www.economist.com/node/15579717

Deluge of data created daily

Big Data? definition?

BIG DATA? Big Data is data that exceeds the storing, processing and managing capacity of conventional systems.

BIG DATA? The reason is that the data is too big, moves too fast, or doesn’t fit the structures of our current systems’ architectures.

BIG DATA? Moreover, to gain value from this data, we must change the way to analyze it.

BIG DATA? Big Data is data that exceeds the storing, processing and managing capacity of conventional systems. The reason is that the data is too big, moves too fast, or doesn’t fit the structures of our current systems’ architectures. Moreover, to gain value from this data, we must change the way to analyze it.

NEW CHALLENGES that must be addressed urgently, in order to respond

to the needs of the advancement of science

1. Storing 2. Managing 3. Processing 4. Analyzing

Affordable Storage

But scanning disks…

assume 100MB/sec

But scanning disks…

assume 100MB/sec more than 5 hours

assume 20.000 disks: scanning 2 TB takes 1 second

Source: http://www.google.com/about/datacenters/gallery/images/_2000/IDI_018.jpg

approach: massive parallelism

Source: http://www.google.com/about/datacenters/gallery/images/_2000/IDI_018.jpg

Rethinking data processing is required: MapReduce, Storm, S4,…

Data processing challenges 1

Data storage challenges

New Storage technologies are required

RAM vs HHD Present solutions: Research:

HHD 100 cheaper than RAM But 1000 times slower

Solid- state drive (SSD) Not volatile

Storage Class Memory (SCM)

2

Example: eventual consistency Solution: “NoSQL systems” Research: New management systems

43

Relational DB can’t support everything

Source: gigaom.com/cloud/big-data-and-nosql-march-to-the-enterprise/

Data management challenges 3

Knowledge

Information

Data

+ Volum

e

- +

-

Valu

e

Research: The majority of algorithms function well in thousands of registers, however at the moment they are impractical for thousands of milions.

The information is non actionable knowledge

prediction using data mining & machine learning techniques

Obtaining value from data 4

Cloud Computing and Big Data:

the next frontier of science and innovation

www.smartcityexpo.com www.bsc.es/eBusiness

Autonomic Systems and e-Business Platforms research line at BSC/UPC

Thank you for your attention

www.JordiTorres.org - @JordiTorresBCN

“cloud computing y big data, próxima frontera de la innovación”

Technology