“cloud computing y big data, próxima frontera de la innovación”
DESCRIPTION
Jornada sobre Big Data y Cloud organizada por la Fundación Areces el jueves 21 de marzo en Madrid. Adjunto las slides de mi ponencia titulada “Cloud Computing y Big Data, próxima frontera de la innovación”. Más información en http://www.jorditorres.org/jornada-el-impacto-de-la-nube-y-el-big-data-en-la-ciencia/TRANSCRIPT
Cloud Computing y Big Data, próxima frontera de la innovación Cloud Computing and Big Data, the next frontier of innovation
Jordi Torres, UPC-BSC Madrid, 21 Marzo 2013
Resumen Un gran avance de la ciencia se produjo hace siglos cuando la teoría matemática permitió formalizar la experimentación. Pero sin duda otro paso fundamental para el avance de la ciencia se dio gracias a la aparición de los computadores. Gracias a ello hoy en día disponemos de potentes supercomputadores que por medio de simulaciones nos permiten crear escenarios caros, peligrosos o incluso imposibles de reproducir en la vida real. Todo un avance para la ciencia y el progreso. Pero lamentablemente hasta ahora la potencia de la supercomputación no se ha encontrado al alcance de todo el mundo, reduciéndose a un conjunto limitado de grupos de investigación, dado los costes de crear y mantener las grandes infraestructuras de este tipo. Pero esto está cambiando con la llegada de lo que se conoce como Cloud Computing, y que ya está permitiendo que muchos otros ámbitos de la ciencia que hasta ahora no podían beneficiarse de esta tecnología puedan hacerlo. Ahora bien, dado que hoy en día los datos disponibles para poder realizar los cálculos han adquirido dimensiones de gran magnitud , lo que se conoce por Big Data, los sistemas de computación actuales presentan nuevos retos que la propia ciencia informática ha empezado a abordar. En esta presentación se discutirá cuáles son las características de esta nueva realidad que conforma el Cloud y el Big Data, enfatizando los nuevos retos que deben ser abordados urgentemente para poder dar respuesta a las necesidades del avance de la ciencia.
Summary A major breakthrough in science occurred centuries ago when mathematical theory enabled formalized experimentation. But certainly another important step in the advancement of science came with the advent of computers. As a result today we have powerful supercomputers that through simulations allow us to create different scenarios: expensive, dangerous, or even those which are impossible to replicate in real life. A breakthrough for science and progress. But unfortunately, so far the power of supercomputing is not available to everyone. It is restricted to a limited set of research groups, due the costs of creating and maintaining large infrastructures like this. With the advent of what is known as cloud computing, this situation is changing. Today, many other fields of science, that until now could not benefit from this technology, are able to take advantage of it. However, given that nowadays the data available to perform calculations has acquired large-scale magnitudes, which is known as Big Data. Current computer systems present new challenges that computer science itself has begun to address. In this presentation we will discuss the characteristics of this new reality that make up the Cloud and Big Data. We will emphasize the new challenges that must be addressed urgently, in order to respond to the needs of the advancement of science.
HOW DID SCIENCE START?
Source: Prof. Mateo Valero, BSC-CNS 2010
Source: Prof. Mateo Valero, BSC-CNS 2010
HOW IS SCIENCE ADVANCING TODAY?
Source: Prof. Mateo Valero, BSC-CNS 2010
Source: Prof. Mateo Valero, BSC-CNS 2010
MATHEMATICAL CALCULATIONS?
WHERE?
MN3
Compute
Cores/chip 8
Chip/node 2
Cores/node 16
Nodes 3028 Total cores 48448
Performance
Freq. 2,6
Gflops/core 20,8
Gflops/node 332,8
Total Tflops 1000,0
Memory
GB/core (GB) 2
GB/node (GB) 32 Total (TB) 96,89
Network Latency (μs) 0,7 Bandwidth (Gb/s) 40
Storage (TB) 2000 Consumption (KW) 1080
FOR SOME SPANISH RESEARCH GROUPS!
AND…
FOR THE REST OF THE WORLD?
Source: http://news.cnet.com/8301-13846_3-57349321-62 /amazon-takes-supercomputing-to-the-cloud
GOOD NEWS!
CLOUD COMPUTING?
Source: http://www.wired.com/wiredenterprise/2011/ 12/nonexistent-supercomputer/all/1
28.000 m2
Sour
ce: h
ttp:
//w
ww
.face
book
.com
/med
ia/
set/
?set
=a.1
9084
2620
9651
85.4
7008
.140
3752
8934
5252
40 Mw
Foto
: Goo
gle
HUGE DATA CENTERS Fo
to: G
oogl
e
> football pitch x 4
Source: http://www.google.com/about/datacenters/gallery/images
Source: http://www.google.com/about/datacenters/gallery/images
Source: http://www.google.com/about/datacenters/gallery/images
Different IT production
Foto
: J.T
.
On-demand self-service
Rapid elasticity Ubiquitous access . . . .
Pay per use
CLOUD COMPUTING: IT as a service
Source: http://www.telegraph.co.uk/technology /reviews/9241719/Power-Ethernet-Sockets-review.html
Idea : Tutorial SC2011 - Robert Grossman
1 computer in a rack for 120 hours 120 computers in three
racks for 1 hour
Example of benefits (IaaS):
AND DATA?
Source: http://www.docuciencia.es/2009/05/lhc-el-acelerador-de-particulas/
“… the LHC produces 1PetaByte of data every second, big data and lack of computing resources were becoming the European Organization for Nuclear Research’s biggest IT challenges…”
Source: computerweekly.com/news/2240173897/CERN-adopts -OpenStack-private-cloud-to-solve-big-data-challenges
1 Gigabyte (GB) = 1.000.000.000 byte 1 Terabyte (TB) = 1.000 Gigabyte (GB) 1 Petabyte (PB) = 1.000.000 Gigabyte (GB) 1 Exabyte (EB) = 1.000.000.000 Gigabyte (GB) 1 Zettabyte (ZB) = 1.000.000.000.000 (GB)
Source: Economist , Feb 25th, 2010 http://www.economist.com/node/15579717
Deluge of data created daily
Big Data? definition?
BIG DATA? Big Data is data that exceeds the storing, processing and managing capacity of conventional systems.
BIG DATA? The reason is that the data is too big, moves too fast, or doesn’t fit the structures of our current systems’ architectures.
BIG DATA? Moreover, to gain value from this data, we must change the way to analyze it.
BIG DATA? Big Data is data that exceeds the storing, processing and managing capacity of conventional systems. The reason is that the data is too big, moves too fast, or doesn’t fit the structures of our current systems’ architectures. Moreover, to gain value from this data, we must change the way to analyze it.
NEW CHALLENGES that must be addressed urgently, in order to respond
to the needs of the advancement of science
1. Storing 2. Managing 3. Processing 4. Analyzing
Affordable Storage
But scanning disks…
assume 100MB/sec
But scanning disks…
assume 100MB/sec more than 5 hours
assume 20.000 disks: scanning 2 TB takes 1 second
Source: http://www.google.com/about/datacenters/gallery/images/_2000/IDI_018.jpg
approach: massive parallelism
Source: http://www.google.com/about/datacenters/gallery/images/_2000/IDI_018.jpg
Rethinking data processing is required: MapReduce, Storm, S4,…
Data processing challenges 1
Data storage challenges
New Storage technologies are required
RAM vs HHD Present solutions: Research:
HHD 100 cheaper than RAM But 1000 times slower
Solid- state drive (SSD) Not volatile
Storage Class Memory (SCM)
2
Example: eventual consistency Solution: “NoSQL systems” Research: New management systems
43
Relational DB can’t support everything
Source: gigaom.com/cloud/big-data-and-nosql-march-to-the-enterprise/
Data management challenges 3
Knowledge
Information
Data
+ Volum
e
- +
-
Valu
e
Research: The majority of algorithms function well in thousands of registers, however at the moment they are impractical for thousands of milions.
The information is non actionable knowledge
prediction using data mining & machine learning techniques
Obtaining value from data 4
Cloud Computing and Big Data:
the next frontier of science and innovation
www.smartcityexpo.com www.bsc.es/eBusiness
Autonomic Systems and e-Business Platforms research line at BSC/UPC
Thank you for your attention
www.JordiTorres.org - @JordiTorresBCN