big data-public-private-forum--2013 publioc-sector_meeting_spain_big_data_technological_trends

17
Big Data Public Private Forum BIG DATA TENDENCIAS TECNOLÓGICAS First workshop for the construction of a Roadmap for Big Data in Europe 16/04/2013 Tomás Pariente – Atos Research and Innovation / BIG Project

Upload: tomas-pariente-lobo

Post on 18-Jan-2017

799 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Big data-public-private-forum--2013 publioc-sector_meeting_spain_big_data_technological_trends

BIG

Big Data Public Private Forum

BIG DATA TENDENCIAS TECNOLÓGICASFirst workshop for the construction of a Roadmap for Big Data in Europe16/04/2013Tomás Pariente – Atos Research and Innovation /

BIG Project

Page 2: Big data-public-private-forum--2013 publioc-sector_meeting_spain_big_data_technological_trends

BIGBig Data Public Private Forum

2Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 BIG 318062

WE KNOW WHAT BIG DATA IS, RIGHT?

BIG DATASmall

DATA

Hi, big brother

Page 3: Big data-public-private-forum--2013 publioc-sector_meeting_spain_big_data_technological_trends

BIGBig Data Public Private Forum

3Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 BIG 318062

BIG DATA IS NOT ONLY ABOUT SIZE: DATA DIVERSITY MATTERS

BIG DATA

TraditionalStructured

Data

+ =

BIG SIZE

SOCIAL

OPEN

REAL TIME

MULTIMEDIA

LINKED/SHARED

UNSTRUCTURED

“EXHAUST”

3 Vs: Volume, Velocity, Variety

Page 4: Big data-public-private-forum--2013 publioc-sector_meeting_spain_big_data_technological_trends

BIGBig Data Public Private Forum

4Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 BIG 318062

“Data is the new gold”1 Data mgmt

Data

Big Data definition When dealing with data becomes The

problem

THE DATA DELUGEEXPONENTIAL DATA GROWTH

1 Neelie Kroes Vice-President of the European Commission responsible for the Digital Agenda Data

IBM: “Every day, we create 2.5 quintillion bytes of data- so much that 90% of the data in the world today has been created in the last two years alone.”

Page 5: Big data-public-private-forum--2013 publioc-sector_meeting_spain_big_data_technological_trends

BIGBig Data Public Private Forum

5Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 BIG 318062

BIG DATA TECHNOLOGIES LANDSCAPE

Real-time processing

Batch processing

Page 6: Big data-public-private-forum--2013 publioc-sector_meeting_spain_big_data_technological_trends

BIGBig Data Public Private Forum

6Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 BIG 318062

BIG DATA STORAGE NOSQL AND BEYOND

• Distributed File Systems: – Hadoop File System (HDFS). – Capability to store large amount of unstructured data in a reliable way on

commodity hardware.• NoSQL Databases:

– Use other data models than the relational model known from the SQL world – Do not necessarily adhere to transactional properties of atomicity, consistency

and isolation and durability (ACID).• NewSQL Databases: Shorthand for new scalable/high-performance SQL DBs.

– SQL as the primary mechanism for application interaction– ACID support for transactions– A non locking concurrency control mechanism– An architecture providing much higher per-node performances – A scale out, shared-nothing architecture, capable of running on a large number

of nodes without suffering bottlenecks.– The expectation is that NewSQL systems are about 50 times faster than

traditional OLTP RDBMS.

Page 7: Big data-public-private-forum--2013 publioc-sector_meeting_spain_big_data_technological_trends

BIGBig Data Public Private Forum

7Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 BIG 318062

BIG DATATHE NOSQL WORLD

Schema-lessUnstructuredApache HBase

Row – Column - TimestampValue = StringSeveral columnsVoldemort

DocumentsStored in JSON or XMLAccessible by Key or contentCouchDB, MongoDB

Graphs structuresHighly associative, social networksAccessible by Key or contentNeo4j

Page 8: Big data-public-private-forum--2013 publioc-sector_meeting_spain_big_data_technological_trends

BIGBig Data Public Private Forum

8Kick Off meeting 10-11/09/2012 BIG 318062

BIG DATAAPACHE BIG DATA TOOLS

Courtesy of Michael Hausenblass

Page 9: Big data-public-private-forum--2013 publioc-sector_meeting_spain_big_data_technological_trends

BIGBig Data Public Private Forum

9Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 BIG 318062

BIG DATA IS ABOUT CHOOSING THE RIGHT THING

Acquisition

Batch processing

Real-time processing

Query facade

Processing platformsStorm/Apache S4Stream processingIntelligent parallelizationRobust and flexible topologies

Batch / Historical

Real-time

Map-Reduce (Hadoop)Analytical platformIntelligent parallelizationReliability

Performant QueriesCloudera ImpalaApache HIVEApache Solr (Lucene)RDBMS (SQL)…

Distributed massive storageHadoop File System (HDFS)NoSQL (Hbase, Cassandra, CouchDB…)

Corporate

Linked

Social

Un-structured

Events

Logs…

Running pipelinesFast algorithmsHigh throughputNo storage or complex storage

Long-lasting analytical algorithmsIterative process / might take daysHuge volumeData curation

Apache Kafka MessagingPublish-subscribeHundred of thousands per secondApache FlumeFor events or logsEvent pushing

Acquisition software

3 V’s

Page 10: Big data-public-private-forum--2013 publioc-sector_meeting_spain_big_data_technological_trends

BIGBig Data Public Private Forum

10Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 BIG 318062

TRENDS

Big Data Tendencies

Big Data Analytics

New Visualization and queries techniques

Going real-time

Data curation

New business models

New efficient and scalable algorithmsMultidisciplinary teams (data scientists)Understand technology platforms Aggregation and correlation algorithms

Stream processingReal-time queriesReal-time visualization paradigms

Performance and scalabilityStorage selection and costsCloud vs. data centers

Data selectionData value and garbage Trust, provenanceNew business models for selling dataDealing with privacy, ownershipFostering reuse of data“Do it before the competitors do”

Data Management

Big Data vs small eyesTake “time” into accountFaster queries

Page 11: Big data-public-private-forum--2013 publioc-sector_meeting_spain_big_data_technological_trends

BIGBig Data Public Private Forum

11Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 BIG 318062

BUT BE AWARE OF THE RISKS

Too many solutions:Blank page blockage

Get hold of dataBreak Data silosData Quality

Policies:Security, Privacy, IPR

InvestmentOld apps, StorageCPDs vs Cloud

CurationTrustProvenance

Few ProfessionalsData scientists

Page 12: Big data-public-private-forum--2013 publioc-sector_meeting_spain_big_data_technological_trends

BIGBig Data Public Private Forum

12Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 BIG 318062

BIG DATA AND THE PUBLIC SECTORFINDINGS FROM THE TECHAMERICA SURVEY

Real-Time Big Data Could Save Government 10% or More Annually2Real-Time Big Data Could Save Significant Number of Lives3Big Data is Helping Improve the Quality of Citizens’ Lives4

State IT Officials Agree Big Data Can Improve Social and Welfare Services5

Big Data Advances in Medicine, Public Safety Seen as Most Important6

Big Data is Here to Stay: 82% Say Real-Time Big Data is the Way of the Future1

Privacy and Policy Concerns Remain a Barrier to Utilizing Big Data7Public Sector IT Officials Frustrated With Multiple Data Formats, Leadership Changes8

Many Public Sector IT Officials Say Database queries Take Too Much Time9Nearly All Government IT Officials Would Opt For Real Time Access to Data Over Backward Looking Queries10

Page 13: Big data-public-private-forum--2013 publioc-sector_meeting_spain_big_data_technological_trends

BIGBig Data Public Private Forum

13Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 BIG 318062

Big Data in the Public Sector

Page 14: Big data-public-private-forum--2013 publioc-sector_meeting_spain_big_data_technological_trends

BIGBig Data Public Private Forum

14Kick Off meeting 10-11/09/2012 BIG 318062

PROJECT BIG - SECTOR FORUMS AND TECHNICAL WORKING GROUPS

Page 15: Big data-public-private-forum--2013 publioc-sector_meeting_spain_big_data_technological_trends

BIGBig Data Public Private Forum

15Kick Off meeting 10-11/09/2012 BIG 318062

PROJECT BIG SECTORS’ ROADMAP

Identification of Sector’s requisites

Applicability of Big Data

technical white papers in each

Sector

Elaboration of Sector Roadmap

▶requirements and objectives from all Sectors (industry driven working groups)

▶Introduce technologies and trends to the stakeholders to better understand Big Data technologies and its capabilities

▶Sectorial roadmap (elaborate a roadmap per sector).

▶Contributions towards integrated roadmap (cross-sectorial)

Page 16: Big data-public-private-forum--2013 publioc-sector_meeting_spain_big_data_technological_trends

BIGBig Data Public Private Forum

16Kick Off meeting 10-11/09/2012 BIG 318062

PROJECT BIG - TIMELINE OF THE MOST IMPORTANT DELIVERABLES

04/2013D2.3.1-1º version of Sector’s requisites

06/2013D4.2.1-1ºversion of IPR, Standardization

•recommendations

09/2013D2.4.1 1ª version of

Sector´s Roadmap

04/2013D2.2.1-1º version of Technicalwhite papers

10/2013D4.3.1-First draft of theBig Data Public-Private Forum

10/2014D2.5-Cross-sectorial roadmap

consolidation

D2.3.1D2.2.1

D4.2.1

D2.4.1

01/2014D2.2.2-Final version of Technical white paper

D4.3.1

D2.2.2

D2.5

D2.3.2

04/2014D2.3.2-Final version of

Sectors requisitesD4.2.204/2014D4.2.2-Final version of IPR, Standardization recommendations

Page 17: Big data-public-private-forum--2013 publioc-sector_meeting_spain_big_data_technological_trends

BIG

Big Data Public Private Forum

THANKSTomás Pariente LoboAtos Research & InnovationAtos [email protected]