big data-public-private-forum--2013 publioc-sector_meeting_spain_big_data_technological_trends
TRANSCRIPT
BIG
Big Data Public Private Forum
BIG DATA TENDENCIAS TECNOLÓGICASFirst workshop for the construction of a Roadmap for Big Data in Europe16/04/2013Tomás Pariente – Atos Research and Innovation /
BIG Project
BIGBig Data Public Private Forum
2Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 BIG 318062
WE KNOW WHAT BIG DATA IS, RIGHT?
BIG DATASmall
DATA
Hi, big brother
BIGBig Data Public Private Forum
3Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 BIG 318062
BIG DATA IS NOT ONLY ABOUT SIZE: DATA DIVERSITY MATTERS
BIG DATA
TraditionalStructured
Data
+ =
BIG SIZE
SOCIAL
OPEN
REAL TIME
MULTIMEDIA
LINKED/SHARED
UNSTRUCTURED
“EXHAUST”
3 Vs: Volume, Velocity, Variety
BIGBig Data Public Private Forum
4Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 BIG 318062
“Data is the new gold”1 Data mgmt
Data
Big Data definition When dealing with data becomes The
problem
THE DATA DELUGEEXPONENTIAL DATA GROWTH
1 Neelie Kroes Vice-President of the European Commission responsible for the Digital Agenda Data
IBM: “Every day, we create 2.5 quintillion bytes of data- so much that 90% of the data in the world today has been created in the last two years alone.”
BIGBig Data Public Private Forum
5Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 BIG 318062
BIG DATA TECHNOLOGIES LANDSCAPE
Real-time processing
Batch processing
BIGBig Data Public Private Forum
6Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 BIG 318062
BIG DATA STORAGE NOSQL AND BEYOND
• Distributed File Systems: – Hadoop File System (HDFS). – Capability to store large amount of unstructured data in a reliable way on
commodity hardware.• NoSQL Databases:
– Use other data models than the relational model known from the SQL world – Do not necessarily adhere to transactional properties of atomicity, consistency
and isolation and durability (ACID).• NewSQL Databases: Shorthand for new scalable/high-performance SQL DBs.
– SQL as the primary mechanism for application interaction– ACID support for transactions– A non locking concurrency control mechanism– An architecture providing much higher per-node performances – A scale out, shared-nothing architecture, capable of running on a large number
of nodes without suffering bottlenecks.– The expectation is that NewSQL systems are about 50 times faster than
traditional OLTP RDBMS.
BIGBig Data Public Private Forum
7Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 BIG 318062
BIG DATATHE NOSQL WORLD
Schema-lessUnstructuredApache HBase
Row – Column - TimestampValue = StringSeveral columnsVoldemort
DocumentsStored in JSON or XMLAccessible by Key or contentCouchDB, MongoDB
Graphs structuresHighly associative, social networksAccessible by Key or contentNeo4j
BIGBig Data Public Private Forum
8Kick Off meeting 10-11/09/2012 BIG 318062
BIG DATAAPACHE BIG DATA TOOLS
Courtesy of Michael Hausenblass
BIGBig Data Public Private Forum
9Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 BIG 318062
BIG DATA IS ABOUT CHOOSING THE RIGHT THING
Acquisition
Batch processing
Real-time processing
Query facade
Processing platformsStorm/Apache S4Stream processingIntelligent parallelizationRobust and flexible topologies
Batch / Historical
Real-time
Map-Reduce (Hadoop)Analytical platformIntelligent parallelizationReliability
Performant QueriesCloudera ImpalaApache HIVEApache Solr (Lucene)RDBMS (SQL)…
Distributed massive storageHadoop File System (HDFS)NoSQL (Hbase, Cassandra, CouchDB…)
Corporate
Linked
Social
Un-structured
Events
Logs…
Running pipelinesFast algorithmsHigh throughputNo storage or complex storage
Long-lasting analytical algorithmsIterative process / might take daysHuge volumeData curation
Apache Kafka MessagingPublish-subscribeHundred of thousands per secondApache FlumeFor events or logsEvent pushing
Acquisition software
3 V’s
BIGBig Data Public Private Forum
10Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 BIG 318062
TRENDS
Big Data Tendencies
Big Data Analytics
New Visualization and queries techniques
Going real-time
Data curation
New business models
New efficient and scalable algorithmsMultidisciplinary teams (data scientists)Understand technology platforms Aggregation and correlation algorithms
Stream processingReal-time queriesReal-time visualization paradigms
Performance and scalabilityStorage selection and costsCloud vs. data centers
Data selectionData value and garbage Trust, provenanceNew business models for selling dataDealing with privacy, ownershipFostering reuse of data“Do it before the competitors do”
Data Management
Big Data vs small eyesTake “time” into accountFaster queries
BIGBig Data Public Private Forum
11Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 BIG 318062
BUT BE AWARE OF THE RISKS
Too many solutions:Blank page blockage
Get hold of dataBreak Data silosData Quality
Policies:Security, Privacy, IPR
InvestmentOld apps, StorageCPDs vs Cloud
CurationTrustProvenance
Few ProfessionalsData scientists
BIGBig Data Public Private Forum
12Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 BIG 318062
BIG DATA AND THE PUBLIC SECTORFINDINGS FROM THE TECHAMERICA SURVEY
Real-Time Big Data Could Save Government 10% or More Annually2Real-Time Big Data Could Save Significant Number of Lives3Big Data is Helping Improve the Quality of Citizens’ Lives4
State IT Officials Agree Big Data Can Improve Social and Welfare Services5
Big Data Advances in Medicine, Public Safety Seen as Most Important6
Big Data is Here to Stay: 82% Say Real-Time Big Data is the Way of the Future1
Privacy and Policy Concerns Remain a Barrier to Utilizing Big Data7Public Sector IT Officials Frustrated With Multiple Data Formats, Leadership Changes8
Many Public Sector IT Officials Say Database queries Take Too Much Time9Nearly All Government IT Officials Would Opt For Real Time Access to Data Over Backward Looking Queries10
BIGBig Data Public Private Forum
13Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 BIG 318062
Big Data in the Public Sector
BIGBig Data Public Private Forum
14Kick Off meeting 10-11/09/2012 BIG 318062
PROJECT BIG - SECTOR FORUMS AND TECHNICAL WORKING GROUPS
BIGBig Data Public Private Forum
15Kick Off meeting 10-11/09/2012 BIG 318062
PROJECT BIG SECTORS’ ROADMAP
Identification of Sector’s requisites
Applicability of Big Data
technical white papers in each
Sector
Elaboration of Sector Roadmap
▶requirements and objectives from all Sectors (industry driven working groups)
▶Introduce technologies and trends to the stakeholders to better understand Big Data technologies and its capabilities
▶Sectorial roadmap (elaborate a roadmap per sector).
▶Contributions towards integrated roadmap (cross-sectorial)
BIGBig Data Public Private Forum
16Kick Off meeting 10-11/09/2012 BIG 318062
PROJECT BIG - TIMELINE OF THE MOST IMPORTANT DELIVERABLES
04/2013D2.3.1-1º version of Sector’s requisites
06/2013D4.2.1-1ºversion of IPR, Standardization
•recommendations
09/2013D2.4.1 1ª version of
Sector´s Roadmap
04/2013D2.2.1-1º version of Technicalwhite papers
10/2013D4.3.1-First draft of theBig Data Public-Private Forum
10/2014D2.5-Cross-sectorial roadmap
consolidation
D2.3.1D2.2.1
D4.2.1
D2.4.1
01/2014D2.2.2-Final version of Technical white paper
D4.3.1
D2.2.2
D2.5
D2.3.2
04/2014D2.3.2-Final version of
Sectors requisitesD4.2.204/2014D4.2.2-Final version of IPR, Standardization recommendations
BIG
Big Data Public Private Forum
THANKSTomás Pariente LoboAtos Research & InnovationAtos [email protected]