diventare aziende data -driven: rendere pervasiva …...from my bank 9:00 pm relax & enjoy 11:00...
TRANSCRIPT
Diventare aziende data-driven: rendere pervasiva l’adozione di analytics in ogni organizzazione11/04/2019 - PADOVA
RELATORI
Carlo Arioli EMEA Marketing Manager @ [email protected]+39-346-2256423
Gianluigi ViganòEMEA PRESALES @ [email protected]+39-335-7483447
Time for the real title
2005
6:15 amMorning Run
3:00 pmShopping
7:00 pmDinner with Bio
9:00 amTrip to Work
1:30 pmBooking trip, ’cause of Ads
4:01 pmAlert recived
from my Bank
9:00 pmRelax & Enjoy
11:00 amEntertain
Data Efficiency
Strategic
DrivenSpeedProjects
VolumeCosts
Tools/Process
Silos / Skills ValueOutcomes
The «Data-Driven Company» value chain
6
New datasources
Volume
Descriptiveanalytics
Classical predictive
statistics
AdvancedMachineLearning
CognitiveModeling
HorizontalScalability
Analyticalprogram
languages
Speed ofanalytics
Culturalchange
Data enabledDecision making
Role profiles
Analyticstalents
Adaptationof business
processes
Automationbusiness
processes
Agileprocesses
DATA ANALYTICS IT PEOPLE PROCESSES
Technical foundations
Optimization
Orchestrationof data
Data Security
Unstructureddata
CorporateIT stacks
Organization Crossfunctio-
nality
Cloudworkloads
MultiplicativeIt as good as the weakest link
Business foundation
x x x x =
Strategy and (Analytic) VisionSOU
RCE:
«Ac
hiev
ing
busin
ess i
mpa
ct w
ith d
ata”
McK
inse
y Di
gita
l
Data analytics
governance
Value captured
You Don’t Need Big Data — You Need the Right Analytics …for all
* RIGHT = in right place, accessible for right people in the right way and right time to help make right business
decisions at the right cost
What if ? Pervasive data-enabled decision making
ExaByte*proven scale *1EB = 1 000 000 000 GB
The Industry’s only infrastructure agnostic, Unified Advanced Analytics Platform
5-1000xfaster query response certified
Analyze in the Right Place
Strong Reliable Performance at Exabyte Scale
In-Database Analytics &
Machine Learning
Freedom from Underlying
Infrastructure
Point of view: NO compromise
6xC• Column-Oriented• Cluster based (MPP)• Compression & encoding (TCO)• Cloud proven with EON mode• Complementary to Open Source.• Compliant with ANSI SQL
Why 6? 6 is a perfect number
according number theory!
How ?
Advanced Analytics & MLRich SQL Analytics and In-database Machine Learning
Extremely fast, scalable & cost effectiveColumnar DB with Multi Parallel Processing Architecture
Easy to use & develop Standard SQL, certified integration with all BI & ETL tools
Streaming Analytics with Kafka
Extended Data Science Spark integration, Java, C, R & Python & V-Python
Analyze on existingData LakesAnalyze data in place withwith SQL on Hadoop and SQL on Amazon S3
Certified Multi-CloudCertified: Azure, AWS & Google
Vertica – Technical Value Proposition
Data Transformation
Messaging & ETL
BI & Visualization
R Java Python
ODB
C, JD
BC,
ADO
.NET
Geospatial
Event Series
Time series
Text Analytics
Pattern Matching
Regression
User-Defined Functions
SQL
Machine Learning
C++
Vertica Unified Analytics with Open Architecture
Row/Column Security, Masking, FPE, LDAP, Kerberos
ROS JSON{} CSV
Geospatial Real-Time Text Analytics
Event Series
Pattern Matching
Time Series
Machine Learning Regression
1
2
3
6
5
7
Column Oriented1 column = 1 file on disk (or more)
Ideal for load-/read-intensive workloads with dramatic reduction of disk I/O
Only reads the columns involved in the query from disk instead of every row and column
Reads and writes in very large block sizes
SELECTavg(price)FROMtickstoreWHEREsymbol = 'AAPL'ANDdate = '5/06/09'
5/05/095/06/095/05/095/06/09
Column Store - Reads 3 columns
Row Store - Reads all columns
NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE
NQDS
NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS
NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS
NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS
NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS
NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS
NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE
NQDS
NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS
NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS
AAPLAAPLBBYBBY
143.74143.75
37.0337.13
5/05/095/06/095/05/095/06/09
NQDS
NYSE
NYSE
NYSE
AAPLAAPLBBYBBY
143.74143.75
37.0337.13
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
Compression & Encoding
8:1
30:1
20:1
60:1
20:1
5:1
10:1
10:1
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
CDR
Consumer
Marketing
Network Logs
SNMP
Trading, IoT (float)
IoT (int)
Clickstream
RatioCompression Results
Just-In-Time Decoding
Engine:Encoded
blocks
Buffer Pool: De-compress
only
Network:Encoded blocks+ Optional LZO
Disk:Encoding +
Compression
Results Decoded Just-In-Time
Cluster Based
14
Vendor-agnostic, MPP, Shared-Nothing, Scale-Out
8-40 core
8-16 GB / core
24x HDD / SSD
Physical Rack Servers
Linux (RHEL/Ubuntu/Oracle…)
8-64 vCPU
4-8 GB / vCPU
SAN / S3
Virtual / Cloud Servers
Linux (RHEL/Ubuntu/AWS…)Intranet1/10 GbE
Private Network10 GbE
Client
Vertica Cluster
Platform agnostic …
15
The data storage decisions you make today won’t impact your ability to execute in the future
SQL Database
++Analytics & ML
Access one unified analytics engine and license across all infrastructure choices
Choose your Deployment Choose your Consumption
On Prem
Choose your Cloud
Compute StorageHardware AgnosticHybrid Cloud
Query Engine
… included Eon
AmazonMicrosoft Azure
Google Cloud
Amazon
S3
First GenerationUsing the cloud as a data center (IaaS)
Second GenerationSeparation of compute and storage
17
Cultural change
Today‘Right Time’ is…(Near) Real Time…
NOW
NOW NOW
NOW
6K concurrentanalysts
RT distanceprice discrimination
I don’t need NRT / I do need RT
car testing
Analytic maturity journey: what leaders do better
Ultra Fast Ad-hocDashboard& Analytics
TCO Effective E-DWH
& reporting
Easy Enterprise ScalePredictive Analytics
Automated, Complex Predictive
Analytics
1 2 3 4
Source: McKinsey, The need to lead in data and analytics
communicate simply Build strong capabilities
Process & Technology Metrics
What if ? Less Guess work !
“When we did the first queries, they were done so fast, we thought they
were broken.”
- Michael Relich, Guess
1hr3.6 sec
8hrs(overnight)
< 30 sec
Bettersales tracking and customerservice in stores.
ImprovedMerchandise allocation and distribution across location.
20
Take the GUESS work out: what? why? what now ?
21
Customer 360 you want to «pay for»: self-service, «ad-hoc», ultra-fast
Catch Media’s cloud based B2B Analytics & Engagement Intelligence Platform enables content owners and distributors to keep their consumers satisfied and loyal by
understanding consumer
behavior and acting upon it at
the right time
Mission: The Right Analytics
Powered by
Business Understanding
Data Analysis &
UnderstandingData
Preparation Modeling Evaluation Deployment
Machine Learning
Speed
ANSI SQL
Scalability
Massively Parallel
Processing
Deploy Anywhere
Outer Detection
Normalization
ImbalancedData
Processing
Sampling
Missing Value Imputation
And More…
Support Vector
MachinesRandom Forests
Logistic Regression
Linear Regression
Ridge Regression
Naive Bayes
Cross Validation
And More…
Model-level Stats
ROC Tables
Error Rate
Lift Table
Confusion Matrix
R-Squared
MSE
In-Database Scoring
Speed
Scale
Security
Pattern Matching
Date/Time Algebra
Window/Partition
Date Type Handling
Sequences
And More…
Sessionize
Time Series
Statistical Summary
SQL SQLSQL SQLSQL
Analytics & in-DB Machine Learning Process Flow
DS cycle
@ columnar MPP speed
80% 20%
Leveraging OpenSource … at scale
23
Building Predictive Analytics into the Core of Vertica
I I I I I I
I I I I I I
Simpler Data Science at MPP speed & scale
Style Prediction ?
Personalized Mobile Messages
25
Online Personalized Recommendation
Predictive Analytics
what will ?
Style Prediction Automation
Dialogue with dataSQL
Prepare your next IoT evolutionSpeed & Integration
Predictive Maintenance
«”We calculated 17 different statistical functions on 2 billion data points in less thana minute, which is faster than our previous
system or any other system I’m aware ofwould have taken just to retrieve the data»
https://youtu.be/IZkkoy5ZT1M
«A significant reduction in the operational cost (351 % ROI) »
https://youtu.be/QZ5vWqblVXU
TCO
Predictive Maintenance
Easy to do
«The agility of Vertica is core for … a non-IT organization like
Suunto»
https://youtu.be/BTIee0tYq9E
Wearables for B2C
IoT - Predictive Maintenance – Listem Data Driven leaders
28
Philips Aims for Zero Unplanned Downtime with Predictive Analytics
Featuring:Dr. Mauro BarbieriSenior ScientistPhilips Research
VisionOrganizationTechnology
https://www.brighttalk.com/webcast/8913/351928
First of all It’s a very fast database actually the fastestest I’ve been working with … and the speed is not only in querying the data, is also in loading the data. You can answer complex queries on hundreds of billions of rows …
However there is one more aspect: it’s the learning curve for the development organization and the consumers of the data. Time to market, development cost are extremely important and especially in this domain, if you want to develop new features fast, that means making new predictive models and also find out which ones works and which do not work, you need to be able to load and process and integrate data fast, to make dataset available at the higher speed … and that’s what Vertica allowed us to do.
… And of top of it, it support standard SQL, it can be interfaced to everything, deployed everywhere. All engineers knows to some degree SQL, so there is a very low threshold for people to start using the data, and when they start using the data, they realize the value and they are happyt to contribute and that’s the value of Vertica”
29
Dr. Mauro Barbieri, Senior Scientist Philips ResearchTime to Results
Cross-functional, accurate, agile & transparent
• Thousands live MRIs• Trillions data-points• NRT Dashboarding• Predictive Models
IoT Vertica based reference architecture
31
Rental charges
Power consumption
Machine sensors
Access logs
Optional Spark/Storm inclusion - convert all
currencies to $
Geospatial to track usage and failures by location
Machine learning to categorize, classify, and predict
Multidimensional to aggregate by dimensions
Real-time dashboards
To Ops as parts replacement recommendations
To Finance as lease buyout recommendations
Time Series to interpolate missing values. Event Series Joins to blend feeds.
Log Text Analytics and Pattern Match to understand errors
Join with data in several other data ponds in many formats
Many live streams in many formats
CEF
Flex Zone
Semi-Structured Data Flex Table Instant view with Vertica BI executive dashboards
Structure on demand or “schema on need”. Mitigate the volubility and variety of machine data
Now Avro and CSV added to the growing list of open-source parsers
Vertica Unified Analytics follow your maturity journey
33
HDFS ( months / years )
Vertica HOT LAYER (columnar / MPP / compression )
Fast Ad-hocDashboard& Analytics
Data Access / ExportJDBC
/ODBCREST
API
Kafka – Message Bus
Web Services
TCO EffectiveE-DWH
& reportingNRT Data Driven Custom App
Data Visualizatoin & Mining Layer
Off the Shelf tools
Logi, PowBI,Microstrategy
VSQL-on-Hadoop
VerticaIn-dBML
UdxC++Java
Data science collab tools
TableauQlikView
Vertica KafkaConnector
(spped)
…..
…
Vertica Flex Table
Vertica copy
Enterprise ScalePreditcive Analytics
ExtDWH
Vertica Ingestion• JSON• DELIMITED• PAIR DELIMITED• AVRO• CSV• CEF• REGEX & SDK
OLTP
TransationalData
GEOspatial
Sensors
Events
Logs & Text
Sensors EventsProbes
Batch + Micro Batch + Stream
RTanalytics
Real TimeCEPconn
Automated, Complex Predictive Analytics
RT
1 2 3 4 5
Traditional ETL
The Lastminute speech: simplicity !
34
• Data Science• BI and DWH• Clickstream Analytics
«Vertica si caratterizza per la facilità con la quale si è integrato nell’ecosistema BI esistente ..
….. e per l’efficiente scalabilità orizzontale con cui riesce a gestire la crescita dei dati e delle analisi di lastminute.com group ….
• Campaign Management
• CRM Optimization
… perché ritenuto possedere un modello di pricing più efficace per gli obiettivi dell’organizzazione»
CIO Lastminute Group
Try it free (till 1 TB) !
35
FREE Vertica Communitiy edition: www.vertica.com/try
Take outs: accelerate “data-driven” transformation
36
Take advantage of advances of tools for modern data pipeline
Deploy fast core “unified” analytics engine to leverage existing and to scale up easily in enterprise
Embrace “governed” self-service analytics with more raw, ad-hoc and the explosion of external data
Employ machine learning and automation at scale with enterprise wide simplicity, accuracy & org transparency
Mobilize the organization
Democratize data access Focus on 1 to 2 areas in the
organization with defined use cases
Change workflows and extend skills to leverage automated analytics
Launch a cultural transformation through training, competitions, and communications
+ =Big Impactfrom simpler
EnterprisewideAnalytics
Source: adapted from McKinsey “Getting big impact from big data”
Data Analytics without limits
see more at:
vertica.com