Qrious about InsightsBig Data in the Real World
AUT DSRG Workshop
Guy Kloss
[email protected] ArchitectQrious Limited
7 February 2017
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Outline
1 The Problem
2 Examples
3 The Solution
4 Tools of the Trade
5 Boxing up a Solution
6 Flotsam and JetsamGuy Kloss | Big Data in the Real World 2/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Who/What is Qrious?
We help New Zealand businessesand public sector organisations
create valueand solve their most pressing business problems
by turning data into actionable insight.
Guy Kloss | Big Data in the Real World 3/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Who/What is Qrious?
Backed by SparkApprox. 60 employeesOffices in Auckland & WellingtonSubstantial investment across Data, Platform & PeopleBuilt from the ground up(new generation technology and working principles)One of the largest Data Science teams in the countrywith > 80% qualified to Masters & PhD leveland over 60 years of combined experience years of combined experienceNZs leading data analytics specialist by 2017
Guy Kloss | Big Data in the Real World 4/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Our Capabilities
Advanced analyticsLocation insightsBig Data platformsConsulting servicesBI & Warehousing
Guy Kloss | Big Data in the Real World 5/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Who am I?
Chemical Engineer (Masters)Rocket Scientist (German Aerospace Centre)Computer Scientist (PhD)Former lecturer (AUT)Lead Software Developer and Head Crypto Geek @ MegaEnterprise Architect at QriousDad, baseballer, diver, . . . general geek!
Guy Kloss | Big Data in the Real World 6/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Outline
1 The Problem
2 Examples
3 The Solution
4 Tools of the Trade
5 Boxing up a Solution
6 Flotsam and JetsamGuy Kloss | Big Data in the Real World 7/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Data size
Number of recordsData volume
Guy Kloss | Big Data in the Real World 8/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
An exponentially growing data worldPrimary Memory/Disk Capacity
Guy Kloss | Big Data in the Real World 9/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
An exponentially growing data worldRelative Speeds
Source: http://www.cs.cmu.edu/~amarp/cpu-io-gap
Guy Kloss | Big Data in the Real World 10/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Size Does Matter!
Access/processing beyond a single machine(RAM, disk, CPU)Expensive data transfers at volume(latency, throughput)
Guy Kloss | Big Data in the Real World 11/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Storage Issues
Storage, access, index, findTransfer, manage, prevent data loss
Guy Kloss | Big Data in the Real World 12/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Types of Data
StructuredUnstructuredGraphsFree text. . .
Guy Kloss | Big Data in the Real World 13/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Correlating . . . co-relating . . . mashing . . .
Not single record problemBut an m : n problem
Guy Kloss | Big Data in the Real World 14/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Beyond Exponential
Problems are between exponential and hyperexponential→ Enabling data processing in an exponential world
Guy Kloss | Big Data in the Real World 15/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Outline
1 The Problem
2 Examples
3 The Solution
4 Tools of the Trade
5 Boxing up a Solution
6 Flotsam and JetsamGuy Kloss | Big Data in the Real World 16/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Number of Records
> 1 trillion (109) records: Spark’s location based data setAnonymised for privacy (on ingest)Fully encrypted (at rest and in transport)Continuous/stream ingestionNormalisation and segmentation on data setCorrelating with external data set
→ Finding insights in this “hay mountain”
Guy Kloss | Big Data in the Real World 17/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Data Volume
100s of TB to PB of “Data Lakes”Not just a backup/data graveFully encrypted (at rest and in transport)Includes data querying and processing capability
→ Capability to “store everything” (every thing and kind)
Guy Kloss | Big Data in the Real World 18/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Outline
1 The Problem
2 Examples
3 The Solution
4 Tools of the Trade
5 Boxing up a Solution
6 Flotsam and JetsamGuy Kloss | Big Data in the Real World 19/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Divide and Conquer
Massively parallel processing: MPPParallelise: Map-ReducePipelines: Stream processing
Guy Kloss | Big Data in the Real World 20/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Leverage Data Locality
Bring processing to the data
Guy Kloss | Big Data in the Real World 21/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
The Right Tools
Don’t re-invent the wheelUse existing high performing tools where possibleAvailable high productivity frameworks, making use of high level languagesThe right tool for the type of dataUse the Source, Luke!(Leverage open source based tooling with a community)
Guy Kloss | Big Data in the Real World 22/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
The Right Data Organisation
Row vs. columnar storage→ For analytics often better in columnar format
Guy Kloss | Big Data in the Real World 23/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
In, Out, Cha-Cha-Cha
Ingest data from (legacy, external) source systems→ ETL – Extract, Transform, Load
Make sure the rhythm fits (no missing “Out”)
Guy Kloss | Big Data in the Real World 24/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Outline
1 The Problem
2 Examples
3 The Solution
4 Tools of the Trade
5 Boxing up a Solution
6 Flotsam and JetsamGuy Kloss | Big Data in the Real World 25/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Hadoop
Hadoop and distributionsProcessing tools for relational, streaming, batch, graph, text, search, . . .Allocates cluster resources dynamicallyData distributed (with redundancy),so compute allocated where data is
Guy Kloss | Big Data in the Real World 26/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Hadoop Distributions
Many Hadoop distributions: Similar to Linux distributionsCloudera Partnership with Qrious
“Bronze” partnerAmbitions to become “Silver” partnerand MSP (managed service provider)
Guy Kloss | Big Data in the Real World 27/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Basic Hadoop Tool SuiteExample: Cloudera Hadoop Distribution
Guy Kloss | Big Data in the Real World 28/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
MPP Databases
DB for massively parallel processing (MPP)Greenplum database and forks(based on PostgreSQL)
Guy Kloss | Big Data in the Real World 29/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Generic and Specialised DBs
Generic RDBMS (where useful)NoSQLGraph DBOther columnar species
Guy Kloss | Big Data in the Real World 30/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Outline
1 The Problem
2 Examples
3 The Solution
4 Tools of the Trade
5 Boxing up a Solution
6 Flotsam and JetsamGuy Kloss | Big Data in the Real World 31/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Delivering a Suitable Solution
Includes:System managementConnectivityApplication logicServicesYummy add-ons
Guy Kloss | Big Data in the Real World 32/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
System Management Framework
SecurityDedicated sub-networks with specific firewall rulesExternal firewallsUser and credentials managementLog collectorOther security tools . . .
System accessVPNRemote desktop services
Guy Kloss | Big Data in the Real World 33/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Connectivity
API gateways(Reverse) proxiesSFTP
Guy Kloss | Big Data in the Real World 34/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Application Logic
Platfor-as-a-Service (PaaS)Huge benefits of containerising application logic (using Docker)
→ Much reduced cadence for deliveryAPIs, Micro-ServicesOrchestration of Big Data analysis
Guy Kloss | Big Data in the Real World 35/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Services
Solutioning, buildAnalytics and developmentOperation and maintenance
Guy Kloss | Big Data in the Real World 36/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Bonus Points for . . .
Provenance(reproducibility, auditability, compliance)AI and MLBlockchain(non-repudiation, trust, “smart contracts”,identity management, federation, . . . )
Guy Kloss | Big Data in the Real World 37/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Outline
1 The Problem
2 Examples
3 The Solution
4 Tools of the Trade
5 Boxing up a Solution
6 Flotsam and JetsamGuy Kloss | Big Data in the Real World 38/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
In the Qrious Pipeline
Make Big Data a commodity: Don’t buy, pay what you need!→ Big-Data-as-a-Service – BDPaaS
Sliced, diced and configured to your needsStraight on bare metal,not VMs (like most cloud hosters)
Guy Kloss | Big Data in the Real World 39/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Maximising the Jobmarket
What skills do you need?RDBMS?SAS?NoSQL DBs?Maybe Hadoop is a good answer?
Guy Kloss | Big Data in the Real World 40/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Questions?
Parallelise!Guy [email protected]
Just a humble hair–dryer from the 30s:“One of the first machines used forpermanent wave hairstyling back in the1920’s and 1930’s.”Dark Roasted Blend:http://www.darkroastedblend.com/2007/05/
mystery-devices-issue-2.html
Guy Kloss | Big Data in the Real World 41/41