power product innovation with big data technologies · pdf filepower product innovation with...

35
Power product innovation with Big Data technologies

Upload: lykhanh

Post on 03-Feb-2018

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

Power product innovation with Big Data technologies

Page 2: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

Introducing:

Zhixuan Wang Experian

Hua Li Experian

Page 3: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 3

In the 21st century, data is the new oil, Big Data analytics is the new engine, Big Data tools are the new machinery.

4/21/2017 Experian Public Vision 2017

Page 4: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 4

Big Data and open source landscape

4/21/2017 Experian Public Vision 2017

Page 5: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 5

Apache Hadoop Stack

4/21/2017 Experian Public Vision 2017

Tip: Use Hadoop streaming to write mapper and reducer in your favorite program language

Page 6: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 6

Credit Card Attrition Trigger Transaction Data Insight System (TDIS)

4/21/2017 Experian Public Vision 2017

1

Historical spend enables probability

expectation (profile) to be computed

As time passes, new transactions

adjust the probability expectation

Notify when transaction does not

occur within the probability

expectation threshold

2

3

Page 7: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 7

Hadoop Streaming with secondary sort

4/21/2017 Experian Public Vision 2017

Calculate triggers in reducer

• Build up profile based on account-grouped date-time ordered transactions

• Reuse old python code

Results

• 10M accounts with 1.2B transaction over 24 months

• No profile data to be stored: ~50GB / snapshot

• Finish in 1 hours 17 minutes

6 machine with 8 cores each

• Trigger delivery from weekly to daily

Sort

• Primary key = account number

• Secondary key = date-time

Page 8: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 8

Apache Spark

4/21/2017 Experian Public Vision 2017

Page 10: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 10

Credit card transaction data: 24-month

• 25GB bzipped

• 1.2B transaction

– 18 fields / transaction

• 8 machine

– 32 cores / machine

– 256GB memory / machine

Interactively explore credit card transactions data

4/21/2017 Experian Public Vision 2017

Page 11: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 11

Split, convert and load data

4/21/2017 Experian Public Vision 2017

Split, convert, and load data

Fire up Spark-shell

Page 12: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 12

Cache data

4/21/2017 Experian Public Vision 2017

Check cached data and executors

Cache it!

Page 13: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 13

Explore data (fast!)

4/21/2017 Experian Public Vision 2017

Take a peak

Five number summary on TRAN_AMT

Page 14: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 14

Save results

4/21/2017 Experian Public Vision 2017

Top merchant ZIP Codes™

Save results

Page 15: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 15

Start Spark Shell

• Set proper number of executors and memory per executor

Convert, load, cache data

• Spark >=1.6v: memory efficient

• Partition data to fit executor’s memory limit

Explore

Recap and tips

4/21/2017 Experian Public Vision 2017

Page 16: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 16

Graph database

4/21/2017 Experian Public Vision 2017

Page 17: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 17

Challenge: Finding the missing link

Potential applications:

• Healthcare: Elder patients close to his / her children

• Wealth service: Identify the heirs of the elder customers

• Retail: Condolence / celebration / holiday gifts and services

• Anti-money laundry: Domestic politically exposed persons

• Fraud prevention: Synthetic ID fraud

Who are my family members?

4/21/2017 Experian Public Vision 2017

Page 18: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 18

What is a graph database?

4/21/2017 Experian Public Vision 2017

Graph

• A collection of vertices (nodes) and edges(relationships) that connect them

Graph database

• Index-free adjacency: connected nodes physically “point” to each other in the database

Page 19: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 19

• Extremely Flexible data format

• Most of time family members are not directly connected

• Nodes that are useful family indicators:

• Address

• Phone number

• Email address

• Last name

• Other usage:

• Meetup / E-harmony (based on hobby, taste etc.)

• Facebook / LinkedIn (based on co-worker, classmates etc.)

Design the graph for family search

4/21/2017 Experian Public Vision 2017

Page 20: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 20

Comparison

4/21/2017 Experian Public Vision 2017

SQL Query (RDBMS Database) Cypher Query (Graph Database)

Page 21: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 21

Geolocation database with PostgreSQL

4/21/2017 Experian Public Vision 2017

Page 22: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 22

Geolocation data

4/21/2017 Experian Public Vision 2017

Page 23: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 23

Geolocation data

4/21/2017 Experian Public Vision 2017

Exponential growth of mobile location data with the rise of smart phones

Wide applications:

• Home / work location detection

• Favorite shops

• Mobile marketing service

• Passenger analysis

Key question:

• Where has the consumer been?

Supporting components:

• Where are the Points of Interest (POI) data?

• Which POI is/are around the consumer?

where you

work

where you

shop

how you get there

events you

attend

where you

travel

where you live

where you spend free time

Page 24: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 24

OpenStreetMap Best Free source for points of interests

4/21/2017 Experian Public Vision 2017

OpenStreetMap (OSM) is a collaborative project to create a free editable map of the world

• Not as accurate as Google, but getting closer and closer, especially in major cities

• Points, lines, polygons

• Rich tags:

Addr: House number, street, city, etc.

Shop: Alcohol, beverage, computer

Admin_level: 2 (country), 4 (state), 6 (city)

Highway: Residential, primary, cycle way, track, etc.

Amenity: Library, school, parking area, bar

Cuisine: coffee, pizza, Chinese, sushi

• Could be easily imported into PostgreSQL with PostGIS extension

Page 25: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 25

What POIs are around me?

4/21/2017 Experian Public Vision 2017

Page 26: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 26

9p 9r 9x 9z

9n 9q 9w 9y

9j 9m 9t 9v

9h 9k 9s 9u

95 97 9e 9g

94 96 9d 9f

91 93 99 9c

90 92 98 9b

9q

b

9q

c

9qf 9q

g

9q

u

9q

v

9q

y

9q

z

9q

8

9q

9

9q

d

9q

e

9q

s

9qt 9q

w

9q

x

9q

2

9q

3

9q

6

9q

7

9q

k

9q

m

9q

q

9qr

9q

0

9q

1

9q

4

9q

5

9q

h

9qj 9q

n

9q

p

• Hierarchical group coding of (latitude, longitude) coordinates

• Arbitrary accuracy

• Fast encoding

Geohash

4/21/2017 Experian Public Vision 2017

Page 27: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 27

Nearby points Easy case of vicinity search

4/21/2017 Experian Public Vision 2017

Which store am I visiting?

Identify the search radius

POI candidates within

candidate Geohash

Filter by actual distance

calculation

Page 28: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 28

Nearby polygons Advanced case of vicinity search

4/21/2017 Experian Public Vision 2017

Challenge #1: The geohash of a polygon

is the geohash of its center, but the boundary

could be very far away from its center

Which park am I visiting?

Solution:

Categorize polygons by its size first, then

customize search radius by the search

Page 29: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 29

Nearby polygons Advanced case of vicinity search

4/21/2017 Experian Public Vision 2017

Multiple level search: Find polygons of all sizes

Which park am I visiting?

Page 30: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 30

Nearby polygons Advanced case of vicinity search

4/21/2017 Experian Public Vision 2017

Am I in the park?

Challenge #2: Given a point, how do we

determine whether it is inside the polygon?

Solution:

ST_Within (PostGIS built-in function):

Using ray_casting algorithm

• Draw a ray from the point in random

direction

• Count the number of intersections

• Odd: In Even: Out

Page 31: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 31

Key takeaways

4/21/2017 Experian Public Vision 2017

• Use OpenStreetMap + PostgreSQL(PostGIS) to handle your geo-location data

• Filter the candidates first before you calculate distance

Page 32: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 32

Tips on some latest techniques based on our experiences

• Spark:

– Set proper number of executors and memory per executor

– Partition data to fit executor’s memory limit

• Graph Database:

– Much more efficient when you have to do multiple joins in traditional RDBMS

– Much more flexible

• Geolocation data:

– OpenStreetMap + PostgreSQL

– Filter candidates before a proximity search

Summary

4/21/2017 Experian Public Vision 2017

http://www.experian.com/big-data/datalabs.html

Page 33: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 33

Experian contact:

Hua Li [email protected]

Zhixuan Wang [email protected]

Questions and answers

4/21/2017 Experian Public Vision 2017

Page 34: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data

©Experian 34

Share your thoughts about Vision 2017!

4/21/2017 Experian Public Vision 2017

Please take the time now to give us your feedback about this session.

You can complete the survey at the kiosk outside.

How would you rate both the Speaker and Content?

Page 35: Power product innovation with Big Data technologies · PDF filePower product innovation with Big Data ... Experian . 3 ©Experian In the 21st century, data is the new oil, Big Data