high-performance analytics - myvertica · high-performance analytics ... risk control system obiee...
TRANSCRIPT
Who We Are
Leading Payment Solutions
Provider for Vertical Industry
• Airline ticketing
• Mutual funds
• ……
Top Acquirer for Small & Micro
Business
Acquiring(merchant service)
business
• Wholesale market
• Shopping mall
• Dining
• Supermarket
• Life style
Integrated Finance Enabler
Wealth management
• Trust , Asset management product
Credit Loan
• Small & micro loan
Account Custodian
• P2P
20062011
2013
New Financial Eco-system Strategy
“New Financial Eco-system” Strategy
‣ Provide “Fundamental Infrastructure" for new financial eco-system
‣ Leverage distribution channels and customers to cross sell
‣ Align with traditional and new financial institutions to integrate financial products
Data Architecture v1.0
TTY Fund System
…
P2P Payment
System
Financial Data
External Data
Data Analysis
OBIEE
Reporting System
Data Query
DW1.0_v0.1_20160717
Oracle RACStandby
Database
Synchronizing
What We Need
Fast data transfer High scalabilityData analysis
capabilities
Stability and
reliability
Simple
management
Data Architecture v2.0
TTY Fund System
…
P2P Payment
System
Financial Data
External Data
Real-time
Streaming Data
Storm
Topology
Data API
Redis
Data Query
Data Portal
Risk Control
System
OBIEE Reporting
System
Data Analysis
DW2.0_v0.5_20160810
Oracle RAC
Kafka
Standby
Database
DWH
VerticaVertica Vertica
ETL
SRC
Benefit
• Overall data processing performance has improved by a factor of ten
• Data volume for analysis has expanded from a few months’ worth to 3-5 years’ worth
• The speed of data extraction has gone from 4-5 hours down to 1-2 hours, an improvement of 300 percent
• Data queries have become 100 times faster
Data architecture v3.0
Hadoop
DWH3
DW3.0_v0.7_20160810
Data API
e2
Data Query
Data Portal
Risk Control
System
Data Analysis
(R, Python)Vertica
DW3
Real-time
Streaming DataStorm RedisKafka
Other System
TTY Fund System
…
P2P Payment
System
Financial Data
External Data
Standby
Database
Vertica
SRC
Oracle RACOBIEE Reporting
System
ETL
Log Analysis
Network Behavior
Analysis
Data Query and Data Reports
4,691 96.4%
Times of
Data Query
Avg. Feedback
Rate
386
Total
Data Reports
Jan 2016 – Jun 2016
Business
Mgmt
Finance
Risk
Control
Operation
Mgmt
PR
BUs
Data Portal
• Display 400 business units’ financial indices, risk
indicators and operation indices
• Use a variety of forms such as pie chart, broken
line chart and geographic map
• Support up to 3 years historical data query and
users can get insight in seconds
• Provide real-time analysis and track real-time
trading trend and regional distribution
Data Portal - Topological Graph
Vertica
SRC
Vertica
DW3
Data API
(Java)
ATAT
e2
Web Service
Web
Data Portal
r2 DSL
Engine
Standby
Database
Date Report
Module
Data Query
Module
BI Module
e2
(Python)
IFS
ETL Cache
Management
Module
Data Access
Module
Data Storage
Module
MySQL
Data Portal
Console
Mobile
Data Portal
Data Report
Report
Download
Self Service
Data Query
Modeling
BI
Data Drilling
User
Management
System
Setting
Property
Management
DP_v0.1_20160810
DW3H
Hadoop
Data Portal - Time Series Analysis
• Extract trend components in the trading time series data and analyze performance
• Identify outliers and interpret the abnormal changes combining the practical business
• Compare the trading time series with the same period of last year and find the difference
Data Portal - Data Drilling
• Summarize a corpus of data in multiple ways and illustrate different representations of the same
basic information
• Give deep insight into the problem and help find what caused the problem
Drill Down Drill Down
Data Portal - Real-time Trading Volume Forecast
• The time series of historical trading volume of P2P payment is
significantly autocorrelation
• Test the time series for stationarity in logs and build ARIMA
model by quoting dummy variables to make a predict of the
tendency of trading volume in next half hour
• Test whether the change of the real trading volume take place
during the confidence interval and identify the outliers to
prevent risks
• Provide a reference for decision makers and reflect the
development status of business
P2P Payment Trading Volume Forecast
Data Portal - User Behavior Tracking
Real-time regional distribution of trading
• Compute the trading volume of each region in real-time and
cover more than 600 different cities
• Reflect the effectiveness promptly when it comes marketing
strategy such as coupon strategy, sending promotional
messages and so on
• Comparing the amount of the transaction and observing the
activities in each region, help product managers to discover the
location of the key costumers
Customer Centric Data Warehouse
Fact Labels
Derived Insights
• User Loss Probability
• Product and Content
Recommendations
• Credit Scores
• …
• Information Retrieval
• Complaints
• Marketing Times
• ...
• Customer Lifetime Value
• Activity Score
• Investment Style
• Risk Preference
• …
• Demographic Attributes
• Purchase Transactions
• Website Browsing
• …
360-Degree
Customer View
Customer Level
Interaction Status
Customer Centric Data Warehouse - Overview
100.0%200.0%
0:0
0
0:0
0
0:0
0
0:0
0
0:0
0
0:0
0
0:0
0
0:0
0
0:0
0
0:0
0
0:0
0
0:0
0
0:0
0
0:0
0
0:0
0
0:0
0
0:0
0
0:0
0
0:0
0
0:0
0
0:0
0
0:0
0
0:0
0
0:0
0
1
2
3
4
5
6
7
Browse the catalog
Click to buy
Pay the order
Complete transaction
Customer Centric Data Warehouse - Profile
Marketing Times: Once
Complaints: None
Name: Mr. Hong
Gender: Male
Age: 32
Location: Zhenjiang, Jiangsu
P2P investment: 3,604,676.07 RMB
Account Number: 10
Fund investment: 9,000.00 RMB
ARPU: 2,978.14
RFM analysis: Rank 3
Active Evaluation: Grade 6
Purchase Evaluation: Grade 2
Yield Preference: 10%-15%
Fund Type Reference: Stock Fund
2013.05.13 09:28:19
Opened fund investment account2013.05.13 14:22:25
Invested a stock fund
1,000.00 RMB
2014.09.18 14:09:09
Opened P2P payment account
2014.09.18 15:00:32
Used online chat
2014.10.15 14:18:01
Received repayment
7.39 RMB
2016.07.25 18:00:07
Last P2P payment
37,000.00 RMB
Customer Centric Data Warehouse - Industry Report
2014-2015Q1
• High Costs of reaching
targeted customers
• Mobile Era
• Social Channels
2015Q2
• Chinese Dama
2015Q3
• Liquidity Preference
2015
• White-collared Workers
• Decision Making in Short
Time
Risk Identification
Supay illegal-cash-advance identification
• Screen variables such as average transaction amount, trading
period, used credit card number and so on according risk
control experience
• Process PCA algorithms to avoid subjective interferes and get
the principal components based on contribution rate
• Compute Silhouette Coefficient to determine the number of
clusters and use both K-Means and K-Medoid clustering
algorithm
• Compare the differences of each group and identify illegal cash
advance group of great suspicion
‘Bad’ Signature Recognition
• Randomly drawing over 60,000 samples and label good or bad
signature
• Process grey scale and normalize each sample to a n*n matrix
• Create test set (2,000 samples) and training set (838 samples)
and balance the scales of bad and good signatures
• Running CNN model with TensorFlow and confirm the best
model parameters through iterations
• Evaluate the performance of the classification model and the
identification accuracy is about 90%
Speech Recognition
• Collect over 2000 mandarin speech sample and the speech
content is digital from 0 to 9
• Make up sound processing named mel-frequency cepstrum
and extract features
• Set up training set and testing set and they are mutually
exclusive
• Get n-gram model from training speech corpus using a speech
recognition toolkit called Kaldi
• Optimize model parameters in GMM-HMM algorithm and test
stability of the model
• The identification accuracy is 95% above
■ Text Analysis
• Scrap web pages including P2P platform websites, news and
forum
• Delete the duplicated and irrelevant web pages, tokenize the
text and apply filters
• Use Naïve Bayes classifier to do prediction such as identify
whether a loan is unsecured
• Monitor opinions in news and forum and analyze text semantic
orientation to perfect early warning systems