machine learning and big data

22
Machine Learning and Big Data By Poo Kuan Hoong (Multimedia University)

Upload: poo-kuan-hoong

Post on 11-Jan-2017

216 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Machine learning and big data

Machine Learning and Big DataBy Poo Kuan Hoong (Multimedia University)

Page 2: Machine learning and big data

Disclaimer: The views and opinions expressed in this slides are those ofthe author and do not necessarily reflect the official policy or positionof Multimedia University. Examples of analysis performed within thisslides are only examples. They should not be utilized in real-worldanalytic products as they are based only on very limited and datedopen source information. Assumptions made within the analysis arenot reflective of the position of Multimedia University.

Page 3: Machine learning and big data

Data Science Institute• The Data Science Institute is a research

center based in the Faculty of Computing & Informatics, Multimedia University.

• The members comprise of expertise across faculties such as Faculty of Computing and Informatics, Faculty of Engineering, Faculty of Management & Faculty of Information Science and Technology.

• Conduct research in leading data science areas including stream mining, video analytics, machine learning, deep learning, next generation data visualization and advanced data modelling.

Page 4: Machine learning and big data

Domain Sub-Domain Research Areas

Algorithm and Machine Learning

High Performance and Parallel Computing

1. HPC for massive heterogeneous data sources

2. Enhanced algorithmic performance using shared and distributed memory parallel processing (GPGPU).

Performance Optimization 1. Big Data Stream Mining2. Data Storage

Social Media Analytics Data mining 1. Predictive Analytics

Social Media Modelling 1. Sentiment Analysis2. Topic Modelling

Research Structure

Page 5: Machine learning and big data

Research Structure

Domain Sub-Domain Research Areas

Behavioral Analytics Media Analytics 1. Media Recommender

2. Customer Profiling

Smart Cities 1. Sensor networks

Transport & mobility

management

1. Image and Video Analytics

Network Analysis 1. Fault Prediction

2. Intrusion Prediction

Page 6: Machine learning and big data

Domain Sub-Domain Research Areas

Public Health Analytics Public health data 1. Infectious Disease modeling2. Home Monitoring and Sensing

Technologies

Multi-domainElectronic Health Recordsdata

1. Knowledge + Data Driven Risk Factor 2. Text mining for clinical notes

Financial & Business Analytics

Marketing and e-commerce 1. Finance and Banking

Financial market design and behavior

1. Time Series Analysis

Research Structure

Page 7: Machine learning and big data

In the near future….

Page 8: Machine learning and big data

Machine learning is all around us…

• Machine learning is part of our daily live• Email spam detection

• Photos searching using keywords

• Movies/Songs recommender systems

• Voice recognition

• Video captioning

• Self driving cars

• etc

Page 9: Machine learning and big data

What is machine learning?

Data Algorithms Insight

Page 10: Machine learning and big data

Machine Learning 101• Machine Learning is a process for generalizing

from examples• examples = example or "training" data• generalizing = building "statistical models" to capture

correlations• process= on going process, we keep validating &

refitting models to improve accuracy

• Simple machine learning workflow:• explore data• FIT models based on data• APPLY models in prediction• Evaluate and validate the models

*all models are incorrect essentially, but some are useful

Page 11: Machine learning and big data

3 types of machine learning

• Supervised Learning – generalizing from labeled data

http://cfs22.simplicdn.net/ice9/free_resources_article_thumb/Machine_Learning_2.jpg

Page 12: Machine learning and big data

3 types of machine learning

• Unsupervised learning – generalizing from unlabeled data

http://cfs22.simplicdn.net/ice9/free_resources_article_thumb/Machine_Learning_3.jpg

Page 13: Machine learning and big data

3 types of machine learning

• Reinforcement learning – generalizing based on feedbacks in time

http://cfs22.simplicdn.net/ice9/free_resources_article_thumb/Machine_Learning_5.jpg

Page 14: Machine learning and big data
Page 15: Machine learning and big data

Common machine learning techniques…

Naive Bayes

Decision Tree

K-Nearest Neighbour

Artificial Neural Network

Support Vector Machine

Ensemble Methods: Random Forest, Bagging, Adaboost

Logistic Regression

K-means

Page 16: Machine learning and big data

Which technique to use?

• What is size and dimensionality of my training set?

• Is my data linearly separable?

• How much do I care about computational efficiency?• Model building vs real-time prediction time

• Eager vs lazy learning/ on-line vs batch learning

• Prediction performance vs speed

• Do I care about interpretability or should it "just work well?"

Page 17: Machine learning and big data

What can I do with machine learning?

• Customer Churn Analysis

• Predictive Maintenance

• Customer Segmentation

• Products Recommendation

Page 18: Machine learning and big data

Business Analytics: Predict Customer Churn

• Problem: Customer churn will lead to income loss and high expenses to find new customers

• Solution: Build predictive model to forecast possible churn, act pre-emptively and learn from previous historical dataset

1. Get customer data (set-top boxes, web logs, transaction history)

2. Explore data, and fit predictive models based on past or real-time data

3. Apply and validate models until predictions are accurate

4. Identify customers likely to churn

5. Escalate the incidents to Business Ops. to investigate and act accordingly

Page 19: Machine learning and big data

Operation Analytics: Predictive Maintenance

• Problem: Network/Service outage will lead to income loss and high expenses

• Solution: Build predictive model to forecast possible outage, act pre-emptively and learn from previous historical dataset

1. Get resource usage data (latency, syslog, outage reports)

2. Explore data, and fit predictive models based on past or real-time data

3. Apply and validate models until predictions are accurate

4. Forecast resource saturation, demand and usage

5. Escalate the incidents to IT Ops. to investigate and act accordingly

Page 20: Machine learning and big data

Summary: The machine learning process

• Problem: Identify problem that may cost time and high expenses

• Solution: Build predictive model to forecast possible incidents, act pre-emptively and learn

1. Get all relevant data to problem

2. Explore data, and fit predictive models on past/real-time data

3. Apply and validate models until predictions are accurate

4. Forecast KPIs & metrics associated to use case

5. Escalate the incidents to respective units to investigate and act

Page 21: Machine learning and big data

Machine learning tools

Page 22: Machine learning and big data

Thanks!

Questions?

@kuanhoong

https://www.linkedin.com/in/kuanhoong

[email protected]