machine learning and big data
TRANSCRIPT
Machine Learning and Big DataBy Poo Kuan Hoong (Multimedia University)
Disclaimer: The views and opinions expressed in this slides are those ofthe author and do not necessarily reflect the official policy or positionof Multimedia University. Examples of analysis performed within thisslides are only examples. They should not be utilized in real-worldanalytic products as they are based only on very limited and datedopen source information. Assumptions made within the analysis arenot reflective of the position of Multimedia University.
Data Science Institute• The Data Science Institute is a research
center based in the Faculty of Computing & Informatics, Multimedia University.
• The members comprise of expertise across faculties such as Faculty of Computing and Informatics, Faculty of Engineering, Faculty of Management & Faculty of Information Science and Technology.
• Conduct research in leading data science areas including stream mining, video analytics, machine learning, deep learning, next generation data visualization and advanced data modelling.
Domain Sub-Domain Research Areas
Algorithm and Machine Learning
High Performance and Parallel Computing
1. HPC for massive heterogeneous data sources
2. Enhanced algorithmic performance using shared and distributed memory parallel processing (GPGPU).
Performance Optimization 1. Big Data Stream Mining2. Data Storage
Social Media Analytics Data mining 1. Predictive Analytics
Social Media Modelling 1. Sentiment Analysis2. Topic Modelling
Research Structure
Research Structure
Domain Sub-Domain Research Areas
Behavioral Analytics Media Analytics 1. Media Recommender
2. Customer Profiling
Smart Cities 1. Sensor networks
Transport & mobility
management
1. Image and Video Analytics
Network Analysis 1. Fault Prediction
2. Intrusion Prediction
Domain Sub-Domain Research Areas
Public Health Analytics Public health data 1. Infectious Disease modeling2. Home Monitoring and Sensing
Technologies
Multi-domainElectronic Health Recordsdata
1. Knowledge + Data Driven Risk Factor 2. Text mining for clinical notes
Financial & Business Analytics
Marketing and e-commerce 1. Finance and Banking
Financial market design and behavior
1. Time Series Analysis
Research Structure
In the near future….
Machine learning is all around us…
• Machine learning is part of our daily live• Email spam detection
• Photos searching using keywords
• Movies/Songs recommender systems
• Voice recognition
• Video captioning
• Self driving cars
• etc
What is machine learning?
Data Algorithms Insight
Machine Learning 101• Machine Learning is a process for generalizing
from examples• examples = example or "training" data• generalizing = building "statistical models" to capture
correlations• process= on going process, we keep validating &
refitting models to improve accuracy
• Simple machine learning workflow:• explore data• FIT models based on data• APPLY models in prediction• Evaluate and validate the models
*all models are incorrect essentially, but some are useful
3 types of machine learning
• Supervised Learning – generalizing from labeled data
http://cfs22.simplicdn.net/ice9/free_resources_article_thumb/Machine_Learning_2.jpg
3 types of machine learning
• Unsupervised learning – generalizing from unlabeled data
http://cfs22.simplicdn.net/ice9/free_resources_article_thumb/Machine_Learning_3.jpg
3 types of machine learning
• Reinforcement learning – generalizing based on feedbacks in time
http://cfs22.simplicdn.net/ice9/free_resources_article_thumb/Machine_Learning_5.jpg
Common machine learning techniques…
Naive Bayes
Decision Tree
K-Nearest Neighbour
Artificial Neural Network
Support Vector Machine
Ensemble Methods: Random Forest, Bagging, Adaboost
Logistic Regression
K-means
Which technique to use?
• What is size and dimensionality of my training set?
• Is my data linearly separable?
• How much do I care about computational efficiency?• Model building vs real-time prediction time
• Eager vs lazy learning/ on-line vs batch learning
• Prediction performance vs speed
• Do I care about interpretability or should it "just work well?"
What can I do with machine learning?
• Customer Churn Analysis
• Predictive Maintenance
• Customer Segmentation
• Products Recommendation
Business Analytics: Predict Customer Churn
• Problem: Customer churn will lead to income loss and high expenses to find new customers
• Solution: Build predictive model to forecast possible churn, act pre-emptively and learn from previous historical dataset
1. Get customer data (set-top boxes, web logs, transaction history)
2. Explore data, and fit predictive models based on past or real-time data
3. Apply and validate models until predictions are accurate
4. Identify customers likely to churn
5. Escalate the incidents to Business Ops. to investigate and act accordingly
Operation Analytics: Predictive Maintenance
• Problem: Network/Service outage will lead to income loss and high expenses
• Solution: Build predictive model to forecast possible outage, act pre-emptively and learn from previous historical dataset
1. Get resource usage data (latency, syslog, outage reports)
2. Explore data, and fit predictive models based on past or real-time data
3. Apply and validate models until predictions are accurate
4. Forecast resource saturation, demand and usage
5. Escalate the incidents to IT Ops. to investigate and act accordingly
Summary: The machine learning process
• Problem: Identify problem that may cost time and high expenses
• Solution: Build predictive model to forecast possible incidents, act pre-emptively and learn
1. Get all relevant data to problem
2. Explore data, and fit predictive models on past/real-time data
3. Apply and validate models until predictions are accurate
4. Forecast KPIs & metrics associated to use case
5. Escalate the incidents to respective units to investigate and act
Machine learning tools
Thanks!
Questions?
@kuanhoong
https://www.linkedin.com/in/kuanhoong