imtiaz khan data_science_analytics

8
Imtiaz Khan Hyderabad Email: [email protected] Mobile: +919493377607 A passionate engineer from R/Azure Machine Learning, Machine Learning and Algorithms background with strong research interest, loves to play with large volume, velocity and variety of data. Other experience includes web and windows application in C#, visual studio and SQL. Professional Summary: Intensive, hands-on experience on Data Analytics . Technical skills spanning from Statistics to Programming including data engineering, data visualization, machine learning and programming in R and SQL. Trained on most common data analysis problems that arise in most business verticals: Classification, Regression, Recommender Systems, Clustering, Association Analysis, Frequent Pattern Mining and Outlier Detection. Have a strong foundation in business analysis . Exposure to environment of Microsoft Technologies like .NET framework, and Visual Studio IDE. Key Skills and Technologies : Data Analytics : Predictive Analytics Problems: Supervised and Unsupervised Learning R Language- RStudio Azure Machine Learning Studio Language Understanding Intelligent service(LUIS) packages required for data Science in R other IDE’s : Visual Studio 2010,2012 Microsoft .NET Framework 4.0 SQL Server Management Studio 2008, 2010,2012,2013 Microsoft Excel 2010,2013 Windows7, Windows 8. Technical Skills: Data Collection Techniques Excel/csv/tsv Databases Scraping Collecting data from Excel/csv/tsv files Collecting data from databases Collecting data via scraping Data Preparation Techniques Structured Data Preparation Data Type Conversion Category to Numeric Conversion Numeric to Category Conversion Data Normalization:0-1, Z-Score Handling Skew Data: Box-Cox Idea Handling Missing Data Text Data Preparation Normalizing Text Stop word Removal Whitespace Removal Stemming

Upload: imtiaz-khan

Post on 23-Jan-2018

51 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Imtiaz khan data_science_analytics

Imtiaz Khan

Hyderabad Email: [email protected]

Mobile: +919493377607

A passionate engineer from R/Azure Machine Learning, Machine Learning and

Algorithms background with strong research interest, loves to play with large volume,

velocity and variety of data. Other experience includes web and windows application in C#,

visual studio and SQL.

Professional Summary:

Intensive, hands-on experience on Data Analytics. Technical skills spanning from

Statistics to Programming including data engineering, data visualization, machine

learning and programming in R and SQL.

Trained on most common data analysis problems that arise in most business verticals:

Classification, Regression, Recommender Systems, Clustering, Association

Analysis, Frequent Pattern Mining and Outlier Detection.

Have a strong foundation in business analysis.

Exposure to environment of Microsoft Technologies like .NET framework, and

Visual Studio IDE.

Key Skills and Technologies:

Data Analytics: Predictive Analytics Problems: Supervised and Unsupervised

Learning

R Language- RStudio

Azure Machine Learning Studio

Language Understanding Intelligent service(LUIS)

packages required for data Science in R

other IDE’s : Visual Studio 2010,2012

Microsoft .NET Framework 4.0

SQL Server Management Studio 2008, 2010,2012,2013

Microsoft Excel 2010,2013

Windows7, Windows 8.

Technical Skills:

Data Collection

Techniques

Excel/csv/tsv Databases Scraping

Collecting data from Excel/csv/tsv files

Collecting data from databases

Collecting data via scraping

Data Preparation

Techniques

Structured Data Preparation

Data Type Conversion

Category to Numeric Conversion Numeric to Category Conversion

Data Normalization:0-1, Z-Score

Handling Skew Data: Box-Cox Idea

Handling Missing Data

Text Data Preparation

Normalizing Text

Stop word Removal

Whitespace Removal

Stemming

Page 2: Imtiaz khan data_science_analytics

Building Document Term Matrix

Image Data Preparation

Converting to gray scale

Pixel Value Normalization

Building Pixel Intensity Matrix

Data Analytics Predictive Analytics Classification Regression Recommenders

Machine

Learning

Algorithms

Classification and

Regression

KNN Model

Decision Tree Model

Naive Bayes Model

Logistic Regression

SVM Model

Recommenders Content based Recommendation

User-User KNN Model

Item-Item KNN Model

Latent Factor Model

Clustering Iterative Models

Hierarchical Models

Density Models

Outliers Detection Probabilistic Model

Density Model

KNN Model

Association Analysis Apriori Model

Mathematical

skills

Linear Algebra, Vector

Algebra, Probability , Calculus and Statistics

Matrix Algebra

Understanding of factorization: Spectral factorization, Eigen factorization, SVD

factorization

Applications of matrices: image processing,

solving systems of equations, modelling discrete systems

Probability

Bayes Rule/Reasoning

MAP vs. MLE Reasoning

Properties of Random variables: expectation,

variance, entropy and cross-entropy, covariance and correlation

Understanding standard random processes

Probability Distributions : Normal, Gamma. Parameter Estimation in Distributions: MAP and

MLE approaches

Page 3: Imtiaz khan data_science_analytics

Statistics

Descriptive stats for single variable

mean, median, mode, quantiles, percentiles

standard deviation, variance

MAD, IQR

Descriptive stats for two variables

covariance

correlation

chi-squared Analysis

Hypothesis Testing

Job Experience:

Accenture

Senior Software Engineer

Machine Learning Projects:

Project: Ticket Classification in Application Management using Natural Language

processing (NLP) and Recommendation Systems

Problem Domain: Tickets come in many forms from end users; how can we

reduce the time to classify and resolve these tickets

This project is aimed at enabling substantial reduction of cost-to-serve in Application

Management by reducing the number of tickets and considerable time to resolve

tickets through cognitive automation. Cognitive Automation can be used to help

developers, testers & project managers make better decisions for various tasks

during defect logging, defect resolution, and test execution phases. This would

reduce the effort in analysing and classifying the known issues, provide

recommendations of similar issues and automate the process of creating a Team

Foundation Server (TFS) Work item and assigning the right team member to fix the

issue.

Use Case 1: Issue to Issue

Identification of similarities of the defect observed with historic issues.

Removal of duplicate issues, if similar ones being worked upon.

Consolidated view of similar defects, which can be targeted to fix together

with same fix.

Use Case 2: Issue to Resolver/Tester/Developer and (Assignee Recommendation)

Enabling Build/Test Leads to identify the best candidate who can fix/test it

faster, based on analytics the tool provides.

Page 4: Imtiaz khan data_science_analytics

History of Testers and Developers who worked on similar fixes, to help decide whom to assign the defect for faster turnaround.

Use Case 3: Root Cause Analysis

Assist users in the identification and analysis of root causes underlying a

ticket using available log data and other information sources pertaining to

incident.

Cognitive Solution: High Level Approach

When a new ticket arrives, parse its details and would map the ticket to earlier built

knowledge model and determine semantically highly similar tickets as duplicates

Based upon user acceptance or rejection of the recommendation, tool incrementally

learn and improve its performance on the duplication process.

If duplicate tickets do not exist, would list semantically related tickets in ranked

order together with degree of semantic associations.

Technology Support

Language Understanding Intelligent Service (LUIS) – for Natural Language

Processing of the email content.

Azure Machine Learning (ML) – For classifying the content and finding

recommendations.

Project: Churn Modelling in Telecom using R

This project is in the domain of telecommunication (prepaid segment) .It involves voluntary churn modelling enabling the business personnel in understanding the business

problem i.e. voluntary customer churn (e.g. Drop in usage, movement to another

network, no revenue generation) is a significant concern in many service industries. One

way to decrease churn is to identify customers in advance who are at risk of churning and target an incentive to encourage them to stay. However, this requires accurate predictions about which customers are at risk. Churn is a term for customers quitting and joining another

Page 5: Imtiaz khan data_science_analytics

service provider. Most telecom companies suffer from voluntary churn. Churn rate has strong impact on the life time value of the customer because it affects the length of service and the

future revenue of the company.

This problem domain was termed as classification as it was to determine two possibilities “1” (churn) or “0” (not churn). The two machine learning models in the interest were

logistic regression and Decision Tree. Logistic regression had an edge on the accuracy giving around 78.4%. The business proactive action was to treat the customers based on the revenue they are generating to the company. For example high revenue customers were given

regular calls and follow-ups and low revenue customers were emailed regularly.

Packages used in R

Outliers: This package used for detecting outliers in the data set.

VIM: This package is used for the visualization of missing and/or imputed values, which can be used for exploring the data and the structure of the missing and/or imputed values

Rattle: A GUI that provides a graphical user interface specifically for data mining using R

Car: A function VIF (variance inflation factor, to determine multicollinearity between predictors)

Rpart: A machine learning decision tree package for building the tree model.

ROCR: This package used in determining the AUC metric to evaluate across models.

Projects: Learning Projects

A knowledge driven supervised learning approach to identify image of a

handwritten single digit, and determine what that digit is. (Kaggle.com)

This competition is aimed at identifying a handwritten image of a single digit and determining what the digit is. K-Nearest Neighbours and Naive Bayes has been used

separately for prediction. KNN performed better with an accuracy of 97%. The dataset containing different

parameters are first pre-processed to remove near zero variance parameters. The most effective parameters are then filtered and used for prediction. Cross Validation (10-fold) is used to create training and test sets.

A Supervised learning approach to identify an insulting comment

The challenge is to detect when a comment from a conversation would be considered insulting to another participant in the conversation. Naive Bayes has been used for prediction. The dataset containing different parameters (terms) are first pre-processed so as

to normalize, to remove punctuations, stop words, numbers, punctuations and stemmed words. The most effective parameters of the bigram terms ob tained are then filtered and

used for prediction. Cross Validation (10-fold) is used to create training and test sets

Page 6: Imtiaz khan data_science_analytics

Predict survival on the Titanic (Kaggle.com)

This competition is aimed at analysing what sorts of people are likely to survive or applying different tools of Machine learning to predict which passengers survived the tragedy. Logistic regression and random forest has been used separately for prediction

and gave same ranking (0.77990). The dataset containing different parameters are first pre-processed to impute missing values. The most effective parameters are then filtered and

used for prediction. Cross Validation (10-fold) is used to create training and test sets.

Responsibilities: 1. Collected data from Excel/csv/tsv files, databases, services, web scrapping.

2. Performed data normalization using Z-score and max min normalization, smoothing skew data and missing data through box-cox transformation.

3. Conducted exploratory and descriptive data analysis for large data sets 4. Explored features using univariate(mean, median and mode), bivariate(Covariance

and correlation) and multivariate(using R package: ggplot2) relationship by stats

quantities. 5. Applied Dimensionality Reduction, Image Compression using Principle component

Analysis (PCA).

6. Used Logistic regression model in Titanic Survivor dataset to measures the relationship between the categorical dependent variable and independent variables by

estimating probabilities. 7. Implemented K-Nearest Neighbors (KNN) to identify digit in hand written image

dataset (Image Processing). 8. Identified text and sentiment analysis posted in social media network to classify the

indignity of a comment via Naïve Bayes approach

9. Improved the model build through K-fold cross validation

Accenture

Senior Software Engineer

Projects

The projects mentioned below are about developing a Fare Management Solution using multiple technologies. We have implemented this Solution using BizTalk, SharePoint, BI and

Dynamics AX. AX supports transactional, Finance and back office support for the Solution. We have integrated Dynamics AX with Payment providers, BizTalk and online website using

AIF and WCF services.

Project -3: Accenture Fare Management Solution

Senior Software Engineer (May 2013 – May 2015)

1. Worked on Coded UI Tool which is integrated with Visual studio, written scripts, developed framework which would enable faster execution of Test Cases over night.

2. Integrated our test suite with several technologies like BizTalk, SQL server,

Microsoft Dynamics AX and Web application. 3. Effectively learnt new technologies like Microsoft Dynamics AX and created XPO

files which would enable faster creation of data required for the test cases.

Page 7: Imtiaz khan data_science_analytics

4. Developed and configured the result set produced after execution to be sent over the mail to all the team members using SMTP settings.

5. Implemented and developed automated test practices for both web and windows applications primarily using Visual Studio’s Coded UI module for both web and

windows applications. 6. Designed and created test scripts using C# to address areas such as database impacts,

software scenarios, regression testing, negative testing, error or bug retests, or

usability in preparation for implementation.

Project - 2: Accenture Software’s

Software Engineer (Nov 2012 - May 2013)

Responsibilities:

1. Contributed to testing and validating of AX solution per requirements

2. Delivered testing results in a professional manner to customer

3. Delivered testing results according to required timeline and per quality

4. Provided inputs for continuous improvements of testing group

5. Cooperated with centralized Technical and Functional departments. 6. Documented test results and evaluate results to log defects

Project - 1: Presto E-ticketing

Associate Software Engineer (May 2012 - Nov 2012)

Responsibilities:

1. Major Technologies involved while deploying the solution include .NET, SQL,

BizTalk and Microsoft Dynamics AX 2. Monitoring the Connectivity of servers and informing the same to the developers to

ensure stability of the environment 3. Involved in mentoring the fresher’s in performing the tasks by guiding them with

necessary knowledge transfer.

4. Provided timely report to the supervisor about the fluctuations in the Environment and took necessary actions to ensure stability.

Certifications:

BCS, The Chartered Institute for IT Foundation Certificate in Business Analysis

Educational Background:

Course of

Study

Board/

University Year Of Passing Percentage

B.E

(Electronics

and Nagarjuna University 2012

78

Page 8: Imtiaz khan data_science_analytics

Communicati

on)

Intermediate

(Higher

Secondary

Education)

Board Of

Intermediate

Education, AP 2008 96.4

10th standard

(Schooling) ICSE 2006 87.5

Personal Details:

Name

Imtiaz khan

Date of Birth 02 Nov, 1990

Father’s Name Mohammad khan

Sex Male

Nationality Indian

Declaration:

I hereby declare that the information furnished above is true to the best of my knowledge.

Date:

Place: (Imtiaz khan)