course - cluster analysis with r

7

Click here to load reader

Upload: persontyle

Post on 16-Apr-2017

804 views

Category:

Data & Analytics


4 download

TRANSCRIPT

Page 1: Course - Cluster Analysis with R

© 2014 Persontyle Ltd. All rights reserved.

CLUSTER ANALYSIS WITH R

[cluster analysis divides data into groups that are meaningful, useful, or both]

LEARNING STAGE – ADVANCED

DURATION – 3 DAY

Page 2: Course - Cluster Analysis with R

www.persontyle.com© 2014 Persontyle Ltd. All rights reserved.

Cluster Analysis or Clustering is the study of methods and algorithms for

finding groups in data. It is an enormously important part of data science –

and a topic always treated in data mining and machine learning. Clustering

methods can be found in areas as disparate as customer segmentation,

recommender systems, drug compound library design, risk modeling, fraud

detection, gene expression studies, field biology, text mining, … the list is

nearly endless. Clustering is an essential part of predictive modeling

methodology, from data exploration and hypothesis generation about the

classes or structure of data to actually being part of some predictive

modeling tasks.

Some Applications of Cluster Analysis;

WHAT IS CLUSTER ANALYSIS?

• Market researchers and analysts use cluster analysis to partition the

general population of consumers into market segments and to better

understand the relationships between different groups of

potential customers, and for use in market segmentation, Product

positioning, and new product development.

• Clustering is used to group all the shopping items available on the web into

a set of unique products.

• In the study of social networks, clustering is used to

recognize communities within large groups of people.

• Cluster analysis is used to identify areas where there are greater

incidences of particular types of crime. By identifying these distinct areas or

"hot spots" where a similar crime has happened over a period of time, it is

possible to manage law enforcement resources more effectively.

• Flickr's map of photos and other map sites use clustering to reduce the

number of markers on a map.

Page 3: Course - Cluster Analysis with R

www.persontyle.com© 2014 Persontyle Ltd. All rights reserved.

R logo is trademark of the R Foundation, from http://www.r-project.org

This course presents a broad overview of Cluster Analysis, a form of

unsupervised machine learning that is used for exploratory data

analysis, data summation, ordination, and even predictive modelling.

This course will provide an in depth review of both clustering theory and

application across a large spectrum of disciplines and applied settings,

from drug discovery to management science.

Clustering topics, such as issues with data types, measures of

similarity, and clustering algorithms and their taxonomy, will be

additionally explored in the form of a hands-on labs with the use of the

R programming language.

Participants will come away with information and a set of tools that will

form the basis for an approach to the use of Cluster Analysis for

clustering problems in their respective domain.

Cluster Analysis forms an important area of statistical learning theory,

both as an independent discipline of unsupervised learning and a

sometimes subdomain within predictive modelling and supervised

learning.

CLUSTER ANALYSIS WITH R 3 day course for professionals and researchers interested in

developing practical skills on how to implement clustering

algorithms using R.

“If we can get usable, flexible, dependable machine learning software into the hands of domain experts,

benefits to society are bound to follow.”Dr Kiri L. Wagstaff, researcher at NASA JPL

Page 4: Course - Cluster Analysis with R

WHAT WILL YOU LEARN?

This course will proceed such that participants will learn and explore by

way of simulated and practical examples in R: the general concerns of

data and data types used in clustering, measures of similarity (including

notions of “distance” and “metric”), theoretical foundation of

clustering, data summation, ordination, connection to data mining and

prediction, clustering approaches in the form of an informal taxonomy,

algorithm complexity, relevant graph theory, specific clustering algorithms

(model-based, hierarchical, partitional, graphical, hybrids, co-clusteing,

asymmetric clustering, online clustering), visualization of various forms of

clustering results, and clustering validation, parallelism.

Participants will learn about the data preparation and various clustering

algorithms and visualization methods with the help of the following R and

R clustering packages. Examples will include real world applications in

drug discovery, bioinformatics, social media, management science,

finance, ecology, and others. Specific applications will include drug

compound library design and diversity, gene expression, community

detection, customer segmentation, species ordination, QSAR (Quantitative

Structure Activity Relationship), among others. Participants will come away

from the course with the tools and applied understanding necessary to

approach a large array of clustering problems in their domain.

www.persontyle.com© 2014 Persontyle Ltd. All rights reserved.

R logo is trademark of the R Foundation, from http://www.r-project.org

CLUSTER ANALYSIS WITH R

PREREQUISITES

Participants should have at least passing familiarity with the following

topics: probability theory, statistics, matrix algebra, and programming in R.

Page 5: Course - Cluster Analysis with R

www.persontyle.com© 2014 Persontyle Ltd. All rights reserved.

R logo is trademark of the R Foundation, from http://www.r-project.org

CLUSTER ANALYSIS WITH R

VIRTUOUS CIRCLE OF LEARNING

Learning outcomes combine theory, overview of concepts and practices,

applied examples from real world and implementation (Hands-on Labs).

Time allocated to each topic will drive the depth and coverage of that topic.

WHAT SHOULD I BRING?

Along with bringing your laptop and a charger, don’t forget to bring loads

of curiosity, scepticism, eagerness to participate and the desire to learn.

WHO SHOULD TAKE THIS COURSE?

DATA SCIENTISTSQUANTITATIVE PROFESSIONALS

TECHNOLOGISTS/ PROGRAMMERS

DATA/MARKET ANALYSTS

RESEARCH ANALYSTS

This course is intended for those who are currently working as data analysts, programmers, market researchers with limited exposure to clustering techniques and algorithms as well as those looking to move into the field.

Page 6: Course - Cluster Analysis with R

COURSE INSTRUCTORS

Today’s Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these,

developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data.

Kevin P. Murphy - Research Scientist at Google

www.persontyle.com© 2014 Persontyle Ltd. All rights reserved.

R logo is trademark of the R Foundation, from http://www.r-project.org

CLUSTER ANALYSIS WITH R

John MacCuish

John MacCuish is a founder and President of Mesa Analytics & Computing, Inc. and a

computer scientist with over 20 years of experience as a researcher, algorithm designer,

and data scientist in applied settings. John has published numerous journal articles,

books, successful grant applications, patents, and technical reports on graph theory,

algorithm animation, scientific visualization, image processing, cheminfomatics,

bioinformatics, and data mining. He also wrote or contributed to many internal and

confidential reports on fraud detection, image recognition, precision agriculture,

economic modeling, queuing theory models, financial risk modeling, text mining, and

drug discovery. He is a recognized expert in cluster analysis, designing algorithms and

implementing original software for clustering solutions in the field of early drug

discovery. John has a Distinguished Performance Award from Los Alamos National

Laboratory for his work on the IRS Fraud Detection Project.

Dr. Norah MacCuish

Dr. Norah MacCuish received her Ph.D. from Cornell University in the field of

Theoretical Physical Chemistry. Her twenty years’ experience in pharmaceutical and

software companies has given her expertise in the areas of diversity assessment for

compound acquisitions, combinatorial chemistry library design, Chemical information

systems use and design, both in basic drug discovery research and software

development. She was awarded a Bronze Impact award for her collaborative work

involving a Smith Kline Pharmaceutical Partnership. Norah has numerous publications

and has made scientific presentations in the areas of fluid simulations, chemical

diversity analysis, object-relational database systems, and chemical cluster

analysis. She was the principal Investigator for the two Phase I NSF SBIR grants, as

well as a Phase II NSF SBIR titled “Cheminformatics Teaching Tools for the

Cheminformatics Virtual Classroom.”

Page 7: Course - Cluster Analysis with R

THE SCHOOL OF DATA SCIENCE The School of Data Science, a project of Persontyle, specializes in designing and delivering

structured, relevant and practical learning experiences for all of us to understand data science in

simple human terms.

RETURN ON INVESTMENT (ROI) CONVINCE YOUR BOSS

The advent of the data driven connected era means that analyzing massive

scale, messy, noisy, and unstructured data is going to increasingly form part

of everyone's work.

The School of Data Science learning programs provide a unique investment

opportunity that pays for itself many times over.

For corporate bookings or to organize on-site training email

[email protected] or call now +44 (0)20 3239 3141

www.persontyle.com/school

World-class

Instructors Develop Practical

Data Science

Skills

Real World

Industry Use

Cases

Short Courses

For Time

Convenience

Value For

Money

Register Now

Follow us on Twitter @schooltds

Like us on Facebook

Get in touch! [email protected]

Limited seats. We encourage you to register as soon as you can.

"For the best return on your money, pour

your purse into your head."Benjamin Franklin