apache mahout - isabel drost-frommisabel-drost.de/hadoop/slides/devoxx.pdf · apache mahout making...

89
Apache Mahout Making data analysis easy

Upload: hanhu

Post on 12-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Apache MahoutMaking data analysis easy

Page 2: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Isabel Drost

Nighttime:

Co-Founder, committer Apache Mahout. Organiser of Berlin Hadoop Get Together.

Daytime:Software developer.

Guest lecturer at TU Berlin.Co-Organiser Berlin Buzzwords 2010.

Page 3: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

● “Mastering Data-Intensive Collaboration and Decision Making”

● EU funded research project– Number of partners: 8– Coordinator: Research Academic Computer Technology

Institute (CTI), Greece

Page 4: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Hello Devoxx!

Page 5: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Hello Devoxx!

Page 6: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Hello Devoxx!

Page 7: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Hello Devoxx!

Page 8: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Hello Devoxx!

Page 9: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Hello Devoxx!

Machine learningbackground?

Page 10: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Hello Devoxx!

Page 11: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Agenda

● Data Mining/ Machine Learning?

● Why is scaling hard?

● Going beyond simple statistics.

Page 12: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Data Mining Applications

● Marketing.● Surveillance.● Fraud Detection.● Scientific Discovery.● Discover items usually purchased together.

= Extracting patterns from data.

Page 13: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Machine Learning Applications

● E-Mail spam classification.● News-topic discovery.● Building recommender systems.

= Extracting prediction models from data.

Page 14: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Machine learning – what's that?

Page 15: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Image by John Leech, from: The Comic History of Rome by Gilbert Abbott A Beckett.

Bradbury, Evans & Co, London, 1850sArchimedes taking a Warm Bath

Page 16: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Archimedes model of nature

Page 17: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

June 25, 2008 by chase-mehttp://www.flickr.com/photos/sasy/2609508999

Page 18: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

An SVM's model of nature

Page 19: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

The challenge

Page 20: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Mission

Provide scalable data mining algorithms.

Page 21: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

http://www.flickr.com/photos/honou/2936937247/

Page 22: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

HowTo: From data to information.

Page 23: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

January 3, 2006 by Matt Callowhttp://www.flickr.com/photos/blackcustard/81680010

Page 24: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

http://www.flickr.com/photos/29143375@N05/3344809375/in/photostream/

http://www.flickr.com/photos/redux/409356158/

Page 25: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

http://www.flickr.com/photos/disowned/1158260369/

The HDFS filesystem is not restricted to MapReduce jobs. It can be used for other applications, many of which are under way at Apache. The list includes the HBase database, the Apache Mahout machine learning system, and matrix operations.

Page 26: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

http://www.flickr.com/photos/29143375@N05/3344809375/in/photostream/ http://www.flickr.com/photos/redux/409356158/in/photostream/

http://www.flickr.com/photos/noodlepie/2675987121/http://www.flickr.com/photos/topsy/204929063/

Page 27: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

http://www.flickr.com/photos/29143375@N05/3344809375/in/photostream/

http://www.flickr.com/photos/redux/409356158/

Page 28: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

From data to information.From data to information.

● Collect data and define your learning problem.

● Data preparation.

● Training a prediction model.

● Checking the performance of your model.

Page 29: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data
Page 30: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

● Remove noise.

Page 31: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

● Remove noise.● Convert text to vectors.

Page 32: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

From texts to vectors

Page 33: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Sunny weather

High performance computing

If we looked at two words only:

Page 34: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Aaron

Zuse

Page 35: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Binary bag of words

● Imagine a n-dimensional space.● Each dimension = one possible word in texts.● Entry in vector is one, if word occurs in text.

● Problem:● Number of word occurrences not accounted for.

bi , j={1∀ x i∈d j0else }

Page 36: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Term Frequency

● Imagine a n-dimensional space.● Each dimension = one possible word in texts.● Entry in vector equal to the words frequency.

● Problem:● Common words dominate vectors.

bi , j=ni , j

Page 37: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

TF with stop wording

● Imagine a n-dimensional space.● Each dimension = one possible word in texts.● Filter stopwords.● Entry in vector equal to the words frequency.

● Problem:● Common and uncommon words with same weight.

bi , j=ni , j

Page 38: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

TF- IDF

● Imagine a n-dimensional space.● Each dimension = one possible word in texts.● Filter stopwords.● Entry in vector equal to the weighted frequency.

● Problem:● Long texts get larger values.

bi , j=ni , j×log ∣D∣

∣{d : ti∈d }∣

Page 39: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Normalized TF- IDF

● Imagine a n-dimensional space.● Each dimension = one possible word in texts.● Filter stopwords.● Entry in vector equal to the weighted frequency.● Normalize vectors.

● Problem:● Additional domain knowledge ignored.

bi , j=ni , j

∑knk , j

× log ∣D∣

∣{d : ti∈d }∣

Page 40: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Reality

● There are a few more words in news.● Use all relevant features/ signals available.

● Words.● Header fields.● Characteristics of publishing url.● …

● Usually pipeline of feature extractors.

Page 41: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

From data to information.

● Collect data and define your learning problem.

● Data preparation.

● Training a prediction model.

● Checking the performance of your model.

Page 42: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Step 2: Similarity

Page 43: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Euclidian

Page 44: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Euclidian

Page 45: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Euclidian

Cosine

Page 46: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Step 3: Clustering

Page 47: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data
Page 48: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data
Page 49: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data
Page 50: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data
Page 51: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Until stable.

Page 52: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Reality

● Seed selection.

● Choice of initial k.

● Continuous updates.

● Regular addition of clusters.

Page 53: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

From data to information.

● Collect data and define your learning problem.

● Data preparation.

● Training a prediction model.

● Checking the performance of your model.

Page 54: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Evaluation

● Compare against gold standard.

● Use quality measures.

● Manual inspection.

Page 55: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

From data to information.

● Collect data and define your learning problem.

● Data preparation.

● Training a prediction model.

● Checking the performance of your model.

Page 56: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

http://www.flickr.com/photos/generated/943078008/

Page 57: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data
Page 58: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

What else does Mahout have to offer.

Page 59: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Identify dominant topics

● Given a dataset of texts, identify main topics.

● Examples:● Dominant topics in set of mails.● Identify news message categories.

Algorithms: Parallel LDA

Page 60: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Assign items to defined categories.

● Given pre-defined categories, assign items to it.

Page 61: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

By freezelight, http://www.flickr.com/photos/63056612@N00/155554663/

Page 62: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data
Page 63: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data
Page 64: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Recommendation mining.

● Collaborative filtering.

Page 65: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Show most relevant ads

Page 66: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Show most relevant ads

Page 67: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

http://www.flickr.com/photos/alainpicard/4175214747

http://www.flickr.com/photos/25831000@N08/4156701164

http://www.flickr.com/photos/jfclere/4061801735

http://www.flickr.com/photos/claudio_ar/2643165035/

http://www.flickr.com/photos/claudio_ar/2643180457

Thanks to Falko Menge for the pictures of Brussels.

http://www.flickr.com/photos/joachim_s_mueller/2417313476/

http://www.flickr.com/photos/sebastian_bergmann/1244514498

http://www.flickr.com/photos/philfotos/4510197138/

Recommending places

Page 68: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Recommending people

Page 69: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Recommendation mining.

● Online collaborative filtering on single machine.● Offline Map/Reduce based version.● Content similarity can be integrated.

● Based on former Taste project.

Page 70: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Frequent pattern mining

● Given groups of items, find commonly co-occurring items.

● Examples:● In shopping carts find items bought together.● In query logs find queries issued in one session.

Page 71: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

By crypto, http://www.flickr.com/photos/crypto/3201254932/sizes/l/

By libraryman, http://www.flickr.com/photos/libraryman/78337046/sizes/l/

Page 72: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

By crypto, http://www.flickr.com/photos/crypto/3201254932/sizes/l/

By libraryman, http://www.flickr.com/photos/libraryman/78337046/sizes/l/

By quinnanya, http://www.flickr.com/photos/quinnanya/2806883231/

Page 73: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

March 14, 2009 by Artful Magpiehttp://www.flickr.com/photos/kmtucker/3355551036/

Requirements to get started

Page 74: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data
Page 75: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data
Page 76: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data
Page 77: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data
Page 78: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Why go for Apache Mahout?

Page 79: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Jumpstart your project with proven code.

January 8, 2008 by dreizehn28http://www.flickr.com/photos/1328/2176949559

Page 80: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Discuss ideas and problems online.

November 16, 2005 [phil h]http://www.flickr.com/photos/hi-phi/64055296

Page 81: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data
Page 82: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Become a committer.

Page 83: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Become a committer:Of Apache Mahout

Sebastian SchelterJake Mannix

Benson MarguliesRobin AnilDavid Hall

AbdelHakim DenecheKarl Wettin

Sean OwenGrant Ingersoll

Otis GospodneticDrew Farris

Jeff EastmanTed DunningIsabel Drost

Emeritus:

Niranjan BalasubramanianErik Hatcher

Ozgur YilmazelDawid Weiss

Page 84: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

*[email protected]

*[email protected]

Interest in solving hard problems.

Being part of lively community.

Engineering best practices.

Bug reports, patches, features.

Documentation, code, examples.Image by: Patrick McEvoy

Page 85: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data
Page 86: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Thanks to Tim Lossen et. al for taking amazing pictures of the conf.

Page 87: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

Berlin Buzzwords 2011

Search/ Store/ Scale

May/ June 2011

Thanks to Tim Lossen et. al for taking amazing pictures of the conf.

Page 88: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data

*[email protected]

*[email protected]

Interest in solving hard problems.

Being part of lively community.

Engineering best practices.

Bug reports, patches, features.

Documentation, code, examples.Image by: Patrick McEvoy

Page 89: Apache Mahout - Isabel Drost-Frommisabel-drost.de/hadoop/slides/devoxx.pdf · Apache Mahout Making data analysis easy. ... Data Mining Applications ... Mission Provide scalable data