setting up a mini big data architecture, just for you! - bas geerdink

Building a (mini) Big Data architecture Bas Geerdink 5 november 2014

Upload: nljug

Post on 02-Jul-2015

341 views

Category:

Technology

2 download

Report

Download

Embed Size (px):

DESCRIPTION

In this session, we'll start from scratch and build a nice little software stack that you can use to experiment with big data software. At the end, I've shown the steps to take for setting up a virtual server with a NoSQL database, Hadoop, stream processing engine, and visualization tools. After importing the data, we'll have a modest result in the form of a visualization of some 'little' big data. This session will give you an introduction to the world of big data architecture, without getting too complex or fuzzy. There will be some theory, but the focus is on the practical things you need to do to get started. Bring your laptop if you want some hands-on experience right away! Join this session ff you want to understand what's under the hood of Cloudera, Hortonworks, and MapR, and want to play with modern open source software!

TRANSCRIPT

Page 1: Setting up a mini big data architecture, just for you! - Bas Geerdink

Building a (mini) Big Data

architectureBas Geerdink5 november 2014

Page 2: Setting up a mini big data architecture, just for you! - Bas Geerdink

About me

• Work: ING

• Education: Master’s degree in AI and Informatics

• Programming since 1998 (C#, Java, Scala, Python, …)

• Twitter: @bgeerdink

• Email: [email protected]

https://twitter.com/bgeerdink

mailto:[email protected]

Page 3: Setting up a mini big data architecture, just for you! - Bas Geerdink

Introduction

• Big Data– Volume, Velocity, Variety

• Predictive Analytics / Machine Learning– Classification– Clustering– Recommendation

• Today’s goal:– Start small, create a playground!– Learn some basic tools and techniques

Page 4: Setting up a mini big data architecture, just for you! - Bas Geerdink

Reference big data solution architecture

Page 5: Setting up a mini big data architecture, just for you! - Bas Geerdink

• On-premise:– Hortonworks– Cloudera– MapR– IBM InfoSphere BigInsights– HP Vertica– Oracle– Teradata– SAS

• Cloud-based:– Amazon Elastic MapReduce– Microsoft Azure HDInsight– Google (App Engine, BigTable, Prediction API, …)– SAP HANA

… however, we’ll set up our own environment!

There are several out-of-the-box options to get started with big data development

Page 6: Setting up a mini big data architecture, just for you! - Bas Geerdink

Page 7: Setting up a mini big data architecture, just for you! - Bas Geerdink

Mahout features

• Optimized for large datasets (millions of records)

• Moving from Hadoop to Spark

• Supervised learning

– Classification: Naïve Bayes, Hidden Markov Models(NN), Random Forest

– Logistic Regression (predict a continuous value)

• Unsupervised learning

– Clustering: k-Means, Canopy

– Recommendations

Page 8: Setting up a mini big data architecture, just for you! - Bas Geerdink

Mahout AlgorithmsSize of dataset

Mahoutalgorithm

Executionmodel

Characteristics

Small SGD Sequential Uses all types of predictor vars

Medium (Complementary)Naïve Bayes

Parallel Prefers text, high training cost

Large Random Forest Parallel Uses all types of predictor vars, high training cost

Source: Cloudera (2011)

Page 9: Setting up a mini big data architecture, just for you! - Bas Geerdink

Page 10: Setting up a mini big data architecture, just for you! - Bas Geerdink

Example 1: newsgroups

• Data: newsgroup items

• 20.000 records

• Train with Naïve Bayes Classifier

• Categories: 20 newsgroups

• Prediction: newsgroup of

unclassified item

Page 11: Setting up a mini big data architecture, just for you! - Bas Geerdink

Example 2: hospital treatment

• Data: hospital surgeries in 50s, 60s, 70s

• 306 records

• Train with logistic regression

• Features:– Age of subject

– Year of treatment

– Number of positive axillary nodes

• Prediction: survival rate

• Visualization: D3.js

Page 12: Setting up a mini big data architecture, just for you! - Bas Geerdink

Page 13: Setting up a mini big data architecture, just for you! - Bas Geerdink

Summary

Page 14: Setting up a mini big data architecture, just for you! - Bas Geerdink

Want to move on?

• Follow courses on Coursera– Machine Learning: https://www.coursera.org/course/ml

– Introduction to Data Science: https://www.coursera.org/course/datasci

• Read Hadoop/Mahout/R tutorials and books

• Get some ML datasets: – http://archive.ics.uci.edu/ml/datasets.html

– http://aws.amazon.com/datasets

– http://www.datasciencecentral.com/profiles/blogs/big-data-sets-available-for-free

• Expand the ecosystem: Hive, Pig, HBase, Spark, …

https://www.coursera.org/course/ml

https://www.coursera.org/course/datasci

http://archive.ics.uci.edu/ml/datasets.html

http://aws.amazon.com/datasets

http://www.datasciencecentral.com/profiles/blogs/big-data-sets-available-for-free

BUSINESS APPLICATIONS SOLUTION (BAS) …...BAS TOWN HALL AGENDA Part I – 4 W's of BAS • What is BAS • Why we are doing BAS • Who is involved • When is BAS Part II – Next

WS-Bas wheel blocks TG-Bas gearboxes MT-Bas motors ...4.imimg.com/data4/KR/MC/MY-5286700/bas-drive-components.pdfComplete Bas travel unit comprising wheel block, gearbox and motor

BAS-341H INSTRUCTION MANUAL BAS-342H - Brotherdownload.brother.com/pub/com/ism/pdf/bas341h_342h_in.pdf · 2020-04-07 · BAS-341H, BAS-342H Thank you very much for buying a BROTHER

BAS-360H-05A BAS-365H-05A BAS-360H BAS-365H€¦ · The sewing area of BAS-365H (700x400mm) with a conventional arm bed type. 1 Improved Productivity, Superior Quality, a Space Saving

send any questions to [email protected]. SCIENCE (BAS) …

CERAMICS CLAY TILES in Bas Relief CLAY TILES in Bas Relief

Brochure | J. Geerdink Handel BV

Bas Winisis

BAS - Information Systems Engineering Networking & Security · BAS - Information Systems Engineering Networking & Security Award Level – Bachelor of Applied Science (BAS) Degree

BAS-326H-484 INSTRUCTION MANUAL BAS-326H-484 SF

BAS-300G INSTRUCTION MANUAL BAS-311G BAS-326G - Brotherdownload.brother.com/pub/com/ism/pdf/300g_in.pdf · 2020-04-07 · BAS-300G, BAS-311G, BAS-326G Thank you very much for buying

Fransje L. HOOIMEIJER, Hanneke PUTS, Tara GEERDINK

Simulated annealing in FWA-network planning · 2002. 4. 18. · The FWA system applied in this master thesis is the Ericsson MINI-LINK Broadband Access System (MINI-LINK BAS). The

Www.telewerkforum.nl Telewerken of E-werken? (als het maar werkt) Michael Geerdink Voorzitter TelewerkForum Nederland Amsterdam, 7 juli 2014

BAS-311E INSTRUCTION MANUAL BAS-311EL BAS … BAS-311E, -311EL...BAS-311EL BAS-326E tf:IMJ:t&,iiili(I 7 -i±*i) i:. Cl.)iltll.ijil~iWC/\1 c1J' 5, -~~

Biological Sciences - BAS · 2010. 6. 9. · Biological Sciences - BAS

Specifications / BAS-326H-484 / BAS-326H-484SF(with ... · BAS-326H-484 / BAS-326H-484SF (with Intermittent Presser Foot) * The number of data items and stitches that can be stored

MINI-LINK BAS - cosconor.frcosconor.fr/GSM/Divers/Equipment/Ericsson/MW/Ericsson Mini-Link B… · MINI-LINK BAS is part of Ericsson’s complete microwave ... in both 2G and 3G networks

Spark Summit EU talk by Bas Geerdink

BAS-326G BAS-311G

ADSL2+ IP DSLAM - svpro.ru Guide for BAS-8124... · BAS-8124/BAS-8124c Management Guide Oct 2006 Release 1.3 1 ADSL2+ IP DSLAM BAS-8124/BAS-8124c Release 1.3

Marketing (bas)

OAKLAND COUNTY FACILITIES DNR/DEQ Facilities · 2011-11-17 · oakland county facilities dodge #4 sp big lake bas long lake bas hart lake bas shoe lake bas loon lake bas davisburg

Automatically generated PDF from existing images.mmmut.ac.in/News_content/11055notice_09102019.pdfPaper Code BAS-OI- BBA-104 BAS-02 BBA-102 BAS-12 BAS-06 BAS-03 BBA-103 BAS-05 BAS-15

BAS Handbook

BAS-311HN BAS-326H BAS-341H BAS-342H BAS-341H BAS-342H - Brother … · 2018-03-21 · Brother has established voluntary environmental standards for products and created the “Brother

BAS-311HN BAS-326H BAS-341H BAS-342H BAS-341H BAS-342H€¦ · Stepping presser foot stroke 2-4.5mm, 4.5mm-10mm or 0(Default setting: 3mm) Hook Double-capacity shuttle hook Thread

MINI-LINK BAS - cosconor.fr Mini... · MINI-LINK™ BAS Wireless point-to-multipoint broadband access system Ericsson’s point-to-multipoint system, MINI-LINK BAS, is a compact,

Bas Kooijman Dept theoretical biology Vrije Universiteit Amsterdam [email protected]

BAS Sixth Annual Case Competition Team 14 BAS Sixth …actuary/caseCompetition...1 BAS Sixth Annual Case Competition Team 14. Background 2 BAS Sixth Annual Case Competition Team 14

design bas