notes for big data

21
Hadoop Fundamentals I teaches you the basics of Apache Hadoop and the concept of Big Data. The materials and software used in this course are all FREE!. This is the second version of this course. Review the What's New? section for a list of changes made from the version 1 of this course. Welcome! About this course Page About your instructors URL What's New? Page Taking this course, a guided tour (7:01) URL Taking this course, a guided tour - Transcript URL Technical assistance Course forum Reading material and references Hadoop: The Definitive Guide (May 2012) URL Hadoop Essentials - A Quantitative Approach (Oct 2012) URL Hadoop in Action (Dec 2010) URL 1 Lesson 1 Lesson 1: Introduction to Hadoop Learning objectives Understand what Hadoop is Understand what Big Data is Learn about other open source software related to Hadoop Understand how Big Data solutions can work on the Cloud Instructions Review all the videos provided Complete the lab Videos What is Hadoop? - Part 1 (3:49) URL What is Hadoop? - Part 2 (4:31) URL What is Hadoop? - Transcript URL Hands-on lab - Creating your own Hadoop cluster We will use IBM InfoSphere BigInsights (BigInsights) software to work with Hadoop. BigInsights is available in different editions; this course uses the Quick Start Edition which is free, has no time usage limits and no data size usage limits. Step 1: Choose any of these options to work with BigInsights Option 1: Download and install BigInsights Download BigInsights Quick Start Edition (free to use) URL Hadoop Fundamentals I Version 2: Updated July 2013

Upload: niyati-nayak

Post on 02-Jan-2016

411 views

Category:

Documents


4 download

DESCRIPTION

Notes for Big Data

TRANSCRIPT

Page 1: Notes for Big Data

Hadoop Fundamentals I teaches you the basics of Apache Hadoop and the concept of Big Data. The

materials and software used in this course are all FREE!. This is the second version of this course.

Review the What's New? section for a list of changes made from the version 1 of this course.

Welcome!

About this course Page

About your instructors URL

What's New? Page

Taking this course, a guided tour (7:01) URL

Taking this course, a guided tour - Transcript URL

Technical assistance

Course forum

Reading material and references

Hadoop: The Definitive Guide (May 2012) URL

Hadoop Essentials - A Quantitative Approach (Oct 2012) URL

Hadoop in Action (Dec 2010) URL

1

Lesson 1

Lesson 1: Introduction to Hadoop

Learning objectives

Understand what Hadoop is

Understand what Big Data is

Learn about other open source software related to Hadoop

Understand how Big Data solutions can work on the Cloud

Instructions

Review all the videos provided

Complete the lab

Videos

What is Hadoop? - Part 1 (3:49) URL

What is Hadoop? - Part 2 (4:31) URL

What is Hadoop? - Transcript URL

Hands-on lab - Creating your own Hadoop cluster

We will use IBM InfoSphere BigInsights (BigInsights) software to work with Hadoop.

BigInsights is available in different editions; this course uses the Quick Start Edition which is

free, has no time usage limits and no data size usage limits.

Step 1: Choose any of these options to work with BigInsights

Option 1: Download and install BigInsights

Download BigInsights Quick Start Edition (free to use) URL

Hadoop Fundamentals I

Version 2: Updated July 2013

Page 2: Notes for Big Data

Option 2: Use BigInsights on the Amazon Cloud

Review the "Hadoop and Amazon Cloud" course (BD005EN) for details URL

Option 3: Use BigInsights on the IBM SmartCloud Enterprise

Review the "Hadoop and the IBM SmartCloud Enterprise" course (BD006EN) for details URL

Option 4: Download and use the supplied VMWare image

Download the 64-bit VMWare image URL

Download and install free VMWare Player to play VMWare image URL

Use the supplied VMWare image - User ID / password URL

Step 2: Set up lab input files

Download and copy the lab input files to the right locations Page

Lab Solution

Lab solution (6:41) URL

2

Lesson 2

Lesson 2: Hadoop architecture

Learning objectives

Understand the main Hadoop components

Learn how HDFS works

List data access patterns for which HDFS is designed

Describe how data is stored in an HDFS cluster

Instructions

Review all the videos provided

Complete the lab

Videos

Hadoop architecture and HDFS (8:01) URL

Hadoop architecture and HDFS - Transcript URL

Topology awareness and writing to HDFS (2:37) URL

Topology awareness and writing to HDFS - Transcript URL

HDFS Command Line (4:28) URL

HDFS Command Line - Transcript URL

Hands-on lab

Exploring HDFS - Lab instructions URL

Lab solution (5:45) URL

3

Lesson 3

Lesson 3: Introduction to MapReduce

Learning objectives

Understand the concepts of map and reduce operations

Describe how Hadoop executes a MapReduce job

List MapReduce fault tolerance and scheduling features

Page 3: Notes for Big Data

List MapReduce fundamental data types

Describe a MapReduce data flow

Instructions

Review all the videos provided

Complete the lab

Videos

Map and Reduce operations - Introduction (4:21) URL

Map and Reduce operations - Introduction - Transcript URL

Submitting a MapReduce job (1:23) URL

Submitting a MapReduce job - Transcript URL

Distributed mergesort engine (1:11) URL

Distributed mergesort engine - Transcript URL

Fundamental data types (2:09) URL

Fundamental data types - Transcript URL

Fault tolerance (1:04) URL

Fault tolerance - Transcript URL

Scheduling and task execution (1:51) URL

Scheduling and task execution - Transcript URL

Hands-on lab

Using MapReduce - Lab instructions URL

4

Lesson 4

Lesson 4: Querying data

Learning objectives

Understand how to work with Pig, Hive and JAQL

Instructions

Review all the videos provided

Complete the lab

Videos

An overview of Pig, Hive and Jaql (3:23) URL

An overview of Pig, Hive and Jaql - Transcript URL

Working with Pig (7:43) URL

Working with Pig - Transcript URL

Working with Hive (9:34) URL

Working with Hive - Transcript URL

Working with JAQL (4:28) URL

Working with JAQL - Transcript URL

Hands-on lab

Working with Jaql, Pig, and Hive - Lab instructions URL

Working with Jaql, Pig and Hive - Lab solution Part 1 (5:01) URL

Working with Jaql, Pig and Hive - Lab solution Part 2 (4:50) URL

Working with Jaql, Pig and Hive - Lab solution Part 3 (5:07) URL

Page 4: Notes for Big Data

Working with Jaql, Pig and Hive - Lab solution Part 4 (4:35) URL

5

Lesson 5

Lesson 5: Hadoop administration

Learning objectives

Understand how to add and remove nodes in a Hadoop cluster

Learn how to monitor the health status of your cluster

Learn how to configure Hadoop

Instructions

Review all the videos provided

Complete the lab

Videos

Adding and removing nodes to the cluster (7:46) URL

Verifying cluster health & stopping/starting somponents (2:41) URL

Configuring Hadoop - Part 1 (7:44) URL

Configuring Hadoop - Part 2 (2:52) URL

Setting up rack topology (1:52) URL

Hands-on lab

Hadoop Administration - Lab instructions URL

Hadoop Administration - Lab solution Part 1 (5:29) URL

Hadoop Administration - Lab solution Part 2 (4:59) URL

Hadoop Administration - Lab solution Part 3 (4:25) URL

Hadoop Administration - Lab solution Part 4 (3:55) URL

6

Lesson 6

Lesson 6: Moving data into Hadoop

Learning objectives

Understand how to move data into Hadoop using Flume

Instructions

Review all the videos provided

Complete the lab

Videos

Introduction to Flume (4:42) URL

Introduction to Flume - Transcript URL

Flume modes of operation and configuration (3:39) URL

Flume modes of operation and configuration - Transcript URL

Hands-on lab

Data Movement - Lab instructions URL

Page 5: Notes for Big Data

7

Test

Test your knowledge

Test objectives and instructions Page

Take the test! Quiz

Evaluation Form: Please provide feedback Assignment

Print your certificate!

Not available until the activity Evaluation Form: Please provide feedback is marked complete.

Not available until you achieve a required score in Take the test!.

Page 6: Notes for Big Data

SQL Access for Hadoop teaches you how to take advantage of the SQL language to access big data

stored in HDFS or HBase using SQL.

The course presents the different alternatives for SQL access, such as Hive, Impala, and Big SQL. It

explains the similarities and differences between these three technologies. The course includes hands on exercises and access to a Hadoop cluster with Hive, HBase, HDFS and Big SQL, so you can try

these technologies first hand. At the end of the course you will understand the different alternatives for accessing Big Data with SQL, and you will gain

hands-on experience with these technologies.

Welcome!

About this course Page

About your instructors URL

Taking this course, a guided tour (7:01) URL

Taking this course, a guided tour - Transcript URL

Technical assistance

Course forum

Reading material and references

Hadoop in Action URL

1

Lesson 1

Lesson 1: Introduction to Hive, Big SQL and Impala

Learning objectives

Understand Hive, Big SQL and Impala concepts, terminology and architecture

Understand similarities and differences between these technologies

Instructions

Review all the videos provided

Complete the lab

Videos

Lesson Outline (0:57) URL

Lesson Outline - Transcript URL

SQL for Big Data: Overview (5:43) URL

SQL for Big Data - Transcript URL

Introduction to Hive (8:31) URL

Introduction to Hive - Transcript URL

Introduction to Impala (7:08) URL

Introduction to Impala - Transcript URL

Introduction to Big SQL (9:38) URL

Introduction to Big SQL - Transcript URL

SQL Access for Hadoop

Page 7: Notes for Big Data

Hands-on lab - Accessing a Hadoop Cluster on the Cloud

Follow the steps in this section to gain access to a Hadoop Cluster on the Cloud.

Accessing the Cloud Based Environment for Exercises (6:30) URL

Accessing the Cloud Based Environment for Exercises - Transcript URL

Using putty with the IM Demo Cloud (5:17) URL

Using putty with the IM Demo Cloud - Transcript URL

2

Lesson 2

Lesson 2: Working with SQL using Hive

Learning objectives

Learn how to create tables and run HiveQL queries from the command line

Instructions

Review all the videos provided

Videos

Lesson outline (00:45) URL

Lesson Outline - Transcript URL

Exploring and Configuring the Hive environment (5:35) URL

Exploring and Configuring the Hive Environment - Transcript URL

Hive Tables (7:45) URL

Hive Tables - Transcript URL

Querying data with Hive (6:28) URL

Querying data with Hive - Transcript URL

Hands-on lab

Lab instructions - Working with Hive URL

3

Lesson 3

Lesson 3: Working with SQL using Big SQL

Lab objectives

Learn how to configure your Big SQL environment

Learn how to create tables and run Big SQL queries

Understand how to work with the JSQSH command line interface

Understand how to work with a JDBC or ODBC client

Instructions

Watch the videos in this lesson Review the lab instructions

Videos

Exploring the Big SQL environment (6:05) URL

Exploring the Big SQL Environment - Transcript URL

Page 8: Notes for Big Data

Starting, stopping and monitoring the Big SQL server process (4:14) URL

Starting, stopping and monitoring the Big SQL server process - Transcript URL

Configuring the Big SQL server (4:57) URL

Configuring the Big SQL server - Transcript URL

Getting started with JSQSH and connecting to a data source (10:56) URL

Getting started with JSQSH and connecting to a data source - Transcript URL

Creating and dropping schemas and tables (6:14) URL

Creating and dropping schemas and tables - Transcript URL

Loading tables and running queries (15:00) URL

Loading tables and running queries - Transcript URL

Working with Complex Data Types (7:19) URL

Working with Complex Data Types - Transcript URL

Connecting and running queries using JDBC and Eclipse(11:08) URL

Connecting and running queries using JDBC and Eclipse - Transcript URL

Hands-on lab

Lab instructions - Working with Big SQL URL

4

Lesson 4

Lesson 4: Accessing HBase with Hive and Big SQL

Learning objectives

Understand how to access HBase with Hive

Understand how to access HBase with Big SQL

Learn how to deal with HBase encoding and storage

Instructions

Review all the videos provided Complete the lab

Videos

HBase Support: Overview (8:22) URL

HBase Support: Overview - Transcript URL

Working with Big SQL and HBase (15:01) URL

Working with Big SQL and HBase - Transcript URL

Hands-on lab

Accessing HBase with SQL URL

5

Lesson 5

Lesson 5: System Tables and Troubleshooting

Learning objectives

Page 9: Notes for Big Data

Understand how to work with Catalog and System Tables with Big SQL

Learn how to troubleshoot a problem in Big SQL

Instructions

Review all the videos provided

Complete the labs

Videos

Troubleshooting in Big SQL (5:25) URL

Troubleshooting in Big SQL - Transcript URL

Inspecting Catalog and System Tables in Big SQL (3:11) URL

Inspecting Catalog and System Tables in Big SQL - Transcript URL

6

Test

Test your knowledge

Test objectives and instructions Page

Take the test! Quiz

Print your certificate!

Not available until you achieve a required score in Take the test!.

Page 10: Notes for Big Data

Stream Computing I teaches you the basics of Stream Computing using IBM InfoSphere Streams. This

is the first in a series of two courses. The course and the materials are all FREE. Trial software of

InfoSphere Streams will be used for the labs.

Welcome!

About this course Page

Taking this course, a guided tour (7:01) URL

Taking this course, a guided tour - Transcript URL

Technical assistance

Course forum (Input your feedback)

Download the course materials

Download the VMWare Image (with a 90 day trial of Streams 3.1) for exercises URL

Reading material and references

IBM InfoSphere Streams: Assembling Continuous Insight in the Information Revolution URL

1

Lesson 1

Lesson 1: Introduction to Stream Computing

Learning objectives

Understand what Stream Computing is all about

Instructions

Review all the videos provided

Complete the lab

Videos

What is Stream Computing? (5:23) URL

What is Stream Computing? - Transcript URL

The evolution of analytics (4:30) URL

The evolution of analytics - Transcript URL

Event processing vs stream computing (3:01) URL

Event processing vs. stream processing - Transcript URL

Use cases for stream computing (3:09) URL

Use cases for stream computing - Transcript URL

Introduction to IBM InfoSphere Streams (7:24) URL

Introduction to IBM InfoSphere Streams - Transcript URL

Stream Computing I

* Preview *

Page 11: Notes for Big Data

Hands-on lab - Downloading and installing InfoSphere Streams

We will use IBM's InfoSphere Streams Trial software to work with Stream Computing. This trial

software can be used for 90 days and has all the features of the fee-based version.

Download InfoSphere Streams (trial version) URL

Install InfoSphere Streams - Instructions URL

2

Lesson 2

Lesson 2: Streams concepts and terms

Learning objectives

Understand Streams concepts such as instances, hosts, operators, PEs, and jobs.

Instructions

Review all the videos provided

Complete the lab

Videos

Streams instances and hosts (3:46) URL

Streams instances and hosts - Transcript URL

Operators and Processing Elements (5:27) URL

Operators and Processing Elements - Transcript URL

Components of Streams (4:36) URL

Components of Streams - Transcript URL

Streams Studio IDE (3:53) URL

3

Lesson 3

Lesson 3: Streams applications

Learning objectives

Working with SPL

Get started with Streams applications

Instructions

Review all the videos provided

Complete the lab

Videos

What is the Streams Processing Language (SPL)? (5:26) URL

What is the Streams Processing Language (SPL) - Transcript URL

4

Lesson 4

Lesson 4: Composing an Application in

Page 12: Notes for Big Data

Streams

Learning objectives

Understand how to work with Streams operators such as Functor, Aggregate,

InetSource, and more!

Instructions

Review all the videos provided

Complete the lab

Videos

Setting up the environment and the inetSource operator (7:24) URL

Using the custom operator (9:33) URL

Using the filter operator (6:34) URL

Using the sort operator and tumbling windows (10:43) URL

Extracting values using Aggregate (7:42) URL

Working with the Join operator (14:17) URL

Selecting out columns using Functor operator (9:44) URL

Building an entire application with Drag and Drop in Streams 3.0 (36:17) URL

5

Lesson 5

Lesson 5: Deploying Streams Applications

Learning objectives

Understand how to deploy a Stream application

Instructions

Review all the videos provided

Complete the lab

Videos

Runtime architecture and introduction to topologies (5:36) URL

Runtime architecture and introduction to topologies - Transcript URL

Working with instances (2:00) URL

Working with instances - Transcript URL

Using StreamTool (4:52) URL

Using StreamTool - Transcript URL

6

Not available

7

Not available

Page 13: Notes for Big Data

Spreadsheet-like Analytics teaches you how to explore big data and takes you into a journey of

discovery without having to write a single line of code. Using BigSheets, a tool developed by IBM

Research, you can perform analytics on big data with an interface similar to a regular spreadsheet.

BigSheets masks all complexities of processing big data, and let's analysts and managers concentrate on

getting the analytics they want without having to know how to code.

Welcome!

About this course Page

Taking this course, a guided tour (7:01) URL

Taking this course, a guided tour - Transcript URL

Technical assistance

Course forum

1

Lesson 1

Lesson 1: Getting started with BigSheets

Learning objectives

Understand what BigSheets is

Learn who are the target users for BigSheets

Instructions

Review all the videos provided

Videos

Introduction to BigSheets (3:49) URL

What can you do with BigSheets? (1:11) URL

Working with BigSheets (3:31) URL

A tour of BigSheets - Part 1 (2:59) URL

A tour of BigSheets - Part 2 (3:01) URL

2

Lesson 2

Lesson 2: Discovering what BigSheets can do

Learning objectives

Using a simple scenario, understand BigSheets features and capabilities

Instructions

Review all the videos provided

Spreadsheet-like Analytics

Page 14: Notes for Big Data

Videos

Gathering input data from an application (4:04) URL

Manipulating data in BigSheets (3:26) URL

Overview of other BigSheets scenarios (2:31) URL

3

Lesson 3

Lesson 3: Deep Dive into BigSheets

Learning objectives

Exploring data by adding sheets

Understanding workflow and workbook diagrams

Monitoring BigSheets in the Dashboard

Instructions

Review all the videos provided

Complete the lab

Videos

Exploring Data by Adding Sheets - Part 1 (6:32) URL

Exploring Data by Adding Sheets - Part 1 - Transcript URL

Exploring Data by Adding Sheets - Part 2 (7:40) URL

Exploring Data by Adding Sheets - Part 2 - Transcript URL

Exploring Data by Adding Sheets - Part 3 (8:02) URL

Exploring Data by Adding Sheets - Part 3 - Transcript URL

Exploring Data by Adding Sheets - Part 4 (7:58) URL

Exploring Data by Adding Sheets - Part 4 - Transcript URL

Exploring Data by Adding Sheets - Part 5 (6:46) URL

Exploring Data by Adding Sheets - Part 5 - Transcript URL

Understanding Workflow and Workbook Diagrams. (5:04) URL

Understanding Workflow and Workbook Diagrams - Transcript URL

Monitoring BigSheets in Dashboard (4:26) URL

Monitoring BigSheets in Dashboard - Transcript URL

4

Lesson 4

Lesson 4: A complete case study using BigSheets

Learning objectives

Understand how to work with BigSheets using a complete case study

Instructions

Review all the videos provided

Videos

Page 16: Notes for Big Data

Brought to you by SciSpike (www.scispike.com) Java Fundamentals teaches you the basics of the Java Programming Language. The skills you gain can

also help you with Big Data technologies since MapReduce jobs in Hadoop can be written in Java.

Course Feedback (help us complete developing this course!)

Course forum (input your feedback)

1

Lesson 1

Lesson 1: Java overview

Learning objectives

Learn about the history of Java

Understand what JVM, JRE, JDK, and Java APIs are

Learn about Java Editions

Instructions

Complete all the presentations

Presentations

Java Overview SCORM package

2

Lesson 5

Lesson 5: Packages and Access Control

Learning objectives

Understand what packages are

Learn about packages naming convention

Learn about access level modifiers (private, protected, public)

Understand the import statement

Instructions

Complete all the presentations

Presentations

Packages and Access Control SCORM package

3

Java Fundamentals

*Preview*

Page 17: Notes for Big Data

Lesson 7

Lesson 7: Arrays

Learning objectives

Learn what arrays are

Understand the syntax for arrays in Java

Learn how to work with arrays

Compare arrays to collections

Instructions

Complete all the presentations

Presentations

Arrays SCORM package

4

Lesson 10

Lesson 10: JavaBeans

Learning objectives

Learn what JavaBeans are

Implementing the serializable interface

Learn about JavaBeans properties

Understand what is introspection

Instructions

Complete all the presentations

Presentations

JavaBeans SCORM package

5

Lesson 12

Lesson 12: Additional Features

Learning objectives

Learn about the enhanced for loop (foreach)

Understand what is Autoboxing

Learn about varargs

Learn about static imports

Understand how to work with annotations

Instructions

Complete all the presentations

Presentations

Page 18: Notes for Big Data

Additional Features SCORM package

Brought to you by Jaspersoft (www.jaspersoft.com) Hadoop Reporting and Analysis teaches you how to build your own Hadoop/Big Data reports over

relevant Hadoop technologies such as HBase, Hive, etc. It provides guidelines to choose between

various reporting techniques: Direct Batch Reports, Live Exploration, and Indirect Batch

Analysis. Hands-on labs are included using the free version of Jaspersoft and BigInsights (IBM's

Hadoop distribution). All materials and software used are FREE!

Welcome!

About this course Page

Taking this course, a guided tour (7:01) URL

Taking this course, a guided tour - Transcript URL

Technical assistance

Course forum

Instructions to Download Jaspersoft Software File

Attachments Folder

1

Lesson 1

Lesson 1: Introduction to Reporting and Analysis on Hadoop

Learning objectives

- Understanding Why Reporting and Analysis on Hadoop is important

- Approaches to Big Data reporting and analysis

- Big Data Access Technologies for Reporting and Analysis

- Business Intelligence and Hadoop Architecture

Instructions

- Review all the videos provided

Videos

Introduction to Reporting and Analytics on Hadoop (14:11) URL

Introduction to Reporting and Analytics on Hadoop - Transcript URL

2

Lesson 2

Lesson 2: Direct Batch Reporting on

Hadoop Reporting and Analysis

Page 19: Notes for Big Data

Hadoop

Learning objectives

- Understanding Direct Batch Reporting

- Importance of Direct Batch Reporting on Hadoop

- Guideline to choose Direct Batch Reporting approach

- Creating a Direct Batch Report on Hadoop

Instructions

- Review all the videos provided

- Complete the lab

Videos

Direct Batch Reporting (4:51) URL

Direct Batch Reporting Demo (10:27) URL

Hands-on lab

Creating Direct batch reports for big data - Instructions URL

Creating a big data direct batch report - Solution (11:36) URL

3

Lesson 3

Lesson 3: Live Exploration of Big Data

Learning objectives

- Understanding Live Exploration of Big Data

- Guidelines to choose Live Exploration approach to Big Data analysis

- Perform Live Exploration of Big Data on Hadoop

Instructions

- Review all the videos provided

- Complete the lab

Videos

Live Exploration Reporting (5:22) URL

Live Exploration Tutorial (10:43) URL

Hands-on lab

Practice Live Exploration URL

Practice Live Exploration - Solution (12:56) URL

4

Lesson 4

Lesson 4: Indirect Batch Analysis on Hadoop

Learning objectives

Page 20: Notes for Big Data

- Understanding Indirect Batch Analysis on Hadoop

- Guidelines to choose Indirect Batch Analysis approach

- Perform Indirect Batch analysis on Big Data

Instructions

- Review all the videos provided

- Complete the lab

Videos

Indirect Batch Analysis of Big Data (5:50) URL

Indirect Batch Analysis of Big Data - Demo (4:47) URL

Hands-on lab

Indirect Batch Analysis - Lab Instructions URL

Indirect Batch Analysis - Lab Solution (6:11) URL

5

Test

Test your knowledge

Test objectives and instructions Page

Take the test! Quiz

Print your certificate!

Not available until you achieve a required score in Take the test!.

6

Evaluation Form

Evaluation form

Evaluation Form: Please provide feedback

Page 21: Notes for Big Data