teaching*splunk*in* the*university* classroom* · 2017-10-13 · disclaimer* 2...
TRANSCRIPT
Copyright © 2014 Splunk Inc.
Peter Zadrozny Adjunct Professor San José State University
Teaching Splunk in the University Classroom
Disclaimer
2
During the course of this presentaIon, we may make forward-‐looking statements regarding future events or the expected performance of the company. We cauIon you that such statements reflect our current expectaIons and
esImates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-‐looking statements,
please review our filings with the SEC. The forward-‐looking statements made in the this presentaIon are being made as of the Ime and date of its live presentaIon. If reviewed aUer its live presentaIon, this presentaIon may not contain current or accurate informaIon. We do not assume any obligaIon to update any forward-‐looking statements we may make. In addiIon, any informaIon about our roadmap outlines our general product direcIon and is subject to change at any Ime without noIce. It is for informaIonal purposes only, and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligaIon either to develop the features or funcIonality described or to
include any such feature or funcIonality in a future release.
About Me
! CTO/Founder, Opallios – ConsulIng on Big Data, Performance
Engineering and SoUware Architecture
! Professional Highlights – Started operaIons of WebLogic in
Europe – Started operaIons of Sun
Microsystems in Mexico
3
IntroducIon
History
5
! Undergraduate elecIve from the Computer Science Department at San Jose State University – Category: CS 185C Advanced PracIcal CompuIng Topics – Course: IntroducIon to Big Data AnalyIcs
! 80% of the students are from the Master in Computer Science program – Qualifies as a graduate program elecIve course
! Course first taught on Fall 2012 ! RegistraIon limited to 30 students
– First elecIve in the CS Department to have a waiIng list
Course ObjecIve
6
! Focus on creaIng “Data Wranglers” – They know how to “massage” data – They know which tool is best for the problem at hand – They know how to install these tools and to manage them – They know how to use them to do the required analysis
! Data Wranglers are paired with domain experts – To achieve ideal synergies for a comprehensive analysis
! The course was designed with the employer in mind ! We consulted with a bunch of companies:
– Preference given to job candidates that have gone through the worst of the learning curve of big data technologies
Course ObjecIve
7
! The course is focused on geing a pracIcal understanding of the most popular tools and technologies that allow to process and analyze big data – Hadoop & Hive – Splunk
! They learn how to use these tools through exercises and real projects on a real cloud – No quizzes or exams
Course ObjecIve
8
! The course is fully supported by
Course OrganizaIon
Requirements
10
! For Splunk – None
! For Hadoop & Hive – Data Structures and Algorithms
ê They need to know the basics of programming – Data Base Management Systems
ê Experience with SQL is required
! For the cloud services – Unix system administraIon experience
Semester Structure
11
! Three general lectures – IntroducIon – Methodology for Big Data Projects
ê Used for grading the projects – Use and rules of the GoGrid cloud services
! 13 lectures for Hadoop and Hive – Start with this because it’s substanIally more difficult – Academic work load is less at the beginning of the semester
! 13 lectures for Splunk ! At least three guest speakers throughout the semester
Course Work
12
! One individual lab – Set up a high availability Splunk cluster in the GoGrid cloud
ê 1 master ê 1 search head ê 3 indexers ê 3 forwarders
– Monitor the /var/log directory on the indexers and forwarders – Create a report that shows all the failed login aoempts on all monitored
servers – Write a “setup” document that can be used by another person
ê Presents step-‐by-‐step instrucIons for this lab exercise
Course Work
13
! Team Project – Teams of three students, randomly chosen – They must find the data sets and define the analyIcs
ê AnalyIcs must be reviewed by instructor – Typical sources
ê Yelp ê www.data.gov ê data.sfgov.org (City of San Francisco) ê data.cityofchicago.org (City of Chicago) ê data.cityofnewyork.us (City of New York) ê data.ca.gov (State of California) ê data.ny.gov (State of New York)
Splunk Lectures
Splunk Lectures
15
! Originally used the Splunk EducaIon materials – For the first two semesters
! The dynamics of corporate educaIon assume – 8 hours a day for 4 or 5 days – Exclusive dedicaIon to the course
! This does not work in an academic seing – Two 75-‐minute lectures a week – Students have addiIonal courses
Splunk Lectures
16
! MoIvaIon to write a book – PracIcal hands-‐on projects – Using Splunk – Based on the experiences of teaching this course – NOT an academic text book
! Provides the necessary – Theory – Methodology – PracIcal projects used to teach Splunk commands
ê Machine Data ê Social media
Splunk Lectures
17
! Content of lectures is synchronized with due dates of – Individual lab – Team project schedule of delivery
! Schedule of delivery for the project report is based on the methodology – Phase 1 – Loading the Data – Phase 2 -‐ WriIng the searches – Phase 3 – VisualizaIon of the results – Final PresentaIons
Splunk Lectures
18
Lecture Ac,vity Content
1 What is Splunk? (C1) Geing data into Splunk (C2)
2 Remote Data CollecIon (C15)
3 Project selecIon due Scaling and high availability (C16)
4 The FAA Project: Geing the flight data into Splunk (C9)
5 Individual lab due The FAA Project: Geing the flight data into Splunk (C9)
6 The FAA Project: Analyzing airlines, airports, flights and delays (C10)
Splunk Lectures
19
Lecture Ac,vity Content
7 The FAA Project: Analyzing airlines, airports, flights and delays (C10)
8 Phase 1 due The FAA Project: Analyzing a specific flight over the years (C11)
9 Analyzing Tweets (C12)
10 Analyzing Foursquare Check-‐Ins (C13)
11 Phase 2 due SenIment Analysis (C14)
12 Using log files to create advanced analyIcs (C7)
13 Phase 3 due Final presentaIons
Grading
Grading
21
! Individual lab – Full credit if the report presents the failed logins for all the servers being
monitored – The objecIve is for the student to install and understand the different roles
in a Splunk cluster – Get familiar with the basic use of indexing and search
! Team Project – Team deliverable is a report document – Grading rubric based on the project methodology – Big data pracIIoners parIcipate in grading
Grading Rubric
22
Phase/Item Points Phase Total
Phase 1 – Loading and Verifying Data 21
Data input 9
VerificaIon of loaded data 7
Report quality 5
Phase 2 – Building and Verifying Searches 33
Searches 20
VerificaIon of searches 8
Report quality 5
Grading Rubric
23
Phase/Item Points Phase Total
Phase 3 – VisualizaIon of results 20
VisualizaIons 15
Report quality 5
Final PresentaIon 10
Delivery of presentaIon 5
Quality of presentaIon 5
Team member grade 16
TOTAL 100
Project Examples
Bay Area Bike Share
25
! Analysis of transportaIon paoerns of the users of this bicycle rental company
! 17 million log entries ! Approximately 150,000 “trips”
! 6 months worth of data ! 610MB
Analysis of 311 Calls
26
! A comparison of 311 calls of New York City and San Francisco
! 6 years worth of data – July 2008 to April 2014
! San Francisco – 900,000 events – 300MB
! New York City – 9,000,000 – 3GB
Beer Advocate Reviews
27
! A comprehensive analysis of best rated beers by season, alcohol content, specialty and brewery locaIon
! Total of 1,586,614 reviews ! From July 1998 to January 2012
! 1.43GB
Yelp – Analysis of Phoenix Reviews
28
! 11,537 business in the Phoenix metropolitan area
! 8,282 check-‐in sets ! 43,873 users ! 229,907 reviews
University Campus Safety and Security
29
! Data provided by the US Department of EducaIon
! Data includes 6 years – 2006 to 2011
! Used three files – Campus crime data – Campus violaIons
data – Campus fire data
Top 15 Most Dangerous UniversiIes
A Week in
30
! A broad analysis of interesIng facts found in Foursquare check-‐ins
! September 8 to 14, 2012 ! 24,187,615 Check-‐Ins ! 10GB
Most Popular Coffee Shops
US Office of Foreign Labor CerIficaIon
31
! A detailed analysis of H1B visas and corporate sponsorships for permanent residency
! From 2006 to 2013 ! 745MB
Permanent Residency Sponsorships
Splunk Academic Partner Program
32
More than 100 universiIes are teaching Splunk… Launch Splunk learning in your university today by leveraging the resources of the Splunk Academic Partner Program www.splunk.com/goto/academic ! Licensing – Free, 5gb one-‐year Splunk licenses for your classroom and students ! Free training courses – Leverage free web-‐based training courses to jump-‐start learning ! Instructor content – Get watermarked PDF’s of the slides that professional Splunk
trainers use ! Community – Network with peers at other schools to share content and best pracIces
QuesIons? Contact Rob Reed, Worldwide EducaIon Evangelist, at [email protected]
THANK YOU