lab #1: data wrangling and exploration

8
Lab #1: Data Wrangling and Exploration Everything Data CompSci 216 Spring 2015

Upload: lykien

Post on 14-Feb-2017

249 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Lab #1: Data Wrangling and Exploration

Lab  #1:  Data  Wrangling  and  Exploration Everything  Data

CompSci  216  Spring  2015

Page 2: Lab #1: Data Wrangling and Exploration

Announcements  (Wed.  Jan.  14)

•  Welcome  our  UTAs!

•  Most  of  our  office  hours  have  been  posted  on  the  website

2

Kevin  Wu Eric  Wu Billy  Wan

Page 3: Lab #1: Data Wrangling and Exploration

Seat  assignment 3

Front  of  room

Back  of  room

1 2 3 4 5

6 7 8 9 10

11 12 13 14 15

16 17 18 19 20

See  course  website  (under  “Schedule”)  for  team  assignment

Page 4: Lab #1: Data Wrangling and Exploration

Format  of  this  lab

•  Discuss  last  homework – Continue  working  on  your  setup  if  needed

•  Work  on  new  exercises – Raise  your  hand  to  get  your  answer  checked  by  the  course  staff

– Win  challenge  to  get  li]le  prizes! •  Summarize  lessons  learned  (10  min.)

4

Page 5: Lab #1: Data Wrangling and Exploration

Lesson  learned:  reality  check

•  Data  is  messy •  Reality  is  messy •  Garbage  in,    garbage  out

☞  One  reason  why  you  need  data  wrangling

5

Image  credit:  http://www.practicalpolymath.com/wp-content/uploads/2012/10/eliminate_the_paper_mess.jpg

Page 6: Lab #1: Data Wrangling and Exploration

Lesson  learned:  “abstraction” •  More  structure  and    semantics  ➠  more    powerful  analysis,  and

•  Different  data  models    ➠  different  questions  and    processing  methods

☞  Another  reason  you  need  data  wrangling •  Examples  in  this  lab – Character  strings: – Dates: – Tables:

6

Picture  of  Prof.  Carlo  Sequin,  UC  Berkeley.  Image  credit:  http://scgp.stonybrook.edu/archives/4169

regular  expressions meaningful  range  conditions,  subtraction filtering,  grouping,  counting,  sorting…

Page 7: Lab #1: Data Wrangling and Exploration

Lesson  learned:  UI  or  not  UI

•  Exploratory  analysis  goes  a  long  way – You  can  get  fair  amount  of  insight  without  a  single  line  of  code

7

•  Interactivity  is  nice •  Easy  to  learn,    difficult  to  master – Tool-­‐‑specific  recipes  vs.  universal  primitives

Image  credit:  http://www.24tee.com/movie_related_shirts/lunch_is_for_wimps_tee_shirt

Page 8: Lab #1: Data Wrangling and Exploration

What’s  next

•  No  class  next  Monday  (MLK  Day) •  No  homework  until  next  Wednesday  (lecture  on  relational  data  processing) – Resolve  any  remaining  setup  issues – Try  to  gain  some  efficiency  with  VM  and  shell

8