data science from 3,209 feet john chandler university of montana and ars quanta

27
Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta

Upload: alicia-wilkins

Post on 02-Jan-2016

215 views

Category:

Documents


2 download

TRANSCRIPT

Data Science from3,209 Feet

John ChandlerUniversity of Montana and Ars Quanta

A Data Scientist Toolkit

• A scripting language (Python, C#, Java, Perl)• A statistical computing language (R, SAS, SPSS)• Database languages/environments (MSSQL, Oracle, Postgres, sqlite)• Distributed computing environment (MapReduce, in many flavors)

Fundamentally we are flipping bits, but this isn’t software development.

CRISP-DM, Shearer, 2000

CRISP-DM, Shearer, 2000

CRISP-DM, Shearer, 2000

Tools for data preparation

• A scripting language (Python, C#, Java)• A statistical computing language (R, SAS, SPSS)• Database languages/environments (MSSQL, Oracle, Postgres, sqlite)• Distributed computing environment (MapReduce)

CRISP-DM, Shearer, 2000

CRISP-DM, Shearer, 2000

CRISP-DM, Shearer, 2000

Advice

• What is the simplest thing that could possibly work?• Start small and expand scope.• Use general tools. • Bring uncertainty into the spotlight.• Expect iteration.• Clear-eyed evaluation of not competing on data.