data science 1 - bagrow.combagrow.com/ds1/lec00_notes_2018-08-27.pdf · 27/8/2018  · data science...

86
Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Upload: others

Post on 23-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Data Science 1STAT/CS 287

Prof. James Bagrow, UVM Dept of Math and Statistics

LECTURE 00 (2018-08-27)

Page 2: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Today’s Schedule

Intro + Motivation

Course logistics

Intro/Review of programming

Page 3: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

HomeworkRegister and install Anaconda on your machine ➞ Complete HW00.➞ HW00 includes additional setup reading

Work through reading

More Python review next timeFollowed by review of probability and statistics

Page 4: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

What is data science?

“Data science is the study of the generalizable extraction of knowledge from data”

http://en.wikipedia.org/wiki/Data_science

Page 5: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

What is data science?“Data science is the study of the generalizable extraction of knowledge from data”

http://en.wikipedia.org/wiki/Data_science

Page 6: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

What is data science?mixed with domain knowledge:

• social networks/sociology • biology or drug discovery • neuroscience • business • etc…

STATS

CS

MATH

Page 7: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

What is data science?A mixed course is incredibly useful for majors preparing for today’s interdisciplinary research and industry.

“All the skills I wish I learned before grad school!”

- Me, now

STATS

CS

MATH

Page 8: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

What is data science?What research and

coursework make data science look like

What data science actually is

Statistical analysis,predictive modeling,machine learning

Data collection, preprocessing, filtering

Presentation and visualization, reporting

Statistical analysis,predictive modeling,machine learning

Data collection, preprocessing, filtering

Presentation and visualization, reporting, explanation

Page 9: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

What is data science?What research and

coursework make data science look like

What data science actually is

Statistical analysis,predictive modeling,machine learning

Data collection, preprocessing, filtering

Presentation and visualization, reporting

Statistical analysis,predictive modeling,machine learning

Data collection, preprocessing, filtering

Presentation and visualization, reporting, explanation

Page 10: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

What is data science?A meta-field field encompassing:

statistics machine learning data mining knowledge discovery database engineering information retrieval visualization ⋮

Page 11: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Course logisticsWebsite: http://bagrow.com/ds1/

SyllabusLecture notes/slides

Homeworks, Projects on Blackboard

Let’s go over the syllabus right now!

Page 12: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Topics include• "good" practices for scientific computing • practical data exploration and analysis • web access to data sets • Natural language processing (NLP) and ML • Classic regression and inference methods • A/B testing and multiple comparisons problems • bayesian statistics and computation • visualization principles and design • Interpreting and communicating results!

Page 13: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

This is a programming-intensive class. The language will be python.

Page 14: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Why Python? Why not: RMATLAB IDL Perl Julia SAS STATA ⋮

?

Page 15: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Why Python? Why not: RMATLAB IDL Perl Julia SAS STATA ⋮

?

http://pypl.github.io/PYPL.html

Answers: 1. Popularity 2. Personal Preference 3. Plasticity

Page 16: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Why Python? Why not: RMATLAB IDL Perl Julia SAS STATA ⋮

?

http://pypl.github.io/PYPL.html

Answers: 1. Popularity 2. Personal Preference 3. Plasticity

Page 17: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Programming review

print("Hello world")

This is a programming-intensive class. The language will be python.

Page 18: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Programming review

Data types

Page 19: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Programming review

Data types

4 3.45“a” True

Page 20: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Programming review

Data types Data structures

4 3.45“a” True

Page 21: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Programming review

Data types Data structures

4 3.45“a” True

vectors, arrays, hashes, matrices, sets, etc.

Page 22: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Programming review

Data types Data structures

Code

4 3.45“a” True

vectors, arrays, hashes, matrices, sets, etc.

Page 23: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Programming review

Data types Data structures

Code

4 3.45“a” True

vectors, arrays, hashes, matrices, sets, etc.

keywords+ - % #

for while

Page 24: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Programming review

Data types Data structures

Code

4 3.45“a” True

vectors, arrays, hashes, matrices, sets, etc.

keywords+ - % #

for while

Code structures

Page 25: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Programming review

Data types Data structures

Code

4 3.45“a” True

vectors, arrays, hashes, matrices, sets, etc.

keywords+ - % #

for while

Code structuresfunctions

moduleslibraries

Page 26: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Programming review

Data types Data structures

Code

4 3.45“a” True

vectors, arrays, hashes, matrices, sets, etc.

keywords+ - % #

for while

Code structuresfunctions

moduleslibraries

Objects/Classes

Page 27: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Python has all of these

Page 28: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Use it interactively

Python 3.5.2 |Anaconda custom (x86_64)| (default, Jul 2 2016, 17:52:12) [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwinType "help", "copyright", "credits" or "license" for more information.>>>

Page 29: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Use it interactively

Python 3.5.2 |Anaconda custom (x86_64)| (default, Jul 2 2016, 17:52:12) [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwinType "help", "copyright", "credits" or "license" for more information.>>>

Prompt

Page 30: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Use it interactively

Python 3.5.2 |Anaconda custom (x86_64)| (default, Jul 2 2016, 17:52:12) [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwinType "help", "copyright", "credits" or "license" for more information.>>>

Page 31: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Use it interactively

Python 3.5.2 |Anaconda custom (x86_64)| (default, Jul 2 2016, 17:52:12) [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwinType "help", "copyright", "credits" or "license" for more information.>>> print("hello world")

Page 32: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Use it interactively

Python 3.5.2 |Anaconda custom (x86_64)| (default, Jul 2 2016, 17:52:12) [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwinType "help", "copyright", "credits" or "license" for more information.>>>hello world

print("hello world")

Page 33: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Use it with files

script.py:

Use it with files

script.py:

Page 34: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Use it with files

script.py: import sys # not used

def product(x,y): return x*y

print("x =", product(5,7))

Use it with files

script.py:

Page 35: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Use it with files

script.py: import sys # not used

def product(x,y): return x*y

print("x =", product(5,7))

x = 35

Use it with files

script.py:

Page 36: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Simple syntaxx = 34 # a comment y = "Hello" z = 3.45

if z == 3.45 or y == "Hello": x = x+1 y = y + " World" # String concat.

print(x) print(y)

Page 37: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Data types

integers

floats

strings

booleans

x = 5

pi = 3.14

s = "hi\tmom"

found = True

Page 38: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Data structuresLists

Page 39: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Data structuresLists nums = [4,7,12,9]

Page 40: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Data structuresLists nums = [4,7,12,9]

names = ["John","Alice","Bob"]

Page 41: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Data structuresLists nums = [4,7,12,9]

names = ["John","Alice","Bob"][5, [1,2,3], "nested and mixed"]

Page 42: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Using lists>>> nums = [4,7,12,9]>>> print(nums[0], nums[-1])4 9>>> print(nums[:3])[4,7,12]>>> print(nums[2:])[12,9]>>> nums.append(15)>>> print(nums)[4,7,12,9,15]

index access

slicing

modifying

Page 43: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

OK, so what?I want to study a social network

>>> F = ["Alice", "John", "Bob"]

Page 44: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

OK, so what?I want to study a social network

>>> F = ["Alice", "John", "Bob"]

But whose friends are these?

Page 45: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

OK, so what?I want to study a social network

>>> F = ["Alice", "John", "Bob"]

But whose friends are these?

>>> L = [["Alice", "John", "Bob"], ["John", "Bob", "Stacy"]]>>> print(L[1])["John", "Bob", "Stacy"]

Page 46: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

OK, so what?I want to study a social network

>>> F = ["Alice", "John", "Bob"]

But whose friends are these?

>>> L = [["Alice", "John", "Bob"], ["John", "Bob", "Stacy"]]>>> print(L[1])["John", "Bob", "Stacy"]

But who is person "1"?

Page 47: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

DictionariesA list maps the integers 0, 1, … to some values. Why limit ourselves to integers?

Page 48: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

DictionariesA list maps the integers 0, 1, … to some values. Why limit ourselves to integers?

>>> F = {"Gwen":["Alice", "John", "Bob"], "Peter":["John", "Bob", "Stacy"]}

Page 49: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

DictionariesA list maps the integers 0, 1, … to some values. Why limit ourselves to integers?

>>> F = {"Gwen":["Alice", "John", "Bob"], "Peter":["John", "Bob", "Stacy"]}>>> print(F['Gwen'])

Page 50: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

DictionariesA list maps the integers 0, 1, … to some values. Why limit ourselves to integers?

>>> F = {"Gwen":["Alice", "John", "Bob"], "Peter":["John", "Bob", "Stacy"]}>>> print(F['Gwen'])['Alice', 'John', 'Bob']

Page 51: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

DictionariesA list maps numbers 0, 1, … to values. Why limit ourselves to integers?

>>> F = {"Gwen":["Alice", "John", "Bob"], "Peter":["John", "Bob", "Stacy"]}

Generalization: maps KEYS to VALUES

Page 52: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Dictionaries cont.>>> F = {"Gwen":["Alice", "John", "Bob"], "Peter":["John", "Bob", "Stacy"]}

Generalization: maps KEYS to VALUES

Keys must be unique and immutable

Page 53: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Dictionaries cont.>>> F = {"Gwen":["Alice", "John", "Bob"], "Peter":["John", "Bob", "Stacy"]}

Generalization: maps KEYS to VALUES

Keys must be unique and immutable

Finding a key in a dict is very fast (make great lookup tables)

Page 54: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Dictionaries cont>>> D = {} # empty>>> D[5] = 9>>> D[3] = 12>>> print(D){3: 12, 5: 9}>>> D[5] = 8>>> del D[3]>>> print(D){5: 8}

Page 55: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Dictionaries cont>>> D = {} # empty>>> D[5] = 9>>> D[3] = 12>>> print(D){3: 12, 5: 9}>>> D[5] = 8>>> del D[3]>>> print(D){5: 8}

dicts are unordered

Page 56: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Dictionaries cont>>> D = {} # empty>>> D[5] = 9>>> D[3] = 12>>> print(D){3: 12, 5: 9}>>> D[5] = 8>>> del D[3]>>> print(D){5: 8}

dicts are unordered

replaced a value

Page 57: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Dictionaries cont>>> D = {} # empty>>> D[5] = 9>>> D[3] = 12>>> print(D){3: 12, 5: 9}>>> D[5] = 8>>> del D[3]>>> print(D){5: 8}

dicts are unordered

replaced a value

deleted a (key,value)

Page 58: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

More data structures to come

Page 59: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Control flowL = ["Alice","Bob"]

for x in L:print("x =", x)

Page 60: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Control flowL = ["Alice","Bob"]

for x in L:print("x =", x)

A python for-loop is like a "for each" loop: The looping variable (x) becomes each item in the list

Page 61: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Control flowL = ["Alice","Bob"]

for x in L:print("x =", x)

A python for-loop is like a "for each" loop: The looping variable (x) becomes each item in the list

White space is significant!

Page 62: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Control flowL = ["Alice","Bob"]

for x in L:print("x =", x)

x = Alicex = Bob

Output:

Page 63: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Control flowL = ["Alice","Bob"]

for x in L:print("x =", x)

versus: for i in range(len(L)):print("x =", L[i])

compare:

Page 64: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Control flowL = ["Alice","Bob"]

for x in L:print("x =", x)

versus: for i in range(len(L)):print("x =", L[i])

compare:

Becoming comfortable with "for each" loops makes for less code

and more readable code!

Page 65: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Control flowx = 5y = "yes"z = Truep = 0.333

If statements

Page 66: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Control flowx = 5y = "yes"z = Truep = 0.333

If statements

if x < 10 and y == "yes" and not z:print("won't happen")

print("if statement over")

Page 67: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Control flowx = 5y = "yes"z = Truep = 0.333

If statements

if x < 10 and y == "yes" and not z:print("won't happen")

print("if statement over")

if 0 < p < 1: print("yep")

Page 68: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Control flowx = 5y = "yes"z = Truep = 0.333

If statements

if x < 10 and y == "yes" and not z:print("won't happen")

print("if statement over")

if 0 < p < 1: print("yep")

nice! same as: p > 0 and p < 1

Page 69: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Control flowWhile loop

Page 70: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Control flowWhile loop

x = 0while True: x += 1

if x >= 100:break

print(x)

(x += 1 is a shortcut for x = x + 1)

Page 71: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Control flowWhile loop

x = 0while True: x += 1

if x >= 100:break

print(x)

x = 0while x < 100: x += 1print(x)

(x += 1 is a shortcut for x = x + 1)

Page 72: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Functionsdefmy_function(x,y): """Thisisthedocstring.This functiondoesblahblahblah. """ #Thecodewouldgohere.. pass

defsecond_func(username,password=None): ifpasswordisnotNone: print("foundapassword")

Page 73: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

"Batteries included"Want to write a web crawler? No problem! To download web pages:

Page 74: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

"Batteries included"Want to write a web crawler? No problem! To download web pages:

>>> import urllib>>> web = urllib.request.urlopen("http://bagrow.com/ds1")>>> text = web.read() # read?>>> print(text[2800:3400])

Page 75: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

"Batteries included"Want to write a web crawler? No problem! To download web pages:

>>> import urllib>>> web = urllib.request.urlopen("http://bagrow.com/ds1")>>> text = web.read() # read?>>> print(text[2800:3400])b'data,\nso it is more important than ever to understand how to collect, process, and\nanalyze these data. A picture is worth a thousand words, so visualizations,\nfrom scientific plots and infographics to interactive data explorers, are\ncrucial to summarize and communicate new discoveries.\n</p>\n\n</div><!--/row-->\n\n\n\n<div class="row">\n <h3>Resources</h3>\n <div class="col-md-7">\n <ul>\n <li>The <b><a href="syllabus_stat287_fall2016.pdf">course syllabus</a></b>. Be sure to check this out. </li>\n <li>The course introductory reading, <a href="whirlwindtourpython/00-Title.html">A W'

Page 76: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

"Batteries included"Programmatic power:

Page 77: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

"Batteries included"Programmatic power:

import urllib

website = "http://uvm.edu/"names = ["alice","bob","john"]

for name in names:url = website + name + ".html"print(url)

Page 78: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

"Batteries included"Programmatic power:

import urllib

website = "http://uvm.edu/"names = ["alice","bob","john"]

for name in names:url = website + name + ".html"print(url)

http://uvm.edu/alice.htmlhttp://uvm.edu/bob.htmlhttp://uvm.edu/john.html

Page 79: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

"Batteries included"Programmatic power:

import urllib

names = ["alice","bob","john"]

for name in names:url = "http://uvm.edu/{}.html".format(name)

Page 80: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

"Batteries included"Programmatic power:

import urllib

names = ["alice","bob","john"]

for name in names:url = "http://uvm.edu/{}.html".format(name)

string substitution

Page 82: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Computer setup

We're using a scientific python environment called Anaconda

Page 83: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Computer setup

Supports Windows/Mac/Linux

Includes package manager, text editor, and interactive prompt

Free for academic use

We're using a scientific python environment called Anaconda

Page 84: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Computer setup

https://www.anaconda.com/download/

Page 85: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

Computer setup

Let's see it in action

Page 86: Data Science 1 - bagrow.combagrow.com/ds1/LEC00_notes_2018-08-27.pdf · 27/8/2018  · Data Science 1 STAT/CS 287 Prof. James Bagrow, UVM Dept of Math and Statistics LECTURE 00 (2018-08-27)

HomeworkRegister and install Anaconda on your machine ➞ Complete HW00.➞ HW00 includes additional setup reading

Work through reading

More Python review next timeFollowed by review of probability and statistics