data wrangling lab - university of arkansas at little rock...why do we choose python? •c or c++...

34
Data Wrangling Lab Sept 26-29, 2016 (c) 2016 iCDO@UALR 1 David /WEI DAI CDO-1 Certificate Program: Foundations for Chief Data Officers

Upload: others

Post on 13-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

Data Wrangling Lab

Sept 26-29, 2016 (c) 2016 iCDO@UALR 1

David /WEI DAI

CDO-1 Certificate Program:Foundations for Chief Data Officers

Page 2: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

Agenda

• Basic Python Program

• MongoDB Lab

• Clean Data Lab

Sept 26-29, 2016 (c) 2016 iCDO@UALR 2

Page 3: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

A Tutorial on the Python Programming Language

Sept 26-29, 2016 (c) 2016 iCDO@UALR 3

Page 4: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

Why do we choose Python?

• C or C++

• Java

• Perl

• Scheme

• Fortran

• Python

• Matlab

Modern, interpreted, object-oriented, full featured high level programming language

Portable(Unix/Linux,MacOS X,Windows) Open source, intellectual property rights held

by the Python Software Foundation Python versions: 2.x and 3.x

3.x is not backwards compatible with 2.x This course uses 3.x version

Fast program development Simple syntax Easy to write well readable code Large standard library Lots of third party libraries

Numpy, Scipy, Biopython MatplotlibSept 26-29, 2016 (c) 2016 iCDO@UALR 4

Page 5: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

Python Program Platform

• Open a browser and access the website:

• https://teslae.host.ualr.edu:8888

• Password: python

Sept 26-29, 2016 (c) 2016 iCDO@UALR 5

Page 6: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

Hello World

•At the prompt type “ hello world!”

Sept 26-29, 2016 (c) 2016 iCDO@UALR 6

Page 7: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

The print and string Statement

>>> print('hello')hello>>> print('hello', David')hello David

• Elements separated by commas print with a space between them

• Strings are immutable

• “+” is overloaded to do concatenation >>> x = 'hello'

>>> x = x + ' America'>>> print(x)'hello America'

Sept 26-29, 2016 (c) 2016 iCDO@UALR 7

Page 8: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

Substrings and Methods

>>> s = '012345'>>> print(s[3])'3'>>> print(s[1:4])'123'>>> print(s[2:])'2345'>>> print(s[:4])'0123'>>> print(s[-2])'4'

• len(String) – returns the number of characters in the String

• str(Object) – returns a String representation of the Object

>>> print(len(s))6>>> print(str(10.3))'10.3'

Sept 26-29, 2016 (c) 2016 iCDO@UALR 8

Page 9: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

Sept 26-29, 2016 (c) 2016 iCDO@UALR 9

• Relational operators== equal

!=, <> not equal

> greater than

>= greater than or

equal

< less than

<= less than or equal

• Logical operatorsand and

or or

notnot

Page 10: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

Variables

• Are not declared, just assigned

• The variable is created the first time you assign it a value

• Assignment is = and comparison is ==

Sept 26-29, 2016 (c) 2016 iCDO@UALR 10

Page 11: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

Lists

• Ordered collection of data

• Data can be of different types

• Lists are mutable

• Issues with shared references and mutability

• Same subset operations as Strings

>>> x = [1,'hello', (3 + 2j)]>>> print(x)[1, 'hello', (3+2j)]>>> print(x[2])(3+2j)>>> print(x[0:2])[1, 'hello']

Sept 26-29, 2016 (c) 2016 iCDO@UALR 11

Page 12: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

Lists: Modifying Content

• x[i] = a reassigns the ith element to the value a

• Since x and y point to the same list object, both are changed

• The method appendalso modifies the list

>>> x = [1,2,3]>>> y = x>>> x[1] = 15>>>print( x)[1, 15, 3]>>> print(y)[1, 15, 3]>>> x.append(12)>>> print(y)[1, 15, 3, 12]

Sept 26-29, 2016 (c) 2016 iCDO@UALR 12

Page 13: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

Lists: Modifying Contents

• The method append modifies the list and returns None

• List addition (+) returns a new list

>>> x = [1,2,3]>>> y = x>>> z = x.append(12)>>> print(z == None)True>>> print(y)[1, 2, 3, 12]>>> x = x + [9,10]>>> print(x)[1, 2, 3, 12, 9, 10]>>> print(y)[1, 2, 3, 12]>>>

Sept 26-29, 2016 (c) 2016 iCDO@UALR 13

Page 14: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

If ELSE Statements

if expression:statement(s)

else:statement(s)

Sept 26-29, 2016 (c) 2016 iCDO@UALR 14

Page 15: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

For Loops

• Similar to perl for loops, iterating through a list of values

16123

for x in [1,6,12,3] :print(x)forloop1.py

0123

for x in range(4) :print(x)forloop2.py

range(N) generates a list of numbers [0,1, …, n-1]Sept 26-29, 2016 (c) 2016 iCDO@UALR 15

Page 16: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

Functions are first class objects

• Can be assigned to a variable

• Can be passed as a parameter

• Can be returned from a function

• Functions are treated like any other variable in Python, the def statement simply assigns a function to a variable

Sept 26-29, 2016 (c) 2016 iCDO@UALR 16

Page 17: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

Function Basics

def min(x,y) :if x > y :

return xelse :

return y

>>> mix(2,5)5

functionbasics.py

Sept 26-29, 2016 (c) 2016 iCDO@UALR 17

Page 18: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

Python for graph

• Matplotlib is a python 2D plotting library which produces high quality figures

• Read demos is ready at plot_demo.ipy file.

Sept 26-29, 2016 (c) 2016 iCDO@UALR 18

Page 19: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

MongoDB LAB

Sept 26-29, 2016 (c) 2016 iCDO@UALR 19

Page 20: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

http://teslae.host.ualr.edu:8081

username: mongotest

Password: mongotest

MongoDB Express User Interface

Sept 26-29, 2016 (c) 2016 iCDO@UALR 20

Page 21: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

MongoDB Express

• MongoDB Express is Web-based MongoDB admin interface

• You can create, review, export, delete data through the platform

Sept 26-29, 2016 (c) 2016 iCDO@UALR 21

Page 22: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

MongoDB Express Lab

• Export cities.json

• Add a new city name which you like to MongoDB

• Query or find the new city name

• Delete the new city name

Sept 26-29, 2016 (c) 2016 iCDO@UALR 22

Page 23: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

Clean Data Lab

Sept 26-29, 2016 (c) 2016 iCDO@UALR 23

Page 24: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

Courses Data in MongoDB

Sept 26-29, 2016 (c) 2016 iCDO@UALR 24

Page 25: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

Connect to MongoDB

Sept 26-29, 2016 (c) 2016 iCDO@UALR 25

Page 26: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

CRUD Operation for MongoDB

Sept 26-29, 2016 (c) 2016 iCDO@UALR 26

Page 27: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

Basic Python-MongoDB Lab

• Write codes to add a new course • {"courseid": "71XX", <--Change XX

• "subject": "information science",

• "title": "data quality algorithm", <--Change course name

• "hours": 3 <--Change hours

• }

• Write codes to search your courses• query = {"title": "data quality algorithm" } <--Change title name

• projection = {"hours": 3 <--Change hours

Sept 26-29, 2016 (c) 2016 iCDO@UALR 27

Page 28: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

Basic Python-MongoDB lab (cont.)

• A challenge project• Write codes to add your name at teachers’ list

Sept 26-29, 2016 (c) 2016 iCDO@UALR 28

Page 29: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

Clean Data lab (cont.)

• Teachers, Courses, and Students are MDM data so that the data is accurate and trust.

• student_course_report and

• teacher_course_report contain incorrect data, but teacherid, studentid ,and courseid are correct.

Teachersinfo teacher_course_report

Sept 26-29, 2016 (c) 2016 iCDO@UALR 29

Page 30: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

Clean Data lab (cont.)

teacher_course_report

Sept 26-29, 2016 (c) 2016 iCDO@UALR 30

Page 31: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

Clean Data lab (cont.)

• Write codes to clean student_course_report

• Tips:

coursesinfo

studentsinfo

student_course_report

Sept 26-29, 2016 (c) 2016 iCDO@UALR 31

Page 32: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

Clean Data lab (cont.)

• A challenge project• Write codes to clean t_s_c_report.

coursesinfo

studentsinfo

TeachersinfoSept 26-29, 2016 (c) 2016 iCDO@UALR 32

Page 33: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

THANK YOU

Sept 26-29, 2016 (c) 2016 iCDO@UALR 33

Page 34: Data Wrangling Lab - University of Arkansas at Little Rock...Why do we choose Python? •C or C++ •Java •Perl •Scheme •Fortran •Python •Matlab Modern, interpreted, object-oriented,

Reference

• http://www.scipy-lectures.org/packages/statistics/index.html

• https://github.com/mongo-express/mongo-express

• https://api.mongodb.com/python/current/

• https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&sqi=2&ved=0ahUKEwjI-uufkabPAhVOgx4KHdWsAXwQFggiMAE&url=http%3A%2F%2Fwww.fh.huji.ac.il%2F~goldmosh%2FPythonTutorialFeb152012.ppt&usg=AFQjCNH5nWz_PAanbl7JCdE6PN7SFUVxyw&sig2=SGxL0rIqfL8gbxQD7mfURA

• https://docs.mongodb.com/manual/

• http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/5944/pdf

• O'higgins, Niall. MongoDB and Python: Patterns and processes for the popular document-oriented database. " O'Reilly Media, Inc.", 2011.

Sept 26-29, 2016 (c) 2016 iCDO@UALR 34