data structures in python - r grapenthin · basic python data structures (built-in) •list, dict,...

29
Data Structures in Python October 2, 2017

Upload: nguyenphuc

Post on 24-Apr-2018

225 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: data Structures In Python - R Grapenthin · Basic Python Data Structures (built-in) •List, dict, tuple, set, string •Each of these can be accessed in a variety of ways •Decision

DataStructuresinPythonOctober2,2017

Page 2: data Structures In Python - R Grapenthin · Basic Python Data Structures (built-in) •List, dict, tuple, set, string •Each of these can be accessed in a variety of ways •Decision

Whatisadatastructure?

• Waytostoredataandhavesomemethodtoretrieveandmanipulateit

• Lotsofexamplesinpython:• List,dict,tuple,set,string• Array• Series,DataFrame

• Someoftheseare“built-in”(meaningyoucanjustusethem),othersarecontainedwithinotherpythonpackages,likenumpy andpandas

Page 3: data Structures In Python - R Grapenthin · Basic Python Data Structures (built-in) •List, dict, tuple, set, string •Each of these can be accessed in a variety of ways •Decision

BasicPythonDataStructures(built-in)• List,dict,tuple,set,string

• Eachofthesecanbeaccessedinavarietyofways

• Decisiononwhichtouse?Dependsonwhatsortoffeaturesyouneed(easyindexing,immutability,etc)• Mutablevsimmutable

• Mutable– canchange• Immutable– doesn’tchange x=something#immutabletype

printxfunc(x)printx#printsthesamething

x=something#mutabletypeprintxfunc(x)printx#mightprintsomethingdifferent

Page 4: data Structures In Python - R Grapenthin · Basic Python Data Structures (built-in) •List, dict, tuple, set, string •Each of these can be accessed in a variety of ways •Decision

BasicStructure:List

• Veryversatile,canhaveitemsofdifferenttypes,ismutable

• Tocreate:usesquarebrackets[]tocontaincommaseparatedvalues

• Example:>>l=[‘a’,‘b’,123]• >>l[’a’,‘b’,123]

• Togetvaluesout:>>l[1](useindex,startswith0)>>b

• Wesawthesebackinlab3

Page 5: data Structures In Python - R Grapenthin · Basic Python Data Structures (built-in) •List, dict, tuple, set, string •Each of these can be accessed in a variety of ways •Decision

BasicStructure:Set

• Setisanunorderedcollectionwithnoduplicatevalues,ismutable• Createusing{}• Example:>>s={1,2,3}• >>s

set([1,2,3])

• Usefulforeliminatingduplicatevaluesfromalist,doingoperationslikeintersection,difference,union

Page 6: data Structures In Python - R Grapenthin · Basic Python Data Structures (built-in) •List, dict, tuple, set, string •Each of these can be accessed in a variety of ways •Decision

BasicStructure:Tuple

• Tupleholdsvaluesseparatedbycommas,areimmutable• Createusing,or()tocreateempty• Example:>>t=1,2,3

• >>t(1,2,3)

>>type(t)type‘tuple’

• Usefulwhenstoringdatathatdoesnotchange,whenneedingtooptimizeperformanceofcode(pythonknowshowmuchmemoryneeded)

Page 7: data Structures In Python - R Grapenthin · Basic Python Data Structures (built-in) •List, dict, tuple, set, string •Each of these can be accessed in a variety of ways •Decision

BasicStructure:Dict• Representedbykey:value pair

• Keys:canbyanyimmutabletypeandunique• Values:canbeanytype(mutableorimmutable)

• Tocreate:usecurlybraces{}ordict()andlistbothkeyandvalue• >>> letters = {1: 'a', 2: 'b', 3: 'c', 4: 'd'}

>>> type(letters) <type 'dict'>

• Toaccessdataindictionary,callbythekey• >>>letters[2]

'b'• Haveusefulmethodslikekeys(),values(),iteritems(),itervalues() usefulforaccessingdictionaryentries

• Usefulwhen:• Needassociationbetweenkey:value pair• Needtoquicklylookupdatabasedonadefinedkey• Valuesaremodified

Page 8: data Structures In Python - R Grapenthin · Basic Python Data Structures (built-in) •List, dict, tuple, set, string •Each of these can be accessed in a variety of ways •Decision

Array:UseNumPy!

• Whatisanarray?• “listoflists”• SimilartoMatlab insomeways

• Createa2x3array• [123;456]:matlab• np.array([[1.,2.,3.],[4.,5.,6.]])

• WhatisNumPy?• NumericalPython• Pythonlibraryveryusefulforscientificcomputing

• HowtoaccessNumPy?• Needtoimportitintoyourpythonworkspaceorintoyourscript

• >>importnumpy asnp

>>>importnumpy asnp>>> y = np.array([[1.,2.,3.], [4.,5.,6.]]) >>> y array([[ 1., 2., 3.],

[ 4., 5., 6.]]) >>>

Page 9: data Structures In Python - R Grapenthin · Basic Python Data Structures (built-in) •List, dict, tuple, set, string •Each of these can be accessed in a variety of ways •Decision

WhyuseaNumPy array?

• Whatisit?• “multidimensionalarrayofobjectsofallthesametype”

• Morecompactforthanlist(don’tneedtostorebothvalueandtypelikeinalist)• Reading/writingfasterwithNumPy• Getalotofvectorandmatrixoperations

• Can’tdo“vectorized”operationsonlist(likeelement-wiseaddition,multiplication)

• Canalsodothestandardstuff,likeindexing,comparisons,logicaloperations

Page 10: data Structures In Python - R Grapenthin · Basic Python Data Structures (built-in) •List, dict, tuple, set, string •Each of these can be accessed in a variety of ways •Decision

CreatingNumPy ArraysCreatingNumPy arrayandcheckingifeachelementis>3

CreateNumPy array,printoutarraydimensions,anduseindexingtools

Create2x2NumPy arraywithjustzeros

Page 11: data Structures In Python - R Grapenthin · Basic Python Data Structures (built-in) •List, dict, tuple, set, string •Each of these can be accessed in a variety of ways •Decision

MoreCreatingNumPy Arrays

• arange:like“range”,returnsanndarray

• Usereshapetodefine/changeshapeofarray

Page 12: data Structures In Python - R Grapenthin · Basic Python Data Structures (built-in) •List, dict, tuple, set, string •Each of these can be accessed in a variety of ways •Decision

OperationswithNumPy Arrays

• Arithmeticoperations(e.g.+,-,*,/,**)withscalarsandbetweenequal-sizearrays– doneelementbyelement• Anewarrayiscreatedwiththeresult

• Universalfunctions(forexample:sin,cos,exp)alsooperateelementwiseonanarray,newarrayresults

Page 13: data Structures In Python - R Grapenthin · Basic Python Data Structures (built-in) •List, dict, tuple, set, string •Each of these can be accessed in a variety of ways •Decision

Becareful:*vsdot

• *isproductoperator,operateselementwiseinNumPy arrays

A*B– elementwisemultiplication

.dot– matrixproduct

Page 14: data Structures In Python - R Grapenthin · Basic Python Data Structures (built-in) •List, dict, tuple, set, string •Each of these can be accessed in a variety of ways •Decision

OtherUsefulNumPy ArrayOperations• Sum,min,max:canbeusedtogetvaluesforallelementsinarray

• Canuse(axis=#)tospecifycertainrowsandcolumns

Getsumofallelementsinarray,alsominandmaxwithinarray

Sumofeachcolumn(axis=0)

Minofeachrow(axis=1)

Cumulativesumalongeachrow

Page 15: data Structures In Python - R Grapenthin · Basic Python Data Structures (built-in) •List, dict, tuple, set, string •Each of these can be accessed in a variety of ways •Decision

IndexingwithNumPy Arrays• 1Darrays(justlikelists)

• Multidimensionalarrays:workwithanindexperaxis

Createarrayusingarange

Pulloutelementatposition3

Pulloutelementsinpositionsstartingat3,before6

Elementatrow3,column4

Eachrowin2nd columnEachrowin2nd column

Eachcolumnin2nd and3rd row

Page 16: data Structures In Python - R Grapenthin · Basic Python Data Structures (built-in) •List, dict, tuple, set, string •Each of these can be accessed in a variety of ways •Decision

Whatispandas?

• OpensourcepackagewithuserfriendlydatastructuresanddataanalysistoolsforPython• BuiltontopofNumPy,givesmoretools

• Veryusefulfortabulardataincolumns(i.e.spreadsheets),timeseriesdata,matrixdata,etc

• Twomaindatastructures:• Series(1-dimensional)• DataFrame (2-dimensional)

• Howtoaccess:• Needtoimportitintoyourpythonworkspaceorintoyourscript

>>importpandasaspd

paneldata:multidimensionalstructureddatasets

Page 17: data Structures In Python - R Grapenthin · Basic Python Data Structures (built-in) •List, dict, tuple, set, string •Each of these can be accessed in a variety of ways •Decision

Pandas:Series• Effectivelya1-DNumPy arraywithanindex• 1Dlabeledarraythatcanholdanydatatype,withlabelsknownasthe“index”

>>>s=pd.Series(data,index=index)

datacanbeanarray,scalar,oradict

Page 18: data Structures In Python - R Grapenthin · Basic Python Data Structures (built-in) •List, dict, tuple, set, string •Each of these can be accessed in a variety of ways •Decision

Pandas:Series

• Canusingslicingtograboutvalues

• Canalsouseindextograboutvalues

Page 19: data Structures In Python - R Grapenthin · Basic Python Data Structures (built-in) •List, dict, tuple, set, string •Each of these can be accessed in a variety of ways •Decision

Pandas:DataFrame• Mostcommonlyusedpandasobject• DataFrame isbasicallyatablemadeupofnamedcolumnsofseries• Thinkspreadsheetortableofsomekind• Cantakedatafrom

• Dict of1Darrays,lists,dicts,Series• 2Dnumpy array• Series• AnotherDataFrame

• Canalsodefineindex(rowlabels)andcolumns(columnlabels)• SeriescanbedynamicallyaddedtoorremovedfromtheDataFrame

Page 20: data Structures In Python - R Grapenthin · Basic Python Data Structures (built-in) •List, dict, tuple, set, string •Each of these can be accessed in a variety of ways •Decision

CreatingDataFrames

• Fromdict ofSeriesordicts:Have2series(oneandtwo)

NewDataFrame (df)isunionofthe2Seriesindices

Outputincludesrowlabels(index)andcolumnlabelsasspecified

NotetheNaN reportedbecauseofno4th valuein“one”Usingarrays/listsissimilar:

Ifnoindexisgiven,indexwillberange(n)wherenisarraylength

Page 21: data Structures In Python - R Grapenthin · Basic Python Data Structures (built-in) •List, dict, tuple, set, string •Each of these can be accessed in a variety of ways •Decision

AccessingDataFrame Info

Canaccessspecificrows

Canaccessspecificrowsandcolumns

GrabspecificcolumnfromexistingDataFrame

Page 22: data Structures In Python - R Grapenthin · Basic Python Data Structures (built-in) •List, dict, tuple, set, string •Each of these can be accessed in a variety of ways •Decision

AccessingDataFrame InfoGrabspecificcolumnfromexistingDataFrame

Makeanewcolumnthroughoperationsonothers

Getridofcolumns

Page 23: data Structures In Python - R Grapenthin · Basic Python Data Structures (built-in) •List, dict, tuple, set, string •Each of these can be accessed in a variety of ways •Decision

WorkingwithDataFrames Create2differentDataFrames

Addthedataframes together

Noteelementwiseaddition,withtheresulthavingtheunionofrowandcolumnlabels,evenifyoudon’thavevaluesineachposition

LotsofNumPy elementwisefunctionsworkonDataFrames,asdooperationsliketranspose(.T),.dot

Page 24: data Structures In Python - R Grapenthin · Basic Python Data Structures (built-in) •List, dict, tuple, set, string •Each of these can be accessed in a variety of ways •Decision

OthercoolthingstodowithDataFrames

Basicstatistics

sorting

Page 25: data Structures In Python - R Grapenthin · Basic Python Data Structures (built-in) •List, dict, tuple, set, string •Each of these can be accessed in a variety of ways •Decision

OthercoolthingstodowithDataFrames

Grabbingdatathatmeetacertaincondition

Filteringdatatograbonlydatathatcontainscertainvaluesusing.isin

Addanewcolumnatendofdataframe

Page 26: data Structures In Python - R Grapenthin · Basic Python Data Structures (built-in) •List, dict, tuple, set, string •Each of these can be accessed in a variety of ways •Decision

DataFrames:groupby

• Thisallowsyoutosplitupdataintogroupsbasedonsomecriteria,applysomefunction,andgetaresult

Using“groupby”toselectrowsthatcontainsamevalueinE,thensumthosevalues

Page 27: data Structures In Python - R Grapenthin · Basic Python Data Structures (built-in) •List, dict, tuple, set, string •Each of these can be accessed in a variety of ways •Decision

PlottingDatainSeries

Createdaseriesof1000randomnumbers,withanindexofdatesstartingat1/1/2000

Plottedthecumulativesumofthoserandomnumbers

Page 28: data Structures In Python - R Grapenthin · Basic Python Data Structures (built-in) •List, dict, tuple, set, string •Each of these can be accessed in a variety of ways •Decision

PlottingDatainDataFrames

Using.plot()withDataFrames willplotallofthecolumnswithlabels

Page 29: data Structures In Python - R Grapenthin · Basic Python Data Structures (built-in) •List, dict, tuple, set, string •Each of these can be accessed in a variety of ways •Decision

Nextup:

• Labtoday– workingwithdatastructures

• Nextweek:howtogetdataintoandoutofpython(I/Otopics)