data structures in python - r grapenthin · basic python data structures (built-in) •list, dict,...
TRANSCRIPT
DataStructuresinPythonOctober2,2017
Whatisadatastructure?
• Waytostoredataandhavesomemethodtoretrieveandmanipulateit
• Lotsofexamplesinpython:• List,dict,tuple,set,string• Array• Series,DataFrame
• Someoftheseare“built-in”(meaningyoucanjustusethem),othersarecontainedwithinotherpythonpackages,likenumpy andpandas
BasicPythonDataStructures(built-in)• List,dict,tuple,set,string
• Eachofthesecanbeaccessedinavarietyofways
• Decisiononwhichtouse?Dependsonwhatsortoffeaturesyouneed(easyindexing,immutability,etc)• Mutablevsimmutable
• Mutable– canchange• Immutable– doesn’tchange x=something#immutabletype
printxfunc(x)printx#printsthesamething
x=something#mutabletypeprintxfunc(x)printx#mightprintsomethingdifferent
BasicStructure:List
• Veryversatile,canhaveitemsofdifferenttypes,ismutable
• Tocreate:usesquarebrackets[]tocontaincommaseparatedvalues
• Example:>>l=[‘a’,‘b’,123]• >>l[’a’,‘b’,123]
• Togetvaluesout:>>l[1](useindex,startswith0)>>b
• Wesawthesebackinlab3
BasicStructure:Set
• Setisanunorderedcollectionwithnoduplicatevalues,ismutable• Createusing{}• Example:>>s={1,2,3}• >>s
set([1,2,3])
• Usefulforeliminatingduplicatevaluesfromalist,doingoperationslikeintersection,difference,union
BasicStructure:Tuple
• Tupleholdsvaluesseparatedbycommas,areimmutable• Createusing,or()tocreateempty• Example:>>t=1,2,3
• >>t(1,2,3)
>>type(t)type‘tuple’
• Usefulwhenstoringdatathatdoesnotchange,whenneedingtooptimizeperformanceofcode(pythonknowshowmuchmemoryneeded)
BasicStructure:Dict• Representedbykey:value pair
• Keys:canbyanyimmutabletypeandunique• Values:canbeanytype(mutableorimmutable)
• Tocreate:usecurlybraces{}ordict()andlistbothkeyandvalue• >>> letters = {1: 'a', 2: 'b', 3: 'c', 4: 'd'}
>>> type(letters) <type 'dict'>
• Toaccessdataindictionary,callbythekey• >>>letters[2]
'b'• Haveusefulmethodslikekeys(),values(),iteritems(),itervalues() usefulforaccessingdictionaryentries
• Usefulwhen:• Needassociationbetweenkey:value pair• Needtoquicklylookupdatabasedonadefinedkey• Valuesaremodified
Array:UseNumPy!
• Whatisanarray?• “listoflists”• SimilartoMatlab insomeways
• Createa2x3array• [123;456]:matlab• np.array([[1.,2.,3.],[4.,5.,6.]])
• WhatisNumPy?• NumericalPython• Pythonlibraryveryusefulforscientificcomputing
• HowtoaccessNumPy?• Needtoimportitintoyourpythonworkspaceorintoyourscript
• >>importnumpy asnp
>>>importnumpy asnp>>> y = np.array([[1.,2.,3.], [4.,5.,6.]]) >>> y array([[ 1., 2., 3.],
[ 4., 5., 6.]]) >>>
WhyuseaNumPy array?
• Whatisit?• “multidimensionalarrayofobjectsofallthesametype”
• Morecompactforthanlist(don’tneedtostorebothvalueandtypelikeinalist)• Reading/writingfasterwithNumPy• Getalotofvectorandmatrixoperations
• Can’tdo“vectorized”operationsonlist(likeelement-wiseaddition,multiplication)
• Canalsodothestandardstuff,likeindexing,comparisons,logicaloperations
CreatingNumPy ArraysCreatingNumPy arrayandcheckingifeachelementis>3
CreateNumPy array,printoutarraydimensions,anduseindexingtools
Create2x2NumPy arraywithjustzeros
MoreCreatingNumPy Arrays
• arange:like“range”,returnsanndarray
• Usereshapetodefine/changeshapeofarray
OperationswithNumPy Arrays
• Arithmeticoperations(e.g.+,-,*,/,**)withscalarsandbetweenequal-sizearrays– doneelementbyelement• Anewarrayiscreatedwiththeresult
• Universalfunctions(forexample:sin,cos,exp)alsooperateelementwiseonanarray,newarrayresults
Becareful:*vsdot
• *isproductoperator,operateselementwiseinNumPy arrays
A*B– elementwisemultiplication
.dot– matrixproduct
OtherUsefulNumPy ArrayOperations• Sum,min,max:canbeusedtogetvaluesforallelementsinarray
• Canuse(axis=#)tospecifycertainrowsandcolumns
Getsumofallelementsinarray,alsominandmaxwithinarray
Sumofeachcolumn(axis=0)
Minofeachrow(axis=1)
Cumulativesumalongeachrow
IndexingwithNumPy Arrays• 1Darrays(justlikelists)
• Multidimensionalarrays:workwithanindexperaxis
Createarrayusingarange
Pulloutelementatposition3
Pulloutelementsinpositionsstartingat3,before6
Elementatrow3,column4
Eachrowin2nd columnEachrowin2nd column
Eachcolumnin2nd and3rd row
Whatispandas?
• OpensourcepackagewithuserfriendlydatastructuresanddataanalysistoolsforPython• BuiltontopofNumPy,givesmoretools
• Veryusefulfortabulardataincolumns(i.e.spreadsheets),timeseriesdata,matrixdata,etc
• Twomaindatastructures:• Series(1-dimensional)• DataFrame (2-dimensional)
• Howtoaccess:• Needtoimportitintoyourpythonworkspaceorintoyourscript
>>importpandasaspd
paneldata:multidimensionalstructureddatasets
Pandas:Series• Effectivelya1-DNumPy arraywithanindex• 1Dlabeledarraythatcanholdanydatatype,withlabelsknownasthe“index”
>>>s=pd.Series(data,index=index)
datacanbeanarray,scalar,oradict
Pandas:Series
• Canusingslicingtograboutvalues
• Canalsouseindextograboutvalues
Pandas:DataFrame• Mostcommonlyusedpandasobject• DataFrame isbasicallyatablemadeupofnamedcolumnsofseries• Thinkspreadsheetortableofsomekind• Cantakedatafrom
• Dict of1Darrays,lists,dicts,Series• 2Dnumpy array• Series• AnotherDataFrame
• Canalsodefineindex(rowlabels)andcolumns(columnlabels)• SeriescanbedynamicallyaddedtoorremovedfromtheDataFrame
CreatingDataFrames
• Fromdict ofSeriesordicts:Have2series(oneandtwo)
NewDataFrame (df)isunionofthe2Seriesindices
Outputincludesrowlabels(index)andcolumnlabelsasspecified
NotetheNaN reportedbecauseofno4th valuein“one”Usingarrays/listsissimilar:
Ifnoindexisgiven,indexwillberange(n)wherenisarraylength
AccessingDataFrame Info
Canaccessspecificrows
Canaccessspecificrowsandcolumns
GrabspecificcolumnfromexistingDataFrame
AccessingDataFrame InfoGrabspecificcolumnfromexistingDataFrame
Makeanewcolumnthroughoperationsonothers
Getridofcolumns
WorkingwithDataFrames Create2differentDataFrames
Addthedataframes together
Noteelementwiseaddition,withtheresulthavingtheunionofrowandcolumnlabels,evenifyoudon’thavevaluesineachposition
LotsofNumPy elementwisefunctionsworkonDataFrames,asdooperationsliketranspose(.T),.dot
OthercoolthingstodowithDataFrames
Basicstatistics
sorting
OthercoolthingstodowithDataFrames
Grabbingdatathatmeetacertaincondition
Filteringdatatograbonlydatathatcontainscertainvaluesusing.isin
Addanewcolumnatendofdataframe
DataFrames:groupby
• Thisallowsyoutosplitupdataintogroupsbasedonsomecriteria,applysomefunction,andgetaresult
Using“groupby”toselectrowsthatcontainsamevalueinE,thensumthosevalues
PlottingDatainSeries
Createdaseriesof1000randomnumbers,withanindexofdatesstartingat1/1/2000
Plottedthecumulativesumofthoserandomnumbers
PlottingDatainDataFrames
Using.plot()withDataFrames willplotallofthecolumnswithlabels
Nextup:
• Labtoday– workingwithdatastructures
• Nextweek:howtogetdataintoandoutofpython(I/Otopics)