data structures in python - r grapenthin · basic python data structures (built-in) •list, dict,...

Post on 24-Apr-2018

225 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DataStructuresinPythonOctober2,2017

Whatisadatastructure?

• Waytostoredataandhavesomemethodtoretrieveandmanipulateit

• Lotsofexamplesinpython:• List,dict,tuple,set,string• Array• Series,DataFrame

• Someoftheseare“built-in”(meaningyoucanjustusethem),othersarecontainedwithinotherpythonpackages,likenumpy andpandas

BasicPythonDataStructures(built-in)• List,dict,tuple,set,string

• Eachofthesecanbeaccessedinavarietyofways

• Decisiononwhichtouse?Dependsonwhatsortoffeaturesyouneed(easyindexing,immutability,etc)• Mutablevsimmutable

• Mutable– canchange• Immutable– doesn’tchange x=something#immutabletype

printxfunc(x)printx#printsthesamething

x=something#mutabletypeprintxfunc(x)printx#mightprintsomethingdifferent

BasicStructure:List

• Veryversatile,canhaveitemsofdifferenttypes,ismutable

• Tocreate:usesquarebrackets[]tocontaincommaseparatedvalues

• Example:>>l=[‘a’,‘b’,123]• >>l[’a’,‘b’,123]

• Togetvaluesout:>>l[1](useindex,startswith0)>>b

• Wesawthesebackinlab3

BasicStructure:Set

• Setisanunorderedcollectionwithnoduplicatevalues,ismutable• Createusing{}• Example:>>s={1,2,3}• >>s

set([1,2,3])

• Usefulforeliminatingduplicatevaluesfromalist,doingoperationslikeintersection,difference,union

BasicStructure:Tuple

• Tupleholdsvaluesseparatedbycommas,areimmutable• Createusing,or()tocreateempty• Example:>>t=1,2,3

• >>t(1,2,3)

>>type(t)type‘tuple’

• Usefulwhenstoringdatathatdoesnotchange,whenneedingtooptimizeperformanceofcode(pythonknowshowmuchmemoryneeded)

BasicStructure:Dict• Representedbykey:value pair

• Keys:canbyanyimmutabletypeandunique• Values:canbeanytype(mutableorimmutable)

• Tocreate:usecurlybraces{}ordict()andlistbothkeyandvalue• >>> letters = {1: 'a', 2: 'b', 3: 'c', 4: 'd'}

>>> type(letters) <type 'dict'>

• Toaccessdataindictionary,callbythekey• >>>letters[2]

'b'• Haveusefulmethodslikekeys(),values(),iteritems(),itervalues() usefulforaccessingdictionaryentries

• Usefulwhen:• Needassociationbetweenkey:value pair• Needtoquicklylookupdatabasedonadefinedkey• Valuesaremodified

Array:UseNumPy!

• Whatisanarray?• “listoflists”• SimilartoMatlab insomeways

• Createa2x3array• [123;456]:matlab• np.array([[1.,2.,3.],[4.,5.,6.]])

• WhatisNumPy?• NumericalPython• Pythonlibraryveryusefulforscientificcomputing

• HowtoaccessNumPy?• Needtoimportitintoyourpythonworkspaceorintoyourscript

• >>importnumpy asnp

>>>importnumpy asnp>>> y = np.array([[1.,2.,3.], [4.,5.,6.]]) >>> y array([[ 1., 2., 3.],

[ 4., 5., 6.]]) >>>

WhyuseaNumPy array?

• Whatisit?• “multidimensionalarrayofobjectsofallthesametype”

• Morecompactforthanlist(don’tneedtostorebothvalueandtypelikeinalist)• Reading/writingfasterwithNumPy• Getalotofvectorandmatrixoperations

• Can’tdo“vectorized”operationsonlist(likeelement-wiseaddition,multiplication)

• Canalsodothestandardstuff,likeindexing,comparisons,logicaloperations

CreatingNumPy ArraysCreatingNumPy arrayandcheckingifeachelementis>3

CreateNumPy array,printoutarraydimensions,anduseindexingtools

Create2x2NumPy arraywithjustzeros

MoreCreatingNumPy Arrays

• arange:like“range”,returnsanndarray

• Usereshapetodefine/changeshapeofarray

OperationswithNumPy Arrays

• Arithmeticoperations(e.g.+,-,*,/,**)withscalarsandbetweenequal-sizearrays– doneelementbyelement• Anewarrayiscreatedwiththeresult

• Universalfunctions(forexample:sin,cos,exp)alsooperateelementwiseonanarray,newarrayresults

Becareful:*vsdot

• *isproductoperator,operateselementwiseinNumPy arrays

A*B– elementwisemultiplication

.dot– matrixproduct

OtherUsefulNumPy ArrayOperations• Sum,min,max:canbeusedtogetvaluesforallelementsinarray

• Canuse(axis=#)tospecifycertainrowsandcolumns

Getsumofallelementsinarray,alsominandmaxwithinarray

Sumofeachcolumn(axis=0)

Minofeachrow(axis=1)

Cumulativesumalongeachrow

IndexingwithNumPy Arrays• 1Darrays(justlikelists)

• Multidimensionalarrays:workwithanindexperaxis

Createarrayusingarange

Pulloutelementatposition3

Pulloutelementsinpositionsstartingat3,before6

Elementatrow3,column4

Eachrowin2nd columnEachrowin2nd column

Eachcolumnin2nd and3rd row

Whatispandas?

• OpensourcepackagewithuserfriendlydatastructuresanddataanalysistoolsforPython• BuiltontopofNumPy,givesmoretools

• Veryusefulfortabulardataincolumns(i.e.spreadsheets),timeseriesdata,matrixdata,etc

• Twomaindatastructures:• Series(1-dimensional)• DataFrame (2-dimensional)

• Howtoaccess:• Needtoimportitintoyourpythonworkspaceorintoyourscript

>>importpandasaspd

paneldata:multidimensionalstructureddatasets

Pandas:Series• Effectivelya1-DNumPy arraywithanindex• 1Dlabeledarraythatcanholdanydatatype,withlabelsknownasthe“index”

>>>s=pd.Series(data,index=index)

datacanbeanarray,scalar,oradict

Pandas:Series

• Canusingslicingtograboutvalues

• Canalsouseindextograboutvalues

Pandas:DataFrame• Mostcommonlyusedpandasobject• DataFrame isbasicallyatablemadeupofnamedcolumnsofseries• Thinkspreadsheetortableofsomekind• Cantakedatafrom

• Dict of1Darrays,lists,dicts,Series• 2Dnumpy array• Series• AnotherDataFrame

• Canalsodefineindex(rowlabels)andcolumns(columnlabels)• SeriescanbedynamicallyaddedtoorremovedfromtheDataFrame

CreatingDataFrames

• Fromdict ofSeriesordicts:Have2series(oneandtwo)

NewDataFrame (df)isunionofthe2Seriesindices

Outputincludesrowlabels(index)andcolumnlabelsasspecified

NotetheNaN reportedbecauseofno4th valuein“one”Usingarrays/listsissimilar:

Ifnoindexisgiven,indexwillberange(n)wherenisarraylength

AccessingDataFrame Info

Canaccessspecificrows

Canaccessspecificrowsandcolumns

GrabspecificcolumnfromexistingDataFrame

AccessingDataFrame InfoGrabspecificcolumnfromexistingDataFrame

Makeanewcolumnthroughoperationsonothers

Getridofcolumns

WorkingwithDataFrames Create2differentDataFrames

Addthedataframes together

Noteelementwiseaddition,withtheresulthavingtheunionofrowandcolumnlabels,evenifyoudon’thavevaluesineachposition

LotsofNumPy elementwisefunctionsworkonDataFrames,asdooperationsliketranspose(.T),.dot

OthercoolthingstodowithDataFrames

Basicstatistics

sorting

OthercoolthingstodowithDataFrames

Grabbingdatathatmeetacertaincondition

Filteringdatatograbonlydatathatcontainscertainvaluesusing.isin

Addanewcolumnatendofdataframe

DataFrames:groupby

• Thisallowsyoutosplitupdataintogroupsbasedonsomecriteria,applysomefunction,andgetaresult

Using“groupby”toselectrowsthatcontainsamevalueinE,thensumthosevalues

PlottingDatainSeries

Createdaseriesof1000randomnumbers,withanindexofdatesstartingat1/1/2000

Plottedthecumulativesumofthoserandomnumbers

PlottingDatainDataFrames

Using.plot()withDataFrames willplotallofthecolumnswithlabels

Nextup:

• Labtoday– workingwithdatastructures

• Nextweek:howtogetdataintoandoutofpython(I/Otopics)

top related