learn python programming: a beginner's guide to learning the fundamentals of python ... ·...
TRANSCRIPT
LearnPythonProgrammingSecondEdition
Abeginner'sguidetolearningthefundamentalsofPythonlanguagetowriteefficient,high-qualitycode
FabrizioRomano
BIRMINGHAM-MUMBAI
LearnPythonProgrammingSecondEditionCopyright©2018PacktPublishing
Allrightsreserved.Nopartofthisbookmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformorbyanymeans,withoutthepriorwrittenpermissionofthepublisher,exceptinthecaseofbriefquotationsembeddedincriticalarticlesorreviews.
Everyefforthasbeenmadeinthepreparationofthisbooktoensuretheaccuracyoftheinformationpresented.However,theinformationcontainedinthisbookissoldwithoutwarranty,eitherexpressorimplied.Neithertheauthor(s),norPacktPublishingoritsdealersanddistributors,willbeheldliableforanydamagescausedorallegedtohavebeencauseddirectlyorindirectlybythisbook.
PacktPublishinghasendeavoredtoprovidetrademarkinformationaboutallofthecompaniesandproductsmentionedinthisbookbytheappropriateuseofcapitals.However,PacktPublishingcannotguaranteetheaccuracyofthisinformation.
CommissioningEditor:RichaTripathiAcquisitionEditor:KaranSadawanaContentDevelopmentEditor:RohitSinghTechnicalEditor:RomyDiasCopyEditor:SafisEditingProjectCoordinator:VaidehiSawantProofreader:SafisEditingIndexer:MariammalChettiyarGraphics:JasonMonteiroProductionCoordinator:ShantanuZagade
Firstpublished:December2015Secondedition:June2018
Productionreference:1280618
PublishedbyPacktPublishingLtd.LiveryPlace35LiveryStreetBirminghamB32PB,UK.
ISBN978-1-78899-666-2
www.packtpub.com
Tomydeardearfriendandmentor,TorstenAlexanderLange.Thankyouforalltheloveandsupport.
mapt.io
Maptisanonlinedigitallibrarythatgivesyoufullaccesstoover5,000booksandvideos,aswellasindustryleadingtoolstohelpyouplanyourpersonaldevelopmentandadvanceyourcareer.Formoreinformation,pleasevisitourwebsite.
Whysubscribe?SpendlesstimelearningandmoretimecodingwithpracticaleBooksandVideosfromover4,000industryprofessionals
ImproveyourlearningwithSkillPlansbuiltespeciallyforyou
GetafreeeBookorvideoeverymonth
Maptisfullysearchable
Copyandpaste,print,andbookmarkcontent
PacktPub.comDidyouknowthatPacktofferseBookversionsofeverybookpublished,withPDFandePubfilesavailable?YoucanupgradetotheeBookversionatwww.PacktPub.comandasaprintbookcustomer,youareentitledtoadiscountontheeBookcopy.Getintouchwithusatservice@packtpub.comformoredetails.
Atwww.PacktPub.com,youcanalsoreadacollectionoffreetechnicalarticles,signupforarangeoffreenewsletters,andreceiveexclusivediscountsandoffersonPacktbooksandeBooks.
ForewordIfirstgottoknowFabriziowhenhebecameourleaddeveloperafewyearsago.Itwasquicklyapparentthathewasoneofthoserarepeoplewhocombinerigoroustechnicalexpertisewithagenuinecareforthepeoplearoundhimandatruepassiontomentorandteach.Whetheritwasdesigningasystem,pairingtowritecode,doingcodereviews,orevenorganizingteamcardgamesatlunch,Fabwasalwaysthinkingnotonlyaboutthebestwaytodothejob,butalsoabouthowtomakesurethattheentireteamhadtheskillsandmotivationtodotheirbest.
You'llmeetthesamewiseandcaringguideinthisbook.Everychapter,everyexample,everyexplanationhasbeencarefullythoughtout,drivenbyadesiretoimpartthebestandmostaccurateunderstandingofthetechnology,andtodoitwithkindness.FabtakesyouunderhiswingtoteachyoubothPython'ssyntaxanditsbestpractices.
I'malsoimpressedwiththescopeofthisbook.Pythonhasgrownandevolvedovertheyears,anditnowspansanenormousecosystem,beingusedforwebdevelopment,routinedatahandling,andETL,andincreasinglyfordatascience.IfyouarenewtothePythonecosystem,it'softenhardtoknowwhattostudytoachieveyourgoals.Inthisbook,youwillfindusefulexamplesexposingyoutomanydifferentusesofPython,whichwillhelpguideyouasyoumovethroughthebreadththatPythonoffers.
IhopeyouwillenjoylearningPythonandbecomeamemberofourglobalcommunity.I'mproudtohavebeenaskedtowritethis,butaboveall,I'mpleasedthatFabwillbeyourguide.
NaomiCeder
PythonSoftwareFoundationFellow
Contributors
AbouttheauthorFabrizioRomanowasborninItalyin1975.Heholdsamaster'sdegreeincomputerscienceengineeringfromtheUniversityofPadova.Heisalsoacertifiedscrummaster,Reikimasterandteacher,andamemberofCNHC.
HemovedtoLondonin2011toworkforcompaniessuchasGlassesDirect,TBG/Sprinklr,andstudent.com.HenowworksatSohonetasaPrincipalEngineer/TeamLead.
HehasgiventalksonTeachingPythonandTDDattwoeditionsofEuroPython,andatSkillsmatterandProgSCon,inLondon.
I'mgratefultoallthosewhohelpedmecreatethisbook.SpecialthankstoDr.NaomiCederforwritingtheforewordtothisedition,andtoHeinrichKrugerandJulioTrigoforreviewingthisvolume.Tomyfriendsandfamily,wholovemeandsupportmeeveryday,thankyou.AndtoPetraLange,foralwaysbeingsolovelytome,thankyou.
AboutthereviewersHeinrichKrugerwasborninSouthAfricain1981.Heobtainedabachelor'sdegreewithhonorsfromtheUniversityoftheWitwatersrandinSouthAfricain2005andamaster'sdegreeincomputersciencefromUtrechtUniversityintheNetherlandsin2008.
HeworkedasaresearchassistantatUtrechtUniversityfrom2009until2013andhasbeenworkingasaprofessionalsoftwaredeveloperdevelopersince2014.HehasbeenusingPythonforpersonalandprojectsandinhisstudiessince2004,andprofessionallysince2014.
JulioVicenteTrigoGuijarroisacomputerscientistandsoftwareengineerwithoveradecadeofexperienceinsoftwaredevelopment.HecompletedhisstudiesattheUniversityofAlicante,Spain,in2007.Hehasworkedwithseveraltechnologiesandlanguages,includingMicrosoftDynamicsNAV,Java,JavaScript,andPython.HeisacertifiedScrumMaster.HehasbeenusingPythonsince2012,andheispassionateaboutsoftwaredesign,quality,andcodingstandards.HecurrentlyworksasseniorsoftwaredeveloperandteamleadatSohonet,developingreal-timecollaborationapplications.
Iwouldliketothankmyparentsfortheirlove,goodadvice,andcontinuoussupport.IwouldalsoliketothankallthefriendsIhavemetalongtheway,whoenrichedmylife,forkeepingupmymotivation,andmakemeprogress.
PacktissearchingforauthorslikeyouIfyou'reinterestedinbecominganauthorforPackt,pleasevisitauthors.packtpub.comandapplytoday.Wehaveworkedwiththousandsofdevelopersandtechprofessionals,justlikeyou,tohelpthemsharetheirinsightwiththeglobaltechcommunity.Youcanmakeageneralapplication,applyforaspecifichottopicthatwearerecruitinganauthorfor,orsubmityourownidea.
TableofContents
TitlePage
CopyrightandCredits
LearnPythonProgramming
SecondEdition
Dedication
PacktUpsell
Whysubscribe?
PacktPub.com
Foreword
Contributors
Abouttheauthor
Aboutthereviewers
Packtissearchingforauthorslikeyou
Preface
Whothisbookisfor
Whatthisbookcovers
Togetthemostoutofthisbook
Downloadtheexamplecodefiles
Conventionsused
Getintouch
Reviews
1. AGentleIntroductiontoPython
Aproperintroduction
EnterthePython
AboutPython
Portability
Coherence
Developerproductivity
Anextensivelibrary
Softwarequality
Softwareintegration
Satisfactionandenjoyment
Whatarethedrawbacks?
WhoisusingPythontoday?
Settinguptheenvironment
Python2versusPython3
InstallingPython
SettingupthePythoninterpreter
Aboutvirtualenv
Yourfirstvirtualenvironment
Yourfriend,theconsole
HowyoucanrunaPythonprogram
RunningPythonscripts
RunningthePythoninteractiveshell
RunningPythonasaservice
RunningPythonasaGUIapplication
HowisPythoncodeorganized?
Howdoweusemodulesandpackages?
Python'sexecutionmodel
Namesandnamespaces
Scopes
Objectsandclasses
Guidelinesonhowtowritegoodcode
ThePythonculture
AnoteonIDEs
Summary
2. Built-inDataTypes
Everythingisanobject
Mutableorimmutable?Thatisthequestion
Numbers
Integers
Booleans
Realnumbers
Complexnumbers
Fractionsanddecimals
Immutablesequences
Stringsandbytes
Encodinganddecodingstrings
Indexingandslicingstrings
Stringformatting
Tuples
Mutablesequences
Lists
Bytearrays
Settypes
Mappingtypes – dictionaries
Thecollectionsmodule
namedtuple
defaultdict
ChainMap
Enums
Finalconsiderations
Smallvaluescaching
Howtochoosedatastructures
Aboutindexingandslicing
Aboutthenames
Summary
3. IteratingandMakingDecisions
Conditionalprogramming
Aspecializedelse –elif
Theternaryoperator
Looping
Theforloop
Iteratingoverarange
Iteratingoverasequence
Iteratorsanditerables
Iteratingovermultiplesequences
Thewhileloop
Thebreakandcontinuestatements
Aspecialelseclause
Puttingallthistogether
Aprimegenerator
Applyingdiscounts
Aquickpeekattheitertoolsmodule
Infiniteiterators
Iteratorsterminatingontheshortestinputsequence
Combinatoricgenerators
Summary
4. Functions,theBuildingBlocksofCode
Whyusefunctions?
Reducingcodeduplication
Splittingacomplextask
Hidingimplementationdetails
Improvingreadability
Improvingtraceability
Scopesandnameresolution
Theglobalandnonlocalstatements
Inputparameters
Argument-passing
Assignmenttoargumentnamesdoesn'taffectthecaller
Changingamutableaffectsthecaller
Howtospecifyinputparameters
Positionalarguments
Keywordargumentsanddefaultvalues
Variablepositionalarguments
Variablekeywordarguments
Keyword-onlyarguments
Combininginputparameters
Additionalunpackinggeneralizations
Avoidthetrap!Mutabledefaults
Returnvalues
Returningmultiplevalues
Afewusefultips
Recursivefunctions
Anonymousfunctions
Functionattributes
Built-infunctions
Onefinalexample
Documentingyourcode
Importingobjects
Relativeimports
Summary
5. SavingTimeandMemory
Themap,zip,andfilterfunctions
map
zip
filter
Comprehensions
Nestedcomprehensions
Filteringacomprehension
dictcomprehensions
setcomprehensions
Generators
Generatorfunctions
Goingbeyondnext
Theyieldfromexpression
Generatorexpressions
Someperformanceconsiderations
Don'toverdocomprehensionsandgenerators
Namelocalization
Generationbehaviorinbuilt-ins
Onelastexample
Summary
6. OOP,Decorators,andIterators
Decorators
Adecoratorfactory
Object-orientedprogramming(OOP)
ThesimplestPythonclass
Classandobjectnamespaces
Attributeshadowing
Me,myself,andI – usingtheselfvariable
Initializinganinstance
OOPisaboutcodereuse
Inheritanceandcomposition
Accessingabaseclass
Multipleinheritance
Methodresolutionorder
Classandstaticmethods
Staticmethods
Classmethods
Privatemethodsandnamemangling
Thepropertydecorator
Operatoroverloading
Polymorphism –abriefoverview
Dataclasses
Writingacustomiterator
Summary
7. FilesandDataPersistence
Workingwithfilesanddirectories
Openingfiles
Usingacontextmanagertoopenafile
Readingandwritingtoafile
Readingandwritinginbinarymode
Protectingagainstoverridinganexistingfile
Checkingforfileanddirectoryexistence
Manipulatingfilesanddirectories
Manipulatingpathnames
Temporaryfilesanddirectories
Directorycontent
Fileanddirectorycompression
Datainterchangeformats
WorkingwithJSON
Customencoding/decodingwithJSON
IO,streams,andrequests
Usinganin-memorystream
MakingHTTPrequests
Persistingdataondisk
Serializingdatawithpickle
Savingdatawithshelve
Savingdatatoadatabase
Summary
8. Testing,Profiling,andDealingwithExceptions
Testingyourapplication
Theanatomyofatest
Testingguidelines
Unittesting 
Writingaunittest
Mockobjectsandpatching
Assertions
TestingaCSVgenerator
Boundariesandgranularity
Testingtheexportfunction
Finalconsiderations
Test-drivendevelopment
Exceptions
ProfilingPython
Whentoprofile?
Summary
9. CryptographyandTokens
Theneedforcryptography
Usefulguidelines
Hashlib
Secrets
Randomnumbers
Tokengeneration
Digestcomparison
HMAC
JSONWebTokens
Registeredclaims
Time-relatedclaims
Auth-relatedclaims
Usingasymmetric(public-key)algorithms
Usefulreferences
Summary
10. ConcurrentExecution
Concurrencyversusparallelism
Threadsandprocesses– anoverview
Quickanatomyofathread
Killingthreads
Context-switching
TheGlobalInterpreterLock
Raceconditionsanddeadlocks
Raceconditions
ScenarioA– raceconditionnothappening
ScenarioB– raceconditionhappening
Lockstotherescue
ScenarioC– usingalock
Deadlocks
Quickanatomyofaprocess
Propertiesofaprocess
Multithreadingormultiprocessing?
ConcurrentexecutioninPython
Startingathread
Startingaprocess
Stoppingthreadsandprocesses
Stoppingaprocess
Spawningmultiplethreads
Dealingwithraceconditions
Athread'slocaldata
Threadandprocesscommunication
Threadcommunication
Sendingevents
Inter-processcommunicationwithqueues
Threadandprocesspools
Usingaprocesstoaddatimeouttoafunction
Caseexamples
Exampleone–concurrentmergesort
Single-threadmergesort
Single-threadmultipartmergesort
Multithreadedmergesort
Multiprocessmergesort
Exampletwo –batchsudoku-solver
WhatisSudoku?
Implementingasudoku-solverinPython
Solvingsudokuwithmultiprocessing
Examplethree –downloadingrandompictures
Downloadingrandompictureswithasyncio
Summary
11. DebuggingandTroubleshooting
Debuggingtechniques
Debuggingwithprint
Debuggingwithacustomfunction
Inspectingthetraceback
UsingthePythondebugger
Inspectinglogfiles
Othertechniques
Profiling
Assertions
Wheretofindinformation
Troubleshootingguidelines
Usingconsoleeditors
Wheretoinspect
Usingteststodebug
Monitoring
Summary
12. GUIsandScripts
Firstapproach–scripting
Theimports
Parsingarguments
Thebusinesslogic
Secondapproach –aGUIapplication
Theimports
Thelayoutlogic
Thebusinesslogic
Fetchingthewebpage
Savingtheimages
Alertingtheuser
Howcanweimprovetheapplication?
Wheredowegofromhere?
Theturtlemodule
wxPython,PyQt,andPyGTK
Theprincipleofleastastonishment
Threadingconsiderations
Summary
13. DataScience
IPythonandJupyterNotebook
Installingtherequiredlibraries
UsingAnaconda
StartingaNotebook
Dealingwithdata
SettinguptheNotebook
Preparingthedata
Cleaningthedata
CreatingtheDataFrame
Unpackingthecampaignname
Unpackingtheuserdata
Cleaningeverythingup
SavingtheDataFrametoafile
Visualizingtheresults
Wheredowegofromhere?
Summary
14. WebDevelopment
Whatistheweb?
Howdoesthewebwork?
TheDjangowebframework
Djangodesignphilosophy
Themodellayer
Theviewlayer
Thetemplatelayer
TheDjangoURLdispatcher
Regularexpressions
Aregexwebsite
SettingupDjango
Startingtheproject
Creatingusers
AddingtheEntrymodel
Customizingtheadminpanel
Creatingtheform
Writingtheviews
Thehomeview
Theentrylistview
Theformview
TyingupURLsandviews
Writingthetemplates
Thefutureofwebdevelopment
WritingaFlaskview
BuildingaJSONquoteserverinFalcon
Summary
Afarewell
OtherBooksYouMayEnjoy
Leaveareview-letotherreadersknowwhatyouthink
PrefaceWhenIstartedwritingthefirsteditionofthisbook,Iknewverylittleaboutwhatwasexpected.Gradually,Ilearnedhowtoconverteachtopicintoastory.IwantedtotalkaboutPythonbyofferinguseful,simple,easy-to-graspexamples,but,atthesametime,Iwantedtopourmyownexperienceintothepages,anythingI'velearnedovertheyearsthatIthoughtwouldbevaluableforthereader—somethingtothinkabout,reflectupon,andhopefullyassimilate.Readersmaydisagreeandcomeupwithadifferentwayofdoingthings,buthopefullyabetterway.
Iwantedthisbooktonotjustbeaboutthelanguagebutaboutprogramming.Theartofprogramming,infact,comprisesmanyaspects,andlanguageisjustoneofthem.
Anothercrucialaspectofprogrammingisindependence.Theabilitytounblockyourselfwhenyouhitawallanddon'tknowwhattodotosolvetheproblemyou'refacing.Thereisnobookthatcanteachit,soIthought,insteadoftryingtoteachthataspect,Iwilltryandtrainthereaderinit.Therefore,Ileftcomments,questions,andremarksscatteredthroughoutthewholebook,hopingtoinspirethereader.IhopedthattheywouldtakethetimetobrowsetheWebortheofficialdocumentation,todigdeeper,learnmore,anddiscoverthepleasureoffindingthingsoutbythemselves.
Finally,Iwantedtowriteabookthat,eveninitspresentation,wouldbeslightlydifferent.So,Idecided,withmyeditor,towritethefirstpartinatheoreticalway,presentingtopicsthatwoulddescribethecharacteristicsofPython,andtohaveasecondpartmadeupofvariousreal-lifeprojects,toshowthereaderhowmuchcanbeachievedwiththislanguage.
Withallthesegoalsinmind,Ithenhadtofacethehardestchallenge:takeallthecontentIwantedtowriteandmakeitfitintheamountofpagesthatwereallowed.Ithasbeentough,andsacrificesweremade.
Myeffortshavebeenrewardedthough:tothisday,afteralmost3years,Istillreceivelovelymessagesfromreaders,everynowandthen,whothankmeand
tellmethingslikeyourbookhasempoweredme.Tome,itisthemostbeautifulcompliment.Iknowthatthelanguagemightchangeandpass,butIhavemanagedtosharesomeofmyknowledgewiththereader,andthatpieceofknowledgewillstickwiththem.
Andnow,Ihavewrittenthesecondeditionofthisbook,andthistime,Ihadalittlemorespace.SoIdecidedtoaddachapteraboutIO,whichwasdesperatelyneeded,andIevenhadtheopportunitytoaddtwomorechapters,oneaboutsecretsandoneaboutconcurrentexecution.Thelatterisdefinitelythemostchallengingchapterinthewholebook,anditspurposeisthatofstimulatingthereadertoreachalevelwheretheywillbeabletoeasilydigestthecodeinitandunderstanditsconcepts.
Ihavekeptalltheoriginalchapters,exceptforthelastonethatwasslightlyredundant.TheyhaveallbeenrefreshedandupdatedtothelatestversionofPython,whichis3.7atthetimeofwriting.
WhenIlookatthisbook,Iseeamuchmorematureproduct.Therearemorechapters,andthecontenthasbeenreorganizedtobetterfitthenarrative,butthesoulofthebookisstillthere.Themainandmostimportantpoint,empoweringthereader,isstillverymuchintact.
Ihopethatthiseditionwillbeevenmoresuccessfulthanthepreviousone,andthatitwillhelpthereadersbecomegreatprogrammers.Ihopetohelpthemdevelopcriticalthinking,greatskills,andtheabilitytoadaptovertime,thankstothesolidfoundationtheyhaveacquiredfromthebook.
Whothisbookisfor
PythonisthemostpopularintroductoryteachinglanguageinthetopcomputerscienceuniversitiesintheUS,soifyouarenewtosoftwaredevelopment,orifyouhavelittleexperienceandwouldliketostartoffontherightfoot,thenthislanguageandthisbookarewhatyouneed.Itsamazingdesignandportabilitywillhelpyoutobecomeproductiveregardlessoftheenvironmentyouchoosetoworkwith.
IfyouhavealreadyworkedwithPythonoranyotherlanguage,thisbookcanstillbeusefultoyou,bothasareferencetoPython'sfundamentals,andforprovidingawiderangeofconsiderationsandsuggestionscollectedovertwodecadesofexperience.
WhatthisbookcoversChapter1,AGentleIntroductiontoPython,introducesyoutofundamentalprogrammingconcepts.ItguidesyouthroughgettingPythonupandrunningonyourcomputerandintroducesyoutosomeofitsconstructs.
Chapter2,Built-inDataTypes,introducesyoutoPythonbuilt-indatatypes.Pythonhasaveryrichsetofnativedatatypes,andthischapterwillgiveyouadescriptionandashortexampleforeachofthem.
Chapter3,IteratingandMakingDecisions,teachesyouhowtocontroltheflowofyourcodebyinspectingconditions,applyinglogic,andperformingloops.
Chapter4,Functions,theBuildingBlocksofCode,teachesyouhowtowritefunctions.Functionsarethekeystoreusingcode,toreducingdebuggingtime,and,ingeneral,towritingbettercode.
Chapter5,SavingTimeandMemory,introducesyoutothefunctionalaspectsofPythonprogramming.Thischapterteachesyouhowtowritecomprehensionsandgenerators,whicharepowerfultoolsthatyoucanusetospeedupyourcodeandsavememory.
Chapter6,OOP,Decorators,andIterators,teachesyouthebasicsofobject-orientedprogrammingwithPython.Itshowsyouthekeyconceptsandallthepotentialsofthisparadigm.ItalsoshowsyouoneofthemostbelovedcharacteristicsofPython:decorators.Finally,italsocoverstheconceptofiterators.
Chapter7,FilesandDataPersistence,teachesyouhowtodealwithfiles,streams,datainterchangeformats,anddatabases,amongotherthings.
Chapter8,Testing,Profiling,andDealingwithExceptions,teachesyouhowtomakeyourcodemorerobust,fast,andstableusingtechniquessuchastestingandprofiling.Italsoformallydefinestheconceptofexceptions.
Chapter9,CryptographyandTokens,touchesupontheconceptsofsecurity,
hashes,encryption,andtokens,whicharepartofday-to-dayprogrammingatpresent.
Chapter10,ConcurrentExecution,isachallengingchapterthatdescribeshowtodomanythingsatthesametime.Itprovidesanintroductiontothetheoreticalaspectsofthissubjectandthenpresentsthreeniceexercisesthataredevelopedwithdifferenttechniques,therebyenablingthereadertounderstandthedifferencesbetweentheparadigmspresented.
Chapter11,DebuggingandTroubleshooting,showsyouthemainmethodsfordebuggingyourcodeandsomeexamplesonhowtoapplythem.
Chapter12,GUIsandScripts,guidesyouthroughanexamplefromtwodifferentpointsofview.Theyareatoppositeendsofthespectrum:oneimplementationisascript,andanotheroneisapropergraphicaluserinterfaceapplication.
Chapter13,DataScience,introducesafewkeyconceptsandaveryspecialtool,theJupyterNotebook.
Chapter14,WebDevelopment,introducesthefundamentalsofwebdevelopmentanddeliversaprojectusingtheDjangowebframework.Theexamplewillbebasedonregularexpressions.
TogetthemostoutofthisbookYouareencouragedtofollowtheexamplesinthisbook.Inordertodoso,youwillneedacomputer,aninternetconnection,andabrowser.ThebookiswritteninPython3.7,butitshouldalsowork,forthemostpart,withanyrecentPython3.*version.IhavegivenguidelinesonhowtoinstallPythononyouroperatingsystem.Theprocedurestodothatchangeallthetime,soyouwillneedtorefertothemostup-to-dateguideontheWebtofindprecisesetupinstructions.Ihavealsoexplainedhowtoinstallalltheextralibrariesusedinthevariousexamplesandprovidedsuggestionsifthereaderfindsanyissuesduringtheinstallationofanyofthem.Noparticulareditorisrequiredtotypethecode;however,Isuggestthatthosewhoareinterestedinfollowingtheexamplesshouldconsideradoptingapropercodingenvironment.Ihavegivensuggestionsonthismatterinthefirstchapter.
DownloadtheexamplecodefilesYoucandownloadtheexamplecodefilesforthisbookfromyouraccountatwww.packtpub.com.Ifyoupurchasedthisbookelsewhere,youcanvisitwww.packtpub.com/supportandregistertohavethefilesemaileddirectlytoyou.
Youcandownloadthecodefilesbyfollowingthesesteps:
1. Loginorregisteratwww.packtpub.com.2. SelecttheSUPPORTtab.3. ClickonCodeDownloads&Errata.4. EnterthenameofthebookintheSearchboxandfollowtheonscreen
instructions.
Oncethefileisdownloaded,pleasemakesurethatyouunziporextractthefolderusingthelatestversionof:
WinRAR/7-ZipforWindowsZipeg/iZip/UnRarXforMac7-Zip/PeaZipforLinux
ThecodebundleforthebookisalsohostedonGitHubathttps://github.com/PacktPublishing/Learn-Python-Programming-Second-Edition.Incasethere'sanupdatetothecode,itwillbeupdatedontheexistingGitHubrepository.
Wealsohaveothercodebundlesfromourrichcatalogofbooksandvideosavailableathttps://github.com/PacktPublishing/.Checkthemout!
ConventionsusedThereareanumberoftextconventionsusedthroughoutthisbook.
CodeInText:Indicatescodewordsintext,databasetablenames,foldernames,filenames,fileextensions,pathnames,dummyURLs,userinput,andTwitterhandles.Hereisanexample:"Withinthelearn.ppfolder,wewillcreateavirtualenvironmentcalledlearnpp."
Ablockofcodeissetasfollows:
#wedefineafunction,calledlocal
deflocal():
m=7
print(m)
Whenwewishtodrawyourattentiontoaparticularpartofacodeblock,therelevantlinesoritemsaresetinbold:
#key.points.mutable.assignment.py
x=[1,2,3]
deffunc(x):
x[1]=42#thischangesthecaller!
x='somethingelse'#thispointsxtoanewstringobject
Anycommand-lineinputoroutputiswrittenasfollows:
>>>importsys
>>>print(sys.version)
Bold:Indicatesanewterm,animportantword,orwordsthatyouseeonscreen.Forexample,wordsinmenusordialogboxesappearinthetextlikethis.Hereisanexample:"ToopentheconsoleinWindows,gototheStartmenu,chooseRun,andtypecmd."
Warningsorimportantnotesappearlikethis.
Tipsandtricksappearlikethis.
GetintouchFeedbackfromourreadersisalwayswelcome.
Generalfeedback:Emailfeedback@packtpub.comandmentionthebooktitleinthesubjectofyourmessage.Ifyouhavequestionsaboutanyaspectofthisbook,[email protected].
Errata:Althoughwehavetakeneverycaretoensuretheaccuracyofourcontent,mistakesdohappen.Ifyouhavefoundamistakeinthisbook,wewouldbegratefulifyouwouldreportthistous.Pleasevisitwww.packtpub.com/submit-errata,selectingyourbook,clickingontheErrataSubmissionFormlink,andenteringthedetails.
Piracy:IfyoucomeacrossanyillegalcopiesofourworksinanyformontheInternet,wewouldbegratefulifyouwouldprovideuswiththelocationaddressorwebsitename.Pleasecontactusatcopyright@packtpub.comwithalinktothematerial.
Ifyouareinterestedinbecominganauthor:Ifthereisatopicthatyouhaveexpertiseinandyouareinterestedineitherwritingorcontributingtoabook,pleasevisitauthors.packtpub.com.
ReviewsPleaseleaveareview.Onceyouhavereadandusedthisbook,whynotleaveareviewonthesitethatyoupurchaseditfrom?Potentialreaderscanthenseeanduseyourunbiasedopiniontomakepurchasedecisions,weatPacktcanunderstandwhatyouthinkaboutourproducts,andourauthorscanseeyourfeedbackontheirbook.Thankyou!
FormoreinformationaboutPackt,pleasevisitpacktpub.com.
AGentleIntroductiontoPython"Giveamanafishandyoufeedhimforaday.Teachamantofishandyoufeedhimforalifetime."
–Chineseproverb
AccordingtoWikipedia,computerprogrammingis:
"...aprocessthatleadsfromanoriginalformulationofacomputingproblemtoexecutablecomputerprograms.Programminginvolvesactivitiessuchasanalysis,developingunderstanding,generatingalgorithms,verificationofrequirementsofalgorithmsincludingtheircorrectnessandresourcesconsumption,andimplementation(commonlyreferredtoascoding)ofalgorithmsinatargetprogramminglanguage."
Inanutshell,codingistellingacomputertodosomethingusingalanguageitunderstands.
Computersareverypowerfultools,butunfortunately,theycan'tthinkforthemselves.Theyneedtobetoldeverything:howtoperformatask,howtoevaluateaconditiontodecidewhichpathtofollow,howtohandledatathatcomesfromadevice,suchasthenetworkoradisk,andhowtoreactwhensomethingunforeseenhappens,say,somethingisbrokenormissing.
Youcancodeinmanydifferentstylesandlanguages.Isithard?Iwouldsayyesandno.It'sabitlikewriting.Everybodycanlearnhowtowrite,andyoucantoo.But,whatifyouwantedtobecomeapoet?Thenwritingaloneisnotenough.Youhavetoacquireawholeothersetofskillsandthiswilltakealongerandgreatereffort.
Intheend,itallcomesdowntohowfaryouwanttogodowntheroad.Codingisnotjustputtingtogethersomeinstructionsthatwork.Itissomuchmore!
Goodcodeisshort,fast,elegant,easytoreadandunderstand,simple,easytomodifyandextend,easytoscaleandrefactor,andeasytotest.Ittakestimetobeabletowritecodethathasallthesequalitiesatthesametime,butthegoodnewsisthatyou'retakingthefirststeptowardsitatthisverymomentbyreadingthisbook.AndIhavenodoubtyoucandoit.Anyonecan;infact,weallprogramallthetime,onlywearen'tawareofit.
Wouldyoulikeanexample?
Sayyouwanttomakeinstantcoffee.Youhavetogetamug,theinstantcoffeejar,ateaspoon,water,andthekettle.Evenifyou'renotawareofit,you'reevaluatingalotofdata.You'remakingsurethatthereiswaterinthekettleandthatthekettleispluggedin,thatthemugisclean,andthatthereisenoughcoffeeinthejar.Then,youboilthewaterandmaybe,inthemeantime,youputsomecoffeeinthemug.Whenthewaterisready,youpouritintothecup,andstir.
So,howisthisprogramming?
Well,wegatheredresources(thekettle,coffee,water,teaspoon,andmug)andweverifiedsomeconditionsconcerningthem(thekettleispluggedin,themugisclean,andthereisenoughcoffee).Thenwestartedtwoactions(boilingthewaterandputtingcoffeeinthemug),andwhenbothofthemwerecompleted,wefinallyendedtheprocedurebypouringwaterintothemugandstirring.
Canyouseeit?Ihavejustdescribedthehigh-levelfunctionalityofacoffeeprogram.Itwasn'tthathardbecausethisiswhatthebraindoesalldaylong:evaluateconditions,decidetotakeactions,carryouttasks,repeatsomeofthem,andstopatsomepoint.Cleanobjects,putthemback,andsoon.
Allyouneednowistolearnhowtodeconstructallthoseactionsyoudoautomaticallyinreallifesothatacomputercanactuallymakesomesenseofthem.Andyouneedtolearnalanguageaswell,toinstructit.
Sothisiswhatthisbookisfor.I'lltellyouhowtodoitandI'lltrytodothatbymeansofmanysimplebutfocusedexamples(myfavoritekind).
Inthischapter,wearegoingtocoverthefollowing:
Python'scharacteristicsandecosystemGuidelinesonhowtogetupandrunningwithPythonandvirtualenvironmentsHowtorunPythonprogramsHowtoorganizePythoncodeandPython'sexecutionmodel
AproperintroductionIlovetomakereferencestotherealworldwhenIteachcoding;Ibelievetheyhelppeopleretaintheconceptsbetter.However,nowisthetimetobeabitmorerigorousandseewhatcodingisfromamoretechnicalperspective.
Whenwewritecode,we'reinstructingacomputeraboutthethingsithastodo.Wheredoestheactionhappen?Inmanyplaces:thecomputermemory,harddrives,networkcables,theCPU,andsoon.It'sawholeworld,whichmostofthetimeistherepresentationofasubsetoftherealworld.
Ifyouwriteapieceofsoftwarethatallowspeopletobuyclothesonline,youwillhavetorepresentrealpeople,realclothes,realbrands,sizes,andsoonandsoforth,withintheboundariesofaprogram.
Inordertodoso,youwillneedtocreateandhandleobjectsintheprogramyou'rewriting.Apersoncanbeanobject.Acarisanobject.Apairofsocksisanobject.Luckily,Pythonunderstandsobjectsverywell.
Thetwomainfeaturesanyobjecthasarepropertiesandmethods.Let'stakeapersonobjectasanexample.Typicallyinacomputerprogram,you'llrepresentpeopleascustomersoremployees.Thepropertiesthatyoustoreagainstthemarethingslikethename,theSSN,theage,iftheyhaveadrivinglicense,theiremail,gender,andsoon.Inacomputerprogram,youstoreallthedatayouneedinordertouseanobjectforthepurposeyou'reserving.Ifyouarecodingawebsitetosellclothes,youprobablywanttostoretheheightsandweightsaswellasothermeasuresofyourcustomerssothatyoucansuggesttheappropriateclothesforthem.So,propertiesarecharacteristicsofanobject.Weusethemallthetime:Couldyoupassmethatpen?—Whichone?—Theblackone.Here,weusedtheblackpropertyofapentoidentifyit(mostlikelyamongablueandaredone).
Methodsarethingsthatanobjectcando.Asaperson,Ihavemethodssuchasspeak,walk,sleep,wakeup,eat,dream,write,read,andsoon.AllthethingsthatIcandocouldbeseenasmethodsoftheobjectsthatrepresentme.
So,nowthatyouknowwhatobjectsareandthattheyexposemethodsthatyoucanrunandpropertiesthatyoucaninspect,you'rereadytostartcoding.Codinginfactissimplyaboutmanagingthoseobjectsthatliveinthesubsetoftheworldthatwe'rereproducinginoursoftware.Youcancreate,use,reuse,anddeleteobjectsasyouplease.
AccordingtotheDataModelchapterontheofficialPythondocumentation(https://docs.python.org/3/reference/datamodel.html):
"ObjectsarePython'sabstractionfordata.AlldatainaPythonprogramisrepresentedbyobjectsorbyrelationsbetweenobjects."
We'lltakeacloserlookatPythonobjectsinChapter6,OOP,Decorators,andIterators.Fornow,allweneedtoknowisthateveryobjectinPythonhasanID(oridentity),atype,andavalue.
Oncecreated,theIDofanobjectisneverchanged.It'sauniqueidentifierforit,andit'susedbehindthescenesbyPythontoretrievetheobjectwhenwewanttouseit.
Thetype,aswell,neverchanges.Thetypetellswhatoperationsaresupportedbytheobjectandthepossiblevaluesthatcanbeassignedtoit.
We'llseePython'smostimportantdatatypesinChapter2,Built-inDataTypes.
Thevaluecaneitherchangeornot.Ifitcan,theobjectissaidtobemutable,whilewhenitcannot,theobjectissaidtobeimmutable.
Howdoweuseanobject?Wegiveitaname,ofcourse!Whenyougiveanobjectaname,thenyoucanusethenametoretrievetheobjectanduseit.
Inamoregenericsense,objectssuchasnumbers,strings(text),collections,andsoonareassociatedwithaname.Usually,wesaythatthisnameisthenameofavariable.Youcanseethevariableasbeinglikeabox,whichyoucanusetoholddata.
So,youhavealltheobjectsyouneed;whatnow?Well,weneedtousethem,right?Wemaywanttosendthemoveranetworkconnectionorstoretheminadatabase.Maybedisplaythemonawebpageorwritethemintoafile.Inorder
todoso,weneedtoreacttoauserfillinginaform,orpressingabutton,oropeningawebpageandperformingasearch.Wereactbyrunningourcode,evaluatingconditionstochoosewhichpartstoexecute,howmanytimes,andunderwhichcircumstances.
Andtodoallthis,basicallyweneedalanguage.That'swhatPythonisfor.Pythonisthelanguagewe'llusetogetherthroughoutthisbooktoinstructthecomputertodosomethingforus.
Now,enoughofthistheoreticalstuff;let'sgetstarted.
EnterthePythonPythonisthemarvelouscreationofGuidoVanRossum,aDutchcomputerscientistandmathematicianwhodecidedtogifttheworldwithaprojecthewasplayingaroundwithoverChristmas1989.Thelanguageappearedtothepublicsomewherearound1991,andsincethenhasevolvedtobeoneoftheleadingprogramminglanguagesusedworldwidetoday.
IstartedprogrammingwhenIwas7yearsold,onaCommodoreVIC-20,whichwaslaterreplacedbyitsbiggerbrother,theCommodore64.ItslanguagewasBASIC.Lateron,IlandedonPascal,Assembly,C,C++,Java,JavaScript,VisualBasic,PHP,ASP,ASP.NET,C#,andotherminorlanguagesIcannotevenremember,butonlywhenIlandedonPythondidIfinallyhavethatfeelingthatyouhavewhenyoufindtherightcouchintheshop.Whenallofyourbodypartsareyelling,Buythisone!Thisoneisperfectforus!
Ittookmeaboutadaytogetusedtoit.ItssyntaxisabitdifferentfromwhatIwasusedto,butaftergettingpastthatinitialfeelingofdiscomfort(likehavingnewshoes),Ijustfellinlovewithit.Deeply.Let'sseewhy.
AboutPythonBeforewegetintothegorydetails,let'sgetasenseofwhysomeonewouldwanttousePython(IwouldrecommendyoutoreadthePythonpageonWikipediatogetamoredetailedintroduction).
Tomymind,Pythonepitomizesthefollowingqualities.
PortabilityPythonrunseverywhere,andportingaprogramfromLinuxtoWindowsorMacisusuallyjustamatteroffixingpathsandsettings.Pythonisdesignedforportabilityandittakescareofspecificoperatingsystem(OS)quirksbehindinterfacesthatshieldyoufromthepainofhavingtowritecodetailoredtoaspecificplatform.
Coherence
Pythonisextremelylogicalandcoherent.Youcanseeitwasdesignedbyabrilliantcomputerscientist.Mostofthetime,youcanjustguesshowamethodiscalled,ifyoudon'tknowit.
Youmaynotrealizehowimportantthisisrightnow,especiallyifyouareatthebeginning,butthisisamajorfeature.Itmeanslessclutteringinyourhead,aswellaslessskimmingthroughthedocumentation,andlessneedformappingsinyourbrainwhenyoucode.
Developerproductivity
AccordingtoMarkLutz(LearningPython,5thEdition,O'ReillyMedia),aPythonprogramistypicallyone-fifthtoone-thirdthesizeofequivalentJavaorC++code.Thismeansthejobgetsdonefaster.Andfasterisgood.Fastermeansafasterresponseonthemarket.Lesscodenotonlymeanslesscodetowrite,butalsolesscodetoread(andprofessionalcodersreadmuchmorethantheywrite),lesscodetomaintain,todebug,andtorefactor.
AnotherimportantaspectisthatPythonrunswithouttheneedforlengthyandtime-consumingcompilationandlinkagesteps,soyoudon'thavetowaittoseetheresultsofyourwork.
AnextensivelibraryPythonhasanincrediblywidestandardlibrary(it'ssaidtocomewithbatteriesincluded).Ifthatwasn'tenough,thePythoncommunityallovertheworldmaintainsabodyofthird-partylibraries,tailoredtospecificneeds,whichyoucanaccessfreelyatthePythonPackageIndex(PyPI).WhenyoucodePythonandyourealizethatyouneedacertainfeature,inmostcases,thereisatleastonelibrarywherethatfeaturehasalreadybeenimplementedforyou.
SoftwarequalityPythonisheavilyfocusedonreadability,coherence,andquality.Thelanguageuniformityallowsforhighreadabilityandthisiscrucialnowadayswherecodingismoreofacollectiveeffortthanasoloendeavor.AnotherimportantaspectofPythonisitsintrinsicmultiparadigmnature.Youcanuseitasascriptinglanguage,butyoualsocanexploitobject-oriented,imperative,andfunctionalprogrammingstyles.Itisversatile.
SoftwareintegrationAnotherimportantaspectisthatPythoncanbeextendedandintegratedwithmanyotherlanguages,whichmeansthatevenwhenacompanyisusingadifferentlanguageastheirmainstreamtool,Pythoncancomeinandactasaglueagentbetweencomplexapplicationsthatneedtotalktoeachotherinsomeway.Thisiskindofanadvancedtopic,butintherealworld,thisfeatureisveryimportant.
Satisfactionandenjoyment
Last,butnotleast,thereisthefunofit!WorkingwithPythonisfun.Icancodefor8hoursandleavetheofficehappyandsatisfied,alientothestruggleothercodershavetoendurebecausetheyuselanguagesthatdon'tprovidethemwiththesameamountofwell-designeddatastructuresandconstructs.Pythonmakescodingfun,nodoubtaboutit.Andfunpromotesmotivationandproductivity.
ThesearethemajoraspectsofwhyIwouldrecommendPythontoeveryone.Ofcourse,therearemanyothertechnicalandadvancedfeaturesthatIcouldhavetalkedabout,buttheydon'treallypertaintoanintroductorysectionlikethisone.Theywillcomeupnaturally,chapterafterchapter,inthisbook.
Whatarethedrawbacks?Probably,theonlydrawbackthatonecouldfindinPython,whichisnotduetopersonalpreferences,isitsexecutionspeed.Typically,Pythonisslowerthanitscompiledbrothers.ThestandardimplementationofPythonproduces,whenyourunanapplication,acompiledversionofthesourcecodecalledbytecode(withtheextension.pyc),whichisthenrunbythePythoninterpreter.Theadvantageofthisapproachisportability,whichwepayforwithaslowdownduetothefactthatPythonisnotcompileddowntomachinelevelasareotherlanguages.
However,Pythonspeedisrarelyaproblemtoday,henceitswideuseregardlessofthissuboptimalfeature.Whathappensisthat,inreallife,hardwarecostisnolongeraproblem,andusuallyit'seasyenoughtogainspeedbyparallelizingtasks.Moreover,manyprogramsspendagreatproportionofthetimewaitingforIOoperationstocomplete;therefore,therawexecutionspeedisoftenasecondaryfactortotheoverallperformance.Whenitcomestonumbercrunchingthough,onecanswitchtofasterPythonimplementations,suchasPyPy,whichprovidesanaveragefive-foldspeedupbyimplementingadvancedcompilationtechniques(checkhttp://pypy.org/forreference).
Whendoingdatascience,you'llmostlikelyfindthatthelibrariesthatyouusewithPython,suchasPandasandNumPy,achievenativespeedduetothewaytheyareimplemented.
Ifthatwasn'tagood-enoughargument,youcanalwaysconsiderthatPythonhasbeenusedtodrivethebackendofservicessuchasSpotifyandInstagram,whereperformanceisaconcern.Nonetheless,Pythonhasdoneitsjobperfectlyadequately.
WhoisusingPythontoday?Notyetconvinced?Let'stakeaverybrieflookatthecompaniesthatareusingPythontoday:Google,YouTube,Dropbox,Yahoo!,ZopeCorporation,IndustrialLight&Magic,WaltDisneyFeatureAnimation,Blender3D,Pixar,NASA,theNSA,RedHat,Nokia,IBM,Netflix,Yelp,Intel,Cisco,HP,Qualcomm,andJPMorganChase,tonamejustafew.
EvengamessuchasBattlefield2,CivilizationIV,andQuArKareimplementedusingPython.
Pythonisusedinmanydifferentcontexts,suchassystemprogramming,webprogramming,GUIapplications,gamingandrobotics,rapidprototyping,systemintegration,datascience,databaseapplications,andmuchmore.SeveralprestigiousuniversitieshavealsoadoptedPythonastheirmainlanguageincomputersciencecourses.
SettinguptheenvironmentBeforewetalkaboutinstallingPythononyoursystem,letmetellyouaboutwhichPythonversionI'llbeusinginthisbook.
Python2versusPython3Pythoncomesintwomainversions:Python2,whichisthepast,andPython3,whichisthepresent.Thetwoversions,thoughverysimilar,areincompatibleinsomerespects.
Intherealworld,Python2isactuallyquitefarfrombeingthepast.Inshort,eventhoughPython3hasbeenoutsince2008,thetransitionphasefromVersion2isstillfarfrombeingover.ThisismostlyduetothefactthatPython2iswidelyusedintheindustry,andofcourse,companiesaren'tsokeenonupdatingtheirsystemsjustforthesakeofupdatingthem,followingtheifitain'tbroke,don'tfixitphilosophy.Youcanreadallaboutthetransitionbetweenthetwoversionsontheweb.
Anotherissuethathashinderedthetransitionistheavailabilityofthird-partylibraries.Usually,aPythonprojectreliesontensofexternallibraries,andofcourse,whenyoustartanewproject,youneedtobesurethatthereisalreadyaVersion-3-compatiblelibraryforanybusinessrequirementthatmaycomeup.Ifthat'snotthecase,startingabrand-newprojectinPython3meansintroducingapotentialrisk,whichmanycompaniesarenothappytotake.
Atthetimeofwriting,though,themajorityofthemostwidelyusedlibrarieshavebeenportedtoPython3,andit'squitesafetostartaprojectinPython3formostcases.Manyofthelibrarieshavebeenrewrittensothattheyarecompatiblewithbothversions,mostlyharnessingthepowerofthesixlibrary(thenamecomesfromthemultiplication2x3,duetotheportingfromVersion2to3),whichhelpsintrospectingandadaptingthebehavioraccordingtotheversionused.AccordingtoPEP373(https://legacy.python.org/dev/peps/pep-0373/),theendoflife(EOL)ofPython2.7hasbeensetto2020,andtherewon'tbeaPython2.8,sothisisthetimewhencompaniesthathaveprojectsrunninginPython2needtostartdevisinganupgradestrategytomovetoPython3beforeit'stoolate.
Onmybox(MacBookPro),thisisthelatestPythonversionIhave:
>>>importsys
>>>print(sys.version)
3.7.0a3(default,Jan272018,00:46:45)
[Clang9.0.0(clang-900.0.39.2)]
SoyoucanseethattheversionisanalphareleaseofPython3.7,whichwillbereleasedinJune2018.TheprecedingtextisalittlebitofPythoncodethatItypedintomyconsole.We'lltalkaboutitinamoment.
AlltheexamplesinthisbookwillberunusingPython3.7.EventhoughatthemomentthefinalversionmightstillbeslightlydifferentthanwhatIhave,Iwillmakesurethatallthecodeandexamplesareuptodatewith3.7bythetimethebookispublished.
SomeofthecodecanalsoruninPython2.7,eitherasitisorwithminortweaks,butatthispointintime,Ithinkit'sbettertolearnPython3,andthen,ifyouneedto,learnthedifferencesithaswithPython2,ratherthangoingtheotherwayaround.
Don'tworryaboutthisversionthingthough;it'snotthatbiganissueinpractice.
InstallingPython
Ineverreallygotthepointofhavingasetupsectioninabook,regardlessofwhatitisthatyouhavetosetup.Mostofthetime,betweenthetimetheauthorwritestheinstructionsandthetimeyouactuallytrythemout,monthshavepassed.Thatis,ifyou'relucky.Oneversionchangeandthingsmaynotworkinthewaythatisdescribedinthebook.Luckily,wehavethewebnow,soinordertohelpyougetupandrunning,I'lljustgiveyoupointersandobjectives.
Iamconsciousthatthemajorityofreaderswouldprobablyhavepreferredtohaveguidelinesinthebook.Idoubtitwouldhavemadetheirlifemucheasier,asIstronglybelievethatifyouwanttogetstartedwithPythonyouhavetoputinthatinitialeffortinordertogetfamiliarwiththeecosystem.Itisveryimportant,anditwillboostyourconfidencetofacethematerialinthechaptersahead.Ifyougetstuck,rememberthatGoogleisyourfriend.
SettingupthePythoninterpreterFirstofall,let'stalkaboutyourOS.PythonisfullyintegratedandmostlikelyalreadyinstalledinbasicallyalmosteveryLinuxdistribution.IfyouhaveamacOS,it'slikelythatPythonisalreadythereaswell(however,possiblyonlyPython2.7),whereasifyou'reusingWindows,youprobablyneedtoinstallit.
GettingPythonandthelibrariesyouneedupandrunningrequiresabitofhandiwork.LinuxandmacOSseemtobethemostuser-friendlyOSesforPythonprogrammers;Windows,ontheotherhand,istheonethatrequiresthebiggesteffort.
MycurrentsystemisaMacBookPro,andthisiswhatIwillusethroughoutthebook,alongwithPython3.7.
TheplaceyouwanttostartistheofficialPythonwebsite:https://www.python.org.ThiswebsitehoststheofficialPythondocumentationandmanyotherresourcesthatyouwillfindveryuseful.Takethetimetoexploreit.
Anotherexcellent,resourcefulwebsiteonPythonanditsecosystemishttp://docs.python-guide.org.YoucanfindinstructionstosetupPythonondifferentoperatingsystems,usingdifferentmethods.
FindthedownloadsectionandchoosetheinstallerforyourOS.IfyouareonWindows,makesurethatwhenyouruntheinstaller,youchecktheoptioninstallpip(actually,Iwouldsuggesttomakeacompleteinstallation,justtobesafe,ofallthecomponentstheinstallerholds).We'lltalkaboutpiplater.
NowthatPythonisinstalledinyoursystem,theobjectiveistobeabletoopenaconsoleandrunthePythoninteractiveshellbytypingpython.
PleasenotethatIusuallyrefertothePythoninteractiveshellsimplyasthePythonconsole.
ToopentheconsoleinWindows,gototheStartmenu,chooseRun,andtypecmd.Ifyouencounteranythingthatlookslikeapermissionproblemwhileworkingontheexamplesinthisbook,pleasemakesureyouarerunningtheconsolewith
administratorrights.
OnthemacOSX,youcanstartaTerminalbygoingtoApplications|Utilities|Terminal.
IfyouareonLinux,youknowallthatthereistoknowabouttheconsole.
IwillusethetermconsoleinterchangeablytoindicatetheLinuxconsole,theWindowsCommandPrompt,andtheMacintoshTerminal.Iwillalsoindicatethecommand-linepromptwiththeLinuxdefaultformat,likethis:
$sudoapt-getupdate
Ifyou'renotfamiliarwiththat,pleasetakesometimetolearnthebasicsonhowaconsoleworks.Inanutshell,afterthe$sign,younormallyfindaninstructionthatyouhavetotype.Payattentiontocapitalizationandspaces,astheyareveryimportant.
Whateverconsoleyouopen,typepythonattheprompt,andmakesurethePythoninteractiveshellshowsup.Typeexit()toquit.Keepinmindthatyoumayhavetospecifypython3ifyourOScomeswithPython2.*preinstalled.
ThisisroughlywhatyoushouldseewhenyourunPython(itwillchangeinsomedetailsaccordingtotheversionandOS):
$python3.7
Python3.7.0a3(default,Jan272018,00:46:45)
[Clang9.0.0(clang-900.0.39.2)]ondarwin
Type"help","copyright","credits"or"license"formoreinformation.
>>>
NowthatPythonissetupandyoucanrunit,it'stimetomakesureyouhavetheothertoolthatwillbeindispensabletofollowtheexamplesinthebook:virtualenv.
AboutvirtualenvAsyouprobablyhaveguessedbyitsname,virtualenvisallaboutvirtualenvironments.Letmeexplainwhattheyareandwhyweneedthemandletmedoitbymeansofasimpleexample.
YouinstallPythononyoursystemandyoustartworkingonawebsiteforClientX.Youcreateaprojectfolderandstartcoding.Alongtheway,youalsoinstallsomelibraries;forexample,theDjangoframework,whichwe'llseeindepthinChapter14,WebDevelopment.Let'ssaytheDjangoversionyouinstallforProjectXis1.7.1.
Now,yourwebsiteissogoodthatyougetanotherclient,Y.Shewantsyoutobuildanotherwebsite,soyoustartProjectYand,alongtheway,youneedtoinstallDjangoagain.TheonlyissueisthatnowtheDjangoversionis1.8andyoucannotinstallitonyoursystembecausethiswouldreplacetheversionyouinstalledforProjectX.Youdon'twanttoriskintroducingincompatibilityissues,soyouhavetwochoices:eitheryoustickwiththeversionyouhavecurrentlyonyourmachine,oryouupgradeitandmakesurethefirstprojectisstillfullyworkingcorrectlywiththenewversion.
Let'sbehonest,neitheroftheseoptionsisveryappealing,right?Definitelynot.So,here'sthesolution:virtualenv!
virtualenvisatoolthatallowsyoutocreateavirtualenvironment.Inotherwords,itisatooltocreateisolatedPythonenvironments,eachofwhichisafolderthatcontainsallthenecessaryexecutablestousethepackagesthataPythonprojectwouldneed(thinkofpackagesaslibrariesforthetimebeing).
SoyoucreateavirtualenvironmentforProjectX,installallthedependencies,andthenyoucreateavirtualenvironmentforProjectY,installingallitsdependencieswithouttheslightestworrybecauseeverylibraryyouinstallendsupwithintheboundariesoftheappropriatevirtualenvironment.Inourexample,ProjectXwillholdDjango1.7.1,whileProjectYwillholdDjango1.8.
Itisofvitalimportancethatyouneverinstalllibrariesdirectlyatthesystemlevel.Linux,for
example,reliesonPythonformanydifferenttasksandoperations,andifyoufiddlewiththesysteminstallationofPython,youriskcompromisingtheintegrityofthewholesystem(guesstowhomthishappened...).Sotakethisasarule,suchasbrushingyourteethbeforegoingtobed:always,alwayscreateavirtualenvironmentwhenyoustartanewproject.
Toinstallvirtualenvonyoursystem,thereareafewdifferentways.OnaDebian-baseddistributionofLinux,forexample,youcaninstallitwiththefollowingcommand:
$sudoapt-getinstallpython-virtualenv
Probably,theeasiestwayistofollowtheinstructionsyoucanfindonthevirtualenvofficialwebsite:https://virtualenv.pypa.io.
Youwillfindthatoneofthemostcommonwaystoinstallvirtualenvisbyusingpip,apackagemanagementsystemusedtoinstallandmanagesoftwarepackageswritteninPython.
AsofPython3.5,thesuggestedwaytocreateavirtualenvironmentistousethevenvmodule.Pleaseseetheofficialdocumentationforfurtherinformation.However,atthetimeofwriting,virtualenvisstillbyfarthetoolmostusedforcreatingvirtualenvironments.
YourfirstvirtualenvironmentItisveryeasytocreateavirtualenvironment,butaccordingtohowyoursystemisconfiguredandwhichPythonversionyouwantthevirtualenvironmenttorun,youneedtorunthecommandproperly.Anotherthingyouwillneedtodowithvirtualenv,whenyouwanttoworkwithit,istoactivateit.ActivatingvirtualenvbasicallyproducessomepathjugglingbehindthescenessothatwhenyoucallthePythoninterpreter,you'reactuallycallingtheactivevirtualenvironmentone,insteadofthemeresystemone.
I'llshowyouafullexampleonmyMacintoshconsole.Wewill:
1. Createafoldernamedlearn.ppunderyourprojectroot(whichinmycaseisafoldercalledsrv,inmyhomefolder).Pleaseadaptthepathsaccordingtothesetupyoufancyonyourbox.
2. Withinthelearn.ppfolder,wewillcreateavirtualenvironmentcalledlearnpp.
Somedevelopersprefertocallallvirtualenvironmentsusingthesamename(forexample,.venv).Thiswaytheycanrunscriptsagainstanyvirtualenvbyjustknowingthenameoftheprojecttheydwellin.Thedotin.venvistherebecauseinLinux/macOSprependinganamewithadotmakesthatfileorfolderinvisible.
3. Aftercreatingthevirtualenvironment,wewillactivateit.ThemethodsareslightlydifferentbetweenLinux,macOS,andWindows.
4. Then,we'llmakesurethatwearerunningthedesiredPythonversion(3.7.*)byrunningthePythoninteractiveshell.
5. Finally,wewilldeactivatethevirtualenvironmentusingthedeactivatecommand.
Thesefivesimplestepswillshowyouallyouhavetodotostartanduseaproject.
Here'sanexampleofhowthosestepsmightlook(notethatyoumightgetaslightlydifferentresult,accordingtoyourOS,Pythonversion,andsoon)onthemacOS(commandsthatstartwitha#arecomments,spaceshavebeenintroducedforreadability,and⇢indicateswherethelinehaswrappedarounddue
tolackofspace):
fabmp:srvfab$#step1-createfolder
fabmp:srvfab$mkdirlearn.pp
fabmp:srvfab$cdlearn.pp
fabmp:learn.ppfab$#step2-createvirtualenvironment
fabmp:learn.ppfab$whichpython3.7
/Users/fab/.pyenv/shims/python3.7
fabmp:learn.ppfab$virtualenv-p
⇢/Users/fab/.pyenv/shims/python3.7learnppRunningvirtualenvwithinterpreter/Users/fab/.pyenv/shims/python3.7
Usingbaseprefix'/Users/fab/.pyenv/versions/3.7.0a3'
Newpythonexecutablein/Users/fab/srv/learn.pp/learnpp/bin/python3.7
Alsocreatingexecutablein/Users/fab/srv/learn.pp/learnpp/bin/python
Installingsetuptools,pip,wheel...done.
fabmp:learn.ppfab$#step3-activatevirtualenvironment
fabmp:learn.ppfab$sourcelearnpp/bin/activate
(learnpp)fabmp:learn.ppfab$#step4-verifywhichpython
(learnpp)fabmp:learn.ppfab$whichpython
/Users/fab/srv/learn.pp/learnpp/bin/python
(learnpp)fabmp:learn.ppfab$python
Python3.7.0a3(default,Jan272018,00:46:45)
[Clang9.0.0(clang-900.0.39.2)]ondarwin
Type"help","copyright","credits"or"license"formoreinformation.
>>>exit()
(learnpp)fabmp:learn.ppfab$#step5-deactivate
(learnpp)fabmp:learn.ppfab$deactivate
fabmp:learn.ppfab$
NoticethatIhadtotellvirtualenvexplicitlytousethePython3.7interpreterbecauseonmyboxPython2.7isthedefaultone.HadInotdonethat,IwouldhavehadavirtualenvironmentwithPython2.7insteadofPython3.7.
Youcancombinethetwoinstructionsforstep2inonesinglecommandlikethis:
$virtualenv-p$(whichpython3.7)learnpp
Ichosetobeexplicitlyverboseinthisinstance,tohelpyouunderstandeachbitoftheprocedure.
Anotherthingtonoticeisthatinordertoactivateavirtualenvironment,weneedtorunthe/bin/activatescript,whichneedstobesourced.Whenascriptissourced,itmeansthatitisexecutedinthecurrentshell,andthereforeitseffectslastaftertheexecution.Thisisveryimportant.Alsonoticehowthepromptchangesafterweactivatethevirtualenvironment,showingitsnameontheleft(andhowitdisappearswhenwedeactivateit).OnLinux,thestepsarethesame
soIwon'trepeatthemhere.OnWindows,thingschangeslightly,buttheconceptsarethesame.Pleaserefertotheofficialvirtualenvwebsiteforguidance.
Atthispoint,youshouldbeabletocreateandactivateavirtualenvironment.Pleasetryandcreateanotheronewithoutmeguidingyou.Getacquaintedwiththisprocedurebecauseit'ssomethingthatyouwillalwaysbedoing:weneverworksystem-widewithPython,remember?It'sextremelyimportant.
So,withthescaffoldingoutoftheway,we'rereadytotalkabitmoreaboutPythonandhowyoucanuseit.Beforewedothatthough,allowmetospeakafewwordsabouttheconsole.
Yourfriend,theconsoleInthiseraofGUIsandtouchscreendevices,itseemsalittleridiculoustohavetoresorttoatoolsuchastheconsole,wheneverythingisjustaboutoneclickaway.
Butthetruthiseverytimeyouremoveyourrighthandfromthekeyboard(ortheleftone,ifyou'realefty)tograbyourmouseandmovethecursorovertothespotyouwanttoclickon,you'relosingtime.Gettingthingsdonewiththeconsole,counter-intuitiveasitmaybe,resultsinhigherproductivityandspeed.Iknow,youhavetotrustmeonthis.
Speedandproductivityareimportantand,personally,Ihavenothingagainstthemouse,butthereisanotherverygoodreasonforwhichyoumaywanttogetwell-acquaintedwiththeconsole:whenyoudevelopcodethatendsuponsomeserver,theconsolemightbetheonlyavailabletool.Ifyoumakefriendswithit,Ipromiseyou,youwillnevergetlostwhenit'sofutmostimportancethatyoudon't(typically,whenthewebsiteisdownandyouhavetoinvestigateveryquicklywhat'sgoingon).
Soit'sreallyuptoyou.Ifyou'reundecided,pleasegrantmethebenefitofthedoubtandgiveitatry.It'seasierthanyouthink,andyou'llneverregretit.ThereisnothingmorepitifulthanagooddeveloperwhogetslostwithinanSSHconnectiontoaserverbecausetheyareusedtotheirowncustomsetoftools,andonlytothat.
Now,let'sgetbacktoPython.
HowyoucanrunaPythonprogramThereareafewdifferentwaysinwhichyoucanrunaPythonprogram.
RunningPythonscriptsPythoncanbeusedasascriptinglanguage.Infact,italwaysprovesitselfveryuseful.Scriptsarefiles(usuallyofsmalldimensions)thatyounormallyexecutetodosomethinglikeatask.Manydevelopersenduphavingtheirownarsenaloftoolsthattheyfirewhentheyneedtoperformatask.Forexample,youcanhavescriptstoparsedatainaformatandrenderitintoanotherdifferentformat.Oryoucanuseascripttoworkwithfilesandfolders.Youcancreateormodifyconfigurationfiles,andmuchmore.Technically,thereisnotmuchthatcannotbedoneinascript.
It'squitecommontohavescriptsrunningataprecisetimeonaserver.Forexample,ifyourwebsitedatabaseneedscleaningevery24hours(forexample,thetablethatstorestheusersessions,whichexpireprettyquicklybutaren'tcleanedautomatically),youcouldsetupaCronjobthatfiresyourscriptat3:00A.M.everyday.
AccordingtoWikipedia,thesoftwareutilityCronisatime-basedjobschedulerinUnix-likecomputeroperatingsystems.PeoplewhosetupandmaintainsoftwareenvironmentsuseCrontoschedulejobs(commandsorshellscripts)torunperiodicallyatfixedtimes,dates,orintervals.
IhavePythonscriptstodoallthemenialtasksthatwouldtakememinutesormoretodomanually,andatsomepoint,Idecidedtoautomate.We'lldevotehalfofChapter12,GUIsandScripts,onscriptingwithPython.
RunningthePythoninteractiveshellAnotherwayofrunningPythonisbycallingtheinteractiveshell.Thisissomethingwealreadysawwhenwetypedpythononthecommandlineofourconsole.
So,openaconsole,activateyourvirtualenvironment(whichbynowshouldbesecondnaturetoyou,right?),andtypepython.Youwillbepresentedwithacoupleoflinesthatshouldlooklikethis:
$python
Python3.7.0a3(default,Jan272018,00:46:45)
[Clang9.0.0(clang-900.0.39.2)]ondarwin
Type"help","copyright","credits"or"license"formoreinformation.
>>>
Those>>>arethepromptoftheshell.TheytellyouthatPythoniswaitingforyoutotypesomething.Ifyoutypeasimpleinstruction,somethingthatfitsinoneline,that'sallyou'llsee.However,ifyoutypesomethingthatrequiresmorethanonelineofcode,theshellwillchangethepromptto...,givingyouavisualcluethatyou'retypingamultilinestatement(oranythingthatwouldrequiremorethanonelineofcode).
Goon,tryitout;let'sdosomebasicmath:
>>>2+4
6
>>>10/4
2.5
>>>2**1024
179769313486231590772930519078902473361797697894230657273430081157732675805500963132708477322407536021120113879871393357658789768814416622492847430639474124377767893424865485276302219601246094119453082952085005768838150682342462881473913110540827237163350510684586298239947245938479716304835356329624224137216
Thelastoperationisshowingyousomethingincredible.Weraise2tothepowerof1024,andPythonishandlingthistaskwithnotroubleatall.TrytodoitinJava,C++,orC#.Itwon'twork,unlessyouusespeciallibrariestohandlesuchbignumbers.
Iusetheinteractiveshelleveryday.It'sextremelyusefultodebugveryquickly,forexample,tocheckifadatastructuresupportsanoperation.Ormaybetoinspectorrunapieceofcode.
WhenyouuseDjango(awebframework),theinteractiveshelliscoupledwithitandallowsyoutoworkyourwaythroughtheframeworktools,toinspectthedatainthedatabase,andmanymorethings.Youwillfindthattheinteractiveshellwillsoonbecomeoneofyourdearestfriendsonthejourneyyouareembarkingon.
Anothersolution,whichcomesinamuchnicergraphiclayout,istouseIntegratedDeveLopmentEnvironment(IDLE).It'squiteasimpleIDE,whichisintendedmostlyforbeginners.Ithasaslightlylargersetofcapabilitiesthanthenakedinteractiveshellyougetintheconsole,soyoumaywanttoexploreit.ItcomesforfreeintheWindowsPythoninstallerandyoucaneasilyinstallitinanyothersystem.YoucanfindinformationaboutitonthePythonwebsite.
GuidoVanRossumnamedPythonaftertheBritishcomedygroup,MontyPython,soit'srumoredthatthenameIDLEhasbeenchoseninhonorofEricIdle,oneofMontyPython'sfoundingmembers.
RunningPythonasaserviceApartfrombeingrunasascript,andwithintheboundariesofashell,Pythoncanbecodedandrunasanapplication.We'llseemanyexamplesthroughoutthebookaboutthismode.Andwe'llunderstandmoreaboutitinamoment,whenwe'lltalkabouthowPythoncodeisorganizedandrun.
RunningPythonasaGUIapplicationPythoncanalsoberunasagraphicaluserinterface(GUI).Thereareseveralframeworksavailable,someofwhicharecross-platformandsomeothersareplatform-specific.InChapter12,GUIsandScripts,we'llseeanexampleofaGUIapplicationcreatedusingTkinter,whichisanobject-orientedlayerthatlivesontopofTk(TkintermeansTkinterface).
TkisaGUItoolkitthattakesdesktopapplicationdevelopmenttoahigherlevelthantheconventionalapproach.ItisthestandardGUIforToolCommandLanguage(Tcl),butalsoformanyotherdynamiclanguages,anditcanproducerichnativeapplicationsthatrunseamlesslyunderWindows,Linux,macOSX,andmore.
TkintercomesbundledwithPython;therefore,itgivestheprogrammereasyaccesstotheGUIworld,andforthesereasons,IhavechosenittobetheframeworkfortheGUIexamplesthatI'llpresentinthisbook.
AmongtheotherGUIframeworks,wefindthatthefollowingarethemostwidelyused:
PyQtwxPythonPyGTK
Describingthemindetailisoutsidethescopeofthisbook,butyoucanfindalltheinformationyouneedonthePythonwebsite(https://docs.python.org/3/faq/gui.html)intheWhatplatform-independentGUItoolkitsexistforPython?section.IfGUIsarewhatyou'relookingfor,remembertochoosetheoneyouwantaccordingtosomeprinciples.Makesurethey:
OfferallthefeaturesyoumayneedtodevelopyourprojectRunonalltheplatformsyoumayneedtosupportRelyonacommunitythatisaswideandactiveaspossibleWrapgraphicdrivers/toolsthatyoucaneasilyinstall/access
HowisPythoncodeorganized?Let'stalkalittlebitabouthowPythoncodeisorganized.Inthissection,we'llstartgoingdowntherabbitholealittlebitmoreandintroducemoretechnicalnamesandconcepts.
Startingwiththebasics,howisPythoncodeorganized?Ofcourse,youwriteyourcodeintofiles.Whenyousaveafilewiththeextension.py,thatfileissaidtobeaPythonmodule.
Ifyou'reonWindowsormacOSthattypicallyhidefileextensionsfromtheuser,pleasemakesureyouchangetheconfigurationsothatyoucanseethecompletenamesofthefiles.Thisisnotstrictlyarequirement,butasuggestion.
Itwouldbeimpracticaltosaveallthecodethatitisrequiredforsoftwaretoworkwithinonesinglefile.Thatsolutionworksforscripts,whichareusuallynotlongerthanafewhundredlines(andoftentheyarequiteshorterthanthat).
AcompletePythonapplicationcanbemadeofhundredsofthousandsoflinesofcode,soyouwillhavetoscatteritthroughdifferentmodules,whichisbetter,butnotnearlygoodenough.Itturnsoutthatevenlikethis,itwouldstillbeimpracticaltoworkwiththecode.SoPythongivesyouanotherstructure,calledpackage,whichallowsyoutogroupmodulestogether.Apackageisnothingmorethanafolder,whichmustcontainaspecialfile,__init__.py,thatdoesn'tneedtoholdanycodebutwhosepresenceisrequiredtotellPythonthatthefolderisnotjustsomefolder,butit'sactuallyapackage(notethatasofPython3.3,the__init__.pymoduleisnotstrictlyrequiredanymore).
Asalways,anexamplewillmakeallofthismuchclearer.Ihavecreatedanexamplestructureinmybookproject,andwhenItypeinmyconsole:
$tree-vexample
Igetatreerepresentationofthecontentsofthech1/examplefolder,whichholdsthecodefortheexamplesofthischapter.Here'swhatthestructureofareallysimpleapplicationcouldlooklike:
example
├──core.py
├──run.py
└──util
├──__init__.py
├──db.py
├──math.py
└──network.py
Youcanseethatwithintherootofthisexample,wehavetwomodules,core.pyandrun.py,andonepackage:util.Withincore.py,theremaybethecorelogicofourapplication.Ontheotherhand,withintherun.pymodule,wecanprobablyfindthelogictostarttheapplication.Withintheutilpackage,Iexpecttofindvariousutilitytools,andinfact,wecanguessthatthemodulestherearenamedbasedonthetypesoftoolstheyhold:db.pywouldholdtoolstoworkwithdatabases,math.pywould,ofcourse,holdmathematicaltools(maybeourapplicationdealswithfinancialdata),andnetwork.pywouldprobablyholdtoolstosend/receivedataonnetworks.
Asexplainedbefore,the__init__.pyfileistherejusttotellPythonthatutilisapackageandnotjustamerefolder.
Hadthissoftwarebeenorganizedwithinmodulesonly,itwouldhavebeenhardertoinferitsstructure.Iputamoduleonlyexampleunderthech1/files_onlyfolder;seeitforyourself:
$tree-vfiles_only
Thisshowsusacompletelydifferentpicture:
files_only/
├──core.py
├──db.py
├──math.py
├──network.py
└──run.py
Itisalittlehardertoguesswhateachmoduledoes,right?Now,considerthatthisisjustasimpleexample,soyoucanguesshowmuchharderitwouldbetounderstandarealapplicationifwecouldn'torganizethecodeinpackagesandmodules.
Howdoweusemodulesandpackages?Whenadeveloperiswritinganapplication,itislikelythattheywillneedtoapplythesamepieceoflogicindifferentpartsofit.Forexample,whenwritingaparserforthedatathatcomesfromaformthatausercanfillinawebpage,theapplicationwillhavetovalidatewhetheracertainfieldisholdinganumberornot.Regardlessofhowthelogicforthiskindofvalidationiswritten,it'slikelythatitwillbeneededinmorethanoneplace.
Forexample,inapollapplication,wheretheuserisaskedmanyquestions,it'slikelythatseveralofthemwillrequireanumericanswer.Forexample:
Whatisyourage?Howmanypetsdoyouown?Howmanychildrendoyouhave?Howmanytimeshaveyoubeenmarried?
Itwouldbeverybadpracticetocopy/paste(or,moreproperlysaid:duplicate)thevalidationlogicineveryplacewhereweexpectanumericanswer.Thiswouldviolatethedon'trepeatyourself(DRY)principle,whichstatesthatyoushouldneverrepeatthesamepieceofcodemorethanonceinyourapplication.Ifeeltheneedtostresstheimportanceofthisprinciple:youshouldneverrepeatthesamepieceofcodemorethanonceinyourapplication(punintended).
Thereareseveralreasonswhyrepeatingthesamepieceoflogiccanbeverybad,themostimportantonesbeing:
Therecouldbeabuginthelogic,andtherefore,youwouldhavetocorrectitineveryplacethatthelogicisapplied.Youmaywanttoamendthewayyoucarryoutthevalidation,andagainyouwouldhavetochangeitineveryplaceitisapplied.Youmayforgettofix/amendapieceoflogicbecauseyoumisseditwhensearchingforallitsoccurrences.Thiswouldleavewrong/inconsistentbehaviorinyourapplication.
Yourcodewouldbelongerthanneeded,fornogoodreason.
Pythonisawonderfullanguageandprovidesyouwithallthetoolsyouneedtoapplyallthecodingbestpractices.Forthisparticularexample,weneedtobeabletoreuseapieceofcode.Tobeabletoreuseapieceofcode,weneedtohaveaconstructthatwillholdthecodeforussothatwecancallthatconstructeverytimeweneedtorepeatthelogicinsideit.Thatconstructexists,andit'scalledafunction.
I'mnotgoingtoodeepintothespecificshere,sopleasejustrememberthatafunctionisablockoforganized,reusablecodethatisusedtoperformatask.Functionscanassumemanyformsandnames,accordingtowhatkindofenvironmenttheybelongto,butfornowthisisnotimportant.We'llseethedetailswhenweareabletoappreciatethem,lateron,inthebook.Functionsarethebuildingblocksofmodularityinyourapplication,andtheyarealmostindispensable.Unlessyou'rewritingasuper-simplescript,you'llusefunctionsallthetime.We'llexplorefunctionsinChapter4,Functions,theBuildingBlocksofCode.
Pythoncomeswithaveryextensivelibrary,asIhavealreadysaidafewpagesago.Now,maybeit'sagoodtimetodefinewhatalibraryis:alibraryisacollectionoffunctionsandobjectsthatprovidefunctionalitiesthatenrichtheabilitiesofalanguage.
Forexample,withinPython'smathlibrary,wecanfindaplethoraoffunctions,oneofwhichisthefactorialfunction,whichofcoursecalculatesthefactorialofanumber.
Inmathematics,thefactorialofanon-negativeintegernumberN,denotedasN!,isdefinedastheproductofallpositiveintegerslessthanorequaltoN.Forexample,thefactorialof5iscalculatedas:5!=5*4*3*2*1=120
Thefactorialof0is0!=1,torespecttheconventionforanemptyproduct.
So,ifyouwantedtousethisfunctioninyourcode,allyouwouldhavetodoistoimportitandcallitwiththerightinputvalues.Don'tworrytoomuchifinputvaluesandtheconceptofcallingisnotveryclearfornow;pleasejustconcentrateontheimportpart.Weusealibrarybyimportingwhatweneedfromit,andthenweuseit.
InPython,tocalculatethefactorialofnumber5,wejustneedthefollowingcode:
>>>frommathimportfactorial
>>>factorial(5)
120
Whateverwetypeintheshell,ifithasaprintablerepresentation,willbeprintedontheconsoleforus(inthiscase,theresultofthefunctioncall:120).
So,let'sgobacktoourexample,theonewithcore.py,run.py,util,andsoon.
Inourexample,thepackageutilisourutilitylibrary.Ourcustomutilitybeltthatholdsallthosereusabletools(thatis,functions),whichweneedinourapplication.Someofthemwilldealwithdatabases(db.py),somewiththenetwork(network.py),andsomewillperformmathematicalcalculations(math.py)thatareoutsidethescopeofPython'sstandardmathlibraryand,therefore,wehavetocodethemforourselves.
Wewillseeindetailhowtoimportfunctionsandusethemintheirdedicatedchapter.Let'snowtalkaboutanotherveryimportantconcept:Python'sexecutionmodel.
Python'sexecutionmodelInthissection,Iwouldliketointroduceyoutoafewveryimportantconcepts,suchasscope,names,andnamespaces.YoucanreadallaboutPython'sexecutionmodelintheofficiallanguagereference,ofcourse,butIwouldarguethatitisquitetechnicalandabstract,soletmegiveyoualessformalexplanationfirst.
NamesandnamespacesSayyouarelookingforabook,soyougotothelibraryandasksomeoneforthebookyouwanttofetch.TheytellyousomethinglikeSecondFloor,SectionX,RowThree.Soyougoupthestairs,lookforSectionX,andsoon.
Itwouldbeverydifferenttoenteralibrarywhereallthebooksarepiledtogetherinrandomorderinonebigroom.Nofloors,nosections,norows,noorder.Fetchingabookwouldbeextremelyhard.
Whenwewritecode,wehavethesameissue:wehavetotryandorganizeitsothatitwillbeeasyforsomeonewhohasnopriorknowledgeaboutittofindwhatthey'relookingfor.Whensoftwareisstructuredcorrectly,italsopromotescodereuse.Ontheotherhand,disorganizedsoftwareismorelikelytoexposescatteredpiecesofduplicatedlogic.
Firstofall,let'sstartwiththebook.WerefertoabookbyitstitleandinPythonlingo,thatwouldbeaname.Pythonnamesaretheclosestabstractiontowhatotherlanguagescallvariables.Namesbasicallyrefertoobjectsandareintroducedbyname-bindingoperations.Let'smakeaquickexample(noticethatanythingthatfollowsa#isacomment):
>>>n=3#integernumber
>>>address="221bBakerStreet,NW16XE,London"#SherlockHolmes'address
>>>employee={
...'age':45,
...'role':'CTO',
...'SSN':'AB1234567',
...}
>>>#let'sprintthem
>>>n
3
>>>address
'221bBakerStreet,NW16XE,London'
>>>employee
{'age':45,'role':'CTO','SSN':'AB1234567'}
>>>other_name
Traceback(mostrecentcalllast):
File"<stdin>",line1,in<module>
NameError:name'other_name'isnotdefined
Wedefinedthreeobjectsintheprecedingcode(doyourememberwhatarethethreefeatureseveryPythonobjecthas?):
Anintegernumbern(type:int,value:3)Astringaddress(type:str,value:SherlockHolmes'address)Adictionaryemployee(type:dict,value:adictionarythatholdsthreekey/valuepairs)
Don'tworry,Iknowyou'renotsupposedtoknowwhatadictionaryis.We'llseeinChapter2,Built-inDataTypes,thatit'sthekingofPythondatastructures.
Haveyounoticedthatthepromptchangedfrom>>>to...whenItypedinthedefinitionofemployee?That'sbecausethedefinitionspansovermultiplelines.
So,whataren,address,andemployee?Theyarenames.Namesthatwecanusetoretrievedatawithinourcode.Theyneedtobekeptsomewheresothatwheneverweneedtoretrievethoseobjects,wecanusetheirnamestofetchthem.Weneedsomespacetoholdthem,hence:namespaces!
Anamespaceisthereforeamappingfromnamestoobjects.Examplesarethesetofbuilt-innames(containingfunctionsthatarealwaysaccessibleinanyPythonprogram),theglobalnamesinamodule,andthelocalnamesinafunction.Eventhesetofattributesofanobjectcanbeconsideredanamespace.
Thebeautyofnamespacesisthattheyallowyoutodefineandorganizeyournameswithclarity,withoutoverlappingorinterference.Forexample,thenamespaceassociatedwiththatbookwewerelookingforinthelibrarycanbeusedtoimportthebookitself,likethis:
fromlibrary.second_floor.section_x.row_threeimportbook
Westartfromthelibrarynamespace,andbymeansofthedot(.)operator,wewalkintothatnamespace.Withinthisnamespace,welookforsecond_floor,andagainwewalkintoitwiththe.operator.Wethenwalkintosection_x,andfinallywithinthelastnamespace,row_three,wefindthenamewewerelookingfor:book.
Walkingthroughanamespacewillbeclearerwhenwe'llbedealingwithrealcodeexamples.Fornow,justkeepinmindthatnamespacesareplaceswherenamesareassociatedwithobjects.
Thereisanotherconcept,whichiscloselyrelatedtothatofanamespace,whichI'dliketobrieflytalkabout:thescope.
ScopesAccordingtoPython'sdocumentation:
"AscopeisatextualregionofaPythonprogram,whereanamespaceisdirectlyaccessible."
Directlyaccessiblemeansthatwhenyou'relookingforanunqualifiedreferencetoaname,Pythontriestofinditinthenamespace.
Scopesaredeterminedstatically,butactually,duringruntime,theyareuseddynamically.Thismeansthatbyinspectingthesourcecode,youcantellwhatthescopeofanobjectis,butthisdoesn'tpreventthesoftwarefromalteringthatduringruntime.TherearefourdifferentscopesthatPythonmakesaccessible(notnecessarilyallofthemarepresentatthesametime,ofcourse):
Thelocalscope,whichistheinnermostoneandcontainsthelocalnames.Theenclosingscope,thatis,thescopeofanyenclosingfunction.Itcontainsnon-localnamesandalsonon-globalnames.Theglobalscopecontainstheglobalnames.Thebuilt-inscopecontainsthebuilt-innames.Pythoncomeswithasetoffunctionsthatyoucanuseinanoff-the-shelffashion,suchasprint,all,abs,andsoon.Theyliveinthebuilt-inscope.
Theruleisthefollowing:whenwerefertoaname,Pythonstartslookingforitinthecurrentnamespace.Ifthenameisnotfound,Pythoncontinuesthesearchtotheenclosingscopeandthiscontinuesuntilthebuilt-inscopeissearched.Ifanamehasn'tbeenfoundaftersearchingthebuilt-inscope,thenPythonraisesaNameErrorexception,whichbasicallymeansthatthenamehasn'tbeendefined(yousawthisintheprecedingexample).
Theorderinwhichthenamespacesarescannedwhenlookingforanameistherefore:local,enclosing,global,built-in(LEGB).
Thisisallverytheoretical,solet'sseeanexample.Inordertoshowyoulocalandenclosingnamespaces,Iwillhavetodefineafewfunctions.Don'tworryifyouarenotfamiliarwiththeirsyntaxforthemoment.We'llstudyfunctionsinChapter4,Functions,theBuildingBlocksofCode.Justrememberthatinthe
followingcode,whenyouseedef,itmeansI'mdefiningafunction:
#scopes1.py
#LocalversusGlobal
#wedefineafunction,calledlocal
deflocal():
m=7
print(m)
m=5
print(m)
#wecall,or`execute`thefunctionlocal
local()
Intheprecedingexample,wedefinethesamenamem,bothintheglobalscopeandinthelocalone(theonedefinedbythelocalfunction).Whenweexecutethisprogramwiththefollowingcommand(haveyouactivatedyourvirtualenv?):
$pythonscopes1.py
Weseetwonumbersprintedontheconsole:5and7.
WhathappensisthatthePythoninterpreterparsesthefile,toptobottom.First,itfindsacoupleofcommentlines,whichareskipped,thenitparsesthedefinitionofthefunctionlocal.Whencalled,thisfunctiondoestwothings:itsetsupanametoanobjectrepresentingnumber7andprintsit.ThePythoninterpreterkeepsgoinganditfindsanothernamebinding.Thistimethebindinghappensintheglobalscopeandthevalueis5.Thenextlineisacalltotheprintfunction,whichisexecuted(andsowegetthefirstvalueprintedontheconsole:5).
Afterthis,thereisacalltothefunctionlocal.Atthispoint,Pythonexecutesthefunction,soatthistime,thebindingm=7happensandit'sprinted.
Oneveryimportantthingtonoticeisthatthepartofthecodethatbelongstothedefinitionofthelocalfunctionisindentedbyfourspacesontheright.Python,infact,definesscopesbyindentingthecode.Youwalkintoascopebyindenting,andwalkoutofitbyunindenting.Somecodersusetwospaces,othersthree,butthesuggestednumberofspacestouseisfour.It'sagoodmeasuretomaximizereadability.We'lltalkmoreaboutalltheconventionsyoushouldembracewhenwritingPythoncodelater.
Whatwouldhappenifweremovedthatm=7line?RemembertheLEGBrule.
Pythonwouldstartlookingforminthelocalscope(functionlocal),and,notfindingit,itwouldgotothenextenclosingscope.Thenextone,inthiscase,istheglobalonebecausethereisnoenclosingfunctionwrappedaroundlocal.Therefore,wewouldseetwonumbers5printedontheconsole.Let'sactuallyseewhatthecodewouldlooklike:
#scopes2.py
#LocalversusGlobal
deflocal():
#mdoesn'tbelongtothescopedefinedbythelocalfunction
#soPythonwillkeeplookingintothenextenclosingscope.
#misfinallyfoundintheglobalscope
print(m,'printingfromthelocalscope')
m=5
print(m,'printingfromtheglobalscope')
local()
Runningscopes2.pywillprintthis:
$pythonscopes2.py
5printingfromtheglobalscope
5printingfromthelocalscope
Asexpected,Pythonprintsmthefirsttime,thenwhenthefunctionlocaliscalled,misn'tfoundinitsscope,soPythonlooksforitfollowingtheLEGBchainuntilmisfoundintheglobalscope.
Let'sseeanexamplewithanextralayer,theenclosingscope:
#scopes3.py
#Local,EnclosingandGlobal
defenclosing_func():
m=13
deflocal():
#mdoesn'tbelongtothescopedefinedbythelocal
#functionsoPythonwillkeeplookingintothenext
#enclosingscope.Thistimemisfoundintheenclosing
#scope
print(m,'printingfromthelocalscope')
#callingthefunctionlocal
local()
m=5
print(m,'printingfromtheglobalscope')
enclosing_func()
Runningscopes3.pywillprintontheconsole:
$pythonscopes3.py
(5,'printingfromtheglobalscope')
(13,'printingfromthelocalscope')
Asyoucansee,theprintinstructionfromthefunctionlocalisreferringtomasbefore.misstillnotdefinedwithinthefunctionitself,soPythonstartswalkingscopesfollowingtheLEGBorder.Thistimemisfoundintheenclosingscope.
Don'tworryifthisisstillnotperfectlyclearfornow.Itwillcometoyouaswegothroughtheexamplesinthebook.TheClassessectionofthePythontutorial(https://docs.python.org/3/tutorial/classes.html)hasaninterestingparagraphaboutscopesandnamespaces.Makesureyoureaditatsomepointifyouwantadeeperunderstandingofthesubject.
Beforewefinishoffthischapter,Iwouldliketotalkabitmoreaboutobjects.Afterall,basicallyeverythinginPythonisanobject,soIthinktheydeserveabitmoreattention.
ObjectsandclassesWhenIintroducedobjectspreviouslyintheAproperintroductionsectionofthechapter,Isaidthatweusethemtorepresentreal-lifeobjects.Forexample,wesellgoodsofanykindonthewebnowadaysandweneedtobeabletohandle,store,andrepresentthemproperly.Butobjectsareactuallysomuchmorethanthat.Mostofwhatyouwilleverdo,inPython,hastodowithmanipulatingobjects.
So,withoutgoingintotoomuchdetail(we'lldothatinChapter6,OOP,Decorators,andIterators),Iwanttogiveyoutheinanutshellkindofexplanationaboutclassesandobjects.
We'vealreadyseenthatobjectsarePython'sabstractionfordata.Infact,everythinginPythonisanobject,infactnumbers,strings(datastructuresthatholdtext),containers,collections,evenfunctions.Youcanthinkofthemasiftheywereboxeswithatleastthreefeatures:anID(unique),atype,andavalue.
Buthowdotheycometolife?Howdowecreatethem?Howdowewriteourowncustomobjects?Theanswerliesinonesimpleword:classes.
Objectsare,infact,instancesofclasses.ThebeautyofPythonisthatclassesareobjectsthemselves,butlet'snotgodownthisroad.Itleadstooneofthemostadvancedconceptsofthislanguage:metaclasses.Fornow,thebestwayforyoutogetthedifferencebetweenclassesandobjectsisbymeansofanexample.
Sayafriendtellsyou,Iboughtanewbike!Youimmediatelyunderstandwhatshe'stalkingabout.Haveyouseenthebike?No.Doyouknowwhatcoloritis?Nope.Thebrand?Nope.Doyouknowanythingaboutit?Nope.Butatthesametime,youknoweverythingyouneedinordertounderstandwhatyourfriendmeantwhenshetoldyousheboughtanewbike.Youknowthatabikehastwowheelsattachedtoaframe,asaddle,pedals,handlebars,brakes,andsoon.Inotherwords,evenifyouhaven'tseenthebikeitself,youknowtheconceptofbike.Anabstractsetoffeaturesandcharacteristicsthattogetherformsomethingcalledbike.
Incomputerprogramming,thatiscalledaclass.It'sthatsimple.Classesareusedtocreateobjects.Infact,objectsaresaidtobeinstancesofclasses.
Inotherwords,weallknowwhatabikeis;weknowtheclass.ButthenIhavemyownbike,whichisaninstanceofthebikeclass.Andmybikeisanobjectwithitsowncharacteristicsandmethods.Youhaveyourownbike.Sameclass,butdifferentinstance.Everybikeevercreatedintheworldisaninstanceofthebikeclass.
Let'sseeanexample.Wewillwriteaclassthatdefinesabikeandthenwe'llcreatetwobikes,oneredandoneblue.I'llkeepthecodeverysimple,butdon'tfretifyoudon'tunderstandeverythingaboutit;allyouneedtocareaboutatthismomentistounderstandthedifferencebetweenaclassandanobject(orinstanceofaclass):
#bike.py
#let'sdefinetheclassBike
classBike:
def__init__(self,colour,frame_material):
self.colour=colour
self.frame_material=frame_material
defbrake(self):
print("Braking!")
#let'screateacoupleofinstances
red_bike=Bike('Red','Carbonfiber')
blue_bike=Bike('Blue','Steel')
#let'sinspecttheobjectswehave,instancesoftheBikeclass.
print(red_bike.colour)#prints:Red
print(red_bike.frame_material)#prints:Carbonfiber
print(blue_bike.colour)#prints:Blue
print(blue_bike.frame_material)#prints:Steel
#let'sbrake!
red_bike.brake()#prints:Braking!
IhopebynowIdon'tneedtotellyoutorunthefileeverytime,right?Thefilenameisindicatedinthefirstlineofthecodeblock.Justrun$pythonfilename,andyou'llbefine.Butremembertohaveyourvirtualenvactivated!
Somanyinterestingthingstonoticehere.Firstthingsfirst;thedefinitionofaclasshappenswiththeclassstatement.Whatevercodecomesaftertheclassstatement,andisindented,iscalledthebodyoftheclass.Inourcase,thelastlinethatbelongstotheclassdefinitionistheprint("Braking!")one.
Afterhavingdefinedtheclass,we'rereadytocreateinstances.Youcanseethat
theclassbodyhoststhedefinitionoftwomethods.Amethodisbasically(andsimplistically)afunctionthatbelongstoaclass.
Thefirstmethod,__init__,isaninitializer.ItusessomePythonmagictosetuptheobjectswiththevalueswepasswhenwecreateit.
Everymethodthathasleadingandtrailingdoubleunderscores,inPython,iscalledamagicmethod.MagicmethodsareusedbyPythonforamultitudeofdifferentpurposes;henceit'sneveragoodideatonameacustommethodusingtwoleadingandtrailingunderscores.ThisnamingconventionisbestlefttoPython.
Theothermethodwedefined,brake,isjustanexampleofanadditionalmethodthatwecouldcallifwewantedtobrakethebike.Itcontainsjustaprintstatement,ofcourse;it'sanexample.
Wecreatedtwobikesthen.Onehasredcolorandacarbonfiberframe,andtheotheronehasbluecolorandasteelframe.Wepassthosevaluesuponcreation.Aftercreation,weprintoutthecolorpropertyandframetypeoftheredbike,andtheframetypeoftheblueonejustasanexample.Wealsocallthebrakemethodofthered_bike.
Onelastthingtonotice.YourememberItoldyouthatthesetofattributesofanobjectisconsideredtobeanamespace?Ihopeit'sclearerwhatImeantnow.Youseethatbygettingtotheframe_typepropertythroughdifferentnamespaces(red_bike,blue_bike),weobtaindifferentvalues.Nooverlapping,noconfusion.
Thedot(.)operatorisofcoursethemeansweusetowalkintoanamespace,inthecaseofobjectsaswell.
GuidelinesonhowtowritegoodcodeWritinggoodcodeisnotaseasyasitseems.AsIalreadysaidbefore,goodcodeexposesalonglistofqualitiesthatisquitehardtoputtogether.Writinggoodcodeis,tosomeextent,anart.Regardlessofwhereonthepathyouwillbehappytosettle,thereissomethingthatyoucanembracewhichwillmakeyourcodeinstantlybetter:PEP8.
AccordingtoWikipedia:
"Python'sdevelopmentisconductedlargelythroughthePythonEnhancementProposal(PEP)process.ThePEPprocessistheprimarymechanismforproposingmajornewfeatures,forcollectingcommunityinputonanissue,andfordocumentingthedesigndecisionsthathavegoneintoPython."
PEP8isperhapsthemostfamousofallPEPs.ItlaysoutasimplebuteffectivesetofguidelinestodefinePythonaestheticssothatwewritebeautifulPythoncode.Ifyoutakeonesuggestionoutofthischapter,pleaseletitbethis:useit.Embraceit.Youwillthankmelater.
Codingtodayisnolongeracheck-in/check-outbusiness.Rather,it'smoreofasocialeffort.SeveraldeveloperscollaborateonapieceofcodethroughtoolssuchasGitandMercurial,andtheresultiscodethatisfatheredbymanydifferenthands.
GitandMercurialareprobablythedistributedrevisioncontrolsystemsthataremostusedtoday.Theyareessentialtoolsdesignedtohelpteamsofdeveloperscollaborateonthesamesoftware.
Thesedays,morethanever,weneedtohaveaconsistentwayofwritingcode,sothatreadabilityismaximized.WhenalldevelopersofacompanyabidebyPEP8,it'snotuncommonforanyofthemlandingonapieceofcodetothinktheywroteitthemselves.Itactuallyhappenstomeallthetime(IalwaysforgetthecodeIwrite).
Thishasatremendousadvantage:whenyoureadcodethatyoucouldhavewrittenyourself,youreaditeasily.Withoutaconvention,everycoderwouldstructurethecodethewaytheylikemost,orsimplythewaytheyweretaughtorareusedto,andthiswouldmeanhavingtointerpreteverylineaccordingto
someoneelse'sstyle.Itwouldmeanhavingtolosemuchmoretimejusttryingtounderstandit.ThankstoPEP8,wecanavoidthis.I'msuchafanofitthatIwon'tsignoffacodereviewifthecodedoesn'trespectit.So,pleasetakethetimetostudyit;it'sveryimportant.
Intheexamplesinthisbook,IwilltrytorespectitasmuchasIcan.Unfortunately,Idon'thavetheluxuryof79characters(whichisthemaximumlinelengthsuggestedbyPEP8),andIwillhavetocutdownonblanklinesandotherthings,butIpromiseyouI'lltrytolayoutmycodesothatit'sasreadableaspossible.
ThePythonculturePythonhasbeenadoptedwidelyinallcodingindustries.It'susedbymanydifferentcompaniesformanydifferentpurposes,andit'salsousedineducation(it'sanexcellentlanguageforthatpurpose,becauseofitsmanyqualitiesandthefactthatit'seasytolearn).
OneofthereasonsPythonissopopulartodayisthatthecommunityarounditisvast,vibrant,andfullofbrilliantpeople.Manyeventsareorganizedallovertheworld,mostlyeitheraroundPythonoritsmainwebframework,Django.
Pythonisopen,andveryoftensoarethemindsofthosewhoembraceit.CheckoutthecommunitypageonthePythonwebsiteformoreinformationandgetinvolved!
ThereisanotheraspecttoPythonwhichrevolvesaroundthenotionofbeingPythonic.IthastodowiththefactthatPythonallowsyoutousesomeidiomsthataren'tfoundelsewhere,atleastnotinthesameformoraseasytouse(IfeelquiteclaustrophobicwhenIhavetocodeinalanguagewhichisnotPythonnow).
Anyway,overtheyears,thisconceptofbeingPythonichasemergedand,thewayIunderstandit,issomethingalongthelinesofdoingthingsthewaytheyaresupposedtobedoneinPython.
TohelpyouunderstandalittlebitmoreaboutPython'scultureandaboutbeingPythonic,IwillshowyoutheZenofPython.AlovelyEastereggthatisverypopular.OpenupaPythonconsoleandtypeimportthis.Whatfollowsistheresultofthisline:
>>>importthis
TheZenofPython,byTimPeters
Beautifulisbetterthanugly.
Explicitisbetterthanimplicit.
Simpleisbetterthancomplex.
Complexisbetterthancomplicated.
Flatisbetterthannested.
Sparseisbetterthandense.
Readabilitycounts.
Specialcasesaren'tspecialenoughtobreaktherules.
Althoughpracticalitybeatspurity.
Errorsshouldneverpasssilently.
Unlessexplicitlysilenced.
Inthefaceofambiguity,refusethetemptationtoguess.
Thereshouldbeone--andpreferablyonlyone--obviouswaytodoit.
Althoughthatwaymaynotbeobviousatfirstunlessyou'reDutch.
Nowisbetterthannever.
Althoughneverisoftenbetterthan*right*now.
Iftheimplementationishardtoexplain,it'sabadidea.
Iftheimplementationiseasytoexplain,itmaybeagoodidea.
Namespacesareonehonkinggreatidea--let'sdomoreofthose!
Therearetwolevelsofreadinghere.Oneistoconsideritasasetofguidelinesthathavebeenputdowninafunway.Theotheroneistokeepitinmind,andmaybereaditonceinawhile,tryingtounderstandhowitreferstosomethingdeeper:somePythoncharacteristicsthatyouwillhavetounderstanddeeplyinordertowritePythonthewayit'ssupposedtobewritten.Startwiththefunlevel,andthendigdeeper.Alwaysdigdeeper.
AnoteonIDEs
JustafewwordsaboutIDEs.Tofollowtheexamplesinthisbook,youdon'tneedone;anytexteditorwilldofine.Ifyouwanttohavemoreadvancedfeatures,suchassyntaxcoloringandautocompletion,youwillhavetofetchyourselfanIDE.YoucanfindacomprehensivelistofopensourceIDEs(justGooglePythonIDEs)onthePythonwebsite.IpersonallyuseSublimeTexteditor.It'sfreetotryoutanditcostsjustafewdollars.IhavetriedmanyIDEsinmylife,butthisistheonethatmakesmemostproductive.
Twoimportantpiecesofadvice:
WhateverIDEyouchoosetouse,trytolearnitwellsothatyoucanexploititsstrengths,butdon'tdependonit.ExerciseyourselftoworkwithVIM(oranyothertexteditor)onceinawhile;learntobeabletodosomeworkonanyplatform,withanysetoftools.Whatevertexteditor/IDEyouuse,whenitcomestowritingPython,indentationisfourspaces.Don'tusetabs,don'tmixthemwithspaces.Usefourspaces,nottwo,notthree,notfive.Justusefour.Thewholeworldworkslikethat,andyoudon'twanttobecomeanoutcastbecauseyouwerefondofthethree-spacelayout.
SummaryInthischapter,westartedtoexploretheworldofprogrammingandthatofPython.We'vebarelyscratchedthesurface,justalittle,touchingconceptsthatwillbediscussedlateroninthebookingreaterdetail.
WetalkedaboutPython'smainfeatures,whoisusingitandforwhat,andwhatarethedifferentwaysinwhichwecanwriteaPythonprogram.
Inthelastpartofthechapter,weflewoverthefundamentalnotionsofnamespaces,scopes,classes,andobjects.WealsosawhowPythoncodecanbeorganizedusingmodulesandpackages.
Onapracticallevel,welearnedhowtoinstallPythononoursystem,howtomakesurewehavethetoolsweneed,pipandvirtualenv,andwealsocreatedandactivatedourfirstvirtualenvironment.Thiswillallowustoworkinaself-containedenvironmentwithouttheriskofcompromisingthePythonsysteminstallation.
Nowyou'rereadytostartthisjourneywithme.Allyouneedisenthusiasm,anactivatedvirtualenvironment,thisbook,yourfingers,andsomecoffee.
Trytofollowtheexamples;I'llkeepthemsimpleandshort.Ifyouputthemunderyourfingertips,youwillretainthemmuchbetterthanifyoujustreadthem.
Inthenextchapter,wewillexplorePython'srichsetofbuilt-indatatypes.There'smuchtocoverandmuchtolearn!
Built-inDataTypes"Data!Data!Data!"hecriedimpatiently."Ican'tmakebrickswithoutclay."
–SherlockHolmes–TheAdventureoftheCopperBeeches
Everythingyoudowithacomputerismanagingdata.Datacomesinmanydifferentshapesandflavors.It'sthemusicyoulistento,themoviesyoustream,thePDFsyouopen.Eventhesourceofthechapteryou'rereadingatthisverymomentisjustafile,whichisdata.
Datacanbesimple,anintegernumbertorepresentanage,orcomplex,likeanorderplacedonawebsite.Itcanbeaboutasingleobjectoraboutacollectionofthem.Datacanevenbeaboutdata,thatis,metadata.Datathatdescribesthedesignofotherdatastructuresordatathatdescribesapplicationdataoritscontext.InPython,objectsareabstractionfordata,andPythonhasanamazingvarietyofdatastructuresthatyoucanusetorepresentdata,orcombinethemtocreateyourowncustomdata.
Inthischapter,wearegoingtocoverthefollowing:
Pythonobjects'structuresMutabilityandimmutabilityBuilt-indatatypes:numbers,strings,sequences,collections,andmappingtypesThecollectionsmoduleEnumerations
EverythingisanobjectBeforewedelveintothespecifics,IwantyoutobeveryclearaboutobjectsinPython,solet'stalkalittlebitmoreaboutthem.Aswealreadysaid,everythinginPythonisanobject.Butwhatreallyhappenswhenyoutypeaninstructionlikeage=42inaPythonmodule?
Ifyougotohttp://pythontutor.com/,youcantypethatinstructionintoatextboxandgetitsvisualrepresentation.Keepthiswebsiteinmind;it'sveryusefultoconsolidateyourunderstandingofwhatgoesonbehindthescenes.
So,whathappensisthatanobjectiscreated.Itgetsanid,thetypeissettoint(integernumber),andthevalueto42.Anameageisplacedintheglobalnamespace,pointingtothatobject.Therefore,wheneverweareintheglobalnamespace,aftertheexecutionofthatline,wecanretrievethatobjectbysimplyaccessingitthroughitsname:age.
Ifyouweretomovehouse,youwouldputalltheknives,forks,andspoonsinaboxandlabelitcutlery.Canyouseeit'sexactlythesameconcept?Here'sascreenshotofwhatitmaylooklike(youmayhavetotweakthesettingstogettothesameview):
So,fortherestofthischapter,wheneveryoureadsomethingsuchasname=some_value,thinkofanameplacedinthenamespacethatistiedtothescopeinwhichtheinstructionwaswritten,withanicearrowpointingtoanobjectthathasanid,atype,andavalue.Thereisalittlebitmoretosayaboutthismechanism,butit'smucheasiertotalkaboutitoveranexample,sowe'llget
backtothislater.
Mutableorimmutable?ThatisthequestionAfirstfundamentaldistinctionthatPythonmakesondataisaboutwhetherornotthevalueofanobjectchanges.Ifthevaluecanchange,theobjectiscalledmutable,whileifthevaluecannotchange,theobjectiscalledimmutable.
Itisveryimportantthatyouunderstandthedistinctionbetweenmutableandimmutablebecauseitaffectsthecodeyouwrite,sohere'saquestion:
>>>age=42
>>>age
42
>>>age=43#A
>>>age
43
Intheprecedingcode,ontheline#A,haveIchangedthevalueofage?Well,no.Butnowit's43(Ihearyousay...).Yes,it's43,but42wasanintegernumber,ofthetypeint,whichisimmutable.So,whathappenedisreallythatonthefirstline,ageisanamethatissettopointtoanintobject,whosevalueis42.Whenwetypeage=43,whathappensisthatanotherobjectiscreated,ofthetypeintandvalue43(also,theidwillbedifferent),andthenameageissettopointtoit.So,wedidn'tchangethat42to43.Weactuallyjustpointedagetoadifferentlocation:thenewintobjectwhosevalueis43.Let'sseethesamecodealsoprintingtheIDs:
>>>age=42
>>>id(age)
4377553168
>>>age=43
>>>id(age)
4377553200
NoticethatweprinttheIDsbycallingthebuilt-inidfunction.Asyoucansee,theyaredifferent,asexpected.Bearinmindthatagepointstooneobjectatatime:42first,then43.Nevertogether.
Now,let'sseethesameexampleusingamutableobject.Forthisexample,let'sjustuseaPersonobject,thathasapropertyage(don'tworryabouttheclassdeclarationfornow;it'sthereonlyforcompleteness):
>>>classPerson():
...def__init__(self,age):
...self.age=age
...
>>>fab=Person(age=42)
>>>fab.age
42
>>>id(fab)
4380878496
>>>id(fab.age)
4377553168
>>>fab.age=25#Iwish!
>>>id(fab)#willbethesame
4380878496
>>>id(fab.age)#willbedifferent
4377552624
Inthiscase,IsetupanobjectfabwhosetypeisPerson(acustomclass).Oncreation,theobjectisgiventheageof42.I'mprintingit,alongwiththeobjectid,andtheIDofageaswell.Noticethat,evenafterIchangeagetobe25,theIDoffabstaysthesame(whiletheIDofagehaschanged,ofcourse).CustomobjectsinPythonaremutable(unlessyoucodethemnottobe).Keepthisconceptinmind;it'sveryimportant.I'llremindyouaboutitthroughouttherestofthechapter.
NumbersLet'sstartbyexploringPython'sbuilt-indatatypesfornumbers.Pythonwasdesignedbyamanwithamaster'sdegreeinmathematicsandcomputerscience,soit'sonlylogicalthatithasamazingsupportfornumbers.
Numbersareimmutableobjects.
IntegersPythonintegershaveanunlimitedrange,subjectonlytotheavailablevirtualmemory.Thismeansthatitdoesn'treallymatterhowbiganumberyouwanttostoreis:aslongasitcanfitinyourcomputer'smemory,Pythonwilltakecareofit.Integernumberscanbepositive,negative,and0(zero).Theysupportallthebasicmathematicaloperations,asshowninthefollowingexample:
>>>a=14
>>>b=3
>>>a+b#addition
17
>>>a-b#subtraction
11
>>>a*b#multiplication
42
>>>a/b#truedivision
4.666666666666667
>>>a//b#integerdivision
4
>>>a%b#modulooperation(reminderofdivision)
2
>>>a**b#poweroperation
2744
Theprecedingcodeshouldbeeasytounderstand.Justnoticeoneimportantthing:Pythonhastwodivisionoperators,oneperformstheso-calledtruedivision(/),whichreturnsthequotientoftheoperands,andtheotherone,theso-calledintegerdivision(//),whichreturnstheflooredquotientoftheoperands.ItmightbeworthnotingthatinPython2thedivisionoperator/behavesdifferentlythaninPython3.Seehowthatisdifferentforpositiveandnegativenumbers:
>>>7/4#truedivision
1.75
>>>7//4#integerdivision,truncationreturns1
1
>>>-7/4#truedivisionagain,resultisoppositeofprevious
-1.75
>>>-7//4#integerdiv.,resultnottheoppositeofprevious
-2
Thisisaninterestingexample.Ifyouwereexpectinga-1onthelastline,don'tfeelbad,it'sjustthewayPythonworks.TheresultofanintegerdivisioninPythonisalwaysroundedtowardsminusinfinity.If,insteadofflooring,you
wanttotruncateanumbertoaninteger,youcanusethebuilt-inintfunction,asshowninthefollowingexample:
>>>int(1.75)
1
>>>int(-1.75)
-1
Noticethatthetruncationisdonetoward0.
Thereisalsoanoperatortocalculatetheremainderofadivision.It'scalledamodulooperator,andit'srepresentedbyapercentage(%):
>>>10%3#remainderofthedivision10//3
1
>>>10%4#remainderofthedivision10//4
2
OnenicefeatureintroducedinPython3.6istheabilitytoaddunderscoreswithinnumberliterals(betweendigitsorbasespecifiers,butnotleadingortrailing).Thepurposeistohelpmakesomenumbersmorereadable,likeforexample1_000_000_000:
>>>n=1_024
>>>n
1024
>>>hex_n=0x_4_0_0#0x400==1024
>>>hex_n
1024
BooleansBooleanalgebraisthatsubsetofalgebrainwhichthevaluesofthevariablesarethetruthvalues:trueandfalse.InPython,TrueandFalsearetwokeywordsthatareusedtorepresenttruthvalues.Booleansareasubclassofintegers,andbehaverespectivelylike1and0.TheequivalentoftheintclassforBooleansistheboolclass,whichreturnseitherTrueorFalse.Everybuilt-inPythonobjecthasavalueintheBooleancontext,whichmeanstheybasicallyevaluatetoeitherTrueorFalsewhenfedtotheboolfunction.We'llseeallaboutthisinChapter3,IteratingandMakingDecisions.
BooleanvaluescanbecombinedinBooleanexpressionsusingthelogicaloperatorsand,or,andnot.Again,we'llseetheminfullinthenextchapter,sofornowlet'sjustseeasimpleexample:
>>>int(True)#Truebehaveslike1
1
>>>int(False)#Falsebehaveslike0
0
>>>bool(1)#1evaluatestoTrueinabooleancontext
True
>>>bool(-42)#andsodoeseverynon-zeronumber
True
>>>bool(0)#0evaluatestoFalse
False
>>>#quickpeakattheoperators(and,or,not)
>>>notTrue
False
>>>notFalse
True
>>>TrueandTrue
True
>>>FalseorTrue
True
YoucanseethatTrueandFalsearesubclassesofintegerswhenyoutrytoaddthem.Pythonupcaststhemtointegersandperformstheaddition:
>>>1+True
2
>>>False+42
42
>>>7-True
6
Upcastingisatypeconversionoperationthatgoesfromasubclasstoitsparent.Intheexamplepresentedhere,TrueandFalse,whichbelongtoaclassderivedfromtheintegerclass,areconvertedbacktointegerswhenneeded.Thistopicisaboutinheritanceandwillbe
explainedindetailinChapter6,OOP,Decorators,andIterators.
RealnumbersRealnumbers,orfloatingpointnumbers,arerepresentedinPythonaccordingtotheIEEE754double-precisionbinaryfloating-pointformat,whichisstoredin64bitsofinformationdividedintothreesections:sign,exponent,andmantissa.
QuenchyourthirstforknowledgeaboutthisformatonWikipedia:http://en.wikipedia.org/wiki/Double-precision_floating-point_format.
Usually,programminglanguagesgivecoderstwodifferentformats:singleanddoubleprecision.Theformertakesup32bitsofmemory,andthelatter64.Pythonsupportsonlythedoubleformat.Let'sseeasimpleexample:
>>>pi=3.1415926536#howmanydigitsofPIcanyouremember?
>>>radius=4.5
>>>area=pi*(radius**2)
>>>area
63.617251235400005
Inthecalculationofthearea,Iwrappedtheradius**2withinbraces.Eventhoughthatwasn'tnecessarybecausethepoweroperatorhashigherprecedencethanthemultiplicationone,Ithinktheformulareadsmoreeasilylikethat.Moreover,shouldyougetaslightlydifferentresultforthearea,don'tworry.ItmightdependonyourOS,howPythonwascompiled,andsoon.Aslongasthefirstfewdecimaldigitsarecorrect,youknowit'sacorrectresult.
Thesys.float_infostructsequenceholdsinformationabouthowfloatingpointnumberswillbehaveonyoursystem.ThisiswhatIseeonmybox:
>>>importsys
>>>sys.float_info
sys.float_info(max=1.7976931348623157e+308,max_exp=1024,max_10_exp=308,
min=2.2250738585072014e-308,min_exp=-1021,min_10_exp=-307,dig=15,mant_dig=53,
epsilon=2.220446049250313e-16,radix=2,rounds=1)
Let'smakeafewconsiderationshere:wehave64bitstorepresentfloatnumbers.Thismeanswecanrepresentatmost2**64==18,446,744,073,709,551,616numberswiththatamountofbits.Takealookatthemaxandepsilonvaluesforthefloatnumbers,andyou'llrealizeit'simpossibletorepresentthemall.Thereisjustnotenoughspace,sotheyareapproximatedtotheclosestrepresentablenumber.Youprobablythinkthatonlyextremelybigorextremelysmallnumberssufferfromthisissue.Well,thinkagainandtrythefollowinginyourconsole:
>>>0.3-0.1*3#thisshouldbe0!!!
-5.551115123125783e-17
Whatdoesthistellyou?Ittellsyouthatdoubleprecisionnumberssufferfromapproximationissuesevenwhenitcomestosimplenumberslike0.1or0.3.Whyisthisimportant?Itcanbeabigproblemifyou'rehandlingprices,orfinancialcalculations,oranykindofdatathatneedsnottobeapproximated.Don'tworry,Pythongivesyouthedecimaltype,whichdoesn'tsufferfromtheseissues;we'llseetheminamoment.
Complexnumbers
Pythongivesyoucomplexnumberssupportoutofthebox.Ifyoudon'tknowwhatcomplexnumbersare,theyarenumbersthatcanbeexpressedintheforma+ibwhereaandbarerealnumbers,andi(orjifyou'reanengineer)istheimaginaryunit,thatis,thesquarerootof-1.aandbarecalled,respectively,therealandimaginarypartofthenumber.
It'sactuallyunlikelyyou'llbeusingthem,unlessyou'recodingsomethingscientific.Let'sseeasmallexample:
>>>c=3.14+2.73j
>>>c.real#realpart
3.14
>>>c.imag#imaginarypart
2.73
>>>c.conjugate()#conjugateofA+BjisA-Bj
(3.14-2.73j)
>>>c*2#multiplicationisallowed
(6.28+5.46j)
>>>c**2#poweroperationaswell
(2.4067000000000007+17.1444j)
>>>d=1+1j#additionandsubtractionaswell
>>>c-d
(2.14+1.73j)
FractionsanddecimalsLet'sfinishthetourofthenumberdepartmentwithalookatfractionsanddecimals.Fractionsholdarationalnumeratoranddenominatorintheirlowestforms.Let'sseeaquickexample:
>>>fromfractionsimportFraction
>>>Fraction(10,6)#madhatter?
Fraction(5,3)#noticeit'sbeensimplified
>>>Fraction(1,3)+Fraction(2,3)#1/3+2/3==3/3==1/1
Fraction(1,1)
>>>f=Fraction(10,6)
>>>f.numerator
5
>>>f.denominator
3
Althoughtheycanbeveryusefulattimes,it'snotthatcommontospotthemincommercialsoftware.Mucheasierinstead,istoseedecimalnumbersbeingusedinallthosecontextswhereprecisioniseverything;forexample,inscientificandfinancialcalculations.
It'simportanttorememberthatarbitraryprecisiondecimalnumberscomeatapriceinperformance,ofcourse.Theamountofdatatobestoredforeachnumberisfargreaterthanitisforfractionsorfloatsaswellasthewaytheyarehandled,whichcausesthePythoninterpretermuchmoreworkbehindthescenes.Anotherinterestingthingtonoteisthatyoucangetandsettheprecisionbyaccessingdecimal.getcontext().prec.
Let'sseeaquickexamplewithdecimalnumbers:
>>>fromdecimalimportDecimalasD#renameforbrevity
>>>D(3.14)#pi,fromfloat,soapproximationissues
Decimal('3.140000000000000124344978758017532527446746826171875')
>>>D('3.14')#pi,fromastring,sonoapproximationissues
Decimal('3.14')
>>>D(0.1)*D(3)-D(0.3)#fromfloat,westillhavetheissue
Decimal('2.775557561565156540423631668E-17')
>>>D('0.1')*D(3)-D('0.3')#fromstring,allperfect
Decimal('0.0')
>>>D('1.4').as_integer_ratio()#7/5=1.4(isn'tthiscool?!)
(7,5)
NoticethatwhenweconstructaDecimalnumberfromafloat,ittakesonalltheapproximationissuesfloatmaycomefrom.Ontheotherhand,whentheDecimalhasnoapproximationissues(forexample,whenwefeedanintorastringrepresentationtotheconstructor),thenthecalculationhasnoquirkybehavior.
Whenitcomestomoney,usedecimals.
Thisconcludesourintroductiontobuilt-innumerictypes.Let'snowlookatsequences.
ImmutablesequencesLet'sstartwithimmutablesequences:strings,tuples,andbytes.
StringsandbytesTextualdatainPythonishandledwithstrobjects,morecommonlyknownasstrings.TheyareimmutablesequencesofUnicodecodepoints.Unicodecodepointscanrepresentacharacter,butcanalsohaveothermeanings,suchasformattingdata,forexample.Python,unlikeotherlanguages,doesn'thaveachartype,soasinglecharacterisrenderedsimplybyastringoflength1.
Unicodeisanexcellentwaytohandledata,andshouldbeusedfortheinternalsofanyapplication.Whenitcomestostoringtextualdatathough,orsendingitonthenetwork,youmaywanttoencodeit,usinganappropriateencodingforthemediumyou'reusing.Theresultofanencodingproducesabytesobject,whosesyntaxandbehaviorissimilartothatofstrings.StringliteralsarewritteninPythonusingsingle,double,ortriplequotes(bothsingleordouble).Ifbuiltwithtriplequotes,astringcanspanonmultiplelines.Anexamplewillclarifythis:
>>>#4waystomakeastring
>>>str1='Thisisastring.Webuiltitwithsinglequotes.'
>>>str2="Thisisalsoastring,butbuiltwithdoublequotes."
>>>str3='''Thisisbuiltusingtriplequotes,
...soitcanspanmultiplelines.'''
>>>str4="""Thistoo
...isamultilineone
...builtwithtripledouble-quotes."""
>>>str4#A
'Thistoo\nisamultilineone\nbuiltwithtripledouble-quotes.'
>>>print(str4)#B
Thistoo
isamultilineone
builtwithtripledouble-quotes.
In#Aand#B,weprintstr4,firstimplicitly,andthenexplicitly,usingtheprintfunction.Aniceexercisewouldbetofindoutwhytheyaredifferent.Areyouuptothechallenge?(hint:lookupthestrfunction.)
Strings,likeanysequence,havealength.Youcangetthisbycallingthelenfunction:
>>>len(str1)
49
Encodinganddecodingstrings
Usingtheencode/decodemethods,wecanencodeUnicodestringsanddecodebytesobjects.UTF-8isavariablelengthcharacterencoding,capableofencodingallpossibleUnicodecodepoints.Itisthedominantencodingfortheweb.Noticealsothatbyaddingaliteralbinfrontofastringdeclaration,we'recreatingabytesobject:
>>>s="Thisisüŋíc0de"#unicodestring:codepoints
>>>type(s)
<class'str'>
>>>encoded_s=s.encode('utf-8')#utf-8encodedversionofs
>>>encoded_s
b'Thisis\xc3\xbc\xc5\x8b\xc3\xadc0de'#result:bytesobject
>>>type(encoded_s)#anotherwaytoverifyit
<class'bytes'>
>>>encoded_s.decode('utf-8')#let'sreverttotheoriginal
'Thisisüŋíc0de'
>>>bytes_obj=b"Abytesobject"#abytesobject
>>>type(bytes_obj)
<class'bytes'>
IndexingandslicingstringsWhenmanipulatingsequences,it'sverycommontohavetoaccessthematonepreciseposition(indexing),ortogetasubsequenceoutofthem(slicing).Whendealingwithimmutablesequences,bothoperationsareread-only.
Whileindexingcomesinoneform,azero-basedaccesstoanypositionwithinthesequence,slicingcomesindifferentforms.Whenyougetasliceofasequence,youcanspecifythestartandstoppositions,andthestep.Theyareseparatedwithacolon(:)likethis:my_sequence[start:stop:step].Alltheargumentsareoptional,startisinclusive,andstopisexclusive.It'smucheasiertoshowanexample,ratherthanexplainthemfurtherinwords:
>>>s="Thetroubleisyouthinkyouhavetime."
>>>s[0]#indexingatposition0,whichisthefirstchar
'T'
>>>s[5]#indexingatposition5,whichisthesixthchar
'r'
>>>s[:4]#slicing,wespecifyonlythestopposition
'The'
>>>s[4:]#slicing,wespecifyonlythestartposition
'troubleisyouthinkyouhavetime.'
>>>s[2:14]#slicing,bothstartandstoppositions
'etroubleis'
>>>s[2:14:3]#slicing,start,stopandstep(every3chars)
'erb'
>>>s[:]#quickwayofmakingacopy
'Thetroubleisyouthinkyouhavetime.'
Ofallthelines,thelastoneisprobablythemostinteresting.Ifyoudon'tspecifyaparameter,Pythonwillfillinthedefaultforyou.Inthiscase,startwillbethestartofthestring,stopwillbetheendofthestring,andstepwillbethedefault1.Thisisaneasyandquickwayofobtainingacopyofthestrings(samevalue,butdifferentobject).Canyoufindawaytogetthereversedcopyofastringusingslicing(don'tlookitup;finditforyourself)?
StringformattingOneofthefeaturesstringshaveistheabilitytobeusedasatemplate.Thereareseveraldifferentwaysofformattingastring,andforthefulllistofpossibilities,Iencourageyoutolookupthedocumentation.Herearesomecommonexamples:
>>>greet_old='Hello%s!'
>>>greet_old%'Fabrizio'
'HelloFabrizio!'
>>>greet_positional='Hello{}{}!'
>>>greet_positional.format('Fabrizio','Romano')
'HelloFabrizioRomano!'
>>>greet_positional_idx='Thisis{0}!{1}loves{0}!'
>>>greet_positional_idx.format('Python','Fabrizio')
'ThisisPython!FabriziolovesPython!'
>>>greet_positional_idx.format('Coffee','Fab')
'ThisisCoffee!FablovesCoffee!'
>>>keyword='Hello,mynameis{name}{last_name}'
>>>keyword.format(name='Fabrizio',last_name='Romano')
'Hello,mynameisFabrizioRomano'
Inthepreviousexample,youcanseefourdifferentwaysofformattingstings.Thefirstone,whichreliesonthe%operator,isdeprecatedandshouldn'tbeusedanymore.Thecurrent,modernwaytoformatastringisbyusingtheformatstringmethod.Youcansee,fromthedifferentexamples,thatapairofcurlybracesactsasaplaceholderwithinthestring.Whenwecallformat,wefeeditdatathatreplacestheplaceholders.Wecanspecifyindexes(andmuchmore)withinthecurlybraces,andevennames,whichimplieswe'llhavetocallformatusingkeywordargumentsinsteadofpositionalones.
Noticehowgreet_positional_idxisrendereddifferentlybyfeedingdifferentdatatothecalltoformat.Apparently,I'mintoPythonandcoffee...bigsurprise!
OnelastfeatureIwanttoshowyouisarelativelynewadditiontoPython(Version3.6)andit'scalledformattedstringliterals.Thisfeatureisquitecool:stringsareprefixedwithf,andcontainreplacementfieldssurroundedbycurlybraces.Replacementfieldsareexpressionsevaluatedatruntime,andthenformattedusingtheformatprotocol:
>>>name='Fab'
>>>age=42
>>>f"Hello!Mynameis{name}andI'm{age}"
"Hello!MynameisFabandI'm42"
>>>frommathimportpi
>>>f"Noarguingwith{pi},it'sirrational..."
"Noarguingwith3.141592653589793,it'sirrational..."
Checkouttheofficialdocumentationtolearneverythingaboutstringformattingandhowpowerfulitcanbe.
TuplesThelastimmutablesequencetypewe'regoingtoseeisthetuple.AtupleisasequenceofarbitraryPythonobjects.Inatuple,itemsareseparatedbycommas.TheyareusedeverywhereinPython,becausetheyallowforpatternsthatarehardtoreproduceinotherlanguages.Sometimestuplesareusedimplicitly;forexample,tosetupmultiplevariablesononeline,ortoallowafunctiontoreturnmultipledifferentobjects(usuallyafunctionreturnsoneobjectonly,inmanyotherlanguages),andeveninthePythonconsole,youcanusetuplesimplicitlytoprintmultipleelementswithonesingleinstruction.We'llseeexamplesforallthesecases:
>>>t=()#emptytuple
>>>type(t)
<class'tuple'>
>>>one_element_tuple=(42,)#youneedthecomma!
>>>three_elements_tuple=(1,3,5)#bracesareoptionalhere
>>>a,b,c=1,2,3#tupleformultipleassignment
>>>a,b,c#implicittupletoprintwithoneinstruction
(1,2,3)
>>>3inthree_elements_tuple#membershiptest
True
Noticethatthemembershipoperatorincanalsobeusedwithlists,strings,dictionaries,and,ingeneral,withcollectionandsequenceobjects.
Noticethattocreateatuplewithoneitem,weneedtoputthatcommaaftertheitem.Thereasonisthatwithoutthecommathatitemisjustitselfwrappedinbraces,kindofinaredundantmathematicalexpression.Noticealsothatonassignment,bracesareoptionalsomy_tuple=1,2,3isthesameasmy_tuple=(1,2,3).
Onethingthattupleassignmentallowsustodo,isone-lineswaps,withnoneedforathirdtemporaryvariable.Let'sseefirstamoretraditionalwayofdoingit:
>>>a,b=1,2
>>>c=a#weneedthreelinesandatemporaryvarc
>>>a=b
>>>b=c
>>>a,b#aandbhavebeenswapped
(2,1)
Andnowlet'sseehowwewoulddoitinPython:
>>>a,b=0,1
>>>a,b=b,a#thisisthePythonicwaytodoit
>>>a,b
(1,0)
TakealookatthelinethatshowsyouthePythonicwayofswappingtwovalues.DoyourememberwhatIwroteinChapter1,AGentleIntroductiontoPython?APythonprogramistypicallyone-fifthtoone-thirdthesizeofequivalentJavaorC++code,andfeatureslikeone-lineswapscontributetothis.Pythoniselegant,whereeleganceinthiscontextalsomeanseconomy.
Becausetheyareimmutable,tuplescanbeusedaskeysfordictionaries(we'llseethisshortly).Tome,tuplesarePython'sbuilt-indatathatmostcloselyrepresentamathematicalvector.Thisdoesn'tmeanthatthiswasthereasonforwhichtheywerecreatedthough.Tuplesusuallycontainanheterogeneoussequenceofelements,whileontheotherhand,listsaremostofthetimeshomogeneous.Moreover,tuplesarenormallyaccessedviaunpackingorindexing,whilelistsareusuallyiteratedover.
MutablesequencesMutablesequencesdifferfromtheirimmutablesistersinthattheycanbechangedaftercreation.TherearetwomutablesequencetypesinPython:listsandbytearrays.IsaidbeforethatthedictionaryisthekingofdatastructuresinPython.Iguessthismakesthelistitsrightfulqueen.
ListsPythonlistsaremutablesequences.Theyareverysimilartotuples,buttheydon'thavetherestrictionsofimmutability.Listsarecommonlyusedtostoringcollectionsofhomogeneousobjects,butthereisnothingpreventingyoufromstoreheterogeneouscollectionsaswell.Listscanbecreatedinmanydifferentways.Let'sseeanexample:
>>>[]#emptylist
[]
>>>list()#sameas[]
[]
>>>[1,2,3]#aswithtuples,itemsarecommaseparated
[1,2,3]
>>>[x+5forxin[2,3,4]]#Pythonismagic
[7,8,9]
>>>list((1,3,5,7,9))#listfromatuple
[1,3,5,7,9]
>>>list('hello')#listfromastring
['h','e','l','l','o']
Inthepreviousexample,Ishowedyouhowtocreatealistusingdifferenttechniques.IwouldlikeyoutotakeagoodlookatthelinethatsaysPythonismagic,whichIamnotexpectingyoutofullyunderstandatthispoint(unlessyoucheatedandyou'renotanovice!).Thatiscalledalistcomprehension,averypowerfulfunctionalfeatureofPython,whichwe'llseeindetailinChapter5,SavingTimeandMemory.Ijustwantedtomakeyourmouthwateratthispoint.
Creatinglistsisgood,buttherealfuncomeswhenweusethem,solet'sseethemainmethodstheygiftuswith:
>>>a=[1,2,1,3]
>>>a.append(13)#wecanappendanythingattheend
>>>a
[1,2,1,3,13]
>>>a.count(1)#howmany`1`arethereinthelist?
2
>>>a.extend([5,7])#extendthelistbyanother(orsequence)
>>>a
[1,2,1,3,13,5,7]
>>>a.index(13)#positionof`13`inthelist(0-basedindexing)
4
>>>a.insert(0,17)#insert`17`atposition0
>>>a
[17,1,2,1,3,13,5,7]
>>>a.pop()#pop(removeandreturn)lastelement
7
>>>a.pop(3)#popelementatposition3
1
>>>a
[17,1,2,3,13,5]
>>>a.remove(17)#remove`17`fromthelist
>>>a
[1,2,3,13,5]
>>>a.reverse()#reversetheorderoftheelementsinthelist
>>>a
[5,13,3,2,1]
>>>a.sort()#sortthelist
>>>a
[1,2,3,5,13]
>>>a.clear()#removeallelementsfromthelist
>>>a
[]
Theprecedingcodegivesyouaroundupofalist'smainmethods.Iwanttoshowyouhowpowerfultheyare,usingextendasanexample.Youcanextendlistsusinganysequencetype:
>>>a=list('hello')#makesalistfromastring
>>>a
['h','e','l','l','o']
>>>a.append(100)#append100,heterogeneoustype
>>>a
['h','e','l','l','o',100]
>>>a.extend((1,2,3))#extendusingtuple
>>>a
['h','e','l','l','o',100,1,2,3]
>>>a.extend('...')#extendusingstring
>>>a
['h','e','l','l','o',100,1,2,3,'.','.','.']
Now,let'sseewhatarethemostcommonoperationsyoucandowithlists:
>>>a=[1,3,5,7]
>>>min(a)#minimumvalueinthelist
1
>>>max(a)#maximumvalueinthelist
7
>>>sum(a)#sumofallvaluesinthelist
16
>>>len(a)#numberofelementsinthelist
4
>>>b=[6,7,8]
>>>a+b#`+`withlistmeansconcatenation
[1,3,5,7,6,7,8]
>>>a*2#`*`hasalsoaspecialmeaning
[1,3,5,7,1,3,5,7]
Thelasttwolinesintheprecedingcodearequiteinterestingbecausetheyintroduceustoaconceptcalledoperatoroverloading.Inshort,itmeansthatoperatorssuchas+,-.*,%,andsoon,mayrepresentdifferentoperationsaccordingtothecontexttheyareusedin.Itdoesn'tmakeanysensetosumtwolists,right?Therefore,the+signisusedtoconcatenatethem.Hence,the*signis
usedtoconcatenatethelisttoitselfaccordingtotherightoperand.
Now,let'stakeastepfurtherandseesomethingalittlemoreinteresting.IwanttoshowyouhowpowerfulthesortedmethodcanbeandhoweasyitisinPythontoachieveresultsthatrequireagreatdealofeffortinotherlanguages:
>>>fromoperatorimportitemgetter
>>>a=[(5,3),(1,3),(1,2),(2,-1),(4,9)]
>>>sorted(a)
[(1,2),(1,3),(2,-1),(4,9),(5,3)]
>>>sorted(a,key=itemgetter(0))
[(1,3),(1,2),(2,-1),(4,9),(5,3)]
>>>sorted(a,key=itemgetter(0,1))
[(1,2),(1,3),(2,-1),(4,9),(5,3)]
>>>sorted(a,key=itemgetter(1))
[(2,-1),(1,2),(5,3),(1,3),(4,9)]
>>>sorted(a,key=itemgetter(1),reverse=True)
[(4,9),(5,3),(1,3),(1,2),(2,-1)]
Theprecedingcodedeservesalittleexplanation.Firstofall,aisalistoftuples.Thismeanseachelementinaisatuple(a2-tuple,tobeprecise).Whenwecallsorted(some_list),wegetasortedversionofsome_list.Inthiscase,thesortingona2-tupleworksbysortingthemonthefirstiteminthetuple,andonthesecondwhenthefirstoneisthesame.Youcanseethisbehaviorintheresultofsorted(a),whichyields[(1,2),(1,3),...].Pythonalsogivesustheabilitytocontrolwhichelement(s)ofthetuplethesortingmustberunagainst.Noticethatwhenweinstructthesortedfunctiontoworkonthefirstelementofeachtuple(bykey=itemgetter(0)),theresultisdifferent:[(1,3),(1,2),...].Thesortingisdoneonlyonthefirstelementofeachtuple(whichistheoneatposition0).Ifwewanttoreplicatethedefaultbehaviorofasimplesorted(a)call,weneedtousekey=itemgetter(0,1),whichtellsPythontosortfirstontheelementsatposition0withinthetuples,andthenonthoseatposition1.Comparetheresultsandyou'llseetheymatch.
Forcompleteness,Iincludedanexampleofsortingonlyontheelementsatposition1,andthesamebutinreverseorder.IfyouhaveeverseensortinginJava,Iexpectyoutobequiteimpressedatthismoment.
ThePythonsortingalgorithmisverypowerful,anditwaswrittenbyTimPeters(we'vealreadyseenthisname,canyourecallwhen?).ItisaptlynamedTimsort,anditisablendbetweenmergeandinsertionsortandhasbettertimeperformancesthanmostotheralgorithmsusedformainstreamprogramminglanguages.Timsortisastablesortingalgorithm,whichmeansthatwhenmultiple
recordshavethesamekey,theiroriginalorderispreserved.We'veseenthisintheresultofsorted(a,key=itemgetter(0)),whichhasyielded[(1,3),(1,2),...],inwhichtheorderofthosetwotupleshasbeenpreservedbecausetheyhavethesamevalueatposition0.
BytearraysToconcludeouroverviewofmutablesequencetypes,let'sspendacoupleofminutesonthebytearraytype.Basically,theyrepresentthemutableversionofbytesobjects.Theyexposemostoftheusualmethodsofmutablesequencesaswellasmostofthemethodsofthebytestype.Itemsareintegersintherange[0,256).
Whenitcomestointervals,I'mgoingtousethestandardnotationforopen/closedranges.Asquarebracketononeendmeansthatthevalueisincluded,whilearoundbracemeansit'sexcluded.Thegranularityisusuallyinferredbythetypeoftheedgeelementsso,forexample,theinterval[3,7]meansallintegersbetween3and7,inclusive.Ontheotherhand,(3,7)meansallintegersbetween3and7exclusive(hence4,5,and6).Itemsinabytearraytypeareintegersbetween0and256;0isincluded,256isnot.Onereasonintervalsareoftenexpressedlikethisistoeasecoding.Ifwebreakarange[a,b)intoNconsecutiveranges,wecaneasilyrepresenttheoriginaloneasaconcatenationlikethis:[a,k1)+[k1,k2)+[k2,k3)+...+[kN-1,b)Themiddlepoints(ki)beingexcludedononeend,andincludedontheotherend,allowforeasyconcatenationandsplittingwhenintervalsarehandledinthecode.
Let'sseeaquickexamplewiththebytearraytype:
>>>bytearray()#emptybytearrayobject
bytearray(b'')
>>>bytearray(10)#zero-filledinstancewithgivenlength
bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')
>>>bytearray(range(5))#bytearrayfromiterableofintegers
bytearray(b'\x00\x01\x02\x03\x04')
>>>name=bytearray(b'Lina')#A-bytearrayfrombytes
>>>name.replace(b'L',b'l')
bytearray(b'lina')
>>>name.endswith(b'na')
True
>>>name.upper()
bytearray(b'LINA')
>>>name.count(b'L')
1
Asyoucanseeintheprecedingcode,thereareafewwaystocreateabytearrayobject.Theycanbeusefulinmanysituations;forexample,whenreceivingdatathroughasocket,theyeliminatetheneedtoconcatenatedatawhilepolling,hencetheycanprovetobeveryhandy.Ontheline#A,Icreatedabytearraynamedasnamefromthebytesliteralb'Lina'toshowyouhowthebytearrayobjectexposesmethodsfrombothsequencesandstrings,whichisextremelyhandy.Ifyouthinkaboutit,theycanbeconsideredasmutablestrings.
SettypesPythonalsoprovidestwosettypes,setandfrozenset.Thesettypeismutable,whilefrozensetisimmutable.Theyareunorderedcollectionsofimmutableobjects.Hashabilityisacharacteristicthatallowsanobjecttobeusedasasetmemberaswellasakeyforadictionary,aswe'llseeverysoon.
Fromtheofficialdocumentation:"Anobjectishashableifithasahashvaluewhichneverchangesduringitslifetime,andcanbecomparedtootherobjects.Hashabilitymakesanobjectusableasadictionarykeyandasetmember,becausethesedatastructuresusethehashvalueinternally.AllofPython’simmutablebuilt-inobjectsarehashablewhilemutablecontainersarenot."
Objectsthatcompareequallymusthavethesamehashvalue.Setsareverycommonlyusedtotestformembership,solet'sintroducetheinoperatorinthefollowingexample:
>>>small_primes=set()#emptyset
>>>small_primes.add(2)#addingoneelementatatime
>>>small_primes.add(3)
>>>small_primes.add(5)
>>>small_primes
{2,3,5}
>>>small_primes.add(1)#LookwhatI'vedone,1isnotaprime!
>>>small_primes
{1,2,3,5}
>>>small_primes.remove(1)#solet'sremoveit
>>>3insmall_primes#membershiptest
True
>>>4insmall_primes
False
>>>4notinsmall_primes#negatedmembershiptest
True
>>>small_primes.add(3)#tryingtoadd3again
>>>small_primes
{2,3,5}#nochange,duplicationisnotallowed
>>>bigger_primes=set([5,7,11,13])#fastercreation
>>>small_primes|bigger_primes#unionoperator`|`
{2,3,5,7,11,13}
>>>small_primes&bigger_primes#intersectionoperator`&`
{5}
>>>small_primes-bigger_primes#differenceoperator`-`
{2,3}
Intheprecedingcode,youcanseetwodifferentwaystocreateaset.Onecreatesanemptysetandthenaddselementsoneatatime.Theothercreatesthesetusingalistofnumbersasanargumenttotheconstructor,whichdoesalltheworkforus.Ofcourse,youcancreateasetfromalistortuple(oranyiterable)andthenyoucanaddandremovemembersfromthesetasyouplease.
We'lllookatiterableobjectsanditerationinthenextchapter.Fornow,justknowthatiterableobjectsareobjectsyoucaniterateoninadirection.
Anotherwayofcreatingasetisbysimplyusingthecurlybracesnotation,likethis:
>>>small_primes={2,3,5,5,3}
>>>small_primes
{2,3,5}
NoticeIaddedsomeduplicationtoemphasizethattheresultingsetwon'thaveany.Let'sseeanexampleabouttheimmutablecounterpartofthesettype,frozenset:
>>>small_primes=frozenset([2,3,5,7])
>>>bigger_primes=frozenset([5,7,11])
>>>small_primes.add(11)#wecannotaddtoafrozenset
Traceback(mostrecentcalllast):
File"<stdin>",line1,in<module>
AttributeError:'frozenset'objecthasnoattribute'add'
>>>small_primes.remove(2)#neitherwecanremove
Traceback(mostrecentcalllast):
File"<stdin>",line1,in<module>
AttributeError:'frozenset'objecthasnoattribute'remove'
>>>small_primes&bigger_primes#intersect,union,etc.allowed
frozenset({5,7})
Asyoucansee,frozensetobjectsarequitelimitedinrespectoftheirmutablecounterpart.Theystillproveveryeffectiveformembershiptest,union,intersection,anddifferenceoperations,andforperformancereasons.
Mappingtypes–dictionariesOfallthebuilt-inPythondatatypes,thedictionaryiseasilythemostinterestingone.It'stheonlystandardmappingtype,anditisthebackboneofeveryPythonobject.
Adictionarymapskeystovalues.Keysneedtobehashableobjects,whilevaluescanbeofanyarbitrarytype.Dictionariesaremutableobjects.Therearequiteafewdifferentwaystocreateadictionary,soletmegiveyouasimpleexampleofhowtocreateadictionaryequalto{'A':1,'Z':-1}infivedifferentways:
>>>a=dict(A=1,Z=-1)
>>>b={'A':1,'Z':-1}
>>>c=dict(zip(['A','Z'],[1,-1]))
>>>d=dict([('A',1),('Z',-1)])
>>>e=dict({'Z':-1,'A':1})
>>>a==b==c==d==e#aretheyallthesame?
True#Theyareindeed
Haveyounoticedthosedoubleequals?Assignmentisdonewithoneequal,whiletocheckwhetheranobjectisthesameasanotherone(orfiveinonego,inthiscase),weusedoubleequals.Thereisalsoanotherwaytocompareobjects,whichinvolvestheisoperator,andcheckswhetherthetwoobjectsarethesame(iftheyhavethesameID,notjustthevalue),butunlessyouhaveagoodreasontouseit,youshouldusethedoubleequalsinstead.Intheprecedingcode,Ialsousedonenicefunction:zip.Itisnamedafterthereal-lifezip,whichgluestogethertwothingstakingoneelementfromeachatatime.Letmeshowyouanexample:
>>>list(zip(['h','e','l','l','o'],[1,2,3,4,5]))
[('h',1),('e',2),('l',3),('l',4),('o',5)]
>>>list(zip('hello',range(1,6)))#equivalent,morePythonic
[('h',1),('e',2),('l',3),('l',4),('o',5)]
Intheprecedingexample,Ihavecreatedthesamelistintwodifferentways,onemoreexplicit,andtheotheralittlebitmorePythonic.ForgetforamomentthatIhadtowrapthelistconstructoraroundthezipcall(thereasonisbecausezipreturnsaniterator,notalist,soifIwanttoseetheresultIneedtoexhaustthatiteratorintosomething—alistinthiscase),andconcentrateontheresult.See
howziphascoupledthefirstelementsofitstwoargumentstogether,thenthesecondones,thenthethirdones,andsoonandsoforth?Takealookatyourpants(oratyourpurse,ifyou'realady)andyou'llseethesamebehaviorinyouractualzip.Butlet'sgobacktodictionariesandseehowmanywonderfulmethodstheyexposeforallowingustomanipulatethemaswewant.
Let'sstartwiththebasicoperations:
>>>d={}
>>>d['a']=1#let'ssetacoupleof(key,value)pairs
>>>d['b']=2
>>>len(d)#howmanypairs?
2
>>>d['a']#whatisthevalueof'a'?
1
>>>d#howdoes`d`looknow?
{'a':1,'b':2}
>>>deld['a']#let'sremove`a`
>>>d
{'b':2}
>>>d['c']=3#let'sadd'c':3
>>>'c'ind#membershipischeckedagainstthekeys
True
>>>3ind#notthevalues
False
>>>'e'ind
False
>>>d.clear()#let'scleaneverythingfromthisdictionary
>>>d
{}
Noticehowaccessingkeysofadictionary,regardlessofthetypeofoperationwe'reperforming,isdonethroughsquarebrackets.Doyourememberstrings,lists,andtuples?Wewereaccessingelementsatsomepositionthroughsquarebracketsaswell,whichisyetanotherexampleofPython'sconsistency.
Let'sseenowthreespecialobjectscalleddictionaryviews:keys,values,anditems.Theseobjectsprovideadynamicviewofthedictionaryentriesandtheychangewhenthedictionarychanges.keys()returnsallthekeysinthedictionary,values()returnsallthevaluesinthedictionary,anditems()returnsallthe(key,value)pairsinthedictionary.
AccordingtothePythondocumentation:"Keysandvaluesareiteratedoverinanarbitraryorderwhichisnon-random,variesacrossPythonimplementations,anddependsonthedictionary’shistoryofinsertionsanddeletions.Ifkeys,valuesanditemsviewsareiteratedoverwithnointerveningmodificationstothedictionary,theorderofitemswilldirectlycorrespond."
Enoughwiththischatter;let'sputallthisdownintocode:
>>>d=dict(zip('hello',range(5)))
>>>d
{'h':0,'e':1,'l':3,'o':4}
>>>d.keys()
dict_keys(['h','e','l','o'])
>>>d.values()
dict_values([0,1,3,4])
>>>d.items()
dict_items([('h',0),('e',1),('l',3),('o',4)])
>>>3ind.values()
True
>>>('o',4)ind.items()
True
Thereareafewthingstonoticeintheprecedingcode.First,noticehowwe'recreatingadictionarybyiteratingoverthezippedversionofthestring'hello'andthelist[0,1,2,3,4].Thestring'hello'hastwo'l'charactersinside,andtheyarepairedupwiththevalues2and3bythezipfunction.Noticehowinthedictionary,thesecondoccurrenceofthe'l'key(theonewithvalue3),overwritesthefirstone(theonewithvalue2).Anotherthingtonoticeisthatwhenaskingforanyview,theoriginalorderisnowpreserved,whilebeforeVersion3.6therewasnoguaranteeofthat.
AsofPython3.6,thedicttypehasbeenreimplementedtouseamorecompactrepresentation.Thisresultedindictionariesusing20%to25%lessmemorywhencomparedtoPython3.5.Moreover,inPython3.6,asasideeffect,dictionariesarenativelyordered.Thisfeaturehasreceivedsuchawelcomefromthecommunitythatin3.7ithasbecomealegitfeatureofthelanguageratherthananimplementationsideeffect.Adictisorderedifitrememberstheorderinwhichkeyswerefirstinserted.
We'llseehowtheseviewsarefundamentaltoolswhenwetalkaboutiteratingovercollections.Let'stakealooknowatsomeothermethodsexposedbyPython'sdictionaries;there'splentyofthemandtheyareveryuseful:
>>>d
{'e':1,'h':0,'o':4,'l':3}
>>>d.popitem()#removesarandomitem(usefulinalgorithms)
('o',4)
>>>d
{'h':0,'e':1,'l':3}
>>>d.pop('l')#removeitemwithkey`l`
3
>>>d.pop('not-a-key')#removeakeynotindictionary:KeyError
Traceback(mostrecentcalllast):
File"<stdin>",line1,in<module>
KeyError:'not-a-key'
>>>d.pop('not-a-key','default-value')#withadefaultvalue?
'default-value'#wegetthedefaultvalue
>>>d.update({'another':'value'})#wecanupdatedictthisway
>>>d.update(a=13)#orthisway(likeafunctioncall)
>>>d
{'h':0,'e':1,'another':'value','a':13}
>>>d.get('a')#sameasd['a']butifkeyismissingnoKeyError
13
>>>d.get('a',177)#defaultvalueusedifkeyismissing
13
>>>d.get('b',177)#likeinthiscase
177
>>>d.get('b')#keyisnotthere,soNoneisreturned
Allthesemethodsarequitesimpletounderstand,butit'sworthtalkingaboutthatNone,foramoment.EveryfunctioninPythonreturnsNone,unlessthereturnstatementisexplicitlyusedtoreturnsomethingelse,butwe'llseethiswhenweexplorefunctions.Noneisfrequentlyusedtorepresenttheabsenceofavalue,anditisquitecommonlyusedasadefaultvalueforargumentsinfunctiondeclaration.SomeinexperiencedcoderssometimeswritecodethatreturnseitherFalseorNone.BothFalseandNoneevaluatetoFalseinaBooleancontextsoitmayseemthereisnotmuchdifferencebetweenthem.Butactually,Iwouldarguethereisquiteanimportantdifference:Falsemeansthatwehaveinformation,andtheinformationwehaveisFalse.Nonemeansnoinformation.AndnoinformationisverydifferentfrominformationthatisFalse.Inlayman'sterms,ifyouaskyourmechanic,Ismycarready?,thereisabigdifferencebetweentheanswer,No,it'snot(False)and,Ihavenoidea(None).
OnelastmethodIreallylikeaboutdictionariesissetdefault.Itbehaveslikeget,butalsosetsthekeywiththegivenvalueifitisnotthere.Let'sseeanexample:
>>>d={}
>>>d.setdefault('a',1)#'a'ismissing,wegetdefaultvalue
1
>>>d
{'a':1}#also,thekey/valuepair('a',1)hasnowbeenadded
>>>d.setdefault('a',5)#let'strytooverridethevalue
1
>>>d
{'a':1}#nooverride,asexpected
So,we'renowattheendofthistour.Testyourknowledgeaboutdictionariesbytryingtoforeseewhatdlookslikeafterthisline:
>>>d={}
>>>d.setdefault('a',{}).setdefault('b',[]).append(1)
Don'tworryifyoudon'tgetitimmediately.Ijustwantedtoencourageyoutoexperimentwithdictionaries.
Thisconcludesourtourofbuilt-indatatypes.BeforeIdiscusssomeconsiderationsaboutwhatwe'veseeninthischapter,Iwanttotakeapeek
brieflyatthecollectionsmodule.
ThecollectionsmoduleWhenPythongeneralpurposebuilt-incontainers(tuple,list,set,anddict)aren'tenough,wecanfindspecializedcontainerdatatypesinthecollectionsmodule.Theyare:
Datatype Description
namedtuple() Factoryfunctionforcreatingtuplesubclasseswithnamedfields
deque List-likecontainerwithfastappendsandpopsoneitherend
ChainMapDictionary-likeclassforcreatingasingleviewofmultiplemappings
Counter Dictionarysubclassforcountinghashableobjects
OrderedDictDictionarysubclassthatrememberstheorderentrieswereadded
defaultdictDictionarysubclassthatcallsafactoryfunctiontosupplymissingvalues
UserDictWrapperarounddictionaryobjectsforeasierdictionarysubclassing
UserList Wrapperaroundlistobjectsforeasierlistsubclassing
UserString Wrapperaroundstringobjectsforeasierstringsubclassing
Wedon'thavetheroomtocoverallofthem,butyoucanfindplentyofexamplesintheofficialdocumentation,sohereI'lljustgiveasmallexampletoshowyounamedtuple,defaultdict,andChainMap.
namedtupleAnamedtupleisatuple-likeobjectthathasfieldsaccessiblebyattributelookupaswellasbeingindexableanditerable(it'sactuallyasubclassoftuple).Thisissortofacompromisebetweenafull-fledgedobjectandatuple,anditcanbeusefulinthosecaseswhereyoudon'tneedthefullpowerofacustomobject,butyouwantyourcodetobemorereadablebyavoidingweirdindexing.Anotherusecaseiswhenthereisachancethatitemsinthetupleneedtochangetheirpositionafterrefactoring,forcingthecodertorefactoralsoallthelogicinvolved,whichcanbeverytricky.Asusual,anexampleisbetterthanathousandwords(orwasitapicture?).Saywearehandlingdataabouttheleftandrighteyesofapatient.Wesaveonevalueforthelefteye(position0)andonefortherighteye(position1)inaregulartuple.Here'showthatmightbe:>>>vision=(9.5,8.8)>>>vision(9.5,8.8)>>>vision[0]#lefteye(implicitpositionalreference)9.5>>>vision[1]#righteye(implicitpositionalreference)8.8
Nowlet'spretendwehandlevisionobjectsallthetime,andatsomepointthedesignerdecidestoenhancethembyaddinginformationforthecombinedvision,sothatavisionobjectstoresdatainthisformat:(lefteye,combined,righteye).
Doyouseethetroublewe'reinnow?Wemayhavealotofcodethatdependsonvision[0]beingthelefteyeinformation(whichitstillis)andvision[1]beingtherighteyeinformation(whichisnolongerthecase).Wehavetorefactorourcodewhereverwehandletheseobjects,changingvision[1]tovision[2],anditcanbepainful.Wecouldhaveprobablyapproachedthisabitbetterfromthebeginning,byusinganamedtuple.LetmeshowyouwhatImean:
>>>fromcollectionsimportnamedtuple
>>>Vision=namedtuple('Vision',['left','right'])
>>>vision=Vision(9.5,8.8)
>>>vision[0]
9.5
>>>vision.left#sameasvision[0],butexplicit
9.5
>>>vision.right#sameasvision[1],butexplicit
8.8
Ifwithinourcode,werefertotheleftandrighteyesusingvision.leftandvision.right,allweneedtodotofixthenewdesignissueistochangeourfactoryandthewaywecreateinstances.Therestofthecodewon'tneedtochange:
>>>Vision=namedtuple('Vision',['left','combined','right'])
>>>vision=Vision(9.5,9.2,8.8)
>>>vision.left#stillcorrect
9.5
>>>vision.right#stillcorrect(thoughnowisvision[2])
8.8
>>>vision.combined#thenewvision[1]
9.2
Youcanseehowconvenientitistorefertothosevaluesbynameratherthanbyposition.Afterall,awisemanoncewrote,Explicitisbetterthanimplicit(canyourecallwhere?ThinkZenifyoucan't...).Thisexamplemaybealittleextreme;ofcourse,it'snotlikelythatourcodedesignerwillgoforachangelikethis,butyou'dbeamazedtoseehowfrequentlyissuessimilartothisonehappeninaprofessionalenvironment,andhowpainfulitistorefactorthem.
defaultdictThedefaultdictdatatypeisoneofmyfavorites.Itallowsyoutoavoidcheckingifakeyisinadictionarybysimplyinsertingitforyouonyourfirstaccessattempt,withadefaultvaluewhosetypeyoupassoncreation.Insomecases,thistoolcanbeveryhandyandshortenyourcodealittle.Let'sseeaquickexample.Sayweareupdatingthevalueofage,byaddingoneyear.Ifageisnotthere,weassumeitwas0andweupdateitto1:>>>d={}>>>d['age']=d.get('age',0)+1#agenotthere,weget0+1>>>d{'age':1}>>>d={'age':39}>>>d['age']=d.get('age',0)+1#ageisthere,weget40>>>d{'age':40}
Nowlet'sseehowitwouldworkwithadefaultdictdatatype.Thesecondlineisactuallytheshortversionofafour-lines-longifclausethatwewouldhavetowriteifdictionariesdidn'thavethegetmethod(we'llseeallaboutifclausesinChapter3,IteratingandMakingDecisions):
>>>fromcollectionsimportdefaultdict
>>>dd=defaultdict(int)#intisthedefaulttype(0thevalue)
>>>dd['age']+=1#shortfordd['age']=dd['age']+1
>>>dd
defaultdict(<class'int'>,{'age':1})#1,asexpected
Noticehowwejustneedtoinstructthedefaultdictfactorythatwewantanintnumbertobeusedincasethekeyismissing(we'llget0,whichisthedefaultfortheinttype).Also,noticethateventhoughinthisexamplethereisnogainonthenumberoflines,thereisdefinitelyagaininreadability,whichisveryimportant.Youcanalsouseadifferenttechniquetoinstantiateadefaultdictdatatype,whichinvolvescreatingafactoryobject.Todigdeeper,pleaserefertotheofficialdocumentation.
ChainMapChainMapisanextremelynicedatatypewhichwasintroducedinPython3.3.ItbehaveslikeanormaldictionarybutaccordingtothePythondocumentation:"isprovidedforquicklylinkinganumberofmappingssotheycanbetreatedasasingleunit""".Thisisusuallymuchfasterthancreatingonedictionaryandrunningmultipleupdatecallsonit.ChainMapcanbeusedtosimulatenestedscopesandisusefulintemplating.Theunderlyingmappingsarestoredinalist.Thatlistispublicandcanbeaccessedorupdatedusingthemapsattribute.Lookupssearchtheunderlyingmappingssuccessivelyuntilakeyisfound.Bycontrast,writes,updates,anddeletionsonlyoperateonthefirstmapping.
Averycommonusecaseisprovidingdefaults,solet'sseeanexample:
>>>fromcollectionsimportChainMap
>>>default_connection={'host':'localhost','port':4567}
>>>connection={'port':5678}
>>>conn=ChainMap(connection,default_connection)#mapcreation
>>>conn['port']#portisfoundinthefirstdictionary
5678
>>>conn['host']#hostisfetchedfromtheseconddictionary
'localhost'
>>>conn.maps#wecanseethemappingobjects
[{'port':5678},{'host':'localhost','port':4567}]
>>>conn['host']='packtpub.com'#let'saddhost
>>>conn.maps
[{'port':5678,'host':'packtpub.com'},
{'host':'localhost','port':4567}]
>>>delconn['port']#let'sremovetheportinformation
>>>conn.maps
[{'host':'packtpub.com'},{'host':'localhost','port':4567}]
>>>conn['port']#nowportisfetchedfromtheseconddictionary
4567
>>>dict(conn)#easytomergeandconverttoregulardictionary
{'host':'packtpub.com','port':4567}
IjustlovehowPythonmakesyourlifeeasy.YouworkonaChainMapobject,configurethefirstmappingasyouwant,andwhenyouneedacompletedictionarywithallthedefaultsaswellasthecustomizeditems,youjustfeedtheChainMapobjecttoadictconstructor.Ifyouhavenevercodedinotherlanguages,suchasJavaorC++,youprobablywon'tbeabletoappreciatefullyhowpreciousthisis,andhowPythonmakesyourlifesomucheasier.Ido,IfeelclaustrophobiceverytimeIhavetocodeinsomeotherlanguage.
EnumsTechnicallynotabuilt-indatatype,asyouhavetoimportthemfromtheenummodule,butdefinitelyworthmentioning,areenumerations.TheywereintroducedinPython3.4,andthoughitisnotthatcommontoseetheminprofessionalcode(yet),IthoughtI'dgiveyouanexampleanyway.
Theofficialdefinitiongoeslikethis:"Anenumerationisasetofsymbolicnames(members)boundtounique,constantvalues.Withinanenumeration,thememberscanbecomparedbyidentity,andtheenumerationitselfcanbeiteratedover."
Sayyouneedtorepresenttrafficlights.Inyourcode,youmightresorttodoingthis:
>>>GREEN=1
>>>YELLOW=2
>>>RED=4
>>>TRAFFIC_LIGHTS=(GREEN,YELLOW,RED)
>>>#orwithadict
>>>traffic_lights={'GREEN':1,'YELLOW':2,'RED':4}
There'snothingspecialabouttheprecedingcode.It'ssomething,infact,thatisverycommontofind.But,considerdoingthisinstead:
>>>fromenumimportEnum
>>>classTrafficLight(Enum):
...GREEN=1
...YELLOW=2
...RED=4
...
>>>TrafficLight.GREEN
<TrafficLight.GREEN:1>
>>>TrafficLight.GREEN.name
'GREEN'
>>>TrafficLight.GREEN.value
1
>>>TrafficLight(1)
<TrafficLight.GREEN:1>
>>>TrafficLight(4)
<TrafficLight.RED:4>
Ignoringforamomentthe(relative)complexityofaclassdefinition,youcanappreciatehowthismightbemoreadvantageous.Thedatastructureismuchcleaner,andtheAPIitprovidesismuchmorepowerful.Iencourageyouto
checkouttheofficialdocumentationtoexploreallthegreatfeaturesyoucanfindintheenummodule.Ithinkit'sworthexploring,atleastonce.
Finalconsiderations
That'sit.NowyouhaveseenaverygoodproportionofthedatastructuresthatyouwilluseinPython.IencourageyoutotakeadiveintothePythondocumentationandexperimentfurtherwitheachandeverydatatypewe'veseeninthischapter.It'sworthit,believeme.Everythingyou'llwritewillbeabouthandlingdata,somakesureyourknowledgeaboutitisrocksolid.
BeforeweleapintoChapter3,IteratingandMakingDecisions,I'dliketosharesomefinalconsiderationsaboutdifferentaspectsthattomymindareimportantandnottobeneglected.
Smallvaluescaching
Whenwediscussedobjectsatthebeginningofthischapter,wesawthatwhenweassignedanametoanobject,Pythoncreatestheobject,setsitsvalue,andthenpointsthenametoit.Wecanassigndifferentnamestothesamevalueandweexpectdifferentobjectstobecreated,likethis:
>>>a=1000000
>>>b=1000000
>>>id(a)==id(b)
False
Intheprecedingexample,aandbareassignedtotwointobjects,whichhavethesamevaluebuttheyarenotthesameobject,asyoucansee,theiridisnotthesame.Solet'sdoitagain:
>>>a=5
>>>b=5
>>>id(a)==id(b)
True
Oh,oh!IsPythonbroken?Whyarethetwoobjectsthesamenow?Wedidn'tdoa=b=5,wesetthemupseparately.Well,theanswerisperformances.Pythoncachesshortstringsandsmallnumbers,toavoidhavingmanycopiesofthemcloggingupthesystemmemory.Everythingishandledproperlyunderthehoodsoyoudon'tneedtoworryabit,butmakesurethatyourememberthisbehaviorshouldyourcodeeverneedtofiddlewithIDs.
HowtochoosedatastructuresAswe'veseen,Pythonprovidesyouwithseveralbuilt-indatatypesandsometimes,ifyou'renotthatexperienced,choosingtheonethatservesyoubestcanbetricky,especiallywhenitcomestocollections.Forexample,sayyouhavemanydictionariestostore,eachofwhichrepresentsacustomer.Withineachcustomerdictionary,there'san'id':'code'uniqueidentificationcode.Inwhatkindofcollectionwouldyouplacethem?Well,unlessIknowmoreaboutthesecustomers,it'sveryhardtoanswer.WhatkindofaccesswillIneed?WhatsortofoperationswillIhavetoperformoneachofthem,andhowmanytimes?Willthecollectionchangeovertime?WillIneedtomodifythecustomerdictionariesinanyway?WhatisgoingtobethemostfrequentoperationIwillhavetoperformonthecollection?
Ifyoucananswertheprecedingquestions,thenyouwillknowwhattochoose.Ifthecollectionnevershrinksorgrows(inotherwords,itwon'tneedtoadd/deleteanycustomerobjectaftercreation)orshuffles,thentuplesareapossiblechoice.Otherwise,listsareagoodcandidate.Everycustomerdictionaryhasauniqueidentifierthough,soevenadictionarycouldwork.Letmedrafttheseoptionsforyou:
#examplecustomerobjects
customer1={'id':'abc123','full_name':'MasterYoda'}
customer2={'id':'def456','full_name':'Obi-WanKenobi'}
customer3={'id':'ghi789','full_name':'AnakinSkywalker'}
#collecttheminatuple
customers=(customer1,customer2,customer3)
#orcollecttheminalist
customers=[customer1,customer2,customer3]
#ormaybewithinadictionary,theyhaveauniqueidafterall
customers={
'abc123':customer1,
'def456':customer2,
'ghi789':customer3,
}
Somecustomerswehavethere,right?Iprobablywouldn'tgowiththetupleoption,unlessIwantedtohighlightthatthecollectionisnotgoingtochange.I'dsayusuallyalistisbetter,asitallowsformoreflexibility.
Anotherfactortokeepinmindisthattuplesandlistsareorderedcollections.If
youuseadictionary(priortoPython3.6)oraset,youlosetheordering,soyouneedtoknowiforderingisimportantinyourapplication.
Whataboutperformances?Forexample,inalist,operationssuchasinsertionandmembershipcantakeO(n),whiletheyareO(1)foradictionary.It'snotalwayspossibletousedictionariesthough,ifwedon'thavetheguaranteethatwecanuniquelyidentifyeachitemofthecollectionbymeansofoneofitsproperties,andthatthepropertyinquestionishashable(soitcanbeakeyindict).
Ifyou'rewonderingwhatO(n)andO(1)mean,pleaseGooglebigOnotation.Inthiscontext,let'sjustsaythatifperforminganoperationOponadatastructuretakesO(f(n)),itwouldmeanthatOptakesatmostatimet≤c*f(n)tocomplete,wherecissomepositiveconstant,nisthesizeoftheinput,andfissomefunction.So,thinkofO(...)asanupperboundfortherunningtimeofanoperation(itcanbeusedalsotosizeothermeasurablequantities,ofcourse).
Anotherwayofunderstandingifyouhavechosentherightdatastructureisbylookingatthecodeyouhavetowriteinordertomanipulateit.Ifeverythingcomeseasilyandflowsnaturally,thenyouprobablyhavechosencorrectly,butifyoufindyourselfthinkingyourcodeisgettingunnecessarilycomplicated,thenyouprobablyshouldtryanddecidewhetheryouneedtoreconsideryourchoices.It'squitehardtogiveadvicewithoutapracticalcasethough,sowhenyouchooseadatastructureforyourdata,trytokeepeaseofuseandperformanceinmindandgiveprecedencetowhatmattersmostinthecontextyouarein.
AboutindexingandslicingAtthebeginningofthischapter,wesawslicingappliedonstrings.Slicing,ingeneral,appliestoasequence:tuples,lists,strings,andsoon.Withlists,slicingcanalsobeusedforassignment.I'vealmostneverseenthisusedinprofessionalcode,butstill,youknowyoucan.Couldyouslicedictionariesorsets?Ihearyouscream,Ofcoursenot!.Excellent;Iseewe'reonthesamepagehere,solet'stalkaboutindexing.
ThereisonecharacteristicaboutPythonindexingIhaven'tmentionedbefore.I'llshowyoubywayofanexample.Howdoyouaddressthelastelementofacollection?Let'ssee:
>>>a=list(range(10))#`a`has10elements.Lastoneis9.
>>>a
[0,1,2,3,4,5,6,7,8,9]
>>>len(a)#itslengthis10elements
10
>>>a[len(a)-1]#positionoflastoneislen(a)-1
9
>>>a[-1]#butwedon'tneedlen(a)!Pythonrocks!
9
>>>a[-2]#equivalenttolen(a)-2
8
>>>a[-3]#equivalenttolen(a)-3
7
Ifthelistahas10elements,becauseofthe0-indexpositioningsystemofPython,thefirstoneisatposition0andthelastoneisatposition9.Intheprecedingexample,theelementsareconvenientlyplacedinapositionequaltotheirvalue:0isatposition0,1atposition1,andsoon.
So,inordertofetchthelastelement,weneedtoknowthelengthofthewholelist(ortuple,orstring,andsoon)andthensubtract1.Hence:len(a)-1.ThisissocommonanoperationthatPythonprovidesyouwithawaytoretrieveelementsusingnegativeindexing.Thisprovesveryusefulwhenyoudodatamanipulation.Here'sanicediagramabouthowindexingworksonthestring"HelloThere"(whichisObi-WanKenobisarcasticallygreetingGeneralGrievous):
Tryingtoaddressindexesgreaterthan9orsmallerthan-10willraiseanIndexError,asexpected.
AboutthenamesYoumayhavenoticedthat,inordertokeeptheexamplesasshortaspossible,Ihavecalledmanyobjectsusingsimpleletters,likea,b,c,d,andsoon.ThisisperfectlyOKwhenyoudebugontheconsoleorwhenyoushowthata+b==7,butit'sbadpracticewhenitcomestoprofessionalcoding(oranytypeofcoding,forthatmatter).IhopeyouwillindulgemeifIsometimesdoit;thereasonistopresentthecodeinamorecompactway.
Inarealenvironmentthough,whenyouchoosenamesforyourdata,youshouldchoosethemcarefullyandtheyshouldreflectwhatthedataisabout.So,ifyouhaveacollectionofCustomerobjects,customersisaperfectlygoodnameforit.Wouldcustomers_list,customers_tuple,orcustomers_collectionworkaswell?Thinkaboutitforasecond.Isitgoodtotiethenameofthecollectiontothedatatype?Idon'tthinkso,atleastinmostcases.SoI'dsayifyouhaveanexcellentreasontodoso,goahead;otherwise,don't.Thereasonis,oncethatcustomers_tuplestartsbeingusedindifferentplacesofyourcode,andyourealizeyouactuallywanttousealistinsteadofatuple,you'reupforsomefunrefactoring(alsoknownaswastedtime).Namesfordatashouldbenouns,andnamesforfunctionsshouldbeverbs.Namesshouldbeasexpressiveaspossible.Pythonisactuallyaverygoodexamplewhenitcomestonames.Mostofthetimeyoucanjustguesswhatafunctioniscalledifyouknowwhatitdoes.Crazy,huh?
Chapter2ofMeaningfulNamesofCleanCode,RobertC.Martin,PrenticeHallisentirelydedicatedtonames.It'sanamazingbookthathelpedmeimprovemycodingstyleinmanydifferentways,andisamust-readifyouwanttotakeyourcodingtothenextlevel.
SummaryInthischapter,we'veexploredthebuilt-indatatypesofPython.We'veseenhowmanythereareandhowmuchcanbeachievedbyjustusingthemindifferentcombinations.
We'veseennumbertypes,sequences,sets,mappings,collections(andaspecialguestappearancebyEnum),we'veseenthateverythingisanobject,we'velearnedthedifferencebetweenmutableandimmutable,andwe'vealsolearnedaboutslicingandindexing(and,proudly,negativeindexingaswell).
We'vepresentedsimpleexamples,butthere'smuchmorethatyoucanlearnaboutthissubject,sostickyournoseintotheofficialdocumentationandexplore.
Mostofall,Iencourageyoutotryoutalltheexercisesbyyourself,getyourfingersusingthatcode,buildsomemusclememory,andexperiment,experiment,experiment.Learnwhathappenswhenyoudividebyzero,whenyoucombinedifferentnumbertypesintoasingleexpression,whenyoumanagestrings.Playwithalldatatypes.Exercisethem,breakthem,discoveralltheirmethods,enjoythem,andlearnthemvery,verywell.
Ifyourfoundationisnotrocksolid,howgoodcanyourcodebe?Anddataisthefoundationforeverything.Datashapeswhatdancesaroundit.
Themoreyouprogresswiththebook,themoreit'slikelythatyouwillfindsomediscrepanciesormaybeasmalltypohereandthereinmycode(oryours).Youwillgetanerrormessage,somethingwillbreak.That'swonderful!Whenyoucode,thingsbreakallthetime,youdebugandfixallthetime,soconsidererrorsasusefulexercisestolearnsomethingnewaboutthelanguageyou'reusing,andnotasfailuresorproblems.Errorswillkeepcomingupuntilyourverylastlineofcode,that'sforsure,soyoumayaswellstartmakingyourpeacewiththemnow.
Thenextchapterisaboutiteratingandmakingdecisions.We'llseehowactuallytoputthosecollectionstouse,andtakedecisionsbasedonthedatawe'representedwith.We'llstarttogoalittlefasternowthatyourknowledgeis
buildingup,somakesureyou'recomfortablewiththecontentsofthischapterbeforeyoumovetothenextone.Oncemore,havefun,explore,breakthings.It'saverygoodwaytolearn.
IteratingandMakingDecisions"Insanity:doingthesamethingoverandoveragainandexpectingdifferentresults."–AlbertEinsteinInthepreviouschapter,welookedatPython'sbuilt-indatatypes.Nowthatyou'refamiliar
withdatainitsmanyformsandshapes,it'stimetostartlookingathowaprogramcanuseit.
AccordingtoWikipedia:Incomputerscience,controlflow(oralternatively,flowofcontrol)referstothespecificationoftheorderinwhichtheindividualstatements,instructionsorfunctioncallsofanimperativeprogramareexecutedorevaluated.
Inordertocontroltheflowofaprogram,wehavetwomainweapons:conditionalprogramming(alsoknownasbranching)andlooping.Wecanusetheminmanydifferentcombinationsandvariations,butinthischapter,insteadofgoingthroughallthepossibleformsofthosetwoconstructsinadocumentationfashion,I'drathergiveyouthebasicsandthenI'llwriteacoupleofsmallscriptswithyou.Inthefirstone,we'llseehowtocreatearudimentaryprime-numbergenerator,whileinthesecondone,we'llseehowtoapplydiscountstocustomersbasedoncoupons.Thisway,youshouldgetabetterfeelingforhowconditionalprogrammingandloopingcanbeused.
Inthischapter,wearegoingtocoverthefollowing:
ConditionalprogrammingLoopinginPythonAquickpeekattheitertoolsmodule
ConditionalprogrammingConditionalprogramming,orbranching,issomethingyoudoeveryday,everymoment.It'saboutevaluatingconditions:ifthelightisgreen,thenIcancross;ifit'sraining,thenI'mtakingtheumbrella;andifI'mlateforwork,thenI'llcallmymanager.
Themaintoolistheifstatement,whichcomesindifferentformsandcolors,butbasicallyitevaluatesanexpressionand,basedontheresult,chooseswhichpartofthecodetoexecute.Asusual,let'slookatanexample:
#conditional.1.py
late=True
iflate:
print('Ineedtocallmymanager!')
Thisispossiblythesimplestexample:whenfedtotheifstatement,lateactsasaconditionalexpression,whichisevaluatedinaBooleancontext(exactlylikeifwewerecallingbool(late)).IftheresultoftheevaluationisTrue,thenweenterthebodyofthecodeimmediatelyaftertheifstatement.Noticethattheprintinstructionisindented:thismeansitbelongstoascopedefinedbytheifclause.Executionofthiscodeyields:
$pythonconditional.1.py
Ineedtocallmymanager!
SincelateisTrue,theprintstatementwasexecuted.Let'sexpandonthisexample:
#conditional.2.py
late=False
iflate:
print('Ineedtocallmymanager!')#1
else:
print('noneedtocallmymanager...')#2
ThistimeIsetlate=False,sowhenIexecutethecode,theresultisdifferent:
$pythonconditional.2.py
noneedtocallmymanager...
Dependingontheresultofevaluatingthelateexpression,wecaneitherenterblock#1orblock#2,butnotboth.Block#1isexecutedwhenlateevaluatesto
True,whileblock#2isexecutedwhenlateevaluatestoFalse.TryassigningFalse/Truevaluestothelatename,andseehowtheoutputforthiscodechangesaccordingly.
Theprecedingexamplealsointroducestheelseclause,whichbecomesveryhandywhenwewanttoprovideanalternativesetofinstructionstobeexecutedwhenanexpressionevaluatestoFalsewithinanifclause.Theelseclauseisoptional,asisevidentbycomparingtheprecedingtwoexamples.
Aspecializedelse–elifSometimesallyouneedistodosomethingifaconditionismet(asimpleifclause).Atothertimes,youneedtoprovideanalternative,incasetheconditionisFalse(if/elseclause),buttherearesituationswhereyoumayhavemorethantwopathstochoosefrom,so,sincecallingthemanager(ornotcallingthem)iskindofabinarytypeofexample(eitheryoucalloryoudon't),let'schangethetypeofexampleandkeepexpanding.Thistime,wedecideontaxpercentages.Ifmyincomeislessthan$10,000,Iwon'tpayanytaxes.Ifitisbetween$10,000and$30,000,I'llpay20%intaxes.Ifitisbetween$30,000and$100,000,I'llpay35%intaxes,andifit'sover$100,000,I'll(gladly)pay45%intaxes.Let'sputthisalldownintobeautifulPythoncode:
#taxes.py
income=15000
ifincome<10000:
tax_coefficient=0.0#1
elifincome<30000:
tax_coefficient=0.2#2
elifincome<100000:
tax_coefficient=0.35#3
else:
tax_coefficient=0.45#4
print('Iwillpay:',income*tax_coefficient,'intaxes')
Executingtheprecedingcodeyields:
$pythontaxes.py
Iwillpay:3000.0intaxes
Let'sgothroughtheexamplelinebyline:westartbysettinguptheincomevalue.Intheexample,myincomeis$15,000.Weentertheifclause.Noticethatthistimewealsointroducedtheelifclause,whichisacontractionofelse-if,andit'sdifferentfromabareelseclauseinthatitalsohasitsowncondition.So,theifexpressionofincome<10000evaluatestoFalse,thereforeblock#1isnotexecuted.
Thecontrolpassestothenextconditionevaluator:elifincome<30000.ThisoneevaluatestoTrue,thereforeblock#2isexecuted,andbecauseofthis,Pythonthenresumesexecutionafterthewholeif/elif/elif/elseclause(whichwecanjustcall
theifclausefromnowon).Thereisonlyoneinstructionaftertheifclause,theprintcall,whichtellsusIwillpay3000.0intaxesthisyear(15,000*20%).Noticethattheorderismandatory:ifcomesfirst,then(optionally)asmanyelifclausesasyouneed,andthen(optionally)anelseclause.
Interesting,right?Nomatterhowmanylinesofcodeyoumayhavewithineachblock,whenoneoftheconditionsevaluatestoTrue,theassociatedblockisexecutedandthenexecutionresumesafterthewholeclause.IfnoneoftheconditionsevaluatestoTrue(forexample,income=200000),thenthebodyoftheelseclausewouldbeexecuted(block#4).Thisexampleexpandsourunderstandingofthebehavioroftheelseclause.Itsblockofcodeisexecutedwhennoneoftheprecedingif/elif/.../elifexpressionshasevaluatedtoTrue.
Trytomodifythevalueofincomeuntilyoucancomfortablyexecuteallblocksatwill(oneperexecution,ofcourse).Andthentrytheboundaries.Thisiscrucial,wheneveryouhaveconditionsexpressedasequalitiesorinequalities(==,!=,<,>,<=,>=),thosenumbersrepresentboundaries.Itisessentialtotestboundariesthoroughly.ShouldIallowyoutodriveat18or17?AmIcheckingyouragewithage<18,orage<=18?Youcan'timaginehowmanytimesI'vehadtofixsubtlebugsthatstemmedfromusingthewrongoperator,sogoaheadandexperimentwiththeprecedingcode.Changesome<to<=andsetincometobeoneoftheboundaryvalues(10,000,30,000,100,000)aswellasanyvalueinbetween.Seehowtheresultchanges,andgetagoodunderstandingofitbeforeproceeding.
Let'snowseeanotherexamplethatshowsushowtonestifclauses.Sayyourprogramencountersanerror.Ifthealertsystemistheconsole,weprinttheerror.Ifthealertsystemisanemail,wesenditaccordingtotheseverityoftheerror.Ifthealertsystemisanythingotherthanconsoleoremail,wedon'tknowwhattodo,thereforewedonothing.Let'sputthisintocode:
#errorsalert.py
alert_system='console'#othervaluecanbe'email'
error_severity='critical'#othervalues:'medium'or'low'
error_message='OMG!Somethingterriblehappened!'
ifalert_system=='console':
print(error_message)#1
elifalert_system=='email':
iferror_severity=='critical':
send_email('[email protected]',error_message)#2
eliferror_severity=='medium':
send_email('[email protected]',error_message)#3
else:
send_email('[email protected]',error_message)#4
Theprecedingexampleisquiteinteresting,becauseofitssilliness.Itshowsustwonestedifclauses(outerandinner).Italsoshowsusthattheouterifclausedoesn'thaveanyelse,whiletheinneronedoes.Noticehowindentationiswhatallowsustonestoneclausewithinanotherone.
Ifalert_system=='console',body#1isexecuted,andnothingelsehappens.Ontheotherhand,ifalert_system=='email',thenweenterintoanotherifclause,whichwecalledinner.Intheinnerifclause,accordingtoerror_severity,wesendanemailtoeitheranadmin,first-levelsupport,orsecond-levelsupport(blocks#2,#3,and#4).Thesend_emailfunctionisnotdefinedinthisexample,thereforetryingtorunitwouldgiveyouanerror.Inthesourcecodeofthebook,whichyoucandownloadfromthewebsite,Iincludedatricktoredirectthatcalltoaregularprintfunction,justsoyoucanexperimentontheconsolewithoutactuallysendinganemail.Trychangingthevaluesandseehowitallworks.
TheternaryoperatorOnelastthingIwouldliketoshowyou,beforemovingontothenextsubject,istheternaryoperatoror,inlayman'sterms,theshortversionofanif/elseclause.Whenthevalueofanameistobeassignedaccordingtosomecondition,sometimesit'seasierandmorereadabletousetheternaryoperatorinsteadofaproperifclause.Inthefollowingexample,thetwocodeblocksdoexactlythesamething:#ternary.pyorder_total=247#GBP#classicif/elseformiforder_total>100:discount=25#GBPelse:discount=0#GBPprint(order_total,discount)#ternaryoperatordiscount=25iforder_total>100else0print(order_total,discount)
Forsimplecaseslikethis,Ifinditverynicetobeabletoexpressthatlogicinonelineinsteadoffour.Remember,asacoder,youspendmuchmoretimereadingcodethanwritingit,soPython'sconcisenessisinvaluable.
Areyouclearonhowtheternaryoperatorworks?Basically,name=somethingifconditionelsesomething-else.SonameisassignedsomethingifconditionevaluatestoTrue,andsomething-elseifconditionevaluatestoFalse.
Nowthatyouknoweverythingaboutcontrollingthepathofthecode,let'smoveontothenextsubject:looping.
Looping
Ifyouhaveanyexperiencewithloopinginotherprogramminglanguages,youwillfindPython'swayofloopingabitdifferent.Firstofall,whatislooping?Loopingmeansbeingabletorepeattheexecutionofacodeblockmorethanonce,accordingtotheloopparameterswe'regiven.Therearedifferentloopingconstructs,whichservedifferentpurposes,andPythonhasdistilledallofthemdowntojusttwo,whichyoucanusetoachieveeverythingyouneed.Thesearetheforandwhilestatements.
Whileit'sdefinitelypossibletodoeverythingyouneedusingeitherofthem,theyservedifferentpurposesandthereforethey'reusuallyusedindifferentcontexts.We'llexplorethisdifferencethoroughlyinthischapter.
Theforloop
Theforloopisusedwhenloopingoverasequence,suchasalist,tuple,oracollectionofobjects.Let'sstartwithasimpleexampleandexpandontheconcepttoseewhatthePythonsyntaxallowsustodo:
#simple.for.py
fornumberin[0,1,2,3,4]:
print(number)
Thissimplesnippetofcode,whenexecuted,printsallnumbersfrom0to4.Theforloopisfedthelist[0,1,2,3,4]andateachiteration,numberisgivenavaluefromthesequence(whichisiteratedsequentially,inorder),thenthebodyoftheloopisexecuted(theprintline).Thenumbervaluechangesateveryiteration,accordingtowhichvalueiscomingnextfromthesequence.Whenthesequenceisexhausted,theforloopterminates,andtheexecutionofthecoderesumesnormallywiththecodeaftertheloop.
IteratingoverarangeSometimesweneedtoiterateoverarangeofnumbers,anditwouldbequiteunpleasanttohavetodosobyhardcodingthelistsomewhere.Insuchcases,therangefunctioncomestotherescue.Let'sseetheequivalentoftheprevioussnippetofcode:
#simple.for.py
fornumberinrange(5):
print(number)
TherangefunctionisusedextensivelyinPythonprogramswhenitcomestocreatingsequences:youcancallitbypassingonevalue,whichactsasstop(countingfrom0),oryoucanpasstwovalues(startandstop),oreventhree(start,stop,andstep).Checkoutthefollowingexample:
>>>list(range(10))#onevalue:from0tovalue(excluded)
[0,1,2,3,4,5,6,7,8,9]
>>>list(range(3,8))#twovalues:fromstarttostop(excluded)
[3,4,5,6,7]
>>>list(range(-10,10,4))#threevalues:stepisadded
[-10,-6,-2,2,6]
Forthemoment,ignorethatweneedtowraprange(...)withinalist.Therangeobjectisalittlebitspecial,butinthiscase,we'rejustinterestedinunderstandingwhatvaluesitwillreturntous.Youcanseethatthedealisthesamewithslicing:startisincluded,stopexcluded,andoptionallyyoucanaddastepparameter,whichbydefaultis1.
Trymodifyingtheparametersoftherange()callinoursimple.for.pycodeandseewhatitprints.Getcomfortablewithit.
IteratingoverasequenceNowwehaveallthetoolstoiterateoverasequence,solet'sbuildonthatexample:
#simple.for.2.py
surnames=['Rivest','Shamir','Adleman']
forpositioninrange(len(surnames)):
print(position,surnames[position])
Theprecedingcodeaddsalittlebitofcomplexitytothegame.Executionwillshowthisresult:
$pythonsimple.for.2.py
0Rivest
1Shamir
2Adleman
Let'susetheinside-outtechniquetobreakitdown,OK?Westartfromtheinnermostpartofwhatwe'retryingtounderstand,andweexpandoutward.So,len(surnames)isthelengthofthesurnameslist:3.Therefore,range(len(surnames))isactuallytransformedintorange(3).Thisgivesustherange[0,3),whichisbasicallyasequence(0,1,2).Thismeansthattheforloopwillrunthreeiterations.Inthefirstone,positionwilltakevalue0,whileinthesecondone,itwilltakevalue1,andfinallyvalue2inthethirdandlastiteration.Whatis(0,1,2),ifnotthepossibleindexingpositionsforthesurnameslist?Atposition0,wefind'Rivest',atposition1,'Shamir',andatposition2,'Adleman'.Ifyouarecuriousaboutwhatthesethreemencreatedtogether,changeprint(position,surnames[position])toprint(surnames[position][0],end=''),addafinalprint()outsideoftheloop,andrunthecodeagain.
Now,thisstyleofloopingisactuallymuchclosertolanguagessuchasJavaorC++.InPython,it'squiteraretoseecodelikethis.Youcanjustiterateoveranysequenceorcollection,sothereisnoneedtogetthelistofpositionsandretrieveelementsoutofasequenceateachiteration.It'sexpensive,needlesslyexpensive.Let'schangetheexampleintoamorePythonicform:
#simple.for.3.py
surnames=['Rivest','Shamir','Adleman']
forsurnameinsurnames:
print(surname)
Nowthat'ssomething!It'spracticallyEnglish.Theforloopcaniterateoverthesurnameslist,anditgivesbackeachelementinorderateachinteraction.Runningthiscodewillprintthethreesurnames,oneatatime.It'smucheasiertoread,right?
Whatifyouwantedtoprintthepositionaswellthough?Orwhatifyouactuallyneededit?Shouldyougobacktotherange(len(...))form?No.Youcanusetheenumeratebuilt-infunction,likethis:
#simple.for.4.py
surnames=['Rivest','Shamir','Adleman']
forposition,surnameinenumerate(surnames):
print(position,surname)
Thiscodeisveryinterestingaswell.Noticethatenumerategivesbackatwo-tuple(position,surname)ateachiteration,butstill,it'smuchmorereadable(andmoreefficient)thantherange(len(...))example.Youcancallenumeratewithastartparameter,suchasenumerate(iterable,start),anditwillstartfromstart,ratherthan0.JustanotherlittlethingthatshowsyouhowmuchthoughthasbeengivenindesigningPythonsothatitmakesyourlifeeasier.
Youcanuseaforlooptoiterateoverlists,tuples,andingeneralanythingthatPythoncallsiterable.Thisisaveryimportantconcept,solet'stalkaboutitabitmore.
IteratorsanditerablesAccordingtothePythondocumentation(https://docs.python.org/3/glossary.html),aniterableis:Anobjectcapableofreturningitsmembersoneatatime.Examplesofiterablesincludeallsequencetypes(suchaslist,str,andtuple)andsomenon-sequencetypeslikedict,fileobjects,andobjectsofanyclassesyoudefinewithan__iter__()or__getitem__()method.Iterablescanbeusedinaforloopandinmanyotherplaceswhereasequenceisneeded(zip(),map(),...).Whenaniterableobjectispassedasanargumenttothebuilt-infunctioniter(),itreturnsaniteratorfortheobject.Thisiteratorisgoodforonepassoverthesetofvalues.Whenusingiterables,itisusuallynotnecessarytocalliter()ordealwithiteratorobjectsyourself.Theforstatementdoesthatautomaticallyforyou,creatingatemporaryunnamedvariabletoholdtheiteratorforthedurationoftheloop.
Simplyput,whathappenswhenyouwriteforkinsequence:...body...,isthattheforloopaskssequenceforthenextelement,itgetssomethingback,itcallsthatsomethingk,andthenexecutesitsbody.Then,onceagain,theforloopaskssequenceforthenextelement,itcallsitkagain,andexecutesthebodyagain,andsoonandsoforth,untilthesequenceisexhausted.Emptysequenceswillresultinzeroexecutionsofthebody.
Somedatastructures,wheniteratedover,producetheirelementsinorder,suchaslists,tuples,andstrings,whilesomeothersdon't,suchassetsanddictionaries(priortoPython3.6).Pythongivesustheabilitytoiterateoveriterables,usingatypeofobjectcalledaniterator.
Accordingtotheofficialdocumentation(https://docs.python.org/3/glossary.html),aniteratoris:Anobjectrepresentingastreamofdata.Repeatedcallstotheiterator's__next__()method(orpassingittothebuilt-infunctionnext())returnsuccessiveitemsinthestream.WhennomoredataareavailableaStopIterationexceptionisraisedinstead.Atthispoint,theiteratorobjectisexhaustedandanyfurthercallstoits__next__()methodjustraiseStopIterationagain.Iteratorsarerequiredtohavean__iter__()methodthatreturnstheiteratorobjectitselfsoeveryiteratorisalsoiterableandmaybeusedinmostplaceswhereotheriterablesareaccepted.Onenotableexceptioniscodewhichattemptsmultiple
iterationpasses.Acontainerobject(suchasalist)producesafreshnewiteratoreachtimeyoupassittotheiter()functionoruseitinaforloop.Attemptingthiswithaniteratorwilljustreturnthesameexhaustediteratorobjectusedinthepreviousiterationpass,makingitappearlikeanemptycontainer.
Don'tworryifyoudon'tfullyunderstandalltheprecedinglegalese,youwillinduetime.Iputithereasahandyreferenceforthefuture.
Inpractice,thewholeiterable/iteratormechanismissomewhathiddenbehindthecode.Unlessyouneedtocodeyourowniterableoriteratorforsomereason,youwon'thavetoworryaboutthistoomuch.Butit'sveryimportanttounderstandhowPythonhandlesthiskeyaspectofcontrolflowbecauseitwillshapethewayyouwillwriteyourcode.
IteratingovermultiplesequencesLet'sseeanotherexampleofhowtoiterateovertwosequencesofthesamelength,inordertoworkontheirrespectiveelementsinpairs.Saywehavealistofpeopleandalistofnumbersrepresentingtheageofthepeopleinthefirstlist.Wewanttoprintapairperson/ageononelineforallofthem.Let'sstartwithanexampleandlet'srefineitgradually:
#multiple.sequences.py
people=['Conrad','Deepak','Heinrich','Tom']
ages=[29,30,34,36]
forpositioninrange(len(people)):
person=people[position]
age=ages[position]
print(person,age)
Bynow,thiscodeshouldbeprettystraightforwardforyoutounderstand.Weneedtoiterateoverthelistofpositions(0,1,2,3)becausewewanttoretrieveelementsfromtwodifferentlists.Executingitwegetthefollowing:
$pythonmultiple.sequences.py
Conrad29
Deepak30
Heinrich34
Tom36
ThiscodeisbothinefficientandnotPythonic.It'sinefficientbecauseretrievinganelementgiventhepositioncanbeanexpensiveoperation,andwe'redoingitfromscratchateachiteration.Thepostalworkerdoesn'tgobacktothebeginningoftheroadeachtimetheydeliveraletter,right?Theymovefromhousetohouse.Fromonetothenextone.Let'strytomakeitbetterusingenumerate:
#multiple.sequences.enumerate.py
people=['Conrad','Deepak','Heinrich','Tom']
ages=[29,30,34,36]
forposition,personinenumerate(people):
age=ages[position]
print(person,age)
That'sbetter,butstillnotperfect.Andit'sstillabitugly.We'reiteratingproperlyonpeople,butwe'restillfetchingageusingpositionalindexing,whichwewanttoloseaswell.Well,noworries,Pythongivesyouthezipfunction,remember?
Let'suseit:
#multiple.sequences.zip.py
people=['Conrad','Deepak','Heinrich','Tom']
ages=[29,30,34,36]
forperson,ageinzip(people,ages):
print(person,age)
Ah!Somuchbetter!Onceagain,comparetheprecedingcodewiththefirstexampleandadmirePython'selegance.ThereasonIwantedtoshowthisexampleistwofold.Ontheonehand,IwantedtogiveyouanideaofhowshortercodeinPythoncanbecomparedtootherlanguageswherethesyntaxdoesn'tallowyoutoiterateoversequencesorcollectionsaseasily.Andontheotherhand,andmuchmoreimportantly,noticethatwhentheforloopaskszip(sequenceA,sequenceB)forthenextelement,itgetsbackatuple,notjustasingleobject.Itgetsbackatuplewithasmanyelementsasthenumberofsequenceswefeedtothezipfunction.Let'sexpandalittleonthepreviousexampleintwoways,usingexplicitandimplicitassignment:
#multiple.sequences.explicit.py
people=['Conrad','Deepak','Heinrich','Tom']
ages=[29,30,34,36]
nationalities=['Poland','India','SouthAfrica','England']
forperson,age,nationalityinzip(people,ages,nationalities):
print(person,age,nationality)
Intheprecedingcode,weaddedthenationalitieslist.Nowthatwefeedthreesequencestothezipfunction,theforloopgetsbackathree-tupleateachiteration.Noticethatthepositionoftheelementsinthetuplerespectsthepositionofthesequencesinthezipcall.Executingthecodewillyieldthefollowingresult:
$pythonmultiple.sequences.explicit.py
Conrad29Poland
Deepak30India
Heinrich34SouthAfrica
Tom36England
Sometimes,forreasonsthatmaynotbeclearinasimpleexamplesuchastheprecedingone,youmaywanttoexplodethetuplewithinthebodyoftheforloop.Ifthatisyourdesire,it'sperfectlypossibletodoso:
#multiple.sequences.implicit.py
people=['Conrad','Deepak','Heinrich','Tom']
ages=[29,30,34,36]
nationalities=['Poland','India','SouthAfrica','England']
fordatainzip(people,ages,nationalities):
person,age,nationality=data
print(person,age,nationality)
It'sbasicallydoingwhattheforloopdoesautomaticallyforyou,butinsomecasesyoumaywanttodoityourself.Here,thethree-tupledatathatcomesfromzip(...)isexplodedwithinthebodyoftheforloopintothreevariables:person,age,andnationality.
ThewhileloopIntheprecedingpages,wesawtheforloopinaction.It'sincrediblyusefulwhenyouneedtoloopoverasequenceoracollection.Thekeypointtokeepinmind,whenyouneedtobeabletodiscriminatewhichloopingconstructtouse,isthattheforlooprockswhenyouhavetoiterateoverafiniteamountofelements.Itcanbeahugeamount,butstill,somethingthatendsatsomepoint.
Thereareothercasesthough,whenyoujustneedtoloopuntilsomeconditionissatisfied,orevenloopindefinitelyuntiltheapplicationisstopped,suchascaseswherewedon'treallyhavesomethingtoiterateon,andthereforetheforloopwouldbeapoorchoice.Butfearnot,forthesecases,Pythonprovidesuswiththewhileloop.
Thewhileloopissimilartotheforloop,inthattheybothloop,andateachiterationtheyexecuteabodyofinstructions.Whatisdifferentbetweenthemisthatthewhileloopdoesn'tloopoverasequence(itcan,butyouhavetowritethelogicmanuallyanditwouldn'tmakeanysense,youwouldjustwanttouseaforloop),rather,itloopsaslongasacertainconditionissatisfied.Whentheconditionisnolongersatisfied,theloopends.
Asusual,let'sseeanexamplethatwillclarifyeverythingforus.Wewanttoprintthebinaryrepresentationofapositivenumber.Inordertodoso,wecanuseasimplealgorithmthatcollectstheremaindersofdivisionby2(inreverseorder),andthatturnsouttobethebinaryrepresentationofthenumberitself:
6/2=3(remainder:0)
3/2=1(remainder:1)
1/2=0(remainder:1)
Listofremainders:0,1,1.
Inverseis1,1,0,whichisalsothebinaryrepresentationof6:110
Let'swritesomecodetocalculatethebinaryrepresentationforthenumber39:1001112:
#binary.py
n=39
remainders=[]
whilen>0:
remainder=n%2#remainderofdivisionby2
remainders.insert(0,remainder)#wekeeptrackofremainders
n//=2#wedividenby2
print(remainders)
Intheprecedingcode,Ihighlightedn>0,whichistheconditiontokeeplooping.Wecanmakethecodealittleshorter(andmorePythonic),byusingthedivmodfunction,whichiscalledwithanumberandadivisor,andreturnsatuplewiththeresultoftheintegerdivisionanditsremainder.Forexample,divmod(13,5)wouldreturn(2,3),andindeed5*2+3=13:
#binary.2.py
n=39
remainders=[]
whilen>0:
n,remainder=divmod(n,2)
remainders.insert(0,remainder)
print(remainders)
Intheprecedingcode,wehavereassignedntotheresultofthedivisionby2,andtheremainder,inonesingleline.
Noticethattheconditioninawhileloopisaconditiontocontinuelooping.IfitevaluatestoTrue,thenthebodyisexecutedandthenanotherevaluationfollows,andsoon,untiltheconditionevaluatestoFalse.Whenthathappens,theloopisexitedimmediatelywithoutexecutingitsbody.
IftheconditionneverevaluatestoFalse,theloopbecomesaso-calledinfiniteloop.Infiniteloopsareused,forexample,whenpollingfromnetworkdevices:youaskthesocketwhetherthereisanydata,youdosomethingwithitifthereisany,thenyousleepforasmallamountoftime,andthenyouaskthesocketagain,overandoveragain,withouteverstopping.
Havingtheabilitytoloopoveracondition,ortoloopindefinitely,isthereasonwhytheforloopaloneisnotenough,andthereforePythonprovidesthewhileloop.
Bytheway,ifyouneedthebinaryrepresentationofanumber,checkoutthebinfunction.
Justforfun,let'sadaptoneoftheexamples(multiple.sequences.py)usingthewhilelogic:
#multiple.sequences.while.py
people=['Conrad','Deepak','Heinrich','Tom']
ages=[29,30,34,36]
position=0
whileposition<len(people):
person=people[position]
age=ages[position]
print(person,age)
position+=1
Intheprecedingcode,Ihavehighlightedtheinitialization,condition,andupdateofthepositionvariable,whichmakesitpossibletosimulatetheequivalentforloopcodebyhandlingtheiterationvariablemanually.Everythingthatcanbedonewithaforloopcanalsobedonewithawhileloop,eventhoughyoucanseethere'sabitofboilerplateyouhavetogothroughinordertoachievethesameresult.Theoppositeisalsotrue,butunlessyouhaveareasontodoso,yououghttousetherighttoolforthejob,and99.9%ofthetimeyou'llbefine.
So,torecap,useaforloopwhenyouneedtoiterateoveraniterable,andawhileloopwhenyouneedtoloopaccordingtoaconditionbeingsatisfiedornot.Ifyoukeepinmindthedifferencebetweenthetwopurposes,youwillneverchoosethewrongloopingconstruct.
Let'snowseehowtoalterthenormalflowofaloop.
ThebreakandcontinuestatementsAccordingtothetaskathand,sometimesyouwillneedtoaltertheregularflowofaloop.Youcaneitherskipasingleiteration(asmanytimesasyouwant),oryoucanbreakoutoftheloopentirely.Acommonusecaseforskippingiterationsis,forexample,whenyou'reiteratingoveralistofitemsandyouneedtoworkoneachofthemonlyifsomeconditionisverified.Ontheotherhand,ifyou'reiteratingoveracollectionofitems,andyouhavefoundoneofthemthatsatisfiessomeneedyouhave,youmaydecidenottocontinuetheloopentirelyandthereforebreakoutofit.Therearecountlesspossiblescenarios,soit'sbettertoseeacoupleofexamples.
Let'ssayyouwanttoapplya20%discounttoallproductsinabasketlistforthosethathaveanexpirationdateoftoday.Thewayyouachievethisistousethecontinuestatement,whichtellstheloopingconstruct(fororwhile)tostopexecutionofthebodyimmediatelyandgotothenextiteration,ifany.Thisexamplewilltakeusalittledeeperdowntherabbithole,sobereadytojump:
#discount.py
fromdatetimeimportdate,timedelta
today=date.today()
tomorrow=today+timedelta(days=1)#today+1dayistomorrow
products=[
{'sku':'1','expiration_date':today,'price':100.0},
{'sku':'2','expiration_date':tomorrow,'price':50},
{'sku':'3','expiration_date':today,'price':20},
]
forproductinproducts:
ifproduct['expiration_date']!=today:
continue
product['price']*=0.8#equivalenttoapplying20%discount
print(
'Priceforsku',product['sku'],
'isnow',product['price'])
Westartbyimportingthedateandtimedeltaobjects,thenwesetupourproducts.Thosewithskuas1and3haveanexpirationdateoftoday,whichmeanswewanttoapplya20%discountonthem.Weloopovereachproductandweinspecttheexpirationdate.Ifitisnot(inequalityoperator,!=)today,wedon'twanttoexecutetherestofthebodysuite,sowecontinue.
Noticethatitisnotimportantwhereinthebodysuiteyouplacethecontinuestatement(youcanevenuseitmorethanonce).Whenyoureachit,executionstopsandgoesbacktothenextiteration.Ifwerunthediscount.pymodule,thisistheoutput:
$pythondiscount.py
Priceforsku1isnow80.0
Priceforsku3isnow16.0
Thisshowsyouthatthelasttwolinesofthebodyhaven'tbeenexecutedforskunumber2.
Let'snowseeanexampleofbreakingoutofaloop.SaywewanttotellwhetheratleastoneoftheelementsinalistevaluatestoTruewhenfedtotheboolfunction.Giventhatweneedtoknowwhetherthereisatleastone,whenwefindit,wedon'tneedtokeepscanningthelistanyfurther.InPythoncode,thistranslatestousingthebreakstatement.Let'swritethisdownintocode:
#any.py
items=[0,None,0.0,True,0,7]#Trueand7evaluatetoTrue
found=False#thisiscalled"flag"
foriteminitems:
print('scanningitem',item)
ifitem:
found=True#weupdatetheflag
break
iffound:#weinspecttheflag
print('AtleastoneitemevaluatestoTrue')
else:
print('AllitemsevaluatetoFalse')
Theprecedingcodeissuchacommonpatterninprogramming,youwillseeitalot.Whenyouinspectitemsthisway,basicallywhatyoudoistosetupaflagvariable,thenstarttheinspection.Ifyoufindoneelementthatmatchesyourcriteria(inthisexample,thatevaluatestoTrue),thenyouupdatetheflagandstopiterating.Afteriteration,youinspecttheflagandtakeactionaccordingly.Executionyields:
$pythonany.py
scanningitem0
scanningitemNone
scanningitem0.0
scanningitemTrue
AtleastoneitemevaluatestoTrue
SeehowexecutionstoppedafterTruewasfound?Thebreakstatementactsexactly
likethecontinueone,inthatitstopsexecutingthebodyoftheloopimmediately,butalso,preventsanyotheriterationfromrunning,effectivelybreakingoutoftheloop.Thecontinueandbreakstatementscanbeusedtogetherwithnolimitationintheirnumbers,bothintheforandwhileloopingconstructs.
Bytheway,thereisnoneedtowritecodetodetectwhetherthereisatleastoneelementinasequencethatevaluatestoTrue.Justcheckoutthebuilt-inanyfunction.
AspecialelseclauseOneofthefeaturesI'veseenonlyinthePythonlanguageistheabilitytohaveelseclausesafterwhileandforloops.It'sveryrarelyused,butit'sdefinitelynicetohave.Inshort,youcanhaveanelsesuiteafterafororwhileloop.Iftheloopendsnormally,becauseofexhaustionoftheiterator(forloop)orbecausetheconditionisfinallynotmet(whileloop),thentheelsesuite(ifpresent)isexecuted.Incaseexecutionisinterruptedbyabreakstatement,theelseclauseisnotexecuted.Let'stakeanexampleofaforloopthatiteratesoveragroupofitems,lookingforonethatwouldmatchsomecondition.Incasewedon'tfindatleastonethatsatisfiesthecondition,wewanttoraiseanexception.Thismeanswewanttoarresttheregularexecutionoftheprogramandsignalthattherewasanerror,orexception,thatwecannotdealwith.ExceptionswillbethesubjectofChapter8,Testing,Profiling,andDealingwithExceptions,sodon'tworryifyoudon'tfullyunderstandthemnow.Justbearinmindthattheywillaltertheregularflowofthecode.
Letmenowshowyoutwoexamplesthatdoexactlythesamething,butoneofthemisusingthespecialfor...elsesyntax.Saythatwewanttofind,amongacollectionofpeople,onethatcoulddriveacar:
#for.no.else.py
classDriverException(Exception):
pass
people=[('James',17),('Kirk',9),('Lars',13),('Robert',8)]
driver=None
forperson,ageinpeople:
ifage>=18:
driver=(person,age)
break
ifdriverisNone:
raiseDriverException('Drivernotfound.')
Noticetheflagpatternagain.WesetthedrivertobeNone,thenifwefindone,weupdatethedriverflag,andthen,attheendoftheloop,weinspectittoseewhetheronewasfound.Ikindofhavethefeelingthatthosekidswoulddriveaverymetalliccar,butanyway,noticethatifadriverisnotfound,DriverExceptionisraised,signalingtotheprogramthatexecutioncannotcontinue(we'relackingthedriver).
Thesamefunctionalitycanberewrittenabitmoreelegantlyusingthefollowingcode:
#for.else.py
classDriverException(Exception):
pass
people=[('James',17),('Kirk',9),('Lars',13),('Robert',8)]
forperson,ageinpeople:
ifage>=18:
driver=(person,age)
break
else:
raiseDriverException('Drivernotfound.')
Noticethatwearen'tforcedtousetheflagpatternanymore.Theexceptionisraisedaspartoftheforlooplogic,whichmakesgoodsensebecausetheforloopischeckingonsomecondition.Allweneedistosetupadriverobjectincasewefindone,becausetherestofthecodeisgoingtousethatinformationsomewhere.Noticethecodeisshorterandmoreelegant,becausethelogicisnowcorrectlygroupedtogetherwhereitbelongs.
IntheTransformingCodeintoBeautiful,IdiomaticPythonvideo,RaymondHettingersuggestsamuchbetternamefortheelsestatementassociatedwithaforloop:nobreak.Ifyoustrugglerememberinghowtheelseworksforaforloop,simplyrememberingthisfactshouldhelpyou.
PuttingallthistogetherNowthatyouhaveseenallthereistoseeaboutconditionalsandloops,it'stimetospicethingsupalittle,andlookatthosetwoexamplesIanticipatedatthebeginningofthischapter.We'llmixandmatchhere,soyoucanseehowyoucanusealltheseconceptstogether.Let'sstartbywritingsomecodetogeneratealistofprimenumbersuptosomelimit.PleasebearinmindthatI'mgoingtowriteaveryinefficientandrudimentaryalgorithmtodetectprimes.Theimportantthingforyouistoconcentrateonthosebitsinthecodethatbelongtothischapter'ssubject.
AprimegeneratorAccordingtoWikipedia:
Aprimenumber(oraprime)isanaturalnumbergreaterthan1thathasnopositivedivisorsotherthan1anditself.Anaturalnumbergreaterthan1thatisnotaprimenumberiscalledacompositenumber.
Basedonthisdefinition,ifweconsiderthefirst10naturalnumbers,wecanseethat2,3,5,and7areprimes,while1,4,6,8,9,and10arenot.Inordertohaveacomputertellyouwhetheranumber,N,isprime,youcandividethatnumberbyallnaturalnumbersintherange[2,N).Ifanyofthosedivisionsyieldszeroasaremainder,thenthenumberisnotaprime.Enoughchatter,let'sgetdowntobusiness.I'llwritetwoversionsofthis,thesecondofwhichwillexploitthefor...elsesyntax:
#primes.py
primes=[]#thiswillcontaintheprimesintheend
upto=100#thelimit,inclusive
forninrange(2,upto+1):
is_prime=True#flag,newateachiterationofouterfor
fordivisorinrange(2,n):
ifn%divisor==0:
is_prime=False
break
ifis_prime:#checkonflag
primes.append(n)
print(primes)
Therearealotofthingstonoticeintheprecedingcode.Firstofall,wesetupanemptyprimeslist,whichwillcontaintheprimesattheend.Thelimitis100,andyoucanseeit'sinclusiveinthewaywecallrange()intheouterloop.Ifwewroterange(2,upto)thatwouldbe[2,upto),right?Thereforerange(2,upto+1)givesus[2,upto+1)==[2,upto].
So,therearetwoforloops.Intheouterone,weloopoverthecandidateprimes,thatis,allnaturalnumbersfrom2toupto.Insideeachiterationofthisouterloop,wesetupaflag(whichissettoTrueateachiteration),andthenstartdividingthecurrentnbyallnumbersfrom2ton-1.Ifwefindaproperdivisorforn,itmeansniscomposite,andthereforewesettheflagtoFalseandbreaktheloop.Noticethatwhenwebreaktheinnerone,theouteronekeepsongoingnormally.Thereasonwhywebreakafterhavingfoundaproperdivisorfornisthatwedon't
needanyfurtherinformationtobeabletotellthatnisnotaprime.
Whenwecheckontheis_primeflag,ifitisstillTrue,itmeanswecouldn'tfindanynumberin[2,n)thatisaproperdivisorforn,thereforenisaprime.Weappendntotheprimeslist,andhop!Anotheriterationproceeds,untilnequals100.
Runningthiscodeyields:
$pythonprimes.py
[2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,
83,89,97]
Beforeweproceed,onequestion:ofalltheiterationsoftheouterloop,oneofthemisdifferentfromalltheothers.Couldyoutellwhichone,andwhy?Thinkaboutitforasecond,gobacktothecode,trytofigureitoutforyourself,andthenkeepreadingon.
Didyoufigureitout?Ifnot,don'tfeelbad,it'sperfectlynormal.Iaskedyoutodoitasasmallexercisebecauseit'swhatcodersdoallthetime.Theskilltounderstandwhatthecodedoesbysimplylookingatitissomethingyoubuildovertime.It'sveryimportant,sotrytoexerciseitwheneveryoucan.I'lltellyoutheanswernow:theiterationthatbehavesdifferentlyfromallothersisthefirstone.Thereasonisbecauseinthefirstiteration,nis2.Thereforetheinnermostforloopwon'tevenrun,becauseit'saforloopthatiteratesoverrange(2,2),andwhatisthatifnot[2,2)?Tryitoutforyourself,writeasimpleforloopwiththatiterable,putaprintinthebodysuite,andseewhetheranythinghappens(itwon't...).
Now,fromanalgorithmicpointofview,thiscodeisinefficient,solet'satleastmakeitmorebeautiful:
#primes.else.py
primes=[]
upto=100
forninrange(2,upto+1):
fordivisorinrange(2,n):
ifn%divisor==0:
break
else:
primes.append(n)
print(primes)
Muchnicer,right?Theis_primeflagisgone,andweappendntotheprimeslistwhenweknowtheinnerforloophasn'tencounteredanybreakstatements.See
howthecodelookscleanerandreadsbetter?
ApplyingdiscountsInthisexample,IwanttoshowyouatechniqueIlikealot.Inmanyprogramminglanguages,otherthantheif/elif/elseconstructs,inwhateverformorsyntaxtheymaycome,youcanfindanotherstatement,usuallycalledswitch/case,thatinPythonismissing.Itistheequivalentofacascadeofif/elif/.../elif/elseclauses,withasyntaxsimilartothis(warning!JavaScriptcode!):
/*switch.js*/
switch(day_number){
case1:
case2:
case3:
case4:
case5:
day="Weekday";
break;
case6:
day="Saturday";
break;
case0:
day="Sunday";
break;
default:
day="";
alert(day_number+'isnotavaliddaynumber.')
}
Intheprecedingcode,weswitchonavariablecalledday_number.Thismeanswegetitsvalueandthenwedecidewhatcaseitfitsin(ifany).From1to5thereisacascade,whichmeansnomatterthenumber,[1,5]allgodowntothebitoflogicthatsetsdayas"Weekday".Thenwehavesinglecasesfor0and6,andadefaultcasetopreventerrors,whichalertsthesystemthatday_numberisnotavaliddaynumber,thatis,notin[0,6].Pythonisperfectlycapableofrealizingsuchlogicusingif/elif/elsestatements:
#switch.py
if1<=day_number<=5:
day='Weekday'
elifday_number==6:
day='Saturday'
elifday_number==0:
day='Sunday'
else:
day=''
raiseValueError(
str(day_number)+'isnotavaliddaynumber.')
Intheprecedingcode,wereproducethesamelogicoftheJavaScriptsnippetinPython,usingif/elif/elsestatements.IraisedtheValueErrorexceptionjustasanexampleattheend,ifday_numberisnotin[0,6].Thisisonepossiblewayoftranslatingtheswitch/caselogic,butthereisalsoanotherone,sometimescalleddispatching,whichIwillshowyouinthelastversionofthenextexample.
Bytheway,didyounoticethefirstlineoftheprevioussnippet?HaveyounoticedthatPythoncanmakedouble(actually,evenmultiple)comparisons?It'sjustwonderful!
Let'sstartthenewexamplebysimplywritingsomecodethatassignsadiscounttocustomersbasedontheircouponvalue.I'llkeepthelogicdowntoaminimumhere,rememberthatallwereallycareaboutisunderstandingconditionalsandloops:
#coupons.py
customers=[
dict(id=1,total=200,coupon_code='F20'),#F20:fixed,£20
dict(id=2,total=150,coupon_code='P30'),#P30:percent,30%
dict(id=3,total=100,coupon_code='P50'),#P50:percent,50%
dict(id=4,total=110,coupon_code='F15'),#F15:fixed,£15
]
forcustomerincustomers:
code=customer['coupon_code']
ifcode=='F20':
customer['discount']=20.0
elifcode=='F15':
customer['discount']=15.0
elifcode=='P30':
customer['discount']=customer['total']*0.3
elifcode=='P50':
customer['discount']=customer['total']*0.5
else:
customer['discount']=0.0
forcustomerincustomers:
print(customer['id'],customer['total'],customer['discount'])
Westartbysettingupsomecustomers.Theyhaveanordertotal,acouponcode,andanID.Imadeupfourdifferenttypesofcoupons,twoarefixedandtwoarepercentage-based.Youcanseethatintheif/elif/elsecascadeIapplythediscountaccordingly,andIsetitasa'discount'keyinthecustomerdictionary.
Attheend,Ijustprintoutpartofthedatatoseewhethermycodeisworkingproperly:
$pythoncoupons.py
120020.0
215045.0
310050.0
411015.0
Thiscodeissimpletounderstand,butallthoseclausesarekindofclutteringthelogic.It'snoteasytoseewhat'sgoingonatafirstglance,andIdon'tlikeit.Incaseslikethis,youcanexploitadictionarytoyouradvantage,likethis:
#coupons.dict.py
customers=[
dict(id=1,total=200,coupon_code='F20'),#F20:fixed,£20
dict(id=2,total=150,coupon_code='P30'),#P30:percent,30%
dict(id=3,total=100,coupon_code='P50'),#P50:percent,50%
dict(id=4,total=110,coupon_code='F15'),#F15:fixed,£15
]
discounts={
'F20':(0.0,20.0),#eachvalueis(percent,fixed)
'P30':(0.3,0.0),
'P50':(0.5,0.0),
'F15':(0.0,15.0),
}
forcustomerincustomers:
code=customer['coupon_code']
percent,fixed=discounts.get(code,(0.0,0.0))
customer['discount']=percent*customer['total']+fixed
forcustomerincustomers:
print(customer['id'],customer['total'],customer['discount'])
Runningtheprecedingcodeyieldsexactlythesameresultwehadfromthesnippetbeforeit.Wesparedtwolines,butmoreimportantly,wegainedalotinreadability,asthebodyoftheforloopnowisjustthreelineslong,andveryeasytounderstand.Theconcepthereistouseadictionaryasadispatcher.Inotherwords,wetrytofetchsomethingfromthedictionarybasedonacode(ourcoupon_code),andbyusingdict.get(key,default),wemakesurewealsocaterforwhenthecodeisnotinthedictionaryandweneedadefaultvalue.
NoticethatIhadtoapplysomeverysimplelinearalgebrainordertocalculatethediscountproperly.Eachdiscounthasapercentageandfixedpartinthedictionary,representedbyatwo-tuple.Byapplyingpercent*total+fixed,wegetthecorrectdiscount.Whenpercentis0,theformulajustgivesthefixedamount,anditgivespercent*totalwhenfixedis0.
Thistechniqueisimportantbecauseitisalsousedinothercontexts,withfunctions,whereitactuallybecomesmuchmorepowerfulthanwhatwe'veseenintheprecedingsnippet.Anotheradvantageofusingitisthatyoucancodeitinsuchawaythatthekeysandvaluesofthediscountsdictionaryarefetched
dynamically(forexample,fromadatabase).Thiswillallowthecodetoadapttowhateverdiscountsandconditionsyouhave,withouthavingtomodifyanything.
Ifit'snotcompletelycleartoyouhowitworks,Isuggestyoutakeyourtimeandexperimentwithit.Changevaluesandaddprintstatementstoseewhat'sgoingonwhiletheprogramisrunning.
AquickpeekattheitertoolsmoduleAchapteraboutiterables,iterators,conditionallogic,andloopingwouldn'tbecompletewithoutafewwordsabouttheitertoolsmodule.Ifyouareintoiterating,thisisakindofheaven.
AccordingtothePythonofficialdocumentation(https://docs.python.org/2/library/itertools.html),theitertoolsmoduleis:ThismodulewhichimplementsanumberofiteratorbuildingblocksinspiredbyconstructsfromAPL,Haskell,andSML.EachhasbeenrecastinaformsuitableforPython.Themodulestandardizesacoresetoffast,memoryefficienttoolsthatareusefulbythemselvesorincombination.Together,theyforman“iteratoralgebra”makingitpossibletoconstructspecializedtoolssuccinctlyandefficientlyinpurePython.
BynomeansdoIhavetheroomheretoshowyouallthegoodiesyoucanfindinthismodule,soIencourageyoutogocheckitoutforyourself,Ipromiseyou'llenjoyit.Inanutshell,itprovidesyouwiththreebroadcategoriesofiterators.Iwillgiveyouaverysmallexampleofoneiteratortakenfromeachoneofthem,justtomakeyourmouthwateralittle.
InfiniteiteratorsInfiniteiteratorsallowyoutoworkwithaforloopinadifferentfashion,suchasifitwereawhileloop:#infinite.pyfromitertoolsimportcount
fornincount(5,3):ifn>20:breakprint(n,end=',')#insteadofnewline,commaandspace
Runningthecodegivesthis:
$pythoninfinite.py
5,8,11,14,17,20,
Thecountfactoryclassmakesaniteratorthatjustgoesonandoncounting.Itstartsfrom5andkeepsadding3toit.Weneedtobreakitmanuallyifwedon'twanttogetstuckinaninfiniteloop.
IteratorsterminatingontheshortestinputsequenceThiscategoryisveryinteresting.Itallowsyoutocreateaniteratorbasedonmultipleiterators,combiningtheirvaluesaccordingtosomelogic.Thekeypointhereisthatamongthoseiterators,incaseanyofthemareshorterthantherest,theresultingiteratorwon'tbreak,itwillsimplystopassoonastheshortestiteratorisexhausted.Thisisverytheoretical,Iknow,soletmegiveyouanexampleusingcompress.ThisiteratorgivesyoubackthedataaccordingtoacorrespondingiteminaselectorbeingTrueorFalse:
compress('ABC',(1,0,1))wouldgiveback'A'and'C',becausetheycorrespondto1.Let'sseeasimpleexample:
#compress.py
fromitertoolsimportcompress
data=range(10)
even_selector=[1,0]*10
odd_selector=[0,1]*10
even_numbers=list(compress(data,even_selector))
odd_numbers=list(compress(data,odd_selector))
print(odd_selector)
print(list(data))
print(even_numbers)
print(odd_numbers)
Noticethatodd_selectorandeven_selectorare20elementslong,whiledataisjust10elementslong.compresswillstopassoonasdatahasyieldeditslastelement.Runningthiscodeproducesthefollowing:
$pythoncompress.py
[0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1]
[0,1,2,3,4,5,6,7,8,9]
[0,2,4,6,8]
[1,3,5,7,9]
It'saveryfastandnicewayofselectingelementsoutofaniterable.Thecodeisverysimple,justnoticethatinsteadofusingaforlooptoiterateovereachvaluethatisgivenbackbythecompresscalls,weusedlist(),whichdoesthesame,butinsteadofexecutingabodyofinstructions,putsallthevaluesintoalistand
returnsit.
CombinatoricgeneratorsLastbutnotleast,combinatoricgenerators.Thesearereallyfun,ifyouareintothiskindofthing.Let'sjustseeasimpleexampleonpermutations.
AccordingtoWolframMathworld:
Apermutation,alsocalledan"arrangementnumber"or"order",isarearrangementoftheelementsofanorderedlistSintoaone-to-onecorrespondencewithSitself.
Forexample,therearesixpermutationsofABC:ABC,ACB,BAC,BCA,CAB,andCBA.
IfasethasNelements,thenthenumberofpermutationsofthemisN!(Nfactorial).FortheABCstring,thepermutationsare3!=3*2*1=6.Let'sdoitinPython:
#permutations.py
fromitertoolsimportpermutations
print(list(permutations('ABC')))
Thisveryshortsnippetofcodeproducesthefollowingresult:
$pythonpermutations.py
[('A','B','C'),('A','C','B'),('B','A','C'),('B','C','A'),('C','A','B'),
('C','B','A')]
Beverycarefulwhenyouplaywithpermutations.Theirnumbergrowsataratethatisproportionaltothefactorialofthenumberoftheelementsyou'repermuting,andthatnumbercangetreallybig,reallyfast.
SummaryInthischapter,we'vetakenanothersteptowardexpandingourcodingvocabulary.We'veseenhowtodrivetheexecutionofthecodebyevaluatingconditions,andwe'veseenhowtoloopanditerateoversequencesandcollectionsofobjects.Thisgivesusthepowertocontrolwhathappenswhenourcodeisrun,whichmeanswearegettinganideaofhowtoshapeitsothatitdoeswhatwewantanditreactstodatathatchangesdynamically.
We'vealsoseenhowtocombineeverythingtogetherinacoupleofsimpleexamples,andintheend,wetookabrieflookattheitertoolsmodule,whichisfullofinterestingiteratorsthatcanenrichourabilitieswithPythonevenmore.
Nowit'stimetoswitchgears,takeanotherstepforward,andtalkaboutfunctions.Thenextchapterisallaboutthembecausetheyareextremelyimportant.Makesureyou'recomfortablewithwhathasbeencovereduptonow.Iwanttoprovideyouwithinterestingexamples,soI'llhavetogoalittlefaster.Ready?Turnthepage.
Functions,theBuildingBlocksofCode"Tocreatearchitectureistoputinorder.Putwhatinorder?Functionsandobjects."
–LeCorbusier
Inthepreviouschapters,wehaveseenthateverythingisanobjectinPython,andfunctionsarenoexception.But,whatexactlyisafunction?Afunctionisasequenceofinstructionsthatperformatask,bundledasaunit.Thisunitcanthenbeimportedandusedwhereverit'sneeded.Therearemanyadvantagestousingfunctionsinyourcode,aswe'llseeshortly.
Inthischapter,wearegoingtocoverthefollowing:
Functions—whattheyareandwhyweshouldusethemScopesandnameresolutionFunctionsignatures—inputparametersandreturnvaluesRecursiveandanonymousfunctionsImportingobjectsforcodereuse
Ibelievethesaying,apictureisworthonethousandwords,isparticularlytruewhenexplainingfunctionstosomeonewhoisnewtothisconcept,sopleasetakealookatthefollowingdiagram:
Asyoucansee,afunctionisablockofinstructions,packagedasawhole,likeabox.Functionscanacceptinputargumentsandproduceoutputvalues.Bothoftheseareoptional,aswe'llseeintheexamplesinthischapter.
AfunctioninPythonisdefinedbyusingthedefkeyword,afterwhichthenameofthefunctionfollows,terminatedbyapairofparentheses(whichmayormay
notcontaininputparameters),andacolon(:)signalstheendofthefunctiondefinitionline.Immediatelyafterwards,indentedbyfourspaces,wefindthebodyofthefunction,whichisthesetofinstructionsthatthefunctionwillexecutewhencalled.
Notethattheindentationbyfourspacesisnotmandatory,butitistheamountofspacessuggestedbyPEP8,and,inpractice,itisthemostwidelyusedspacingmeasure.
Afunctionmayormaynotreturnanoutput.Ifafunctionwantstoreturnanoutput,itdoessobyusingthereturnkeyword,followedbythedesiredoutput.Ifyouhaveaneagleeye,youmayhavenoticedthelittle*afterOptionalintheoutputsectionoftheprecedingdiagram.ThisisbecauseafunctionalwaysreturnssomethinginPython,evenifyoudon'texplicitlyusethereturnclause.Ifthefunctionhasnoreturnstatementinitsbody,ornovalueisgiventothereturnstatementitself,thefunctionreturnsNone.Thereasonsbehindthisdesignchoiceareoutsidethescopeofanintroductorychapter,soallyouneedtoknowisthatthisbehaviorwillmakeyourlifeeasier.Asalways,thankyou,Python.
Whyusefunctions?
Functionsareamongthemostimportantconceptsandconstructsofanylanguage,soletmegiveyouafewreasonswhyweneedthem:
Theyreducecodeduplicationinaprogram.Byhavingaspecifictasktakencareofbyaniceblockofpackagedcodethatwecanimportandcallwheneverwewant,wedon'tneedtoduplicateitsimplementation.Theyhelpinsplittingacomplextaskorprocedureintosmallerblocks,eachofwhichbecomesafunction.Theyhidetheimplementationdetailsfromtheirusers.Theyimprovetraceability.Theyimprovereadability.
Let'slookatafewexamplestogetabetterunderstandingofeachpoint.
ReducingcodeduplicationImaginethatyouarewritingapieceofscientificsoftware,andyouneedtocalculateprimesuptoalimit,aswedidinthepreviouschapter.Youhaveanicealgorithmtocalculatethem,soyoucopyandpasteittowhereveryouneed.Oneday,though,yourfriend,B.Riemann,givesyouabetteralgorithmtocalculateprimes,whichwillsaveyoualotoftime.Atthispoint,youneedtogooveryourwholecodebaseandreplacetheoldcodewiththenewone.
Thisisactuallyabadwaytogoaboutit.It'serror-prone,youneverknowwhatlinesyouarechoppingoutorleavinginbymistake,whenyoucutandpastecodeintoothercode,andyoumayalsoriskmissingoneoftheplaceswhereprimecalculationisdone,leavingyoursoftwareinaninconsistentstatewherethesameactionisperformedindifferentplacesindifferentways.Whatif,insteadofreplacingcodewithabetterversionofit,youneedtofixabug,andyoumissoneoftheplaces?Thatwouldbeevenworse.
So,whatshouldyoudo?Simple!Youwriteafunction,get_prime_numbers(upto),anduseitanywhereyouneedalistofprimes.WhenB.Riemanncomestoyouandgivesyouthenewcode,allyouhavetodoisreplacethebodyofthatfunctionwiththenewimplementation,andyou'redone!Therestofthesoftwarewillautomaticallyadapt,sinceit'sjustcallingthefunction.
Yourcodewillbeshorter,itwillnotsufferfrominconsistenciesbetweenoldandnewwaysofperformingatask,orundetectedbugsduetocopy-and-pastefailuresoroversights.Usefunctions,andyou'llonlygainfromit,Ipromise.
SplittingacomplextaskFunctionsarealsoveryusefulforsplittinglongorcomplextasksintosmallerones.Theendresultisthatthecodebenefitsfromitinseveralways,forexample,readability,testability,andreuse.Togiveyouasimpleexample,imaginethatyou'repreparingareport.Yourcodeneedstofetchdatafromadatasource,parseit,filterit,polishit,andthenawholeseriesofalgorithmsneedstoberunagainstit,inordertoproducetheresultsthatwillfeedtheReportclass.It'snotuncommontoreadprocedureslikethisthatarejustonebigdo_report(data_source)function.Therearetensorhundredsoflinesofcodethatendwithreturnreport.
Thesesituationsareslightlymorecommoninscientificcode,whichtendtobebrilliantfromanalgorithmicpointofview,butsometimeslackthetouchofexperiencedprogrammerswhenitcomestothestyleinwhichtheyarewritten.Now,pictureafewhundredlinesofcode.It'sveryhardtofollowthrough,tofindtheplaceswherethingsarechangingcontext(suchasfinishingonetaskandstartingthenextone).Doyouhavethepictureinyourmind?Good.Don'tdoit!Instead,lookatthiscode:
#data.science.example.py
defdo_report(data_source):
#fetchandpreparedata
data=fetch_data(data_source)
parsed_data=parse_data(data)
filtered_data=filter_data(parsed_data)
polished_data=polish_data(filtered_data)
#runalgorithmsondata
final_data=analyse(polished_data)
#createandreturnreport
report=Report(final_data)
returnreport
Thepreviousexampleisfictitious,ofcourse,butcanyouseehoweasyitwouldbetogothroughthecode?Iftheendresultlookswrong,itwouldbeveryeasytodebugeachofthesingledataoutputsinthedo_reportfunction.Moreover,it'seveneasiertoexcludepartoftheprocesstemporarilyfromthewholeprocedure(youjustneedtocommentoutthepartsyouneedtosuspend).Codelikethisiseasiertodealwith.
HidingimplementationdetailsLet'sstaywiththeprecedingexampletotalkaboutthispointaswell.Youcanseethat,bygoingthroughthecodeofthedo_reportfunction,youcangetaprettygoodunderstandingwithoutreadingonesinglelineofimplementation.Thisisbecausefunctionshidetheimplementationdetails.Thisfeaturemeansthat,ifyoudon'tneedtodelveintothedetails,youarenotforcedto,inthewayyouwouldifdo_reportwasjustonebig,fatfunction.Inordertounderstandwhatwasgoingon,youwouldhavetoreadeverysinglelineofcode.Withfunctions,youdon'tneedto.Thisreducesthetimeyouspendreadingthecodeandsince,inaprofessionalenvironment,readingcodetakesmuchmoretimethanactuallywritingit,it'sveryimportanttoreduceitbyasmuchaswecan.
ImprovingreadabilityCoderssometimesdon'tseethepointinwritingafunctionwithabodyofoneortwolinesofcode,solet'slookatanexamplethatshowsyouwhyyoushoulddoit.
Imaginethatyouneedtomultiplytwomatrices:
Wouldyouprefertohavetoreadthiscode:
#matrix.multiplication.nofunc.py
a=[[1,2],[3,4]]
b=[[5,1],[2,1]]
c=[[sum(i*jfori,jinzip(r,c))forcinzip(*b)]
forrina]
Orwouldyoupreferthisone:
#matrix.multiplication.func.py
#thisfunctioncouldalsobedefinedinanothermodule
defmatrix_mul(a,b):
return[[sum(i*jfori,jinzip(r,c))forcinzip(*b)]
forrina]
a=[[1,2],[3,4]]
b=[[5,1],[2,1]]
c=matrix_mul(a,b)
It'smucheasiertounderstandthatcistheresultofthemultiplicationbetweenaandbinthesecondexample.It'smucheasiertoreadthroughthecodeand,ifyoudon'tneedtomodifythatmultiplicationlogic,youdon'tevenneedtogointotheimplementationdetails.Therefore,readabilityisimprovedherewhile,inthefirstsnippet,youwouldhavetospendtimetryingtounderstandwhatthatcomplicatedlistcomprehensionisdoing.
Don'tworryifyoudon'tunderstandlistcomprehensions,we'llstudytheminChapter5,SavingTimeandMemory.
ImprovingtraceabilityImaginethatyouhavewrittenane-commercewebsite.Youhavedisplayedtheproductpricesalloverthepages.ImaginethatthepricesinyourdatabasearestoredwithnoVAT(salestax),butyouwanttodisplaythemonthewebsitewithVATat20%.Here'safewwaysofcalculatingtheVAT-inclusivepricefromtheVAT-exclusiveprice:
#vat.py
price=100#GBP,noVAT
final_price1=price*1.2
final_price2=price+price/5.0
final_price3=price*(100+20)/100.0
final_price4=price+price*0.2
AllthesefourdifferentwaysofcalculatingaVAT-inclusivepriceareperfectlyacceptable,andIpromiseyouIhavefoundthemallinmycolleagues'code,overtheyears.Now,imaginethatyouhavestartedsellingyourproductsindifferentcountriesandsomeofthemhavedifferentVATrates,soyouneedtorefactoryourcode(throughoutthewebsite)inordertomakethatVATcalculationdynamic.
HowdoyoutracealltheplacesinwhichyouareperformingaVATcalculation?CodingtodayisacollaborativetaskandyoucannotbesurethattheVAThasbeencalculatedusingonlyoneofthoseforms.It'sgoingtobehell,believeme.
So,let'swriteafunctionthattakestheinputvalues,vatandprice(VAT-exclusive),andreturnsaVAT-inclusiveprice:
#vat.function.py
defcalculate_price_with_vat(price,vat):
returnprice*(100+vat)/100
NowyoucanimportthatfunctionanduseitinanyplaceinyourwebsitewhereyouneedtocalculateaVAT-inclusiveprice,andwhenyouneedtotracethosecalls,youcansearchforcalculate_price_with_vat.
Notethat,intheprecedingexample,priceisassumedtobeVAT-exclusive,andvatisapercentagevalue(forexample,19,20,or23).
ScopesandnameresolutionDoyourememberwhenwetalkedaboutscopesandnamespacesinChapter1,AGentleIntroductiontoPython?We'regoingtoexpandonthatconceptnow.Finally,wecantalkaboutfunctionsandthiswillmakeeverythingeasiertounderstand.Let'sstartwithaverysimpleexample:
#scoping.level.1.py
defmy_function():
test=1#thisisdefinedinthelocalscopeofthefunction
print('my_function:',test)
test=0#thisisdefinedintheglobalscope
my_function()
print('global:',test)
Ihavedefinedthetestnameintwodifferentplacesinthepreviousexample.Itisactuallyintwodifferentscopes.Oneistheglobalscope(test=0),andtheotheristhelocalscopeofthemy_functionfunction(test=1).Ifyouexecutethecode,you'llseethis:
$pythonscoping.level.1.py
my_function:1
global:0
It'sclearthattest=1shadowsthetest=0assignmentinmy_function.Intheglobalcontext,testisstill0,asyoucanseefromtheoutputoftheprogram,butwedefinethetestnameagaininthefunctionbody,andwesetittopointtoanintegerofvalue1.Boththetwotestnamesthereforeexist,oneintheglobalscope,pointingtoanintobjectwithavalueof0,theotherinthemy_functionscope,pointingtoanintobjectwithavalueof1.Let'scommentoutthelinewithtest=1.Pythonsearchesforthetestnameinthenextenclosingnamespace(recalltheLEGBrule:local,enclosing,global,built-indescribedinChapter1,AGentleIntroductiontoPython)and,inthiscase,wewillseethevalue0printedtwice.Tryitinyourcode.
Now,let'sraisethestakeshereandlevelup:
#scoping.level.2.py
defouter():
test=1#outerscope
definner():
test=2#innerscope
print('inner:',test)
inner()
print('outer:',test)
test=0#globalscope
outer()
print('global:',test)
Intheprecedingcode,wehavetwolevelsofshadowing.Onelevelisinthefunctionouter,andtheotheroneisinthefunctioninner.Itisfarfromrocketscience,butitcanbetricky.Ifwerunthecode,weget:
$pythonscoping.level.2.py
inner:2
outer:1
global:0
Trycommentingoutthetest=1line.Canyoufigureoutwhattheresultwillbe?Well,whenreachingtheprint('outer:',test)line,Pythonwillhavetolookfortestinthenextenclosingscope,thereforeitwillfindandprint0,insteadof1.Makesureyoucommentouttest=2aswell,toseewhetheryouunderstandwhathappens,andwhethertheLEGBruleisclear,beforeproceeding.
AnotherthingtonoteisthatPythongivesyoutheabilitytodefineafunctioninanotherfunction.Theinnerfunction'snameisdefinedwithinthenamespaceoftheouterfunction,exactlyaswouldhappenwithanyothername.
TheglobalandnonlocalstatementsGoingbacktotheprecedingexample,wecanalterwhathappenstotheshadowingofthetestnamebyusingoneofthesetwospecialstatements:globalandnonlocal.Asyoucanseefromthepreviousexample,whenwedefinetest=2intheinnerfunction,weoverwritetestneitherintheouterfunctionnorintheglobalscope.Wecangetreadaccesstothosenamesifweusetheminanestedscopethatdoesn'tdefinethem,butwecannotmodifythembecause,whenwewriteanassignmentinstruction,we'reactuallydefininganewnameinthecurrentscope.
Howdowechangethisbehavior?Well,wecanusethenonlocalstatement.Accordingtotheofficialdocumentation:
"Thenonlocalstatementcausesthelistedidentifierstorefertopreviouslyboundvariablesinthenearestenclosingscopeexcludingglobals."
Let'sintroduceitintheinnerfunction,andseewhathappens:
#scoping.level.2.nonlocal.py
defouter():
test=1#outerscope
definner():
nonlocaltest
test=2#nearestenclosingscope(whichis'outer')
print('inner:',test)
inner()
print('outer:',test)
test=0#globalscope
outer()
print('global:',test)
Noticehowinthebodyoftheinnerfunction,Ihavedeclaredthetestnametobenonlocal.Runningthiscodeproducesthefollowingresult:
$pythonscoping.level.2.nonlocal.py
inner:2
outer:2
global:0
Wow,lookatthatresult!Itmeansthat,bydeclaringtesttobenonlocalintheinnerfunction,weactuallygettobindthetestnametotheonedeclaredintheouter
function.Ifweremovedthenonlocaltestlinefromtheinnerfunctionandtriedthesametrickintheouterfunction,wewouldgetaSyntaxError,becausethenonlocalstatementworksonenclosingscopesexcludingtheglobalone.
Isthereawaytogettothattest=0intheglobalnamespacethen?Ofcourse,wejustneedtousetheglobalstatement:
#scoping.level.2.global.py
defouter():
test=1#outerscope
definner():
globaltest
test=2#globalscope
print('inner:',test)
inner()
print('outer:',test)
test=0#globalscope
outer()
print('global:',test)
Notethatwehavenowdeclaredthetestnametobeglobal,whichwillbasicallybindittotheonewedefinedintheglobalnamespace(test=0).Runthecodeandyoushouldgetthefollowing:
$pythonscoping.level.2.global.py
inner:2
outer:1
global:2
Thisshowsthatthenameaffectedbythetest=2assignmentisnowtheglobalone.Thistrickwouldalsoworkintheouterfunctionbecause,inthiscase,we'rereferringtotheglobalscope.Tryitforyourselfandseewhatchanges,getcomfortablewithscopesandnameresolution,it'sveryimportant.Also,couldyoutellwhathappensifyoudefinedinneroutsideouterintheprecedingexamples?
Inputparameters
Atthebeginningofthischapter,wesawthatafunctioncantakeinputparameters.Beforewedelveintoallpossibletypeofparameters,let'smakesureyouhaveaclearunderstandingofwhatpassingaparametertoafunctionmeans.Therearethreekeypointstokeepinmind:
Argument-passingisnothingmorethanassigninganobjecttoalocalvariablenameAssigninganobjecttoanargumentnameinsideafunctiondoesn'taffectthecallerChangingamutableobjectargumentinafunctionaffectsthecaller
Let'slookatanexampleforeachofthesepoints.
Argument-passingTakealookatthefollowingcode.Wedeclareaname,x,intheglobalscope,thenwedeclareafunction,func(y),andfinallywecallit,passingx:#key.points.argument.passing.pyx=3deffunc(y):print(y)func(x)#prints:3
Whenfunciscalledwithx,withinitslocalscope,aname,y,iscreated,andit'spointedtothesameobjectxispointingto.Thisisbetterclarifiedbythefollowingfigure(don'tworryaboutPython3.3,thisisafeaturethathasn't
changed):
Therightpartoftheprecedingfiguredepictsthestateoftheprogramwhenexecutionhasreachedtheend,afterfunchasreturned(None).TakealookattheFramescolumn,andnotethatwehavetwonames,xandfunc,intheglobalnamespace(Globalframe),pointingtoanint(withavalueof3)andtoafunctionobject,respectively.Rightbeneathit,intherectangletitledfunc,wecanseethefunction'slocalnamespace,inwhichonlyonenamehasbeendefined:y.Becausewehavecalledfuncwithx(line5intheleftpartofthefigure),yispointingtothesameobjectthatxispointingto.Thisiswhathappensunderthehoodwhenanargumentispassedtoafunction.Ifwehadusedthenamexinsteadofyinthefunctiondefinition,thingswouldhavebeenexactlythesame(onlymaybeabitconfusingatfirst),therewouldbealocalxinthefunction,andaglobalxoutside,aswesawintheScopesandnameresolutionsectionpreviouslyinthischapter.
So,inanutshell,whatreallyhappensisthatthefunctioncreates,initslocalscope,thenamesdefinedasargumentsand,whenwecallit,webasicallytellPythonwhichobjectsthosenamesmustbepointedtoward.
Assignmenttoargumentnamesdoesn'taffectthecaller
Thisissomethingthatcanbetrickytounderstandatfirst,solet'slookatanexample:
#key.points.assignment.py
x=3
deffunc(x):
x=7#definingalocalx,notchangingtheglobalone
func(x)
print(x)#prints:3
Intheprecedingcode,whenthex=7lineisexecuted,withinthelocalscopeofthefuncfunction,thename,x,ispointedtoanintegerwithavalueof7,leavingtheglobalxunaltered.
ChangingamutableaffectsthecallerThisisthefinalpoint,andit'sveryimportantbecausePythonapparentlybehavesdifferentlywithmutables(justapparently,though).Let'slookatanexample:
#key.points.mutable.py
x=[1,2,3]
deffunc(x):
x[1]=42#thisaffectsthecaller!
func(x)
print(x)#prints:[1,42,3]
Wow,weactuallychangedtheoriginalobject!Ifyouthinkaboutit,thereisnothingweirdinthisbehavior.Thexnameinthefunctionissettopointtothecallerobjectbythefunctioncallandwithinthebodyofthefunction,we'renotchangingx,inthatwe'renotchangingitsreference,or,inotherwords,wearenotchangingtheobjectxispointingto.We'reaccessingthatobject'selementatposition1,andchangingitsvalue.
Rememberpoint#2undertheInputparameterssection:Assigninganobjecttoanargumentnamewithinafunctiondoesn'taffectthecaller.Ifthatiscleartoyou,thefollowingcodeshouldnotbesurprising:
#key.points.mutable.assignment.py
x=[1,2,3]
deffunc(x):
x[1]=42#thischangesthecaller!
x='somethingelse'#thispointsxtoanewstringobject
func(x)
print(x)#stillprints:[1,42,3]
TakealookatthetwolinesIhavehighlighted.Atfirst,likebefore,wejustaccessthecallerobjectagain,atposition1,andchangeitsvaluetonumber42.Then,wereassignxtopointtothe'somethingelse'string.Thisleavesthecallerunalteredand,infact,theoutputisthesameasthatoftheprevioussnippet.
Takeyourtimetoplayaroundwiththisconcept,andexperimentwithprintsandcallstotheidfunctionuntileverythingisclearinyourmind.ThisisoneofthekeyaspectsofPythonanditmustbeveryclear,otherwiseyouriskintroducingsubtlebugsintoyourcode.Onceagain,thePythonTutorwebsite(http://www.pytho
ntutor.com/)willhelpyoualotbygivingyouavisualrepresentationoftheseconcepts.
Nowthatwehaveagoodunderstandingofinputparametersandhowtheybehave,let'sseehowwecanspecifythem.
HowtospecifyinputparametersTherearefivedifferentwaysofspecifyinginputparameters:
PositionalargumentsKeywordargumentsVariablepositionalargumentsVariablekeywordargumentsKeyword-onlyarguments
Let'slookatthemonebyone.
Positionalarguments
Positionalargumentsarereadfromlefttorightandtheyarethemostcommontypeofarguments:
#arguments.positional.py
deffunc(a,b,c):
print(a,b,c)
func(1,2,3)#prints:123
Thereisnotmuchelsetosay.Theycanbeasnumerousasyouwantandtheyareassignedbyposition.Inthefunctioncall,1comesfirst,2comessecond,and3comesthird,thereforetheyareassignedtoa,b,andc,respectively.
KeywordargumentsanddefaultvaluesKeywordargumentsareassignedbykeywordusingthename=valuesyntax:
#arguments.keyword.py
deffunc(a,b,c):
print(a,b,c)
func(a=1,c=2,b=3)#prints:132
Keywordargumentsarematchedbyname,evenwhentheydon'trespectthedefinition'soriginalposition(we'llseethatthereisalimitationtothisbehaviorlater,whenwemixandmatchdifferenttypesofarguments).
Thecounterpartofkeywordarguments,onthedefinitionside,isdefaultvalues.Thesyntaxisthesame,name=value,andallowsustonothavetoprovideanargumentifwearehappywiththegivendefault:
#arguments.default.py
deffunc(a,b=4,c=88):
print(a,b,c)
func(1)#prints:1488
func(b=5,a=7,c=9)#prints:759
func(42,c=9)#prints:4249
func(42,43,44)#prints:42,43,44
Thearetwothingstonotice,whichareveryimportant.Firstofall,youcannotspecifyadefaultargumentontheleftofapositionalone.Second,notehowintheexamples,whenanargumentispassedwithoutusingtheargument_name=valuesyntax,itmustbethefirstoneinthelist,anditisalwaysassignedtoa.Noticealsothatpassingvaluesinapositionalfashionstillworks,andfollowsthefunctionsignatureorder(lastlineoftheexample).
Tryandscramblethoseargumentsandseewhathappens.Pythonerrormessagesareverygoodattellingyouwhat'swrong.So,forexample,ifyoutriedsomethingsuchasthis:
#arguments.default.error.py
deffunc(a,b=4,c=88):
print(a,b,c)
func(b=1,c=2,42)#positionalargumentafterkeywordone
Youwouldgetthefollowingerror:
$pythonarguments.default.error.py
File"arguments.default.error.py",line4
func(b=1,c=2,42)#positionalargumentafterkeywordone
^
SyntaxError:positionalargumentfollowskeywordargument
Thisinformsyouthatyou'vecalledthefunctionincorrectly.
VariablepositionalargumentsSometimesyoumaywanttopassavariablenumberofpositionalargumentstoafunction,andPythonprovidesyouwiththeabilitytodoit.Let'slookataverycommonusecase,theminimumfunction.Thisisafunctionthatcalculatestheminimumofitsinputvalues:
#arguments.variable.positional.py
defminimum(*n):
#print(type(n))#nisatuple
ifn:#explainedafterthecode
mn=n[0]
forvalueinn[1:]:
ifvalue<mn:
mn=value
print(mn)
minimum(1,3,-7,9)#n=(1,3,-7,9)-prints:-7
minimum()#n=()-prints:nothing
Asyoucansee,whenwespecifyaparameterprependinga*toitsname,wearetellingPythonthatthatparameterwillbecollectingavariablenumberofpositionalarguments,accordingtohowthefunctioniscalled.Withinthefunction,nisatuple.Uncommentprint(type(n))toseeforyourselfandplayaroundwithitforabit.
Haveyounoticedhowwecheckedwhethernwasn'temptywithasimpleifn:?ThisisbecausecollectionobjectsevaluatetoTruewhennon-empty,andotherwiseFalseinPython.Thisistruefortuples,sets,lists,dictionaries,andsoon.Oneotherthingtonoteisthatwemaywanttothrowanerrorwhenwecallthefunctionwithnoarguments,insteadofsilentlydoingnothing.Inthiscontext,we'renotconcernedaboutmakingthisfunctionrobust,butinunderstandingvariablepositionalarguments.
Let'smakeanotherexampletoshowyoutwothingsthat,inmyexperience,areconfusingtothosewhoarenewtothis:
#arguments.variable.positional.unpacking.py
deffunc(*args):
print(args)
values=(1,3,-7,9)
func(values)#equivalentto:func((1,3,-7,9))
func(*values)#equivalentto:func(1,3,-7,9)
Takeagoodlookatthelasttwolinesoftheprecedingexample.Inthefirstone,
wecallfuncwithoneargument,afour-elementstuple.Inthesecondexample,byusingthe*syntax,we'redoingsomethingcalledunpacking,whichmeansthatthefour-elementstupleisunpacked,andthefunctioniscalledwithfourarguments:1,3,-7,9.
ThisbehaviorispartofthemagicPythondoestoallowyoutodoamazingthingswhencallingfunctionsdynamically.
VariablekeywordargumentsVariablekeywordargumentsareverysimilartovariablepositionalarguments.Theonlydifferenceisthesyntax(**insteadof*)andthattheyarecollectedinadictionary.Collectionandunpackingworkinthesameway,solet'slookatanexample:
#arguments.variable.keyword.py
deffunc(**kwargs):
print(kwargs)
#Allcallsequivalent.Theyprint:{'a':1,'b':42}
func(a=1,b=42)
func(**{'a':1,'b':42})
func(**dict(a=1,b=42))
Allthecallsareequivalentintheprecedingexample.Youcanseethataddinga**infrontoftheparameternameinthefunctiondefinitiontellsPythontousethatnametocollectavariablenumberofkeywordparameters.Ontheotherhand,whenwecallthefunction,wecaneitherpassname=valueargumentsexplicitly,orunpackadictionaryusingthesame**syntax.
Thereasonwhybeingabletopassavariablenumberofkeywordparametersissoimportantmaynotbeevidentatthemoment,so,howaboutamorerealisticexample?Let'sdefineafunctionthatconnectstoadatabase.Wewanttoconnecttoadefaultdatabasebysimplycallingthisfunctionwithnoparameters.Wealsowanttoconnecttoanyotherdatabasebypassingthefunctiontheappropriatearguments.Beforeyoureadon,trytospendacoupleofminutesfiguringoutasolutionbyyourself:
#arguments.variable.db.py
defconnect(**options):
conn_params={
'host':options.get('host','127.0.0.1'),
'port':options.get('port',5432),
'user':options.get('user',''),
'pwd':options.get('pwd',''),
}
print(conn_params)
#wethenconnecttothedb(commentedout)
#db.connect(**conn_params)
connect()
connect(host='127.0.0.42',port=5433)
connect(port=5431,user='fab',pwd='gandalf')
Notethatinthefunction,wecanprepareadictionaryofconnectionparameters(conn_params)usingdefaultvaluesasfallbacks,allowingthemtobeoverwritteniftheyareprovidedinthefunctioncall.Therearebetterwaystodothiswithfewerlinesofcode,butwe'renotconcernedwiththatrightnow.Runningtheprecedingcodeyieldsthefollowingresult:
$pythonarguments.variable.db.py
{'host':'127.0.0.1','port':5432,'user':'','pwd':''}
{'host':'127.0.0.42','port':5433,'user':'','pwd':''}
{'host':'127.0.0.1','port':5431,'user':'fab','pwd':'gandalf'}
Notethecorrespondencebetweenthefunctioncallsandtheoutput.Noticehowdefaultvaluesareoverriddenaccordingtowhatwaspassedtothefunction.
Keyword-onlyargumentsPython3allowsforanewtypeofparameter:thekeyword-onlyparameter.Wearegoingtostudythemonlybrieflyastheirusecasesarenotthatfrequent.Therearetwowaysofspecifyingthem,eitherafterthevariablepositionalarguments,orafterabare*.Let'sseeanexampleofboth:#arguments.keyword.only.pydefkwo(*a,c):print(a,c)
kwo(1,2,3,c=7)#prints:(1,2,3)7kwo(c=4)#prints:()4#kwo(1,2)#breaks,invalidsyntax,withthefollowingerror#TypeError:kwo()missing1requiredkeyword-onlyargument:'c'
defkwo2(a,b=42,*,c):print(a,b,c)
kwo2(3,b=7,c=99)#prints:3799kwo2(3,c=13)#prints:34213#kwo2(3,23)#breaks,invalidsyntax,withthefollowingerror#TypeError:kwo2()missing1requiredkeyword-onlyargument:'c'
Asanticipated,thefunction,kwo,takesavariablenumberofpositionalarguments(a)andakeyword-onlyone,c.TheresultsofthecallsarestraightforwardandyoucanuncommentthethirdcalltoseewhaterrorPythonreturns.
Thesameappliestothefunction,kwo2,whichdiffersfromkwointhatittakesapositionalargument,a,akeywordargument,b,andthenakeyword-onlyone,c.Youcanuncommentthethirdcalltoseetheerror.
Nowthatyouknowhowtospecifydifferenttypesofinputparameters,let'sseehowyoucancombinetheminfunctiondefinitions.
CombininginputparametersYoucancombineinputparameters,aslongasyoufollowtheseorderingrules:
Whendefiningafunction,normalpositionalargumentscomefirst(name),thenanydefaultarguments(name=value),thenthevariablepositionalarguments(*nameorsimply*),thenanykeyword-onlyarguments(eithernameorname=valueformisgood),andthenanyvariablekeywordarguments(**name).
Ontheotherhand,whencallingafunction,argumentsmustbegiveninthefollowingorder:positionalargumentsfirst(value),thenanycombinationofkeywordarguments(name=value),variablepositionalarguments(*name),andthenvariablekeywordarguments(**name).
Sincethiscanbeabittrickywhenlefthanginginthetheoreticalworld,let'slookatacoupleofquickexamples:
#arguments.all.py
deffunc(a,b,c=7,*args,**kwargs):
print('a,b,c:',a,b,c)
print('args:',args)
print('kwargs:',kwargs)
func(1,2,3,*(5,7,9),**{'A':'a','B':'b'})
func(1,2,3,5,7,9,A='a',B='b')#sameaspreviousone
Notetheorderoftheparametersinthefunctiondefinition,andthatthetwocallsareequivalent.Inthefirstone,we'reusingtheunpackingoperatorsforiterablesanddictionaries,whileinthesecondonewe'reusingamoreexplicitsyntax.Theexecutionofthisyieldsthefollowing(Iprintedonlytheresultofonecall,theotheronebeingthesame):
$pythonarguments.all.py
a,b,c:123
args:(5,7,9)
kwargs:{'A':'a','B':'b'}
Let'snowlookatanexamplewithkeyword-onlyarguments:
#arguments.all.kwonly.py
deffunc_with_kwonly(a,b=42,*args,c,d=256,**kwargs):
print('a,b:',a,b)
print('c,d:',c,d)
print('args:',args)
print('kwargs:',kwargs)
#bothcallsequivalent
func_with_kwonly(3,42,c=0,d=1,*(7,9,11),e='E',f='F')
func_with_kwonly(3,42,*(7,9,11),c=0,d=1,e='E',f='F')
NotethatIhavehighlightedthekeyword-onlyargumentsinthefunctiondeclaration.Theycomeafterthe*argsvariablepositionalargument,anditwouldbethesameiftheycamerightafterasingle*(inwhichcasetherewouldn'tbeavariablepositionalargument).Theexecutionofthisyieldsthefollowing(Iprintedonlytheresultofonecall):
$pythonarguments.all.kwonly.py
a,b:342
c,d:01
args:(7,9,11)
kwargs:{'e':'E','f':'F'}
OneotherthingtonoteisthenamesIgavetothevariablepositionalandkeywordarguments.You'refreetochoosedifferently,butbeawarethatargsandkwargsaretheconventionalnamesgiventotheseparameters,atleastgenerically.
AdditionalunpackinggeneralizationsOneoftherecentnewfeatures,introducedinPython3.5,istheabilitytoextendtheiterable(*)anddictionary(**)unpackingoperatorstoallowunpackinginmorepositions,anarbitrarynumberoftimes,andinadditionalcircumstances.I'llpresentyouwithanexampleconcerningfunctioncalls:
#additional.unpacking.py
defadditional(*args,**kwargs):
print(args)
print(kwargs)
args1=(1,2,3)
args2=[4,5]
kwargs1=dict(option1=10,option2=20)
kwargs2={'option3':30}
additional(*args1,*args2,**kwargs1,**kwargs2)
Inthepreviousexample,wedefinedasimplefunctionthatprintsitsinputarguments,argsandkwargs.Thenewfeatureliesinthewaywecallthisfunction.Noticehowwecanunpackmultipleiterablesanddictionaries,andtheyarecorrectlycoalescedunderargsandkwargs.Thereasonwhythisfeatureisimportantisthatitallowsusnottohavetomergeargs1withargs2,andkwargs1withkwargs2inthecode.Runningthecodeproduces:
$pythonadditional.unpacking.py
(1,2,3,4,5)
{'option1':10,'option2':20,'option3':30}
PleaserefertoPEP448(https://www.python.org/dev/peps/pep-0448/)tolearnthefullextentofthisnewfeatureandseefurtherexamples.
Avoidthetrap!MutabledefaultsOnethingtobeveryawareofwithPythonisthatdefaultvaluesarecreatedatdeftime,therefore,subsequentcallstothesamefunctionwillpossiblybehavedifferentlyaccordingtothemutabilityoftheirdefaultvalues.Let'slookatanexample:
#arguments.defaults.mutable.py
deffunc(a=[],b={}):
print(a)
print(b)
print('#'*12)
a.append(len(a))#thiswillaffecta'sdefaultvalue
b[len(a)]=len(a)#andthiswillaffectb'sone
func()
func()
func()
Bothparametershavemutabledefaultvalues.Thismeansthat,ifyouaffectthoseobjects,anymodificationwillstickaroundinsubsequentfunctioncalls.Seeifyoucanunderstandtheoutputofthosecalls:
$pythonarguments.defaults.mutable.py
[]
{}
############
[0]
{1:1}
############
[0,1]
{1:1,2:2}
############
It'sinteresting,isn'tit?Whilethisbehaviormayseemveryweirdatfirst,itactuallymakessense,andit'sveryhandy,forexample,whenusingmemoizationtechniques(Googleanexampleofthat,ifyou'reinterested).Evenmoreinterestingiswhathappenswhen,betweenthecalls,weintroduceonethatdoesn'tusedefaults,suchasthis:
#arguments.defaults.mutable.intermediate.call.py
func()
func(a=[1,2,3],b={'B':1})
func()
Whenwerunthiscode,thisistheoutput:
$pythonarguments.defaults.mutable.intermediate.call.py
[]
{}
############
[1,2,3]
{'B':1}
############
[0]
{1:1}
############
Thisoutputshowsusthatthedefaultsareretainedevenifwecallthefunctionwithothervalues.Onequestionthatcomestomindis,howdoIgetafreshemptyvalueeverytime?Well,theconventionisthefollowing:
#arguments.defaults.mutable.no.trap.py
deffunc(a=None):
ifaisNone:
a=[]
#dowhateveryouwantwith`a`...
Notethat,byusingtheprecedingtechnique,ifaisn'tpassedwhencallingthefunction,youalwaysgetabrandnew,emptylist.
Okay,enoughwiththeinput,let'slookattheothersideofthecoin,theoutput.
ReturnvaluesThereturnvaluesoffunctionsareoneofthosethingswherePythonisaheadofmostotherlanguages.Functionsareusuallyallowedtoreturnoneobject(onevalue)but,inPython,youcanreturnatuple,andthisimpliesthatyoucanreturnwhateveryouwant.Thisfeatureallowsacodertowritesoftwarethatwouldbemuchhardertowriteinanyotherlanguage,orcertainlymoretedious.We'vealreadysaidthattoreturnsomethingfromafunctionweneedtousethereturnstatement,followedbywhatwewanttoreturn.Therecanbeasmanyreturnstatementsasneededinthebodyofafunction.
Ontheotherhand,ifwithinthebodyofafunctionwedon'treturnanything,orweinvokeabarereturnstatement,thefunctionwillreturnNone.Thisbehaviorisharmlessand,eventhoughIdon'thavetheroomheretogointodetailexplainingwhyPythonwasdesignedlikethis,letmejusttellyouthatthisfeatureallowsforseveralinterestingpatterns,andconfirmsPythonasaveryconsistentlanguage.
Isayit'sharmlessbecauseyouareneverforcedtocollecttheresultofafunctioncall.I'llshowyouwhatImeanwithanexample:
#return.none.py
deffunc():
pass
func()#thereturnofthiscallwon'tbecollected.It'slost.
a=func()#thereturnofthisoneinsteadiscollectedinto`a`
print(a)#prints:None
Notethatthewholebodyofthefunctioniscomposedonlyofthepassstatement.Astheofficialdocumentationtellsus,passisanulloperation.Whenitisexecuted,nothinghappens.Itisusefulasaplaceholderwhenastatementisrequiredsyntactically,butnocodeneedstobeexecuted.Inotherlanguages,wewouldprobablyjustindicatethatwithapairofcurlybrackets({}),whichdefineanemptyscope,butinPython,ascopeisdefinedbyindentingcode,thereforeastatementsuchaspassisnecessary.
Noticealsothatthefirstcallofthefuncfunctionreturnsavalue(None)whichwedon'tcollect.AsIsaidbefore,collectingthereturnvalueofafunctioncallisnot
mandatory.
Now,that'sgoodbutnotveryinterestingso,howaboutwewriteaninterestingfunction?RememberthatinChapter1,AGentleIntroductiontoPython,wetalkedaboutthefactorialofafunction.Let'swriteourownhere(forsimplicity,IwillassumethefunctionisalwayscalledcorrectlywithappropriatevaluessoIwon'tsanity-checktheinputargument):
#return.single.value.py
deffactorial(n):
ifnin(0,1):
return1
result=n
forkinrange(2,n):
result*=k
returnresult
f5=factorial(5)#f5=120
Notethatwehavetwopointsofreturn.Ifniseither0or1(inPythonit'scommontousetheintypeofcheck,asIdidinsteadofthemoreverboseifn==0orn==1:),wereturn1.Otherwise,weperformtherequiredcalculationandwereturnresult.Let'strytowritethisfunctionalittlebitmoresuccinctly:
#return.single.value.2.py
fromfunctoolsimportreduce
fromoperatorimportmul
deffactorial(n):
returnreduce(mul,range(1,n+1),1)
f5=factorial(5)#f5=120
Iknowwhatyou'rethinking:oneline?Pythoniselegant,andconcise!Ithinkthisfunctionisreadableevenifyouhaveneverseenreduceormul,butifyoucan'treaditorunderstandit,setasideafewminutesanddosomeresearchonthePythondocumentationuntilitsbehavioriscleartoyou.Beingabletolookupfunctionsinthedocumentationandunderstandcodewrittenbysomeoneelseisataskeverydeveloperneedstobeabletoperform,sotakethisasachallenge.
Tothisend,makesureyoulookupthehelpfunction,whichprovesquitehelpfulwhenexploringwiththeconsole.
ReturningmultiplevaluesUnlikeinmostotherlanguages,inPythonit'sveryeasytoreturnmultipleobjectsfromafunction.Thisfeatureopensupawholeworldofpossibilitiesandallowsyoutocodeinastylethatishardtoreproducewithotherlanguages.Ourthinkingislimitedbythetoolsweuse,thereforewhenPythongivesyoumorefreedomthanotherlanguages,itisactuallyboostingyourowncreativityaswell.Toreturnmultiplevaluesisveryeasy,youjustusetuples(eitherexplicitlyorimplicitly).Let'slookatasimpleexamplethatmimicsthedivmodbuilt-infunction:#return.multiple.pydefmoddiv(a,b):returna//b,a%b
print(moddiv(20,7))#prints(2,6)
Icouldhavewrappedthehighlightedpartintheprecedingcodeinbrackets,makingitanexplicittuple,butthere'snoneedforthat.Theprecedingfunctionreturnsboththeresultandtheremainderofthedivision,atthesametime.
Inthesourcecodeforthisexample,Ihaveleftasimpleexampleofatestfunctiontomakesuremycodeisdoingthecorrectcalculation.
Afewusefultips
Whenwritingfunctions,it'sveryusefultofollowguidelinessothatyouwritethemwell.I'llquicklypointsomeofthemout:
Functionsshoulddoonething:Functionsthatdoonethingareeasytodescribeinoneshortsentence.Functionsthatdomultiplethingscanbesplitintosmallerfunctionsthatdoonething.Thesesmallerfunctionsareusuallyeasiertoreadandunderstand.Rememberthedatascienceexamplewesawafewpagesago.Functionsshouldbesmall:Thesmallertheyare,theeasieritistotestthemandtowritethemsothattheydoonething.Thefewerinputparameters,thebetter:Functionsthattakealotofargumentsquicklybecomehardertomanage(amongotherissues).Functionsshouldbeconsistentintheirreturnvalues:ReturningFalseorNoneisnotthesamething,evenifwithinaBooleancontexttheybothevaluatetoFalse.Falsemeansthatwehaveinformation(False),whileNonemeansthatthereisnoinformation.Trywritingfunctionsthatreturninaconsistentway,nomatterwhathappensintheirbody.Functionsshouldn'thavesideeffects:Inotherwords,functionsshouldnotaffectthevaluesyoucallthemwith.Thisisprobablythehardeststatementtounderstandatthispoint,soI'llgiveyouanexampleusinglists.Inthefollowingcode,notehownumbersisnotsortedbythesortedfunction,whichactuallyreturnsasortedcopyofnumbers.Conversely,thelist.sort()methodisactingonthenumbersobjectitself,andthatisfinebecauseitisamethod(afunctionthatbelongstoanobjectandthereforehastherightstomodifyit):
>>>numbers=[4,1,7,5]
>>>sorted(numbers)#won'tsorttheoriginal`numbers`list
[1,4,5,7]
>>>numbers#let'sverify
[4,1,7,5]#good,untouched
>>>numbers.sort()#thiswillactonthelist
>>>numbers
[1,4,5,7]
Followtheseguidelinesandyou'llwritebetterfunctions,whichwillserveyou
well.
Chapter3,FunctionsinCleanCodebyRobertC.Martin,PrenticeHallisdedicatedtofunctionsandit'sprobablythebestsetofguidelinesI'veeverreadonthesubject.
RecursivefunctionsWhenafunctioncallsitselftoproducearesult,itissaidtoberecursive.Sometimesrecursivefunctionsareveryusefulinthattheymakeiteasiertowritecode.Somealgorithmsareveryeasytowriteusingtherecursiveparadigm,whileothersarenot.Thereisnorecursivefunctionthatcannotberewritteninaniterativefashion,soit'susuallyuptotheprogrammertochoosethebestapproachforthecaseathand.
Thebodyofarecursivefunctionusuallyhastwosections:onewherethereturnvaluedependsonasubsequentcalltoitself,andonewhereitdoesn't(calledabasecase).
Asanexample,wecanconsiderthe(hopefullyfamiliarbynow)factorialfunction,N!.ThebasecaseiswhenNiseither0or1.Thefunctionreturns1withnoneedforfurthercalculation.Ontheotherhand,inthegeneralcase,N!returnstheproduct1*2*...*(N-1)*N.Ifyouthinkaboutit,N!canberewrittenlikethis:N!=(N-1)!*N.Asapracticalexample,consider5!=1*2*3*4*5=(1*2*3*4)*5=4!*5.
Let'swritethisdownincode:
#recursive.factorial.py
deffactorial(n):
ifnin(0,1):#basecase
return1
returnfactorial(n-1)*n#recursivecase
Whenwritingrecursivefunctions,alwaysconsiderhowmanynestedcallsyoumake,sincethereisalimit.Forfurtherinformationonthis,checkoutsys.getrecursionlimit()andsys.setrecursionlimit().
Recursivefunctionsareusedalotwhenwritingalgorithmsandtheycanbereallyfuntowrite.Asanexercise,trytosolveacoupleofsimpleproblemsusingbotharecursiveandaniterativeapproach.
AnonymousfunctionsOnelasttypeoffunctionsthatIwanttotalkaboutareanonymousfunctions.Thesefunctions,whicharecalledlambdasinPython,areusuallyusedwhenafully-fledgedfunctionwithitsownnamewouldbeoverkill,andallwewantisaquick,simpleone-linerthatdoesthejob.
ImaginethatyouwantalistofallthenumbersuptoNthataremultiplesoffive.Imaginethatyouwanttofilterthoseoutusingthefilterfunction,whichtakesafunctionandaniterableandconstructsafilterobjectthatyoucaniterateon,fromthoseelementsofiterablesforwhichthefunctionreturnsTrue.Withoutusingananonymousfunction,youwoulddosomethinglikethis:
#filter.regular.py
defis_multiple_of_five(n):
returnnotn%5
defget_multiples_of_five(n):
returnlist(filter(is_multiple_of_five,range(n)))
Notehowweuseis_multiple_of_fivetofilterthefirstnnaturalnumbers.Thisseemsabitexcessive,thetaskissimpleandwedon'tneedtokeeptheis_multiple_of_fivefunctionaroundforanythingelse.Let'srewriteitusingalambdafunction:
#filter.lambda.py
defget_multiples_of_five(n):
returnlist(filter(lambdak:notk%5,range(n)))
Thelogicisexactlythesamebutthefilteringfunctionisnowalambda.Definingalambdaisveryeasyandfollowsthisform:func_name=lambda[parameter_list]:expression.Afunctionobjectisreturned,whichisequivalenttothis:deffunc_name([parameter_list]):returnexpression.
Notethatoptionalparametersareindicatedfollowingthecommonsyntaxofwrappingtheminsquarebrackets.
Let'slookatanothercoupleofexamplesofequivalentfunctionsdefinedinthetwoforms:
#lambda.explained.py
#example1:adder
defadder(a,b):
returna+b
#isequivalentto:
adder_lambda=lambdaa,b:a+b
#example2:touppercase
defto_upper(s):
returns.upper()
#isequivalentto:
to_upper_lambda=lambdas:s.upper()
Theprecedingexamplesareverysimple.Thefirstoneaddstwonumbers,andthesecondoneproducestheuppercaseversionofastring.NotethatIassignedwhatisreturnedbythelambdaexpressionstoaname(adder_lambda,to_upper_lambda),butthereisnoneedforthatwhenyouuselambdasinthewaywedidinthefilterexample.
FunctionattributesEveryfunctionisafully-fledgedobjectand,assuch,theyhavemanyattributes.Someofthemarespecialandcanbeusedinanintrospectivewaytoinspectthefunctionobjectatruntime.Thefollowingscriptisanexamplethatshowsapartofthemandhowtodisplaytheirvalueforanexamplefunction:
#func.attributes.py
defmultiplication(a,b=1):
"""Returnamultipliedbyb."""
returna*b
special_attributes=[
"__doc__","__name__","__qualname__","__module__",
"__defaults__","__code__","__globals__","__dict__",
"__closure__","__annotations__","__kwdefaults__",
]
forattributeinspecial_attributes:
print(attribute,'->',getattr(multiplication,attribute))
Iusedthebuilt-ingetattrfunctiontogetthevalueofthoseattributes.getattr(obj,attribute)isequivalenttoobj.attributeandcomesinhandywhenweneedtogetanattributeatruntimeusingitsstringname.Runningthisscriptyields:
$pythonfunc.attributes.py
__doc__->Returnamultipliedbyb.
__name__->multiplication
__qualname__->multiplication
__module__->__main__
__defaults__->(1,)
__code__-><codeobjectmultiplicationat0x10caf7660,file"func.attributes.py",line
1>
__globals__->{...omitted...}
__dict__->{}
__closure__->None
__annotations__->{}
__kwdefaults__->None
Ihaveomittedthevalueofthe__globals__attribute,asitwastoobig.AnexplanationofthemeaningofthisattributecanbefoundintheCallabletypessectionofthePythonDataModeldocumentationpage(https://docs.python.org/3/reference/datamodel.html#the-standard-type-hierarchy).Shouldyouwanttoseealltheattributesofanobject,justcalldir(object_name)andyou'llbegiventhelistofallofitsattributes.
Built-infunctionsPythoncomeswithalotofbuilt-infunctions.Theyareavailableanywhereandyoucangetalistofthembyinspectingthebuiltinsmodulewithdir(__builtins__),orbygoingtotheofficialPythondocumentation.Unfortunately,Idon'thavetheroomtogothroughallofthemhere.We'vealreadyseensomeofthem,suchasany,bin,bool,divmod,filter,float,getattr,id,int,len,list,min,print,set,tuple,type,andzip,buttherearemanymore,whichyoushouldreadatleastonce.Getfamiliarwiththem,experiment,writeasmallpieceofcodeforeachofthem,andmakesureyouhavethematyourfingertipssothatyoucanusethemwhenyouneedthem.
OnefinalexampleBeforewefinishoffthischapter,howaboutonelastexample?Iwasthinkingwecouldwriteafunctiontogeneratealistofprimenumbersuptoalimit.We'vealreadyseenthecodeforthissolet'smakeitafunctionand,tokeepitinteresting,let'soptimizeitabit.
Itturnsoutthatyoudon'tneedtodivideitbyallnumbersfrom2toN-1todecidewhetheranumber,N,isprime.Youcanstopat√N.Moreover,youdon'tneedtotestthedivisionforallnumbersfrom2to√N,youcanjustusetheprimesinthatrange.I'llleaveittoyoutofigureoutwhythisworks,ifyou'reinterested.Let'sseehowthecodechanges:
#primes.py
frommathimportsqrt,ceil
defget_primes(n):
"""Calculatealistofprimesupton(included)."""
primelist=[]
forcandidateinrange(2,n+1):
is_prime=True
root=ceil(sqrt(candidate))#divisionlimit
forprimeinprimelist:#wetryonlytheprimes
ifprime>root:#noneedtocheckanyfurther
break
ifcandidate%prime==0:
is_prime=False
break
ifis_prime:
primelist.append(candidate)
returnprimelist
Thecodeisthesameasinthepreviouschapter.Wehavechangedthedivisionalgorithmsothatweonlytestdivisibilityusingthepreviouslycalculatedprimesandwestoppedoncethetestingdivisorwasgreaterthantherootofthecandidate.Weusedtheprimelistresultlisttogettheprimesforthedivision.Wecalculatedtherootvalueusingafancyformula,theintegervalueoftheceilingoftherootofthecandidate.Whileasimpleint(k**0.5)+1wouldhaveservedourpurposeaswell,theformulaIchoseiscleanerandrequiresmetouseacoupleofimports,whichIwantedtoshowyou.Checkoutthefunctionsinthemathmodule,theyareveryinteresting!
DocumentingyourcodeI'mabigfanofcodethatdoesn'tneeddocumentation.Whenyouprogramcorrectly,choosetherightnamesandtakecareofthedetails,yourcodeshouldcomeoutasself-explanatoryanddocumentationshouldnotbeneeded.Sometimesacommentisveryusefulthough,andsoissomedocumentation.YoucanfindtheguidelinesfordocumentingPythoninPEP257-Docstringconventions(https://www.python.org/dev/peps/pep-0257/),butI'llshowyouthebasicshere.
Pythonisdocumentedwithstrings,whichareaptlycalleddocstrings.Anyobjectcanbedocumented,andyoucanuseeitherone-lineormultilinedocstrings.One-linersareverysimple.Theyshouldnotprovideanothersignatureforthefunction,butclearlystateitspurpose:#docstrings.pydefsquare(n):"""Returnthesquareofanumbern."""returnn**2
defget_username(userid):"""Returntheusernameofausergiventheirid."""returndb.get(user_id=userid).username
Usingtripledouble-quotedstringsallowsyoutoexpandeasilylateron.Usesentencesthatendinaperiod,anddon'tleaveblanklinesbeforeorafter.
Multilinecommentsarestructuredinasimilarway.Thereshouldbeaone-linerthatbrieflygivesyouthegistofwhattheobjectisabout,andthenamoreverbosedescription.Asanexample,Ihavedocumentedafictitiousconnectfunction,usingtheSphinxnotation,inthefollowingexample:defconnect(host,port,user,password):"""Connecttoadatabase.
ConnecttoaPostgreSQLdatabasedirectly,usingthegivenparameters.
:paramhost:ThehostIP.:paramport:Thedesiredport.:paramuser:Theconnectionusername.:parampassword:Theconnectionpassword.:return:Theconnectionobject."""#bodyofthefunctionhere...returnconnection
SphinxisprobablythemostwidelyusedtoolforcreatingPythondocumentation.Infact,theofficialPythondocumentationwaswrittenwithit.It'sdefinitelyworthspendingsometimecheckingitout.
ImportingobjectsNowthatyouknowalotaboutfunctions,let'slookathowtousethem.Thewholepointofwritingfunctionsistobeabletoreusethemlater,andinPython,thistranslatestoimportingthemintothenamespacewhereyouneedthem.Therearemanydifferentwaystoimportobjectsintoanamespace,butthemostcommononesareimportmodule_nameandfrommodule_nameimportfunction_name.Ofcourse,thesearequitesimplisticexamples,butbearwithmeforthetimebeing.
Theimportmodule_nameformfindsthemodule_namemoduleanddefinesanameforitinthelocalnamespacewheretheimportstatementisexecuted.Thefrommodule_nameimportidentifierformisalittlebitmorecomplicatedthanthat,butbasicallydoesthesamething.Itfindsmodule_nameandsearchesforanattribute(orasubmodule)andstoresareferencetoidentifierinthelocalnamespace.
Bothformshavetheoptiontochangethenameoftheimportedobjectusingtheasclause:
frommymoduleimportmyfuncasbetter_named_func
Justtogiveyouaflavorofwhatimportinglookslike,here'sanexamplefromatestmoduleofoneofmyprojects(noticethattheblanklinesbetweenblocksofimportsfollowtheguidelinesfromPEP8athttps://www.python.org/dev/peps/pep-0008/#imports:standardlibrary,thirdparty,andlocalcode):
fromdatetimeimportdatetime,timezone#twoimportsonthesameline
fromunittest.mockimportpatch#singleimport
importpytest#thirdpartylibrary
fromcore.modelsimport(#multilineimport
Exam,
Exercise,
Solution,
)
Whenyouhaveastructureoffilesstartingintherootofyourproject,youcanusethedotnotationtogettotheobjectyouwanttoimportintoyourcurrentnamespace,beitapackage,amodule,aclass,afunction,oranythingelse.Thefrommoduleimportsyntaxalsoallowsacatch-allclause,frommoduleimport*,which
issometimesusedtogetallthenamesfromamoduleintothecurrentnamespaceatonce,butit'sfrowneduponforseveralreasons,suchasperformanceandtheriskofsilentlyshadowingothernames.YoucanreadallthatthereistoknowaboutimportsintheofficialPythondocumentationbut,beforeweleavethesubject,letmegiveyouabetterexample.
Imaginethatyouhavedefinedacoupleoffunctions:square(n)andcube(n)inamodule,funcdef.py,whichisinthelibfolder.Youwanttousetheminacoupleofmodulesthatareatthesamelevelofthelibfolder,calledfunc_import.pyandfunc_from.py.Showingthetreestructureofthatprojectproducessomethinglikethis:
├──func_from.py
├──func_import.py
├──lib
├──funcdef.py
└──__init__.py
BeforeIshowyouthecodeofeachmodule,pleaserememberthatinordertotellPythonthatitisactuallyapackage,weneedtoputa__init__.pymoduleinit.
Therearetwothingstonoteaboutthe__init__.pyfile.Firstofall,itisafully-fledgedPythonmodulesoyoucanputcodeintoitasyouwouldwithanyothermodule.Second,asofPython3.3,itspresenceisnolongerrequiredtomakeafolderbeinterpretedasaPythonpackage.
Thecodeisasfollows:
#funcdef.py
defsquare(n):
returnn**2
defcube(n):
returnn**3
#func_import.py
importlib.funcdef
print(lib.funcdef.square(10))
print(lib.funcdef.cube(10))
#func_from.py
fromlib.funcdefimportsquare,cube
print(square(10))
print(cube(10))
Boththesefiles,whenexecuted,print100and1000.Youcanseehowdifferentlywethenaccessthesquareandcubefunctions,accordingtohowandwhatweimportedinthecurrentscope.
Relativeimports
Theimportswe'veseensofararecalledabsolute,thatis,theydefinethewholepathofthemodulethatwewanttoimport,orfromwhichwewanttoimportanobject.ThereisanotherwayofimportingobjectsintoPython,whichiscalledarelativeimport.It'shelpfulinsituationswherewewanttorearrangethestructureoflargepackageswithouthavingtoeditsub-packages,orwhenwewanttomakeamoduleinsideapackageabletoimportitself.Relativeimportsaredonebyaddingasmanyleadingdotsinfrontofthemoduleasthenumberoffoldersweneedtobacktrack,inordertofindwhatwe'researchingfor.Simplyput,itissomethingsuchasthis:
from.mymoduleimportmyfunc
Foracompleteexplanationofrelativeimports,refertoPEP328(https://www.python.org/dev/peps/pep-0328/).Inlaterchapters,we'llcreateprojectsusingdifferentlibrariesandwe'lluseseveraldifferenttypesofimports,includingrelativeones,somakesureyoutakeabitoftimetoreadupaboutitintheofficialPythondocumentation.
SummaryInthischapter,weexploredtheworldoffunctions.Theyareextremelyimportantand,fromnowon,we'llusethembasicallyeverywhere.Wetalkedaboutthemainreasonsforusingthem,themostimportantofwhicharecodereuseandimplementationhiding.
Wesawthatafunctionobjectislikeaboxthattakesoptionalinputsandproducesoutputs.Wecanfeedinputvaluestoafunctioninmanydifferentways,usingpositionalandkeywordarguments,andusingvariablesyntaxforbothtypes.
Nowyoushouldknowhowtowriteafunction,documentit,importitintoyourcode,andcallit.
Thenextchapterwillforcemetopushmyfootdownonthethrottleevenmore,soIsuggestyoutakeanyopportunityyougettoconsolidateandenrichtheknowledgeyou'vegatheredsofarbyputtingyournoseintothePythonofficialdocumentation.
SavingTimeandMemory"It'snotthedailyincreasebutdailydecrease.Hackawayattheunessential."
–BruceLee
IlovethisquotefromBruceLee.Hewassuchawiseman!Especially,thesecondpart,""hackawayattheunessential"",istomewhatmakesacomputerprogramelegant.Afterall,ifthereisabetterwayofdoingthingssothatwedon'twastetimeormemory,whynot?
Sometimes,therearevalidreasonsfornotpushingourcodeuptothemaximumlimit:forexample,sometimestoachieveanegligibleimprovement,wehavetosacrificeonreadabilityormaintainability.Doesitmakeanysensetohaveawebpageservedin1secondwithunreadable,complicatedcode,whenwecanserveitin1.05secondswithreadable,cleancode?No,itmakesnosense.
Ontheotherhand,sometimesit'sperfectlyreasonabletotrytoshaveoffamillisecondfromafunction,especiallywhenthefunctionismeanttobecalledthousandsoftimes.Everymillisecondyousavetheremeansonesecondsavedperthousandsofcalls,andthiscouldbemeaningfulforyourapplication.
Inlightoftheseconsiderations,thefocusofthischapterwillnotbetogiveyouthetoolstopushyourcodetotheabsolutelimitsofperformanceandoptimization"nomatterwhat,"butrather,toenableyoutowriteefficient,elegantcodethatreadswell,runsfast,anddoesn'twasteresourcesinanobviousway.
Inthischapter,wearegoingtocoverthefollowing:
Themap,zip,andfilterfunctionsComprehensionsGenerators
Iwillperformseveralmeasurementsandcomparisons,andcautiouslydrawsomeconclusions.Pleasedokeepinmindthatonadifferentboxwithadifferentsetuporadifferentoperatingsystem,resultsmayvary.Takealookatthiscode:
#squares.py
defsquare1(n):
returnn**2#squaringthroughthepoweroperator
defsquare2(n):
returnn*n#squaringthroughmultiplication
Bothfunctionsreturnthesquareofn,butwhichisfaster?FromasimplebenchmarkIranonthem,itlookslikethesecondisslightlyfaster.Ifyouthinkaboutit,itmakessense:calculatingthepowerofanumberinvolvesmultiplicationandtherefore,whateveralgorithmyoumayusetoperformthepoweroperation,it'snotlikelytobeatasimplemultiplicationsuchastheoneinsquare2.
Dowecareaboutthisresult?Inmostcases,no.Ifyou'recodingane-commercewebsite,chancesareyouwon'teverevenneedtoraiseanumbertothesecondpower,andifyoudo,it'slikelytobeasporadicoperation.Youdon'tneedtoconcernyourselfwithsavingafractionofamicrosecondonafunctionyoucallafewtimes.
So,whendoesoptimizationbecomeimportant?Oneverycommoncaseiswhenyouhavetodealwithhugecollectionsofdata.Ifyou'reapplyingthesamefunctiononamillioncustomerobjects,thenyouwantyourfunctiontobetuneduptoitsbest.Gaining1/10ofasecondonafunctioncalledonemilliontimessavesyou100,000seconds,whichisabout27.7hours.That'snotthesame,right?So,let'sfocusoncollections,andlet'sseewhichtoolsPythongivesyoutohandlethemwithefficiencyandgrace.
Manyoftheconceptswewillseeinthischapterarebasedonthoseoftheiteratoranditerable.Simplyput,theabilityforanobjecttoreturnitsnextelementwhenasked,andtoraiseaStopIterationexceptionwhenexhausted.We'llseehowtocodeacustomiteratoranditerableobjectsinChapter6,OOP,Decorators,andIterators.
Duetothenatureoftheobjectswe'regoingtoexploreinthischapter,Iwasoftenforcedtowrapthecodeinalistconstructor.Thisisbecausepassinganiterator/generatortolist(...)exhaustsitandputsallthegenerateditemsinanewlycreatedlist,whichIcaneasilyprinttoshowyouitscontent.Thistechniquehindersreadability,soletmeintroduceanaliasforlist:
#alias.py
>>>range(7)
range(0,7)
>>>list(range(7))#putallelementsinalisttoviewthem
[0,1,2,3,4,5,6]
>>>_=list#createan"alias"tolist
>>>_(range(7))#sameaslist(range(7))
[0,1,2,3,4,5,6]
OfthethreesectionsIhavehighlighted,thefirstoneisthecallweneedtodoinordertoshowwhatwouldbegeneratedbyrange(7),thesecondoneisthemomentwhenIcreatethealiastolist(Ichosethehopefullyunobtrusiveunderscore),andthethirdoneistheequivalentcall,whenIusethealiasinsteadoflist.
Hopefullyreadabilitywillbenefitfromthis,andpleasekeepinmindthatIwillassumethisaliastohavebeendefinedforallthecodeinthischapter.
Themap,zip,andfilterfunctionsWe'llstartbyreviewingmap,filter,andzip,whicharethemainbuilt-infunctionsonecanemploywhenhandlingcollections,andthenwe'lllearnhowtoachievethesameresultsusingtwoveryimportantconstructs:comprehensionsandgenerators.Fastenyourseatbelt!
mapAccordingtotheofficialPythondocumentation:
map(function,iterable,...)returnsaniteratorthatappliesfunctiontoeveryitemofiterable,yieldingtheresults.Ifadditionaliterableargumentsarepassed,functionmusttakethatmanyargumentsandisappliedtotheitemsfromalliterablesinparallel.Withmultipleiterables,theiteratorstopswhentheshortestiterableisexhausted.
Wewillexplaintheconceptofyieldinglateroninthechapter.Fornow,let'stranslatethisintocode—we'llusealambdafunctionthattakesavariablenumberofpositionalarguments,andjustreturnsthemasatuple:
#map.example.py
>>>map(lambda*a:a,range(3))#1iterable
<mapobjectat0x10acf8f98>#Notuseful!Let'susealias
>>>_(map(lambda*a:a,range(3)))#1iterable
[(0,),(1,),(2,)]
>>>_(map(lambda*a:a,range(3),'abc'))#2iterables
[(0,'a'),(1,'b'),(2,'c')]
>>>_(map(lambda*a:a,range(3),'abc',range(4,7)))#3
[(0,'a',4),(1,'b',5),(2,'c',6)]
>>>#mapstopsattheshortestiterator
>>>_(map(lambda*a:a,(),'abc'))#emptytupleisshortest
[]
>>>_(map(lambda*a:a,(1,2),'abc'))#(1,2)shortest
[(1,'a'),(2,'b')]
>>>_(map(lambda*a:a,(1,2,3,4),'abc'))#'abc'shortest
[(1,'a'),(2,'b'),(3,'c')]
Intheprecedingcode,youcanseewhywehavetowrapcallsinlist(...)(oritsalias,_,inthiscase).Withoutit,Igetthestringrepresentationofamapobject,whichisnotreallyusefulinthiscontext,isit?
Youcanalsonoticehowtheelementsofeachiterableareappliedtothefunction;atfirst,thefirstelementofeachiterable,thenthesecondoneofeachiterable,andsoon.Noticealsothatmapstopswhentheshortestoftheiterableswecalleditwithisexhausted.Thisisactuallyaverynicebehavior;itdoesn'tforceustoleveloffalltheiterablestoacommonlength,anditdoesn'tbreakiftheyaren'tallthesamelength.
mapisveryusefulwhenyouhavetoapplythesamefunctiontooneormorecollectionsofobjects.Asamoreinterestingexample,let'sseethedecorate-sort-undecorateidiom(alsoknownasSchwartziantransform).It'sa
techniquethatwasextremelypopularwhenPythonsortingwasn'tprovidingkey-functions,andthereforeislessusedtoday,butit'sacooltrickthatstillcomesinhandyonceinawhile.
Let'sseeavariationofitinthenextexample:wewanttosortindescendingorderbythesumofcreditsaccumulatedbystudents,sotohavethebeststudentatposition0.Wewriteafunctiontoproduceadecoratedobject,wesort,andthenweundecorate.Eachstudenthascreditsinthree(possiblydifferent)subjects.Inthiscontext,todecorateanobjectmeanstotransformit,eitheraddingextradatatoit,orputtingitintoanotherobject,inawaythatallowsustobeabletosorttheoriginalobjectsthewaywewant.ThistechniquehasnothingtodowithPythondecorators,whichwewillexplorelateroninthebook.
Afterthesorting,werevertthedecoratedobjectstogettheoriginalonesfromthem.Thisiscalledtoundecorate:
#decorate.sort.undecorate.py
students=[
dict(id=0,credits=dict(math=9,physics=6,history=7)),
dict(id=1,credits=dict(math=6,physics=7,latin=10)),
dict(id=2,credits=dict(history=8,physics=9,chemistry=10)),
dict(id=3,credits=dict(math=5,physics=5,geography=7)),
]
defdecorate(student):
#createa2-tuple(sumofcredits,student)fromstudentdict
return(sum(student['credits'].values()),student)
defundecorate(decorated_student):
#discardsumofcredits,returnoriginalstudentdict
returndecorated_student[1]
students=sorted(map(decorate,students),reverse=True)
students=_(map(undecorate,students))
Let'sstartbyunderstandingwhateachstudentobjectis.Infact,let'sprintthefirstone:
{'credits':{'history':7,'math':9,'physics':6},'id':0}
Youcanseethatit'sadictionarywithtwokeys:idandcredits.Thevalueofcreditsisalsoadictionaryinwhichtherearethreesubject/gradekey/valuepairs.AsI'msureyourecallfromourvisitinthedatastructuresworld,callingdict.values()returnsanobjectsimilartoiterable,withonlythevalues.Therefore,sum(student['credits'].values())forthefirststudentisequivalenttosum((9,6,7)).
Let'sprinttheresultofcallingdecoratewiththefirststudent:
>>>decorate(students[0])
(22,{'credits':{'history':7,'math':9,'physics':6},'id':0})
Ifwedecorateallthestudentslikethis,wecansortthemontheirtotalamountofcreditsbyjustsortingthelistoftuples.Inordertoapplythedecorationtoeachiteminstudents,wecallmap(decorate,students).Thenwesorttheresult,andthenweundecorateinasimilarfashion.Ifyouhavegonethroughthepreviouschapterscorrectly,understandingthiscodeshouldn'tbetoohard.
Printingstudentsafterrunningthewholecodeyields:
$pythondecorate.sort.undecorate.py
[{'credits':{'chemistry':10,'history':8,'physics':9},'id':2},
{'credits':{'latin':10,'math':6,'physics':7},'id':1},
{'credits':{'history':7,'math':9,'physics':6},'id':0},
{'credits':{'geography':7,'math':5,'physics':5},'id':3}]
Andyoucansee,bytheorderofthestudentobjects,thattheyhaveindeedbeensortedbythesumoftheircredits.
Formoreonthedecorate-sort-undecorateidiom,there'saveryniceintroductioninthesortinghow-tosectionoftheofficialPythondocumentation(https://docs.python.org/3.7/howto/sorting.html#the-old-way-using-decorate-sort-undecorate).
Onethingtonoticeaboutthesortingpart:whatiftwoormorestudentssharethesametotalsum?Thesortingalgorithmwouldthenproceedtosortthetuplesbycomparingthestudentobjectswitheachother.Thisdoesn'tmakeanysense,andinmorecomplexcases,couldleadtounpredictableresults,orevenerrors.Ifyouwanttobesuretoavoidthisissue,onesimplesolutionistocreateathree-tupleinsteadofatwo-tuple,havingthesumofcreditsinthefirstposition,thepositionofthestudentobjectinthestudentslistinthesecondone,andthestudentobjectitselfinthethirdone.Thisway,ifthesumofcreditsisthesame,thetupleswillbesortedagainsttheposition,whichwillalwaysbedifferentandthereforeenoughtoresolvethesortingbetweenanypairoftuples.
zipWe'vealreadycoveredzipinthepreviouschapters,solet'sjustdefineitproperlyandthenIwanttoshowyouhowyoucouldcombineitwithmap.
AccordingtothePythondocumentation:
zip(*iterables)returnsaniteratoroftuples,wherethei-thtuplecontainsthei-thelementfromeachoftheargumentsequencesoriterables.Theiteratorstopswhentheshortestinputiterableisexhausted.Withasingleiterableargument,itreturnsaniteratorof1-tuples.Withnoarguments,itreturnsanemptyiterator.
Let'sseeanexample:
#zip.grades.py
>>>grades=[18,23,30,27]
>>>avgs=[22,21,29,24]
>>>_(zip(avgs,grades))
[(22,18),(21,23),(29,30),(24,27)]
>>>_(map(lambda*a:a,avgs,grades))#equivalenttozip
[(22,18),(21,23),(29,30),(24,27)]
Intheprecedingcode,we'rezippingtogethertheaverageandthegradeforthelastexam,foreachstudent.Noticehoweasyitistoreproducezipusingmap(lasttwoinstructionsoftheexample).Hereaswell,tovisualizeresultswehavetouseour_alias.
Asimpleexampleonthecombineduseofmapandzipcouldbeawayofcalculatingtheelement-wisemaximumamongstsequences,thatis,themaximumofthefirstelementofeachsequence,thenthemaximumofthesecondone,andsoon:
#maxims.py
>>>a=[5,9,2,4,7]
>>>b=[3,7,1,9,2]
>>>c=[6,8,0,5,3]
>>>maxs=map(lambdan:max(*n),zip(a,b,c))
>>>_(maxs)
[6,9,2,9,7]
Noticehoweasyitistocalculatethemaxvaluesofthreesequences.zipisnotstrictlyneededofcourse,wecouldjustusemap.Sometimesit'shard,whenshowingasimpleexample,tograspwhyusingatechniquemightbegoodorbad.Weforgetthatwearen'talwaysincontrolofthesourcecode,wemight
havetouseathird-partylibrary,whichwecan'tchangethewaywewant.Havingdifferentwaystoworkwithdataisthereforereallyhelpful.
filterAccordingtothePythondocumentation:filter(function,iterable)constructaniteratorfromthoseelementsofiterableforwhichfunctionreturnsTrue.iterablemaybeeitherasequence,acontainerwhichsupportsiteration,oraniterator.IffunctionisNone,theidentityfunctionisassumed,thatis,allelementsofiterablethatarefalseareremoved.
Let'sseeaveryquickexample:#filter.py>>>test=[2,5,8,0,0,1,0]>>>_(filter(None,test))[2,5,8,1]>>>_(filter(lambdax:x,test))#equivalenttopreviousone[2,5,8,1]>>>_(filter(lambdax:x>4,test))#keeponlyitems>4[5,8]
Intheprecedingcode,noticehowthesecondcalltofilterisequivalenttothefirstone.Ifwepassafunctionthattakesoneargumentandreturnstheargumentitself,onlythoseargumentsthatareTruewillmakethefunctionreturnTrue,thereforethisbehaviorisexactlythesameaspassingNone.It'softenaverygoodexercisetomimicsomeofthebuilt-inPythonbehaviors.Whenyousucceed,youcansayyoufullyunderstandhowPythonbehavesinaspecificsituation.
Armedwithmap,zip,andfilter(andseveralotherfunctionsfromthePythonstandardlibrary)wecanmassagesequencesveryeffectively.Butthosefunctionsarenottheonlywaytodoit.Solet'sseeoneofthenicestfeaturesofPython:comprehensions.
ComprehensionsComprehensionsareaconcisenotation,bothperformsomeoperationforacollectionofelements,and/orselectasubsetofthemthatmeetsomecondition.TheyareborrowedfromthefunctionalprogramminglanguageHaskell(https://www.haskell.org/),andcontributetogivingPythonafunctionalflavor,togetherwithiteratorsandgenerators.
Pythonoffersyoudifferenttypesofcomprehensions:list,dict,andset.We'llconcentrateonthefirstonefornow,andthenitwillbeeasytoexplaintheothertwo.
Let'sstartwithaverysimpleexample.Iwanttocalculatealistwiththesquaresofthefirst10naturalnumbers.Howwouldyoudoit?Thereareacoupleofequivalentways:
#squares.map.py
#IfyoucodelikethisyouarenotaPythondev!;)
>>>squares=[]
>>>forninrange(10):
...squares.append(n**2)
...
>>>squares
[0,1,4,9,16,25,36,49,64,81]
#Thisisbetter,oneline,niceandreadable
>>>squares=map(lambdan:n**2,range(10))
>>>_(squares)
[0,1,4,9,16,25,36,49,64,81]
Theprecedingexampleshouldbenothingnewforyou.Let'sseehowtoachievethesameresultusingalistcomprehension:
#squares.comprehension.py
>>>[n**2forninrange(10)]
[0,1,4,9,16,25,36,49,64,81]
Assimpleasthat.Isn'titelegant?Basicallywehaveputaforloopwithinsquarebrackets.Let'snowfilterouttheoddsquares.I'llshowyouhowtodoitwithmapandfilterfirst,andthenusingalistcomprehensionagain:
#even.squares.py
#usingmapandfilter
sq1=list(
map(lambdan:n**2,filter(lambdan:notn%2,range(10)))
)
#equivalent,butusinglistcomprehensions
sq2=[n**2forninrange(10)ifnotn%2]
print(sq1,sq1==sq2)#prints:[0,4,16,36,64]True
Ithinkthatnowthedifferenceinreadabilityisevident.Thelistcomprehensionreadsmuchbetter.It'salmostEnglish:givemeallsquares(n**2)fornbetween0and9ifniseven.
AccordingtothePythondocumentation:
Alistcomprehensionconsistsofbracketscontaininganexpressionfollowedbyaforclause,thenzeroormorefororifclauses.Theresultwillbeanewlistresultingfromevaluatingtheexpressioninthecontextoftheforandifclauseswhichfollowit.
NestedcomprehensionsLet'sseeanexampleofnestedloops.It'sverycommonwhendealingwithalgorithmstohavetoiterateonasequenceusingtwoplaceholders.Thefirstonerunsthroughthewholesequence,lefttoright.Thesecondoneaswell,butitstartsfromthefirstone,insteadof0.Theconceptisthatoftestingallpairswithoutduplication.Let'sseetheclassicalforloopequivalent:#pairs.for.loop.pyitems='ABCD'pairs=[]
forainrange(len(items)):forbinrange(a,len(items)):pairs.append((items[a],items[b]))
Ifyouprintpairsattheend,youget:
$pythonpairs.for.loop.py
[('A','A'),('A','B'),('A','C'),('A','D'),('B','B'),('B','C'),('B','D'),
('C','C'),('C','D'),('D','D')]
Allthetupleswiththesameletterarethosewherebisatthesamepositionasa.Now,let'sseehowwecantranslatethisinalistcomprehension:
#pairs.list.comprehension.py
items='ABCD'
pairs=[(items[a],items[b])
forainrange(len(items))forbinrange(a,len(items))]
Thisversionisjusttwolineslongandachievesthesameresult.Noticethatinthisparticularcase,becausetheforloopoverbhasadependencyona,itmustfollowtheforloopoverainthecomprehension.Ifyouswapthemaround,you'llgetanameerror.
FilteringacomprehensionWecanapplyfilteringtoacomprehension.Let'sdoitfirstwithfilter.Let'sfindallPythagoreantripleswhoseshortsidesarenumberssmallerthan10.Weobviouslydon'twanttotestacombinationtwice,andthereforewe'lluseatricksimilartotheonewesawinthepreviousexample:
#pythagorean.triple.py
frommathimportsqrt
#thiswillgenerateallpossiblepairs
mx=10
triples=[(a,b,sqrt(a**2+b**2))
forainrange(1,mx)forbinrange(a,mx)]
#thiswillfilteroutallnonpythagoreantriples
triples=list(
filter(lambdatriple:triple[2].is_integer(),triples))
print(triples)#prints:[(3,4,5.0),(6,8,10.0)]
APythagoreantripleisatriple(a,b,c)ofintegernumberssatisfyingtheequationa2+b2=c2.
Intheprecedingcode,wegeneratedalistofthree-tuples,triples.Eachtuplecontainstwointegernumbers(thelegs),andthehypotenuseofthePythagoreantrianglewhoselegsarethefirsttwonumbersinthetuple.Forexample,whenais3andbis4,thetuplewillbe(3,4,5.0),andwhenais5andbis7,thetuplewillbe(5,7,8.602325267042627).
Afterhavingallthetriplesdone,weneedtofilteroutallthosethatdon'thaveahypotenusethatisanintegernumber.Inordertodothis,wefilterbasedonfloat_number.is_integer()beingTrue.ThismeansthatofthetwoexampletuplesIshowedyoubefore,theonewith5.0hypotenusewillberetained,whiletheonewiththe8.602325267042627hypotenusewillbediscarded.
Thisisgood,butIdon'tlikethatthetriplehastwointegernumbersandafloat.Theyaresupposedtobeallintegers,solet'susemaptofixthis:
#pythagorean.triple.int.py
frommathimportsqrt
mx=10
triples=[(a,b,sqrt(a**2+b**2))
forainrange(1,mx)forbinrange(a,mx)]
triples=filter(lambdatriple:triple[2].is_integer(),triples)
#thiswillmakethethirdnumberinthetuplesinteger
triples=list(
map(lambdatriple:triple[:2]+(int(triple[2]),),triples))
print(triples)#prints:[(3,4,5),(6,8,10)]
Noticethestepweadded.Wetakeeachelementintriplesandwesliceit,takingonlythefirsttwoelementsinit.Then,weconcatenatetheslicewithaone-tuple,inwhichweputtheintegerversionofthatfloatnumberthatwedidn'tlike.Seemslikealotofwork,right?Indeeditis.Let'sseehowtodoallthiswithalistcomprehension:
#pythagorean.triple.comprehension.py
frommathimportsqrt
#thisstepisthesameasbefore
mx=10
triples=[(a,b,sqrt(a**2+b**2))
forainrange(1,mx)forbinrange(a,mx)]
#herewecombinefilterandmapinoneCLEANlistcomprehension
triples=[(a,b,int(c))fora,b,cintriplesifc.is_integer()]
print(triples)#prints:[(3,4,5),(6,8,10)]
Iknow.It'smuchbetter,isn'tit?It'sclean,readable,shorter.Inotherwords,it'selegant.
I'mgoingquitefasthere,asanticipatedintheSummaryofChapter4,Functions,theBuildingBlocksofCode.Areyouplayingwiththiscode?Ifnot,Isuggestyoudo.It'sveryimportantthatyouplayaround,breakthings,changethings,seewhathappens.Makesureyouhaveaclearunderstandingofwhatisgoingon.Youwanttobecomeaninja,right?
dictcomprehensionsDictionaryandsetcomprehensionsworkexactlylikethelistones,onlythereisalittledifferenceinthesyntax.Thefollowingexamplewillsufficetoexplaineverythingyouneedtoknow:
#dictionary.comprehensions.py
fromstringimportascii_lowercase
lettermap=dict((c,k)fork,cinenumerate(ascii_lowercase,1))
Ifyouprintlettermap,youwillseethefollowing(Iomittedthemiddleresults,yougetthegist):
$pythondictionary.comprehensions.py
{'a':1,
'b':2,
...
'y':25,
'z':26}
Whathappensintheprecedingcodeisthatwe'refeedingthedictconstructorwithacomprehension(technically,ageneratorexpression,we'llseeitinabit).Wetellthedictconstructortomakekey/valuepairsfromeachtupleinthecomprehension.WeenumeratethesequenceofalllowercaseASCIIletters,startingfrom1,usingenumerate.Pieceofcake.Thereisalsoanotherwaytodothesamething,whichisclosertotheotherdictionarysyntax:
lettermap={c:kfork,cinenumerate(ascii_lowercase,1)}
Itdoesexactlythesamething,withaslightlydifferentsyntaxthathighlightsabitmoreofthekey:valuepart.
Dictionariesdonotallowduplicationinthekeys,asshowninthefollowingexample:
#dictionary.comprehensions.duplicates.py
word='Hello'
swaps={c:c.swapcase()forcinword}
print(swaps)#prints:{'H':'h','e':'E','l':'L','o':'O'}
Wecreateadictionarywithkeys,thelettersinthe'Hello'string,andvaluesofthesameletters,butwiththecaseswapped.Noticethereisonlyone'l':'L'pair.
Theconstructordoesn'tcomplain,itsimplyreassignsduplicatestothelatestvalue.Let'smakethisclearerwithanotherexample;let'sassigntoeachkeyitspositioninthestring:
#dictionary.comprehensions.positions.py
word='Hello'
positions={c:kfork,cinenumerate(word)}
print(positions)#prints:{'H':0,'e':1,'l':3,'o':4}
Noticethevalueassociatedwiththeletter'l':3.The'l':2pairisn'tthere;ithasbeenoverriddenby'l':3.
setcomprehensionsThesetcomprehensionsareverysimilartolistanddictionaryones.Pythonallowsboththeset()constructortobeused,ortheexplicit{}syntax.Let'sseeonequickexample:#set.comprehensions.pyword='Hello'letters1=set(cforcinword)letters2={cforcinword}print(letters1)#prints:{'H','o','e','l'}print(letters1==letters2)#prints:True
Noticehowforsetcomprehensions,asfordictionaries,duplicationisnotallowedandthereforetheresultingsethasonlyfourletters.Also,noticethattheexpressionsassignedtoletters1andletters2produceequivalentsets.
Thesyntaxusedtocreateletters2isverysimilartotheonewecanusetocreateadictionarycomprehension.Youcanspotthedifferenceonlybythefactthatdictionariesrequirekeysandvalues,separatedbycolumns,whilesetsdon't.
Generators
GeneratorsareverypowerfultoolthatPythongiftsuswith.Theyarebasedontheconceptsofiteration,aswesaidbefore,andtheyallowforcodingpatternsthatcombineelegancewithefficiency.
Generatorsareoftwotypes:
Generatorfunctions:Theseareverysimilartoregularfunctions,butinsteadofreturningresultsthroughreturnstatements,theyuseyield,whichallowsthemtosuspendandresumetheirstatebetweeneachcallGeneratorexpressions:Theseareverysimilartothelistcomprehensionswe'veseeninthischapter,butinsteadofreturningalisttheyreturnanobjectthatproducesresultsonebyone
GeneratorfunctionsGeneratorfunctionsbehavelikeregularfunctionsinallrespects,exceptforonedifference.Insteadofcollectingresultsandreturningthematonce,theyareautomaticallyturnedintoiteratorsthatyieldresultsoneatatimewhenyoucallnextonthem.GeneratorfunctionsareautomaticallyturnedintotheirowniteratorsbyPython.
Thisisallverytheoreticalso,let'smakeitclearwhysuchamechanismissopowerful,andthenlet'sseeanexample.
SayIaskedyoutocountoutloudfrom1to1,000,000.Youstart,andatsomepointIaskyoutostop.Aftersometime,Iaskyoutoresume.Atthispoint,whatistheminimuminformationyouneedtobeabletoresumecorrectly?Well,youneedtorememberthelastnumberyoucalled.IfIstoppedyouafter31,415,youwilljustgoonwith31,416,andsoon.
Thepointis,youdon'tneedtorememberallthenumbersyousaidbefore31,415,nordoyouneedthemtobewrittendownsomewhere.Well,youmaynotknowit,butyou'rebehavinglikeageneratoralready!
Takeagoodlookatthefollowingcode:
#first.n.squares.py
defget_squares(n):#classicfunctionapproach
return[x**2forxinrange(n)]
print(get_squares(10))
defget_squares_gen(n):#generatorapproach
forxinrange(n):
yieldx**2#weyield,wedon'treturn
print(list(get_squares_gen(10)))
Theresultofthetwoprintstatementswillbethesame:[0,1,4,9,16,25,36,49,64,81].Butthereisahugedifferencebetweenthetwofunctions.get_squaresisaclassicfunctionthatcollectsallthesquaresofnumbersin[0,n)inalist,andreturnsit.Ontheotherhand,get_squares_genisagenerator,andbehavesverydifferently.Eachtimetheinterpreterreachestheyieldline,itsexecutionissuspended.Theonlyreasonthoseprintstatementsreturnthesameresultisbecausewefedget_squares_gentothelistconstructor,whichexhauststhe
generatorcompletelybyaskingthenextelementuntilaStopIterationisraised.Let'sseethisindetail:
#first.n.squares.manual.py
defget_squares_gen(n):
forxinrange(n):
yieldx**2
squares=get_squares_gen(4)#thiscreatesageneratorobject
print(squares)#<generatorobjectget_squares_genat0x10dd...>
print(next(squares))#prints:0
print(next(squares))#prints:1
print(next(squares))#prints:4
print(next(squares))#prints:9
#thefollowingraisesStopIteration,thegeneratorisexhausted,
#anyfurthercalltonextwillkeepraisingStopIteration
print(next(squares))
Intheprecedingcode,eachtimewecallnextonthegeneratorobject,weeitherstartit(firstnext)ormakeitresumefromthelastsuspensionpoint(anyothernext).
Thefirsttimewecallnextonit,weget0,whichisthesquareof0,then1,then4,then9,andsincetheforloopstopsafterthat(nis4),thenthegeneratornaturallyends.AclassicfunctionwouldatthatpointjustreturnNone,butinordertocomplywiththeiterationprotocol,ageneratorwillinsteadraiseaStopIterationexception.
Thisexplainshowaforloopworks.Whenyoucallforkinrange(n),whathappensunderthehoodisthattheforloopgetsaniteratoroutofrange(n)andstartscallingnextonit,untilStopIterationisraised,whichtellstheforloopthattheiterationhasreacheditsend.
HavingthisbehaviorbuiltintoeveryiterationaspectofPythonmakesgeneratorsevenmorepowerfulbecauseoncewewritethem,we'llbeabletoplugthemintowhateveriterationmechanismwewant.
Atthispoint,you'reprobablyaskingyourselfwhyyouwouldwanttouseageneratorinsteadofaregularfunction.Well,thetitleofthischaptershouldsuggesttheanswer.I'lltalkaboutperformanceslater,sofornowlet'sconcentrateonanotheraspect:sometimesgeneratorsallowyoutodosomethingthatwouldn'tbepossiblewithasimplelist.Forexample,sayyouwanttoanalyzeallpermutationsofasequence.IfthesequencehasalengthofN,thenthenumberofitspermutationsisN!.Thismeansthatifthesequenceis10elementslong,the
numberofpermutationsis3,628,800.Butasequenceof20elementswouldhave2,432,902,008,176,640,000permutations.Theygrowfactorially.
Nowimagineyouhaveaclassicfunctionthatisattemptingtocalculateallpermutations,puttheminalist,andreturnittoyou.With10elements,itwouldrequireprobablyafewdozenseconds,butfor20elementsthereissimplynowaythatitcanbedone.
Ontheotherhand,ageneratorfunctionwillbeabletostartthecomputationandgiveyoubackthefirstpermutation,thenthesecond,andsoon.Ofcourseyouwon'thavethetimetoparsethemall,therearetoomany,butatleastyou'llbeabletoworkwithsomeofthem.
Rememberwhenweweretalkingaboutthebreakstatementinforloops?Whenwefoundanumberdividingacandidateprimewewerebreakingtheloop,andtherewasnoneedtogoon.
Sometimesit'sexactlythesame,onlytheamountofdatayouhavetoiterateoverissohugethatyoucannotkeepitallinmemoryinalist.Inthiscase,generatorsareinvaluable:theymakepossiblewhatwouldn'tbepossibleotherwise.
So,inordertosavememory(andtime),usegeneratorfunctionswheneverpossible.
It'salsoworthnotingthatyoucanusethereturnstatementinageneratorfunction.ItwillproduceaStopIterationexceptiontoberaised,effectivelyendingtheiteration.Thisisextremelyimportant.Ifareturnstatementwereactuallytomakethefunctionreturnsomething,itwouldbreaktheiterationprotocol.Python'sconsistencypreventsthis,andallowsusgreateasewhencoding.Let'sseeaquickexample:
#gen.yield.return.py
defgeometric_progression(a,q):
k=0
whileTrue:
result=a*q**k
ifresult<=100000:
yieldresult
else:
return
k+=1
forningeometric_progression(2,5):
print(n)
Theprecedingcodeyieldsalltermsofthegeometricprogression,a,aq,aq2,aq3,....Whentheprogressionproducesatermthatisgreaterthan100000,thegeneratorstops(withareturnstatement).Runningthecodeproducesthefollowingresult:
$pythongen.yield.return.py
2
10
50
250
1250
6250
31250
Thenexttermwouldhavebeen156250,whichistoobig.
SpeakingaboutStopIteration,asofPython3.5,thewaythatexceptionsarehandledingeneratorshaschanged.Tounderstandtheimplicationsofthechangeisprobablyaskingtoomuchofyouatthispoint,sojustknowthatyoucanreadallaboutitinPEP479(https://legacy.python.org/dev/peps/pep-0479/).
GoingbeyondnextAtthebeginningofthischapter,Itoldyouthatgeneratorobjectsarebasedontheiterationprotocol.We'llseeinChapter6,OOP,Decorators,andIteratorsacompleteexampleofhowtowriteacustomiterator/iterableobject.Fornow,Ijustwantyoutounderstandhownext()works.
Whathappenswhenyoucallnext(generator)isthatyou'recallingthegenerator.__next__()method.Remember,amethodisjustafunctionthatbelongstoanobject,andobjectsinPythoncanhavespecialmethods.__next__()isjustoneoftheseanditspurposeistoreturnthenextelementoftheiteration,ortoraiseStopIterationwhentheiterationisoverandtherearenomoreelementstoreturn.
Ifyourecall,inPython,anobject'sspecialmethodsarealsocalledmagicmethods,ordunder(from"doubleunderscore")methods.
Whenwewriteageneratorfunction,Pythonautomaticallytransformsitintoanobjectthatisverysimilartoaniterator,andwhenwecallnext(generator),thatcallistransformedingenerator.__next__().Let'srevisitthepreviousexampleaboutgeneratingsquares:
#first.n.squares.manual.method.py
defget_squares_gen(n):
forxinrange(n):
yieldx**2
squares=get_squares_gen(3)
print(squares.__next__())#prints:0
print(squares.__next__())#prints:1
print(squares.__next__())#prints:4
#thefollowingraisesStopIteration,thegeneratorisexhausted,
#anyfurthercalltonextwillkeepraisingStopIteration
Theresultisexactlyasthepreviousexample,onlythistimeinsteadofusingthenext(squares)proxycall,we'redirectlycallingsquares.__next__().
Generatorobjectshavealsothreeothermethodsthatallowustocontroltheirbehavior:send,throw,andclose.sendallowsustocommunicateavaluebacktothegeneratorobject,whilethrowandclose,respectively,allowustoraiseanexceptionwithinthegeneratorandcloseit.TheiruseisquiteadvancedandI
won'tbecoveringthemhereindetail,butIwanttospendafewwordsonsend,withasimpleexample:
#gen.send.preparation.py
defcounter(start=0):
n=start
whileTrue:
yieldn
n+=1
c=counter()
print(next(c))#prints:0
print(next(c))#prints:1
print(next(c))#prints:2
Theprecedingiteratorcreatesageneratorobjectthatwillrunforever.Youcankeepcallingit,anditwillneverstop.Alternatively,youcanputitinaforloop,forexample,fornincounter():...,anditwillgoonforeveraswell.Butwhatifyouwantedtostopitatsomepoint?Onesolutionistouseavariabletocontrolthewhileloop.Somethingsuchasthis:
#gen.send.preparation.stop.py
stop=False
defcounter(start=0):
n=start
whilenotstop:
yieldn
n+=1
c=counter()
print(next(c))#prints:0
print(next(c))#prints:1
stop=True
print(next(c))#raisesStopIteration
Thiswilldoit.Westartwithstop=False,anduntilwechangeittoTrue,thegeneratorwilljustkeepgoing,likebefore.ThemomentwechangestoptoTruethough,thewhileloopwillexit,andthenextcallwillraiseaStopIterationexception.Thistrickworks,butIdon'tlikeit.Wedependonanexternalvariable,andthiscanleadtoissues:whatifanotherfunctionchangesthatstop?Moreover,thecodeisscattered.Inanutshell,thisisn'tgoodenough.
Wecanmakeitbetterbyusinggenerator.send().Whenwecallgenerator.send(),thevaluethatwefeedtosendwillbepassedintothegenerator,executionisresumed,andwecanfetchitviatheyieldexpression.Thisisallverycomplicatedwhenexplainedwithwords,solet'sseeanexample:
#gen.send.py
defcounter(start=0):
n=start
whileTrue:
result=yieldn#A
print(type(result),result)#B
ifresult=='Q':
break
n+=1
c=counter()
print(next(c))#C
print(c.send('Wow!'))#D
print(next(c))#E
print(c.send('Q'))#F
Executionoftheprecedingcodeproducesthefollowing:
$pythongen.send.py
0
<class'str'>Wow!
1
<class'NoneType'>None
2
<class'str'>Q
Traceback(mostrecentcalllast):
File"gen.send.py",line14,in<module>
print(c.send('Q'))#F
StopIteration
Ithinkit'sworthgoingthroughthiscodelinebyline,likeifwewereexecutingit,toseewhetherwecanunderstandwhat'sgoingon.
Westartthegeneratorexecutionwithacalltonext(#C).Withinthegenerator,nissettothesamevalueasstart.Thewhileloopisentered,executionstops(#A)andn(0)isyieldedbacktothecaller.0isprintedontheconsole.
Wethencallsend(#D),executionresumes,andresultissetto'Wow!'(still#A),thenitstypeandvalueareprintedontheconsole(#B).resultisnot'Q',thereforenisincrementedby1andexecutiongoesbacktothewhilecondition,which,beingTrue,evaluatestoTrue(thatwasn'thardtoguess,right?).Anotherloopcyclebegins,executionstopsagain(#A),andn(1)isyieldedbacktothecaller.1isprintedontheconsole.
Atthispoint,wecallnext(#E),executionisresumedagain(#A),andbecausewearenotsendinganythingtothegeneratorexplicitly,Pythonbehavesexactlylikefunctionsthatarenotusingthereturnstatement;theyieldnexpression(#A)returnsNone.resultthereforeissettoNone,anditstypeandvalueareyetagainprintedontheconsole(#B).Executioncontinues,resultisnot'Q'sonisincrementedby1,andwestartanotherloopagain.Executionstopsagain(#A)and
n(2)isyieldedbacktothecaller.2isprintedontheconsole.
Andnowforthegrandfinale:wecallsendagain(#F),butthistimewepassin'Q',thereforewhenexecutionisresumed,resultissetto'Q'(#A).Itstypeandvalueareprintedontheconsole(#B),andthenfinallytheifclauseevaluatestoTrueandthewhileloopisstoppedbythebreakstatement.Thegeneratornaturallyterminates,whichmeansaStopIterationexceptionisraised.Youcanseetheprintofitstracebackonthelastfewlinesprintedontheconsole.
Thisisnotatallsimpletounderstandatfirst,soifit'snotcleartoyou,don'tbediscouraged.Youcankeepreadingonandthenyoucancomebacktothisexampleaftersometime.
Usingsendallowsforinterestingpatterns,andit'sworthnotingthatsendcanalsobeusedtostarttheexecutionofagenerator(providedyoucallitwithNone).
TheyieldfromexpressionAnotherinterestingconstructistheyieldfromexpression.Thisexpressionallowsyoutoyieldvaluesfromasubiterator.Itsuseallowsforquiteadvancedpatterns,solet'sjustseeaveryquickexampleofit:
#gen.yield.for.py
defprint_squares(start,end):
forninrange(start,end):
yieldn**2
forninprint_squares(2,5):
print(n)
Thepreviouscodeprintsthenumbers4,9,16ontheconsole(onseparatelines).Bynow,Iexpectyoutobeabletounderstanditbyyourself,butlet'squicklyrecapwhathappens.Theforloopoutsidethefunctiongetsaniteratorfromprint_squares(2,5)andcallsnextonituntiliterationisover.Everytimethegeneratoriscalled,executionissuspended(andlaterresumed)onyieldn**2,whichreturnsthesquareofthecurrentn.Let'sseehowwecantransformthiscodebenefitingfromtheyieldfromexpression:
#gen.yield.from.py
defprint_squares(start,end):
yieldfrom(n**2forninrange(start,end))
forninprint_squares(2,5):
print(n)
Thiscodeproducesthesameresult,butasyoucanseeyieldfromisactuallyrunningasubiterator,(n**2...).Theyieldfromexpressionreturnstothecallereachvaluethesubiteratorisproducing.It'sshorteranditreadsbetter.
GeneratorexpressionsLet'snowtalkabouttheothertechniquestogeneratevaluesoneatatime.
Thesyntaxisexactlythesameaslistcomprehensions,only,insteadofwrappingthecomprehensionwithsquarebrackets,youwrapitwithroundbrackets.Thatiscalledageneratorexpression.
Ingeneral,generatorexpressionsbehavelikeequivalentlistcomprehensions,butthereisoneveryimportantthingtoremember:generatorsallowforoneiterationonly,thentheywillbeexhausted.Let'sseeanexample:
#generator.expressions.py
>>>cubes=[k**3forkinrange(10)]#regularlist
>>>cubes
[0,1,8,27,64,125,216,343,512,729]
>>>type(cubes)
<class'list'>
>>>cubes_gen=(k**3forkinrange(10))#createasgenerator
>>>cubes_gen
<generatorobject<genexpr>at0x103fb5a98>
>>>type(cubes_gen)
<class'generator'>
>>>_(cubes_gen)#thiswillexhaustthegenerator
[0,1,8,27,64,125,216,343,512,729]
>>>_(cubes_gen)#nothingmoretogive
[]
Lookatthelineinwhichthegeneratorexpressioniscreatedandassignedthenamecubes_gen.Youcanseeit'sageneratorobject.Inordertoseeitselements,wecanuseaforloop,amanualsetofcallstonext,orsimply,feedittoalistconstructor,whichiswhatIdid(rememberI'musing_asanalias).
Noticehow,oncethegeneratorhasbeenexhausted,thereisnowaytorecoverthesameelementsfromitagain.Weneedtorecreateitifwewanttouseitfromscratchagain.
Inthenextfewexamples,let'sseehowtoreproducemapandfilterusinggeneratorexpressions:
#gen.map.py
defadder(*n):
returnsum(n)
s1=sum(map(lambda*n:adder(*n),range(100),range(1,101)))
s2=sum(adder(*n)forninzip(range(100),range(1,101)))
Inthepreviousexample,s1ands2areexactlythesame:theyarethesumofadder(0,1),adder(1,2),adder(2,3),andsoon,whichtranslatestosum(1,3,5,...).Thesyntaxisdifferent,thoughIfindthegeneratorexpressiontobemuchmorereadable:
#gen.filter.py
cubes=[x**3forxinrange(10)]
odd_cubes1=filter(lambdacube:cube%2,cubes)
odd_cubes2=(cubeforcubeincubesifcube%2)
Inthepreviousexample,odd_cubes1andodd_cubes2arethesame:theygenerateasequenceofoddcubes.Yetagain,Ipreferthegeneratorsyntax.Thisshouldbeevidentwhenthingsgetalittlemorecomplicated:
#gen.map.filter.py
N=20
cubes1=map(
lambdan:(n,n**3),
filter(lambdan:n%3==0orn%5==0,range(N))
)
cubes2=(
(n,n**3)forninrange(N)ifn%3==0orn%5==0)
Theprecedingcodecreatestwogenerators,cubes1andcubes2.Theyareexactlythesame,andreturntwo-tuples(n,n3)whennisamultipleof3or5.
Ifyouprintthelist(cubes1),youget:[(0,0),(3,27),(5,125),(6,216),(9,729),(10,1000),(12,1728),(15,3375),(18,5832)].
Seehowmuchbetterthegeneratorexpressionreads?Itmaybedebatablewhenthingsareverysimple,butassoonasyoustartnestingfunctionsabit,likewedidinthisexample,thesuperiorityofthegeneratorsyntaxisevident.It'sshorter,simpler,andmoreelegant.
Now,letmeaskyouaquestion—whatisthedifferencebetweenthefollowinglinesofcode:
#sum.example.py
s1=sum([n**2forninrange(10**6)])
s2=sum((n**2forninrange(10**6)))
s3=sum(n**2forninrange(10**6))
Strictlyspeaking,theyallproducethesamesum.Theexpressionstogets2ands3
areexactlythesamebecausethebracketsins2areredundant.Theyarebothgeneratorexpressionsinsidethesumfunction.Theexpressiontogets1isdifferentthough.Insidesum,wefindalistcomprehension.Thismeansthatinordertocalculates1,thesumfunctionhastocallnextonalistamilliontimes.
Doyouseewherewe'relosingtimeandmemory?Beforesumcanstartcallingnextonthatlist,thelistneedstohavebeencreated,whichisawasteoftimeandspace.It'smuchbetterforsumtocallnextonasimplegeneratorexpression.Thereisnoneedtohaveallthenumbersfromrange(10**6)storedinalist.
So,watchoutforextraparentheseswhenyouwriteyourexpressions:sometimesit'seasytoskipoverthesedetails,whichmakesourcodeverydifferent.Ifyoudon'tbelieveme,checkoutthefollowingcode:
#sum.example.2.py
s=sum([n**2forninrange(10**8)])#thisiskilled
#s=sum(n**2forninrange(10**8))#thissucceeds
print(s)#prints:333333328333333350000000
Tryrunningtheprecedingexample.IfIrunthefirstlineonmyoldLinuxboxwith8GBRAM,thisiswhatIget:
$pythonsum.example.2.py
Killed
Ontheotherhand,ifIcommentoutthefirstline,anduncommentthesecondone,thisistheresult:
$pythonsum.example.2.py
333333328333333350000000
Sweetgeneratorexpressions.Thedifferencebetweenthetwolinesisthatinthefirstone,alistwiththesquaresofthefirsthundredmillionnumbersmustbemadebeforebeingabletosumthemup.Thatlistishuge,andweranoutofmemory(atleast,myboxdid,ifyoursdoesn'ttryabiggernumber),thereforePythonkillstheprocessforus.Sadface.
Butwhenweremovethesquarebrackets,wedon'thavealistanymore.Thesumfunctionreceives0,1,4,9,andsoonuntilthelastone,andsumsthemup.Noproblems,happyface.
SomeperformanceconsiderationsSo,we'veseenthatwehavemanydifferentwaystoachievethesameresult.Wecanuseanycombinationofmap,zip,andfilter,orchoosetogowithacomprehension,ormaybechoosetouseagenerator,eitherfunctionorexpression.Wemayevendecidetogowithforloops;whenthelogictoapplytoeachrunningparameterisn'tsimple,theymaybethebestoption.
Otherthanreadabilityconcernsthough,let'stalkaboutperformance.Whenitcomestoperformance,usuallytherearetwofactorsthatplayamajorrole:spaceandtime.
Spacemeansthesizeofthememorythatadatastructureisgoingtotakeup.Thebestwaytochooseistoaskyourselfifyoureallyneedalist(ortuple)orifasimplegeneratorfunctionwouldworkaswell.Iftheanswerisyes,gowiththegenerator,it'llsavealotofspace.Thesamegoesforfunctions;ifyoudon'tactuallyneedthemtoreturnalistortuple,thenyoucantransformthemintogeneratorfunctionsaswell.
Sometimes,youwillhavetouselists(ortuples),forexampletherearealgorithmsthatscansequencesusingmultiplepointersormaybetheyrunoverthesequencemorethanonce.Ageneratorfunction(orexpression)canbeiteratedoveronlyonceandthenit'sexhausted,sointhesesituations,itwouldn'tbetherightchoice.
Timeisabitharderthanspacebecauseitdependsonmorevariablesandthereforeitisn'tpossibletostatethatXisfasterthanYwithabsolutecertaintyforallcases.However,basedontestsrunonPythontoday,wecansaythatonaverage,mapexhibitsperformancessimilartolistcomprehensionsandgeneratorexpressions,whileforloopsareconsistentlyslower.
Inordertoappreciatethereasoningbehindthesestatementsfully,weneedtounderstandhowPythonworks,andthisisabitoutsidethescopeofthisbook,asit'stootechnicalindetail.Let'sjustsaythatmapandlistcomprehensionsrunatC-languagespeedwithintheinterpreter,whileaPythonforloopisrunasPythonbytecodewithinthePythonVirtualMachine,whichisoftenmuchslower.
ThereareseveraldifferentimplementationsofPython.Theoriginalone,andstillthemostcommonone,isCPython(https://github.com/python/cpython),whichiswritteninC.Cisoneofthemostpowerfulandpopularprogramminglanguagesstillusedtoday.
HowaboutwedoasmallexerciseandtrytofindoutwhethertheclaimsImadeareaccurate?Iwillwriteasmallpieceofcodethatcollectstheresultsofdivmod(a,b)foracertainsetofintegerpairs,(a,b).IwillusethetimefunctionfromthetimemoduletocalculatetheelapsedtimeoftheoperationsthatIwillperform:
#performances.py
fromtimeimporttime
mx=5000
t=time()#starttimefortheforloop
floop=[]
forainrange(1,mx):
forbinrange(a,mx):
floop.append(divmod(a,b))
print('forloop:{:.4f}s'.format(time()-t))#elapsedtime
t=time()#starttimeforthelistcomprehension
compr=[
divmod(a,b)forainrange(1,mx)forbinrange(a,mx)]
print('listcomprehension:{:.4f}s'.format(time()-t))
t=time()#starttimeforthegeneratorexpression
gener=list(
divmod(a,b)forainrange(1,mx)forbinrange(a,mx))
print('generatorexpression:{:.4f}s'.format(time()-t))
Asyoucansee,we'recreatingthreelists:floop,compr,andgener.Runningthecodeproducesthefollowing:
$pythonperformances.py
forloop:4.4814s
listcomprehension:3.0210s
generatorexpression:3.4334s
Thelistcomprehensionrunsin~67%ofthetimetakenbytheforloop.That'simpressive.Thegeneratorexpressioncamequiteclosetothat,withagood~77%.Thereasonthegeneratorexpressionissloweristhatweneedtofeedittothelist()constructor,andthishasalittlebitmoreoverheadcomparedtoasheerlistcomprehension.IfIdidn'thavetoretaintheresultsofthosecalculations,ageneratorwouldprobablyhavebeenamoresuitableoption.
Aninterestingresultistonoticethat,withinthebodyoftheforloop,we'reappendingdatatoalist.ThisimpliesthatPythondoesthework,behindthescenes,ofresizingiteverynowandthen,allocatingspaceforitemstobe
appended.Iguessedthatcreatingalistofzeros,andsimplyfillingitwiththeresults,mighthavespeduptheforloop,butIwaswrong.Checkitforyourself,youjustneedmx*(mx-1)//2elementstobepreallocated.
Let'sseeasimilarexamplethatcomparesaforloopandamapcall:
#performances.map.py
fromtimeimporttime
mx=2*10**7
t=time()
absloop=[]
forninrange(mx):
absloop.append(abs(n))
print('forloop:{:.4f}s'.format(time()-t))
t=time()
abslist=[abs(n)forninrange(mx)]
print('listcomprehension:{:.4f}s'.format(time()-t))
t=time()
absmap=list(map(abs,range(mx)))
print('map:{:.4f}s'.format(time()-t))
Thiscodeisconceptuallyverysimilartothepreviousexample.Theonlythingthathaschangedisthatwe'reapplyingtheabsfunctioninsteadofthedivmodone,andwehaveonlyoneloopinsteadoftwonestedones.Executiongivesthefollowingresult:
$pythonperformances.map.py
forloop:3.8948s
listcomprehension:1.8594s
map:1.1548s
Andmapwinstherace:~62%ofthelistcomprehensionand~30%oftheforloop.Taketheseresultswithapinchofsalt,asthingsmightbedifferentaccordingtovariousfactors,suchasOSandPythonversion.Butingeneral,Ithinkit'ssafetosaythattheseresultsaregoodenoughforhavinganideawhenitcomestocodingforperformance.
Apartfromthecase-by-caselittledifferencesthough,it'squiteclearthattheforloopoptionistheslowestone,solet'sseewhywestillwanttouseit.
Don'toverdocomprehensionsandgeneratorsWe'veseenhowpowerfullistcomprehensionsandgeneratorexpressionscanbe.Andtheyare,don'tgetmewrong,butthefeelingthatIhavewhenIdealwiththemisthattheircomplexitygrowsexponentially.Themoreyoutrytodowithinasinglecomprehensionorageneratorexpression,theharderitbecomestoread,understand,andthereforemaintainorchange.
IfyouchecktheZenofPythonagain,thereareafewlinesthatIthinkareworthkeepinginmindwhendealingwithoptimizedcode:
>>>importthis
...
Explicitisbetterthanimplicit.
Simpleisbetterthancomplex.
...
Readabilitycounts.
...
Iftheimplementationishardtoexplain,it'sabadidea.
...
Comprehensionsandgeneratorexpressionsaremoreimplicitthanexplicit,canbequitedifficulttoreadandunderstand,andtheycanbehardtoexplain.Sometimesyouhavetobreakthemapartusingtheinside-outtechnique,tounderstandwhat'sgoingon.
Togiveyouanexample,let'stalkabitmoreaboutPythagoreantriples.Justtoremindyou,aPythagoreantripleisatupleofpositiveintegers(a,b,c)suchthata2+b2=c2.
WesawhowtocalculatethemintheFilteringacomprehensionsection,butwediditinaveryinefficientwaybecausewewerescanningallpairsofnumbersbelowacertainthreshold,calculatingthehypotenuse,andfilteringoutthosethatwerenotproducingatriple.
AbetterwaytogetalistofPythagoreantriplesistogeneratethemdirectly.Therearemanydifferentformulasyoucanusetodothis,we'llusethe
Euclideanformula.
Thisformulasaysthatanytriple(a,b,c),wherea=m2-n2,b=2mn,c=m2+n2,withmandnpositiveintegerssuchthatm>n,isaPythagoreantriple.Forexample,whenm=2andn=1,wefindthesmallesttriple:(3,4,5).
Thereisonecatchthough:considerthetriple(6,8,10)thatisjustlike(3,4,5)withallthenumbersmultipliedby2.ThistripleisdefinitelyPythagorean,since62+82=102,butwecanderiveitfrom(3,4,5)simplybymultiplyingeachofitselementsby2.Samegoesfor(9,12,15),(12,16,20),andingeneralforallthetriplesthatwecanwriteas(3k,4k,5k),withkbeingapositiveintegergreaterthan1.
Atriplethatcannotbeobtainedbymultiplyingtheelementsofanotheronebysomefactor,k,iscalledprimitive.Anotherwayofstatingthisis:ifthethreeelementsofatriplearecoprime,thenthetripleisprimitive.Twonumbersarecoprimewhentheydon'tshareanyprimefactoramongsttheirdivisors,thatis,theirgreatestcommondivisor(GCD)is1.Forexample,3and5arecoprime,while3and6arenot,becausetheyarebothdivisibleby3.
So,theEuclideanformulatellsusthatifmandnarecoprime,andm-nisodd,thetripletheygenerateisprimitive.Inthefollowingexample,wewillwriteageneratorexpressiontocalculatealltheprimitivePythagoreantripleswhosehypotenuse(c)islessthanorequaltosomeinteger,N.Thismeanswewantalltriplesforwhichm2+n2≤N.Whennis1,theformulalookslikethis:m2≤N-1,whichmeanswecanapproximatethecalculationwithanupperboundofm≤N1/2.
So,torecap:mmustbegreaterthann,theymustalsobecoprime,andtheirdifferencem-nmustbeodd.Moreover,inordertoavoiduselesscalculations,we'llputtheupperboundformatfloor(sqrt(N))+1.
Thefloorfunctionforarealnumber,x,givesthemaximuminteger,n,suchthatn<x,forexample,floor(3.8)=3,floor(13.1)=13.Takingfloor(sqrt(N))+1meanstakingtheintegerpartofthesquarerootofNandaddingaminimalmarginjusttomakesurewedon'tmissanynumbers.
Let'sputallofthisintocode,stepbystep.Let'sstartbywritingasimplegcdfunctionthatusesEuclid'salgorithm:
#functions.py
defgcd(a,b):
"""CalculatetheGreatestCommonDivisorof(a,b)."""
whileb!=0:
a,b=b,a%b
returna
TheexplanationofEuclid'salgorithmisavailableontheweb,soIwon'tspendanytimeheretalkingaboutit;weneedtofocusonthegeneratorexpression.ThenextstepistousetheknowledgewegatheredbeforetogeneratealistofprimitivePythagoreantriples:
#pythagorean.triple.generation.py
fromfunctionsimportgcd
N=50
triples=sorted(#1
((a,b,c)fora,b,cin(#2
((m**2-n**2),(2*m*n),(m**2+n**2))#3
forminrange(1,int(N**.5)+1)#4
forninrange(1,m)#5
if(m-n)%2andgcd(m,n)==1#6
)ifc<=N),key=lambda*triple:sum(*triple)#7
)
Thereyougo.It'snoteasytoread,solet'sgothroughitlinebyline.At#3,westartageneratorexpressionthatiscreatingtriples.Youcanseefrom#4and#5thatwe'reloopingonmin[1,M]withMbeingtheintegerpartofsqrt(N),plus1.Ontheotherhand,nloopswithin[1,m),torespectthem>nrule.It'sworthnotinghowIcalculatedsqrt(N),thatis,N**.5,whichisjustanotherwaytodoitthatIwantedtoshowyou.
At#6,youcanseethefilteringconditionstomakethetriplesprimitive:(m-n)%2evaluatestoTruewhen(m-n)isodd,andgcd(m,n)==1meansmandnarecoprime.Withtheseinplace,weknowthetripleswillbeprimitive.Thistakescareoftheinnermostgeneratorexpression.Theoutermostonestartsat#2,andfinishesat#7.Wetakethetriples(a,b,c)in(...innermostgenerator...)suchthatc<=N.
Finally,at#1weapplysorting,topresentthelistinorder.At#7,aftertheoutermostgeneratorexpressionisclosed,youcanseethatwespecifythesortingkeytobethesuma+b+c.Thisisjustmypersonalpreference,thereisnomathematicalreasonbehindit.
So,whatdoyouthink?Wasitstraightforwardtoread?Idon'tthinkso.Andbelieveme,thisisstillasimpleexample;Ihaveseenmuchworseinmycareer.
Thiskindofcodeisdifficulttounderstand,debug,andmodify.Itshouldn'tfindaplaceinaprofessionalenvironment.
So,let'sseeifwecanrewritethiscodeintosomethingmorereadable:
#pythagorean.triple.generation.for.py
fromfunctionsimportgcd
defgen_triples(N):
forminrange(1,int(N**.5)+1):#1
forninrange(1,m):#2
if(m-n)%2andgcd(m,n)==1:#3
c=m**2+n**2#4
ifc<=N:#5
a=m**2-n**2#6
b=2*m*n#7
yield(a,b,c)#8
triples=sorted(
gen_triples(50),key=lambda*triple:sum(*triple))#9
Thisissomuchbetter.Let'sgothroughit,linebyline.You'llseehowmucheasieritistounderstand.
Westartloopingat#1and#2,inexactlythesamewaywewereloopinginthepreviousexample.Online#3,wehavethefilteringforprimitivetriples.Online#4,wedeviateabitfromwhatweweredoingbefore:wecalculatec,andonline#5,wefilteroncbeinglessthanorequaltoN.Onlywhencsatisfiesthatcondition,wedocalculateaandb,andyieldtheresultingtuple.It'salwaysgoodtodelayallcalculationsforasmuchaspossiblesothatwedon'twastetimeandCPU.Onthelastline,weapplysortingwiththesamekeywewereusinginthegeneratorexpressionexample.
Ihopeyouagree,thisexampleiseasiertounderstand.AndIpromiseyou,ifyouhavetomodifythecodeoneday,you'llfindthatmodifyingthisoneiseasy,whiletomodifytheotherversionwilltakemuchlonger(anditwillbemoreerror-prone).
Ifyouprinttheresultsofbothexamples(theyarethesame),youwillgetthis:
[(3,4,5),(5,12,13),(15,8,17),(7,24,25),(21,20,29),(35,12,37),(9,40,
41)]
Themoralofthestoryis,tryandusecomprehensionsandgeneratorexpressionsasmuchasyoucan,butifthecodestartstobecomplicatedtomodifyortoread,
youmaywanttorefactoritintosomethingmorereadable.Yourcolleagueswillthankyou.
NamelocalizationNowthatwearefamiliarwithalltypesofcomprehensionsandgeneratorexpression,let'stalkaboutnamelocalizationwithinthem.Python3.*localizesloopvariablesinallfourformsofcomprehensions:list,dict,set,andgeneratorexpressions.Thisbehavioristhereforedifferentfromthatoftheforloop.Let'sseeasimpleexampletoshowallthecases:#scopes.pyA=100ex1=[AforAinrange(5)]print(A)#prints:100
ex2=list(AforAinrange(5))print(A)#prints:100
ex3=dict((A,2*A)forAinrange(5))print(A)#prints:100
ex4=set(AforAinrange(5))print(A)#prints:100
s=0forAinrange(5):s+=Aprint(A)#prints:4
Intheprecedingcode,wedeclareaglobalname,A=100,andthenweexercisethefourcomprehensions:list,generatorexpression,dictionary,andset.Noneofthemaltertheglobalname,A.Conversely,youcanseeattheendthattheforloopmodifiesit.Thelastprintstatementprints4.
Let'sseewhathappensifAwasn'tthere:
#scopes.noglobal.py
ex1=[AforAinrange(5)]
print(A)#breaks:NameError:name'A'isnotdefined
Theprecedingcodewouldworkthesamewithanyofthefourtypesof
comprehensions.Afterwerunthefirstline,Aisnotdefinedintheglobalnamespace.Onceagain,theforloopbehavesdifferently:
#scopes.for.py
s=0
forAinrange(5):
s+=A
print(A)#prints:4
print(globals())
Theprecedingcodeshowsthatafteraforloop,iftheloopvariablewasn'tdefinedbeforeit,wecanfinditintheglobalframe.Tomakesureofit,let'stakeapeekatitbycallingtheglobals()built-infunction:
$pythonscopes.for.py
4
{'__name__':'__main__','__doc__':None,...,'s':10,'A':4}
TogetherwithalotofotherboilerplatestuffthatIhaveomitted,wecanspot'A':4.
Generationbehaviorinbuilt-insAmongthebuilt-intypes,thegenerationbehaviorisnowquitecommon.ThisisamajordifferencebetweenPython2andPython3.Alotoffunctions,suchasmap,zip,andfilter,havebeentransformedsothattheyreturnobjectsthatbehavelikeiterables.Theideabehindthischangeisthatifyouneedtomakealistofthoseresults,youcanalwayswrapthecallinalist()class,andyou'redone.Ontheotherhand,ifyoujustneedtoiterateandwanttokeeptheimpactonmemoryaslightaspossible,youcanusethosefunctionssafely.
Anothernotableexampleistherangefunction.InPython2itreturnsalist,andthereisanotherfunctioncalledxrangethatreturnsanobjectthatyoucaniterateon,whichgeneratesthenumbersonthefly.InPython3thisfunctionhasgone,andrangenowbehaveslikeit.
Butthisconcept,ingeneral,isnowquitewidespread.Youcanfinditintheopen()function,whichisusedtooperateonfileobjects(we'llseeitinChapter7,FilesandDataPersistence),butalsoinenumerate,inthedictionarykeys,values,anditemsmethods,andseveralotherplaces.
Itallmakessense:Python'saimistotrytoreducethememoryfootprintbyavoidingwastingspacewhereverpossible,especiallyinthosefunctionsandmethodsthatareusedextensivelyinmostsituations.
Doyourememberatthebeginningofthischapter?Isaidthatitmakesmoresensetooptimizetheperformancesofcodethathastodealwithalotofobjects,ratherthanshavingoffafewmillisecondsfromafunctionthatwecalltwiceaday.
OnelastexampleBeforewefinishthischapter,I'llshowyouasimpleproblemthatIusedtosubmittocandidatesforaPythondeveloperroleinacompanyIusedtoworkfor.
Theproblemisthefollowing:giventhesequence01123581321...,writeafunctionthatwouldreturnthetermsofthissequenceuptosomelimit,N.
Ifyouhaven'trecognizedit,thatistheFibonaccisequence,whichisdefinedasF(0)=0,F(1)=1and,foranyn>1,F(n)=F(n-1)+F(n-2).Thissequenceisexcellenttotestknowledgeaboutrecursion,memoizationtechniques,andothertechnicaldetails,butinthiscase,itwasagoodopportunitytocheckwhetherthecandidateknewaboutgenerators.
Let'sstartfromarudimentaryversionofafunction,andthenimproveonit:
#fibonacci.first.py
deffibonacci(N):
"""ReturnallfibonaccinumbersuptoN."""
result=[0]
next_n=1
whilenext_n<=N:
result.append(next_n)
next_n=sum(result[-2:])
returnresult
print(fibonacci(0))#[0]
print(fibonacci(1))#[0,1,1]
print(fibonacci(50))#[0,1,1,2,3,5,8,13,21,34]
Fromthetop:wesetuptheresultlisttoastartingvalueof[0].Thenwestarttheiterationfromthenextelement(next_n),whichis1.WhilethenextelementisnotgreaterthanN,wekeepappendingittothelistandcalculatingthenext.Wecalculatethenextelementbytakingasliceofthelasttwoelementsintheresultlistandpassingittothesumfunction.Addsomeprintstatementshereandthereifthisisnotcleartoyou,butbynowIwouldexpectitnottobeanissue.
WhentheconditionofthewhileloopevaluatestoFalse,weexittheloopandreturnresult.Youcanseetheresultofthoseprintstatementsinthecommentsnexttoeachofthem.
Atthispoint,Iwouldaskthecandidatethefollowingquestion:WhatifIjustwantedtoiterateoverthosenumbers?Agoodcandidatewouldthenchangethecodetowhatyou'llfindhere(anexcellentcandidatewouldhavestartedwithit!):
#fibonacci.second.py
deffibonacci(N):
"""ReturnallfibonaccinumbersuptoN."""
yield0
ifN==0:
return
a=0
b=1
whileb<=N:
yieldb
a,b=b,a+b
print(list(fibonacci(0)))#[0]
print(list(fibonacci(1)))#[0,1,1]
print(list(fibonacci(50)))#[0,1,1,2,3,5,8,13,21,34]
ThisisactuallyoneofthesolutionsIwasgiven.Idon'tknowwhyIkeptit,butI'mgladIdidsoIcanshowittoyou.Now,thefibonaccifunctionisageneratorfunction.Firstweyield0,thenifNis0,wereturn(thiswillcauseaStopIterationexceptiontoberaised).Ifthat'snotthecase,westartiterating,yieldingbateveryloopcycle,andthenupdatingaandb.Allweneedinordertobeabletoproducethenextelementofthesequenceisthepasttwo:aandb,respectively.
Thiscodeismuchbetter,hasalightermemoryfootprintandallwehavetodotogetalistofFibonaccinumbersistowrapthecallwithlist(),asusual.Butwhataboutelegance?Ican'tleaveitlikethat,canI?Let'strythefollowing:
#fibonacci.elegant.py
deffibonacci(N):
"""ReturnallfibonaccinumbersuptoN."""
a,b=0,1
whilea<=N:
yielda
a,b=b,a+b
Muchbetter.Thewholebodyofthefunctionisfourlines,fiveifyoucountthedocstring.Noticehow,inthiscase,usingtupleassignment(a,b=0,1anda,b=b,a+b)helpsinmakingthecodeshorter,andmorereadable.
SummaryInthischapter,weexploredtheconceptofiterationandgenerationabitmoredeeply.Welookedatthemap,zip,andfilterfunctionsindetail,andlearnedhowtousethemasanalternativetoaregularforloopapproach.
Thenwecoveredtheconceptofcomprehensions,forlists,dictionaries,andsets.Weexploredtheirsyntaxandhowtousethemasanalternativetoboththeclassicforloopapproachandalsototheuseofthemap,zip,andfilterfunctions.
Finally,wetalkedabouttheconceptofgeneration,intwoforms:generatorfunctionsandexpressions.Welearnedhowtosavetimeandspacebyusinggenerationtechniquesandsawhowtheycanmakepossiblewhatwouldn'tnormallybeifweusedaconventionalapproachbasedonlists.
Wetalkedaboutperformance,andsawthatforloopsarelastintermsofspeed,buttheyprovidethebestreadabilityandflexibilitytochange.Ontheotherhand,functionssuchasmapandfilter,andlistcomprehensions,canbemuchfaster.
Thecomplexityofthecodewrittenusingthesetechniquesgrowsexponentiallyso,inordertofavorreadabilityandeaseofmaintainability,westillneedtousetheclassicforloopapproachattimes.Anotherdifferenceisinthenamelocalization,wheretheforloopbehavesdifferentlyfromallothertypesofcomprehensions.
Thenextchapterwillbeallaboutobjectsandclasses.Itisstructurallysimilartothisone,inthatwewon'texploremanydifferentsubjects,justafewofthem,butwe'lltrytodiveintothemalittlebitmoredeeply.
Makesureyouunderstandtheconceptsofthischapterbeforemovingontothenextone.We'rebuildingawallbrickbybrick,andifthefoundationisnotsolid,wewon'tgetveryfar.
OOP,Decorators,andIteratorsLaclassenonèacqua.(Classwillout)
–Italiansaying
Icouldprobablywriteawholebookaboutobject-orientedprogramming(OOP)andclasses.Inthischapter,I'mfacingthehardchallengeoffindingthebalancebetweenbreadthanddepth.Therearesimplytoomanythingstotell,andplentyofthemwouldtakemorethanthiswholechapterifIdescribedthemindepth.Therefore,IwilltrytogiveyouwhatIthinkisagoodpanoramicviewofthefundamentals,plusafewthingsthatmaycomeinhandyinthenextchapters.Python'sofficialdocumentationwillhelpinfillingthegaps.
Inthischapter,wearegoingtocoverthefollowingtopics:
DecoratorsOOPwithPythonIterators
DecoratorsInChapter5,SavingTimeandMemory,Imeasuredtheexecutiontimeofvariousexpressions.Ifyourecall,Ihadtoinitializeavariabletothestarttime,andsubtractitfromthecurrenttimeafterexecutioninordertocalculatetheelapsedtime.Ialsoprinteditontheconsoleaftereachmeasurement.Thatwasverytedious.
Everytimeyoufindyourselfrepeatingthings,analarmbellshouldgooff.Canyouputthatcodeinafunctionandavoidrepetition?Theanswermostofthetimeisyes,solet'slookatanexample:
#decorators/time.measure.start.py
fromtimeimportsleep,time
deff():
sleep(.3)
defg():
sleep(.5)
t=time()
f()
print('ftook:',time()-t)#ftook:0.3001396656036377
t=time()
g()
print('gtook:',time()-t)#gtook:0.5039339065551758
Intheprecedingcode,Idefinedtwofunctions,fandg,whichdonothingbutsleep(by0.3and0.5seconds,respectively).Iusedthesleepfunctiontosuspendtheexecutionforthedesiredamountoftime.Noticehowthetimemeasureisprettyaccurate.Now,howdoweavoidrepeatingthatcodeandthosecalculations?Onefirstpotentialapproachcouldbethefollowing:
#decorators/time.measure.dry.py
fromtimeimportsleep,time
deff():
sleep(.3)
defg():
sleep(.5)
defmeasure(func):
t=time()
func()
print(func.__name__,'took:',time()-t)
measure(f)#ftook:0.30434322357177734
measure(g)#gtook:0.5048270225524902
Ah,muchbetternow.Thewholetimingmechanismhasbeenencapsulatedintoafunctionsowedon'trepeatcode.Weprintthefunctionnamedynamicallyandit'seasyenoughtocode.Whatifweneedtopassargumentstothefunctionwemeasure?Thiscodewouldgetjustabitmorecomplicated,solet'sseeanexample:
#decorators/time.measure.arguments.py
fromtimeimportsleep,time
deff(sleep_time=0.1):
sleep(sleep_time)
defmeasure(func,*args,**kwargs):
t=time()
func(*args,**kwargs)
print(func.__name__,'took:',time()-t)
measure(f,sleep_time=0.3)#ftook:0.30056095123291016
measure(f,0.2)#ftook:0.2033553123474121
Now,fisexpectingtobefedsleep_time(withadefaultvalueof0.1),sowedon'tneedganymore.Ialsohadtochangethemeasurefunctionsothatitisnowacceptsafunction,anyvariablepositionalarguments,andanyvariablekeywordarguments.Inthisway,whateverwecallmeasurewith,weredirectthoseargumentstothecalltofuncwedoinside.
Thisisverygood,butwecanpushitalittlebitfurther.Let'ssaywewanttosomehowhavethattimingbehaviorbuilt-inintotheffunction,sothatwecouldjustcallitandhavethatmeasuretaken.Here'showwecoulddoit:
#decorators/time.measure.deco1.py
fromtimeimportsleep,time
deff(sleep_time=0.1):
sleep(sleep_time)
defmeasure(func):
defwrapper(*args,**kwargs):
t=time()
func(*args,**kwargs)
print(func.__name__,'took:',time()-t)
returnwrapper
f=measure(f)#decorationpoint
f(0.2)#ftook:0.20372915267944336
f(sleep_time=0.3)#ftook:0.30455899238586426
print(f.__name__)#wrapper<-ouch!
Theprecedingcodeisprobablynotsostraightforward.Let'sseewhathappenshere.Themagicisinthedecorationpoint.Webasicallyreassignfwithwhateverisreturnedbymeasurewhenwecallitwithfasanargument.Withinmeasure,wedefineanotherfunction,wrapper,andthenwereturnit.So,theneteffectisthatafterthedecorationpoint,whenwecallf,we'reactuallycallingwrapper.Sincethewrapperinsideiscallingfunc,whichisf,weareactuallyclosingthelooplikethat.Ifyoudon'tbelieveme,takealookatthelastline.
wrapperisactually...awrapper.Ittakesvariableandpositionalarguments,andcallsfwiththem.Italsodoesthetimemeasurementcalculationaroundthecall.
Thistechniqueiscalleddecoration,andmeasureis,effectively,adecorator.Thisparadigmbecamesopopularandwidelyusedthatatsomepoint,Pythonaddedaspecialsyntaxforit(checkouthttps://www.python.org/dev/peps/pep-0318/).Let'sexplorethreecases:onedecorator,twodecorators,andonedecoratorthattakesarguments:
#decorators/syntax.py
deffunc(arg1,arg2,...):
pass
func=decorator(func)
#isequivalenttothefollowing:
@decorator
deffunc(arg1,arg2,...):
pass
Basically,insteadofmanuallyreassigningthefunctiontowhatwasreturnedbythedecorator,weprependthedefinitionofthefunctionwiththespecialsyntax,@decorator_name.
Wecanapplymultipledecoratorstothesamefunctioninthefollowingway:
#decorators/syntax.py
deffunc(arg1,arg2,...):
pass
func=deco1(deco2(func))
#isequivalenttothefollowing:
@deco1
@deco2
deffunc(arg1,arg2,...):
pass
Whenapplyingmultipledecorators,payattentiontotheorder.Intheprecedingexample,funcisdecoratedwithdeco2first,andtheresultisdecoratedwithdeco1.Agoodruleofthumbis:thecloserthedecoratoristothefunction,thesooneritisapplied.
Somedecoratorscantakearguments.Thistechniqueisgenerallyusedtoproduceotherdecorators.Let'slookatthesyntax,andthenwe'llseeanexampleofit:
#decorators/syntax.py
deffunc(arg1,arg2,...):
pass
func=decoarg(arg_a,arg_b)(func)
#isequivalenttothefollowing:
@decoarg(arg_a,arg_b)
deffunc(arg1,arg2,...):
pass
Asyoucansee,thiscaseisabitdifferent.First,decoargiscalledwiththegivenarguments,andthenitsreturnvalue(theactualdecorator)iscalledwithfunc.BeforeIgiveyouanotherexample,let'sfixonethingthatisbotheringme.Idon'twanttolosetheoriginalfunctionnameanddocstring(andotherattributesaswell,checkthedocumentationforthedetails)whenIdecorateit.Butbecauseinsideourdecoratorwereturnwrapper,theoriginalattributesfromfuncarelostandfendsupbeingassignedtheattributesofwrapper.Thereisaneasyfixforthatfromthebeautifulfunctoolsmodule.Iwillfixthelastexample,andIwillalsorewriteitssyntaxtousethe@operator:
#decorators/time.measure.deco2.py
fromtimeimportsleep,time
fromfunctoolsimportwraps
defmeasure(func):
@wraps(func)
defwrapper(*args,**kwargs):
t=time()
func(*args,**kwargs)
print(func.__name__,'took:',time()-t)
returnwrapper
@measure
deff(sleep_time=0.1):
"""I'macat.Ilovetosleep!"""
sleep(sleep_time)
f(sleep_time=0.3)#ftook:0.3010902404785156
print(f.__name__,':',f.__doc__)#f:I'macat.Ilovetosleep!
Nowwe'retalking!Asyoucansee,allweneedtodoistotellPythonthatwrapperactuallywrapsfunc(bymeansofthewrapsfunction),andyoucanseethattheoriginalnameanddocstringarenowmaintained.
Let'sseeanotherexample.Iwantadecoratorthatprintsanerrormessagewhentheresultofafunctionisgreaterthanacertainthreshold.Iwillalsotakethisopportunitytoshowyouhowtoapplytwodecoratorsatonce:
#decorators/two.decorators.py
fromtimeimportsleep,time
fromfunctoolsimportwraps
defmeasure(func):
@wraps(func)
defwrapper(*args,**kwargs):
t=time()
result=func(*args,**kwargs)
print(func.__name__,'took:',time()-t)
returnresult
returnwrapper
defmax_result(func):
@wraps(func)
defwrapper(*args,**kwargs):
result=func(*args,**kwargs)
ifresult>100:
print('Resultistoobig({0}).Maxallowedis100.'
.format(result))
returnresult
returnwrapper
@measure
@max_result
defcube(n):
returnn**3
print(cube(2))
print(cube(5))
Takeyourtimeinstudyingtheprecedingexampleuntilyouaresureyouunderstanditwell.Ifyoudo,Idon'tthinkthereisanydecoratoryounowwon'tbeabletowrite.
Ihadtoenhancethemeasuredecorator,sothatitswrappernowreturnstheresultofthecalltofunc.Themax_resultdecoratordoesthataswell,butbeforereturning,itchecksthatresultisnotgreaterthan100,whichisthemaximumallowed.Idecoratedcubewithbothofthem.First,max_resultisapplied,thenmeasure.Runningthiscodeyieldsthisresult:
$pythontwo.decorators.py
cubetook:3.0994415283203125e-06
8
Resultistoobig(125).Maxallowedis100.
cubetook:1.0013580322265625e-05
125
Foryourconvenience,Ihaveseparatedtheresultsofthetwocallswithablankline.Inthefirstcall,theresultis8,whichpassesthethresholdcheck.Therunningtimeismeasuredandprinted.Finally,weprinttheresult(8).
Onthesecondcall,theresultis125,sotheerrormessageisprinted,theresultreturned,andthenit'stheturnofmeasure,whichprintstherunningtimeagain,andfinally,weprinttheresult(125).
HadIdecoratedthecubefunctionwiththesametwodecoratorsbutinadifferentorder,theerrormessagewouldhavefollowedthelinethatprintstherunningtime,insteadofhaveprecededit.
AdecoratorfactoryLet'ssimplifythisexamplenow,goingbacktoasingledecorator:max_result.IwanttomakeitsothatIcandecoratedifferentfunctionswithdifferentthresholds,Idon'twanttowriteonedecoratorforeachthreshold.Let'samendmax_resultsothatitallowsustodecoratefunctionsspecifyingthethresholddynamically:
#decorators/decorators.factory.py
fromfunctoolsimportwraps
defmax_result(threshold):
defdecorator(func):
@wraps(func)
defwrapper(*args,**kwargs):
result=func(*args,**kwargs)
ifresult>threshold:
print(
'Resultistoobig({0}).Maxallowedis{1}.'
.format(result,threshold))
returnresult
returnwrapper
returndecorator
@max_result(75)
defcube(n):
returnn**3
print(cube(5))
Theprecedingcodeshowsyouhowtowriteadecoratorfactory.Ifyourecall,decoratingafunctionwithadecoratorthattakesargumentsisthesameaswritingfunc=decorator(argA,argB)(func),sowhenwedecoratecubewithmax_result(75),we'redoingcube=max_result(75)(cube).
Let'sgothroughwhathappens,stepbystep.Whenwecallmax_result(75),weenteritsbody.Adecoratorfunctionisdefinedinside,whichtakesafunctionasitsonlyargument.Insidethatfunction,theusualdecoratortrickisperformed.Wedefinewrapper,insideofwhichwechecktheresultoftheoriginalfunction'scall.Thebeautyofthisapproachisthatfromtheinnermostlevel,wecanstillrefertoasbothfuncandthreshold,whichallowsustosetthethresholddynamically.
wrapperreturnsresult,decoratorreturnswrapper,andmax_resultreturnsdecorator.Thismeansthatourcube=max_result(75)(cube)callactuallybecomescube=
decorator(cube).Notjustanydecoratorthough,butoneforwhichthresholdhasavalueof75.Thisisachievedbyamechanismcalledclosure,whichisoutsideofthescopeofthischapterbutstillveryinteresting,soImentioneditforyoutodosomeresearchonit.
Runningthelastexampleproducesthefollowingresult:
$pythondecorators.factory.py
Resultistoobig(125).Maxallowedis75.
125
Theprecedingcodeallowsmetousethemax_resultdecoratorwithdifferentthresholdsatmyownwill,likethis:
#decorators/decorators.factory.py
@max_result(75)
defcube(n):
returnn**3
@max_result(100)
defsquare(n):
returnn**2
@max_result(1000)
defmultiply(a,b):
returna*b
Notethateverydecorationusesadifferentthresholdvalue.
DecoratorsareverypopularinPython.Theyareusedquiteoftenandtheysimplify(andbeautify,Idaresay)thecodealot.
Object-orientedprogramming(OOP)It'sbeenquitealongandhopefullynicejourneyand,bynow,weshouldbereadytoexploreOOP.I'llusethedefinitionfromKindler,E.;Krivy,I.(2011).Object-orientedsimulationofsystemswithsophisticatedcontrolbyInternationalJournalofGeneralSystems,andadaptittoPython:
Object-orientedprogramming(OOP)isaprogrammingparadigmbasedontheconceptof"objects",whicharedatastructuresthatcontaindata,intheformofattributes,andcode,intheformoffunctionsknownasmethods.Adistinguishingfeatureofobjectsisthatanobject'smethodcanaccessandoftenmodifythedataattributesoftheobjectwithwhichtheyareassociated(objectshaveanotionof"self").InOOprogramming,computerprogramsaredesignedbymakingthemoutofobjectsthatinteractwithoneanother.
Pythonhasfullsupportforthisparadigm.Actually,aswehavealreadysaid,everythinginPythonisanobject,sothisshowsthatOOPisnotjustsupportedbyPython,butit'sapartofitsverycore.
ThetwomainplayersinOOPareobjectsandclasses.Classesareusedtocreateobjects(objectsareinstancesoftheclassesfromwhichtheywerecreated),sowecouldseethemasinstancefactories.Whenobjectsarecreatedbyaclass,theyinherittheclassattributesandmethods.Theyrepresentconcreteitemsintheprogram'sdomain.
ThesimplestPythonclassIwillstartwiththesimplestclassyoucouldeverwriteinPython:
#oop/simplest.class.py
classSimplest():#whenempty,thebracesareoptional
pass
print(type(Simplest))#whattypeisthisobject?
simp=Simplest()#wecreateaninstanceofSimplest:simp
print(type(simp))#whattypeissimp?
#issimpaninstanceofSimplest?
print(type(simp)==Simplest)#There'sabetterwayforthis
Let'sruntheprecedingcodeandexplainitlinebyline:
$pythonsimplest.class.py
<class'type'>
<class'__main__.Simplest'>
True
TheSimplestclassIdefinedhasonlythepassinstructioninitsbody,whichmeansitdoesn'thaveanycustomattributesormethods.Bracketsafterthenameareoptionalifempty.Iwillprintitstype(__main__isthenameofthescopeinwhichtop-levelcodeexecutes),andIamawarethat,inthecomment,Iwroteobjectinsteadofclass.Itturnsoutthat,asyoucanseebytheresultofthatprint,classesareactuallyobjects.Tobeprecise,theyareinstancesoftype.Explainingthisconceptwouldleadustoatalkaboutmetaclassesandmetaprogramming,advancedconceptsthatrequireasolidgraspofthefundamentalstobeunderstoodandarebeyondthescopeofthischapter.Asusual,Imentionedittoleaveapointerforyou,forwhenyou'llbereadytodigdeeper.
Let'sgobacktotheexample:IusedSimplesttocreateaninstance,simp.Youcanseethatthesyntaxtocreateaninstanceisthesameasweusetocallafunction.ThenweprintwhattypesimpbelongstoandweverifythatsimpisinfactaninstanceofSimplest.I'llshowyouabetterwayofdoingthislateroninthechapter.
Uptonow,it'sallverysimple.WhathappenswhenwewriteclassClassName():pass,though?Well,whatPythondoesiscreateaclassobjectandassignitaname.Thisisverysimilartowhathappenswhenwedeclareafunctionusingdef.
ClassandobjectnamespacesAftertheclassobjecthasbeencreated(whichusuallyhappenswhenthemoduleisfirstimported),itbasicallyrepresentsanamespace.Wecancallthatclasstocreateitsinstances.Eachinstanceinheritstheclassattributesandmethodsandisgivenitsownnamespace.Wealreadyknowthat,towalkanamespace,allweneedtodoistousethedot(.)operator.
Let'slookatanotherexample:
#oop/class.namespaces.py
classPerson:
species='Human'
print(Person.species)#Human
Person.alive=True#Addeddynamically!
print(Person.alive)#True
man=Person()
print(man.species)#Human(inherited)
print(man.alive)#True(inherited)
Person.alive=False
print(man.alive)#False(inherited)
man.name='Darth'
man.surname='Vader'
print(man.name,man.surname)#DarthVader
Intheprecedingexample,Ihavedefinedaclassattributecalledspecies.Anyvariabledefinedinthebodyofaclassisanattributethatbelongstothatclass.Inthecode,IhavealsodefinedPerson.alive,whichisanotherclassattribute.Youcanseethatthereisnorestrictiononaccessingthatattributefromtheclass.Youcanseethatman,whichisaninstanceofPerson,inheritsbothofthem,andreflectstheminstantlywhentheychange.
manhasalsotwoattributesthatbelongtoitsownnamespaceandthereforearecalledinstanceattributes:nameandsurname.
Classattributesaresharedamongallinstances,whileinstanceattributesarenot;therefore,youshoulduseclassattributestoprovidethestatesandbehaviorstobesharedbyallinstances,anduseinstanceattributesfordatathatbelongsjusttoonespecificobject.
AttributeshadowingWhenyousearchforanattributeinanobject,ifitisnotfound,Pythonkeepssearchingintheclassthatwasusedtocreatethatobject(andkeepssearchinguntilit'seitherfoundortheendoftheinheritancechainisreached).Thisleadstoaninterestingshadowingbehavior.Let'slookatanotherexample:
#oop/class.attribute.shadowing.py
classPoint:
x=10
y=7
p=Point()
print(p.x)#10(fromclassattribute)
print(p.y)#7(fromclassattribute)
p.x=12#pgetsitsown`x`attribute
print(p.x)#12(nowfoundontheinstance)
print(Point.x)#10(classattributestillthesame)
delp.x#wedeleteinstanceattribute
print(p.x)#10(nowsearchhastogoagaintofindclassattr)
p.z=3#let'smakeita3Dpoint
print(p.z)#3
print(Point.z)
#AttributeError:typeobject'Point'hasnoattribute'z'
Theprecedingcodeisveryinteresting.WehavedefinedaclasscalledPointwithtwoclassattributes,xandy.Whenwecreateaninstance,p,youcanseethatwecanprintbothxandyfromthepnamespace(p.xandp.y).WhathappenswhenwedothatisPythondoesn'tfindanyxoryattributesontheinstance,andthereforesearchestheclass,andfindsthemthere.
Thenwegivepitsownxattributebyassigningp.x=12.Thisbehaviormayappearabitweirdatfirst,butifyouthinkaboutit,it'sexactlythesameaswhathappensinafunctionthatdeclaresx=12whenthereisaglobalx=10outside.Weknowthatx=12won'taffecttheglobalone,andforclassesandinstances,itisexactlythesame.
Afterassigningp.x=12,whenweprintit,thesearchdoesn'tneedtoreadtheclassattributes,becausexisfoundontheinstance,thereforeweget12printedout.WealsoprintPoint.x,whichreferstoxintheclassnamespace.
Andthen,wedeletexfromthenamespaceofp,whichmeansthat,onthenextline,whenweprintitagain,Pythonwillgoagainandsearchforitintheclass,becauseitwon'tbefoundintheinstanceanymore.
Thelastthreelinesshowyouthatassigningattributestoaninstancedoesn'tmeanthattheywillbefoundintheclass.Instancesgetwhateverisintheclass,buttheoppositeisnottrue.
Whatdoyouthinkaboutputtingthexandycoordinatesasclassattributes?Doyouthinkitwasagoodidea?WhatifyouaddedanotherinstanceofPoint?Wouldthathelptoshowwhyclassattributescanbeveryuseful?
Me,myself,andI–usingtheselfvariableFromwithinaclassmethod,wecanrefertoaninstancebymeansofaspecialargument,calledselfbyconvention.selfisalwaysthefirstattributeofaninstancemethod.Let'sexaminethisbehaviortogetherwithhowwecanshare,notjustattributes,butmethodswithallinstances:
#oop/class.self.py
classSquare:
side=8
defarea(self):#selfisareferencetoaninstance
returnself.side**2
sq=Square()
print(sq.area())#64(sideisfoundontheclass)
print(Square.area(sq))#64(equivalenttosq.area())
sq.side=10
print(sq.area())#100(sideisfoundontheinstance)
Notehowtheareamethodisusedbysq.Thetwocalls,Square.area(sq)andsq.area(),areequivalent,andteachushowthemechanismworks.Eitheryoupasstheinstancetothemethodcall(Square.area(sq)),whichwithinthemethodwilltakethenameself,oryoucanuseamorecomfortablesyntax,sq.area(),andPythonwilltranslatethatforyoubehindthescenes.
Let'slookatabetterexample:
#oop/class.price.py
classPrice:
deffinal_price(self,vat,discount=0):
"""Returnspriceafterapplyingvatandfixeddiscount."""
return(self.net_price*(100+vat)/100)-discount
p1=Price()
p1.net_price=100
print(Price.final_price(p1,20,10))#110(100*1.2-10)
print(p1.final_price(20,10))#equivalent
Theprecedingcodeshowsyouthatnothingpreventsusfromusingargumentswhendeclaringmethods.Wecanusetheexactsamesyntaxasweusedwiththefunction,butweneedtorememberthatthefirstargumentwillalwaysbetheinstance.Wedon'tneedtonecessarilycallitself,butit'stheconvention,andthis
isoneofthefewcaseswhereit'sveryimportanttoabidebyit.
Initializinganinstance
Haveyounoticedhow,beforecallingp1.final_price(...),wehadtoassignnet_pricetop1?Thereisabetterwaytodoit.Inotherlanguages,thiswouldbecalledaconstructor,butinPython,it'snot.Itisactuallyaninitializer,sinceitworksonanalready-createdinstance,andthereforeit'scalled__init__.It'samagicmethod,whichisrunrightaftertheobjectiscreated.Pythonobjectsalsohavea__new__method,whichistheactualconstructor.Inpractice,it'snotsocommontohavetooverrideitthough,it'sapracticethatismostlyusedwhencodingmetaclasses,whichaswementioned,isafairlyadvancedtopicthatwewon'texploreinthebook:
#oop/class.init.py
classRectangle:
def__init__(self,side_a,side_b):
self.side_a=side_a
self.side_b=side_b
defarea(self):
returnself.side_a*self.side_b
r1=Rectangle(10,4)
print(r1.side_a,r1.side_b)#104
print(r1.area())#40
r2=Rectangle(7,3)
print(r2.area())#21
Thingsarefinallystartingtotakeshape.Whenanobjectiscreated,the__init__methodisautomaticallyrunforus.Inthiscase,Icodeditsothatwhenwecreateanobject(bycallingtheclassnamelikeafunction),wepassargumentstothecreationcall,likewewouldonanyregularfunctioncall.Thewaywepassparametersfollowsthesignatureofthe__init__method,andtherefore,inthetwocreationstatements,10and7willbeside_aforr1andr2,respectively,while4and3willbeside_b.Youcanseethatthecalltoarea()fromr1andr2reflectsthattheyhavedifferentinstancearguments.Settingupobjectsinthiswayismuchnicerandmoreconvenient.
OOPisaboutcodereuseBynowitshouldbeprettyclear:OOPisallaboutcodereuse.Wedefineaclass,wecreateinstances,andthoseinstancesusemethodsthataredefinedonlyintheclass.Theywillbehavedifferentlyaccordingtohowtheinstanceshavebeensetupbytheinitializer.
InheritanceandcompositionButthisisjusthalfofthestory,OOPismuchmorepowerful.Wehavetwomaindesignconstructstoexploit:inheritanceandcomposition.
InheritancemeansthattwoobjectsarerelatedbymeansofanIs-Atypeofrelationship.Ontheotherhand,compositionmeansthattwoobjectsarerelatedbymeansofaHas-Atypeofrelationship.It'sallveryeasytoexplainwithanexample:
#oop/class_inheritance.py
classEngine:
defstart(self):
pass
defstop(self):
pass
classElectricEngine(Engine):#Is-AEngine
pass
classV8Engine(Engine):#Is-AEngine
pass
classCar:
engine_cls=Engine
def__init__(self):
self.engine=self.engine_cls()#Has-AEngine
defstart(self):
print(
'Startingengine{0}forcar{1}...Wroom,wroom!'
.format(
self.engine.__class__.__name__,
self.__class__.__name__)
)
self.engine.start()
defstop(self):
self.engine.stop()
classRaceCar(Car):#Is-ACar
engine_cls=V8Engine
classCityCar(Car):#Is-ACar
engine_cls=ElectricEngine
classF1Car(RaceCar):#Is-ARaceCarandalsoIs-ACar
pass#engine_clssameasparent
car=Car()
racecar=RaceCar()
citycar=CityCar()
f1car=F1Car()
cars=[car,racecar,citycar,f1car]
forcarincars:
car.start()
"""Prints:
StartingengineEngineforcarCar...Wroom,wroom!
StartingengineV8EngineforcarRaceCar...Wroom,wroom!
StartingengineElectricEngineforcarCityCar...Wroom,wroom!
StartingengineV8EngineforcarF1Car...Wroom,wroom!
"""
TheprecedingexampleshowsyouboththeIs-AandHas-Atypesofrelationshipsbetweenobjects.Firstofall,let'sconsiderEngine.It'sasimpleclassthathastwomethods,startandstop.WethendefineElectricEngineandV8Engine,whichbothinheritfromEngine.Youcanseethatbythefactthatwhenwedefinethem,weputEnginewithinthebracketsaftertheclassname.
ThismeansthatbothElectricEngineandV8EngineinheritattributesandmethodsfromtheEngineclass,whichissaidtobetheirbaseclass.
Thesamehappenswithcars.CarisabaseclassforbothRaceCarandCityCar.RaceCarisalsothebaseclassforF1Car.AnotherwayofsayingthisisthatF1CarinheritsfromRaceCar,whichinheritsfromCar.Therefore,F1CarIs-ARaceCarandRaceCarIs-ACar.Becauseofthetransitiveproperty,wecansaythatF1CarIs-ACaraswell.CityCartoo,Is-ACar.
WhenwedefineclassA(B):pass,wesayAisthechildofB,andBistheparentofA.Theparentandbaseclassesaresynonyms,arechildandderived.Also,wesaythataclassinheritsfromanotherclass,orthatitextendsit.
Thisistheinheritancemechanism.
Ontheotherhand,let'sgobacktothecode.Eachclasshasaclassattribute,engine_cls,whichisareferencetotheengineclasswewanttoassigntoeachtypeofcar.CarhasagenericEngine,whilethetworacecarshaveapowerfulV8engine,andthecitycarhasanelectricone.
Whenacariscreatedintheinitializermethod,__init__,wecreateaninstanceofwhateverengineclassisassignedtothecar,andsetitastheengineinstanceattribute.
Itmakessensetohaveengine_clssharedamongallclassinstancesbecauseit'squitelikelythatthesameinstancesofacarwillhavethesamekindofengine.Ontheotherhand,itwouldn'tbegoodtohaveonesingleengine(aninstanceofanyEngineclass)asaclassattribute,becausewewouldbesharingoneengineamongallinstances,whichisincorrect.
ThetypeofrelationshipbetweenacaranditsengineisaHas-Atype.AcarHas-Aengine.Thisiscalledcomposition,andreflectsthefactthatobjectscanbemadeofmanyotherobjects.AcarHas-Aengine,gears,wheels,aframe,doors,seats,andsoon.
WhendesigningOOPcode,itisofvitalimportancetodescribeobjectsinthiswaysothatwecanuseinheritanceandcompositioncorrectlytostructureourcodeinthebestway.
NoticehowIhadtoavoidhavingdotsintheclass_inheritance.pyscriptname,asdotsinmodulenamesmakeitimportsdifficult.Mostmodulesinthesourcecodeofthebookaremeanttoberunasstandalonescripts,thereforeIchosetoadddotstoenhancereadabilitywhenpossible,butingeneral,youwanttoavoiddotsinyourmodulenames.
Beforeweleavethisparagraph,let'scheckwhetherItoldyouthetruthwithanotherexample:
#oop/class.issubclass.isinstance.py
fromclass_inheritanceimportCar,RaceCar,F1Car
car=Car()
racecar=RaceCar()
f1car=F1Car()
cars=[(car,'car'),(racecar,'racecar'),(f1car,'f1car')]
car_classes=[Car,RaceCar,F1Car]
forcar,car_nameincars:
forclass_incar_classes:
belongs=isinstance(car,class_)
msg='isa'ifbelongselse'isnota'
print(car_name,msg,class_.__name__)
"""Prints:
carisaCar
carisnotaRaceCar
carisnotaF1Car
racecarisaCar
racecarisaRaceCar
racecarisnotaF1Car
f1carisaCar
f1carisaRaceCar
f1carisaF1Car
"""
Asyoucansee,carisjustaninstanceofCar,whileracecarisaninstanceofRaceCar(andofCar,byextension)andf1carisaninstanceofF1Car(andofbothRaceCarandCar,byextension).Abananaisaninstanceofbanana.But,also,itisaFruit.Also,itisFood,right?Thisisthesameconcept.Tocheckwhetheranobjectisaninstanceofaclass,usetheisinstancemethod.Itisrecommendedoversheertypecomparison:(type(object)==Class).
NoticeIhaveleftouttheprintsyougetwheninstantiatingthecars.Wesawtheminthepreviousexample.
Let'salsocheckinheritance–samesetup,differentlogicintheforloops:
#oop/class.issubclass.isinstance.py
forclass1incar_classes:
forclass2incar_classes:
is_subclass=issubclass(class1,class2)
msg='{0}asubclassof'.format(
'is'ifis_subclasselse'isnot')
print(class1.__name__,msg,class2.__name__)
"""Prints:
CarisasubclassofCar
CarisnotasubclassofRaceCar
CarisnotasubclassofF1Car
RaceCarisasubclassofCar
RaceCarisasubclassofRaceCar
RaceCarisnotasubclassofF1Car
F1CarisasubclassofCar
F1CarisasubclassofRaceCar
F1CarisasubclassofF1Car
"""
Interestingly,welearnthataclassisasubclassofitself.ChecktheoutputoftheprecedingexampletoseethatitmatchestheexplanationIprovided.
OnethingtonoticeaboutconventionsisthatclassnamesarealwayswrittenusingCapWords,whichmeansThisWayIsCorrect,asopposedtofunctionsandmethods,whicharewrittenthis_way_is_correct.Also,wheninthecode,youwanttouseanamethatisaPython-reservedkeywordorabuilt-infunctionorclass,theconventionistoaddatrailingunderscoretothename.Inthefirstforloopexample,I'mloopingthroughtheclassnamesusingforclass_in...,becauseclassisareservedword.ButyoualreadyknewallthisbecauseyouhavethoroughlystudiedPEP8,right?
TohelpyoupicturethedifferencebetweenIs-AandHas-A,takealookatthefollowingdiagram:
AccessingabaseclassWe'vealreadyseenclassdeclarations,suchasclassClassA:passandclassClassB(BaseClassName):pass.Whenwedon'tspecifyabaseclassexplicitly,Pythonwillsetthespecialobjectclassasthebaseclassfortheonewe'redefining.Ultimately,allclassesderivefromanobject.Notethat,ifyoudon'tspecifyabaseclass,bracketsareoptional.
Therefore,writingclassA:passorclassA():passorclassA(object):passisexactlythesamething.TheobjectclassisaspecialclassinthatithasthemethodsthatarecommontoallPythonclasses,anditdoesn'tallowyoutosetanyattributesonit.
Let'sseehowwecanaccessabaseclassfromwithinaclass:
#oop/super.duplication.py
classBook:
def__init__(self,title,publisher,pages):
self.title=title
self.publisher=publisher
self.pages=pages
classEbook(Book):
def__init__(self,title,publisher,pages,format_):
self.title=title
self.publisher=publisher
self.pages=pages
self.format_=format_
Takealookattheprecedingcode.ThreeoftheinputparametersareduplicatedinEbook.Thisisquitebadpracticebecausewenowhavetwosetsofinstructionsthataredoingthesamething.Moreover,anychangeinthesignatureofBook.__init__willnotbereflectedinEbook.WeknowthatEbookIs-ABook,andthereforewewouldprobablywantchangestobereflectedinthechildrenclasses.
Let'sseeonewaytofixthisissue:
#oop/super.explicit.py
classBook:
def__init__(self,title,publisher,pages):
self.title=title
self.publisher=publisher
self.pages=pages
classEbook(Book):
def__init__(self,title,publisher,pages,format_):
Book.__init__(self,title,publisher,pages)
self.format_=format_
ebook=Ebook(
'LearnPythonProgramming','PacktPublishing',500,'PDF')
print(ebook.title)#LearnPythonProgramming
print(ebook.publisher)#PacktPublishing
print(ebook.pages)#500
print(ebook.format_)#PDF
Now,that'sbetter.Wehaveremovedthatnastyduplication.Basically,wetellPythontocallthe__init__methodoftheBookclass,andwefeedselftothecall,makingsurethatwebindthatcalltothepresentinstance.
Ifwemodifythelogicwithinthe__init__methodofBook,wedon'tneedtotouchEbook,itwillauto-adapttothechange.
Thisapproachisgood,butwecanstilldoabitbetter.SaythatwechangethenameofBooktoLiber,becausewe'vefalleninlovewithLatin.Wehavetochangethe__init__methodofEbooktoreflectthechange.Thiscanbeavoidedbyusingsuper:
#oop/super.implicit.py
classBook:
def__init__(self,title,publisher,pages):
self.title=title
self.publisher=publisher
self.pages=pages
classEbook(Book):
def__init__(self,title,publisher,pages,format_):
super().__init__(title,publisher,pages)
#Anotherwaytodothesamethingis:
#super(Ebook,self).__init__(title,publisher,pages)
self.format_=format_
ebook=Ebook(
'LearnPythonProgramming','PacktPublishing',500,'PDF')
print(ebook.title)#LearnPythonProgramming
print(ebook.publisher)#PacktPublishing
print(ebook.pages)#500
print(ebook.format_)#PDF
superisafunctionthatreturnsaproxyobjectthatdelegatesmethodcallstoaparentorsiblingclass.Inthiscase,itwilldelegatethatcallto__init__totheBookclass,andthebeautyofthismethodisthatnowwe'reevenfreetochangeBooktoLiberwithouthavingtotouchthelogicinthe__init__methodofEbook.
Nowthatweknowhowtoaccessabaseclassfromachild,let'sexplorePython'smultipleinheritance.
MultipleinheritanceApartfromcomposingaclassusingmorethanonebaseclass,whatisofinteresthereishowanattributesearchisperformed.Takealookatthefollowingdiagram:
Asyoucansee,ShapeandPlotteractasbaseclassesforalltheothers.Polygoninheritsdirectlyfromthem,RegularPolygoninheritsfromPolygon,andbothRegularHexagonandSquareinheritfromRegulaPolygon.NotealsothatShapeandPlotterimplicitlyinheritfromobject,thereforewehavewhatiscalledadiamondor,insimplerterms,morethanonepathtoreachabaseclass.We'llseewhythismattersinafewmoments.Let'stranslateitintosomesimplecode:
#oop/multiple.inheritance.py
classShape:
geometric_type='GenericShape'
defarea(self):#Thisactsasplaceholderfortheinterface
raiseNotImplementedError
defget_geometric_type(self):
returnself.geometric_type
classPlotter:
defplot(self,ratio,topleft):
#Imaginesomeniceplottinglogichere...
print('Plottingat{},ratio{}.'.format(
topleft,ratio))
classPolygon(Shape,Plotter):#baseclassforpolygons
geometric_type='Polygon'
classRegularPolygon(Polygon):#Is-APolygon
geometric_type='RegularPolygon'
def__init__(self,side):
self.side=side
classRegularHexagon(RegularPolygon):#Is-ARegularPolygon
geometric_type='RegularHexagon'
defarea(self):
return1.5*(3**.5*self.side**2)
classSquare(RegularPolygon):#Is-ARegularPolygon
geometric_type='Square'
defarea(self):
returnself.side*self.side
hexagon=RegularHexagon(10)
print(hexagon.area())#259.8076211353316
print(hexagon.get_geometric_type())#RegularHexagon
hexagon.plot(0.8,(75,77))#Plottingat(75,77),ratio0.8.
square=Square(12)
print(square.area())#144
print(square.get_geometric_type())#Square
square.plot(0.93,(74,75))#Plottingat(74,75),ratio0.93.
Takealookattheprecedingcode:theShapeclasshasoneattribute,geometric_type,andtwomethods:areaandget_geometric_type.It'squitecommontousebaseclasses(suchasShape,inourexample)todefineaninterface–methodsforwhichchildrenmustprovideanimplementation.Therearedifferentandbetterwaystodothis,butIwanttokeepthisexampleassimpleaspossible.
WealsohavethePlotterclass,whichaddstheplotmethod,therebyprovidingplottingcapabilitiesforanyclassthatinheritsfromit.Ofcourse,theplotimplementationisjustadummyprintinthisexample.ThefirstinterestingclassisPolygon,whichinheritsfrombothShapeandPlotter.
Therearemanytypesofpolygons,oneofwhichistheregularone,whichisbothequiangular(allanglesareequal)andequilateral(allsidesareequal),sowecreatetheRegularPolygonclassthatinheritsfromPolygon.Foraregularpolygon,whereallsidesareequal,wecanimplementasimple__init__methodonRegularPolygon,whichtakesthelengthoftheside.Finally,wecreatetheRegularHexagonandSquareclasses,whichbothinheritfromRegularPolygon.
Thisstructureisquitelong,buthopefullygivesyouanideaofhowtospecializetheclassificationofyourobjectswhenyoudesignthecode.
Now,pleasetakealookatthelasteightlines.NotethatwhenIcalltheareamethodonhexagonandsquare,Igetthecorrectareaforboth.Thisisbecausetheybothprovidethecorrectimplementationlogicforit.Also,Icancall
get_geometric_typeonbothofthem,eventhoughitisnotdefinedontheirclasses,andPythonhastogoallthewayuptoShapetofindanimplementationforit.Notethat,eventhoughtheimplementationisprovidedintheShapeclass,theself.geometric_typeusedforthereturnvalueiscorrectlytakenfromthecallerinstance.
Theplotmethodcallsarealsointeresting,andshowyouhowyoucanenrichyourobjectswithcapabilitiestheywouldn'totherwisehave.ThistechniqueisverypopularinwebframeworkssuchasDjango(whichwe'llexploreChapter14,WebDevelopment),whichprovidesspecialclassescalledmixins,whosecapabilitiesyoucanjustuseoutofthebox.Allyouhavetodoistodefinethedesiredmixinasonethebaseclassesforyourown,andthat'sit.
Multipleinheritanceispowerful,butcanalsogetreallymessy,soweneedtomakesureweunderstandwhathappenswhenweuseit.
MethodresolutionorderBynow,weknowthatwhenyouaskforsomeobject.attributeandattributeisnotfoundonthatobject,Pythonstartssearchingintheclassthatsomeobjectwascreatedfrom.Ifit'snotthereeither,Pythonsearchesuptheinheritancechainuntileitherattributeisfoundortheobjectclassisreached.Thisisquitesimpletounderstandiftheinheritancechainisonlycomposedofsingle-inheritancesteps,whichmeansthatclasseshaveonlyoneparent.However,whenmultipleinheritanceisinvolved,therearecaseswhenit'snotstraightforwardtopredictwhatwillbethenextclassthatwillbesearchedforifanattributeisnotfound.
Pythonprovidesawaytoalwaysknowtheorderinwhichclassesaresearchedonattributelookup:theMethodResolutionOrder(MRO).
TheMROistheorderinwhichbaseclassesaresearchedforamemberduringlookup.Fromversion2.3,PythonusesanalgorithmcalledC3,whichguaranteesmonotonicity.InPython2.2,new-styleclasseswereintroduced.Thewayyouwriteanew-styleclassinPython2.*istodefineitwithanexplicitobjectbaseclass.ClassicclasseswerenotexplicitlyinheritingfromobjectandhavebeenremovedinPython3.Oneofthedifferencesbetweenclassicandnew-styleclassesinPython2.*isthatnew-styleclassesaresearchedwiththenewMRO.
Withregardstothepreviousexample,let'sseetheMROfortheSquareclass:
#oop/multiple.inheritance.py
print(square.__class__.__mro__)
#prints:
#(<class'__main__.Square'>,<class'__main__.RegularPolygon'>,
#<class'__main__.Polygon'>,<class'__main__.Shape'>,
#<class'__main__.Plotter'>,<class'object'>)
TogettotheMROofaclass,wecangofromtheinstancetoits__class__attribute,andfromthattoits__mro__attribute.Alternatively,wecouldhavecalledSquare.__mro__,orSquare.mro()directly,butifyouhavetodoitdynamically,it'smorelikelyyouwillhaveanobjectthanaclass.
NotethattheonlypointofdoubtisthebisectionafterPolygon,wheretheinheritancechainbreaksintotwoways:oneleadstoShapeandtheothertoPlotter.WeknowbyscanningtheMROfortheSquareclassthatShapeissearchedbeforePlotter.
Whyisthisimportant?Well,considerthefollowingcode:
#oop/mro.simple.py
classA:
label='a'
classB(A):
label='b'
classC(A):
label='c'
classD(B,C):
pass
d=D()
print(d.label)#Hypotheticallythiscouldbeeither'b'or'c'
BothBandCinheritfromA,andDinheritsfrombothBandC.Thismeansthatthelookupforthelabelattributecanreachthetop(A)througheitherBorC.Accordingtowhichisreachedfirst,wegetadifferentresult.
So,intheprecedingexample,weget'b',whichiswhatwewereexpecting,sinceBistheleftmostoneamongthebaseclassesofD.ButwhathappensifIremovethelabelattributefromB?Thiswouldbeaconfusingsituation:willthealgorithmgoallthewayuptoAorwillitgettoCfirst?Let'sfindout:
#oop/mro.py
classA:
label='a'
classB(A):
pass#was:label='b'
classC(A):
label='c'
classD(B,C):
pass
d=D()
print(d.label)#'c'
print(d.__class__.mro())#noticeanotherwaytogettheMRO
#prints:
#[<class'__main__.D'>,<class'__main__.B'>,
#<class'__main__.C'>,<class'__main__.A'>,<class'object'>]
So,welearnthattheMROisD-B-C-A-object,whichmeanswhenweaskford.label,weget'c',whichiscorrect.
Inday-to-dayprogramming,itisnotcommontohavetodealwiththeMRO,butthefirsttimeyoufightagainstsomemixinfromaframework,Ipromiseyou'll
begladIspentaparagraphexplainingit.
ClassandstaticmethodsSofar,wehavecodedclasseswithattributesintheformofdataandinstancemethods,buttherearetwoothertypesofmethodsthatwecanplaceinsideaclass:staticmethodsandclassmethods.
StaticmethodsAsyoumayrecall,whenyoucreateaclassobject,Pythonassignsanametoit.Thatnameactsasanamespace,andsometimesitmakessensetogroupfunctionalitiesunderit.Staticmethodsareperfectforthisusecasesince,unlikeinstancemethods,theyarenotpassedanyspecialargument.Let'slookatanexampleofanimaginaryStringUtilclass:
#oop/static.methods.py
classStringUtil:
@staticmethod
defis_palindrome(s,case_insensitive=True):
#weallowonlylettersandnumbers
s=''.join(cforcinsifc.isalnum())#Studythis!
#Forcaseinsensitivecomparison,welower-cases
ifcase_insensitive:
s=s.lower()
forcinrange(len(s)//2):
ifs[c]!=s[-c-1]:
returnFalse
returnTrue
@staticmethod
defget_unique_words(sentence):
returnset(sentence.split())
print(StringUtil.is_palindrome(
'Radar',case_insensitive=False))#False:CaseSensitive
print(StringUtil.is_palindrome('Anutforajaroftuna'))#True
print(StringUtil.is_palindrome('NeverOdd,OrEven!'))#True
print(StringUtil.is_palindrome(
'InGirumImusNocteEtConsumimurIgni')#Latin!Show-off!
)#True
print(StringUtil.get_unique_words(
'Ilovepalindromes.Ireallyreallylovethem!'))
#{'them!','really','palindromes.','I','love'}
Theprecedingcodeisquiteinteresting.Firstofall,welearnthatstaticmethodsarecreatedbysimplyapplyingthestaticmethoddecoratortothem.Youcanseethattheyaren'tpassedanyspecialargumentso,apartfromthedecoration,theyreallyjustlooklikefunctions.
Wehaveaclass,StringUtil,thatactsasacontainerforfunctions.Anotherapproachwouldbetohaveaseparatemodulewithfunctionsinside.It'sreallyamatterofpreferencemostofthetime.
Thelogicinsideis_palindromeshouldbestraightforwardforyoutounderstandbynow,but,justincase,let'sgothroughit.First,weremoveallcharactersfromsthatareneitherlettersnornumbers.Inordertodothis,weusethejoinmethodofastringobject(anemptystringobject,inthiscase).Bycallingjoinonanemptystring,theresultisthatallelementsintheiterableyoupasstojoinwillbeconcatenatedtogether.Wefeedjoinageneratorexpressionthatsaystotakeanycharacterfromsifthecharacteriseitheralphanumericoranumber.Thisisbecause,inpalindromesentences,wewanttodiscardanythingthatisnotacharacteroranumber.
Wethenlowercasesifcase_insensitiveisTrue,andthenweproceedtocheckwhetheritisapalindrome.Inordertodothis,wecomparethefirstandlastcharacters,thenthesecondandthesecondtolast,andsoon.Ifatanypointwefindadifference,itmeansthestringisn'tapalindromeandthereforewecanreturnFalse.Ontheotherhand,ifweexittheforloopnormally,itmeansnodifferenceswerefound,andwecanthereforesaythestringisapalindrome.
Noticethatthiscodeworkscorrectlyregardlessofthelengthofthestring;thatis,ifthelengthisoddoreven.len(s)//2reacheshalfofs,andifsisanoddamountofcharacterslong,themiddleonewon'tbechecked(suchasinRaDaR,Disnotchecked),butwedon'tcare;itwouldbecomparedwithitselfsoit'salwayspassingthatcheck.
get_unique_wordsismuchsimpler:itjustreturnsasettowhichwefeedalistwiththewordsfromasentence.Thesetclassremovesanyduplicationforus,sowedon'tneedtodoanythingelse.
TheStringUtilclassprovidesusanicecontainernamespaceformethodsthataremeanttoworkonstrings.IcouldhavecodedasimilarexamplewithaMathUtilclass,andsomestaticmethodstoworkonnumbers,butIwantedtoshowyousomethingdifferent.
ClassmethodsClassmethodsareslightlydifferentfromstaticmethodsinthat,likeinstancemethods,theyalsotakeaspecialfirstargument,butinthiscase,itistheclassobjectitself.Averycommonusecaseforcodingclassmethodsistoprovidefactorycapabilitytoaclass.Let'sseeanexample:
#oop/class.methods.factory.py
classPoint:
def__init__(self,x,y):
self.x=x
self.y=y
@classmethod
deffrom_tuple(cls,coords):#clsisPoint
returncls(*coords)
@classmethod
deffrom_point(cls,point):#clsisPoint
returncls(point.x,point.y)
p=Point.from_tuple((3,7))
print(p.x,p.y)#37
q=Point.from_point(p)
print(q.x,q.y)#37
Intheprecedingcode,Ishowedyouhowtouseaclassmethodtocreateafactoryfortheclass.Inthiscase,wewanttocreateaPointinstancebypassingbothcoordinates(regularcreationp=Point(3,7)),butwealsowanttobeabletocreateaninstancebypassingatuple(Point.from_tuple)oranotherinstance(Point.from_point).
Withinthetwoclassmethods,theclsargumentreferstothePointclass.Aswiththeinstancemethod,whichtakesselfasthefirstargument,theclassmethodtakesaclsargument.Bothselfandclsarenamedafteraconventionthatyouarenotforcedtofollowbutarestronglyencouragedtorespect.ThisissomethingthatnoPythoncoderwouldchangebecauseitissostrongaconventionthatparsers,linters,andanytoolthatautomaticallydoessomethingwithyourcodewouldexpect,soit'smuchbettertosticktoit.
Classandstaticmethodsplaywelltogether.Staticmethodsareactuallyquitehelpfulinbreakingupthelogicofaclassmethodtoimproveitslayout.Let'ssee
anexamplebyrefactoringtheStringUtilclass:
#oop/class.methods.split.py
classStringUtil:
@classmethod
defis_palindrome(cls,s,case_insensitive=True):
s=cls._strip_string(s)
#Forcaseinsensitivecomparison,welower-cases
ifcase_insensitive:
s=s.lower()
returncls._is_palindrome(s)
@staticmethod
def_strip_string(s):
return''.join(cforcinsifc.isalnum())
@staticmethod
def_is_palindrome(s):
forcinrange(len(s)//2):
ifs[c]!=s[-c-1]:
returnFalse
returnTrue
@staticmethod
defget_unique_words(sentence):
returnset(sentence.split())
print(StringUtil.is_palindrome('Anutforajaroftuna'))#True
print(StringUtil.is_palindrome('Anutforajarofbeans'))#False
Comparethiscodewiththepreviousversion.Firstofall,notethateventhoughis_palindromeisnowaclassmethod,wecallitinthesamewaywewerecallingitwhenitwasastaticone.Thereasonwhywechangedittoaclassmethodisthatafterfactoringoutacoupleofpiecesoflogic(_strip_stringand_is_palindrome),weneedtogetareferencetothem,andifwehavenoclsinourmethod,theonlyoptionwouldbetocallthemlikethis:StringUtil._strip_string(...)andStringUtil._is_palindrome(...),whichisnotgoodpractice,becausewewouldhardcodetheclassnameintheis_palindromemethod,therebyputtingourselvesinthepositionofhavingtomodifyitwheneverwewanttochangetheclassname.Usingclswillactastheclassname,whichmeansourcodewon'tneedanyamendments.
Noticehowthenewlogicreadsmuchbetterthanthepreviousversion.Moreover,noticethat,bynamingthefactored-outmethodswithaleadingunderscore,Iamhintingthatthosemethodsarenotsupposedtobecalledfromoutsidetheclass,butthiswillbethesubjectofthenextparagraph.
PrivatemethodsandnamemanglingIfyouhaveanybackgroundwithlanguageslikeJava,C#,orC++,thenyouknowtheyallowtheprogrammertoassignaprivacystatustoattributes(bothdataandmethods).Eachlanguagehasitsownslightlydifferentflavorforthis,butthegististhatpublicattributesareaccessiblefromanypointinthecode,whileprivateonesareaccessibleonlywithinthescopetheyaredefinedin.
InPython,thereisnosuchthing.Everythingispublic;therefore,werelyonconventionsandonamechanismcallednamemangling.
Theconventionisasfollows:ifanattribute'snamehasnoleadingunderscores,itisconsideredpublic.Thismeansyoucanaccessitandmodifyitfreely.Whenthenamehasoneleadingunderscore,theattributeisconsideredprivate,whichmeansit'sprobablymeanttobeusedinternallyandyoushouldnotuseitormodifyitfromtheoutside.Averycommonusecaseforprivateattributesarehelpermethodsthataresupposedtobeusedbypublicones(possiblyincallchainsinconjunctionwithothermethods),andinternaldata,suchasscalingfactors,oranyotherdatathatideallywewouldputinaconstant(avariablethatcannotchange,but,surprise,surprise,Pythondoesn'thavethoseeither).
Thischaracteristicusuallyscarespeoplefromotherbackgroundsoff;theyfeelthreatenedbythelackofprivacy.Tobehonest,inmywholeprofessionalexperiencewithPython,I'veneverheardanyonescreaming"ohmyGod,wehaveaterriblebugbecausePythonlacksprivateattributes!"Notonce,Iswear.
Thatsaid,thecallforprivacyactuallymakessensebecausewithoutit,youriskintroducingbugsintoyourcodeforreal.LetmeshowyouwhatImean:
#oop/private.attrs.py
classA:
def__init__(self,factor):
self._factor=factor
defop1(self):
print('Op1withfactor{}...'.format(self._factor))
classB(A):
defop2(self,factor):
self._factor=factor
print('Op2withfactor{}...'.format(self._factor))
obj=B(100)
obj.op1()#Op1withfactor100...
obj.op2(42)#Op2withfactor42...
obj.op1()#Op1withfactor42...<-ThisisBAD
Intheprecedingcode,wehaveanattributecalled_factor,andlet'spretendit'ssoimportantthatitisn'tmodifiedatruntimeaftertheinstanceiscreated,becauseop1dependsonittofunctioncorrectly.We'venameditwithaleadingunderscore,buttheissuehereisthatwhenwecallobj.op2(42),wemodifyit,andthisisreflectedinsubsequentcallstoop1.
Let'sfixthisundesiredbehaviorbyaddinganotherleadingunderscore:
#oop/private.attrs.fixed.py
classA:
def__init__(self,factor):
self.__factor=factor
defop1(self):
print('Op1withfactor{}...'.format(self.__factor))
classB(A):
defop2(self,factor):
self.__factor=factor
print('Op2withfactor{}...'.format(self.__factor))
obj=B(100)
obj.op1()#Op1withfactor100...
obj.op2(42)#Op2withfactor42...
obj.op1()#Op1withfactor100...<-Wohoo!Nowit'sGOOD!
Wow,lookatthat!Nowit'sworkingasdesired.Pythoniskindofmagicandinthiscase,whatishappeningisthatthename-manglingmechanismhaskickedin.
Namemanglingmeansthatanyattributenamethathasatleasttwoleadingunderscoresandatmostonetrailingunderscore,suchas__my_attr,isreplacedwithanamethatincludesanunderscoreandtheclassnamebeforetheactualname,suchas_ClassName__my_attr.
Thismeansthatwhenyouinheritfromaclass,themanglingmechanismgivesyourprivateattributetwodifferentnamesinthebaseandchildclassessothatnamecollisionisavoided.Everyclassandinstanceobjectstoresreferencestotheirattributesinaspecialattributecalled__dict__,solet'sinspectobj.__dict__toseenamemanglinginaction:
#oop/private.attrs.py
print(obj.__dict__.keys())
#dict_keys(['_factor'])
Thisisthe_factorattributethatwefindintheproblematicversionofthisexample.Butlookattheonethatisusing__factor:
#oop/private.attrs.fixed.py
print(obj.__dict__.keys())
#dict_keys(['_A__factor','_B__factor'])
See?objhastwoattributesnow,_A__factor(mangledwithintheAclass),and_B__factor(mangledwithintheBclass).Thisisthemechanismthatensuresthatwhenyoudoobj.__factor=42,__factorinAisn'tchanged,becauseyou'reactuallytouching_B__factor,whichleaves_A__factorsafeandsound.
Ifyou'redesigningalibrarywithclassesthataremeanttobeusedandextendedbyotherdevelopers,youwillneedtokeepthisinmindinordertoavoidtheunintentionaloverridingofyourattributes.Bugslikethesecanbeprettysubtleandhardtospot.
ThepropertydecoratorAnotherthingthatwouldbeacrimenottomentionisthepropertydecorator.ImaginethatyouhaveanageattributeinaPersonclassandatsomepointyouwanttomakesurethatwhenyouchangeitsvalue,you'realsocheckingthatageiswithinaproperrange,suchas[18,99].Youcanwriteaccessormethods,suchasget_age()andset_age(...)(alsocalledgettersandsetters),andputthelogicthere.get_age()willmostlikelyjustreturnage,whileset_age(...)willalsodotherangecheck.Theproblemisthatyoumayalreadyhavealotofcodeaccessingtheageattributedirectly,whichmeansyou'renowuptosometediousrefactoring.LanguageslikeJavaovercomethisproblembyusingtheaccessorpatternbasicallybydefault.ManyJavaIntegratedDevelopmentEnvironments(IDEs)autocompleteanattributedeclarationbywritinggetterandsetteraccessormethodstubsforyouonthefly.
Pythonissmarter,anddoesthiswiththepropertydecorator.Whenyoudecorateamethodwithproperty,youcanusethenameofthemethodasifitwereadataattribute.Becauseofthis,it'salwaysbesttorefrainfromputtinglogicthatwouldtakeawhiletocompleteinsuchmethodsbecause,byaccessingthemasattributes,wearenotexpectingtowait.
Let'slookatanexample:
#oop/property.py
classPerson:
def__init__(self,age):
self.age=age#anyonecanmodifythisfreely
classPersonWithAccessors:
def__init__(self,age):
self._age=age
defget_age(self):
returnself._age
defset_age(self,age):
if18<=age<=99:
self._age=age
else:
raiseValueError('Agemustbewithin[18,99]')
classPersonPythonic:
def__init__(self,age):
self._age=age
@property
defage(self):
returnself._age
@age.setter
defage(self,age):
if18<=age<=99:
self._age=age
else:
raiseValueError('Agemustbewithin[18,99]')
person=PersonPythonic(39)
print(person.age)#39-Noticeweaccessasdataattribute
person.age=42#Noticeweaccessasdataattribute
print(person.age)#42
person.age=100#ValueError:Agemustbewithin[18,99]
ThePersonclassmaybethefirstversionwewrite.Thenwerealizeweneedtoputtherangelogicinplaceso,withanotherlanguage,wewouldhavetorewritePersonasthePersonWithAccessorsclass,andrefactorallthecodethatwasusingPerson.age.InPython,werewritePersonasPersonPythonic(younormallywouldn'tchangethename,ofcourse)sothattheageisstoredinaprivate_agevariable,andwedefinepropertygettersandsettersusingthatdecoration,whichallowsustokeepusingthepersoninstancesaswewerebefore.Agetterisamethodthatiscalledwhenweaccessanattributeforreading.Ontheotherhand,asetterisamethodthatiscalledwhenweaccessanattributetowriteit.Inotherlanguages,suchasJava,it'scustomarytodefinethemasget_age()andset_age(intvalue),butIfindthePythonsyntaxmuchneater.Itallowsyoutostartwritingsimplecodeandrefactorlateron,onlywhenyouneedit,thereisnoneedtopolluteyourcodewithaccessorsonlybecausetheymaybehelpfulinthefuture.
Thepropertydecoratoralsoallowsforread-onlydata(nosetter)andforspecialactionswhentheattributeisdeleted.Pleaserefertotheofficialdocumentationtodigdeeper.
OperatoroverloadingIfindPython'sapproachtooperatoroverloadingtobebrilliant.Tooverloadanoperatormeanstogiveitameaningaccordingtothecontextinwhichitisused.Forexample,the+operatormeansadditionwhenwedealwithnumbers,butconcatenationwhenwedealwithsequences.
InPython,whenyouuseoperators,you'remostlikelycallingthespecialmethodsofsomeobjectsbehindthescenes.Forexample,thea[k]callroughlytranslatestotype(a).__getitem__(a,k).
Asanexample,let'screateaclassthatstoresastringandevaluatestoTrueif'42'ispartofthatstring,andFalseotherwise.Also,let'sgivetheclassalengthpropertythatcorrespondstothatofthestoredstring:#oop/operator.overloading.pyclassWeird:def__init__(self,s):self._s=s
def__len__(self):returnlen(self._s)
def__bool__(self):return'42'inself._s
weird=Weird('Hello!Iam9yearsold!')print(len(weird))#24print(bool(weird))#False
weird2=Weird('Hello!Iam42yearsold!')print(len(weird2))#25print(bool(weird2))#True
Thatwasfun,wasn'tit?Forthecompletelistofmagicmethodsthatyoucanoverrideinordertoprovideyourcustomimplementationofoperatorsforyour
classes,pleaserefertothePythondatamodelintheofficialdocumentation.
Polymorphism–abriefoverviewThewordpolymorphismcomesfromtheGreekpolys(many,much)andmorphē(form,shape),anditsmeaningistheprovisionofasingleinterfaceforentitiesofdifferenttypes.
Inourcarexample,wecallengine.start(),regardlessofwhatkindofengineitis.Aslongasitexposesthestartmethod,wecancallit.That'spolymorphisminaction.
Inotherlanguages,suchasJava,inordertogiveafunctiontheabilitytoacceptdifferenttypesandcallamethodonthem,thosetypesneedtobecodedinsuchawaythattheyshareaninterface.Inthisway,thecompilerknowsthatthemethodwillbeavailableregardlessofthetypeoftheobjectthefunctionisfed(aslongasitextendstheproperinterface,ofcourse).
InPython,thingsaredifferent.Polymorphismisimplicit,nothingpreventsyoufromcallingamethodonanobject;therefore,technically,thereisnoneedtoimplementinterfacesorotherpatterns.
Thereisaspecialkindofpolymorphismcalledadhocpolymorphism,whichiswhatwesawinthelastparagraph:operatoroverloading.Thisistheabilityofanoperatortochangeshape,accordingtothetypeofdataitisfed.
PolymorphismalsoallowsPythonprogrammerstosimplyusetheinterface(methodsandproperties)exposedfromanobjectratherthanhavingtocheckwhichclassitwasinstantiatedfrom.Thisallowsthecodetobemorecompactandfeelmorenatural.
Icannotspendtoomuchtimeonpolymorphism,butIencourageyoutocheckitoutbyyourself,itwillexpandyourunderstandingofOOP.Goodluck!
DataclassesBeforeweleavetheOOPrealm,thereisonelastthingIwanttomention:dataclasses.IntroducedinPython3.7byPEP557(https://www.python.org/dev/peps/pep-0557/),theycanbedescribedas"mutablenamedtupleswithdefaults".Let'sdiveintoanexample:
#oop/dataclass.py
fromdataclassesimportdataclass
@dataclass
classBody:
'''Classtorepresentaphysicalbody.'''
name:str
mass:float=0.#Kg
speed:float=1.#m/s
defkinetic_energy(self)->float:
return(self.mass*self.speed**2)/2
body=Body('Ball',19,3.1415)
print(body.kinetic_energy())#93.755711375Joule
print(body)#Body(name='Ball',mass=19,speed=3.1415)
Inthepreviouscode,Ihavecreatedaclasstorepresentaphysicalbody,withonemethodthatallowsmetocalculateitskineticenergy(usingtherenownedformulaEk=½mv2).Noticethatnameissupposedtobeastring,whilemassandspeedarebothfloats,andbotharegivenadefaultvalue.It'salsointerestingthatIdidn'thavetowriteany__init__method,it'sdoneformebythedataclassdecorator,alongwithmethodsforcomparisonandforproducingthestringrepresentationoftheobject(implicitlycalledonthelastlinebyprint).
YoucanreadallthespecificationsinPEP557ifyouarecurious,butfornowjustrememberthatdataclassesmightofferanicer,slightlymorepowerfulalternativetonamedtuples,incaseyouneedit.
WritingacustomiteratorNowwehaveallthetoolstoappreciatehowwecanwriteourowncustomiterator.Let'sfirstdefineaniterableandaniterator:
Iterable:Anobjectissaidtobeiterableifit'scapableofreturningitsmembersoneatatime.Lists,tuples,strings,anddictionariesarealliterables.Customobjectsthatdefineeitherofthe__iter__or__getitem__methodsarealsoiterables.Iterator:Anobjectissaidtobeaniteratorifitrepresentsastreamofdata.Acustomiteratorisrequiredtoprovideanimplementationfor__iter__thatreturnstheobjectitself,andanimplementationfor__next__thatreturnsthenextitemofthedatastreamuntilthestreamisexhausted,atwhichpointallsuccessivecallsto__next__simplyraisetheStopIterationexception.Built-infunctions,suchasiterandnext,aremappedtocall__iter__and__next__onanobject,behindthescenes.
Let'swriteaniteratorthatreturnsalltheoddcharactersfromastringfirst,andthentheevenones:#iterators/iterator.pyclassOddEven:
def__init__(self,data):self._data=dataself.indexes=(list(range(0,len(data),2))+list(range(1,len(data),2)))
def__iter__(self):returnself
def__next__(self):ifself.indexes:returnself._data[self.indexes.pop(0)]raiseStopIteration
oddeven=OddEven('ThIsIsCoOl!')print(''.join(cforcinoddeven))#TIICO!hssol
oddeven=OddEven('HoLa')#ormanually...it=iter(oddeven)#thiscallsoddeven.__iter__internallyprint(next(it))#H
print(next(it))#L
print(next(it))#o
print(next(it))#a
So,weneededtoprovideanimplementationfor__iter__thatreturnedtheobjectitself,andthenonefor__next__.Let'sgothroughit.Whatneededtohappenwasthereturnof_data[0],_data[2],_data[4],...,_data[1],_data[3],_data[5],...untilwehadreturnedeveryiteminthedata.Inordertodothat,wepreparedalistandindexes,suchas[0,2,4,6,...,1,3,5,...],andwhiletherewasatleastanelementinit,wepoppedthefirstoneandreturnedtheelementfromthedatathatwasatthatposition,therebyachievingourgoal.Whenindexeswasempty,weraisedStopIteration,asrequiredbytheiteratorprotocol.
Thereareotherwaystoachievethesameresult,sogoaheadandtrytocodeadifferentoneyourself.Makesuretheendresultworksforalledgecases,emptysequences,sequencesoflengthsof1,2,andsoon.
SummaryInthischapter,welookedatdecorators,discoveredthereasonsforhavingthem,andcoveredafewexamplesusingoneormoreatthesametime.Wealsosawdecoratorsthattakearguments,whichareusuallyusedasdecoratorfactories.
Wescratchedthesurfaceofobject-orientedprogramminginPython.Wecoveredallthebasics,soyoushouldnowbeabletounderstandthecodethatwillcomeinfuturechapters.Wetalkedaboutallkindsofmethodsandattributesthatonecanwriteinaclass,weexploredinheritanceversuscomposition,methodoverriding,properties,operatoroverloading,andpolymorphism.
Attheend,weverybrieflytouchedbaseoniterators,sonowyouunderstandgeneratorsmoredeeply.
Inthenextchapter,we'regoingtoseehowtodealwithfilesandhowtopersistdatainseveraldifferentwaysandformats.
FilesandDataPersistence"Persistenceisthekeytotheadventurewecalllife."
–TorstenAlexanderLange
Inthepreviouschapters,wehaveexploredseveraldifferentaspectsofPython.Astheexampleshaveadidacticpurpose,we'veruntheminasimplePythonshell,orintheformofaPythonmodule.Theyran,maybeprintedsomethingontheconsole,andthentheyterminated,leavingnotraceoftheirbriefexistence.
Real-worldapplicationsthougharegenerallymuchdifferent.Naturally,theystillruninmemory,buttheyinteractwithnetworks,disks,anddatabases.Theyalsoexchangeinformationwithotherapplicationsanddevices,usingformatsthataresuitableforthesituation.
Inthischapter,wearegoingtostartclosingintotherealworldbyexploringthefollowing:
FilesanddirectoriesCompressionNetworksandstreamsTheJSONdata-interchangeformatDatapersistencewithpickleandshelve,fromthestandardlibraryDatapersistencewithSQLAlchemy
Asusual,Iwilltrytobalancebreadthanddepth,sothatbytheendofthechapter,youwillhaveasolidgraspofthefundamentalsandwillknowhowtofetchfurtherinformationontheweb.
WorkingwithfilesanddirectoriesWhenitcomestofilesanddirectories,Pythonoffersplentyofusefultools.Inparticular,inthefollowingexamples,wewillleveragetheosandshutilmodules.Aswe'llbereadingandwritingonthedisk,Iwillbeusingafile,fear.txt,whichcontainsanexcerptfromFear,byThichNhatHanh,asaguineapigforsomeofourexamples.
OpeningfilesOpeningafileinPythonisverysimpleandintuitive.Infact,wejustneedtousetheopenfunction.Let'sseeaquickexample:
#files/open_try.py
fh=open('fear.txt','rt')#r:read,t:text
forlineinfh.readlines():
print(line.strip())#removewhitespaceandprint
fh.close()
Thepreviouscodeisverysimple.Wecallopen,passingthefilename,andtellingopenthatwewanttoreaditintextmode.Thereisnopathinformationbeforethefilename;therefore,openwillassumethefileisinthesamefolderthescriptisrunfrom.Thismeansthatifwerunthisscriptfromoutsidethefilesfolder,thenfear.txtwon'tbefound.
Oncethefilehasbeenopened,weobtainafileobjectback,fh,whichwecanusetoworkonthecontentofthefile.Inthiscase,weusethereadlines()methodtoiterateoverallthelinesinthefile,andprintthem.Wecallstrip()oneachlinetogetridofanyextraspacesaroundthecontent,includingthelineterminationcharacterattheend,sinceprintwillalreadyaddoneforus.Thisisaquickanddirtysolutionthatworksinthisexample,butshouldthecontentofthefilecontainmeaningfulspacesthatneedtobepreserved,youwillhavetobeslightlymorecarefulinhowyousanitizethedata.Attheendofthescript,weflushandclosethestream.
Closingafileisveryimportant,aswedon'twanttoriskfailingtoreleasethehandlewehaveonit.Therefore,weneedtoapplysomeprecaution,andwrapthepreviouslogicinatry/finallyblock.Thishastheeffectthat,whatevererrormightoccurwhilewetrytoopenandreadthefile,wecanrestassuredthatclose()willbecalled:
#files/open_try.py
try:
fh=open('fear.txt','rt')
forlineinfh.readlines():
print(line.strip())
finally:
fh.close()
Thelogicisexactlythesame,butnowitisalsosafe.
Don'tworryifyoudon'tunderstandtry/finallyfornow.Wewillexplorehowtodealwithexceptionsinthenextchapter.Fornow,sufficetosaythatputtingcodewithinthebodyofatryblockaddsamechanismaroundthatcodethatallowsustodetecterrors(whicharecalledexceptions)anddecidewhattodoiftheyhappen.Inthiscase,wedon'treallydoanythingincaseoferrors,butbyclosingthefilewithinthefinallyblock,wemakesurethatlineisexecutedwhetherornotanyerrorhashappened.
Wecansimplifythepreviousexamplethisway:
#files/open_try.py
try:
fh=open('fear.txt')#rtisdefault
forlineinfh:#wecaniteratedirectlyonfh
print(line.strip())
finally:
fh.close()
Asyoucansee,rtisthedefaultmodeforopeningfiles,sowedon'tneedtospecifyit.Moreover,wecansimplyiterateonfh,withoutexplicitlycallingreadlines()onit.Pythonisveryniceandgivesusshorthandstomakeourcodeshorterandsimplertoread.
Allthepreviousexamplesproduceaprintofthefileontheconsole(checkoutthesourcecodetoreadthewholecontent):
AnexcerptfromFear-ByThichNhatHanh
ThePresentIsFreefromFear
Whenwearenotfullypresent,wearenotreallyliving.We’renotreallythere,either
forourlovedonesorforourselves.Ifwe’renotthere,thenwherearewe?Weare
running,running,running,evenduringoursleep.Werunbecausewe’retryingtoescape
fromourfear.
...
UsingacontextmanagertoopenafileLet'sadmitit:theprospectofhavingtodisseminateourcodewithtry/finallyblocksisnotoneofthebest.Asusual,Pythongivesusamuchnicerwaytoopenafileinasecurefashion:byusingacontextmanager.Let'sseethecodefirst:
#files/open_with.py
withopen('fear.txt')asfh:
forlineinfh:
print(line.strip())
Thepreviousexampleisequivalenttotheonebeforeit,butreadssomuchbetter.Thewithstatementsupportstheconceptofaruntimecontextdefinedbyacontextmanager.Thisisimplementedusingapairofmethods,__enter__and__exit__,thatallowuser-definedclassestodefinearuntimecontextthatisenteredbeforethestatementbodyisexecutedandexitedwhenthestatementends.Theopenfunctioniscapableofproducingafileobjectwheninvokedbyacontextmanager,butthetruebeautyofitliesinthefactthatfh.close()willbecalledautomaticallyforus,evenincaseoferrors.
Contextmanagersareusedinseveraldifferentscenarios,suchasthreadsynchronization,closureoffilesorotherobjects,andmanagementofnetworkanddatabaseconnections.Youcanfindinformationabouttheminthecontextlibdocumentationpage(https://docs.python.org/3.7/library/contextlib.html).
ReadingandwritingtoafileNowthatweknowhowtoopenafile,let'sseeacoupleofdifferentwaysthatwehavetoreadandwritetoit:
#files/print_file.py
withopen('print_example.txt','w')asfw:
print('HeyIamprintingintoafile!!!',file=fw)
Afirstapproachusestheprintfunction,whichyou'veseenplentyoftimesinthepreviouschapters.Afterobtainingafileobject,thistimespecifyingthatweintendtowritetoit("w"),wecantellthecalltoprinttodirectitseffectsonthefile,insteadofthedefaultsys.stdout,which,whenexecutedonaconsole,ismappedtoit.
Thepreviouscodehastheeffectofcreatingtheprint_example.txtfileifitdoesn'texist,ortruncateitincaseitdoes,andwritesthelineHeyIamprintingintoafile!!!toit.
Thisisallniceandeasy,butnotwhatwetypicallydowhenwewanttowritetoafile.Let'sseeamuchmorecommonapproach:#files/read_write.pywithopen('fear.txt')asf:lines=[line.rstrip()forlineinf]
withopen('fear_copy.txt','w')asfw:fw.write('\n'.join(lines))
Inthepreviousexample,wefirstopenfear.txtandcollectitscontentintoalist,linebyline.Noticethatthistime,I'mcallingamoreprecisemethod,rstrip(),asanexample,tomakesureIonlystripthewhitespaceontheright-handsideofeveryline.
Inthesecondpartofthesnippet,wecreateanewfile,fear_copy.txt,andwewritetoitallthelinesfromtheoriginalfile,joinedbyanewline,\n.Pythonisgraciousandworksbydefaultwithuniversalnewlines,whichmeansthateventhoughtheoriginalfilemighthaveanewlinethatisdifferentthan\n,itwillbetranslatedautomaticallyforusbeforethelineisreturned.Thisbehavioris,of
course,customizable,butnormallyitisexactlywhatyouwant.Speakingofnewlines,canyouthinkofoneofthemthatmightbemissinginthecopy?
ReadingandwritinginbinarymodeNoticethatbyopeningafilepassingtintheoptions(oromittingit,asitisthedefault),we'reopeningthefileintextmode.Thismeansthatthecontentofthefileistreatedandinterpretedastext.Ifyouwishtowritebytestoafile,youcanopenitinbinarymode.Thisisacommonrequirementwhenyoudealwithfilesthatdon'tjustcontainrawtext,suchasimages,audio/video,and,ingeneral,anyotherproprietaryformat.
Inordertohandlefilesinbinarymode,simplyspecifythebflagwhenopeningthem,asinthefollowingexample:
#files/read_write_bin.py
withopen('example.bin','wb')asfw:
fw.write(b'Thisisbinarydata...')
withopen('example.bin','rb')asf:
print(f.read())#prints:b'Thisisbinarydata...'
Inthisexample,I'mstillusingtextasbinarydata,butitcouldbeanythingyouwant.Youcanseeit'streatedasabinarybythefactthatyougettheb'This...'prefixintheoutput.
Protectingagainstoverridinganexistingfile
Pythongivesustheabilitytoopenfilesforwriting.Byusingthewflag,weopenafileandtruncateitscontent.Thismeansthefileisoverwrittenwithanemptyfile,andtheoriginalcontentislost.Ifyouwishtoonlyopenafileforwritingincaseitdoesn'texist,youcanusethexflaginstead,inthefollowingexample:
#files/write_not_exists.py
withopen('write_x.txt','x')asfw:
fw.write('Writingline1')#thissucceeds
withopen('write_x.txt','x')asfw:
fw.write('Writingline2')#thisfails
Ifyouruntheprevioussnippet,youwillfindafilecalledwrite_x.txtinyourdirectory,containingonlyonelineoftext.Thesecondpartofthesnippet,infact,failstoexecute.ThisistheoutputIgetonmyconsole:
$pythonwrite_not_exists.py
Traceback(mostrecentcalllast):
File"write_not_exists.py",line6,in<module>
withopen('write_x.txt','x')asfw:
FileExistsError:[Errno17]Fileexists:'write_x.txt'
CheckingforfileanddirectoryexistenceIfyouwanttomakesureafileordirectoryexists(oritdoesn't),theos.pathmoduleiswhatyouneed.Let'sseeasmallexample:
#files/existence.py
importos
filename='fear.txt'
path=os.path.dirname(os.path.abspath(filename))
print(os.path.isfile(filename))#True
print(os.path.isdir(path))#True
print(path)#/Users/fab/srv/lpp/ch7/files
Theprecedingsnippetisquiteinteresting.Afterdeclaringthefilenamewitharelativereference(inthatitismissingthepathinformation),weuseabspathtocalculatethefull,absolutepathofthefile.Then,wegetthepathinformation(byremovingthefilenameattheend)bycallingdirnameonit.Theresult,asyoucansee,isprintedonthelastline.Noticealsohowwecheckforexistence,bothforafileandadirectory,bycallingisfileandisdir.Intheos.pathmodule,youfindallthefunctionsyouneedtoworkwithpathnames.
Shouldyoueverneedtoworkwithpathsinadifferentway,youcancheckoutpathlib.Whileos.pathworkswithstrings,pathliboffersclassesrepresentingfilesystempathswithsemanticsappropriatefordifferentoperatingsystems.Itisbeyondthescopeofthischapter,butifyou'reinterested,checkoutPEP428(https://www.python.org/dev/peps/pep-0428/),anditspageinthestandardlibrary.
ManipulatingfilesanddirectoriesLet'sseeacoupleofquickexamplesonhowtomanipulatefilesanddirectories.Thefirstexamplemanipulatesthecontent:
#files/manipulation.py
fromcollectionsimportCounter
fromstringimportascii_letters
chars=ascii_letters+''
defsanitize(s,chars):
return''.join(cforcinsifcinchars)
defreverse(s):
returns[::-1]
withopen('fear.txt')asstream:
lines=[line.rstrip()forlineinstream]
withopen('raef.txt','w')asstream:
stream.write('\n'.join(reverse(line)forlineinlines))
#nowwecancalculatesomestatistics
lines=[sanitize(line,chars)forlineinlines]
whole=''.join(lines)
cnt=Counter(whole.lower().split())
print(cnt.most_common(3))
Thepreviousexampledefinestwofunctions:sanitizeandreverse.Theyaresimplefunctionswhosepurposeistoremoveanythingthatisnotaletterorspacefromastring,andproducethereversedcopyofastring,respectively.
Weopenfear.txtandwereaditscontentintoalist.Thenwecreateanewfile,raef.txt,whichwillcontainthehorizontally-mirroredversionoftheoriginalone.Wewriteallthecontentoflineswithasingleoperation,usingjoinonanewlinecharacter.Maybemoreinteresting,isthebitintheend.First,wereassignlinestoasanitizedversionofitself,bymeansoflistcomprehension.Thenweputthemtogetherinthewholestring,andfinally,wepasstheresulttoCounter.Noticethatwesplitthestringandputitinlowercase.Thisway,eachwordwillbecountedcorrectly,regardlessofitscase,and,thankstosplit,wedon'tneedtoworryaboutextraspacesanywhere.Whenweprintthethreemostcommonwords,werealizethattrulyThichNhatHanh'sfocusisonothers,asweisthemostcommonwordinthetext:
$pythonmanipulation.py
[('we',17),('the',13),('were',7)]
Let'snowseeanexampleofmanipulationmoreorientedtodiskoperations,inwhichweputtheshutilmoduletouse:
#files/ops_create.py
importshutil
importos
BASE_PATH='ops_example'#thiswillbeourbasepath
os.mkdir(BASE_PATH)
path_b=os.path.join(BASE_PATH,'A','B')
path_c=os.path.join(BASE_PATH,'A','C')
path_d=os.path.join(BASE_PATH,'A','D')
os.makedirs(path_b)
os.makedirs(path_c)
forfilenamein('ex1.txt','ex2.txt','ex3.txt'):
withopen(os.path.join(path_b,filename),'w')asstream:
stream.write(f'Somecontentherein{filename}\n')
shutil.move(path_b,path_d)
shutil.move(
os.path.join(path_d,'ex1.txt'),
os.path.join(path_d,'ex1d.txt')
)
Inthepreviouscode,westartbydeclaringabasepath,whichwillsafelycontainallthefilesandfolderswe'regoingtocreate.Wethenusemakedirstocreatetwodirectories:ops_example/A/Bandops_example/A/C.(Canyouthinkofawayofcreatingthetwodirectoriesbyusingmap?).
Weuseos.path.jointoconcatenatedirectorynames,asusing/wouldspecializethecodetorunonaplatformwherethedirectoryseparatoris/,butthenthecodewouldfailonplatformswithadifferentseparator.Let'sdelegatetojointhetasktofigureoutwhichistheappropriateseparator.
Aftercreatingthedirectories,withinasimpleforloop,weputsomecodethatcreatesthreefilesindirectoryB.Then,wemovethefolderBanditscontenttoadifferentname:D.Andfinally,werenameex1.txttoex1d.txt.Ifyouopenthatfile,you'llseeitstillcontainstheoriginaltextfromtheforloop.Callingtreeontheresultproducesthefollowing:
$treeops_example/
ops_example/
└──A
├──C
└──D
├──ex1d.txt
├──ex2.txt
└──ex3.txt
Manipulatingpathnames
Let'sexplorealittlemoretheabilitiesofos.pathbymeansofasimpleexample:
#files/paths.py
importos
filename='fear.txt'
path=os.path.abspath(filename)
print(path)
print(os.path.basename(path))
print(os.path.dirname(path))
print(os.path.splitext(path))
print(os.path.split(path))
readme_path=os.path.join(
os.path.dirname(path),'..','..','README.rst')
print(readme_path)
print(os.path.normpath(readme_path))
Readingtheresultisprobablyagoodenoughexplanationforthissimpleexample:
/Users/fab/srv/lpp/ch7/files/fear.txt#path
fear.txt#basename
/Users/fab/srv/lpp/ch7/files#dirname
('/Users/fab/srv/lpp/ch7/files/fear','.txt')#splitext
('/Users/fab/srv/lpp/ch7/files','fear.txt')#split
/Users/fab/srv/lpp/ch7/files/../../README.rst#readme_path
/Users/fab/srv/lpp/README.rst#normalized
TemporaryfilesanddirectoriesSometimes,it'sveryusefultobeabletocreateatemporarydirectoryorfilewhenrunningsomecode.Forexample,whenwritingteststhataffectthedisk,youcanusetemporaryfilesanddirectoriestorunyourlogicandassertthatit'scorrect,andtobesurethatattheendofthetestrun,thetestfolderhasnoleftovers.Let'sseehowyoudoitinPython:
#files/tmp.py
importos
fromtempfileimportNamedTemporaryFile,TemporaryDirectory
withTemporaryDirectory(dir='.')astd:
print('Tempdirectory:',td)
withNamedTemporaryFile(dir=td)ast:
name=t.name
print(os.path.abspath(name))
Theprecedingexampleisquitestraightforward:wecreateatemporarydirectoryinthecurrentone("."),andwecreateanamedtemporaryfileinit.Weprintthefilename,aswellasitsfullpath:
$pythontmp.py
Tempdirectory:./tmpwa9bdwgo
/Users/fab/srv/lpp/ch7/files/tmpwa9bdwgo/tmp3d45hm46
Runningthisscriptwillproduceadifferentresulteverytime.Afterall,it'satemporaryrandomnamewe'recreatinghere,right?
DirectorycontentWithPython,youcanalsoinspectthecontentofadirectory.I'llshowyoutwowaysofdoingthis:#files/listing.pyimportos
withos.scandir('.')asit:forentryinit:print(entry.name,entry.path,'File'ifentry.is_file()else'Folder')
Thissnippetusesos.scandir,calledonthecurrentdirectory.Weiterateontheresults,eachofwhichisaninstanceofos.DirEntry,aniceclassthatexposesusefulpropertiesandmethods.Inthecode,weaccessasubsetofthose:name,path,andis_file().Runningthecodeyieldsthefollowing(Iomittedafewresultsforbrevity):$pythonlisting.pyfixed_amount.py./fixed_amount.pyFileexistence.py./existence.pyFile...ops_example./ops_exampleFolder...
Amorepowerfulwaytoscanadirectorytreeisgiventousbyos.walk.Let'sseeanexample:#files/walking.pyimportos
forroot,dirs,filesinos.walk('.'):print(os.path.abspath(root))ifdirs:print('Directories:')fordir_indirs:print(dir_)print()
iffiles:print('Files:')forfilenameinfiles:print(filename)print()
Runningtheprecedingsnippetwillproducealistofallfilesanddirectoriesinthecurrentone,anditwilldothesameforeachsub-directory.
Fileanddirectorycompression
Beforeweleavethissection,letmegiveyouanexampleofhowtocreateacompressedfile.Inthesourcecodeofthebook,Ihavetwoexamples:onecreatesaZIPfile,whiletheotheronecreatesatar.gzfile.Pythonallowsyoutocreatecompressedfilesinseveraldifferentwaysandformats.Here,Iamgoingtoshowyouhowtocreatethemostcommonone,ZIP:
#files/compression/zip.py
fromzipfileimportZipFile
withZipFile('example.zip','w')aszp:
zp.write('content1.txt')
zp.write('content2.txt')
zp.write('subfolder/content3.txt')
zp.write('subfolder/content4.txt')
withZipFile('example.zip')aszp:
zp.extract('content1.txt','extract_zip')
zp.extract('subfolder/content3.txt','extract_zip')
Intheprecedingcode,weimportZipFile,andthen,withinacontextmanager,wewriteintoitfourdummycontextfiles(twoofwhichareinasub-folder,toshowZIPpreservesthefullpath).Afterwards,asanexample,weopenthecompressedfileandextractacoupleoffilesfromit,intotheextract_zipdirectory.Ifyouareinterestedinlearningmoreaboutdatacompression,makesureyoucheckouttheDataCompressionandArchivingsectiononthestandardlibrary(https://docs.python.org/3.7/library/archiving.html),whereyou'llbeabletolearnallaboutthistopic.
DatainterchangeformatsModernsoftwarearchitecturetendstosplitanapplicationintoseveralcomponents.Whetheryouembracetheservice-orientedarchitectureparadigm,oryoupushitevenfurtherintothemicroservicesrealm,thesecomponentswillhavetoexchangedata.Butevenifyouarecodingamonolithicapplication,whosecodebaseiscontainedinoneproject,chancesarethatyouhavetostillexchangedatawithAPIs,otherprograms,orsimplyhandlethedataflowbetweenthefrontendandthebackendpartofyourwebsite,whichverylikelywon'tspeakthesamelanguage.
Choosingtherightformatinwhichtoexchangeinformationiscrucial.Alanguage-specificformathastheadvantagethatthelanguageitselfisverylikelytoprovideyouwithallthetoolstomakeserializationanddeserializationabreeze.However,youwilllosetheabilitytotalktoothercomponentsthathavebeenwrittenindifferentversionsofthesamelanguage,orindifferentlanguagesaltogether.Regardlessofwhatthefuturelookslike,goingwithalanguage-specificformatshouldonlybedoneifitistheonlypossiblechoiceforthegivensituation.
Amuchbetterapproachistochooseaformatthatislanguageagnostic,andcanbespokenbyall(oratleastmost)languages.IntheteamIlead,wehavepeoplefromEngland,Poland,SouthAfrica,Spain,Greece,India,Italy,tomentionjustafew.WeallspeakEnglish,soregardlessofournativetongue,wecanallunderstandeachother(well...mostly!).
Inthesoftwareworld,somepopularformatshavebecomethedefactostandardoverrecentyears.ThemostfamousonesprobablyareXML,YAML,andJSON.ThePythonstandardlibraryfeaturesthexmlandjsonmodules,and,onPyPI(https://docs.python.org/3.7/library/archiving.html),youcanfindafewdifferentpackagestoworkwithYAML.
InthePythonenvironment,JSONisprobablythemostcommonlyusedone.Itwinsovertheothertwobecauseofbeingpartofthestandardlibrary,andforitssimplicity.IfyouhaveeverworkedwithXML,youknowwhatanightmareit
canbe.
WorkingwithJSONJSONistheacronymofJavaScriptObjectNotation,anditisasubsetoftheJavaScriptlanguage.Ithasbeenthereforalmosttwodecadesnow,soitiswellknownandwidelyadoptedbybasicallyalllanguages,eventhoughitisactuallylanguageindependent.Youcanreadallaboutitonitswebsite(https://www.json.org/),butI'mgoingtogiveyouaquickintroductiontoitnow.
JSONisbasedontwostructures:acollectionofname/valuepairs,andanorderedlistofvalues.YouwillimmediatelyrealizethatthesetwoobjectsmaptothedictionaryandlistdatatypesinPython,respectively.Asdatatypes,itoffersstrings,numbers,objects,andvalues,suchastrue,false,andnull.Let'sseeaquickexampletogetusstarted:
#json_examples/json_basic.py
importsys
importjson
data={
'big_number':2**3141,
'max_float':sys.float_info.max,
'a_list':[2,3,5,7],
}
json_data=json.dumps(data)
data_out=json.loads(json_data)
assertdata==data_out#jsonandback,datamatches
Webeginbyimportingthesysandjsonmodules.Thenwecreateasimpledictionarywithsomenumbersinsideandalist.Iwantedtotestserializinganddeserializingusingverybignumbers,bothintandfloat,soIput23141andwhateveristhebiggestfloatingpointnumbermysystemcanhandle.
Weserializewithjson.dumps,whichtakesdataandconvertsitintoaJSONformattedstring.Thatdataisthenfedintojson.loads,whichdoestheopposite:fromaJSONformattedstring,itreconstructsthedataintoPython.Onthelastline,wemakesurethattheoriginaldataandtheresultoftheserialization/deserializationthroughJSONmatch.
Let'ssee,inthenextexample,whatJSONdatawouldlooklikeifweprintedit:
#json_examples/json_basic.py
importjson
info={
'full_name':'SherlockHolmes',
'address':{
'street':'221BBakerSt',
'zip':'NW16XE',
'city':'London',
'country':'UK',
}
}
print(json.dumps(info,indent=2,sort_keys=True))
Inthisexample,wecreateadictionarywithSherlockHolmes'datainit.If,likeme,you'reafanofSherlockHolmes,andareinLondon,you'llfindhismuseumatthataddress(whichIrecommendvisiting,it'ssmallbutverynice).
Noticehowwecalljson.dumps,though.Wehavetoldittoindentwithtwospaces,andsortkeysalphabetically.Theresultisthis:
$pythonjson_basic.py
{
"address":{
"city":"London",
"country":"UK",
"street":"221BBakerSt",
"zip":"NW16XE"
},
"full_name":"SherlockHolmes"
}
ThesimilaritywithPythonishuge.Theonedifferenceisthatifyouplaceacommaonthelastelementinadictionary,likeI'vedoneinPython(asitiscustomary),JSONwillcomplain.
Letmeshowyousomethinginteresting:
#json_examples/json_tuple.py
importjson
data_in={
'a_tuple':(1,2,3,4,5),
}
json_data=json.dumps(data_in)
print(json_data)#{"a_tuple":[1,2,3,4,5]}
data_out=json.loads(json_data)
print(data_out)#{'a_tuple':[1,2,3,4,5]}
Inthisexample,wehaveputatuple,insteadofalist.Theinterestingbitisthat,conceptually,atupleisalsoanorderedlistofitems.Itdoesn'thavetheflexibility
ofalist,butstill,itisconsideredthesamefromtheperspectiveofJSON.Therefore,asyoucanseebythefirstprint,inJSONatupleistransformedintoalist.Naturallythen,theinformationthatitwasatupleislost,andwhendeserializationhappens,whatwehaveindata_out,a_tupleisactuallyalist.Itisimportantthatyoukeepthisinmindwhendealingwithdata,asgoingthroughatransformationprocessthatinvolvesaformatthatonlycomprisesasubsetofthedatastructuresyoucanuseimpliestherewillbeinformationloss.Inthiscase,welosttheinformationaboutthetype(tupleversuslist).
Thisisactuallyacommonproblem.Forexample,youcan'tserializeallPythonobjectstoJSON,asitisnotclearifJSONshouldrevertthat(orhow).Thinkaboutdatetime,forexample.AninstanceofthatclassisaPythonobjectthatJSONwon'tallowserializing.Ifwetransformitintoastringsuchas2018-03-04T12:00:30Z,whichistheISO8601representationofadatewithtimeandtimezoneinformation,whatshouldJSONdowhendeserializing?Shoulditsaythisisactuallydeserializableintoadatetimeobject,soI'dbetterdoit,orshoulditsimplyconsideritasastringandleaveitasitis?Whataboutdatatypesthatcanbeinterpretedinmorethanoneway?
Theansweristhatwhendealingwithdatainterchange,weoftenneedtotransformourobjectsintoasimplerformatpriortoserializingthemwithJSON.Thisway,wewillknowhowtoreconstructthemcorrectlywhenwedeserializethem.
Insomecases,though,andmostlyforinternaluse,itisusefultobeabletoserializecustomobjects,so,justforfun,I'mgoingtoshowyouhowwithtwoexamples:complexnumbers(becauseIlovemath)anddatetimeobjects.
Customencoding/decodingwithJSONIntheJSONworld,wecanconsidertermslikeencoding/decodingassynonymstoserializing/deserializing.TheybasicallyallmeantransformingtoandbackfromJSON.Inthefollowingexample,I'mgoingtoshowyouhowtoencodecomplexnumbers:
#json_examples/json_cplx.py
importjson
classComplexEncoder(json.JSONEncoder):
defdefault(self,obj):
ifisinstance(obj,complex):
return{
'_meta':'_complex',
'num':[obj.real,obj.imag],
}
returnjson.JSONEncoder.default(self,obj)
data={
'an_int':42,
'a_float':3.14159265,
'a_complex':3+4j,
}
json_data=json.dumps(data,cls=ComplexEncoder)
print(json_data)
defobject_hook(obj):
try:
ifobj['_meta']=='_complex':
returncomplex(*obj['num'])
except(KeyError,TypeError):
returnobj
data_out=json.loads(json_data,object_hook=object_hook)
print(data_out)
WestartbydefiningaComplexEncoderclass,whichneedstoimplementthedefaultmethod.Thismethodispassedtoalltheobjectsthathavetobeserialized,oneatatime,intheobjvariable.Atsomepoint,objwillbeourcomplexnumber,3+4j.Whenthatistrue,wereturnadictionarywithsomecustommetainformation,andalistthatcontainsboththerealandtheimaginarypartofthenumber.Thatisallweneedtodotoavoidlosinginformationforacomplexnumber.
Wethencalljson.dumps,butthistimeweusetheclsargumenttospecifyour
customencoder.Theresultisprinted:
{"an_int":42,"a_float":3.14159265,"a_complex":{"_meta":"_complex","num":[3.0,
4.0]}}
Halfthejobisdone.Forthedeserializationpart,wecouldhavewrittenanotherclassthatwouldinheritfromJSONDecoder,but,justforfun,I'veusedadifferenttechniquethatissimplerandusesasmallfunction:object_hook.
Withinthebodyofobject_hook,wefindanothertryblock,butdon'tworryaboutitfornow.I'llexplainitindetailinthenextchapter.Theimportantpartisthetwolineswithinthebodyofthetryblockitself.Thefunctionreceivesanobject(notice,thefunctionisonlycalledwhenobjisadictionary),andifthemetadatamatchesourconventionforcomplexnumbers,wepasstherealandimaginarypartstothecomplexfunction.Thetry/exceptblockisthereonlytopreventmalformedJSONfromruiningtheparty(andifthathappens,wesimplyreturntheobjectasitis).
Thelastprintreturns:
{'an_int':42,'a_float':3.14159265,'a_complex':(3+4j)}
Youcanseethata_complexhasbeencorrectlydeserialized.
Let'sseeaslightlymorecomplex(nopunintended)examplenow:dealingwithdatetimeobjects.I'mgoingtosplitthecodeintotwoblocks,theserializingpart,andthedeserializingafterwards:
#json_examples/json_datetime.py
importjson
fromdatetimeimportdatetime,timedelta,timezone
now=datetime.now()
now_tz=datetime.now(tz=timezone(timedelta(hours=1)))
classDatetimeEncoder(json.JSONEncoder):
defdefault(self,obj):
ifisinstance(obj,datetime):
try:
off=obj.utcoffset().seconds
exceptAttributeError:
off=None
return{
'_meta':'_datetime',
'data':obj.timetuple()[:6]+(obj.microsecond,),
'utcoffset':off,
}
returnjson.JSONEncoder.default(self,obj)
data={
'an_int':42,
'a_float':3.14159265,
'a_datetime':now,
'a_datetime_tz':now_tz,
}
json_data=json.dumps(data,cls=DatetimeEncoder)
print(json_data)
ThereasonwhythisexampleisslightlymorecomplexliesinthefactthatdatetimeobjectsinPythoncanbetimezoneawareornot;therefore,weneedtobemorecareful.Theflowisbasicallythesameasbefore,onlyitisdealingwithadifferentdatatype.Westartbygettingthecurrentdateandtimeinformation,andwedoitbothwithout(now)andwith(now_tz)timezoneawareness,justtomakesureourscriptworks.Wethenproceedtodefineacustomencoderasbefore,andweimplementonceagainthedefaultmethod.Theimportantbitsinthatmethodarehowwegetthetimezoneoffset(off)information,inseconds,andhowwestructurethedictionarythatreturnsthedata.Thistime,themetadatasaysit'sadatetimeinformation,andthenwesavethefirstsixitemsinthetimetuple(year,month,day,hour,minute,andsecond),plusthemicrosecondsinthedatakey,andtheoffsetafterthat.Couldyoutellthatthevalueofdataisaconcatenationoftuples?Goodjobifyoucould!
Whenwehaveourcustomencoder,weproceedtocreatesomedata,andthenweserialize.Theprintstatementreturns(afterI'vedonesomeprettifying):
{
"a_datetime":{
"_meta":"_datetime",
"data":[2018,3,18,17,57,27,438792],
"utcoffset":null
},
"a_datetime_tz":{
"_meta":"_datetime",
"data":[2018,3,18,18,57,27,438810],
"utcoffset":3600
},
"a_float":3.14159265,
"an_int":42
}
Interestingly,wefindoutthatNoneistranslatedtonull,itsJavaScriptequivalent.Moreover,wecanseeourdataseemstohavebeenencodedproperly.Let'sproceedtothesecondpartofthescript:
#json_examples/json_datetime.py
defobject_hook(obj):
try:
ifobj['_meta']=='_datetime':
ifobj['utcoffset']isNone:
tz=None
else:
tz=timezone(timedelta(seconds=obj['utcoffset']))
returndatetime(*obj['data'],tzinfo=tz)
except(KeyError,TypeError):
returnobj
data_out=json.loads(json_data,object_hook=object_hook)
Onceagain,wefirstverifythatthemetadataistellingusit'sadatetime,andthenweproceedtofetchthetimezoneinformation.Oncewehavethat,wepassthe7-tuple(using*tounpackitsvaluesinthecall)andthetimezoneinformationtothedatetimecall,gettingbackouroriginalobject.Let'sverifyitbyprintingdata_out:
{
'a_datetime':datetime.datetime(2018,3,18,18,1,46,54693),
'a_datetime_tz':datetime.datetime(
2018,3,18,19,1,46,54711,
tzinfo=datetime.timezone(datetime.timedelta(seconds=3600))),
'a_float':3.14159265,
'an_int':42
}
Asyoucansee,wegoteverythingbackcorrectly.Asanexercise,I'dliketochallengeyoutowritethesamelogic,butforadateobject,whichshouldbesimpler.
Beforewemoveontothenexttopic,awordofcaution.Perhapsitiscounter-intuitive,butworkingwithdatetimeobjectscanbeoneofthetrickiestthingstodo,so,althoughI'mprettysurethiscodeisdoingwhatitissupposedtodo,IwanttostressthatIonlytesteditverylightly.Soifyouintendtograbitanduseit,pleasedotestitthoroughly.Testfordifferenttimezones,testfordaylightsavingtimebeingonandoff,testfordatesbeforetheepoch,andsoon.Youmightfindthatthecodeinthissectionthenwouldneedsomemodificationstosuityourcases.
Let'snowmovetothenexttopic,IO.
IO,streams,andrequestsIOstandsforinput/output,anditbroadlyreferstothecommunicationbetweenacomputerandtheoutsideworld.ThereareseveraldifferenttypesofIO,anditisoutsidethescopeofthischaptertoexplainallofthem,butIstillwanttoofferyouacoupleofexamples.
Usinganin-memorystreamThefirstwillshowyoutheio.StringIOclass,whichisanin-memorystreamfortextIO.Thesecondoneinsteadwillescapethelocalityofourcomputer,andshowyouhowtoperformanHTTPrequest.Let'sseethefirstexample:
#io_examples/string_io.py
importio
stream=io.StringIO()
stream.write('LearningPythonProgramming.\n')
print('BecomeaPythonninja!',file=stream)
contents=stream.getvalue()
print(contents)
stream.close()
Intheprecedingcodesnippet,weimporttheiomodulefromthestandardlibrary.ThisisaveryinterestingmodulethatfeaturesmanytoolsrelatedtostreamsandIO.OneofthemisStringIO,whichisanin-memorybufferinwhichwe'regoingtowritetwosentences,usingtwodifferentmethods,aswedidwithfilesinthefirstexamplesofthischapter.WecanbothcallStringIO.writeorwecanuseprint,andtellittodirectthedatatoourstream.
Bycallinggetvalue,wecangetthecontentofthestream(andprintit),andfinallywecloseit.Thecalltoclosecausesthetextbuffertobeimmediatelydiscarded.
Thereisamoreelegantwaytowritethepreviouscode(canyouguessit,beforeyoulook?):
#io_examples/string_io.py
withio.StringIO()asstream:
stream.write('LearningPythonProgramming.\n')
print('BecomeaPythonninja!',file=stream)
contents=stream.getvalue()
print(contents)
Yes,itisagainacontextmanager.Likeopen,io.StringIOworkswellwithinacontextmanagerblock.Noticethesimilaritywithopen:inthiscasetoo,wedon'tneedtomanuallyclosethestream.
In-memoryobjectscanbeusefulinamultitudeofsituations.Memoryismuch
fasterthanadiskand,forsmallamountsofdata,canbetheperfectchoice.
Whenrunningthescript,theoutputis:
$pythonstring_io.py
LearningPythonProgramming.
BecomeaPythonninja!
MakingHTTPrequestsLet'snowexploreacoupleofexamplesonHTTPrequests.Iwillusetherequestslibraryfortheseexamples,whichyoucaninstallwithpip.We'regoingtoperformHTTPrequestsagainstthehttpbin.orgAPI,which,interestingly,wasdevelopedbyKennethReitz,thecreatoroftherequestslibraryitself.Thislibraryisamongstthemostwidelyadoptedallovertheworld:
importrequests
urls={
'get':'https://httpbin.org/get?title=learn+python+programming',
'headers':'https://httpbin.org/headers',
'ip':'https://httpbin.org/ip',
'now':'https://now.httpbin.org/',
'user-agent':'https://httpbin.org/user-agent',
'UUID':'https://httpbin.org/uuid',
}
defget_content(title,url):
resp=requests.get(url)
print(f'Responsefor{title}')
print(resp.json())
fortitle,urlinurls.items():
get_content(title,url)
print('-'*40)
Theprecedingsnippetshouldbesimpletounderstand.IdeclareadictionaryofURLsagainstwhichIwanttoperformrequests.Ihaveencapsulatedthecodethatperformstherequestintoatinyfunction:get_content.Asyoucansee,verysimply,weperformaGETrequest(byusingrequests.get),andweprintthetitleandtheJSONdecodedversionofthebodyoftheresponse.Letmespendawordaboutthislastbit.
Whenweperformarequesttoawebsite,orAPI,wegetbackaresponseobject,whichis,verysimply,whatwasreturnedbytheserverweperformedtherequestagainst.Thebodyofallresponsesfromhttpbin.orghappenstobeJSONencoded,soinsteadofgettingthebodyasitis(bygettingresp.text)andmanuallydecodingit,callingjson.loadsonit,wesimplycombinethetwobyleveragingthejsonmethodontheresponseobject.Thereareplentyofreasonswhytherequestspackagehasbecomesowidelyadopted,andoneofthemisdefinitelyitseaseofuse.
Now,whenyouperformarequestinyourapplication,youwillwanttohaveamuchmorerobustapproachindealingwitherrorsandsoon,butforthischapter,asimpleexamplewilldo.Don'tworry,IwillgiveyouamorecomprehensiveintroductiontoHTTPrequestsinChapter14,WebDevelopment.
Goingbacktoourcode,intheend,werunaforloopandgetalltheURLs.Whenyourunit,youwillseetheresultofeachcallprintedonyourconsole,likethis(prettifiedandtrimmedforbrevity):
$pythonreqs.py
Responseforget
{
"args":{
"title":"learnpythonprogramming"
},
"headers":{
"Accept":"*/*",
"Accept-Encoding":"gzip,deflate",
"Connection":"close",
"Host":"httpbin.org",
"User-Agent":"python-requests/2.19.0"
},
"origin":"82.47.175.158",
"url":"https://httpbin.org/get?title=learn+python+programming"
}
...restoftheoutputomitted...
NoticethatyoumightgetaslightlydifferentoutputintermsofversionnumbersandIPs,whichisfine.Now,GETisonlyoneoftheHTTPverbs,anditisdefinitelythemostcommonlyused.ThesecondoneistheubiquitousPOST,whichisthetypeofrequestyoumakewhenyouneedtosenddatatotheserver.Everytimeyousubmitaformontheweb,you'rebasicallymakingaPOSTrequest.So,let'strytomakeoneprogrammatically:
#io_examples/reqs_post.py
importrequests
url='https://httpbin.org/post'
data=dict(title='LearnPythonProgramming')
resp=requests.post(url,data=data)
print('ResponseforPOST')
print(resp.json())
Thepreviouscodeisverysimilartotheonewesawbefore,onlythistimewedon'tcallget,butpost,andbecausewewanttosendsomedata,wespecifythatinthecall.Therequestslibraryoffersmuch,muchmorethanthis,andithasbeenpraisedbythecommunityforthebeautifulAPIitexposes.ItisaprojectthatIencourageyoutocheckoutandexplore,asyouwillendupusingitallthetime,
anyway.
Runningthepreviousscript(andapplyingsomeprettifyingmagictotheoutput)yieldsthefollowing:
$pythonreqs_post.py
ResponseforPOST
{'args':{},
'data':'',
'files':{},
'form':{'title':'LearnPythonProgramming'},
'headers':{'Accept':'*/*',
'Accept-Encoding':'gzip,deflate',
'Connection':'close',
'Content-Length':'30',
'Content-Type':'application/x-www-form-urlencoded',
'Host':'httpbin.org',
'User-Agent':'python-requests/2.7.0CPython/3.7.0b2'
'Darwin/17.4.0'},
'json':None,
'origin':'82.45.123.178',
'url':'https://httpbin.org/post'}
Noticehowtheheadersarenowdifferent,andwefindthedatawesentintheformkey/valuepairoftheresponsebody.
Ihopetheseshortexamplesareenoughtogetyoustarted,especiallywithrequests.Thewebchangeseveryday,soit'sworthlearningthebasicsandthenbrushupeverynowandthen.
Let'snowmoveontothelasttopicofthischapter:persistingdataondiskindifferentformats.
PersistingdataondiskInthelastsectionofthischapter,we'reexploringhowtopersistdataondiskinthreedifferentformats.Wewillexplorepickle,shelve,andashortexamplethatwillinvolveaccessingadatabaseusingSQLAlchemy,themostwidelyadoptedORMlibraryinthePythonecosystem.
SerializingdatawithpickleThepicklemodule,fromthePythonstandardlibrary,offerstoolstoconvertPythonobjectsintobytestreams,andviceversa.EventhoughthereisapartialoverlapintheAPIthatpickleandjsonexpose,thetwoarequitedifferent.Aswehaveseenpreviouslyinthischapter,JSONisatextformat,humanreadable,languageindependent,andsupportsonlyarestrictedsubsetofPythondatatypes.Thepicklemodule,ontheotherhand,isnothumanreadable,translatestobytes,isPythonspecific,and,thankstothewonderfulPythonintrospectioncapabilities,itsupportsanextremelylargeamountofdatatypes.
Regardlessofthesedifferences,though,whichyoushouldknowwhenyouconsiderwhethertouseoneortheother,Ithinkthatthemostimportantconcernregardingpickleliesinthesecuritythreatsyouareexposedtowhenyouuseit.Unpicklingerroneousormaliciousdatafromanuntrustedsourcecanbeverydangerous,soifyoudecidetoadoptitinyourapplication,youneedtobeextracareful.
Thatsaid,let'sseeitinaction,bymeansofasimpleexample:
#persistence/pickler.py
importpickle
fromdataclassesimportdataclass
@dataclass
classPerson:
first_name:str
last_name:str
id:int
defgreet(self):
print(f'Hi,Iam{self.first_name}{self.last_name}'
f'andmyIDis{self.id}'
)
people=[
Person('Obi-Wan','Kenobi',123),
Person('Anakin','Skywalker',456),
]
#savedatainbinaryformattoafile
withopen('data.pickle','wb')asstream:
pickle.dump(people,stream)
#loaddatafromafile
withopen('data.pickle','rb')asstream:
peeps=pickle.load(stream)
forpersoninpeeps:
person.greet()
Inthepreviousexample,wecreateaPersonclassusingthedataclassdecorator,whichwehaveseeninChapter6,OOP,Decorators,andIterators.TheonlyreasonIwrotethisexamplewithadataclassistoshowyouhoweffortlesslypickledealswithit,withnoneedforustodoanythingwewouldn'tdoforasimplerdatatype.
Theclasshasthreeattributes:first_name,last_name,andid.Italsoexposesagreetmethod,whichsimplyprintsahellomessagewiththedata.
Wecreatealistofinstances,andthenwesaveittoafile.Inordertodoso,weusepickle.dump,towhichwefeedthecontenttobepickled,andthestreamtowhichwewanttowrite.Immediatelyafterthat,wereadfromthatsamefile,andbyusingpickle.load,weconvertbackintoPythonthewholecontentofthatstream.Justtomakesurethattheobjectshavebeenconvertedcorrectly,wecallthegreetmethodonbothofthem.Theresultisthefollowing:
$pythonpickler.py
Hi,IamObi-WanKenobiandmyIDis123
Hi,IamAnakinSkywalkerandmyIDis456
Thepicklemodulealsoallowsyoutoconvertto(andfrom)byteobjects,bymeansofthedumpsandloadsfunctions(notethesattheendofbothnames).Inday-to-dayapplications,pickleisusuallyusedwhenweneedtopersistPythondatathatisnotsupposedtobeexchangedwithanotherapplication.OneexampleIstumbleduponrecentlywasthesessionmanagementinaflaskplugin,whichpicklesthesessionobjectbeforesendingittoRedis.Inpractice,though,youareunlikelytohavetodealwiththislibraryveryoften.
Anothertoolthatispossiblyusedevenless,butthatprovestobeveryusefulwhenyouareshortofresources,isshelve.
SavingdatawithshelveAshelf,isapersistentdictionary-likeobject.Thebeautyofitisthatthevaluesyousaveintoashelfcanbeanyobjectyoucanpickle,soyou'renotrestrictedlikeyouwouldbeifyouwereusingadatabase.Albeitinterestinganduseful,theshelvemoduleisusedquiterarelyinpractice.Justforcompleteness,let'sseeaquickexampleofhowitworks:
#persistence/shelf.py
importshelve
classPerson:
def__init__(self,name,id):
self.name=name
self.id=id
withshelve.open('shelf1.shelve')asdb:
db['obi1']=Person('Obi-Wan',123)
db['ani']=Person('Anakin',456)
db['a_list']=[2,3,5]
db['delete_me']='wewillhavetodeletethisone...'
print(list(db.keys()))#['ani','a_list','delete_me','obi1']
deldb['delete_me']#gone!
print(list(db.keys()))#['ani','a_list','obi1']
print('delete_me'indb)#False
print('ani'indb)#True
a_list=db['a_list']
a_list.append(7)
db['a_list']=a_list
print(db['a_list'])#[2,3,5,7]
Apartfromthewiringandtheboilerplatearoundit,thepreviousexampleresemblesanexercisewithdictionaries.WecreateasimplePersonclassandthenweopenashelvefilewithinacontextmanager.Asyoucansee,weusethedictionarysyntaxtostorefourobjects:twoPersoninstances,alist,andastring.Ifweprintthekeys,wegetalistcontainingthefourkeysweused.Immediatelyafterprintingit,wedeletethe(aptlynamed)delete_mekey/valuepairfromshelf.Printingthekeysagainshowsthedeletionhassucceeded.Wethentestacoupleofkeysformembership,andfinally,weappendnumber7toa_list.Noticehowwehavetoextractthelistfromtheshelf,modifyit,andsaveitagain.
Incasethisbehaviorisundesired,thereissomethingwecando:
#persistence/shelf.py
withshelve.open('shelf2.shelve',writeback=True)asdb:
db['a_list']=[11,13,17]
db['a_list'].append(19)#in-placeappend!
print(db['a_list'])#[11,13,17,19]
Byopeningtheshelfwithwriteback=True,weenablethewritebackfeature,whichallowsustosimplyappendtoa_listasifitactuallywasavaluewithinaregulardictionary.Thereasonwhythisfeatureisnotactivebydefaultisthatitcomeswithapricethatyoupayintermsofmemoryconsumptionandslowerclosingoftheshelf.
Nowthatwehavepaidhomagetothestandardlibrarymodulesrelatedtodatapersistence,let'stakealookatthemostwidelyadoptedORMinthePythonecosystem:SQLAlchemy.
SavingdatatoadatabaseForthisexample,wearegoingtoworkwithanin-memorydatabase,whichwillmakethingssimplerforus.Inthesourcecodeofthebook,IhaveleftacoupleofcommentstoshowyouhowtogenerateaSQLitefile,soIhopeyou'llexplorethatoptionaswell.
YoucanfindafreedatabasebrowserforSQLiteatsqlitebrowser.org.Ifyouarenotsatisfiedwithit,youwillbeabletofindawiderangeoftools,somefree,somenotfree,thatyoucanusetoaccessandmanipulateadatabasefile.
Beforewediveintothecode,allowmetobrieflyintroducetheconceptofarelationaldatabase.
Arelationaldatabaseisadatabasethatallowsyoutosavedatafollowingtherelationalmodel,inventedin1969byEdgarF.Codd.Inthismodel,dataisstoredinoneormoretables.Eachtablehasrows(alsoknownasrecords,ortuples),eachofwhichrepresentsanentryinthetable.Tablesalsohavecolumns(alsoknownasattributes),eachofwhichrepresentsanattributeoftherecords.Eachrecordisidentifiedthroughauniquekey,morecommonlyknownastheprimarykey,whichistheunionofoneormorecolumnsinthetable.Togiveyouanexample:imagineatablecalledUsers,withcolumnsid,username,password,name,andsurname.Suchatablewouldbeperfecttocontainusersofoursystem.Eachrowwouldrepresentadifferentuser.Forexample,arowwiththevalues3,gianchub,my_wonderful_pwd,Fabrizio,andRomano,wouldrepresentmyuserinthesystem.
Thereasonwhythemodeliscalledrelationalisbecauseyoucanestablishrelationsbetweentables.Forexample,ifyouaddedatablecalledPhoneNumberstoourfictitiousdatabase,youcouldinsertphonenumbersintoit,andthen,througharelation,establishwhichphonenumberbelongstowhichuser.
Inordertoqueryarelationaldatabase,weneedaspeciallanguage.ThemainstandardiscalledSQL,whichstandsforStructuredQueryLanguage.Itisbornoutofsomethingcalledrelationalalgebra,whichisaverynicefamilyofalgebrasusedtomodeldatastoredaccordingtotherelationalmodel,and
performingqueriesonit.Themostcommonoperationsyoucanperformusuallyinvolvefilteringontherowsorcolumns,joiningtables,aggregatingtheresultsaccordingtosomecriteria,andsoon.TogiveyouanexampleinEnglish,aqueryonourimaginarydatabasecouldbe:Fetchallusers(username,name,surname)whoseusernamestartswith"m",whohaveatmostonephonenumber.Inthisquery,weareaskingforasubsetofthecolumnsintheUsertable.Wearefilteringonusersbytakingonlythosewhoseusernamestartswiththeletterm,andevenfurther,onlythosewhohaveatmostonephonenumber.
BackinthedayswhenIwasastudentinPadova,Ispentawholesemesterlearningboththerelationalalgebrasemantics,andthestandardSQL(amongstotherthings).Ifitwasn'tforamajorbicycleaccidentIhadthedayoftheexam,IwouldsaythatthiswasoneofthemostfunexamsIeverhadtoprepare.
Now,eachdatabasecomeswithitsownflavorofSQL.Theyallrespectthestandardtosomeextent,butnonefullydoes,andtheyarealldifferentfromoneanotherinsomerespects.Thisposesanissueinmodernsoftwaredevelopment.IfourapplicationcontainsSQLcode,itisquitelikelythatifwedecidedtouseadifferentdatabaseengine,ormaybeadifferentversionofthesameengine,wewouldfindourSQLcodeneedsamending.
Thiscanbequitepainful,especiallysinceSQLqueriescanbecomevery,verycomplicatedquitequickly.Inordertoalleviatethispainalittle,computerscientists(blessthem)havecreatedcodethatmapsobjectsofaparticularlanguagetotablesofarelationaldatabase.Unsurprisingly,thenameofsuchtoolsisObject-RelationalMapping(ORMs).
Inmodernapplicationdevelopment,youwouldnormallystartinteractingwithadatabasebyusinganORM,andshouldyoufindyourselfinasituationwhereyoucan'tperformaqueryyouneedtoperform,throughtheORM,youwouldthenresorttousingSQLdirectly.ThisisagoodcompromisebetweenhavingnoSQLatall,andusingnoORM,whichultimatelymeansspecializingthecodethatinteractswiththedatabase,withtheaforementioneddisadvantages.
Inthissection,I'dliketoshowanexamplethatleveragesSQLAlchemy,themostpopularPythonORM.Wearegoingtodefinetwomodels(PersonandAddress)whichmaptoatableeach,andthenwe'regoingtopopulatethedatabaseandperformafewqueriesonit.
Let'sstartwiththemodeldeclarations:
#persistence/alchemy_models.py
fromsqlalchemy.ext.declarativeimportdeclarative_base
fromsqlalchemyimport(
Column,Integer,String,ForeignKey,create_engine)
fromsqlalchemy.ormimportrelationship
Atthebeginning,weimportsomefunctionsandtypes.Thefirstthingweneedtodothenistocreateanengine.ThisenginetellsSQLAlchemyaboutthetypeofdatabasewehavechosenforourexample:
#persistence/alchemy_models.py
engine=create_engine('sqlite:///:memory:')
Base=declarative_base()
classPerson(Base):
__tablename__='person'
id=Column(Integer,primary_key=True)
name=Column(String)
age=Column(Integer)
addresses=relationship(
'Address',
back_populates='person',
order_by='Address.email',
cascade='all,delete-orphan'
)
def__repr__(self):
returnf'{self.name}(id={self.id})'
classAddress(Base):
__tablename__='address'
id=Column(Integer,primary_key=True)
email=Column(String)
person_id=Column(ForeignKey('person.id'))
person=relationship('Person',back_populates='addresses')
def__str__(self):
returnself.email
__repr__=__str__
Base.metadata.create_all(engine)
EachmodeltheninheritsfromtheBasetable,whichinthisexampleconsistsofthemeredefault,returnedbydeclarative_base().WedefinePerson,whichmapstoatablecalledperson,andexposestheattributesid,name,andage.WealsodeclarearelationshipwiththeAddressmodel,bystatingthataccessingtheaddressesattributewillfetchalltheentriesintheaddresstablethatarerelatedtotheparticularPersoninstancewe'redealingwith.Thecascadeoptionaffectshowcreationanddeletionwork,butitisamoreadvancedconcept,soI'dsuggestyou
glideonitfornowandmaybeinvestigatemorelateron.
Thelastthingwedeclareisthe__repr__method,whichprovidesuswiththeofficialstringrepresentationofanobject.Thisissupposedtobearepresentationthatcanbeusedtocompletelyreconstructtheobject,butinthisexample,Isimplyuseittoprovidesomethinginoutput.Pythonredirectsrepr(obj)toacalltoobj.__repr__().
WealsodeclaretheAddressmodel,whichwillcontainemailaddresses,andareferencetothepersontheybelongto.Youcanseetheperson_idandpersonattributesarebothaboutsettingarelationbetweentheAddressandPersoninstances.NotehowIdeclaredthe__str__methodonAddress,andthenassignedanaliastoit,called__repr__.ThismeansthatcallingbothreprandstronAddressobjectswillultimatelyresultincallingthe__str__method.ThisisquiteacommontechniqueinPython,soItooktheopportunitytoshowittoyouhere.
Onthelastline,wetelltheenginetocreatetablesinthedatabaseaccordingtoourmodels.
AdeeperunderstandingofthiscodewouldrequiremuchmorespacethanIcanafford,soIencourageyoutoreadupondatabasemanagementsystems(DBMS),SQL,RelationalAlgebra,andSQLAlchemy.
Nowthatwehaveourmodels,let'susethemtopersistsomedata!
Let'stakealookatthefollowingexample:
#persistence/alchemy.py
fromalchemy_modelsimportPerson,Address,engine
fromsqlalchemy.ormimportsessionmaker
Session=sessionmaker(bind=engine)
session=Session()
Firstwecreatesession,whichistheobjectweusetomanagethedatabase.Next,weproceedbycreatingtwopeople:
anakin=Person(name='AnakinSkywalker',age=32)
obi1=Person(name='Obi-WanKenobi',age=40)
Wethenaddemailaddressestobothofthem,usingtwodifferenttechniques.Oneassignsthemtoalist,andtheotheronesimplyappendsthem:
obi1.addresses=[
Address(email='[email protected]'),
Address(email='[email protected]'),
]
anakin.addresses.append(Address(email='[email protected]'))
anakin.addresses.append(Address(email='[email protected]'))
anakin.addresses.append(Address(email='[email protected]'))
Wehaven'ttouchedthedatabaseyet.It'sonlywhenweusethesessionobjectthatsomethingactuallyhappensinit:
session.add(anakin)
session.add(obi1)
session.commit()
AddingthetwoPersoninstancesisenoughtoalsoaddtheiraddresses(thisisthankstothecascadingeffect).CallingcommitiswhatactuallytellsSQLAlchemytocommitthetransactionandsavethedatainthedatabase.Atransactionisanoperationthatprovidessomethinglikeasandbox,butinadatabasecontext.Aslongasthetransactionhasn'tbeencommitted,wecanrollbackanymodificationwehavedonetothedatabase,andbysodoing,reverttothestatewewerebeforestartingthetransactionitself.SQLAlchemyoffersmorecomplexandgranularwaystodealwithtransactions,whichyoucanstudyinitsofficialdocumentation,asitisquiteanadvancedtopic.WenowqueryforallthepeoplewhosenamestartswithObibyusinglike,whichhookstotheLIKEoperatorinSQL:
obi1=session.query(Person).filter(
Person.name.like('Obi%')
).first()
print(obi1,obi1.addresses)
Wetakethefirstresultofthatquery(weknowweonlyhaveObi-Wananyway),andprintit.Wethenfetchanakin,byusinganexactmatchonhisname(justtoshowyouadifferentwayoffiltering):
anakin=session.query(Person).filter(
Person.name=='AnakinSkywalker'
).first()
print(anakin,anakin.addresses)
WethencaptureAnakin'sID,anddeletetheanakinobjectfromtheglobalframe:
anakin_id=anakin.id
delanakin
ThereasonwedothisisbecauseIwanttoshowyouhowtofetchanobjectbyitsID.Beforewedothat,wewritethedisplay_infofunction,whichwewillusetodisplaythefullcontentofthedatabase(fetchedstartingfromtheaddresses,inordertodemonstratehowtofetchobjectsbyusingarelationattributeinSQLAlchemy):
defdisplay_info():
#getalladdressesfirst
addresses=session.query(Address).all()
#displayresults
foraddressinaddresses:
print(f'{address.person.name}<{address.email}>')
#displayhowmanyobjectswehaveintotal
print('people:{},addresses:{}'.format(
session.query(Person).count(),
session.query(Address).count())
)
Thedisplay_infofunctionprintsalltheaddresses,alongwiththerespectiveperson'sname,and,attheend,producesafinalpieceofinformationregardingthenumberofobjectsinthedatabase.Wecallthefunction,thenwefetchanddeleteanakin(thinkaboutDarthVaderandyouwon'tbesadaboutdeletinghim),andthenwedisplaytheinfoagain,toverifyhe'sactuallydisappearedfromthedatabase:
display_info()
anakin=session.query(Person).get(anakin_id)
session.delete(anakin)
session.commit()
display_info()
Theoutputofallthesesnippetsruntogetheristhefollowing(foryourconvenience,Ihaveseparatedtheoutputintofourblocks,toreflectthefourblocksofcodethatactuallyproducethatoutput):
$pythonalchemy.py
Obi-WanKenobi(id=2)[[email protected],[email protected]]
AnakinSkywalker(id=1)[[email protected],[email protected],[email protected]]
AnakinSkywalker<[email protected]>
AnakinSkywalker<[email protected]>
AnakinSkywalker<[email protected]>
Obi-WanKenobi<[email protected]>
Obi-WanKenobi<[email protected]>
people:2,addresses:5
Obi-WanKenobi<[email protected]>
Obi-WanKenobi<[email protected]>
people:1,addresses:2
Asyoucanseefromthelasttwoblocks,deletinganakinhasdeletedonePersonobject,andthethreeaddressesassociatedwithit.Again,thisisduetothefactthatcascadingtookplacewhenwedeletedanakin.
Thisconcludesourbriefintroductiontodatapersistence.Itisavastand,attimes,complexdomain,whichIencourageyoutoexplorelearningasmuchtheoryaspossible.Lackofknowledgeorproperunderstanding,whenitcomestodatabasesystems,canreallybite.
SummaryInthischapter,wehaveexploredworkingwithfilesanddirectories.Wehavelearnedhowtoopenfilesforreadingandwritingandhowtodothatmoreelegantlybyusingcontextmanagers.Wealsoexploreddirectories:howtolisttheircontent,bothrecursivelyandnot.Wealsolearnedaboutpathnames,whicharethegatewaytoaccessingbothfilesanddirectories.
WethenbrieflysawhowtocreateaZIParchive,andextractitscontent.Thesourcecodeofthebookalsocontainsanexamplewithadifferentcompressionformat:tar.gz.
Wetalkedaboutdatainterchangeformats,andhaveexploredJSONinsomedepth.WehadsomefunwritingcustomencodersanddecodersforspecificPythondatatypes.
ThenweexploredIO,bothwithin-memorystreamsandHTTPrequests.
Andfinally,wesawhowtopersistdatausingpickle,shelve,andtheSQLAlchemyORMlibrary.
Youshouldnowhaveaprettygoodideaofhowtodealwithfilesanddatapersistence,andIhopeyouwilltakethetimetoexplorethesetopicsinmuchmoredepthbyyourself.
Thenextchapterwilllookattesting,profiling,anddealingwithexceptions.
Testing,Profiling,andDealingwithExceptions"Justasthewiseacceptsgoldaftertestingitbyheating,cuttingandrubbingit,soaremywordstobeacceptedafterexaminingthem,butnotoutofrespectforme."
–Buddha
IlovethisquotebytheBuddha.Withinthesoftwareworld,ittranslatesperfectlyintothehealthyhabitofnevertrustingcodejustbecausesomeonesmartwroteitorbecauseit'sbeenworkingfineforalongatime.Ifithasnotbeentested,codeisnottobetrusted.
Whyaretestssoimportant?Well,forone,theygiveyoupredictability.Or,atleast,theyhelpyouachievehighpredictability.Unfortunately,thereisalwayssomebugthatsneaksintothecode.Butwedefinitelywantourcodetobeaspredictableaspossible.Whatwedon'twantistohaveasurprise,inotherwords,ourcodebehavinginanunpredictableway.Wouldyoubehappytoknowthatthesoftwarethatchecksonthesensorsoftheplanethatistakingyouonyourholidayssometimesgoescrazy?No,probablynot.
Therefore,weneedtotestourcode;weneedtocheckthatitsbehavioriscorrect,thatitworksasexpectedwhenitdealswithedgecases,thatitdoesn'thangwhenthecomponentsit'stalkingtoarebrokenorunreachable,thattheperformancesarewellwithintheacceptablerange,andsoon.
Thischapterisallaboutthat—makingsurethatyourcodeispreparedtofacethescaryoutsideworld,thatit'sfastenough,andthatitcandealwithunexpectedorexceptionalconditions.
Inthischapter,we'regoingtoexplorethefollowingtopics:
Testing(severalaspectsofit,includingabriefintroductiontotest-drivendevelopment)ExceptionhandlingProfilingandperformances
Let'sstartbyunderstandingwhattestingis.
TestingyourapplicationTherearemanydifferentkindsoftests,somany,infact,thatcompaniesoftenhaveadedicateddepartment,calledqualityassurance(QA),madeupofindividualswhospendtheirdaytestingthesoftwarethecompanydevelopersproduce.
Tostartmakinganinitialclassification,wecandividetestsintotwobroadcategories:white-boxandblack-boxtests.
White-boxtestsarethosethatexercisetheinternalsofthecode;theyinspectitdowntoaveryfinelevelofdetail.Ontheotherhand,black-boxtestsarethosethatconsiderthesoftwareundertestasifwithinabox,theinternalsofwhichareignored.Eventhetechnology,orthelanguageusedinsidethebox,isnotimportantforblack-boxtests.Whattheydoispluginputintooneendoftheboxandverifytheoutputattheotherend—that'sit.
Thereisalsoanin-betweencategory,calledgray-boxtesting,whichinvolvestestingasysteminthesamewaywedowiththeblack-boxapproach,buthavingsomeknowledgeaboutthealgorithmsanddatastructuresusedtowritethesoftwareandonlypartialaccesstoitssourcecode.
Therearemanydifferentkindsoftestsinthesecategories,eachofwhichservesadifferentpurpose.Togiveyouanidea,hereareafew:
Frontendtests:Makesurethattheclientsideofyourapplicationisexposingtheinformationthatitshould,allthelinks,thebuttons,theadvertising,everythingthatneedstobeshowntotheclient.Itmayalsoverifythatitispossibletowalkacertainpaththroughtheuserinterface.Scenariotests:Makeuseofstories(orscenarios)thathelpthetesterworkthroughacomplexproblemortestapartofthesystem.Integrationtests:Verifythebehaviorofthevariouscomponentsofyourapplicationwhentheyareworkingtogethersendingmessagesthroughinterfaces.Smoketests:Particularlyusefulwhenyoudeployanewupdateonyourapplication.Theycheckwhetherthemostessential,vitalpartsofyourapplicationarestillworkingastheyshouldandthattheyarenotonfire.
Thistermcomesfromwhenengineerstestedcircuitsbymakingsurenothingwassmoking.Acceptancetests,oruseracceptancetesting(UAT):Whatadeveloperdoeswithaproductowner(forexample,inaSCRUMenvironment)todeterminewhethertheworkthatwascommissionedwascarriedoutcorrectly.Functionaltests:Verifythefeaturesorfunctionalitiesofyoursoftware.Destructivetests:Takedownpartsofyoursystem,simulatingafailure,toestablishhowwelltheremainingpartsofthesystemperform.Thesekindsoftestsareperformedextensivelybycompaniesthatneedtoprovideanextremelyreliableservice,suchasAmazonandNetflix,forexample.Performancetests:Aimtoverifyhowwellthesystemperformsunderaspecificloadofdataortrafficsothat,forexample,engineerscangetabetterunderstandingofthebottlenecksinthesystemthatcouldbringittoitskneesinaheavy-loadsituation,orthosethatpreventscalability.Usabilitytests,andthecloselyrelateduserexperience(UX)tests:Aimtocheckwhethertheuserinterfaceissimpleandeasytounderstandanduse.Theyaimtoprovideinputtothedesignerssothattheuserexperienceisimproved.Securityandpenetrationtests:Aimtoverifyhowwellthesystemisprotectedagainstattacksandintrusions.Unittests:Helpthedevelopertowritethecodeinarobustandconsistentway,providingthefirstlineoffeedbackanddefenseagainstcodingmistakes,refactoringmistakes,andsoon.Regressiontests:Providethedeveloperwithusefulinformationaboutafeaturebeingcompromisedinthesystemafteranupdate.Someofthecausesforasystembeingsaidtohavearegressionareanoldbugcomingbacktolife,anexistingfeaturebeingcompromised,oranewissuebeingintroduced.
Manybooksandarticleshavebeenwrittenabouttesting,andIhavetopointyoutothoseresourcesifyou'reinterestedinfindingoutmoreaboutallthedifferentkindsoftests.Inthischapter,wewillconcentrateonunittests,sincetheyarethebackboneofsoftware-craftingandformthevastmajorityofteststhatarewrittenbyadeveloper.
Testingisanart,anartthatyoudon'tlearnfrombooks,I'mafraid.Youcanlearnallthedefinitions(andyoushould),andtrytocollectasmuchknowledgeabouttestingasyoucan,butyouwilllikelybeabletotestyoursoftwareproperlyonly
whenyouhavedoneitforlongenoughinthefield.
Whenyouarehavingtroublerefactoringabitofcode,becauseeverylittlethingyoutouchmakesatestblowup,youlearnhowtowritelessrigidandlimitingtests,whichstillverifythecorrectnessofyourcodebut,atthesametime,allowyouthefreedomandjoytoplaywithit,toshapeitasyouwant.
Whenyouarebeingcalledtoooftentofixunexpectedbugsinyourcode,youlearnhowtowritetestsmorethoroughly,howtocomeupwithamorecomprehensivelistofedgecases,andstrategiestocopewiththembeforetheyturnintobugs.
Whenyouarespendingtoomuchtimereadingtestsandtryingtorefactorthemtochangeasmallfeatureinthecode,youlearntowritesimpler,shorter,andbetter-focusedtests.
Icouldgoonwiththiswhenyou...youlearn...,butIguessyougetthepicture.Youneedtogetyourhandsdirtyandbuildexperience.Mysuggestion?Studythetheoryasmuchasyoucan,andthenexperimentusingdifferentapproaches.Also,trytolearnfromexperiencedcoders;it'sveryeffective.
TheanatomyofatestBeforeweconcentrateonunittests,let'sseewhatatestis,andwhatitspurposeis.
Atestisapieceofcodewhosepurposeistoverifysomethinginoursystem.Itmaybethatwe'recallingafunctionpassingtwointegers,thatanobjecthasapropertycalleddonald_duck,orthatwhenyouplaceanorderonsomeAPI,afteraminuteyoucanseeitdissectedintoitsbasicelements,inthedatabase.
Atestistypicallycomposedofthreesections:
Preparation:Thisiswhereyousetupthescene.Youprepareallthedata,theobjects,andtheservicesyouneedintheplacesyouneedthemsothattheyarereadytobeused.Execution:Thisiswhereyouexecutethebitoflogicthatyou'recheckingagainst.Youperformanactionusingthedataandtheinterfacesyouhavesetupinthepreparationphase.Verification:Thisiswhereyouverifytheresultsandmakesuretheyareaccordingtoyourexpectations.Youcheckthereturnedvalueofafunction,orthatsomedataisinthedatabase,someisnot,somehaschanged,arequesthasbeenmade,somethinghashappened,amethodhasbeencalled,andsoon.
Whiletestsusuallyfollowthisstructure,inatestsuite,youwilltypicallyfindsomeotherconstructsthattakepartinthetestinggame:
Setup:Thisissomethingquitecommonlyfoundinseveraldifferenttests.It'slogicthatcanbecustomizedtorunforeverytest,class,module,orevenforawholesession.Inthisphaseusuallydeveloperssetupconnectionstodatabases,maybepopulatethemwithdatathatwillbeneededthereforthetesttomakesense,andsoon.Teardown:Thisistheoppositeofthesetup;theteardownphasetakesplacewhenthetestshavebeenrun.Likethesetup,itcanbecustomizedtorunforeverytest,classormodule,orsession.Typicallyinthisphase,wedestroyanyartefactsthatwerecreatedforthetestsuite,andcleanupafter
ourselves.Fixtures:Theyarepiecesofdatausedinthetests.Byusingaspecificsetoffixture,outcomesarepredictableandthereforetestscanperformverificationsagainstthem.
Inthischapter,wewillusethepytestPythonlibrary.Itisanincrediblypowerfultoolthatmakestestingmucheasierandprovidesplentyofhelperssothatthetestlogiccanfocusmoreontheactualtestingthanthewiringaroundit.Youwillsee,whenwegettothecode,thatoneofthecharacteristicsofpytestisthatfixtures,setup,andteardownoftenblendintoone.
Testingguidelines
Likesoftware,testscanbegoodorbad,withawholerangeofshadesinthemiddle.Towritegoodtests,herearesomeguidelines:
Keepthemassimpleaspossible.It'sokaytoviolatesomegoodcodingrules,suchashardcodingvaluesorduplicatingcode.Testsneed,firstandforemost,tobeasreadableaspossibleandeasytounderstand.Whentestsarehardtoreadorunderstand,youcanneverbeconfidenttheyareactuallymakingsureyourcodeisperformingcorrectly.Testsshouldverifyonethingandonethingonly.It'sveryimportantthatyoukeepthemshortandcontained.It'sperfectlyfinetowritemultipleteststoexerciseasingleobjectorfunction.Justmakesurethateachtesthasoneandonlyonepurpose.Testsshouldnotmakeanyunnecessaryassumptionwhenverifyingdata.Thisistrickytounderstandatfirst,butitisimportant.Verifyingthattheresultofafunctioncallis[1,2,3]isnotthesameassayingtheoutputisalistthatcontainsthenumbers1,2,and3.Intheformer,we'realsoassumingtheordering;inthelatter,we'reonlyassumingwhichitemsareinthelist.Thedifferencessometimesarequitesubtle,buttheyarestillveryimportant.Testsshouldexercisethewhat,ratherthanthehow.Testsshouldfocusoncheckingwhatafunctionissupposedtodo,ratherthanhowitisdoingit.Forexample,focusonthefactthatit'scalculatingthesquarerootofanumber(thewhat),insteadofonthefactthatitiscallingmath.sqrttodoit(thehow).Unlessyou'rewritingperformancetestsoryouhaveaparticularneedtoverifyhowacertainactionisperformed,trytoavoidthistypeoftestingandfocusonthewhat.Testingthehowleadstorestrictivetestsandmakesrefactoringhard.Moreover,thetypeoftestyouhavetowritewhenyouconcentrateonthehowismorelikelytodegradethequalityofyourtestingcodebasewhenyouamendyoursoftwarefrequently.Testsshouldusetheminimalsetoffixturesneededtodothejob.Thisisanothercrucialpoint.Fixtureshaveatendencytogrowovertime.Theyalsotendtochangeeverynowandthen.Ifyouusebigamountsoffixtures
andignoreredundanciesinyourtests,refactoringwilltakelonger.Spottingbugswillbeharder.Trytouseasetoffixturesthatisbigenoughforthetesttoperformcorrectly,butnotanybigger.Testsshouldrunasfastaspossible.Agoodtestcodebasecouldendupbeingmuchlongerthanthecodebeingtesteditself.Itvariesaccordingtothesituationandthedeveloper,but,whateverthelength,you'llenduphavinghundreds,ifnotthousands,ofteststorun,whichmeansthefastertheyrun,thefasteryoucangetbacktowritingcode.WhenusingTDD,forexample,youruntestsveryoften,sospeedisessential.Testsshoulduseuptheleastpossibleamountofresources.Thereasonforthisisthateverydeveloperwhochecksoutyourcodeshouldbeabletorunyourtests,nomatterhowpowerfultheirboxis.ItcouldbeaskinnyvirtualmachineoraneglectedJenkinsbox,yourtestsshouldrunwithoutchewinguptoomanyresources.
AJenkinsboxisamachinethatrunsJenkins,softwarethatiscapableof,amongmanyotherthings,runningyourtestsautomatically.Jenkinsisfrequentlyusedincompanieswheredevelopersusepracticessuchascontinuousintegrationandextremeprogramming.
UnittestingNowthatyouhaveanideaaboutwhattestingisandwhyweneedit,let'sintroducethedeveloper'sbestfriend:theunittest.
Beforeweproceedwiththeexamples,allowmetosharesomewordsofcaution:I'lltrytogiveyouthefundamentalsaboutunittesting,butIdon'tfollowanyparticularschoolofthoughtormethodologytotheletter.Overtheyears,Ihavetriedmanydifferenttestingapproaches,eventuallycomingupwithmyownwayofdoingthings,whichisconstantlyevolving.ToputitasBruceLeewouldhave:"Absorbwhatisuseful,discardwhatisuselessandaddwhatisspecificallyyourown."
WritingaunittestUnitteststaketheirnameafterthefactthattheyareusedtotestsmallunitsofcode.Toexplainhowtowriteaunittest,let'stakealookatasimplesnippet:
#data.py
defget_clean_data(source):
data=load_data(source)
cleaned_data=clean_data(data)
returncleaned_data
Theget_clean_datafunctionisresponsibleforgettingdatafromsource,cleaningit,andreturningittothecaller.Howdowetestthisfunction?
Onewayofdoingthisistocallitandthenmakesurethatload_datawascalledoncewithsourceasitsonlyargument.Thenwehavetoverifythatclean_datawascalledonce,withthereturnvalueofload_data.And,finally,wewouldneedtomakesurethatthereturnvalueofclean_dataiswhatisreturnedbytheget_clean_datafunctionaswell.
Todothis,weneedtosetupthesourceandrunthiscode,andthismaybeaproblem.Oneofthegoldenrulesofunittestingisthatanythingthatcrossestheboundariesofyourapplicationneedstobesimulated.Wedon'twanttotalktoarealdatasource,andwedon'twanttoactuallyrunrealfunctionsiftheyarecommunicatingwithanythingthatisnotcontainedinourapplication.Afewexampleswouldbeadatabase,asearchservice,anexternalAPI,andafileinthefilesystem.
Weneedtheserestrictionstoactasashield,sothatwecanalwaysrunourtestssafelywithoutthefearofdestroyingsomethinginarealdatasource.
Anotherreasonisthatitmaybequitedifficultforasingledevelopertoreproducethewholearchitectureontheirbox.Itmayrequirethesettingupofdatabases,APIs,services,filesandfolders,andsoonandsoforth,andthiscanbedifficult,time-consuming,orsometimesnotevenpossible.
Verysimplyput,anapplicationprogramminginterface(API)isasetoftoolsforbuildingsoftwareapplications.AnAPIexpressesasoftwarecomponentintermsofitsoperations,inputandoutput,andunderlyingtypes.Forexample,ifyoucreateasoftwarethatneedsto
interfacewithadataproviderservice,it'sverylikelythatyouwillhavetogothroughtheirAPIinordertogainaccesstothedata.
Therefore,inourunittests,weneedtosimulateallthosethingsinsomeway.Unittestsneedtoberunbyanydeveloperwithouttheneedforthewholesystemtobesetupontheirbox.
Adifferentapproach,whichIalwaysfavorwhenit'spossibletodoso,istosimulateentitieswithoutusingfakeobjects,butusingspecial-purposetestobjectsinstead.Forexample,ifyourcodetalkstoadatabase,insteadoffakingallthefunctionsandmethodsthattalktothedatabaseandprogrammingthefakeobjectssothattheyreturnwhattherealoneswould,I'dmuchratherspawnatestdatabase,setupthetablesanddataIneed,andthenpatchtheconnectionsettingssothatmytestsarerunningrealcode,againstthetestdatabase,therebydoingnoharmatall.In-memorydatabasesareexcellentoptionsforthesecases.
OneoftheapplicationsthatallowyoutospawnadatabasefortestingisDjango.Withinthedjango.testpackage,youcanfindseveraltoolsthathelpyouwriteyourtestssothatyouwon'thavetosimulatethedialogwithadatabase.Bywritingteststhisway,youwillalsobeabletocheckontransactions,encodings,andallotherdatabase-relatedaspectsofprogramming.Anotheradvantageofthisapproachconsistsintheabilityofcheckingagainstthingsthatcanchangefromonedatabasetoanother.
Sometimes,though,it'sstillnotpossible,andweneedtousefakes,solet'stalkaboutthem.
Mockobjectsandpatching
Firstofall,inPython,thesefakeobjectsarecalledmocks.UptoVersion3.3,themocklibrarywasathird-partylibrarythatbasicallyeveryprojectwouldinstallviapipbut,fromVersion3.3,ithasbeenincludedinthestandardlibraryundertheunittestmodule,andrightfullyso,givenitsimportanceandhowwidespreaditis.
Theactofreplacingarealobjectorfunction(oringeneral,anypieceofdatastructure)withamock,iscalledpatching.Themocklibraryprovidesthepatchtool,whichcanactasafunctionorclassdecorator,andevenasacontextmanagerthatyoucanusetomockthingsout.Onceyouhavereplacedeverythingyoudon'tneedtorunwithsuitablemocks,youcanpasstothesecondphaseofthetestandrunthecodeyouareexercising.Aftertheexecution,youwillbeabletocheckthosemockstoverifythatyourcodehasworkedcorrectly.
AssertionsTheverificationphaseisdonethroughtheuseofassertions.Anassertionisafunction(ormethod)thatyoucanusetoverifyequalitybetweenobjects,aswellasotherconditions.Whenaconditionisnotmet,theassertionwillraiseanexceptionthatwillmakeyourtestfail.Youcanfindalistofassertionsintheunittestmoduledocumentation;however,whenusingpytest,youwilltypicallyusethegenericassertstatement,whichmakesthingsevensimpler.
TestingaCSVgeneratorLet'snowadoptapracticalapproach.Iwillshowyouhowtotestapieceofcode,andwewilltouchontherestoftheimportantconceptsaroundunittesting,withinthecontextofthisexample.
Wewanttowriteanexportfunctionthatdoesthefollowing:ittakesalistofdictionaries,eachofwhichrepresentsauser.ItcreatesaCSVfile,putsaheaderinit,andthenproceedstoaddalltheuserswhoaredeemedvalidaccordingtosomerules.Theexportfunctiontakesalsoafilename,whichwillbethenamefortheCSVinoutput.And,finally,ittakesanindicationonwhethertoallowanexistingfilewiththesamenametobeoverwritten.
Asfortheusers,theymustabidebythefollowing:eachuserhasatleastanemail,aname,andanage.Therecanbeafourthfieldrepresentingtherole,butit'soptional.Theuser'semailaddressneedstobevalid,thenameneedstobenon-empty,andtheagemustbeanintegerbetween18and65.
Thisisourtask,sonowI'mgoingtoshowyouthecode,andthenwe'regoingtoanalyzethetestsIwroteforit.But,firstthingsfirst,inthefollowingcodesnippets,I'llbeusingtwothird-partylibraries:marshmallowandpytest.Theybothareintherequirementsofthebook'ssourcecode,somakesureyouhaveinstalledthemwithpip.
marshmallowisawonderfullibrarythatprovidesuswiththeabilitytoserializeanddeserializeobjectsand,mostimportantly,givesustheabilitytodefineaschemathatwecanusetovalidateauserdictionary.pytestisoneofthebestpiecesofsoftwareIhaveeverseen.Itisusedeverywherenow,andhasreplacedothertoolssuchasnose,forexample.Itprovidesuswithgreattoolstowritebeautifulshorttests.
Butlet'sgettothecode.Icalleditapi.pyjustbecauseitexposesafunctionthatwecanusetodothings.I'llshowittoyouinchunks:
#api.py
importos
importcsv
fromcopyimportdeepcopy
frommarshmallowimportSchema,fields,pre_load
frommarshmallow.validateimportLength,Range
classUserSchema(Schema):
"""Representa*valid*user."""
email=fields.Email(required=True)
name=fields.String(required=True,validate=Length(min=1))
age=fields.Integer(
required=True,validate=Range(min=18,max=65)
)
role=fields.String()
@pre_load(pass_many=False)
defstrip_name(self,data):
data_copy=deepcopy(data)
try:
data_copy['name']=data_copy['name'].strip()
except(AttributeError,KeyError,TypeError):
pass
returndata_copy
schema=UserSchema()
Thisfirstpartiswhereweimportallthemodulesweneed(osandcsv),andsometoolsfrommarshmallow,andthenwedefinetheschemafortheusers.Asyoucansee,weinheritfrommarshmallow.Schema,andthenwesetfourfields.NoticeweareusingtwoStringfields,EmailandInteger.Thesewillalreadyprovideuswithsomevalidationfrommarshmallow.Noticethereisnorequired=Trueintherolefield.
Weneedtoaddacoupleofcustombitsofcode,though.Weneedtoaddvalidate_agetomakesurethevalueiswithintherangewewant.WeraiseValidationErrorincaseit'snot.Andmarshmallowwillkindlytakecareofraisinganerrorshouldwepassanythingbutaninteger.
Next,weaddvalidate_name,becausethefactthatanamekeyinthedictionaryistheredoesn'tguaranteethatthenameisactuallynon-empty.Sowetakeitsvalue,westripallleadingandtrailingwhitespacecharacters,andiftheresultisempty,weraiseValidationErroragain.Noticewedon'tneedtoaddacustomvalidatorfortheemailfield.Thisisbecausemarshmallowwillvalidateit,andavalidemailcannotbeempty.
Wetheninstantiateschema,sothatwecanuseittovalidatedata.Solet'swritetheexportfunction:
#api.py
defexport(filename,users,overwrite=True):
"""ExportaCSVfile.
CreateaCSVfileandfillwithvalidusers.If`overwrite`
isFalseandfilealreadyexists,raiseIOError.
"""
ifnotoverwriteandos.path.isfile(filename):
raiseIOError(f"'{filename}'alreadyexists.")
valid_users=get_valid_users(users)
write_csv(filename,valid_users)
Asyousee,itsinternalsarequitestraightforward.IfoverwriteisFalseandthefilealreadyexists,weraiseIOErrorwithamessagesayingthefilealreadyexists.Otherwise,ifwecanproceed,wesimplygetthelistofvalidusersandfeedittowrite_csv,whichisresponsibleforactuallydoingthejob.Let'sseehowallthesefunctionsaredefined:
#api.py
defget_valid_users(users):
"""Yieldonevaliduseratatimefromusers."""
yieldfromfilter(is_valid,users)
defis_valid(user):
"""Returnwhetherornottheuserisvalid."""
returnnotschema.validate(user)
TurnsoutIcodedget_valid_usersasagenerator,asthereisnoneedtomakeapotentiallybiglistinordertoputitinafile.Wecanvalidateandsavethemonebyone.Theheartofvalidationissimplyadelegationtoschema.validate,whichusesvalidationenginebymarshmallow.Thewaythisworksisbyreturningadictionary,whichisemptyifvalidationsucceeded,orelseitwillcontainerrorinformation.Wedon'treallycareaboutcollectingtheerrorinformationforthistask,sowesimplyignoreit,andwithinis_validwebasicallyreturnTrueifthereturnvaluefromschema.validateisempty,andFalseotherwise.
Onelastpieceismissing;hereitis:
#api.py
defwrite_csv(filename,users):
"""WriteaCSVgivenafilenameandalistofusers.
TheusersareassumedtobevalidforthegivenCSVstructure.
"""
fieldnames=['email','name','age','role']
withopen(filename,'x',newline='')ascsvfile:
writer=csv.DictWriter(csvfile,fieldnames=fieldnames)
writer.writeheader()
foruserinusers:
writer.writerow(user)
Again,thelogicisstraightforward.Wedefinetheheaderinfieldnames,thenweopenfilenameforwriting,andwespecifynewline='',whichisrecommendedinthedocumentationwhendealingwithCSVfiles.Whenthefilehasbeencreated,wegetawriterobjectbyusingthecsv.DictWriterclass.Thebeautyofthistoolisthatitiscapableofmappingtheuserdictionariestothefieldnames,sowedon'tneedtotakecareoftheordering.
Wewritetheheaderfirst,andthenweloopovertheusersandaddthemonebyone.Notice,thisfunctionassumesitisfedalistofvalidusers,anditmaybreakifthatassumptionisfalse(withthedefaultvalues,itwouldbreakifanyuserdictionaryhadextrafields).
That'sthewholecodeyouhavetokeepinmind.Isuggestyouspendamomenttogothroughitagain.Thereisnoneedtomemorizeit,andthefactthatIhaveusedsmallhelperfunctionswithmeaningfulnameswillenableyoutofollowthetestingalongmoreeasily.
Let'snowgettotheinterestingpart:testingourexportfunction.Onceagain,I'llshowyouthecodeinchunks:
#tests/test_api.py
importos
fromunittest.mockimportpatch,mock_open,call
importpytest
from..apiimportis_valid,export,write_csv
Let'sstartfromtheimports:weneedos,temporarydirectories(whichwealreadysawinChapter7,FilesandDataPersistence),thenpytest,and,finally,weusearelativeimporttofetchthethreefunctionsthatwewanttoactuallytest:is_valid,export,andwrite_csv.
Beforewecanwritetests,though,weneedtomakeafewfixtures.Asyouwillsee,afixtureisafunctionthatisdecoratedwiththepytest.fixturedecorator.Inmostcases,weexpectfixturetoreturnsomething,sothatwecanuseitinatest.Wehavesomerequirementsforauserdictionary,solet'swriteacoupleofusers:onewithminimalrequirements,andonewithfullrequirements.Bothneedtobevalid.Hereisthecode:
#tests/test_api.py
@pytest.fixture
defmin_user():
"""Representavaliduserwithminimaldata."""
return{
'email':'[email protected]',
'name':'PrimusMinimus',
'age':18,
}
@pytest.fixture
deffull_user():
"""Representvaliduserwithfulldata."""
return{
'email':'[email protected]',
'name':'MaximusPlenus',
'age':65,
'role':'emperor',
}
Inthisexample,theonlydifferenceisthepresenceoftherolekey,butit'senoughtoshowyouthepointIhope.Noticethatinsteadofsimplydeclaringdictionariesatamodulelevel,weactuallyhavewrittentwofunctionsthatreturnadictionary,andwehavedecoratedthemwiththepytest.fixturedecorator.Thisisbecausewhenyoudeclareadictionaryatmodule-level,whichissupposedtobeusedinyourtests,youneedtomakesureyoucopyitatthebeginningofeverytest.Ifyoudon't,youmayhaveatestthatmodifiesit,andthiswillaffectallteststhatfollowit,compromisingtheirintegrity.
Byusingthesefixtures,pytestwillgiveusanewdictionaryeverytestrun,sowedon'tneedtogothroughthatpainourselves.Noticethatifafixturereturnsanothertype,insteadofdict,thenthatiswhatyouwillgetinthetest.Fixturesalsoarecomposable,whichmeanstheycanbeusedinoneanother,whichisaverypowerfulfeatureofpytest.Toshowyouthis,let'swriteafixtureforalistofusers,inwhichweputthetwowealreadyhave,plusonethatwouldfailvalidationbecauseithasnoage.Let'stakealookatthefollowingcode:
#tests/test_api.py
@pytest.fixture
defusers(min_user,full_user):
"""Listofusers,twovalidandoneinvalid."""
bad_user={
'email':'[email protected]',
'name':'Horribilis',
}
return[min_user,bad_user,full_user]
Nice.So,nowwehavetwousersthatwecanuseindividually,butalsowehavealistofthreeusers.Thefirstroundoftestswillbetestinghowwearevalidatingauser.Wewillgroupallthetestsforthistaskwithinaclass.Thisnotonlyhelpsgivingrelatedtestsanamespace,aplacetobe,but,aswe'llseelateron,itallowsustodeclareclass-levelfixtures,whicharedefinedjustforthetestsbelongingto
theclass.Takealookatthiscode:
#tests/test_api.py
classTestIsValid:
"""Testhowcodeverifieswhetherauserisvalidornot."""
deftest_minimal(self,min_user):
assertis_valid(min_user)
deftest_full(self,full_user):
assertis_valid(full_user)
Westartverysimplybymakingsureourfixturesareactuallypassingvalidation.Thisisveryimportant,asthosefixtureswillbeusedeverywhere,sowewantthemtobeperfect.Next,wetesttheage.Twothingstonoticehere:Iwillnotrepeattheclasssignature,sothecodethatfollowsisindentedbyfourspacesandit'sbecausetheseareallmethodswithinthesameclass,okay?And,second,we'regoingtouseparametrizationquiteheavily.
Parametrizationisatechniquethatenablesustorunthesametestmultipletimes,butfeedingdifferentdatatoit.Itisveryuseful,asitallowsustowritethetestonlyoncewithnorepetition,andtheresultwillbeveryintelligentlyhandledbypytest,whichwillrunallthosetestsasiftheywereactuallyseparate,thusprovidinguswithclearerrormessageswhentheyfail.Ifyouparametrizemanually,youlosethisfeature,andbelievemeyouwon'tbehappy.Let'sseehowwetesttheage:
#tests/test_api.py
@pytest.mark.parametrize('age',range(18))
deftest_invalid_age_too_young(self,age,min_user):
min_user['age']=age
assertnotis_valid(min_user)
Right,sowestartbywritingatesttocheckthatvalidationfailswhentheuseristooyoung.Accordingtoourrule,auseristooyoungwhentheyareyoungerthan18.Wecheckforeveryagebetween0and17,byusingrange.
Ifyoutakealookathowtheparametrizationworks,you'llseewedeclarethenameofanobject,whichwethenpasstothesignatureofthemethod,andthenwespecifywhichvaluesthisobjectwilltake.Foreachvalue,thetestwillberunonce.Inthecaseofthisfirsttest,theobject'snameisage,andthevaluesareallthosereturnedbyrange(18),whichmeansallintegernumbersfrom0to17areincluded.Noticehowwefeedagetothetestmethod,rightafterself,andthenwedosomethingelse,whichisalsoveryinteresting.Wepassthismethodafixture:
min_user.Thishastheeffectofactivatingthatfixtureforthetestrun,sothatwecanuseit,andcanrefertoitfromwithinthetest.Inthiscase,wesimplychangetheagewithinthemin_userdictionary,andthenweverifythattheresultofis_valid(min_user)isFalse.
WedothislastbitbyassertingonthefactthatnotFalseisTrue.Inpytest,thisishowyoucheckforsomething.Yousimplyassertthatsomethingistruthy.Ifthatisthecase,thetesthassucceeded.Shoulditinsteadbetheopposite,thetestwouldfail.
Let'sproceedandaddallthetestsneededtomakevalidationfailontheage:
#tests/test_api.py
@pytest.mark.parametrize('age',range(66,100))
deftest_invalid_age_too_old(self,age,min_user):
min_user['age']=age
assertnotis_valid(min_user)
@pytest.mark.parametrize('age',['NaN',3.1415,None])
deftest_invalid_age_wrong_type(self,age,min_user):
min_user['age']=age
assertnotis_valid(min_user)
So,anothertwotests.Onetakescareoftheotherendofthespectrum,from66yearsofageto99.Andthesecondoneinsteadmakessurethatageisinvalidwhenit'snotanintegernumber,sowepasssomevalues,suchasastring,afloat,andNone,justtomakesure.Noticehowthestructureofthetestisbasicallyalwaysthesame,but,thankstotheparametrization,wefeedverydifferentinputargumentstoit.
Nowthatwehavetheage-failingallsortedout,let'saddatestthatactuallycheckstheageiswithinthevalidrange:
#tests/test_api.py
@pytest.mark.parametrize('age',range(18,66))
deftest_valid_age(self,age,min_user):
min_user['age']=age
assertis_valid(min_user)
It'saseasyasthat.Wepassthecorrectrange,from18to65,andremovethenotintheassertion.Noticehowalltestsstartwiththetest_prefix,andhaveadifferentname.
Wecanconsidertheageasbeingtakencareof.Let'smoveontowritetestsonmandatoryfields:
#tests/test_api.py
@pytest.mark.parametrize('field',['email','name','age'])
deftest_mandatory_fields(self,field,min_user):
min_user.pop(field)
assertnotis_valid(min_user)
@pytest.mark.parametrize('field',['email','name','age'])
deftest_mandatory_fields_empty(self,field,min_user):
min_user[field]=''
assertnotis_valid(min_user)
deftest_name_whitespace_only(self,min_user):
min_user['name']='\n\t'
assertnotis_valid(min_user)
Thepreviousthreetestsstillbelongtothesameclass.Thefirstonetestswhetherauserisinvalidwhenoneofthemandatoryfieldsismissing.Noticethatateverytestrun,themin_userfixtureisrestored,soweonlyhaveonemissingfieldpertestrun,whichistheappropriatewaytocheckformandatoryfields.Wesimplypopthekeyoutofthedictionary.Thistimetheparametrizationobjecttakesthenamefield,and,bylookingatthefirsttest,youseeallthemandatoryfieldsintheparametrizationdecorator:email,name,andage.
Inthesecondone,thingsarealittledifferent.Insteadofpoppingkeysout,wesimplysetthem(oneatatime)totheemptystring.Finally,inthethirdone,wecheckforthenametobemadeofwhitespaceonly.
Thepreviousteststakecareofmandatoryfieldsbeingthereandbeingnon-empty,andoftheformattingaroundthenamekeyofauser.Good.Let'snowwritethelasttwotestsforthisclass.Wewanttocheckemailvalidity,andtypeforemail,name,andtherole:
#tests/test_api.py
@pytest.mark.parametrize(
'email,outcome',
[
('missing_at.com',False),
('@missing_start.com',False),
('missing_end@',False),
('missing_dot@example',False),
('[email protected]',True),
('δοκιμή@παράδειγμα.δοκιμή',True),
('аджай@экзампл.рус',True),
]
)
deftest_email(self,email,outcome,min_user):
min_user['email']=email
assertis_valid(min_user)==outcome
Thistime,theparametrizationisslightlymorecomplex.Wedefinetwoobjects(emailandoutcome),andthenwepassalistoftuples,insteadofasimplelist,tothedecorator.Whathappensisthateachtimethetestisrun,oneofthosetupleswillbeunpackedsotofillthevaluesofemailandoutcome,respectively.Thisallowsustowriteonetestforbothvalidandinvalidemailaddresses,insteadoftwoseparateones.Wedefineanemailaddress,andwespecifytheoutcomeweexpectfromvalidation.Thefirstfourareinvalidemailaddresses,butthelastthreeareactuallyvalid.IhaveusedacoupleofexampleswithUnicode,justtomakesurewe'renotforgettingtoincludeourfriendsfromallovertheworldinthevalidation.
Noticehowthevalidationisdone,assertingtheresultofthecallneedstomatchtheoutcomewehaveset.
Let'snowwriteasimpletesttomakesurevalidationfailswhenwefeedthewrongtypetothefields(again,theagehasbeentakencareofseparatelybefore):
#tests/test_api.py
@pytest.mark.parametrize(
'field,value',
[
('email',None),
('email',3.1415),
('email',{}),
('name',None),
('name',3.1415),
('name',{}),
('role',None),
('role',3.1415),
('role',{}),
]
)
deftest_invalid_types(self,field,value,min_user):
min_user[field]=value
assertnotis_valid(min_user)
Aswedidbefore,justforfun,wepassthreedifferentvalues,noneofwhichisactuallyastring.Thistestcouldbeexpandedtoincludemorevalues,but,honestly,weshouldn'tneedtowritetestssuchasthisone.Ihaveincludeditherejusttoshowyouwhat'spossible.
Beforewemovetothenexttestclass,letmetalkaboutsomethingwehaveseenwhenwewerecheckingtheage.
BoundariesandgranularityWhilecheckingfortheage,wehavewrittenthreeteststocoverthethreeranges:0-17(fail),18-65(success),66-99(fail).Whydidwedothis?Theanswerliesinthefactthatwearedealingwithtwoboundaries:18and65.Soourtestingneedstofocusonthethreeregionsthosetwoboundariesdefine:before18,within18and65,andafter65.Howyoudoitisnotcrucial,aslongasyoumakesureyoutesttheboundariescorrectly.Thismeansifsomeonechangesthevalidationintheschemafrom18<=value<=65to18<=value<65(noticethemissing=),theremustbeatestthatfailsonthe65.
Thisconceptisknownasboundary,andit'sveryimportantthatyourecognizetheminyourcodesothatyoucantestagainstthem.
Anotherimportantthingistounderstandiswhichzoomlevelwewanttogetclosetotheboundaries.Inotherwords,whichunitshouldIusetomovearoundit?Inthecaseofage,we'redealingwithintegers,soaunitof1willbetheperfectchoice(whichiswhyweused16,17,18,19,20,...).Butwhatifyouweretestingforatimestamp?Well,inthatcase,thecorrectgranularitywilllikelybedifferent.Ifthecodehastoactdifferentlyaccordingtoyourtimestampandthattimestamprepresentseconds,thenthegranularityofyourtestsshouldzoomdowntoseconds.Ifthetimestamprepresentsyears,thenyearsshouldbetheunityouuse.Ihopeyougetthepicture.Thisconceptisknownasgranularity,andneedstobecombinedwiththatofboundaries,sothatbygoingaroundtheboundarieswiththecorrectgranularity,youcanmakesureyourtestsarenotleavinganythingtochance.
Let'snowcontinuewithourexample,andtesttheexportfunction.
TestingtheexportfunctionInthesametestmodule,Ihavedefinedanotherclassthatrepresentsatestsuitefortheexportfunction.Hereitis:
#tests/test_api.py
classTestExport:
@pytest.fixture
defcsv_file(self,tmpdir):
yieldtmpdir.join("out.csv")
@pytest.fixture
defexisting_file(self,tmpdir):
existing=tmpdir.join('existing.csv')
existing.write('Pleaseleavemealone...')
yieldexisting
Let'sstartunderstandingthefixtures.Wehavedefinedthematclass-levelthistime,whichmeanstheywillbealiveonlyforaslongasthetestsintheclassarerunning.Wedon'tneedthesefixturesoutsideofthisclass,soitdoesn'tmakesensetodeclarethematamodulelevellikewe'vedonewiththeuserones.
So,weneedtwofiles.IfyourecallwhatIwroteatthebeginningofthischapter,whenitcomestointeractionwithdatabases,disks,networks,andsoon,weshouldmockeverythingout.However,whenpossible,Iprefertouseadifferenttechnique.Inthiscase,Iwillemploytemporaryfolders,whichwillbebornwithinthefixture,anddiewithinit,leavingnotraceoftheirexistence.IammuchhappierifIcanavoidmocking.Mockingisamazing,butitcanbetricky,andasourceofbugs,unlessit'sdonecorrectly.
Now,thefirstfixture,csv_file,definesamanagedcontextinwhichweobtainareferencetoatemporaryfolder.Wecanconsiderthelogicuptoandincludingtheyield,asthesetupphase.Thefixtureitself,intermsofdata,isrepresentedbythetemporaryfilename.Thefileitselfisnotpresentyet.Whenatestruns,thefixtureiscreated,andattheendofthetest,therestofthefixturecode(theoneafteryield,ifany)isexecuted.Thatpartcanbeconsideredtheteardownphase.Inthiscase,itconsistsofexitingthecontextmanager,whichmeansthetemporaryfolderisdeleted(alongwithallitscontent).Youcanputmuchmoreineachphaseofanyfixture,andwithexperience,I'msureyou'llmastertheart
ofdoingsetupandteardownthisway.Itactuallycomesverynaturallyquitequickly.
Thesecondfixtureisverysimilartothefirstone,butwe'lluseittotestthatwecanpreventoverwritingwhenwecallexportwithoverwrite=False.Sowecreateafileinthetemporaryfolder,andweputsomecontentintoit,justtohavethemeanstoverifyithasn'tbeentouched.
Noticehowbothfixturesarereturningthefilenamewiththefullpathinformation,tomakesureweactuallyusethetemporaryfolderinourcode.Let'snowseethetests:
#tests/test_api.py
deftest_export(self,users,csv_file):
export(csv_file,users)
lines=csv_file.readlines()
assert[
'email,name,age,role\n',
'[email protected],PrimusMinimus,18,\n',
'[email protected],MaximusPlenus,65,emperor\n',
]==lines
Thistestemploystheusersandcsv_filefixtures,andimmediatelycallsexportwiththem.Weexpectthatafilehasbeencreated,andpopulatedwiththetwovaliduserswehave(rememberthelistcontainsthreeusers,butoneisinvalid).
Toverifythat,weopenthetemporaryfile,andcollectallitslinesintoalist.Wethencomparethecontentofthefilewithalistofthelinesthatweexpecttobeinit.Noticeweonlyputtheheader,andthetwovalidusers,inthecorrectorder.
Nowweneedanothertest,tomakesurethatifthereisacommainoneofthevalues,ourCSVisstillgeneratedcorrectly.Beingacomma-separatedvalues(CSV)file,weneedtomakesurethatacommainthedatadoesn'tbreakthingsup:
#tests/test_api.py
deftest_export_quoting(self,min_user,csv_file):
min_user['name']='Aname,withacomma'
export(csv_file,[min_user])
lines=csv_file.readlines()
assert[
'email,name,age,role\n',
'[email protected],"Aname,withacomma",18,\n',
]==lines
Thistime,wedon'tneedthewholeuserslist,wejustneedoneaswe'retestingaspecificthing,andwehavetheprevioustesttomakesurewe'regeneratingthefilecorrectlywithalltheusers.Remember,alwaystrytominimizetheworkyoudowithinatest.
So,weusemin_user,andputanicecommainitsname.Wethenrepeattheprocedure,whichisverysimilartothatoftheprevioustest,andfinallywemakesurethatthenameisputintheCSVfilesurroundedbydoublequotes.ThisisenoughforanygoodCSVparsertounderstandthattheydon'thavetobreakonthecommainsidethedoublequotes.
NowIwantonemoretest,whichneedstocheckthatwhetherthefileexistsandwedon'twanttooverrideit,ourcodewon'ttouchit:
#tests/test_api.py
deftest_does_not_overwrite(self,users,existing_file):
withpytest.raises(IOError)aserr:
export(existing_file,users,overwrite=False)
asserterr.match(
r"'{}'alreadyexists\.".format(existing_file)
)
#let'salsoverifythefileisstillintact
assertexisting_file.read()=='Pleaseleavemealone...'
Thisisabeautifultest,becauseitallowsmetoshowyouhowyoucantellpytestthatyouexpectafunctioncalltoraiseanexception.Wedoitinthecontextmanagergiventousbypytest.raises,towhichwefeedtheexceptionweexpectfromthecallwemakeinsidethebodyofthatcontextmanager.Iftheexceptionisnotraised,thetestwillfail.
Iliketobethoroughinmytest,soIdon'twanttostopthere.Ialsoassertonthemessage,byusingtheconvenienterr.matchhelper(watchout,ittakesaregularexpression,notasimplestring–we'llseeregularexpressionsinChapter14,WebDevelopment).
Finally,let'smakesurethatthefilestillcontainsitsoriginalcontent(whichiswhyIcreatedtheexisting_filefixture)byopeningit,andcomparingallofitscontenttothestringitshouldbe.
FinalconsiderationsBeforewemoveontothenexttopic,letmejustwrapupwithsomeconsiderations.
First,IhopeyouhavenoticedthatIhaven'ttestedallthefunctionsIwrote.Specifically,Ididn'ttestget_valid_users,validate,andwrite_csv.Thereasonisbecausethesefunctionsareimplicitlytestedbyourtestsuite.Wehavetestedis_validandexport,whichismorethanenoughtomakesureourschemaisvalidatinguserscorrectly,andtheexportfunctionisdealingwithfilteringoutinvaliduserscorrectly,respectingexistingfileswhenneeded,andwritingaproperCSV.Thefunctionswehaven'ttestedaretheinternals,theyprovidelogicthatparticipatestodoingsomethingthatwehavethoroughlytestedanyway.Wouldaddingextratestsforthosefunctionsbegoodorbad?Thinkaboutitforamoment.
Theanswerisactuallydifficult.Themoreyoutest,thelessyoucanrefactorthatcode.Asitisnow,Icouldeasilydecidetocallis_validwithanothername,andIwouldn'thavetochangeanyofmytests.Ifyouthinkaboutit,itmakessense,becauseaslongasis_validprovidescorrectvalidationtotheget_valid_usersfunction,Idon'treallyneedtoknowaboutit.Doesthismakesensetoyou?
IfinsteadIhadtestsforthevalidatefunction,thenIwouldhavetochangethem,ifIdecidedtocallitdifferently(ortosomehowchangeitssignature).
So,whatistherightthingtodo?Testsornotests?Itwillbeuptoyou.Youhavetofindtherightbalance.Mypersonaltakeonthismatteristhateverythingneedstobethoroughlytested,eitherdirectlyorindirectly.AndIwantthesmallestpossibletestsuitethatguaranteesmethat.Thisway,Iwillhaveagreattestsuiteintermsofcoverage,butnotanybiggerthannecessary.Youneedtomaintainthosetests!
Ihopethisexamplemadesensetoyou,Ithinkithasallowedmetotouchontheimportanttopics.
Ifyoucheckoutthesourcecodeforthebook,inthetest_api.pymodule,Ihave
addedacoupleofextratestclasses,whichwillshowyouhowdifferenttestingwouldhavebeenhadIdecidedtogoallthewaywiththemocks.Makesureyoureadthatcodeandunderstanditwell.Itisquitestraightforwardandwillofferyouagoodcomparisonwithmypersonalapproach,whichIhaveshownyouhere.
Now,howaboutwerunthosetests?(Theoutputisre-arrangedtofitthisbook'sformat):
$pytesttests
======================testsessionstarts======================
platformdarwin--Python3.7.0b2,pytest-3.5.0,py-1.5.3,...
rootdir:/Users/fab/srv/lpp/ch8,inifile:
collected132items
tests/test_api.py...............................................
.................................................................
....................[100%]
==================132passedin0.41seconds===================
Makesureyourun$pytesttestfromwithinthech8folder(addthe-vvflagforaverboseoutputthatwillshowyouhowparametrizationmodifiesthenamesofyourtests).Asyoucansee,132testswereruninlessthanhalfasecond,andtheyallsucceeded.Istronglysuggestyoucheckoutthiscodeandplaywithit.Changesomethinginthecodeandseewhetheranytestisbreaking.Understandwhyitisbreaking.Isitsomethingimportantthatmeansthetestisn'tgoodenough?Orisitsomethingsillythatshouldn'tcausethetesttobreak?Alltheseapparentlyinnocuousquestionswillhelpyougaindeepinsightintotheartoftesting.
Ialsosuggestyoustudytheunittestmodule,andpytesttoo.Thesearetoolsyouwilluseallthetime,soyouneedtobeveryfamiliarwiththem.
Let'snowcheckouttest-drivendevelopment!
Test-drivendevelopmentLet'stalkbrieflyabouttest-drivendevelopment(TDD).ItisamethodologythatwasrediscoveredbyKentBeck,whowroteTest-DrivenDevelopmentbyExample,AddisonWesley,2002,whichIencourageyoutocheckoutifyouwanttolearnaboutthefundamentalsofthissubject.
TDDisasoftwaredevelopmentmethodologythatisbasedonthecontinuousrepetitionofaveryshortdevelopmentcycle.
First,thedeveloperwritesatest,andmakesitrun.Thetestissupposedtocheckafeaturethatisnotyetpartofthecode.Maybeitisanewfeaturetobeadded,orsomethingtoberemovedoramended.Runningthetestwillmakeitfailand,becauseofthis,thisphaseiscalledRed.
Whenthetesthasfailed,thedeveloperwritestheminimalamountofcodetomakeitpass.Whenrunningthetestsucceeds,wehavetheso-calledGreenphase.Inthisphase,itisokaytowritecodethatcheats,justtomakethetestpass.Thistechniqueiscalledfakeit'tillyoumakeit.Inasecondmoment,testsareenrichedwithdifferentedgecases,andthecheatingcodethenhastoberewrittenwithproperlogic.Addingothertestcasesiscalledtriangulation.
Thelastpieceofthecycleiswherethedevelopertakescareofboththecodeandthetests(inseparatetimes)andrefactorsthemuntiltheyareinthedesiredstate.ThislastphaseiscalledRefactor.
TheTDDmantrathereforeisRed-Green-Refactor.
Atfirst,itfeelsreallyweirdtowritetestsbeforethecode,andImustconfessittookmeawhiletogetusedtoit.Ifyousticktoit,though,andforceyourselftolearnthisslightlycounter-intuitivewayofworking,atsomepointsomethingalmostmagicalhappens,andyouwillseethequalityofyourcodeincreaseinawaythatwouldn'tbepossibleotherwise.
Whenyouwriteyourcodebeforethetests,youhavetotakecareofwhatthecodehastodoandhowithastodoit,bothatthesametime.Ontheotherhand,whenyouwritetestsbeforethecode,youcanconcentrateonthewhatpart
alone,whileyouwritethem.Whenyouwritethecodeafterward,youwillmostlyhavetotakecareofhowthecodehastoimplementwhatisrequiredbythetests.Thisshiftinfocusallowsyourmindtoconcentrateonthewhatandhowpartsinseparatemoments,yieldingabrainpowerboostthatwillsurpriseyou.
Thereareseveralotherbenefitsthatcomefromtheadoptionofthistechnique:
Youwillrefactorwithmuchmoreconfidence:Testswillbreakifyouintroducebugs.Moreover,thearchitecturalrefactorwillalsobenefitfromhavingteststhatactasguardians.Thecodewillbemorereadable:Thisiscrucialinourtime,whencodingisasocialactivityandeveryprofessionaldeveloperspendsmuchmoretimereadingcodethanwritingit.Thecodewillbemorelooselycoupledandeasiertotestandmaintain:Writingthetestsfirstforcesyoutothinkmoredeeplyaboutcodestructure.Writingtestsfirstrequiresyoutohaveabetterunderstandingofthebusinessrequirements:Ifyourunderstandingoftherequirementsislackinginformation,you'llfindwritingatestextremelychallengingandthissituationactsasasentinelforyou.Havingeverythingunittestedmeansthecodewillbeeasiertodebug:Moreover,smalltestsareperfectforprovidingalternativedocumentation.Englishcanbemisleading,butfivelinesofPythoninasimpletestareveryhardtomisunderstand.Higherspeed:It'sfastertowritetestsandcodethanitistowritethecodefirstandthenlosetimedebuggingit.Ifyoudon'twritetests,youwillprobablydeliverthecodesooner,butthenyouwillhavetotrackthebugsdownandsolvethem(and,restassured,therewillbebugs).ThecombinedtimetakentowritethecodeandthendebugitisusuallylongerthanthetimetakentodevelopthecodewithTDD,wherehavingtestsrunningbeforethecodeiswritten,ensuringthattheamountofbugsinitwillbemuchlowerthanintheothercase.
Ontheotherhand,themainshortcomingsofthistechniquearethefollowingones:
Thewholecompanyneedstobelieveinit:Otherwise,youwillhavetoconstantlyarguewithyourboss,whowillnotunderstandwhyittakesyousolongtodeliver.Thetruthis,itmaytakeyouabitlongertodeliverinthe
short-term,butinthelong-term,yougainalotwithTDD.However,itisquitehardtoseethelong-termbecauseit'snotunderournosesliketheshort-termis.Ihavefoughtbattleswithstubbornbossesinmycareer,tobeabletocodeusingTDD.Sometimesithasbeenpainful,butalwayswellworthit,andIhaveneverregretteditbecause,intheend,thequalityoftheresulthasalwaysbeenappreciated.Ifyoufailtounderstandthebusinessrequirements,thiswillreflectinthetestsyouwrite,andthereforeitwillreflectinthecodetoo:ThiskindofproblemisquitehardtospotuntilyoudoUAT,butonethingthatyoucandotoreducethelikelihoodofithappeningistopairwithanotherdeveloper.Pairingwillinevitablyrequirediscussionsaboutthebusinessrequirements,anddiscussionwillbringclarification,whichwillhelpwritingcorrecttests.Badlywrittentestsarehardtomaintain:Thisisafact.Testswithtoomanymocksorwithextraassumptionsorbadly-structureddatawillsoonbecomeaburden.Don'tletthisdiscourageyou;justkeepexperimentingandchangethewayyouwritethemuntilyoufindawaythatdoesn'trequireyouahugeamountofworkeverytimeyoutouchyourcode.
I'mquitepassionateaboutTDD.WhenIinterviewforajob,Ialwaysaskwhetherthecompanyadoptsit.Iencourageyoutocheckitoutanduseit.Useituntilyoufeelsomethingclickinginyourmind.Youwon'tregretit,Ipromise.
ExceptionsEventhoughIhaven'tformallyintroducedthemtoyou,bynowIexpectyoutoatleasthaveavagueideaofwhatanexceptionis.Inthepreviouschapters,we'veseenthatwhenaniteratorisexhausted,callingnextonitraisesaStopIterationexception.WemetIndexErrorwhenwetriedaccessingalistatapositionthatwasoutsidethevalidrange.WealsometAttributeErrorwhenwetriedaccessinganattributeonanobjectthatdidn'thaveit,andKeyErrorwhenwedidthesamewithakeyandadictionary.
Nowthetimehascomeforustotalkaboutexceptions.
Sometimes,eventhoughanoperationorapieceofcodeiscorrect,thereareconditionsinwhichsomethingmaygowrong.Forexample,ifwe'reconvertinguserinputfromstringtoint,theusercouldaccidentallytypealetterinplaceofadigit,makingitimpossibleforustoconvertthatvalueintoanumber.Whendividingnumbers,wemaynotknowinadvancewhetherwe'reattemptingadivisionbyzero.Whenopeningafile,itcouldbemissingorcorrupted.
Whenanerrorisdetectedduringexecution,itiscalledanexception.Exceptionsarenotnecessarilylethal;infact,we'veseenthatStopIterationisdeeplyintegratedinthePythongeneratoranditeratormechanisms.Normally,though,ifyoudon'ttakethenecessaryprecautions,anexceptionwillcauseyourapplicationtobreak.Sometimes,thisisthedesiredbehavior,butinothercases,wewanttopreventandcontrolproblemssuchasthese.Forexample,wemayalerttheuserthatthefilethey'retryingtoopeniscorruptedorthatitismissingsothattheycaneitherfixitorprovideanotherfile,withouttheneedfortheapplicationtodiebecauseofthisissue.Let'sseeanexampleofafewexceptions:
#exceptions/first.example.py
>>>gen=(nforninrange(2))
>>>next(gen)
0
>>>next(gen)
1
>>>next(gen)
Traceback(mostrecentcalllast):
File"<stdin>",line1,in<module>
StopIteration
>>>print(undefined_name)
Traceback(mostrecentcalllast):
File"<stdin>",line1,in<module>
NameError:name'undefined_name'isnotdefined
>>>mylist=[1,2,3]
>>>mylist[5]
Traceback(mostrecentcalllast):
File"<stdin>",line1,in<module>
IndexError:listindexoutofrange
>>>mydict={'a':'A','b':'B'}
>>>mydict['c']
Traceback(mostrecentcalllast):
File"<stdin>",line1,in<module>
KeyError:'c'
>>>1/0
Traceback(mostrecentcalllast):
File"<stdin>",line1,in<module>
ZeroDivisionError:divisionbyzero
Asyoucansee,thePythonshellisquiteforgiving.WecanseeTraceback,sothatwehaveinformationabouttheerror,buttheprogramdoesn'tdie.Thisisaspecialbehavior,aregularprogramorascriptwouldnormallydieifnothingweredonetohandleexceptions.
Tohandleanexception,Pythongivesyouthetrystatement.Whenyouenterthetryclause,Pythonwillwatchoutforoneormoredifferenttypesofexceptions(accordingtohowyouinstructit),andiftheyareraised,itwillallowyoutoreact.Thetrystatementiscomposedofthetryclause,whichopensthestatement,oneormoreexceptclauses(alloptional)thatdefinewhattodowhenanexceptioniscaught,anelseclause(optional),whichisexecutedwhenthetryclauseisexitedwithoutanyexceptionraised,andafinallyclause(optional),whosecodeisexecutedregardlessofwhateverhappenedintheotherclauses.Thefinallyclauseistypicallyusedtocleanupresources(wesawthisinChapter7,FilesandDataPersistence,whenwewereopeningfileswithoutusingacontextmanager).
Mindtheorder—it'simportant.Also,trymustbefollowedbyatleastoneexceptclauseorafinallyclause.Let'sseeanexample:
#exceptions/try.syntax.py
deftry_syntax(numerator,denominator):
try:
print(f'Inthetryblock:{numerator}/{denominator}')
result=numerator/denominator
exceptZeroDivisionErroraszde:
print(zde)
else:
print('Theresultis:',result)
returnresult
finally:
print('Exiting')
print(try_syntax(12,4))
print(try_syntax(11,0))
Theprecedingexampledefinesasimpletry_syntaxfunction.Weperformthedivisionoftwonumbers.WearepreparedtocatchaZeroDivisionErrorexceptionifwecallthefunctionwithdenominator=0.Initially,thecodeentersthetryblock.Ifdenominatorisnot0,resultiscalculatedandtheexecution,afterleavingthetryblock,resumesintheelseblock.Weprintresultandreturnit.Takealookattheoutputandyou'llnoticethatjustbeforereturningresult,whichistheexitpointofthefunction,Pythonexecutesthefinallyclause.
Whendenominatoris0,thingschange.Weentertheexceptblockandprintzde.Theelseblockisn'texecutedbecauseanexceptionwasraisedinthetryblock.Before(implicitly)returningNone,westillexecutethefinallyblock.Takealookattheoutputandseewhetheritmakessensetoyou:
$pythontry.syntax.py
Inthetryblock:12/4#try
Theresultis:3.0#else
Exiting#finally
3.0#returnwithinelse
Inthetryblock:11/0#try
divisionbyzero#except
Exiting#finally
None#implicitreturnendoffunction
Whenyouexecuteatryblock,youmaywanttocatchmorethanoneexception.Forexample,whentryingtodecodeaJSONobject,youmayincurintoValueErrorformalformedJSON,orTypeErrorifthetypeofthedatayou'refeedingtojson.loads()isnotastring.Inthiscase,youmaystructureyourcodelikethis:
#exceptions/json.example.py
importjson
json_data='{}'
try:
data=json.loads(json_data)
except(ValueError,TypeError)ase:
print(type(e),e)
ThiscodewillcatchbothValueErrorandTypeError.Trychangingjson_data='{}'tojson_data=2orjson_data='{{',andyou'llseethedifferentoutput.
Ifyouwanttohandlemultipleexceptionsdifferently,youcanjustaddmoreexceptclauses,likethis:
#exceptions/multiple.except.py
try:
#somecode
exceptException1:
#reacttoException1
except(Exception2,Exception3):
#reacttoException2orException3
exceptException4:
#reacttoException4
...
Keepinmindthatanexceptionishandledinthefirstblockthatdefinesthatexceptionclassoranyofitsbases.Therefore,whenyoustackmultipleexceptclauseslikewe'vejustdone,makesurethatyouputspecificexceptionsatthetopandgenericonesatthebottom.InOOPterms,childrenontop,grandparentsatthebottom.Moreover,rememberthatonlyoneexcepthandlerisexecutedwhenanexceptionisraised.
Youcanalsowritecustomexceptions.Todothat,youjusthavetoinheritfromanyotherexceptionclass.Python'sbuilt-inexceptionsaretoomanytobelistedhere,soIhavetopointyoutotheofficialdocumentation.OneimportantthingtoknowisthateveryPythonexceptionderivesfromBaseException,butyourcustomexceptionsshouldneverinheritdirectlyfromit.Thereasonisbecausehandlingsuchanexceptionwillalsotrapsystem-exitingexceptions,suchasSystemExitandKeyboardInterrupt,whichderivefromBaseException,andthiscouldleadtosevereissues.Inthecaseofdisaster,youwanttobeabletoCtrl+Cyourwayoutofanapplication.
YoucaneasilysolvetheproblembyinheritingfromException,whichinheritsfromBaseExceptionbutdoesn'tincludeanysystem-exitingexceptioninitschildrenbecausetheyaresiblingsinthebuilt-inexceptionshierarchy(seehttps://docs.python.org/3/library/exceptions.html#exception-hierarchy).
Programmingwithexceptionscanbeverytricky.Youcouldinadvertentlysilenceouterrors,ortrapexceptionsthataren'tmeanttobehandled.Playitsafebykeepinginmindafewguidelines:alwaysputinthetryclauseonlythecodethatmaycausetheexception(s)thatyouwanttohandle.Whenyouwriteexceptclauses,beasspecificasyoucan,don'tjustresorttoexceptExceptionbecauseit'seasy.Useteststomakesureyourcodehandlesedgecasesinawaythatrequirestheleastpossibleamountofexceptionhandling.Writinganexceptstatementwithoutspecifyinganyexceptionwouldcatchanyexception,thereforeexposingyourcodetothesamerisksyouincurwhenyouderiveyourcustomexceptions
fromBaseException.
Youwillfindinformationaboutexceptionsalmosteverywhereontheweb.Somecodersusethemabundantly,otherssparingly.Findyourownwayofdealingwiththembytakingexamplesfromotherpeople'ssourcecode.ThereareplentyofinterestingopensourceprojectsonwebsitessuchasGitHub(https://github.com)andBitbucket(https://bitbucket.org/).
Beforewetalkaboutprofiling,letmeshowyouanunconventionaluseofexceptions,justtogiveyousomethingtohelpyouexpandyourviewsonthem.Theyarenotjustsimplyerrors:
#exceptions/for.loop.py
n=100
found=False
forainrange(n):
iffound:break
forbinrange(n):
iffound:break
forcinrange(n):
if42*a+17*b+c==5096:
found=True
print(a,b,c)#799995
Theprecedingcodeisquiteacommonidiomifyoudealwithnumbers.Youhavetoiterateoverafewnestedrangesandlookforaparticularcombinationofa,b,andcthatsatisfiesacondition.Intheexample,conditionisatriviallinearequation,butimaginesomethingmuchcoolerthanthat.Whatbugsmeishavingtocheckwhetherthesolutionhasbeenfoundatthebeginningofeachloop,inordertobreakoutofthemasfastaswecanwhenitis.ThebreakoutlogicinterfereswiththerestofthecodeandIdon'tlikeit,soIcameupwithadifferentsolutionforthis.Takealookatit,andseewhetheryoucanadaptittoothercasestoo:
#exceptions/for.loop.py
classExitLoopException(Exception):
pass
try:
n=100
forainrange(n):
forbinrange(n):
forcinrange(n):
if42*a+17*b+c==5096:
raiseExitLoopException(a,b,c)
exceptExitLoopExceptionasele:
print(ele)#(79,99,95)
Canyouseehowmuchmoreelegantitis?Nowthebreakoutlogicisentirelyhandledwithasimpleexceptionwhosenameevenhintsatitspurpose.Assoonastheresultisfound,weraiseit,andimmediatelythecontrolisgiventotheexceptclausethathandlesit.Thisisfoodforthought.Thisexampleindirectlyshowsyouhowtoraiseyourownexceptions.Readupontheofficialdocumentationtodiveintothebeautifuldetailsofthissubject.
Moreover,ifyouareupforachallenge,youmightwanttotrytomakethislastexampleintoacontextmanagerfornestedforloops.Goodluck!
ProfilingPythonThereareafewdifferentwaystoprofileaPythonapplication.Profilingmeanshavingtheapplicationrunwhilekeepingtrackofseveraldifferentparameters,suchasthenumberoftimesafunctioniscalledandtheamountoftimespentinsideit.Profilingcanhelpusfindthebottlenecksinourapplication,sothatwecanimproveonlywhatisreallyslowingusdown.
Ifyoutakealookattheprofilingsectioninthestandardlibraryofficialdocumentation,youwillseethatthereareacoupleofdifferentimplementationsofthesameprofilinginterface—profileandcProfile:
cProfileisrecommendedformostusers,it'saCextensionwithreasonableoverheadthatmakesitsuitableforprofilinglong-runningprogramsprofileisapurePythonmodulewhoseinterfaceisimitatedbycProfile,butwhichaddssignificantoverheadtoprofiledprograms
Thisinterfacedoesdeterministprofiling,whichmeansthatallfunctioncalls,functionreturns,andexceptioneventsaremonitored,andprecisetimingsaremadefortheintervalsbetweentheseevents.Anotherapproach,calledstatisticalprofiling,randomlysamplestheeffectiveinstructionpointer,anddeduceswheretimeisbeingspent.
Thelatterusuallyinvolveslessoverhead,butprovidesonlyapproximateresults.Moreover,becauseofthewaythePythoninterpreterrunsthecode,deterministicprofilingdoesn'taddasmuchoverheadasonewouldthink,soI'llshowyouasimpleexampleusingcProfilefromthecommandline.
We'regoingtocalculatePythagoreantriples(Iknow,you'vemissedthem...)usingthefollowingcode:
#profiling/triples.py
defcalc_triples(mx):
triples=[]
forainrange(1,mx+1):
forbinrange(a,mx+1):
hypotenuse=calc_hypotenuse(a,b)
ifis_int(hypotenuse):
triples.append((a,b,int(hypotenuse)))
returntriples
defcalc_hypotenuse(a,b):
return(a**2+b**2)**.5
defis_int(n):#nisexpectedtobeafloat
returnn.is_integer()
triples=calc_triples(1000)
Thescriptisextremelysimple;weiterateovertheinterval[1,mx]withaandb(avoidingrepetitionofpairsbysettingb>=a)andwecheckwhethertheybelongtoarighttriangle.Weusecalc_hypotenusetogethypotenuseforaandb,andthen,withis_int,wecheckwhetheritisaninteger,whichmeans(a,b,c)isaPythagoreantriple.Whenweprofilethisscript,wegetinformationinatabularform.Thecolumnsarencalls,tottime,percall,cumtime,percall,andfilename:lineno(function).Theyrepresenttheamountofcallswemadetoafunction,howmuchtimewespentinit,andsoon.I'lltrimacoupleofcolumnstosavespace,soifyouruntheprofilingyourself—don'tworryifyougetadifferentresult.Hereisthecode:
$python-mcProfiletriples.py
1502538functioncallsin0.704seconds
Orderedby:standardname
ncallstottimepercallfilename:lineno(function)
5005000.3930.000triples.py:17(calc_hypotenuse)
5005000.0960.000triples.py:21(is_int)
10.0000.000triples.py:4(<module>)
10.1760.176triples.py:4(calc_triples)
10.0000.000{built-inmethodbuiltins.exec}
10340.0000.000{method'append'of'list'objects}
10.0000.000{method'disable'of'_lsprof.Profil...
5005000.0380.000{method'is_integer'of'float'objects}
Evenwiththislimitedamountofdata,wecanstillinfersomeusefulinformationaboutthiscode.First,wecanseethatthetimecomplexityofthealgorithmwehavechosengrowswiththesquareoftheinputsize.Theamountoftimeswegetinsidetheinnerloopbodyisexactlymx(mx+1)/2.Werunthescriptwithmx=1000,whichmeansweget500500timesinsidetheinnerforloop.Threemainthingshappeninsidethatloop:wecallcalc_hypotenuse,wecallis_int,and,iftheconditionismet,weappendittothetripleslist.
Takingalookattheprofilingreport,wenoticethatthealgorithmhasspent0.393secondsinsidecalc_hypotenuse,whichiswaymorethanthe0.096secondsspentinsideis_int,giventhattheywerecalledthesamenumberoftimes,solet'sseewhetherwecanboostcalc_hypotenusealittle.
Asitturnsout,wecan.AsImentionedearlierinthisbook,the**poweroperatorisquiteexpensive,andincalc_hypotenuse,we'reusingitthreetimes.Fortunately,wecaneasilytransformtwoofthoseintosimplemultiplications,likethis:
defcalc_hypotenuse(a,b):
return(a*a+b*b)**.5
Thissimplechangeshouldimprovethings.Ifweruntheprofilingagain,weseethat0.393isnowdownto0.137.Notbad!Thismeansnowwe'respendingonlyabout37%ofthetimeinsidecalc_hypotenusethatwewerebefore.
Let'sseewhetherwecanimproveis_intaswell,bychangingit,likethis:
defis_int(n):
returnn==int(n)
Thisimplementationisdifferent,andtheadvantageisthatitalsoworkswhennisaninteger.Alas,whenweruntheprofilingagainstit,weseethatthetimetakeninsidetheis_intfunctionhasgoneupto0.135seconds,so,inthiscase,weneedtoreverttothepreviousimplementation.Youwillfindthethreeversionsinthesourcecodeforthebook.
Thisexamplewastrivial,ofcourse,butenoughtoshowyouhowonecouldprofileanapplication.Havingtheamountofcallsthatareperformedagainstafunctionhelpsusbetterunderstandthetimecomplexityofouralgorithms.Forexample,youwouldn'tbelievehowmanycodersfailtoseethatthosetwoforloopsrunproportionallytothesquareoftheinputsize.
Onethingtomention:dependingonwhatsystemyou'reusing,resultsmaybedifferent.Therefore,it'squiteimportanttobeabletoprofilesoftwareonasystemthatisascloseaspossibletotheonethesoftwareisdeployedon,ifnotactuallyonthatone.
Whentoprofile?Profilingissupercool,butweneedtoknowwhenitisappropriatetodoit,andinwhatmeasureweneedtoaddresstheresultswegetfromit.
DonaldKnuthoncesaid,""prematureoptimizationistherootofallevil","and,althoughIwouldn'thaveputitdownsodrastically,Idoagreewithhim.Afterall,whoamItodisagreewiththemanwhogaveusTheArtofComputerProgramming,TeX,andsomeofthecoolestalgorithmsIhaveeverstudiedwhenIwasauniversitystudent?
So,firstandforemost:correctness.Youwantyourcodetodeliverthecorrectresults,thereforewritetests,findedgecases,andstressyourcodeineverywayyouthinkmakessense.Don'tbeprotective,don'tputthingsinthebackofyourbrainforlaterbecauseyouthinkthey'renotlikelytohappen.Bethorough.
Second,takecareofcodingbestpractices.Rememberthefollowing—readability,extensibility,loosecoupling,modularity,anddesign.ApplyOOPprinciples:encapsulation,abstraction,singleresponsibility,open/closed,andsoon.Readupontheseconcepts.Theywillopenhorizonsforyou,andtheywillexpandthewayyouthinkaboutcode.
Third,refactorlikeabeast!TheBoyScoutsrulesays:
"Alwaysleavethecampgroundcleanerthanyoufoundit."
Applythisruletoyourcode.
And,finally,whenallofthishasbeentakencareof,thenandonlythen,takecareofoptimizingandprofiling.
Runyourprofilerandidentifybottlenecks.Whenyouhaveanideaofthebottlenecksyouneedtoaddress,startwiththeworstonefirst.Sometimes,fixingabottleneckcausesarippleeffectthatwillexpandandchangethewaytherestofthecodeworks.Sometimesthisisonlyalittle,sometimesabitmore,accordingtohowyourcodewasdesignedandimplemented.Therefore,start
withthebiggestissuefirst.
OneofthereasonsPythonissopopularisthatitispossibletoimplementitinmanydifferentways.So,ifyoufindyourselfhavingtroubleboostingupsomepartofyourcodeusingsheerPython,nothingpreventsyoufromrollingupyoursleeves,buying200litersofcoffee,andrewritingtheslowpieceofcodeinC—guaranteedtobefun!
SummaryInthischapter,weexploredtheworldoftesting,exceptions,andprofiling.
Itriedtogiveyouafairlycomprehensiveoverviewoftesting,especiallyunittesting,whichisthekindoftestingthatadevelopermostlydoes.IhopeIhavesucceededinchannelingthemessagethattestingisnotsomethingthatisperfectlydefinedthatyoucanlearnfromabook.Youneedtoexperimentwithitalotbeforeyougetcomfortable.Ofalltheeffortsacodermustmakeintermsofstudyandexperimentation,I'dsaytestingistheonethatisthemostimportant.
Webrieflysawhowwecanpreventourprogramfromdyingbecauseoferrors,calledexceptions,thathappenatruntime.And,tosteerawayfromtheusualground,Ihavegivenyouanexampleofasomewhatunconventionaluseofexceptionstobreakoutofnestedforloops.That'snottheonlycase,andI'msureyou'lldiscoverothersasyougrowasacoder.
Attheend,weverybrieflytouchedonprofiling,withasimpleexampleandafewguidelines.Iwantedtotalkaboutprofilingforthesakeofcompleteness,soatleastyoucanplayaroundwithit.
Inthenextchapter,we'regoingtoexplorethewonderfulworldofsecrets,hashing,andcreatingtokens.
IamawarethatIgaveyoualotofpointersinthischapter,withnolinksordirections.I'mafraidthiswasbychoice.Asacoder,therewon'tbeasingledayatworkwhenyouwon'thavetolooksomethingupinadocumentationpage,inamanual,onawebsite,andsoon.Ithinkit'svitalforacodertobeabletosearcheffectivelyfortheinformationtheyneed,soIhopeyou'llforgivemeforthisextratraining.Afterall,it'sallforyourbenefit.
CryptographyandTokens"ThreemaykeepaSecret,iftwoofthemaredead."
–BenjaminFranklin,PoorRichard'sAlmanack
Inthisshortchapter,IamgoingtogiveyouabriefoverviewofthecryptographicservicesofferedbythePythonstandardlibrary.IamalsogoingtotouchuponsomethingcalledJSONWebToken,whichisaveryinterestingstandardtorepresentclaimssecurelybetweentwoparties.
Inparticular,wearegoingtoexplorethefollowing:
HashlibSecretsHMACJSONWebTokenswithPyJWT,whichseemstobethemostpopularPythonlibraryfordealingwithJWTs
Let'sstartbyspendingamomenttalkingaboutcryptographyandwhyitissoimportant.
TheneedforcryptographyAccordingtothestatisticsyoucanfindallovertheweb,theestimatedamountofsmartphoneusersin2019willbearound2.5billion.EachofthosepeopleknowthePINtounlocktheirphone,thecredentialstologintoapplicationsweallusetodo,well,basicallyeverything,frombuyingfoodtofindingastreet,fromsendingamessagetoafriend,toseeingifourbitcoinwallethasincreasedinvaluesincewelastchecked10secondsago.
Ifyouareanapplicationdeveloper,youhavetotakesecurityvery,veryseriously.Itdoesn'tmatterhowsmallorapparentlyinsignificantyourapplicationis:securityshouldalwaysbeaconcernforyou.
Securityininformationtechnologyisachievedbyemployingseveraldifferentmeans,butbyfar,themostimportantoneiscryptography.Everythingyoudowithyourcomputerorphoneshouldincludealayerwherecryptographytakesplace(andifnot,that'sreallybad).Itisusedtopayonlinewithacreditcard,totransfermessagesoverthenetworkinawaythatevenifsomeoneinterceptsthem,theywon'tbeabletoreadthem,anditisusedtoencryptyourfileswhenyoubackthemupinthecloud(becauseyoudo,right?).Listsofexamplesareendless.
Now,thepurposeofthischapterisnotthatofteachingyouthedifferencebetweenhashingandencryption,asIcouldwriteawholeotherbookonthesubject.Rather,itisthatofshowingyouhowyoucanusethetoolsthatPythonoffersyoutocreatedigests,tokens,andingeneral,tobeonthesafe(r)sidewhenyouneedtoimplementsomethingcryptography-related.
Usefulguidelines
Alwaysrememberthefollowingrules:
Rulenumberone:Donotattempttocreateyourownhashorencryptionfunctions.Simplydon't.Usetoolsandfunctionsthataretherealready.Itisincrediblytoughtocomeupwithagood,solid,robustalgorithmtodohashingorencryption,soit'sbesttoleaveittoprofessionalcryptographers.Rulenumbertwo:Followrulenumberone.
Thosearetheonlytworulesyouneed.Apartfromthem,itisveryusefultounderstandcryptography,soyouneedtotryandlearnasmuchasyoucanaboutthissubject.Thereisplentyofinformationontheweb,butforyourconvenience,I'llputsomeusefulreferencesattheendofthischapter.
Now,let'sdigintothefirstofthestandardlibrarymodulesIwanttoshowyou:hashlib.
HashlibThismoduleexposesacommoninterfacetomanydifferentsecurehashandmessagedigestalgorithms.Thedifferenceinthosetwotermsissimplyhistorical:olderalgorithmswerecalleddigests,whilethemodernalgorithmsarecalledhashes.
Ingeneral,ahashfunctionisanyfunctionthatcanbeusedtomapdataofanarbitrarysizetodataofafixedsize.Itisaone-waytypeofencryption,inthatitisnotexpectedtobeabletorecoverthemessagegivenitshash.
Thereareseveralalgorithmsthatcanbeusedtocalculateahash,solet'sseehowtofindoutwhichonesaresupportedbyyoursystem(note,yourresultsmightbedifferentthanmine):
>>>importhashlib
>>>hashlib.algorithms_available
{'SHA512','SHA256','shake_256','sha3_256','ecdsa-with-SHA1',
'DSA-SHA','sha1','sha384','sha3_224','whirlpool','mdc2',
'RIPEMD160','shake_128','MD4','dsaEncryption','dsaWithSHA',
'SHA1','blake2s','md5','sha','sha224','SHA','MD5',
'sha256','SHA384','sha3_384','md4','SHA224','MDC2',
'sha3_512','sha512','blake2b','DSA','ripemd160'}
>>>hashlib.algorithms_guaranteed
{'blake2s','md5','sha224','sha3_512','shake_256','sha3_256',
'shake_128','sha256','sha1','sha512','blake2b','sha3_384',
'sha384','sha3_224'}
ByopeningaPythonshell,wecangetthelistofavailablealgorithmsforoursystem.Ifourapplicationhastotalktothird-partyapplications,it'salwaysbesttopickanalgorithmoutofthoseguaranteed,though,asthatmeanseveryplatformactuallysupportsthem.Noticethatalotofthemstartwithsha,whichmeanssecurehashalgorithm.Let'skeepgoinginthesameshell:wearegoingtocreateahashforthebinarystringb'Hashmenow!',andwe'regoingtodoitintwoways:
>>>h=hashlib.blake2b()
>>>h.update(b'Hashme')
>>>h.update(b'now!')
>>>h.hexdigest()
'56441b566db9aafcf8cdad3a4729fa4b2bfaab0ada36155ece29f52ff70e1e9d'
'7f54cacfe44bc97c7e904cf79944357d023877929430bc58eb2dae168e73cedf'
>>>h.digest()
b'VD\x1bVm\xb9\xaa\xfc\xf8\xcd\xad:G)\xfaK+\xfa\xab\n\xda6\x15^'
b'\xce)\xf5/\xf7\x0e\x1e\x9d\x7fT\xca\xcf\xe4K\xc9|~\x90L\xf7'
b'\x99D5}\x028w\x92\x940\xbcX\xeb-\xae\x16\x8es\xce\xdf'
>>>h.block_size
128
>>>h.digest_size
64
>>>h.name
'blake2b'
Wehaveusedtheblake2bcryptographicfunction,whichisquitesophisticatedandwasaddedinPython3.6.Aftercreatingthehashobjecth,weupdateitsmessageintwosteps.Notthatweneedto,butsometimesweneedtohashdatathatisnotavailableallatonce,soit'sgoodtoknowwecandoitinsteps.
Whenthemessageislikewewantittobe,wegetthehexrepresentationofthedigest.Thiswillusetwocharactersperbyte(aseachcharacterrepresents4bits,whichishalfabyte).Wealsogetthebyterepresentationofthedigest,andthenweinspectitsdetails:ithasablocksize(theinternalblocksizeofthehashalgorithminbytes)of128bytes,adigestsize(thesizeoftheresultinghashinbytes)of64bytes,andaname.Couldallthisbedoneinonesimplerline?Yes,ofcourse:
>>>hashlib.blake2b(b'Hashmenow!').hexdigest()
'56441b566db9aafcf8cdad3a4729fa4b2bfaab0ada36155ece29f52ff70e1e9d'
'7f54cacfe44bc97c7e904cf79944357d023877929430bc58eb2dae168e73cedf'
Noticehowthesamemessageproducesthesamehash,whichofcourseisexpected.
Let'sseewhatwegetif,insteadoftheblake2bfunction,weusesha256:
>>>hashlib.sha256(b'Hashmenow!').hexdigest()
'10d561fa94a89a25ea0c7aa47708bdb353bbb062a17820292cd905a3a60d6783'
Theresultinghashisshorter(andthereforelesssecure).
Hashingisaveryinterestingtopic,andofcoursethesimpleexampleswe'veseensofararejustthestart.Theblake2bfunctionallowsusagreatdealofflexibilityintermsofcustomization.Thisisextremelyusefultopreventsomekindsofattacks(forthefullexplanationofthosethreats,pleasedorefertothestandarddocumentationat:https://docs.python.org/3.7/library/hashlib.htmlforthehashlibmodule).Let'sseeanotherexamplewherewecustomizeahashbyaddingakey,asalt,andaperson.Allofthisextrainformationwillcausethehashtobedifferentthantheonewewouldgetifwedidn'tprovidethem,andarecrucialin
addingextrasecuritytothedatahandledinoursystem:
>>>h=hashlib.blake2b(
...b'Importantpayload',digest_size=16,key=b'secret-key',
...salt=b'random-salt',person=b'fabrizio'
...)
>>>h.hexdigest()
'c2d63ead796d0d6d734a5c3c578b6e41'
Theresultinghashisonly16byteslong.Amongthecustomizationparameters,saltisprobablythemostfamousone.Itisrandomdatathatisusedasanadditionalinputtoaone-wayfunctionthathashesdata.Itiscommonlystoredalongsidetheresultinghash,inordertoprovidethemeanstorecoverthesamehashgiventhesamemessage.
Ifyouwanttomakesureyouhashapasswordproperly,youcanusepbkdf2_hmac,akeyderivationalgorithmthatallowsyoutospecifyasaltandalsothenumberofiterationsusedbythealgorithmitself.Ascomputersgetmoreandmorepowerful,itisimportanttoincreasetheamountofiterationswedoovertime,otherwisethelikelihoodofasuccessfulbrute-forceattackonourdataincreasesastimepasses.Here'showyouwouldusesuchanalgorithm:
>>>importos
>>>dk=hashlib.pbkdf2_hmac(
...'sha256',b'Password123',os.urandom(16),100000
...)
>>>dk.hex()
'f8715c37906df067466ce84973e6e52a955be025a59c9100d9183c4cbec27a9e'
NoticeIhaveusedos.urandomtoprovidea16byterandomsalt,asrecommendedbythedocumentation.
Iencourageyoutoexploreandexperimentwiththismodule,assoonerorlateryouwillhavetouseit.Now,let'smoveontothesecretsone.
SecretsThisnice,smallmoduleisusedforgeneratingcryptographicallystrong,randomnumberssuitableformanagingdatasuchaspasswords,accountauthentication,securitytokens,andrelatedsecrets.ItwasaddedinPython3.6,andbasicallydealswiththreethings:randomnumbers,tokens,anddigestcomparison.Let'sexplorethemveryquickly.
RandomnumbersWecanusethreefunctionsinordertodealwithrandomnumbers:
#secrs/secr_rand.py
importsecrets
print(secrets.choice('Chooseoneofthesewords'.split()))
print(secrets.randbelow(10**6))
print(secrets.randbits(32))
Thefirstone,choice,picksanelementatrandomfromanon-emptysequence.Thesecondone,randbelow,generatesarandomintegerbetween0andtheargumentyoucallitwith,andthethirdone,randbits,generatesanintegerwithnrandombitsinit.Runningthatcodeproducesthefollowingoutput(whichisalwaysdifferent):
$pythonsecr_rand.py
one
504156
3172492450
Youshouldusethesefunctionsinsteadofthosefromtherandommodulewheneveryouneedrandomnessinthecontextofcryptography,asthesearespeciallydesignedforthistask.Let'sseewhatthemodulegivesusfortokens.
TokengenerationAgain,wehavethreefunctionsthatallproduceatoken,albeitindifferentformats.Let'sseetheexample:
#secrs/secr_rand.py
print(secrets.token_bytes(16))
print(secrets.token_hex(32))
print(secrets.token_urlsafe(32))
Thefirstone,token_bytes,simplyreturnsarandombytestringcontainingnbytes(16,inthisexample).Theothertwodothesame,buttoken_hexreturnsatokeninhexadecimalformat,andtoken_urlsafereturnsatokenthatonlycontainscharacterssuitableforbeingincludedinaURL.Let'sseetheoutput(whichisacontinuationfromthepreviousrun):
b'\xda\x863\xeb\xbb|\x8fk\x9b\xbd\x14Q\xd4\x8d\x15}'
9f90fd042229570bf633e91e92505523811b45e1c3a72074e19bbeb2e5111bf7
bl4qz_Av7QNvPEqZtKsLuTOUsNLFmXW3O03pn50leiY
Thisisallnice,sowhydon'twehavesomefunandwritearandompasswordgeneratorusingthesetools?
#secrs/secr_gen.py
importsecrets
fromstringimportdigits,ascii_letters
defgenerate_pwd(length=8):
chars=digits+ascii_letters
return''.join(secrets.choice(chars)forcinrange(length))
defgenerate_secure_pwd(length=16,upper=3,digits=3):
iflength<upper+digits+1:
raiseValueError('Nicetry!')
whileTrue:
pwd=generate_pwd(length)
if(any(c.islower()forcinpwd)
andsum(c.isupper()forcinpwd)>=upper
andsum(c.isdigit()forcinpwd)>=digits):
returnpwd
print(generate_secure_pwd())
print(generate_secure_pwd(length=3,upper=1,digits=1))
Inthepreviouscode,wedefinedtwofunctions.generate_pwdsimplygeneratesarandomstringofgivenlengthbyjoiningtogetherlengthcharacterspickedatrandomfromastringthatcontainsallthelettersofthealphabet(lowercaseand
uppercase),andthe10decimaldigits.
Then,wedefineanotherfunction,generate_secure_pwd,thatsimplykeepscallinggenerate_pwduntiltherandomstringwegetmatchestherequirements,whicharequitesimple.Thepasswordmusthaveatleastonelowercasecharacter,upperuppercasecharacters,digitsdigits,andlengthlength.
Beforewediveintothewhileloop,it'sworthnotingthatifwesumtogethertherequirements(uppercase,lowercase,anddigits)andthatsumisgreaterthantheoveralllengthofthepassword,thereisnowaywecaneversatisfytheconditionwithintheloop.So,inordertoavoidgettingstuckinaninfiniteloop,Ihaveputacheckclauseinthefirstlineofthebody,andIraiseaValueErrorincaseIneedit.Couldyouthinkofhowtowriteatestforthisedgecase?
Thebodyofthewhileloopisstraightforward:firstwegeneratetherandompassword,andthenweverifytheconditionsbyusinganyandsum.anyreturnsTrueifanyoftheitemsintheiterableit'scalledwithevaluatetoTrue.Theuseofsumisactuallyslightlymoretrickyhere,inthatitexploitspolymorphism.CanyouseewhatI'mtalkingaboutbeforeyoureadon?
Well,it'sverysimple:TrueandFalseinPythonaresubclassesofintegernumbers,thereforewhensummingonaniterableofTrue/Falsevalues,theywillautomaticallybeinterpretedlikeintegersbythesumfunction.Thatiscalledpolymorphism,andwe'vebrieflytalkedaboutitinChapter6,OOP,Decorators,andIterators.
Runningtheexampleproducesthefollowingresult:
$pythonsecr_gen.py
nsL5voJnCi7Ote3F
J5e
Thesecondpasswordisprobablynottoosecure...
Onelastexample,beforewemoveontothenextmodule.Let'sgeneratearesetpasswordURL:
#secrs/secr_reset.py
importsecrets
defget_reset_pwd_url(token_length=16):
token=secrets.token_urlsafe(token_length)
returnf'https://fabdomain.com/reset-pwd/{token}'
print(get_reset_pwd_url())
ThisfunctionissoeasyIwillonlyshowyoutheoutput:
$pythonsecr_reset.py
https://fabdomain.com/reset-pwd/m4jb7aKgzTGuyjs9lTIspw
Digestcomparison
Thisisprobablyquitesurprising,butwithinsecrets,youcanfindthecompare_digest(a,b)function,whichistheequivalentofcomparingtwodigestsbysimplydoinga==b.So,whydoweneedthatfunction?It'sbecauseithasbeendesignedtopreventtimingattacks.Thesekindofattackscaninferinformationaboutwherethetwodigestsstartbeingdifferent,accordingtothetimeittakesforthecomparisontofail.So,compare_digestpreventsthisattackbyremovingthecorrelationbetweentimeandfailures.Ithinkthisisabrilliantexampleofhowsophisticatedattackingmethodscanbe.Ifyouraisedyoureyebrowsinastonishment,maybenowit'sclearerwhyIsaidtoneverimplementcryptographyfunctionsbyyourself.
Andthat'sit!Now,let'scheckouthmac.
HMACThismoduleimplementstheHMACalgorithm,asdescribedbyRFC2104(https://tools.ietf.org/html/rfc2104.html).Sinceitisverysmall,butnonethelessimportant,Iwillprovideyouwithasimpleexample:
#hmc.py
importhmac
importhashlib
defcalc_digest(key,message):
key=bytes(key,'utf-8')
message=bytes(message,'utf-8')
dig=hmac.new(key,message,hashlib.sha256)
returndig.hexdigest()
digest=calc_digest('secret-key','ImportantMessage')
Asyoucansee,theinterfaceisalwaysthesameorsimilar.Wefirstconvertthekeyandthemessageintobytes,andthencreateadigestinstancethatwewillusetogetahexadecimalrepresentationofthehash.Notmuchelsetosay,butIthoughttoaddthismoduleanyway,forcompleteness.
Now,let'smoveontoadifferenttypeoftoken:JWTs.
JSONWebTokensAJSONWebToken,orJWT,isaJSON-basedopenstandardforcreatingtokensthatassertsomenumberofclaims.Youcanlearnallaboutthistechnologyonthewebsite(https://jwt.io/).Inanutshell,thistypeoftokeniscomprisedofthreesections,separatedbyadot,intheformatA.B.C.Bisthepayload,whichiswhereweputthedataandtheclaims.Cisthesignature,whichisusedtoverifythevalidityofthetoken,andAisthealgorithmusedtocomputethesignature.A,B,andCareallencodedwithaURLsafeBase64encoding(whichI'llrefertoasBase64URL).
Base64isaverypopularbinary-to-textencodingschemethatrepresentsbinarydatainanASCIIstringformatbytranslatingitintoaradix-64representation.Theradix-64representationusesthelettersA-Z,a-z,andthedigits0-9,plusthetwosymbols+and/foragrandtotalof64symbolsaltogether.Therefore,notsurprisingly,theBase64alphabetismadeupofthese64symbols.Base64isused,forexample,toencodeimagesattachedinanemail.Ithappensseamlessly,sothevastmajorityofpeoplearecompletelyobliviousofthisfact.
ThereasonwhyaJWTisencodedusingBase64URLisbecauseofthecharacters+and/,whichinaURLcontextmeanspace,andpathseparator,respectively.ThereforeintheURLsafeversion,theyarereplacedwith-and_.Moreover,anypaddingcharacter(=),whichisnormallyusedinBase64,isstrippedout,asthistoohasaspecificmeaningwithinaURL.
Thewaythistypeoftokenworksisthereforeslightlydifferentthanwhatweareusedtowhenweworkwithhashes.Infact,theinformationthatthetokencarriesisalwaysvisible.YoujustneedtodecodeAandBtogetthealgorithmandthepayload.However,thesecurityliesinpartC,whichisaHMAChashofthetoken.IfyoutrytomodifytheBpartbyeditingthepayload,encodingitbacktoBase64,andreplacingitinthetoken,thesignaturewon'tmatchanymore,andthereforethetokenwillbeinvalid.
Thismeansthatwecanbuildapayloadwithclaimssuchasloggedinasadmin,orsomethingalongthoselines,andaslongasthetokenisvalid,weknowwecantrustthatthatuserisactuallyloggedinasanadmin.
WhendealingwithJWTs,youwanttomakesureyouhaveresearchedhowtohandlethem
safely.Thingslikenotacceptingunsignedtokens,orrestrictingthelistofalgorithmsyouusetoencodeanddecode,aswellasothersecuritymeasures,areveryimportantandyoushouldtakethetimetoinvestigateandlearnthem.
Forthispartofthecode,youwillhavetohavethePyJWTandcryptographyPythonpackagesinstalled.Asalways,youwillfindthemintherequirementsofthesourcecodeofthisbook.
Let'sstartwithasimpleexample:
#tok.py
importjwt
data={'payload':'data','id':123456789}
token=jwt.encode(data,'secret-key')
data_out=jwt.decode(token,'secret-key')
print(token)
print(data_out)
Wedefinethedatapayload,whichcontainsanIDandsomepayloaddata.Then,wecreateatokenusingthejwt.encodefunction,whichtakesatleastthepayloadandasecretkey,whichisusedtocomputethesignature.ThedefaultalgorithmusedtocalculatethetokenisHS256.Let'sseetheoutput:
$pythontok.py
b'eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJwYXlsb2FkIjoiZGF0YSIsImlkIjoxMjM0NTY3ODl9.WFRY-
uoACMoNYX97PXXjEfXFQO1rCyFCyiwxzOVMn40'
{'payload':'data','id':123456789}
So,asyoucansee,thetokenisabinarystringofBase64URL-encodedpiecesofdata.Wehavecalledjwt.decode,providingthecorrectsecretkey.Hadwedoneotherwise,thedecodingwouldhavebroken.
Sometimes,youmightwanttobeabletoinspectthecontentofthetokenwithoutverifyingit.Youcandosobysimplycallingdecodethisway:
#tok.py
jwt.decode(token,verify=False)
Thisisuseful,forexample,whenvaluesinthetokenpayloadareneededtorecoverthesecretkey,butthattechniqueisquiteadvancedsoIwon'tbespendingtimeonitinthiscontext.Instead,let'sseehowwecanspecifyadifferentalgorithmforcomputingthesignature:
#tok.py
token512=jwt.encode(data,'secret-key',algorithm='HS512')
data_out=jwt.decode(token512,'secret-key',algorithm='HS512')
print(data_out)
Theoutputisouroriginalpayloaddictionary.Incaseyouwanttoallowmorethanonealgorithminthedecodingphase,youcanevenspecifyalistofthem,insteadofonlyone.
Now,whileyouarefreetoputwhateveryouwantinthetokenpayload,therearesomeclaimsthathavebeenstandardized,andtheyenableyoutohaveagreatdealofcontroloverthetoken.
Registeredclaims
Atthetimeofwritingthisbook,thesearetheregisteredclaims:
iss:Theissuerofthetokensub:Thesubjectinformationaboutthepartythistokeniscarryinginformationaboutaud:Theaudienceforthetokenexp:Theexpirationtime,afterwhichthetokenisconsideredtobeinvalidnbf:Thenotbefore(time),orthetimebeforewhichthetokenisconsideredtobenotvalidyetiat:Thetimeatwhichthetokenwasissuedjti:ThetokenID
Claimscanalsobecategorizedaspublicorprivate:
Private:Arethosethataredefinedbyusers(consumersandproducers)oftheJWTs.Inotherwords,theseareadhocclaimsusedforaparticularcase.Assuch,caremustbetakentopreventcollisions.Public:AreclaimsthatareeitherregisteredwiththeIANAJSONWebTokenClaimsRegistry(aregistrywhereuserscanregistertheirclaimsandthuspreventcollisions),ornamedusingacollisionresistantname(forinstance,byprependinganamespacetoitsname).
Tolearnallaboutclaims,pleaserefertotheofficialwebsite.Now,let'sseeacoupleofcodeexamplesinvolvingasubsetoftheseclaims.
Time-relatedclaimsLet'sseehowwemightusetheclaimsrelatedtotime:
#claims_time.py
fromdatetimeimportdatetime,timedelta
fromtimeimportsleep
importjwt
iat=datetime.utcnow()
nfb=iat+timedelta(seconds=1)
exp=iat+timedelta(seconds=3)
data={'payload':'data','nbf':nfb,'exp':exp,'iat':iat}
defdecode(token,secret):
print(datetime.utcnow().time().isoformat())
try:
print(jwt.decode(token,secret))
except(
jwt.ImmatureSignatureError,jwt.ExpiredSignatureError
)aserr:
print(err)
print(type(err))
secret='secret-key'
token=jwt.encode(data,secret)
decode(token,secret)
sleep(2)
decode(token,secret)
sleep(2)
decode(token,secret)
Inthisexample,wesettheissuedat(iat)claimtothecurrentUTCtime(UTCstandsforUniversalTimeCoordinated).Wethensetthenotbefore(nbf)andexpiretime(exp)at1and3secondsfromnow,respectively.Wethendefinedadecodehelperfunctionthatreactstoatokennotbeingvalidyet,orbeingexpired,bytrappingtheappropriateexceptions,andthenwecallitthreetimes,interspersedbytwocallstosleep.Thisway,wewilltrytodecodethetokenwhenit'snotvalidyet,thenwhenit'svalid,andfinallywhenit'salreadyexpired.Thisfunctionalsoprintsausefultimestampbeforeattemptingdecryption.Let'sseehowitgoes(blanklineshavebeenaddedforreadability):
$pythonclaims_time.py
14:04:13.469778
Thetokenisnotyetvalid(nbf)
<class'jwt.exceptions.ImmatureSignatureError'>
14:04:15.475362
{'payload':'data','nbf':1522591454,'exp':1522591456,'iat':1522591453}
14:04:17.476948
Signaturehasexpired
<class'jwt.exceptions.ExpiredSignatureError'>
Asyoucansee,itallexecutedasexpected.Wegetnice,descriptivemessagesfromtheexceptions,andgettheoriginalpayloadbackwhenthetokenisactuallyvalid.
Auth-relatedclaimsLet'sseeanotherquickexampleinvolvingtheissuer(iss)andaudience(aud)claims.Thecodeisconceptuallyverysimilartothepreviousexample,andwe'regoingtoexerciseitinthesameway:#claims_auth.pyimportjwt
data={'payload':'data','iss':'fab','aud':'learn-python'}secret='secret-key'token=jwt.encode(data,secret)
defdecode(token,secret,issuer=None,audience=None):try:print(jwt.decode(token,secret,issuer=issuer,audience=audience))except(jwt.InvalidIssuerError,jwt.InvalidAudienceError)aserr:print(err)print(type(err))
decode(token,secret)#notprovidingtheissuerwon'tbreakdecode(token,secret,audience='learn-python')#notprovidingtheaudiencewillbreakdecode(token,secret,issuer='fab')#bothwillbreakdecode(token,secret,issuer='wrong',audience='learn-python')decode(token,secret,issuer='fab',audience='wrong')
decode(token,secret,issuer='fab',audience='learn-python')
Asyoucansee,thistime,wehavespecifiedissuerandaudience.Itturnsoutthatifwedon'tprovidetheissuerwhendecodingthetoken,itwon'tcausethedecodingtobreak.However,providingthewrongissuerwillactuallybreakdecoding.On
theotherhand,bothfailingtoprovidetheaudience,orprovidingthewrongaudience,willbreakdecoding.
Asinthepreviousexample,Ihavewrittenacustomdecodefunctionthatreactstotheappropriateexceptions.Seeifyoucanfollowalongwiththecallsandtherelativeoutputthatfollows(I'llhelpwithsomeblanklines):$pythonclaims_auth.pyInvalidaudience<class'jwt.exceptions.InvalidAudienceError'>
{'payload':'data','iss':'fab','aud':'learn-python'}
Invalidaudience<class'jwt.exceptions.InvalidAudienceError'>
Invalidissuer<class'jwt.exceptions.InvalidIssuerError'>
Invalidaudience<class'jwt.exceptions.InvalidAudienceError'>
{'payload':'data','iss':'fab','aud':'learn-python'}
Now,let'sseeonefinalexampleforamorecomplexusecase.
Usingasymmetric(public-key)algorithmsSometimes,usingasharedsecretisnotthebestoption.Inthosecases,itmightbeusefultoadoptadifferenttechnique.Inthisexample,wearegoingtocreateatoken(anddecodeit)usingapairofRSAkeys.
Publickeycryptography,orasymmetricalcryptography,isanycryptographicsystemthatusespairsofkeys:publickeyswhichmaybedisseminatedwidely,andprivatekeyswhichareknownonlytotheowner.Ifyouareinterestedinlearningmoreaboutthistopic,pleaseseetheendofthischapterforrecommendations.
Now,let'screatetwopairsofkeys.Onepairwillhavenopassword,andonewill.Tocreatethem,I'mgoingtousethessh-keygenutilsfromOpenSSH(https://www.ssh.com/ssh/keygen/).Inthefolderwheremyscriptsforthischapterare,Icreatedanrsasubfolder.Withinit,runthefollowing:
$ssh-keygen-trsa
Givethenamekeytothepath(itwillbesavedinthecurrentfolder),andsimplyhittheEnterkeywhenaskedforthepassword.Whendone,dothesameagain,butthistimeusethenamekeypwdforthekey,andgiveitapassword.TheoneIchoseistheclassicPassword123.Whenyouaredone,changebacktothech9folder,andrunthiscode:
#token_rsa.py
importjwt
fromcryptography.hazmat.backendsimportdefault_backend
fromcryptography.hazmat.primitivesimportserialization
data={'payload':'data'}
defencode(data,priv_filename,priv_pwd=None,algorithm='RS256'):
withopen(priv_filename,'rb')askey:
private_key=serialization.load_pem_private_key(
key.read(),
password=priv_pwd,
backend=default_backend()
)
returnjwt.encode(data,private_key,algorithm=algorithm)
defdecode(data,pub_filename,algorithm='RS256'):
withopen(pub_filename,'rb')askey:
public_key=key.read()
returnjwt.decode(data,public_key,algorithm=algorithm)
#nopwd
token=encode(data,'rsa/key')
data_out=decode(token,'rsa/key.pub')
print(data_out)
#withpwd
token=encode(data,'rsa/keypwd',priv_pwd=b'Password123')
data_out=decode(token,'rsa/keypwd.pub')
print(data_out)
Inthepreviousexample,wedefinedacoupleofcustomfunctionstoencodeanddecodetokensusingprivate/publickeys.Asyoucanseeinthesignatureoftheencodefunction,weareusingtheRS256algorithmthistime.Weneedtoopentheprivatekeyfilebyusingthespecialload_pem_private_keyfunction,whichallowsustospecifyacontent,password,andbackend..pemisthenameoftheformatinwhichourkeyshavebeencreated.Ifyoutakealookatthosefiles,youwillprobablyrecognizethem,sincetheyarequitepopular.
Thelogicisprettystraightforward,andIwouldencourageyoutothinkaboutatleastoneusecasewherethistechniquemightbemoresuitablethanusingasharedkey.
Usefulreferences
Here,youcanfindalistofusefulreferencesifyouwanttodigdeeperintothefascinatingworldofcryptography:
Cryptography:https://en.wikipedia.org/wiki/CryptographyJSONWebTokens:https://jwt.io
Hashfunctions:https://en.wikipedia.org/wiki/Cryptographic_hash_functionHMAC:https://en.wikipedia.org/wiki/HMACCryptographyservices(PythonSTDlibrary):https://docs.python.org/3.7/library/crypto.html
IANAJSONWebTokenClaimsRegistry:https://www.iana.org/assignments/jwt/jwt.xhtml
PyJWTlibrary:https://pyjwt.readthedocs.io/Cryptographylibrary:https://cryptography.io/
Thereiswaymoreontheweb,andplentyofbooksyoucanalsostudy,butI'drecommendthatyoustartwiththemainconceptsandthengraduallydiveintothespecificsyouwanttounderstandmorethoroughly.
SummaryInthisshortchapter,weexploredtheworldofcryptographyinthePythonstandardlibrary.Welearnedhowtocreateahash(ordigest)foramessageusingdifferentcryptographicfunctions.Wealsolearnedhowtocreatetokensanddealwithrandomdatawhenitcomestothecryptographycontext.
WethentookasmalltouroutsidethestandardlibrarytolearnaboutJSONWebTokens,whichareusedintensivelytodayinauthenticationandclaims-relatedfunctionalitiesbymodernsystemsandapplications.
Themostimportantthingistounderstandthatdoingthingsmanuallycanbeveryriskywhenitcomestocryptography,soit'salwaysbesttoleaveittotheprofessionalsandsimplyusethetoolswehaveavailable.
Thenextchapterwillbeallaboutmovingawayfromonelineofsoftwareexecution.We'regoingtolearnhowsoftwareworksintherealworld,exploreconcurrentexecution,andlearnaboutthreads,processes,andthetoolsPythongivesustodomorethanonethingatatime,sotospeak.
ConcurrentExecution"Whatdowewant?Now!Whendowewantit?Fewerraceconditions!"
–AnnaMelzer
Inthischapter,I'mgoingtoupthegamealittlebit,bothintermsoftheconceptsI'llpresent,andinthecomplexityofthecodesnippetsI'llshowyou.Ifyoudon'tfeeluptothetask,orasyouarereadingthroughyourealizeitisgettingtoodifficult,feelfreetoskipit.Youcanalwayscomebacktoitwhenyoufeelready.
Theplanistotakeadetourfromthefamiliarsingle-threadedexecutionparadigm,anddeepdiveintowhatcanbedescribedasconcurrentexecution.Iwillonlybeabletoscratchthesurfaceofthiscomplextopic,soIwon'texpectyoutobeamasterofconcurrencybythetimeyou'redonereading,butIwill,asusual,trytogiveyouenoughinformationsothatyoucanthenproceedbywalkingthepath,sotospeak.
Wewilllearnaboutalltheimportantconceptsthatapplytothisareaofprogramming,andIwilltrytoshowyouexamplescodedindifferentstyles,togiveyouasolidunderstandingofthebasicsofthesetopics.Todigdeepintothischallengingandinterestingbranchofprogramming,youwillhavetorefertotheConcurrentExecutionsectioninthePythondocumentation(https://docs.python.org/3.7/library/concurrency.html),andmaybesupplementyourknowledgebystudyingbooksonthesubject.
Inparticular,wearegoingtoexplorethefollowing:
ThetheorybehindthreadsandprocessesWritingmultithreadedcodeWritingmultiprocessingcodeUsingexecutorstospawnthreadsandprocessesAbriefexampleofprogrammingwithasyncio
Let'sstartbygettingthetheoryoutoftheway.
ConcurrencyversusparallelismConcurrencyandparallelismareoftenmistakenforthesamething,butthereisadistinctionbetweenthem.Concurrencyistheabilitytorunmultiplethingsatthesametime,notnecessarilyinparallel.Parallelismistheabilitytodoanumberofthingsatthesametime.
Imagineyoutakeyourotherhalftothetheater.Therearetwolines:thatis,forVIPandregulartickets.Thereisonlyonefunctionarycheckingticketsandso,inordertoavoidblockingeitherofthetwoqueues,theycheckoneticketfromtheVIPline,thenonefromtheregularline.Overtime,bothqueuesareprocessed.Thisisanexampleofconcurrency.
Nowimaginethatanotherfunctionaryjoins,sonowwehaveonefunctionaryperqueue.Thisway,bothqueueswillbeprocessedeachbyitsownfunctionary.Thisisanexampleofparallelism.
Modernlaptopprocessorsfeaturemultiplecores(normallytwotofour).Acoreisanindependentprocessingunitthatbelongstoaprocessor.HavingmorethanonecoremeansthattheCPUinquestionhasthephysicalabilitytoactuallyexecutetasksinparallel.Withineachcore,normallythereisaconstantalternationofstreamsofwork,whichisconcurrentexecution.
BearinmindthatI'mkeepingthediscussiongenericonpurposehere.Accordingtowhichsystemyouareusing,therewillbedifferencesinhowexecutionishandled,soIwillconcentrateontheconceptsthatarecommontoall,oratleastmost,systems.
Threadsandprocesses–anoverviewAthreadcanbedefinedasasequenceofinstructionsthatcanberunbyascheduler,whichisthatpartoftheoperatingsystemthatdecideswhichchunkofworkwillreceivethenecessaryresourcestobecarriedout.Typically,athreadliveswithinaprocess.Aprocesscanbedefinedasaninstanceofacomputerprogramthatisbeingexecuted.
Inpreviouschapters,wehaverunourownmodulesandscriptswithcommandssimilarto$pythonmy_script.py.Whathappenswhenacommandlikethatisrun,isthataPythonprocessiscreated.Withinit,amainthreadofexecutionisspawned.Theinstructionsinthescriptarewhatwillberunwithinthatthread.
Thisisjustonewayofworkingthough,andPythoncanactuallyusemorethanonethreadwithinthesameprocess,andcanevenspawnmultipleprocesses.Unsurprisingly,thesebranchesofcomputersciencearecalledmultithreadingandmultiprocessing.
Inordertounderstandthedifference,let'stakeamomenttoexplorethreadsandprocessesinslightlymoredepth.
Quickanatomyofathread
Generallyspeaking,therearetwodifferenttypesofthreads:
User-levelthreads:ThreadsthatwecancreateandmanageinordertoperformataskKernel-levelthreads:Low-levelthreadsthatruninkernelmodeandactonbehalfoftheoperatingsystem
GiventhatPythonworksattheuserlevel,we'renotgoingtodeepdiveintokernelthreadsatthistime.Instead,wewillexploreseveralexamplesofuser-levelthreadsinthischapter'sexamples.
Athreadcanbeinanyofthefollowingstates:
Newthread:Athreadthathasn'tstartedyet,andhasn'tbeenallocatedanyresources.Runnable:Thethreadiswaitingtorun.Ithasalltheresourcesneededtorun,andassoonastheschedulergivesitthegreenlight,itwillberun.Running:Athreadwhosestreamofinstructionsisbeingexecuted.Fromthisstate,itcangobacktoanon-runningstate,ordie.Not-running:Athreadthathasbeenpaused.Thiscouldbeduetoanotherthreadtakingprecedenceoverit,orsimplybecausethethreadiswaitingforalong-runningIOoperationtofinish.Dead:Athreadthathasdiedbecauseithasreachedthenaturalendofitsstreamofexecution,orithasbeenkilled.
Transitionsbetweenstatesareprovokedeitherbyouractionsorbythescheduler.Thereisonethingtobearinmind,though;itisbestnottointerferewiththedeathofathread.
KillingthreadsKillingthreadsisnotconsideredtobegoodpractice.Pythondoesn'tprovidetheabilitytokillathreadbycallingamethodorfunction,andthisshouldbeahintthatkillingthreadsisn'tsomethingyouwanttobedoing.
Onereasonisthatathreadmighthavechildren—threadsspawnedfromwithinthethreaditself—whichwouldbeorphanedwhentheirparentdies.Anotherreasoncouldbethatifthethreadyou'rekillingisholdingaresourcethatneedstobeclosedproperly,youmightpreventthatfromhappeningandthatcouldpotentiallyleadtoproblems.
Later,wewillseeanexampleofhowwecanworkaroundtheseissues.
Context-switchingWehavesaidthattheschedulercandecidewhenathreadcanrun,orispaused,andsoon.Anytimearunningthreadneedstobesuspendedsothatanothercanberun,theschedulersavesthestateoftherunningthreadinawaythatitwillbepossible,atalatertime,toresumeexecutionexactlywhereitwaspaused.
Thisactiscalledcontext-switching.Peopledothatallthetimetoo.Wearedoingsomepaperwork,andwehearbing!onourphone.Westopthepaperworkandcheckourphone.Whenwe'redonedealingwithwhatwasprobablytheumpteenthpictureofafunnycat,wegobacktoourpaperwork.Wedon'tstartthepaperworkfromthebeginning,though;wesimplycontinuewherewehadleftoff.
Context-switchingisamarvelousabilityofmoderncomputers,butitcanbecometroublesomeifyougeneratetoomanythreads.Theschedulerthenwilltrytogiveeachofthemachancetorunforalittletime,andtherewillbealotoftimespentsavingandrecoveringthestateofthethreadsthatarerespectivelypausedandrestarted.
Inordertoavoidthisproblem,itisquitecommontolimittheamountofthreads(thesameconsiderationappliestoprocesses)thatcanberunatanygivenpointintime.Thisisachievedbyusingastructurecalledapool,thesizeofwhichcanbedecidedbytheprogrammer.Inanutshell,wecreateapoolandthenassigntaskstoitsthreads.Whenallthethreadsofthepoolarebusy,theprogramwon'tbeabletospawnanewthreaduntiloneofthemterminates(andgoesbacktothepool).Poolsarealsogreatforsavingresources,inthattheyproviderecyclingfeaturestothethreadecosystem.
Whenyouwritemultithreadedcode,itisusefultohaveinformationaboutthemachineoursoftwareisgoingtorunon.Thatinformation,coupledwithsomeprofiling(we'lllearnaboutitinChapter11,DebuggingandTroubleshooting),shouldenableustocalibratethesizeofourpoolscorrectly.
TheGlobalInterpreterLockInJuly2015,IattendedtheEuroPythonconferenceinBilbao,whereIgaveatalkabouttest-drivendevelopment.Thecameraoperatorunfortunatelylostthefirsthalfofit,butI'vesincebeenabletogivethattalkanothercoupleoftimes,soyoucanfindacompleteversionofitontheweb.Attheconference,IhadthegreatpleasureofmeetingGuidovanRossumandtalkingtohim,andIalsoattendedhiskeynotespeech.
OneofthetopicsheaddressedwastheinfamousGlobalInterpreterLock(GIL).TheGILisamutexthatprotectsaccesstoPythonobjects,preventingmultiplethreadsfromexecutingPythonbytecodesatonce.ThismeansthateventhoughyoucanwritemultithreadedcodeinPython,thereisonlyonethreadrunningatanypointintime(perprocess,ofcourse).
Incomputerprogramming,amutualexclusionobject(mutex)isaprogramobjectthatallowsmultipleprogramthreadstosharethesameresource,suchasfileaccess,butnotsimultaneously.
Thisisnormallyseenasanundesiredlimitationofthelanguage,andmanydeveloperstakeprideincursingthisgreatvillain.Thetruthliessomewhereelsethough,aswasbeautifullyexplainedbyRaymondHettingerinhisKeynoteonConcurrency,atPyBay2017(https://bit.ly/2KcijOB).About10minutesin,RaymondexplainsthatitisactuallyquitesimpletoremovetheGILfromPython.Ittakesaboutadayofwork.ThepriceyoupayforthisGIL-ectomythough,isthatyouthenhavetoapplylocksyourselfwherevertheyareneededinyourcode.Thisleadstoamoreexpensivefootprint,asmultitudesofindividuallockstakemoretimetobeacquiredandreleased,andmostimportantly,itintroducestheriskofbugs,aswritingrobustmultithreadedcodeisnoteasyandyoumightenduphavingtowritedozensorhundredsoflocks.
Inordertounderstandwhatalockis,andwhyyoumightwanttouseit,wefirstneedtotalkaboutoneoftheperilsofmultithreadedprogramming:raceconditions.
Raceconditionsanddeadlocks
Whenitcomestowritingmultithreadedcode,youneedtobeawareofthedangersthatcomewhenyourcodeisnolongerexecutedlinearly.Bythat,Imeanthatmultithreadedcodeisexposedtotheriskofbeingpausedatanypointintimebythescheduler,becauseithasdecidedtogivesomeCPUtimetoanotherstreamofinstructions.
Thisbehaviorexposesyoutodifferenttypesofrisks,thetwomostfamousbeingraceconditionsanddeadlocks.Let'stalkaboutthembriefly.
RaceconditionsAraceconditionisabehaviorofasystemwheretheoutputofaproceduredependsonthesequenceortimingofotheruncontrollableevents.Whentheseeventsdon'tunfoldintheorderintendedbytheprogrammer,araceconditionbecomesabug.
It'smucheasiertoexplainthiswithanexample.
Imagineyouhavetwothreadsrunning.Bothareperformingthesametask,whichconsistsofreadingavaluefromalocation,performinganactionwiththatvalue,incrementingthevalueby1unit,andsavingitback.SaythattheactionistopostthatvaluetoanAPI.
ScenarioA–raceconditionnothappening
ThreadAreadsthevalue(1),posts1totheAPI,thenincrementsitto2,andsavesitback.Rightafterthis,theschedulerpausesThreadA,andrunsThreadB.ThreadBreadsthevalue(now2),posts2totheAPI,incrementsitto3,andsavesitback.
Atthispoint,aftertheoperationhashappenedtwice,thevaluestorediscorrect:1+2=3.Moreover,theAPIhasbeencalledwithboth1and2,correctly.
ScenarioB–raceconditionhappeningThreadAreadsthevalue(1),postsittotheAPI,incrementsitto2,butbeforeitcansaveitback,theschedulerdecidestopausethreadAinfavorofThreadB.
ThreadBreadsthevalue(still1!),postsittotheAPI,incrementsitto2,andsavesitback.TheschedulerthenswitchesovertoThreadAagain.ThreadAresumesitsstreamofworkbysimplysavingthevalueitwasholdingafterincrementing,whichis2.
Afterthisscenario,eventhoughtheoperationhashappenedtwiceasinScenarioA,thevaluesavedis2,andtheAPIhasbeencalledtwicewith1.
Inareal-lifesituation,withmultiplethreadsandrealcodeperformingseveraloperations,theoverallbehavioroftheprogramexplodesintoamyriadofpossibilities.We'llseeanexampleofthislateron,andwe'llfixitusinglocks.
Themainproblemwithraceconditionsisthattheymakeourcodenon-deterministic,whichisbad.Thereareareasincomputersciencewherenon-determinismisusedtoachievethings,andthat'sfine,butingeneralyouwanttobeabletopredicthowyourcodewillbehave,andraceconditionsmakeitimpossibletodoso.
LockstotherescueLockscometotherescuewhendealingwithraceconditions.Forexample,inordertofixtheprecedingexample,allyouneedisalockaroundtheprocedure.Alockislikeaguardianthatwillallowonlyonethreadtotakeholdofit(wesaytoacquirealock),anduntilthatthreadreleasesthelock,nootherthreadcanacquireit.Theywillhavetositandwaituntilthelockisavailableagain.
ScenarioC–usingalockThreadAacquiresthelock,readsthevalue(1),poststotheAPI,increasesto2,andtheschedulersuspendsit.ThreadBisgivensomeCPUtime,soittriestoacquirethelock.Butthelockhasn'tbeenreleasedyetbyThreadA,soThreadBsitsandwaits.Theschedulermightnoticethis,andquicklydecidetoswitchbacktoThreadA.
ThreadAsaves2,andreleasesthelock,makingitavailabletoallotherthreads.
Atthispoint,whetherthelockisacquiredagainbyThreadA,orbyThreadB(becausetheschedulermighthavedecidedtoswitchagain),isnotimportant.Theprocedurewillalwaysbecarriedoutcorrectly,sincethelockmakessurethatwhenathreadreadsavalue,ithastocompletetheprocedure(pingAPI,increment,andsave)beforeanyotherthreadcanreadthevalueaswell.
Thereareamultitudeofdifferentlocksavailableinthestandardlibrary.Idefinitelyencourageyoutoreaduponthemtounderstandalltheperilsyoumightencounterwhencodingmultithreadedcode,andhowtosolvethem.
Let'snowtalkaboutdeadlocks.
DeadlocksAdeadlockisastateinwhicheachmemberofagroupiswaitingforsomeothermembertotakeaction,suchassendingamessageor,morecommonly,releasingalock,oraresource.
Asimpleexamplewillhelpyougetthepicture.Imaginetwolittlekidsplayingtogether.Findatoythatismadeoftwoparts,andgiveeachofthemonepart.Naturally,neitherofthemwillwanttogivetheotheronetheirpart,andtheywillwanttheotheronetoreleasetheparttheyhave.Soneitherofthemwillbeabletoplaywiththetoy,astheyeachholdhalfofit,andwillindefinitelywaitfortheotherkidtoreleasetheotherhalf.
Don'tworry,nokidswereharmedduringthemakingofthisexample.Itallhappenedinmymind.
Anotherexamplecouldbehavingtwothreadsexecutethesameprocedureagain.Theprocedurerequiresacquiringtworesources,AandB,bothguardedbyaseparatelock.Thread1acquiresA,andThread2acquiresB,andthentheywillwaitindefinitelyuntiltheotheronereleasestheresourceithas.Butthatwon'thappen,astheybothareinstructedtowaitandacquirethesecondresourceinordertocompletetheprocedure.Threadscanbemuchmorestubbornthankids.
Youcansolvethisprobleminseveralways.Theeasiestonemightbesimplytoapplyanordertotheresourcesacquisition,whichmeansthatthethreadthatgetsA,willalsogetalltherest:B,C,andsoon.
Anotherwayistoputalockaroundthewholeresourcesacquisitionprocedure,sothatevenifitmighthappenoutoforder,itwillstillbewithinthecontextofalock,whichmeansonlyonethreadatatimecanactuallygatheralltheresources.
Let'snowpauseourtalkonthreadsforamoment,andexploreprocesses.
Quickanatomyofaprocess
Processesarenormallymorecomplexthanthreads.Ingeneral,theycontainamainthread,butcanalsobemultithreadedifyouchoose.Theyarecapableofspawningmultiplesub-threads,eachofwhichcontainsitsownsetofregistersandastack.Eachprocessprovidesalltheresourcesthatthecomputerneedsinordertoexecutetheprogram.
Similarlytousingmultiplethreads,wecandesignourcodetotakeadvantageofamultiprocessingdesign.Multipleprocessesarelikelytorunovermultiplecores,thereforewithmultiprocessing,youcantrulyparallelizecomputation.Theirmemoryfootprints,though,areslightlyheavierthanthoseofthreads,andanotherdrawbacktousingmultipleprocessesisthatinter-processcommunication(IPC)tendstobemoreexpensivethancommunicationbetweenthreads.
Propertiesofaprocess
AUNIXprocessiscreatedbytheoperatingsystem.Ittypicallycontainsthefollowing:
AprocessID,processgroupID,userID,orgroupIDAnenvironmentandworkingdirectoryPrograminstructionsRegisters,astack,andaheapFiledescriptorsSignalactionsSharedlibrariesInter-processcommunicationtools(pipes,messagequeues,semaphores,orsharedmemory)
Ifyouarecuriousaboutprocesses,openupashellandtype$top.Thiscommanddisplaysandupdatessortedinformationabouttheprocessesthatarerunninginyoursystem.WhenIrunitonmymachine,thefirstlinetellsmethefollowing:
$top
Processes:477total,4running,473sleeping,2234threads
...
Thisgivesyouanideaabouthowmuchworkourcomputersaredoingwithoutusbeingreallyawareofit.
Multithreadingormultiprocessing?Givenallthisinformation,decidingwhichapproachisthebestmeanshavinganunderstandingofthetypeofworkthatneedstobecarriedout,andknowledgeaboutthesystemthatwillbededicatedtodoingthatwork.
Thereareadvantagestobothapproaches,solet'strytoclarifythemaindifferences.
Herearesomeadvantagesofusingmultithreading:
Threadsareallbornwithinthesameprocess.Theyshareresourcesandcancommunicatewithoneanotherveryeasily.Communicationbetweenprocessesrequiresmorecomplexstructuresandtechniques.Theoverheadofspawningathreadissmallerthanthatofaprocess.Moreover,theirmemoryfootprintisalsosmaller.ThreadscanbeveryeffectiveatblockingIO-boundapplications.Forexample,whileonethreadisblockedwaitingforanetworkconnectiontogivebacksomedata,workcanbeeasilyandeffectivelyswitchedtoanotherthread.Becausetherearen'tanysharedresourcesbetweenprocesses,weneedtouseIPCtechniques,andtheyrequiremorememorythancommunicationbetweenthreads.
Herearesomeadvantagesofusingmultiprocessing:
WecanavoidthelimitationsoftheGILbyusingprocesses.Sub-processesthatfailwon'tkillthemainapplication.Threadssufferfromissuessuchasraceconditionsanddeadlocks;whileusingprocessesthelikelihoodofhavingtodealwiththemisgreatlyreduced.Context-switchingofthreadscanbecomequiteexpensivewhentheiramountisaboveacertainthreshold.Processescanmakebetteruseofmulticoreprocessors.ProcessesarebetterthanmultiplethreadsathandlingCPU-intensivetasks.
Inthischapter,I'llshowyoubothapproachesformultipleexamples,so
hopefullyyou'llgainagoodunderstandingofthevariousdifferenttechniques.Let'sgettothecodethen!
ConcurrentexecutioninPythonLet'sstartbyexploringthebasicsofPythonmultithreadingandmultiprocessingwithsomesimpleexamples.
Keepinmindthatseveralofthefollowingexampleswillproduceanoutputthatdependsonaparticularrun.Whendealingwiththreads,thingscangetnon-deterministic,asImentionedearlier.So,ifyouexperiencedifferentresults,itisabsolutelyfine.Youwillprobablynoticethatsomeofyourresultswillvaryfromruntoruntoo.
StartingathreadFirstthingsfirst,let'sstartathread:
#start.py
importthreading
defsum_and_product(a,b):
s,p=a+b,a*b
print(f'{a}+{b}={s},{a}*{b}={p}')
t=threading.Thread(
target=sum_and_product,name='SumProd',args=(3,7)
)
t.start()
Afterimportingthreading,wedefineafunction:sum_and_product.Thisfunctioncalculatesthesumandtheproductoftwonumbers,andprintstheresults.Theinterestingbitisafterthefunction.Weinstantiatetfromthreading.Thread.Thisisourthread.Wepassedthenameofthefunctionthatwillberunasthethreadbody,wegaveitaname,andpassedthearguments3and7,whichwillbefedintothefunctionasaandb,respectively.
Afterhavingcreatedthethread,westartitwiththehomonymousmethod.
Atthispoint,Pythonwillstartexecutingthefunctioninanewthread,andwhenthatoperationisdone,thewholeprogramwillbedoneaswell,andexit.Let'srunit:
$pythonstart.py
3+7=10,3*7=21
Startingathreadisthereforequitesimple.Let'sseeamoreinterestingexamplewherewedisplaymoreinformation:
#start_with_info.py
importthreading
fromtimeimportsleep
defsum_and_product(a,b):
sleep(.2)
print_current()
s,p=a+b,a*b
print(f'{a}+{b}={s},{a}*{b}={p}')
defstatus(t):
ift.is_alive():
print(f'Thread{t.name}isalive.')
else:
print(f'Thread{t.name}hasterminated.')
defprint_current():
print('Thecurrentthreadis{}.'.format(
threading.current_thread()
))
print('Threads:{}'.format(list(threading.enumerate())))
print_current()
t=threading.Thread(
target=sum_and_product,name='SumPro',args=(3,7)
)
t.start()
status(t)
t.join()
status(t)
Inthisexample,thethreadlogicisexactlythesameasinthepreviousone,soyoudon'tneedtosweatonitandcanconcentrateonthe(insane!)amountoflogginginformationIadded.Weusetwofunctionstodisplayinformation:statusandprint_current.Thefirstonetakesathreadininputanddisplaysitsnameandwhetherornotit'salivebycallingitsis_alivemethod.Thesecondoneprintsthecurrentthread,andthenenumeratesallthethreadsintheprocess.Thisinformationcomesfromthreading.current_threadandthreading.enumerate.
ThereisareasonwhyIput.2secondsofsleepingtimewithinthefunction.Whenthethreadstarts,itsfirstinstructionistosleepforamoment.Thesneakyschedulerwillcatchthat,andswitchexecutionbacktothemainthread.Youcanverifythisbythefactthatintheoutput,youwillseetheresultofstatus(t)beforethatofprint_currentfromwithinthethread.Thismeansthatthatcallhappenswhilethethreadissleeping.
Finally,noticeIcalledt.join()attheend.ThatinstructsPythontoblockuntilthethreadhascompleted.ThereasonforthatisbecauseIwantthelastcalltostatus(t)totellusthatthethreadisgone.Let'speekattheoutput(slightlyrearrangedforreadability):
$pythonstart_with_info.py
Thecurrentthreadis
<_MainThread(MainThread,started140735733822336)>.
Threads:[<_MainThread(MainThread,started140735733822336)>]
ThreadSumProdisalive.
Thecurrentthreadis<Thread(SumProd,started123145375604736)>.
Threads:[
<_MainThread(MainThread,started140735733822336)>,
<Thread(SumProd,started123145375604736)>
]
3+7=10,3*7=21
ThreadSumProdhasterminated.
Asyoucansee,atfirstthecurrentthreadisthemainthread.Theenumerationshowsonlyonethread.ThenwecreateandstartSumProd.Weprintitsstatusandwelearnitisalive.Then,andthistimefromwithinSumProd,wedisplayinformationaboutthecurrentthreadagain.Ofcourse,nowthecurrentthreadisSumProd,andwecanseethatenumeratingallthreadsreturnsbothofthem.Aftertheresultisprinted,weverify,withonelastcalltostatus,thatthethreadhasterminated,aspredicted.Shouldyougetdifferentresults(apartfromtheIDsofthethreads,ofcourse),tryincreasingthesleepingtimeandseewhetheranythingchanges.
Startingaprocess
Let'snowseeanequivalentexample,butinsteadofusingathread,we'lluseaprocess:
#start_proc.py
importmultiprocessing
...
p=multiprocessing.Process(
target=sum_and_product,name='SumProdProc',args=(7,9)
)
p.start()
Thecodeisexactlythesameasforthefirstexample,butinsteadofusingaThread,weactuallyinstantiatemultiprocessing.Process.Thesum_and_productfunctionisthesameasbefore.Theoutputisalsothesame,exceptthenumbersaredifferent.
StoppingthreadsandprocessesAsmentionedbefore,ingeneral,stoppingathreadisabadidea,andthesamegoesforaprocess.Beingsureyou'vetakencaretodisposeandcloseeverythingthatisopencanbequitedifficult.However,therearesituationsinwhichyoumightwanttobeabletostopathread,soletmeshowyouhowtodoit:
#stop.py
importthreading
fromtimeimportsleep
classFibo(threading.Thread):
def__init__(self,*a,**kwa):
super().__init__(*a,**kwa)
self._running=True
defstop(self):
self._running=False
defrun(self):
a,b=0,1
whileself._running:
print(a,end='')
a,b=b,a+b
sleep(0.07)
print()
fibo=Fibo()
fibo.start()
sleep(1)
fibo.stop()
fibo.join()
print('Alldone.')
Forthisexample,weuseaFibonaccigenerator.We'veseenitbeforesoIwon'texplainit.Theimportantbittofocusonisthe_runningattribute.Firstofall,noticetheclassinheritsfromThread.Byoverridingthe__init__method,wecansetthe_runningflagtoTrue.Whenyouwriteathreadthisway,insteadofgivingitatargetfunction,yousimplyoverridetherunmethodintheclass.OurrunmethodcalculatesanewFibonaccinumber,andthensleepsforabout0.07seconds.
Inthelastblockofcode,wecreateandstartaninstanceofourclass.Thenwesleepforonesecond,whichshouldgivethethreadtimetoproduceabout14Fibonaccinumbers.Whenwecallfibo.stop(),wearen'tactuallystoppingthethread.WesimplysetourflagtoFalse,andthisallowsthecodewithinruntoreachitsnaturalend.Thismeansthatthethreadwilldieorganically.Wecalljoin
tomakesurethethreadisactuallydonebeforeweprintAlldone.ontheconsole.Let'schecktheoutput:
$pythonstop.py
01123581321345589144233
Alldone.
Checkhowmanynumberswereprinted:14,aspredicted.
Thisisbasicallyaworkaroundtechniquethatallowsyoutostopathread.Ifyoudesignyourcodecorrectlyaccordingtomultithreadingparadigms,youshouldn'thavetokillthreadsallthetime,soletthatneedbecomeyouralarmbellthatsomethingcouldbedesignedbetter.
StoppingaprocessWhenitcomestostoppingaprocess,thingsaredifferent,andfuss-free.Youcanuseeithertheterminateorkillmethod,butpleasemakesureyouknowwhatyou'redoing,asalltheprecedingconsiderationsaboutopenresourceslefthangingarestilltrue.
SpawningmultiplethreadsJustforfun,let'splaywithtwothreadsnow:
#starwars.py
importthreading
fromtimeimportsleep
fromrandomimportrandom
defrun(n):
t=threading.current_thread()
forcountinrange(n):
print(f'Hellofrom{t.name}!({count})')
sleep(0.2*random())
obi=threading.Thread(target=run,name='Obi-Wan',args=(4,))
ani=threading.Thread(target=run,name='Anakin',args=(3,))
obi.start()
ani.start()
obi.join()
ani.join()
Therunfunctionsimplyprintsthecurrentthread,andthenentersaloopofncycles,inwhichitprintsagreetingmessage,andsleepsforarandomamountoftime,between0and0.2seconds(random()returnsafloatbetween0and1).
Thepurposeofthisexampleistoshowyouhowaschedulermightjumpbetweenthreads,soithelpstomakethemsleepalittle.Let'sseetheoutput:
$pythonstarwars.py
HellofromObi-Wan!(0)
HellofromAnakin!(0)
HellofromObi-Wan!(1)
HellofromObi-Wan!(2)
HellofromAnakin!(1)
HellofromObi-Wan!(3)
HellofromAnakin!(2)
Asyoucansee,theoutputalternatesrandomlybetweenthetwo.Everytimethathappens,youknowacontextswitchhasbeenperformedbythescheduler.
DealingwithraceconditionsNowthatwehavethetoolstostartthreadsandrunthem,let'ssimulatearaceconditionsuchastheonewediscussedearlier:
#race.py
importthreading
fromtimeimportsleep
fromrandomimportrandom
counter=0
randsleep=lambda:sleep(0.1*random())
defincr(n):
globalcounter
forcountinrange(n):
current=counter
randsleep()
counter=current+1
randsleep()
n=5
t1=threading.Thread(target=incr,args=(n,))
t2=threading.Thread(target=incr,args=(n,))
t1.start()
t2.start()
t1.join()
t2.join()
print(f'Counter:{counter}')
Inthisexample,wedefinetheincrfunction,whichgetsanumbernininput,andloopsovern.Ineachcycle,itreadsthevalueofthecounter,sleepsforarandomamountoftime(between0and0.1seconds)bycallingrandsleep,atinyLambdafunctionIwrotetoimprovereadability,thenincreasesthevalueofthecounterby1.
Ichosetouseglobalinordertohaveread/writeaccesstocounter,butitcouldbeanythingreally,sofeelfreetoexperimentwiththatyourself.
Thewholescriptbasicallystartstwothreads,eachofwhichrunsthesamefunction,andgetsn=5.Noticehowweneedtojoinonboththreadsattheendtomakesurethatwhenweprintthefinalvalueofthecounter(lastline),boththreadsaredonedoingtheirwork.
Whenweprintthefinalvalue,wewouldexpectthecountertobe10,right?Twothreads,fiveloopseach,thatmakes10.However,wealmostneverget10ifwe
runthisscript.Iranitmyselfmanytimes,anditseemstoalwayshitsomewherebetween5and7.Thereasonthishappensisthatthereisaraceconditioninthiscode,andthoserandomsleepsIaddedaretheretoexacerbateit.Ifyouremovedthem,therewouldstillbearacecondition,becausethecounterisincreasedinanon-atomicway(whichmeansanoperationthatcanbebrokendowninmultiplesteps,andthereforepausedinbetween).However,thelikelihoodofthatraceconditionshowingisreallylow,soaddingtherandomsleephelps.
Let'sanalyzethecode.t1getsthecurrentvalueofthecounter,say,3.t1thensleepsforamoment.Iftheschedulerswitchescontextinthatmoment,pausingt1andstartingt2,t2willreadthesamevalue,3.Whateverhappensafterward,weknowthatboththreadswillupdatethecountertobe4,whichwillbeincorrectasaftertworeadingsitshouldhavegoneupto5.Addingthesecondrandomsleepcall,aftertheupdate,helpstheschedulerswitchmorefrequently,andmakesiteasiertoshowtheracecondition.Trycommentingoutoneofthem,andseehowtheresultchanges(itwilldoso,dramatically).
Nowthatwehaveidentifiedtheissue,let'sfixitbyusingalock.Thecodeisbasicallythesame,soI'llshowyouonlywhatchanges:
#race_with_lock.py
incr_lock=threading.Lock()
defincr(n):
globalcounter
forcountinrange(n):
withincr_lock:
current=counter
randsleep()
counter=current+1
randsleep()
Thistimewehavecreatedalock,fromthethreading.Lockclass.Wecouldcallitsacquireandreleasemethodsmanually,orwecanbePythonicanduseitwithinacontextmanager,whichlooksmuchnicer,anddoesthewholeacquire/releasebusinessforus.NoticeIlefttherandomsleepsinthecode.However,everytimeyourunit,itwillnowreturn10.
Thedifferenceisthis:whenthefirstthreadacquiresthatlock,itdoesn'tmatterthatwhenit'ssleeping,amomentlater,theschedulerswitchesthecontext.Thesecondthreadwilltrytoacquirethelock,andPythonwillanswerwitharesoundingno.So,thesecondthreadwilljustsitandwaituntilthatlockis
released.Assoonastheschedulerswitchesbacktothefirstthread,andthelockisreleased,thentheotherthreadwillhaveachance(ifitgetstherefirst,whichisnotnecessarilyguaranteed),toacquirethelockandupdatethecounter.Tryaddingsomeprintsintothatlogictoseewhetherthethreadsalternateperfectlyornot.Myguessisthattheywon't,atleastnoteverytime.Rememberthethreading.current_threadfunction,tobeabletoseewhichthreadisactuallyprintingtheinformation.
Pythonoffersseveraldatastructuresinthethreadingmodule:Lock,RLock,Condition,Semaphore,Event,Timer,andBarrier.Iwon'tbeabletoshowyouallofthem,becauseunfortunatelyIdon'thavetheroomtoexplainalltheusecases,butreadingthedocumentationofthethreadingmodule(https://docs.python.org/3.7/library/threading.html)willbeagoodplacetostartunderstandingthem.
Let'snowseeanexampleaboutthread'slocaldata.
Athread'slocaldataThethreadingmoduleoffersawaytoimplementlocaldataforthreads.Localdataisanobjectthatholdsthread-specificdata.Letmeshowyouanexample,andallowmetosneakinaBarriertoo,soIcantellyouhowitworks:
#local.py
importthreading
fromrandomimportrandint
local=threading.local()
defrun(local,barrier):
local.my_value=randint(0,10**2)
t=threading.current_thread()
print(f'Thread{t.name}hasvalue{local.my_value}')
barrier.wait()
print(f'Thread{t.name}stillhasvalue{local.my_value}')
count=3
barrier=threading.Barrier(count)
threads=[
threading.Thread(
target=run,name=f'T{name}',args=(local,barrier)
)fornameinrange(count)
]
fortinthreads:
t.start()
Westartbydefininglocal.Thatisthespecialobjectthatholdsthread-specificdata.Werunthreethreads.Eachofthemwillassignarandomvaluetolocal.my_value,andprintit.ThenthethreadreachesaBarrierobject,whichisprogrammedtoholdthreethreadsintotal.Whenthebarrierishitbythethirdthread,theyallcanpass.It'sbasicallyanicewaytomakesurethatNamountofthreadsreachacertainpointandtheyallwaituntileverysingleoneofthemhasarrived.
Now,iflocalwasanormal,dummyobject,thesecondthreadwouldoverridethevalueoflocal.my_value,andthethirdwoulddothesame.Thismeansthatwewouldseethemprintingdifferentvaluesinthefirstsetofprints,buttheywouldshowthesamevalue(thelastone)inthesecondroundofprints.Butthatdoesn'thappen,thankstolocal.Theoutputshowsthefollowing:
$pythonlocal.py
ThreadT0hasvalue61
ThreadT1hasvalue52
ThreadT2hasvalue38
ThreadT2stillhasvalue38
ThreadT0stillhasvalue61
ThreadT1stillhasvalue52
Noticethewrongorder,duetotheschedulerswitchingcontext,butthevaluesareallcorrect.
ThreadandprocesscommunicationWehaveseenquitealotofexamplessofar.So,let'sexplorehowtomakethreadsandprocessestalktooneanotherbyemployingaqueue.Let'sstartwiththreads.
ThreadcommunicationForthisexample,wewillbeusinganormalQueue,fromthequeuemodule:
#comm_queue.py
importthreading
fromqueueimportQueue
SENTINEL=object()
defproducer(q,n):
a,b=0,1
whilea<=n:
q.put(a)
a,b=b,a+b
q.put(SENTINEL)
defconsumer(q):
whileTrue:
num=q.get()
q.task_done()
ifnumisSENTINEL:
break
print(f'Gotnumber{num}')
q=Queue()
cns=threading.Thread(target=consumer,args=(q,))
prd=threading.Thread(target=producer,args=(q,35))
cns.start()
prd.start()
q.join()
Thelogicisverybasic.WehaveaproducerfunctionthatgeneratesFibonaccinumbersandputstheminaqueue.Whenthenextnumberisgreaterthanagivenn,theproducerexitsthewhileloop,andputsonelastthinginthequeue:aSENTINEL.ASENTINELisanyobjectthatisusedtosignalsomething,andinourcase,itsignalstotheconsumerthattheproducerisdone.
Theinterestingbitoflogicisintheconsumerfunction.Itloopsindefinitely,readingvaluesoutofthequeueandprintingthemout.Thereareacoupleofthingstonoticehere.First,seehowwearecallingq.task_done()?Thatistoacknowledgethattheelementinthequeuehasbeenprocessed.Thepurposeofthisistoallowthefinalinstructioninthecode,q.join(),tounblockwhenallelementshavebeenacknowledged,sothattheexecutioncanend.
Second,noticehowweusetheisoperatortocompareagainsttheitemsinordertofindthesentinel.We'llseeshortlythatwhenusingamultiprocessing.Queuethis
won'tbepossibleanymore.Beforewegetthere,wouldyoubeabletoguesswhy?
Runningthisexampleproducesaseriesoflines,suchasGotnumber0,Gotnumber1,andsoon,until34,sincethelimitweputis35,andthenextFibonaccinumberwouldbe55.
SendingeventsAnotherwaytomakethreadscommunicateistofireevents.Letmequicklyshowyouanexampleofthat:
#evt.py
importthreading
deffire():
print('Firingevent...')
event.set()
deflisten():
event.wait()
print('Eventhasbeenfired')
event=threading.Event()
t1=threading.Thread(target=fire)
t2=threading.Thread(target=listen)
t2.start()
t1.start()
Herewehavetwothreadsthatrunfireandlisten,respectivelyfiringandlisteningforanevent.Tofireanevent,callthesetmethodonit.Thet2thread,whichisstartedfirst,isalreadylisteningtotheevent,andwillsitthereuntiltheeventisfired.Theoutputfromthepreviousexampleisthefollowing:
$pythonevt.py
Firingevent...
Eventhasbeenfired
Eventsaregreatinsomesituations.Thinkabouthavingthreadsthatarewaitingonaconnectionobjecttobeready,beforetheycanactuallystartusingit.Theycouldbewaitingonanevent,andonethreadcouldbecheckingthatconnection,andfiringtheeventwhenit'sready.Eventsarefuntoplaywith,somakesureyouexperimentandthinkaboutusecasesforthem.
Inter-processcommunicationwithqueuesLet'snowseehowtocommunicatebetweenprocessesusingaqueue.Thisexampleisveryverysimilartotheoneforthreads:
#comm_queue_proc.py
importmultiprocessing
SENTINEL='STOP'
defproducer(q,n):
a,b=0,1
whilea<=n:
q.put(a)
a,b=b,a+b
q.put(SENTINEL)
defconsumer(q):
whileTrue:
num=q.get()
ifnum==SENTINEL:
break
print(f'Gotnumber{num}')
q=multiprocessing.Queue()
cns=multiprocessing.Process(target=consumer,args=(q,))
prd=multiprocessing.Process(target=producer,args=(q,35))
cns.start()
prd.start()
Asyoucansee,inthiscase,wehavetouseaqueuethatisaninstanceofmultiprocessing.Queue,whichdoesn'texposeatask_donemethod.However,becauseofthewaythisqueueisdesigned,itautomaticallyjoinsthemainthread,thereforeweonlyneedtostartthetwoprocessesandallwillwork.Theoutputofthisexampleisthesameastheonebefore.
WhenitcomestoIPC,becareful.Objectsarepickledwhentheyenterthequeue,soIDsgetlost,andthereareafewothersubtlethingstotakecareof.ThisiswhyinthisexampleIcannolongeruseanobjectasasentinel,andcompareusingis,likeIdidinthemulti-threadedversion.Thatsentinelobjectwouldbepickledinthequeue(becausethistimetheQueuecomesfrommultiprocessingandnotfromqueuelikebefore),andwouldassumeanewIDafterunpickling,failingtocomparecorrectly.Thestring"STOP"inthiscasedoesthe
trick,anditwillbeuptoyoutofindasuitablevalueforasentinel,whichneedstobesomethingthatcouldneverclashwithanyoftheitemsthatcouldbeinthesamequeue.Ileaveituptoyoutorefertothedocumentation,andlearnasmuchasyoucanonthistopic.
Queuesaren'ttheonlywaytocommunicatebetweenprocesses.Youcanalsousepipes(multiprocessing.Pipe),whichprovideaconnection(asin,apipe,clearly)fromoneprocesstoanother,andviceversa.Youcanfindplentyofexamplesinthedocumentation;theyaren'tthatdifferentfromwhatwe'veseenhere.
ThreadandprocesspoolsAsmentionedbefore,poolsarestructuresdesignedtoholdNobjects(threads,processes,andsoon).Whentheusagereachescapacity,noworkisassignedtoathread(orprocess)untiloneofthosecurrentlyworkingbecomesavailableagain.Pools,therefore,areagreatwaytolimitthenumberofthreads(orprocesses)thatcanbealiveatthesametime,preventingthesystemfromstarvingduetoresourceexhaustion,orthecomputationtimefrombeingaffectedbytoomuchcontextswitching.
Inthefollowingexamples,Iwillbetappingintotheconcurrent.futuresmoduletousetheThreadPoolExecutorandProcessPoolExecutorexecutors.Thesetwoclasses,useapoolofthreads(andprocesses,respectively),toexecutecallsasynchronously.Theybothacceptaparameter,max_workers,whichsetstheupperlimittohowmanythreads(orprocesses)canbeusedatthesametimebytheexecutor.
Let'sstartfromthemultithreadedexample:
#pool.py
fromconcurrent.futuresimportThreadPoolExecutor,as_completed
fromrandomimportrandint
importthreading
defrun(name):
value=randint(0,10**2)
tname=threading.current_thread().name
print(f'Hi,Iam{name}({tname})andmyvalueis{value}')
return(name,value)
withThreadPoolExecutor(max_workers=3)asexecutor:
futures=[
executor.submit(run,f'T{name}')fornameinrange(5)
]
forfutureinas_completed(futures):
name,value=future.result()
print(f'Thread{name}returned{value}')
Afterimportingthenecessarybits,wedefinetherunfunction.Itgetsarandomvalue,printsit,andreturnsit,alongwiththenameargumentitwascalledwith.Theinterestingbitcomesrightafterthefunction.
Asyoucansee,we'reusingacontextmanagertocallThreadPoolExecutor,towhichwepassmax_workers=3,whichmeansthepoolsizeis3.Thismeansonlythree
threadsatanytimewillbealive.
Wedefinealistoffutureobjectsbymakingalistcomprehension,inwhichwecallsubmitonourexecutorobject.Weinstructtheexecutortoruntherunfunction,withanamethatwillgofromT0toT4.Afutureisanobjectthatencapsulatestheasynchronousexecutionofacallable.
Thenweloopoverthefutureobjects,astheyarearedone.Todothis,weuseas_completedtogetaniteratorofthefutureinstancesthatreturnsthemassoonastheycomplete(finishorwerecancelled).Wegrabtheresultofeachfuturebycallingthehomonymousmethod,andsimplyprintit.Giventhatrunreturnsatuplename,value,weexpecttheresulttobeatwo-tuplecontainingnameandvalue.Ifweprinttheoutputofarun(bearinmindeachruncanpotentiallybeslightlydifferent),weget:
$pythonpool.py
Hi,IamT0(ThreadPoolExecutor-0_0)andmyvalueis5
Hi,IamT1(ThreadPoolExecutor-0_0)andmyvalueis23
Hi,IamT2(ThreadPoolExecutor-0_1)andmyvalueis58
ThreadT1returned23
ThreadT0returned5
Hi,IamT3(ThreadPoolExecutor-0_0)andmyvalueis93
Hi,IamT4(ThreadPoolExecutor-0_1)andmyvalueis62
ThreadT2returned58
ThreadT3returned93
ThreadT4returned62
Beforereadingon,canyoutellwhytheoutputlookslikethis?Couldyouexplainwhathappened?Spendamomentthinkingaboutit.
So,whatgoesonisthatthreethreadsstartrunning,sowegetthreeHi,Iam...messagesprintedout.Onceallthreeofthemarerunning,thepoolisatcapacity,soweneedtowaitforatleastonethreadtocompletebeforeanythingelsecanhappen.Intheexamplerun,T0andT2complete(whichissignaledbytheprintingofwhattheyreturned),sotheyreturntothepoolandcanbeusedagain.TheygetrunwithnamesT3andT4,andfinallyallthree,T1,T3,andT4complete.Youcanseefromtheoutputhowthethreadsareactuallyreused,andhowthefirsttwoarereassignedtoT3andT4aftertheycomplete.
Let'snowseethesameexample,butwiththemultiprocessdesign:
#pool_proc.py
fromconcurrent.futuresimportProcessPoolExecutor,as_completed
fromrandomimportrandint
fromtimeimportsleep
defrun(name):
sleep(.05)
value=randint(0,10**2)
print(f'Hi,Iam{name}andmyvalueis{value}')
return(name,value)
withProcessPoolExecutor(max_workers=3)asexecutor:
futures=[
executor.submit(run,f'P{name}')fornameinrange(5)
]
forfutureinas_completed(futures):
name,value=future.result()
print(f'Process{name}returned{value}')
Thedifferenceistrulyminimal.WeuseProcessPoolExecutorthistime,andtherunfunctionisexactlythesame,withonesmalladdition:wesleepfor50millisecondsatthebeginningofeachrun.Thisistoexacerbatethebehaviorandhavetheoutputclearlyshowthesizeofthepool,whichisstillthree.Ifweruntheexample,weget:
$pythonpool_proc.py
Hi,IamP0andmyvalueis19
Hi,IamP1andmyvalueis97
Hi,IamP2andmyvalueis74
ProcessP0returned19
ProcessP1returned97
ProcessP2returned74
Hi,IamP3andmyvalueis80
Hi,IamP4andmyvalueis68
ProcessP3returned80
ProcessP4returned68
Thisoutputclearlyshowsthepoolsizebeingthree.Itisveryinterestingtonoticethatifweremovethatcalltosleep,mostofthetimetheoutputwillhavefiveprintsofHi,Iam...,followedbyfiveprintsofProcessPxreturned....Howcanweexplainthat?Wellit'ssimple.Bythetimethefirstthreeprocessesaredone,andreturnedbyas_completed,allthreeareaskedfortheirresult,andwhateverisreturned,isprinted.Whilethishappens,theexecutorcanalreadystartrecyclingtwoprocessestorunthefinaltwotasks,andtheyhappentoprinttheirHi,Iam...messages,beforetheprintsintheforloopareallowedtotakeplace.
ThisbasicallymeansProcessPoolExecutorisquitefastandaggressive(intermsofgettingthescheduler'sattention),andit'sworthnotingthatthisbehaviordoesn'thappenwiththethreadcounterpart,inwhich,ifyourecall,wedidn'tneedtouseanyartificialsleeping.
Theimportantthingtokeepinmindthough,isbeingabletoappreciatethateven
simpleexamplessuchasthesecanalreadybeslightlytrickytounderstandorexplain.Letthisbealessontoyou,sothatyouraiseyourattentionto110%whenyoucodeformultithreadedormultiprocessdesigns.
Let'snowmoveontoamoreinterestingexample.
UsingaprocesstoaddatimeouttoafunctionMost,ifnotall,librariesthatexposefunctionstomakeHTTPrequests,providetheabilitytospecifyatimeoutwhenperformingtherequest.ThismeansthatifafterXseconds(Xbeingthetimeout),therequesthasn'tcompleted,thewholeoperationisabortedandexecutionresumesfromthenextinstruction.Notallfunctionsexposethisfeaturethough,so,whenafunctiondoesn'tprovidetheabilitytobeinginterrupted,wecanuseaprocesstosimulatethatbehavior.Inthisexample,we'llbetryingtotranslateahostnameintoanIPv4address.Thegethostbynamefunction,fromthesocketmodule,doesn'tallowustoputatimeoutontheoperationthough,soweuseaprocesstodothatartificially.Thecodethatfollowsmightnotbesostraightforward,soIencourageyoutospendsometimegoingthroughitbeforeyoureadonfortheexplanation:
#hostres/util.py
importsocket
frommultiprocessingimportProcess,Queue
defresolve(hostname,timeout=5):
exitcode,ip=resolve_host(hostname,timeout)
ifexitcode==0:
returnip
else:
returnhostname
defresolve_host(hostname,timeout):
queue=Queue()
proc=Process(target=gethostbyname,args=(hostname,queue))
proc.start()
proc.join(timeout=timeout)
ifqueue.empty():
proc.terminate()
ip=None
else:
ip=queue.get()
returnproc.exitcode,ip
defgethostbyname(hostname,queue):
ip=socket.gethostbyname(hostname)
queue.put(ip)
Let'sstartfromresolve.Itsimplytakesahostnameandatimeout,andcallsresolve_hostwiththem.Iftheexitcodeis0(whichmeanstheprocessterminated
correctly),itreturnstheIPv4thatcorrespondstothathost.Otherwise,itreturnsthehostnameitself,asafallbackmechanism.
Next,let'stalkaboutgethostbyname.Ittakesahostnameandaqueue,andcallssocket.gethostbynametoresolvethehostname.Whentheresultisavailable,itisputintothequeue.Now,thisiswheretheissuelies.Ifthecalltosocket.gethostbynametakeslongerthanthetimeoutwewanttoassign,weneedtokillit.
Theresolve_hostfunctiondoesexactlythis.Itreceivesthehostnameandthetimeout,and,atfirst,itsimplycreatesaqueue.Thenitspawnsanewprocessthattakesgethostbynameasthetarget,andpassestheappropriatearguments.Thentheprocessisstartedandjoinedon,butwithatimeout.
Now,thesuccessfulscenarioisthis:thecalltosocket.gethostbynamesucceedsquickly,theIPisinthequeue,theprocessterminateswellbeforeitstimeouttime,andwhenwegettotheifpart,thequeuewillnotbeempty.WefetchtheIPfromit,andreturnit,alongsidetheprocessexitcode.
Intheunsuccessfulscenario,thecalltosocket.gethostbynametakestoolong,andtheprocessiskilledafteritstimeouthasexpired.Becausethecallfailed,noIPhasbeeninsertedinthequeue,andthereforeitwillbeempty.Intheiflogic,wethereforesettheIPtoNone,andreturnasbefore.Theresolvefunctionwillfindthattheexitcodeisnot0(astheprocessdidn'tterminatehappily,butwaskilledinstead),andwillcorrectlyreturnthehostnameinsteadoftheIP,whichwecouldn'tgetanyway.
Inthesourcecodeofthebook,inthehostresfolderofthischapter,Ihaveaddedsometeststomakesurethisbehaviorisactuallycorrect.YoucanfindinstructionsonhowtorunthemintheREADME.mdfileinthefolder.Makesureyoucheckthetestcodetoo,itshouldbequiteinteresting.
CaseexamplesInthisfinalpartofthechapter,Iamgoingtoshowyouthreecaseexamplesinwhichwe'llseehowtodothesamethingbyemployingdifferentapproaches(single-thread,multithread,andmultiprocess).Finally,I'lldedicateafewwordstoasyncio,amodulethatintroducesyetanotherwayofdoingasynchronousprogramminginPython.
Exampleone–concurrentmergesort
Thefirstexamplewillrevolvearoundthemergesortalgorithm.Thissortingalgorithmisbasedonthedivideetimpera(divideandconquer)designparadigm.Thewayitworksisverysimple.Youhavealistofnumbersyouwanttosort.Thefirststepistodividethelistintotwoparts,sortthem,andmergetheresultsbackintoonesortedlist.Letmegiveyouasimpleexamplewithsixnumbers.Imaginewehavealist,v=[8,5,3,9,0,2].Thefirststepwouldbetodividethelist,v,intotwosublistsofthreenumbers:v1=[8,5,3]andv2=[9,0,2].Thenwesortv1andv2byrecursivelycallingmergesortonthem.Theresultwouldbev1=[3,5,8]andv2=[0,2,9].Inordertocombinev1andv2backintoasortedv,wesimplyconsiderthefirstiteminbothlists,andpicktheminimumofthose.Thefirstiterationwouldcompare3and0.Wepick0,leavingv2=[2,9].Thenwerinseandrepeat:wecompare3and2,wepick2,sonowv2=[9].Thenwecompare3and9.Thistimewepick3,leavingv1=[5,8],andsoonandsoforth.Nextwewouldpick5(5versus9),then8(8versus9),andfinally9.Thiswouldgiveusanew,sortedversionofv:v=[0,2,3,5,8,9].
ThereasonwhyIchosethisalgorithmasanexampleistwofold.First,itiseasytoparallelize.Yousplitthelistintwo,havetwoprocessesworkonthem,andthencollecttheresults.Second,itispossibletoamendthealgorithmsothatitsplitstheinitiallistintoanyN≥2,andassignsthosepartstoNprocesses.Recombinationisassimpleasdealingwithjusttwoparts.Thischaracteristicmakesitagoodcandidateforaconcurrentimplementation.
Single-threadmergesortLet'sseehowallthistranslatesintocode,startingbylearninghowtocodeourownhomemademergesort:#ms/algo/mergesort.pydefsort(v):iflen(v)<=1:returnvmid=len(v)//2v1,v2=sort(v[:mid]),sort(v[mid:])returnmerge(v1,v2)
defmerge(v1,v2):v=[]h=k=0len_v1,len_v2=len(v1),len(v2)whileh<len_v1ork<len_v2:ifk==len_v2or(h<len_v1andv1[h]<v2[k]):v.append(v1[h])h+=1else:v.append(v2[k])k+=1returnv
Let'sstartfromthesortfunction.Firstweencounterthebaseoftherecursion,whichsaysthatifthelisthas0or1elements,wedon'tneedtosortit,wecansimplyreturnitasitis.Ifthatisnotthecase,thenwecalculatethemidpoint(mid),andrecursivelycallsortonv[:mid]andv[mid:].Ihopeyouarebynowveryfamiliarwiththeslicingsyntax,butjustincaseyouneedarefresher,thefirstoneisallelementsinvuptothemidindex(excluded),andthesecondoneisallelementsfrommidtotheend.Theresultsofsortingthemareassignedrespectivelytov1andv2.Finally,wecallmerge,passingv1andv2.
Thelogicofmergeusestwopointers,handk,tokeeptrackofwhichelementsinv1andv2wehavealreadycompared.Ifwefindthattheminimumisinv1,we
appendittov,andincreaseh.Ontheotherhand,iftheminimumisinv2,weappendittovbutincreasekthistime.Theprocedureisrunninginawhileloopwhosecondition,combinedwiththeinnerif,makessurewedon'tgeterrorsduetoindexesoutofbounds.It'saprettystandardalgorithmthatyoucanfindinmanydifferentvariationsontheweb.
Inordertomakesurethiscodeissolid,Ihavewrittenatestsuitethatresidesinthech10/msfolder.Iencourageyoutocheckitout.
Nowthatwehavethebuildingblocks,let'sseehowwemodifythistomakeitsothatitworkswithanarbitrarynumberofparts.
Single-threadmultipartmergesortThecodeforthemultipartversionofthealgorithmisquitesimple.Wecanreusethemergefunction,butwe'llhavetorewritethesortone:
#ms/algo/multi_mergesort.py
fromfunctoolsimportreduce
from.mergesortimportmerge
defsort(v,parts=2):
assertparts>1,'Partsneedtobeatleast2.'
iflen(v)<=1:
returnv
chunk_len=max(1,len(v)//parts)
chunks=(
sort(v[k:k+chunk_len],parts=parts)
forkinrange(0,len(v),chunk_len)
)
returnmulti_merge(*chunks)
defmulti_merge(*v):
returnreduce(merge,v)
WesawreduceinChapter4,Functions,theBuildingBlocksofCode,whenwecodedourownfactorialfunction.Thewayitworkswithinmulti_mergeistomergethefirsttwolistsinv.Thentheresultismergedwiththethirdone,afterwhichtheresultismergedwiththefourthone,andsoon.
Takealookatthenewversionofsort.Ittakesthevlist,andthenumberofpartswewanttosplititinto.Thefirstthingwedoischeckthatwepassedacorrectnumberforparts,whichneedstobeatleasttwo.Then,likebefore,wehavethebaseoftherecursion.Andfinallywegetintothemainlogicofthefunction,whichissimplyamultipartversionoftheonewesawinthepreviousexample.Wecalculatethelengthofeachchunkusingthemaxfunction,justincasetherearefewerelementsinthelistthanparts.Andthenwewriteageneratorexpressionthatcallssortrecursivelyoneachchunk.Finally,wemergealltheresultsbycallingmulti_merge.
Iamawarethatinexplainingthiscode,Ihaven'tbeenasexhaustiveasIusuallyam,andI'mafraiditisonpurpose.Theexamplethatcomesafterthemergesortwillbemuchmorecomplex,soIwouldliketoencourageyoutoreallytrytounderstandtheprevioustwosnippetsasthoroughlyasyoucan.
Now,let'stakethisexampletothenextstep:multithreading.
MultithreadedmergesortInthisexample,weamendthesortfunctiononceagain,sothat,aftertheinitialdivisionintochunks,itspawnsathreadperpart.Eachthreadusesthesingle-threadedversionofthealgorithmtosortitspart,andthenattheendweusethemulti-mergetechniquetocalculatethefinalresult.TranslatingintoPython:
#ms/algo/mergesort_thread.py
fromfunctoolsimportreduce
frommathimportceil
fromconcurrent.futuresimportThreadPoolExecutor,as_completed
from.mergesortimportsortas_sort,merge
defsort(v,workers=2):
iflen(v)==0:
returnv
dim=ceil(len(v)/workers)
chunks=(v[k:k+dim]forkinrange(0,len(v),dim))
withThreadPoolExecutor(max_workers=workers)asexecutor:
futures=[
executor.submit(_sort,chunk)forchunkinchunks
]
returnreduce(
merge,
(future.result()forfutureinas_completed(futures))
)
Weimportalltherequiredtools,includingexecutors,theceilingfunction,andsortandmergefromthesingle-threadedversionofthealgorithm.NoticehowIchangedthenameofthesingle-threadedsortinto_sortuponimportingit.
Inthisversionofsort,wecheckwhethervisemptyfirst,andifnotweproceed.Wecalculatethedimensionofeachchunkusingtheceilfunction.It'sbasicallydoingwhatweweredoingwithmaxintheprevioussnippet,butIwantedtoshowyouanotherwaytosolvetheissue.
Whenwehavethedimension,wecalculatethechunksandprepareanicegeneratorexpressiontoservethemtotheexecutor.Therestisstraightforward:wedefinealistoffutureobjects,eachofwhichistheresultofcallingsubmitontheexecutor.Eachfutureobjectrunsthesingle-threaded_sortalgorithmonthechunkithasbeenassignedto.
Finallyastheyarereturnedbytheas_completedfunction,theresultsaremergedusingthesametechniquewesawintheearliermultipartexample.
MultiprocessmergesortToperformthefinalstep,weneedtoamendonlytwolinesinthepreviouscode.Ifyouhavepaidattentionintheintroductoryexamples,youwillknowwhichofthetwolinesIamreferringto.Inordertosavesomespace,I'lljustgiveyouthediffofthecode:
#ms/algo/mergesort_proc.py
...
fromconcurrent.futuresimportProcessPoolExecutor,as_completed
...
defsort(v,workers=2):
...
withProcessPoolExecutor(max_workers=workers)asexecutor:
...
That'sit!BasicallyallyouhavetodoisuseProcessPoolExecutorinsteadofThreadPoolExecutor,andinsteadofspawningthreads,youarespawningprocesses.
DoyourecallwhenIwassayingthatprocessescanactuallyrunondifferentcores,whilethreadsrunwithinthesameprocesssotheyarenotactuallyrunninginparallel?Thisisagoodexampletoshowyouaconsequenceofchoosingoneapproachortheother.BecausethecodeisCPU-intensive,andthereisnoIOgoingon,splittingthelistandhavingthreadsworkingthechunksdoesn'taddanyadvantage.Ontheotherhand,usingprocessesdoes.Ihaverunsomeperformancetests(runthech10/ms/performance.pymodulebyyourselfandyouwillseehowyourmachineperforms)andtheresultsprovemyexpectations:
$pythonperformance.py
TestingSort
Size:100000
Elapsedtime:0.492s
Size:500000
Elapsedtime:2.739s
TestingSortThread
Size:100000
Elapsedtime:0.482s
Size:500000
Elapsedtime:2.818s
TestingSortProc
Size:100000
Elapsedtime:0.313s
Size:500000
Elapsedtime:1.586s
Thetwotestsarerunontwolistsof100,000and500,000items,respectively.AndIamusingfourworkersforthemultithreadedandmultiprocessingversions.Usingdifferentsizesisquiteusefulwhenlookingforpatterns.Asyoucansee,thetimeelapsedisbasicallythesameforthefirsttwoversions(single-threaded,andmultithreaded),buttheyarereducedbyabout50%forthemultiprocessingversion.It'sslightlymorethan50%becausehavingtospawnprocesses,andhandlethem,comesataprice.Butstill,youcandefinitelyappreciatethatIhaveaprocessorwithtwocoresonmymachine.
ThisalsotellsyouthateventhoughIusedfourworkersinthemultiprocessingversion,Icanstillonlyparallelizeproportionatelytotheamountofcoresmyprocessorhas.Therefore,twoormoreworkersmakesverylittledifference.
Nowthatyouareallwarmedup,let'smoveontothenextexample.
Exampletwo–batchsudoku-solverInthisexample,wearegoingtoexploreasudoku-solver.Wearenotgoingtogointomuchdetailwithit,asthepointisnotthatofunderstandinghowtosolvesudoku,butrathertoshowyouhowtousemulti-processingtosolveabatchofsudokupuzzles.
Whatisinterestinginthisexample,isthatinsteadofmakingthecomparisonbetweensingleandmultithreadedversionsagain,we'regoingtoskipthatandcomparethesingle-threadedversionwithtwodifferentmultiprocessversions.Onewillassignonepuzzleperworker,soifwesolve1,000puzzles,we'lluse1,000workers(well,wewilluseapoolofNworkers,eachofwhichisconstantlyrecycled).Theotherversionwillinsteaddividetheinitialbatchofpuzzlesbythepoolsize,andbatch-solveeachchunkwithinoneprocess.Thismeans,assumingapoolsizeoffour,dividingthose1,000puzzlesintochunksof250puzzleseach,andgivingeachchunktooneworker,foratotaloffourofthem.
ThecodeIwillpresenttoyouforthesudoku-solver(withoutthemultiprocessingpart),comesfromasolutiondesignedbyPeterNorvig,whichhasbeendistributedundertheMITlicense.Hissolutionissoefficientthat,aftertryingtore-implementmyownforafewdays,andgettingtothesameresult,Isimplygaveupanddecidedtogowithhisdesign.Ididdoalotofrefactoringthough,becauseIwasn'thappywithhischoiceoffunctionandvariablenames,soImadethosemorebookfriendly,sotospeak.Youcanfindtheoriginalcode,alinktotheoriginalpagefromwhichIgotit,andtheoriginalMITlicense,inthech10/sudoku/norvigfolder.Ifyoufollowthelink,you'llfindaverythoroughexplanationofthesudoku-solverbyNorvighimself.
WhatisSudoku?Firstthingsfirst.Whatisasudokupuzzle?Sudokuisanumber-placementpuzzlebasedonlogicthatoriginatedinJapan.Theobjectiveistofilla9x9gridwithdigitssothateachrow,column,andbox(3x3subgridsthatcomposethegrid)containsallofthedigitsfrom1to9.Youstartfromapartiallypopulatedgrid,andaddnumberafternumberusinglogicconsiderations.
Sudokucanbeinterpreted,fromacomputerscienceperspective,asaproblemthatfitsintheexactcovercategory.DonaldKnuth,theauthorofTheArtofComputerProgramming(andmanyotherwonderfulbooks),hasdevisedanalgorithm,calledAlgorithmX,tosolveproblemsinthiscategory.AbeautifulandefficientimplementationofAlgorithmX,calledDancingLinks,whichharnessesthepowerofcirculardoubly-linkedlists,canbeusedtosolvesudoku.Thebeautyofthisapproachisthatallitrequiresisamappingbetweenthestructureofthesudoku,andtheDancingLinksalgorithm,andwithouthavingtodoanyofthelogicdeductionsnormallyneededtosolvethepuzzle,itgetstothesolutionatthespeedoflight.
Manyyearsago,whenmyfreetimewasanumbergreaterthanzero,IwroteaDancingLinkssudoku-solverinC#,whichIstillhavearchivedsomewhere,whichwasgreatfuntodesignandcode.Idefinitelyencourageyoutocheckouttheliteratureandcodeyourownsolver,it'sagreatexercise,ifyoucansparethetime.
Inthisexample'ssolutionthough,we'regoingtouseasearchalgorithmusedinconjunctionwithaprocessthat,inartificialintelligence,isknownasconstraintpropagation.Thetwoarequitecommonlyusedtogethertomakeaproblemsimplertosolve.We'llseethatinourexample,theyareenoughforustobeabletosolveadifficultsudokuinamatterofmilliseconds.
Implementingasudoku-solverinPythonLet'snowexploremyrefactoredimplementationofthesolver.I'mgoingtopresentthecodetoyouinsteps,asitisquiteinvolved(also,Iwon'trepeatthesourcenameatthetopofeachsnippet,untilImovetoanothermodule):
#sudoku/algo/solver.py
importos
fromitertoolsimportzip_longest,chain
fromtimeimporttime
defcross_product(v1,v2):
return[w1+w2forw1inv1forw2inv2]
defchunk(iterable,n,fillvalue=None):
args=[iter(iterable)]*n
returnzip_longest(*args,fillvalue=fillvalue)
Westartwithsomeimports,andthenwedefineacoupleofusefulfunctions:cross_productandchunk.Theydoexactlywhatthenameshintat.Thefirstonereturnsthecross-productbetweentwoiterables,whilethesecondonereturnsalistofchunksfromiterable,eachofwhichhasnelements,andthelastofwhichmightbepaddedwithagivenfillvalue,shouldthelengthofiterablenotbeamultipleofn.Thenweproceedtodefineafewstructures,whichwillbeusedbythesolver:
digits='123456789'
rows='ABCDEFGHI'
cols=digits
squares=cross_product(rows,cols)
all_units=(
[cross_product(rows,c)forcincols]
+[cross_product(r,cols)forrinrows]
+[cross_product(rs,cs)
forrsinchunk(rows,3)forcsinchunk(cols,3)]
)
units=dict(
(square,[unitforunitinall_unitsifsquareinunit])
forsquareinsquares
)
peers=dict(
(square,set(chain(*units[square]))-set([square]))
forsquareinsquares
)
Withoutgoingtoomuchintodetail,let'shoverovertheseobjects.squaresisalist
ofallsquaresinthegrid.SquaresarerepresentedbyastringsuchasA3orC7.Rowsarenumberedwithletters,andcolumnswithnumbers,soA3willindicatethesquareinthefirstrow,andthirdcolumn.
all_unitsisalistofallpossiblerows,columns,andblocks.Eachofthoseelementsisrepresentedasalistofthesquaresthatbelongtotherow/column/block.unitsisamorecomplexstructure.Itisadictionarywith81keys.Eachkeyrepresentsasquare,andthecorrespondingvalueisalistwiththreeelementsinit:arow,acolumn,andablock.Ofcourse,thosearetherow,column,andblockthatthesquarebelongsto.
Finally,peersisadictionaryverysimilartounits,butthevalueofeachkey(whichstillrepresentsasquare),isasetcontainingallpeersforthatsquare.Peersaredefinedasallthesquaresbelongingtotherow,column,andblockthesquareinthekeybelongsto.Thesestructureswillbeusedinthecalculationofthesolution,whenattemptingtosolveapuzzle.
Beforewetakealookatthefunctionthatparsestheinputlines,letmegiveyouanexampleofwhataninputpuzzlelookslike:
1..3.......75...3..3.4.8.2...47....9.........689....4..5..178.4.....2.75.......1.
Thefirstninecharactersrepresentthefirstrow,thenanothernineforthesecondrow,andsoon.Emptysquaresarerepresentedbydots:
defparse_puzzle(puzzle):
assertset(puzzle)<=set('.0123456789')
assertlen(puzzle)==81
grid=dict((square,digits)forsquareinsquares)
forsquare,digitinzip(squares,puzzle):
ifdigitindigitsandnotplace(grid,square,digit):
returnFalse#Incongruentpuzzle
returngrid
defsolve(puzzle):
grid=parse_puzzle(puzzle)
returnsearch(grid)
Thissimpleparse_puzzlefunctionisusedtoparseaninputpuzzle.Wedoalittlebitofsanitycheckingatthebeginning,assertingthattheinputpuzzlehastoshrinkintoasetthatisasubsetofthesetofallnumbersplusadot.Thenwemakesurewehave81inputcharacters,andfinallywedefinegrid,whichinitiallyissimplyadictionarywith81keys,eachofwhichisasquare,allwiththesame
value,whichisastringofallpossibledigits.Thisisbecauseasquareinacompletelyemptygridhasthepotentialtobecomeanynumberfrom1to9.Theforloopisdefinitelythemostinterestingpart.Weparseeachofthe81charactersintheinputpuzzle,couplingthemwiththecorrespondingsquareinthegrid,andwetryto"place"them.Iputthatindoublequotesbecause,aswe'llseeinamoment,theplacefunctiondoesmuchmorethansimplysettingagivennumberinagivensquare.Ifwefindthatwecannotplaceadigitfromtheinputpuzzle,itmeanstheinputisinvalid,andwereturnFalse.Otherwise,we'regoodtogoandwereturnthegrid.
parse_puzzleisusedinthesolvefunction,whichsimplyparsestheinputpuzzle,andunleashessearchonit.Whatfollowsisthereforetheheartofthealgorithm:
defsearch(grid):
ifnotgrid:
returnFalse
ifall(len(grid[square])==1forsquareinsquares):
returngrid#Solved
values,square=min(
(len(grid[square]),square)forsquareinsquares
iflen(grid[square])>1
)
fordigitingrid[square]:
result=search(place(grid.copy(),square,digit))
ifresult:
returnresult
Thissimplefunctionfirstcheckswhetherthegridisactuallynon-empty.Thenittriestoseewhetherthegridissolved.Asolvedgridwillhaveonevaluepersquare.Ifthatisnotthecase,itloopsthrougheachsquareandfindsthesquarewiththeminimumamountofcandidates.Ifasquarehasastringvalueofonlyonedigit,itmeansanumberhasbeenplacedinthatsquare.Butifthevalueismorethanonedigit,thenthosearepossiblecandidates,soweneedtofindthesquarewiththeminimumamountofcandidates,andtrythem.Tryingasquarewith23candidatesismuchbetterthantryingonewith23589.Inthefirstcase,wehavea50%chanceofgettingtherightvalue,whileinthesecondone,weonlyhave20%.Choosingthesquarewiththeminimumamountofcandidatesthereforemaximizesthechancesforustoplacegoodnumbersinthegrid.
Oncethecandidateshavebeenfound,wetrytheminorderandifanyofthemresultsinbeingsuccessful,wehavesolvedthegridandwereturn.Youmighthavenoticedtheuseoftheplacefunctioninthesearchtoo.Solet'sexploreitscode:
defplace(grid,square,digit):
"""Eliminatealltheothervalues(exceptdigit)from
grid[square]andpropagate.
Returngrid,orFalseifacontradictionisdetected.
"""
other_vals=grid[square].replace(digit,'')
ifall(eliminate(grid,square,val)forvalinother_vals):
returngrid
returnFalse
Thisfunctiontakesawork-in-progressgrid,andtriestoplaceagivendigitinagivensquare.AsImentionedbefore,"placing"isnotthatstraightforward.Infact,whenweplaceanumber,wehavetopropagatetheconsequencesofthatactionthroughoutthegrid.Wedothatbycallingtheeliminatefunction,whichappliestwostrategiesofthesudokugame:
Ifasquarehasonlyonepossiblevalue,eliminatethatvaluefromthesquare'speersIfaunithasonlyoneplaceforavalue,placethevaluethere
Letmebrieflyofferanexampleofbothpoints.Forthefirstone,ifyouplace,say,number7inasquare,thenyoucaneliminate7fromthelistofcandidatesforallthesquaresthatbelongtotherow,column,andblockthatsquarebelongsto.
Forthesecondpoint,sayyou'reexaminingthefourthrowand,ofallthesquaresthatbelongtoit,onlyoneofthemhasnumber7initscandidates.Thismeansthatnumber7canonlygointhatprecisesquare,soyoushouldgoaheadandplaceitthere.
Thefollowingfunction,eliminate,appliesthesetworules.Itscodeisquiteinvolved,soinsteadofgoinglinebylineandofferinganexcruciatingexplanation,Ihaveaddedsomecomments,andwillleaveyouwiththetaskofunderstandingit:
defeliminate(grid,square,digit):
"""Eliminatedigitfromgrid[square].Propagatewhencandidates
are<=2.
Returngrid,orFalseifacontradictionisdetected.
"""
ifdigitnotingrid[square]:
returngrid#alreadyeliminated
grid[square]=grid[square].replace(digit,'')
##(1)Ifasquareisreducedtoonevalue,eliminatevalue
##frompeers.
iflen(grid[square])==0:
returnFalse#nothinglefttoplacehere,wrongsolution
eliflen(grid[square])==1:
value=grid[square]
ifnotall(
eliminate(grid,peer,value)forpeerinpeers[square]
):
returnFalse
##(2)Ifaunitisreducedtoonlyoneplaceforavalue,
##thenputitthere.
forunitinunits[square]:
places=[sqrforsqrinunitifdigitingrid[sqr]]
iflen(places)==0:
returnFalse#Noplaceforthisvalue
eliflen(places)==1:
#digitcanonlybeinoneplaceinunit,
#assignitthere
ifnotplace(grid,places[0],digit):
returnFalse
returngrid
Therestofthefunctionsinthemodulearen'timportantfortherestofthisexample,soIwillskipthem.Youcanrunthismodulebyitself;itwillfirstperformaseriesofchecksonitsdatastructures,andthenitwillsolveallthesudokupuzzlesIhaveplacedinthesudoku/puzzlesfolder.Butthatisnotwhatwe'reinterestedin,right?Wewanttoseehowtosolvesudokuusingmultiprocessingtechniques,solet'sgettoit.
SolvingsudokuwithmultiprocessingInthismodule,we'regoingtoimplementthreefunctions.Thefirstonesimplysolvesabatchofsudokupuzzles,withnomultiprocessinginvolved.Wewillusetheresultsforbenchmarking.Thesecondandthethirdoneswillusemultiprocessing,withandwithoutbatch-solving,sowecanappreciatethedifferences.Let'sstart:
#sudoku/process_solver.py
importos
fromfunctoolsimportreduce
fromoperatorimportconcat
frommathimportceil
fromtimeimporttime
fromcontextlibimportcontextmanager
fromconcurrent.futuresimportProcessPoolExecutor,as_completed
fromunittestimportTestCase
fromalgo.solverimportsolve
@contextmanager
deftimer():
t=time()
yield
tot=time()-t
print(f'Elapsedtime:{tot:.3f}s')
Afteralonglistofimports,wedefineacontextmanagerthatwe'regoingtouseasatimerdevice.Ittakesareferencetothecurrenttime(t),andthenityields.Afterhavingyielded,that'swhenthebodyofthemanagedcontextisexecuted.Finally,onexitingthemanagedcontext,wecalculatetot,whichisthetotalamountoftimeelapsed,andprintit.It'sasimpleandelegantcontextmanagerwrittenwiththedecorationtechnique,andit'ssuperfun.Let'snowseethethreefunctionsImentionedearlier:
defbatch_solve(puzzles):
#Singlethreadbatchsolve.
return[solve(puzzle)forpuzzleinpuzzles]
Thisoneisasingle-threadedsimplebatchsolver,whichwillgiveusatimetocompareagainst.Itsimplyreturnsalistofallsolvedgrids.Boring.Now,checkoutthefollowingcode:
defparallel_single_solver(puzzles,workers=4):
#Parallelsolve-1processpereachpuzzle
withProcessPoolExecutor(max_workers=workers)asexecutor:
futures=(
executor.submit(solve,puzzle)forpuzzleinpuzzles
)
return[
future.result()forfutureinas_completed(futures)
]
Thisoneismuchbetter.ItusesProcessPoolExecutortouseapoolofworkers,eachofwhichisusedtosolveroughlyone-fourthofthepuzzles.Thisisbecausewearespawningonefutureobjectperpuzzle.Thelogicisextremelysimilartoanymultiprocessingexamplewehavealreadyseeninthechapter.Let'sseethethirdfunction:
defparallel_batch_solver(puzzles,workers=4):
#Parallelbatchsolve-Puzzlesarechunkedinto`workers`
#chunks.Aprocessforeachchunk.
assertlen(puzzles)>=workers
dim=ceil(len(puzzles)/workers)
chunks=(
puzzles[k:k+dim]forkinrange(0,len(puzzles),dim)
)
withProcessPoolExecutor(max_workers=workers)asexecutor:
futures=(
executor.submit(batch_solve,chunk)forchunkinchunks
)
results=(
future.result()forfutureinas_completed(futures)
)
returnreduce(concat,results)
Thislastfunctionisslightlydifferent.Insteadofspawningonefutureobjectperpuzzle,itsplitsthetotallistofpuzzlesintoworkerschunks,andthencreatesonefutureobjectperchunk.Thismeansthatifworkersiseight,we'regoingtospawneightfutureobjects.Noticethatinsteadofpassingsolvetoexecutor.submit,we'repassingbatch_solve,whichdoesthetrick.ThereasonwhyIcodedthelasttwofunctionssodifferentlyisbecauseIwascurioustoseetheseverityoftheimpactoftheoverheadweincurintowhenwerecycleprocessesfromapoolanon-negligibleamountoftimes.
Nowthatwehavethefunctionsdefined,let'susethem:
puzzles_file=os.path.join('puzzles','sudoku-topn234.txt')
withopen(puzzles_file)asstream:
puzzles=[puzzle.strip()forpuzzleinstream]
#singlethreadsolve
withtimer():
res_batch=batch_solve(puzzles)
#parallelsolve,1processperpuzzle
withtimer():
res_parallel_single=parallel_single_solver(puzzles)
#parallelbatchsolve,1batchperprocess
withtimer():
res_parallel_batch=parallel_batch_solver(puzzles)
#Quickwaytoverifythattheresultsarethesame,but
#possiblyinadifferentorder,astheydependonhowthe
#processeshavebeenscheduled.
assert_items_equal=TestCase().assertCountEqual
assert_items_equal(res_batch,res_parallel_single)
assert_items_equal(res_batch,res_parallel_batch)
print('Done.')
Weuseasetof234veryhardsudokupuzzlesforthisbenchmarkingsession.Asyoucansee,wesimplyrunthethreefunctions,batch_solve,parallel_single_solver,andparallel_batch_solver,allwithinatimedcontext.Wecollecttheresults,and,justtomakesure,weverifythatalltherunshaveproducedthesameresults.
Ofcourse,inthesecondandthirdruns,wehaveusedmultiprocessing,sowecannotguaranteethattheorderintheresultswillbethesameasthatofthesingle-threadedbatch_solve.ThisminorissueisbrilliantlysolvedwiththeaidofassertCountEqual,oneoftheworst-namedmethodsinthePythonstandardlibrary.WefinditintheTestCaseclass,whichwecaninstantiatejusttotakeareferencetothemethodweneed.We'renotactuallyrunningunittests,butthisisacooltrick,andIwantedtoshowittoyou.Let'sseetheoutputofrunningthismodule:
$pythonprocess_solver.py
Elapsedtime:5.368s
Elapsedtime:2.856s
Elapsedtime:2.818s
Done.
Wow.Thatisquiteinteresting.Firstofall,youcanonceagainseethatmymachinehasatwo-coreprocessor,asthetimeelapsedforthemultiprocessingrunsisabouthalfthetimetakenbythesingle-threadedsolver.However,whatisactuallymuchmoreinterestingisthefactthatthereisbasicallynodifferenceinthetimetakenbythetwomultiprocessingfunctions.Multiplerunssometimesendinfavorofoneapproach,andsometimesinfavoroftheother.Understandingwhyrequiresadeepunderstandingofallthecomponentsthataretakingpartinthegame,notjusttheprocesses,andthereforeisnotsomethingwecandiscusshere.Itisfairlysafetosaythough,thatthetwoapproachesarecomparableintermsofperformance.
Inthesourcecodeforthebook,youcanfindtestsinthesudokufolder,withinstructionsonhowtorunthem.Takethetimetocheckthemout!
Andnow,let'sgettothefinalexample.
Examplethree–downloadingrandompicturesThisexamplehasbeenfuntocode.Wearegoingtodownloadrandompicturesfromawebsite.I'llshowyouthreeversions:aserialone,amultiprocessingone,andfinallyasolutioncodedusingasyncio.Intheseexamples,wearegoingtouseawebsitecalledhttp://lorempixel.com,whichprovidesyouwithanAPIthatyoucancalltogetrandomimages.Ifyoufindthatthewebsiteisdownorslow,youcanuseanexcellentalternativetoit:https://lorempizza.com/.
ItmaybesomethingofaclichéforabookwrittenbyanItalian,butthepicturesaregorgeous.Youcansearchforanotheralternativeontheweb,ifyouwanttohavesomefun.Whateverwebsiteyouchoose,pleasebesensibleandtrynottohammeritbymakingamillionrequeststoit.Themultiprocessingandasyncioversionsofthiscodecanbequiteaggressive!
Let'sstartbyexploringthesingle-threadedversionofthecode:
#aio/randompix_serial.py
importos
fromsecretsimporttoken_hex
importrequests
PICS_FOLDER='pics'
URL='http://lorempixel.com/640/480/'
defdownload(url):
resp=requests.get(URL)
returnsave_image(resp.content)
defsave_image(content):
filename='{}.jpg'.format(token_hex(4))
path=os.path.join(PICS_FOLDER,filename)
withopen(path,'wb')asstream:
stream.write(content)
returnfilename
defbatch_download(url,n):
return[download(url)for_inrange(n)]
if__name__=='__main__':
saved=batch_download(URL,10)
print(saved)
Thiscodeshouldbestraightforwardtoyoubynow.Wedefineadownload
function,whichmakesarequesttothegivenURL,savestheresultbycallingsave_image,andfeedsitthebodyoftheresponsefromthewebsite.Savingtheimageisverysimple:wecreatearandomfilenamewithtoken_hex,justbecauseit'sfun,thenwecalculatethefullpathofthefile,createitinbinarymode,andwriteintoitthecontentoftheresponse.Wereturnthefilenametobeabletoprintitonscreen.Finallybatch_downloadsimplyrunsthenrequestswewanttorunandreturnsthefilenamesasaresult.
Youcanleapfrogtheif__name__...linefornow,itwillbeexplainedinChapter12,GUIsandScriptsandit'snotimportanthere.Allwedoiscallbatch_downloadwiththeURLandwetellittodownload10images.Ifyouhaveaneditor,openthepicsfolder,andyoucanseeitgettingpopulatedinafewseconds(alsonotice:thescriptassumesthepicsfolderexists).
Let'sspicethingsupabit.Let'sintroducemultiprocessing(thecodeisvastlysimilar,soIwillnotrepeatit):
#aio/randompix_proc.py
...
fromconcurrent.futuresimportProcessPoolExecutor,as_completed
...
defbatch_download(url,n,workers=4):
withProcessPoolExecutor(max_workers=workers)asexecutor:
futures=(executor.submit(download,url)for_inrange(n))
return[future.result()forfutureinas_completed(futures)]
...
Thetechniqueshouldbefamiliartoyoubynow.Wesimplysubmitjobstotheexecutor,andcollecttheresultsastheybecomeavailable.BecausethisisIOboundcode,theprocessesworkquitefastandthereisheavycontext-switchingwhiletheprocessesarewaitingfortheAPIresponse.Ifyouhaveaviewoverthepicsfolder,youwillnoticethatit'snotgettingpopulatedinalinearfashionanymore,butrather,inbatches.
Let'snowlookattheasyncioversionofthisexample.
DownloadingrandompictureswithasyncioThecodeisprobablythemostchallengingofthewholechapter,sodon'tfeelbadifitistoomuchforyouatthismomentintime.Ihaveaddedthisexamplejustasamouthwateringdevice,toencourageyoutodigdeeperintotheheartofPythonasynchronousprogramming.Anotherthingworthknowingisthatthereareprobablyseveralotherwaystowritethissamelogic,sopleasebearinmindthatthisisjustoneofthepossibleexamples.
Theasynciomoduleprovidesinfrastructureforwritingsingle-threaded,concurrentcodeusingcoroutines,multiplexingIOaccessoversocketsandotherresources,runningnetworkclientsandservers,andotherrelatedprimitives.ItwasaddedtoPythoninversion3.4,andsomeclaimitwillbecomethedefactostandardforwritingPythoncodeinthefuture.Idon'tknowwhetherthat'strue,butIknowitisdefinitelyworthseeinganexample:
#aio/randompix_corout.py
importos
fromsecretsimporttoken_hex
importasyncio
importaiohttp
Firstofall,wecannotuserequestsanymore,asitisnotsuitableforasyncio.Wehavetouseaiohttp,sopleasemakesureyouhaveinstalledit(it'sintherequirementsforthebook):
PICS_FOLDER='pics'
URL='http://lorempixel.com/640/480/'
asyncdefdownload_image(url):
asyncwithaiohttp.ClientSession()assession:
asyncwithsession.get(url)asresp:
returnawaitresp.read()
Thepreviouscodedoesnotlooktoofriendly,butit'snotsobad,onceyouknowtheconceptsbehindit.Wedefinetheasynccoroutinedownload_image,whichtakesaURLasparameter.
Incaseyoudon'tknow,acoroutineisacomputerprogramcomponentthatgeneralizes
subroutinesfornon-preemptivemultitasking,byallowingmultipleentrypointsforsuspendingandresumingexecutionatcertainlocations.Asubroutineisasequenceofprograminstructionsthatperformsaspecifictask,packagedasaunit.
Insidedownload_image,wecreateasessionobjectusingtheClientSessioncontextmanager,andthenwegettheresponsebyusinganothercontextmanager,thistimefromsession.get.Thefactthatthesemanagersaredefinedasasynchronoussimplymeansthattheyareabletosuspendexecutionintheirenterandexitmethods.Wereturnthecontentoftheresponsebyusingtheawaitkeyword,whichallowssuspension.Noticethatcreatingasessionforeachrequestisnotoptimal,butIfeltthatforthepurposeofthisexampleIwouldkeepthecodeasstraightforwardaspossible,soIleaveitsoptimizationtoyou,asanexercise.
Let'sproceedwiththenextsnippet:
asyncdefdownload(url,semaphore):
asyncwithsemaphore:
content=awaitdownload_image(url)
filename=save_image(content)
returnfilename
defsave_image(content):
filename='{}.jpg'.format(token_hex(4))
path=os.path.join(PICS_FOLDER,filename)
withopen(path,'wb')asstream:
stream.write(content)
returnfilename
Anothercoroutine,download,getsaURLandasemaphore.Allitdoesisfetchthecontentoftheimage,bycallingdownload_image,savingit,andreturningthefilename.Theinterestingbithereistheuseofthatsemaphore.Weuseitasanasynchronouscontextmanager,sothatwecansuspendthiscoroutineaswell,andallowaswitchtosomethingelse,butmorethanhow,itisimportanttounderstandwhywewanttouseasemaphore.Thereasonissimple,thissemaphoreiskindoftheequivalentofapoolofthreads.WeuseittoallowatmostNcoroutinestobeactiveatthesametime.Weinstantiateitinthenextfunction,andwepass10astheinitialvalue.Everytimeacoroutineacquiresthesemaphore,itsinternalcounterisdecreasedby1,thereforewhen10coroutineshaveacquiredit,thenextonewillsitandwait,untilthesemaphoreisreleasedbyacoroutinethathascompleted.ThisisanicewaytotrytolimithowaggressivelywearefetchingimagesfromthewebsiteAPI.
Thesave_imagefunctionisnotacoroutine,anditslogichasalreadybeendiscussedinthepreviousexamples.Let'snowgettothepartofthecodewhere
executiontakesplace:
defbatch_download(images,url):
loop=asyncio.get_event_loop()
semaphore=asyncio.Semaphore(10)
cors=[download(url,semaphore)for_inrange(images)]
res,_=loop.run_until_complete(asyncio.wait(cors))
loop.close()
return[r.result()forrinres]
if__name__=='__main__':
saved=batch_download(20,URL)
print(saved)
Wedefinethebatch_downloadfunction,whichtakesanumber,images,andtheURLofwheretofetchthem.Thefirstthingitdoesiscreateaneventloop,whichisnecessarytorunanyasynchronouscode.Theeventloopisthecentralexecutiondeviceprovidedbyasyncio.Itprovidesmultiplefacilities,including:
Registering,executing,andcancellingdelayedcalls(timeouts)CreatingclientandservertransportsforvariouskindsofcommunicationLaunchingsubprocessesandtheassociatedtransportsforcommunicationwithanexternalprogramDelegatingcostlyfunctioncallstoapoolofthreads
Aftertheeventloopiscreated,weinstantiatethesemaphore,andthenweproceedtocreatealistoffutures,cors.Bycallingloop.run_until_complete,wemakesuretheeventloopwillrununtilthewholetaskhasbeencompleted.Wefeedittheresultofacalltoasyncio.wait,whichwaitsforthefuturestocomplete.
Whendone,weclosetheeventloop,andreturnalistoftheresultsyieldedbyeachfutureobject(thefilenamesofthesavedimages).Noticehowwecapturetheresultsofthecalltoloop.run_until_complete.Wedon'treallycarefortheerrors,soweassign_totheseconditeminthetuple.ThisisacommonPythonidiomusedwhenwewanttosignalthatwe'renotinterestedinthatobject.
Attheendofthemodule,wecallbatch_downloadandweget20imagessaved.Theycomeinbatches,andthewholeprocessislimitedbyasemaphorewithonly10availablespots.
Andthat'sit!Tolearnmoreaboutasyncio,pleaserefertothedocumentationpage(https://docs.python.org/3.7/library/asyncio.html)fortheasynciomoduleonthestandardlibrary.Thisexamplewasfuntocode,andhopefullyitwillmotivate
youtostudyhardandunderstandtheintricaciesofthiswonderfulsideofPython.
SummaryInthischapter,welearnedaboutconcurrencyandparallelism.Wesawhowthreadsandprocesseshelpinachievingoneandtheother.Weexploredthenatureofthreadsandtheissuesthattheyexposeusto:raceconditionsanddeadlocks.
Welearnedhowtosolvethoseissuesbyusinglocksandcarefulresourcemanagement.Wealsolearnedhowtomakethreadscommunicateandsharedata,andwetalkedaboutthescheduler,whichisthatpartoftheoperatingsystemthatdecideswhichthreadwillrunatanygiventime.Wethenmovedtoprocesses,andexploredabunchoftheirpropertiesandcharacteristics.
Followingtheinitialtheoreticalpart,welearnedhowtoimplementthreadsandprocessesinPython.Wedealtwithmultiplethreadsandprocesses,fixedraceconditions,andlearnedworkaroundstostopthreadswithoutleavinganyresourceopenbymistake.WealsoexploredIPC,andusedqueuestoexchangemessagesbetweenprocessesandthreads.Wealsoplayedwitheventsandbarriers,whicharesomeofthetoolsprovidedbythestandardlibrarytocontroltheflowofexecutioninanon-deterministicenvironment.
Afteralltheseintroductoryexamples,wedeepdivedintothreecaseexamples,whichshowedhowtosolvethesameproblemusingdifferentapproaches:single-thread,multithread,multiprocess,andasyncio.
Welearnedaboutmergesortandhow,ingeneral,divideandconqueralgorithmsareeasytoparallelize.
Welearnedaboutsudoku,andexploredanicesolutionthatusesalittlebitofartificialintelligencetorunanefficientalgorithm,whichwethenranindifferentserialandparallelmodes.
Finally,wesawhowtodownloadrandompicturesfromawebsite,usingserial,multiprocess,andasynciocode.Thelatterwasbyfarthehardestpieceofcodeinthewholebook,anditspresenceinthechapterservesasareminder,orsomesortofmilestonethatwillencouragethereadertolearnPythonwell,anddeeply.
Nowwe'llmoveontomuchsimpler,andmostlyproject-orientedchapters,wherewegetatasteofdifferentreal-worldapplicationsindifferentcontexts.
DebuggingandTroubleshooting"Ifdebuggingistheprocessofremovingsoftwarebugs,thenprogrammingmustbetheprocessofputtingthemin."
–EdsgerW.Dijkstra
Inthelifeofaprofessionalcoder,debuggingandtroubleshootingtakeupasignificantamountoftime.Evenifyouworkonthemostbeautifulcodebaseeverwrittenbyahuman,therewillstillbebugsinit;thatisguaranteed.
Wespendanawfullotoftimereadingotherpeople'scodeand,inmyopinion,agoodsoftwaredeveloperissomeonewhokeepstheirattentionhigh,evenwhenthey'rereadingcodethatisnotreportedtobewrongorbuggy.
Beingabletodebugcodeefficientlyandquicklyisaskillthateverycoderneedstokeepimproving.Somethinkthatbecausetheyhavereadthemanual,they'refine,buttherealityis,thenumberofvariablesinthegameissogreatthatthereisnomanual.Thereareguidelinesonecanfollow,butthereisnomagicbookthatwillteachyoueverythingyouneedtoknowinordertobecomegoodatthis.
Ifeelthatonthisparticularsubject,Ihavelearnedthemostfrommycolleagues.Itamazesmetoobservesomeoneveryskilledattackingaproblem.Ienjoyseeingthestepstheytake,thethingstheyverifytoexcludepossiblecauses,andthewaytheyconsiderthesuspectsthateventuallyleadthemtoasolution.
Everycolleagueweworkwithcanteachussomething,orsurpriseuswithafantasticguessthatturnsouttobetherightone.Whenthathappens,don'tjustremaininwonderment(orworse,inenvy),butseizethemomentandaskthemhowtheygottothatguessandwhy.Theanswerwillallowyoutoseewhetherthereissomethingyoucanstudyin-depthlateronsothat,maybenexttime,you'llbetheonewhowillcatchthebug.
Somebugsareveryeasytospot.Theycomeoutofcoarsemistakesand,onceyouseetheeffectsofthosemistakes,it'seasytofindasolutionthatfixestheproblem.
Butthereareotherbugsthataremuchmoresubtle,muchmoreslippery,and
requiretrueexpertise,andagreatdealofcreativityandout-of-the-boxthinking,tobedealtwith.
Theworstofall,atleastforme,arethenondeterministicones.Thesesometimeshappen,andsometimesdon't.SomehappenonlyinenvironmentAbutnotinenvironmentB,eventhoughAandBaresupposedtobeexactlythesame.Thosebugsarethetrulyevilones,andtheycandriveyoucrazy.
Andofcourse,bugsdon'tjusthappeninthesandbox,right?Withyourbosstellingyou,"Don'tworry!Takeyourtimetofixthis.Havelunchfirst!"Nope.TheyhappenonaFridayathalfpastfive,whenyourbrainiscookedandyoujustwanttogohome.It'sinthosemomentswheneveryoneisgettingupsetinasplitsecond,whenyourbossisbreathingdownyourneck,thatyouhavetobeabletokeepcalm.AndIdomeanit.That'sthemostimportantskilltohaveifyouwanttobeabletofightbugseffectively.Ifyouallowyourmindtogetstressed,saygoodbyetocreativethinking,tologicaldeduction,andtoeverythingyouneedatthatmoment.Sotakeadeepbreath,sitproperly,andfocus.
Inthischapter,Iwilltrytodemonstratesomeusefultechniquesthatyoucanemployaccordingtotheseverityofthebug,andafewsuggestionsthatwillhopefullyboostyourweaponsagainstbugsandissues.
Specifically,we'regoingtolookatthefollowing:
DebuggingtechniquesProfilingAssertions
Troubleshootingguidelines
DebuggingtechniquesInthispart,I'llpresentyouwiththemostcommontechniques,theonesIusemostoften;however,pleasedon'tconsiderthislisttobeexhaustive.
DebuggingwithprintThisisprobablytheeasiesttechniqueofall.It'snotveryeffective,itcannotbeusedeverywhere,anditrequiresaccesstoboththesourcecodeandaTerminalthatwillrunit(andthereforeshowtheresultsoftheprintfunctioncalls).
However,inmanysituations,thisisstillaquickandusefulwaytodebug.Forexample,ifyouaredevelopingaDjangowebsiteandwhathappensinapageisnotwhatyouwouldexpect,youcanfilltheviewwithprintsandkeepaneyeontheconsolewhileyoureloadthepage.Whenyouscattercallstoprintinyourcode,younormallyendupinasituationwhereyouduplicatealotofdebuggingcode,eitherbecauseyou'reprintingatimestamp(likewedidwhenweweremeasuringhowfastlistcomprehensionsandgeneratorswere),orbecauseyouhavesomehowtobuildastringofsomesortthatyouwanttodisplay.
Anotherissueisthatit'sextremelyeasytoforgetcallstoprintinyourcode.
So,forthesereasons,ratherthanusingabarecalltoprint,Isometimesprefertocodeacustomfunction.Let'sseehow.
DebuggingwithacustomfunctionHavingacustomfunctioninasnippetthatyoucanquicklygrabandpasteintothecode,andthenusetodebug,canbeveryuseful.Ifyou'refast,youcanalwayscodeoneonthefly.Theimportantthingistocodeitinawaythatitwon'tleavestuffaroundwhenyoueventuallyremovethecallsanditsdefinition.Thereforeit'simportanttocodeitinawaythatiscompletelyself-contained.Anothergoodreasonforthisrequirementisthatitwillavoidpotentialnameclasheswiththerestofthecode.
Let'sseeanexampleofsuchafunction:
#custom.py
defdebug(*msg,print_separator=True):
print(*msg)
ifprint_separator:
print('-'*40)
debug('Datais...')
debug('Different','Strings','Arenotaproblem')
debug('Afterwhileloop',print_separator=False)
Inthiscase,Iamusingakeyword-onlyargumenttobeabletoprintaseparator,whichisalineof40dashes.
Thefunctionisverysimple.Ijustredirectwhateverisinmsgtoacalltoprintand,ifprint_separatorisTrue,Iprintalineseparator.Runningthecodewillshowthefollowing:
$pythoncustom.py
Datais...
----------------------------------------
DifferentStringsArenotaproblem
----------------------------------------
Afterwhileloop
Asyoucansee,thereisnoseparatorafterthelastline.
Thisisjustoneeasywaytosomehowaugmentasimplecalltotheprintfunction.Let'sseehowwecancalculateatimedifferencebetweencalls,usingoneofPython'strickyfeaturestoouradvantage:
#custom_timestamp.py
fromtimeimportsleep
defdebug(*msg,timestamp=[None]):
print(*msg)
fromtimeimporttime#localimport
iftimestamp[0]isNone:
timestamp[0]=time()#1
else:
now=time()
print(
'Timeelapsed:{:.3f}s'.format(now-timestamp[0])
)
timestamp[0]=now#2
debug('Enteringnastypieceofcode...')
sleep(.3)
debug('Firststepdone.')
sleep(.5)
debug('Secondstepdone.')
Thisisabittrickier,butstillquitesimple.First,noticeweimportthetimefunctionfromthetimemodulefrominsidethedebugfunction.Thisallowsustoavoidhavingtoaddthatimportoutsideofthefunction,andmaybeforgetitthere.
TakealookathowIdefinedtimestamp.It'salist,ofcourse,butwhat'simportanthereisthatitisamutableobject.ThismeansthatitwillbesetupwhenPythonparsesthefunctionanditwillretainitsvaluethroughoutdifferentcalls.Therefore,ifweputatimestampinitaftereachcall,wecankeeptrackoftimewithouthavingtouseanexternalglobalvariable.Iborrowedthistrickfrommystudiesonclosures,atechniquethatIencourageyoutoreadaboutbecauseit'sveryinteresting.
Right,so,afterhavingprintedwhatevermessagewehadtoprintandsomeimportingtime,wetheninspectthecontentoftheonlyitemintimestamp.IfitisNone,wehavenopreviousreference,thereforewesetthevaluetothecurrenttime(#1).
Ontheotherhand,ifwehaveapreviousreference,wecancalculateadifference(whichwenicelyformattothreedecimaldigits)andthenwefinallyputthecurrenttimeagainintimestamp(#2).It'sanicetrick,isn'tit?
Runningthiscodeshowsthisresult:
$pythoncustom_timestamp.py
Enteringnastypieceofcode...
Firststepdone.
Timeelapsed:0.304s
Secondstepdone.
Timeelapsed:0.505s
Whateveryoursituation,havingaself-containedfunctionlikethiscanbeveryuseful.
InspectingthetracebackWebrieflytalkedaboutthetracebackinChapter8,Testing,Profiling,andDealingwithExceptions,whenwesawseveraldifferentkindsofexceptions.Thetracebackgivesyouinformationaboutwhatwentwronginyourapplication.It'shelpfultoreadit,solet'sseeasmallexample:
#traceback_simple.py
d={'some':'key'}
key='some-other'
print(d[key])
Wehaveadictionaryandwetrytoaccessakeythatisn'tinit.YoushouldrememberthatthiswillraiseaKeyErrorexception.Let'srunthecode:
$pythontraceback_simple.py
Traceback(mostrecentcalllast):
File"traceback_simple.py",line3,in<module>
print(d[key])
KeyError:'some-other'
Youcanseethatwegetalltheinformationweneed:themodulename,thelinethatcausedtheerror(boththenumberandtheinstruction),andtheerroritself.Withthisinformation,youcangobacktothesourcecodeandtrytounderstandwhat'sgoingon.
Let'snowcreateamoreinterestingexamplethatbuildsontopofthis,andexercisesafeaturethatisonlyavailableinPython3.Imaginethatwe'revalidatingadictionary,workingonmandatoryfields,thereforeweexpectthemtobethere.Ifnot,weneedtoraiseacustomValidationErrorthatwewilltrapfurtherupstreamintheprocessthatrunsthevalidator(whichisnotshownhere,soitcouldbeanything,really).Itshouldbesomethinglikethis:
#traceback_validator.py
classValidatorError(Exception):
"""RaisedwhenaccessingadictresultsinKeyError."""
d={'some':'key'}
mandatory_key='some-other'
try:
print(d[mandatory_key])
exceptKeyErroraserr:
raiseValidatorError(
f'`{mandatory_key}`notfoundind.'
)fromerr
Wedefineacustomexceptionthatisraisedwhenthemandatorykeyisn'tthere.Notethatitsbodyconsistsofitsdocumentationstring,sowedon'tneedtoaddanyotherstatements.
Verysimply,wedefineadummydictandtrytoaccessitusingmandatory_key.WetrapKeyErrorandraiseValidatorErrorwhenthathappens.Andwedoitbyusingtheraise...from...syntax,whichwasintroducedinPython3byPEP3134(https://www.python.org/dev/peps/pep-3134/),tochainexceptions.ThepurposeofdoingthisisthatwemayalsowanttoraiseValidatorErrorinothercircumstances,notnecessarilyasaconsequenceofamandatorykeybeingmissing.Thistechniqueallowsustorunthevalidationinasimpletry/exceptthatonlycaresaboutValidatorError.
Withoutbeingabletochainexceptions,wewouldloseinformationaboutKeyError.Thecodeproducesthisresult:
$pythontraceback_validator.py
Traceback(mostrecentcalllast):
File"traceback_validator.py",line7,in<module>
print(d[mandatory_key])
KeyError:'some-other'
Theaboveexceptionwasthedirectcauseofthefollowingexception:
Traceback(mostrecentcalllast):
File"traceback_validator.py",line10,in<module>
'`{}`notfoundind.'.format(mandatory_key))fromerr
__main__.ValidatorError:`some-other`notfoundind.
Thisisbrilliant,becausewecanseethetracebackoftheexceptionthatledustoraiseValidationError,aswellasthetracebackfortheValidationErroritself.
Ihadanicediscussionwithoneofmyreviewersaboutthetracebackyougetfromthepipinstaller.HewashavingtroublesettingeverythingupinordertoreviewthecodeforChapter13,DataScience.HisfreshUbuntuinstallationwasmissingafewlibrariesthatwereneededbythepippackagesinordertoruncorrectly.
Thereasonhewasblockedwasthathewastryingtofixtheerrorsdisplayedinthetracebackstartingfromthetopone.Isuggestedthathestartedfromthebottomoneinstead,andfixthat.Thereasonwasthat,iftheinstallerhadgottentothatlastline,Iguessthatbeforethat,whatevererrormayhaveoccurred,itwasstillpossibletorecoverfromit.Onlyafterthelastline,pipdecideditwasn't
possibletocontinueanyfurther,andthereforeIstartedfixingthatone.Oncethelibrariesrequiredtofixthaterrorhadbeeninstalled,everythingelsewentsmoothly.
Readingatracebackcanbetricky,andmyfriendwaslackingthenecessaryexperiencetoaddressthisproblemcorrectly.Therefore,ifyouendupinthesamesituation.Don'tbediscouraged,andtrytoshakethingsupabit,don'ttakeanythingforgranted.
Pythonhasahugeandwonderfulcommunityandit'sveryunlikelythat,whenyouencounteraproblem,you'rethefirstonetoseeit,soopenabrowserandsearch.Bydoingso,yoursearchingskillswillalsoimprovebecauseyouwillhavetotrimtheerrordowntotheminimumbutessentialsetofdetailsthatwillmakeyoursearcheffective.
Ifyouwanttoplayandunderstandthetracebackabitbetter,inthestandardlibrarythereisamoduleyoucanusecalled,surprisesurprise,traceback.Itprovidesastandardinterfacetoextract,format,andprintstacktracesofPythonprograms,mimickingthebehaviorofthePythoninterpreterwhenitprintsastacktrace.
UsingthePythondebuggerAnotherveryeffectivewayofdebuggingPythonistousethePythondebugger:pdb.Insteadofusingitdirectlythough,youshoulddefinitelycheckoutthepdbpplibrary.pdbppaugmentsthestandardpdbinterfacebyprovidingsomeconvenienttools,myfavoriteofwhichisthestickymode,whichallowsyoutoseeawholefunctionwhileyoustepthroughitsinstructions.
Thereareseveraldifferentwaystousethisdebugger(whicheverversion,it'snotimportant),butthemostcommononeconsistsofsimplysettingabreakpointandrunningthecode.WhenPythonreachesthebreakpoint,executionissuspendedandyougetconsoleaccesstothatpointsothatyoucaninspectallthenames,andsoon.Youcanalsoalterdataontheflytochangetheflowoftheprogram.
Asatoyexample,let'spretendwehaveaparserthatisraisingKeyErrorbecauseakeyismissinginadictionary.ThedictionaryisfromaJSONpayloadthatwecannotcontrol,andwejustwant,forthetimebeing,tocheatandpassthatcontrol,sincewe'reinterestedinwhatcomesafterward.Let'sseehowwecouldinterceptthismoment,inspectthedata,fixit,andgettothebottomofit,withpdbpp:
#pdebugger.py
#dcomesfromaJSONpayloadwedon'tcontrol
d={'first':'v1','second':'v2','fourth':'v4'}
#keysalsocomesfromaJSONpayloadwedon'tcontrol
keys=('first','second','third','fourth')
defdo_something_with_value(value):
print(value)
forkeyinkeys:
do_something_with_value(d[key])
print('Validationdone.')
Asyoucansee,thiscodewillbreakwhenkeygetsthe'third'value,whichismissinginthedictionary.Remember,we'repretendingthatbothdandkeyscomedynamicallyfromaJSONpayloadwedon'tcontrol,soweneedtoinspecttheminordertofixdandpasstheforloop.Ifwerunthecodeasitis,wegetthefollowing:
$pythonpdebugger.py
v1
v2
Traceback(mostrecentcalllast):
File"pdebugger.py",line10,in<module>
do_something_with_value(d[key])
KeyError:'third'
Soweseethatthatkeyismissingfromthedictionary,butsinceeverytimewerunthiscodewemaygetadifferentdictionaryorkeystuple,thisinformationdoesn'treallyhelpus.Let'sinjectacalltopdbjustbeforetheforloop.Youhavetwooptions:
importpdb
pdb.set_trace()
Thisisthemostcommonwayofdoingit.Youimportpdbandcallitsset_tracemethod.Manydevelopershavemacrosintheireditortoaddthislinewithakeyboardshortcut.AsofPython3.7though,wecansimplifythingsevenfurther,tothis:
breakpoint()
Thenewbreakpointbuilt-infunctioncallssys.breakpointhook()underthehood,whichisprogrammedbydefaulttocallpdb.set_trace().However,youcanreprogramsys.breakpointhook()tocallwhateveryouwant,andthereforebreakpointwillpointtothattoo,whichisveryconvenient.
Thecodeforthisexampleisinthepdebugger_pdb.pymodule.Ifwenowrunthiscode,thingsgetinteresting(notethatyouroutputmayvaryalittleandthatallthecommentsinthisoutputwereaddedbyme):
$pythonpdebugger_pdb.py
(Pdb++)l
16
17->forkeyinkeys:#breakpointcomesin
18do_something_with_value(d[key])
19
(Pdb++)keys#inspectingthekeystuple
('first','second','third','fourth')
(Pdb++)d.keys()#inspectingkeysof`d`
dict_keys(['first','second','fourth'])
(Pdb++)d['third']='placeholder'#addtmpplaceholder
(Pdb++)c#continue
v1
v2
placeholder
v4
Validationdone.
First,notethatwhenyoureachabreakpoint,you'reservedaconsolethattellsyouwhereyouare(thePythonmodule)andwhichlineisthenextonetobeexecuted.Youcan,atthispoint,performabunchofexploratoryactions,suchasinspectingthecodebeforeandafterthenextline,printingastacktrace,andinteractingwiththeobjects.PleaseconsulttheofficialPythondocumentation(https://docs.python.org/3.7/library/pdb.html)onpdbtolearnmoreaboutthis.Inourcase,wefirstinspectthekeystuple.Afterthat,weinspectthekeysofd.Weseethat'third'ismissing,soweputitinourselves(couldthisbedangerous—thinkaboutit).Finally,nowthatallthekeysarein,wetypec,whichmeans(c)ontinue.
pdbalsogivesyoutheabilitytoproceedwithyourcodeonelineatatimeusing(n)ext,to(s)tepintoafunctionfordeeperanalysis,ortohandlebreakswith(b)reak.Foracompletelistofcommands,pleaserefertothedocumentationortype(h)elpintheconsole.
Youcansee,fromtheoutputoftheprecedingrun,thatwecouldfinallygettotheendofthevalidation.
pdb(orpdbpp)isaninvaluabletoolthatIuseeveryday.So,goandhavefun,setabreakpointsomewhere,andtrytoinspectit,followtheofficialdocumentationandtrythecommandsinyourcodetoseetheireffectandlearnthemwell.
NoticethatinthisexampleIhaveassumedyouinstalledpdbpp.Ifthatisnotthecase,thenyoumightfindthatsomecommandsdon'tworkthesameinpdb.Oneexampleistheletterd,whichwouldbeinterpretedfrompdbasthedowncommand.Inordertogetaroundthat,youwouldhavetoadda!infrontofd,totellpdbthatitismeanttobeinterpretedliterally,andnotasacommand.
InspectinglogfilesAnotherwayofdebuggingamisbehavingapplicationistoinspectitslogfiles.Logfilesarespecialfilesinwhichanapplicationwritesdownallsortsofthings,normallyrelatedtowhat'sgoingoninsideofit.Ifanimportantprocedureisstarted,Iwouldtypicallyexpectacorrespondinglineinthelogs.Itisthesamewhenitfinishes,andpossiblyforwhathappensinsideofit.
Errorsneedtobeloggedsothatwhenaproblemhappens,wecaninspectwhatwentwrongbytakingalookattheinformationinthelogfiles.
TherearemanydifferentwaystosetupaloggerinPython.Loggingisverymalleableandyoucanconfigureit.Inanutshell,therearenormallyfourplayersinthegame:loggers,handlers,filters,andformatters:
Loggers:ExposetheinterfacethattheapplicationcodeusesdirectlyHandlers:Sendthelogrecords(createdbyloggers)totheappropriatedestinationFilters:Provideafiner-grainedfacilityfordeterminingwhichlogrecordstooutputFormatters:Specifythelayoutofthelogrecordsinthefinaloutput
LoggingisperformedbycallingmethodsoninstancesoftheLoggerclass.Eachlineyouloghasalevel.Thelevelsnormallyusedare:DEBUG,INFO,WARNING,ERROR,andCRITICAL.Youcanimportthemfromtheloggingmodule.Theyareinorderofseverityandit'sveryimportanttousethemproperlybecausetheywillhelpyoufilterthecontentsofalogfilebasedonwhatyou'researchingfor.Logfilesusuallybecomeextremelybigsoit'sveryimportanttohavetheinformationinthemwrittenproperlysothatyoucanfinditquicklywhenitmatters.
Youcanlogtoafilebutyoucanalsologtoanetworklocation,toaqueue,toaconsole,andsoon.Ingeneral,ifyouhaveanarchitecturethatisdeployedononemachine,loggingtoafileisacceptable,butwhenyourarchitecturespansovermultiplemachines(suchasinthecaseofservice-orientedormicroservicearchitectures),it'sveryusefultoimplementacentralizedsolutionforloggingsothatalllogmessagescomingfromeachservicecanbestoredandinvestigatedin
asingleplace.Ithelpsalot,otherwisetryingtocorrelategiantfilesfromseveraldifferentsourcestofigureoutwhatwentwrongcanbecometrulychallenging.
Aservice-orientedarchitecture(SOA)isanarchitecturalpatterninsoftwaredesigninwhichapplicationcomponentsprovideservicestoothercomponentsviaacommunicationsprotocol,typicallyoveranetwork.Thebeautyofthissystemisthat,whencodedproperly,eachservicecanbewritteninthemostappropriatelanguagetoserveitspurpose.Theonlythingthatmattersisthecommunicationwiththeotherservices,whichneedstohappenviaacommonformatsothatdataexchangecanbedone.MicroservicearchitecturesareanevolutionofSOAs,butfollowadifferentsetofarchitecturalpatterns.
Here,Iwillpresentyouwithaverysimpleloggingexample.Wewilllogafewmessagestoafile:
#log.py
importlogging
logging.basicConfig(
filename='ch11.log',
level=logging.DEBUG,#minimumlevelcaptureinthefile
format='[%(asctime)s]%(levelname)s:%(message)s',
datefmt='%m/%d/%Y%I:%M:%S%p')
mylist=[1,2,3]
logging.info('Startingtoprocess`mylist`...')
forpositioninrange(4):
try:
logging.debug(
'Valueatposition%sis%s',position,mylist[position]
)
exceptIndexError:
logging.exception('Faultyposition:%s',position)
logging.info('Doneparsing`mylist`.')
Let'sgothroughitlinebyline.First,weimporttheloggingmodule,thenwesetupabasicconfiguration.Ingeneral,aproduction-loggingconfigurationismuchmorecomplicatedthanthis,butIwantedtokeepthingsaseasyaspossible.Wespecifyafilename,theminimumlogginglevelwewanttocaptureinthefile,andthemessageformat.We'lllogthedateandtimeinformation,thelevel,andthemessage.
Iwillstartbylogginganinfomessagethattellsmewe'reabouttoprocessourlist.Then,Iwilllog(thistimeusingtheDEBUGlevel,byusingthedebugfunction)whichisthevalueatsomeposition.I'musingdebugherebecauseIwanttobeabletofilterouttheselogsinthefuture(bysettingtheminimumleveltologging.INFOormore),becauseImighthavetohandleverybiglistsandIdon't
wanttologallthevalues.
IfwegetIndexError(andwedo,sinceI'mloopingoverrange(4)),wecalllogging.exception(),whichisthesameaslogging.error(),butitalsoprintsthetraceback.
Attheendofthecode,Iloganotherinfomessagesayingwe'redone.Theresultisthis:
#ch11.log
[05/06/201811:13:48AM]INFO:Startingtoprocess`mylist`...
[05/06/201811:13:48AM]DEBUG:Valueatposition0is1
[05/06/201811:13:48AM]DEBUG:Valueatposition1is2
[05/06/201811:13:48AM]DEBUG:Valueatposition2is3
[05/06/201811:13:48AM]ERROR:Faultyposition:3
Traceback(mostrecentcalllast):
File"log.py",line15,in<module>
position,mylist[position]))
IndexError:listindexoutofrange
[05/06/201811:13:48AM]INFO:Doneparsing`mylist`.
Thisisexactlywhatweneedtobeabletodebuganapplicationthatisrunningonabox,andnotonourconsole.Wecanseewhatwenton,thetracebackofanyexceptionraised,andsoon.
Theexamplepresentedhereonlyscratchesthesurfaceoflogging.Foramorein-depthexplanation,youcanfindinformationinthePythonHOWTOssectionoftheofficialPythondocumentation:LoggingHOWTO,andLoggingCookbook.
Loggingisanart.Youneedtofindagoodbalancebetweenloggingeverythingandloggingnothing.Ideally,youshouldloganythingthatyouneedtomakesureyourapplicationisworkingcorrectly,andpossiblyallerrorsorexceptions.
OthertechniquesInthisfinalsection,I'dliketodemonstratebrieflyacoupleoftechniquesthatyoumayfinduseful.
ProfilingWetalkedaboutprofilinginChapter8,Testing,Profiling,andDealingwithExceptions,andI'monlymentioningitherebecauseprofilingcansometimesexplainweirderrorsthatareduetoacomponentbeingtooslow.Especiallywhennetworkingisinvolved,havinganideaofthetimingsandlatenciesyourapplicationhastogothroughisveryimportantinordertounderstandwhatmaybegoingonwhenproblemsarise,thereforeIsuggestyougetacquaintedwithprofilingtechniquesandalsoforatroubleshootingperspective.
AssertionsAssertionsareanicewaytomakeyourcodeensureyourassumptionsareverified.Iftheyare,allproceedsregularlybut,iftheyarenot,yougetaniceexceptionthatyoucanworkwith.Sometimes,insteadofinspecting,it'squickertodropacoupleofassertionsinthecodejusttoexcludepossibilities.Let'sseeanexample:
#assertions.py
mylist=[1,2,3]#thisideallycomesfromsomeplace
assert4==len(mylist)#thiswillbreak
forpositioninrange(4):
print(mylist[position])
Thiscodesimulatesasituationinwhichmylistisn'tdefinedbyuslikethat,ofcourse,butwe'reassumingithasfourelements.Soweputanassertionthere,andtheresultisthis:
$pythonassertions.py
Traceback(mostrecentcalllast):
File"assertions.py",line3,in<module>
assert4==len(mylist)#thiswillbreak
AssertionError
Thistellsusexactlywheretheproblemis.
WheretofindinformationInthePythonofficialdocumentation,thereisasectiondedicatedtodebuggingandprofiling,whereyoucanreadupaboutthebdbdebuggerframework,andaboutmodulessuchasfaulthandler,timeit,trace,tracemallock,andofcoursepdb.Justheadtothestandardlibrarysectioninthedocumentationandyou'llfindallthisinformationveryeasily.
TroubleshootingguidelinesInthisshortsection,I'dliketogiveyouafewtipsthatcomefrommytroubleshootingexperience.
UsingconsoleeditorsFirst,getcomfortableusingVimornanoasaneditor,andlearnthebasicsoftheconsole.Whenthingsbreak,youdon'thavetheluxuryofyoureditorwithallthebellsandwhistlesthere.Youhavetoconnecttoaboxandworkfromthere.Soit'saverygoodideatobecomfortablebrowsingyourproductionenvironmentwithconsolecommands,andbeabletoeditfilesusingconsole-basededitors,suchasvi,Vim,ornano.Don'tletyourusualdevelopmentenvironmentspoilyou.
WheretoinspectMysecondsuggestionconcernswheretoplaceyourdebuggingbreakpoints.Itdoesn'tmatterifyouareusingprint,acustomfunction,orpdb,youstillhavetochoosewheretoplacethecallsthatprovideyouwiththeinformation,right?
Well,someplacesarebetterthanothers,andtherearewaystohandlethedebuggingprogressionthatarebetterthanothers.
Inormallyavoidplacingabreakpointinanifclausebecause,ifthatclauseisnotexercised,IlosethechanceofgettingtheinformationIwanted.Sometimesit'snoteasyorquicktogettothebreakpoint,sothinkcarefullybeforeplacingthem.
Anotherimportantthingiswheretostart.Imaginethatyouhave100linesofcodethathandleyourdata.Datacomesinatline1,andsomehowit'swrongatline100.Youdon'tknowwherethebugis,sowhatdoyoudo?Youcanplaceabreakpointatline1andpatientlygothroughallthelines,checkingyourdata.Intheworstcasescenario,99lines(andmanycupsofcoffee)later,youspotthebug.So,considerusingadifferentapproach.
Youstartatline50,andinspect.Ifthedataisgood,itmeansthebughappenslater,inwhichcaseyouplaceyournextbreakpointatline75.Ifthedataatline50isalreadybad,yougoonbyplacingabreakpointatline25.Then,yourepeat.Eachtime,youmoveeitherbackwardorforward,byhalfthejumpyoudidlasttime.
Inourworst-casescenario,yourdebuggingwouldgofrom1,2,3,...,99,inalinearfashion,toaseriesofjumpssuchas50,75,87,93,96,...,99whichiswayfaster.Infact,it'slogarithmic.Thissearchingtechniqueiscalledbinarysearch,it'sbasedonadivide-and-conquerapproach,andit'sveryeffective,sotrytomasterit.
UsingteststodebugDoyourememberChapter8,Testing,Profiling,andDealingwithExceptions,abouttests?Well,ifwehaveabugandalltestsarepassing,itmeanssomethingiswrongormissinginourtestcodebase.So,oneapproachistomodifythetestsinsuchawaythattheycaterforthenewedgecasethathasbeenspotted,andthenworkyourwaythroughthecode.Thisapproachcanbeverybeneficial,becauseitmakessurethatyourbugwillbecoveredbyatestwhenit'sfixed.
Monitoring
Monitoringisalsoveryimportant.Softwareapplicationscangocompletelycrazyandhavenon-deterministichiccupswhentheyencounteredge-casesituationssuchasthenetworkbeingdown,aqueuebeingfull,oranexternalcomponentbeingunresponsive.Inthesecases,it'simportanttohaveanideaofwhatthebigpicturewaswhentheproblemoccurredandbeabletocorrelateittosomethingrelatedtoitinasubtle,perhapsmysteriousway.
YoucanmonitorAPIendpoints,processes,webpagesavailabilityandloadtimes,andbasicallyalmosteverythingthatyoucancode.Ingeneral,whenstartinganapplicationfromscratch,itcanbeveryusefultodesignitkeepinginmindhowyouwanttomonitorit.
SummaryInthisshortchapter,welookedatdifferenttechniquesandsuggestionsfordebuggingandtroubleshootingourcode.Debuggingisanactivitythatisalwayspartofasoftwaredeveloper'swork,soit'simportanttobegoodatit.
Ifapproachedwiththecorrectattitude,itcanbefunandrewarding.
Weexploredtechniquestoinspectourcodebaseonfunctions,logging,debuggers,tracebackinformation,profiling,andassertions.Wesawsimpleexamplesofmostofthemandwealsotalkedaboutasetofguidelinesthatwillhelpwhenitcomestofacingthefire.
Justrememberalwaystostaycalmandfocused,anddebuggingwillbemucheasier.Thistoo,isaskillthatneedstobelearnedandit'sthemostimportant.Anagitatedandstressedmindcannotworkproperly,logically,andcreatively,therefore,ifyoudon'tstrengthenit,itwillbehardforyoutoputallofyourknowledgetogooduse.
Inthenextchapter,wearegoingtoexploreGUIsandscripts,takinganinterestingdetourfromthemorecommonweb-applicationscenario.
GUIsandScripts"Auserinterfaceislikeajoke.Ifyouhavetoexplainit,it'snotthatgood."
–MartinLeBlanc
Inthischapter,we'regoingtoworkonaprojecttogether.Wearegoingtowriteasimplescraperthatfindsandsavesimagesfromawebpage.We'llfocusonthreeparts:
AsimpleHTTPwebserverinPythonAscriptthatscrapesagivenURLAGUIapplicationthatscrapesagivenURL
Agraphicaluserinterface(GUI)isatypeofinterfacethatallowstheusertointeractwithanelectronicdevicethroughgraphicalicons,buttons,andwidgets,asopposedtotext-basedorcommand-lineinterfaces,whichrequirecommandsortexttobetypedonthekeyboard.Inanutshell,anybrowser,anyofficesuitesuchasLibreOffice,and,ingeneral,anythingthatpopsupwhenyouclickonanicon,isaGUIapplication.
So,ifyouhaven'talreadydoneso,thiswouldbetheperfecttimetostartaconsoleandpositionyourselfinafoldercalledch12intherootofyourprojectforthisbook.Withinthatfolder,we'llcreatetwoPythonmodules(scrape.pyandguiscrape.py)andafolder(simple_server).Withinsimple_server,we'llwriteourHTMLpage:index.html.Imageswillbestoredinsimple_server/img.
Thestructureinch12shouldlooklikethis:
$tree-A
.
├──guiscrape.py
├──scrape.py
└──simple_server
├──img
│├──owl-alcohol.png
│├──owl-book.png
│├──owl-books.png
│├──owl-ebook.jpg
│└──owl-rose.jpeg
├──index.html
└──serve.sh
Ifyou'reusingeitherLinuxormacOS,youcandowhatIdoandputthecodetostarttheHTTPserverinaserve.shfile.OnWindows,you'llprobablywanttouseabatchfile.
TheHTMLpagewe'regoingtoscrapehasthefollowingstructure:
#simple_server/index.html
<!DOCTYPEhtml>
<htmllang="en">
<head><title>CoolOwls!</title></head>
<body>
<h1>Welcometomyowlgallery</h1>
<div>
<imgsrc="img/owl-alcohol.png"height="128"/>
<imgsrc="img/owl-book.png"height="128"/>
<imgsrc="img/owl-books.png"height="128"/>
<imgsrc="img/owl-ebook.jpg"height="128"/>
<imgsrc="img/owl-rose.jpeg"height="128"/>
</div>
<p>Doyoulikemyowls?</p>
</body>
</html>
It'sanextremelysimplepage,solet'sjustnotethatwehavefiveimages,threeofwhicharePNGsandtwoofwhichareJPGs(notethateventhoughtheyarebothJPGs,oneendswith.jpgandtheotherwith.jpeg,whicharebothvalidextensionsforthisformat).
So,PythongivesyouaverysimpleHTTPserverforfreethatyoucanstartwiththefollowingcommand(inthesimple_serverfolder):
$python-mhttp.server8000
ServingHTTPon0.0.0.0port8000(http://0.0.0.0:8000/)...
127.0.0.1--[06/May/201816:54:30]"GET/HTTP/1.1"200-
...
Thelastlineisthelogyougetwhenyouaccesshttp://localhost:8000,whereourbeautifulpagewillbeserved.Alternatively,youcanputthatcommandinafilecalledserve.sh,andjustrunthatwiththiscommand(makesureit'sexecutable):
$./serve.sh
Itwillhavethesameeffect.Ifyouhavethecodeforthisbook,yourpageshouldlooksomethinglikethis:
Feelfreetouseanyothersetofimages,aslongasyouuseatleastonePNGandoneJPG,andthatinthesrctagyouuserelativepaths,notabsoluteones.Igottheselovelyowlsfromhttps://openclipart.org/.
Firstapproach–scriptingNow,let'sstartwritingthescript.I'llgothroughthesourceinthreesteps:imports,argumentsparsing,andbusinesslogic.
TheimportsHere'showthescriptstarts:
#scrape.py
importargparse
importbase64
importjson
importos
frombs4importBeautifulSoup
importrequests
Goingthroughthemfromthetop,youcanseethatwe'llneedtoparsethearguments,whichwe'llfeedtothescriptitself(argparse).Wewillneedthebase64librarytosavetheimageswithinaJSONfile(json),andwe'llneedtoopenfilesforwriting(os).Finally,we'llneedBeautifulSoupforscrapingthewebpageeasily,andrequeststofetchitscontent.Iassumeyou'refamiliarwithrequestsaswehaveuseditinpreviouschapters.
WewillexploretheHTTPprotocolandtherequestsmechanisminChapter14,WebDevelopment,sofornow,let'sjust(simplistically)saythatweperformanHTTPrequesttofetchthecontentofawebpage.Wecandoitprogrammaticallyusingalibrary,suchasrequests,andit'smoreorlesstheequivalentoftypingaURLinyourbrowserandpressingEnter(thebrowserthenfetchesthecontentofawebpageanddisplaysittoyou).
Ofalltheseimports,onlythelasttwodon'tbelongtothePythonstandardlibrary,somakesureyouhavetheminstalled:
$pipfreeze|egrep-i"soup|requests"
beautifulsoup4==4.6.0
requests==2.18.4
Ofcourse,theversionnumbersmightbedifferentforyou.Ifthey'renotinstalled,usethiscommandtodoso:
$pipinstallbeautifulsoup4==4.6.0requests==2.18.4
Atthispoint,theonlythingthatIreckonmightconfuseyouisthebase64/jsoncouple,soallowmetospendafewwordsonthat.
Aswesawinthepreviouschapter,JSONisoneofthemostpopularformatsfordataexchangebetweenapplications.It'salsowidelyusedforotherpurposestoo,
forexample,tosavedatainafile.Inourscript,we'regoingtooffertheusertheabilitytosaveimagesasimagefiles,orasaJSONsinglefile.WithintheJSON,we'llputadictionarywithkeysastheimagenamesandvaluesastheircontent.Theonlyissueisthatsavingimagesinthebinaryformatistricky,andthisiswherethebase64librarycomestotherescue.
Thebase64libraryisactuallyquiteuseful.Forexample,everytimeyousendanemailwithanimageattachedtoit,theimagegetsencodedwithbase64beforetheemailissent.Ontherecipientside,imagesareautomaticallydecodedintotheiroriginalbinaryformatsothattheemailclientcandisplaythem.
ParsingargumentsNowthatthetechnicalitiesareoutoftheway,let'sseethesecondsectionofourscript(itshouldbeattheendofthescrape.pymodule):
if__name__=="__main__":
parser=argparse.ArgumentParser(
description='Scrapeawebpage.')
parser.add_argument(
'-t',
'--type',
choices=['all','png','jpg'],
default='all',
help='Theimagetypewewanttoscrape.')
parser.add_argument(
'-f',
'--format',
choices=['img','json'],
default='img',
help='Theformatimagesare_savedto.')
parser.add_argument(
'url',
help='TheURLwewanttoscrapeforimages.')
args=parser.parse_args()
scrape(args.url,args.format,args.type)
Lookatthatfirstline;itisaverycommonidiomwhenitcomestoscripting.AccordingtotheofficialPythondocumentation,the'__main__'stringisthenameofthescopeinwhichtop-levelcodeexecutes.Amodule's__name__issetequalto'__main__'whenreadfromstandardinput,ascript,orfromaninteractiveprompt.
Therefore,ifyouputtheexecutionlogicunderthatif,itwillberunonlywhenyourunthescriptdirectly,asits__name__willbe'__main__'.Ontheotherhand,shouldyouimportfromthismodule,thenitsnamewillbesettosomethingelse,sothelogicundertheifwon'trun.
Thefirstthingwedoisdefineourparser.Iwouldrecommendusingthestandardlibrarymodule,argparse,whichissimpleenoughandquitepowerful.Thereareotheroptionsoutthere,butinthiscase,argparsewillprovideuswithallweneed.
Wewanttofeedourscriptthreedifferentpiecesofdata:thetypesofimageswewanttosave,theformatinwhichwewanttosavethem,andtheURLforthepagetobescraped.
ThetypescanbePNGs,JPGs,orboth(default),whiletheformatcanbeeitherimageorJSON,imagebeingthedefault.URListheonlymandatoryargument.
So,weaddthe-toption,allowingalsothelongversion,--type.Thechoicesare'all','png',and'jpg'.Wesetthedefaultto'all'andweaddahelpmessage.
Wedoasimilarprocedurefortheformatargument,allowingboththeshortandlongsyntax(-fand--format),andfinallyweaddtheurlargument,whichistheonlyonethatisspecifieddifferentlysothatitwon'tbetreatedasanoption,butratherasapositionalargument.
Inordertoparseallthearguments,allweneedisparser.parse_args().Verysimple,isn'tit?
Thelastlineiswherewetriggertheactuallogic,bycallingthescrapefunction,passingalltheargumentswejustparsed.Wewillseeitsdefinitionshortly.Thenicethingaboutargparseisthatifyoucallthescriptbypassing-h,itwillprintaniceusagetextforyouautomatically.Let'stryitout:
$pythonscrape.py-h
usage:scrape.py[-h][-t{all,png,jpg}][-f{img,json}]url
Scrapeawebpage.
positionalarguments:
urlTheURLwewanttoscrapeforimages.
optionalarguments:
-h,--helpshowthishelpmessageandexit
-t{all,png,jpg},--type{all,png,jpg}
Theimagetypewewanttoscrape.
-f{img,json},--format{img,json}
Theformatimagesare_savedto.
Ifyouthinkaboutit,theonetrueadvantageofthisisthatwejustneedtospecifytheargumentsandwedon'thavetoworryabouttheusagetext,whichmeanswewon'thavetokeepitinsyncwiththearguments'definitioneverytimewechangesomething.Thisisprecious.
Hereareafewdifferentwaystocallourscrape.pyscript,whichdemonstratethattypeandformatareoptional,andhowyoucanusetheshortandlongsyntaxestoemploythem:
$pythonscrape.pyhttp://localhost:8000
$pythonscrape.py-tpnghttp://localhost:8000
$pythonscrape.py--type=jpg-fjsonhttp://localhost:8000
Thefirstoneisusingdefaultvaluesfortypeandformat.ThesecondonewillsaveonlyPNGimages,andthethirdonewillsaveonlyJPGs,butinJSONformat.
ThebusinesslogicNowthatwe'veseenthescaffolding,let'sdeepdiveintotheactuallogic(ifitlooksintimidating,don'tworry;we'llgothroughittogether).Withinthescript,thislogicliesaftertheimportsandbeforetheparsing(beforetheif__name__clause):
defscrape(url,format_,type_):
try:
page=requests.get(url)
exceptrequests.RequestExceptionaserr:
print(str(err))
else:
soup=BeautifulSoup(page.content,'html.parser')
images=_fetch_images(soup,url)
images=_filter_images(images,type_)
_save(images,format_)
Let'sstartwiththescrapefunction.Thefirstthingitdoesisfetchthepageatthegivenurlargument.Whatevererrormayhappenwhiledoingthis,wetrapitinRequestException(err)andprintit.RequestExceptionisthebaseexceptionclassforalltheexceptionsintherequestslibrary.
However,ifthingsgowell,andwehaveapagebackfromtheGETrequest,thenwecanproceed(elsebranch)andfeeditscontenttotheBeautifulSoupparser.TheBeautifulSouplibraryallowsustoparseawebpageinnotime,withouthavingtowriteallthelogicthatwouldbeneededtofindalltheimagesinapage,whichwereallydon'twanttodo.It'snotaseasyasitseems,andreinventingthewheelisnevergood.Tofetchimages,weusethe_fetch_imagesfunctionandwefilterthemwith_filter_images.Finally,wecall_savewiththeresult.
Splittingthecodeintodifferentfunctionswithmeaningfulnamesallowsustoreaditmoreeasily.Evenifyouhaven'tseenthelogicofthe_fetch_images,_filter_images,and_savefunctions,it'snothardtopredictwhattheydo,right?Checkoutthefollowing:
def_fetch_images(soup,base_url):
images=[]
forimginsoup.findAll('img'):
src=img.get('src')
img_url=f'{base_url}/{src}'
name=img_url.split('/')[-1]
images.append(dict(name=name,url=img_url))
returnimages
_fetch_imagestakesaBeautifulSoupobjectandabaseURL.Allitdoesisloopthroughalloftheimagesfoundonthepageandfillinthenameandurlinformationabouttheminadictionary(oneperimage).Alldictionariesareaddedtotheimageslist,whichisreturnedattheend.
Thereissometrickerygoingonwhenwegetthenameofanimage.Wesplittheimg_url(http://localhost:8000/img/my_image_name.png)stringusing'/'asaseparator,andwetakethelastitemastheimagename.Thereisamorerobustwayofdoingthis,butforthisexampleitwouldbeoverkill.Ifyouwanttoseethedetailsofeachstep,trytobreakthislogicdownintosmallersteps,andprinttheresultofeachofthemtohelpyourselfunderstand.Towardtheendofthebook,I'llshowyouanothertechniquefordebugginginamuchmoreefficientway.
Anyway,byjustaddingprint(images)attheendofthe_fetch_imagesfunction,wegetthis:
[{'url':'http://localhost:8000/img/owl-alcohol.png','name':'owl-alcohol.png'},
{'url':'http://localhost:8000/img/owl-book.png','name':'owl-book.png'},...]
Itruncatedtheresultforbrevity.Youcanseeeachdictionaryhasaurlandnamekey/valuepair,whichwecanusetofetch,identify,andsaveourimagesaswelike.Atthispoint,Ihearyouaskingwhatwouldhappeniftheimagesonthepagewerespecifiedwithanabsolutepathinsteadofarelativeone,right?Goodquestion!
Theansweristhatthescriptwillfailtodownloadthembecausethislogicexpectsrelativepaths.IwasabouttoaddabitoflogictosolvethisissuewhenIthoughtthat,atthisstage,itwouldbeaniceexerciseforyoutodoit,soI'llleaveituptoyoutofixit.
Hint:Inspectthestartofthatsrcvariable.Ifitstartswith'http',it'sprobablyanabsolutepath.Youmightalsowanttocheckouturllib.parsetodothat.
Ihopethebodyofthe_filter_imagesfunctionisinterestingtoyou.Iwantedtoshowyouhowtocheckonmultipleextensionsusingamappingtechnique:
def_filter_images(images,type_):
iftype_=='all':
returnimages
ext_map={
'png':['.png'],
'jpg':['.jpg','.jpeg'],
}
return[
imgforimginimages
if_matches_extension(img['name'],ext_map[type_])
]
def_matches_extension(filename,extension_list):
name,extension=os.path.splitext(filename.lower())
returnextensioninextension_list
Inthisfunction,iftype_isall,thennofilteringisrequired,sowejustreturnalltheimages.Ontheotherhand,whentype_isnotall,wegettheallowedextensionsfromtheext_mapdictionary,anduseittofiltertheimagesinthelistcomprehensionthatendsthefunctionbody.Youcanseethatbyusinganotherhelperfunction,_matches_extension,Ihavemadethelistcomprehensionsimplerandmorereadable.
All_matches_extensiondoesissplitthenameoftheimagegettingitsextensionandcheckwhetheritiswithinthelistofallowedones.Canyoufindonemicro-improvement(speed-wise)thatcouldbemadetothisfunction?
I'msureyou'rewonderingwhyIhavecollectedalltheimagesinthelistandthenremovedthem,insteadofcheckingwhetherIwantedtosavethembeforeaddingthemtothelist.ThefirstreasonisthatIneeded_fetch_imagesintheGUIapplicationasitisnow.Thesecondreasonisthatcombining,fetching,andfilteringwouldproducealongerandmorecomplicatedfunction,andI'mtryingtokeepthecomplexityleveldown.Thethirdreasonisthatthiscouldbeaniceexerciseforyoutodo:
def_save(images,format_):
ifimages:
ifformat_=='img':
_save_images(images)
else:
_save_json(images)
print('Done')
else:
print('Noimagestosave.')
def_save_images(images):
forimginimages:
img_data=requests.get(img['url']).content
withopen(img['name'],'wb')asf:
f.write(img_data)
def_save_json(images):
data={}
forimginimages:
img_data=requests.get(img['url']).content
b64_img_data=base64.b64encode(img_data)
str_img_data=b64_img_data.decode('utf-8')
data[img['name']]=str_img_data
withopen('images.json','w')asijson:
ijson.write(json.dumps(data))
Let'skeepgoingthroughthecodeandinspectthe_savefunction.Youcanseethat,whenimagesisn'tempty,thisbasicallyactsasadispatcher.Weeithercall_save_imagesor_save_json,dependingonwhatinformationisstoredintheformat_variable.
Wearealmostdone.Let'sjumpto_save_images.Weloopontheimageslistandforeachdictionarywefindthere,weperformaGETrequestontheimageURLandsaveitscontentinafile,whichwenameastheimageitself.
Finally,let'snowstepintothe_save_jsonfunction.It'sverysimilartothepreviousone.Webasicallyfillinthedatadictionary.Theimagenameisthekey,andtheBase64representationofitsbinarycontentisthevalue.Whenwe'redonepopulatingourdictionary,weusethejsonlibrarytodumpitintheimages.jsonfile.I'llgiveyouasmallpreviewofthat:
#images.json(truncated)
{
"owl-alcohol.png":"iVBORw0KGgoAAAANSUhEUgAAASwAAAEICA...
"owl-book.png":"iVBORw0KGgoAAAANSUhEUgAAASwAAAEbCAYAA...
"owl-books.png":"iVBORw0KGgoAAAANSUhEUgAAASwAAAElCAYA...
"owl-ebook.jpg":"/9j/4AAQSkZJRgABAQEAMQAxAAD/2wBDAAEB...
"owl-rose.jpeg":"/9j/4AAQSkZJRgABAQEANAA0AAD/2wBDAAEB...
}
Andthat'sit!Now,beforeproceedingtothenextsection,makesureyouplaywiththisscriptandunderstandhowitworks.Trytomodifysomething,printoutintermediateresults,addanewargumentorfunctionality,orscramblethelogic.We'regoingtomigrateitintoaGUIapplicationnow,whichwilladdalayerofcomplexitysimplybecausewe'llhavetobuildtheGUIinterface,soit'simportantthatyou'rewellacquaintedwiththebusinesslogic—itwillallowyoutoconcentrateontherestofthecode.
Secondapproach–aGUIapplicationThereareseverallibrariesthatwriteGUIapplicationsinPython.ThemostfamousonesareTkinter,wxPython,PyGTK,andPyQt.TheyallofferawiderangeoftoolsandwidgetsthatyoucanusetocomposeaGUIapplication.
TheoneI'mgoingtousefortherestofthischapterisTkinter.TkinterstandsforTkinterfaceanditisthestandardPythoninterfacetotheTkGUItoolkit.BothTkandTkinterareavailableonmostUnixplatforms,macOSX,aswellasonWindowssystems.
Let'smakesurethattkinterisinstalledproperlyonyoursystembyrunningthiscommand:
$python-mtkinter
Itshouldopenadialogwindow,demonstratingasimpleTkinterface.Ifyoucanseethat,we'regoodtogo.However,ifitdoesn'twork,pleasesearchfortkinterinthePythonofficialdocumentation(https://docs.python.org/3.7/library/tkinter.html).Youwillfindseverallinkstoresourcesthatwillhelpyougetupandrunningwithit.
We'regoingtomakeaverysimpleGUIapplicationthatbasicallymimicsthebehaviorofthescriptwesawinthefirstpartofthischapter.Wewon'taddtheabilitytosaveJPGsorPNGssingularly,butafteryou'vegonethroughthischapter,youshouldbeabletoplaywiththecodeandputthatfeaturebackinbyyourself.
So,thisiswhatwe'reaimingfor:
Gorgeous,isn'tit?Asyoucansee,it'saverysimpleinterface(thisishowitshouldlookonamac).Thereisaframe(thatis,acontainer)fortheURLfieldandtheFetchinfobutton,anotherframefortheListbox(Content)toholdtheimagenamesandtheradiobuttontocontrolthewaywesavethem,andfinallythereisaScrape!buttonatthebottom.Wealsohaveastatusbar,whichshowsussomeinformation.
Inordertogetthislayout,wecouldjustplaceallthewidgetsonarootwindow,butthatwouldmakethelayoutlogicquitemessyandunnecessarilycomplicated.So,instead,wewilldividethespaceusingframesandplacethewidgetsinthoseframes.Thiswaywewillachieveamuchnicerresult.So,thisisthedraftforthelayout:
WehaveaRootWindow,whichisthemainwindowoftheapplication.Wedivideitintotworows,thefirstoneinwhichweplacetheMainFrame,andthesecondoneinwhichweplacetheStatusFrame(whichwillholdthestatusbartext).TheMainFrameissubsequentlydividedintothreerows.Inthefirstone,
weplacetheURLFrame,whichholdstheURLwidgets.Inthesecondone,weplacetheImgFrame,whichwillholdtheListboxandtheRadioFrame,whichwillhostalabelandtheradiobuttonwidgets.Andfinallywehavethethirdone,whichwilljustholdtheScrapebutton.
Inordertolayoutframesandwidgets,wewillusealayoutmanager,calledgrid,thatsimplydividesupthespaceintorowsandcolumns,asinamatrix.
Now,allthecodeI'mgoingtowritecomesfromtheguiscrape.pymodule,soIwon'trepeatitsnameforeachsnippet,tosavespace.Themoduleislogicallydividedintothreesections,notunlikethescriptversion:imports,layoutlogic,andbusinesslogic.We'regoingtoanalyzethemlinebyline,inthreechunks.
TheimportsImportsarelikeinthescriptversion,exceptwe'velostargparse,whichisnolongerneeded,andwehaveaddedtwolines:#guiscrape.pyfromtkinterimport*fromtkinterimportttk,filedialog,messagebox...
Thefirstlineisquitecommonpracticewhendealingwithtkinter,althoughingeneralitisbadpracticetoimportusingthe*syntax.Youcanincurinnamecollisionsand,ifthemoduleistoobig,importingeverythingwouldbeexpensive.
Afterthat,weimportttk,filedialog,andmessageboxexplicitly,followingtheconventionalapproachusedwiththislibrary.ttkisthenewsetofstyledwidgets.Theybehavebasicallyliketheoldones,butarecapableofdrawingthemselvescorrectlyaccordingtothestyleyourOSisseton,whichisnice.
Therestoftheimports(omitted)iswhatweneedinordertocarryoutthetaskyouknowwellbynow.Notethatthereisnothingweneedtoinstallwithpipinthissecondpart;wealreadyhaveeverythingweneed.
ThelayoutlogicI'mgoingtopasteitchunkbychunksothatIcanexplainiteasilytoyou.You'llseehowallthosepieceswetalkedaboutinthelayoutdraftarearrangedandgluedtogether.WhatI'mabouttopaste,aswedidinthescriptbefore,isthefinalpartoftheguiscrape.pymodule.We'llleavethemiddlepart,thebusinesslogic,forlast:
if__name__=="__main__":
_root=Tk()
_root.title('Scrapeapp')
Asyouknowbynow,weonlywanttoexecutethelogicwhenthemoduleisrundirectly,sothatfirstlineshouldn'tsurpriseyou.
Inthelasttwolines,wesetupthemainwindow,whichisaninstanceoftheTkclass.Weinstantiateitandgiveitatitle.NotethatIusetheprependingunderscoretechniqueforallthenamesofthetkinterobjects,inordertoavoidpotentialcollisionswithnamesinthebusinesslogic.Ijustfinditcleanerlikethis,butyou'reallowedtodisagree:
_mainframe=ttk.Frame(_root,padding='5555')
_mainframe.grid(row=0,column=0,sticky=(E,W,N,S))
Here,wesetuptheMainFrame.It'sattk.Frameinstance.Weset_rootasitsparent,andgiveitsomepadding.Thepaddingisameasureinpixelsofhowmuchspaceshouldbeinsertedbetweentheinnercontentandthebordersinordertoletourlayoutbreathealittle,otherwisewehaveasardineeffect,wherewidgetsarepackedtootightly.
Thesecondlineismoreinteresting.Weplacethis_mainframeonthefirstrow(0)andfirstcolumn(0)oftheparentobject(_root).Wealsosaythatthisframeneedstoextenditselfineachdirectionbyusingthestickyargumentwithallfourcardinaldirections.Ifyou'rewonderingwheretheycamefrom,it'sthefromtkinterimport*magicthatbroughtthemtous:
_url_frame=ttk.LabelFrame(
_mainframe,text='URL',padding='5555')
_url_frame.grid(row=0,column=0,sticky=(E,W))
_url_frame.columnconfigure(0,weight=1)
_url_frame.rowconfigure(0,weight=1)
Next,westartbyplacingtheURLFramedown.Thistime,theparentobjectis_mainframe,asyouwillrecallfromourdraft.ThisisnotjustasimpleFrame,it'sactuallyaLabelFrame,whichmeanswecansetthetextargumentandexpectarectangletobedrawnaroundit,withthecontentofthetextargumentwritteninthetop-leftpartofit(checkoutthepreviouspictureifithelps).Wepositionthisframeat(0,0),andsaythatitshouldexpandtotheleftandtotheright.Wedon'tneedtheothertwodirections.
Finally,weuserowconfigureandcolumnconfiguretomakesureitbehavescorrectly,shoulditneedtoresize.Thisisjustaformalityinourpresentlayout:
_url=StringVar()
_url.set('http://localhost:8000')
_url_entry=ttk.Entry(
_url_frame,width=40,textvariable=_url)
_url_entry.grid(row=0,column=0,sticky=(E,W,S,N),padx=5)
_fetch_btn=ttk.Button(
_url_frame,text='Fetchinfo',command=fetch_url)
_fetch_btn.grid(row=0,column=1,sticky=W,padx=5)
Here,wehavethecodetolayouttheURLtextboxandthe_fetchbutton.AtextboxinthisenvironmentiscalledEntry.Weinstantiateitasusual,setting_url_frameasitsparentandgivingitawidth.Also,andthisisthemostinterestingpart,wesetthetextvariableargumenttobe_url._urlisaStringVar,whichisanobjectthatisnowconnectedtoEntryandwillbeusedtomanipulateitscontent.Therefore,wedon'tmodifythetextinthe_url_entryinstancedirectly,butbyaccessing_url.Inthiscase,wecallthesetmethodonittosettheinitialvaluetotheURLofourlocalwebpage.
Weposition_url_entryat(0,0),settingallfourcardinaldirectionsforittostickto,andwealsosetabitofextrapaddingontheleftandrightedgesusingpadx,whichaddspaddingonthex-axis(horizontal).Ontheotherhand,padytakescareoftheverticaldirection.
Bynow,youshouldgetthateverytimeyoucallthe.gridmethodonanobject,we'rebasicallytellingthegridlayoutmanagertoplacethatobjectsomewhere,accordingtorulesthatwespecifyasargumentsinthegrid()call.
Similarly,wesetupandplacethe_fetchbutton.Theonlyinterestingparameteriscommand=fetch_url.Thismeansthatwhenweclickthisbutton,wecallthefetch_url
function.Thistechniqueiscalledcallback:
_img_frame=ttk.LabelFrame(
_mainframe,text='Content',padding='9000')
_img_frame.grid(row=1,column=0,sticky=(N,S,E,W))
ThisiswhatwecalledImgFrameinthelayoutdraft.Itisplacedonthesecondrowofitsparent_mainframe.ItwillholdtheListboxandtheRadioFrame:
_images=StringVar()
_img_listbox=Listbox(
_img_frame,listvariable=_images,height=6,width=25)
_img_listbox.grid(row=0,column=0,sticky=(E,W),pady=5)
_scrollbar=ttk.Scrollbar(
_img_frame,orient=VERTICAL,command=_img_listbox.yview)
_scrollbar.grid(row=0,column=1,sticky=(S,N),pady=6)
_img_listbox.configure(yscrollcommand=_scrollbar.set)
Thisisprobablythemostinterestingbitofthewholelayoutlogic.Aswedidwith_url_entry,weneedtodrivethecontentsofListboxbytyingittoan_imagesvariable.WesetupListboxsothat_img_frameisitsparent,and_imagesisthevariableit'stiedto.Wealsopasssomedimensions.
Theinterestingbitcomesfromthe_scrollbarinstance.Notethat,whenweinstantiateit,wesetitscommandto_img_listbox.yview.ThisisthefirsthalfofthecontractbetweenListboxandScrollbar.Theotherhalfisprovidedbythe_img_listbox.configuremethod,whichsetsyscrollcommand=_scrollbar.set.
Byprovidingthisreciprocalbond,whenwescrollonListbox,Scrollbarwillmoveaccordinglyandviceversa,whenweoperateScrollbar,Listboxwillscrollaccordingly:
_radio_frame=ttk.Frame(_img_frame)
_radio_frame.grid(row=0,column=2,sticky=(N,S,W,E))
WeplacetheRadioFrame,readytobepopulated.NotethatListboxisoccupying(0,0)on_img_frame,Scrollbar(0,1),andtherefore_radio_framewillgoin(0,2):
_choice_lbl=ttk.Label(
_radio_frame,text="Choosehowtosaveimages")
_choice_lbl.grid(row=0,column=0,padx=5,pady=5)
_save_method=StringVar()
_save_method.set('img')
_img_only_radio=ttk.Radiobutton(
_radio_frame,text='AsImages',variable=_save_method,
value='img')
_img_only_radio.grid(
row=1,column=0,padx=5,pady=2,sticky=W)
_img_only_radio.configure(state='normal')
_json_radio=ttk.Radiobutton(
_radio_frame,text='AsJSON',variable=_save_method,
value='json')
_json_radio.grid(row=2,column=0,padx=5,pady=2,sticky=W)
Firstly,weplacethelabel,andwegiveitsomepadding.Notethatthelabelandradiobuttonsarechildrenof_radio_frame.
AsfortheEntryandListboxobjects,Radiobuttonisalsodrivenbyabondtoanexternalvariable,whichIcalled_save_method.EachRadiobuttoninstancesetsavalueargument,andbycheckingthevalueon_save_method,weknowwhichbuttonisselected:
_scrape_btn=ttk.Button(
_mainframe,text='Scrape!',command=save)
_scrape_btn.grid(row=2,column=0,sticky=E,pady=5)
Onthethirdrowof_mainframeweplacetheScrapebutton.Itscommandissave,whichsavestheimagestobelistedinListbox,afterwehavesuccessfullyparsedawebpage:
_status_frame=ttk.Frame(
_root,relief='sunken',padding='2222')
_status_frame.grid(row=1,column=0,sticky=(E,W,S))
_status_msg=StringVar()
_status_msg.set('TypeaURLtostartscraping...')
_status=ttk.Label(
_status_frame,textvariable=_status_msg,anchor=W)
_status.grid(row=0,column=0,sticky=(E,W))
Weendthelayoutsectionbyplacingdownthestatusframe,whichisasimplettk.Frame.Togiveitalittlestatusbareffect,wesetitsreliefpropertyto'sunken'andgiveitauniformpaddingoftwopixels.Itneedstosticktotheleft,right,andbottompartsofthe_rootwindow,sowesetitsstickyattributeto(E,W,S).
Wethenplacealabelinitand,thistime,wetieittoaStringVarobject,becausewewillhavetomodifyiteverytimewewanttoupdatethestatusbartext.Youshouldbeacquaintedwiththistechniquebynow.
Finally,onthelastline,weruntheapplicationbycallingthemainloopmethodontheTkinstance:
_root.mainloop()
Pleaserememberthatalltheseinstructionsareplacedundertheif__name__==
"__main__":clauseintheoriginalscript.
Asyoucansee,thecodetodesignourGUIapplicationisnothard.Granted,atthebeginning,youhavetoplayaroundalittlebit.Noteverythingwillworkoutperfectlyatthefirstattempt,butIpromiseyouit'sveryeasyandyoucanfindplentyoftutorialsontheweb.Let'snowgettotheinterestingbit,thebusinesslogic.
ThebusinesslogicWe'llanalyzethebusinesslogicoftheGUIapplicationinthreechunks.Thereisthefetchinglogic,thesavinglogic,andthealertinglogic.
FetchingthewebpageLet'sstartwiththecodetofetchthepageandimages:
config={}
deffetch_url():
url=_url.get()
config['images']=[]
_images.set(())#initialisedasanemptytuple
try:
page=requests.get(url)
exceptrequests.RequestExceptionaserr:
_sb(str(err))
else:
soup=BeautifulSoup(page.content,'html.parser')
images=fetch_images(soup,url)
ifimages:
_images.set(tuple(img['name']forimginimages))
_sb('Imagesfound:{}'.format(len(images)))
else:
_sb('Noimagesfound')
config['images']=images
deffetch_images(soup,base_url):
images=[]
forimginsoup.findAll('img'):
src=img.get('src')
img_url=f'{base_url}/{src}'
name=img_url.split('/')[-1]
images.append(dict(name=name,url=img_url))
returnimages
Firstofall,letmeexplainthatconfigdictionary.WeneedsomewayofpassingdatabetweentheGUIapplicationandthebusinesslogic.Now,insteadofpollutingtheglobalnamespacewithmanydifferentvariables,mypersonalpreferenceistohaveasingledictionarythatholdsalltheobjectsweneedtopassbackandforth,sothattheglobalnamespaceisn'tcloggedupwithallthosenames,andwehaveasingle,clean,easywayofknowingwherealltheobjectsthatareneededbyourapplicationare.
Inthissimpleexample,we'lljustpopulatetheconfigdictionarywiththeimageswefetchfromthepage,butIwantedtoshowyouthetechniquesothatyouhaveatleastoneexample.ThistechniquecomesfrommyexperiencewithJavaScript.Whenyoucodeawebpage,youoftenimportseveraldifferentlibraries.Ifeachoftheseclutteredtheglobalnamespacewithallsortsofvariables,theremightbeissuesinmakingeverythingwork,becauseofnameclashesandvariable
overriding.
So,it'smuchbettertoleavetheglobalnamespaceascleanaswecan.Inthiscase,Ifindthatusingoneconfigvariableismorethanacceptable.
Thefetch_urlfunctionisquitesimilartowhatwedidinthescript.First,wegettheurlvaluebycalling_url.get().Rememberthatthe_urlobjectisaStringVarinstancethatistiedtothe_url_entryobject,whichisanEntry.ThetextfieldyouseeontheGUIistheEntry,butthetextbehindthescenesisthevalueoftheStringVarobject.
Bycallingget()on_url,wegetthevalueofthetext,whichisdisplayedin_url_entry.
Thenextstepistoprepareconfig['images']tobeanemptylist,andtoemptythe_imagesvariable,whichistiedto_img_listbox.This,ofcourse,hastheeffectofcleaningupalltheitemsin_img_listbox.
Afterthispreparationstep,wecantrytofetchthepage,usingthesametry/exceptlogicweadoptedinthescriptatthebeginningofthechapter.Theonedifferenceistheactionwetakeifthingsgowrong.Wecall_sb(str(err))._sbisahelperfunctionwhosecodewe'llseeshortly.Basically,itsetsthetextinthestatusbarforus.Notagoodname,right?Ihadtoexplainitsbehaviortoyou–foodforthought.
Ifwecanfetchthepage,thenwecreatethesoupinstance,andfetchtheimagesfromit.Thelogicoffetch_imagesisexactlythesameastheoneexplainedbefore,soIwon'trepeatmyselfhere.
Ifwehaveimages,usingaquicktuplecomprehension(whichisactuallyageneratorexpressionfedtoatupleconstructor)wefeedthe_imagesasStringVarandthishastheeffectofpopulatingour_img_listboxwithalltheimagenames.Finally,weupdatethestatusbar.
Iftherewerenoimages,westillupdatethestatusbar,andattheendofthefunction,regardlessofhowmanyimageswerefound,weupdateconfig['images']toholdtheimageslist.Inthisway,we'llbeabletoaccesstheimagesfromotherfunctionsbyinspectingconfig['images']withouthavingtopassthatlistaround.
SavingtheimagesThelogictosavetheimagesisprettystraightforward.Hereitis:
defsave():
ifnotconfig.get('images'):
_alert('Noimagestosave')
return
if_save_method.get()=='img':
dirname=filedialog.askdirectory(mustexist=True)
_save_images(dirname)
else:
filename=filedialog.asksaveasfilename(
initialfile='images.json',
filetypes=[('JSON','.json')])
_save_json(filename)
def_save_images(dirname):
ifdirnameandconfig.get('images'):
forimginconfig['images']:
img_data=requests.get(img['url']).content
filename=os.path.join(dirname,img['name'])
withopen(filename,'wb')asf:
f.write(img_data)
_alert('Done')
def_save_json(filename):
iffilenameandconfig.get('images'):
data={}
forimginconfig['images']:
img_data=requests.get(img['url']).content
b64_img_data=base64.b64encode(img_data)
str_img_data=b64_img_data.decode('utf-8')
data[img['name']]=str_img_data
withopen(filename,'w')asijson:
ijson.write(json.dumps(data))
_alert('Done')
WhentheuserclickstheScrape!button,thesavefunctioniscalledusingthecallbackmechanism.
Thefirstthingthatthisfunctiondoesischeckwhetherthereareactuallyanyimagestobesaved.Ifnot,italertstheuseraboutit,usinganotherhelperfunction,_alert,whosecodewe'llseeshortly.Nofurtheractionisperformediftherearenoimages.
Ontheotherhand,iftheconfig['images']listisnotempty,saveactsasadispatcher,anditcalls_save_imagesor_save_json,accordingtowhichvalueisheldby
_same_method.Remember,thisvariableistiedtotheradiobuttons,thereforeweexpectitsvaluetobeeither'img'or'json'.
Thisdispatcherisabitdifferentfromtheoneinthescript.Accordingtowhichmethodwehaveselected,adifferentactionmustbetaken.
Ifwewanttosavetheimagesasimages,weneedtoasktheusertochooseadirectory.Wedothisbycallingfiledialog.askdirectoryandassigningtheresultofthecalltothedirnamevariable.Thisopensupanicedialogwindowthatasksustochooseadirectory.Thedirectorywechoosemustexist,asspecifiedbythewaywecallthemethod.Thisisdonesothatwedon'thavetowritecodetodealwithapotentiallymissingdirectorywhensavingthefiles.
Here'showthisdialogshouldlookonamac:
Ifwecanceltheoperation,dirnamewillbesettoNone.
Beforefinishinganalyzingthelogicinsave,let'squicklygothrough_save_images.
It'sverysimilartotheversionwehadinthescriptsojustnotethat,atthebeginning,inordertobesurethatweactuallyhavesomethingtodo,wecheckonbothdirnameandthepresenceofatleastoneimageinconfig['images'].
Ifthat'sthecase,itmeanswehaveatleastoneimagetosaveandthepathforit,sowecanproceed.Thelogictosavetheimageshasalreadybeenexplained.Theonethingwedodifferentlythistimeisjointhedirectory(whichmeansthecompletepath)totheimagename,bymeansofos.path.join.
Attheendof_save_images,ifwesavedatleastoneimage,wealerttheuserthatwe'redone.
Let'sgobacknowtotheotherbranchinsave.ThisbranchisexecutedwhentheuserselectstheAsJSONradiobuttonbeforepressingtheScrapebutton.Inthiscase,wewanttosaveafile;therefore,wecannotjustaskforadirectory.Wewanttogivetheusertheabilitytochooseafilenameaswell.Hence,wefireupadifferentdialog:filedialog.asksaveasfilename.
Wepassaninitialfilename,whichisproposedtotheuser–theyhavetheabilitytochangeitiftheydon'tlikeit.Moreover,becausewe'resavingaJSONfile,we'reforcingtheusertousethecorrectextensionbypassingthefiletypesargument.Itisalist,withanynumberoftwo-tuples(description,extension),thatrunsthelogicofthedialog.
Here'showthisdialogshouldlookonamacOS:
Oncewehavechosenaplaceandafilename,wecanproceedwiththesavinglogic,whichisthesameasitwasinthepreviousscript.WecreateaJSONobjectfromaPythondictionary(data)thatwepopulatewithkey/valuepairsmadebytheimagesnameandBase64-encodedcontent.
In_save_jsonaswell,wehavealittlecheckatthebeginningthatmakessurethat
wedon'tproceedunlesswehaveafilenameandatleastoneimagetosave.ThisensuresthatiftheuserpressestheCancelbutton,nothingbadhappens.
AlertingtheuserFinally,let'sseethealertinglogic.It'sextremelysimple:
def_sb(msg):
_status_msg.set(msg)
def_alert(msg):
messagebox.showinfo(message=msg)
That'sit!Tochangethestatusbarmessageallweneedtodoistoaccess_status_msgStringVar,asit'stiedtothe_statuslabel.
Ontheotherhand,ifwewanttoshowtheuseramorevisiblemessage,wecanfireupamessagebox.Here'showitshouldlookonamac:
Themessageboxobjectcanalsobeusedtowarntheuser(messagebox.showwarning)ortosignalanerror(messagebox.showerror).Butitcanalsobeusedtoprovidedialogsthataskuswhetherwe'resurewewanttoproceedorifwereallywanttodeletethatfile,andsoon.
Ifyouinspectmessageboxbysimplyprintingoutwhatdir(messagebox)returns,you'llfindmethodssuchasaskokcancel,askquestion,askretrycancel,askyesno,andaskyesnocancel,aswellasasetofconstantstoverifytheresponseoftheuser,suchasCANCEL,NO,OK,OKCANCEL,YES,andYESNOCANCEL.Youcancomparethesetotheuser'schoicesothatyouknowthenextactiontoexecutewhenthedialogcloses.
Howcanweimprovetheapplication?Nowthatyou'reaccustomedtothefundamentalsofdesigningaGUIapplication,I'dliketogiveyousomesuggestionsonhowtomakeoursbetter.
Wecanstartwiththecodequality.Doyouthinkthiscodeisgoodenough,orwouldyouimproveit?Ifso,how?Iwouldtestit,andmakesureit'srobustandcatersforallthevariousscenariosthatausermightcreatebyclickingaroundontheapplication.IwouldalsomakesurethebehavioriswhatIwouldexpectwhenthewebsitewe'rescrapingisdownforanyreason.
Anotherthingthatwecouldimproveisthenaming.Ihaveprudentlynamedallthecomponentswithaleadingunderscore,bothtohighlighttheirsomewhatprivatenature,andtoavoidhavingnameclasheswiththeunderlyingobjectstheyarelinkedto.Butinretrospect,manyofthosecomponentscoulduseabettername,soit'sreallyuptoyoutorefactoruntilyoufindtheformthatsuitsyoubest.Youcouldstartbygivingabetternametothe_sbfunction!
Forwhatconcernstheuserinterface,youcouldtrytoresizethemainapplication.Seewhathappens?Thewholecontentstaysexactlywhereitis.Emptyspaceisaddedifyouexpand,orthewholewidgetssetdisappearsgraduallyifyoushrink.Thisbehaviorisn'texactlynice,thereforeonequicksolutioncouldbetomaketherootwindowfixed(thatis,unabletoresize).
Anotherthingthatyoucoulddotoimprovetheapplicationistoaddthesamefunctionalitywehadinthescript,tosaveonlyPNGsorJPGs.Inordertodothis,youcouldplaceacomboboxsomewhere,withthreevalues:All,PNGs,JPGs,orsomethingsimilar.Theusershouldbeabletoselectoneofthoseoptionsbeforesavingthefiles.
Evenbetter,youcouldchangethedeclarationofListboxsothatit'spossibletoselectmultipleimagesatthesametime,andonlytheselectedoneswillbesaved.Ifyoumanagetodothis(it'snotashardasitseems,believeme),thenyoushouldconsiderpresentingtheListboxabitbetter,maybeprovidingalternatingbackgroundcolorsfortherows.
Anothernicethingyoucouldaddisabuttonthatopensupadialogtoselectafile.ThefilemustbeoneoftheJSONfilestheapplicationcanproduce.Onceselected,youcouldrunsomelogictoreconstructtheimagesfromtheirBase64-encodedversion.Thelogictodothisisverysimple,sohere'sanexample:
withopen('images.json','r')asf:
data=json.loads(f.read())
for(name,b64val)indata.items():
withopen(name,'wb')asf:
f.write(base64.b64decode(b64val))
Asyoucansee,weneedtoopenimages.jsoninreadmode,andgrabthedatadictionary.Oncewehaveit,wecanloopthroughitsitems,andsaveeachimagewiththeBase64-decodedcontent.I'llleaveituptoyoutotiethislogictoabuttonintheapplication.
AnothercoolfeaturethatyoucouldaddistheabilitytoopenupapreviewpanethatshowsanyimageyouselectfromListbox,sothattheusercantakeapeekattheimagesbeforedecidingtosavethem.
Finally,onelastsuggestionforthisapplicationistoaddamenu.MaybeevenasimplemenuwithFileand?toprovidetheusualHelporAbout.Justforfun.Addingmenusisnotthatcomplicated;youcanaddtext,keyboardshortcuts,images,andsoon.
Wheredowegofromhere?IfyouareinterestedindiggingdeeperintotheworldofGUIs,thenI'dliketoofferyouthefollowingsuggestions.
TheturtlemoduleTheturtlemoduleisanextendedreimplementationoftheeponymousmodulefromthePythonstandarddistributionuptoversionPython2.5.It'saverypopularwaytointroducechildrentoprogramming.
It'sbasedontheideaofanimaginaryturtlestartingat(0,0)intheCartesianplane.Youcanprogrammaticallycommandtheturtletomoveforwardandbackward,rotate,andsoon;bycombiningallthepossiblemoves,allsortsofintricateshapesandimagescanbedrawn.
It'sdefinitelyworthcheckingout,ifonlytoseesomethingdifferent.
wxPython,PyQt,andPyGTKAfteryouhaveexploredthevastnessofthetkinterrealm,I'dsuggestyouexploreotherGUIlibraries:wxPython(https://www.wxpython.org/),PyQt(https://riverbankcomputing.com/software/pyqt/intro),andPyGTK(https://pygobject.readthedocs.io/en/latest/).Youmayfindoutoneoftheseworksbetterforyou,oritmakesiteasierforyoutocodetheapplicationyouneed.
Ibelievethatcoderscanrealizetheirideasonlywhentheyareconsciousofwhattoolstheyhaveavailable.Ifyourtoolsetistoonarrow,yourideasmayseemimpossibleorextremelyhardtorealize,andtheyriskremainingexactlywhattheyare,justideas.
Ofcourse,thetechnologicalspectrumtodayishumongous,soknowingeverythingisnotpossible;therefore,whenyouareabouttolearnanewtechnologyoranewsubject,mysuggestionistogrowyourknowledgebyexploringbreadthfirst.
Investigateseveralthings,andthengodeepwiththeoneorthefewthatlookedmostpromising.Thiswayyou'llbeabletobeproductivewithatleastonetool,andwhenthetoolnolongerfitsyourneeds,you'llknowwheretodigdeeper,thankstoyourpreviousexploration.
TheprincipleofleastastonishmentWhendesigninganinterface,therearemanydifferentthingstobearinmind.Oneofthem,whichformeisthemostimportant,isthelaworprincipleofleastastonishment.Itbasicallystatesthatifinyourdesignanecessaryfeaturehasahighastonishingfactor,itmaybenecessarytoredesignyourapplication.Togiveyouoneexample,whenyou'reusedtoworkingwithWindows,wherethebuttonstominimize,maximize,andcloseawindowareonthetop-rightcorner,it'squitehardtoworkonLinux,wheretheyareatthetop-leftcorner.You'llfindyourselfconstantlygoingtothetop-rightcorneronlytodiscoveroncemorethatthebuttonsareontheotherside.
Ifacertainbuttonhasbecomesoimportantinapplicationsthatit'snowplacedinapreciselocationbydesigners,pleasedon'tinnovate.Justfollowtheconvention.Userswillonlybecomefrustratedwhentheyhavetowastetimelookingforabuttonthatisnotwhereit'ssupposedtobe.
ThedisregardforthisruleisthereasonwhyIcannotworkwithproductssuchasJira.Ittakesmeminutestodosimplethingsthatshouldrequireseconds.
ThreadingconsiderationsThistopicisoutsidethescopeofthisbook,butIdowanttomentionit.
IfyouarecodingaGUIapplicationthatneedstoperformalong-runningoperationwhenabuttonisclicked,youwillseethatyourapplicationwillprobablyfreezeuntiltheoperationhasbeencarriedout.Inordertoavoidthis,andmaintaintheapplication'sresponsiveness,youmayneedtorunthattime-expensiveoperationinadifferentthread(orevenadifferentprocess)sothattheOSwillbeabletodedicatealittlebitoftimetotheGUIeverynowandthen,tokeepitresponsive.
Gainagoodgraspofthefundamentalsfirst,andthenhavefunexploringthem!
SummaryInthischapter,weworkedonaprojecttogether.Wehavewrittenascriptthatscrapesaverysimplewebpageandacceptsoptionalcommandsthatalteritsbehaviorindoingso.WealsocodedaGUIapplicationtodothesamethingbyclickingbuttonsinsteadoftypingonaconsole.IhopeyouenjoyedreadingitandfollowingalongasmuchasIenjoyedwritingit.
Wesawmanydifferentconcepts,suchasworkingwithfilesandperformingHTTPrequests,andwetalkedaboutguidelinesforusabilityanddesign.
Ihaveonlybeenabletoscratchthesurface,buthopefullyyouhaveagoodstartingpointfromwhichtoexpandyourexploration.
Throughoutthechapter,Ihavepointedoutseveraldifferentwaysyoucouldimprovetheapplication,andIhavechallengedyouwithafewexercisesandquestions.Ihopeyouhavetakenthetimetoplaywiththoseideas.Youcanlearnalotjustbyplayingaroundwithfunapplicationsliketheonewe'vecodedtogether.
Inthenextchapter,we'regoingtotalkaboutdatascience,oratleastaboutthetoolsthataPythonprogrammerhaswhenitcomestofacingthissubject.
DataScience"Ifwehavedata,let'slookatdata.Ifallwehaveareopinions,let'sgowithmine."
–JimBarksdale,formerNetscapeCEO
Datascienceisaverybroadtermandcanassumeseveraldifferentmeaningsbasedoncontext,understanding,tools,andsoon.Therearecountlessbooksonthissubject,whichisnotsuitableforthefaint-hearted.
Inordertodoproperdatascience,youneedto,attheveryleast,knowmathematicsandstatistics.Then,youmaywanttodigintoothersubjects,suchaspatternrecognitionandmachinelearningand,ofcourse,thereisaplethoraoflanguagesandtoolsyoucanchoosefrom.
Iwon'tbeabletotalkabouteverythinghere.Therefore,inordertorenderthischaptermeaningful,we'regoingtoworkonacoolprojecttogetherinstead.
Aroundtheyear2012/2013,Iwasworkingforatop-tiersocialmediacompanyinLondon.Istayedtherefortwoyears,andIwasprivilegedtoworkwithseveralpeoplewhosebrillianceIcanonlystarttodescribe.WewerethefirstintheworldtohaveaccesstotheTwitterAdsAPI,andwewerepartnerswithFacebookaswell.Thatmeansalotofdata.
Ouranalystsweredealingwithahugenumberofcampaignsandtheywerestrugglingwiththeamountofworktheyhadtodo,sothedevelopmentteamIwasapartoftriedtohelpbyintroducingthemtoPythonandtothetoolsPythongivesyoutodealwithdata.ItwasaveryinterestingjourneythatledmetomentorseveralpeopleinthecompanyandeventuallytookmetoManilawhere,fortwoweeks,IgaveintensivetraininginPythonanddatasciencetotheanalystsoverthere.
Theprojectwe'regoingtodointhischapterisalightweightversionofthefinalexampleIpresentedtomystudentsinManila.Ihaverewrittenittoasizethatwillfitthischapter,andmadeafewadjustmentshereandthereforteachingpurposes,butallthemainconceptsarethere,soitshouldbefunandinstructionalforyou.
Specifically,wearegoingtoexplorethefollowing:
TheJupyterNotebookPandasandNumPy:mainlibrariesfordatascienceinPythonAfewconceptsaroundPandas'sDataFrameclassCreatingandmanipulatingadataset
Let'sstartbytalkingaboutRomangods.
IPythonandJupyterNotebookIn2001,FernandoPerezwasagraduatestudentinphysicsatCUBoulder,andwastryingtoimprovethePythonshellsothathecouldhavethenicetieshewasusedtowhenhewasworkingwithtoolssuchasMathematicaandMaple.TheresultofthatefforttookthenameIPython.
Inanutshell,thatsmallscriptbeganasanenhancedversionofthePythonshelland,throughtheeffortofothercodersandeventuallywithproperfundingfromseveraldifferentcompanies,itbecamethewonderfulandsuccessfulprojectitistoday.Some10yearsafteritsbirth,aNotebookenvironmentwascreated,poweredbytechnologiessuchasWebSockets,theTornadowebserver,jQuery,CodeMirror,andMathJax.TheZeroMQlibrarywasalsousedtohandlethemessagesbetweentheNotebookinterfaceandthePythoncorethatliesbehindit.
TheIPythonNotebookhasbecomesopopularandwidelyusedthat,overtime,allsortsofgoodieshavebeenaddedtoit.Itcanhandlewidgets,parallelcomputing,allsortsofmediaformats,andmuch,muchmore.Moreover,atsomepoint,itbecamepossibletocodeinlanguagesotherthanPythonfromwithintheNotebook.
Thishasledtoahugeprojectthatatsomestagehasbeensplitintotwo:IPythonhasbeenstrippeddowntofocusmoreonthekernelpartandtheshell,whiletheNotebookhasbecomeabrandnewprojectcalledJupyter.Jupyterallowsinteractivescientificcomputationstobemadeinmorethan40languages.
Thischapter'sprojectwillallbecodedandruninaJupyterNotebook,soletmeexplaininafewwordswhataNotebookis.
ANotebookenvironmentisawebpagethatexposesasimplemenuandthecellsinwhichyoucanrunPythoncode.Eventhoughthecellsareseparateentitiesthatyoucanrunindividually,theyallsharethesamePythonkernel.Thismeansthatallthenamesthatyoudefineinacell(thevariables,functions,andsoon)willbeavailableinanyothercell.
Simplyput,aPythonkernelisaprocessinwhichPythonisrunning.TheNotebookwebpage
is,therefore,aninterfaceexposedtotheuserfordrivingthiskernel.Thewebpagecommunicatestoitusingaveryfastmessagingsystem.
Apartfromallthegraphicaladvantages,thebeautyofhavingsuchanenvironmentliesintheabilitytorunaPythonscriptinchunks,andthiscanbeatremendousadvantage.Takeascriptthatisconnectingtoadatabasetofetchdataandthenmanipulatethatdata.Ifyoudoitintheconventionalway,withaPythonscript,youhavetofetchthedataeverytimeyouwanttoexperimentwithit.WithinaNotebookenvironment,youcanfetchthedatainacellandthenmanipulateandexperimentwithitinothercells,sofetchingiteverytimeisnotnecessary.
TheNotebookenvironmentisalsoextremelyhelpfulfordatasciencebecauseitallowsforstep-by-stepintrospection.Youdoonechunkofworkandthenverifyit.Youthendoanotherchunkandverifyagain,andsoon.
It'salsoinvaluableforprototypingbecausetheresultsarethere,rightinfrontofyoureyes,immediatelyavailable.
Ifyouwanttoknowmoreaboutthesetools,pleasecheckoutipython.organdjupyter.org.
IhavecreatedaverysimpleexampleNotebookwithafibonaccifunctionthatgivesyouthelistofalltheFibonaccinumberssmallerthanagivenN.Inmybrowser,itlookslikethis:
EverycellhasanIn[]label.Ifthere'snothingbetweenthebrackets,itmeansthatacellhasneverbeenexecuted.Ifthereisanumber,itmeansthatthecellhasbeenexecuted,andthenumberrepresentstheorderinwhichthecellwasexecuted.Finally,a*meansthatthecelliscurrentlybeingexecuted.
YoucanseeinthepicturethatinthefirstcellIhavedefinedthefibonaccifunction,andIhaveexecutedit.ThishastheeffectofplacingthefibonaccinameintheglobalframeassociatedwiththeNotebook,thereforethefibonaccifunctionisnowavailabletotheothercellsaswell.Infact,inthesecondcell,Icanrunfibonacci(100)andseetheresultsinOut[2].Inthethirdcell,IhaveshownyouoneoftheseveralmagicfunctionsyoucanfindinaNotebookinthesecondcell.%timeitrunsthecodeseveraltimesandprovidesyouwithanicebenchmarkforit.AllthemeasurementsforthelistcomprehensionsandgeneratorsIdidinChapter5,SavingTimeandMemory,werecarriedoutwiththisnicefeature.
Youcanexecuteacellasmanytimesasyouwant,andchangetheorderinwhichyourunthem.Cellsareverymalleable,youcanalsoputinmarkdowntextorrenderthemasheaders.
MarkdownisalightweightmarkuplanguagewithplaintextformattingsyntaxdesignedsothatitcanbeconvertedtoHTMLandmanyotherformats.
Also,whateveryouplaceinthelastrowofacellwillbeautomaticallyprintedforyou.Thisisveryhandybecauseyou'renotforcedtowriteprint(...)
explicitly.
FeelfreetoexploretheNotebookenvironment;onceyou'refriendswithit,it'salong-lastingrelationship,Ipromise.
InstallingtherequiredlibrariesInordertoruntheNotebook,youhavetoinstallahandfuloflibraries,eachofwhichcollaborateswiththeotherstomakethewholethingwork.Alternatively,youcanjustinstallJupyteranditwilltakecareofeverythingforyou.Forthischapter,thereareafewotherdependenciesthatweneedtoinstall.Youcanfindthemlistedinrequirements/requirements.data.science.in.Toinstallthem,pleasetakealookatREADME.rstintherootfolderoftheproject,andyouwillfindinstructionsspecificallyforthischapter.
UsingAnacondaSometimesinstallingdatasciencelibrariescanbeextremelypainful.Ifyouarestrugglingtoinstallthelibrariesforthischapterinyourvirtualenvironment,analternativechoiceyouhaveistoinstallAnaconda.AnacondaisafreeandopensourcedistributionofthePythonandRprogramminglanguagesfordatascienceandmachine-learning-relatedapplicationsthataimstosimplifypackagemanagementanddeployment.Youcandownloaditfromtheanaconda.orgwebsite.Onceyouhaveinstalleditinyoursystem,takeapeekatthevariousrequirementsforthischapterandinstallthemthroughAnaconda.
StartingaNotebookOnceyouhavealltherequiredlibrariesinstalled,youcaneitherstartaNotebookwiththefollowingcommandorbyusingtheAnacondainterface:$jupyternotebook
Youwillhaveanopenpageinyourbrowseratthisaddress(theportmightbedifferent):http://localhost:8888/.GotothatpageandcreateanewNotebookusingthemenu.Whenyoufeelcomfortablewithit,you'rereadytogo.IstronglyencourageyoutotryandgetaJupyterenvironmentrunning,beforeyouproceedreadingon.Itisanexcellentexercisesometimestohavetodealwithdifficultdependencies.
OurprojectwilltakeplaceinaNotebook,thereforeIwilltageachcodesnippetwiththecellnumberitbelongsto,sothatyoucaneasilyreproducethecodeandfollowalong.
Ifyoufamiliarizeyourselfwiththekeyboardshortcuts(lookintheNotebook'sHelpsection),youwillbeabletomovebetweencellsandhandletheircontentwithouthavingtoreachforthemouse.ThiswillmakeyoumoreproficientandwayfasterwhenyouworkinaNotebook.
Let'snowmoveonandtalkaboutthemostinterestingpartofthischapter:data.
Dealingwithdata
Typically,whenyoudealwithdata,thisisthepathyougothrough:youfetchit,youcleanandmanipulateit,andthenyouinspectit,andpresentresultsasvalues,spreadsheets,graphs,andsoon.Iwantyoutobeinchargeofallthreestepsoftheprocesswithouthavinganyexternaldependencyonadataprovider,sowe'regoingtodothefollowing:
1. We'regoingtocreatethedata,simulatingthefactthatitcomesinaformatthatisnotperfectorreadytobeworkedon
2. We'regoingtocleanitandfeedittothemaintoolwe'lluseintheprojectsuchasDataFramefromthepandaslibrary
3. We'regoingtomanipulatethedatainDataFrame4. We'regoingtosaveDataFrametoafileindifferentformats5. We'regoingtoinspectthedataandgetsomeresultsoutofit
SettinguptheNotebook
Firstthingsfirst,let'sproducethedata.Westartfromthech13-dataprepNotebook:
#1
importjson
importrandom
fromdatetimeimportdate,timedelta
importfaker
Cell#1takescareoftheimports.Wehavealreadyencounteredthem,apartfromfaker.Youcanusethismoduletopreparefakedata.It'sveryusefulintests,whenyouprepareyourfixtures,togetallsortsofthingssuchasnames,emailaddresses,phonenumbers,andcreditcarddetails.Itisallfake,ofcourse.
PreparingthedataWewanttoachievethefollowingdatastructure:we'regoingtohavealistofuserobjects.Eachuserobjectwillbelinkedtoanumberofcampaignobjects.InPython,everythingisanobject,soI'musingthisterminagenericway.Theuserobjectmaybeastring,adictionary,orsomethingelse.
Acampaigninthesocialmediaworldisapromotionalcampaignthatamediaagencyrunsonsocialmedianetworksonbehalfofaclient.Rememberthatwe'regoingtopreparethisdatasothatit'snotinperfectshape(butitwon'tbethatbadeither...):
#2
fake=faker.Faker()
Firstly,weinstantiatetheFakerthatwe'llusetocreatethedata:
#3
usernames=set()
usernames_no=1000
#populatethesetwith1000uniqueusernames
whilelen(usernames)<usernames_no:
usernames.add(fake.user_name())
Thenweneedusernames.Iwant1,000uniqueusernames,soIloopoverthelengthoftheusernamessetuntilithas1,000elements.Asetmethoddoesn'tallowduplicatedelements,thereforeuniquenessisguaranteed:
#4
defget_random_name_and_gender():
skew=.6#60%ofuserswillbefemale
male=random.random()>skew
ifmale:
returnfake.name_male(),'M'
else:
returnfake.name_female(),'F'
defget_users(usernames):
users=[]
forusernameinusernames:
name,gender=get_random_name_and_gender()
user={
'username':username,
'name':name,
'gender':gender,
'email':fake.email(),
'age':fake.random_int(min=18,max=90),
'address':fake.address(),
}
users.append(json.dumps(user))
returnusers
users=get_users(usernames)
users[:3]
Here,wecreatealistofusers.Eachusernamehasnowbeenaugmentedtoafull-blownuserdictionary,withotherdetailssuchasname,gender,andemail.EachuserdictionaryisthendumpedtoJSONandaddedtothelist.Thisdatastructureisnotoptimal,ofcourse,butwe'resimulatingascenariowhereuserscometouslikethat.
Notetheskeweduseofrandom.random()tomake60%ofusersfemale.Therestofthelogicshouldbeveryeasyforyoutounderstand.
Notealsothelastline.Eachcellautomaticallyprintswhat'sonthelastline;therefore,theoutputof#4isalistwiththefirstthreeusers:
['{"username":"samuel62","name":"TonyaLucas","gender":"F","email":
"[email protected]","age":27,"address":"PSC8934,Box4049\\nAPOAA
43073"}',
'{"username":"eallen","name":"CharlesHarmon","gender":"M","email":
"[email protected]","age":28,"address":"38661ClarkMewsApt.
528\\nAnthonychester,ID25919"}',
'{"username":"amartinez","name":"LauraDunn","gender":"F","email":
"[email protected]","age":88,"address":"0536DanielCourtApt.541\\nPort
Christopher,HI49399-3415"}']
Ihopeyou'refollowingalongwithyourownNotebook.Ifyouare,pleasenotethatalldataisgeneratedusingrandomfunctionsandvalues;therefore,youwillseedifferentresults.TheywillchangeeverytimeyouexecutetheNotebook.
Inthefollowingcode#5isthelogictogenerateacampaignname:
#5
#campaignnameformat:
#InternalType_StartDate_EndDate_TargetAge_TargetGender_Currency
defget_type():
#justsomegibberishinternalcodes
types=['AKX','BYU','GRZ','KTR']
returnrandom.choice(types)
defget_start_end_dates():
duration=random.randint(1,2*365)
offset=random.randint(-365,365)
start=date.today()-timedelta(days=offset)
end=start+timedelta(days=duration)
def_format_date(date_):
returndate_.strftime("%Y%m%d")
return_format_date(start),_format_date(end)
defget_age():
age=random.randint(20,45)
age-=age%5
diff=random.randint(5,25)
diff-=diff%5
return'{}-{}'.format(age,age+diff)
defget_gender():
returnrandom.choice(('M','F','B'))
defget_currency():
returnrandom.choice(('GBP','EUR','USD'))
defget_campaign_name():
separator='_'
type_=get_type()
start,end=get_start_end_dates()
age=get_age()
gender=get_gender()
currency=get_currency()
returnseparator.join(
(type_,start,end,age,gender,currency))
Analystsusespreadsheetsallthetime,andtheycomeupwithallsortsofcodingtechniquestocompressasmuchinformationaspossibleintothecampaignnames.TheformatIchoseisasimpleexampleofthattechnique—thereisacodethattellsusthecampaigntype,thenthestartandenddates,thenthetargetageandgender,andfinallythecurrency.Allvaluesareseparatedbyanunderscore.
Intheget_typefunction,Iuserandom.choice()togetonevaluerandomlyoutofacollection.Probablymoreinterestingisget_start_end_dates.First,Igetthedurationforthecampaign,whichgoesfromonedaytotwoyears(randomly),thenIgetarandomoffsetintimewhichIsubtractfromtoday'sdateinordertogetthestartdate.Giventhatanoffsetisarandomnumberbetween-365and365,wouldanythingbedifferentifIaddedittotoday'sdateinsteadofsubtractingit?
WhenIhaveboththestartandenddates,Ireturnastringifiedversionofthem,joinedbyanunderscore.
Then,wehaveabitofmodulartrickerygoingonwiththeagecalculation.Ihopeyourememberthemodulooperator(%)fromChapter2,Built-inDataTypes.
WhathappenshereisthatIwantadaterangethathasmultiplesoffiveasextremes.So,therearemanywaystodoit,butwhatIdoistogetarandom
numberbetween20and45fortheleftextreme,andremovetheremainderofthedivisionby5.So,if,forexample,Iget28,Iwillremove28%5=3fromit,getting25.Icouldhavejustusedrandom.randrange(),butit'shardtoresistmodulardivision.
Therestofthefunctionsarejustsomeotherapplicationsofrandom.choice()andthelastone,get_campaign_name,isnothingmorethanacollectorforallthesepuzzlepiecesthatreturnsthefinalcampaignname:
#6
#campaigndata:
#name,budget,spent,clicks,impressions
defget_campaign_data():
name=get_campaign_name()
budget=random.randint(10**3,10**6)
spent=random.randint(10**2,budget)
clicks=int(random.triangular(10**2,10**5,0.2*10**5))
impressions=int(random.gauss(0.5*10**6,2))
return{
'cmp_name':name,
'cmp_bgt':budget,
'cmp_spent':spent,
'cmp_clicks':clicks,
'cmp_impr':impressions
}
In#6,wewriteafunctionthatcreatesacompletecampaignobject.Iusedafewdifferentfunctionsfromtherandommodule.random.randint()givesyouanintegerbetweentwoextremes.Theproblemwithitisthatitfollowsauniformprobabilitydistribution,whichmeansthatanynumberintheintervalhasthesameprobabilityofcomingup.
Therefore,whendealingwithalotofdata,ifyoudistributeyourfixturesusingauniformdistribution,theresultsyougetwillalllooksimilar.Forthisreason,Ichosetousetriangularandgauss,forclicksandimpressions.Theyusedifferentprobabilitydistributionssothatwe'llhavesomethingmoreinterestingtoseeintheend.
Justtomakesurewe'reonthesamepagewiththeterminology:clicksrepresentsthenumberofclicksonacampaignadvertisement,budgetisthetotalamountofmoneyallocatedforthecampaign,spentishowmuchofthatmoneyhasalreadybeenspent,andimpressionsisthenumberoftimesthecampaignhasbeenfetched,asaresource,fromitssource,regardlessofthenumberofclicksthatwereperformedonthecampaign.Normally,theamountofimpressionsisgreaterthan
thenumberofclicks.
Nowthatwehavethedata,it'stimetoputitalltogether:
#7
defget_data(users):
data=[]
foruserinusers:
campaigns=[get_campaign_data()
for_inrange(random.randint(2,8))]
data.append({'user':user,'campaigns':campaigns})
returndata
Asyoucansee,eachitemindataisadictionarywithauserandalistofcampaignsthatareassociatedwiththatuser.
CleaningthedataLet'sstartcleaningthedata:
#8
rough_data=get_data(users)
rough_data[:2]#let'stakeapeek
Wesimulatefetchingthedatafromasourceandtheninspectit.TheNotebookistheperfecttoolforinspectingyoursteps.Youcanvarythegranularitytoyourneeds.Thefirstiteminrough_datalookslikethis:
{'user':'{"username":"samuel62","name":"TonyaLucas","gender":"F","email":
"[email protected]","age":27,"address":"PSC8934,Box4049\\nAPOAA
43073"}',
'campaigns':[{'cmp_name':'GRZ_20171018_20171116_35-55_B_EUR',
'cmp_bgt':999613,
'cmp_spent':43168,
'cmp_clicks':35603,
'cmp_impr':500001},
...
{'cmp_name':'BYU_20171122_20181016_30-45_B_USD',
'cmp_bgt':561058,
'cmp_spent':472283,
'cmp_clicks':44823,
'cmp_impr':499999}]}
So,wenowstartworkingonit:
#9
data=[]
fordatuminrough_data:
forcampaignindatum['campaigns']:
campaign.update({'user':datum['user']})
data.append(campaign)
data[:2]#let'stakeanotherpeek
ThefirstthingweneedtodoinordertobeabletofeedDataFramewiththisdataistodenormalizeit.Thismeanstransformingdataintoalistwhoseitemsarecampaigndictionaries,augmentedwiththeirrelativeuserdictionary.Userswillbeduplicatedineachcampaigntheybelongto.Thefirstitemindatalookslikethis:
{'cmp_name':'GRZ_20171018_20171116_35-55_B_EUR',
'cmp_bgt':999613,
'cmp_spent':43168,
'cmp_clicks':35603,
'cmp_impr':500001,
'user':'{"username":"samuel62","name":"TonyaLucas","gender":"F","email":
"[email protected]","age":27,"address":"PSC8934,Box4049\\nAPOAA
43073"}'}
Youcanseethattheuserobjecthasbeenbroughtintothecampaigndictionary,whichwasrepeatedforeachcampaign.
Now,Iwouldliketohelpyouandofferadeterministicsecondpartofthechapter,soI'mgoingtosavethedataIgeneratedheresothatI(andyou,too)willbeabletoloaditfromthenextNotebook,andweshouldthenhavethesameresults:
#10
withopen('data.json','w')asstream:
stream.write(json.dumps(data))
Youshouldfindthedata.jsonfileinthesourcecodeforthebook.Nowwearedonewithch13-dataprep,sowecancloseit,andopenupch13.
CreatingtheDataFrameFirst,wehaveanotherroundofimports:
#1
importjson
importcalendar
importnumpyasnp
frompandasimportDataFrame
importarrow
importpandasaspd
Thejsonandcalendarlibrariescomefromthestandardlibrary.numpyistheNumPylibrary,thefundamentalpackageforscientificcomputingwithPython.NumPystandsforNumericPython,anditisoneofthemostwidely-usedlibrariesinthedatascienceenvironment.I'llsayafewwordsaboutitlateroninthischapter.pandasistheverycoreuponwhichthewholeprojectisbased.PandasstandsforPythonDataAnalysisLibrary.Amongmanyotherthings,itprovidesDataFrame,amatrix-likedatastructurewithadvancedprocessingcapabilities.It'scustomarytoimportDataFrameseparatelyandthentoimportpandasaspd.
arrowisanicethird-partylibrarythatspeedsupdealingwithdatesdramatically.Technically,wecoulddoitwiththestandardlibrary,butIseenoreasonnottoexpandtherangeoftheexampleandshowyousomethingdifferent.
Aftertheimports,weloadthedataasfollows:
#2
withopen('data.json')asstream:
data=json.loads(stream.read())
Andfinally,it'stimetocreateDataFrame:
#3
df=DataFrame(data)
df.head()
WecaninspectthefirstfiverowsusingtheheadmethodofDataFrame.Youshouldseesomethinglikethis:
Jupyterrenderstheoutputofthedf.head()callasHTMLautomatically.Inordertohaveatext-basedoutput,simplywrapdf.head()inaprintcall.
TheDataFramestructureisverypowerful.Itallowsustomanipulatealotofitscontents.Youcanfilterbyrows,columns,aggregateondata,andmanyotheroperations.YoucanoperatewithrowsorcolumnswithoutsufferingthetimepenaltyyouwouldhavetopayifyouwereworkingondatawithpurePython.Thishappensbecause,underthecovers,pandasisharnessingthepoweroftheNumPylibrary,whichitselfdrawsitsincrediblespeedfromthelow-levelimplementationofitscore.
UsingDataFrameallowsustocouplethepowerofNumPywithspreadsheet-likecapabilitiessothatwe'llbeabletoworkonourdatainafashionthatissimilartowhatananalystcoulddo.Only,wedoitwithcode.
Butlet'sgobacktoourproject.Let'sseetwowaystoquicklygetabird'seyeviewofthedata:
#4
df.count()
countyieldsacountofallthenon-emptycellsineachcolumn.Thisisgoodtohelpyouunderstandhowsparseyourdatacanbe.Inourcase,wehavenomissingvalues,sotheoutputis:
cmp_bgt5037
cmp_clicks5037
cmp_impr5037
cmp_name5037
cmp_spent5037
user5037
dtype:int64
Nice!Wehave5,037rows,andthedatatypeisintegers(dtype:int64meanslongintegersbecausetheytake64bitseach).Giventhatwehave1,000usersandtheamountofcampaignsperuserisarandomnumberbetween2and8,we're
exactlyinlinewithwhatIwasexpecting:
#5
df.describe()
Thedescribemethodisanice,quickwaytointrospectabitfurther:
cmp_bgtcmp_clickscmp_imprcmp_spent
count5037.0000005037.0000005037.0000005037.000000
mean496930.31705440920.962676499999.498312246963.542783
std287126.68348421758.5052102.033342217822.037701
min1057.000000341.000000499993.000000114.000000
25%247663.00000023340.000000499998.00000064853.000000
50%491650.00000037919.000000500000.000000183716.000000
75%745093.00000056253.000000500001.000000379478.000000
max999577.00000099654.000000500008.000000975799.000000
Asyoucansee,itgivesusseveralmeasures,suchascount,mean,std(standarddeviation),min,andmax,andshowshowdataisdistributedinthevariousquadrants.Thankstothismethod,wealreadyhavearoughideaofhowourdataisstructured.
Let'sseewhicharethethreecampaignswiththehighestandlowestbudgets:
#6
df.sort_index(by=['cmp_bgt'],ascending=False).head(3)
Thisgivesthefollowingoutput:
cmp_bgtcmp_clickscmp_imprcmp_name
33219995778232499997GRZ_20180810_20190107_40-55_M_EUR
236199953453223499999GRZ_20180516_20191030_25-30_B_EUR
222099909613347499999KTR_20180620_20190809_40-50_F_USD
Andacalltotailshowsustheoneswiththelowestbudgets:
#7
df.sort_values(by=['cmp_bgt'],ascending=False).tail(3)
UnpackingthecampaignnameNowit'stimetoincreasethecomplexity.Firstofall,wewanttogetridofthathorriblecampaignname(cmp_name).Weneedtoexplodeitintopartsandputeachpartinonededicatedcolumn.Inordertodothis,we'llusetheapplymethodoftheSeriesobject.
Thepandas.core.series.Seriesclassisbasicallyapowerfulwrapperaroundanarray(thinkofitasalistwithaugmentedcapabilities).WecanextrapolateaSeriesobjectfromDataFramebyaccessingitinthesamewaywedowithakeyinadictionary,andwecancallapplyonthatSeriesobject,whichwillrunafunctionfeedingeachitemintheSeriestoit.WecomposetheresultintoanewDataFrame,andthenjointhatDataFramewithdf:
#8
defunpack_campaign_name(name):
#veryoptimisticmethod,assumesdataincampaignname
#isalwaysingoodstate
type_,start,end,age,gender,currency=name.split('_')
start=arrow.get(start,'YYYYMMDD').date()
end=arrow.get(end,'YYYYMMDD').date()
returntype_,start,end,age,gender,currency
campaign_data=df['cmp_name'].apply(unpack_campaign_name)
campaign_cols=[
'Type','Start','End','Age','Gender','Currency']
campaign_df=DataFrame(
campaign_data.tolist(),columns=campaign_cols,index=df.index)
campaign_df.head(3)
Withinunpack_campaign_name,wesplitthecampaignnameinparts.Weusearrow.get()togetaproperdateobjectoutofthosestrings(arrowmakesitreallyeasytodoit,doesn'tit?),andthenwereturntheobjects.Aquickpeekatthelastlinereveals:
TypeStartEndAgeGenderCurrency
0KTR2019-03-242020-11-0620-35FEUR
1GRZ2017-05-212018-07-2430-45BGBP
2KTR2017-12-182018-02-0830-40FGBP
Nice!Oneimportantthing:evenifthedatesappearasstrings,theyarejusttherepresentationoftherealdateobjectsthatarehostedinDataFrame.
Anotherveryimportantthing:whenjoiningtwoDataFrameinstances,it'simperativethattheyhavethesameindex,otherwisepandaswon'tbeabletoknow
whichrowsgowithwhich.Therefore,whenwecreatecampaign_df,wesetitsindextotheonefromdf.Thisenablesustojointhem.WhencreatingthisDataFrame,wealsopassthecolumn'snames:
#9
df=df.join(campaign_df)
Andafterjoin,wetakeapeek,hopingtoseematchingdata:
#10
df[['cmp_name']+campaign_cols].head(3)
Thetruncatedoutputoftheprecedingcodesnippetisasfollows:
cmp_nameTypeStartEnd
0KTR_20190324_20201106_20-35_F_EURKTR2019-03-242020-11-06
1GRZ_20170521_20180724_30-45_B_GBPGRZ2017-05-212018-07-24
2KTR_20171218_20180208_30-40_F_GBPKTR2017-12-182018-02-08
Asyoucansee,joinwassuccessful;thecampaignnameandtheseparatecolumnsexposethesamedata.Didyouseewhatwedidthere?We'reaccessingDataFrameusingthesquarebracketssyntax,andwepassalistofcolumnnames.ThiswillproduceabrandnewDataFrame,withthosecolumns(inthesameorder),onwhichwethencallthehead()method.
UnpackingtheuserdataWenowdotheexactsamethingforeachpieceofuserJSONdata.Wecallapplyontheuserseries,runningtheunpack_user_jsonfunction,whichtakesaJSONuserobjectandtransformsitintoalistofitsfields,whichwecantheninjectintoabrandnewDataFrame,user_df.Afterthat,we'lljoinuser_dfbackwithdf,likewedidwithcampaign_df:
#11
defunpack_user_json(user):
#veryoptimisticaswell,expectsuserobjects
#tohaveallattributes
user=json.loads(user.strip())
return[
user['username'],
user['email'],
user['name'],
user['gender'],
user['age'],
user['address'],
]
user_data=df['user'].apply(unpack_user_json)
user_cols=[
'username','email','name','gender','age','address']
user_df=DataFrame(
user_data.tolist(),columns=user_cols,index=df.index)
Verysimilartothepreviousoperation,isn'tit?Weshouldalsonoteherethat,whencreatinguser_df,weneedtoinstructDataFrameaboutthecolumnnamesandtheindex.Let'sjoinandtakeaquickpeek:
#12
df=df.join(user_df)
#13
df[['user']+user_cols].head(2)
Theoutputshowsusthateverythingwentwell.We'regood,butwe'renotdoneyet.Ifyoucalldf.columnsinacell,you'llseethatwestillhaveuglynamesforourcolumns.Let'schangethat:
#14
better_columns=[
'Budget','Clicks','Impressions',
'cmp_name','Spent','user',
'Type','Start','End',
'TargetAge','TargetGender','Currency',
'Username','Email','Name',
'Gender','Age','Address',
]
df.columns=better_columns
Good!Now,withtheexceptionof'cmp_name'and'user',weonlyhavenicenames.
CompletingthedatasetNextstepwillbetoaddsomeextracolumns.Foreachcampaign,wehavethenumbersofclicksandimpressions,andwehavetheamountsspent.Thisallowsustointroducethreemeasurementratios:CTR,CPC,andCPI.TheystandforClickThroughRate,CostPerClick,andCostPerImpression,respectively.
Thelasttwoarestraightforward,butCTRisnot.Sufficeittosaythatitistheratiobetweenclicksandimpressions.Itgivesyouameasureofhowmanyclickswereperformedonacampaignadvertisementperimpression—thehigherthisnumber,themoresuccessfultheadvertisementisinattractinguserstoclickonit:
#15
defcalculate_extra_columns(df):
#ClickThroughRate
df['CTR']=df['Clicks']/df['Impressions']
#CostPerClick
df['CPC']=df['Spent']/df['Clicks']
#CostPerImpression
df['CPI']=df['Spent']/df['Impressions']
calculate_extra_columns(df)
Iwrotethisasafunction,butIcouldhavejustwrittenthecodeinthecell.It'snotimportant.WhatIwantyoutonoticehereisthatwe'readdingthosethreecolumnswithonelineofcodeeach,butDataFrameappliestheoperationautomatically(thedivision,inthiscase)toeachpairofcellsfromtheappropriatecolumns.So,eveniftheyaremaskedasthreedivisions,theseareactually5037*3divisions,becausetheyareperformedforeachrow.Pandasdoesalotofworkforus,andalsodoesaverygoodjobofhidingthecomplexityofit.
Thefunction,calculate_extra_columns,takesDataFrame,andworksdirectlyonit.Thismodeofoperationiscalledin-place.Doyourememberhowlist.sort()wassortingthelist?Itisthesamedeal.Youcouldalsosaythatthisfunctionisnotpure,whichmeansithassideeffects,asitmodifiesthemutableobjectitispassedasanargument.
Wecantakealookattheresultsbyfilteringontherelevantcolumnsandcallinghead:
#16
df[['Spent','Clicks','Impressions',
'CTR','CPC','CPI']].head(3)
Thisshowsusthatthecalculationswereperformedcorrectlyoneachrow:
SpentClicksImpressionsCTRCPCCPI
039383625544999970.1251090.6295840.078766
1210452361765000010.0723525.8174480.420903
2342507622995000010.1245985.4977930.685013
Now,Iwanttoverifytheaccuracyoftheresultsmanuallyforthefirstrow:
#17
clicks=df['Clicks'][0]
impressions=df['Impressions'][0]
spent=df['Spent'][0]
CTR=df['CTR'][0]
CPC=df['CPC'][0]
CPI=df['CPI'][0]
print('CTR:',CTR,clicks/impressions)
print('CPC:',CPC,spent/clicks)
print('CPI:',CPI,spent/impressions)
Thisyieldsthefollowingoutput:
CTR:0.12510875065250390.1251087506525039
CPC:0.62958403938996710.6295840393899671
CPI:0.07876647259883560.0787664725988356
Thisisexactlywhatwesawinthepreviousoutput.Ofcourse,Iwouldn'tnormallyneedtodothis,butIwantedtoshowyouhowcanyouperformcalculationsthisway.YoucanaccessSeries(acolumn)bypassingitsnametoDataFrame,insquarebrackets,andthenyouaccesseachrowbyitsposition,exactlyasyouwouldwitharegularlistortuple.
We'realmostdonewithourDataFrame.Allwearemissingnowisacolumnthattellsusthedurationofthecampaignandacolumnthattellsuswhichdayoftheweekcorrespondstothestartdateofeachcampaign.Thisallowsmetoexpandonhowtoplaywithdateobjects:
#18
defget_day_of_the_week(day):
number_to_day=dict(enumerate(calendar.day_name,1))
returnnumber_to_day[day.isoweekday()]
defget_duration(row):
return(row['End']-row['Start']).days
df['DayofWeek']=df['Start'].apply(get_day_of_the_week)
df['Duration']=df.apply(get_duration,axis=1)
Weusedtwodifferenttechniquesherebutfirst,thecode.
get_day_of_the_weektakesadateobject.Ifyoucannotunderstandwhatitdoes,pleasetakeafewmomentstotrytounderstandforyourselfbeforereadingtheexplanation.Usetheinside-outtechniquelikewe'vedoneafewtimesbefore.
So,asI'msureyouknowbynow,ifyouputcalendar.day_nameinalistcall,youget['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday'].Thismeansthat,ifweenumeratecalendar.day_namestartingfrom1,wegetpairssuchas(1,'Monday'),(2,'Tuesday'),andsoon.Ifwefeedthesepairstoadictionary,wegetamappingbetweenthedaysoftheweekasnumbers(1,2,3,...)andtheirnames.Whenthemappingiscreated,inordertogetthenameofaday,wejustneedtoknowitsnumber.Togetit,wecalldate.isoweekday(),whichtellsuswhichdayoftheweekthatdateis(asanumber).Youfeedthatintothemappingand,boom!Youhavethenameoftheday.
get_durationisinterestingaswell.First,noticeittakesanentirerow,notjustasinglevalue.Whathappensinitsbodyisthatweperformasubtractionbetweenacampaign'sendandstartdates.Whenyousubtractdateobjects,theresultisatimedeltaobject,whichrepresentsagivenamountoftime.Wetakethevalueofits.daysproperty.Itisassimpleasthat.
Now,wecanintroducethefunpart,theapplicationofthosetwofunctions.
ThefirstapplicationisperformedonaSeriesobject,likewedidbeforefor'user'and'cmp_name';thereisnothingnewhere.
ThesecondoneisappliedtothewholeDataFrameand,inordertoinstructpandastoperformthatoperationontherows,wepassaxis=1.
Wecanverifytheresultsveryeasily,asshownhere:
#19
df[['Start','End','Duration','DayofWeek']].head(3)
Theprecedingcodeyieldsthefollowingoutput:
StartEndDurationDayofWeek
02019-03-242020-11-06593Sunday
12017-05-212018-07-24429Sunday
22017-12-182018-02-0852Monday
So,wenowknowthatbetweenthe24thofMarch,2019andthe6thofNovember,2020thereare593days,andthatthe24thofMarch,2019isaSunday.
Ifyou'rewonderingwhatthepurposeofthisis,I'llprovideanexample.ImaginethatyouhaveacampaignthatistiedtoasportseventthatusuallytakesplaceonaSunday.Youmaywanttoinspectyourdataaccordingtothedayssothatyoucancorrelatethemtothevariousmeasurementsyouhave.We'renotgoingtodoitinthisproject,butitwasusefultosee,ifonlyforthedifferentwayofcallingapply()onDataFrame.
CleaningeverythingupNowthatwehaveeverythingwewant,it'stimetodothefinalcleaning;rememberwestillhavethe'cmp_name'and'user'columns.Thoseareuselessnow,sotheyhavetogo.Also,IwanttoreorderthecolumnsinDataFramesothatitismorerelevanttothedataitnowcontains.Inordertodothis,wejustneedtofilterdfonthecolumnlistwewant.We'llgetbackabrandnewDataFramethatwecanreassigntodfitself:
#20
final_columns=[
'Type','Start','End','Duration','DayofWeek','Budget',
'Currency','Clicks','Impressions','Spent','CTR','CPC',
'CPI','TargetAge','TargetGender','Username','Email',
'Name','Gender','Age'
]
df=df[final_columns]
Ihavegroupedthecampaigninformationatthebeginning,thenthemeasurements,andfinallytheuserdataattheend.NowourDataFrameiscleanandreadyforustoinspect.
Beforewestartgoingcrazywithgraphs,whatabouttakingasnapshotofDataFramesothatwecaneasilyreconstructitfromafile,ratherthanhavingtoredoallthestepswedidtogethere.Someanalystsmaywanttohaveitinspreadsheetform,todoadifferentkindofanalysisthantheonewewanttodo,solet'sseehowtosaveDataFrametoafile.It'seasierdonethansaid.
SavingtheDataFrametoafileWecansaveDataFrameinmanydifferentways.Youcantypedf.to_andthenpressTabtomakeautocompletionpopup,toseeallthepossibleoptions.
We'regoingtosaveDataFrameinthreedifferentformats,justforfun.First,CSV:#21df.to_csv('df.csv')
ThenJSON:
#22
df.to_json('df.json')
Andfinally,inanExcelspreadsheet:
#23
df.to_excel('df.xls')
TheCSVfilelookslikethis(outputtruncated):
,Type,Start,End,Duration,DayofWeek,Budget,Currency,Clicks,Im
0,KTR,2019-03-24,2020-11-06,593,Sunday,847110,EUR,62554,499997
1,GRZ,2017-05-21,2018-07-24,429,Sunday,510835,GBP,36176,500001
2,KTR,2017-12-18,2018-02-08,52,Monday,720897,GBP,62299,500001,
AndtheJSONonelookslikethis(again,outputtruncated):
{
"Age":{
"0":29,
"1":29,
"10":80,
So,it'sextremelyeasytosaveDataFrameinmanydifferentformats,andthegoodnewsisthatthereverseisalsotrue:it'sveryeasytoloadaspreadsheetintoDataFrame.Theprogrammersbehindpandaswentalongwaytoeaseourtasks,somethingtobegratefulfor.
VisualizingtheresultsFinally,thejuicybits.Inthissection,we'regoingtovisualizesomeresults.Fromadatascienceperspective,I'mnotveryinterestedingoingdeepintoanalysis,especiallybecausethedataiscompletelyrandom,butstill,thiscodewillgetyoustartedwithgraphsandotherfeatures.
SomethingIlearnedinmylife,andthismaycomeasasurprisetoyou,isthat—looksalsocount,soit'sveryimportantthatwhenyoupresentyourresults,youdoyourbesttomakethempretty.
First,wetellpandastorendergraphsinthecelloutputframe,whichisconvenient.Wedoitwiththefollowing:
#24
%matplotlibinline
Then,weproceedwithsomestyling:
#25
importmatplotlib.pyplotasplt
plt.style.use(['classic','ggplot'])
importpylab
pylab.rcParams.update({'font.family':'serif'})
Itspurposeistomakethegraphswewilllookatinthissectionalittlebitprettier.YoucanalsoinstructtheNotebooktodothiswhenyoustartitfromtheconsolebypassingaparameter,butIwantedtoshowyouthiswaytoosinceitcanbeannoyingtohavetorestarttheNotebookjustbecauseyouwanttoplotsomething.Inthisway,youcandoitontheflyandthenkeepworking.
Wealsousepylabtosetthefont.familytoserif.Thismightnotbenecessaryonyoursystem.TrytocommentitoutandexecutetheNotebook,andseewhetheranythingchanges.
NowthatDataFrameiscomplete,let'srundf.describe()(#26)again.Theresultsshouldlooksomethinglikethis:
Thiskindofquickresultisperfectforsatisfyingthosemanagerswhohave20secondstodedicatetoyouandjustwantroughnumbers.
Onceagain,pleasekeepinmindthatourcampaignshavedifferentcurrencies,sothesenumbersareactuallymeaningless.ThepointhereistodemonstratetheDataFramecapabilities,nottogettoacorrectordetailedanalysisofrealdata.
Alternatively,agraphisusuallymuchbetterthanatablewithnumbersbecauseit'smucheasiertoreaditanditgivesyouimmediatefeedback.So,let'sgraphoutthefourpiecesofinformationwehaveoneachcampaign—'Budget','Spent','Clicks',and'Impressions':
#27
df[['Budget','Spent','Clicks','Impressions']].hist(
bins=16,figsize=(16,6));
Weextrapolatethosefourcolumns(thiswillgiveusanotherDataFramemadewithonlythosecolumns)andcallthehistogramhist()methodonit.Wegivesomemeasurementsonthebinsandfiguresizes,butbasically,everythingisdoneautomatically.
Oneimportantthing:sincethisinstructionistheonlyoneinthiscell(whichalsomeans,it'sthelastone),theNotebookwillprintitsresultbeforedrawingthegraph.Tosuppressthisbehaviorandhaveonlythegraphdrawnwithnoprinting,justaddasemicolonattheend(youthoughtIwasreminiscingaboutJava,didn'tyou?).Herearethegraphs:
Theyarebeautiful,aren'tthey?Didyounoticetheseriffont?Howaboutthemeaningofthosefigures?Ifyougobackandtakealookatthewaywegeneratethedata,youwillseethatallthesegraphsmakeperfectsense:
Budgetissimplyarandomintegerinaninterval,thereforewewereexpectingauniformdistribution,andtherewehaveit;it'spracticallyaconstantline.Spentisauniformdistributionaswell,butthehighendofitsintervalisthebudget,whichismoving.Thismeansweshouldexpectsomethingsuchasaquadratichyperbolethatdecreasestotheright.Andthereitisaswell.Clickswasgeneratedwithatriangulardistributionwithameanroughly20%oftheintervalsize,andyoucanseethatthepeakisrightthere,atabout20%totheleft.ImpressionswasaGaussiandistribution,whichistheonethatassumesthefamousbellshape.Themeanwasexactlyinthemiddleandwehadastandarddeviationof2.Youcanseethatthegraphmatchesthoseparameters.
Good!Let'splotoutthemeasureswecalculated:
#28
df[['CTR','CPC','CPI']].hist(
bins=20,figsize=(16,6))
Hereistheplotrepresentation:
WecanseethattheCPCishighlyskewedtotheleft,meaningthatmostoftheCPCvaluesareverylow.TheCPIhasasimilarshape,butislessextreme.
Now,allthisisnice,butifyouwantedtoanalyzeonlyaparticularsegmentofthedata,howwouldyoudoit?WecanapplyamasktoDataFramesothatwegetanotheronewithonlytherowsthatsatisfythemaskcondition.It'slikeapplyingaglobal,row-wiseifclause:
#29
mask=(df.Spent>0.75*df.Budget)
df[mask][['Budget','Spent','Clicks','Impressions']].hist(
bins=15,figsize=(16,6),color='g');
Inthiscase,Ipreparedmasktofilteroutalltherowsforwhichtheamountspentislessthanorequalto75%ofthebudget.Inotherwords,we'llincludeonlythosecampaignsforwhichwehavespentatleastthree-quartersofthebudget.Noticethatinmask,IamshowingyouanalternativewayofaskingforaDataFramecolumn,byusingdirectpropertyaccess(object.property_name),insteadofdictionary-likeaccess(object['property_name']).Ifproperty_nameisavalidPythonname,youcanusebothwaysinterchangeably(JavaScriptworkslikethisaswell).
maskisappliedinthesamewaythatweaccessadictionarywithakey.WhenyouapplymasktoDataFrame,yougetbackanotheroneandweselectonlytherelevantcolumnsonthisandcallhist()again.Thistime,justforfun,wewanttheresultstobegreen:
Notethattheshapesofthegraphshaven'tchangedmuch,apartfromtheSpentgraph,whichisquitedifferent.Thereasonforthisisthatwe'veaskedonlyfortherowswheretheamountspentisatleast75%ofthebudget.Thismeansthatwe'reincludingonlytherowswheretheamountspentisclosetothebudget.Thebudgetnumberscomefromauniformdistribution.Therefore,itisquiteobviousthattheSpentgraphisnowassumingthatkindofshape.Ifyoumaketheboundaryeventighterandaskfor85%ormore,you'llseetheSpentgraphbecomemoreandmoreliketheBudgetone.
Let'snowaskforsomethingdifferent.Howaboutthemeasureof'Spent','Clicks',and'Impressions'groupedbydayoftheweek:
#30
df_weekday=df.groupby(['DayofWeek']).sum()
df_weekday[['Impressions','Spent','Clicks']].plot(
figsize=(16,6),subplots=True);
ThefirstlinecreatesanewDataFrame,df_weekday,byaskingforagroupingby'DayofWeek'ondf.Thefunctionusedtoaggregatethedataisanaddition.
Thesecondlinegetsasliceofdf_weekdayusingalistofcolumnnames,somethingwe'reaccustomedtobynow.Ontheresult,wecallplot(),whichisabitdifferenttohist().Thesubplots=Trueoptionmakesplotdrawthreeindependentgraphs:
Interestinglyenough,wecanseethatmostoftheactionhappensonSundaysandWednesdays.Ifthisweremeaningfuldata,thiswouldpotentiallybeimportantinformationtogivetoourclients,whichiswhyI'mshowingyouthisexample.
Notethatthedaysaresortedalphabetically,whichscramblesthemupabit.Canyouthinkofaquicksolutionthatwouldfixtheissue?I'llleaveittoyouasanexercisetocomeupwithsomething.
Let'sfinishthispresentationsectionwithacouplemorethings.First,asimpleaggregation.Wewanttoaggregateon'TargetGender'and'TargetAge',andshow'Impressions'and'Spent'.Forboth,wewanttosee'mean'andthestandarddeviation('std'):
#31
agg_config={
'Impressions':['mean','std'],
'Spent':['mean','std'],
}
df.groupby(['TargetGender','TargetAge']).agg(agg_config)
It'sveryeasytodo.Wewillprepareadictionarythatwe'lluseasaconfiguration.Then,weperformagroupingonthe'TargetGender'and'TargetAge'columns,andwepassourconfigurationdictionarytotheagg()method.Theresultistruncatedandrearrangedalittlebittomakeitfit,andshownhere:
ImpressionsSpent
meanstdmean
TargetGenderTargetAge
B20-25499999.7415731.904111218917.000000
20-30499999.6184212.039393237180.644737
20-35499999.3580252.039048256378.641975
............
M20-25499999.3552632.108421277232.276316
20-30499999.6352942.075062252140.117647
20-35499999.8358211.871614308598.149254
Thisisthetextualrepresentation,ofcourse,butyoucanalsohavetheHTMLone.
Let'sdoonemorethingbeforewewrapthischapterup.Iwanttoshowyousomethingcalledapivottable.It'skindofabuzzwordinthedataenvironment,soanexamplesuchasthisone,albeitverysimple,isamust:
#32
pivot=df.pivot_table(
values=['Impressions','Clicks','Spent'],
index=['TargetAge'],
columns=['TargetGender'],
aggfunc=np.sum
)
pivot
Wecreateapivottablethatshowsusthecorrelationbetween'TargetAge'and'Impressions','Clicks',and'Spent'.Theselastthreewillbesubdividedaccordingto'TargetGender'.Theaggregationfunction(aggfunc)usedtocalculatetheresultsisthenumpy.sumfunction(numpy.meanwouldbethedefault,hadInotspecifiedanything).
Aftercreatingthepivottable,wesimplyprintitwiththelastlineinthecell,andhere'sacropoftheresult:
It'sprettyclearandprovidesveryusefulinformationwhenthedataismeaningful.
That'sit!I'llleaveyoutodiscovermoreaboutthewonderfulworldofIPython,Jupyter,anddatascience.IstronglyencourageyoutogetcomfortablewiththeNotebookenvironment.It'smuchbetterthanaconsole,it'sextremelypractical
andfuntouse,andyoucanevencreateslidesanddocumentswithit.
Wheredowegofromhere?Datascienceisindeedafascinatingsubject.AsIsaidintheintroduction,thosewhowanttodelveintoitsmeandersneedtobewell-trainedinmathematicsandstatistics.Workingwithdatathathasbeeninterpolatedincorrectlyrendersanyresultaboutituseless.Thesamegoesfordatathathasbeenextrapolatedincorrectlyorsampledwiththewrongfrequency.Togiveyouanexample,imagineapopulationofindividualsthatarealignedinaqueue.Ifforsomereason,thegenderofthatpopulationalternatedbetweenmaleandfemale,thequeuewouldbesomethinglikethis:F-M-F-M-F-M-F-M-F...
Ifyousampledittakingonlytheevenelements,youwoulddrawtheconclusionthatthepopulationwasmadeuponlyofmales,whilesamplingtheoddoneswouldtellyouexactlytheopposite.
Ofcourse,thiswasjustasillyexample,Iknow,butit'sveryeasytomakemistakesinthisfield,especiallywhendealingwithbigdatawheresamplingismandatoryandtherefore,thequalityoftheintrospectionyoumakedepends,firstandforemost,onthequalityofthesamplingitself.
WhenitcomestodatascienceandPython,thesearethemaintoolsyouwanttolookat:
NumPy(http://www.numpy.org/):ThisisthemainpackageforscientificcomputingwithPython.ItcontainsapowerfulN-dimensionalarrayobject,sophisticated(broadcasting)functions,toolsforintegratingC/C++andFortrancode,usefullinearalgebra,theFouriertransform,randomnumbercapabilities,andmuchmore.Scikit-Learn(http://scikit-learn.org/):ThisisprobablythemostpopularmachinelearninglibraryinPython.Ithassimpleandefficienttoolsfordatamininganddataanalysis,accessibletoeverybody,andreusableinvariouscontexts.It'sbuiltonNumPy,SciPy,andMatplotlib.Pandas(http://pandas.pydata.org/):Thisisanopensource,BSD-licensedlibraryprovidinghigh-performance,easy-to-usedatastructures,anddataanalysistools.We'veuseditthroughoutthischapter.IPython(http://ipython.org/)/Jupyter(http://jupyter.org/):Theseprovidea
richarchitectureforinteractivecomputing.Matplotlib(http://matplotlib.org/):ThisisaPython2-Dplottinglibrarythatproducespublication-qualityfiguresinavarietyofhard-copyformatsandinteractiveenvironmentsacrossplatforms.MatplotlibcanbeusedinPythonscripts,thePythonandIPythonshell,JupyterNotebook,webapplicationservers,andfourgraphicaluserinterfacetoolkits.Numba(http://numba.pydata.org/):Thisgivesyouthepowertospeedupyourapplicationswithhigh-performancefunctionswrittendirectlyinPython.Withafewannotations,array-orientedandmath-heavyPythoncodecanbejust-in-timecompiledtonativemachineinstructions,similarinperformancetoC,C++,andFortran,withouthavingtoswitchlanguagesorPythoninterpreters.Bokeh(http://bokeh.pydata.org/):ThisisaPython-interactivevisualizationlibrarythattargetsmodernwebbrowsersforpresentation.Itsgoalistoprovideelegant,conciseconstructionofnovelgraphicsinthestyleofD3.js,butalsodeliverthiscapabilitywithhigh-performanceinteractivityoververylargeorstreamingdatasets.
Otherthanthesesinglelibraries,youcanalsofindecosystems,suchasSciPy(http://scipy.org/)andtheaforementionedAnaconda(https://anaconda.org/),thatbundleseveraldifferentpackagesinordertogiveyousomethingthatjustworksinan"out-of-the-box"fashion.
Installingallthesetoolsandtheirseveraldependenciesishardonsomesystems,soIsuggestthatyoutryoutecosystemsaswelltoseewhetheryouarecomfortablewiththem.Itmaybeworthit.
SummaryInthischapter,wetalkedaboutdatascience.Ratherthanattemptingtoexplainanythingaboutthisextremelywidesubject,wedelvedintoaproject.WefamiliarizedourselveswiththeJupyterNotebook,andwithdifferentlibraries,suchasPandas,Matplotlib,andNumPy.
Ofcourse,havingtocompressallthisinformationintoonesinglechaptermeansIcouldonlytouchbrieflyonthesubjectsIpresented.Ihopetheprojectwe'vegonethroughtogetherhasbeencomprehensiveenoughtogiveyouanideaofwhatcouldpotentiallybetheworkflowyoumightfollowwhenworkinginthisfield.
Thenextchapterisdedicatedtowebdevelopment.So,makesureyouhaveabrowserreadyandlet'sgo!
WebDevelopment"Don'tbelieveeverythingyoureadontheweb."
–Confucius
Inthischapter,we'regoingtoworkonawebsitetogether.Byworkingonasmallproject,myaimistoopenawindowforyoutotakeapeekintowhatwebdevelopmentis,alongwiththemainconceptsandtoolsyoushouldknowifyouwanttobesuccessfulwithit.
Inparticular,wearegoingtoexplorethefollowing:
ThebasicconceptsaroundwebprogrammingTheDjangowebframeworkRegularexpressionsAbriefoverviewoftheFlaskandFalconwebframeworks
Let'sstartwiththefundamentals.
Whatistheweb?
TheWorldWideWeb(WWW),orsimplytheweb,isawayofaccessinginformationthroughtheuseofamediumcalledtheinternet.Theinternetisahugenetworkofnetworks,anetworkinginfrastructure.Itspurposeistoconnectbillionsofdevicestogether,allaroundtheglobe,sothattheycancommunicatewithoneanother.Informationtravelsthroughtheinternetinarichvarietyoflanguages,calledprotocols,thatallowdifferentdevicestospeakthesametongueinordertosharecontent.
Thewebisaninformation-sharingmodel,builtontopoftheinternet,whichemploystheHypertextTransferProtocol(HTTP)asabasisfordatacommunication.Theweb,therefore,isjustoneofseveraldifferentwaysinformationcanbeexchangedovertheinternet;email,instantmessaging,newsgroups,andsoon,allrelyondifferentprotocols.
Howdoesthewebwork?Inanutshell,HTTPisanasymmetricrequest-responseclient-serverprotocol.AnHTTPclientsendsarequestmessagetoanHTTPserver.Theserver,inturn,returnsaresponsemessage.Inotherwords,HTTPisapullprotocolinwhichtheclientpullsinformationfromtheserver(asopposedtoapushprotocol,inwhichtheserverpushesinformationdowntotheclient).Takealookatthefollowingdiagram:
HTTPisbasedonTCP/IP(ortheTransmissionControlProtocol/InternetProtocol),whichprovidesthetoolsforareliablecommunicationexchange.
AnimportantfeatureoftheHTTPprotocolisthatit'sstateless.Thismeansthatthecurrentrequesthasnoknowledgeaboutwhathappenedinpreviousrequests.Thisisalimitation,butyoucanbrowseawebsitewiththeillusionofbeingloggedin.Underthecoversthough,whathappensisthat,onlogin,atokenofuserinformationissaved(mostoftenontheclientside,inspecialfilescalledcookies)sothateachrequesttheusermakescarriesthemeansfortheservertorecognizetheuserandprovideacustominterfacebyshowingtheirname,keepingtheirbasketpopulated,andsoon.
Eventhoughit'sveryinteresting,we'renotgoingtodelveintotherichdetailsofHTTPandhowitworks.However,we'regoingtowriteasmallwebsite,whichmeanswe'llhavetowritethecodetohandleHTTPrequestsandreturnHTTPresponses.Iwon'tkeepprependingHTTPtothetermsrequestandresponsefromnowon,asItrusttherewon'tbeanyconfusion.
TheDjangowebframeworkForourproject,we'regoingtouseoneofthemostpopularwebframeworksyoucanfindinthePythonecosystem:Django.
Awebframeworkisasetoftools(libraries,functions,classes,andsoon)thatwecanusetocodeawebsite.Weneedtodecidewhatkindofrequestswewanttoallowtobeissuedagainstourwebserverandhowwerespondtothem.Awebframeworkistheperfecttoolfordoingthatbecauseittakescareofmanythingsforussothatwecanconcentrateonlyontheimportantbitswithouthavingtoreinventthewheel.
Therearedifferenttypesofframeworks.Notallofthemaredesignedforwritingcodefortheweb.Ingeneral,aframeworkisatoolthatprovidesfunctionalitiestofacilitatethedevelopmentofsoftwareapplications,products,andsolutions.
Djangodesignphilosophy
Djangoisdesignedaccordingtothefollowingprinciples:
Don'trepeatyourself(DRY):Don'trepeatcode,andcodeinawaythatmakestheframeworkdeduceasmuchaspossiblefromaslittleaspossible.Loosecoupling:Thevariouslayersoftheframeworkshouldn'tknowabouteachother(unlessabsolutelynecessaryforwhateverreason).Loosecouplingworksbestwhenparalleledwithhighcohesion.Puttingtogetherthingswhichchangeforthesamereason,andspreadingapartthosewhichchangefordifferentreasons.Lesscode:Applicationsshouldusetheleastpossibleamountofcode,andbewritteninawaythatfavorsreuseasmuchaspossible.Consistency:WhenusingtheDjangoframework,regardlessofwhichlayeryou'recodingagainst,yourexperiencewillbeveryconsistentwiththedesignpatternsandparadigmsthatwerechosentolayouttheproject.
Theframeworkitselfisdesignedaroundthemodel-template-view(MTV)pattern,whichisavariantofmodel-view-controller(MVC),whichiswidelyemployedbyotherframeworks.Thepurposeofsuchpatternsistoseparateconcernsandpromotecodereuseandquality.
ThemodellayerOfthethreelayers,thisistheonethatdefinesthestructureofthedatathatishandledbytheapplication,anddealswithdatasources.Amodelisaclassthatrepresentsadatastructure.ThroughsomeDjangomagic,modelsaremappedtodatabasetablessothatyoucanstoreyourdatainarelationaldatabase.
Arelationaldatabasestoresdataintablesinwhicheachcolumnisapropertyofthedataandeachrowrepresentsasingleitemorentryinthecollectionrepresentedbythattable.Throughtheprimarykeyofeachtable,whichisthatpartofthedatathatallowsittouniquelyidentifyeachitem,itispossibletoestablishrelationshipsbetweenitemsbelongingtodifferenttables,thatis,toputthemintorelation.
Thebeautyofthissystemisthatyoudon'thavetowritedatabase-specificcodeinordertohandleyourdata.Youjusthavetoconfigureyourmodelscorrectlyandusethem.TheworkonthedatabaseisdoneforyoubytheDjangoobject-relationalmapping(ORM),whichtakescareoftranslatingoperationsdoneonPythonobjectsintoalanguagethatarelationaldatabasecanunderstand:SQL(orStructuredQueryLanguage).WesawanexampleofORMinChapter7,FilesandDataPersistence,whereweexploredSQLAlchemy.
Onebenefitofthisapproachisthatyouwillbeabletochangedatabaseswithoutrewritingyourcode,sinceallthedatabase-specificcodeisproducedbyDjangoonthefly,accordingtowhichdatabaseit'sconnectedto.RelationaldatabasesspeakSQL,buteachofthemhasitsownuniqueflavorofit;therefore,nothavingtohardcodeanySQLinourapplicationisatremendousadvantage.
Djangoallowsyoutomodifyyourmodelsatanytime.Whenyoudo,youcanrunacommandthatcreatesamigration,whichisthesetofinstructionsneededtoportthedatabaseinastatethatrepresentsthecurrentdefinitionofyourmodels.
Tosummarize,thislayerdealswithdefiningthedatastructuresyouneedtohandleinyourwebsiteandgivesyouthemeanstosaveandloadthemfromandtothedatabasebysimplyaccessingthemodels,whicharePythonobjects.
TheviewlayerThefunctionofaviewishandlingarequest,performingwhateveractionneedstobecarriedout,andeventuallyreturningaresponse.Forexample,ifyouopenyourbrowserandrequestapagecorrespondingtoacategoryofproductsinane-commerceshop,theviewwilllikelytalktothedatabase,askingforallthecategoriesthatarechildrenoftheselectedcategory(forexample,todisplaytheminanavigationsidebar)andforalltheproductsthatbelongtotheselectedcategory,inordertodisplaythemonthepage.
Therefore,theviewisthemechanismthroughwhichwecanfulfillarequest.Itsresult,theresponseobject,canassumeseveraldifferentforms:aJSONpayload,text,anHTMLpage,andsoon.Whenyoucodeawebsite,yourresponsesusuallyconsistofHTMLorJSON.
TheHypertextMarkupLanguage,orHTML,isthestandardmarkuplanguageusedtocreatewebpages.WebbrowsersrunenginesthatarecapableofinterpretingHTMLcodeandrenderitintowhatweseewhenweopenapageofawebsite.
Thetemplatelayer
Thisisthelayerthatprovidesthebridgebetweenbackendandfrontenddevelopment.WhenaviewhastoreturnHTML,itusuallydoesitbypreparingacontextobject(adictionary)withsomedata,andthenitfeedsthiscontexttoatemplate,whichisrendered(thatistosay,transformedintoHTML),andreturnedtothecallerintheformofaresponse(moreprecisely,thebodyoftheresponse).Thismechanismallowsformaximumcodereuse.Ifyougobacktothecategoryexample,it'seasytoseethat,ifyoubrowseawebsitethatsellsproducts,itdoesn'treallymatterwhichcategoryyouclickonorwhattypeofsearchyouperform,thelayoutoftheproductspagedoesn'tchange.Whatdoeschangeisthedatawithwhichthatpageispopulated.
Therefore,thelayoutofthepageisdefinedbyatemplate,whichiswritteninamixtureofHTMLandDjangotemplatelanguages.Theviewthatservesthatpagecollectsalltheproductstobedisplayedinthecontextdictionary,andfeedsittothetemplate,whichwillberenderedintoanHTMLpagebytheDjangotemplateengine.
TheDjangoURLdispatcherThewayDjangoassociatesaUniformResourceLocator(URL)withaviewisbymatchingtherequestedURLwiththepatternsthatareregisteredinaspecialfile.AURLrepresentsapageinawebsitesohttp://mysite.com/categories?id=123wouldprobablypointtothepageforthecategorywithID123onmywebsite,whilehttps://mysite.com/loginwouldprobablybetheuserloginpage.
ThedifferencebetweenHTTPandHTTPSisthatthelatteraddsencryptiontotheprotocolsothatthedatathatyouexchangewiththewebsiteissecured.Whenyouputyourcreditcarddetailsonawebsite,orloginanywhere,ordoanythingaroundsensitivedata,youwanttomakesurethatyou'reusingHTTPS.
RegularexpressionsThewayDjangomatchesURLstopatternsisthrougharegularexpression.Aregularexpressionisasequenceofcharactersthatdefinesasearchpatternwithwhichwecancarryoutoperations,suchaspatternandstringmatching,andfind/replace.
Regularexpressionshaveaspecialsyntaxtoindicatethingssuchasdigits,letters,andspaces,aswellashowmanytimesweexpectacharactertoappear,andmuchmore.Acompleteexplanationofthistopicisoutsidethescopeofthisbook.However,itisaveryimportantsubject,sotheprojectwe'regoingtoworkontogetherwillrevolvearoundit,inthehopethatyouwillbestimulatedtofindthetimetoexploreitabitmoreonyourown.
Togiveyouaquickexample,imaginethatyouwantedtospecifyapatterntomatchadate,suchas"26-12-1947".Thisstringconsistsoftwodigits,onedash,twodigits,onedash,andfinallyfourdigits.Therefore,wecouldwriteitlikethis:r'[0-9]{2}-[0-9]{2}-[0-9]{4}'.Wecreatedaclassbyusingsquarebrackets,andwedefinedarangeofdigitsinside,from0to9,henceallthepossibledigits.Then,betweencurlybrackets,wesaythatweexpecttwoofthem.Thenadash,thenwerepeatthispatternonceasitis,andoncemore,bychanginghowmanydigitsweexpect,andwithoutthefinaldash.Havingaclasssuchas[0-9]issuchacommonpatternthataspecialnotationhasbeencreatedasashortcut:'\d'.Therefore,wecanrewritethepatternlikethis:r'\d{2}-\d{2}-\d{4}'anditwillworkexactlythesame.Thatrinfrontofthestringstandsforraw,anditspurposeistopreventspythonfromtryingtointerpretbackslashescapesequences,sothattheycanbepassedas-istotheregularexpressionengine.
AregexwebsiteSo,hereweare.We'llcodeawebsitethatstoresregularexpressionssothatwe'llbeabletoplaywiththemalittlebit.
Beforeweproceedwithcreatingtheproject,I'dliketotalkaboutCascadingStyleSheets(CSS).CSSarefilesinwhichwespecifyhowthevariouselementsonanHTMLpagelook.Youcansetallsortsofproperties,suchasshape,size,color,margins,borders,andfonts.Inthisproject,Ihavetriedmybesttoachieveadecentresultonthepages,butI'mneitherafrontenddevelopernoradesigner,sopleasedon'tpaytoomuchattentiontohowthingslook.Trytofocusonhowtheywork.
SettingupDjangoOntheDjangowebsite(https://www.djangoproject.com/),youcanfollowthetutorial,whichgivesyouaprettygoodideaofDjango'scapabilities.Ifyouwant,youcanfollowthattutorialfirstandthencomebacktothisexample.So,firstthingsfirst;let'sinstallDjangoinyourvirtualenvironment(youwillfinditisalreadyinstalled,asitispartoftherequirementsfile):$pipinstalldjango
Whenthiscommandisdone,youcantestitwithinaconsole(trydoingitwithbpython,itgivesyouashellsimilartoIPythonbutwithniceintrospectioncapabilities):
>>>importdjango
>>>django.VERSION
(2,0,5,'final',0)
NowthatDjangoisinstalled,we'regoodtogo.We'llhavetodosomescaffolding,soI'llquicklyguideyouthroughthat.
StartingtheprojectChooseafolderinthebook'senvironmentandchangeintothat.I'llusech14.Fromthere,wecanstartaDjangoprojectwiththefollowingcommand:
$django-adminstartprojectregex
ThiswillpreparetheskeletonforaDjangoprojectcalledregex.Changeintotheregexfolderandrunthefollowing:
$pythonmanage.pyrunserver
Youshouldbeabletogotohttp://127.0.0.1:8000/withyourbrowserandseetheItworked!defaultDjangopage.Thismeansthattheprojectiscorrectlysetup.Whenyou'veseenthepage,killtheserverwithCtrl+C(orwhateveritsaysintheconsole).I'llpastethefinalstructurefortheprojectnowsothatyoucanuseitasareference:
$tree-Aregex#fromthech14folder
regex
├──entries
│├──__init__.py
│├──admin.py
│├──forms.py
│├──migrations
││├──0001_initial.py
││└──__init__.py
│├──models.py
│├──static
││└──entries
││└──css
││└──main.css
│├──templates
││└──entries
││├──base.html
││├──footer.html
││├──home.html
││├──insert.html
││└──list.html
│└──views.py
├──manage.py
└──regex
├──__init__.py
├──settings.py
├──urls.py
└──wsgi.py
Don'tworryifyou'remissingfiles,we'llgetthere.ADjangoprojectistypically
acollectionofseveraldifferentapplications.Eachapplicationismeanttoprovideafunctionalityinaself-contained,reusablefashion.We'llcreatejustone,calledentries:
$pythonmanage.pystartappentries
Withintheentriesfolderthathasbeencreated,youcangetridofthetests.pymodule.
Now,let'sfixtheregex/settings.pyfileintheregexfolder.WeneedtoaddourapplicationtotheINSTALLED_APPSlistsothatwecanuseit(additatthebottomofthelist):
INSTALLED_APPS=[
'django.contrib.admin',
...
'entries',
]
Then,youmaywanttofixthelanguageandtimezoneaccordingtoyourpersonalpreference.IliveinLondon,soIsetthemlikethis:
LANGUAGE_CODE='en-gb'
TIME_ZONE='Europe/London'
Thereisnothingelsetodointhisfile,soyoucansaveandcloseit.
Nowit'stimetoapplythemigrationstothedatabase.Djangoneedsdatabasesupporttohandleusers,sessions,andthingslikethat,soweneedtocreateadatabaseandpopulateitwiththenecessarydata.Luckily,thisisveryeasilydonewiththefollowingcommand:
$pythonmanage.pymigrate
Forthisproject,weuseanSQLitedatabase,whichisbasicallyjustafile.Onarealproject,youwoulduseadifferentdatabaseengine,suchasMySQLorPostgreSQL.
CreatingusersNowthatwehaveadatabase,wecancreateasuperuserusingtheconsole:
$pythonmanage.pycreatesuperuser
Afterenteringtheusernameandotherdetails,wehaveauserwithadminprivileges.ThisisenoughtoaccesstheDjangoadminsection,sotrytostarttheserver:
$pythonmanage.pyrunserver
ThiswillstarttheDjangodevelopmentserver,whichisaveryusefulbuilt-inwebserverthatyoucanusewhileworkingwithDjango.Nowthattheserverisrunning,wecanaccesstheadminpageathttp://localhost:8000/admin/.Iwillshowyouascreenshotofthissectionlater.IfyouloginwiththecredentialsoftheuseryoujustcreatedandheadtotheAuthenticationandAuthorizationsection,you'llfindUsers.Openthatandyouwillbeabletoseethelistofusers.Youcaneditthedetailsofanyuseryouwantasanadmin.Inourcase,makesureyoucreateadifferentonesothatthereareatleasttwousersinthesystem(we'llneedthemlater).I'llcallthefirstuserFabrizio(username:fab)andthesecondoneAdriano(username:adri),inhonorofmyfather.
Bytheway,youshouldseethattheDjangoadminpanelcomesforfreeautomatically.Youdefineyourmodels,hookthemup,andthat'sit.ThisisanincredibletoolthatshowshowadvancedDjango'sintrospectioncapabilitiesare.Moreover,itiscompletelycustomizableandextendable.It'strulyanexcellentpieceofwork.
AddingtheEntrymodelNowthattheboilerplateisoutoftheway,andwehaveacoupleofusers,we'rereadytocode.WestartbyaddingtheEntrymodeltoourapplicationsothatwecanstoreobjectsinthedatabase.Here'sthecodeyou'llneedtoadd(remembertousetheprojecttreeforreference):
#entries/models.py
fromdjango.dbimportmodels
fromdjango.contrib.auth.modelsimportUser
fromdjango.utilsimporttimezone
classEntry(models.Model):
user=models.ForeignKey(User,on_delete=models.CASCADE)
pattern=models.CharField(max_length=255)
test_string=models.CharField(max_length=255)
date_added=models.DateTimeField(default=timezone.now)
classMeta:
verbose_name_plural='entries'
Thisisthemodelwe'llusetostoreregularexpressionsinoursystem.We'llstoreapattern,ateststring,areferencetotheuserwhocreatedtheentry,andthemomentofcreation.Youcanseethatcreatingamodelisactuallyquiteeasy,butnonetheless,let'sgothroughitlinebyline.
Firstweneedtoimportthemodelsmodulefromdjango.db.ThiswillgiveusthebaseclassforourEntrymodel.Djangomodelsarespecialclassesandmuchisdoneforusbehindthesceneswhenweinheritfrommodels.Model.
Wewantareferencetotheuserwhocreatedtheentry,soweneedtoimporttheUsermodelfromDjango'sauthorizationapplicationandwealsoneedtoimportthetimezonemodeltogetaccesstothetimezone.now()function,whichprovidesuswithatimezone-awareversionofdatetime.now().Thebeautyofthisisthatit'shookedupwiththeTIME_ZONEsettingsIshowedyoubefore.
Asfortheprimarykeyforthisclass,ifwedon'tsetoneexplicitly,Djangowilladdoneforus.AprimarykeyisakeythatallowsustouniquelyidentifyanEntryobjectinthedatabase(inthiscase,Djangowilladdanauto-incrementingintegerID).
So,wedefineourclass,andwesetupfourclassattributes.WehaveaForeignKeyattributethatisourreferencetotheUsermodel.WealsohavetwoCharFieldattributesthatholdthepatternandteststringsforourregularexpressions.WealsohaveDateTimeField,whosedefaultvalueissettotimezone.now.Notethatwedon'tcalltimezone.nowrightthere,it'snow,notnow().So,we'renotpassingaDateTimeinstance(setatthemomentintimewhenthatlineisparsed)rather,we'repassingacallable,afunctionthatiscalledwhenwesaveanentryinthedatabase.ThisissimilartothecallbackmechanismweusedinChapter12,GUIsandScripts,whenwewereassigningcommandstobuttonclicks.
Thelasttwolinesareveryinteresting.WedefineaMetaclasswithintheEntryclassitself.TheMetaclassisusedbyDjangotoprovideallsortsofextrainformationforamodel.DjangohasagreatdealoflogicunderthehoodtoadaptitsbehavioraccordingtotheinformationweputintotheMetaclass.Inthiscase,intheadminpanel,thepluralizedversionofEntrywouldbeEntrys,whichiswrong,thereforeweneedtosetitmanually.Wespecifythepluralinalllowercase,asDjangotakescareofcapitalizingitforuswhenneeded.
Nowthatwehaveanewmodel,weneedtoupdatethedatabasetoreflectthenewstateofthecode.Inordertodothis,weneedtoinstructDjangothatitneedstocreatethecodetoupdatethedatabase.Thiscodeiscalledmigration.Let'screateitandexecuteit:
$pythonmanage.pymakemigrationsentries
$pythonmanage.pymigrate
Afterthesetwoinstructions,thedatabasewillbereadytostoreEntryobjects.
Therearetwodifferentkindsofmigrations:dataandschemamigrations.Datamigrationsportdatafromonestatetoanotherwithoutalteringitsstructure.Forexample,adatamigrationcouldsetallproductsforacategoryasoutofstockbyswitchingaflagtoFalseor0.Aschemamigrationisasetofinstructionsthatalterthestructureofthedatabaseschema.Forexample,thatcouldbeaddinganagecolumntoaPersontable,orincreasingthemaximumlengthofafieldtoaccountforverylongaddresses.WhendevelopingwithDjango,it'squitecommontohavetoperformbothkindsofmigrationsoverthecourseofdevelopment.Dataevolvescontinuously,especiallyifyoucodeinanagileenvironment.
CustomizingtheadminpanelThenextstepistohooktheEntrymodelupwiththeadminpanel.Youcandoitwithonelineofcode,butinthiscase,Iwanttoaddsomeoptionstocustomizethewaytheadminpanelshowstheentries,bothinthelistviewofallentryitemsinthedatabaseandintheformviewthatallowsustocreateandmodifythem.
Allweneedtodoistoaddthefollowingcode:
#entries/admin.py
fromdjango.contribimportadmin
from.modelsimportEntry
@admin.register(Entry)
classEntryAdmin(admin.ModelAdmin):
fieldsets=[
('RegularExpression',
{'fields':['pattern','test_string']}),
('OtherInformation',
{'fields':['user','date_added']}),
]
list_display=('pattern','test_string','user')
list_filter=['user']
search_fields=['test_string']
Thisissimplybeautiful.Myguessisthatyouprobablyalreadyunderstandmostofit,evenifyou'renewtoDjango.
So,westartbyimportingtheadminmoduleandtheEntrymodel.Becausewewanttofostercodereuse,weimporttheEntrymodelusingarelativeimport(there'sadotbeforemodels).Thiswillallowustomoveorrenametheapplicationwithouttoomuchtrouble.Then,wedefinetheEntryAdminclass,whichinheritsfromadmin.ModelAdmin.ThedecorationontheclasstellsDjangotodisplaytheEntrymodelintheadminpanel,andwhatweputintheEntryAdminclasstellsDjangohowtocustomizethewayithandlesthismodel.
First,wespecifythefieldsetsforthecreate/editpage.Thiswilldividethepageintotwosectionssothatwegetabettervisualizationofthecontent(patternandteststring)andtheotherdetails(userandtimestamp)separately.
Then,wecustomizethewaythelistpagedisplaystheresults.Wewanttoseeallthefields,butnotthedate.Wealsowanttobeabletofilterontheusersothat
wecanhavealistofalltheentriesbyjustoneuser,andwewanttobeabletosearchontest_string.
Iwillgoaheadandaddthreeentries,oneformyselfandtwoonbehalfofmyfather.Theresultisshowninthenexttwoscreenshots.Afterinsertingthem,thelistpagelookslikethis:
IhavehighlightedthethreepartsofthisviewthatwecustomizedintheEntryAdminclass.Wecanfilterbyuser,wecansearch,andwehaveallthefieldsdisplayed.Ifyouclickonapattern,theeditviewopensup.
Afterourcustomization,itlookslikethis:
Noticehowwehavetwosections:RegularExpressionandOtherInformation,thankstoourcustomEntryAdminclass.Haveagowithit,addsomeentriestoacoupleofdifferentusers,getfamiliarwiththeinterface.Isn'titnicetohaveallthisforfree?
CreatingtheformEverytimeyoufillinyourdetailsonawebpage,you'reinsertingdatainformfields.AformisapartoftheHTMLDocumentObjectModel(DOM)tree.InHTML,youcreateaformbyusingtheformtag.Whenyouclickonthesubmitbutton,yourbrowsernormallypackstheformdatatogetherandputsitinthebodyofaPOSTrequest.AsopposedtoGETrequests,whichareusedtoaskthewebserverforaresource,aPOSTrequestnormallysendsdatatothewebserverwiththeaimofcreatingorupdatingaresource.Forthisreason,handlingPOSTrequestsusuallyrequiresmorecarethanGETrequests.
WhentheserverreceivesdatafromaPOSTrequest,thatdataneedstobevalidated.Moreover,theserverneedstoemploysecuritymechanismstoprotectagainstvarioustypesofattacks.Oneattackthatisverydangerousisthecross-siterequestforgery(CSRF)attack,whichhappenswhendataissentfromadomainthatisnottheonetheuserisauthenticatedon.Djangoallowsyoutohandlethisissueinaveryelegantway.
So,insteadofbeinglazyandusingtheDjangoadmintocreatetheentries,I'mgoingtoshowyouhowtodoitusingaDjangoform.Byusingthetoolstheframeworkgivesyou,yougetaverygooddegreeofvalidationworkalreadydone(infact,wewon'tneedtoaddanycustomvalidationourselves).
TherearetwokindsofformclassesinDjango:FormandModelForm.Youusetheformertocreateaformwhoseshapeandbehaviordependsonhowyoucodetheclass,whatfieldsyouadd,andsoon.Ontheotherhand,thelatterisatypeofformthat,albeitstillcustomizable,infersfieldsandbehaviorfromamodel.SinceweneedaformfortheEntrymodel,we'llusethatone:
#entries/forms.py
fromdjango.formsimportModelForm
from.modelsimportEntry
classEntryForm(ModelForm):
classMeta:
model=Entry
fields=['pattern','test_string']
Amazinglyenough,thisisallwehavetodotohaveaformthatwecanputona
page.Theonlynotablethinghereisthatwerestrictthefieldstoonlypatternandtest_string.Onlylogged-inuserswillbeallowedaccesstotheinsertpage,andthereforewedon'tneedtoaskwhotheuseris,wealreadyknowthat.Asforthedate,whenwesaveanEntry,thedate_addedfieldwillbesetaccordingtoitsdefault,thereforewedon'tneedtospecifythataswell.We'llseeintheviewhowtofeedtheuserinformationtotheformbeforesaving.So,nowthatthebackgroundworkisdone,allweneedistheviewsandthetemplates.Let'sstartwiththeviews.
Writingtheviews
Weneedtowritethreeviews.Weneedoneforthehomepage,onetodisplaythelistofallentriesforauser,andonetocreateanewentry.Wealsoneedviewstologinandlogout.ButthankstoDjango,wedon'tneedtowritethem.I'llpastethecodeinsteps:
#entries/views.py
importre
fromdjango.contrib.auth.decoratorsimportlogin_required
fromdjango.contrib.messages.viewsimportSuccessMessageMixin
fromdjango.urlsimportreverse_lazy
fromdjango.utils.decoratorsimportmethod_decorator
fromdjango.views.genericimportFormView,TemplateView
from.formsimportEntryForm
from.modelsimportEntry
Let'sstartwiththeimports.Weneedtheremoduletohandleregularexpressions,thenweneedafewclassesandfunctionsfromDjango,andfinally,weneedtheEntrymodelandtheEntryFormform.
ThehomeviewThefirstviewisHomeView:
#entries/views.py
classHomeView(TemplateView):
template_name='entries/home.html'
@method_decorator(
login_required(login_url=reverse_lazy('login')))
defget(self,request,*args,**kwargs):
returnsuper(HomeView,self).get(request,*args,**kwargs)
ItinheritsfromTemplateView,whichmeansthattheresponsewillbecreatedbyrenderingatemplatewiththecontextwe'llcreateintheview.Allwehavetodoisspecifythetemplate_nameclassattributetopointtothecorrecttemplate.Djangopromotescodereusetoapointthatifwedidn'tneedtomakethisviewaccessibleonlytologged-inusers,thefirsttwolineswouldhavebeenallweneeded.
However,wewantthisviewtobeaccessibleonlytologged-inusers;therefore,weneedtodecorateitwithlogin_required.Now,historicallyviewsinDjangowerefunctions;therefore,thisdecoratorwasdesignedtoacceptafunction,andnotamethodlikewehaveinthisclass.We'reusingDjangoclass-basedviewsinthisprojectso,inordertomakethingswork,weneedtotransformlogin_requiredsothatitacceptsamethod(thedifferencebeinginthefirstargument:self).Wedothisbypassinglogin_requiredtomethod_decorator.
Wealsoneedtofeedthelogin_requireddecoratorwithlogin_urlinformation,andherecomesanotherwonderfulfeatureofDjango.Asyou'llseeafterwe'redonewiththeviews,inDjango,youtieaviewtoaURLthroughapattern,consistingofastringwhichmayormaynotbearegularexpression,andpossiblyotherinformation.Youcangiveanametoeachentryintheurls.pyfilesothatwhenyouwanttorefertoaURL,youdon'thavetohardcodeitsvalueintoyourcode.AllyouhavetodoisgetDjangotoreverse-engineerthatURLfromthenamewegavetotheentryinurls.py,definingtheURLandtheviewthatistiedtoit.Thismechanismwillbecomeclearerlater.Fornow,justthinkofreverse('...')asawayofgettingaURLfromanidentifier.Inthisway,youonlywritetheactualURLonce,intheurls.pyfile,whichisbrilliant.Intheviews.pycode,weneedto
usereverse_lazy,whichworksexactlylikereversewithonemajordifference:itonlyfindstheURLwhenweactuallyneedit(inalazyfashion).Thereasonwhyreverse_lazycanbesousefulisthatsometimesitmighthappenthatweneedtoreverseanURLfromanidentifier,butatthemomentwecallreverse,theurls.pymodulehasn'tbeenloadedyet,whichcausesafailure.Thelazybehaviorofreverse_lazysolvestheissuebecauseevenifthecallismadebeforetheurls.pymodulehasbeenloaded,theactualreversingoftheidentifier,togettotherelatedURL,happensinalazyfashion,lateron,whenurls.pyhassurelybeenloaded.
Thegetmethod,whichwejustdecorated,simplycallsthegetmethodoftheparentclass.Ofcourse,thegetmethodisthemethodthatDjangocallswhenaGETrequestisperformedagainsttheURLtiedtothisview.
TheentrylistviewThisviewismuchmoreinterestingthanthepreviousone:
#entries/views.py
classEntryListView(TemplateView):
template_name='entries/list.html'
@method_decorator(
login_required(login_url=reverse_lazy('login')))
defget(self,request,*args,**kwargs):
context=self.get_context_data(**kwargs)
entries=Entry.objects.filter(
user=request.user).order_by('-date_added')
matches=(self._parse_entry(entry)forentryinentries)
context['entries']=list(zip(entries,matches))
returnself.render_to_response(context)
def_parse_entry(self,entry):
match=re.search(entry.pattern,entry.test_string)
ifmatchisnotNone:
return(
match.group(),
match.groups()orNone,
match.groupdict()orNone
)
returnNone
Firstofall,wedecoratethegetmethodaswedidbefore.Insideofit,weneedtopreparealistofEntryobjectsandfeedittothetemplate,whichshowsittotheuser.Inordertodoso,westartbygettingthecontextdictionarylikewe'resupposedtodo,bycallingtheget_context_datamethodoftheTemplateViewclass.Then,weusetheORMtogetalistoftheentries.Wedothisbyaccessingtheobjectsmanager,andcallingafilteronit.Wefiltertheentriesaccordingtowhichuserisloggedin,andweaskforthemtobesortedindescendingorder(that'-'infrontofthenamespecifiesthedescendingorder).TheobjectsmanageristhedefaultmanagereveryDjangomodelisaugmentedwithoncreation:itallowsustointeractwiththedatabasethroughitsmethods.
Weparseeachentrytogetalistofmatches(actually,Icodeditsothatmatchesisageneratorexpression).Finally,weaddtothecontextan'entries'keywhosevalueisthecouplingofentriesandmatches,sothateachEntryinstanceispairedwiththeresultingmatchofitspatternandteststring.
Onthelastline,wesimplyaskDjangotorenderthetemplateusingthecontext
wecreated.
Takealookatthe_parse_entrymethod.Allitdoesisperformasearchontheentry.test_stringwiththeentry.pattern.IftheresultingmatchobjectisnotNone,itmeansthatwefoundsomething.Ifso,wereturnatuplewiththreeelements:theoverallgroup,thesubgroups,andthegroupdictionary.
Noticethatmatch.groups()andmatch.groupdict()mightreturnrespectivelyanemptytupleandanemptydict.InordertonormalizeemptyresultstoasimplerNone,IuseacommonpatterninPythonbyexploitingtheoroperator.AorB,infact,willreturnAifAevaluatestoatruthyvalue,orBotherwise.Canyouthinkhowthismightdifferfromthebehavioroftheandoperator?
Ifyou'renotfamiliarwiththoseterms,don'tworry,you'llseeascreenshotsoonwithanexample.WereturnNoneifthereisnomatch(whichtechnicallyisnotneeded,asPythonwoulddothatanyway,butIhaveincludedithereforthesakeofbeingexplicit).
TheformviewFinally,let'sexamineEntryFormView:
#entries/views.py
classEntryFormView(SuccessMessageMixin,FormView):
template_name='entries/insert.html'
form_class=EntryForm
success_url=reverse_lazy('insert')
success_message="Entrywascreatedsuccessfully"
@method_decorator(
login_required(login_url=reverse_lazy('login')))
defget(self,request,*args,**kwargs):
returnsuper(EntryFormView,self).get(
request,*args,**kwargs)
@method_decorator(
login_required(login_url=reverse_lazy('login')))
defpost(self,request,*args,**kwargs):
returnsuper(EntryFormView,self).post(
request,*args,**kwargs)
defform_valid(self,form):
self._save_with_user(form)
returnsuper(EntryFormView,self).form_valid(form)
def_save_with_user(self,form):
self.object=form.save(commit=False)
self.object.user=self.request.user
self.object.save()
Thisisparticularlyinterestingforafewreasons.First,itshowsusaniceexampleofPython'smultipleinheritance.Wewanttodisplayamessageonthepage,afterhavinginsertedanEntry,soweinheritfromSuccessMessageMixin.Butwewanttohandleaformaswell,sowealsoinheritfromFormView.
Notethat,whenyoudealwithmixinsandinheritance,youmayhavetoconsidertheorderinwhichyouspecifythebaseclassesintheclassdeclaration,asitwillaffecthowmethodsarefoundwhengoinguptheinheritancechaintoserveacall.
Inordertosetupthisviewcorrectly,weneedtospecifyafewattributesatthebeginning:thetemplatetoberendered,theformclasstobeusedtohandlethedatafromthePOSTrequest,theURLweneedtoredirecttheusertointhecaseofsuccess,andthesuccessmessage.
AnotherinterestingfeatureisthatthisviewneedstohandlebothGETandPOSTrequests.Whenwelandontheformpageforthefirsttime,theformisempty,
andthatistheGETrequest.Ontheotherhand,whenwefillintheformandwanttosubmittheEntry,wemakeaPOSTrequest.YoucanseethatthebodyofgetisconceptuallyidenticaltoHomeView.Djangodoeseverythingforus.
Thepostmethodisjustlikeget.Theonlyreasonweneedtocodethesetwomethodsissothatwecandecoratethemtorequirelogin.
WithintheDjangoform-handlingprocess(intheFormViewclass),thereareafewmethodsthatwecanoverrideinordertocustomizetheoverallbehavior.Weneedtodoitwiththeform_validmethod.Thismethodwillbecalledwhentheformvalidationissuccessful.ItspurposeistosavetheformsothatanEntryobjectiscreatedoutofit,andthenstoredinthedatabase.
Theonlyproblemisthatourformismissingtheuser.Weneedtointerceptthatmomentinthechainofcallsandputtheuserinformationinourselves.Thisisdonebycallingthe_save_with_usermethod,whichisverysimple.
First,weaskDjangotosavetheformwiththecommitargumentsettoFalse.ThiscreatesanEntryinstancewithoutattemptingtosaveittothedatabase.Savingitimmediatelywouldfailbecausetheuserinformationisnotthere.
ThenextlineupdatestheEntryinstance(self.object),addingtheuserinformationand,onthelastline,wecansafelysaveit.ThereasonIcalledobjectandsetitontheinstancelikethatwastofollowwhattheoriginalFormViewclassdoes.
We'refiddlingwiththeDjangomechanismhere,soifwewantthewholethingtowork,weneedtopayattentiontowhenandhowwemodifyitsbehavior,andmakesurewedon'talteritincorrectly.Forthisreason,it'sveryimportanttoremembertocalltheform_validmethodofthebaseclass(weusesuperforthat)attheendofourowncustomizedversion,tomakesurethateveryotheractionthatmethodusuallyperformsiscarriedoutcorrectly.
Notehowtherequestistiedtoeachviewinstance(self.request)sothatwedon'tneedtopassitthroughwhenwerefactorourlogicintomethods.NotealsothattheuserinformationhasbeenaddedtotherequestautomaticallybyDjango.Finally,thereasonwhyalltheprocessissplitintoverysmallmethodsliketheseissothatwecanonlyoverridethosethatweneedtocustomize.Allthisremovestheneedtowritealotofcode.
Nowthatwehavetheviewscovered,let'sseehowwecouplethemtotheURLs.
TyingupURLsandviewsIntheurls.pymodule,wetieeachviewtoaURL.Therearemanywaysofdoingthis.Ichosethesimplestone,whichworksperfectlyfortheextentofthisexercise,butyoumaywanttoexplorethissubjectmoredeeplyifyouintendtoworkwithDjango.Thisisthecorearoundwhichthewholewebsitelogicwillrevolve;therefore,youshouldtrytogetitdowncorrectly.Notethattheurls.pymodulebelongstotheprojectfolder:
#regex/urls.py
fromdjango.contribimportadmin
fromdjango.urlsimportpath
fromdjango.contrib.authimportviewsasauth_views
fromdjango.urlsimportreverse_lazy
fromentries.viewsimportHomeView,EntryListView,EntryFormView
urlpatterns=[
path('admin/',admin.site.urls),
path('entries/',EntryListView.as_view(),name='entries'),
path('entries/insert',
EntryFormView.as_view(),
name='insert'),
path('login/',
auth_views.login,
kwargs={'template_name':'admin/login.html'},
name='login'),
path('logout/',
auth_views.logout,
kwargs={'next_page':reverse_lazy('home')},
name='logout'),
path('',HomeView.as_view(),name='home'),
]
Ifyouarefamiliarwithversion1ofDjango,youwillnoticesomedifferenceshere,asthisprojectiscodedinversion2.Asyoucansee,themagiccomesfromthepathfunction,whichhasrecentlyreplacedtheurlfunction.First,wepassitapathstring(alsoknownasaroute),thentheview,andfinallyaname,whichiswhatwewilluseinthereverseandreverse_lazyfunctionstorecovertheURL.
Notethat,whenusingclass-basedviews,wehavetotransformthemintofunctions,whichiswhatpathisexpecting.Todothat,wecalltheas_view()methodonthem.
Notealsothatthefirstpathentry,fortheadmin,isspecial.Insteadofspecifyinga
URLandaview,itspecifiesaURLprefixandanotherurls.pymodule(fromtheadmin.sitepackage).Inthisway,DjangowillcompletealltheURLsfortheadminsectionbyprepending'admin/'toalltheURLsspecifiedinadmin.site.urls.Wecouldhavedonethesameforourentriesapplication(andweshouldhave),butIfeelitwouldhavebeenabitofoverkillforthissimpleproject.
TheURLpathsdefinedinthismodulearesosimplethattheydon'trequireanyregularexpressiontobedefined.Shouldyouneedtousearegularexpression,youcancheckoutthere_pathfunction,whichisdesignedforthatpurpose.
Wealsoincludeloginandlogoutfunctionalities,byemployingviewsthatcomestraightoutofthedjango.contrib.authpackage.Weenrichthedeclarationwiththenecessaryinformation(suchasthenextpage,forthelogoutview,forexample)andwedon'tneedtowriteasinglelineofcodetohandleauthentication.Thisisbrilliantandsavesusalotoftime.
Eachpathdeclarationmustbedonewithintheurlpatternslistandonthismatter,it'simportanttoconsiderthat,whenDjangoistryingtofindaviewforaURLthathasbeenrequested,thepatternsareexercisedinorder,fromtoptobottom.Thefirstonethatmatchesistheonethatwillprovidetheviewforitso,ingeneral,youhavetoputspecificpatternsbeforegenericones,otherwisetheywillnevergetachancetobecaught.Toshowyouanexamplethatusesregularexpressionsintheroutedeclaration,'^shop/categories/$'needstocomebefore'^shop'(noticethatthe'$'signalstheendofthepattern,anditisnotspecifiedinthelatter),otherwiseitwouldneverbecalled.
So,models,forms,admin,views,andURLsarealldone.Allthat'sleftistotakecareofthetemplates.I'llhavetobeverybriefonthispartbecauseHTMLcanbeveryverbose.
WritingthetemplatesAlltemplatesinheritfromabaseone,whichprovidestheHTMLstructureforallothers,inaveryobject-orientedprogramming(OOP)fashion.Italsospecifiesafewblocks,whichareareasthatcanbeoverriddenbychildrensothattheycanprovidecustomcontentforthoseareas.Let'sstartwiththebasetemplate:
#entries/templates/entries/base.html
{%loadstaticfromstaticfiles%}
<!DOCTYPEhtml>
<htmllang="en">
<head>
{%blockmeta%}
<metacharset="utf-8">
<metaname="viewport"
content="width=device-width,initial-scale=1.0">
{%endblockmeta%}
{%blockstyles%}
<linkhref="{%static"entries/css/main.css"%}"
rel="stylesheet">
{%endblockstyles%}
<title>{%blocktitle%}Title{%endblocktitle%}</title>
</head>
<body>
<divid="page-content">
{%blockpage-content%}
{%endblockpage-content%}
</div>
<divid="footer">
{%blockfooter%}
{%endblockfooter%}
</div>
</body>
</html>
Thereisagoodreasontorepeattheentriesfolderfromthetemplatesone.WhenyoudeployaDjangowebsite,youcollectallthetemplatefilesunderonefolder.Ifyoudon'tspecifythepathslikeIdid,youmaygetabase.htmltemplateintheentriesapplication,andabase.htmltemplateinanotherapp.Thelastonetobecollectedwilloverrideanyotherfilewiththesamename.Forthisreason,byputtingtheminatemplates/entriesfolderandusingthistechniqueforeachDjangoapplicationyouwrite,youavoidtheriskofnamecollisions(thesamegoesforanyotherstaticfile).
Thereisnotmuchtosayaboutthistemplate,really,apartfromthefactthatit
loadsthestatictagsothatwecangeteasyaccesstothestaticpathwithouthardcodingitinthetemplateusing{%static...%}.Thecodeinthespecial{%...%}sectionsiscodethatdefineslogic.Thecodeinthespecial{{...}}representsvariablesthatwillberenderedonthepage.
Wedefinefiveblocks:styles,meta,title,page-content,andfooter,whosepurposeistoholdthemetadata,styleinformation,title,thecontentofthepage,andthefooter,respectively.Blockscanbeoptionallyoverriddenbychildtemplatesinordertoprovidedifferentcontentwithinthem.
Here'sthefooter:
#entries/templates/entries/footer.html
<divclass="footer">
Goback<ahref="{%url"home"%}">home</a>.
</div>
Itgivesusanicelinktothehomepage,whichcomesfromthefollowingtemplate:
#entries/templates/entries/home.html
{%extends"entries/base.html"%}
{%blocktitle%}WelcometotheEntrywebsite.{%endblocktitle%}
{%blockpage-content%}
<h1>Welcome{{user.first_name}}!</h1>
<divclass="home-option">Toseethelistofyourentries
pleaseclick<ahref="{%url"entries"%}">here.</a>
</div>
<divclass="home-option">Toinsertanewentrypleaseclick
<ahref="{%url"insert"%}">here.</a>
</div>
<divclass="home-option">Tologinasanotheruserpleaseclick
<ahref="{%url"logout"%}">here.</a>
</div>
<divclass="home-option">Togototheadminpanel
pleaseclick<ahref="{%url"admin:index"%}">here.</a>
</div>
{%endblockpage-content%}
Itextendsthebase.htmltemplate,andoverridestitleandpage-content.Youcanseethatbasicallyallitdoesisprovidefourlinkstotheuser.Thesearethelistofentries,theinsertpage,thelogoutpage,andtheadminpage.AllofthisisdonewithouthardcodingasingleURL,throughtheuseofthe{%url...%}tag,whichisthetemplateequivalentofthereversefunction.
ThetemplateforinsertingEntryisasfollows:
#entries/templates/entries/insert.html
{%extends"entries/base.html"%}
{%blocktitle%}InsertanewEntry{%endblocktitle%}
{%blockpage-content%}
{%ifmessages%}
{%formessageinmessages%}
<pclass="{{message.tags}}">{{message}}</p>
{%endfor%}
{%endif%}
<h1>InsertanewEntry</h1>
<formaction="{%url"insert"%}"method="post">
{%csrf_token%}{{form.as_p}}
<inputtype="submit"value="Insert">
</form><br>
{%endblockpage-content%}
{%blockfooter%}
<div><ahref="{%url"entries"%}">Seeyourentries.</a></div>
{%include"entries/footer.html"%}
{%endblockfooter%}
Thereissomeconditionallogicatthebeginningtodisplaymessages,ifany,andthenwedefinetheform.Djangogivesustheabilitytorenderaformbysimplycalling{{form.as_p}}(alternatively,form.as_ulorform.as_table).Thiscreatesallthenecessaryfieldsandlabelsforus.Thedifferencebetweenthethreecommandsisinthewaytheformislaidout:asaparagraph,asanunorderedlist,orasatable.Weonlyneedtowrapitinformtagsandaddasubmitbutton.Thisbehaviorwasdesignedforourconvenience:weneedthefreedomtoshapethat<form>tagaswewant,soDjangoisn'tintrusiveonthat.Also,notethat{%csrf_token%}.
ItwillberenderedintoatokenbyDjangoandwillbecomepartofthedatasenttotheserveronsubmission.Thisway,Djangowillbeabletoverifythattherequestwasfromanallowedsource,thusavoidingtheaforementionedCSRFissue.DidyouseehowwehandledthetokenwhenwewrotetheviewfortheEntryinsertion?Exactly.Wedidn'twriteasinglelineofcodeforit.Djangotakescareofitautomaticallythankstoamiddlewareclass(CsrfViewMiddleware).PleaserefertotheofficialDjangodocumentation(https://docs.djangoproject.com/en/2.0/)toexplorethissubjectfurther.
Forthispage,wealsousethefooterblocktodisplayalinktothehomepage.Finally,wehavethelisttemplate,whichisthemostinterestingone:
#entries/templates/entries/list.html
{%extends"entries/base.html"%}
{%blocktitle%}Entrieslist{%endblocktitle%}
{%blockpage-content%}
{%ifentries%}
<h1>Yourentries({{entries|length}}found)</h1>
<div><ahref="{%url"insert"%}">Insertnewentry.</a></div>
<tableclass="entries-table">
<thead>
<tr><th>Entry</th><th>Matches</th></tr>
</thead>
<tbody>
{%forentry,matchinentries%}
<trclass="entries-list{%cycle'light-gray''white'%}">
<td>
Pattern:<codeclass="code">
"{{entry.pattern}}"</code><br>
TestString:<codeclass="code">
"{{entry.test_string}}"</code><br>
Added:{{entry.date_added}}
</td>
<td>
{%ifmatch%}
Group:{{match.0}}<br>
Subgroups:
{{match.1|default_if_none:"none"}}<br>
GroupDict:{{match.2|default_if_none:"none"}}
{%else%}
Nomatchesfound.
{%endif%}
</td>
</tr>
{%endfor%}
</tbody>
</table>
{%else%}
<h1>Youhavenoentries</h1>
<div><ahref="{%url"insert"%}">Insertnewentry.</a></div>
{%endif%}
{%endblockpage-content%}
{%blockfooter%}
{%include"entries/footer.html"%}
{%endblockfooter%}
Itmaytakeyouawhiletogetusedtothetemplatelanguage,butreally,allthereistoitisthecreationofatableusingaforloop.Westartbycheckingwhetherthereareanyentriesand,ifso,wecreateatable.Therearetwocolumns,oneforEntry,andtheotherforthematch.
IntheEntrycolumn,wedisplaytheEntryobject(apartfromtheuser),andintheMatchescolumn,wedisplaythatthree-tuplewecreatedintheEntryListView.Notethattoaccesstheattributesofanobject,weusethesamedotsyntaxweuseinPython,forexample{{entry.pattern}}or{{entry.test_string}},andsoon.
Whendealingwithlistsandtuples,wecannotaccessitemsusingthesquarebracketssyntax,soweusethedotoneaswell({{match.0}}isequivalenttomatch[0],andsoon).Wealsouseafilter,throughthepipe(|)operatortodisplaya
customvalueifamatchisNone.
TheDjangotemplatelanguage(whichisnotproperlyPython)iskeptsimpleforaprecisereason.Ifyoufindyourselflimitedbythelanguage,itmeansyou'reprobablytryingtodosomethinginthetemplatethatshouldactuallybedoneintheview,wherethatlogicismorepertinent.
Allowmetoshowyouacoupleofscreenshotsofthelistandinserttemplates.Thisiswhatthelistofentrieslookslikeformyfather:
Notehowtheuseofthecycletagalternatesthebackgroundcoloroftherowsfromwhitetolightgray.Thoseclassesaredefinedinthemain.cssfile.
TheEntryinsertionpageissmartenoughtoprovideafewdifferentscenarios.Whenyoulandonitatfirst,itpresentsyouwithjustanemptyform.Ifyoufillitincorrectly,itwilldisplayanicemessageforyou(seethefollowingpicture).However,ifyoufailtofillinbothfields,itwilldisplayanerrormessagebeforethem,alertingyouthatthosefieldsarerequired.
Notealsothecustomfooter,whichincludesbothalinktotheentrieslistandalinktothehomepage:
Andthat'sit!YoucanplayaroundwiththeCSSstylesifyouwant.Downloadthecodeforthebookandhavefunexploringandextendingthisproject.Addsomethingelsetothemodel,createandapplyamigration,playwiththetemplates,there'slotstodo!
Djangoisaverypowerfulframework,andofferssomuchmorethanwhatI'vebeenabletoshowyouinthischapter,soyoushoulddefinitelycheckitout.ThebeautyofitisthatDjangoisPython,soreadingitssourcecodeisaveryusefulexercise.
ThefutureofwebdevelopmentComputerscienceisaveryyoungsubject,comparedtootherbranchesofsciencethathaveexistedalongsidehumankindforcenturies.Oneofitsmaincharacteristicsisthatitmovesextremelyfast.Itleapsforwardwithsuchspeedthat,injustafewyears,youcanseechangesthatarecomparabletoreal-worldchangesthattookacenturytohappen.Therefore,asacoder,youmustpayattentiontowhathappensinthisworld,allthetime.
Currently,becausepowerfulcomputersarequitecheapandalmosteveryonehasaccesstothem,thetrendistotrytoavoidputtingtoomuchworkloadonthebackend,andletthefrontendhandlepartofit.Therefore,inthelastfewyears,JavaScriptframeworksandlibraries,suchasjQuery,Backboneand,morerecently,React,havebecomeverypopular.Webdevelopmenthasshiftedfromaparadigmwherethebackendtakescareofhandlingdata,preparingit,andservingittothefrontendtodisplayit,toaparadigmwherethebackendissometimesjustusedasanAPI,asheerdataprovider.ThefrontendfetchesthedatafromthebackendwithanAPIcall,andthenittakescareoftherest.ThisshiftfacilitatestheexistenceofparadigmssuchasSingle-PageApplication(SPA),where,ideally,thewholepageisloadedonceandthenevolves,basedonthecontentthatusuallycomesfromthebackend.E-commercewebsitesthatloadtheresultsofasearchinapagethatdoesn'trefreshthesurroundingstructurearemadewithsimilartechniques.BrowserscanperformasynchronouscallssuchasAsynchronousJavaScriptandXML(AJAX)thatcanreturndatathatcanberead,manipulated,andinjectedbackintothepagewithJavaScriptcode.
So,ifyou'replanningtoworkonwebdevelopment,IstronglysuggestyoutogetacquaintedwithJavaScript(ifyou'renotalready),andalsowithAPIs.Inthelastfewpagesofthischapter,I'llgiveyouanexampleofhowtomakeasimpleAPIusingtwodifferentPythonmicroframeworks:FlaskandFalcon.
WritingaFlaskviewFlask(http://flask.pocoo.org/)isaPythonmicroframework.ItprovidesfarfewerfeaturesthanDjango,butifyourprojectismeanttobeverysmall,thenitmightbeabetterchoice.Inmyexperiencethough,whendeveloperschooseFlaskatthebeginningofaproject,theyeventuallyendupaddingpluginafterplugin,untiltheyhavewhatIcallaDjangoFrankensteinproject.Beingagilemeanshavingperiodicallytospendtimereducingthetechnicaldebtaccumulatedovertime.However,switchingfromFlasktoDjangocanbeadauntingoperation,sowhenstartinganewproject,makesureyouconsideritsevolution.Mycheekyopiniononthismatterisverysimple:IalwaysgowithDjango,asIpersonallypreferittoFlask,butyoumightdisagreewithme,soIwanttoofferyouanexample.
Inyourch14folder,createaflaskfolderwiththefollowingstructure:
$tree-Aflask#fromthech14folder
flask
├──main.py
└──templates
└──main.html
Basically,we'regoingtocodetwosimplefiles:aFlaskapplicationandanHTMLtemplate.FlaskusesJinja2asatemplateengine.It'sextremelypopularandveryfast,tothepointthatevenDjangostartedofferingnativesupportforit:
#flask/templates/main.html
<!doctypehtml>
<title>HellofromFlask</title>
<h1>
{%ifname%}
Hello{{name}}!
{%else%}
Helloshyperson!
{%endif%}
</h1>
Thetemplateisalmostoffensivelysimple.Allitdoesischangethegreetingaccordingtothepresenceofthenamevariable.AbitmoreinterestingistheFlaskapplicationthatrendersit:
#flask/main.py
fromflaskimportFlask,render_template
app=Flask(__name__)
@app.route('/')
@app.route('/<name>')
defhello(name=None):
returnrender_template('main.html',name=name)
Wecreateanappobject,whichisaFlaskapplication.Weonlyfeedthefullyqualifiednameofthemodule,whichisstoredin__name__.
Then,wewriteasimplehelloview,whichtakesanoptionalnameargument.Inthebodyoftheview,wesimplyrenderthemain.htmltemplate,passingtoitthenameargument,regardlessofitsvalue.
What'sinterestingistherouting.DifferentlyfromDjango'swayoftyingupviewsandURLs(theurls.pymodule),inFlaskyoudecorateyourviewswithoneormore@app.routedecorators.Inthiscase,wedecoratetwice:thefirstlinetiestheviewtotherootURL(/),whilethesecondlinetiestheviewtotherootURLwithanameinformation(/<name>).
Changeintotheflaskfolderandtype(makesureyouhaveeitherinstalledFlaskwith$pipinstallflaskorbyinstallingtherequirementsinthesourcecodeforthebook):
$FLASK_APP=main.pyflaskrun
Youcanopenabrowserandgotohttp://127.0.0.1:5000/.ThisURLhasnonameinformation;therefore,youwillseeHelloshyperson!Itiswrittenallniceandbig.TrytoaddsomethingtothatURL,suchashttp://127.0.0.1:5000/Milena.HitEnterandthepagewillchangetoHelloMilena!(soyouwillhavesaidhellotomysister).
Ofcourse,Flaskoffersyoumuchmorethanthis,butwedon'thavetheroomtogothroughamorecomplexexample.It'sdefinitelyworthexploring,though.Severalprojectsuseitsuccessfullyandit'sfunandnicetocreatewebsitesorAPIswithit.Flask'sauthor,ArminRonacher,isasuccessfulandveryprolificcoder.Healsocreatedorcollaboratedonseveralotherinterestingprojects,suchasWerkzeug,Jinja2,Click,andSphinx.HealsocontributedfunctionalitiestothePythonASTmodule.
BuildingaJSONquoteserverinFalconFalcon(http://falconframework.org/)isanothermicroframeworkwritteninPython,whichwasdesignedtobelight,fast,andflexible.Ihaveseenthisrelativelyyoungprojectevolvetobecomesomethingreallypopularduetoitsspeed,whichisimpressive,soI'mhappytoshowyouatinyexampleusingit.We'regoingtobuildanAPIthatreturnsarandomquotefromtheBuddha.
Inyourch14folder,createanewonecalledfalcon.We'llhavetwofiles:quotes.pyandmain.py.Torunthisexample,installFalconandGunicorn($pipinstallfalcongunicornorthefullrequirementsforthebook).Falconistheframework,andGunicorn(GreenUnicorn)isaPythonWSGIHTTPServerforUnix(which,inlayman'sterms,meansthetechnologythatisusedtoruntheserver).
TheWebServerGatewayInterface(WSGI)isasimplecallingconventionforwebserverstoforwardrequeststowebapplicationsorframeworkswritteninPython.Ifyouwishtolearnmore,pleasecheckoutPEP333,whichdefinestheinterface.
Whenyou'reallsetup,startbycreatingthequotes.pyfile:
#falcon/quotes.py
quotes=[
"Thousandsofcandlescanbelightedfromasinglecandle,"
"andthelifeofthecandlewillnotbeshortened."
"Happinessneverdecreasesbybeingshared.",
...
"Peacecomesfromwithin.Donotseekitwithout.",
...
]
Youwillfindthecompletelistofquotesinthesourcecodeforthisbook.Ifyoudon'thaveit,youcaninsteadfillinyourfavoritequotes.Notethatnoteverylinehasacommaattheend.InPython,it'spossibletoconcatenatestringslikethat,aslongastheyareinbrackets(orbraces).It'scalledimplicitconcatenation.
Thecodeforthemainapplicationisnotlong,butitisinteresting:
#falcon/main.py
importjson
importrandom
importfalcon
fromquotesimportquotes
classQuoteResource:
defon_get(self,req,resp):
quote={
'quote':random.choice(quotes),
'author':'TheBuddha'
}
resp.body=json.dumps(quote)
api=falcon.API()
api.add_route('/quote',QuoteResource())
Let'sstartwiththeclass.InDjangowehadagetmethod,inFlaskwedefinedafunction,andherewewriteanon_getmethod,anamingstylethatremindsmeofJava/C#eventhandlers.Ittakesarequestandaresponseargument,bothautomaticallyfedbytheframework.Initsbody,wedefineadictionarywitharandomlychosenquote,andtheauthorinformation.ThenwedumpthatdictionarytoaJSONstringandsettheresponsebodytoitsvalue.Wedon'tneedtoreturnanything,Falconwilltakecareofitforus.
Attheendofthefile,wecreatetheFalconapplication,andwecalladd_routeonittotiethehandlerwehavejustwrittentotheURLwewant.
Whenyou'reallsetup,changetothefalconfolderandtype:
$gunicornmain:api
Then,makearequest(orsimplyopenthepagewithyourbrowser)tohttp://127.0.0.1:8000/quote.WhenIdidit,IgotthisJSONinresponse:
{
quote:"Peacecomesfromwithin.Donotseekitwithout.",
author:"TheBuddha"
}
Withinthefalconfolder,Ihaveleftastress.pymoduleforyou,whichtestshowfastourFalconcodeis.Seeifyoucanmakeitworkbyyourself,itshouldbeveryeasyforyouatthispoint.
Whateverframeworkyouendupusingforyourwebdevelopment,trytokeepyourselfinformedaboutotherchoicestoo.Sometimesyoumaybeinsituationswhereadifferentframeworkistherightwaytogo,andhavingaworkingknowledgeofdifferenttoolswillgiveyouanadvantage.
SummaryInthischapter,wetookalookatwebdevelopment.Wetalkedaboutimportantconcepts,suchastheDRYphilosophyandtheconceptofaframeworkasatoolthatprovidesuswithmanythingsweneedinordertowritecodetoserverequests.WealsotalkedabouttheMTVpattern,andhownicelythesethreelayersplaytogethertorealizearequest-responsepath.
Then,webrieflyintroducedregularexpressions,whichisasubjectofparamountimportance,andit'sthelayerthatprovidesthetoolsforURLrouting.
Therearemanydifferentframeworksoutthere,andDjangoisdefinitelyoneofthebestandmostwidelyused,soit'sworthexploring,especiallyitssourcecode,whichiswellwritten.
Thereareotherveryinterestingandimportantframeworkstoo,suchasFlask.Theyprovidefewerfeaturesbutmightbefaster,bothinexecutiontimeandtosetup.OnethatisextremelyfastistheFalconproject,whosebenchmarksareoutstanding.
It'simportanttogetasolidunderstandingofhowtherequest-responsemechanismworks,andhowthewebingeneralworks,sothateventuallyitwon'tmattertoomuchwhichframeworkyouhavetouse.Youwillbeabletopickitupquicklybecauseitwillonlybeamatterofgettingfamiliarwithawayofdoingsomethingyoualreadyknowalotabout.
Exploreatleastthreeframeworksandtrytocomeupwithdifferentusecasestodecidewhichoneofthemcouldbetheidealchoice.Whenyouareabletomakethatchoice,youwillknowyouhaveagoodenoughunderstandingofthem.
AfarewellIhopethatyouarestillthirstyandthatthisbookwillbejustthefirstofmanystepsyoutaketowardsPython.It'satrulywonderfullanguage,wellworthlearningdeeply.
Ihopethatyouenjoyedthisjourneywithme,Ididmybesttomakeitinterestingforyou.Itsurewasforme,Ihadsuchagreattimewritingthesepages.
Pythonisopensource,sopleasekeepsharingitandconsidersupportingthewonderfulcommunityaroundit.
Untilnexttime,myfriend,farewell!
OtherBooksYouMayEnjoyIfyouenjoyedthisbook,youmaybeinterestedintheseotherbooksbyPackt:
SecretRecipesofthePythonNinjaCodyJackson
ISBN:978-1-78829-487-4
Knowthedifferencesbetween.pyand.pycfilesExplorethedifferentwaystoinstallandupgradePythonpackagesUnderstandtheworkingofthePyPImodulethatenhancesbuilt-indecoratorsSeehowcoroutinesaredifferentfromgeneratorsandhowtheycansimulatemultithreadingGrasphowthedecimalmoduleimprovesfloatingpointnumbersandtheiroperationsStandardizesubinterpreterstoimproveconcurrencyDiscoverPython’sbuilt-indocstringanalyzer
PythonProgrammingBlueprints
DanielFurtado,MarcusPennington
ISBN:978-1-78646-816-1
Learnobject-orientedandfunctionalprogrammingconceptswhiledevelopingprojectsThedosanddon'tsofstoringpasswordsinadatabaseDevelopafullyfunctionalwebsiteusingthepopularDjangoframeworkUsetheBeautifulSouplibrarytoperformwebscrappingGetstartedwithcloudcomputingbybuildingmicroserviceandserverlessapplicationsinAWSDevelopscalableandcohesivemicroservicesusingtheNamekoframeworkCreateservicedependenciesforRedisandPostgreSQL
Leaveareview-letotherreadersknowwhatyouthinkPleaseshareyourthoughtsonthisbookwithothersbyleavingareviewonthesitethatyouboughtitfrom.IfyoupurchasedthebookfromAmazon,pleaseleaveusanhonestreviewonthisbook'sAmazonpage.Thisisvitalsothatotherpotentialreaderscanseeanduseyourunbiasedopiniontomakepurchasingdecisions,wecanunderstandwhatourcustomersthinkaboutourproducts,andourauthorscanseeyourfeedbackonthetitlethattheyhaveworkedwithPackttocreate.Itwillonlytakeafewminutesofyourtime,butisvaluabletootherpotentialcustomers,ourauthors,andPackt.Thankyou!