automate the boring stuff with python: albert sweigart · 2015-07-13 · automate the boring stuff...
TRANSCRIPT
AutomatetheBoringStuffwithPython:PracticalProgrammingforTotal
Beginners
AlbertSweigart
PublishedbyNoStarchPress
AbouttheAuthorAlSweigartisasoftwaredeveloperandtechbookauthorlivinginSanFrancisco.Pythonishisfavoriteprogramminglanguage,andheisthedeveloperofseveralopensourcemodulesforit.HisotherbooksarefreelyavailableunderaCreativeCommonslicenseonhiswebsitehttp://www.inventwithpython.com/.Hiscatweighs14pounds.
AbouttheTechReviewerAriLacenskiisadeveloperofAndroidapplicationsandPythonsoftware.ShelivesinSanFrancisco,whereshewritesaboutAndroidprogrammingathttp://gradlewhy.ghost.io/andmentorswithWomenWhoCode.She’salsoafolkguitarist.
AcknowledgmentsIcouldn’thavewrittenabooklikethiswithoutthehelpofalotofpeople.I’dliketothankBillPollock;myeditors,LaurelChun,LeslieShen,GregPoulos,andJenniferGriffith-Delgado;andtherestofthestaffatNoStarchPressfortheirinvaluablehelp.Thankstomytechreviewer,AriLacenski,forgreatsuggestions,edits,andsupport.
ManythankstoourBenevolentDictatorForLife,GuidovanRossum,andeveryoneatthePythonSoftwareFoundationfortheirgreatwork.ThePythoncommunityisthebestoneI’vefoundinthetechindustry.
Finally,Iwouldliketothankmyfamily,friends,andthegangatShotwell’sfornotmindingthebusylifeI’vehadwhilewritingthisbook.Cheers!
Introduction“You’vejustdoneintwohourswhatittakesthethreeofustwodaystodo.”Mycollegeroommatewasworkingataretailelectronicsstoreintheearly2000s.Occasionally,thestorewouldreceiveaspreadsheetofthousandsofproductpricesfromitscompetitor.Ateamofthreeemployeeswouldprintthespreadsheetontoathickstackofpaperandsplititamongthemselves.Foreachproductprice,theywouldlookuptheirstore’spriceandnotealltheproductsthattheircompetitorssoldforless.Itusuallytookacoupleofdays.
“Youknow,Icouldwriteaprogramtodothatifyouhavetheoriginalfilefortheprintouts,”myroommatetoldthem,whenhesawthemsittingonthefloorwithpapersscatteredandstackedaroundthem.
Afteracoupleofhours,hehadashortprogramthatreadacompetitor’spricefromafile,foundtheproductinthestore’sdatabase,andnotedwhetherthecompetitorwascheaper.Hewasstillnewtoprogramming,andhespentmostofhistimelookingupdocumentationinaprogrammingbook.Theactualprogramtookonlyafewsecondstorun.Myroommateandhisco-workerstookanextra-longlunchthatday.
Thisisthepowerofcomputerprogramming.AcomputerislikeaSwissArmyknifethatyoucanconfigureforcountlesstasks.Manypeoplespendhoursclickingandtypingtoperformrepetitivetasks,unawarethatthemachinethey’reusingcoulddotheirjobinsecondsiftheygaveittherightinstructions.
WhomIsThisBookFor?Softwareisatthecoreofsomanyofthetoolsweusetoday:Nearlyeveryoneusessocialnetworkstocommunicate,manypeoplehaveInternet-connectedcomputersintheirphones,andmostofficejobsinvolveinteractingwithacomputertogetworkdone.Asaresult,thedemandforpeoplewhocancodehasskyrocketed.Countlessbooks,interactivewebtutorials,anddeveloperbootcampspromisetoturnambitiousbeginnersintosoftwareengineerswithsix-figuresalaries.
Thisbookisnotforthosepeople.It’sforeveryoneelse.
Onitsown,thisbookwon’tturnyouintoaprofessionalsoftwaredeveloperanymorethanafewguitarlessonswillturnyouintoarockstar.Butifyou’reanofficeworker,administrator,academic,oranyoneelsewhousesacomputerforworkorfun,youwilllearnthebasicsofprogrammingsothatyoucanautomatesimpletaskssuchasthefollowing:
MovingandrenamingthousandsoffilesandsortingthemintofoldersFillingoutonlineforms,notypingrequiredDownloadingfilesorcopytextfromawebsitewheneveritupdatesHavingyourcomputertextyoucustomnotificationsUpdatingorformattingExcelspreadsheetsCheckingyouremailandsendingoutprewrittenresponses
Thesetasksaresimplebuttime-consumingforhumans,andthey’reoftensotrivialorspecificthatthere’snoready-madesoftwaretoperformthem.Armedwithalittlebitofprogrammingknowledge,youcanhaveyourcomputerdothesetasksforyou.
ConventionsThisbookisnotdesignedasareferencemanual;it’saguideforbeginners.Thecodingstylesometimesgoesagainstbestpractices(forexample,someprogramsuseglobalvariables),butthat’satrade-offtomakethecodesimplertolearn.Thisbookismadeforpeopletowritethrowawaycode,sothere’snotmuchtimespentonstyleandelegance.Sophisticatedprogrammingconcepts—likeobject-orientedprogramming,listcomprehensions,andgenerators—aren’tcoveredbecauseofthecomplexitytheyadd.Veteranprogrammersmaypointoutwaysthecodeinthisbookcouldbechangedtoimproveefficiency,butthisbookismostlyconcernedwithgettingprogramstoworkwiththeleastamountofeffort.
WhatIsProgramming?Televisionshowsandfilmsoftenshowprogrammersfuriouslytypingcrypticstreamsof1sand0songlowingscreens,butmodernprogrammingisn’tthatmysterious.Programmingissimplytheactofenteringinstructionsforthecomputertoperform.Theseinstructionsmightcrunchsomenumbers,modifytext,lookupinformationinfiles,orcommunicatewithothercomputersovertheInternet.
Allprogramsusebasicinstructionsasbuildingblocks.Hereareafewofthemostcommonones,inEnglish:
“Dothis;thendothat.”“Ifthisconditionistrue,performthisaction;otherwise,dothataction.”“Dothisactionthatnumberoftimes.”“Keepdoingthatuntilthisconditionistrue.”
Youcancombinethesebuildingblockstoimplementmoreintricatedecisions,too.Forexample,herearetheprogramminginstructions,calledthesourcecode,forasimpleprogramwritteninthePythonprogramminglanguage.Startingatthetop,thePythonsoftwarerunseachlineofcode(somelinesarerunonlyifacertainconditionistrueorelsePythonrunssomeotherline)untilitreachesthebottom.
➊passwordFile=open('SecretPasswordFile.txt')
➋secretPassword=passwordFile.read()
➌print('Enteryourpassword.')
typedPassword=input()
➍iftypedPassword==secretPassword:
➎print('Accessgranted')
➏iftypedPassword=='12345':
➐print('Thatpasswordisonethatanidiotputsontheirluggage.')
else:
➑print('Accessdenied')
Youmightnotknowanythingaboutprogramming,butyoucouldprobablymakeareasonableguessatwhatthepreviouscodedoesjustbyreadingit.First,thefileSecretPasswordFile.txtisopened➊,andthesecretpasswordinitisread➋.Then,theuserispromptedtoinputapassword(fromthekeyboard)➌.Thesetwopasswordsarecompared➍,andifthey’rethesame,theprogramprintsAccessgrantedtothescreen➎.Next,theprogramcheckstoseewhetherthepasswordis12345➏andhintsthatthischoicemightnotbethebestforapassword➐.Ifthepasswordsarenotthesame,theprogramprintsAccessdeniedtothescreen➑.
WhatIsPython?PythonreferstothePythonprogramminglanguage(withsyntaxrulesforwritingwhatisconsideredvalidPythoncode)andthePythoninterpretersoftwarethatreadssourcecode(writteninthePythonlanguage)andperformsitsinstructions.ThePythoninterpreterisfreetodownloadfromhttp://python.org/,andthereareversionsforLinux,OSX,andWindows.
ThenamePythoncomesfromthesurrealBritishcomedygroupMontyPython,notfromthesnake.PythonprogrammersareaffectionatelycalledPythonistas,andbothMontyPythonandserpentinereferencesusuallypepperPythontutorialsanddocumentation.
ProgrammersDon’tNeedtoKnowMuchMath
ThemostcommonanxietyIhearaboutlearningtoprogramisthatpeoplethinkitrequiresalotofmath.Actually,mostprogrammingdoesn’trequiremathbeyondbasicarithmetic.Infact,beinggoodatprogrammingisn’tthatdifferentfrombeinggoodatsolvingSudokupuzzles.
TosolveaSudokupuzzle,thenumbers1through9mustbefilledinforeachrow,eachcolumn,andeach3×3interiorsquareofthefull9×9board.Youfindasolutionbyapplyingdeductionandlogicfromthestartingnumbers.Forexample,since5appearsinthetopleftoftheSudokupuzzleshowninFigureI-1,itcannotappearelsewhereinthetoprow,intheleftmostcolumn,orinthetop-left3×3square.Solvingonerow,column,orsquareatatimewillprovidemorenumbercluesfortherestofthepuzzle.
FigureI-1.AnewSudokupuzzle(left)anditssolution(right).Despiteusingnumbers,Sudokudoesn’tinvolvemuchmath.(Images©WikimediaCommons)
JustbecauseSudokuinvolvesnumbersdoesn’tmeanyouhavetobegoodatmathtofigureoutthesolution.Thesameistrueofprogramming.LikesolvingaSudokupuzzle,writingprogramsinvolvesbreakingdownaproblemintoindividual,detailedsteps.Similarly,whendebuggingprograms(thatis,findingandfixingerrors),you’llpatientlyobservewhattheprogramisdoingandfindthecauseofthebugs.Andlikeallskills,themoreyouprogram,thebetteryou’llbecome.
ProgrammingIsaCreativeActivityProgrammingisacreativetask,somewhatlikeconstructingacastleoutofLEGObricks.Youstartwithabasicideaofwhatyouwantyourcastletolooklikeandinventoryyouravailableblocks.Thenyoustartbuilding.Onceyou’vefinishedbuildingyourprogram,youcanprettyupyourcodejustlikeyouwouldyourcastle.
Thedifferencebetweenprogrammingandothercreativeactivitiesisthatwhenprogramming,youhavealltherawmaterialsyouneedinyourcomputer;youdon’tneedtobuyanyadditionalcanvas,paint,film,yarn,LEGObricks,orelectroniccomponents.Whenyourprogramiswritten,itcaneasilybesharedonlinewiththeentireworld.Andthoughyou’llmakemistakeswhenprogramming,theactivityisstillalotoffun.
AboutThisBookThefirstpartofthisbookcoversbasicPythonprogrammingconcepts,andthesecondpartcoversvarioustasksyoucanhaveyourcomputerautomate.Eachchapterinthesecondparthasprojectprogramsforyoutostudy.Here’sabriefrundownofwhatyou’llfindineachchapter:
PartI
Chapter1.Coversexpressions,themostbasictypeofPythoninstruction,andhowtousethePythoninteractiveshellsoftwaretoexperimentwithcode.Chapter2.Explainshowtomakeprogramsdecidewhichinstructionstoexecutesoyourcodecanintelligentlyrespondtodifferentconditions.Chapter3.Instructsyouonhowtodefineyourownfunctionssothatyoucanorganizeyourcodeintomoremanageablechunks.Chapter4.Introducesthelistdatatypeandexplainshowtoorganizedata.Chapter5.Introducesthedictionarydatatypeandshowsyoumorepowerfulwaystoorganizedata.Chapter6.Coversworkingwithtextdata(calledstringsinPython).
PartII
Chapter7.CovershowPythoncanmanipulatestringsandsearchfortextpatternswithregularexpressions.Chapter8.Explainshowyourprogramscanreadthecontentsoftextfilesandsaveinformationtofilesonyourharddrive.Chapter9.ShowshowPythoncancopy,move,rename,anddeletelargenumbersoffilesmuchfasterthanahumanusercan.Italsoexplainscompressinganddecompressingfiles.Chapter10.ShowshowtousePython’svariousbug-findingandbug-fixingtools.Chapter11.Showshowtowriteprogramsthatcanautomaticallydownloadwebpagesandparsethemforinformation.Thisiscalledwebscraping.Chapter12.CoversprogrammaticallymanipulatingExcelspreadsheetssothatyoudon’thavetoreadthem.Thisishelpfulwhenthenumberofdocumentsyouhavetoanalyzeisinthehundredsorthousands.Chapter13.CoversprogrammaticallyreadingWordandPDFdocuments.Chapter14.ContinuestoexplainhowtoprogrammaticallymanipulatedocumentswithCSVandJSONfiles.Chapter15.ExplainshowtimeanddatesarehandledbyPythonprogramsandhowtoscheduleyourcomputertoperformtasksatcertaintimes.ThischapteralsoshowshowyourPythonprogramscanlaunchnon-Pythonprograms.Chapter16.Explainshowtowriteprogramsthatcansendemailsandtextmessagesonyourbehalf.Chapter17.ExplainshowtoprogrammaticallymanipulateimagessuchasJPEGorPNGfiles.Chapter18.Explainshowtoprogrammaticallycontrolthemouseandkeyboardtoautomateclicksandkeypresses.
DownloadingandInstallingPythonYoucandownloadPythonforWindows,OSX,andUbuntuforfreefromhttp://python.org/downloads/.Ifyoudownloadthelatestversionfromthewebsite’sdownloadpage,alloftheprogramsinthisbookshouldwork.
WARNING
BesuretodownloadaversionofPython3(suchas3.4.0).TheprogramsinthisbookarewrittentorunonPython3andmaynotruncorrectly,ifatall,onPython2.
You’llfindPythoninstallersfor64-bitand32-bitcomputersforeachoperatingsystemonthedownloadpage,sofirstfigureoutwhichinstalleryouneed.Ifyouboughtyourcomputerin2007orlater,itismostlikelya64-bitsystem.Otherwise,youhavea32-bitversion,buthere’showtofindoutforsure:
OnWindows,selectStart▸ControlPanel▸SystemandcheckwhetherSystemTypesays64-bitor32-bit.OnOSX,gotheApplemenu,selectAboutThisMac▸MoreInfo▸SystemReport▸Hardware,andthenlookattheProcessorNamefield.IfitsaysIntelCoreSoloorIntelCoreDuo,youhavea32-bitmachine.Ifitsaysanythingelse(includingIntelCore2Duo),youhavea64-bitmachine.OnUbuntuLinux,openaTerminalandrunthecommanduname-m.Aresponseofi686means32-bit,andx86_64means64-bit.
OnWindows,downloadthePythoninstaller(thefilenamewillendwith.msi)anddouble-clickit.FollowtheinstructionstheinstallerdisplaysonthescreentoinstallPython,aslistedhere:
1. SelectInstallforAllUsersandthenclickNext.2. InstalltotheC:\Python34folderbyclickingNext.3. ClickNextagaintoskiptheCustomizePythonsection.
OnMacOSX,downloadthe.dmgfilethat’srightforyourversionofOSXanddouble-clickit.FollowtheinstructionstheinstallerdisplaysonthescreentoinstallPython,aslistedhere:
1. WhentheDMGpackageopensinanewwindow,double-clickthePython.mpkgfile.Youmayhavetoentertheadministratorpassword.
2. ClickContinuethroughtheWelcomesectionandclickAgreetoacceptthelicense.3. SelectHDMacintosh(orwhatevernameyourharddrivehas)andclickInstall.
Ifyou’rerunningUbuntu,youcaninstallPythonfromtheTerminalbyfollowingthesesteps:
1. OpentheTerminalwindow.2. Entersudoapt-getinstallpython3.3. Entersudoapt-getinstallidle3.4. Entersudoapt-getinstallpython3-pip.
StartingIDLEWhilethePythoninterpreteristhesoftwarethatrunsyourPythonprograms,theinteractivedevelopmentenvironment(IDLE)softwareiswhereyou’llenteryourprograms,muchlikeawordprocessor.Let’sstartIDLEnow.
OnWindows7ornewer,clicktheStarticoninthelower-leftcornerofyourscreen,enterIDLEinthesearchbox,andselectIDLE(PythonGUI).OnWindowsXP,clicktheStartbuttonandthenselectPrograms▸Python3.4▸IDLE(PythonGUI).OnMacOSX,opentheFinderwindow,clickApplications,clickPython3.4,andthenclicktheIDLEicon.OnUbuntu,selectApplications▸Accessories▸Terminalandthenenteridle3.(YoumayalsobeabletoclickApplicationsatthetopofthescreen,selectProgramming,andthenclickIDLE3.)
TheInteractiveShellNomatterwhichoperatingsystemyou’rerunning,theIDLEwindowthatfirstappearsshouldbemostlyblankexceptfortextthatlookssomethinglikethis:
Python3.4.0(v3.4.0:04f714765c13,Mar162014,19:25:23)[MSCv.160064
bit(AMD64)]onwin32Type"copyright","credits"or"license()"formore
information.
>>>
Thiswindowiscalledtheinteractiveshell.Ashellisaprogramthatletsyoutypeinstructionsintothecomputer,muchliketheTerminalorCommandPromptonOSXandWindows,respectively.Python’sinteractiveshellletsyouenterinstructionsforthePythoninterpretersoftwaretorun.Thecomputerreadstheinstructionsyouenterandrunsthemimmediately.
Forexample,enterthefollowingintotheinteractiveshellnexttothe>>>prompt:>>>print('Helloworld!')
AfteryoutypethatlineandpressENTER,theinteractiveshellshoulddisplaythisinresponse:
>>>print('Helloworld!')
Helloworld!
HowtoFindHelpSolvingprogrammingproblemsonyourowniseasierthanyoumightthink.Ifyou’renotconvinced,thenlet’scauseanerroronpurpose:Enter'42'+3intotheinteractiveshell.Youdon’tneedtoknowwhatthisinstructionmeansrightnow,buttheresultshouldlooklikethis:
>>>'42'+3
➊Traceback(mostrecentcalllast):
File"<pyshell#0>",line1,in<module>
'42'+3
➋TypeError:Can'tconvert'int'objecttostrimplicitly
>>>
Theerrormessage➋appearedherebecausePythoncouldn’tunderstandyourinstruction.Thetracebackpart➊oftheerrormessageshowsthespecificinstructionandlinenumberthatPythonhadtroublewith.Ifyou’renotsurewhattomakeofaparticularerrormessage,searchonlinefortheexacterrormessage.Enter“TypeError:Can’tconvert‘int’objecttostrimplicitly”(includingthequotes)intoyourfavoritesearchengine,andyoushouldseetonsoflinksexplainingwhattheerrormessagemeansandwhatcausesit,asshowninFigureI-2.
FigureI-2.TheGoogleresultsforanerrormessagecanbeveryhelpful.
You’lloftenfindthatsomeoneelsehadthesamequestionasyouandthatsomeotherhelpfulpersonhasalreadyansweredit.Noonepersoncanknoweverythingaboutprogramming,soaneverydaypartofanysoftwaredeveloper’sjobislookingupanswerstotechnicalquestions.
AskingSmartProgrammingQuestionsIfyoucan’tfindtheanswerbysearchingonline,tryaskingpeopleinawebforumsuchasStackOverlow(http://stackoverflow.com/)orthe“learnprogramming”subredditathttp://reddit.com/r/learnprogramming/.Butkeepinmindtherearesmartwaystoaskprogrammingquestionsthathelpothershelpyou.BesuretoreadtheFrequentlyAskedQuestionssectionsthesewebsiteshaveabouttheproperwaytopostquestions.
Whenaskingprogrammingquestions,remembertodothefollowing:
Explainwhatyouaretryingtodo,notjustwhatyoudid.Thisletsyourhelperknowifyouareonthewrongtrack.Specifythepointatwhichtheerrorhappens.Doesitoccurattheverystartoftheprogramoronlyafteryoudoacertainaction?Copyandpastetheentireerrormessageandyourcodetohttp://pastebin.com/orhttp://gist.github.com/.ThesewebsitesmakeiteasytosharelargeamountsofcodewithpeopleovertheWeb,withouttheriskoflosinganytextformatting.YoucanthenputtheURLofthepostedcodeinyouremailorforumpost.Forexample,heresomepiecesofcodeI’veposted:http://pastebin.com/SzP2DbFx/andhttps://gist.github.com/asweigart/6912168/.Explainwhatyou’vealreadytriedtodotosolveyourproblem.Thistellspeopleyou’vealreadyputinsomeworktofigurethingsoutonyourown.ListtheversionofPythonyou’reusing.(Therearesomekeydifferencesbetweenversion2Pythoninterpretersandversion3Pythoninterpreters.)Also,saywhichoperatingsystemandversionyou’rerunning.Iftheerrorcameupafteryoumadeachangetoyourcode,explainexactlywhatyouchanged.Saywhetheryou’reabletoreproducetheerroreverytimeyouruntheprogramorwhetherithappensonlyafteryouperformcertainactions.Explainwhatthoseactionsare,ifso.
Alwaysfollowgoodonlineetiquetteaswell.Forexample,don’tpostyourquestionsinallcapsormakeunreasonabledemandsofthepeopletryingtohelpyou.
SummaryFormostpeople,theircomputerisjustanapplianceinsteadofatool.Butbylearninghowtoprogram,you’llgainaccesstooneofthemostpowerfultoolsofthemodernworld,andyou’llhavefunalongtheway.Programmingisn’tbrainsurgery—it’sfineforamateurstoexperimentandmakemistakes.
IlovehelpingpeoplediscoverPython.Iwriteprogrammingtutorialsonmyblogathttp://inventwithpython.com/blog/,[email protected].
Thisbookwillstartyouofffromzeroprogrammingknowledge,butyoumayhavequestionsbeyonditsscope.Rememberthataskingeffectivequestionsandknowinghowtofindanswersareinvaluabletoolsonyourprogrammingjourney.
Let’sbegin!
Chapter1.PythonBasicsThePythonprogramminglanguagehasawiderangeofsyntacticalconstructions,standardlibraryfunctions,andinteractivedevelopmentenvironmentfeatures.Fortunately,youcanignoremostofthat;youjustneedtolearnenoughtowritesomehandylittleprograms.
Youwill,however,havetolearnsomebasicprogrammingconceptsbeforeyoucandoanything.Likeawizard-in-training,youmightthinktheseconceptsseemarcaneandtedious,butwithsomeknowledgeandpractice,you’llbeabletocommandyourcomputerlikeamagicwandtoperformincrediblefeats.
Thischapterhasafewexamplesthatencourageyoutotypeintotheinteractiveshell,whichletsyouexecutePythoninstructionsoneatatimeandshowsyoutheresultsinstantly.UsingtheinteractiveshellisgreatforlearningwhatbasicPythoninstructionsdo,sogiveitatryasyoufollowalong.You’llrememberthethingsyoudomuchbetterthanthethingsyouonlyread.
EnteringExpressionsintotheInteractiveShellYouruntheinteractiveshellbylaunchingIDLE,whichyouinstalledwithPythonintheintroduction.OnWindows,opentheStartmenu,selectAllPrograms▸Python3.3,andthenselectIDLE(PythonGUI).OnOSX,selectApplications▸MacPython3.3▸IDLE.OnUbuntu,openanewTerminalwindowandenteridle3.
Awindowwiththe>>>promptshouldappear;that’stheinteractiveshell.Enter2+2attheprompttohavePythondosomesimplemath.
>>>2+2
4
TheIDLEwindowshouldnowshowsometextlikethis:Python3.3.2(v3.3.2:d047928ae3f6,May162013,00:06:53)[MSCv.160064bit
(AMD64)]onwin32
Type"copyright","credits"or"license()"formoreinformation.
>>>2+2
4
>>>
InPython,2+2iscalledanexpression,whichisthemostbasickindofprogramminginstructioninthelanguage.Expressionsconsistofvalues(suchas2)andoperators(suchas+),andtheycanalwaysevaluate(thatis,reduce)downtoasinglevalue.ThatmeansyoucanuseexpressionsanywhereinPythoncodethatyoucouldalsouseavalue.
Inthepreviousexample,2+2isevaluateddowntoasinglevalue,4.Asinglevaluewithnooperatorsisalsoconsideredanexpression,thoughitevaluatesonlytoitself,asshownhere:
>>>2
2
ERRORSAREOKAY!
Programswillcrashiftheycontaincodethecomputercan’tunderstand,whichwillcausePythontoshowanerrormessage.Anerrormessagewon’tbreakyourcomputer,though,sodon’tbeafraidtomakemistakes.Acrashjustmeanstheprogramstoppedrunningunexpectedly.
Ifyouwanttoknowmoreaboutanerrormessage,youcansearchfortheexactmessagetextonlinetofindoutmoreaboutthatspecificerror.Youcanalsocheckouttheresourcesathttp://nostarch.com/automatestuff/toseealistofcommonPythonerrormessagesandtheirmeanings.
ThereareplentyofotheroperatorsyoucanuseinPythonexpressions,too.Forexample,Table1-1listsallthemathoperatorsinPython.
Table1-1.MathOperatorsfromHighesttoLowestPrecedence
Operator Operation Example Evaluatesto…
** Exponent 2**3 8
% Modulus/remainder 22%8 6
// Integerdivision/flooredquotient 22//8 2
/ Division 22/8 2.75
* Multiplication 3*5 15
- Subtraction 5-2 3
+ Addition 2+2 4
Theorderofoperations(alsocalledprecedence)ofPythonmathoperatorsissimilartothatofmathematics.The**operatorisevaluatedfirst;the*,/,//,and%operatorsareevaluatednext,fromlefttoright;andthe+and-operatorsareevaluatedlast(alsofromlefttoright).Youcanuseparenthesestooverridetheusualprecedenceifyouneedto.Enterthefollowingexpressionsintotheinteractiveshell:
>>>2+3*6
20
>>>(2+3)*6
30
>>>48565878*578453
28093077826734
>>>2**8
256
>>>23/7
3.2857142857142856
>>>23//7
3
>>>23%7
2
>>>2+2
4
>>>(5-1)*((7+1)/(3-1))
16.0
Ineachcase,youastheprogrammermustentertheexpression,butPythondoesthehardpartofevaluatingitdowntoasinglevalue.Pythonwillkeepevaluatingpartsoftheexpressionuntilitbecomesasinglevalue,asshowninFigure1-1.
Figure1-1.Evaluatinganexpressionreducesittoasinglevalue.
TheserulesforputtingoperatorsandvaluestogethertoformexpressionsareafundamentalpartofPythonasaprogramminglanguage,justlikethegrammarrulesthathelpuscommunicate.Here’sanexample:
ThisisagrammaticallycorrectEnglishsentence.ThisgrammaticallyissentencenotEnglishcorrecta.
Thesecondlineisdifficulttoparsebecauseitdoesn’tfollowtherulesofEnglish.Similarly,ifyoutypeinabadPythoninstruction,Pythonwon’tbeabletounderstanditandwilldisplayaSyntaxErrorerrormessage,asshownhere:
>>>5+
File"<stdin>",line1
5+
^
SyntaxError:invalidsyntax
>>>42+5+*2
File"<stdin>",line1
42+5+*2
^
SyntaxError:invalidsyntax
Youcanalwaystesttoseewhetheraninstructionworksbytypingitintotheinteractiveshell.Don’tworryaboutbreakingthecomputer:TheworstthingthatcouldhappenisthatPythonrespondswithanerrormessage.Professionalsoftwaredevelopersgeterrormessageswhilewritingcodeallthetime.
TheInteger,Floating-Point,andStringDataTypesRememberthatexpressionsarejustvaluescombinedwithoperators,andtheyalwaysevaluatedowntoasinglevalue.Adatatypeisacategoryforvalues,andeveryvaluebelongstoexactlyonedatatype.ThemostcommondatatypesinPythonarelistedinTable1-2.Thevalues-2and30,forexample,aresaidtobeintegervalues.Theinteger(orint)datatypeindicatesvaluesthatarewholenumbers.Numberswithadecimalpoint,suchas3.14,arecalledfloating-pointnumbers(orfloats).Notethateventhoughthevalue42isaninteger,thevalue42.0wouldbeafloating-pointnumber.
Table1-2.CommonDataTypes
Datatype Examples
Integers -2,-1,0,1,2,3,4,5
Floating-pointnumbers -1.25,-1.0,--0.5,0.0,0.5,1.0,1.25
Strings 'a','aa','aaa','Hello!','11cats'
Pythonprogramscanalsohavetextvaluescalledstrings,orstrs(pronounced“stirs”).Alwayssurroundyourstringinsinglequote(')characters(asin'Hello'or'Goodbyecruelworld!')soPythonknowswherethestringbeginsandends.Youcanevenhaveastringwithnocharactersinit,'',calledablankstring.StringsareexplainedingreaterdetailinChapter4.
IfyoueverseetheerrormessageSyntaxError:EOLwhilescanningstringliteral,youprobablyforgotthefinalsinglequotecharacterattheendofthestring,suchasinthisexample:
>>>'Helloworld!
SyntaxError:EOLwhilescanningstringliteral
StringConcatenationandReplicationThemeaningofanoperatormaychangebasedonthedatatypesofthevaluesnexttoit.Forexample,+istheadditionoperatorwhenitoperatesontwointegersorfloating-pointvalues.However,when+isusedontwostringvalues,itjoinsthestringsasthestringconcatenationoperator.Enterthefollowingintotheinteractiveshell:
>>>'Alice'+'Bob'
'AliceBob'
Theexpressionevaluatesdowntoasingle,newstringvaluethatcombinesthetextofthetwostrings.However,ifyoutrytousethe+operatoronastringandanintegervalue,Pythonwillnotknowhowtohandlethis,anditwilldisplayanerrormessage.
>>>'Alice'+42
Traceback(mostrecentcalllast):
File"<pyshell#26>",line1,in<module>
'Alice'+42
TypeError:Can'tconvert'int'objecttostrimplicitly
TheerrormessageCan'tconvert'int'objecttostrimplicitlymeansthatPythonthoughtyouweretryingtoconcatenateanintegertothestring'Alice'.Yourcodewillhavetoexplicitlyconverttheintegertoastring,becausePythoncannotdothisautomatically.(ConvertingdatatypeswillbeexplainedinDissectingYourProgramwhentalkingaboutthestr(),int(),andfloat()functions.)
The*operatorisusedformultiplicationwhenitoperatesontwointegerorfloating-pointvalues.Butwhenthe*operatorisusedononestringvalueandoneintegervalue,itbecomesthestringreplicationoperator.Enterastringmultipliedbyanumberintotheinteractiveshelltoseethisinaction.
>>>'Alice'*5
'AliceAliceAliceAliceAlice'
Theexpressionevaluatesdowntoasinglestringvaluethatrepeatstheoriginalanumberoftimesequaltotheintegervalue.Stringreplicationisausefultrick,butit’snotusedasoftenasstringconcatenation.
The*operatorcanbeusedwithonlytwonumericvalues(formultiplication)oronestringvalueandoneintegervalue(forstringreplication).Otherwise,Pythonwilljustdisplayanerrormessage.
>>>'Alice'*'Bob'
Traceback(mostrecentcalllast):
File"<pyshell#32>",line1,in<module>
'Alice'*'Bob'
TypeError:can'tmultiplysequencebynon-intoftype'str'
>>>'Alice'*5.0
Traceback(mostrecentcalllast):
File"<pyshell#33>",line1,in<module>
'Alice'*5.0
TypeError:can'tmultiplysequencebynon-intoftype'float'
ItmakessensethatPythonwouldn’tunderstandtheseexpressions:Youcan’tmultiplytwowords,andit’shardtoreplicateanarbitrarystringafractionalnumberoftimes.
StoringValuesinVariablesAvariableislikeaboxinthecomputer’smemorywhereyoucanstoreasinglevalue.Ifyouwanttousetheresultofanevaluatedexpressionlaterinyourprogram,youcansaveitinsideavariable.
AssignmentStatementsYou’llstorevaluesinvariableswithanassignmentstatement.Anassignmentstatementconsistsofavariablename,anequalsign(calledtheassignmentoperator),andthevaluetobestored.Ifyouentertheassignmentstatementspam=42,thenavariablenamedspamwillhavetheintegervalue42storedinit.
Thinkofavariableasalabeledboxthatavalueisplacedin,asinFigure1-2.
Figure1-2.spam=42isliketellingtheprogram,“Thevariablespamnowhastheintegervalue42init.”
Forexample,enterthefollowingintotheinteractiveshell:➊>>>spam=40
>>>spam
40
>>>eggs=2
➋>>>spam+eggs
42
>>>spam+eggs+spam
82
➌>>>spam=spam+2
>>>spam
42
Avariableisinitialized(orcreated)thefirsttimeavalueisstoredinit➊.Afterthat,youcanuseitinexpressionswithothervariablesandvalues➋.Whenavariableisassignedanewvalue➌,theoldvalueisforgotten,whichiswhyspamevaluatedto42insteadof40attheendoftheexample.Thisiscalledoverwritingthevariable.Enterthefollowingcodeintotheinteractiveshelltotryoverwritingastring:
>>>spam='Hello'
>>>spam
'Hello'
>>>spam='Goodbye'
>>>spam
'Goodbye'
JustliketheboxinFigure1-3,thespamvariableinthisexamplestores'Hello'untilyoureplaceitwith'Goodbye'.
Figure1-3.Whenanewvalueisassignedtoavariable,theoldoneisforgotten.
VariableNamesTable1-3hasexamplesoflegalvariablenames.Youcannameavariableanythingaslongasitobeysthefollowingthreerules:
1. Itcanbeonlyoneword.2. Itcanuseonlyletters,numbers,andtheunderscore(_)character.3. Itcan’tbeginwithanumber.
Table1-3.ValidandInvalidVariableNames
Validvariablenames Invalidvariablenames
balance current-balance(hyphensarenotallowed)
currentBalance currentbalance(spacesarenotallowed)
current_balance 4account(can’tbeginwithanumber)
_spam 42(can’tbeginwithanumber)
SPAM total_$um(specialcharacterslike$arenotallowed)
account4 'hello'(specialcharacterslike'arenotallowed)
Variablenamesarecase-sensitive,meaningthatspam,SPAM,Spam,andsPaMarefour
differentvariables.ItisaPythonconventiontostartyourvariableswithalowercaseletter.
Thisbookusescamelcaseforvariablenamesinsteadofunderscores;thatis,variablesLookLikeThisinsteadoflooking_like_this.SomeexperiencedprogrammsmaypointoutthattheofficialPythoncodestyle,PEP8,saysthatunderscoresshouldbeused.Iunapologeticallyprefercamelcaseandpointto“AFoolishConsistencyIstheHobgoblinofLittleMinds”inPEP8itself:
“Consistencywiththestyleguideisimportant.Butmostimportantly:knowwhentobeinconsistent—sometimesthestyleguidejustdoesn’tapply.Whenindoubt,useyourbestjudgment.”
Agoodvariablenamedescribesthedataitcontains.ImaginethatyoumovedtoanewhouseandlabeledallofyourmovingboxesasStuff.You’dneverfindanything!Thevariablenamesspam,eggs,andbaconareusedasgenericnamesfortheexamplesinthisbookandinmuchofPython’sdocumentation(inspiredbytheMontyPython“Spam”sketch),butinyourprograms,adescriptivenamewillhelpmakeyourcodemorereadable.
YourFirstProgramWhiletheinteractiveshellisgoodforrunningPythoninstructionsoneatatime,towriteentirePythonprograms,you’lltypetheinstructionsintothefileeditor.ThefileeditorissimilartotexteditorssuchasNotepadorTextMate,butithassomespecificfeaturesfortypinginsourcecode.ToopenthefileeditorinIDLE,selectFile▸NewWindow.
Thewindowthatappearsshouldcontainacursorawaitingyourinput,butit’sdifferentfromtheinteractiveshell,whichrunsPythoninstructionsassoonasyoupressENTER.Thefileeditorletsyoutypeinmanyinstructions,savethefile,andruntheprogram.Here’showyoucantellthedifferencebetweenthetwo:
Theinteractiveshellwindowwillalwaysbetheonewiththe>>>prompt.Thefileeditorwindowwillnothavethe>>>prompt.
Nowit’stimetocreateyourfirstprogram!Whenthefileeditorwindowopens,typethefollowingintoit:
➊#Thisprogramsayshelloandasksformyname.
➋print('Helloworld!')
print('Whatisyourname?')#askfortheirname
➌myName=input()
➍print('Itisgoodtomeetyou,'+myName)
➎print('Thelengthofyournameis:')
print(len(myName))
➏print('Whatisyourage?')#askfortheirage
myAge=input()
print('Youwillbe'+str(int(myAge)+1)+'inayear.')
Onceyou’veenteredyoursourcecode,saveitsothatyouwon’thavetoretypeiteachtimeyoustartIDLE.Fromthemenuatthetopofthefileeditorwindow,selectFile▸SaveAs.IntheSaveAswindow,enterhello.pyintheFileNamefieldandthenclickSave.
Youshouldsaveyourprogramseveryonceinawhileasyoutypethem.Thatway,ifthecomputercrashesoryouaccidentallyexitfromIDLE,youwon’tlosethecode.Asashortcut,youcanpressCTRL-SonWindowsandLinuxor⌘-SonOSXtosaveyourfile.
Onceyou’vesaved,let’srunourprogram.SelectRun▸RunModuleorjustpresstheF5key.YourprogramshouldrunintheinteractiveshellwindowthatappearedwhenyoufirststartedIDLE.Remember,youhavetopressF5fromthefileeditorwindow,nottheinteractiveshellwindow.Enteryournamewhenyourprogramasksforit.Theprogram’soutputintheinteractiveshellshouldlooksomethinglikethis:
Python3.3.2(v3.3.2:d047928ae3f6,May162013,00:06:53)[MSCv.160064bit
(AMD64)]onwin32
Type"copyright","credits"or"license()"formoreinformation.
>>>================================RESTART================================
>>>
Helloworld!
Whatisyourname?
Al
Itisgoodtomeetyou,Al
Thelengthofyournameis:
2
Whatisyourage?
4
Youwillbe5inayear.
>>>
Whentherearenomorelinesofcodetoexecute,thePythonprogramterminates;thatis,it
stopsrunning.(YoucanalsosaythatthePythonprogramexits.)
YoucanclosethefileeditorbyclickingtheXatthetopofthewindow.Toreloadasavedprogram,selectFile▸Openfromthemenu.Dothatnow,andinthewindowthatappears,choosehello.py,andclicktheOpenbutton.Yourpreviouslysavedhello.pyprogramshouldopeninthefileeditorwindow.
DissectingYourProgramWithyournewprogramopeninthefileeditor,let’stakeaquicktourofthePythoninstructionsitusesbylookingatwhateachlineofcodedoes.
CommentsThefollowinglineiscalledacomment.
➊#Thisprogramsayshelloandasksformyname.
Pythonignorescomments,andyoucanusethemtowritenotesorremindyourselfwhatthecodeistryingtodo.Anytextfortherestofthelinefollowingahashmark(#)ispartofacomment.
Sometimes,programmerswillputa#infrontofalineofcodetotemporarilyremoveitwhiletestingaprogram.Thisiscalledcommentingoutcode,anditcanbeusefulwhenyou’retryingtofigureoutwhyaprogramdoesn’twork.Youcanremovethe#laterwhenyouarereadytoputthelinebackin.
Pythonalsoignorestheblanklineafterthecomment.Youcanaddasmanyblanklinestoyourprogramasyouwant.Thiscanmakeyourcodeeasiertoread,likeparagraphsinabook.
Theprint()FunctionTheprint()functiondisplaysthestringvalueinsidetheparenthesesonthescreen.
➋print('Helloworld!')
print('Whatisyourname?')#askfortheirname
Thelineprint('Helloworld!')means“Printoutthetextinthestring'Helloworld!'.”WhenPythonexecutesthisline,yousaythatPythoniscallingtheprint()functionandthestringvalueisbeingpassedtothefunction.Avaluethatispassedtoafunctioncallisanargument.Noticethatthequotesarenotprintedtothescreen.Theyjustmarkwherethestringbeginsandends;theyarenotpartofthestringvalue.
NOTE
Youcanalsousethisfunctiontoputablanklineonthescreen;justcallprint()withnothinginbetweentheparentheses.
Whenwritingafunctionname,theopeningandclosingparenthesesattheendidentifyitasthenameofafunction.Thisiswhyinthisbookyou’llseeprint()ratherthanprint.Chapter2describesfunctionsinmoredetail.
Theinput()FunctionTheinput()functionwaitsfortheusertotypesometextonthekeyboardandpressENTER.
➌myName=input()
Thisfunctioncallevaluatestoastringequaltotheuser’stext,andthepreviouslineofcodeassignsthemyNamevariabletothisstringvalue.
Youcanthinkoftheinput()functioncallasanexpressionthatevaluatestowhateverstringtheusertypedin.Iftheuserentered'Al',thentheexpressionwouldevaluateto
myName='Al'.
PrintingtheUser’sNameThefollowingcalltoprint()actuallycontainstheexpression'Itisgoodtomeetyou,'+myNamebetweentheparentheses.
➍print('Itisgoodtomeetyou,'+myName)
Rememberthatexpressionscanalwaysevaluatetoasinglevalue.If'Al'isthevaluestoredinmyNameonthepreviousline,thenthisexpressionevaluatesto'Itisgoodtomeetyou,Al'.Thissinglestringvalueisthenpassedtoprint(),whichprintsitonthescreen.
Thelen()FunctionYoucanpassthelen()functionastringvalue(oravariablecontainingastring),andthefunctionevaluatestotheintegervalueofthenumberofcharactersinthatstring.
➎print('Thelengthofyournameis:')
print(len(myName))
Enterthefollowingintotheinteractiveshelltotrythis:>>>len('hello')
5
>>>len('Myveryenergeticmonsterjustscarfednachos.')
46
>>>len('')
0
Justlikethoseexamples,len(myName)evaluatestoaninteger.Itisthenpassedtoprint()tobedisplayedonthescreen.Noticethatprint()allowsyoutopassiteitherintegervaluesorstringvalues.Butnoticetheerrorthatshowsupwhenyoutypethefollowingintotheinteractiveshell:
>>>print('Iam'+29+'yearsold.')
Traceback(mostrecentcalllast):
File"<pyshell#6>",line1,in<module>
print('Iam'+29+'yearsold.')
TypeError:Can'tconvert'int'objecttostrimplicitly
Theprint()functionisn’tcausingthaterror,butratherit’stheexpressionyoutriedtopasstoprint().Yougetthesameerrormessageifyoutypetheexpressionintotheinteractiveshellonitsown.
>>>'Iam'+29+'yearsold.'
Traceback(mostrecentcalllast):
File"<pyshell#7>",line1,in<module>
'Iam'+29+'yearsold.'
TypeError:Can'tconvert'int'objecttostrimplicitly
Pythongivesanerrorbecauseyoucanusethe+operatoronlytoaddtwointegerstogetherorconcatenatetwostrings.Youcan’taddanintegertoastringbecausethisisungrammaticalinPython.Youcanfixthisbyusingastringversionoftheintegerinstead,asexplainedinthenextsection.
Thestr(),int(),andfloat()FunctionsIfyouwanttoconcatenateanintegersuchas29withastringtopasstoprint(),you’llneedtogetthevalue'29',whichisthestringformof29.Thestr()functioncanbepassedanintegervalueandwillevaluatetoastringvalueversionofit,asfollows:
>>>str(29)
'29'
>>>print('Iam'+str(29)+'yearsold.')
Iam29yearsold.
Becausestr(29)evaluatesto'29',theexpression'Iam'+str(29)+'yearsold.'evaluatesto'Iam'+'29'+'yearsold.',whichinturnevaluatesto'Iam29yearsold.'.Thisisthevaluethatispassedtotheprint()function.
Thestr(),int(),andfloat()functionswillevaluatetothestring,integer,andfloating-pointformsofthevalueyoupass,respectively.Tryconvertingsomevaluesintheinteractiveshellwiththesefunctions,andwatchwhathappens.
>>>str(0)
'0'
>>>str(-3.14)
'-3.14'
>>>int('42')
42
>>>int('-99')
-99
>>>int(1.25)
1
>>>int(1.99)
1
>>>float('3.14')
3.14
>>>float(10)
10.0
Thepreviousexamplescallthestr(),int(),andfloat()functionsandpassthemvaluesoftheotherdatatypestoobtainastring,integer,orfloating-pointformofthosevalues.
Thestr()functionishandywhenyouhaveanintegerorfloatthatyouwanttoconcatenatetoastring.Theint()functionisalsohelpfulifyouhaveanumberasastringvaluethatyouwanttouseinsomemathematics.Forexample,theinput()functionalwaysreturnsastring,eveniftheuserentersanumber.Enterspam=input()intotheinteractiveshellandenter101whenitwaitsforyourtext.
>>>spam=input()
101
>>>spam
'101'
Thevaluestoredinsidespamisn’ttheinteger101butthestring'101'.Ifyouwanttodomathusingthevalueinspam,usetheint()functiontogettheintegerformofspamandthenstorethisasthenewvalueinspam.
>>>spam=int(spam)
>>>spam
101
Nowyoushouldbeabletotreatthespamvariableasanintegerinsteadofastring.>>>spam*10/5
202.0
Notethatifyoupassavaluetoint()thatitcannotevaluateasaninteger,Pythonwilldisplayanerrormessage.
>>>int('99.99')
Traceback(mostrecentcalllast):
File"<pyshell#18>",line1,in<module>
int('99.99')
ValueError:invalidliteralforint()withbase10:'99.99'
>>>int('twelve')
Traceback(mostrecentcalllast):
File"<pyshell#19>",line1,in<module>
int('twelve')
ValueError:invalidliteralforint()withbase10:'twelve'
Theint()functionisalsousefulifyouneedtoroundafloating-pointnumberdown.Ifyouwanttoroundafloating-pointnumberup,justadd1toitafterward.
>>>int(7.7)
7
>>>int(7.7)+1
8
Inyourprogram,youusedtheint()andstr()functionsinthelastthreelinestogetavalueoftheappropriatedatatypeforthecode.
➏print('Whatisyourage?')#askfortheirage
myAge=input()
print('Youwillbe'+str(int(myAge)+1)+'inayear.')
ThemyAgevariablecontainsthevaluereturnedfrominput().Becausetheinput()functionalwaysreturnsastring(eveniftheusertypedinanumber),youcanusetheint(myAge)codetoreturnanintegervalueofthestringinmyAge.Thisintegervalueisthenaddedto1intheexpressionint(myAge)+1.
Theresultofthisadditionispassedtothestr()function:str(int(myAge)+1).Thestringvaluereturnedisthenconcatenatedwiththestrings'Youwillbe'and'inayear.'toevaluatetoonelargestringvalue.Thislargestringisfinallypassedtoprint()tobedisplayedonthescreen.
Let’ssaytheuserentersthestring'4'formyAge.Thestring'4'isconvertedtoaninteger,soyoucanaddonetoit.Theresultis5.Thestr()functionconvertstheresultbacktoastring,soyoucanconcatenateitwiththesecondstring,'inayear.',tocreatethefinalmessage.TheseevaluationstepswouldlooksomethinglikeFigure1-4.
TEXTANDNUMBEREQUIVALENCE
Althoughthestringvalueofanumberisconsideredacompletelydifferentvaluefromtheintegerorfloating-pointversion,anintegercanbeequaltoafloatingpoint.
>>>42=='42'
False
>>>42==42.0
True
>>>42.0==0042.000
True
Pythonmakesthisdistinctionbecausestringsaretext,whileintegersandfloatsarebothnumbers.
SummaryYoucancomputeexpressionswithacalculatorortypestringconcatenationswithawordprocessor.Youcanevendostringreplicationeasilybycopyingandpastingtext.Butexpressions,andtheircomponentvalues—operators,variables,andfunctioncalls—arethebasicbuildingblocksthatmakeprograms.Onceyouknowhowtohandletheseelements,youwillbeabletoinstructPythontooperateonlargeamountsofdataforyou.
Itisgoodtorememberthedifferenttypesofoperators(+,-,*,/,//,%,and**formathoperations,and+and*forstringoperations)andthethreedatatypes(integers,floating-pointnumbers,andstrings)introducedinthischapter.
Afewdifferentfunctionswereintroducedaswell.Theprint()andinput()functionshandlesimpletextoutput(tothescreen)andinput(fromthekeyboard).Thelen()functiontakesastringandevaluatestoanintofthenumberofcharactersinthestring.Thestr(),int(),andfloat()functionswillevaluatetothestring,integer,orfloating-pointnumberformofthevaluetheyarepassed.
Inthenextchapter,youwilllearnhowtotellPythontomakeintelligentdecisionsaboutwhatcodetorun,whatcodetoskip,andwhatcodetorepeatbasedonthevaluesithas.Thisisknownasflowcontrol,anditallowsyoutowriteprogramsthatmakeintelligentdecisions.
PracticeQuestionsQ: 1.Whichofthefollowingareoperators,andwhicharevalues?
*
'hello'
-88.8
-
/
+
5
Q: 2.Whichofthefollowingisavariable,andwhichisastring?spam
'spam'
Q: 3.Namethreedatatypes.
Q: 4.Whatisanexpressionmadeupof?Whatdoallexpressionsdo?
Q: 5.Thischapterintroducedassignmentstatements,likespam=10.Whatisthedifferencebetweenanexpressionandastatement?
Q: 6.Whatdoesthevariablebaconcontainafterthefollowingcoderuns?bacon=20
bacon+1
Q: 7.Whatshouldthefollowingtwoexpressionsevaluateto?'spam'+'spamspam'
'spam'*3
Q: 8.Whyiseggsavalidvariablenamewhile100isinvalid?
Q: 9.Whatthreefunctionscanbeusedtogettheinteger,floating-pointnumber,orstringversionofavalue?
Q: 10.Whydoesthisexpressioncauseanerror?Howcanyoufixit?'Ihaveeaten'+99+'burritos.'
Extracredit:SearchonlineforthePythondocumentationforthelen()function.Itwillbeonawebpagetitled“Built-inFunctions.”SkimthelistofotherfunctionsPythonhas,lookupwhattheround()functiondoes,andexperimentwithitintheinteractiveshell.
Chapter2.FlowControlSoyouknowthebasicsofindividualinstructionsandthataprogramisjustaseriesofinstructions.Buttherealstrengthofprogrammingisn’tjustrunning(orexecuting)oneinstructionafteranotherlikeaweekenderrandlist.Basedonhowtheexpressionsevaluate,theprogramcandecidetoskipinstructions,repeatthem,orchooseoneofseveralinstructionstorun.Infact,youalmostneverwantyourprogramstostartfromthefirstlineofcodeandsimplyexecuteeveryline,straighttotheend.FlowcontrolstatementscandecidewhichPythoninstructionstoexecuteunderwhichconditions.
Theseflowcontrolstatementsdirectlycorrespondtothesymbolsinaflowchart,soI’llprovideflowchartversionsofthecodediscussedinthischapter.Figure2-1showsaflowchartforwhattodoifit’sraining.FollowthepathmadebythearrowsfromStarttoEnd.
Figure2-1.Aflowcharttotellyouwhattodoifitisraining
Inaflowchart,thereisusuallymorethanonewaytogofromthestarttotheend.Thesameistrueforlinesofcodeinacomputerprogram.Flowchartsrepresentthesebranchingpointswithdiamonds,whiletheotherstepsarerepresentedwithrectangles.Thestartingandendingstepsarerepresentedwithroundedrectangles.
Butbeforeyoulearnaboutflowcontrolstatements,youfirstneedtolearnhowtorepresentthoseyesandnooptions,andyouneedtounderstandhowtowritethosebranchingpointsasPythoncode.Tothatend,let’sexploreBooleanvalues,comparison
BooleanValuesWhiletheinteger,floating-point,andstringdatatypeshaveanunlimitednumberofpossiblevalues,theBooleandatatypehasonlytwovalues:TrueandFalse.(BooleaniscapitalizedbecausethedatatypeisnamedaftermathematicianGeorgeBoole.)WhentypedasPythoncode,theBooleanvaluesTrueandFalselackthequotesyouplacearoundstrings,andtheyalwaysstartwithacapitalTorF,withtherestofthewordinlowercase.Enterthefollowingintotheinteractiveshell.(Someoftheseinstructionsareintentionallyincorrect,andthey’llcauseerrormessagestoappear.)
➊>>>spam=True
>>>spam
True
➋>>>true
Traceback(mostrecentcalllast):
File"<pyshell#2>",line1,in<module>
true
NameError:name'true'isnotdefined
➌>>>True=2+2
SyntaxError:assignmenttokeyword
Likeanyothervalue,Booleanvaluesareusedinexpressionsandcanbestoredinvariables➊.Ifyoudon’tusethepropercase➋oryoutrytouseTrueandFalseforvariablenames➌,Pythonwillgiveyouanerrormessage.
ComparisonOperatorsComparisonoperatorscomparetwovaluesandevaluatedowntoasingleBooleanvalue.Table2-1liststhecomparisonoperators.
Table2-1.ComparisonOperators
Operator Meaning
== Equalto
!= Notequalto
< Lessthan
> Greaterthan
<= Lessthanorequalto
>= Greaterthanorequalto
TheseoperatorsevaluatetoTrueorFalsedependingonthevaluesyougivethem.Let’strysomeoperatorsnow,startingwith==and!=.
>>>42==42
True
>>>42==99
False
>>>2!=3
True
>>>2!=2
False
Asyoumightexpect,==(equalto)evaluatestoTruewhenthevaluesonbothsidesarethesame,and!=(notequalto)evaluatestoTruewhenthetwovaluesaredifferent.The==and!=operatorscanactuallyworkwithvaluesofanydatatype.
>>>'hello'=='hello'
True
>>>'hello'=='Hello'
False
>>>'dog'!='cat'
True
>>>True==True
True
>>>True!=False
True
>>>42==42.0
True
➊>>>42=='42'
False
Notethatanintegerorfloating-pointvaluewillalwaysbeunequaltoastringvalue.Theexpression42=='42'➊evaluatestoFalsebecausePythonconsiderstheinteger42tobedifferentfromthestring'42'.
The<,>,<=,and>=operators,ontheotherhand,workproperlyonlywithintegerandfloating-pointvalues.
>>>42<100
True
>>>42>100
False
>>>42<42
False
>>>eggCount=42
➊>>>eggCount<=42
True
>>>myAge=29
➋>>>myAge>=10
True
THEDIFFERENCEBETWEENTHE==AND=OPERATORS
Youmighthavenoticedthatthe==operator(equalto)hastwoequalsigns,whilethe=operator(assignment)hasjustoneequalsign.It’seasytoconfusethesetwooperatorswitheachother.Justrememberthesepoints:
The==operator(equalto)askswhethertwovaluesarethesameaseachother.The=operator(assignment)putsthevalueontherightintothevariableontheleft.
Tohelprememberwhichiswhich,noticethatthe==operator(equalto)consistsoftwocharacters,justlikethe!=operator(notequalto)consistsoftwocharacters.
You’lloftenusecomparisonoperatorstocompareavariable’svaluetosomeothervalue,likeintheeggCount<=42➊andmyAge>=10➋examples.(Afterall,insteadoftyping'dog'!='cat'inyourcode,youcouldhavejusttypedTrue.)You’llseemoreexamplesofthislaterwhenyoulearnaboutflowcontrolstatements.
BooleanOperatorsThethreeBooleanoperators(and,or,andnot)areusedtocompareBooleanvalues.Likecomparisonoperators,theyevaluatetheseexpressionsdowntoaBooleanvalue.Let’sexploretheseoperatorsindetail,startingwiththeandoperator.
BinaryBooleanOperatorsTheandandoroperatorsalwaystaketwoBooleanvalues(orexpressions),sothey’reconsideredbinaryoperators.TheandoperatorevaluatesanexpressiontoTrueifbothBooleanvaluesareTrue;otherwise,itevaluatestoFalse.Entersomeexpressionsusingandintotheinteractiveshelltoseeitinaction.
>>>TrueandTrue
True
>>>TrueandFalse
False
AtruthtableshowseverypossibleresultofaBooleanoperator.Table2-2isthetruthtablefortheandoperator.
Table2-2.TheandOperator’sTruthTable
Expression Evaluatesto…
TrueandTrue True
TrueandFalse False
FalseandTrue False
FalseandFalse False
Ontheotherhand,theoroperatorevaluatesanexpressiontoTrueifeitherofthetwoBooleanvaluesisTrue.IfbothareFalse,itevaluatestoFalse.
>>>FalseorTrue
True
>>>FalseorFalse
False
Youcanseeeverypossibleoutcomeoftheoroperatorinitstruthtable,showninTable2-3.
Table2-3.TheorOperator’sTruthTable
Expression Evaluatesto…
TrueorTrue True
TrueorFalse True
FalseorTrue True
FalseorFalse False
ThenotOperator
Unlikeandandor,thenotoperatoroperatesononlyoneBooleanvalue(orexpression).ThenotoperatorsimplyevaluatestotheoppositeBooleanvalue.
>>>notTrue
False
➊>>>notnotnotnotTrue
True
Muchlikeusingdoublenegativesinspeechandwriting,youcannestnotoperators➊,thoughthere’snevernotnoreasontodothisinrealprograms.Table2-4showsthetruthtablefornot.
Table2-4.ThenotOperator’sTruthTable
Expression Evaluatesto…
notTrue False
notFalse True
MixingBooleanandComparisonOperatorsSincethecomparisonoperatorsevaluatetoBooleanvalues,youcanusetheminexpressionswiththeBooleanoperators.
Recallthattheand,or,andnotoperatorsarecalledBooleanoperatorsbecausetheyalwaysoperateontheBooleanvaluesTrueandFalse.Whileexpressionslike4<5aren’tBooleanvalues,theyareexpressionsthatevaluatedowntoBooleanvalues.TryenteringsomeBooleanexpressionsthatusecomparisonoperatorsintotheinteractiveshell.
>>>(4<5)and(5<6)
True
>>>(4<5)and(9<6)
False
>>>(1==2)or(2==2)
True
Thecomputerwillevaluatetheleftexpressionfirst,andthenitwillevaluatetherightexpression.WhenitknowstheBooleanvalueforeach,itwillthenevaluatethewholeexpressiondowntooneBooleanvalue.Youcanthinkofthecomputer’sevaluationprocessfor(4<5)and(5<6)asshowninFigure2-2.
YoucanalsousemultipleBooleanoperatorsinanexpression,alongwiththecomparisonoperators.
>>>2+2==4andnot2+2==5and2*2==2+2
True
TheBooleanoperatorshaveanorderofoperationsjustlikethemathoperatorsdo.Afteranymathandcomparisonoperatorsevaluate,Pythonevaluatesthenotoperatorsfirst,thentheandoperators,andthentheoroperators.
Figure2-2.Theprocessofevaluating(4<5)and(5<6)toTrue.
ElementsofFlowControlFlowcontrolstatementsoftenstartwithapartcalledthecondition,andallarefollowedbyablockofcodecalledtheclause.BeforeyoulearnaboutPython’sspecificflowcontrolstatements,I’llcoverwhataconditionandablockare.
ConditionsTheBooleanexpressionsyou’veseensofarcouldallbeconsideredconditions,whicharethesamethingasexpressions;conditionisjustamorespecificnameinthecontextofflowcontrolstatements.ConditionsalwaysevaluatedowntoaBooleanvalue,TrueorFalse.AflowcontrolstatementdecideswhattodobasedonwhetheritsconditionisTrueorFalse,andalmosteveryflowcontrolstatementusesacondition.
BlocksofCodeLinesofPythoncodecanbegroupedtogetherinblocks.Youcantellwhenablockbeginsandendsfromtheindentationofthelinesofcode.Therearethreerulesforblocks.
1. Blocksbeginwhentheindentationincreases.2. Blockscancontainotherblocks.3. Blocksendwhentheindentationdecreasestozeroortoacontainingblock’s
indentation.
Blocksareeasiertounderstandbylookingatsomeindentedcode,solet’sfindtheblocksinpartofasmallgameprogram,shownhere:
ifname=='Mary':
➊print('HelloMary')
ifpassword=='swordfish':
➋print('Accessgranted.')
else:
➌print('Wrongpassword.')
Thefirstblockofcode➊startsatthelineprint('HelloMary')andcontainsallthelinesafterit.Insidethisblockisanotherblock➋,whichhasonlyasinglelineinit:print('AccessGranted.').Thethirdblock➌isalsoonelinelong:print('Wrongpassword.').
ProgramExecutionInthepreviouschapter’shello.pyprogram,Pythonstartedexecutinginstructionsatthetopoftheprogramgoingdown,oneafteranother.Theprogramexecution(orsimply,execution)isatermforthecurrentinstructionbeingexecuted.Ifyouprintthesourcecodeonpaperandputyourfingeroneachlineasitisexecuted,youcanthinkofyourfingerastheprogramexecution.
Notallprogramsexecutebysimplygoingstraightdown,however.Ifyouuseyourfingertotracethroughaprogramwithflowcontrolstatements,you’lllikelyfindyourselfjumpingaroundthesourcecodebasedonconditions,andyou’llprobablyskipentireclauses.
FlowControlStatementsNow,let’sexplorethemostimportantpieceofflowcontrol:thestatementsthemselves.ThestatementsrepresentthediamondsyousawintheflowchartinFigure2-1,andtheyaretheactualdecisionsyourprogramswillmake.
ifStatementsThemostcommontypeofflowcontrolstatementistheifstatement.Anifstatement’sclause(thatis,theblockfollowingtheifstatement)willexecuteifthestatement’sconditionisTrue.TheclauseisskippediftheconditionisFalse.
InplainEnglish,anifstatementcouldbereadas,“Ifthisconditionistrue,executethecodeintheclause.”InPython,anifstatementconsistsofthefollowing:
TheifkeywordAcondition(thatis,anexpressionthatevaluatestoTrueorFalse)AcolonStartingonthenextline,anindentedblockofcode(calledtheifclause)
Forexample,let’ssayyouhavesomecodethatcheckstoseewhethersomeone’snameisAlice.(Pretendnamewasassignedsomevalueearlier.)
ifname=='Alice':
print('Hi,Alice.')
Allflowcontrolstatementsendwithacolonandarefollowedbyanewblockofcode(theclause).Thisifstatement’sclauseistheblockwithprint('Hi,Alice.').Figure2-3showswhataflowchartofthiscodewouldlooklike.
Figure2-3.Theflowchartforanifstatement
elseStatementsAnifclausecanoptionallybefollowedbyanelsestatement.Theelseclauseisexecutedonlywhentheifstatement’sconditionisFalse.InplainEnglish,anelsestatementcouldbereadas,“Ifthisconditionistrue,executethiscode.Orelse,executethatcode.”Anelsestatementdoesn’thaveacondition,andincode,anelsestatementalwaysconsistsofthefollowing:
TheelsekeywordAcolonStartingonthenextline,anindentedblockofcode(calledtheelseclause)
ReturningtotheAliceexample,let’slookatsomecodethatusesanelsestatementtoofferadifferentgreetingiftheperson’snameisn’tAlice.
ifname=='Alice':
print('Hi,Alice.')
else:
print('Hello,stranger.')
Figure2-4showswhataflowchartofthiscodewouldlooklike.
Figure2-4.Theflowchartforanelsestatement
elifStatementsWhileonlyoneoftheiforelseclauseswillexecute,youmayhaveacasewhereyouwantoneofmanypossibleclausestoexecute.Theelifstatementisan“elseif”statementthatalwaysfollowsaniforanotherelifstatement.ItprovidesanotherconditionthatischeckedonlyifanyofthepreviousconditionswereFalse.Incode,anelifstatementalwaysconsistsofthefollowing:
TheelifkeywordAcondition(thatis,anexpressionthatevaluatestoTrueorFalse)AcolonStartingonthenextline,anindentedblockofcode(calledtheelifclause)
Let’saddaneliftothenamecheckertoseethisstatementinaction.ifname=='Alice':
print('Hi,Alice.')
elifage<12:
print('YouarenotAlice,kiddo.')
Thistime,youchecktheperson’sage,andtheprogramwilltellthemsomethingdifferentifthey’reyoungerthan12.YoucanseetheflowchartforthisinFigure2-5.
Figure2-5.Theflowchartforanelifstatement
Theelifclauseexecutesifage<12isTrueandname=='Alice'isFalse.However,ifbothoftheconditionsareFalse,thenbothoftheclausesareskipped.Itisnotguaranteedthatatleastoneoftheclauseswillbeexecuted.Whenthereisachainofelifstatements,onlyoneornoneoftheclauseswillbeexecuted.Onceoneofthestatements’conditionsisfoundtobeTrue,therestoftheelifclausesareautomaticallyskipped.Forexample,openanewfileeditorwindowandenterthefollowingcode,savingitasvampire.py:
ifname=='Alice':
print('Hi,Alice.')
elifage<12:
print('YouarenotAlice,kiddo.')
elifage>2000:
print('Unlikeyou,Aliceisnotanundead,immortalvampire.')
elifage>100:
print('YouarenotAlice,grannie.')
HereI’veaddedtwomoreelifstatementstomakethenamecheckergreetapersonwithdifferentanswersbasedonage.Figure2-6showstheflowchartforthis.
Figure2-6.Theflowchartformultipleelifstatementsinthevampire.pyprogram
Theorderoftheelifstatementsdoesmatter,however.Let’srearrangethemtointroduceabug.RememberthattherestoftheelifclausesareautomaticallyskippedonceaTrueconditionhasbeenfound,soifyouswaparoundsomeoftheclausesinvampire.py,yourunintoaproblem.Changethecodetolooklikethefollowing,andsaveitasvampire2.py:
ifname=='Alice':
print('Hi,Alice.')
elifage<12:
print('YouarenotAlice,kiddo.')
➊elifage>100:
print('YouarenotAlice,grannie.')
elifage>2000:
print('Unlikeyou,Aliceisnotanundead,immortalvampire.')
Saytheagevariablecontainsthevalue3000beforethiscodeisexecuted.Youmightexpectthecodetoprintthestring'Unlikeyou,Aliceisnotanundead,immortalvampire.'.However,becausetheage>100conditionisTrue(afterall,3000isgreaterthan100)➊,thestring'YouarenotAlice,grannie.'isprinted,andtherestoftheelifstatementsareautomaticallyskipped.Remember,atmostonlyoneoftheclauseswillbeexecuted,andforelifstatements,theordermatters!
Figure2-7showstheflowchartforthepreviouscode.Noticehowthediamondsforage>100andage>2000areswapped.
Optionally,youcanhaveanelsestatementafterthelastelifstatement.Inthatcase,itisguaranteedthatatleastone(andonlyone)oftheclauseswillbeexecuted.IftheconditionsineveryifandelifstatementareFalse,thentheelseclauseisexecuted.Forexample,let’sre-createtheAliceprogramtouseif,elif,andelseclauses.
ifname=='Alice':
print('Hi,Alice.')
elifage<12:
print('YouarenotAlice,kiddo.')
else:
print('YouareneitherAlicenoralittlekid.')
Figure2-8showstheflowchartforthisnewcode,whichwe’llsaveaslittleKid.py.
InplainEnglish,thistypeofflowcontrolstructurewouldbe,“Ifthefirstconditionistrue,dothis.Else,ifthesecondconditionistrue,dothat.Otherwise,dosomethingelse.”Whenyouuseallthreeofthesestatementstogether,remembertheserulesabouthowtoorderthemtoavoidbugsliketheoneinFigure2-7.First,thereisalwaysexactlyoneifstatement.Anyelifstatementsyouneedshouldfollowtheifstatement.Second,ifyouwanttobesurethatatleastoneclauseisexecuted,closethestructurewithanelsestatement.
Figure2-7.Theflowchartforthevampire2.pyprogram.Thecrossed-outpathwilllogicallyneverhappen,becauseifageweregreaterthan2000,itwouldhavealreadybeengreaterthan100.
Figure2-8.FlowchartforthepreviouslittleKid.pyprogram
whileLoopStatementsYoucanmakeablockofcodeexecuteoverandoveragainwithawhilestatement.Thecodeinawhileclausewillbeexecutedaslongasthewhilestatement’sconditionisTrue.Incode,awhilestatementalwaysconsistsofthefollowing:
ThewhilekeywordAcondition(thatis,anexpressionthatevaluatestoTrueorFalse)AcolonStartingonthenextline,anindentedblockofcode(calledthewhileclause)
Youcanseethatawhilestatementlookssimilartoanifstatement.Thedifferenceisinhowtheybehave.Attheendofanifclause,theprogramexecutioncontinuesaftertheifstatement.Butattheendofawhileclause,theprogramexecutionjumpsbacktothestartofthewhilestatement.Thewhileclauseisoftencalledthewhilelooporjusttheloop.
Let’slookatanifstatementandawhileloopthatusethesameconditionandtakethesameactionsbasedonthatcondition.Hereisthecodewithanifstatement:
spam=0
ifspam<5:
print('Hello,world.')
spam=spam+1
Hereisthecodewithawhilestatement:spam=0
whilespam<5:
print('Hello,world.')
spam=spam+1
Thesestatementsaresimilar—bothifandwhilecheckthevalueofspam,andifit’slessthanfive,theyprintamessage.Butwhenyourunthesetwocodesnippets,somethingverydifferenthappensforeachone.Fortheifstatement,theoutputissimply"Hello,world.".Butforthewhilestatement,it’s"Hello,world."repeatedfivetimes!Takealookattheflowchartsforthesetwopiecesofcode,Figure2-9andFigure2-10,toseewhythishappens.
Figure2-9.Theflowchartfortheifstatementcode
Figure2-10.Theflowchartforthewhilestatementcode
Thecodewiththeifstatementchecksthecondition,anditprintsHello,world.onlyonceifthatconditionistrue.Thecodewiththewhileloop,ontheotherhand,willprintitfivetimes.Itstopsafterfiveprintsbecausetheintegerinspamisincrementedbyoneattheendofeachloopiteration,whichmeansthattheloopwillexecutefivetimesbeforespam<5isFalse.
Inthewhileloop,theconditionisalwayscheckedatthestartofeachiteration(thatis,eachtimetheloopisexecuted).IftheconditionisTrue,thentheclauseisexecuted,andafterward,theconditionischeckedagain.ThefirsttimetheconditionisfoundtobeFalse,thewhileclauseisskipped.
AnAnnoyingwhileLoop
Here’sasmallexampleprogramthatwillkeepaskingyoutotype,literally,yourname.SelectFile▸NewWindowtoopenanewfileeditorwindow,enterthefollowingcode,andsavethefileasyourName.py:
➊name=''
➋whilename!='yourname':
print('Pleasetypeyourname.')
➌name=input()
➍print('Thankyou!')
First,theprogramsetsthenamevariable➊toanemptystring.Thisissothatthename!='yourname'conditionwillevaluatetoTrueandtheprogramexecutionwillenterthewhileloop’sclause➋.
Thecodeinsidethisclauseaskstheusertotypetheirname,whichisassignedtothename
variable➌.Sincethisisthelastlineoftheblock,theexecutionmovesbacktothestartofthewhileloopandreevaluatesthecondition.Ifthevalueinnameisnotequaltothestring'yourname',thentheconditionisTrue,andtheexecutionentersthewhileclauseagain.
Butoncetheusertypesyourname,theconditionofthewhileloopwillbe'yourname'!='yourname',whichevaluatestoFalse.TheconditionisnowFalse,andinsteadoftheprogramexecutionreenteringthewhileloop’sclause,itskipspastitandcontinuesrunningtherestoftheprogram➍.Figure2-11showsaflowchartfortheyourName.pyprogram.
Figure2-11.AflowchartoftheyourName.pyprogram
Now,let’sseeyourName.pyinaction.PressF5torunit,andentersomethingotherthanyournameafewtimesbeforeyougivetheprogramwhatitwants.
Pleasetypeyourname.
Al
Pleasetypeyourname.
Albert
Pleasetypeyourname.
%#@#%*(^&!!!
Pleasetypeyourname.
yourname
Thankyou!
Ifyouneverenteryourname,thenthewhileloop’sconditionwillneverbeFalse,andtheprogramwilljustkeepaskingforever.Here,theinput()callletstheuserentertherightstringtomaketheprogrammoveon.Inotherprograms,theconditionmightneveractuallychange,andthatcanbeaproblem.Let’slookathowyoucanbreakoutofawhileloop.
breakStatementsThereisashortcuttogettingtheprogramexecutiontobreakoutofawhileloop’sclauseearly.Iftheexecutionreachesabreakstatement,itimmediatelyexitsthewhileloop’sclause.Incode,abreakstatementsimplycontainsthebreakkeyword.
Prettysimple,right?Here’saprogramthatdoesthesamethingasthepreviousprogram,butitusesabreakstatementtoescapetheloop.Enterthefollowingcode,andsavethefileasyourName2.py:
➊whileTrue:
print('Pleasetypeyourname.')
➋name=input()
➌ifname=='yourname':
➍break
➎print('Thankyou!')
Thefirstline➊createsaninfiniteloop;itisawhileloopwhoseconditionisalwaysTrue.(TheexpressionTrue,afterall,alwaysevaluatesdowntothevalueTrue.)Theprogramexecutionwillalwaysentertheloopandwillexititonlywhenabreakstatementisexecuted.(Aninfiniteloopthatneverexitsisacommonprogrammingbug.)
Justlikebefore,thisprogramaskstheusertotypeyourname➋.Now,however,whiletheexecutionisstillinsidethewhileloop,anifstatementgetsexecuted➌tocheckwhethernameisequaltoyourname.IfthisconditionisTrue,thebreakstatementisrun➍,andtheexecutionmovesoutofthelooptoprint('Thankyou!')➎.Otherwise,theifstatement’sclausewiththebreakstatementisskipped,whichputstheexecutionattheendofthewhileloop.Atthispoint,theprogramexecutionjumpsbacktothestartofthewhilestatement➊torecheckthecondition.SincethisconditionismerelytheTrueBooleanvalue,theexecutionentersthelooptoasktheusertotypeyournameagain.SeeFigure2-12fortheflowchartofthisprogram.
RunyourName2.py,andenterthesametextyouenteredforyourName.py.Therewrittenprogramshouldrespondinthesamewayastheoriginal.
Figure2-12.TheflowchartfortheyourName2.pyprogramwithaninfiniteloop.NotethattheXpathwilllogicallyneverhappenbecausetheloopconditionisalwaysTrue.
continueStatementsLikebreakstatements,continuestatementsareusedinsideloops.Whentheprogramexecutionreachesacontinuestatement,theprogramexecutionimmediatelyjumpsbacktothestartoftheloopandreevaluatestheloop’scondition.(Thisisalsowhathappenswhentheexecutionreachestheendoftheloop.)
TRAPPEDINANINFINITELOOP?
Ifyoueverrunaprogramthathasabugcausingittogetstuckinaninfiniteloop,pressCTRL-C.ThiswillsendaKeyboardInterrupterrortoyourprogramandcauseittostopimmediately.Totryit,createasimpleinfiniteloopinthefileeditor,andsaveitasinfiniteloop.py.
whileTrue:
print('Helloworld!')
Whenyourunthisprogram,itwillprintHelloworld!tothescreenforever,becausethewhilestatement’sconditionisalwaysTrue.InIDLE’sinteractiveshellwindow,thereareonlytwowaystostopthisprogram:pressCTRL-CorselectShell▸restartShellfromthemenu.CTRL-Cishandyifyoueverwanttoterminateyourprogramimmediately,evenifit’snotstuckinaninfiniteloop.
Let’susecontinuetowriteaprogramthatasksforanameandpassword.Enterthefollowingcodeintoanewfileeditorwindowandsavetheprogramasswordfish.py.
whileTrue:
print('Whoareyou?')
name=input()
➊ifname!='Joe':
➋continue
print('Hello,Joe.Whatisthepassword?(Itisafish.)')
➌password=input()
ifpassword=='swordfish':
➍break
➎print('Accessgranted.')
IftheuserentersanynamebesidesJoe➊,thecontinuestatement➋causestheprogramexecutiontojumpbacktothestartoftheloop.Whenitreevaluatesthecondition,theexecutionwillalwaysentertheloop,sincetheconditionissimplythevalueTrue.Oncetheymakeitpastthatifstatement,theuserisaskedforapassword➌.Ifthepasswordenteredisswordfish,thenthebreakstatement➍isrun,andtheexecutionjumpsoutofthewhilelooptoprintAccessgranted➎.Otherwise,theexecutioncontinuestotheendofthewhileloop,whereitthenjumpsbacktothestartoftheloop.SeeFigure2-13forthisprogram’sflowchart.
Figure2-13.Aflowchartforswordfish.py.TheXpathwilllogicallyneverhappenbecausetheloopconditionisalwaysTrue.
“TRUTHY”AND“FALSEY”VALUES
TherearesomevaluesinotherdatatypesthatconditionswillconsiderequivalenttoTrueandFalse.Whenusedinconditions,0,0.0,and''(theemptystring)areconsideredFalse,whileallothervaluesareconsideredTrue.Forexample,lookatthefollowingprogram:
name=''
whilenotname:➊print('Enteryourname:')
name=input()
print('Howmanyguestswillyouhave?')
numOfGuests=int(input())
ifnumOfGuests:➋print('Besuretohaveenoughroomforallyourguests.')➌print('Done')
Iftheuserentersablankstringforname,thenthewhilestatement’sconditionwillbeTrue➊,andtheprogramcontinuestoaskforaname.IfthevaluefornumOfGuestsisnot0➋,thentheconditionisconsideredtobeTrue,andtheprogramwillprintareminderfortheuser➌.
Youcouldhavetypednotname!=''insteadofnotname,andnumOfGuests!=0insteadofnumOfGuests,butusingthetruthyandfalseyvaluescanmakeyourcodeeasiertoread.
Runthisprogramandgiveitsomeinput.UntilyouclaimtobeJoe,itshouldn’taskforapassword,andonceyouenterthecorrectpassword,itshouldexit.
Whoareyou?
I'mfine,thanks.Whoareyou?
Whoareyou?
Joe
Hello,Joe.Whatisthepassword?(Itisafish.)
Mary
Whoareyou?
Joe
Hello,Joe.Whatisthepassword?(Itisafish.)
swordfish
Accessgranted.
forLoopsandtherange()FunctionThewhileloopkeepsloopingwhileitsconditionisTrue(whichisthereasonforitsname),butwhatifyouwanttoexecuteablockofcodeonlyacertainnumberoftimes?Youcandothiswithaforloopstatementandtherange()function.
Incode,aforstatementlookssomethinglikeforiinrange(5):andalwaysincludesthefollowing:
TheforkeywordAvariablenameTheinkeywordAcalltotherange()methodwithuptothreeintegerspassedtoitAcolonStartingonthenextline,anindentedblockofcode(calledtheforclause)
Let’screateanewprogramcalledfiveTimes.pytohelpyouseeaforloopinaction.print('Mynameis')
foriinrange(5):
print('JimmyFiveTimes('+str(i)+')')
Thecodeintheforloop’sclauseisrunfivetimes.Thefirsttimeitisrun,thevariableiissetto0.Theprint()callintheclausewillprintJimmyFiveTimes(0).AfterPythonfinishesaniterationthroughallthecodeinsidetheforloop’sclause,theexecutiongoesbacktothetopoftheloop,andtheforstatementincrementsibyone.Thisiswhy
range(5)resultsinfiveiterationsthroughtheclause,withibeingsetto0,then1,then2,then3,andthen4.Thevariableiwillgoupto,butwillnotinclude,theintegerpassedtorange().Figure2-14showsaflowchartforthefiveTimes.pyprogram.
Figure2-14.TheflowchartforfiveTimes.py
Whenyourunthisprogram,itshouldprintJimmyFiveTimesfollowedbythevalueofifivetimesbeforeleavingtheforloop.
Mynameis
JimmyFiveTimes(0)
JimmyFiveTimes(1)
JimmyFiveTimes(2)
JimmyFiveTimes(3)
JimmyFiveTimes(4)
NOTE
Youcanusebreakandcontinuestatementsinsideforloopsaswell.Thecontinuestatementwillcontinuetothenextvalueoftheforloop’scounter,asiftheprogramexecutionhadreachedtheendoftheloopandreturnedtothestart.Infact,youcanusecontinueandbreakstatementsonlyinsidewhileandforloops.Ifyoutrytousethesestatementselsewhere,Pythonwillgiveyouanerror.
Asanotherforloopexample,considerthisstoryaboutthemathematicianKarlFriedrichGauss.WhenGausswasaboy,ateacherwantedtogivetheclasssomebusywork.Theteachertoldthemtoaddupallthenumbersfrom0to100.YoungGausscameupwithaclevertricktofigureouttheanswerinafewseconds,butyoucanwriteaPythonprogramwithaforlooptodothiscalculationforyou.
➊total=0
➋fornuminrange(101):
➌total=total+num
➍print(total)
Theresultshouldbe5,050.Whentheprogramfirststarts,thetotalvariableissetto0➊.Theforloop➋thenexecutestotal=total+num➌100times.Bythetimetheloophasfinishedallofits100iterations,everyintegerfrom0to100willhavebeenaddedtototal.Atthispoint,totalisprintedtothescreen➍.Evenontheslowestcomputers,thisprogramtakeslessthanasecondtocomplete.
(YoungGaussfiguredoutthattherewere50pairsofnumbersthataddedupto100:1+99,2+98,3+97,andsoon,until49+51.Since50×100is5,000,whenyouaddthatmiddle50,thesumofallthenumbersfrom0to100is5,050.Cleverkid!)
AnEquivalentwhileLoop
Youcanactuallyuseawhilelooptodothesamethingasaforloop;forloopsarejustmoreconcise.Let’srewritefiveTimes.pytouseawhileloopequivalentofaforloop.
print('Mynameis')
i=0
whilei<5:
print('JimmyFiveTimes('+str(i)+')')
i=i+1
Ifyourunthisprogram,theoutputshouldlookthesameasthefiveTimes.pyprogram,whichusesaforloop.
TheStarting,Stopping,andSteppingArgumentstorange()
Somefunctionscanbecalledwithmultipleargumentsseparatedbyacomma,andrange()isoneofthem.Thisletsyouchangetheintegerpassedtorange()tofollowanysequenceofintegers,includingstartingatanumberotherthanzero.
foriinrange(12,16):
print(i)
Thefirstargumentwillbewheretheforloop’svariablestarts,andthesecondargumentwillbeupto,butnotincluding,thenumbertostopat.
12
13
14
15
Therange()functioncanalsobecalledwiththreearguments.Thefirsttwoargumentswillbethestartandstopvalues,andthethirdwillbethestepargument.Thestepistheamountthatthevariableisincreasedbyaftereachiteration.
foriinrange(0,10,2):
print(i)
Socallingrange(0,10,2)willcountfromzerotoeightbyintervalsoftwo.0
2
4
6
8
Therange()functionisflexibleinthesequenceofnumbersitproducesforforloops.Forexample(Ineverapologizeformypuns),youcanevenuseanegativenumberforthestepargumenttomaketheforloopcountdowninsteadofup.
foriinrange(5,-1,-1):
print(i)
ImportingModulesAllPythonprogramscancallabasicsetoffunctionscalledbuilt-infunctions,includingtheprint(),input(),andlen()functionsyou’veseenbefore.Pythonalsocomeswithasetofmodulescalledthestandardlibrary.EachmoduleisaPythonprogramthatcontainsarelatedgroupoffunctionsthatcanbeembeddedinyourprograms.Forexample,themathmodulehasmathematics-relatedfunctions,therandommodulehasrandomnumber–relatedfunctions,andsoon.
Beforeyoucanusethefunctionsinamodule,youmustimportthemodulewithanimportstatement.Incode,animportstatementconsistsofthefollowing:
TheimportkeywordThenameofthemoduleOptionally,moremodulenames,aslongastheyareseparatedbycommas
Onceyouimportamodule,youcanuseallthecoolfunctionsofthatmodule.Let’sgiveitatrywiththerandommodule,whichwillgiveusaccesstotherandom.ranint()function.
Enterthiscodeintothefileeditor,andsaveitasprintRandom.py:importrandom
foriinrange(5):
print(random.randint(1,10))
Whenyourunthisprogram,theoutputwilllooksomethinglikethis:4
1
8
4
1
Therandom.randint()functioncallevaluatestoarandomintegervaluebetweenthetwointegersthatyoupassit.Sincerandint()isintherandommodule,youmustfirsttyperandom.infrontofthefunctionnametotellPythontolookforthisfunctioninsidetherandommodule.
Here’sanexampleofanimportstatementthatimportsfourdifferentmodules:importrandom,sys,os,math
Nowwecanuseanyofthefunctionsinthesefourmodules.We’lllearnmoreaboutthemlaterinthebook.
fromimportStatementsAnalternativeformoftheimportstatementiscomposedofthefromkeyword,followedbythemodulename,theimportkeyword,andastar;forexample,fromrandomimport*.
Withthisformofimportstatement,callstofunctionsinrandomwillnotneedtherandom.prefix.However,usingthefullnamemakesformorereadablecode,soitisbettertousethenormalformoftheimportstatement.
EndingaProgramEarlywithsys.exit()Thelastflowcontrolconcepttocoverishowtoterminatetheprogram.Thisalwayshappensiftheprogramexecutionreachesthebottomoftheinstructions.However,youcancausetheprogramtoterminate,orexit,bycallingthesys.exit()function.Sincethisfunctionisinthesysmodule,youhavetoimportsysbeforeyourprogramcanuseit.
Openanewfileeditorwindowandenterthefollowingcode,savingitasexitExample.py:importsys
whileTrue:
print('Typeexittoexit.')
response=input()
ifresponse=='exit':
sys.exit()
print('Youtyped'+response+'.')
RunthisprograminIDLE.Thisprogramhasaninfiniteloopwithnobreakstatementinside.Theonlywaythisprogramwillendisiftheuserentersexit,causingsys.exit()tobecalled.Whenresponseisequaltoexit,theprogramends.Sincetheresponsevariableissetbytheinput()function,theusermustenterexitinordertostoptheprogram.
SummaryByusingexpressionsthatevaluatetoTrueorFalse(alsocalledconditions),youcanwriteprogramsthatmakedecisionsonwhatcodetoexecuteandwhatcodetoskip.YoucanalsoexecutecodeoverandoveragaininaloopwhileacertainconditionevaluatestoTrue.Thebreakandcontinuestatementsareusefulifyouneedtoexitalooporjumpbacktothestart.
Theseflowcontrolstatementswillletyouwritemuchmoreintelligentprograms.There’sanothertypeofflowcontrolthatyoucanachievebywritingyourownfunctions,whichisthetopicofthenextchapter.
PracticeQuestionsQ: 1.WhatarethetwovaluesoftheBooleandatatype?Howdoyouwritethem?
Q: 2.WhatarethethreeBooleanoperators?
Q: 3.WriteoutthetruthtablesofeachBooleanoperator(thatis,everypossiblecombinationofBooleanvaluesfortheoperatorandwhattheyevaluateto).
Q: 4.Whatdothefollowingexpressionsevaluateto?(5>4)and(3==5)
not(5>4)
(5>4)or(3==5)
not((5>4)or(3==5))
(TrueandTrue)and(True==False)
(notFalse)or(notTrue)
Q: 5.Whatarethesixcomparisonoperators?
Q: 6.Whatisthedifferencebetweentheequaltooperatorandtheassignmentoperator?
Q: 7.Explainwhataconditionisandwhereyouwoulduseone.
Q: 8.Identifythethreeblocksinthiscode:spam=0
ifspam==10:
print('eggs')
ifspam>5:
print('bacon')
else:
print('ham')
print('spam')
print('spam')
Q: 9.WritecodethatprintsHelloif1isstoredinspam,printsHowdyif2isstoredinspam,andprintsGreetings!ifanythingelseisstoredinspam.
Q: 10.Whatcanyoupressifyourprogramisstuckinaninfiniteloop?
Q: 11.Whatisthedifferencebetweenbreakandcontinue?
Q: 12.Whatisthedifferencebetweenrange(10),range(0,10),andrange(0,10,1)inaforloop?
Q: 13.Writeashortprogramthatprintsthenumbers1to10usingaforloop.Thenwriteanequivalentprogramthatprintsthenumbers1to10usingawhileloop.
Q: 14.Ifyouhadafunctionnamedbacon()insideamodulenamedspam,howwouldyoucallitafterimportingspam?
Extracredit:Lookuptheround()andabs()functionsontheInternet,andfindoutwhattheydo.Experimentwiththemintheinteractiveshell.
Chapter3.FunctionsYou’realreadyfamiliarwiththeprint(),input(),andlen()functionsfromthepreviouschapters.Pythonprovidesseveralbuiltinfunctionslikethese,butyoucanalsowriteyourownfunctions.Afunctionislikeamini-programwithinaprogram.
Tobetterunderstandhowfunctionswork,let’screateone.TypethisprogramintothefileeditorandsaveitashelloFunc.py:
➊defhello():
➋print('Howdy!')
print('Howdy!!!')
print('Hellothere.')
➌hello()
hello()
hello()
Thefirstlineisadefstatement➊,whichdefinesafunctionnamedhello().Thecodeintheblockthatfollowsthedefstatement➋isthebodyofthefunction.Thiscodeisexecutedwhenthefunctioniscalled,notwhenthefunctionisfirstdefined.
Thehello()linesafterthefunction➌arefunctioncalls.Incode,afunctioncallisjustthefunction’snamefollowedbyparentheses,possiblywithsomenumberofargumentsinbetweentheparentheses.Whentheprogramexecutionreachesthesecalls,itwilljumptothetoplineinthefunctionandbeginexecutingthecodethere.Whenitreachestheendofthefunction,theexecutionreturnstothelinethatcalledthefunctionandcontinuesmovingthroughthecodeasbefore.
Sincethisprogramcallshello()threetimes,thecodeinthehello()functionisexecutedthreetimes.Whenyourunthisprogram,theoutputlookslikethis:
Howdy!
Howdy!!!
Hellothere.
Howdy!
Howdy!!!
Hellothere.
Howdy!
Howdy!!!
Hellothere.
Amajorpurposeoffunctionsistogroupcodethatgetsexecutedmultipletimes.Withoutafunctiondefined,youwouldhavetocopyandpastethiscodeeachtime,andtheprogramwouldlooklikethis:
print('Howdy!')
print('Howdy!!!')
print('Hellothere.')
print('Howdy!')
print('Howdy!!!')
print('Hellothere.')
print('Howdy!')
print('Howdy!!!')
print('Hellothere.')
Ingeneral,youalwayswanttoavoidduplicatingcode,becauseifyoueverdecidetoupdatethecode—if,forexample,youfindabugyouneedtofix—you’llhavetoremembertochangethecodeeverywhereyoucopiedit.
Asyougetmoreprogrammingexperience,you’lloftenfindyourselfdeduplicatingcode,whichmeansgettingridofduplicatedorcopy-and-pastedcode.Deduplicationmakesyour
defStatementswithParametersWhenyoucalltheprint()orlen()function,youpassinvalues,calledargumentsinthiscontext,bytypingthembetweentheparentheses.Youcanalsodefineyourownfunctionsthatacceptarguments.TypethisexampleintothefileeditorandsaveitashelloFunc2.py:
➊defhello(name):
➋print('Hello'+name)
➌hello('Alice')
hello('Bob')
Whenyourunthisprogram,theoutputlookslikethis:HelloAlice
HelloBob
Thedefinitionofthehello()functioninthisprogramhasaparametercalledname➊.Aparameterisavariablethatanargumentisstoredinwhenafunctioniscalled.Thefirsttimethehello()functioniscalled,it’swiththeargument'Alice'➌.Theprogramexecutionentersthefunction,andthevariablenameisautomaticallysetto'Alice',whichiswhatgetsprintedbytheprint()statement➋.
Onespecialthingtonoteaboutparametersisthatthevaluestoredinaparameterisforgottenwhenthefunctionreturns.Forexample,ifyouaddedprint(name)afterhello('Bob')inthepreviousprogram,theprogramwouldgiveyouaNameErrorbecausethereisnovariablenamedname.Thisvariablewasdestroyedafterthefunctioncallhello('Bob')hadreturned,soprint(name)wouldrefertoanamevariablethatdoesnotexist.
Thisissimilartohowaprogram’svariablesareforgottenwhentheprogramterminates.I’lltalkmoreaboutwhythathappenslaterinthechapter,whenIdiscusswhatafunction’slocalscopeis.
ReturnValuesandreturnStatementsWhenyoucallthelen()functionandpassitanargumentsuchas'Hello',thefunctioncallevaluatestotheintegervalue5,whichisthelengthofthestringyoupassedit.Ingeneral,thevaluethatafunctioncallevaluatestoiscalledthereturnvalueofthefunction.
Whencreatingafunctionusingthedefstatement,youcanspecifywhatthereturnvalueshouldbewithareturnstatement.Areturnstatementconsistsofthefollowing:
ThereturnkeywordThevalueorexpressionthatthefunctionshouldreturn
Whenanexpressionisusedwithareturnstatement,thereturnvalueiswhatthisexpressionevaluatesto.Forexample,thefollowingprogramdefinesafunctionthatreturnsadifferentstringdependingonwhatnumberitispassedasanargument.Typethiscodeintothefileeditorandsaveitasmagic8Ball.py:
➊importrandom
➋defgetAnswer(answerNumber):
➌ifanswerNumber==1:
return'Itiscertain'
elifanswerNumber==2:
return'Itisdecidedlyso'
elifanswerNumber==3:
return'Yes'
elifanswerNumber==4:
return'Replyhazytryagain'
elifanswerNumber==5:
return'Askagainlater'
elifanswerNumber==6:
return'Concentrateandaskagain'
elifanswerNumber==7:
return'Myreplyisno'
elifanswerNumber==8:
return'Outlooknotsogood'
elifanswerNumber==9:
return'Verydoubtful'
➍r=random.randint(1,9)
➎fortune=getAnswer(r)
➏print(fortune)
Whenthisprogramstarts,Pythonfirstimportstherandommodule➊.ThenthegetAnswer()functionisdefined➋.Becausethefunctionisbeingdefined(andnotcalled),theexecutionskipsoverthecodeinit.Next,therandom.randint()functioniscalledwithtwoarguments,1and9➍.Itevaluatestoarandomintegerbetween1and9(including1and9themselves),andthisvalueisstoredinavariablenamedr.
ThegetAnswer()functioniscalledwithrastheargument➎.TheprogramexecutionmovestothetopofthegetAnswer()function➌,andthevaluerisstoredinaparameternamedanswerNumber.Then,dependingonthisvalueinanswerNumber,thefunctionreturnsoneofmanypossiblestringvalues.TheprogramexecutionreturnstothelineatthebottomoftheprogramthatoriginallycalledgetAnswer()➎.Thereturnedstringisassignedtoavariablenamedfortune,whichthengetspassedtoaprint()call➏andisprintedtothescreen.
Notethatsinceyoucanpassreturnvaluesasanargumenttoanotherfunctioncall,youcouldshortenthesethreelines:
r=random.randint(1,9)
fortune=getAnswer(r)
print(fortune)
tothissingleequivalentline:print(getAnswer(random.randint(1,9)))
Remember,expressionsarecomposedofvaluesandoperators.Afunctioncallcanbeusedinanexpressionbecauseitevaluatestoitsreturnvalue.
TheNoneValueInPythonthereisavaluecalledNone,whichrepresentstheabsenceofavalue.NoneistheonlyvalueoftheNoneTypedatatype.(Otherprogramminglanguagesmightcallthisvaluenull,nil,orundefined.)JustliketheBooleanTrueandFalsevalues,NonemustbetypedwithacapitalN.
Thisvalue-without-a-valuecanbehelpfulwhenyouneedtostoresomethingthatwon’tbeconfusedforarealvalueinavariable.OneplacewhereNoneisusedisasthereturnvalueofprint().Theprint()functiondisplaystextonthescreen,butitdoesn’tneedtoreturnanythinginthesamewaylen()orinput()does.Butsinceallfunctioncallsneedtoevaluatetoareturnvalue,print()returnsNone.Toseethisinaction,enterthefollowingintotheinteractiveshell:
>>>spam=print('Hello!')
Hello!
>>>None==spam
True
Behindthescenes,PythonaddsreturnNonetotheendofanyfunctiondefinitionwithnoreturnstatement.Thisissimilartohowawhileorforloopimplicitlyendswithacontinuestatement.Also,ifyouuseareturnstatementwithoutavalue(thatis,justthereturnkeywordbyitself),thenNoneisreturned.
KeywordArgumentsandprint()Mostargumentsareidentifiedbytheirpositioninthefunctioncall.Forexample,random.randint(1,10)isdifferentfromrandom.randint(10,1).Thefunctioncallrandom.randint(1,10)willreturnarandomintegerbetween1and10,becausethefirstargumentisthelowendoftherangeandthesecondargumentisthehighend(whilerandom.randint(10,1)causesanerror).
However,keywordargumentsareidentifiedbythekeywordputbeforetheminthefunctioncall.Keywordargumentsareoftenusedforoptionalparameters.Forexample,theprint()functionhastheoptionalparametersendandseptospecifywhatshouldbeprintedattheendofitsargumentsandbetweenitsarguments(separatingthem),respectively.
Ifyouranthefollowingprogram:print('Hello')
print('World')
theoutputwouldlooklikethis:Hello
World
Thetwostringsappearonseparatelinesbecausetheprint()functionautomaticallyaddsanewlinecharactertotheendofthestringitispassed.However,youcansettheendkeywordargumenttochangethistoadifferentstring.Forexample,iftheprogramwerethis:
print('Hello',end='')
print('World')
theoutputwouldlooklikethis:HelloWorld
Theoutputisprintedonasinglelinebecausethereisnolongeranew-lineprintedafter'Hello'.Instead,theblankstringisprinted.Thisisusefulifyouneedtodisablethenewlinethatgetsaddedtotheendofeveryprint()functioncall.
Similarly,whenyoupassmultiplestringvaluestoprint(),thefunctionwillautomaticallyseparatethemwithasinglespace.Enterthefollowingintotheinteractiveshell:
>>>print('cats','dogs','mice')
catsdogsmice
Butyoucouldreplacethedefaultseparatingstringbypassingthesepkeywordargument.Enterthefollowingintotheinteractiveshell:
>>>print('cats','dogs','mice',sep=',')
cats,dogs,mice
Youcanaddkeywordargumentstothefunctionsyouwriteaswell,butfirstyou’llhavetolearnaboutthelistanddictionarydatatypesinthenexttwochapters.Fornow,justknowthatsomefunctionshaveoptionalkeywordargumentsthatcanbespecifiedwhenthefunctioniscalled.
LocalandGlobalScopeParametersandvariablesthatareassignedinacalledfunctionaresaidtoexistinthatfunction’slocalscope.Variablesthatareassignedoutsideallfunctionsaresaidtoexistintheglobalscope.Avariablethatexistsinalocalscopeiscalledalocalvariable,whileavariablethatexistsintheglobalscopeiscalledaglobalvariable.Avariablemustbeoneortheother;itcannotbebothlocalandglobal.
Thinkofascopeasacontainerforvariables.Whenascopeisdestroyed,allthevaluesstoredinthescope’svariablesareforgotten.Thereisonlyoneglobalscope,anditiscreatedwhenyourprogrambegins.Whenyourprogramterminates,theglobalscopeisdestroyed,andallitsvariablesareforgotten.Otherwise,thenexttimeyouranyourprogram,thevariableswouldremembertheirvaluesfromthelasttimeyouranit.
Alocalscopeiscreatedwheneverafunctioniscalled.Anyvariablesassignedinthisfunctionexistwithinthelocalscope.Whenthefunctionreturns,thelocalscopeisdestroyed,andthesevariablesareforgotten.Thenexttimeyoucallthisfunction,thelocalvariableswillnotrememberthevaluesstoredinthemfromthelasttimethefunctionwascalled.
Scopesmatterforseveralreasons:
Codeintheglobalscopecannotuseanylocalvariables.However,alocalscopecanaccessglobalvariables.Codeinafunction’slocalscopecannotusevariablesinanyotherlocalscope.Youcanusethesamenamefordifferentvariablesiftheyareindifferentscopes.Thatis,therecanbealocalvariablenamedspamandaglobalvariablealsonamedspam.
ThereasonPythonhasdifferentscopesinsteadofjustmakingeverythingaglobalvariableissothatwhenvariablesaremodifiedbythecodeinaparticularcalltoafunction,thefunctioninteractswiththerestoftheprogramonlythroughitsparametersandthereturnvalue.Thisnarrowsdownthelistcodelinesthatmaybecausingabug.Ifyourprogramcontainednothingbutglobalvariablesandhadabugbecauseofavariablebeingsettoabadvalue,thenitwouldbehardtotrackdownwherethisbadvaluewasset.Itcouldhavebeensetfromanywhereintheprogram—andyourprogramcouldbehundredsorthousandsoflineslong!Butifthebugisbecauseofalocalvariablewithabadvalue,youknowthatonlythecodeinthatonefunctioncouldhavesetitincorrectly.
Whileusingglobalvariablesinsmallprogramsisfine,itisabadhabittorelyonglobalvariablesasyourprogramsgetlargerandlarger.
LocalVariablesCannotBeUsedintheGlobalScopeConsiderthisprogram,whichwillcauseanerrorwhenyourunit:
defspam():
eggs=31337
spam()
print(eggs)
Ifyourunthisprogram,theoutputwilllooklikethis:Traceback(mostrecentcalllast):
File"C:/test3784.py",line4,in<module>
print(eggs)
NameError:name'eggs'isnotdefined
Theerrorhappensbecausetheeggsvariableexistsonlyinthelocalscopecreatedwhenspam()iscalled.Oncetheprogramexecutionreturnsfromspam,thatlocalscopeisdestroyed,andthereisnolongeravariablenamedeggs.Sowhenyourprogramtriestorunprint(eggs),Pythongivesyouanerrorsayingthateggsisnotdefined.Thismakessenseifyouthinkaboutit;whentheprogramexecutionisintheglobalscope,nolocalscopesexist,sotherecan’tbeanylocalvariables.Thisiswhyonlyglobalvariablescanbeusedintheglobalscope.
LocalScopesCannotUseVariablesinOtherLocalScopesAnewlocalscopeiscreatedwheneverafunctioniscalled,includingwhenafunctioniscalledfromanotherfunction.Considerthisprogram:
defspam():
➊eggs=99
➋bacon()
➌print(eggs)
defbacon():
ham=101
➍eggs=0
➎spam()
Whentheprogramstarts,thespam()functioniscalled➎,andalocalscopeiscreated.Thelocalvariableeggs➊issetto99.Thenthebacon()functioniscalled➋,andasecondlocalscopeiscreated.Multiplelocalscopescanexistatthesametime.Inthisnewlocalscope,thelocalvariablehamissetto101,andalocalvariableeggs—whichisdifferentfromtheoneinspam()’slocalscope—isalsocreated➍andsetto0.
Whenbacon()returns,thelocalscopeforthatcallisdestroyed.Theprogramexecutioncontinuesinthespam()functiontoprintthevalueofeggs➌,andsincethelocalscopeforthecalltospam()stillexistshere,theeggsvariableissetto99.Thisiswhattheprogramprints.
Theupshotisthatlocalvariablesinonefunctionarecompletelyseparatefromthelocalvariablesinanotherfunction.
GlobalVariablesCanBeReadfromaLocalScopeConsiderthefollowingprogram:
defspam():
print(eggs)
eggs=42
spam()
print(eggs)
Sincethereisnoparameternamedeggsoranycodethatassignseggsavalueinthespam()function,wheneggsisusedinspam(),Pythonconsidersitareferencetotheglobalvariableeggs.Thisiswhy42isprintedwhenthepreviousprogramisrun.
LocalandGlobalVariableswiththeSameNameTosimplifyyourlife,avoidusinglocalvariablesthathavethesamenameasaglobalvariableoranotherlocalvariable.Buttechnically,it’sperfectlylegaltodosoinPython.Toseewhathappens,typethefollowingcodeintothefileeditorandsaveitas
sameName.py:defspam():
➊eggs='spamlocal'
print(eggs)#prints'spamlocal'
defbacon():
➋eggs='baconlocal'
print(eggs)#prints'baconlocal'
spam()
print(eggs)#prints'baconlocal'
➌eggs='global'
bacon()
print(eggs)#prints'global'
Whenyourunthisprogram,itoutputsthefollowing:baconlocal
spamlocal
baconlocal
global
Thereareactuallythreedifferentvariablesinthisprogram,butconfusinglytheyareallnamedeggs.Thevariablesareasfollows:
➊Avariablenamedeggsthatexistsinalocalscopewhenspam()iscalled.
➋Avariablenamedeggsthatexistsinalocalscopewhenbacon()iscalled.
➌Avariablenamedeggsthatexistsintheglobalscope.
Sincethesethreeseparatevariablesallhavethesamename,itcanbeconfusingtokeeptrackofwhichoneisbeingusedatanygiventime.Thisiswhyyoushouldavoidusingthesamevariablenameindifferentscopes.
TheglobalStatementIfyouneedtomodifyaglobalvariablefromwithinafunction,usetheglobalstatement.Ifyouhavealinesuchasglobaleggsatthetopofafunction,ittellsPython,“Inthisfunction,eggsreferstotheglobalvariable,sodon’tcreatealocalvariablewiththisname.”Forexample,typethefollowingcodeintothefileeditorandsaveitassameName2.py:
defspam():
➊globaleggs
➋eggs='spam'
eggs='global'
spam()
print(eggs)
Whenyourunthisprogram,thefinalprint()callwilloutputthis:spam
Becauseeggsisdeclaredglobalatthetopofspam()➊,wheneggsissetto'spam'➋,thisassignmentisdonetothegloballyscopedspam.Nolocalspamvariableiscreated.
Therearefourrulestotellwhetheravariableisinalocalscopeorglobalscope:
1. Ifavariableisbeingusedintheglobalscope(thatis,outsideofallfunctions),thenitisalwaysaglobalvariable.
2. Ifthereisaglobalstatementforthatvariableinafunction,itisaglobalvariable.3. Otherwise,ifthevariableisusedinanassignmentstatementinthefunction,itisa
localvariable.4. Butifthevariableisnotusedinanassignmentstatement,itisaglobalvariable.
Togetabetterfeelfortheserules,here’sanexampleprogram.TypethefollowingcodeintothefileeditorandsaveitassameName3.py:
defspam():
➊globaleggs
eggs='spam'#thisistheglobal
defbacon():
➋eggs='bacon'#thisisalocal
defham():
➌print(eggs)#thisistheglobal
eggs=42#thisistheglobal
spam()
print(eggs)
Inthespam()function,eggsistheglobaleggsvariable,becausethere’saglobalstatementforeggsatthebeginningofthefunction➊.Inbacon(),eggsisalocalvariable,becausethere’sanassignmentstatementforitinthatfunction➋.Inham()➌,eggsistheglobalvariable,becausethereisnoassignmentstatementorglobalstatementforitinthatfunction.IfyourunsameName3.py,theoutputwilllooklikethis:
spam
Inafunction,avariablewilleitheralwaysbeglobaloralwaysbelocal.There’snowaythatthecodeinafunctioncanusealocalvariablenamedeggsandthenlaterinthatsamefunctionusetheglobaleggsvariable.
NOTE
Ifyoueverwanttomodifythevaluestoredinaglobalvariablefrominafunction,youmustuseaglobalstatementonthatvariable.
Ifyoutrytousealocalvariableinafunctionbeforeyouassignavaluetoit,asinthefollowingprogram,Pythonwillgiveyouanerror.Toseethis,typethefollowingintothefileeditorandsaveitassameName4.py:
defspam():
print(eggs)#ERROR!
➊eggs='spamlocal'
➋eggs='global'
spam()
Ifyourunthepreviousprogram,itproducesanerrormessage.Traceback(mostrecentcalllast):
File"C:/test3784.py",line6,in<module>
spam()
File"C:/test3784.py",line2,inspam
print(eggs)#ERROR!
UnboundLocalError:localvariable'eggs'referencedbeforeassignment
ThiserrorhappensbecausePythonseesthatthereisanassignmentstatementforeggsinthespam()function➊andthereforeconsiderseggstobelocal.Butbecauseprint(eggs)isexecutedbeforeeggsisassignedanything,thelocalvariableeggsdoesn’texist.Pythonwillnotfallbacktousingtheglobaleggsvariable➋.
FUNCTIONSAS“BLACKBOXES”
Often,allyouneedtoknowaboutafunctionareitsinputs(theparameters)andoutputvalue;youdon’talwayshavetoburdenyourselfwithhowthefunction’scodeactuallyworks.Whenyouthinkaboutfunctionsinthishigh-levelway,it’scommontosaythatyou’retreatingthefunctionasa“blackbox.”
Thisideaisfundamentaltomodernprogramming.Laterchaptersinthisbookwillshowyouseveralmoduleswithfunctionsthatwerewrittenbyotherpeople.Whileyoucantakeapeekatthesourcecodeifyou’recurious,youdon’tneedtoknowhowthesefunctionsworkinordertousethem.Andbecausewritingfunctionswithoutglobalvariablesisencouraged,youusuallydon’thavetoworryaboutthefunction’scodeinteractingwiththerestofyourprogram.
ExceptionHandlingRightnow,gettinganerror,orexception,inyourPythonprogrammeanstheentireprogramwillcrash.Youdon’twantthistohappeninreal-worldprograms.Instead,youwanttheprogramtodetecterrors,handlethem,andthencontinuetorun.
Forexample,considerthefollowingprogram,whichhasa“divide-by-zero”error.Openanewfileeditorwindowandenterthefollowingcode,savingitaszeroDivide.py:
defspam(divideBy):
return42/divideBy
print(spam(2))
print(spam(12))
print(spam(0))
print(spam(1))
We’vedefinedafunctioncalledspam,givenitaparameter,andthenprintedthevalueofthatfunctionwithvariousparameterstoseewhathappens.Thisistheoutputyougetwhenyourunthepreviouscode:
21.0
3.5
Traceback(mostrecentcalllast):
File"C:/zeroDivide.py",line6,in<module>
print(spam(0))
File"C:/zeroDivide.py",line2,inspam
return42/divideBy
ZeroDivisionError:divisionbyzero
AZeroDivisionErrorhappenswheneveryoutrytodivideanumberbyzero.Fromthelinenumbergivenintheerrormessage,youknowthatthereturnstatementinspam()iscausinganerror.
Errorscanbehandledwithtryandexceptstatements.Thecodethatcouldpotentiallyhaveanerrorisputinatryclause.Theprogramexecutionmovestothestartofafollowingexceptclauseifanerrorhappens.
Youcanputthepreviousdivide-by-zerocodeinatryclauseandhaveanexceptclausecontaincodetohandlewhathappenswhenthiserroroccurs.
defspam(divideBy):
try:
return42/divideBy
exceptZeroDivisionError:
print('Error:Invalidargument.')
print(spam(2))
print(spam(12))
print(spam(0))
print(spam(1))
Whencodeinatryclausecausesanerror,theprogramexecutionimmediatelymovestothecodeintheexceptclause.Afterrunningthatcode,theexecutioncontinuesasnormal.Theoutputofthepreviousprogramisasfollows:
21.0
3.5
Error:Invalidargument.
None
42.0
Notethatanyerrorsthatoccurinfunctioncallsinatryblockwillalsobecaught.Considerthefollowingprogram,whichinsteadhasthespam()callsinthetryblock:
defspam(divideBy):
return42/divideBy
try:
print(spam(2))
print(spam(12))
print(spam(0))
print(spam(1))
exceptZeroDivisionError:
print('Error:Invalidargument.')
Whenthisprogramisrun,theoutputlookslikethis:21.0
3.5
Error:Invalidargument.
Thereasonprint(spam(1))isneverexecutedisbecauseoncetheexecutionjumpstothecodeintheexceptclause,itdoesnotreturntothetryclause.Instead,itjustcontinuesmovingdownasnormal.
AShortProgram:GuesstheNumberThetoyexamplesI’veshowyousofarareusefulforintroducingbasicconcepts,butnowlet’sseehoweverythingyou’velearnedcomestogetherinamorecompleteprogram.Inthissection,I’llshowyouasimple“guessthenumber”game.Whenyourunthisprogram,theoutputwilllooksomethinglikethis:
Iamthinkingofanumberbetween1and20.
Takeaguess.
10
Yourguessistoolow.
Takeaguess.
15
Yourguessistoolow.
Takeaguess.
17
Yourguessistoohigh.
Takeaguess.
16
Goodjob!Youguessedmynumberin4guesses!
Typethefollowingsourcecodeintothefileeditor,andsavethefileasguessTheNumber.py:
#Thisisaguessthenumbergame.
importrandom
secretNumber=random.randint(1,20)
print('Iamthinkingofanumberbetween1and20.')
#Asktheplayertoguess6times.
forguessesTakeninrange(1,7):
print('Takeaguess.')
guess=int(input())
ifguess<secretNumber:
print('Yourguessistoolow.')
elifguess>secretNumber:
print('Yourguessistoohigh.')
else:
break#Thisconditionisthecorrectguess!
ifguess==secretNumber:
print('Goodjob!Youguessedmynumberin'+str(guessesTaken)+'guesses!')
else:
print('Nope.ThenumberIwasthinkingofwas'+str(secretNumber))
Let’slookatthiscodelinebyline,startingatthetop.#Thisisaguessthenumbergame.
importrandom
secretNumber=random.randint(1,20)
First,acommentatthetopofthecodeexplainswhattheprogramdoes.Then,theprogramimportstherandommodulesothatitcanusetherandom.randint()functiontogenerateanumberfortheusertoguess.Thereturnvalue,arandomintegerbetween1and20,isstoredinthevariablesecretNumber.
print('Iamthinkingofanumberbetween1and20.')
#Asktheplayertoguess6times.
forguessesTakeninrange(1,7):
print('Takeaguess.')
guess=int(input())
Theprogramtellstheplayerthatithascomeupwithasecretnumberandwillgivetheplayersixchancestoguessit.Thecodethatletstheplayerenteraguessandchecksthatguessisinaforloopthatwillloopatmostsixtimes.Thefirstthingthathappensintheloopisthattheplayertypesinaguess.Sinceinput()returnsastring,itsreturnvalueis
passedstraightintoint(),whichtranslatesthestringintoanintegervalue.Thisgetsstoredinavariablenamedguess.
ifguess<secretNumber:
print('Yourguessistoolow.')
elifguess>secretNumber:
print('Yourguessistoohigh.')
Thesefewlinesofcodechecktoseewhethertheguessislessthanorgreaterthanthesecretnumber.Ineithercase,ahintisprintedtothescreen.
else:
break#Thisconditionisthecorrectguess!
Iftheguessisneitherhighernorlowerthanthesecretnumber,thenitmustbeequaltothesecretnumber,inwhichcaseyouwanttheprogramexecutiontobreakoutoftheforloop.
ifguess==secretNumber:
print('Goodjob!Youguessedmynumberin'+str(guessesTaken)+'guesses!')
else:
print('Nope.ThenumberIwasthinkingofwas'+str(secretNumber))
Aftertheforloop,thepreviousif…elsestatementcheckswhethertheplayerhascorrectlyguessedthenumberandprintsanappropriatemessagetothescreen.Inbothcases,theprogramdisplaysavariablethatcontainsanintegervalue(guessesTakenandsecretNumber).Sinceitmustconcatenatetheseintegervaluestostrings,itpassesthesevariablestothestr()function,whichreturnsthestringvalueformoftheseintegers.Nowthesestringscanbeconcatenatedwiththe+operatorsbeforefinallybeingpassedtotheprint()functioncall.
SummaryFunctionsaretheprimarywaytocompartmentalizeyourcodeintologicalgroups.Sincethevariablesinfunctionsexistintheirownlocalscopes,thecodeinonefunctioncannotdirectlyaffectthevaluesofvariablesinotherfunctions.Thislimitswhatcodecouldbechangingthevaluesofyourvariables,whichcanbehelpfulwhenitcomestodebuggingyourcode.
Functionsareagreattooltohelpyouorganizeyourcode.Youcanthinkofthemasblackboxes:Theyhaveinputsintheformofparametersandoutputsintheformofreturnvalues,andthecodeinthemdoesn’taffectvariablesinotherfunctions.
Inpreviouschapters,asingleerrorcouldcauseyourprogramstocrash.Inthischapter,youlearnedabouttryandexceptstatements,whichcanruncodewhenanerrorhasbeendetected.Thiscanmakeyourprogramsmoreresilienttocommonerrorcases.
PracticeQuestionsQ: 1.Whyarefunctionsadvantageoustohaveinyourprograms?
Q: 2.Whendoesthecodeinafunctionexecute:whenthefunctionisdefinedorwhenthefunctioniscalled?
Q: 3.Whatstatementcreatesafunction?
Q: 4.Whatisthedifferencebetweenafunctionandafunctioncall?
Q: 5.HowmanyglobalscopesarethereinaPythonprogram?Howmanylocalscopes?
Q: 6.Whathappenstovariablesinalocalscopewhenthefunctioncallreturns?
Q: 7.Whatisareturnvalue?Canareturnvaluebepartofanexpression?
Q: 8.Ifafunctiondoesnothaveareturnstatement,whatisthereturnvalueofacalltothatfunction?
Q: 9.Howcanyouforceavariableinafunctiontorefertotheglobalvariable?
Q: 10.WhatisthedatatypeofNone?
Q: 11.Whatdoestheimportareallyourpetsnamedericstatementdo?
Q: 12.Ifyouhadafunctionnamedbacon()inamodulenamedspam,howwouldyoucallitafterimportingspam?
Q: 13.Howcanyoupreventaprogramfromcrashingwhenitgetsanerror?
Q: 14.Whatgoesinthetryclause?Whatgoesintheexceptclause?
PracticeProjectsForpractice,writeprogramstodothefollowingtasks.
TheCollatzSequenceWriteafunctionnamedcollatz()thathasoneparameternamednumber.Ifnumberiseven,thencollatz()shouldprintnumber//2andreturnthisvalue.Ifnumberisodd,thencollatz()shouldprintandreturn3*number+1.
Thenwriteaprogramthatletstheusertypeinanintegerandthatkeepscallingcollatz()onthatnumberuntilthefunctionreturnsthevalue1.(Amazinglyenough,thissequenceactuallyworksforanyinteger—soonerorlater,usingthissequence,you’llarriveat1!Evenmathematiciansaren’tsurewhy.Yourprogramisexploringwhat’scalledtheCollatzsequence,sometimescalled“thesimplestimpossiblemathproblem.”)
Remembertoconvertthereturnvaluefrominput()toanintegerwiththeint()function;otherwise,itwillbeastringvalue.
Hint:Anintegernumberisevenifnumber%2==0,andit’soddifnumber%2==1.
Theoutputofthisprogramcouldlooksomethinglikethis:Enternumber:
3
10
5
16
8
4
2
1
InputValidationAddtryandexceptstatementstothepreviousprojecttodetectwhethertheusertypesinanonintegerstring.Normally,theint()functionwillraiseaValueErrorerrorifitispassedanonintegerstring,asinint('puppy').Intheexceptclause,printamessagetotheusersayingtheymustenteraninteger.
Chapter4.ListsOnemoretopicyou’llneedtounderstandbeforeyoucanbeginwritingprogramsinearnestisthelistdatatypeanditscousin,thetuple.Listsandtuplescancontainmultiplevalues,whichmakesiteasiertowriteprogramsthathandlelargeamountsofdata.Andsinceliststhemselvescancontainotherlists,youcanusethemtoarrangedataintohierarchicalstructures.
Inthischapter,I’lldiscussthebasicsoflists.I’llalsoteachyouaboutmethods,whicharefunctionsthataretiedtovaluesofacertaindatatype.ThenI’llbrieflycoverthelist-liketupleandstringdatatypesandhowtheycomparetolistvalues.Inthenextchapter,I’llintroduceyoutothedictionarydatatype.
TheListDataTypeAlistisavaluethatcontainsmultiplevaluesinanorderedsequence.Thetermlistvaluereferstothelistitself(whichisavaluethatcanbestoredinavariableorpassedtoafunctionlikeanyothervalue),notthevaluesinsidethelistvalue.Alistvaluelookslikethis:['cat','bat','rat','elephant'].Justasstringvaluesaretypedwithquotecharacterstomarkwherethestringbeginsandends,alistbeginswithanopeningsquarebracketandendswithaclosingsquarebracket,[].Valuesinsidethelistarealsocalleditems.Itemsareseparatedwithcommas(thatis,theyarecomma-delimited).Forexample,enterthefollowingintotheinteractiveshell:
>>>[1,2,3]
[1,2,3]
>>>['cat','bat','rat','elephant']
['cat','bat','rat','elephant']
>>>['hello',3.1415,True,None,42]
['hello',3.1415,True,None,42]
➊>>>spam=['cat','bat','rat','elephant']
>>>spam
['cat','bat','rat','elephant']
Thespamvariable➊isstillassignedonlyonevalue:thelistvalue.Butthelistvalueitselfcontainsothervalues.Thevalue[]isanemptylistthatcontainsnovalues,similarto'',theemptystring.
GettingIndividualValuesinaListwithIndexesSayyouhavethelist['cat','bat','rat','elephant']storedinavariablenamedspam.ThePythoncodespam[0]wouldevaluateto'cat',andspam[1]wouldevaluateto'bat',andsoon.Theintegerinsidethesquarebracketsthatfollowsthelistiscalledanindex.Thefirstvalueinthelistisatindex0,thesecondvalueisatindex1,thethirdvalueisatindex2,andsoon.Figure4-1showsalistvalueassignedtospam,alongwithwhattheindexexpressionswouldevaluateto.
Figure4-1.Alistvaluestoredinthevariablespam,showingwhichvalueeachindexrefersto
Forexample,typethefollowingexpressionsintotheinteractiveshell.Startbyassigningalisttothevariablespam.
>>>spam=['cat','bat','rat','elephant']
>>>spam[0]
'cat'
>>>spam[1]
'bat'
>>>spam[2]
'rat'
>>>spam[3]
'elephant'
>>>['cat','bat','rat','elephant'][3]
'elephant'
➊>>>'Hello'+spam[0]
➋'Hellocat'
>>>'The'+spam[1]+'atethe'+spam[0]+'.'
'Thebatatethecat.'
Noticethattheexpression'Hello'+spam[0]➊evaluatesto'Hello'+'cat'becausespam[0]evaluatestothestring'cat'.Thisexpressioninturnevaluatestothe
stringvalue'Hellocat'➋.
PythonwillgiveyouanIndexErrorerrormessageifyouuseanindexthatexceedsthenumberofvaluesinyourlistvalue.
>>>spam=['cat','bat','rat','elephant']
>>>spam[10000]
Traceback(mostrecentcalllast):
File"<pyshell#9>",line1,in<module>
spam[10000]
IndexError:listindexoutofrange
Indexescanbeonlyintegervalues,notfloats.ThefollowingexamplewillcauseaTypeErrorerror:
>>>spam=['cat','bat','rat','elephant']
>>>spam[1]
'bat'
>>>spam[1.0]
Traceback(mostrecentcalllast):
File"<pyshell#13>",line1,in<module>
spam[1.0]
TypeError:listindicesmustbeintegers,notfloat
>>>spam[int(1.0)]
'bat'
Listscanalsocontainotherlistvalues.Thevaluesintheselistsoflistscanbeaccessedusingmultipleindexes,likeso:
>>>spam=[['cat','bat'],[10,20,30,40,50]]
>>>spam[0]
['cat','bat']
>>>spam[0][1]
'bat'
>>>spam[1][4]
50
Thefirstindexdictateswhichlistvaluetouse,andthesecondindicatesthevaluewithinthelistvalue.Forexample,spam[0][1]prints'bat',thesecondvalueinthefirstlist.Ifyouonlyuseoneindex,theprogramwillprintthefulllistvalueatthatindex.
NegativeIndexesWhileindexesstartat0andgoup,youcanalsousenegativeintegersfortheindex.Theintegervalue-1referstothelastindexinalist,thevalue-2referstothesecond-to-lastindexinalist,andsoon.Enterthefollowingintotheinteractiveshell:
>>>spam=['cat','bat','rat','elephant']
>>>spam[-1]
'elephant'
>>>spam[-3]
'bat'
>>>'The'+spam[-1]+'isafraidofthe'+spam[-3]+'.'
'Theelephantisafraidofthebat.'
GettingSublistswithSlicesJustasanindexcangetasinglevaluefromalist,aslicecangetseveralvaluesfromalist,intheformofanewlist.Asliceistypedbetweensquarebrackets,likeanindex,butithastwointegersseparatedbyacolon.Noticethedifferencebetweenindexesandslices.
spam[2]isalistwithanindex(oneinteger).spam[1:4]isalistwithaslice(twointegers).
Inaslice,thefirstintegeristheindexwheretheslicestarts.Thesecondintegeristhe
indexwherethesliceends.Aslicegoesupto,butwillnotinclude,thevalueatthesecondindex.Asliceevaluatestoanewlistvalue.Enterthefollowingintotheinteractiveshell:
>>>spam=['cat','bat','rat','elephant']
>>>spam[0:4]
['cat','bat','rat','elephant']
>>>spam[1:3]
['bat','rat']
>>>spam[0:-1]
['cat','bat','rat']
Asashortcut,youcanleaveoutoneorbothoftheindexesoneithersideofthecolonintheslice.Leavingoutthefirstindexisthesameasusing0,orthebeginningofthelist.Leavingoutthesecondindexisthesameasusingthelengthofthelist,whichwillslicetotheendofthelist.Enterthefollowingintotheinteractiveshell:
>>>spam=['cat','bat','rat','elephant']
>>>spam[:2]
['cat','bat']
>>>spam[1:]
['bat','rat','elephant']
>>>spam[:]
['cat','bat','rat','elephant']
GettingaList’sLengthwithlen()Thelen()functionwillreturnthenumberofvaluesthatareinalistvaluepassedtoit,justlikeitcancountthenumberofcharactersinastringvalue.Enterthefollowingintotheinteractiveshell:
>>>spam=['cat','dog','moose']
>>>len(spam)
3
ChangingValuesinaListwithIndexesNormallyavariablenamegoesontheleftsideofanassignmentstatement,likespam=42.However,youcanalsouseanindexofalisttochangethevalueatthatindex.Forexample,spam[1]='aardvark'means“Assignthevalueatindex1inthelistspamtothestring'aardvark'.”Enterthefollowingintotheinteractiveshell:
>>>spam=['cat','bat','rat','elephant']
>>>spam[1]='aardvark'
>>>spam
['cat','aardvark','rat','elephant']
>>>spam[2]=spam[1]
>>>spam
['cat','aardvark','aardvark','elephant']
>>>spam[-1]=12345
>>>spam
['cat','aardvark','aardvark',12345]
ListConcatenationandListReplicationThe+operatorcancombinetwoliststocreateanewlistvalueinthesamewayitcombinestwostringsintoanewstringvalue.The*operatorcanalsobeusedwithalistandanintegervaluetoreplicatethelist.Enterthefollowingintotheinteractiveshell:
>>>[1,2,3]+['A','B','C']
[1,2,3,'A','B','C']
>>>['X','Y','Z']*3
['X','Y','Z','X','Y','Z','X','Y','Z']
>>>spam=[1,2,3]
>>>spam=spam+['A','B','C']
>>>spam
[1,2,3,'A','B','C']
RemovingValuesfromListswithdelStatementsThedelstatementwilldeletevaluesatanindexinalist.Allofthevaluesinthelistafterthedeletedvaluewillbemoveduponeindex.Forexample,enterthefollowingintotheinteractiveshell:
>>>spam=['cat','bat','rat','elephant']
>>>delspam[2]
>>>spam
['cat','bat','elephant']
>>>delspam[2]
>>>spam
['cat','bat']
Thedelstatementcanalsobeusedonasimplevariabletodeleteit,asifitwerean“unassignment”statement.Ifyoutrytousethevariableafterdeletingit,youwillgetaNameErrorerrorbecausethevariablenolongerexists.
Inpractice,youalmostneverneedtodeletesimplevariables.Thedelstatementismostlyusedtodeletevaluesfromlists.
WorkingwithListsWhenyoufirstbeginwritingprograms,it’stemptingtocreatemanyindividualvariablestostoreagroupofsimilarvalues.Forexample,ifIwantedtostorethenamesofmycats,Imightbetemptedtowritecodelikethis:
catName1='Zophie'
catName2='Pooka'
catName3='Simon'
catName4='LadyMacbeth'
catName5='Fat-tail'
catName6='MissCleo'
(Idon’tactuallyownthismanycats,Iswear.)Itturnsoutthatthisisabadwaytowritecode.Foronething,ifthenumberofcatschanges,yourprogramwillneverbeabletostoremorecatsthanyouhavevariables.Thesetypesofprogramsalsohavealotofduplicateornearlyidenticalcodeinthem.Considerhowmuchduplicatecodeisinthefollowingprogram,whichyoushouldenterintothefileeditorandsaveasallMyCats1.py:
print('Enterthenameofcat1:')
catName1=input()
print('Enterthenameofcat2:')
catName2=input()
print('Enterthenameofcat3:')
catName3=input()
print('Enterthenameofcat4:')
catName4=input()
print('Enterthenameofcat5:')
catName5=input()
print('Enterthenameofcat6:')
catName6=input()
print('Thecatnamesare:')
print(catName1+''+catName2+''+catName3+''+catName4+''+
catName5+''+catName6)
Insteadofusingmultiple,repetitivevariables,youcanuseasinglevariablethatcontainsalistvalue.Forexample,here’sanewandimprovedversionoftheallMyCats1.pyprogram.Thisnewversionusesasinglelistandcanstoreanynumberofcatsthattheusertypesin.Inanewfileeditorwindow,typethefollowingsourcecodeandsaveitasallMyCats2.py:
catNames=[]
whileTrue:
print('Enterthenameofcat'+str(len(catNames)+1)+
'(Orenternothingtostop.):')
name=input()
ifname=='':
break
catNames=catNames+[name]#listconcatenation
print('Thecatnamesare:')
fornameincatNames:
print(''+name)
Whenyourunthisprogram,theoutputwilllooksomethinglikethis:Enterthenameofcat1(Orenternothingtostop.):
Zophie
Enterthenameofcat2(Orenternothingtostop.):
Pooka
Enterthenameofcat3(Orenternothingtostop.):
Simon
Enterthenameofcat4(Orenternothingtostop.):
LadyMacbeth
Enterthenameofcat5(Orenternothingtostop.):
Fat-tail
Enterthenameofcat6(Orenternothingtostop.):
MissCleo
Enterthenameofcat7(Orenternothingtostop.):
Thecatnamesare:
Zophie
Pooka
Simon
LadyMacbeth
Fat-tail
MissCleo
Thebenefitofusingalististhatyourdataisnowinastructure,soyourprogramismuchmoreflexibleinprocessingthedatathanitwouldbewithseveralrepetitivevariables.
UsingforLoopswithListsInChapter2,youlearnedaboutusingforloopstoexecuteablockofcodeacertainnumberoftimes.Technically,aforlooprepeatsthecodeblockonceforeachvalueinalistorlist-likevalue.Forexample,ifyouranthiscode:
foriinrange(4):
print(i)
theoutputofthisprogramwouldbeasfollows:0
1
2
3
Thisisbecausethereturnvaluefromrange(4)isalist-likevaluethatPythonconsiderssimilarto[0,1,2,3].Thefollowingprogramhasthesameoutputasthepreviousone:
foriin[0,1,2,3]:
print(i)
Whatthepreviousforloopactuallydoesisloopthroughitsclausewiththevariableisettoasuccessivevalueinthe[0,1,2,3]listineachiteration.
NOTE
Inthisbook,Iusethetermlist-liketorefertodatatypesthataretechnicallynamedsequences.Youdon’tneedtoknowthetechnicaldefinitionsofthisterm,though.
AcommonPythontechniqueistouserange(len(someList))withaforlooptoiterateovertheindexesofalist.Forexample,enterthefollowingintotheinteractiveshell:
>>>supplies=['pens','staplers','flame-throwers','binders']
>>>foriinrange(len(supplies)):
print('Index'+str(i)+'insuppliesis:'+supplies[i])
Index0insuppliesis:pens
Index1insuppliesis:staplers
Index2insuppliesis:flame-throwers
Index3insuppliesis:binders
Usingrange(len(supplies))inthepreviouslyshownforloopishandybecausethecodeintheloopcanaccesstheindex(asthevariablei)andthevalueatthatindex(assupplies[i]).Bestofall,range(len(supplies))williteratethroughalltheindexesofsupplies,nomatterhowmanyitemsitcontains.
TheinandnotinOperatorsYoucandeterminewhetheravalueisorisn’tinalistwiththeinandnotinoperators.Likeotheroperators,inandnotinareusedinexpressionsandconnecttwovalues:avaluetolookforinalistandthelistwhereitmaybefound.TheseexpressionswillevaluatetoaBooleanvalue.Enterthefollowingintotheinteractiveshell:
>>>'howdy'in['hello','hi','howdy','heyas']
True
>>>spam=['hello','hi','howdy','heyas']
>>>'cat'inspam
False
>>>'howdy'notinspam
False
>>>'cat'notinspam
True
Forexample,thefollowingprogramletstheusertypeinapetnameandthencheckstoseewhetherthenameisinalistofpets.Openanewfileeditorwindow,enterthefollowingcode,andsaveitasmyPets.py:
myPets=['Zophie','Pooka','Fat-tail']
print('Enterapetname:')
name=input()
ifnamenotinmyPets:
print('Idonothaveapetnamed'+name)
else:
print(name+'ismypet.')
Theoutputmaylooksomethinglikethis:Enterapetname:
Footfoot
IdonothaveapetnamedFootfoot
TheMultipleAssignmentTrickThemultipleassignmenttrickisashortcutthatletsyouassignmultiplevariableswiththevaluesinalistinonelineofcode.Soinsteadofdoingthis:
>>>cat=['fat','black','loud']
>>>size=cat[0]
>>>color=cat[1]
>>>disposition=cat[2]
youcouldtypethislineofcode:>>>cat=['fat','black','loud']
>>>size,color,disposition=cat
Thenumberofvariablesandthelengthofthelistmustbeexactlyequal,orPythonwillgiveyouaValueError:
>>>cat=['fat','black','loud']
>>>size,color,disposition,name=cat
Traceback(mostrecentcalllast):
File"<pyshell#84>",line1,in<module>
size,color,disposition,name=cat
ValueError:needmorethan3valuestounpack
AugmentedAssignmentOperatorsWhenassigningavaluetoavariable,youwillfrequentlyusethevariableitself.Forexample,afterassigning42tothevariablespam,youwouldincreasethevalueinspamby1withthefollowingcode:
>>>spam=42
>>>spam=spam+1
>>>spam
43
Asashortcut,youcanusetheaugmentedassignmentoperator+=todothesamething:>>>spam=42
>>>spam+=1
>>>spam
43
Thereareaugmentedassignmentoperatorsforthe+,-,*,/,and%operators,describedinTable4-1.
Table4-1.TheAugmentedAssignmentOperators
Augmentedassignmentstatement Equivalentassignmentstatement
spam=spam+1 spam+=1
spam=spam-1 spam-=1
spam=spam*1 spam*=1
spam=spam/1 spam/=1
spam=spam%1 spam%=1
The+=operatorcanalsodostringandlistconcatenation,andthe*=operatorcandostringandlistreplication.Enterthefollowingintotheinteractiveshell:
>>>spam='Hello'
>>>spam+='world!'
>>>spam
'Helloworld!'
>>>bacon=['Zophie']
>>>bacon*=3
>>>bacon
['Zophie','Zophie','Zophie']
MethodsAmethodisthesamethingasafunction,exceptitis“calledon”avalue.Forexample,ifalistvaluewerestoredinspam,youwouldcalltheindex()listmethod(whichI’llexplainnext)onthatlistlikeso:spam.index('hello').Themethodpartcomesafterthevalue,separatedbyaperiod.
Eachdatatypehasitsownsetofmethods.Thelistdatatype,forexample,hasseveralusefulmethodsforfinding,adding,removing,andotherwisemanipulatingvaluesinalist.
FindingaValueinaListwiththeindex()MethodListvalueshaveanindex()methodthatcanbepassedavalue,andifthatvalueexistsinthelist,theindexofthevalueisreturned.Ifthevalueisn’tinthelist,thenPythonproducesaValueErrorerror.Enterthefollowingintotheinteractiveshell:
>>>spam=['hello','hi','howdy','heyas']
>>>spam.index('hello')
0
>>>spam.index('heyas')
3
>>>spam.index('howdyhowdyhowdy')
Traceback(mostrecentcalllast):
File"<pyshell#31>",line1,in<module>
spam.index('howdyhowdyhowdy')
ValueError:'howdyhowdyhowdy'isnotinlist
Whenthereareduplicatesofthevalueinthelist,theindexofitsfirstappearanceisreturned.Enterthefollowingintotheinteractiveshell,andnoticethatindex()returns1,not3:
>>>spam=['Zophie','Pooka','Fat-tail','Pooka']
>>>spam.index('Pooka')
1
AddingValuestoListswiththeappend()andinsert()MethodsToaddnewvaluestoalist,usetheappend()andinsert()methods.Enterthefollowingintotheinteractiveshelltocalltheappend()methodonalistvaluestoredinthevariablespam:
>>>spam=['cat','dog','bat']
>>>spam.append('moose')
>>>spam
['cat','dog','bat','moose']
Thepreviousappend()methodcalladdstheargumenttotheendofthelist.Theinsert()methodcaninsertavalueatanyindexinthelist.Thefirstargumenttoinsert()istheindexforthenewvalue,andthesecondargumentisthenewvaluetobeinserted.Enterthefollowingintotheinteractiveshell:
>>>spam=['cat','dog','bat']
>>>spam.insert(1,'chicken')
>>>spam
['cat','chicken','dog','bat']
Noticethatthecodeisspam.append('moose')andspam.insert(1,'chicken'),notspam=spam.append('moose')andspam=spam.insert(1,'chicken').Neitherappend()norinsert()givesthenewvalueofspamasitsreturnvalue.(Infact,thereturnvalueofappend()andinsert()isNone,soyoudefinitelywouldn’twanttostorethisasthenewvariablevalue.)Rather,thelistismodifiedinplace.Modifyingalistinplaceis
coveredinmoredetaillaterinMutableandImmutableDataTypes.
Methodsbelongtoasingledatatype.Theappend()andinsert()methodsarelistmethodsandcanbecalledonlyonlistvalues,notonothervaluessuchasstringsorintegers.Enterthefollowingintotheinteractiveshell,andnotetheAttributeErrorerrormessagesthatshowup:
>>>eggs='hello'
>>>eggs.append('world')
Traceback(mostrecentcalllast):
File"<pyshell#19>",line1,in<module>
eggs.append('world')
AttributeError:'str'objecthasnoattribute'append'
>>>bacon=42
>>>bacon.insert(1,'world')
Traceback(mostrecentcalllast):
File"<pyshell#22>",line1,in<module>
bacon.insert(1,'world')
AttributeError:'int'objecthasnoattribute'insert'
RemovingValuesfromListswithremove()Theremove()methodispassedthevaluetoberemovedfromthelistitiscalledon.Enterthefollowingintotheinteractiveshell:
>>>spam=['cat','bat','rat','elephant']
>>>spam.remove('bat')
>>>spam
['cat','rat','elephant']
AttemptingtodeleteavaluethatdoesnotexistinthelistwillresultinaValueErrorerror.Forexample,enterthefollowingintotheinteractiveshellandnoticetheerrorthatisdisplayed:
>>>spam=['cat','bat','rat','elephant']
>>>spam.remove('chicken')
Traceback(mostrecentcalllast):
File"<pyshell#11>",line1,in<module>
spam.remove('chicken')
ValueError:list.remove(x):xnotinlist
Ifthevalueappearsmultipletimesinthelist,onlythefirstinstanceofthevaluewillberemoved.Enterthefollowingintotheinteractiveshell:
>>>spam=['cat','bat','rat','cat','hat','cat']
>>>spam.remove('cat')
>>>spam
['bat','rat','cat','hat','cat']
Thedelstatementisgoodtousewhenyouknowtheindexofthevalueyouwanttoremovefromthelist.Theremove()methodisgoodwhenyouknowthevalueyouwanttoremovefromthelist.
SortingtheValuesinaListwiththesort()MethodListsofnumbervaluesorlistsofstringscanbesortedwiththesort()method.Forexample,enterthefollowingintotheinteractiveshell:
>>>spam=[2,5,3.14,1,-7]
>>>spam.sort()
>>>spam
[-7,1,2,3.14,5]
>>>spam=['ants','cats','dogs','badgers','elephants']
>>>spam.sort()
>>>spam
['ants','badgers','cats','dogs','elephants']
YoucanalsopassTrueforthereversekeywordargumenttohavesort()sortthevaluesinreverseorder.Enterthefollowingintotheinteractiveshell:
>>>spam.sort(reverse=True)
>>>spam
['elephants','dogs','cats','badgers','ants']
Therearethreethingsyoushouldnoteaboutthesort()method.First,thesort()methodsortsthelistinplace;don’ttrytocapturethereturnvaluebywritingcodelikespam=spam.sort().
Second,youcannotsortliststhathavebothnumbervaluesandstringvaluesinthem,sincePythondoesn’tknowhowtocomparethesevalues.TypethefollowingintotheinteractiveshellandnoticetheTypeErrorerror:
>>>spam=[1,3,2,4,'Alice','Bob']
>>>spam.sort()
Traceback(mostrecentcalllast):
File"<pyshell#70>",line1,in<module>
spam.sort()
TypeError:unorderabletypes:str()<int()
Third,sort()uses“ASCIIbeticalorder”ratherthanactualalphabeticalorderforsortingstrings.Thismeansuppercaseletterscomebeforelowercaseletters.Therefore,thelowercaseaissortedsothatitcomesaftertheuppercaseZ.Foranexample,enterthefollowingintotheinteractiveshell:
>>>spam=['Alice','ants','Bob','badgers','Carol','cats']
>>>spam.sort()
>>>spam
['Alice','Bob','Carol','ants','badgers','cats']
Ifyouneedtosortthevaluesinregularalphabeticalorder,passstr.lowerforthekeykeywordargumentinthesort()methodcall.
>>>spam=['a','z','A','Z']
>>>spam.sort(key=str.lower)
>>>spam
['a','A','z','Z']
Thiscausesthesort()functiontotreatalltheitemsinthelistasiftheywerelowercasewithoutactuallychangingthevaluesinthelist.
ExampleProgram:Magic8BallwithaListUsinglists,youcanwriteamuchmoreelegantversionofthepreviouschapter’sMagic8Ballprogram.Insteadofseverallinesofnearlyidenticalelifstatements,youcancreateasinglelistthatthecodeworkswith.Openanewfileeditorwindowandenterthefollowingcode.Saveitasmagic8Ball2.py.
importrandom
messages=['Itiscertain',
'Itisdecidedlyso',
'Yesdefinitely',
'Replyhazytryagain',
'Askagainlater',
'Concentrateandaskagain',
'Myreplyisno',
'Outlooknotsogood',
'Verydoubtful']
print(messages[random.randint(0,len(messages)-1)])
EXCEPTIONSTOINDENTATIONRULESINPYTHON
Inmostcases,theamountofindentationforalineofcodetellsPythonwhatblockitisin.Therearesomeexceptionstothisrule,however.Forexample,listscanactuallyspanseverallinesinthesourcecodefile.Theindentationoftheselinesdonotmatter;Pythonknowsthatuntilitseestheendingsquarebracket,thelistisnotfinished.Forexample,youcanhavecodethatlookslikethis:
spam=['apples',
'oranges',
'bananas',
'cats']
print(spam)
Ofcourse,practicallyspeaking,mostpeopleusePython’sbehaviortomaketheirlistslookprettyandreadable,likethemessageslistintheMagic8Ballprogram.
Youcanalsosplitupasingleinstructionacrossmultiplelinesusingthe\linecontinuationcharacterattheend.Thinkof\assaying,“Thisinstructioncontinuesonthenextline.”Theindentationonthelineaftera\linecontinuationisnotsignificant.Forexample,thefollowingisvalidPythoncode:
print('Fourscoreandseven'+\
'yearsago…')
ThesetricksareusefulwhenyouwanttorearrangelonglinesofPythoncodetobeabitmorereadable.
Whenyourunthisprogram,you’llseethatitworksthesameasthepreviousmagic8Ball.pyprogram.
Noticetheexpressionyouuseastheindexintomessages:random.randint(0,len(messages)-1).Thisproducesarandomnumbertousefortheindex,regardlessofthesizeofmessages.Thatis,you’llgetarandomnumberbetween0andthevalueoflen(messages)-1.Thebenefitofthisapproachisthatyoucaneasilyaddandremovestringstothemessageslistwithoutchangingotherlinesofcode.Ifyoulaterupdateyourcode,therewillbefewerlinesyouhavetochangeandfewerchancesforyoutointroducebugs.
List-likeTypes:StringsandTuplesListsaren’ttheonlydatatypesthatrepresentorderedsequencesofvalues.Forexample,stringsandlistsareactuallysimilar,ifyouconsiderastringtobea“list”ofsingletextcharacters.Manyofthethingsyoucandowithlistscanalsobedonewithstrings:indexing;slicing;andusingthemwithforloops,withlen(),andwiththeinandnotinoperators.Toseethis,enterthefollowingintotheinteractiveshell:
>>>name='Zophie'
>>>name[0]
'Z'
>>>name[-2]
'i'
>>>name[0:4]
'Zoph'
>>>'Zo'inname
True
>>>'z'inname
False
>>>'p'notinname
False
>>>foriinname:
print('***'+i+'***')
***Z***
***o***
***p***
***h***
***i***
***e***
MutableandImmutableDataTypesButlistsandstringsaredifferentinanimportantway.Alistvalueisamutabledatatype:Itcanhavevaluesadded,removed,orchanged.However,astringisimmutable:Itcannotbechanged.TryingtoreassignasinglecharacterinastringresultsinaTypeErrorerror,asyoucanseebyenteringthefollowingintotheinteractiveshell:
>>>name='Zophieacat'
>>>name[7]='the'
Traceback(mostrecentcalllast):
File"<pyshell#50>",line1,in<module>
name[7]='the'
TypeError:'str'objectdoesnotsupportitemassignment
Theproperwayto“mutate”astringistouseslicingandconcatenationtobuildanewstringbycopyingfrompartsoftheoldstring.Enterthefollowingintotheinteractiveshell:
>>>name='Zophieacat'
>>>newName=name[0:7]+'the'+name[8:12]
>>>name
'Zophieacat'
>>>newName
'Zophiethecat'
Weused[0:7]and[8:12]torefertothecharactersthatwedon’twishtoreplace.Noticethattheoriginal'Zophieacat'stringisnotmodifiedbecausestringsareimmutable.
Althoughalistvalueismutable,thesecondlineinthefollowingcodedoesnotmodifythelisteggs:
>>>eggs=[1,2,3]
>>>eggs=[4,5,6]
>>>eggs
[4,5,6]
Thelistvalueineggsisn’tbeingchangedhere;rather,anentirelynewanddifferentlistvalue([4,5,6])isoverwritingtheoldlistvalue([1,2,3]).ThisisdepictedinFigure4-2.
Ifyouwantedtoactuallymodifytheoriginallistineggstocontain[4,5,6],youwouldhavetodosomethinglikethis:
>>>eggs=[1,2,3]
>>>deleggs[2]
>>>deleggs[1]
>>>deleggs[0]
>>>eggs.append(4)
>>>eggs.append(5)
>>>eggs.append(6)
>>>eggs
[4,5,6]
Figure4-2.Wheneggs=[4,5,6]isexecuted,thecontentsofeggsarereplacedwithanewlistvalue.
Inthefirstexample,thelistvaluethateggsendsupwithisthesamelistvalueitstartedwith.It’sjustthatthislisthasbeenchanged,ratherthanoverwritten.Figure4-3depictsthesevenchangesmadebythefirstsevenlinesinthepreviousinteractiveshellexample.
Figure4-3.Thedelstatementandtheappend()methodmodifythesamelistvalueinplace.
Changingavalueofamutabledatatype(likewhatthedelstatementandappend()methoddointhepreviousexample)changesthevalueinplace,sincethevariable’svalueisnotreplacedwithanewlistvalue.
Mutableversusimmutabletypesmayseemlikeameaninglessdistinction,butPassingReferenceswillexplainthedifferentbehaviorwhencallingfunctionswithmutableargumentsversusimmutablearguments.Butfirst,let’sfindoutaboutthetupledatatype,
whichisanimmutableformofthelistdatatype.
TheTupleDataTypeThetupledatatypeisalmostidenticaltothelistdatatype,exceptintwoways.First,tuplesaretypedwithparentheses,(and),insteadofsquarebrackets,[and].Forexample,enterthefollowingintotheinteractiveshell:
>>>eggs=('hello',42,0.5)
>>>eggs[0]
'hello'
>>>eggs[1:3]
(42,0.5)
>>>len(eggs)
3
Butthemainwaythattuplesaredifferentfromlistsisthattuples,likestrings,areimmutable.Tuplescannothavetheirvaluesmodified,appended,orremoved.Enterthefollowingintotheinteractiveshell,andlookattheTypeErrorerrormessage:
>>>eggs=('hello',42,0.5)
>>>eggs[1]=99
Traceback(mostrecentcalllast):
File"<pyshell#5>",line1,in<module>
eggs[1]=99
TypeError:'tuple'objectdoesnotsupportitemassignment
Ifyouhaveonlyonevalueinyourtuple,youcanindicatethisbyplacingatrailingcommaafterthevalueinsidetheparentheses.Otherwise,Pythonwillthinkyou’vejusttypedavalueinsideregularparentheses.ThecommaiswhatletsPythonknowthisisatuplevalue.(Unlikesomeotherprogramminglanguages,inPythonit’sfinetohaveatrailingcommaafterthelastiteminalistortuple.)Enterthefollowingtype()functioncallsintotheinteractiveshelltoseethedistinction:
>>>type(('hello',))
<class'tuple'>
>>>type(('hello'))
<class'str'>
Youcanusetuplestoconveytoanyonereadingyourcodethatyoudon’tintendforthatsequenceofvaluestochange.Ifyouneedanorderedsequenceofvaluesthatneverchanges,useatuple.Asecondbenefitofusingtuplesinsteadoflistsisthat,becausetheyareimmutableandtheircontentsdon’tchange,Pythoncanimplementsomeoptimizationsthatmakecodeusingtuplesslightlyfasterthancodeusinglists.
ConvertingTypeswiththelist()andtuple()FunctionsJustlikehowstr(42)willreturn'42',thestringrepresentationoftheinteger42,thefunctionslist()andtuple()willreturnlistandtupleversionsofthevaluespassedtothem.Enterthefollowingintotheinteractiveshell,andnoticethatthereturnvalueisofadifferentdatatypethanthevaluepassed:
>>>tuple(['cat','dog',5])
('cat','dog',5)
>>>list(('cat','dog',5))
['cat','dog',5]
>>>list('hello')
['h','e','l','l','o']
Convertingatupletoalistishandyifyouneedamutableversionofatuplevalue.
ReferencesAsyou’veseen,variablesstorestringsandintegervalues.Enterthefollowingintotheinteractiveshell:
>>>spam=42
>>>cheese=spam
>>>spam=100
>>>spam
100
>>>cheese
42
Youassign42tothespamvariable,andthenyoucopythevalueinspamandassignittothevariablecheese.Whenyoulaterchangethevalueinspamto100,thisdoesn’taffectthevalueincheese.Thisisbecausespamandcheesearedifferentvariablesthatstoredifferentvalues.
Butlistsdon’tworkthisway.Whenyouassignalisttoavariable,youareactuallyassigningalistreferencetothevariable.Areferenceisavaluethatpointstosomebitofdata,andalistreferenceisavaluethatpointstoalist.Hereissomecodethatwillmakethisdistinctioneasiertounderstand.Enterthisintotheinteractiveshell:
➊>>>spam=[0,1,2,3,4,5]
➋>>>cheese=spam
➌>>>cheese[1]='Hello!'
>>>spam
[0,'Hello!',2,3,4,5]
>>>cheese
[0,'Hello!',2,3,4,5]
Thismightlookoddtoyou.Thecodechangedonlythecheeselist,butitseemsthatboththecheeseandspamlistshavechanged.
Whenyoucreatethelist➊,youassignareferencetoitinthespamvariable.Butthenextline➋copiesonlythelistreferenceinspamtocheese,notthelistvalueitself.Thismeansthevaluesstoredinspamandcheesenowbothrefertothesamelist.Thereisonlyoneunderlyinglistbecausethelistitselfwasneveractuallycopied.Sowhenyoumodifythefirstelementofcheese➌,youaremodifyingthesamelistthatspamrefersto.
Rememberthatvariablesarelikeboxesthatcontainvalues.Thepreviousfiguresinthischaptershowthatlistsinboxesaren’texactlyaccuratebecauselistvariablesdon’tactuallycontainlists—theycontainreferencestolists.(ThesereferenceswillhaveIDnumbersthatPythonusesinternally,butyoucanignorethem.)Usingboxesasametaphorforvariables,Figure4-4showswhathappenswhenalistisassignedtothespamvariable.
Figure4-4.spam=[0,1,2,3,4,5]storesareferencetoalist,nottheactuallist.
Then,inFigure4-5,thereferenceinspamiscopiedtocheese.Onlyanewreferencewascreatedandstoredincheese,notanewlist.Notehowbothreferencesrefertothesamelist.
Figure4-5.spam=cheesecopiesthereference,notthelist.
Whenyoualterthelistthatcheeserefersto,thelistthatspamreferstoisalsochanged,becausebothcheeseandspamrefertothesamelist.YoucanseethisinFigure4-6.
Figure4-6.cheese[1]='Hello!'modifiesthelistthatbothvariablesreferto.
Variableswillcontainreferencestolistvaluesratherthanlistvaluesthemselves.Butforstringsandintegervalues,variablessimplycontainthestringorintegervalue.Pythonusesreferenceswhenevervariablesmuststorevaluesofmutabledatatypes,suchaslistsor
dictionaries.Forvaluesofimmutabledatatypessuchasstrings,integers,ortuples,Pythonvariableswillstorethevalueitself.
AlthoughPythonvariablestechnicallycontainreferencestolistordictionaryvalues,peopleoftencasuallysaythatthevariablecontainsthelistordictionary.
PassingReferencesReferencesareparticularlyimportantforunderstandinghowargumentsgetpassedtofunctions.Whenafunctioniscalled,thevaluesoftheargumentsarecopiedtotheparametervariables.Forlists(anddictionaries,whichI’lldescribeinthenextchapter),thismeansacopyofthereferenceisusedfortheparameter.Toseetheconsequencesofthis,openanewfileeditorwindow,enterthefollowingcode,andsaveitaspassingReference.py:
defeggs(someParameter):
someParameter.append('Hello')
spam=[1,2,3]
eggs(spam)
print(spam)
Noticethatwheneggs()iscalled,areturnvalueisnotusedtoassignanewvaluetospam.Instead,itmodifiesthelistinplace,directly.Whenrun,thisprogramproducesthefollowingoutput:
[1,2,3,'Hello']
EventhoughspamandsomeParametercontainseparatereferences,theybothrefertothesamelist.Thisiswhytheappend('Hello')methodcallinsidethefunctionaffectsthelistevenafterthefunctioncallhasreturned.
Keepthisbehaviorinmind:ForgettingthatPythonhandleslistanddictionaryvariablesthiswaycanleadtoconfusingbugs.
ThecopyModule’scopy()anddeepcopy()FunctionsAlthoughpassingaroundreferencesisoftenthehandiestwaytodealwithlistsanddictionaries,ifthefunctionmodifiesthelistordictionarythatispassed,youmaynotwantthesechangesintheoriginallistordictionaryvalue.Forthis,Pythonprovidesamodulenamedcopythatprovidesboththecopy()anddeepcopy()functions.Thefirstofthese,copy.copy(),canbeusedtomakeaduplicatecopyofamutablevaluelikealistordictionary,notjustacopyofareference.Enterthefollowingintotheinteractiveshell:
>>>importcopy
>>>spam=['A','B','C','D']
>>>cheese=copy.copy(spam)
>>>cheese[1]=42
>>>spam
['A','B','C','D']
>>>cheese
['A',42,'C','D']
Nowthespamandcheesevariablesrefertoseparatelists,whichiswhyonlythelistincheeseismodifiedwhenyouassign42atindex7.AsyoucanseeinFigure4-7,thereferenceIDnumbersarenolongerthesameforbothvariablesbecausethevariablesrefertoindependentlists.
Figure4-7.cheese=copy.copy(spam)createsasecondlistthatcanbemodifiedindependentlyofthefirst.
Ifthelistyouneedtocopycontainslists,thenusethecopy.deepcopy()functioninsteadofcopy.copy().Thedeepcopy()functionwillcopytheseinnerlistsaswell.
SummaryListsareusefuldatatypessincetheyallowyoutowritecodethatworksonamodifiablenumberofvaluesinasinglevariable.Laterinthisbook,youwillseeprogramsusingliststodothingsthatwouldbedifficultorimpossibletodowithoutthem.
Listsaremutable,meaningthattheircontentscanchange.Tuplesandstrings,althoughlist-likeinsomerespects,areimmutableandcannotbechanged.Avariablethatcontainsatupleorstringvaluecanbeoverwrittenwithanewtupleorstringvalue,butthisisnotthesamethingasmodifyingtheexistingvalueinplace—like,say,theappend()orremove()methodsdoonlists.
Variablesdonotstorelistvaluesdirectly;theystorereferencestolists.Thisisanimportantdistinctionwhencopyingvariablesorpassinglistsasargumentsinfunctioncalls.Becausethevaluethatisbeingcopiedisthelistreference,beawarethatanychangesyoumaketothelistmightimpactanothervariableinyourprogram.Youcanusecopy()ordeepcopy()ifyouwanttomakechangestoalistinonevariablewithoutmodifyingtheoriginallist.
PracticeQuestionsQ: 1.Whatis[]?
Q: 2.Howwouldyouassignthevalue'hello'asthethirdvalueinaliststoredinavariablenamedspam?(Assumespamcontains[2,4,6,8,10].)
Forthefollowingthreequestions,let’ssayspamcontainsthelist['a','b','c','d'].
Q: 3.Whatdoesspam[int('3'*2)/11]evaluateto?
Q: 4.Whatdoesspam[-1]evaluateto?
Q: 5.Whatdoesspam[:2]evaluateto?
Forthefollowingthreequestions,let’ssaybaconcontainsthelist[3.14,'cat',11,'cat',True].
Q: 6.Whatdoesbacon.index('cat')evaluateto?
Q: 7.Whatdoesbacon.append(99)makethelistvalueinbaconlooklike?
Q: 8.Whatdoesbacon.remove('cat')makethelistvalueinbaconlooklike?
Q: 9.Whataretheoperatorsforlistconcatenationandlistreplication?
Q: 10.Whatisthedifferencebetweentheappend()andinsert()listmethods?
Q: 11.Whataretwowaystoremovevaluesfromalist?
Q: 12.Nameafewwaysthatlistvaluesaresimilartostringvalues.
Q: 13.Whatisthedifferencebetweenlistsandtuples?
Q: 14.Howdoyoutypethetuplevaluethathasjusttheintegervalue42init?
Q: 15.Howcanyougetthetupleformofalistvalue?Howcanyougetthelistformofatuplevalue?
Q: 16.Variablesthat“contain”listvaluesdon’tactuallycontainlistsdirectly.Whatdotheycontaininstead?
Q: 17.Whatisthedifferencebetweencopy.copy()andcopy.deepcopy()?
PracticeProjectsForpractice,writeprogramstodothefollowingtasks.
CommaCodeSayyouhavealistvaluelikethis:
spam=['apples','bananas','tofu','cats']
Writeafunctionthattakesalistvalueasanargumentandreturnsastringwithalltheitemsseparatedbyacommaandaspace,withandinsertedbeforethelastitem.Forexample,passingthepreviousspamlisttothefunctionwouldreturn'apples,bananas,tofu,andcats'.Butyourfunctionshouldbeabletoworkwithanylistvaluepassedtoit.
CharacterPictureGridSayyouhavealistoflistswhereeachvalueintheinnerlistsisaone-characterstring,likethis:
grid=[['.','.','.','.','.','.'],
['.','O','O','.','.','.'],
['O','O','O','O','.','.'],
['O','O','O','O','O','.'],
['.','O','O','O','O','O'],
['O','O','O','O','O','.'],
['O','O','O','O','.','.'],
['.','O','O','.','.','.'],
['.','.','.','.','.','.']]
Youcanthinkofgrid[x][y]asbeingthecharacteratthex-andy-coordinatesofa“picture”drawnwithtextcharacters.The(0,0)originwillbeintheupper-leftcorner,thex-coordinatesincreasegoingright,andwthey-coordinatesincreasegoingdown.
Copythepreviousgridvalue,andwritecodethatusesittoprinttheimage...OO.OO…OOOOOOO.
.OOOOOOO…OOOOO…..OOO…
....O….
Hint:Youwillneedtousealoopinaloopinordertoprintgrid[0][0],thengrid[1][0],thengrid[2][0],andsoon,uptogrid[8][0].Thiswillfinishthefirstrow,sothenprintanewline.Thenyourprogramshouldprintgrid[0][1],thengrid[1][1],thengrid[2][1],andsoon.Thelastthingyourprogramwillprintisgrid[8][5].
Also,remembertopasstheendkeywordargumenttoprint()ifyoudon’twantanewlineprintedautomaticallyaftereachprint()call.
Chapter5.DictionariesandStructuringDataInthischapter,Iwillcoverthedictionarydatatype,whichprovidesaflexiblewaytoaccessandorganizedata.Then,combiningdictionarieswithyourknowledgeoflistsfromthepreviouschapter,you’lllearnhowtocreateadatastructuretomodelatic-tac-toeboard.
TheDictionaryDataTypeLikealist,adictionaryisacollectionofmanyvalues.Butunlikeindexesforlists,indexesfordictionariescanusemanydifferentdatatypes,notjustintegers.Indexesfordictionariesarecalledkeys,andakeywithitsassociatedvalueiscalledakey-valuepair.
Incode,adictionaryistypedwithbraces,{}.Enterthefollowingintotheinteractiveshell:>>>myCat={'size':'fat','color':'gray','disposition':'loud'}
ThisassignsadictionarytothemyCatvariable.Thisdictionary’skeysare'size','color',and'disposition'.Thevaluesforthesekeysare'fat','gray',and'loud',respectively.Youcanaccessthesevaluesthroughtheirkeys:
>>>myCat['size']
'fat'
>>>'Mycathas'+myCat['color']+'fur.'
'Mycathasgrayfur.'
Dictionariescanstilluseintegervaluesaskeys,justlikelistsuseintegersforindexes,buttheydonothavetostartat0andcanbeanynumber.
>>>spam={12345:'LuggageCombination',42:'TheAnswer'}
Dictionariesvs.ListsUnlikelists,itemsindictionariesareunordered.Thefirstiteminalistnamedspamwouldbespam[0].Butthereisno“first”iteminadictionary.Whiletheorderofitemsmattersfordeterminingwhethertwolistsarethesame,itdoesnotmatterinwhatorderthekey-valuepairsaretypedinadictionary.Enterthefollowingintotheinteractiveshell:
>>>spam=['cats','dogs','moose']
>>>bacon=['dogs','moose','cats']
>>>spam==bacon
False
>>>eggs={'name':'Zophie','species':'cat','age':'8'}
>>>ham={'species':'cat','age':'8','name':'Zophie'}
>>>eggs==ham
True
Becausedictionariesarenotordered,theycan’tbeslicedlikelists.
TryingtoaccessakeythatdoesnotexistinadictionarywillresultinaKeyErrorerrormessage,muchlikealist’s“out-of-range”IndexErrorerrormessage.Enterthefollowingintotheinteractiveshell,andnoticetheerrormessagethatshowsupbecausethereisno'color'key:
>>>spam={'name':'Zophie','age':7}
>>>spam['color']
Traceback(mostrecentcalllast):
File"<pyshell#1>",line1,in<module>
spam['color']
KeyError:'color'
Thoughdictionariesarenotordered,thefactthatyoucanhavearbitraryvaluesforthekeysallowsyoutoorganizeyourdatainpowerfulways.Sayyouwantedyourprogramtostoredataaboutyourfriends’birthdays.Youcanuseadictionarywiththenamesaskeysandthebirthdaysasvalues.Openanewfileeditorwindowandenterthefollowingcode.Saveitasbirthdays.py.
➊birthdays={'Alice':'Apr1','Bob':'Dec12','Carol':'Mar4'}
whileTrue:
print('Enteraname:(blanktoquit)')
name=input()
ifname=='':
break
➋ifnameinbirthdays:
➌print(birthdays[name]+'isthebirthdayof'+name)
else:
print('Idonothavebirthdayinformationfor'+name)
print('Whatistheirbirthday?')
bday=input()
➍birthdays[name]=bday
print('Birthdaydatabaseupdated.')
Youcreateaninitialdictionaryandstoreitinbirthdays➊.Youcanseeiftheenterednameexistsasakeyinthedictionarywiththeinkeyword➋,justasyoudidforlists.Ifthenameisinthedictionary,youaccesstheassociatedvalueusingsquarebrackets➌;ifnot,youcanadditusingthesamesquarebracketsyntaxcombinedwiththeassignmentoperator➍.
Whenyourunthisprogram,itwilllooklikethis:Enteraname:(blanktoquit)
Alice
Apr1isthebirthdayofAlice
Enteraname:(blanktoquit)
Eve
IdonothavebirthdayinformationforEve
Whatistheirbirthday?
Dec5
Birthdaydatabaseupdated.
Enteraname:(blanktoquit)
Eve
Dec5isthebirthdayofEve
Enteraname:(blanktoquit)
Ofcourse,allthedatayouenterinthisprogramisforgottenwhentheprogramterminates.You’lllearnhowtosavedatatofilesontheharddriveinChapter8.
Thekeys(),values(),anditems()MethodsTherearethreedictionarymethodsthatwillreturnlist-likevaluesofthedictionary’skeys,values,orbothkeysandvalues:keys(),values(),anditems().Thevaluesreturnedbythesemethodsarenottruelists:Theycannotbemodifiedanddonothaveanappend()method.Butthesedatatypes(dict_keys,dict_values,anddict_items,respectively)canbeusedinforloops.Toseehowthesemethodswork,enterthefollowingintotheinteractiveshell:
>>>spam={'color':'red','age':42}
>>>forvinspam.values():
print(v)
red
42
Here,aforloopiteratesovereachofthevaluesinthespamdictionary.Aforloopcanalsoiterateoverthekeysorbothkeysandvalues:
>>>forkinspam.keys():
print(spam[k])
color
age
>>>foriinspam.items():
print(i)
('color','red')
('age',42)
Usingthekeys(),values(),anditems()methods,aforloopcaniterateoverthekeys,values,orkey-valuepairsinadictionary,respectively.Noticethatthevaluesinthedict_itemsvaluereturnedbytheitems()methodaretuplesofthekeyandvalue.
Ifyouwantatruelistfromoneofthesemethods,passitslist-likereturnvaluetothelist()function.Enterthefollowingintotheinteractiveshell:
>>>spam={'color':'red','age':42}
>>>spam.keys()
dict_keys(['color','age'])
>>>list(spam.keys())
['color','age']
Thelist(spam.keys())linetakesthedict_keysvaluereturnedfromkeys()andpassesittolist(),whichthenreturnsalistvalueof['color','age'].
Youcanalsousethemultipleassignmenttrickinaforlooptoassignthekeyandvaluetoseparatevariables.Enterthefollowingintotheinteractiveshell:
>>>spam={'color':'red','age':42}
>>>fork,vinspam.items():
print('Key:'+k+'Value:'+str(v))
Key:ageValue:42
Key:colorValue:red
CheckingWhetheraKeyorValueExistsinaDictionaryRecallfromthepreviouschapterthattheinandnotinoperatorscancheckwhetheravalueexistsinalist.Youcanalsousetheseoperatorstoseewhetheracertainkeyorvalueexistsinadictionary.Enterthefollowingintotheinteractiveshell:
>>>spam={'name':'Zophie','age':7}
>>>'name'inspam.keys()
True
>>>'Zophie'inspam.values()
True
>>>'color'inspam.keys()
False
>>>'color'notinspam.keys()
True
>>>'color'inspam
False
Inthepreviousexample,noticethat'color'inspamisessentiallyashorterversionofwriting'color'inspam.keys().Thisisalwaysthecase:Ifyoueverwanttocheckwhetheravalueis(orisn’t)akeyinthedictionary,youcansimplyusethein(ornotin)keywordwiththedictionaryvalueitself.
Theget()MethodIt’stedioustocheckwhetherakeyexistsinadictionarybeforeaccessingthatkey’svalue.Fortunately,dictionarieshaveaget()methodthattakestwoarguments:thekeyofthevaluetoretrieveandafallbackvaluetoreturnifthatkeydoesnotexist.
Enterthefollowingintotheinteractiveshell:>>>picnicItems={'apples':5,'cups':2}
>>>'Iambringing'+str(picnicItems.get('cups',0))+'cups.'
'Iambringing2cups.'
>>>'Iambringing'+str(picnicItems.get('eggs',0))+'eggs.'
'Iambringing0eggs.'
Becausethereisno'eggs'keyinthepicnicItemsdictionary,thedefaultvalue0isreturnedbytheget()method.Withoutusingget(),thecodewouldhavecausedanerrormessage,suchasinthefollowingexample:
>>>picnicItems={'apples':5,'cups':2}
>>>'Iambringing'+str(picnicItems['eggs'])+'eggs.'
Traceback(mostrecentcalllast):
File"<pyshell#34>",line1,in<module>
'Iambringing'+str(picnicItems['eggs'])+'eggs.'
KeyError:'eggs'
Thesetdefault()MethodYou’lloftenhavetosetavalueinadictionaryforacertainkeyonlyifthatkeydoesnotalreadyhaveavalue.Thecodelookssomethinglikethis:
spam={'name':'Pooka','age':5}
if'color'notinspam:
spam['color']='black'
Thesetdefault()methodoffersawaytodothisinonelineofcode.Thefirstargumentpassedtothemethodisthekeytocheckfor,andthesecondargumentisthevaluetosetatthatkeyifthekeydoesnotexist.Ifthekeydoesexist,thesetdefault()methodreturnsthekey’svalue.Enterthefollowingintotheinteractiveshell:
>>>spam={'name':'Pooka','age':5}
>>>spam.setdefault('color','black')
'black'
>>>spam
{'color':'black','age':5,'name':'Pooka'}
>>>spam.setdefault('color','white')
'black'
>>>spam
{'color':'black','age':5,'name':'Pooka'}
Thefirsttimesetdefault()iscalled,thedictionaryinspamchangesto{'color':'black','age':5,'name':'Pooka'}.Themethodreturnsthevalue'black'becausethisisnowthevaluesetforthekey'color'.Whenspam.setdefault('color','white')iscallednext,thevalueforthatkeyisnotchangedto'white'becausespamalreadyhasakeynamed'color'.
Thesetdefault()methodisaniceshortcuttoensurethatakeyexists.Hereisashortprogramthatcountsthenumberofoccurrencesofeachletterinastring.Openthefileeditorwindowandenterthefollowingcode,savingitascharacterCount.py:
message='ItwasabrightcolddayinApril,andtheclockswerestrikingthirteen.'
count={}
forcharacterinmessage:
count.setdefault(character,0)
count[character]=count[character]+1
print(count)
Theprogramloopsovereachcharacterinthemessagevariable’sstring,countinghowofteneachcharacterappears.Thesetdefault()methodcallensuresthatthekeyisinthecountdictionary(withadefaultvalueof0)sotheprogramdoesn’tthrowaKeyErrorerrorwhencount[character]=count[character]+1isexecuted.Whenyourunthisprogram,theoutputwilllooklikethis:
{'':13,',':1,'.':1,'A':1,'I':1,'a':4,'c':3,'b':1,'e':5,'d':3,'g':2,'i':
6,'h':3,'k':2,'l':3,'o':2,'n':4,'p':1,'s':3,'r':5,'t':6,'w':2,'y':1}
Fromtheoutput,youcanseethatthelowercaselettercappears3times,thespace
characterappears13times,andtheuppercaseletterAappears1time.Thisprogramwillworknomatterwhatstringisinsidethemessagevariable,evenifthestringismillionsofcharacterslong!
PrettyPrintingIfyouimportthepprintmoduleintoyourprograms,you’llhaveaccesstothepprint()andpformat()functionsthatwill“prettyprint”adictionary’svalues.Thisishelpfulwhenyouwantacleanerdisplayoftheitemsinadictionarythanwhatprint()provides.ModifythepreviouscharacterCount.pyprogramandsaveitasprettyCharacterCount.py.
importpprint
message='ItwasabrightcolddayinApril,andtheclockswerestriking
thirteen.'
count={}
forcharacterinmessage:
count.setdefault(character,0)
count[character]=count[character]+1
pprint.pprint(count)
Thistime,whentheprogramisrun,theoutputlooksmuchcleaner,withthekeyssorted.{'':13,
',':1,
'.':1,
'A':1,
'I':1,
'a':4,
'b':1,
'c':3,
'd':3,
'e':5,
'g':2,
'h':3,
'i':6,
'k':2,
'l':3,
'n':4,
'o':2,
'p':1,
'r':5,
's':3,
't':6,
'w':2,
'y':1}
Thepprint.pprint()functionisespeciallyhelpfulwhenthedictionaryitselfcontainsnestedlistsordictionaries.
Ifyouwanttoobtaintheprettifiedtextasastringvalueinsteadofdisplayingitonthescreen,callpprint.pformat()instead.Thesetwolinesareequivalenttoeachother:
pprint.pprint(someDictionaryValue)
print(pprint.pformat(someDictionaryValue))
UsingDataStructurestoModelReal-WorldThingsEvenbeforetheInternet,itwaspossibletoplayagameofchesswithsomeoneontheothersideoftheworld.Eachplayerwouldsetupachessboardattheirhomeandthentaketurnsmailingapostcardtoeachotherdescribingeachmove.Todothis,theplayersneededawaytounambiguouslydescribethestateoftheboardandtheirmoves.
Inalgebraicchessnotation,thespacesonthechessboardareidentifiedbyanumberandlettercoordinate,asinFigure5-1.
Figure5-1.Thecoordinatesofachessboardinalgebraicchessnotation
Thechesspiecesareidentifiedbyletters:Kforking,Qforqueen,Rforrook,Bforbishop,andNforknight.Describingamoveusestheletterofthepieceandthecoordinatesofitsdestination.Apairofthesemovesdescribeswhathappensinasingleturn(withwhitegoingfirst);forinstance,thenotation2.Nf3Nc6indicatesthatwhitemovedaknighttof3andblackmovedaknighttoc6onthesecondturnofthegame.
There’sabitmoretoalgebraicnotationthanthis,butthepointisthatyoucanuseittounambiguouslydescribeagameofchesswithoutneedingtobeinfrontofachessboard.Youropponentcanevenbeontheothersideoftheworld!Infact,youdon’tevenneedaphysicalchesssetifyouhaveagoodmemory:Youcanjustreadthemailedchessmovesandupdateboardsyouhaveinyourimagination.
Computershavegoodmemories.Aprogramonamoderncomputercaneasilystorebillionsofstringslike'2.Nf3Nc6'.Thisishowcomputerscanplaychesswithouthavingaphysicalchessboard.Theymodeldatatorepresentachessboard,andyoucanwritecodetoworkwiththismodel.
Thisiswherelistsanddictionariescancomein.Youcanusethemtomodelreal-worldthings,likechessboards.Forthefirstexample,you’lluseagamethat’salittlesimplerthanchess:tic-tac-toe.
ATic-Tac-ToeBoardAtic-tac-toeboardlookslikealargehashsymbol(#)withnineslotsthatcaneachcontainanX,anO,orablank.Torepresenttheboardwithadictionary,youcanassigneachslotastring-valuekey,asshowninFigure5-2.
Youcanusestringvaluestorepresentwhat’sineachslotontheboard:'X','O',or''(aspacecharacter).Thus,you’llneedtostoreninestrings.Youcanuseadictionaryofvaluesforthis.Thestringvaluewiththekey'top-R'canrepresentthetop-rightcorner,thestringvaluewiththekey'low-L'canrepresentthebottom-leftcorner,thestringvaluewiththekey'mid-M'canrepresentthemiddle,andsoon.
Figure5-2.Theslotsofatic-tactoeboardwiththeircorrespondingkeys
Thisdictionaryisadatastructurethatrepresentsatic-tac-toeboard.Storethisboard-as-a-dictionaryinavariablenamedtheBoard.Openanewfileeditorwindow,andenterthefollowingsourcecode,savingitasticTacToe.py:
theBoard={'top-L':'','top-M':'','top-R':'',
'mid-L':'','mid-M':'','mid-R':'',
'low-L':'','low-M':'','low-R':''}
ThedatastructurestoredinthetheBoardvariablerepresentsthetic-tactoeboardinFigure5-3.
Figure5-3.Anemptytic-tac-toeboard
SincethevalueforeverykeyintheBoardisasingle-spacestring,thisdictionaryrepresentsacompletelyclearboard.IfplayerXwentfirstandchosethemiddlespace,youcouldrepresentthatboardwiththisdictionary:
theBoard={'top-L':'','top-M':'','top-R':'',
'mid-L':'','mid-M':'X','mid-R':'',
'low-L':'','low-M':'','low-R':''}
ThedatastructureintheBoardnowrepresentsthetic-tac-toeboardinFigure5-4.
Figure5-4.Thefirstmove
AboardwhereplayerOhaswonbyplacingOsacrossthetopmightlooklikethis:theBoard={'top-L':'O','top-M':'O','top-R':'O',
'mid-L':'X','mid-M':'X','mid-R':'',
'low-L':'','low-M':'','low-R':'X'}
ThedatastructureintheBoardnowrepresentsthetic-tac-toeboardinFigure5-5.
Figure5-5.PlayerOwins.
Ofcourse,theplayerseesonlywhatisprintedtothescreen,notthecontentsofvariables.Let’screateafunctiontoprinttheboarddictionaryontothescreen.MakethefollowingadditiontoticTacToe.py(newcodeisinbold):
theBoard={'top-L':'','top-M':'','top-R':'',
'mid-L':'','mid-M':'','mid-R':'',
'low-L':'','low-M':'','low-R':''}
defprintBoard(board):
print(board['top-L']+'|'+board['top-M']+'|'+board['top-R'])
print('-+-+-')
print(board['mid-L']+'|'+board['mid-M']+'|'+board['mid-R'])
print('-+-+-')
print(board['low-L']+'|'+board['low-M']+'|'+board['low-R'])
printBoard(theBoard)
Whenyourunthisprogram,printBoard()willprintoutablanktic-tactoeboard.||
-+-+-
||
-+-+-
||
TheprintBoard()functioncanhandleanytic-tac-toedatastructureyoupassit.Trychangingthecodetothefollowing:
theBoard={'top-L':'O','top-M':'O','top-R':'O','mid-L':'X','mid-M':
'X','mid-R':'','low-L':'','low-M':'','low-R':'X'}
defprintBoard(board):
print(board['top-L']+'|'+board['top-M']+'|'+board['top-R'])
print('-+-+-')
print(board['mid-L']+'|'+board['mid-M']+'|'+board['mid-R'])
print('-+-+-')
print(board['low-L']+'|'+board['low-M']+'|'+board['low-R'])
printBoard(theBoard)
Nowwhenyourunthisprogram,thenewboardwillbeprintedtothescreen.O|O|O
-+-+-
X|X|
-+-+-
||X
Becauseyoucreatedadatastructuretorepresentatic-tac-toeboardandwrotecodeinprintBoard()tointerpretthatdatastructure,younowhaveaprogramthat“models”thetic-tac-toeboard.Youcouldhaveorganizedyourdatastructuredifferently(forexample,usingkeyslike'TOP-LEFT'insteadof'top-L'),butaslongasthecodeworkswithyourdatastructures,youwillhaveacorrectlyworkingprogram.
Forexample,theprintBoard()functionexpectsthetic-tac-toedatastructuretobeadictionarywithkeysforallnineslots.Ifthedictionaryyoupassedwasmissing,say,the'mid-L'key,yourprogramwouldnolongerwork.
O|O|O
-+-+-
Traceback(mostrecentcalllast):
File"ticTacToe.py",line10,in<module>
printBoard(theBoard)
File"ticTacToe.py",line6,inprintBoard
print(board['mid-L']+'|'+board['mid-M']+'|'+board['mid-R'])
KeyError:'mid-L'
Nowlet’saddcodethatallowstheplayerstoentertheirmoves.ModifytheticTacToe.pyprogramtolooklikethis:
theBoard={'top-L':'','top-M':'','top-R':'','mid-L':'','mid-M':'
','mid-R':'','low-L':'','low-M':'','low-R':''}
defprintBoard(board):
print(board['top-L']+'|'+board['top-M']+'|'+board['top-R'])
print('-+-+-')
print(board['mid-L']+'|'+board['mid-M']+'|'+board['mid-R'])
print('-+-+-')
print(board['low-L']+'|'+board['low-M']+'|'+board['low-R'])
turn='X'
foriinrange(9):
➊printBoard(theBoard)
print('Turnfor'+turn+'.Moveonwhichspace?')
➋move=input()
➌theBoard[move]=turn
➍ifturn=='X':
turn='O'
else:
turn='X'
printBoard(theBoard)
Thenewcodeprintsouttheboardatthestartofeachnewturn➊,getstheactiveplayer’s
move➋,updatesthegameboardaccordingly➌,andthenswapstheactiveplayer➍beforemovingontothenextturn.
Whenyourunthisprogram,itwilllooksomethinglikethis:||
-+-+-
||
-+-+-
||
TurnforX.Moveonwhichspace?
mid-M
||
-+-+-
|X|
-+-+-
||
TurnforO.Moveonwhichspace?
low-L
||
-+-+-
|X|
-+-+-
O||
--snip--
O|O|X
-+-+-
X|X|O
-+-+-
O||X
TurnforX.Moveonwhichspace?
low-M
O|O|X
-+-+-
X|X|O
-+-+-
O|X|X
Thisisn’tacompletetic-tac-toegame—forinstance,itdoesn’tevercheckwhetheraplayerhaswon—butit’senoughtoseehowdatastructurescanbeusedinprograms.
NOTE
Ifyouarecurious,thesourcecodeforacompletetic-tac-toeprogramisdescribedintheresourcesavailablefromhttp://nostarch.com/automatestuff/.
NestedDictionariesandListsModelingatic-tac-toeboardwasfairlysimple:Theboardneededonlyasingledictionaryvaluewithninekey-valuepairs.Asyoumodelmorecomplicatedthings,youmayfindyouneeddictionariesandliststhatcontainotherdictionariesandlists.Listsareusefultocontainanorderedseriesofvalues,anddictionariesareusefulforassociatingkeyswithvalues.Forexample,here’saprogramthatusesadictionarythatcontainsotherdictionariesinordertoseewhoisbringingwhattoapicnic.ThetotalBrought()functioncanreadthisdatastructureandcalculatethetotalnumberofanitembeingbroughtbyalltheguests.
allGuests={'Alice':{'apples':5,'pretzels':12},
'Bob':{'hamsandwiches':3,'apples':2},
'Carol':{'cups':3,'applepies':1}}
deftotalBrought(guests,item):
numBrought=0
➊fork,vinguests.items():
➋numBrought=numBrought+v.get(item,0)
returnnumBrought
print('Numberofthingsbeingbrought:')
print('-Apples'+str(totalBrought(allGuests,'apples')))
print('-Cups'+str(totalBrought(allGuests,'cups')))
print('-Cakes'+str(totalBrought(allGuests,'cakes')))
print('-HamSandwiches'+str(totalBrought(allGuests,'hamsandwiches')))
print('-ApplePies'+str(totalBrought(allGuests,'applepies')))
InsidethetotalBrought()function,theforloopiteratesoverthekey-valuepairsinguests➊.Insidetheloop,thestringoftheguest’snameisassignedtok,andthedictionaryofpicnicitemsthey’rebringingisassignedtov.Iftheitemparameterexistsasakeyinthisdictionary,it’svalue(thequantity)isaddedtonumBrought➋.Ifitdoesnotexistasakey,theget()methodreturns0tobeaddedtonumBrought.
Theoutputofthisprogramlookslikethis:Numberofthingsbeingbrought:
-Apples7
-Cups3
-Cakes0
-HamSandwiches3
-ApplePies1
Thismayseemlikesuchasimplethingtomodelthatyouwouldn’tneedtobotherwithwritingaprogramtodoit.ButrealizethatthissametotalBrought()functioncouldeasilyhandleadictionarythatcontainsthousandsofguests,eachbringingthousandsofdifferentpicnicitems.ThenhavingthisinformationinadatastructurealongwiththetotalBrought()functionwouldsaveyoualotoftime!
Youcanmodelthingswithdatastructuresinwhateverwayyoulike,aslongastherestofthecodeinyourprogramcanworkwiththedatamodelcorrectly.Whenyoufirstbeginprogramming,don’tworrysomuchaboutthe“right”waytomodeldata.Asyougainmoreexperience,youmaycomeupwithmoreefficientmodels,buttheimportantthingisthatthedatamodelworksforyourprogram’sneeds.
SummaryYoulearnedallaboutdictionariesinthischapter.Listsanddictionariesarevaluesthatcancontainmultiplevalues,includingotherlistsanddictionaries.Dictionariesareusefulbecauseyoucanmaponeitem(thekey)toanother(thevalue),asopposedtolists,whichsimplycontainaseriesofvaluesinorder.Valuesinsideadictionaryareaccessedusingsquarebracketsjustaswithlists.Insteadofanintegerindex,dictionariescanhavekeysofavarietyofdatatypes:integers,floats,strings,ortuples.Byorganizingaprogram’svaluesintodatastructures,youcancreaterepresentationsofreal-worldobjects.Yousawanexampleofthiswithatic-tac-toeboard.
ThatjustaboutcoversallthebasicconceptsofPythonprogramming!You’llcontinuetolearnnewconceptsthroughouttherestofthisbook,butyounowknowenoughtostartwritingsomeusefulprogramsthatcanautomatetasks.YoumightnotthinkyouhaveenoughPythonknowledgetodothingssuchasdownloadwebpages,updatespreadsheets,orsendtextmessages,butthat’swherePythonmodulescomein!Thesemodules,writtenbyotherprogrammers,providefunctionsthatmakeiteasyforyoutodoallthesethings.Solet’slearnhowtowriterealprogramstodousefulautomatedtasks.
PracticeQuestionsQ: 1.Whatdoesthecodeforanemptydictionarylooklike?
Q: 2.Whatdoesadictionaryvaluewithakey'foo'andavalue42looklike?
Q: 3.Whatisthemaindifferencebetweenadictionaryandalist?
Q: 4.Whathappensifyoutrytoaccessspam['foo']ifspamis{'bar':100}?
Q: 5.Ifadictionaryisstoredinspam,whatisthedifferencebetweentheexpressions'cat'inspamand'cat'inspam.keys()?
Q: 6.Ifadictionaryisstoredinspam,whatisthedifferencebetweentheexpressions'cat'inspamand'cat'inspam.values()?
Q: 7.Whatisashortcutforthefollowingcode?if'color'notinspam:
spam['color']='black'
Q: 8.Whatmoduleandfunctioncanbeusedto“prettyprint”dictionaryvalues?
PracticeProjectsForpractice,writeprogramstodothefollowingtasks.
FantasyGameInventoryYouarecreatingafantasyvideogame.Thedatastructuretomodeltheplayer’sinventorywillbeadictionarywherethekeysarestringvaluesdescribingtheitemintheinventoryandthevalueisanintegervaluedetailinghowmanyofthatitemtheplayerhas.Forexample,thedictionaryvalue{'rope':1,'torch':6,'goldcoin':42,'dagger':1,'arrow':12}meanstheplayerhas1rope,6torches,42goldcoins,andsoon.
WriteafunctionnameddisplayInventory()thatwouldtakeanypossible“inventory”anddisplayitlikethefollowing:
Inventory:
12arrow
42goldcoin
1rope
6torch
1dagger
Totalnumberofitems:63
Hint:Youcanuseaforlooptoloopthroughallthekeysinadictionary.#inventory.py
stuff={'rope':1,'torch':6,'goldcoin':42,'dagger':1,'arrow':12}
defdisplay_inventory(inventory):
print("Inventory:")
item_total=0
fork,vininventory.items():
print(str(v)+''+k)
item_total+=v
print("Totalnumberofitems:"+str(item_total))
display_inventory(stuff)
ListtoDictionaryFunctionforFantasyGameInventoryImaginethatavanquisheddragon’slootisrepresentedasalistofstringslikethis:
dragonLoot=['goldcoin','dagger','goldcoin','goldcoin','ruby']
WriteafunctionnamedaddToInventory(inventory,addedItems),wheretheinventoryparameterisadictionaryrepresentingtheplayer’sinventory(likeinthepreviousproject)andtheaddedItemsparameterisalistlikedragonLoot.TheaddToInventory()functionshouldreturnadictionarythatrepresentstheupdatedinventory.NotethattheaddedItemslistcancontainmultiplesofthesameitem.Yourcodecouldlooksomethinglikethis:
defaddToInventory(inventory,addedItems):
#yourcodegoeshere
inv={'goldcoin':42,'rope':1}
dragonLoot=['goldcoin','dagger','goldcoin','goldcoin','ruby']
inv=addToInventory(inv,dragonLoot)
displayInventory(inv)
Thepreviousprogram(withyourdisplayInventory()functionfromthepreviousproject)wouldoutputthefollowing:
Inventory:
45goldcoin
Chapter6.ManipulatingStringsTextisoneofthemostcommonformsofdatayourprogramswillhandle.Youalreadyknowhowtoconcatenatetwostringvaluestogetherwiththe+operator,butyoucandomuchmorethanthat.Youcanextractpartialstringsfromstringvalues,addorremovespacing,convertletterstolowercaseoruppercase,andcheckthatstringsareformattedcorrectly.YoucanevenwritePythoncodetoaccesstheclipboardforcopyingandpastingtext.
Inthischapter,you’lllearnallthisandmore.Thenyou’llworkthroughtwodifferentprogrammingprojects:asimplepasswordmanagerandaprogramtoautomatetheboringchoreofformattingpiecesoftext.
WorkingwithStringsLet’slookatsomeofthewaysPythonletsyouwrite,print,andaccessstringsinyourcode.
StringLiteralsTypingstringvaluesinPythoncodeisfairlystraightforward:Theybeginandendwithasinglequote.Butthenhowcanyouuseaquoteinsideastring?Typing'ThatisAlice'scat.'won’twork,becausePythonthinksthestringendsafterAlice,andtherest(scat.')isinvalidPythoncode.Fortunately,therearemultiplewaystotypestrings.
DoubleQuotes
Stringscanbeginandendwithdoublequotes,justastheydowithsinglequotes.Onebenefitofusingdoublequotesisthatthestringcanhaveasinglequotecharacterinit.Enterthefollowingintotheinteractiveshell:
>>>spam="ThatisAlice'scat."
Sincethestringbeginswithadoublequote,Pythonknowsthatthesinglequoteispartofthestringandnotmarkingtheendofthestring.However,ifyouneedtousebothsinglequotesanddoublequotesinthestring,you’llneedtouseescapecharacters.
EscapeCharacters
Anescapecharacterletsyouusecharactersthatareotherwiseimpossibletoputintoastring.Anescapecharacterconsistsofabackslash(\)followedbythecharacteryouwanttoaddtothestring.(Despiteconsistingoftwocharacters,itiscommonlyreferredtoasasingularescapecharacter.)Forexample,theescapecharacterforasinglequoteis\'.Youcanusethisinsideastringthatbeginsandendswithsinglequotes.Toseehowescapecharacterswork,enterthefollowingintotheinteractiveshell:
>>>spam='SayhitoBob\'smother.'
PythonknowsthatsincethesinglequoteinBob\'shasabackslash,itisnotasinglequotemeanttoendthestringvalue.Theescapecharacters\'and\"letyouputsinglequotesanddoublequotesinsideyourstrings,respectively.
Table6-1liststheescapecharactersyoucanuse.
Table6-1.EscapeCharacters
Escapecharacter Printsas
\' Singlequote
\" Doublequote
\t Tab
\n Newline(linebreak)
\\ Backslash
Enterthefollowingintotheinteractiveshell:>>>print("Hellothere!\nHowareyou?\nI\'mdoingfine.")
Hellothere!
Howareyou?
I'mdoingfine.
RawStrings
Youcanplaceanrbeforethebeginningquotationmarkofastringtomakeitarawstring.Arawstringcompletelyignoresallescapecharactersandprintsanybackslashthatappearsinthestring.Forexample,typethefollowingintotheinteractiveshell:
>>>print(r'ThatisCarol\'scat.')
ThatisCarol\'scat.
Becausethisisarawstring,Pythonconsidersthebackslashaspartofthestringandnotasthestartofanescapecharacter.Rawstringsarehelpfulifyouaretypingstringvaluesthatcontainmanybackslashes,suchasthestringsusedforregularexpressionsdescribedinthenextchapter.
MultilineStringswithTripleQuotes
Whileyoucanusethe\nescapecharactertoputanewlineintoastring,itisofteneasiertousemultilinestrings.AmultilinestringinPythonbeginsandendswitheitherthreesinglequotesorthreedoublequotes.Anyquotes,tabs,ornewlinesinbetweenthe“triplequotes”areconsideredpartofthestring.Python’sindentationrulesforblocksdonotapplytolinesinsideamultilinestring.
Openthefileeditorandwritethefollowing:print('''DearAlice,
Eve'scathasbeenarrestedforcatnapping,catburglary,andextortion.
Sincerely,
Bob''')
Savethisprogramascatnapping.pyandrunit.Theoutputwilllooklikethis:DearAlice,
Eve'scathasbeenarrestedforcatnapping,catburglary,andextortion.
Sincerely,
Bob
NoticethatthesinglequotecharacterinEve'sdoesnotneedtobeescaped.Escapingsingleanddoublequotesisoptionalinrawstrings.Thefollowingprint()callwouldprintidenticaltextbutdoesn’tuseamultilinestring:
print('DearAlice,\n\nEve\'scathasbeenarrestedforcatnapping,cat
burglary,andextortion.\n\nSincerely,\nBob')
MultilineComments
Whilethehashcharacter(#)marksthebeginningofacommentfortherestoftheline,amultilinestringisoftenusedforcommentsthatspanmultiplelines.ThefollowingisperfectlyvalidPythoncode:
"""ThisisatestPythonprogram.
ThisprogramwasdesignedforPython3,notPython2.
"""
defspam():
"""Thisisamultilinecommenttohelp
explainwhatthespam()functiondoes."""
print('Hello!')
IndexingandSlicingStringsStringsuseindexesandslicesthesamewaylistsdo.Youcanthinkofthestring'Helloworld!'asalistandeachcharacterinthestringasanitemwithacorrespondingindex.
'Helloworld!'
01234567891011
Thespaceandexclamationpointareincludedinthecharactercount,so'Helloworld!'is12characterslong,fromHatindex0to!atindex11.
Enterthefollowingintotheinteractiveshell:>>>spam='Helloworld!'
>>>spam[0]
'H'
>>>spam[4]
'o'
>>>spam[-1]
'!'
>>>spam[0:5]
'Hello'
>>>spam[:5]
'Hello'
>>>spam[6:]
'world!'
Ifyouspecifyanindex,you’llgetthecharacteratthatpositioninthestring.Ifyouspecifyarangefromoneindextoanother,thestartingindexisincludedandtheendingindexisnot.That’swhy,ifspamis'Helloworld!',spam[0:5]is'Hello'.Thesubstringyougetfromspam[0:5]willincludeeverythingfromspam[0]tospam[4],leavingoutthespaceatindex5.
Notethatslicingastringdoesnotmodifytheoriginalstring.Youcancaptureaslicefromonevariableinaseparatevariable.Trytypingthefollowingintotheinteractiveshell:
>>>spam='Helloworld!'
>>>fizz=spam[0:5]
>>>fizz
'Hello'
Byslicingandstoringtheresultingsubstringinanothervariable,youcanhaveboththewholestringandthesubstringhandyforquick,easyaccess.
TheinandnotinOperatorswithStringsTheinandnotinoperatorscanbeusedwithstringsjustlikewithlistvalues.AnexpressionwithtwostringsjoinedusinginornotinwillevaluatetoaBooleanTrueorFalse.Enterthefollowingintotheinteractiveshell:
>>>'Hello'in'HelloWorld'
True
>>>'Hello'in'Hello'
True
>>>'HELLO'in'HelloWorld'
False
>>>''in'spam'
True
>>>'cats'notin'catsanddogs'
False
Theseexpressionstestwhetherthefirststring(theexactstring,casesensitive)canbefoundwithinthesecondstring.
UsefulStringMethodsSeveralstringmethodsanalyzestringsorcreatetransformedstringvalues.Thissectiondescribesthemethodsyou’llbeusingmostoften.
Theupper(),lower(),isupper(),andislower()StringMethodsTheupper()andlower()stringmethodsreturnanewstringwhereallthelettersintheoriginalstringhavebeenconvertedtouppercaseorlower-case,respectively.Nonlettercharactersinthestringremainunchanged.Enterthefollowingintotheinteractiveshell:
>>>spam='Helloworld!'
>>>spam=spam.upper()
>>>spam
'HELLOWORLD!'
>>>spam=spam.lower()
>>>spam
'helloworld!'
Notethatthesemethodsdonotchangethestringitselfbutreturnnewstringvalues.Ifyouwanttochangetheoriginalstring,youhavetocallupper()orlower()onthestringandthenassignthenewstringtothevariablewheretheoriginalwasstored.Thisiswhyyoumustusespam=spam.upper()tochangethestringinspaminsteadofsimplyspam.upper().(Thisisjustlikeifavariableeggscontainsthevalue10.Writingeggs+3doesnotchangethevalueofeggs,buteggs=eggs+3does.)
Theupper()andlower()methodsarehelpfulifyouneedtomakeacase-insensitivecomparison.Thestrings'great'and'GREat'arenotequaltoeachother.Butinthefollowingsmallprogram,itdoesnotmatterwhethertheusertypesGreat,GREAT,orgrEAT,becausethestringisfirstconvertedtolowercase.
print('Howareyou?')
feeling=input()
iffeeling.lower()=='great':
print('Ifeelgreattoo.')
else:
print('Ihopetherestofyourdayisgood.')
Whenyourunthisprogram,thequestionisdisplayed,andenteringavariationongreat,suchasGREat,willstillgivetheoutputIfeelgreattoo.Addingcodetoyourprogramtohandlevariationsormistakesinuserinput,suchasinconsistentcapitalization,willmakeyourprogramseasiertouseandlesslikelytofail.
Howareyou?
GREat
Ifeelgreattoo.
Theisupper()andislower()methodswillreturnaBooleanTruevalueifthestringhasatleastoneletterandallthelettersareuppercaseorlowercase,respectively.Otherwise,themethodreturnsFalse.Enterthefollowingintotheinteractiveshell,andnoticewhateachmethodcallreturns:
>>>spam='Helloworld!'
>>>spam.islower()
False
>>>spam.isupper()
False
>>>'HELLO'.isupper()
True
>>>'abc12345'.islower()
True
>>>'12345'.islower()
False
>>>'12345'.isupper()
False
Sincetheupper()andlower()stringmethodsthemselvesreturnstrings,youcancallstringmethodsonthosereturnedstringvaluesaswell.Expressionsthatdothiswilllooklikeachainofmethodcalls.Enterthefollowingintotheinteractiveshell:
>>>'Hello'.upper()
'HELLO'
>>>'Hello'.upper().lower()
'hello'
>>>'Hello'.upper().lower().upper()
'HELLO'
>>>'HELLO'.lower()
'hello'
>>>'HELLO'.lower().islower()
True
TheisXStringMethodsAlongwithislower()andisupper(),thereareseveralstringmethodsthathavenamesbeginningwiththewordis.ThesemethodsreturnaBooleanvaluethatdescribesthenatureofthestring.HerearesomecommonisXstringmethods:
isalpha()returnsTrueifthestringconsistsonlyoflettersandisnotblank.isalnum()returnsTrueifthestringconsistsonlyoflettersandnumbersandisnotblank.isdecimal()returnsTrueifthestringconsistsonlyofnumericcharactersandisnotblank.isspace()returnsTrueifthestringconsistsonlyofspaces,tabs,andnew-linesandisnotblank.istitle()returnsTrueifthestringconsistsonlyofwordsthatbeginwithanuppercaseletterfollowedbyonlylowercaseletters.
Enterthefollowingintotheinteractiveshell:>>>'hello'.isalpha()
True
>>>'hello123'.isalpha()
False
>>>'hello123'.isalnum()
True
>>>'hello'.isalnum()
True
>>>'123'.isdecimal()
True
>>>''.isspace()
True
>>>'ThisIsTitleCase'.istitle()
True
>>>'ThisIsTitleCase123'.istitle()
True
>>>'ThisIsnotTitleCase'.istitle()
False
>>>'ThisIsNOTTitleCaseEither'.istitle()
False
TheisXstringmethodsarehelpfulwhenyouneedtovalidateuserinput.Forexample,thefollowingprogramrepeatedlyasksusersfortheirageandapassworduntiltheyprovidevalidinput.Openanewfileeditorwindowandenterthisprogram,savingitasvalidateInput.py:
whileTrue:
print('Enteryourage:')
age=input()
ifage.isdecimal():
break
print('Pleaseenteranumberforyourage.')
whileTrue:
print('Selectanewpassword(lettersandnumbersonly):')
password=input()
ifpassword.isalnum():
break
print('Passwordscanonlyhavelettersandnumbers.')
Inthefirstwhileloop,weasktheuserfortheirageandstoretheirinputinage.Ifageisavalid(decimal)value,webreakoutofthisfirstwhileloopandmoveontothesecond,whichasksforapassword.Otherwise,weinformtheuserthattheyneedtoenteranumberandagainaskthemtoentertheirage.Inthesecondwhileloop,weaskforapassword,storetheuser’sinputinpassword,andbreakoutoftheloopiftheinputwasalphanumeric.Ifitwasn’t,we’renotsatisfiedsowetelltheuserthepasswordneedstobealphanumericandagainaskthemtoenterapassword.
Whenrun,theprogram’soutputlookslikethis:Enteryourage:
fortytwo
Pleaseenteranumberforyourage.
Enteryourage:
42
Selectanewpassword(lettersandnumbersonly):
secr3t!
Passwordscanonlyhavelettersandnumbers.
Selectanewpassword(lettersandnumbersonly):
secr3t
Callingisdecimal()andisalnum()onvariables,we’reabletotestwhetherthevaluesstoredinthosevariablesaredecimalornot,alphanumericornot.Here,thesetestshelpusrejecttheinputfortytwoandaccept42,andrejectsecr3t!andacceptsecr3t.
Thestartswith()andendswith()StringMethodsThestartswith()andendswith()methodsreturnTrueifthestringvaluetheyarecalledonbeginsorends(respectively)withthestringpassedtothemethod;otherwise,theyreturnFalse.Enterthefollowingintotheinteractiveshell:
>>>'Helloworld!'.startswith('Hello')
True
>>>'Helloworld!'.endswith('world!')
True
>>>'abc123'.startswith('abcdef')
False
>>>'abc123'.endswith('12')
False
>>>'Helloworld!'.startswith('Helloworld!')
True
>>>'Helloworld!'.endswith('Helloworld!')
True
Thesemethodsareusefulalternativestothe==equalsoperatorifyouneedtocheckonlywhetherthefirstorlastpartofthestring,ratherthanthewholething,isequaltoanotherstring.
Thejoin()andsplit()StringMethodsThejoin()methodisusefulwhenyouhavealistofstringsthatneedtobejoinedtogetherintoasinglestringvalue.Thejoin()methodiscalledonastring,getspasseda
listofstrings,andreturnsastring.Thereturnedstringistheconcatenationofeachstringinthepassed-inlist.Forexample,enterthefollowingintotheinteractiveshell:
>>>','.join(['cats','rats','bats'])
'cats,rats,bats'
>>>''.join(['My','name','is','Simon'])
'MynameisSimon'
>>>'ABC'.join(['My','name','is','Simon'])
'MyABCnameABCisABCSimon'
Noticethatthestringjoin()callsonisinsertedbetweeneachstringofthelistargument.Forexample,whenjoin(['cats','rats','bats'])iscalledonthe','string,thereturnedstringis‘cats,rats,bats’.
Rememberthatjoin()iscalledonastringvalueandispassedalistvalue.(It’seasytoaccidentallycallittheotherwayaround.)Thesplit()methoddoestheopposite:It’scalledonastringvalueandreturnsalistofstrings.Enterthefollowingintotheinteractiveshell:
>>>'MynameisSimon'.split()
['My','name','is','Simon']
Bydefault,thestring'MynameisSimon'issplitwhereverwhitespacecharacterssuchasthespace,tab,ornewlinecharactersarefound.Thesewhitespacecharactersarenotincludedinthestringsinthereturnedlist.Youcanpassadelimiterstringtothesplit()methodtospecifyadifferentstringtosplitupon.Forexample,enterthefollowingintotheinteractiveshell:
>>>'MyABCnameABCisABCSimon'.split('ABC')
['My','name','is','Simon']
>>>'MynameisSimon'.split('m')
['Myna','eisSi','on']
Acommonuseofsplit()istosplitamultilinestringalongthenewlinecharacters.Enterthefollowingintotheinteractiveshell:
>>>spam='''DearAlice,
Howhaveyoubeen?Iamfine.
Thereisacontainerinthefridge
thatislabeled"MilkExperiment".
Pleasedonotdrinkit.
Sincerely,
Bob'''
>>>spam.split('\n')
['DearAlice,','Howhaveyoubeen?Iamfine.','Thereisacontainerinthe
fridge','thatislabeled"MilkExperiment".','','Pleasedonotdrinkit.',
'Sincerely,','Bob']
Passingsplit()theargument'\n'letsussplitthemultilinestringstoredinspamalongthenewlinesandreturnalistinwhicheachitemcorrespondstoonelineofthestring.
JustifyingTextwithrjust(),ljust(),andcenter()Therjust()andljust()stringmethodsreturnapaddedversionofthestringtheyarecalledon,withspacesinsertedtojustifythetext.Thefirstargumenttobothmethodsisanintegerlengthforthejustifiedstring.Enterthefollowingintotheinteractiveshell:
>>>'Hello'.rjust(10)
'Hello'
>>>'Hello'.rjust(20)
'Hello'
>>>'HelloWorld'.rjust(20)
'HelloWorld'
>>>'Hello'.ljust(10)
'Hello'
'Hello'.rjust(10)saysthatwewanttoright-justify'Hello'inastringoftotallength10.'Hello'isfivecharacters,sofivespaceswillbeaddedtoitsleft,givingusastringof10characterswith'Hello'justifiedright.
Anoptionalsecondargumenttorjust()andljust()willspecifyafillcharacterotherthanaspacecharacter.Enterthefollowingintotheinteractiveshell:
>>>'Hello'.rjust(20,'*')
'***************Hello'
>>>'Hello'.ljust(20,'-')
'Hello---------------'
Thecenter()stringmethodworkslikeljust()andrjust()butcentersthetextratherthanjustifyingittotheleftorright.Enterthefollowingintotheinteractiveshell:
>>>'Hello'.center(20)
'Hello'
>>>'Hello'.center(20,'=')
'=======Hello========'
Thesemethodsareespeciallyusefulwhenyouneedtoprinttabulardatathathasthecorrectspacing.Openanewfileeditorwindowandenterthefollowingcode,savingitaspicnicTable.py:
defprintPicnic(itemsDict,leftWidth,rightWidth):
print('PICNICITEMS'.center(leftWidth+rightWidth,'-'))
fork,vinitemsDict.items():
print(k.ljust(leftWidth,'.')+str(v).rjust(rightWidth))
picnicItems={'sandwiches':4,'apples':12,'cups':4,'cookies':8000}
printPicnic(picnicItems,12,5)
printPicnic(picnicItems,20,6)
Inthisprogram,wedefineaprintPicnic()methodthatwilltakeinadictionaryofinformationandusecenter(),ljust(),andrjust()todisplaythatinformationinaneatlyalignedtable-likeformat.
Thedictionarythatwe’llpasstoprintPicnic()ispicnicItems.InpicnicItems,wehave4sandwiches,12apples,4cups,and8000cookies.Wewanttoorganizethisinformationintotwocolumns,withthenameoftheitemontheleftandthequantityontheright.
Todothis,wedecidehowwidewewanttheleftandrightcolumnstobe.Alongwithourdictionary,we’llpassthesevaluestoprintPicnic().
printPicnic()takesinadictionary,aleftWidthfortheleftcolumnofatable,andarightWidthfortherightcolumn.Itprintsatitle,PICNICITEMS,centeredabovethetable.Then,itloopsthroughthedictionary,printingeachkey-valuepaironalinewiththekeyjustifiedleftandpaddedbyperiods,andthevaluejustifiedrightandpaddedbyspaces.
AfterdefiningprintPicnic(),wedefinethedictionarypicnicItemsandcallprintPicnic()twice,passingitdifferentwidthsfortheleftandrighttablecolumns.
Whenyourunthisprogram,thepicnicitemsaredisplayedtwice.Thefirsttimetheleftcolumnis12characterswide,andtherightcolumnis5characterswide.Thesecondtimetheyare20and6characterswide,respectively.
---PICNICITEMS--
sandwiches..4
apples…...12
cups….....4
cookies…..8000
-------PICNICITEMS-------
sandwiches….......4
apples…...........12
cups….............4
cookies…..........8000
Usingrjust(),ljust(),andcenter()letsyouensurethatstringsareneatlyaligned,evenifyouaren’tsurehowmanycharacterslongyourstringsare.
RemovingWhitespacewithstrip(),rstrip(),andlstrip()Sometimesyoumaywanttostripoffwhitespacecharacters(space,tab,andnewline)fromtheleftside,rightside,orbothsidesofastring.Thestrip()stringmethodwillreturnanewstringwithoutanywhitespacecharactersatthebeginningorend.Thelstrip()andrstrip()methodswillremovewhitespacecharactersfromtheleftandrightends,respectively.Enterthefollowingintotheinteractiveshell:
>>>spam='HelloWorld'
>>>spam.strip()
'HelloWorld'
>>>spam.lstrip()
'HelloWorld'
>>>spam.rstrip()
'HelloWorld'
Optionally,astringargumentwillspecifywhichcharactersontheendsshouldbestripped.Enterthefollowingintotheinteractiveshell:
>>>spam='SpamSpamBaconSpamEggsSpamSpam'
>>>spam.strip('ampS')
'BaconSpamEggs'
Passingstrip()theargument'ampS'willtellittostripoccurencesofa,m,p,andcapitalSfromtheendsofthestringstoredinspam.Theorderofthecharactersinthestringpassedtostrip()doesnotmatter:strip('ampS')willdothesamethingasstrip('mapS')orstrip('Spam').
CopyingandPastingStringswiththepyperclipModuleThepyperclipmodulehascopy()andpaste()functionsthatcansendtexttoandreceivetextfromyourcomputer’sclipboard.Sendingtheoutputofyourprogramtotheclipboardwillmakeiteasytopasteittoanemail,wordprocessor,orsomeothersoftware.
PyperclipdoesnotcomewithPython.Toinstallit,followthedirectionsforinstallingthird-partymodulesinAppendixA.Afterinstallingthepyperclipmodule,enterthefollowingintotheinteractiveshell:
>>>importpyperclip
>>>pyperclip.copy('Helloworld!')
>>>pyperclip.paste()
'Helloworld!'
Ofcourse,ifsomethingoutsideofyourprogramchangestheclipboardcontents,thepaste()functionwillreturnit.Forexample,ifIcopiedthissentencetotheclipboardandthencalledpaste(),itwouldlooklikethis:
>>>pyperclip.paste()
'Forexample,ifIcopiedthissentencetotheclipboardandthencalled
paste(),itwouldlooklikethis:'
RUNNINGPYTHONSCRIPTSOUTSIDEOFIDLE
Sofar,you’vebeenrunningyourPythonscriptsusingtheinteractiveshellandfileeditorinIDLE.However,youwon’twanttogothroughtheinconvenienceofopeningIDLEandthePythonscripteachtimeyouwanttorunascript.Fortunately,thereareshortcutsyoucansetuptomakerunningPythonscriptseasier.ThestepsareslightlydifferentforWindows,OSX,andLinux,buteachisdescribedinAppendixB.TurntoAppendixBtolearnhowtorunyourPythonscriptsconvenientlyandbeabletopasscommandlineargumentstothem.(YouwillnotbeabletopasscommandlineargumentstoyourprogramsusingIDLE.)
Project:PasswordLockerYouprobablyhaveaccountsonmanydifferentwebsites.It’sabadhabittousethesamepasswordforeachofthembecauseifanyofthosesiteshasasecuritybreach,thehackerswilllearnthepasswordtoallofyourotheraccounts.It’sbesttousepasswordmanagersoftwareonyourcomputerthatusesonemasterpasswordtounlockthepasswordmanager.Thenyoucancopyanyaccountpasswordtotheclipboardandpasteitintothewebsite’sPasswordfield.
Thepasswordmanagerprogramyou’llcreateinthisexampleisn’tsecure,butitoffersabasicdemonstrationofhowsuchprogramswork.
THECHAPTERPROJECTS
Thisisthefirst“chapterproject”ofthebook.Fromhereon,eachchapterwillhaveprojectsthatdemonstratetheconceptscoveredinthechapter.Theprojectsarewritteninastylethattakesyoufromablankfileeditorwindowtoafull,workingprogram.Justlikewiththeinteractiveshellexamples,don’tonlyreadtheprojectsections—followalongonyourcomputer!
Step1:ProgramDesignandDataStructuresYouwanttobeabletorunthisprogramwithacommandlineargumentthatistheaccount’sname—forinstance,emailorblog.Thataccount’spasswordwillbecopiedtotheclipboardsothattheusercanpasteitintoaPasswordfield.Thisway,theusercanhavelong,complicatedpasswordswithouthavingtomemorizethem.
Openanewfileeditorwindowandsavetheprogramaspw.py.Youneedtostarttheprogramwitha#!(shebang)line(seeAppendixB)andshouldalsowriteacommentthatbrieflydescribestheprogram.Sinceyouwanttoassociateeachaccount’snamewithitspassword,youcanstoretheseasstringsinadictionary.Thedictionarywillbethedatastructurethatorganizesyouraccountandpassworddata.Makeyourprogramlooklikethefollowing:
#!python3
#pw.py-Aninsecurepasswordlockerprogram.
PASSWORDS={'email':'F7minlBDDuvMJuxESSKHFhTxFtjVB6',
'blog':'VmALvQyKAxiVH5G8v01if1MLZF3sdt',
'luggage':'12345'}
Step2:HandleCommandLineArgumentsThecommandlineargumentswillbestoredinthevariablesys.argv.(SeeAppendixBformoreinformationonhowtousecommandlineargumentsinyourprograms.)Thefirstiteminthesys.argvlistshouldalwaysbeastringcontainingtheprogram’sfilename('pw.py'),andtheseconditemshouldbethefirstcommandlineargument.Forthisprogram,thisargumentisthenameoftheaccountwhosepasswordyouwant.Sincethecommandlineargumentismandatory,youdisplayausagemessagetotheuseriftheyforgettoaddit(thatis,ifthesys.argvlisthasfewerthantwovaluesinit).Makeyourprogramlooklikethefollowing:
#!python3
#pw.py-Aninsecurepasswordlockerprogram.
PASSWORDS={'email':'F7minlBDDuvMJuxESSKHFhTxFtjVB6',
'blog':'VmALvQyKAxiVH5G8v01if1MLZF3sdt',
'luggage':'12345'}
importsys
iflen(sys.argv)<2:
print('Usage:pythonpw.py[account]-copyaccountpassword')
sys.exit()
account=sys.argv[1]#firstcommandlineargistheaccountname
Step3:CopytheRightPasswordNowthattheaccountnameisstoredasastringinthevariableaccount,youneedtoseewhetheritexistsinthePASSWORDSdictionaryasakey.Ifso,youwanttocopythekey’svaluetotheclipboardusingpyperclip.copy().(Sinceyou’reusingthepyperclipmodule,youneedtoimportit.)Notethatyoudon’tactuallyneedtheaccountvariable;youcouldjustusesys.argv[1]everywhereaccountisusedinthisprogram.Butavariablenamedaccountismuchmorereadablethansomethingcrypticlikesys.argv[1].
Makeyourprogramlooklikethefollowing:#!python3
#pw.py-Aninsecurepasswordlockerprogram.
PASSWORDS={'email':'F7minlBDDuvMJuxESSKHFhTxFtjVB6',
'blog':'VmALvQyKAxiVH5G8v01if1MLZF3sdt',
'luggage':'12345'}
importsys,pyperclip
iflen(sys.argv)<2:
print('Usage:pypw.py[account]-copyaccountpassword')
sys.exit()
account=sys.argv[1]#firstcommandlineargistheaccountname
ifaccountinPASSWORDS:
pyperclip.copy(PASSWORDS[account])
print('Passwordfor'+account+'copiedtoclipboard.')
else:
print('Thereisnoaccountnamed'+account)
ThisnewcodelooksinthePASSWORDSdictionaryfortheaccountname.Iftheaccountnameisakeyinthedictionary,wegetthevaluecorrespondingtothatkey,copyittotheclipboard,andprintamessagesayingthatwecopiedthevalue.Otherwise,weprintamessagesayingthere’snoaccountwiththatname.
That’sthecompletescript.UsingtheinstructionsinAppendixBforlaunchingcommandlineprogramseasily,younowhaveafastwaytocopyyouraccountpasswordstotheclipboard.YouwillhavetomodifythePASSWORDSdictionaryvalueinthesourcewheneveryouwanttoupdatetheprogramwithanewpassword.
Ofcourse,youprobablydon’twanttokeepallyourpasswordsinoneplacewhereanyonecouldeasilycopythem.Butyoucanmodifythisprogramanduseittoquicklycopyregulartexttotheclipboard.Sayyouaresendingoutseveralemailsthathavemanyofthesamestockparagraphsincommon.YoucouldputeachparagraphasavalueinthePASSWORDSdictionary(you’dprobablywanttorenamethedictionaryatthispoint),andthenyouwouldhaveawaytoquicklyselectandcopyoneofmanystandardpiecesoftexttotheclipboard.
OnWindows,youcancreateabatchfiletorunthisprogramwiththeWIN-RRunwindow.(Formoreaboutbatchfiles,seeAppendixB.)Typethefollowingintothefileeditorandsavethefileaspw.batintheC:\Windowsfolder:
@py.exeC:\Python34\pw.py%*
@pause
Withthisbatchfilecreated,runningthepassword-safeprogramonWindowsisjustamatterofpressingWIN-Randtypingpw<accountname>.
Project:AddingBulletstoWikiMarkupWheneditingaWikipediaarticle,youcancreateabulletedlistbyputtingeachlistitemonitsownlineandplacingastarinfront.Butsayyouhaveareallylargelistthatyouwanttoaddbulletpointsto.Youcouldjusttypethosestarsatthebeginningofeachline,onebyone.OryoucouldautomatethistaskwithashortPythonscript.
ThebulletPointAdder.pyscriptwillgetthetextfromtheclipboard,addastarandspacetothebeginningofeachline,andthenpastethisnewtexttotheclipboard.Forexample,ifIcopiedthefollowingtext(fortheWikipediaarticle“ListofListsofLists”)totheclipboard:
Listsofanimals
Listsofaquariumlife
Listsofbiologistsbyauthorabbreviation
Listsofcultivars
andthenranthebulletPointAdder.pyprogram,theclipboardwouldthencontainthefollowing:
*Listsofanimals
*Listsofaquariumlife
*Listsofbiologistsbyauthorabbreviation
*Listsofcultivars
Thisstar-prefixedtextisreadytobepastedintoaWikipediaarticleasabulletedlist.
Step1:CopyandPastefromtheClipboardYouwantthebulletPointAdder.pyprogramtodothefollowing:
1. Pastetextfromtheclipboard2. Dosomethingtoit3. Copythenewtexttotheclipboard
Thatsecondstepisalittletricky,butsteps1and3areprettystraightforward:Theyjustinvolvethepyperclip.copy()andpyperclip.paste()functions.Fornow,let’sjustwritethepartoftheprogramthatcoverssteps1and3.Enterthefollowing,savingtheprogramasbulletPointAdder.py:
#!python3
#bulletPointAdder.py-AddsWikipediabulletpointstothestart
#ofeachlineoftextontheclipboard.
importpyperclip
text=pyperclip.paste()
#TODO:Separatelinesandaddstars.
pyperclip.copy(text)
TheTODOcommentisareminderthatyoushouldcompletethispartoftheprogrameventually.Thenextstepistoactuallyimplementthatpieceoftheprogram.
Step2:SeparatetheLinesofTextandAddtheStarThecalltopyperclip.paste()returnsallthetextontheclipboardasonebigstring.Ifweusedthe“ListofListsofLists”example,thestringstoredintextwouldlooklikethis:
'Listsofanimals\nListsofaquariumlife\nListsofbiologistsbyauthor
abbreviation\nListsofcultivars'
The\nnewlinecharactersinthisstringcauseittobedisplayedwithmultiplelineswhenitisprintedorpastedfromtheclipboard.Therearemany“lines”inthisonestringvalue.Youwanttoaddastartothestartofeachoftheselines.
Youcouldwritecodethatsearchesforeach\nnewlinecharacterinthestringandthenaddsthestarjustafterthat.Butitwouldbeeasiertousethesplit()methodtoreturnalistofstrings,oneforeachlineintheoriginalstring,andthenaddthestartothefrontofeachstringinthelist.
Makeyourprogramlooklikethefollowing:#!python3
#bulletPointAdder.py-AddsWikipediabulletpointstothestart
#ofeachlineoftextontheclipboard.
importpyperclip
text=pyperclip.paste()
#Separatelinesandaddstars.
lines=text.split('\n')
foriinrange(len(lines)):#loopthroughallindexesinthe"lines"list
lines[i]='*'+lines[i]#addstartoeachstringin"lines"list
pyperclip.copy(text)
Wesplitthetextalongitsnewlinestogetalistinwhicheachitemisonelineofthetext.Westorethelistinlinesandthenloopthroughtheitemsinlines.Foreachline,weaddastarandaspacetothestartoftheline.Noweachstringinlinesbeginswithastar.
Step3:JointheModifiedLinesThelineslistnowcontainsmodifiedlinesthatstartwithstars.Butpyperclip.copy()isexpectingasinglestringvalue,notalistofstringvalues.Tomakethissinglestringvalue,passlinesintothejoin()methodtogetasinglestringjoinedfromthelist’sstrings.Makeyourprogramlooklikethefollowing:
#!python3
#bulletPointAdder.py-AddsWikipediabulletpointstothestart
#ofeachlineoftextontheclipboard.
importpyperclip
text=pyperclip.paste()
#Separatelinesandaddstars.
lines=text.split('\n')
foriinrange(len(lines)):#loopthroughallindexesfor"lines"list
lines[i]='*'+lines[i]#addstartoeachstringin"lines"list
text='\n'.join(lines)
pyperclip.copy(text)
Whenthisprogramisrun,itreplacesthetextontheclipboardwithtextthathasstarsatthestartofeachline.Nowtheprogramiscomplete,andyoucantryrunningitwithtextcopiedtotheclipboard.
Evenifyoudon’tneedtoautomatethisspecifictask,youmightwanttoautomatesomeotherkindoftextmanipulation,suchasremovingtrailingspacesfromtheendoflinesorconvertingtexttouppercaseorlowercase.Whateveryourneeds,youcanusetheclipboardforinputandoutput.
SummaryTextisacommonformofdata,andPythoncomeswithmanyhelpfulstringmethodstoprocessthetextstoredinstringvalues.Youwillmakeuseofindexing,slicing,andstringmethodsinalmosteveryPythonprogramyouwrite.
Theprogramsyouarewritingnowdon’tseemtoosophisticated—theydon’thavegraphicaluserinterfaceswithimagesandcolorfultext.Sofar,you’redisplayingtextwithprint()andlettingtheuserentertextwithinput().However,theusercanquicklyenterlargeamountsoftextthroughtheclipboard.Thisabilityprovidesausefulavenueforwritingprogramsthatmanipulatemassiveamountsoftext.Thesetext-basedprogramsmightnothaveflashywindowsorgraphics,buttheycangetalotofusefulworkdonequickly.
Anotherwaytomanipulatelargeamountsoftextisreadingandwritingfilesdirectlyofftheharddrive.You’lllearnhowtodothiswithPythoninthenextchapter.
PracticeQuestionsQ: 1.Whatareescapecharacters?
Q: 2.Whatdothe\nand\tescapecharactersrepresent?
Q: 3.Howcanyouputa\backslashcharacterinastring?
Q: 4.Thestringvalue"Howl'sMovingCastle"isavalidstring.Whyisn’titaproblemthatthesinglequotecharacterinthewordHowl'sisn’tescaped?
Q: 5.Ifyoudon’twanttoput\ninyourstring,howcanyouwriteastringwithnewlinesinit?
Q: 6.Whatdothefollowingexpressionsevaluateto?
'Helloworld!'[1]
'Helloworld!'[0:5]
'Helloworld!'[:5]
'Helloworld!'[3:]
Q: 7.Whatdothefollowingexpressionsevaluateto?
'Hello'.upper()
'Hello'.upper().isupper()
'Hello'.upper().lower()
Q: 8.Whatdothefollowingexpressionsevaluateto?
'Remember,remember,thefifthofNovember.'.split()
'-'.join('Therecanbeonlyone.'.split())
Q: 9.Whatstringmethodscanyouusetoright-justify,left-justify,andcenterastring?
Q: 10.Howcanyoutrimwhitespacecharactersfromthebeginningorendofastring?
PracticeProjectForpractice,writeaprogramthatdoesthefollowing.
TablePrinterWriteafunctionnamedprintTable()thattakesalistoflistsofstringsanddisplaysitinawell-organizedtablewitheachcolumnright-justified.Assumethatalltheinnerlistswillcontainthesamenumberofstrings.Forexample,thevaluecouldlooklikethis:
tableData=[['apples','oranges','cherries','banana'],
['Alice','Bob','Carol','David'],
['dogs','cats','moose','goose']]
YourprintTable()functionwouldprintthefollowing:applesAlicedogs
orangesBobcats
cherriesCarolmoose
bananaDavidgoose
Hint:Yourcodewillfirsthavetofindthelongeststringineachoftheinnerlistssothatthewholecolumncanbewideenoughtofitallthestrings.Youcanstorethemaximumwidthofeachcolumnasalistofintegers.TheprintTable()functioncanbeginwithcolWidths=[0]*len(tableData),whichwillcreatealistcontainingthesamenumberof0valuesasthenumberofinnerlistsintableData.Thatway,colWidths[0]canstorethewidthofthelongeststringintableData[0],colWidths[1]canstorethewidthofthelongeststringintableData[1],andsoon.YoucanthenfindthelargestvalueinthecolWidthslisttofindoutwhatintegerwidthtopasstotherjust()stringmethod.
Chapter7.PatternMatchingwithRegularExpressionsYoumaybefamiliarwithsearchingfortextbypressingCTRL-Fandtypinginthewordsyou’relookingfor.Regularexpressionsgoonestepfurther:Theyallowyoutospecifyapatternoftexttosearchfor.Youmaynotknowabusiness’sexactphonenumber,butifyouliveintheUnitedStatesorCanada,youknowitwillbethreedigits,followedbyahyphen,andthenfourmoredigits(andoptionally,athree-digitareacodeatthestart).Thisishowyou,asahuman,knowaphonenumberwhenyouseeit:415-555-1234isaphonenumber,but4,155,551,234isnot.
Regularexpressionsarehelpful,butnotmanynon-programmersknowaboutthemeventhoughmostmoderntexteditorsandwordprocessors,suchasMicrosoftWordorOpenOffice,havefindandfind-and-replacefeaturesthatcansearchbasedonregularexpressions.Regularexpressionsarehugetime-savers,notjustforsoftwareusersbutalsoforprogrammers.Infact,techwriterCoryDoctorowarguesthatevenbeforeteachingprogramming,weshouldbeteachingregularexpressions:
“Knowing[regularexpressions]canmeanthedifferencebetweensolvingaproblemin3stepsandsolvingitin3,000steps.Whenyou’reanerd,youforgetthattheproblemsyousolvewithacouplekeystrokescantakeotherpeopledaysoftedious,error-proneworktoslogthrough.”[1]
Inthischapter,you’llstartbywritingaprogramtofindtextpatternswithoutusingregularexpressionsandthenseehowtouseregularexpressionstomakethecodemuchlessbloated.I’llshowyoubasicmatchingwithregularexpressionsandthenmoveontosomemorepowerfulfeatures,suchasstringsubstitutionandcreatingyourowncharacterclasses.Finally,attheendofthechapter,you’llwriteaprogramthatcanautomaticallyextractphonenumbersandemailaddressesfromablockoftext.
FindingPatternsofTextWithoutRegularExpressionsSayyouwanttofindaphonenumberinastring.Youknowthepattern:threenumbers,ahyphen,threenumbers,ahyphen,andfournumbers.Here’sanexample:415-555-4242.
Let’suseafunctionnamedisPhoneNumber()tocheckwhetherastringmatchesthispattern,returningeitherTrueorFalse.Openanewfileeditorwindowandenterthefollowingcode;thensavethefileasisPhoneNumber.py:
defisPhoneNumber(text):
➊iflen(text)!=12:
returnFalse
foriinrange(0,3):
➋ifnottext[i].isdecimal():
returnFalse
➌iftext[3]!='-':
returnFalse
foriinrange(4,7):
➍ifnottext[i].isdecimal():
returnFalse
➎iftext[7]!='-':
returnFalse
foriinrange(8,12):
➏ifnottext[i].isdecimal():
returnFalse
➐returnTrue
print('415-555-4242isaphonenumber:')
print(isPhoneNumber('415-555-4242'))
print('Moshimoshiisaphonenumber:')
print(isPhoneNumber('Moshimoshi'))
Whenthisprogramisrun,theoutputlookslikethis:415-555-4242isaphonenumber:
True
Moshimoshiisaphonenumber:
False
TheisPhoneNumber()functionhascodethatdoesseveralcheckstoseewhetherthestringintextisavalidphonenumber.Ifanyofthesechecksfail,thefunctionreturnsFalse.Firstthecodechecksthatthestringisexactly12characters➊.Thenitchecksthattheareacode(thatis,thefirstthreecharactersintext)consistsofonlynumericcharacters➋.Therestofthefunctionchecksthatthestringfollowsthepatternofaphonenumber:Thenumbermusthavethefirsthyphenaftertheareacode➌,threemorenumericcharacters➍,thenanotherhyphen➎,andfinallyfourmorenumbers➏.Iftheprogramexecutionmanagestogetpastallthechecks,itreturnsTrue➐.
CallingisPhoneNumber()withtheargument'415-555-4242'willreturnTrue.CallingisPhoneNumber()with'Moshimoshi'willreturnFalse;thefirsttestfailsbecause'Moshimoshi'isnot12characterslong.
Youwouldhavetoaddevenmorecodetofindthispatternoftextinalargerstring.Replacethelastfourprint()functioncallsinisPhoneNumber.pywiththefollowing:
message='Callmeat415-555-1011tomorrow.415-555-9999ismyoffice.'
foriinrange(len(message)):
➊chunk=message[i:i+12]
➋ifisPhoneNumber(chunk):
print('Phonenumberfound:'+chunk)
print('Done')
Whenthisprogramisrun,theoutputwilllooklikethis:Phonenumberfound:415-555-1011
Phonenumberfound:415-555-9999
Done
Oneachiterationoftheforloop,anewchunkof12charactersfrommessageisassignedtothevariablechunk➊.Forexample,onthefirstiteration,iis0,andchunkisassignedmessage[0:12](thatis,thestring'Callmeat4').Onthenextiteration,iis1,andchunkisassignedmessage[1:13](thestring'allmeat41').
YoupasschunktoisPhoneNumber()toseewhetheritmatchesthephonenumberpattern➋,andifso,youprintthechunk.
Continuetoloopthroughmessage,andeventuallythe12charactersinchunkwillbeaphonenumber.Theloopgoesthroughtheentirestring,testingeach12-characterpieceandprintinganychunkitfindsthatsatisfiesisPhoneNumber().Oncewe’redonegoingthroughmessage,weprintDone.
Whilethestringinmessageisshortinthisexample,itcouldbemillionsofcharacterslongandtheprogramwouldstillruninlessthanasecond.Asimilarprogramthatfindsphonenumbersusingregularexpressionswouldalsoruninlessthanasecond,butregularexpressionsmakeitquickertowritetheseprograms.
FindingPatternsofTextwithRegularExpressionsThepreviousphonenumber–findingprogramworks,butitusesalotofcodetodosomethinglimited:TheisPhoneNumber()functionis17linesbutcanfindonlyonepatternofphonenumbers.Whataboutaphonenumberformattedlike415.555.4242or(415)555-4242?Whatifthephonenumberhadanextension,like415-555-4242x99?TheisPhoneNumber()functionwouldfailtovalidatethem.Youcouldaddyetmorecodefortheseadditionalpatterns,butthereisaneasierway.
Regularexpressions,calledregexesforshort,aredescriptionsforapatternoftext.Forexample,a\dinaregexstandsforadigitcharacter—thatis,anysinglenumeral0to9.Theregex\d\d\d-\d\d\d-\d\d\d\disusedbyPythontomatchthesametextthepreviousisPhoneNumber()functiondid:astringofthreenumbers,ahyphen,threemorenumbers,anotherhyphen,andfournumbers.Anyotherstringwouldnotmatchthe\d\d\d-\d\d\d-\d\d\d\dregex.
Butregularexpressionscanbemuchmoresophisticated.Forexample,addinga3incurlybrackets({3})afterapatternislikesaying,“Matchthispatternthreetimes.”Sotheslightlyshorterregex\d{3}-\d{3}-\d{4}alsomatchesthecorrectphonenumberformat.
CreatingRegexObjectsAlltheregexfunctionsinPythonareintheremodule.Enterthefollowingintotheinteractiveshelltoimportthismodule:
>>>importre
NOTE
Mostoftheexamplesthatfollowinthischapterwillrequiretheremodule,soremembertoimportitatthebeginningofanyscriptyouwriteoranytimeyourestartIDLE.Otherwise,you’llgetaNameError:name're'isnotdefinederrormessage.
Passingastringvaluerepresentingyourregularexpressiontore.compile()returnsaRegexpatternobject(orsimply,aRegexobject).
TocreateaRegexobjectthatmatchesthephonenumberpattern,enterthefollowingintotheinteractiveshell.(Rememberthat\dmeans“adigitcharacter”and\d\d\d-\d\d\d-\d\d\d\distheregularexpressionforthecorrectphonenumberpattern.)
>>>phoneNumRegex=re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
NowthephoneNumRegexvariablecontainsaRegexobject.
PASSINGRAWSTRINGSTORE.COMPILE()
RememberthatescapecharactersinPythonusethebackslash(\).Thestringvalue'\n'representsasinglenewlinecharacter,notabackslashfollowedbyalowercasen.Youneedtoentertheescapecharacter\\toprintasinglebackslash.So'\\n'isthestringthatrepresentsabackslashfollowedbyalowercasen.However,byputtinganrbeforethefirstquoteofthestringvalue,youcanmarkthestringasarawstring,whichdoesnotescapecharacters.
Sinceregularexpressionsfrequentlyusebackslashesinthem,itisconvenienttopassrawstringstothere.compile()functioninsteadoftypingextrabackslashes.Typingr'\d\d\d-\d\d\d-\d\d\d\d'ismucheasierthantyping'\\d\\d\\d-\\d\\d\\d-\\d\\d\\d\\d'.
MatchingRegexObjects
ARegexobject’ssearch()methodsearchesthestringitispassedforanymatchestotheregex.Thesearch()methodwillreturnNoneiftheregexpatternisnotfoundinthestring.Ifthepatternisfound,thesearch()methodreturnsaMatchobject.Matchobjectshaveagroup()methodthatwillreturntheactualmatchedtextfromthesearchedstring.(I’llexplaingroupsshortly.)Forexample,enterthefollowingintotheinteractiveshell:
>>>phoneNumRegex=re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
>>>mo=phoneNumRegex.search('Mynumberis415-555-4242.')
>>>print('Phonenumberfound:'+mo.group())
Phonenumberfound:415-555-4242
ThemovariablenameisjustagenericnametouseforMatchobjects.Thisexamplemightseemcomplicatedatfirst,butitismuchshorterthantheearlierisPhoneNumber.pyprogramanddoesthesamething.
Here,wepassourdesiredpatterntore.compile()andstoretheresultingRegexobjectinphoneNumRegex.Thenwecallsearch()onphoneNumRegexandpasssearch()thestringwewanttosearchforamatch.Theresultofthesearchgetsstoredinthevariablemo.Inthisexample,weknowthatourpatternwillbefoundinthestring,soweknowthataMatchobjectwillbereturned.KnowingthatmocontainsaMatchobjectandnotthenullvalueNone,wecancallgroup()onmotoreturnthematch.Writingmo.group()insideourprintstatementdisplaysthewholematch,415-555-4242.
ReviewofRegularExpressionMatchingWhilethereareseveralstepstousingregularexpressionsinPython,eachstepisfairlysimple.
1. Importtheregexmodulewithimportre.2. CreateaRegexobjectwiththere.compile()function.(Remembertousearaw
string.)3. PassthestringyouwanttosearchintotheRegexobject’ssearch()method.This
returnsaMatchobject.4. CalltheMatchobject’sgroup()methodtoreturnastringoftheactualmatchedtext.
NOTE
WhileIencourageyoutoentertheexamplecodeintotheinteractiveshell,youshouldalsomakeuseofweb-basedregularexpressiontesters,whichcanshowyouexactlyhowaregexmatchesapieceoftextthatyouenter.Irecommendthetesterathttp://regexpal.com/.
MorePatternMatchingwithRegularExpressionsNowthatyouknowthebasicstepsforcreatingandfindingregularexpressionobjectswithPython,you’rereadytotrysomeoftheirmorepowerfulpattern-matchingcapabilities.
GroupingwithParenthesesSayyouwanttoseparatetheareacodefromtherestofthephonenumber.Addingparentheseswillcreategroupsintheregex:(\d\d\d)-(\d\d\d-\d\d\d\d).Thenyoucanusethegroup()matchobjectmethodtograbthematchingtextfromjustonegroup.
Thefirstsetofparenthesesinaregexstringwillbegroup1.Thesecondsetwillbegroup2.Bypassingtheinteger1or2tothegroup()matchobjectmethod,youcangrabdifferentpartsofthematchedtext.Passing0ornothingtothegroup()methodwillreturntheentirematchedtext.Enterthefollowingintotheinteractiveshell:
>>>phoneNumRegex=re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
>>>mo=phoneNumRegex.search('Mynumberis415-555-4242.')
>>>mo.group(1)
'415'
>>>mo.group(2)
'555-4242'
>>>mo.group(0)
'415-555-4242'
>>>mo.group()
'415-555-4242'
Ifyouwouldliketoretrieveallthegroupsatonce,usethegroups()method—notethepluralformforthename.
>>>mo.groups()
('415','555-4242')
>>>areaCode,mainNumber=mo.groups()
>>>print(areaCode)
415
>>>print(mainNumber)
555-4242
Sincemo.groups()returnsatupleofmultiplevalues,youcanusethemultiple-assignmenttricktoassigneachvaluetoaseparatevariable,asinthepreviousareaCode,mainNumber=mo.groups()line.
Parentheseshaveaspecialmeaninginregularexpressions,butwhatdoyoudoifyouneedtomatchaparenthesisinyourtext?Forinstance,maybethephonenumbersyouaretryingtomatchhavetheareacodesetinparentheses.Inthiscase,youneedtoescapethe(and)characterswithabackslash.Enterthefollowingintotheinteractiveshell:
>>>phoneNumRegex=re.compile(r'(\(\d\d\d\))(\d\d\d-\d\d\d\d)')
>>>mo=phoneNumRegex.search('Myphonenumberis(415)555-4242.')
>>>mo.group(1)
'(415)'
>>>mo.group(2)
'555-4242'
The\(and\)escapecharactersintherawstringpassedtore.compile()willmatchactualparenthesischaracters.
MatchingMultipleGroupswiththePipeThe|characteriscalledapipe.Youcanuseitanywhereyouwanttomatchoneofmanyexpressions.Forexample,theregularexpressionr'Batman|TinaFey'willmatcheither
'Batman'or'TinaFey'.
WhenbothBatmanandTinaFeyoccurinthesearchedstring,thefirstoccurrenceofmatchingtextwillbereturnedastheMatchobject.Enterthefollowingintotheinteractiveshell:
>>>heroRegex=re.compile(r'Batman|TinaFey')
>>>mo1=heroRegex.search('BatmanandTinaFey.')
>>>mo1.group()
'Batman'
>>>mo2=heroRegex.search('TinaFeyandBatman.')
>>>mo2.group()
'TinaFey'
NOTE
Youcanfindallmatchingoccurrenceswiththefindall()methodthat’sdiscussedinThefindall()Method.
Youcanalsousethepipetomatchoneofseveralpatternsaspartofyourregex.Forexample,sayyouwantedtomatchanyofthestrings'Batman','Batmobile','Batcopter',and'Batbat'.SinceallthesestringsstartwithBat,itwouldbeniceifyoucouldspecifythatprefixonlyonce.Thiscanbedonewithparentheses.Enterthefollowingintotheinteractiveshell:
>>>batRegex=re.compile(r'Bat(man|mobile|copter|bat)')
>>>mo=batRegex.search('Batmobilelostawheel')
>>>mo.group()
'Batmobile'
>>>mo.group(1)
'mobile'
Themethodcallmo.group()returnsthefullmatchedtext'Batmobile',whilemo.group(1)returnsjustthepartofthematchedtextinsidethefirstparenthesesgroup,'mobile'.Byusingthepipecharacterandgroupingparentheses,youcanspecifyseveralalternativepatternsyouwouldlikeyourregextomatch.
Ifyouneedtomatchanactualpipecharacter,escapeitwithabackslash,like\|.
OptionalMatchingwiththeQuestionMarkSometimesthereisapatternthatyouwanttomatchonlyoptionally.Thatis,theregexshouldfindamatchwhetherornotthatbitoftextisthere.The?characterflagsthegroupthatprecedesitasanoptionalpartofthepattern.Forexample,enterthefollowingintotheinteractiveshell:
>>>batRegex=re.compile(r'Bat(wo)?man')
>>>mo1=batRegex.search('TheAdventuresofBatman')
>>>mo1.group()
'Batman'
>>>mo2=batRegex.search('TheAdventuresofBatwoman')
>>>mo2.group()
'Batwoman'
The(wo)?partoftheregularexpressionmeansthatthepatternwoisanoptionalgroup.Theregexwillmatchtextthathaszeroinstancesoroneinstanceofwoinit.Thisiswhytheregexmatchesboth'Batwoman'and'Batman'.
Usingtheearlierphonenumberexample,youcanmaketheregexlookforphonenumbersthatdoordonothaveanareacode.Enterthefollowingintotheinteractiveshell:
>>>phoneRegex=re.compile(r'(\d\d\d-)?\d\d\d-\d\d\d\d')
>>>mo1=phoneRegex.search('Mynumberis415-555-4242')
>>>mo1.group()
'415-555-4242'
>>>mo2=phoneRegex.search('Mynumberis555-4242')
>>>mo2.group()
'555-4242'
Youcanthinkofthe?assaying,“Matchzerooroneofthegroupprecedingthisquestionmark.”
Ifyouneedtomatchanactualquestionmarkcharacter,escapeitwith\?.
MatchingZeroorMorewiththeStarThe*(calledthestarorasterisk)means“matchzeroormore”—thegroupthatprecedesthestarcanoccuranynumberoftimesinthetext.Itcanbecompletelyabsentorrepeatedoverandoveragain.Let’slookattheBatmanexampleagain.
>>>batRegex=re.compile(r'Bat(wo)*man')
>>>mo1=batRegex.search('TheAdventuresofBatman')
>>>mo1.group()
'Batman'
>>>mo2=batRegex.search('TheAdventuresofBatwoman')
>>>mo2.group()
'Batwoman'
>>>mo3=batRegex.search('TheAdventuresofBatwowowowoman')
>>>mo3.group()
'Batwowowowoman'
For'Batman',the(wo)*partoftheregexmatcheszeroinstancesofwointhestring;for'Batwoman',the(wo)*matchesoneinstanceofwo;andfor'Batwowowowoman',(wo)*matchesfourinstancesofwo.
Ifyouneedtomatchanactualstarcharacter,prefixthestarintheregularexpressionwithabackslash,\*.
MatchingOneorMorewiththePlusWhile*means“matchzeroormore,”the+(orplus)means“matchoneormore.”Unlikethestar,whichdoesnotrequireitsgrouptoappearinthematchedstring,thegroupprecedingaplusmustappearatleastonce.Itisnotoptional.Enterthefollowingintotheinteractiveshell,andcompareitwiththestarregexesintheprevioussection:
>>>batRegex=re.compile(r'Bat(wo)+man')
>>>mo1=batRegex.search('TheAdventuresofBatwoman')
>>>mo1.group()
'Batwoman'
>>>mo2=batRegex.search('TheAdventuresofBatwowowowoman')
>>>mo2.group()
'Batwowowowoman'
>>>mo3=batRegex.search('TheAdventuresofBatman')
>>>mo3==None
True
TheregexBat(wo)+manwillnotmatchthestring'TheAdventuresofBatman'becauseatleastonewoisrequiredbytheplussign.
Ifyouneedtomatchanactualplussigncharacter,prefixtheplussignwithabackslashtoescapeit:\+.
MatchingSpecificRepetitionswithCurlyBracketsIfyouhaveagroupthatyouwanttorepeataspecificnumberoftimes,followthegroupinyourregexwithanumberincurlybrackets.Forexample,theregex(Ha){3}willmatchthestring'HaHaHa',butitwillnotmatch'HaHa',sincethelatterhasonlytworepeatsofthe(Ha)group.
Insteadofonenumber,youcanspecifyarangebywritingaminimum,acomma,andamaximuminbetweenthecurlybrackets.Forexample,theregex(Ha){3,5}willmatch'HaHaHa','HaHaHaHa',and'HaHaHaHaHa'.
Youcanalsoleaveoutthefirstorsecondnumberinthecurlybracketstoleavetheminimumormaximumunbounded.Forexample,(Ha){3,}willmatchthreeormoreinstancesofthe(Ha)group,while(Ha){,5}willmatchzerotofiveinstances.Curlybracketscanhelpmakeyourregularexpressionsshorter.Thesetworegularexpressionsmatchidenticalpatterns:
(Ha){3}
(Ha)(Ha)(Ha)
Andthesetworegularexpressionsalsomatchidenticalpatterns:(Ha){3,5}
((Ha)(Ha)(Ha))|((Ha)(Ha)(Ha)(Ha))|((Ha)(Ha)(Ha)(Ha)(Ha))
Enterthefollowingintotheinteractiveshell:>>>haRegex=re.compile(r'(Ha){3}')
>>>mo1=haRegex.search('HaHaHa')
>>>mo1.group()
'HaHaHa'
>>>mo2=haRegex.search('Ha')
>>>mo2==None
True
Here,(Ha){3}matches'HaHaHa'butnot'Ha'.Sinceitdoesn’tmatch'Ha',search()returnsNone.
GreedyandNongreedyMatchingSince(Ha){3,5}canmatchthree,four,orfiveinstancesofHainthestring'HaHaHaHaHa',youmaywonderwhytheMatchobject’scalltogroup()inthepreviouscurlybracketexamplereturns'HaHaHaHaHa'insteadoftheshorterpossibilities.Afterall,'HaHaHa'and'HaHaHaHa'arealsovalidmatchesoftheregularexpression(Ha){3,5}.
Python’sregularexpressionsaregreedybydefault,whichmeansthatinambiguoussituationstheywillmatchthelongeststringpossible.Thenon-greedyversionofthecurlybrackets,whichmatchestheshorteststringpossible,hastheclosingcurlybracketfollowedbyaquestionmark.
Enterthefollowingintotheinteractiveshell,andnoticethedifferencebetweenthegreedyandnongreedyformsofthecurlybracketssearchingthesamestring:
>>>greedyHaRegex=re.compile(r'(Ha){3,5}')
>>>mo1=greedyHaRegex.search('HaHaHaHaHa')
>>>mo1.group()
'HaHaHaHaHa'
>>>nongreedyHaRegex=re.compile(r'(Ha){3,5}?')
>>>mo2=nongreedyHaRegex.search('HaHaHaHaHa')
>>>mo2.group()
'HaHaHa'
Notethatthequestionmarkcanhavetwomeaningsinregularexpressions:declaringanongreedymatchorflagginganoptionalgroup.Thesemeaningsareentirelyunrelated.
Thefindall()MethodInadditiontothesearch()method,Regexobjectsalsohaveafindall()method.Whilesearch()willreturnaMatchobjectofthefirstmatchedtextinthesearchedstring,thefindall()methodwillreturnthestringsofeverymatchinthesearchedstring.Toseehowsearch()returnsaMatchobjectonlyonthefirstinstanceofmatchingtext,enterthefollowingintotheinteractiveshell:
>>>phoneNumRegex=re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
>>>mo=phoneNumRegex.search('Cell:415-555-9999Work:212-555-0000')
>>>mo.group()
'415-555-9999'
Ontheotherhand,findall()willnotreturnaMatchobjectbutalistofstrings—aslongastherearenogroupsintheregularexpression.Eachstringinthelistisapieceofthesearchedtextthatmatchedtheregularexpression.Enterthefollowingintotheinteractiveshell:
>>>phoneNumRegex=re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')#hasnogroups
>>>phoneNumRegex.findall('Cell:415-555-9999Work:212-555-0000')
['415-555-9999','212-555-0000']
Iftherearegroupsintheregularexpression,thenfindall()willreturnalistoftuples.Eachtuplerepresentsafoundmatch,anditsitemsarethematchedstringsforeachgroupintheregex.Toseefindall()inaction,enterthefollowingintotheinteractiveshell(noticethattheregularexpressionbeingcompilednowhasgroupsinparentheses):
>>>phoneNumRegex=re.compile(r'(\d\d\d)-(\d\d\d)-(\d\d\d\d)')#hasgroups
>>>phoneNumRegex.findall('Cell:415-555-9999Work:212-555-0000')
[('415','555','1122'),('415','555','8899')]
Tosummarizewhatthefindall()methodreturns,rememberthefollowing:
1. Whencalledonaregexwithnogroups,suchas\d\d\d-\d\d\d-\d\d\d\d,themethodfindall()returnsalistofstringmatches,suchas['415-555-9999','212-555-0000'].
2. Whencalledonaregexthathasgroups,suchas(\d\d\d)-(\d\d\d)-(\d\d\d\d),themethodfindall()returnsalistoftuplesofstrings(onestringforeachgroup),suchas[('415','555','1122'),('415','555','8899')].
CharacterClassesIntheearlierphonenumberregexexample,youlearnedthat\dcouldstandforanynumericdigit.Thatis,\disshorthandfortheregularexpression(0|1|2|3|4|5|6|7|8|9).Therearemanysuchshorthandcharacterclasses,asshowninTable7-1.
Table7-1.ShorthandCodesforCommonCharacterClasses
Shorthandcharacterclass
Represents
\d Anynumericdigitfrom0to9.
\D Anycharacterthatisnotanumericdigitfrom0to9.
\w Anyletter,numericdigit,ortheunderscorecharacter.(Thinkofthisasmatching“word”characters.)
\W Anycharacterthatisnotaletter,numericdigit,ortheunderscorecharacter.
\s Anyspace,tab,ornewlinecharacter.(Thinkofthisasmatching“space”characters.)
\S Anycharacterthatisnotaspace,tab,ornewline.
Characterclassesareniceforshorteningregularexpressions.Thecharacterclass[0-5]willmatchonlythenumbers0to5;thisismuchshorterthantyping(0|1|2|3|4|5).
Forexample,enterthefollowingintotheinteractiveshell:>>>xmasRegex=re.compile(r'\d+\s\w+')
>>>xmasRegex.findall('12drummers,11pipers,10lords,9ladies,8maids,7
swans,6geese,5rings,4birds,3hens,2doves,1partridge')
['12drummers','11pipers','10lords','9ladies','8maids','7swans','6
geese','5rings','4birds','3hens','2doves','1partridge']
Theregularexpression\d+\s\w+willmatchtextthathasoneormorenumericdigits(\d+),followedbyawhitespacecharacter(\s),followedbyoneormoreletter/digit/underscorecharacters(\w+).Thefindall()methodreturnsallmatchingstringsoftheregexpatterninalist.
MakingYourOwnCharacterClassesTherearetimeswhenyouwanttomatchasetofcharactersbuttheshorthandcharacterclasses(\d,\w,\s,andsoon)aretoobroad.Youcandefineyourowncharacterclassusingsquarebrackets.Forexample,thecharacterclass[aeiouAEIOU]willmatchanyvowel,bothlowercaseanduppercase.Enterthefollowingintotheinteractiveshell:
>>>vowelRegex=re.compile(r'[aeiouAEIOU]')
>>>vowelRegex.findall('Robocopeatsbabyfood.BABYFOOD.')
['o','o','o','e','a','a','o','o','A','O','O']
Youcanalsoincluderangesoflettersornumbersbyusingahyphen.Forexample,thecharacterclass[a-zA-Z0-9]willmatchalllowercaseletters,uppercaseletters,andnumbers.
Notethatinsidethesquarebrackets,thenormalregularexpressionsymbolsarenotinterpretedassuch.Thismeansyoudonotneedtoescapethe.,*,?,or()characterswithaprecedingbackslash.Forexample,thecharacterclass[0-5.]willmatchdigits0to5andaperiod.Youdonotneedtowriteitas[0-5\.].
Byplacingacaretcharacter(^)justafterthecharacterclass’sopeningbracket,youcanmakeanegativecharacterclass.Anegativecharacterclasswillmatchallthecharactersthatarenotinthecharacterclass.Forexample,enterthefollowingintotheinteractiveshell:
>>>consonantRegex=re.compile(r'[^aeiouAEIOU]')
>>>consonantRegex.findall('Robocopeatsbabyfood.BABYFOOD.')
['R','b','c','p','','t','s','','b','b','y','','f','d','.','
','B','B','Y','','F','D','.']
Now,insteadofmatchingeveryvowel,we’rematchingeverycharacterthatisn’tavowel.
TheCaretandDollarSignCharactersYoucanalsousethecaretsymbol(^)atthestartofaregextoindicatethatamatchmustoccuratthebeginningofthesearchedtext.Likewise,youcanputadollarsign($)attheendoftheregextoindicatethestringmustendwiththisregexpattern.Andyoucanusethe^and$togethertoindicatethattheentirestringmustmatchtheregex—thatis,it’snotenoughforamatchtobemadeonsomesubsetofthestring.
Forexample,ther'^Hello'regularexpressionstringmatchesstringsthatbeginwith'Hello'.Enterthefollowingintotheinteractiveshell:
>>>beginsWithHello=re.compile(r'^Hello')
>>>beginsWithHello.search('Helloworld!')
<_sre.SRE_Matchobject;span=(0,5),match='Hello'>
>>>beginsWithHello.search('Hesaidhello.')==None
True
Ther'\d$'regularexpressionstringmatchesstringsthatendwithanumericcharacterfrom0to9.Enterthefollowingintotheinteractiveshell:
>>>endsWithNumber=re.compile(r'\d$')
>>>endsWithNumber.search('Yournumberis42')
<_sre.SRE_Matchobject;span=(16,17),match='2'>
>>>endsWithNumber.search('Yournumberisfortytwo.')==None
True
Ther'^\d+$'regularexpressionstringmatchesstringsthatbothbeginandendwithoneormorenumericcharacters.Enterthefollowingintotheinteractiveshell:
>>>wholeStringIsNum=re.compile(r'^\d+$')
>>>wholeStringIsNum.search('1234567890')
<_sre.SRE_Matchobject;span=(0,10),match='1234567890'>
>>>wholeStringIsNum.search('12345xyz67890')==None
True
>>>wholeStringIsNum.search('1234567890')==None
True
Thelasttwosearch()callsinthepreviousinteractiveshellexampledemonstratehowtheentirestringmustmatchtheregexif^and$areused.
Ialwaysconfusethemeaningsofthesetwosymbols,soIusethemnemonic“Carrotscostdollars”toremindmyselfthatthecaretcomesfirstandthedollarsigncomeslast.
TheWildcardCharacterThe.(ordot)characterinaregularexpressioniscalledawildcardandwillmatchanycharacterexceptforanewline.Forexample,enterthefollowingintotheinteractiveshell:
>>>atRegex=re.compile(r'.at')
>>>atRegex.findall('Thecatinthehatsatontheflatmat.')
['cat','hat','sat','lat','mat']
Rememberthatthedotcharacterwillmatchjustonecharacter,whichiswhythematchforthetextflatinthepreviousexamplematchedonlylat.Tomatchanactualdot,escapethedotwithabackslash:\..
MatchingEverythingwithDot-StarSometimesyouwillwanttomatcheverythingandanything.Forexample,sayyouwanttomatchthestring'FirstName:',followedbyanyandalltext,followedby'LastName:',andthenfollowedbyanythingagain.Youcanusethedot-star(.*)tostandinforthat“anything.”Rememberthatthedotcharactermeans“anysinglecharacterexceptthenewline,”andthestarcharactermeans“zeroormoreoftheprecedingcharacter.”
Enterthefollowingintotheinteractiveshell:>>>nameRegex=re.compile(r'FirstName:(.*)LastName:(.*)')
>>>mo=nameRegex.search('FirstName:AlLastName:Sweigart')
>>>mo.group(1)
'Al'
>>>mo.group(2)
'Sweigart'
Thedot-starusesgreedymode:Itwillalwaystrytomatchasmuchtextaspossible.Tomatchanyandalltextinanongreedyfashion,usethedot,star,andquestionmark(.*?).Likewithcurlybrackets,thequestionmarktellsPythontomatchinanongreedyway.
Enterthefollowingintotheinteractiveshelltoseethedifferencebetweenthegreedyandnongreedyversions:
>>>nongreedyRegex=re.compile(r'<.*?>')
>>>mo=nongreedyRegex.search('<Toserveman>fordinner.>')
>>>mo.group()
'<Toserveman>'
>>>greedyRegex=re.compile(r'<.*>')
>>>mo=greedyRegex.search('<Toserveman>fordinner.>')
>>>mo.group()
'<Toserveman>fordinner.>'
Bothregexesroughlytranslateto“Matchanopeninganglebracket,followedbyanything,followedbyaclosinganglebracket.”Butthestring'<Toserveman>fordinner.>'hastwopossiblematchesfortheclosinganglebracket.Inthenongreedyversionoftheregex,Pythonmatchestheshortestpossiblestring:'<Toserveman>'.Inthegreedyversion,Pythonmatchesthelongestpossiblestring:'<Toserveman>fordinner.>'.
MatchingNewlineswiththeDotCharacterThedot-starwillmatcheverythingexceptanewline.Bypassingre.DOTALLasthesecondargumenttore.compile(),youcanmakethedotcharactermatchallcharacters,includingthenewlinecharacter.
Enterthefollowingintotheinteractiveshell:
>>>noNewlineRegex=re.compile('.*')
>>>noNewlineRegex.search('Servethepublictrust.\nProtecttheinnocent.
\nUpholdthelaw.').group()
'Servethepublictrust.'
>>>newlineRegex=re.compile('.*',re.DOTALL)
>>>newlineRegex.search('Servethepublictrust.\nProtecttheinnocent.
\nUpholdthelaw.').group()
'Servethepublictrust.\nProtecttheinnocent.\nUpholdthelaw.'
TheregexnoNewlineRegex,whichdidnothavere.DOTALLpassedtothere.compile()callthatcreatedit,willmatcheverythingonlyuptothefirstnewlinecharacter,whereasnewlineRegex,whichdidhavere.DOTALLpassedtore.compile(),matcheseverything.ThisiswhythenewlineRegex.search()callmatchesthefullstring,includingitsnewlinecharacters.
ReviewofRegexSymbolsThischaptercoveredalotofnotation,sohere’saquickreviewofwhatyoulearned:
The?matcheszerooroneoftheprecedinggroup.The*matcheszeroormoreoftheprecedinggroup.The+matchesoneormoreoftheprecedinggroup.The{n}matchesexactlynoftheprecedinggroup.The{n,}matchesnormoreoftheprecedinggroup.The{,m}matches0tomoftheprecedinggroup.The{n,m}matchesatleastnandatmostmoftheprecedinggroup.{n,m}?or*?or+?performsanongreedymatchoftheprecedinggroup.^spammeansthestringmustbeginwithspam.spam$meansthestringmustendwithspam.The.matchesanycharacter,exceptnewlinecharacters.\d,\w,and\smatchadigit,word,orspacecharacter,respectively.\D,\W,and\Smatchanythingexceptadigit,word,orspacecharacter,respectively.[abc]matchesanycharacterbetweenthebrackets(suchasa,b,orc).[^abc]matchesanycharacterthatisn’tbetweenthebrackets.
Case-InsensitiveMatchingNormally,regularexpressionsmatchtextwiththeexactcasingyouspecify.Forexample,thefollowingregexesmatchcompletelydifferentstrings:
>>>regex1=re.compile('Robocop')
>>>regex2=re.compile('ROBOCOP')
>>>regex3=re.compile('robOcop')
>>>regex4=re.compile('RobocOp')
Butsometimesyoucareonlyaboutmatchingtheletterswithoutworryingwhetherthey’reuppercaseorlowercase.Tomakeyourregexcase-insensitive,youcanpassre.IGNORECASEorre.Iasasecondargumenttore.compile().Enterthefollowingintotheinteractiveshell:
>>>robocop=re.compile(r'robocop',re.I)
>>>robocop.search('Robocopispartman,partmachine,allcop.').group()
'Robocop'
>>>robocop.search('ROBOCOPprotectstheinnocent.').group()
'ROBOCOP'
>>>robocop.search('Al,whydoesyourprogrammingbooktalkaboutrobocopsomuch?').group()
'robocop'
SubstitutingStringswiththesub()MethodRegularexpressionscannotonlyfindtextpatternsbutcanalsosubstitutenewtextinplaceofthosepatterns.Thesub()methodforRegexobjectsispassedtwoarguments.Thefirstargumentisastringtoreplaceanymatches.Thesecondisthestringfortheregularexpression.Thesub()methodreturnsastringwiththesubstitutionsapplied.
Forexample,enterthefollowingintotheinteractiveshell:>>>namesRegex=re.compile(r'Agent\w+')
>>>namesRegex.sub('CENSORED','AgentAlicegavethesecretdocumentstoAgentBob.')
'CENSOREDgavethesecretdocumentstoCENSORED.'
Sometimesyoumayneedtousethematchedtextitselfaspartofthesubstitution.Inthefirstargumenttosub(),youcantype\1,\2,\3,andsoon,tomean“Enterthetextofgroup1,2,3,andsoon,inthesubstitution.”
Forexample,sayyouwanttocensorthenamesofthesecretagentsbyshowingjustthefirstlettersoftheirnames.Todothis,youcouldusetheregexAgent(\w)\w*andpassr'\1****'asthefirstargumenttosub().The\1inthatstringwillbereplacedbywhatevertextwasmatchedbygroup1—thatis,the(\w)groupoftheregularexpression.
>>>agentNamesRegex=re.compile(r'Agent(\w)\w*')
>>>agentNamesRegex.sub(r'\1****','AgentAlicetoldAgentCarolthatAgent
EveknewAgentBobwasadoubleagent.')
A****toldC****thatE****knewB****wasadoubleagent.'
ManagingComplexRegexesRegularexpressionsarefineifthetextpatternyouneedtomatchissimple.Butmatchingcomplicatedtextpatternsmightrequirelong,convolutedregularexpressions.Youcanmitigatethisbytellingthere.compile()functiontoignorewhitespaceandcommentsinsidetheregularexpressionstring.This“verbosemode”canbeenabledbypassingthevariablere.VERBOSEasthesecondargumenttore.compile().
Nowinsteadofahard-to-readregularexpressionlikethis:phoneRegex=re.compile(r'((\d{3}|\(\d{3}\))?(\s|-|\.)?\d{3}(\s|-|\.)\d{4}
(\s*(ext|x|ext.)\s*\d{2,5})?)')
youcanspreadtheregularexpressionovermultiplelineswithcommentslikethis:phoneRegex=re.compile(r'''(
(\d{3}|\(\d{3}\))?#areacode
(\s|-|\.)?#separator
\d{3}#first3digits
(\s|-|\.)#separator
\d{4}#last4digits
(\s*(ext|x|ext.)\s*\d{2,5})?#extension
)''',re.VERBOSE)
Notehowthepreviousexampleusesthetriple-quotesyntax(''')tocreateamultilinestringsothatyoucanspreadtheregularexpressiondefinitionovermanylines,makingitmuchmorelegible.
ThecommentrulesinsidetheregularexpressionstringarethesameasregularPythoncode:The#symbolandeverythingafterittotheendofthelineareignored.Also,theextraspacesinsidethemultilinestringfortheregularexpressionarenotconsideredpartofthetextpatterntobematched.Thisletsyouorganizetheregularexpressionsoit’seasiertoread.
Combiningre.IGNORECASE,re.DOTALL,andre.VERBOSEWhatifyouwanttousere.VERBOSEtowritecommentsinyourregularexpressionbutalsowanttousere.IGNORECASEtoignorecapitalization?Unfortunately,there.compile()functiontakesonlyasinglevalueasitssecondargument.Youcangetaroundthislimitationbycombiningthere.IGNORECASE,re.DOTALL,andre.VERBOSEvariablesusingthepipecharacter(|),whichinthiscontextisknownasthebitwiseoroperator.
Soifyouwantaregularexpressionthat’scase-insensitiveandincludesnewlinestomatchthedotcharacter,youwouldformyourre.compile()calllikethis:
>>>someRegexValue=re.compile('foo',re.IGNORECASE|re.DOTALL)
Allthreeoptionsforthesecondargumentwilllooklikethis:>>>someRegexValue=re.compile('foo',re.IGNORECASE|re.DOTALL|re.VERBOSE)
Thissyntaxisalittleold-fashionedandoriginatesfromearlyversionsofPython.Thedetailsofthebitwiseoperatorsarebeyondthescopeofthisbook,butcheckouttheresourcesathttp://nostarch.com/automatestuff/formoreinformation.Youcanalsopassotheroptionsforthesecondargument;they’reuncommon,butyoucanreadmoreaboutthemintheresources,too.
Project:PhoneNumberandEmailAddressExtractorSayyouhavetheboringtaskoffindingeveryphonenumberandemailaddressinalongwebpageordocument.Ifyoumanuallyscrollthroughthepage,youmightendupsearchingforalongtime.Butifyouhadaprogramthatcouldsearchthetextinyourclipboardforphonenumbersandemailaddresses,youcouldsimplypressCTRL-Atoselectallthetext,pressCTRL-Ctocopyittotheclipboard,andthenrunyourprogram.Itcouldreplacethetextontheclipboardwithjustthephonenumbersandemailaddressesitfinds.
Wheneveryou’retacklinganewproject,itcanbetemptingtodiverightintowritingcode.Butmoreoftenthannot,it’sbesttotakeastepbackandconsiderthebiggerpicture.Irecommendfirstdrawingupahigh-levelplanforwhatyourprogramneedstodo.Don’tthinkabouttheactualcodeyet—youcanworryaboutthatlater.Rightnow,sticktobroadstrokes.
Forexample,yourphoneandemailaddressextractorwillneedtodothefollowing:
Getthetextofftheclipboard.Findallphonenumbersandemailaddressesinthetext.Pastethemontotheclipboard.
Nowyoucanstartthinkingabouthowthismightworkincode.Thecodewillneedtodothefollowing:
Usethepyperclipmoduletocopyandpastestrings.Createtworegexes,oneformatchingphonenumbersandtheotherformatchingemailaddresses.Findallmatches,notjustthefirstmatch,ofbothregexes.Neatlyformatthematchedstringsintoasinglestringtopaste.Displaysomekindofmessageifnomatcheswerefoundinthetext.
Thislistislikearoadmapfortheproject.Asyouwritethecode,youcanfocusoneachofthesestepsseparately.EachstepisfairlymanageableandexpressedintermsofthingsyoualreadyknowhowtodoinPython.
Step1:CreateaRegexforPhoneNumbersFirst,youhavetocreatearegularexpressiontosearchforphonenumbers.Createanewfile,enterthefollowing,andsaveitasphoneAndEmail.py:
#!python3
#phoneAndEmail.py-Findsphonenumbersandemailaddressesontheclipboard.
importpyperclip,re
phoneRegex=re.compile(r'''(
(\d{3}|\(\d{3}\))?#areacode
(\s|-|\.)?#separator
(\d{3})#first3digits
(\s|-|\.)#separator
(\d{4})#last4digits
(\s*(ext|x|ext.)\s*(\d{2,5}))?#extension
)''',re.VERBOSE)
#TODO:Createemailregex.
#TODO:Findmatchesinclipboardtext.
#TODO:Copyresultstotheclipboard.
TheTODOcommentsarejustaskeletonfortheprogram.They’llbereplacedasyouwritetheactualcode.
Thephonenumberbeginswithanoptionalareacode,sotheareacodegroupisfollowedwithaquestionmark.Sincetheareacodecanbejustthreedigits(thatis,\d{3})orthreedigitswithinparentheses(thatis,\(\d{3}\)),youshouldhaveapipejoiningthoseparts.Youcanaddtheregexcomment#Areacodetothispartofthemultilinestringtohelpyourememberwhat(\d{3}|\(\d{3}\))?issupposedtomatch.
Thephonenumberseparatorcharactercanbeaspace(\s),hyphen(-),orperiod(.),sothesepartsshouldalsobejoinedbypipes.Thenextfewpartsoftheregularexpressionarestraightforward:threedigits,followedbyanotherseparator,followedbyfourdigits.Thelastpartisanoptionalextensionmadeupofanynumberofspacesfollowedbyext,x,orext.,followedbytwotofivedigits.
Step2:CreateaRegexforEmailAddressesYouwillalsoneedaregularexpressionthatcanmatchemailaddresses.Makeyourprogramlooklikethefollowing:
#!python3
#phoneAndEmail.py-Findsphonenumbersandemailaddressesontheclipboard.
importpyperclip,re
phoneRegex=re.compile(r'''(
--snip--
#Createemailregex.
emailRegex=re.compile(r'''(
➊[a-zA-Z0-9._%+-]+#username
➋@#@symbol
➌[a-zA-Z0-9.-]+#domainname
(\.[a-zA-Z]{2,4})#dot-something
)''',re.VERBOSE)
#TODO:Findmatchesinclipboardtext.
#TODO:Copyresultstotheclipboard.
Theusernamepartoftheemailaddress➊isoneormorecharactersthatcanbeanyofthefollowing:lowercaseanduppercaseletters,numbers,adot,anunderscore,apercentsign,aplussign,orahyphen.Youcanputalloftheseintoacharacterclass:[a-zA-Z0-9._%+-].
Thedomainandusernameareseparatedbyan@symbol➋.Thedomainname➌hasaslightlylesspermissivecharacterclasswithonlyletters,numbers,periods,andhyphens:[a-zA-Z0-9.-].Andlastwillbethe“dot-com”part(technicallyknownasthetop-leveldomain),whichcanreallybedot-anything.Thisisbetweentwoandfourcharacters.
Theformatforemailaddresseshasalotofweirdrules.Thisregularexpressionwon’tmatcheverypossiblevalidemailaddress,butit’llmatchalmostanytypicalemailaddressyou’llencounter.
Step3:FindAllMatchesintheClipboardTextNowthatyouhavespecifiedtheregularexpressionsforphonenumbersandemailaddresses,youcanletPython’sremoduledothehardworkoffindingallthematchesontheclipboard.Thepyperclip.paste()functionwillgetastringvalueofthetextonthe
clipboard,andthefindall()regexmethodwillreturnalistoftuples.
Makeyourprogramlooklikethefollowing:#!python3
#phoneAndEmail.py-Findsphonenumbersandemailaddressesontheclipboard.
importpyperclip,re
phoneRegex=re.compile(r'''(
--snip--
#Findmatchesinclipboardtext.
text=str(pyperclip.paste())
➊matches=[]
➋forgroupsinphoneRegex.findall(text):
phoneNum='-'.join([groups[1],groups[3],groups[5]])
ifgroups[8]!='':
phoneNum+='x'+groups[8]
matches.append(phoneNum)
➌forgroupsinemailRegex.findall(text):
matches.append(groups[0])
#TODO:Copyresultstotheclipboard.
Thereisonetupleforeachmatch,andeachtuplecontainsstringsforeachgroupintheregularexpression.Rememberthatgroup0matchestheentireregularexpression,sothegroupatindex0ofthetupleistheoneyouareinterestedin.
Asyoucanseeat➊,you’llstorethematchesinalistvariablenamedmatches.Itstartsoffasanemptylist,andacoupleforloops.Fortheemailaddresses,youappendgroup0ofeachmatch➌.Forthematchedphonenumbers,youdon’twanttojustappendgroup0.Whiletheprogramdetectsphonenumbersinseveralformats,youwantthephonenumberappendedtobeinasingle,standardformat.ThephoneNumvariablecontainsastringbuiltfromgroups1,3,5,and8ofthematchedtext➋.(Thesegroupsaretheareacode,firstthreedigits,lastfourdigits,andextension.)
Step4:JointheMatchesintoaStringfortheClipboardNowthatyouhavetheemailaddressesandphonenumbersasalistofstringsinmatches,youwanttoputthemontheclipboard.Thepyperclip.copy()functiontakesonlyasinglestringvalue,notalistofstrings,soyoucallthejoin()methodonmatches.
Tomakeiteasiertoseethattheprogramisworking,let’sprintanymatchesyoufindtotheterminal.Andifnophonenumbersoremailaddresseswerefound,theprogramshouldtelltheuserthis.
Makeyourprogramlooklikethefollowing:#!python3
#phoneAndEmail.py-Findsphonenumbersandemailaddressesontheclipboard.
--snip--
forgroupsinemailRegex.findall(text):
matches.append(groups[0])
#Copyresultstotheclipboard.
iflen(matches)>0:
pyperclip.copy('\n'.join(matches))
print('Copiedtoclipboard:')
print('\n'.join(matches))
else:
print('Nophonenumbersoremailaddressesfound.')
RunningtheProgram
Foranexample,openyourwebbrowsertotheNoStarchPresscontactpageathttp://www.nostarch.com/contactus.htm,pressCTRL-Atoselectallthetextonthepage,andpressCTRL-Ctocopyittotheclipboard.Whenyourunthisprogram,theoutputwilllooksomethinglikethis:
Copiedtoclipboard:
800-420-7240
415-863-9900
415-863-9950
IdeasforSimilarProgramsIdentifyingpatternsoftext(andpossiblysubstitutingthemwiththesub()method)hasmanydifferentpotentialapplications.
FindwebsiteURLsthatbeginwithhttp://orhttps://.Cleanupdatesindifferentdateformats(suchas3/14/2015,03-14-2015,and2015/3/14)byreplacingthemwithdatesinasingle,standardformat.RemovesensitiveinformationsuchasSocialSecurityorcreditcardnumbers.Findcommontypossuchasmultiplespacesbetweenwords,accidentallyaccidentallyrepeatedwords,ormultipleexclamationmarksattheendofsentences.Thoseareannoying!!
SummaryWhileacomputercansearchfortextquickly,itmustbetoldpreciselywhattolookfor.Regularexpressionsallowyoutospecifytheprecisepatternsofcharactersyouarelookingfor.Infact,somewordprocessingandspreadsheetapplicationsprovidefind-and-replacefeaturesthatallowyoutosearchusingregularexpressions.
TheremodulethatcomeswithPythonletsyoucompileRegexobjects.Thesevalueshaveseveralmethods:search()tofindasinglematch,findall()tofindallmatchinginstances,andsub()todoafind-and-replacesubstitutionoftext.
There’sabitmoretoregularexpressionsyntaxthanisdescribedinthischapter.YoucanfindoutmoreintheofficialPythondocumentationathttp://docs.python.org/3/library/re.html.Thetutorialwebsitehttp://www.regular-expressions.info/isalsoausefulresource.
Nowthatyouhaveexpertisemanipulatingandmatchingstrings,it’stimetodiveintohowtoreadfromandwritetofilesonyourcomputer’sharddrive.
PracticeQuestionsQ: 1.WhatisthefunctionthatcreatesRegexobjects?
Q: 2.WhyarerawstringsoftenusedwhencreatingRegexobjects?
Q: 3.Whatdoesthesearch()methodreturn?
Q: 4.HowdoyougettheactualstringsthatmatchthepatternfromaMatchobject?
Q: 5.Intheregexcreatedfromr'(\d\d\d)-(\d\d\d-\d\d\d\d)',whatdoesgroup0cover?Group1?Group2?
Q: 6.Parenthesesandperiodshavespecificmeaningsinregularexpressionsyntax.Howwouldyouspecifythatyouwantaregextomatchactualparenthesesandperiodcharacters?
Q: 7.Thefindall()methodreturnsalistofstringsoralistoftuplesofstrings.Whatmakesitreturnoneortheother?
Q: 8.Whatdoesthe|charactersignifyinregularexpressions?
Q: 9.Whattwothingsdoesthe?charactersignifyinregularexpressions?
Q: 10.Whatisthedifferencebetweenthe+and*charactersinregularexpressions?
Q: 11.Whatisthedifferencebetween{3}and{3,5}inregularexpressions?
Q: 12.Whatdothe\d,\w,and\sshorthandcharacterclassessignifyinregularexpressions?
Q: 13.Whatdothe\D,\W,and\Sshorthandcharacterclassessignifyinregularexpressions?
Q: 14.Howdoyoumakearegularexpressioncase-insensitive?
Q: 15.Whatdoesthe.characternormallymatch?Whatdoesitmatchifre.DOTALLispassedasthesecondargumenttore.compile()?
Q: 16.Whatisthedifferencebetween.*and.*??
Q: 17.Whatisthecharacterclasssyntaxtomatchallnumbersandlowercaseletters?
Q: 18.IfnumRegex=re.compile(r'\d+'),whatwillnumRegex.sub('X','12drummers,11pipers,fiverings,3hens')return?
Q: 19.Whatdoespassingre.VERBOSEasthesecondargumenttore.compile()allowyoutodo?
Q: 20.Howwouldyouwritearegexthatmatchesanumberwithcommasforeverythreedigits?Itmustmatchthefollowing:
'42'
'1,234'
'6,368,745'
butnotthefollowing:
'12,34,567'(whichhasonlytwodigitsbetweenthecommas)'1234'(whichlackscommas)
Q: 21.HowwouldyouwritearegexthatmatchesthefullnameofsomeonewhoselastnameisNakamoto?Youcanassumethatthefirstnamethatcomesbeforeitwillalwaysbeonewordthatbeginswithacapitalletter.Theregexmustmatchthefollowing:
'SatoshiNakamoto'
'AliceNakamoto'
'RobocopNakamoto'
butnotthefollowing:
'satoshiNakamoto'(wherethefirstnameisnotcapitalized)'Mr.Nakamoto'(wheretheprecedingwordhasanonlettercharacter)'Nakamoto'(whichhasnofirstname)'Satoshinakamoto'(whereNakamotoisnotcapitalized)
Q: 22.HowwouldyouwritearegexthatmatchesasentencewherethefirstwordiseitherAlice,Bob,orCarol;thesecondwordiseithereats,pets,orthrows;thethirdwordisapples,cats,orbaseballs;andthesentenceendswithaperiod?Thisregexshouldbecase-insensitive.Itmustmatchthefollowing:
'Aliceeatsapples.'
'Bobpetscats.'
'Carolthrowsbaseballs.'
'AlicethrowsApples.'
'BOBEATSCATS.'
butnotthefollowing:
'Robocopeatsapples.'
'ALICETHROWSFOOTBALLS.'
'Caroleats7cats.'
PracticeProjectsForpractice,writeprogramstodothefollowingtasks.
StrongPasswordDetectionWriteafunctionthatusesregularexpressionstomakesurethepasswordstringitispassedisstrong.Astrongpasswordisdefinedasonethatisatleasteightcharacterslong,containsbothuppercaseandlowercasecharacters,andhasatleastonedigit.Youmayneedtotestthestringagainstmultipleregexpatternstovalidateitsstrength.
RegexVersionofstrip()Writeafunctionthattakesastringanddoesthesamethingasthestrip()stringmethod.Ifnootherargumentsarepassedotherthanthestringtostrip,thenwhitespacecharacterswillberemovedfromthebeginningandendofthestring.Otherwise,thecharactersspecifiedinthesecondargumenttothefunctionwillberemovedfromthestring.
[1]CoryDoctorow,“Here’swhatICTshouldreallyteachkids:howtodoregularexpressions,”Guardian,December4,2012,http://www.theguardian.com/technology/2012/dec/04/ict-teach-kids-regular-expressions/.
Chapter8.ReadingandWritingFilesVariablesareafinewaytostoredatawhileyourprogramisrunning,butifyouwantyourdatatopersistevenafteryourprogramhasfinished,youneedtosaveittoafile.Youcanthinkofafile’scontentsasasinglestringvalue,potentiallygigabytesinsize.Inthischapter,youwilllearnhowtousePythontocreate,read,andsavefilesontheharddrive.
FilesandFilePathsAfilehastwokeyproperties:afilename(usuallywrittenasoneword)andapath.Thepathspecifiesthelocationofafileonthecomputer.Forexample,thereisafileonmyWindows7laptopwiththefilenameprojects.docxinthepathC:\Users\asweigart\Documents.Thepartofthefilenameafterthelastperiodiscalledthefile’sextensionandtellsyouafile’stype.project.docxisaWorddocument,andUsers,asweigart,andDocumentsallrefertofolders(alsocalleddirectories).Folderscancontainfilesandotherfolders.Forexample,project.docxisintheDocumentsfolder,whichisinsidetheasweigartfolder,whichisinsidetheUsersfolder.Figure8-1showsthisfolderorganization.
Figure8-1.Afileinahierarchyoffolders
TheC:\partofthepathistherootfolder,whichcontainsallotherfolders.OnWindows,therootfolderisnamedC:\andisalsocalledtheC:drive.OnOSXandLinux,therootfolderis/.Inthisbook,I’llbeusingtheWindows-stylerootfolder,C:\.IfyouareenteringtheinteractiveshellexamplesonOSXorLinux,enter/instead.
Additionalvolumes,suchasaDVDdriveorUSBthumbdrive,willappeardifferentlyondifferentoperatingsystems.OnWindows,theyappearasnew,letteredrootdrives,suchasD:\orE:\.OnOSX,theyappearasnewfoldersunderthe/Volumesfolder.OnLinux,theyappearasnewfoldersunderthe/mnt(“mount”)folder.AlsonotethatwhilefoldernamesandfilenamesarenotcasesensitiveonWindowsandOSX,theyarecasesensitiveonLinux.
BackslashonWindowsandForwardSlashonOSXandLinuxOnWindows,pathsarewrittenusingbackslashes(\)astheseparatorbetweenfoldernames.OSXandLinux,however,usetheforwardslash(/)astheirpathseparator.Ifyouwantyourprogramstoworkonalloperatingsystems,youwillhavetowriteyourPythonscriptstohandlebothcases.
Fortunately,thisissimpletodowiththeos.path.join()function.Ifyoupassitthestringvaluesofindividualfileandfoldernamesinyourpath,os.path.join()willreturnastringwithafilepathusingthecorrectpathseparators.Enterthefollowingintotheinteractiveshell:
>>>importos
>>>os.path.join('usr','bin','spam')
'usr\\bin\\spam'
I’mrunningtheseinteractiveshellexamplesonWindows,soos.path.join('usr','bin','spam')returned'usr\\bin\\spam'.(Noticethatthebackslashesaredoubledbecauseeachbackslashneedstobeescapedbyanotherbackslashcharacter.)IfIhadcalledthisfunctiononOSXorLinux,thestringwouldhavebeen'usr/bin/spam'.
Theos.path.join()functionishelpfulifyouneedtocreatestringsforfilenames.Thesestringswillbepassedtoseveralofthefile-relatedfunctionsintroducedinthischapter.Forexample,thefollowingexamplejoinsnamesfromalistoffilenamestotheendofafolder’sname:
>>>myFiles=['accounts.txt','details.csv','invite.docx']
>>>forfilenameinmyFiles:
print(os.path.join('C:\\Users\\asweigart',filename))
C:\Users\asweigart\accounts.txt
C:\Users\asweigart\details.csv
C:\Users\asweigart\invite.docx
TheCurrentWorkingDirectoryEveryprogramthatrunsonyourcomputerhasacurrentworkingdirectory,orcwd.Anyfilenamesorpathsthatdonotbeginwiththerootfolderareassumedtobeunderthecurrentworkingdirectory.Youcangetthecurrentworkingdirectoryasastringvaluewiththeos.getcwd()functionandchangeitwithos.chdir().Enterthefollowingintotheinteractiveshell:
>>>importos
>>>os.getcwd()
'C:\\Python34'
>>>os.chdir('C:\\Windows\\System32')
>>>os.getcwd()
'C:\\Windows\\System32'
Here,thecurrentworkingdirectoryissettoC:\Python34,sothefilenameproject.docxreferstoC:\Python34\project.docx.WhenwechangethecurrentworkingdirectorytoC:\Windows,project.docxisinterpretedasC:\Windows\project.docx.
Pythonwilldisplayanerrorifyoutrytochangetoadirectorythatdoesnotexist.>>>os.chdir('C:\\ThisFolderDoesNotExist')
Traceback(mostrecentcalllast):
File"<pyshell#18>",line1,in<module>
os.chdir('C:\\ThisFolderDoesNotExist')
FileNotFoundError:[WinError2]Thesystemcannotfindthefilespecified:
'C:\\ThisFolderDoesNotExist'
NOTE
Whilefolderisthemoremodernnamefordirectory,notethatcurrentworkingdirectory(orjustworkingdirectory)isthestandardterm,notcurrentworkingfolder.
Absolutevs.RelativePathsTherearetwowaystospecifyafilepath.
Anabsolutepath,whichalwaysbeginswiththerootfolderArelativepath,whichisrelativetotheprogram’scurrentworkingdirectory
Therearealsothedot(.)anddot-dot(..)folders.Thesearenotrealfoldersbutspecialnamesthatcanbeusedinapath.Asingleperiod(“dot”)forafoldernameisshorthandfor“thisdirectory.”Twoperiods(“dot-dot”)means“theparentfolder.”
Figure8-2isanexampleofsomefoldersandfiles.WhenthecurrentworkingdirectoryissettoC:\bacon,therelativepathsfortheotherfoldersandfilesaresetastheyareinthefigure.
Figure8-2.TherelativepathsforfoldersandfilesintheworkingdirectoryC:\bacon
The.\atthestartofarelativepathisoptional.Forexample,.\spam.txtandspam.txtrefertothesamefile.
CreatingNewFolderswithos.makedirs()Yourprogramscancreatenewfolders(directories)withtheos.makedirs()function.Enterthefollowingintotheinteractiveshell:
>>>importos
>>>os.makedirs('C:\\delicious\\walnut\\waffles')
ThiswillcreatenotjusttheC:\deliciousfolderbutalsoawalnutfolderinsideC:\deliciousandawafflesfolderinsideC:\delicious\walnut.Thatis,os.makedirs()willcreateanynecessaryintermediatefoldersinordertoensurethatthefullpathexists.Figure8-3showsthishierarchyoffolders.
Figure8-3.Theresultofos.makedirs('C:\\delicious\\walnut\\waffles')
Theos.pathModuleTheos.pathmodulecontainsmanyhelpfulfunctionsrelatedtofilenamesandfilepaths.Forinstance,you’vealreadyusedos.path.join()tobuildpathsinawaythatwillworkonanyoperatingsystem.Sinceos.pathisamoduleinsidetheosmodule,youcanimportitbysimplyrunningimportos.Wheneveryourprogramsneedtoworkwithfiles,folders,orfilepaths,youcanrefertotheshortexamplesinthissection.Thefulldocumentationfortheos.pathmoduleisonthePythonwebsiteathttp://docs.python.org/3/library/os.path.html.
NOTE
Mostoftheexamplesthatfollowinthissectionwillrequiretheosmodule,soremembertoimportitatthebeginningofanyscriptyouwriteandanytimeyourestartIDLE.Otherwise,you’llgetaNameError:name'os'isnotdefinederrormessage.
HandlingAbsoluteandRelativePathsTheos.pathmoduleprovidesfunctionsforreturningtheabsolutepathofarelativepathandforcheckingwhetheragivenpathisanabsolutepath.
Callingos.path.abspath(path)willreturnastringoftheabsolutepathoftheargument.Thisisaneasywaytoconvertarelativepathintoanabsoluteone.Callingos.path.isabs(path)willreturnTrueiftheargumentisanabsolutepathandFalseifitisarelativepath.Callingos.path.relpath(path,start)willreturnastringofarelativepathfromthestartpathtopath.Ifstartisnotprovided,thecurrentworkingdirectoryisusedasthestartpath.
Trythesefunctionsintheinteractiveshell:>>>os.path.abspath('.')
'C:\\Python34'
>>>os.path.abspath('.\\Scripts')
'C:\\Python34\\Scripts'
>>>os.path.isabs('.')
False
>>>os.path.isabs(os.path.abspath('.'))
True
SinceC:\Python34wastheworkingdirectorywhenos.path.abspath()wascalled,the“single-dot”folderrepresentstheabsolutepath'C:\\Python34'.
NOTE
Sinceyoursystemprobablyhasdifferentfilesandfoldersonitthanmine,youwon’tbeabletofolloweveryexampleinthischapterexactly.Still,trytofollowalongusingfoldersthatexistonyourcomputer.
Enterthefollowingcallstoos.path.relpath()intotheinteractiveshell:>>>os.path.relpath('C:\\Windows','C:\\')
'Windows'
>>>os.path.relpath('C:\\Windows','C:\\spam\\eggs')
'..\\..\\Windows'
>>>os.getcwd()'C:\\Python34'
Callingos.path.dirname(path)willreturnastringofeverythingthatcomesbeforethelastslashinthepathargument.Callingos.path.basename(path)willreturnastringofeverythingthatcomesafterthelastslashinthepathargument.Thedirnameandbase
nameofapathareoutlinedinFigure8-4.
Figure8-4.Thebasenamefollowsthelastslashinapathandisthesameasthefilename.Thedirnameiseverythingbeforethelastslash.
Forexample,enterthefollowingintotheinteractiveshell:>>>path='C:\\Windows\\System32\\calc.exe'
>>>os.path.basename(path)
'calc.exe'
>>>os.path.dirname(path)
'C:\\Windows\\System32'
Ifyouneedapath’sdirnameandbasenametogether,youcanjustcallos.path.split()togetatuplevaluewiththesetwostrings,likeso:
>>>calcFilePath='C:\\Windows\\System32\\calc.exe'
>>>os.path.split(calcFilePath)
('C:\\Windows\\System32','calc.exe')
Noticethatyoucouldcreatethesametuplebycallingos.path.dirname()andos.path.basename()andplacingtheirreturnvaluesinatuple.
>>>(os.path.dirname(calcFilePath),os.path.basename(calcFilePath))
('C:\\Windows\\System32','calc.exe')
Butos.path.split()isaniceshortcutifyouneedbothvalues.
Also,notethatos.path.split()doesnottakeafilepathandreturnalistofstringsofeachfolder.Forthat,usethesplit()stringmethodandsplitonthestringinos.sep.Recallfromearlierthattheos.sepvariableissettothecorrectfolder-separatingslashforthecomputerrunningtheprogram.
Forexample,enterthefollowingintotheinteractiveshell:>>>calcFilePath.split(os.path.sep)
['C:','Windows','System32','calc.exe']
OnOSXandLinuxsystems,therewillbeablankstringatthestartofthereturnedlist:>>>'/usr/bin'.split(os.path.sep)
['','usr','bin']
Thesplit()stringmethodwillworktoreturnalistofeachpartofthepath.Itwillworkonanyoperatingsystemifyoupassitos.path.sep.
FindingFileSizesandFolderContentsOnceyouhavewaysofhandlingfilepaths,youcanthenstartgatheringinformationaboutspecificfilesandfolders.Theos.pathmoduleprovidesfunctionsforfindingthesizeofafileinbytesandthefilesandfoldersinsideagivenfolder.
Callingos.path.getsize(path)willreturnthesizeinbytesofthefileinthepathargument.Callingos.listdir(path)willreturnalistoffilenamestringsforeachfileinthepathargument.(Notethatthisfunctionisintheosmodule,notos.path.)
Here’swhatIgetwhenItrythesefunctionsintheinteractiveshell:
>>>os.path.getsize('C:\\Windows\\System32\\calc.exe')
776192
>>>os.listdir('C:\\Windows\\System32')
['0409','12520437.cpx','12520850.cpx','5U877.ax','aaclient.dll',
--snip--
'xwtpdui.dll','xwtpw32.dll','zh-CN','zh-HK','zh-TW','zipfldr.dll']
Asyoucansee,thecalc.exeprogramonmycomputeris776,192bytesinsize,andIhavealotoffilesinC:\Windows\system32.IfIwanttofindthetotalsizeofallthefilesinthisdirectory,Icanuseos.path.getsize()andos.listdir()together.
>>>totalSize=0
>>>forfilenameinos.listdir('C:\\Windows\\System32'):
totalSize=totalSize+os.path.getsize(os.path.join('C:\\Windows\\System32',filename))
>>>print(totalSize)
1117846456
AsIloopovereachfilenameintheC:\Windows\System32folder,thetotalSizevariableisincrementedbythesizeofeachfile.NoticehowwhenIcallos.path.getsize(),Iuseos.path.join()tojointhefoldernamewiththecurrentfilename.Theintegerthatos.path.getsize()returnsisaddedtothevalueoftotalSize.Afterloopingthroughallthefiles,IprinttotalSizetoseethetotalsizeoftheC:\Windows\System32folder.
CheckingPathValidityManyPythonfunctionswillcrashwithanerrorifyousupplythemwithapaththatdoesnotexist.Theos.pathmoduleprovidesfunctionstocheckwhetheragivenpathexistsandwhetheritisafileorfolder.
Callingos.path.exists(path)willreturnTrueifthefileorfolderreferredtointheargumentexistsandwillreturnFalseifitdoesnotexist.Callingos.path.isfile(path)willreturnTrueifthepathargumentexistsandisafileandwillreturnFalseotherwise.Callingos.path.isdir(path)willreturnTrueifthepathargumentexistsandisafolderandwillreturnFalseotherwise.
Here’swhatIgetwhenItrythesefunctionsintheinteractiveshell:>>>os.path.exists('C:\\Windows')
True
>>>os.path.exists('C:\\some_made_up_folder')
False
>>>os.path.isdir('C:\\Windows\\System32')
True
>>>os.path.isfile('C:\\Windows\\System32')
False
>>>os.path.isdir('C:\\Windows\\System32\\calc.exe')
False
>>>os.path.isfile('C:\\Windows\\System32\\calc.exe')
True
YoucandeterminewhetherthereisaDVDorflashdrivecurrentlyattachedtothecomputerbycheckingforitwiththeos.path.exists()function.Forinstance,ifIwantedtocheckforaflashdrivewiththevolumenamedD:\onmyWindowscomputer,Icoulddothatwiththefollowing:
>>>os.path.exists('D:\\')
False
Oops!ItlookslikeIforgottopluginmyflashdrive.
TheFileReading/WritingProcessOnceyouarecomfortableworkingwithfoldersandrelativepaths,you’llbeabletospecifythelocationoffilestoreadandwrite.Thefunctionscoveredinthenextfewsectionswillapplytoplaintextfiles.Plaintextfilescontainonlybasictextcharactersanddonotincludefont,size,orcolorinformation.Textfileswiththe.txtextensionorPythonscriptfileswiththe.pyextensionareexamplesofplaintextfiles.ThesecanbeopenedwithWindows’sNotepadorOSX’sTextEditapplication.Yourprogramscaneasilyreadthecontentsofplaintextfilesandtreatthemasanordinarystringvalue.
Binaryfilesareallotherfiletypes,suchaswordprocessingdocuments,PDFs,images,spreadsheets,andexecutableprograms.IfyouopenabinaryfileinNotepadorTextEdit,itwilllooklikescramblednonsense,likeinFigure8-5.
Figure8-5.TheWindowscalc.exeprogramopenedinNotepad
Sinceeverydifferenttypeofbinaryfilemustbehandledinitsownway,thisbookwillnotgointoreadingandwritingrawbinaryfilesdirectly.Fortunately,manymodulesmakeworkingwithbinaryfileseasier—youwillexploreoneofthem,theshelvemodule,laterinthischapter.
TherearethreestepstoreadingorwritingfilesinPython.
1. Calltheopen()functiontoreturnaFileobject.2. Calltheread()orwrite()methodontheFileobject.3. Closethefilebycallingtheclose()methodontheFileobject.
OpeningFileswiththeopen()FunctionToopenafilewiththeopen()function,youpassitastringpathindicatingthefileyouwanttoopen;itcanbeeitheranabsoluteorrelativepath.Theopen()functionreturnsaFileobject.
Tryitbycreatingatextfilenamedhello.txtusingNotepadorTextEdit.TypeHelloworld!asthecontentofthistextfileandsaveitinyouruserhomefolder.Then,ifyou’reusingWindows,enterthefollowingintotheinteractiveshell:
>>>helloFile=open('C:\\Users\\your_home_folder\\hello.txt')
Ifyou’reusingOSX,enterthefollowingintotheinteractiveshellinstead:>>>helloFile=open('/Users/your_home_folder/hello.txt')
Makesuretoreplaceyour_home_folderwithyourcomputerusername.Forexample,myusernameisasweigart,soI’denter'C:\\Users\\asweigart\\hello.txt'onWindows.
Boththesecommandswillopenthefilein“readingplaintext”mode,orreadmodeforshort.Whenafileisopenedinreadmode,Pythonletsyouonlyreaddatafromthefile;youcan’twriteormodifyitinanyway.ReadmodeisthedefaultmodeforfilesyouopeninPython.Butifyoudon’twanttorelyonPython’sdefaults,youcanexplicitlyspecifythemodebypassingthestringvalue'r'asasecondargumenttoopen().Soopen('/Users/asweigart/hello.txt','r')andopen('/Users/asweigart/hello.txt')dothesamething.
Thecalltoopen()returnsaFileobject.AFileobjectrepresentsafileonyourcomputer;itissimplyanothertypeofvalueinPython,muchlikethelistsanddictionariesyou’realreadyfamiliarwith.Inthepreviousexample,youstoredtheFileobjectinthevariablehelloFile.Now,wheneveryouwanttoreadfromorwritetothefile,youcandosobycallingmethodsontheFileobjectinhelloFile.
ReadingtheContentsofFilesNowthatyouhaveaFileobject,youcanstartreadingfromit.Ifyouwanttoreadtheentirecontentsofafileasastringvalue,usetheFileobject’sread()method.Let’scontinuewiththehello.txtFileobjectyoustoredinhelloFile.Enterthefollowingintotheinteractiveshell:
>>>helloContent=helloFile.read()
>>>helloContent
'Helloworld!'
Ifyouthinkofthecontentsofafileasasinglelargestringvalue,theread()methodreturnsthestringthatisstoredinthefile.
Alternatively,youcanusethereadlines()methodtogetalistofstringvaluesfromthefile,onestringforeachlineoftext.Forexample,createafilenamedsonnet29.txtinthesamedirectoryashello.txtandwritethefollowingtextinit:
When,indisgracewithfortuneandmen'seyes,
Iallalonebeweepmyoutcaststate,
Andtroubledeafheavenwithmybootlesscries,
Andlookuponmyselfandcursemyfate,
Makesuretoseparatethefourlineswithlinebreaks.Thenenterthefollowingintotheinteractiveshell:
>>>sonnetFile=open('sonnet29.txt')
>>>sonnetFile.readlines()
[When,indisgracewithfortuneandmen'seyes,\n','Iallalonebeweepmy
outcaststate,\n',Andtroubledeafheavenwithmybootlesscries,\n',And
lookuponmyselfandcursemyfate,']
Notethateachofthestringvaluesendswithanewlinecharacter,\n,exceptforthelastlineofthefile.Alistofstringsisofteneasiertoworkwiththanasinglelargestringvalue.
WritingtoFilesPythonallowsyoutowritecontenttoafileinawaysimilartohowtheprint()function
“writes”stringstothescreen.Youcan’twritetoafileyou’veopenedinreadmode,though.Instead,youneedtoopenitin“writeplaintext”modeor“appendplaintext”mode,orwritemodeandappendmodeforshort.
Writemodewilloverwritetheexistingfileandstartfromscratch,justlikewhenyouoverwriteavariable’svaluewithanewvalue.Pass'w'asthesecondargumenttoopen()toopenthefileinwritemode.Appendmode,ontheotherhand,willappendtexttotheendoftheexistingfile.Youcanthinkofthisasappendingtoalistinavariable,ratherthanoverwritingthevariablealtogether.Pass'a'asthesecondargumenttoopen()toopenthefileinappendmode.
Ifthefilenamepassedtoopen()doesnotexist,bothwriteandappendmodewillcreateanew,blankfile.Afterreadingorwritingafile,calltheclose()methodbeforeopeningthefileagain.
Let’sputtheseconceptstogether.Enterthefollowingintotheinteractiveshell:>>>baconFile=open('bacon.txt','w')
>>>baconFile.write('Helloworld!\n')
13
>>>baconFile.close()
>>>baconFile=open('bacon.txt','a')
>>>baconFile.write('Baconisnotavegetable.')
25
>>>baconFile.close()
>>>baconFile=open('bacon.txt')
>>>content=baconFile.read()
>>>baconFile.close()
>>>print(content)
Helloworld!
Baconisnotavegetable.
First,weopenbacon.txtinwritemode.Sincethereisn’tabacon.txtyet,Pythoncreatesone.Callingwrite()ontheopenedfileandpassingwrite()thestringargument'Helloworld!/n'writesthestringtothefileandreturnsthenumberofcharacterswritten,includingthenewline.Thenweclosethefile.
Toaddtexttotheexistingcontentsofthefileinsteadofreplacingthestringwejustwrote,weopenthefileinappendmode.Wewrite'Baconisnotavegetable.'tothefileandcloseit.Finally,toprintthefilecontentstothescreen,weopenthefileinitsdefaultreadmode,callread(),storetheresultingFileobjectincontent,closethefile,andprintcontent.
Notethatthewrite()methoddoesnotautomaticallyaddanewlinecharactertotheendofthestringliketheprint()functiondoes.Youwillhavetoaddthischaracteryourself.
SavingVariableswiththeshelveModuleYoucansavevariablesinyourPythonprogramstobinaryshelffilesusingtheshelvemodule.Thisway,yourprogramcanrestoredatatovariablesfromtheharddrive.TheshelvemodulewillletyouaddSaveandOpenfeaturestoyourprogram.Forexample,ifyouranaprogramandenteredsomeconfigurationsettings,youcouldsavethosesettingstoashelffileandthenhavetheprogramloadthemthenexttimeitisrun.
Enterthefollowingintotheinteractiveshell:>>>importshelve
>>>shelfFile=shelve.open('mydata')
>>>cats=['Zophie','Pooka','Simon']
>>>shelfFile['cats']=cats
>>>shelfFile.close()
Toreadandwritedatausingtheshelvemodule,youfirstimportshelve.Callshelve.open()andpassitafilename,andthenstorethereturnedshelfvalueinavariable.Youcanmakechangestotheshelfvalueasifitwereadictionary.Whenyou’redone,callclose()ontheshelfvalue.Here,ourshelfvalueisstoredinshelfFile.WecreatealistcatsandwriteshelfFile['cats']=catstostorethelistinshelfFileasavalueassociatedwiththekey'cats'(likeinadictionary).Thenwecallclose()onshelfFile.
AfterrunningthepreviouscodeonWindows,youwillseethreenewfilesinthecurrentworkingdirectory:mydata.bak,mydata.dat,andmydata.dir.OnOSX,onlyasinglemydata.dbfilewillbecreated.
Thesebinaryfilescontainthedatayoustoredinyourshelf.Theformatofthesebinaryfilesisnotimportant;youonlyneedtoknowwhattheshelvemoduledoes,nothowitdoesit.Themodulefreesyoufromworryingabouthowtostoreyourprogram’sdatatoafile.
Yourprogramscanusetheshelvemoduletolaterreopenandretrievethedatafromtheseshelffiles.Shelfvaluesdon’thavetobeopenedinreadorwritemode—theycandobothonceopened.Enterthefollowingintotheinteractiveshell:
>>>shelfFile=shelve.open('mydata')
>>>type(shelfFile)
<class'shelve.DbfilenameShelf'>
>>>shelfFile['cats']
['Zophie','Pooka','Simon']
>>>shelfFile.close()
Here,weopentheshelffilestocheckthatourdatawasstoredcorrectly.EnteringshelfFile['cats']returnsthesamelistthatwestoredearlier,soweknowthatthelistiscorrectlystored,andwecallclose().
Justlikedictionaries,shelfvalueshavekeys()andvalues()methodsthatwillreturnlist-likevaluesofthekeysandvaluesintheshelf.Sincethesemethodsreturnlist-likevaluesinsteadoftruelists,youshouldpassthemtothelist()functiontogettheminlistform.Enterthefollowingintotheinteractiveshell:
>>>shelfFile=shelve.open('mydata')
>>>list(shelfFile.keys())
['cats']
>>>list(shelfFile.values())
[['Zophie','Pooka','Simon']]
>>>shelfFile.close()
Plaintextisusefulforcreatingfilesthatyou’llreadinatexteditorsuchasNotepadorTextEdit,butifyouwanttosavedatafromyourPythonprograms,usetheshelvemodule.
SavingVariableswiththepprint.pformat()FunctionRecallfromPrettyPrintingthatthepprint.pprint()functionwill“prettyprint”thecontentsofalistordictionarytothescreen,whilethepprint.pformat()functionwillreturnthissametextasastringinsteadofprintingit.Notonlyisthisstringformattedtobeeasytoread,butitisalsosyntacticallycorrectPythoncode.Sayyouhaveadictionarystoredinavariableandyouwanttosavethisvariableanditscontentsforfutureuse.Usingpprint.pformat()willgiveyouastringthatyoucanwriteto.pyfile.Thisfilewillbeyourveryownmodulethatyoucanimportwheneveryouwanttousethevariablestoredinit.
Forexample,enterthefollowingintotheinteractiveshell:>>>importpprint
>>>cats=[{'name':'Zophie','desc':'chubby'},{'name':'Pooka','desc':'fluffy'}]
>>>pprint.pformat(cats)
"[{'desc':'chubby','name':'Zophie'},{'desc':'fluffy','name':'Pooka'}]"
>>>fileObj=open('myCats.py','w')
>>>fileObj.write('cats='+pprint.pformat(cats)+'\n')
83
>>>fileObj.close()
Here,weimportpprinttoletususepprint.pformat().Wehavealistofdictionaries,storedinavariablecats.Tokeepthelistincatsavailableevenafterweclosetheshell,weusepprint.pformat()toreturnitasastring.Oncewehavethedataincatsasastring,it’seasytowritethestringtoafile,whichwe’llcallmyCats.py.
ThemodulesthatanimportstatementimportsarethemselvesjustPythonscripts.Whenthestringfrompprint.pformat()issavedtoa.pyfile,thefileisamodulethatcanbeimportedjustlikeanyother.
AndsincePythonscriptsarethemselvesjusttextfileswiththe.pyfileextension,yourPythonprogramscanevengenerateotherPythonprograms.Youcanthenimportthesefilesintoscripts.
>>>importmyCats
>>>myCats.cats
[{'name':'Zophie','desc':'chubby'},{'name':'Pooka','desc':'fluffy'}]
>>>myCats.cats[0]
{'name':'Zophie','desc':'chubby'}
>>>myCats.cats[0]['name']
'Zophie'
Thebenefitofcreatinga.pyfile(asopposedtosavingvariableswiththeshelvemodule)isthatbecauseitisatextfile,thecontentsofthefilecanbereadandmodifiedbyanyonewithasimpletexteditor.Formostapplications,however,savingdatausingtheshelvemoduleisthepreferredwaytosavevariablestoafile.Onlybasicdatatypessuchasintegers,floats,strings,lists,anddictionariescanbewrittentoafileassimpletext.Fileobjects,forexample,cannotbeencodedastext.
Project:GeneratingRandomQuizFilesSayyou’reageographyteacherwith35studentsinyourclassandyouwanttogiveapopquizonUSstatecapitals.Alas,yourclasshasafewbadeggsinit,andyoucan’ttrustthestudentsnottocheat.You’dliketorandomizetheorderofquestionssothateachquizisunique,makingitimpossibleforanyonetocribanswersfromanyoneelse.Ofcourse,doingthisbyhandwouldbealengthyandboringaffair.Fortunately,youknowsomePython.
Hereiswhattheprogramdoes:
Creates35differentquizzes.Creates50multiple-choicequestionsforeachquiz,inrandomorder.Providesthecorrectanswerandthreerandomwronganswersforeachquestion,inrandomorder.Writesthequizzesto35textfiles.Writestheanswerkeysto35textfiles.
Thismeansthecodewillneedtodothefollowing:
Storethestatesandtheircapitalsinadictionary.Callopen(),write(),andclose()forthequizandanswerkeytextfiles.Userandom.shuffle()torandomizetheorderofthequestionsandmultiple-choiceoptions.
Step1:StoretheQuizDatainaDictionaryThefirststepistocreateaskeletonscriptandfillitwithyourquizdata.CreateafilenamedrandomQuizGenerator.py,andmakeitlooklikethefollowing:
#!python3
#randomQuizGenerator.py-Createsquizzeswithquestionsandanswersin
#randomorder,alongwiththeanswerkey.
➊importrandom
#Thequizdata.Keysarestatesandvaluesaretheircapitals.
➋capitals={'Alabama':'Montgomery','Alaska':'Juneau','Arizona':'Phoenix',
'Arkansas':'LittleRock','California':'Sacramento','Colorado':'Denver',
'Connecticut':'Hartford','Delaware':'Dover','Florida':'Tallahassee',
'Georgia':'Atlanta','Hawaii':'Honolulu','Idaho':'Boise','Illinois':
'Springfield','Indiana':'Indianapolis','Iowa':'DesMoines','Kansas':
'Topeka','Kentucky':'Frankfort','Louisiana':'BatonRouge','Maine':
'Augusta','Maryland':'Annapolis','Massachusetts':'Boston','Michigan':
'Lansing','Minnesota':'SaintPaul','Mississippi':'Jackson','Missouri':
'JeffersonCity','Montana':'Helena','Nebraska':'Lincoln','Nevada':
'CarsonCity','NewHampshire':'Concord','NewJersey':'Trenton','New
Mexico':'SantaFe','NewYork':'Albany','NorthCarolina':'Raleigh',
'NorthDakota':'Bismarck','Ohio':'Columbus','Oklahoma':'OklahomaCity',
'Oregon':'Salem','Pennsylvania':'Harrisburg','RhodeIsland':'Providence',
'SouthCarolina':'Columbia','SouthDakota':'Pierre','Tennessee':
'Nashville','Texas':'Austin','Utah':'SaltLakeCity','Vermont':
'Montpelier','Virginia':'Richmond','Washington':'Olympia','West
Virginia':'Charleston','Wisconsin':'Madison','Wyoming':'Cheyenne'}
#Generate35quizfiles.
➌forquizNuminrange(35):
#TODO:Createthequizandanswerkeyfiles.
#TODO:Writeouttheheaderforthequiz.
#TODO:Shuffletheorderofthestates.
#TODO:Loopthroughall50states,makingaquestionforeach.
Sincethisprogramwillberandomlyorderingthequestionsandanswers,you’llneedtoimporttherandommodule➊tomakeuseofitsfunctions.Thecapitalsvariable➋containsadictionarywithUSstatesaskeysandtheircapitalsasvalues.Andsinceyouwanttocreate35quizzes,thecodethatactuallygeneratesthequizandanswerkeyfiles(markedwithTODOcommentsfornow)willgoinsideaforloopthatloops35times➌.(Thisnumbercanbechangedtogenerateanynumberofquizfiles.)
Step2:CreatetheQuizFileandShuffletheQuestionOrderNowit’stimetostartfillinginthoseTODOs.
Thecodeintheloopwillberepeated35times—onceforeachquiz—soyouhavetoworryaboutonlyonequizatatimewithintheloop.Firstyou’llcreatetheactualquizfile.Itneedstohaveauniquefilenameandshouldalsohavesomekindofstandardheaderinit,withplacesforthestudenttofillinaname,date,andclassperiod.Thenyou’llneedtogetalistofstatesinrandomizedorder,whichcanbeusedlatertocreatethequestionsandanswersforthequiz.
AddthefollowinglinesofcodetorandomQuizGenerator.py:#!python3
#randomQuizGenerator.py-Createsquizzeswithquestionsandanswersin
#randomorder,alongwiththeanswerkey.
--snip--
#Generate35quizfiles.
forquizNuminrange(35):
#Createthequizandanswerkeyfiles.
➊quizFile=open('capitalsquiz%s.txt'%(quizNum+1),'w')
➋answerKeyFile=open('capitalsquiz_answers%s.txt'%(quizNum+1),'w')
#Writeouttheheaderforthequiz.
➌quizFile.write('Name:\n\nDate:\n\nPeriod:\n\n')
quizFile.write((''*20)+'StateCapitalsQuiz(Form%s)'%(quizNum+1))
quizFile.write('\n\n')
#Shuffletheorderofthestates.
states=list(capitals.keys())
➍random.shuffle(states)
#TODO:Loopthroughall50states,makingaquestionforeach.
Thefilenamesforthequizzeswillbecapitalsquiz<N>.txt,where<N>isauniquenumberforthequizthatcomesfromquizNum,theforloop’scounter.Theanswerkeyforcapitalsquiz<N>.txtwillbestoredinatextfilenamedcapitalsquiz_answers<N>.txt.Eachtimethroughtheloop,the%splaceholderin'capitalsquiz%s.txt'and'capitalsquiz_answers%s.txt'willbereplacedby(quizNum+1),sothefirstquizandanswerkeycreatedwillbecapitalsquiz1.txtandcapitalsquiz_answers1.txt.Thesefileswillbecreatedwithcallstotheopen()functionat➊and➋,with'w'asthesecondargumenttoopentheminwritemode.
Thewrite()statementsat➌createaquizheaderforthestudenttofillout.Finally,arandomizedlistofUSstatesiscreatedwiththehelpoftherandom.shuffle()function➍,whichrandomlyreordersthevaluesinanylistthatispassedtoit.
Step3:CreatetheAnswerOptions
Nowyouneedtogeneratetheansweroptionsforeachquestion,whichwillbemultiplechoicefromAtoD.You’llneedtocreateanotherforloop—thisonetogeneratethecontentforeachofthe50questionsonthequiz.Thentherewillbeathirdforloopnestedinsidetogeneratethemultiple-choiceoptionsforeachquestion.Makeyourcodelooklikethefollowing:
#!python3
#randomQuizGenerator.py-Createsquizzeswithquestionsandanswersin
#randomorder,alongwiththeanswerkey.
--snip--
#Loopthroughall50states,makingaquestionforeach.
forquestionNuminrange(50):
#Getrightandwronganswers.
➊correctAnswer=capitals[states[questionNum]]
➋wrongAnswers=list(capitals.values())
➌delwrongAnswers[wrongAnswers.index(correctAnswer)]
➍wrongAnswers=random.sample(wrongAnswers,3)
➎answerOptions=wrongAnswers+[correctAnswer]
➏random.shuffle(answerOptions)
#TODO:Writethequestionandansweroptionstothequizfile.
#TODO:Writetheanswerkeytoafile.
Thecorrectansweriseasytoget—it’sstoredasavalueinthecapitalsdictionary➊.Thisloopwillloopthroughthestatesintheshuffledstateslist,fromstates[0]tostates[49],findeachstateincapitals,andstorethatstate’scorrespondingcapitalincorrectAnswer.
Thelistofpossiblewronganswersistrickier.Youcangetitbyduplicatingallthevaluesinthecapitalsdictionary➋,deletingthecorrectanswer➌,andselectingthreerandomvaluesfromthislist➍.Therandom.sample()functionmakesiteasytodothisselection.Itsfirstargumentisthelistyouwanttoselectfrom;thesecondargumentisthenumberofvaluesyouwanttoselect.Thefulllistofansweroptionsisthecombinationofthesethreewronganswerswiththecorrectanswers➎.Finally,theanswersneedtoberandomized➏sothatthecorrectresponseisn’talwayschoiceD.
Step4:WriteContenttotheQuizandAnswerKeyFilesAllthatisleftistowritethequestiontothequizfileandtheanswertotheanswerkeyfile.Makeyourcodelooklikethefollowing:
#!python3
#randomQuizGenerator.py-Createsquizzeswithquestionsandanswersin
#randomorder,alongwiththeanswerkey.
--snip--
#Loopthroughall50states,makingaquestionforeach.
forquestionNuminrange(50):
--snip--
#Writethequestionandtheansweroptionstothequizfile.
quizFile.write('%s.Whatisthecapitalof%s?\n'%(questionNum+1,
states[questionNum]))
➊foriinrange(4):
➋quizFile.write('%s.%s\n'%('ABCD'[i],answerOptions[i]))
quizFile.write('\n')
#Writetheanswerkeytoafile.
➌answerKeyFile.write('%s.%s\n'%(questionNum+1,'ABCD'[
answerOptions.index(correctAnswer)]))
quizFile.close()
answerKeyFile.close()
Aforloopthatgoesthroughintegers0to3willwritetheansweroptionsintheanswerOptionslist➊.Theexpression'ABCD'[i]at➋treatsthestring'ABCD'asanarrayandwillevaluateto'A','B','C',andthen'D'oneachrespectiveiterationthroughtheloop.
Inthefinalline➌,theexpressionanswerOptions.index(correctAnswer)willfindtheintegerindexofthecorrectanswerintherandomlyorderedansweroptions,and'ABCD'[answerOptions.index(correctAnswer)]willevaluatetothecorrectanswer’slettertobewrittentotheanswerkeyfile.
Afteryouruntheprogram,thisishowyourcapitalsquiz1.txtfilewilllook,thoughofcourseyourquestionsandansweroptionsmaybedifferentfromthoseshownhere,dependingontheoutcomeofyourrandom.shuffle()calls:
Name:
Date:
Period:
StateCapitalsQuiz(Form1)
1.WhatisthecapitalofWestVirginia?
A.Hartford
B.SantaFe
C.Harrisburg
D.Charleston
2.WhatisthecapitalofColorado?
A.Raleigh
B.Harrisburg
C.Denver
D.Lincoln
--snip--
Thecorrespondingcapitalsquiz_answers1.txttextfilewilllooklikethis:1.D
2.C
3.A
4.C
--snip--
Project:MulticlipboardSayyouhavetheboringtaskoffillingoutmanyformsinawebpageorsoftwarewithseveraltextfields.Theclipboardsavesyoufromtypingthesametextoverandoveragain.Butonlyonethingcanbeontheclipboardatatime.Ifyouhaveseveraldifferentpiecesoftextthatyouneedtocopyandpaste,youhavetokeephighlightingandcopyingthesamefewthingsoverandoveragain.
YoucanwriteaPythonprogramtokeeptrackofmultiplepiecesoftext.This“multiclipboard”willbenamedmcb.pyw(since“mcb”isshortertotypethan“multiclipboard”).The.pywextensionmeansthatPythonwon’tshowaTerminalwindowwhenitrunsthisprogram.(SeeAppendixBformoredetails.)
Theprogramwillsaveeachpieceofclipboardtextunderakeyword.Forexample,whenyourunpymcb.pywsavespam,thecurrentcontentsoftheclipboardwillbesavedwiththekeywordspam.Thistextcanlaterbeloadedtotheclipboardagainbyrunningpymcb.pywspam.Andiftheuserforgetswhatkeywordstheyhave,theycanrunpymcb.pywlisttocopyalistofallkeywordstotheclipboard.
Here’swhattheprogramdoes:
Thecommandlineargumentforthekeywordischecked.Iftheargumentissave,thentheclipboardcontentsaresavedtothekeyword.Iftheargumentislist,thenallthekeywordsarecopiedtotheclipboard.Otherwise,thetextforthekeywordiscopiedtothekeyboard.
Thismeansthecodewillneedtodothefollowing:
Readthecommandlineargumentsfromsys.argv.Readandwritetotheclipboard.Saveandloadtoashelffile.
IfyouuseWindows,youcaneasilyrunthisscriptfromtheRun…windowbycreatingabatchfilenamedmcb.batwiththefollowingcontent:
@pyw.exeC:\Python34\mcb.pyw%*
Step1:CommentsandShelfSetupLet’sstartbymakingaskeletonscriptwithsomecommentsandbasicsetup.Makeyourcodelooklikethefollowing:
#!python3
#mcb.pyw-Savesandloadspiecesoftexttotheclipboard.
➊#Usage:py.exemcb.pywsave<keyword>-Savesclipboardtokeyword.
#py.exemcb.pyw<keyword>-Loadskeywordtoclipboard.
#py.exemcb.pywlist-Loadsallkeywordstoclipboard.
➋importshelve,pyperclip,sys
➌mcbShelf=shelve.open('mcb')
#TODO:Saveclipboardcontent.
#TODO:Listkeywordsandloadcontent.
mcbShelf.close()
It’scommonpracticetoputgeneralusageinformationincommentsatthetopofthefile
➊.Ifyoueverforgethowtorunyourscript,youcanalwayslookatthesecommentsforareminder.Thenyouimportyourmodules➋.Copyingandpastingwillrequirethepyperclipmodule,andreadingthecommandlineargumentswillrequirethesysmodule.Theshelvemodulewillalsocomeinhandy:Whenevertheuserwantstosaveanewpieceofclipboardtext,you’llsaveittoashelffile.Then,whentheuserwantstopastethetextbacktotheirclipboard,you’llopentheshelffileandloaditbackintoyourprogram.Theshelffilewillbenamedwiththeprefixmcb➌.
Step2:SaveClipboardContentwithaKeywordTheprogramdoesdifferentthingsdependingonwhethertheuserwantstosavetexttoakeyword,loadtextintotheclipboard,orlistalltheexistingkeywords.Let’sdealwiththatfirstcase.Makeyourcodelooklikethefollowing:
#!python3
#mcb.pyw-Savesandloadspiecesoftexttotheclipboard.
--snip--
#Saveclipboardcontent.
➊iflen(sys.argv)==3andsys.argv[1].lower()=='save':
➋mcbShelf[sys.argv[2]]=pyperclip.paste()
eliflen(sys.argv)==2:
➌#TODO:Listkeywordsandloadcontent.
mcbShelf.close()
Ifthefirstcommandlineargument(whichwillalwaysbeatindex1ofthesys.argvlist)is'save'➊,thesecondcommandlineargumentisthekeywordforthecurrentcontentoftheclipboard.ThekeywordwillbeusedasthekeyformcbShelf,andthevaluewillbethetextcurrentlyontheclipboard➋.
Ifthereisonlyonecommandlineargument,youwillassumeitiseither'list'orakeywordtoloadcontentontotheclipboard.Youwillimplementthatcodelater.Fornow,justputaTODOcommentthere➌.
Step3:ListKeywordsandLoadaKeyword’sContentFinally,let’simplementthetworemainingcases:Theuserwantstoloadclipboardtextinfromakeyword,ortheywantalistofallavailablekeywords.Makeyourcodelooklikethefollowing:
#!python3
#mcb.pyw-Savesandloadspiecesoftexttotheclipboard.
--snip--
#Saveclipboardcontent.
iflen(sys.argv)==3andsys.argv[1].lower()=='save':
mcbShelf[sys.argv[2]]=pyperclip.paste()
eliflen(sys.argv)==2:
#Listkeywordsandloadcontent.
➊ifsys.argv[1].lower()=='list':
➋pyperclip.copy(str(list(mcbShelf.keys())))
elifsys.argv[1]inmcbShelf:
➌pyperclip.copy(mcbShelf[sys.argv[1]])
mcbShelf.close()
Ifthereisonlyonecommandlineargument,firstlet’scheckwhetherit’s'list'➊.Ifso,astringrepresentationofthelistofshelfkeyswillbecopiedtotheclipboard➋.Theusercanpastethislistintoanopentexteditortoreadit.
Otherwise,youcanassumethecommandlineargumentisakeyword.Ifthiskeyword
existsinthemcbShelfshelfasakey,youcanloadthevalueontotheclipboard➌.
Andthat’sit!Launchingthisprogramhasdifferentstepsdependingonwhatoperatingsystemyourcomputeruses.SeeAppendixBfordetailsforyouroperatingsystem.
RecallthepasswordlockerprogramyoucreatedinChapter6thatstoredthepasswordsinadictionary.Updatingthepasswordsrequiredchangingthesourcecodeoftheprogram.Thisisn’tidealbecauseaverageusersdon’tfeelcomfortablechangingsourcecodetoupdatetheirsoftware.Also,everytimeyoumodifythesourcecodetoaprogram,youruntheriskofaccidentallyintroducingnewbugs.Bystoringthedataforaprograminadifferentplacethanthecode,youcanmakeyourprogramseasierforotherstouseandmoreresistanttobugs.
SummaryFilesareorganizedintofolders(alsocalleddirectories),andapathdescribesthelocationofafile.Everyprogramrunningonyourcomputerhasacurrentworkingdirectory,whichallowsyoutospecifyfilepathsrelativetothecurrentlocationinsteadofalwaystypingthefull(orabsolute)path.Theos.pathmodulehasmanyfunctionsformanipulatingfilepaths.
Yourprogramscanalsodirectlyinteractwiththecontentsoftextfiles.Theopen()functioncanopenthesefilestoreadintheircontentsasonelargestring(withtheread()method)orasalistofstrings(withthereadlines()method).Theopen()functioncanopenfilesinwriteorappendmodetocreatenewtextfilesoraddtoexistingtextfiles,respectively.
Inpreviouschapters,youusedtheclipboardasawayofgettinglargeamountsoftextintoaprogram,ratherthantypingitallin.Nowyoucanhaveyourprogramsreadfilesdirectlyfromtheharddrive,whichisabigimprovement,sincefilesaremuchlessvolatilethantheclipboard.
Inthenextchapter,youwilllearnhowtohandlethefilesthemselves,bycopyingthem,deletingthem,renamingthem,movingthem,andmore.
PracticeQuestionsQ: 1.Whatisarelativepathrelativeto?
Q: 2.Whatdoesanabsolutepathstartwith?
Q: 3.Whatdotheos.getcwd()andos.chdir()functionsdo?
Q: 4.Whatarethe.and..folders?
Q: 5.InC:\bacon\eggs\spam.txt,whichpartisthedirname,andwhichpartisthebasename?
Q: 6.Whatarethethree“mode”argumentsthatcanbepassedtotheopen()function?
Q: 7.Whathappensifanexistingfileisopenedinwritemode?
Q: 8.Whatisthedifferencebetweentheread()andreadlines()methods?
Q: 9.Whatdatastructuredoesashelfvalueresemble?
PracticeProjectsForpractice,designandwritethefollowingprograms.
ExtendingtheMulticlipboardExtendthemulticlipboardprograminthischaptersothatithasadelete<keyword>commandlineargumentthatwilldeleteakeywordfromtheshelf.Thenaddadeletecommandlineargumentthatwilldeleteallkeywords.
MadLibsCreateaMadLibsprogramthatreadsintextfilesandletstheuseraddtheirowntextanywherethewordADJECTIVE,NOUN,ADVERB,orVERBappearsinthetextfile.Forexample,atextfilemaylooklikethis:
TheADJECTIVEpandawalkedtotheNOUNandthenVERB.AnearbyNOUNwas
unaffectedbytheseevents.
Theprogramwouldfindtheseoccurrencesandprompttheusertoreplacethem.Enteranadjective:
silly
Enteranoun:
chandelier
Enteraverb:
screamed
Enteranoun:
pickuptruck
Thefollowingtextfilewouldthenbecreated:Thesillypandawalkedtothechandelierandthenscreamed.Anearbypickup
truckwasunaffectedbytheseevents.
Theresultsshouldbeprintedtothescreenandsavedtoanewtextfile.
RegexSearchWriteaprogramthatopensall.txtfilesinafolderandsearchesforanylinethatmatchesauser-suppliedregularexpression.Theresultsshouldbeprintedtothescreen.
Chapter9.OrganizingFilesInthepreviouschapter,youlearnedhowtocreateandwritetonewfilesinPython.Yourprogramscanalsoorganizepreexistingfilesontheharddrive.Maybeyou’vehadtheexperienceofgoingthroughafolderfullofdozens,hundreds,oreventhousandsoffilesandcopying,renaming,moving,orcompressingthemallbyhand.Orconsidertaskssuchasthese:
MakingcopiesofallPDFfiles(andonlythePDFfiles)ineverysub-folderofafolderRemovingtheleadingzerosinthefilenamesforeveryfileinafolderofhundredsoffilesnamedspam001.txt,spam002.txt,spam003.txt,andsoonCompressingthecontentsofseveralfoldersintooneZIPfile(whichcouldbeasimplebackupsystem)
AllthisboringstuffisjustbeggingtobeautomatedinPython.Byprogrammingyourcomputertodothesetasks,youcantransformitintoaquick-workingfileclerkwhonevermakesmistakes.
Asyoubeginworkingwithfiles,youmayfindithelpfultobeabletoquicklyseewhattheextension(.txt,.pdf,.jpg,andsoon)ofafileis.WithOSXandLinux,yourfilebrowsermostlikelyshowsextensionsautomatically.WithWindows,fileextensionsmaybehiddenbydefault.Toshowextensions,gotoStart▸ControlPanel▸AppearanceandPersonalization▸FolderOptions.OntheViewtab,underAdvancedSettings,unchecktheHideextensionsforknownfiletypescheckbox.
TheshutilModuleTheshutil(orshellutilities)modulehasfunctionstoletyoucopy,move,rename,anddeletefilesinyourPythonprograms.Tousetheshutilfunctions,youwillfirstneedtouseimportshutil.
CopyingFilesandFoldersTheshutilmoduleprovidesfunctionsforcopyingfiles,aswellasentirefolders.
Callingshutil.copy(source,destination)willcopythefileatthepathsourcetothefolderatthepathdestination.(Bothsourceanddestinationarestrings.)Ifdestinationisafilename,itwillbeusedasthenewnameofthecopiedfile.Thisfunctionreturnsastringofthepathofthecopiedfile.
Enterthefollowingintotheinteractiveshelltoseehowshutil.copy()works:>>>importshutil,os
>>>os.chdir('C:\\')
➊>>>shutil.copy('C:\\spam.txt','C:\\delicious')
'C:\\delicious\\spam.txt'
➋>>>shutil.copy('eggs.txt','C:\\delicious\\eggs2.txt')
'C:\\delicious\\eggs2.txt'
Thefirstshutil.copy()callcopiesthefileatC:\spam.txttothefolderC:\delicious.Thereturnvalueisthepathofthenewlycopiedfile.Notethatsinceafolderwasspecifiedasthedestination➊,theoriginalspam.txtfilenameisusedforthenew,copiedfile’sfilename.Thesecondshutil.copy()call➋alsocopiesthefileatC:\eggs.txttothefolderC:\deliciousbutgivesthecopiedfilethenameeggs2.txt.
Whileshutil.copy()willcopyasinglefile,shutil.copytree()willcopyanentirefolderandeveryfolderandfilecontainedinit.Callingshutil.copytree(source,destination)willcopythefolderatthepathsource,alongwithallofitsfilesandsubfolders,tothefolderatthepathdestination.Thesourceanddestinationparametersarebothstrings.Thefunctionreturnsastringofthepathofthecopiedfolder.
Enterthefollowingintotheinteractiveshell:>>>importshutil,os
>>>os.chdir('C:\\')
>>>shutil.copytree('C:\\bacon','C:\\bacon_backup')
'C:\\bacon_backup'
Theshutil.copytree()callcreatesanewfoldernamedbacon_backupwiththesamecontentastheoriginalbaconfolder.Youhavenowsafelybackedupyourprecious,preciousbacon.
MovingandRenamingFilesandFoldersCallingshutil.move(source,destination)willmovethefileorfolderatthepathsourcetothepathdestinationandwillreturnastringoftheabsolutepathofthenewlocation.
Ifdestinationpointstoafolder,thesourcefilegetsmovedintodestinationandkeepsitscurrentfilename.Forexample,enterthefollowingintotheinteractiveshell:
>>>importshutil
>>>shutil.move('C:\\bacon.txt','C:\\eggs')
'C:\\eggs\\bacon.txt'
AssumingafoldernamedeggsalreadyexistsintheC:\directory,thisshutil.move()callssays,“MoveC:\bacon.txtintothefolderC:\eggs.”
Iftherehadbeenabacon.txtfilealreadyinC:\eggs,itwouldhavebeenoverwritten.Sinceit’seasytoaccidentallyoverwritefilesinthisway,youshouldtakesomecarewhenusingmove().
Thedestinationpathcanalsospecifyafilename.Inthefollowingexample,thesourcefileismovedandrenamed.
>>>shutil.move('C:\\bacon.txt','C:\\eggs\\new_bacon.txt')
'C:\\eggs\\new_bacon.txt'
Thislinesays,“MoveC:\bacon.txtintothefolderC:\eggs,andwhileyou’reatit,renamethatbacon.txtfiletonew_bacon.txt.”
BothofthepreviousexamplesworkedundertheassumptionthattherewasafoldereggsintheC:\directory.Butifthereisnoeggsfolder,thenmove()willrenamebacon.txttoafilenamedeggs.
>>>shutil.move('C:\\bacon.txt','C:\\eggs')
'C:\\eggs'
Here,move()can’tfindafoldernamedeggsintheC:\directoryandsoassumesthatdestinationmustbespecifyingafilename,notafolder.Sothebacon.txttextfileisrenamedtoeggs(atextfilewithoutthe.txtfileextension)—probablynotwhatyouwanted!Thiscanbeatough-to-spotbuginyourprogramssincethemove()callcanhappilydosomethingthatmightbequitedifferentfromwhatyouwereexpecting.Thisisyetanotherreasontobecarefulwhenusingmove().
Finally,thefoldersthatmakeupthedestinationmustalreadyexist,orelsePythonwillthrowanexception.Enterthefollowingintotheinteractiveshell:
>>>shutil.move('spam.txt','c:\\does_not_exist\\eggs\\ham')
Traceback(mostrecentcalllast):
File"C:\Python34\lib\shutil.py",line521,inmove
os.rename(src,real_dst)
FileNotFoundError:[WinError3]Thesystemcannotfindthepathspecified:
'spam.txt'->'c:\\does_not_exist\\eggs\\ham'
Duringhandlingoftheaboveexception,anotherexceptionoccurred:
Traceback(mostrecentcalllast):
File"<pyshell#29>",line1,in<module>
shutil.move('spam.txt','c:\\does_not_exist\\eggs\\ham')
File"C:\Python34\lib\shutil.py",line533,inmove
copy2(src,real_dst)
File"C:\Python34\lib\shutil.py",line244,incopy2
copyfile(src,dst,follow_symlinks=follow_symlinks)
File"C:\Python34\lib\shutil.py",line108,incopyfile
withopen(dst,'wb')asfdst:
FileNotFoundError:[Errno2]Nosuchfileordirectory:'c:\\does_not_exist\\
eggs\\ham'
Pythonlooksforeggsandhaminsidethedirectorydoes_not_exist.Itdoesn’tfindthenonexistentdirectory,soitcan’tmovespam.txttothepathyouspecified.
PermanentlyDeletingFilesandFoldersYoucandeleteasinglefileorasingleemptyfolderwithfunctionsintheosmodule,whereastodeleteafolderandallofitscontents,youusetheshutilmodule.
Callingos.unlink(path)willdeletethefileatpath.
Callingos.rmdir(path)willdeletethefolderatpath.Thisfoldermustbeemptyofanyfilesorfolders.Callingshutil.rmtree(path)willremovethefolderatpath,andallfilesandfoldersitcontainswillalsobedeleted.
Becarefulwhenusingthesefunctionsinyourprograms!It’softenagoodideatofirstrunyourprogramwiththesecallscommentedoutandwithprint()callsaddedtoshowthefilesthatwouldbedeleted.HereisaPythonprogramthatwasintendedtodeletefilesthathavethe.txtfileextensionbuthasatypo(highlightedinbold)thatcausesittodelete.rxtfilesinstead:
importos
forfilenameinos.listdir():
iffilename.endswith('.rxt'):
os.unlink(filename)
Ifyouhadanyimportantfilesendingwith.rxt,theywouldhavebeenaccidentally,permanentlydeleted.Instead,youshouldhavefirstruntheprogramlikethis:
importos
forfilenameinos.listdir():
iffilename.endswith('.rxt'):
#os.unlink(filename)
print(filename)
Nowtheos.unlink()calliscommented,soPythonignoresit.Instead,youwillprintthefilenameofthefilethatwouldhavebeendeleted.Runningthisversionoftheprogramfirstwillshowyouthatyou’veaccidentallytoldtheprogramtodelete.rxtfilesinsteadof.txtfiles.
Onceyouarecertaintheprogramworksasintended,deletetheprint(filename)lineanduncommenttheos.unlink(filename)line.Thenruntheprogramagaintoactuallydeletethefiles.
SafeDeleteswiththesend2trashModuleSincePython’sbuilt-inshutil.rmtree()functionirreversiblydeletesfilesandfolders,itcanbedangeroustouse.Amuchbetterwaytodeletefilesandfoldersiswiththethird-partysend2trashmodule.Youcaninstallthismodulebyrunningpipinstallsend2trashfromaTerminalwindow.(SeeAppendixAforamorein-depthexplanationofhowtoinstallthird-partymodules.)
Usingsend2trashismuchsaferthanPython’sregulardeletefunctions,becauseitwillsendfoldersandfilestoyourcomputer’strashorrecyclebininsteadofpermanentlydeletingthem.Ifabuginyourprogramdeletessomethingwithsend2trashyoudidn’tintendtodelete,youcanlaterrestoreitfromtherecyclebin.
Afteryouhaveinstalledsend2trash,enterthefollowingintotheinteractiveshell:>>>importsend2trash
>>>baconFile=open('bacon.txt','a')#createsthefile
>>>baconFile.write('Baconisnotavegetable.')
25
>>>baconFile.close()
>>>send2trash.send2trash('bacon.txt')
Ingeneral,youshouldalwaysusethesend2trash.send2trash()functiontodeletefilesandfolders.Butwhilesendingfilestotherecyclebinletsyourecoverthemlater,itwillnotfreeupdiskspacelikepermanentlydeletingthemdoes.Ifyouwantyourprogramto
freeupdiskspace,usetheosandshutilfunctionsfordeletingfilesandfolders.Notethatthesend2trash()functioncanonlysendfilestotherecyclebin;itcannotpullfilesoutofit.
WalkingaDirectoryTreeSayyouwanttorenameeveryfileinsomefolderandalsoeveryfileineverysubfolderofthatfolder.Thatis,youwanttowalkthroughthedirectorytree,touchingeachfileasyougo.Writingaprogramtodothiscouldgettricky;fortunately,Pythonprovidesafunctiontohandlethisprocessforyou.
Let’slookattheC:\deliciousfolderwithitscontents,showninFigure9-1.
Figure9-1.Anexamplefolderthatcontainsthreefoldersandfourfiles
Hereisanexampleprogramthatusestheos.walk()functiononthedirectorytreefromFigure9-1:
importos
forfolderName,subfolders,filenamesinos.walk('C:\\delicious'):
print('Thecurrentfolderis'+folderName)
forsubfolderinsubfolders:
print('SUBFOLDEROF'+folderName+':'+subfolder)
forfilenameinfilenames:
print('FILEINSIDE'+folderName+':'+filename)
print('')
Theos.walk()functionispassedasinglestringvalue:thepathofafolder.Youcanuseos.walk()inaforloopstatementtowalkadirectorytree,muchlikehowyoucanusetherange()functiontowalkoverarangeofnumbers.Unlikerange(),theos.walk()functionwillreturnthreevaluesoneachiterationthroughtheloop:
1. Astringofthecurrentfolder’sname2. Alistofstringsofthefoldersinthecurrentfolder3. Alistofstringsofthefilesinthecurrentfolder
(Bycurrentfolder,Imeanthefolderforthecurrentiterationoftheforloop.Thecurrentworkingdirectoryoftheprogramisnotchangedbyos.walk().)
Justlikeyoucanchoosethevariablenameiinthecodeforiinrange(10):,youcanalsochoosethevariablenamesforthethreevalueslistedearlier.Iusuallyusethenamesfoldername,subfolders,andfilenames.
Whenyourunthisprogram,itwilloutputthefollowing:ThecurrentfolderisC:\delicious
SUBFOLDEROFC:\delicious:cats
SUBFOLDEROFC:\delicious:walnut
FILEINSIDEC:\delicious:spam.txt
ThecurrentfolderisC:\delicious\cats
FILEINSIDEC:\delicious\cats:catnames.txt
FILEINSIDEC:\delicious\cats:zophie.jpg
ThecurrentfolderisC:\delicious\walnut
SUBFOLDEROFC:\delicious\walnut:waffles
ThecurrentfolderisC:\delicious\walnut\waffles
FILEINSIDEC:\delicious\walnut\waffles:butter.txt.
Sinceos.walk()returnslistsofstringsforthesubfolderandfilenamevariables,youcanusetheselistsintheirownforloops.Replacetheprint()functioncallswithyourowncustomcode.(Orifyoudon’tneedoneorbothofthem,removetheforloops.)
CompressingFileswiththezipfileModuleYoumaybefamiliarwithZIPfiles(withthe.zipfileextension),whichcanholdthecompressedcontentsofmanyotherfiles.Compressingafilereducesitssize,whichisusefulwhentransferringitovertheInternet.AndsinceaZIPfilecanalsocontainmultiplefilesandsubfolders,it’sahandywaytopackageseveralfilesintoone.Thissinglefile,calledanarchivefile,canthenbe,say,attachedtoanemail.
YourPythonprogramscanbothcreateandopen(orextract)ZIPfilesusingfunctionsinthezipfilemodule.SayyouhaveaZIPfilenamedexample.zipthathasthecontentsshowninFigure9-2.
YoucandownloadthisZIPfilefromhttp://nostarch.com/automatestuff/orjustfollowalongusingaZIPfilealreadyonyourcomputer.
Figure9-2.Thecontentsofexample.zip
ReadingZIPFilesToreadthecontentsofaZIPfile,firstyoumustcreateaZipFileobject(notethecapitallettersZandF).ZipFileobjectsareconceptuallysimilartotheFileobjectsyousawreturnedbytheopen()functioninthepreviouschapter:Theyarevaluesthroughwhichtheprograminteractswiththefile.TocreateaZipFileobject,callthezipfile.ZipFile()function,passingitastringofthe.zipfile’sfilename.NotethatzipfileisthenameofthePythonmodule,andZipFile()isthenameofthefunction.
Forexample,enterthefollowingintotheinteractiveshell:>>>importzipfile,os
>>>os.chdir('C:\\')#movetothefolderwithexample.zip
>>>exampleZip=zipfile.ZipFile('example.zip')
>>>exampleZip.namelist()
['spam.txt','cats/','cats/catnames.txt','cats/zophie.jpg']
>>>spamInfo=exampleZip.getinfo('spam.txt')
>>>spamInfo.file_size
13908
>>>spamInfo.compress_size
3828
➊>>>'Compressedfileis%sxsmaller!'%(round(spamInfo.file_size/spamInfo
.compress_size,2))
'Compressedfileis3.63xsmaller!'
>>>exampleZip.close()
AZipFileobjecthasanamelist()methodthatreturnsalistofstringsforallthefilesandfolderscontainedintheZIPfile.Thesestringscanbepassedtothegetinfo()ZipFilemethodtoreturnaZipInfoobjectaboutthatparticularfile.ZipInfoobjectshavetheirownattributes,suchasfile_sizeandcompress_sizeinbytes,whichholdintegersoftheoriginalfilesizeandcompressedfilesize,respectively.WhileaZipFileobjectrepresents
anentirearchivefile,aZipInfoobjectholdsusefulinformationaboutasinglefileinthearchive.
Thecommandat➊calculateshowefficientlyexample.zipiscompressedbydividingtheoriginalfilesizebythecompressedfilesizeandprintsthisinformationusingastringformattedwith%s.
ExtractingfromZIPFilesTheextractall()methodforZipFileobjectsextractsallthefilesandfoldersfromaZIPfileintothecurrentworkingdirectory.
>>>importzipfile,os
>>>os.chdir('C:\\')#movetothefolderwithexample.zip
>>>exampleZip=zipfile.ZipFile('example.zip')
➊>>>exampleZip.extractall()
>>>exampleZip.close()
Afterrunningthiscode,thecontentsofexample.zipwillbeextractedtoC:\.Optionally,youcanpassafoldernametoextractall()tohaveitextractthefilesintoafolderotherthanthecurrentworkingdirectory.Ifthefolderpassedtotheextractall()methoddoesnotexist,itwillbecreated.Forinstance,ifyoureplacedthecallat➊withexampleZip.extractall('C:\\delicious'),thecodewouldextractthefilesfromexample.zipintoanewlycreatedC:\deliciousfolder.
Theextract()methodforZipFileobjectswillextractasinglefilefromtheZIPfile.Continuetheinteractiveshellexample:
>>>exampleZip.extract('spam.txt')
'C:\\spam.txt'
>>>exampleZip.extract('spam.txt','C:\\some\\new\\folders')
'C:\\some\\new\\folders\\spam.txt'
>>>exampleZip.close()
Thestringyoupasstoextract()mustmatchoneofthestringsinthelistreturnedbynamelist().Optionally,youcanpassasecondargumenttoextract()toextractthefileintoafolderotherthanthecurrentworkingdirectory.Ifthissecondargumentisafolderthatdoesn’tyetexist,Pythonwillcreatethefolder.Thevaluethatextract()returnsistheabsolutepathtowhichthefilewasextracted.
CreatingandAddingtoZIPFilesTocreateyourowncompressedZIPfiles,youmustopentheZipFileobjectinwritemodebypassing'w'asthesecondargument.(Thisissimilartoopeningatextfileinwritemodebypassing'w'totheopen()function.)
Whenyoupassapathtothewrite()methodofaZipFileobject,PythonwillcompressthefileatthatpathandadditintotheZIPfile.Thewrite()method’sfirstargumentisastringofthefilenametoadd.Thesecondargumentisthecompressiontypeparameter,whichtellsthecomputerwhatalgorithmitshouldusetocompressthefiles;youcanalwaysjustsetthisvaluetozipfile.ZIP_DEFLATED.(Thisspecifiesthedeflatecompressionalgorithm,whichworkswellonalltypesofdata.)Enterthefollowingintotheinteractiveshell:
>>>importzipfile
>>>newZip=zipfile.ZipFile('new.zip','w')
>>>newZip.write('spam.txt',compress_type=zipfile.ZIP_DEFLATED)
>>>newZip.close()
ThiscodewillcreateanewZIPfilenamednew.zipthathasthecompressedcontentsofspam.txt.
Keepinmindthat,justaswithwritingtofiles,writemodewilleraseallexistingcontentsofaZIPfile.IfyouwanttosimplyaddfilestoanexistingZIPfile,pass'a'asthesecondargumenttozipfile.ZipFile()toopentheZIPfileinappendmode.
Project:RenamingFileswithAmerican-StyleDatestoEuropean-StyleDatesSayyourbossemailsyouthousandsoffileswithAmerican-styledates(MM-DD-YYYY)intheirnamesandneedsthemrenamedtoEuropean-styledates(DD-MM-YYYY).Thisboringtaskcouldtakealldaytodobyhand!Let’swriteaprogramtodoitinstead.
Here’swhattheprogramdoes:
ItsearchesallthefilenamesinthecurrentworkingdirectoryforAmerican-styledates.Whenoneisfound,itrenamesthefilewiththemonthanddayswappedtomakeitEuropean-style.
Thismeansthecodewillneedtodothefollowing:
CreatearegexthatcanidentifythetextpatternofAmerican-styledates.Callos.listdir()tofindallthefilesintheworkingdirectory.Loopovereachfilename,usingtheregextocheckwhetherithasadate.Ifithasadate,renamethefilewithshutil.move().
Forthisproject,openanewfileeditorwindowandsaveyourcodeasrenameDates.py.
Step1:CreateaRegexforAmerican-StyleDatesThefirstpartoftheprogramwillneedtoimportthenecessarymodulesandcreatearegexthatcanidentifyMM-DD-YYYYdates.Theto-docommentswillremindyouwhat’slefttowriteinthisprogram.TypingthemasTODOmakesthemeasytofindusingIDLE’sCTRL-Ffindfeature.Makeyourcodelooklikethefollowing:
#!python3
#renameDates.py-RenamesfilenameswithAmericanMM-DD-YYYYdateformat
#toEuropeanDD-MM-YYYY.
➊importshutil,os,re
#CreatearegexthatmatchesfileswiththeAmericandateformat.
➋datePattern=re.compile(r"""^(.*?)#alltextbeforethedate
((0|1)?\d)-#oneortwodigitsforthemonth
((0|1|2|3)?\d)-#oneortwodigitsfortheday
((19|20)\d\d)#fourdigitsfortheyear
(.*?)$#alltextafterthedate
➌""",re.VERBOSE)
#TODO:Loopoverthefilesintheworkingdirectory.
#TODO:Skipfileswithoutadate.
#TODO:Getthedifferentpartsofthefilename.
#TODO:FormtheEuropean-stylefilename.
#TODO:Getthefull,absolutefilepaths.
#TODO:Renamethefiles.
Fromthischapter,youknowtheshutil.move()functioncanbeusedtorenamefiles:Itsargumentsarethenameofthefiletorenameandthenewfilename.Becausethisfunctionexistsintheshutilmodule,youmustimportthatmodule➊.
Butbeforerenamingthefiles,youneedtoidentifywhichfilesyouwanttorename.Filenameswithdatessuchasspam4-4-1984.txtand01-03-2014eggs.zipshouldbe
renamed,whilefilenameswithoutdatessuchaslittlebrother.epubcanbeignored.
Youcanusearegularexpressiontoidentifythispattern.Afterimportingtheremoduleatthetop,callre.compile()tocreateaRegexobject➋.Passingre.VERBOSEforthesecondargument➌willallowwhitespaceandcommentsintheregexstringtomakeitmorereadable.
Theregularexpressionstringbeginswith^(.*?)tomatchanytextatthebeginningofthefilenamethatmightcomebeforethedate.The((0|1)?\d)groupmatchesthemonth.Thefirstdigitcanbeeither0or1,sotheregexmatches12forDecemberbutalso02forFebruary.Thisdigitisalsooptionalsothatthemonthcanbe04or4forApril.Thegroupforthedayis((0|1|2|3)?\d)andfollowssimilarlogic;3,03,and31areallvalidnumbersfordays.(Yes,thisregexwillacceptsomeinvaliddatessuchas4-31-2014,2-29-2013,and0-15-2014.Dateshavealotofthornyspecialcasesthatcanbeeasytomiss.Butforsimplicity,theregexinthisprogramworkswellenough.)
While1885isavalidyear,youcanjustlookforyearsinthe20thor21stcentury.Thiswillkeepyourprogramfromaccidentallymatchingnondatefilenameswithadate-likeformat,suchas10-10-1000.txt.
The(.*?)$partoftheregexwillmatchanytextthatcomesafterthedate.
Step2:IdentifytheDatePartsfromtheFilenamesNext,theprogramwillhavetoloopoverthelistoffilenamestringsreturnedfromos.listdir()andmatchthemagainsttheregex.Anyfilesthatdonothaveadateinthemshouldbeskipped.Forfilenamesthathaveadate,thematchedtextwillbestoredinseveralvariables.FillinthefirstthreeTODOsinyourprogramwiththefollowingcode:
#!python3
#renameDates.py-RenamesfilenameswithAmericanMM-DD-YYYYdateformat
#toEuropeanDD-MM-YYYY.
--snip--
#Loopoverthefilesintheworkingdirectory.
foramerFilenameinos.listdir('.'):
mo=datePattern.search(amerFilename)
#Skipfileswithoutadate.
➊ifmo==None:
➋continue
➌#Getthedifferentpartsofthefilename.
beforePart=mo.group(1)
monthPart=mo.group(2)
dayPart=mo.group(4)
yearPart=mo.group(6)
afterPart=mo.group(8)
--snip--
IftheMatchobjectreturnedfromthesearch()methodisNone➊,thenthefilenameinamerFilenamedoesnotmatchtheregularexpression.Thecontinuestatement➋willskiptherestoftheloopandmoveontothenextfilename.
Otherwise,thevariousstringsmatchedintheregularexpressiongroupsarestoredinvariablesnamedbeforePart,monthPart,dayPart,yearPart,andafterPart➌.ThestringsinthesevariableswillbeusedtoformtheEuropean-stylefilenameinthenextstep.
Tokeepthegroupnumbersstraight,tryreadingtheregexfromthebeginningandcountupeachtimeyouencounteranopeningparenthesis.Withoutthinkingaboutthecode,justwriteanoutlineoftheregularexpression.Thiscanhelpyouvisualizethegroups.Forexample:
datePattern=re.compile(r"""^(1)#alltextbeforethedate
(2(3))-#oneortwodigitsforthemonth
(4(5))-#oneortwodigitsfortheday
(6(7))#fourdigitsfortheyear
(8)$#alltextafterthedate
""",re.VERBOSE)
Here,thenumbers1through8representthegroupsintheregularexpressionyouwrote.Makinganoutlineoftheregularexpression,withjusttheparenthesesandgroupnumbers,cangiveyouaclearerunderstandingofyourregexbeforeyoumoveonwiththerestoftheprogram.
Step3:FormtheNewFilenameandRenametheFilesAsthefinalstep,concatenatethestringsinthevariablesmadeinthepreviousstepwiththeEuropean-styledate:Thedatecomesbeforethemonth.FillinthethreeremainingTODOsinyourprogramwiththefollowingcode:
#!python3
#renameDates.py-RenamesfilenameswithAmericanMM-DD-YYYYdateformat
#toEuropeanDD-MM-YYYY.
--snip--
#FormtheEuropean-stylefilename.
➊euroFilename=beforePart+dayPart+'-'+monthPart+'-'+yearPart+
afterPart
#Getthefull,absolutefilepaths.
absWorkingDir=os.path.abspath('.')
amerFilename=os.path.join(absWorkingDir,amerFilename)
euroFilename=os.path.join(absWorkingDir,euroFilename)
#Renamethefiles.
➋print('Renaming"%s"to"%s"...'%(amerFilename,euroFilename))
➌#shutil.move(amerFilename,euroFilename)#uncommentaftertesting
StoretheconcatenatedstringinavariablenamedeuroFilename➊.Then,passtheoriginalfilenameinamerFilenameandtheneweuroFilenamevariabletotheshutil.move()functiontorenamethefile➌.
Thisprogramhastheshutil.move()callcommentedoutandinsteadprintsthefilenamesthatwillberenamed➋.Runningtheprogramlikethisfirstcanletyoudouble-checkthatthefilesarerenamedcorrectly.Thenyoucanuncommenttheshutil.move()callandruntheprogramagaintoactuallyrenamethefiles.
IdeasforSimilarProgramsTherearemanyotherreasonswhyyoumightwanttorenamealargenumberoffiles.
Toaddaprefixtothestartofthefilename,suchasaddingspam_torenameeggs.txttospam_eggs.txtTochangefilenameswithEuropean-styledatestoAmerican-styledatesToremovethezerosfromfilessuchasspam0042.txt
Project:BackingUpaFolderintoaZIPFileSayyou’reworkingonaprojectwhosefilesyoukeepinafoldernamedC:\AlsPythonBook.You’reworriedaboutlosingyourwork,soyou’dliketocreateZIPfile“snapshots”oftheentirefolder.You’dliketokeepdifferentversions,soyouwanttheZIPfile’sfilenametoincrementeachtimeitismade;forexample,AlsPythonBook_1.zip,AlsPythonBook_2.zip,AlsPythonBook_3.zip,andsoon.Youcoulddothisbyhand,butitisratherannoying,andyoumightaccidentallymisnumbertheZIPfiles’names.Itwouldbemuchsimplertorunaprogramthatdoesthisboringtaskforyou.
Forthisproject,openanewfileeditorwindowandsaveitasbackupToZip.py.
Step1:FigureOuttheZIPFile’sNameThecodeforthisprogramwillbeplacedintoafunctionnamedbackupToZip().ThiswillmakeiteasytocopyandpastethefunctionintootherPythonprogramsthatneedthisfunctionality.Attheendoftheprogram,thefunctionwillbecalledtoperformthebackup.Makeyourprogramlooklikethis:
#!python3
#backupToZip.py-Copiesanentirefolderanditscontentsinto
#aZIPfilewhosefilenameincrements.
➊importzipfile,os
defbackupToZip(folder):
#Backuptheentirecontentsof"folder"intoaZIPfile.
folder=os.path.abspath(folder)#makesurefolderisabsolute
#Figureoutthefilenamethiscodeshouldusebasedon
#whatfilesalreadyexist.
➋number=1
➌whileTrue:
zipFilename=os.path.basename(folder)+'_'+str(number)+'.zip'
ifnotos.path.exists(zipFilename):
break
number=number+1
➍#TODO:CreatetheZIPfile.
#TODO:Walktheentirefoldertreeandcompressthefilesineachfolder.
print('Done.')
backupToZip('C:\\delicious')
Dothebasicsfirst:Addtheshebang(#!)line,describewhattheprogramdoes,andimportthezipfileandosmodules➊.
DefineabackupToZip()functionthattakesjustoneparameter,folder.Thisparameterisastringpathtothefolderwhosecontentsshouldbebackedup.ThefunctionwilldeterminewhatfilenametousefortheZIPfileitwillcreate;thenthefunctionwillcreatethefile,walkthefolderfolder,andaddeachofthesubfoldersandfilestotheZIPfile.WriteTODOcommentsforthesestepsinthesourcecodetoremindyourselftodothemlater➍.
Thefirstpart,namingtheZIPfile,usesthebasenameoftheabsolutepathoffolder.IfthefolderbeingbackedupisC:\delicious,theZIPfile’snameshouldbedelicious_N.zip,whereN=1isthefirsttimeyouruntheprogram,N=2isthesecondtime,andsoon.
YoucandeterminewhatNshouldbebycheckingwhetherdelicious_1.zipalreadyexists,
thencheckingwhetherdelicious_2.zipalreadyexists,andsoon.UseavariablenamednumberforN➋,andkeepincrementingitinsidetheloopthatcallsos.path.exists()tocheckwhetherthefileexists➌.Thefirstnonexistentfilenamefoundwillcausethelooptobreak,sinceitwillhavefoundthefilenameofthenewzip.
Step2:CreatetheNewZIPFileNextlet’screatetheZIPfile.Makeyourprogramlooklikethefollowing:
#!python3
#backupToZip.py-Copiesanentirefolderanditscontentsinto
#aZIPfilewhosefilenameincrements.
--snip--
whileTrue:
zipFilename=os.path.basename(folder)+'_'+str(number)+'.zip'
ifnotos.path.exists(zipFilename):
break
number=number+1
#CreatetheZIPfile.
print('Creating%s…'%(zipFilename))
➊backupZip=zipfile.ZipFile(zipFilename,'w')
#TODO:Walktheentirefoldertreeandcompressthefilesineachfolder.
print('Done.')
backupToZip('C:\\delicious')
NowthatthenewZIPfile’snameisstoredinthezipFilenamevariable,youcancallzipfile.ZipFile()toactuallycreatetheZIPfile➊.Besuretopass'w'asthesecondargumentsothattheZIPfileisopenedinwritemode.
Step3:WalktheDirectoryTreeandAddtotheZIPFileNowyouneedtousetheos.walk()functiontodotheworkoflistingeveryfileinthefolderanditssubfolders.Makeyourprogramlooklikethefollowing:
#!python3
#backupToZip.py-Copiesanentirefolderanditscontentsinto
#aZIPfilewhosefilenameincrements.
--snip--
#Walktheentirefoldertreeandcompressthefilesineachfolder.
➊forfoldername,subfolders,filenamesinos.walk(folder):
print('Addingfilesin%s…'%(foldername))
#AddthecurrentfoldertotheZIPfile.
➋backupZip.write(foldername)
#AddallthefilesinthisfoldertotheZIPfile.
➌forfilenameinfilenames:
newBase/os.path.basename(folder)+'_'
iffilename.startswith(newBase)andfilename.endswith('.zip')
continue#don'tbackupthebackupZIPfiles
backupZip.write(os.path.join(foldername,filename))
backupZip.close()
print('Done.')
backupToZip('C:\\delicious')
Youcanuseos.walk()inaforloop➊,andoneachiterationitwillreturntheiteration’scurrentfoldername,thesubfoldersinthatfolder,andthefilenamesinthatfolder.
Intheforloop,thefolderisaddedtotheZIPfile➋.Thenestedforloopcangothrougheachfilenameinthefilenameslist➌.EachoftheseisaddedtotheZIPfile,exceptforpreviouslymadebackupZIPs.
Whenyourunthisprogram,itwillproduceoutputthatwilllooksomethinglikethis:Creatingdelicious_1.zip…
AddingfilesinC:\delicious…
AddingfilesinC:\delicious\cats…
AddingfilesinC:\delicious\waffles…
AddingfilesinC:\delicious\walnut…
AddingfilesinC:\delicious\walnut\waffles…
Done.
Thesecondtimeyourunit,itwillputallthefilesinC:\deliciousintoaZIPfilenameddelicious_2.zip,andsoon.
IdeasforSimilarProgramsYoucanwalkadirectorytreeandaddfilestocompressedZIParchivesinseveralotherprograms.Forexample,youcanwriteprogramsthatdothefollowing:
Walkadirectorytreeandarchivejustfileswithcertainextensions,suchas.txtor.py,andnothingelseWalkadirectorytreeandarchiveeveryfileexceptthe.txtand.pyonesFindthefolderinadirectorytreethathasthegreatestnumberoffilesorthefolderthatusesthemostdiskspace
SummaryEvenifyouareanexperiencedcomputeruser,youprobablyhandlefilesmanuallywiththemouseandkeyboard.Modernfileexplorersmakeiteasytoworkwithafewfiles.Butsometimesyou’llneedtoperformataskthatwouldtakehoursusingyourcomputer’sfileexplorer.
Theosandshutilmodulesofferfunctionsforcopying,moving,renaming,anddeletingfiles.Whendeletingfiles,youmightwanttousethesend2trashmoduletomovefilestotherecyclebinortrashratherthanpermanentlydeletingthem.Andwhenwritingprogramsthathandlefiles,it’sagoodideatocommentoutthecodethatdoestheactualcopy/move/rename/deleteandaddaprint()callinsteadsoyoucanruntheprogramandverifyexactlywhatitwilldo.
Oftenyouwillneedtoperformtheseoperationsnotonlyonfilesinonefolderbutalsooneveryfolderinthatfolder,everyfolderinthosefolders,andsoon.Theos.walk()functionhandlesthistrekacrossthefoldersforyousothatyoucanconcentrateonwhatyourprogramneedstodowiththefilesinthem.
Thezipfilemodulegivesyouawayofcompressingandextractingfilesin.ziparchivesthroughPython.Combinedwiththefile-handlingfunctionsofosandshutil,zipfilemakesiteasytopackageupseveralfilesfromanywhereonyourharddrive.These.zipfilesaremucheasiertouploadtowebsitesorsendasemailattachmentsthanmanyseparatefiles.
Previouschaptersofthisbookhaveprovidedsourcecodeforyoutocopy.Butwhenyouwriteyourownprograms,theyprobablywon’tcomeoutperfectlythefirsttime.ThenextchapterfocusesonsomePythonmodulesthatwillhelpyouanalyzeanddebugyourprogramssothatyoucanquicklygetthemworkingcorrectly.
PracticeQuestionsQ: 1.Whatisthedifferencebetweenshutil.copy()andshutil.copytree()?
Q: 2.Whatfunctionisusedtorenamefiles?
Q: 3.Whatisthedifferencebetweenthedeletefunctionsinthesend2trashandshutilmodules?
Q: 4.ZipFileobjectshaveaclose()methodjustlikeFileobjects’close()method.WhatZipFilemethodisequivalenttoFileobjects’open()method?
PracticeProjectsForpractice,writeprogramstodothefollowingtasks.
SelectiveCopyWriteaprogramthatwalksthroughafoldertreeandsearchesforfileswithacertainfileextension(suchas.pdfor.jpg).Copythesefilesfromwhateverlocationtheyareintoanewfolder.
DeletingUnneededFilesIt’snotuncommonforafewunneededbuthumongousfilesorfolderstotakeupthebulkofthespaceonyourharddrive.Ifyou’retryingtofreeuproomonyourcomputer,you’llgetthemostbangforyourbuckbydeletingthemostmassiveoftheunwantedfiles.Butfirstyouhavetofindthem.
Writeaprogramthatwalksthroughafoldertreeandsearchesforexceptionallylargefilesorfolders—say,onesthathaveafilesizeofmorethan100MB.(Remember,togetafile’ssize,youcanuseos.path.getsize()fromtheosmodule.)Printthesefileswiththeirabsolutepathtothescreen.
FillingintheGapsWriteaprogramthatfindsallfileswithagivenprefix,suchasspam001.txt,spam002.txt,andsoon,inasinglefolderandlocatesanygapsinthenumbering(suchasifthereisaspam001.txtandspam003.txtbutnospam002.txt).Havetheprogramrenameallthelaterfilestoclosethisgap.
Asanaddedchallenge,writeanotherprogramthatcaninsertgapsintonumberedfilessothatanewfilecanbeadded.
Chapter10.DebuggingNowthatyouknowenoughtowritemorecomplicatedprograms,youmaystartfindingnot-so-simplebugsinthem.Thischaptercoverssometoolsandtechniquesforfindingtherootcauseofbugsinyourprogramtohelpyoufixbugsfasterandwithlesseffort.
Toparaphraseanoldjokeamongprogrammers,“Writingcodeaccountsfor90percentofprogramming.Debuggingcodeaccountsfortheother90percent.”
Yourcomputerwilldoonlywhatyoutellittodo;itwon’treadyourmindanddowhatyouintendedittodo.Evenprofessionalprogrammerscreatebugsallthetime,sodon’tfeeldiscouragedifyourprogramhasaproblem.
Fortunately,thereareafewtoolsandtechniquestoidentifywhatexactlyyourcodeisdoingandwhereit’sgoingwrong.First,youwilllookatloggingandassertions,twofeaturesthatcanhelpyoudetectbugsearly.Ingeneral,theearlieryoucatchbugs,theeasiertheywillbetofix.
Second,youwilllookathowtousethedebugger.ThedebuggerisafeatureofIDLEthatexecutesaprogramoneinstructionatatime,givingyouachancetoinspectthevaluesinvariableswhileyourcoderuns,andtrackhowthevalueschangeoverthecourseofyourprogram.Thisismuchslowerthanrunningtheprogramatfullspeed,butitishelpfultoseetheactualvaluesinaprogramwhileitruns,ratherthandeducingwhatthevaluesmightbefromthesourcecode.
RaisingExceptionsPythonraisesanexceptionwheneverittriestoexecuteinvalidcode.InChapter3,youreadabouthowtohandlePython’sexceptionswithtryandexceptstatementssothatyourprogramcanrecoverfromexceptionsthatyouanticipated.Butyoucanalsoraiseyourownexceptionsinyourcode.Raisinganexceptionisawayofsaying,“Stoprunningthecodeinthisfunctionandmovetheprogramexecutiontotheexceptstatement.”
Exceptionsareraisedwitharaisestatement.Incode,araisestatementconsistsofthefollowing:
TheraisekeywordAcalltotheException()functionAstringwithahelpfulerrormessagepassedtotheException()function
Forexample,enterthefollowingintotheinteractiveshell:>>>raiseException('Thisistheerrormessage.')
Traceback(mostrecentcalllast):
File"<pyshell#191>",line1,in<module>
raiseException('Thisistheerrormessage.')
Exception:Thisistheerrormessage.
Iftherearenotryandexceptstatementscoveringtheraisestatementthatraisedtheexception,theprogramsimplycrashesanddisplaystheexception’serrormessage.
Oftenit’sthecodethatcallsthefunction,notthefuctionitself,thatknowshowtohandleanexpection.Soyouwillcommonlyseearaisestatementinsideafunctionandthetryandexceptstatementsinthecodecallingthefunction.Forexample,openanewfileeditorwindow,enterthefollowingcode,andsavetheprogramasboxPrint.py:
defboxPrint(symbol,width,height):
iflen(symbol)!=1:
➊raiseException('Symbolmustbeasinglecharacterstring.')
ifwidth<=2:
➋raiseException('Widthmustbegreaterthan2.')
ifheight<=2:
➌raiseException('Heightmustbegreaterthan2.')
print(symbol*width)
foriinrange(height-2):
print(symbol+(''*(width-2))+symbol)
print(symbol*width)
forsym,w,hin(('*',4,4),('O',20,5),('x',1,3),('ZZ',3,3)):
try:
boxPrint(sym,w,h)
➍exceptExceptionaserr:
➎print('Anexceptionhappened:'+str(err))
Herewe’vedefinedaboxPrint()functionthattakesacharacter,awidth,andaheight,andusesthecharactertomakealittlepictureofaboxwiththatwidthandheight.Thisboxshapeisprintedtotheconsole.
Saywewantthecharactertobeasinglecharacter,andthewidthandheighttobegreaterthan2.Weaddifstatementstoraiseexceptionsiftheserequirementsaren’tsatisfied.Later,whenwecallboxPrint()withvariousarguments,ourtry/exceptwillhandleinvalidarguments.
ThisprogramusestheexceptExceptionaserrformoftheexceptstatement➍.IfanExceptionobjectisreturnedfromboxPrint()➊➋➌,thisexceptstatementwillstoreit
inavariablenamederr.TheExceptionobjectcanthenbeconvertedtoastringbypassingittostr()toproduceauser-friendlyerrormessage➎.WhenyourunthisboxPrint.py,theoutputwilllooklikethis:
****
**
**
****
OOOOOOOOOOOOOOOOOOOO
OO
OO
OO
OOOOOOOOOOOOOOOOOOOO
Anexceptionhappened:Widthmustbegreaterthan2.
Anexceptionhappened:Symbolmustbeasinglecharacterstring.
Usingthetryandexceptstatements,youcanhandleerrorsmoregracefullyinsteadoflettingtheentireprogramcrash.
GettingtheTracebackasaStringWhenPythonencountersanerror,itproducesatreasuretroveoferrorinformationcalledthetraceback.Thetracebackincludestheerrormessage,thelinenumberofthelinethatcausedtheerror,andthesequenceofthefunctioncallsthatledtotheerror.Thissequenceofcallsiscalledthecallstack.
OpenanewfileeditorwindowinIDLE,enterthefollowingprogram,andsaveitaserrorExample.py:
defspam():
bacon()
defbacon():
raiseException('Thisistheerrormessage.')
spam()
WhenyourunerrorExample.py,theoutputwilllooklikethis:Traceback(mostrecentcalllast):
File"errorExample.py",line7,in<module>
spam()
File"errorExample.py",line2,inspam
bacon()
File"errorExample.py",line5,inbacon
raiseException('Thisistheerrormessage.')
Exception:Thisistheerrormessage.
Fromthetraceback,youcanseethattheerrorhappenedonline5,inthebacon()function.Thisparticularcalltobacon()camefromline2,inthespam()function,whichinturnwascalledonline7.Inprogramswherefunctionscanbecalledfrommultipleplaces,thecallstackcanhelpyoudeterminewhichcallledtotheerror.
ThetracebackisdisplayedbyPythonwheneveraraisedexceptiongoesunhandled.Butyoucanalsoobtainitasastringbycallingtraceback.format_exc().Thisfunctionisusefulifyouwanttheinformationfromanexception’stracebackbutalsowantanexceptstatementtogracefullyhandletheexception.YouwillneedtoimportPython’stracebackmodulebeforecallingthisfunction.
Forexample,insteadofcrashingyourprogramrightwhenanexceptionoccurs,youcanwritethetracebackinformationtoalogfileandkeepyourprogramrunning.Youcanlookatthelogfilelater,whenyou’rereadytodebugyourprogram.Enterthefollowingintotheinteractiveshell:
>>>importtraceback
>>>try:
raiseException('Thisistheerrormessage.')
except:
errorFile=open('errorInfo.txt','w')
errorFile.write(traceback.format_exc())
errorFile.close()
print('ThetracebackinfowaswrittentoerrorInfo.txt.')
116
ThetracebackinfowaswrittentoerrorInfo.txt.
The116isthereturnvaluefromthewrite()method,since116characterswerewrittentothefile.ThetracebacktextwaswrittentoerrorInfo.txt.
Traceback(mostrecentcalllast):
File"<pyshell#28>",line2,in<module>
Exception:Thisistheerrormessage.
AssertionsAnassertionisasanitychecktomakesureyourcodeisn’tdoingsomethingobviouslywrong.Thesesanitychecksareperformedbyassertstatements.Ifthesanitycheckfails,thenanAssertionErrorexceptionisraised.Incode,anassertstatementconsistsofthefollowing:
TheassertkeywordAcondition(thatis,anexpressionthatevaluatestoTrueorFalse)AcommaAstringtodisplaywhentheconditionisFalse
Forexample,enterthefollowingintotheinteractiveshell:>>>podBayDoorStatus='open'
>>>assertpodBayDoorStatus=='open','Thepodbaydoorsneedtobe"open".'
>>>podBayDoorStatus='I\'msorry,Dave.I\'mafraidIcan'tdothat.''
>>>assertpodBayDoorStatus=='open','Thepodbaydoorsneedtobe"open".'
Traceback(mostrecentcalllast):
File"<pyshell#10>",line1,in<module>
assertpodBayDoorStatus=='open','Thepodbaydoorsneedtobe"open".'
AssertionError:Thepodbaydoorsneedtobe"open".
Herewe’vesetpodBayDoorStatusto'open',sofromnowon,wefullyexpectthevalueofthisvariabletobe'open'.Inaprogramthatusesthisvariable,wemighthavewrittenalotofcodeundertheassumptionthatthevalueis'open'—codethatdependsonitsbeing'open'inordertoworkasweexpect.Soweaddanassertiontomakesurewe’rerighttoassumepodBayDoorStatusis'open'.Here,weincludethemessage'Thepodbaydoorsneedtobe"open".'soit’llbeeasytoseewhat’swrongiftheassertionfails.
Later,saywemaketheobviousmistakeofassigningpodBayDoorStatusanothervalue,butdon’tnoticeitamongmanylinesofcode.Theassertioncatchesthismistakeandclearlytellsuswhat’swrong.
InplainEnglish,anassertstatementsays,“Iassertthatthisconditionholdstrue,andifnot,thereisabugsomewhereintheprogram.”Unlikeexceptions,yourcodeshouldnothandleassertstatementswithtryandexcept;ifanassertfails,yourprogramshouldcrash.Byfailingfastlikethis,youshortenthetimebetweentheoriginalcauseofthebugandwhenyoufirstnoticethebug.Thiswillreducetheamountofcodeyouwillhavetocheckbeforefindingthecodethat’scausingthebug.
Assertionsareforprogrammererrors,notusererrors.Forerrorsthatcanberecoveredfrom(suchasafilenotbeingfoundortheuserenteringinvaliddata),raiseanexceptioninsteadofdetectingitwithanassertstatement.
UsinganAssertioninaTrafficLightSimulationSayyou’rebuildingatrafficlightsimulationprogram.Thedatastructurerepresentingthestoplightsatanintersectionisadictionarywithkeys'ns'and'ew',forthestoplightsfacingnorth-southandeast-west,respectively.Thevaluesatthesekeyswillbeoneofthestrings'green','yellow',or'red'.Thecodewouldlooksomethinglikethis:
market_2nd={'ns':'green','ew':'red'}
mission_16th={'ns':'red','ew':'green'}
ThesetwovariableswillbefortheintersectionsofMarketStreetand2ndStreet,and
MissionStreetand16thStreet.Tostarttheproject,youwanttowriteaswitchLights()function,whichwilltakeanintersectiondictionaryasanargumentandswitchthelights.
Atfirst,youmightthinkthatswitchLights()shouldsimplyswitcheachlighttothenextcolorinthesequence:Any'green'valuesshouldchangeto'yellow','yellow'valuesshouldchangeto'red',and'red'valuesshouldchangeto'green'.Thecodetoimplementthisideamightlooklikethis:
defswitchLights(stoplight):
forkeyinstoplight.keys():
ifstoplight[key]=='green':
stoplight[key]='yellow'
elifstoplight[key]=='yellow':
stoplight[key]='red'
elifstoplight[key]=='red':
stoplight[key]='green'
switchLights(market_2nd)
Youmayalreadyseetheproblemwiththiscode,butlet’spretendyouwrotetherestofthesimulationcode,thousandsoflineslong,withoutnoticingit.Whenyoufinallydorunthesimulation,theprogramdoesn’tcrash—butyourvirtualcarsdo!
Sinceyou’vealreadywrittentherestoftheprogram,youhavenoideawherethebugcouldbe.Maybeit’sinthecodesimulatingthecarsorinthecodesimulatingthevirtualdrivers.ItcouldtakehourstotracethebugbacktotheswitchLights()function.
ButifwhilewritingswitchLights()youhadaddedanassertiontocheckthatatleastoneofthelightsisalwaysred,youmighthaveincludedthefollowingatthebottomofthefunction:
assert'red'instoplight.values(),'Neitherlightisred!'+str(stoplight)
Withthisassertioninplace,yourprogramwouldcrashwiththiserrormessage:Traceback(mostrecentcalllast):
File"carSim.py",line14,in<module>
switchLights(market_2nd)
File"carSim.py",line13,inswitchLights
assert'red'instoplight.values(),'Neitherlightisred!'+str(stoplight)
➊AssertionError:Neitherlightisred!{'ns':'yellow','ew':'green'}
TheimportantlinehereistheAssertionError➊.Whileyourprogramcrashingisnotideal,itimmediatelypointsoutthatasanitycheckfailed:Neitherdirectionoftraffichasaredlight,meaningthattrafficcouldbegoingbothways.Byfailingfastearlyintheprogram’sexecution,youcansaveyourselfalotoffuturedebuggingeffort.
DisablingAssertionsAssertionscanbedisabledbypassingthe-OoptionwhenrunningPython.Thisisgoodforwhenyouhavefinishedwritingandtestingyourprogramanddon’twantittobesloweddownbyperformingsanitychecks(althoughmostofthetimeassertstatementsdonotcauseanoticeablespeeddifference).Assertionsarefordevelopment,notthefinalproduct.Bythetimeyouhandoffyourprogramtosomeoneelsetorun,itshouldbefreeofbugsandnotrequirethesanitychecks.SeeAppendixBfordetailsabouthowtolaunchyourprobably-not-insaneprogramswiththe-Ooption.
LoggingIfyou’veeverputaprint()statementinyourcodetooutputsomevariable’svaluewhileyourprogramisrunning,you’veusedaformofloggingtodebugyourcode.Loggingisagreatwaytounderstandwhat’shappeninginyourprogramandinwhatorderitshappening.Python’sloggingmodulemakesiteasytocreatearecordofcustommessagesthatyouwrite.Theselogmessageswilldescribewhentheprogramexecutionhasreachedtheloggingfunctioncallandlistanyvariablesyouhavespecifiedatthatpointintime.Ontheotherhand,amissinglogmessageindicatesapartofthecodewasskippedandneverexecuted.
UsingtheloggingModuleToenabletheloggingmoduletodisplaylogmessagesonyourscreenasyourprogramruns,copythefollowingtothetopofyourprogram(butunderthe#!pythonshebangline):
importlogging
logging.basicConfig(level=logging.DEBUG,format='%(asctime)s-%(levelname)s
-%(message)s')
Youdon’tneedtoworrytoomuchabouthowthisworks,butbasically,whenPythonlogsanevent,itcreatesaLogRecordobjectthatholdsinformationaboutthatevent.Theloggingmodule’sbasicConfig()functionletsyouspecifywhatdetailsabouttheLogRecordobjectyouwanttoseeandhowyouwantthosedetailsdisplayed.
Sayyouwroteafunctiontocalculatethefactorialofanumber.Inmathematics,factorial4is1×2×3×4,or24.Factorial7is1×2×3×4×5×6×7,or5,040.Openanewfileeditorwindowandenterthefollowingcode.Ithasabuginit,butyouwillalsoenterseverallogmessagestohelpyourselffigureoutwhatisgoingwrong.SavetheprogramasfactorialLog.py.
importlogging
logging.basicConfig(level=logging.DEBUG,format='%(asctime)s-%(levelname)s
-%(message)s')
logging.debug('Startofprogram')
deffactorial(n):
logging.debug('Startoffactorial(%)'%(n))
total=1
foriinrange(n+1):
total*=i
logging.debug('iis'+str(i)+',totalis'+str(total))
logging.debug('Endoffactorial(%)'%(n))
returntotal
print(factorial(5))
logging.debug('Endofprogram')
Here,weusethelogging.debug()functionwhenwewanttoprintloginformation.Thisdebug()functionwillcallbasicConfig(),andalineofinformationwillbeprinted.ThisinformationwillbeintheformatwespecifiedinbasicConfig()andwillincludethemessageswepassedtodebug().Theprint(factorial(5))callispartoftheoriginalprogram,sotheresultisdisplayedevenifloggingmessagesaredisabled.
Theoutputofthisprogramlookslikethis:2015-05-2316:20:12,664-DEBUG-Startofprogram
2015-05-2316:20:12,664-DEBUG-Startoffactorial(5)
2015-05-2316:20:12,665-DEBUG-iis0,totalis0
2015-05-2316:20:12,668-DEBUG-iis1,totalis0
2015-05-2316:20:12,670-DEBUG-iis2,totalis0
2015-05-2316:20:12,673-DEBUG-iis3,totalis0
2015-05-2316:20:12,675-DEBUG-iis4,totalis0
2015-05-2316:20:12,678-DEBUG-iis5,totalis0
2015-05-2316:20:12,680-DEBUG-Endoffactorial(5)
0
2015-05-2316:20:12,684-DEBUG-Endofprogram
Thefactorial()functionisreturning0asthefactorialof5,whichisn’tright.Theforloopshouldbemultiplyingthevalueintotalbythenumbersfrom1to5.Butthelogmessagesdisplayedbylogging.debug()showthattheivariableisstartingat0insteadof1.Sincezerotimesanythingiszero,therestoftheiterationsalsohavethewrongvaluefortotal.Loggingmessagesprovideatrailofbreadcrumbsthatcanhelpyoufigureoutwhenthingsstartedtogowrong.
Changetheforiinrange(n+1):linetoforiinrange(1,n+1):,andruntheprogramagain.Theoutputwilllooklikethis:
2015-05-2317:13:40,650-DEBUG-Startofprogram
2015-05-2317:13:40,651-DEBUG-Startoffactorial(5)
2015-05-2317:13:40,651-DEBUG-iis1,totalis1
2015-05-2317:13:40,654-DEBUG-iis2,totalis2
2015-05-2317:13:40,656-DEBUG-iis3,totalis6
2015-05-2317:13:40,659-DEBUG-iis4,totalis24
2015-05-2317:13:40,661-DEBUG-iis5,totalis120
2015-05-2317:13:40,661-DEBUG-Endoffactorial(5)
120
2015-05-2317:13:40,666-DEBUG-Endofprogram
Thefactorial(5)callcorrectlyreturns120.Thelogmessagesshowedwhatwasgoingoninsidetheloop,whichledstraighttothebug.
Youcanseethatthelogging.debug()callsprintedoutnotjustthestringspassedtothembutalsoatimestampandthewordDEBUG.
Don’tDebugwithprint()Typingimportloggingandlogging.basicConfig(level=logging.DEBUG,format='%(asctime)s-%(levelname)s-%(message)s')issomewhatunwieldy.Youmaywanttouseprint()callsinstead,butdon’tgiveintothistemptation!Onceyou’redonedebugging,you’llendupspendingalotoftimeremovingprint()callsfromyourcodeforeachlogmessage.Youmightevenaccidentallyremovesomeprint()callsthatwerebeingusedfornonlogmessages.Thenicethingaboutlogmessagesisthatyou’refreetofillyourprogramwithasmanyasyoulike,andyoucanalwaysdisablethemlaterbyaddingasinglelogging.disable(logging.CRITICAL)call.Unlikeprint(),theloggingmodulemakesiteasytoswitchbetweenshowingandhidinglogmessages.
Logmessagesareintendedfortheprogrammer,nottheuser.Theuserwon’tcareaboutthecontentsofsomedictionaryvalueyouneedtoseetohelpwithdebugging;usealogmessageforsomethinglikethat.Formessagesthattheuserwillwanttosee,likeFilenotfoundorInvalidinput,pleaseenteranumber,youshoulduseaprint()call.Youdon’twanttodeprivetheuserofusefulinformationafteryou’vedisabledlogmessages.
LoggingLevelsLogginglevelsprovideawaytocategorizeyourlogmessagesbyimportance.Therearefivelogginglevels,describedinTable10-1fromleasttomostimportant.Messagescanbe
loggedateachlevelusingadifferentloggingfunction.
Table10-1.LoggingLevelsinPython
Level LoggingFunction Description
DEBUG logging.debug() Thelowestlevel.Usedforsmalldetails.Usuallyyoucareaboutthesemessagesonlywhendiagnosingproblems.
INFO logging.info() Usedtorecordinformationongeneraleventsinyourprogramorconfirmthatthingsareworkingattheirpointintheprogram.
WARNING logging.warning() Usedtoindicateapotentialproblemthatdoesn’tpreventtheprogramfromworkingbutmightdosointhefuture.
ERROR logging.error() Usedtorecordanerrorthatcausedtheprogramtofailtodosomething.
CRITICAL logging.critical() Thehighestlevel.Usedtoindicateafatalerrorthathascausedorisabouttocausetheprogramtostoprunningentirely.
Yourloggingmessageispassedasastringtothesefunctions.Thelogginglevelsaresuggestions.Ultimately,itisuptoyoutodecidewhichcategoryyourlogmessagefallsinto.Enterthefollowingintotheinteractiveshell:
>>>importlogging
>>>logging.basicConfig(level=logging.DEBUG,format='%(asctime)s-
%(levelname)s-%(message)s')
>>>logging.debug('Somedebuggingdetails.')
2015-05-1819:04:26,901-DEBUG-Somedebuggingdetails.
>>>logging.info('Theloggingmoduleisworking.')
2015-05-1819:04:35,569-INFO-Theloggingmoduleisworking.
>>>logging.warning('Anerrormessageisabouttobelogged.')
2015-05-1819:04:56,843-WARNING-Anerrormessageisabouttobelogged.
>>>logging.error('Anerrorhasoccurred.')
2015-05-1819:05:07,737-ERROR-Anerrorhasoccurred.
>>>logging.critical('Theprogramisunabletorecover!')
2015-05-1819:05:45,794-CRITICAL-Theprogramisunabletorecover!
Thebenefitoflogginglevelsisthatyoucanchangewhatpriorityofloggingmessageyouwanttosee.Passinglogging.DEBUGtothebasicConfig()function’slevelkeywordargumentwillshowmessagesfromallthelogginglevels(DEBUGbeingthelowestlevel).Butafterdevelopingyourprogramsomemore,youmaybeinterestedonlyinerrors.Inthatcase,youcansetbasicConfig()’slevelargumenttologging.ERROR.ThiswillshowonlyERRORandCRITICALmessagesandskiptheDEBUG,INFO,andWARNINGmessages.
DisablingLoggingAfteryou’vedebuggedyourprogram,youprobablydon’twantalltheselogmessagesclutteringthescreen.Thelogging.disable()functiondisablesthesesothatyoudon’thavetogointoyourprogramandremovealltheloggingcallsbyhand.Yousimplypasslogging.disable()alogginglevel,anditwillsuppressalllogmessagesatthatlevelorlower.Soifyouwanttodisableloggingentirely,justaddlogging.disable(logging.CRITICAL)toyourprogram.Forexample,enterthefollowingintotheinteractiveshell:
>>>importlogging
>>>logging.basicConfig(level=logging.INFO,format='%(asctime)s-
%(levelname)s-%(message)s')
>>>logging.critical('Criticalerror!Criticalerror!')
2015-05-2211:10:48,054-CRITICAL-Criticalerror!Criticalerror!
>>>logging.disable(logging.CRITICAL)
>>>logging.critical('Criticalerror!Criticalerror!')
>>>logging.error('Error!Error!')
Sincelogging.disable()willdisableallmessagesafterit,youwillprobablywanttoadditneartheimportlogginglineofcodeinyourprogram.Thisway,youcaneasilyfindittocommentoutoruncommentthatcalltoenableordisableloggingmessagesasneeded.
LoggingtoaFileInsteadofdisplayingthelogmessagestothescreen,youcanwritethemtoatextfile.Thelogging.basicConfig()functiontakesafilenamekeywordargument,likeso:
importlogging
logging.basicConfig(filename='myProgramLog.txt',level=logging.DEBUG,format='
%(asctime)s-%(levelname)s-%(message)s')
ThelogmessageswillbesavedtomyProgramLog.txt.Whileloggingmessagesarehelpful,theycanclutteryourscreenandmakeithardtoreadtheprogram’soutput.Writingtheloggingmessagestoafilewillkeepyourscreenclearandstorethemessagessoyoucanreadthemafterrunningtheprogram.Youcanopenthistextfileinanytexteditor,suchasNotepadorTextEdit.
IDLE’sDebuggerThedebuggerisafeatureofIDLEthatallowsyoutoexecuteyourprogramonelineatatime.Thedebuggerwillrunasinglelineofcodeandthenwaitforyoutotellittocontinue.Byrunningyourprogram“underthedebugger”likethis,youcantakeasmuchtimeasyouwanttoexaminethevaluesinthevariablesatanygivenpointduringtheprogram’slifetime.Thisisavaluabletoolfortrackingdownbugs.
ToenableIDLE’sdebugger,clickDebug▸Debuggerintheinteractiveshellwindow.ThiswillbringuptheDebugControlwindow,whichlookslikeFigure10-1.
WhentheDebugControlwindowappears,selectallfouroftheStack,Locals,Source,andGlobalscheckboxessothatthewindowshowsthefullsetofdebuginformation.WhiletheDebugControlwindowisdisplayed,anytimeyourunaprogramfromthefileeditor,thedebuggerwillpauseexecutionbeforethefirstinstructionanddisplaythefollowing:
ThelineofcodethatisabouttobeexecutedAlistofalllocalvariablesandtheirvaluesAlistofallglobalvariablesandtheirvalues
Figure10-1.TheDebugControlwindow
You’llnoticethatinthelistofglobalvariablesthereareseveralvariablesyouhaven’tdefined,suchas__builtins__,__doc__,__file__,andsoon.ThesearevariablesthatPythonautomaticallysetswheneveritrunsaprogram.Themeaningofthesevariablesisbeyondthescopeofthisbook,andyoucancomfortablyignorethem.
TheprogramwillstaypauseduntilyoupressoneofthefivebuttonsintheDebugControlwindow:Go,Step,Over,Out,orQuit.
GoClickingtheGobuttonwillcausetheprogramtoexecutenormallyuntilitterminatesorreachesabreakpoint.(Breakpointsaredescribedlaterinthischapter.)Ifyouaredonedebuggingandwanttheprogramtocontinuenormally,clicktheGobutton.
StepClickingtheStepbuttonwillcausethedebuggertoexecutethenextlineofcodeandthenpauseagain.TheDebugControlwindow’slistofglobalandlocalvariableswillbeupdatediftheirvalueschange.Ifthenextlineofcodeisafunctioncall,thedebuggerwill“stepinto”thatfunctionandjumptothefirstlineofcodeofthatfunction.
OverClickingtheOverbuttonwillexecutethenextlineofcode,similartotheStepbutton.However,ifthenextlineofcodeisafunctioncall,theOverbuttonwill“stepover”thecodeinthefunction.Thefunction’scodewillbeexecutedatfullspeed,andthedebuggerwillpauseassoonasthefunctioncallreturns.Forexample,ifthenextlineofcodeisaprint()call,youdon’treallycareaboutcodeinsidethebuilt-inprint()function;youjustwantthestringyoupassitprintedtothescreen.Forthisreason,usingtheOverbuttonismorecommonthantheStepbutton.
OutClickingtheOutbuttonwillcausethedebuggertoexecutelinesofcodeatfullspeeduntilitreturnsfromthecurrentfunction.IfyouhavesteppedintoafunctioncallwiththeStepbuttonandnowsimplywanttokeepexecutinginstructionsuntilyougetbackout,clicktheOutbuttonto“stepout”ofthecurrentfunctioncall.
QuitIfyouwanttostopdebuggingentirelyandnotbothertocontinueexecutingtherestoftheprogram,clicktheQuitbutton.TheQuitbuttonwillimmediatelyterminatetheprogram.Ifyouwanttorunyourprogramnormallyagain,selectDebug▸Debuggeragaintodisablethedebugger.
DebuggingaNumberAddingProgramOpenanewfileeditorwindowandenterthefollowingcode:
print('Enterthefirstnumbertoadd:')
first=input()
print('Enterthesecondnumbertoadd:')
second=input()
print('Enterthethirdnumbertoadd:')
third=input()
print('Thesumis'+first+second+third)
SaveitasbuggyAddingProgram.pyandrunitfirstwithoutthedebuggerenabled.Theprogramwilloutputsomethinglikethis:
Enterthefirstnumbertoadd:
5
Enterthesecondnumbertoadd:
3
Enterthethirdnumbertoadd:
42
Thesumis5342
Theprogramhasn’tcrashed,butthesumisobviouslywrong.Let’senabletheDebugControlwindowandrunitagain,thistimeunderthedebugger.
WhenyoupressF5orselectRun▸RunModule(withDebug▸DebuggerenabledandallfourcheckboxesontheDebugControlwindowchecked),theprogramstartsinapausedstateonline1.Thedebuggerwillalwayspauseonthelineofcodeitisabouttoexecute.TheDebugControlwindowwilllooklikeFigure10-2.
Figure10-2.TheDebugControlwindowwhentheprogramfirststartsunderthedebugger
ClicktheOverbuttononcetoexecutethefirstprint()call.YoushoulduseOverinsteadofStephere,sinceyoudon’twanttostepintothecodefortheprint()function.TheDebugControlwindowwillupdatetoline2,andline2inthefileeditorwindowwillbehighlighted,asshowninFigure10-3.Thisshowsyouwheretheprogramexecutioncurrentlyis.
Figure10-3.TheDebugControlwindowafterclickingOver
ClickOveragaintoexecutetheinput()functioncall,andthebuttonsintheDebugControlwindowwilldisablethemselveswhileIDLEwaitsforyoutotypesomethingfortheinput()callintotheinteractiveshellwindow.Enter5andpressReturn.TheDebugControlwindowbuttonswillbereenabled.
KeepclickingOver,entering3and42asthenexttwonumbers,untilthedebuggerisonline7,thefinalprint()callintheprogram.TheDebugControlwindowshouldlooklikeFigure10-4.YoucanseeintheGlobalssectionthatthefirst,second,andthirdvariablesaresettostringvalues'5','3',and'42'insteadofintegervalues5,3,and42.Whenthelastlineisexecuted,thesestringsareconcatenatedinsteadofaddedtogether,causingthebug.
Figure10-4.TheDebugControlwindowonthelastline.Thevariablesaresettostrings,causingthebug.
Steppingthroughtheprogramwiththedebuggerishelpfulbutcanalsobeslow.Oftenyou’llwanttheprogramtorunnormallyuntilitreachesacertainlineofcode.Youcanconfigurethedebuggertodothiswithbreakpoints.
BreakpointsAbreakpointcanbesetonaspecificlineofcodeandforcesthedebuggertopausewhenevertheprogramexecutionreachesthatline.Openanewfileeditorwindowandenterthefollowingprogram,whichsimulatesflippingacoin1,000times.SaveitascoinFlip.py.
importrandom
heads=0
foriinrange(1,1001):
➊ifrandom.randint(0,1)==1:
heads=heads+1
ifi==500:
➋print('Halfwaydone!')
print('Headscameup'+str(heads)+'times.')
Therandom.randint(0,1)call➊willreturn0halfofthetimeand1theotherhalfofthetime.Thiscanbeusedtosimulatea50/50coinflipwhere1representsheads.Whenyou
runthisprogramwithoutthedebugger,itquicklyoutputssomethinglikethefollowing:Halfwaydone!
Headscameup490times.
Ifyouranthisprogramunderthedebugger,youwouldhavetoclicktheOverbuttonthousandsoftimesbeforetheprogramterminated.Ifyouwereinterestedinthevalueofheadsatthehalfwaypointoftheprogram’sexecution,when500of1000coinflipshavebeencompleted,youcouldinsteadjustsetabreakpointonthelineprint('Halfwaydone!')➋.Tosetabreakpoint,right-clickthelineinthefileeditorandselectSetBreakpoint,asshowninFigure10-5.
Figure10-5.Settingabreakpoint
Youdon’twanttosetabreakpointontheifstatementline,sincetheifstatementisexecutedoneverysingleiterationthroughtheloop.Bysettingthebreakpointonthecodeintheifstatement,thedebuggerbreaksonlywhentheexecutionenterstheifclause.
Thelinewiththebreakpointwillbehighlightedinyellowinthefileeditor.Whenyouruntheprogramunderthedebugger,itwillstartinapausedstateatthefirstline,asusual.ButifyouclickGo,theprogramwillrunatfullspeeduntilitreachesthelinewiththebreakpointsetonit.YoucanthenclickGo,Over,Step,orOuttocontinueasnormal.
Ifyouwanttoremoveabreakpoint,right-clickthelineinthefileeditorandselectClearBreakpointfromthemenu.Theyellowhighlightingwillgoaway,andthedebuggerwillnotbreakonthatlineinthefuture.
SummaryAssertions,exceptions,logging,andthedebuggerareallvaluabletoolstofindandpreventbugsinyourprogram.AssertionswiththePythonassertstatementareagoodwaytoimplement“sanitychecks”thatgiveyouanearlywarningwhenanecessaryconditiondoesn’tholdtrue.Assertionsareonlyforerrorsthattheprogramshouldn’ttrytorecoverfromandshouldfailfast.Otherwise,youshouldraiseanexception.
Anexceptioncanbecaughtandhandledbythetryandexceptstatements.Theloggingmoduleisagoodwaytolookintoyourcodewhileit’srunningandismuchmoreconvenienttousethantheprint()functionbecauseofitsdifferentlogginglevelsandabilitytologtoatextfile.
Thedebuggerletsyoustepthroughyourprogramonelineatatime.Alternatively,youcanrunyourprogramatnormalspeedandhavethedebuggerpauseexecutionwheneveritreachesalinewithabreakpointset.Usingthedebugger,youcanseethestateofanyvariable’svalueatanypointduringtheprogram’slifetime.
Thesedebuggingtoolsandtechniqueswillhelpyouwriteprogramsthatwork.Accidentallyintroducingbugsintoyourcodeisafactoflife,nomatterhowmanyyearsofcodingexperienceyouhave.
PracticeQuestionsQ: 1.WriteanassertstatementthattriggersanAssertionErrorifthevariablespamisanintegerlessthan10.
Q: 2.WriteanassertstatementthattriggersanAssertionErrorifthevariableseggsandbaconcontainstringsthatarethesameaseachother,eveniftheircasesaredifferent(thatis,'hello'and'hello'areconsideredthesame,and'goodbye'and'GOODbye'arealsoconsideredthesame).
Q: 3.WriteanassertstatementthatalwaystriggersanAssertionError.
Q: 4.Whatarethetwolinesthatyourprogrammusthaveinordertobeabletocalllogging.debug()?
Q: 5.Whatarethetwolinesthatyourprogrammusthaveinordertohavelogging.debug()sendaloggingmessagetoafilenamedprogramLog.txt?
Q: 6.Whatarethefivelogginglevels?
Q: 7.Whatlineofcodecanyouaddtodisableallloggingmessagesinyourprogram?
Q: 8.Whyisusingloggingmessagesbetterthanusingprint()todisplaythesamemessage?
Q: 9.WhatarethedifferencesbetweentheStep,Over,andOutbuttonsintheDebugControlwindow?
Q: 10.AfteryouclickGointheDebugControlwindow,whenwillthedebuggerstop?
Q: 11.Whatisabreakpoint?
Q: 12.HowdoyousetabreakpointonalineofcodeinIDLE?
PracticeProjectForpractice,writeaprogramthatdoesthefollowing.
DebuggingCoinTossThefollowingprogramismeanttobeasimplecointossguessinggame.Theplayergetstwoguesses(it’saneasygame).However,theprogramhasseveralbugsinit.Runthroughtheprogramafewtimestofindthebugsthatkeeptheprogramfromworkingcorrectly.
importrandom
guess=''
whileguessnotin('heads','tails'):
print('Guessthecointoss!Enterheadsortails:')
guess=input()
toss=random.randint(0,1)#0istails,1isheads
iftoss==guess:
print('Yougotit!')
else:
print('Nope!Guessagain!')
guesss=input()
iftoss==guess:
print('Yougotit!')
else:
print('Nope.Youarereallybadatthisgame.')
Chapter11.WebScrapingInthoserare,terrifyingmomentswhenI’mwithoutWi-Fi,IrealizejusthowmuchofwhatIdoonthecomputerisreallywhatIdoontheInternet.OutofsheerhabitI’llfindmyselftryingtocheckemail,readfriends’Twitterfeeds,oranswerthequestion,“DidKurtwoodSmithhaveanymajorrolesbeforehewasintheoriginal1987Robocop?”[2]
SincesomuchworkonacomputerinvolvesgoingontheInternet,it’dbegreatifyourprogramscouldgetonline.WebscrapingisthetermforusingaprogramtodownloadandprocesscontentfromtheWeb.Forexample,Googlerunsmanywebscrapingprogramstoindexwebpagesforitssearchengine.Inthischapter,youwilllearnaboutseveralmodulesthatmakeiteasytoscrapewebpagesinPython.
webbrowser.ComeswithPythonandopensabrowsertoaspecificpage.Requests.DownloadsfilesandwebpagesfromtheInternet.BeautifulSoup.ParsesHTML,theformatthatwebpagesarewrittenin.Selenium.Launchesandcontrolsawebbrowser.Seleniumisabletofillinformsandsimulatemouseclicksinthisbrowser.
Project:mapit.pywiththewebbrowserModuleThewebbrowsermodule’sopen()functioncanlaunchanewbrowsertoaspecifiedURL.Enterthefollowingintotheinteractiveshell:
>>>importwebbrowser
>>>webbrowser.open('http://inventwithpython.com/')
AwebbrowsertabwillopentotheURLhttp://inventwithpython.com/.Thisisabouttheonlythingthewebbrowsermodulecando.Evenso,theopen()functiondoesmakesomeinterestingthingspossible.Forexample,it’stedioustocopyastreetaddresstotheclipboardandbringupamapofitonGoogleMaps.Youcouldtakeafewstepsoutofthistaskbywritingasimplescripttoautomaticallylaunchthemapinyourbrowserusingthecontentsofyourclipboard.Thisway,youonlyhavetocopytheaddresstoaclipboardandrunthescript,andthemapwillbeloadedforyou.
Thisiswhatyourprogramdoes:
Getsastreetaddressfromthecommandlineargumentsorclipboard.OpensthewebbrowsertotheGoogleMapspagefortheaddress.
Thismeansyourcodewillneedtodothefollowing:
Readthecommandlineargumentsfromsys.argv.Readtheclipboardcontents.Callthewebbrowser.open()functiontoopenthewebbrowser.
OpenanewfileeditorwindowandsaveitasmapIt.py.
Step1:FigureOuttheURLBasedontheinstructionsinAppendixB,setupmapIt.pysothatwhenyourunitfromthecommandline,likeso…
C:\>mapit870ValenciaSt,SanFrancisco,CA94110
…thescriptwillusethecommandlineargumentsinsteadoftheclipboard.Iftherearenocommandlinearguments,thentheprogramwillknowtousethecontentsoftheclipboard.
FirstyouneedtofigureoutwhatURLtouseforagivenstreetaddress.Whenyouloadhttp://maps.google.com/inthebrowserandsearchforanaddress,theURLintheaddressbarlookssomethinglikethis:https://www.google.com/maps/place/870+Valencia+St/@37.7590311,-122.4215096,17z/data=!3m1!4b1!4m2!3m1!1s0x808f7e3dadc07a37:0xc86b0b2bb93b73d8
TheaddressisintheURL,butthere’salotofadditionaltextthereaswell.WebsitesoftenaddextradatatoURLstohelptrackvisitorsorcustomizesites.Butifyoutryjustgoingtohttps://www.google.com/maps/place/870+Valencia+St+San+Francisco+CA/,you’llfindthatitstillbringsupthecorrectpage.Soyourprogramcanbesettoopenawebbrowserto'https://www.google.com/maps/place/your_address_string'(whereyour_address_stringistheaddressyouwanttomap).
Step2:HandletheCommandLineArgumentsMakeyourcodelooklikethis:
#!python3
#mapIt.py-Launchesamapinthebrowserusinganaddressfromthe
#commandlineorclipboard.
importwebbrowser,sys
iflen(sys.argv)>1:
#Getaddressfromcommandline.
address=''.join(sys.argv[1:])
#TODO:Getaddressfromclipboard.
Aftertheprogram’s#!shebangline,youneedtoimportthewebbrowsermoduleforlaunchingthebrowserandimportthesysmoduleforreadingthepotentialcommandlinearguments.Thesys.argvvariablestoresalistoftheprogram’sfilenameandcommandlinearguments.Ifthislisthasmorethanjustthefilenameinit,thenlen(sys.argv)evaluatestoanintegergreaterthan1,meaningthatcommandlineargumentshaveindeedbeenprovided.
Commandlineargumentsareusuallyseparatedbyspaces,butinthiscase,youwanttointerpretalloftheargumentsasasinglestring.Sincesys.argvisalistofstrings,youcanpassittothejoin()method,whichreturnsasinglestringvalue.Youdon’twanttheprogramnameinthisstring,soinsteadofsys.argv,youshouldpasssys.argv[1:]tochopoffthefirstelementofthearray.Thefinalstringthatthisexpressionevaluatestoisstoredintheaddressvariable.
Ifyouruntheprogrambyenteringthisintothecommandline…mapit870ValenciaSt,SanFrancisco,CA94110
…thesys.argvvariablewillcontainthislistvalue:['mapIt.py','870','Valencia','St,','San','Francisco,','CA','94110']
Theaddressvariablewillcontainthestring'870ValencieSt,SanFrancisco,CA94110'.
Step3:HandletheClipboardContentandLaunchtheBrowserMakeyourcodelooklikethefollowing:
#!python3
#mapIt.py-Launchesamapinthebrowserusinganaddressfromthe
#commandlineorclipboard.
importwebbrowser,sys,pyperclip
iflen(sys.argv)>1:
#Getaddressfromcommandline.
address=''.join(sys.argv[1:])
else:
#Getaddressfromclipboard.
address=pyperclip.paste()
webbrowser.open('https://www.google.com/maps/place/'+address)
Iftherearenocommandlinearguments,theprogramwillassumetheaddressisstoredontheclipboard.Youcangettheclipboardcontentwithpyperclip.paste()andstoreitinavariablenamedaddress.Finally,tolaunchawebbrowserwiththeGoogleMapsURL,callwebbrowser.open().
Whilesomeoftheprogramsyouwritewillperformhugetasksthatsaveyouhours,itcanbejustassatisfyingtouseaprogramthatconvenientlysavesyouafewsecondseachtimeyouperformacommontask,suchasgettingamapofanaddress.Table11-1comparesthestepsneededtodisplayamapwithandwithoutmapIt.py.
Table11-1.GettingaMapwithandWithoutmapIt.py
Manuallygettingamap UsingmapIt.py
Highlighttheaddress. Highlighttheaddress.
Copytheaddress. Copytheaddress.
Openthewebbrowser. RunmapIt.py.
Gotohttp://maps.google.com/.
Clicktheaddresstextfield.
Pastetheaddress.
PressENTER.
SeehowmapIt.pymakesthistasklesstedious?
IdeasforSimilarProgramsAslongasyouhaveaURL,thewebbrowsermoduleletsuserscutoutthestepofopeningthebrowseranddirectingthemselvestoawebsite.Otherprogramscouldusethisfunctionalitytodothefollowing:
Openalllinksonapageinseparatebrowsertabs.OpenthebrowsertotheURLforyourlocalweather.Openseveralsocialnetworksitesthatyouregularlycheck.
DownloadingFilesfromtheWebwiththerequestsModuleTherequestsmoduleletsyoueasilydownloadfilesfromtheWebwithouthavingtoworryaboutcomplicatedissuessuchasnetworkerrors,connectionproblems,anddatacompression.Therequestsmoduledoesn’tcomewithPython,soyou’llhavetoinstallitfirst.Fromthecommandline,runpipinstallrequests.(AppendixAhasadditionaldetailsonhowtoinstallthird-partymodules.)
TherequestsmodulewaswrittenbecausePython’surllib2moduleistoocomplicatedtouse.Infact,takeapermanentmarkerandblackoutthisentireparagraph.ForgetIevermentionedurllib2.IfyouneedtodownloadthingsfromtheWeb,justusetherequestsmodule.
Next,doasimpletesttomakesuretherequestsmoduleinstalleditselfcorrectly.Enterthefollowingintotheinteractiveshell:
>>>importrequests
Ifnoerrormessagesshowup,thentherequestsmodulehasbeensuccessfullyinstalled.
DownloadingaWebPagewiththerequests.get()FunctionTherequests.get()functiontakesastringofaURLtodownload.Bycallingtype()onrequests.get()’sreturnvalue,youcanseethatitreturnsaResponseobject,whichcontainstheresponsethatthewebservergaveforyourrequest.I’llexplaintheResponseobjectinmoredetaillater,butfornow,enterthefollowingintotheinteractiveshellwhileyourcomputerisconnectedtotheInternet:
>>>importrequests
➊>>>res=requests.get('http://www.gutenberg.org/cache/epub/1112/pg1112.txt')
>>>type(res)
<class'requests.models.Response'>
➋>>>res.status_code==requests.codes.ok
True
>>>len(res.text)
178981
>>>print(res.text[:250])
TheProjectGutenbergEBookofRomeoandJuliet,byWilliamShakespeare
ThiseBookisfortheuseofanyoneanywhereatnocostandwith
almostnorestrictionswhatsoever.Youmaycopyit,giveitawayor
re-useitunderthetermsoftheProje
TheURLgoestoatextwebpagefortheentireplayofRomeoandJuliet,providedbyProjectGutenberg➊.Youcantellthattherequestforthiswebpagesucceededbycheckingthestatus_codeattributeoftheResponseobject.Ifitisequaltothevalueofrequests.codes.ok,theneverythingwentfine➋.(Incidentally,thestatuscodefor“OK”intheHTTPprotocolis200.Youmayalreadybefamiliarwiththe404statuscodefor“NotFound.”)
Iftherequestsucceeded,thedownloadedwebpageisstoredasastringintheResponseobject’stextvariable.Thisvariableholdsalargestringoftheentireplay;thecalltolen(res.text)showsyouthatitismorethan178,000characterslong.Finally,callingprint(res.text[:250])displaysonlythefirst250characters.
CheckingforErrorsAsyou’veseen,theResponseobjecthasastatus_codeattributethatcanbechecked
againstrequests.codes.oktoseewhetherthedownloadsucceeded.Asimplerwaytocheckforsuccessistocalltheraise_for_status()methodontheResponseobject.Thiswillraiseanexceptioniftherewasanerrordownloadingthefileandwilldonothingifthedownloadsucceeded.Enterthefollowingintotheinteractiveshell:
>>>res=requests.get('http://inventwithpython.com/page_that_does_not_exist')
>>>res.raise_for_status()
Traceback(mostrecentcalllast):
File"<pyshell#138>",line1,in<module>
res.raise_for_status()
File"C:\Python34\lib\site-packages\requests\models.py",line773,inraise_for_status
raiseHTTPError(http_error_msg,response=self)
requests.exceptions.HTTPError:404ClientError:NotFound
Theraise_for_status()methodisagoodwaytoensurethataprogramhaltsifabaddownloadoccurs.Thisisagoodthing:Youwantyourprogramtostopassoonassomeunexpectederrorhappens.Ifafaileddownloadisn’tadealbreakerforyourprogram,youcanwraptheraise_for_status()linewithtryandexceptstatementstohandlethiserrorcasewithoutcrashing.
importrequests
res=requests.get('http://inventwithpython.com/page_that_does_not_exist')
try:
res.raise_for_status()
exceptExceptionasexc:
print('Therewasaproblem:%s'%(exc))
Thisraise_for_status()methodcallcausestheprogramtooutputthefollowing:Therewasaproblem:404ClientError:NotFound
Alwayscallraise_for_status()aftercallingrequests.get().Youwanttobesurethatthedownloadhasactuallyworkedbeforeyourprogramcontinues.
SavingDownloadedFilestotheHardDriveFromhere,youcansavethewebpagetoafileonyourharddrivewiththestandardopen()functionandwrite()method.Therearesomeslightdifferences,though.First,youmustopenthefileinwritebinarymodebypassingthestring'wb'asthesecondargumenttoopen().Evenifthepageisinplaintext(suchastheRomeoandJuliettextyoudownloadedearlier),youneedtowritebinarydatainsteadoftextdatainordertomaintaintheUnicodeencodingofthetext.
UNICODEENCODINGS
Unicodeencodingsarebeyondthescopeofthisbook,butyoucanlearnmoreaboutthemfromthesewebpages:
JoelonSoftware:TheAbsoluteMinimumEverySoftwareDeveloperAbsolutely,PositivelyMustKnowAboutUnicodeandCharacterSets(NoExcuses!):http://www.joelonsoftware.com/articles/Unicode.htmlPragmaticUnicode:http://nedbatchelder.com/text/unipain.html
Towritethewebpagetoafile,youcanuseaforloopwiththeResponseobject’siter_content()method.
>>>importrequests
>>>res=requests.get('http://www.gutenberg.org/cache/epub/1112/pg1112.txt')
>>>res.raise_for_status()
>>>playFile=open('RomeoAndJuliet.txt','wb')
>>>forchunkinres.iter_content(100000):
playFile.write(chunk)
100000
78981
>>>playFile.close()
Theiter_content()methodreturns“chunks”ofthecontentoneachiterationthroughtheloop.Eachchunkisofthebytesdatatype,andyougettospecifyhowmanybyteseachchunkwillcontain.Onehundredthousandbytesisgenerallyagoodsize,sopass100000astheargumenttoiter_content().
ThefileRomeoAndJuliet.txtwillnowexistinthecurrentworkingdirectory.Notethatwhilethefilenameonthewebsitewaspg1112.txt,thefileonyourharddrivehasadifferentfilename.Therequestsmodulesimplyhandlesdownloadingthecontentsofwebpages.Oncethepageisdownloaded,itissimplydatainyourprogram.EvenifyouweretoloseyourInternetconnectionafterdownloadingthewebpage,allthepagedatawouldstillbeonyourcomputer.
Thewrite()methodreturnsthenumberofbyteswrittentothefile.Inthepreviousexample,therewere100,000bytesinthefirstchunk,andtheremainingpartofthefileneededonly78,981bytes.
Toreview,here’sthecompleteprocessfordownloadingandsavingafile:
1. Callrequests.get()todownloadthefile.2. Callopen()with'wb'tocreateanewfileinwritebinarymode.3. LoopovertheResponseobject’siter_content()method.4. Callwrite()oneachiterationtowritethecontenttothefile.5. Callclose()toclosethefile.
That’sallthereistotherequestsmodule!Theforloopanditer_content()stuffmayseemcomplicatedcomparedtotheopen()/write()/close()workflowyou’vebeenusing
towritetextfiles,butit’stoensurethattherequestsmoduledoesn’teatuptoomuchmemoryevenifyoudownloadmassivefiles.Youcanlearnabouttherequestsmodule’sotherfeaturesfromhttp://requests.readthedocs.org/.
HTMLBeforeyoupickapartwebpages,you’lllearnsomeHTMLbasics.You’llalsoseehowtoaccessyourwebbrowser’spowerfuldevelopertools,whichwillmakescrapinginformationfromtheWebmucheasier.
ResourcesforLearningHTMLHypertextMarkupLanguage(HTML)istheformatthatwebpagesarewrittenin.ThischapterassumesyouhavesomebasicexperiencewithHTML,butifyouneedabeginnertutorial,Isuggestoneofthefollowingsites:
http://htmldog.com/guides/html/beginner/http://www.codecademy.com/tracks/web/https://developer.mozilla.org/en-US/learn/html/
AQuickRefresherIncaseit’sbeenawhilesinceyou’velookedatanyHTML,here’saquickoverviewofthebasics.AnHTMLfileisaplaintextfilewiththe.htmlfileextension.Thetextinthesefilesissurroundedbytags,whicharewordsenclosedinanglebrackets.Thetagstellthebrowserhowtoformatthewebpage.Astartingtagandclosingtagcanenclosesometexttoformanelement.Thetext(orinnerHTML)isthecontentbetweenthestartingandclosingtags.Forexample,thefollowingHTMLwilldisplayHelloworld!inthebrowser,withHelloinbold:
<strong>Hello</strong>world!
ThisHTMLwilllooklikeFigure11-1inabrowser.
Figure11-1.Helloworld!renderedinthebrowser
Theopening<strong>tagsaysthattheenclosedtextwillappearinbold.Theclosing</strong>tagstellsthebrowserwheretheendoftheboldtextis.
TherearemanydifferenttagsinHTML.Someofthesetagshaveextrapropertiesintheformofattributeswithintheanglebrackets.Forexample,the<a>tagenclosestextthatshouldbealink.TheURLthatthetextlinkstoisdeterminedbythehrefattribute.Here’sanexample:
Al'sfree<ahref="http://inventwithpython.com">Pythonbooks</a>.
ThisHTMLwilllooklikeFigure11-2inabrowser.
Figure11-2.Thelinkrenderedinthebrowser
Someelementshaveanidattributethatisusedtouniquelyidentifytheelementinthepage.Youwillofteninstructyourprogramstoseekoutanelementbyitsidattribute,sofiguringoutanelement’sidattributeusingthebrowser’sdevelopertoolsisacommontaskinwritingwebscrapingprograms.
ViewingtheSourceHTMLofaWebPageYou’llneedtolookattheHTMLsourceofthewebpagesthatyourprogramswillworkwith.Todothis,right-click(orCTRL-clickonOSX)anywebpageinyourwebbrowser,andselectViewSourceorViewpagesourcetoseetheHTMLtextofthepage(seeFigure11-3).Thisisthetextyourbrowseractuallyreceives.Thebrowserknowshowtodisplay,orrender,thewebpagefromthisHTML.
Figure11-3.Viewingthesourceofawebpage
IhighlyrecommendviewingthesourceHTMLofsomeofyourfavoritesites.It’sfineifyoudon’tfullyunderstandwhatyouareseeingwhenyoulookatthesource.Youwon’tneedHTMLmasterytowritesimplewebscrapingprograms—afterall,youwon’tbewritingyourownwebsites.Youjustneedenoughknowledgetopickoutdatafromanexistingsite.
OpeningYourBrowser’sDeveloperToolsInadditiontoviewingawebpage’ssource,youcanlookthroughapage’sHTMLusingyourbrowser’sdevelopertools.InChromeandInternetExplorerforWindows,thedevelopertoolsarealreadyinstalled,andyoucanpressF12tomakethemappear(seeFigure11-4).PressingF12againwillmakethedevelopertoolsdisappear.InChrome,youcanalsobringupthedevelopertoolsbyselectingView▸Developer▸DeveloperTools.InOSX,pressing -OPTION-IwillopenChrome’sDeveloperTools.
Figure11-4.TheDeveloperToolswindowintheChromebrowser
InFirefox,youcanbringuptheWebDeveloperToolsInspectorbypressingCTRL-SHIFT-ConWindowsandLinuxorbypressing⌘-OPTION-ConOSX.ThelayoutisalmostidenticaltoChrome’sdevelopertools.
InSafari,openthePreferenceswindow,andontheAdvancedpanechecktheShowDevelopmenuinthemenubaroption.Afterithasbeenenabled,youcanbringupthedevelopertoolsbypressing -OPTION-I.
Afterenablingorinstallingthedevelopertoolsinyourbrowser,youcanright-clickanypartofthewebpageandselectInspectElementfromthecontextmenutobringuptheHTMLresponsibleforthatpartofthepage.ThiswillbehelpfulwhenyoubegintoparseHTMLforyourwebscrapingprograms.
DON’TUSEREGULAREXPRESSIONSTOPARSEHTML
LocatingaspecificpieceofHTMLinastringseemslikeaperfectcaseforregularexpressions.However,Iadviseyouagainstit.TherearemanydifferentwaysthatHTMLcanbeformattedandstillbeconsideredvalidHTML,buttryingtocaptureallthesepossiblevariationsinaregularexpressioncanbetediousanderrorprone.AmoduledevelopedspecificallyforparsingHTML,suchasBeautifulSoup,willbelesslikelytoresultinbugs.
Youcanfindanextendedargumentforwhyyoushouldn’ttoparseHTMLwithregularexpressionsathttp://stackoverflow.com/a/1732454/1893164/.
UsingtheDeveloperToolstoFindHTMLElementsOnceyourprogramhasdownloadedawebpageusingtherequestsmodule,youwillhavethepage’sHTMLcontentasasinglestringvalue.NowyouneedtofigureoutwhichpartoftheHTMLcorrespondstotheinformationonthewebpageyou’reinterestedin.
Thisiswherethebrowser’sdevelopertoolscanhelp.Sayyouwanttowriteaprogramtopullweatherforecastdatafromhttp://weather.gov/.Beforewritinganycode,doalittleresearch.Ifyouvisitthesiteandsearchforthe94105ZIPcode,thesitewilltakeyoutoapageshowingtheforecastforthatarea.
Whatifyou’reinterestedinscrapingthetemperatureinformationforthatZIPcode?Right-clickwhereitisonthepage(orCONTROL-clickonOSX)andselectInspectElementfromthecontextmenuthatappears.ThiswillbringuptheDeveloperToolswindow,whichshowsyoutheHTMLthatproducesthisparticularpartofthewebpage.Figure11-5showsthedevelopertoolsopentotheHTMLofthetemperature.
Figure11-5.Inspectingtheelementthatholdsthetemperaturetextwiththedevelopertools
Fromthedevelopertools,youcanseethattheHTMLresponsibleforthetemperaturepartofthewebpageis<pclass="myforecast-current-lrg">57°F</p>.Thisisexactlywhatyouwerelookingfor!Itseemsthatthetemperatureinformationiscontainedinsidea<p>elementwiththemyforecast-current-lrgclass.Nowthatyouknowwhatyou’relookingfor,theBeautifulSoupmodulewillhelpyoufinditinthestring.
ParsingHTMLwiththeBeautifulSoupModuleBeautifulSoupisamoduleforextractinginformationfromanHTMLpage(andismuchbetterforthispurposethanregularexpressions).TheBeautifulSoupmodule’snameisbs4(forBeautifulSoup,version4).Toinstallit,youwillneedtorunpipinstallbeautifulsoup4fromthecommandline.(CheckoutAppendixAforinstructionsoninstallingthird-partymodules.)Whilebeautifulsoup4isthenameusedforinstallation,toimportBeautifulSoupyourunimportbs4.
Forthischapter,theBeautifulSoupexampleswillparse(thatis,analyzeandidentifythepartsof)anHTMLfileontheharddrive.OpenanewfileeditorwindowinIDLE,enterthefollowing,andsaveitasexample.html.Alternatively,downloaditfromhttp://nostarch.com/automatestuff/.
<!--Thisistheexample.htmlexamplefile.-->
<html><head><title>TheWebsiteTitle</title></head>
<body>
<p>Downloadmy<strong>Python</strong>bookfrom<ahref="http://
inventwithpython.com">mywebsite</a>.</p>
<pclass="slogan">LearnPythontheeasyway!</p>
<p>By<spanid="author">AlSweigart</span></p>
</body></html>
Asyoucansee,evenasimpleHTMLfileinvolvesmanydifferenttagsandattributes,andmattersquicklygetconfusingwithcomplexwebsites.Thankfully,BeautifulSoupmakesworkingwithHTMLmucheasier.
CreatingaBeautifulSoupObjectfromHTMLThebs4.BeautifulSoup()functionneedstobecalledwithastringcontainingtheHTMLitwillparse.Thebs4.BeautifulSoup()functionreturnsisaBeautifulSoupobject.EnterthefollowingintotheinteractiveshellwhileyourcomputerisconnectedtotheInternet:
>>>importrequests,bs4
>>>res=requests.get('http://nostarch.com')
>>>res.raise_for_status()
>>>noStarchSoup=bs4.BeautifulSoup(res.text)
>>>type(noStarchSoup)
<class'bs4.BeautifulSoup'>
Thiscodeusesrequests.get()todownloadthemainpagefromtheNoStarchPresswebsiteandthenpassesthetextattributeoftheresponsetobs4.BeautifulSoup().TheBeautifulSoupobjectthatitreturnsisstoredinavariablenamednoStarchSoup.
YoucanalsoloadanHTMLfilefromyourharddrivebypassingaFileobjecttobs4.BeautifulSoup().Enterthefollowingintotheinteractiveshell(makesuretheexample.htmlfileisintheworkingdirectory):
>>>exampleFile=open('example.html')
>>>exampleSoup=bs4.BeautifulSoup(exampleFile)
>>>type(exampleSoup)
<class'bs4.BeautifulSoup'>
OnceyouhaveaBeautifulSoupobject,youcanuseitsmethodstolocatespecificpartsofanHTMLdocument.
FindinganElementwiththeselect()MethodYoucanretrieveawebpageelementfromaBeautifulSoupobjectbycallingthe
select()methodandpassingastringofaCSSselectorfortheelementyouarelookingfor.Selectorsarelikeregularexpressions:Theyspecifyapatterntolookfor,inthiscase,inHTMLpagesinsteadofgeneraltextstrings.
AfulldiscussionofCSSselectorsyntaxisbeyondthescopeofthisbook(there’sagoodselectortutorialintheresourcesathttp://nostarch.com/automatestuff/),buthere’sashortintroductiontoselectors.Table11-2showsexamplesofthemostcommonCSSselectorpatterns.
Table11-2.ExamplesofCSSSelectors
Selectorpassedtotheselect()method Willmatch…
soup.select('div') Allelementsnamed<div>
soup.select('#author') Theelementwithanidattributeofauthor
soup.select('.notice') AllelementsthatuseaCSSclassattributenamednotice
soup.select('divspan') Allelementsnamed<span>thatarewithinanelementnamed<div>
soup.select('div>span') Allelementsnamed<span>thataredirectlywithinanelementnamed<div>,withnootherelementinbetween
soup.select('input[name]') Allelementsnamed<input>thathaveanameattributewithanyvalue
soup.select('input[type="button"]') Allelementsnamed<input>thathaveanattributenamedtypewithvaluebutton
Thevariousselectorpatternscanbecombinedtomakesophisticatedmatches.Forexample,soup.select('p#author')willmatchanyelementthathasanidattributeofauthor,aslongasitisalsoinsidea<p>element.
Theselect()methodwillreturnalistofTagobjects,whichishowBeautifulSouprepresentsanHTMLelement.ThelistwillcontainoneTagobjectforeverymatchintheBeautifulSoupobject’sHTML.Tagvaluescanbepassedtothestr()functiontoshowtheHTMLtagstheyrepresent.TagvaluesalsohaveanattrsattributethatshowsalltheHTMLattributesofthetagasadictionary.Usingtheexample.htmlfilefromearlier,enterthefollowingintotheinteractiveshell:
>>>importbs4
>>>exampleFile=open('example.html')
>>>exampleSoup=bs4.BeautifulSoup(exampleFile.read())
>>>elems=exampleSoup.select('#author')
>>>type(elems)
<class'list'>
>>>len(elems)
1
>>>type(elems[0])
<class'bs4.element.Tag'>
>>>elems[0].getText()
'AlSweigart'
>>>str(elems[0])
'<spanid="author">AlSweigart</span>'
>>>elems[0].attrs
{'id':'author'}
Thiscodewillpulltheelementwithid="author"outofourexampleHTML.Weuseselect('#author')toreturnalistofalltheelementswithid="author".WestorethislistofTagobjectsinthevariableelems,andlen(elems)tellsusthereisoneTagobjectinthelist;therewasonematch.CallinggetText()ontheelementreturnstheelement’stext,orinnerHTML.Thetextofanelementisthecontentbetweentheopeningandclosingtags:inthiscase,'AlSweigart'.
Passingtheelementtostr()returnsastringwiththestartingandclosingtagsandtheelement’stext.Finally,attrsgivesusadictionarywiththeelement’sattribute,'id',andthevalueoftheidattribute,'author'.
Youcanalsopullallthe<p>elementsfromtheBeautifulSoupobject.Enterthisintotheinteractiveshell:
>>>pElems=exampleSoup.select('p')
>>>str(pElems[0])
'<p>Downloadmy<strong>Python</strong>bookfrom<ahref="http://
inventwithpython.com">mywebsite</a>.</p>'
>>>pElems[0].getText()
'DownloadmyPythonbookfrommywebsite.'
>>>str(pElems[1])
'<pclass="slogan">LearnPythontheeasyway!</p>'
>>>pElems[1].getText()
'LearnPythontheeasyway!'
>>>str(pElems[2])
'<p>By<spanid="author">AlSweigart</span></p>'
>>>pElems[2].getText()
'ByAlSweigart'
Thistime,select()givesusalistofthreematches,whichwestoreinpElems.Usingstr()onpElems[0],pElems[1],andpElems[2]showsyoueachelementasastring,andusinggetText()oneachelementshowsyouitstext.
GettingDatafromanElement’sAttributesTheget()methodforTagobjectsmakesitsimpletoaccessattributevaluesfromanelement.Themethodispassedastringofanattributenameandreturnsthatattribute’svalue.Usingexample.html,enterthefollowingintotheinteractiveshell:
>>>importbs4
>>>soup=bs4.BeautifulSoup(open('example.html'))
>>>spanElem=soup.select('span')[0]
>>>str(spanElem)
'<spanid="author">AlSweigart</span>'
>>>spanElem.get('id')
'author'
>>>spanElem.get('some_nonexistent_addr')==None
True
>>>spanElem.attrs
{'id':'author'}
Hereweuseselect()tofindany<span>elementsandthenstorethefirstmatchedelementinspanElem.Passingtheattributename'id'toget()returnstheattribute’svalue,'author'.
Project:“I’mFeelingLucky”GoogleSearchWheneverIsearchatopiconGoogle,Idon’tlookatjustonesearchresultatatime.Bymiddle-clickingasearchresultlink(orclickingwhileholdingCTRL),Iopenthefirstseverallinksinabunchofnewtabstoreadlater.IsearchGoogleoftenenoughthatthisworkflow—openingmybrowser,searchingforatopic,andmiddle-clickingseverallinksonebyone—istedious.ItwouldbeniceifIcouldsimplytypeasearchtermonthecommandlineandhavemycomputerautomaticallyopenabrowserwithallthetopsearchresultsinnewtabs.Let’swriteascripttodothis.
Thisiswhatyourprogramdoes:
Getssearchkeywordsfromthecommandlinearguments.Retrievesthesearchresultspage.Opensabrowsertabforeachresult.
Thismeansyourcodewillneedtodothefollowing:
Readthecommandlineargumentsfromsys.argv.Fetchthesearchresultpagewiththerequestsmodule.Findthelinkstoeachsearchresult.Callthewebbrowser.open()functiontoopenthewebbrowser.
Openanewfileeditorwindowandsaveitaslucky.py.
Step1:GettheCommandLineArgumentsandRequesttheSearchPageBeforecodinganything,youfirstneedtoknowtheURLofthesearchresultpage.Bylookingatthebrowser’saddressbarafterdoingaGooglesearch,youcanseethattheresultpagehasaURLlikehttps://www.google.com/search?q=SEARCH_TERM_HERE.TherequestsmodulecandownloadthispageandthenyoucanuseBeautifulSouptofindthesearchresultlinksintheHTML.Finally,you’llusethewebbrowsermoduletoopenthoselinksinbrowsertabs.
Makeyourcodelooklikethefollowing:#!python3
#lucky.py-OpensseveralGooglesearchresults.
importrequests,sys,webbrowser,bs4
print('Googling…')#displaytextwhiledownloadingtheGooglepage
res=requests.get('http://google.com/search?q='+''.join(sys.argv[1:]))
res.raise_for_status()
#TODO:Retrievetopsearchresultlinks.
#TODO:Openabrowsertabforeachresult.
Theuserwillspecifythesearchtermsusingcommandlineargumentswhentheylaunchtheprogram.Theseargumentswillbestoredasstringsinalistinsys.argv.
Step2:FindAlltheResultsNowyouneedtouseBeautifulSouptoextractthetopsearchresultlinksfromyourdownloadedHTML.Buthowdoyoufigureouttherightselectorforthejob?Forexample,youcan’tjustsearchforall<a>tags,becausetherearelotsoflinksyoudon’t
careaboutintheHTML.Instead,youmustinspectthesearchresultpagewiththebrowser’sdevelopertoolstotrytofindaselectorthatwillpickoutonlythelinksyouwant.
AfterdoingaGooglesearchforBeautifulSoup,youcanopenthebrowser’sdevelopertoolsandinspectsomeofthelinkelementsonthepage.Theylookincrediblycomplicated,somethinglikethis:<ahref="/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&
amp;ved=0CCgQFjAA&url=http%3A%2F%2Fwww.crummy.com%2Fsoftware%2FBeautifulSoup
%2F&ei=LHBVU_XDD9KVyAShmYDwCw&usg=AFQjCNHAxwplurFOBqg5cehWQEVKi-
TuLQ&sig2=sdZu6WVlBlVSDrwhtworMA"onmousedown="return
rwt(this,'','','','1','AFQjCNHAxwplurFOBqg5cehWQEVKi-
TuLQ','sdZu6WVlBlVSDrwhtworMA','0CCgQFjAA','','',event)"data-
href="http://www.crummy.com/software/BeautifulSoup/"><em>Beautiful
Soup</em>:WecalledhimTortoisebecausehetaughtus.</a>.
Itdoesn’tmatterthattheelementlooksincrediblycomplicated.Youjustneedtofindthepatternthatallthesearchresultlinkshave.Butthis<a>elementdoesn’thaveanythingthateasilydistinguishesitfromthenonsearchresult<a>elementsonthepage.
Makeyourcodelooklikethefollowing:#!python3
#lucky.py-Opensseveralgooglesearchresults.
importrequests,sys,webbrowser,bs4
--snip--
#Retrievetopsearchresultlinks.
soup=bs4.BeautifulSoup(res.text)
#Openabrowsertabforeachresult.
linkElems=soup.select('.ra')
Ifyoulookupalittlefromthe<a>element,though,thereisanelementlikethis:<h3class="r">.LookingthroughtherestoftheHTMLsource,itlooksliketherclassisusedonlyforsearchresultlinks.Youdon’thavetoknowwhattheCSSclassrisorwhatitdoes.You’rejustgoingtouseitasamarkerforthe<a>elementyouarelookingfor.YoucancreateaBeautifulSoupobjectfromthedownloadedpage’sHTMLtextandthenusetheselector'.ra'tofindall<a>elementsthatarewithinanelementthathastherCSSclass.
Step3:OpenWebBrowsersforEachResultFinally,we’lltelltheprogramtoopenwebbrowsertabsforourresults.Addthefollowingtotheendofyourprogram:
#!python3
#lucky.py-Opensseveralgooglesearchresults.
importrequests,sys,webbrowser,bs4
--snip--
#Openabrowsertabforeachresult.
linkElems=soup.select('.ra')
numOpen=min(5,len(linkElems))
foriinrange(numOpen):
webbrowser.open('http://google.com'+linkElems[i].get('href'))
Bydefault,youopenthefirstfivesearchresultsinnewtabsusingthewebbrowser
module.However,theusermayhavesearchedforsomethingthatturnedupfewerthanfiveresults.Thesoup.select()callreturnsalistofalltheelementsthatmatchedyour'.ra'selector,sothenumberoftabsyouwanttoopeniseither5orthelengthofthislist(whicheverissmaller).
Thebuilt-inPythonfunctionmin()returnsthesmallestoftheintegerorfloatargumentsitispassed.(Thereisalsoabuilt-inmax()functionthatreturnsthelargestargumentitispassed.)Youcanusemin()tofindoutwhethertherearefewerthanfivelinksinthelistandstorethenumberoflinkstoopeninavariablenamednumOpen.Thenyoucanrunthroughaforloopbycallingrange(numOpen).
Oneachiterationoftheloop,youusewebbrowser.open()toopenanewtabinthewebbrowser.Notethatthehrefattribute’svalueinthereturned<a>elementsdonothavetheinitialhttp://google.compart,soyouhavetoconcatenatethattothehrefattribute’sstringvalue.
NowyoucaninstantlyopenthefirstfiveGoogleresultsfor,say,Pythonprogrammingtutorialsbyrunningluckypythonprogrammingtutorialsonthecommandline!(SeeAppendixBforhowtoeasilyrunprogramsonyouroperatingsystem.)
IdeasforSimilarProgramsThebenefitoftabbedbrowsingisthatyoucaneasilyopenlinksinnewtabstoperuselater.Aprogramthatautomaticallyopensseverallinksatoncecanbeaniceshortcuttodothefollowing:
OpenalltheproductpagesaftersearchingashoppingsitesuchasAmazonOpenallthelinkstoreviewsforasingleproductOpentheresultlinkstophotosafterperformingasearchonaphotositesuchasFlickrorImgur
Project:DownloadingAllXKCDComicsBlogsandotherregularlyupdatingwebsitesusuallyhaveafrontpagewiththemostrecentpostaswellasaPreviousbuttononthepagethattakesyoutothepreviouspost.ThenthatpostwillalsohaveaPreviousbutton,andsoon,creatingatrailfromthemostrecentpagetothefirstpostonthesite.Ifyouwantedacopyofthesite’scontenttoreadwhenyou’renotonline,youcouldmanuallynavigateovereverypageandsaveeachone.Butthisisprettyboringwork,solet’swriteaprogramtodoitinstead.
XKCDisapopulargeekwebcomicwithawebsitethatfitsthisstructure(seeFigure11-6).Thefrontpageathttp://xkcd.com/hasaPrevbuttonthatguidestheuserbackthroughpriorcomics.Downloadingeachcomicbyhandwouldtakeforever,butyoucanwriteascripttodothisinacoupleofminutes.
Here’swhatyourprogramdoes:
LoadstheXKCDhomepage.Savesthecomicimageonthatpage.FollowsthePreviousComiclink.Repeatsuntilitreachesthefirstcomic.
Figure11-6.XKCD,“awebcomicofromance,sarcasm,math,andlanguage”
Thismeansyourcodewillneedtodothefollowing:
Downloadpageswiththerequestsmodule.FindtheURLofthecomicimageforapageusingBeautifulSoup.Downloadandsavethecomicimagetotheharddrivewithiter_content().FindtheURLofthePreviousComiclink,andrepeat.
OpenanewfileeditorwindowandsaveitasdownloadXkcd.py.
Step1:DesigntheProgramIfyouopenthebrowser’sdevelopertoolsandinspecttheelementsonthepage,you’llfindthefollowing:
TheURLofthecomic’simagefileisgivenbythehrefattributeofan<img>element.The<img>elementisinsidea<divid="comic">element.ThePrevbuttonhasarelHTMLattributewiththevalueprev.Thefirstcomic’sPrevbuttonlinkstothehttp://xkcd.com/#URL,indicatingthattherearenomorepreviouspages.
Makeyourcodelooklikethefollowing:#!python3
#downloadXkcd.py-DownloadseverysingleXKCDcomic.
importrequests,os,bs4
url='http://xkcd.com'#startingurl
os.makedirs('xkcd',exist_ok=True)#storecomicsin./xkcd
whilenoturl.endswith('#'):
#TODO:Downloadthepage.
#TODO:FindtheURLofthecomicimage.
#TODO:Downloadtheimage.
#TODO:Savetheimageto./xkcd.
#TODO:GetthePrevbutton'surl.
print('Done.')
You’llhaveaurlvariablethatstartswiththevalue'http://xkcd.com'andrepeatedlyupdateit(inaforloop)withtheURLofthecurrentpage’sPrevlink.Ateverystepintheloop,you’lldownloadthecomicaturl.You’llknowtoendtheloopwhenurlendswith'#'.
Youwilldownloadtheimagefilestoafolderinthecurrentworkingdirectorynamedxkcd.Thecallos.makedirs()ensuresthatthisfolderexists,andtheexist_ok=Truekeywordargumentpreventsthefunctionfromthrowinganexceptionifthisfolderalreadyexists.Therestofthecodeisjustcommentsthatoutlinetherestofyourprogram.
Step2:DownloadtheWebPageLet’simplementthecodefordownloadingthepage.Makeyourcodelooklikethefollowing:
#!python3
#downloadXkcd.py-DownloadseverysingleXKCDcomic.
importrequests,os,bs4
url='http://xkcd.com'#startingurl
os.makedirs('xkcd',exist_ok=True)#storecomicsin./xkcd
whilenoturl.endswith('#'):
#Downloadthepage.
print('Downloadingpage%s…'%url)
res=requests.get(url)
res.raise_for_status()
soup=bs4.BeautifulSoup(res.text)
#TODO:FindtheURLofthecomicimage.
#TODO:Downloadtheimage.
#TODO:Savetheimageto./xkcd.
#TODO:GetthePrevbutton'surl.
print('Done.')
First,printurlsothattheuserknowswhichURLtheprogramisabouttodownload;thenusetherequestsmodule’srequest.get()functiontodownloadit.Asalways,youimmediatelycalltheResponseobject’sraise_for_status()methodtothrowanexceptionandendtheprogramifsomethingwentwrongwiththedownload.Otherwise,youcreateaBeautifulSoupobjectfromthetextofthedownloadedpage.
Step3:FindandDownloadtheComicImageMakeyourcodelooklikethefollowing:
#!python3
#downloadXkcd.py-DownloadseverysingleXKCDcomic.
importrequests,os,bs4
--snip--
#FindtheURLofthecomicimage.
comicElem=soup.select('#comicimg')
ifcomicElem==[]:
print('Couldnotfindcomicimage.')
else:
comicUrl=comicElem[0].get('src')
#Downloadtheimage.
print('Downloadingimage%s…'%(comicUrl))
res=requests.get(comicUrl)
res.raise_for_status()
#TODO:Savetheimageto./xkcd.
#TODO:GetthePrevbutton'surl.
print('Done.')
FrominspectingtheXKCDhomepagewithyourdevelopertools,youknowthatthe<img>elementforthecomicimageisinsidea<div>elementwiththeidattributesettocomic,sotheselector'#comicimg'willgetyouthecorrect<img>elementfromtheBeautifulSoupobject.
AfewXKCDpageshavespecialcontentthatisn’tasimpleimagefile.That’sfine;you’lljustskipthose.Ifyourselectordoesn’tfindanyelements,thensoup.select('#comicimg')willreturnablanklist.Whenthathappens,theprogramcanjustprintanerrormessageandmoveonwithoutdownloadingtheimage.
Otherwise,theselectorwillreturnalistcontainingone<img>element.Youcangetthesrcattributefromthis<img>elementandpassittorequests.get()todownloadthecomic’simagefile.
Step4:SavetheImageandFindthePreviousComicMakeyourcodelooklikethefollowing:
#!python3
#downloadXkcd.py-DownloadseverysingleXKCDcomic.
importrequests,os,bs4
--snip--
#Savetheimageto./xkcd.
imageFile=open(os.path.join('xkcd',os.path.basename(comicUrl)),'wb')
forchunkinres.iter_content(100000):
imageFile.write(chunk)
imageFile.close()
#GetthePrevbutton'surl.
prevLink=soup.select('a[rel="prev"]')[0]
url='http://xkcd.com'+prevLink.get('href')
print('Done.')
Atthispoint,theimagefileofthecomicisstoredintheresvariable.Youneedtowritethisimagedatatoafileontheharddrive.
You’llneedafilenameforthelocalimagefiletopasstoopen().ThecomicUrlwillhaveavaluelike'http://imgs.xkcd.com/comics/heartbleed_explanation.png'—whichyoumighthavenoticedlooksalotlikeafilepath.Andinfact,youcancallos.path.basename()withcomicUrl,anditwillreturnjustthelastpartoftheURL,'heartbleed_explanation.png'.Youcanusethisasthefilenamewhensavingtheimagetoyourharddrive.Youjointhisnamewiththenameofyourxkcdfolderusingos.path.join()sothatyourprogramusesbackslashes(\)onWindowsandforwardslashes(/)onOSXandLinux.Nowthatyoufinallyhavethefilename,youcancallopen()toopenanewfilein'wb'“writebinary”mode.
Rememberfromearlierinthischapterthattosavefilesyou’vedownloadedusingRequests,youneedtoloopoverthereturnvalueoftheiter_content()method.Thecodeintheforloopwritesoutchunksoftheimagedata(atmost100,000byteseach)tothefileandthenyouclosethefile.Theimageisnowsavedtoyourharddrive.
Afterward,theselector'a[rel="prev"]'identifiesthe<a>elementwiththerelattributesettoprev,andyoucanusethis<a>element’shrefattributetogetthepreviouscomic’sURL,whichgetsstoredinurl.Thenthewhileloopbeginstheentiredownloadprocessagainforthiscomic.
Theoutputofthisprogramwilllooklikethis:Downloadingpagehttp://xkcd.com…
Downloadingimagehttp://imgs.xkcd.com/comics/phone_alarm.png…
Downloadingpagehttp://xkcd.com/1358/...
Downloadingimagehttp://imgs.xkcd.com/comics/nro.png…
Downloadingpagehttp://xkcd.com/1357/...
Downloadingimagehttp://imgs.xkcd.com/comics/free_speech.png…
Downloadingpagehttp://xkcd.com/1356/...
Downloadingimagehttp://imgs.xkcd.com/comics/orbital_mechanics.png…
Downloadingpagehttp://xkcd.com/1355/...
Downloadingimagehttp://imgs.xkcd.com/comics/airplane_message.png…
Downloadingpagehttp://xkcd.com/1354/...
Downloadingimagehttp://imgs.xkcd.com/comics/heartbleed_explanation.png…
--snip--
ThisprojectisagoodexampleofaprogramthatcanautomaticallyfollowlinksinordertoscrapelargeamountsofdatafromtheWeb.YoucanlearnaboutBeautifulSoup’sotherfeaturesfromitsdocumentationathttp://www.crummy.com/software/BeautifulSoup/bs4/doc/.
IdeasforSimilarProgramsDownloadingpagesandfollowinglinksarethebasisofmanywebcrawlingprograms.
Similarprogramscouldalsodothefollowing:
Backupanentiresitebyfollowingallofitslinks.Copyallthemessagesoffawebforum.Duplicatethecatalogofitemsforsaleonanonlinestore.
TherequestsandBeautifulSoupmodulesaregreataslongasyoucanfigureouttheURLyouneedtopasstorequests.get().However,sometimesthisisn’tsoeasytofind.Orperhapsthewebsiteyouwantyourprogramtonavigaterequiresyoutologinfirst.Theseleniummodulewillgiveyourprogramsthepowertoperformsuchsophisticatedtasks.
ControllingtheBrowserwiththeseleniumModuleTheseleniummoduleletsPythondirectlycontrolthebrowserbyprogrammaticallyclickinglinksandfillinginlogininformation,almostasthoughthereisahumanuserinteractingwiththepage.SeleniumallowsyoutointeractwithwebpagesinamuchmoreadvancedwaythanRequestsandBeautifulSoup;butbecauseitlaunchesawebbrowser,itisabitslowerandhardtoruninthebackgroundif,say,youjustneedtodownloadsomefilesfromtheWeb.
AppendixAhasmoredetailedstepsoninstallingthird-partymodules.
StartingaSelenium-ControlledBrowserFortheseexamples,you’llneedtheFirefoxwebbrowser.Thiswillbethebrowserthatyoucontrol.Ifyoudon’talreadyhaveFirefox,youcandownloaditforfreefromhttp://getfirefox.com/.
ImportingthemodulesforSeleniumisslightlytricky.Insteadofimportselenium,youneedtorunfromseleniumimportwebdriver.(Theexactreasonwhytheseleniummoduleissetupthiswayisbeyondthescopeofthisbook.)Afterthat,youcanlaunchtheFirefoxbrowserwithSelenium.Enterthefollowingintotheinteractiveshell:
>>>fromseleniumimportwebdriver
>>>browser=webdriver.Firefox()
>>>type(browser)
<class'selenium.webdriver.firefox.webdriver.WebDriver'>
>>>browser.get('http://inventwithpython.com')
You’llnoticewhenwebdriver.Firefox()iscalled,theFirefoxwebbrowserstartsup.Callingtype()onthevaluewebdriver.Firefox()revealsit’softheWebDriverdatatype.Andcallingbrowser.get('http://inventwithpython.com')directsthebrowsertohttp://inventwithpython.com/.YourbrowsershouldlooksomethinglikeFigure11-7.
Figure11-7.Aftercallingwebdriver.Firefox()andget()inIDLE,theFirefoxbrowserappears.
FindingElementsonthePage
WebDriverobjectshavequiteafewmethodsforfindingelementsonapage.Theyaredividedintothefind_element_*andfind_elements_*methods.Thefind_element_*methodsreturnasingleWebElementobject,representingthefirstelementonthepagethatmatchesyourquery.Thefind_elements_*methodsreturnalistofWebElement_*objectsforeverymatchingelementonthepage.
Table11-3showsseveralexamplesoffind_element_*andfind_elements_*methodsbeingcalledonaWebDriverobjectthat’sstoredinthevariablebrowser.
Table11-3.Selenium’sWebDriverMethodsforFindingElements
Methodname WebElementobject/listreturned
browser.find_element_by_class_name(name)
browser.find_elements_by_class_name(name)ElementsthatusetheCSSclassname
browser.find_element_by_css_selector(selector)
browser.find_elements_by_css_selector(selector)ElementsthatmatchtheCSSselector
browser.find_element_by_id(id)
browser.find_elements_by_id(id)Elementswithamatchingidattributevalue
browser.find_element_by_link_text(text)
browser.find_elements_by_link_text(text)<a>elementsthatcompletelymatchthetextprovided
browser.find_element_by_partial_link_text(text)
browser.find_elements_by_partial_link_text(text)<a>elementsthatcontainthetextprovided
browser.find_element_by_name(name)
browser.find_elements_by_name(name)Elementswithamatchingnameattributevalue
browser.find_element_by_tag_name(name)
browser.find_elements_by_tag_name(name)Elementswithamatchingtagname(caseinsensitive;an<a>elementismatchedby'a'and'A')
Exceptforthe*_by_tag_name()methods,theargumentstoallthemethodsarecasesensitive.Ifnoelementsexistonthepagethatmatchwhatthemethodislookingfor,theseleniummoduleraisesaNoSuchElementexception.Ifyoudonotwantthisexceptiontocrashyourprogram,addtryandexceptstatementstoyourcode.
OnceyouhavetheWebElementobject,youcanfindoutmoreaboutitbyreadingtheattributesorcallingthemethodsinTable11-4.
Table11-4.WebElementAttributesandMethods
Attributeormethod Description
tag_name Thetagname,suchas'a'foran<a>element
get_attribute(name) Thevaluefortheelement’snameattribute
text Thetextwithintheelement,suchas'hello'in<span>hello</span>
clear() Fortextfieldortextareaelements,clearsthetexttypedintoit
is_displayed() ReturnsTrueiftheelementisvisible;otherwisereturnsFalse
is_enabled() Forinputelements,returnsTrueiftheelementisenabled;otherwisereturnsFalse
is_selected() Forcheckboxorradiobuttonelements,returnsTrueiftheelementisselected;otherwisereturnsFalse
location Adictionarywithkeys'x'and'y'forthepositionoftheelementinthepage
Forexample,openanewfileeditorandenterthefollowingprogram:fromseleniumimportwebdriver
browser=webdriver.Firefox()
browser.get('http://inventwithpython.com')
try:
elem=browser.find_element_by_class_name('bookcover')
print('Found<%s>elementwiththatclassname!'%(elem.tag_name))
except:
print('Wasnotabletofindanelementwiththatname.')
HereweopenFirefoxanddirectittoaURL.Onthispage,wetrytofindelementswiththeclassname'bookcover',andifsuchanelementisfound,weprintitstagnameusingthetag_nameattribute.Ifnosuchelementwasfound,weprintadifferentmessage.
Thisprogramwilloutputthefollowing:Found<img>elementwiththatclassname!
Wefoundanelementwiththeclassname'bookcover'andthetagname'img'.
ClickingthePageWebElementobjectsreturnedfromthefind_element_*andfind_elements_*methodshaveaclick()methodthatsimulatesamouseclickonthatelement.Thismethodcanbeusedtofollowalink,makeaselectiononaradiobutton,clickaSubmitbutton,ortriggerwhateverelsemighthappenwhentheelementisclickedbythemouse.Forexample,enterthefollowingintotheinteractiveshell:
>>>fromseleniumimportwebdriver
>>>browser=webdriver.Firefox()
>>>browser.get('http://inventwithpython.com')
>>>linkElem=browser.find_element_by_link_text('ReadItOnline')
>>>type(linkElem)
<class'selenium.webdriver.remote.webelement.WebElement'>
>>>linkElem.click()#followsthe"ReadItOnline"link
ThisopensFirefoxtohttp://inventwithpython.com/,getstheWebElementobjectforthe<a>
elementwiththetextReadItOnline,andthensimulatesclickingthat<a>element.It’sjustlikeifyouclickedthelinkyourself;thebrowserthenfollowsthatlink.
FillingOutandSubmittingFormsSendingkeystrokestotextfieldsonawebpageisamatteroffindingthe<input>or<textarea>elementforthattextfieldandthencallingthesend_keys()method.Forexample,enterthefollowingintotheinteractiveshell:
>>>fromseleniumimportwebdriver
>>>browser=webdriver.Firefox()
>>>browser.get('http://gmail.com')
>>>emailElem=browser.find_element_by_id('Email')
>>>emailElem.send_keys('[email protected]')
>>>passwordElem=browser.find_element_by_id('Passwd')
>>>passwordElem.send_keys('12345')
>>>passwordElem.submit()
AslongasGmailhasn’tchangedtheidoftheUsernameandPasswordtextfieldssincethisbookwaspublished,thepreviouscodewillfillinthosetextfieldswiththeprovidedtext.(Youcanalwaysusethebrowser’sinspectortoverifytheid.)Callingthesubmit()methodonanyelementwillhavethesameresultasclickingtheSubmitbuttonfortheformthatelementisin.(YoucouldhavejustaseasilycalledemailElem.submit(),andthecodewouldhavedonethesamething.)
SendingSpecialKeysSeleniumhasamoduleforkeyboardkeysthatareimpossibletotypeintoastringvalue,whichfunctionmuchlikeescapecharacters.Thesevaluesarestoredinattributesintheselenium.webdriver.common.keysmodule.Sincethatissuchalongmodulename,it’smucheasiertorunfromselenium.webdriver.common.keysimportKeysatthetopofyourprogram;ifyoudo,thenyoucansimplywriteKeysanywhereyou’dnormallyhavetowriteselenium.webdriver.common.keys.Table11-5liststhecommonlyusedKeysvariables.
Table11-5.CommonlyUsedVariablesintheselenium.webdriver.common.keysModule
Attributes Meanings
Keys.DOWN,Keys.UP,Keys.LEFT,Keys.RIGHT Thekeyboardarrowkeys
Keys.ENTER,Keys.RETURN TheENTERandRETURNkeys
Keys.HOME,Keys.END,Keys.PAGE_DOWN,Keys.PAGE_UP Thehome,end,pagedown,andpageupkeys
Keys.ESCAPE,Keys.BACK_SPACE,Keys.DELETE TheESC,BACKSPACE,andDELETEkeys
Keys.F1,Keys.F2,…,Keys.F12 TheF1toF12keysatthetopofthekeyboard
Keys.TAB TheTABkey
Forexample,ifthecursorisnotcurrentlyinatextfield,pressingtheHOMEandENDkeyswillscrollthebrowsertothetopandbottomofthepage,respectively.Enterthefollowingintotheinteractiveshell,andnoticehowthesend_keys()callsscrollthepage:
>>>fromseleniumimportwebdriver
>>>fromselenium.webdriver.common.keysimportKeys
>>>browser=webdriver.Firefox()
>>>browser.get('http://nostarch.com')
>>>htmlElem=browser.find_element_by_tag_name('html')
>>>htmlElem.send_keys(Keys.END)#scrollstobottom
>>>htmlElem.send_keys(Keys.HOME)#scrollstotop
The<html>tagisthebasetaginHTMLfiles:ThefullcontentoftheHTMLfileisenclosedwithinthe<html>and</html>tags.Callingbrowser.find_element_by_tag_name('html')isagoodplacetosendkeystothegeneralwebpage.Thiswouldbeusefulif,forexample,newcontentisloadedonceyou’vescrolledtothebottomofthepage.
ClickingBrowserButtonsSeleniumcansimulateclicksonvariousbrowserbuttonsaswellthroughthefollowingmethods:
browser.back().ClickstheBackbutton.browser.forward().ClickstheForwardbutton.browser.refresh().ClickstheRefresh/Reloadbutton.browser.quit().ClickstheCloseWindowbutton.
MoreInformationonSeleniumSeleniumcandomuchmorebeyondthefunctionsdescribedhere.Itcanmodifyyourbrowser’scookies,takescreenshotsofwebpages,andruncustomJavaScript.Tolearnmoreaboutthesefeatures,youcanvisittheSeleniumdocumentationathttp://selenium-python.readthedocs.org/.
SummaryMostboringtasksaren’tlimitedtothefilesonyourcomputer.BeingabletoprogrammaticallydownloadwebpageswillextendyourprogramstotheInternet.Therequestsmodulemakesdownloadingstraightforward,andwithsomebasicknowledgeofHTMLconceptsandselectors,youcanutilizetheBeautifulSoupmoduletoparsethepagesyoudownload.
Buttofullyautomateanyweb-basedtasks,youneeddirectcontrolofyourwebbrowserthroughtheseleniummodule.Theseleniummodulewillallowyoutologintowebsitesandfilloutformsautomatically.SinceawebbrowseristhemostcommonwaytosendandreceiveinformationovertheInternet,thisisagreatabilitytohaveinyourprogrammertoolkit.
PracticeQuestionsQ: 1.Brieflydescribethedifferencesbetweenthewebbrowser,requests,BeautifulSoup,andseleniummodules.
Q: 2.Whattypeofobjectisreturnedbyrequests.get()?Howcanyouaccessthedownloadedcontentasastringvalue?
Q: 3.WhatRequestsmethodchecksthatthedownloadworked?
Q: 4.HowcanyougettheHTTPstatuscodeofaRequestsresponse?
Q: 5.HowdoyousaveaRequestsresponsetoafile?
Q: 6.Whatisthekeyboardshortcutforopeningabrowser’sdevelopertools?
Q: 7.Howcanyouview(inthedevelopertools)theHTMLofaspecificelementonawebpage?
Q: 8.WhatistheCSSselectorstringthatwouldfindtheelementwithanidattributeofmain?
Q: 9.WhatistheCSSselectorstringthatwouldfindtheelementswithaCSSclassofhighlight?
Q: 10.WhatistheCSSselectorstringthatwouldfindallthe<div>elementsinsideanother<div>element?
Q: 11.WhatistheCSSselectorstringthatwouldfindthe<button>elementwithavalueattributesettofavorite?
Q: 12.SayyouhaveaBeautifulSoupTagobjectstoredinthevariablespamfortheelement<div>Helloworld!</div>.Howcouldyougetastring'Helloworld!'fromtheTagobject?
Q: 13.HowwouldyoustorealltheattributesofaBeautifulSoupTagobjectinavariablenamedlinkElem?
Q: 14.Runningimportseleniumdoesn’twork.Howdoyouproperlyimporttheseleniummodule?
Q: 15.What’sthedifferencebetweenthefind_element_*andfind_elements_*methods?
Q: 16.WhatmethodsdoSelenium’sWebElementobjectshaveforsimulatingmouseclicksandkeyboardkeys?
Q: 17.Youcouldcallsend_keys(Keys.ENTER)ontheSubmitbutton’sWebElementobject,butwhatisaneasierwaytosubmitaformwithSelenium?
Q: 18.Howcanyousimulateclickingabrowser’sForward,Back,andRefreshbuttonswithSelenium?
PracticeProjectsForpractice,writeprogramstodothefollowingtasks.
CommandLineEmailerWriteaprogramthattakesanemailaddressandstringoftextonthecommandlineandthen,usingSelenium,logsintoyouremailaccountandsendsanemailofthestringtotheprovidedaddress.(Youmightwanttosetupaseparateemailaccountforthisprogram.)
Thiswouldbeanicewaytoaddanotificationfeaturetoyourprograms.YoucouldalsowriteasimilarprogramtosendmessagesfromaFacebookorTwitteraccount.
ImageSiteDownloaderWriteaprogramthatgoestoaphoto-sharingsitelikeFlickrorImgur,searchesforacategoryofphotos,andthendownloadsalltheresultingimages.Youcouldwriteaprogramthatworkswithanyphotositethathasasearchfeature.
20482048isasimplegamewhereyoucombinetilesbyslidingthemup,down,left,orrightwiththearrowkeys.Youcanactuallygetafairlyhighscorebyrepeatedlyslidinginanup,right,down,andleftpatternoverandoveragain.Writeaprogramthatwillopenthegameathttps://gabrielecirulli.github.io/2048/andkeepsendingup,right,down,andleftkeystrokestoautomaticallyplaythegame.
LinkVerificationWriteaprogramthat,giventheURLofawebpage,willattempttodownloadeverylinkedpageonthepage.Theprogramshouldflaganypagesthathavea404“NotFound”statuscodeandprintthemoutasbrokenlinks.
[2]Theanswerisno.
Chapter12.WorkingwithExcelSpreadsheetsExcelisapopularandpowerfulspreadsheetapplicationforWindows.TheopenpyxlmoduleallowsyourPythonprogramstoreadandmodifyExcelspreadsheetfiles.Forexample,youmighthavetheboringtaskofcopyingcertaindatafromonespreadsheetandpastingitintoanotherone.Oryoumighthavetogothroughthousandsofrowsandpickoutjustahandfulofthemtomakesmalleditsbasedonsomecriteria.Oryoumighthavetolookthroughhundredsofspreadsheetsofdepartmentbudgets,searchingforanythatareinthered.Theseareexactlythesortofboring,mindlessspreadsheettasksthatPythoncandoforyou.
AlthoughExcelisproprietarysoftwarefromMicrosoft,therearefreealternativesthatrunonWindows,OSX,andLinux.BothLibreOfficeCalcandOpenOfficeCalcworkwithExcel’s.xlsxfileformatforspreadsheets,whichmeanstheopenpyxlmodulecanworkonspreadsheetsfromtheseapplicationsaswell.Youcandownloadthesoftwarefromhttps://www.libreoffice.org/andhttp://www.openoffice.org/,respectively.EvenifyoualreadyhaveExcelinstalledonyourcomputer,youmayfindtheseprogramseasiertouse.Thescreenshotsinthischapter,however,areallfromExcel2010onWindows7.
ExcelDocumentsFirst,let’sgooversomebasicdefinitions:AnExcelspreadsheetdocumentiscalledaworkbook.Asingleworkbookissavedinafilewiththe.xlsxextension.Eachworkbookcancontainmultiplesheets(alsocalledworksheets).Thesheettheuseriscurrentlyviewing(orlastviewedbeforeclosingExcel)iscalledtheactivesheet.
Eachsheethascolumns(addressedbylettersstartingatA)androws(addressedbynumbersstartingat1).Aboxataparticularcolumnandrowiscalledacell.Eachcellcancontainanumberortextvalue.Thegridofcellswithdatamakesupasheet.
InstallingtheopenpyxlModulePythondoesnotcomewithOpenPyXL,soyou’llhavetoinstallit.Followtheinstructionsforinstallingthird-partymodulesinAppendixA;thenameofthemoduleisopenpyxl.Totestwhetheritisinstalledcorrectly,enterthefollowingintotheinteractiveshell:
>>>importopenpyxl
Ifthemodulewascorrectlyinstalled,thisshouldproducenoerrormessages.Remembertoimporttheopenpyxlmodulebeforerunningtheinteractiveshellexamplesinthischapter,oryou’llgetaNameError:name'openpyxl'isnotdefinederror.
Thisbookcoversversion2.1.4ofOpenPyXL,butnewversionsareregularlyreleasedbytheOpenPyXLteam.Don’tworry,though:Newversionsshouldstaybackwardcompatiblewiththeinstructionsinthisbookforquitesometime.Ifyouhaveanewerversionandwanttoseewhatadditionalfeaturesmaybeavailabletoyou,youcancheckoutthefulldocumentationforOpenPyXLathttp://openpyxl.readthedocs.org/.
ReadingExcelDocumentsTheexamplesinthischapterwilluseaspreadsheetnamedexample.xlsxstoredintherootfolder.Youcaneithercreatethespreadsheetyourselfordownloaditfromhttp://nostarch.com/automatestuff/.Figure12-1showsthetabsforthethreedefaultsheetsnamedSheet1,Sheet2,andSheet3thatExcelautomaticallyprovidesfornewworkbooks.(Thenumberofdefaultsheetscreatedmayvarybetweenoperatingsystemsandspreadsheetprograms.)
Figure12-1.Thetabsforaworkbook’ssheetsareinthelower-leftcornerofExcel.
Sheet1intheexamplefileshouldlooklikeTable12-1.(Ifyoudidn’tdownloadexample.xlsxfromthewebsite,youshouldenterthisdataintothesheetyourself.)
Table12-1.Theexample.xlsxSpreadsheet
A B C
1 4/5/20151:34:02PM Apples 73
2 4/5/20153:41:23AM Cherries 85
3 4/6/201512:46:51PM Pears 14
4 4/8/20158:59:43AM Oranges 52
5 4/10/20152:07:00AM Apples 152
6 4/10/20156:10:37PM Bananas 23
7 4/10/20152:40:46AM Strawberries 98
Nowthatwehaveourexamplespreadsheet,let’sseehowwecanmanipulateitwiththeopenpyxlmodule.
OpeningExcelDocumentswithOpenPyXLOnceyou’veimportedtheopenpyxlmodule,you’llbeabletousetheopenpyxl.load_workbook()function.Enterthefollowingintotheinteractiveshell:
>>>importopenpyxl
>>>wb=openpyxl.load_workbook('example.xlsx')
>>>type(wb)
<class'openpyxl.workbook.workbook.Workbook'>
Theopenpyxl.load_workbook()functiontakesinthefilenameandreturnsavalueofthe
workbookdatatype.ThisWorkbookobjectrepresentstheExcelfile,abitlikehowaFileobjectrepresentsanopenedtextfile.
Rememberthatexample.xlsxneedstobeinthecurrentworkingdirectoryinorderforyoutoworkwithit.Youcanfindoutwhatthecurrentworkingdirectoryisbyimportingosandusingos.getcwd(),andyoucanchangethecurrentworkingdirectoryusingos.chdir().
GettingSheetsfromtheWorkbookYoucangetalistofallthesheetnamesintheworkbookbycallingtheget_sheet_names()method.Enterthefollowingintotheinteractiveshell:
>>>importopenpyxl
>>>wb=openpyxl.load_workbook('example.xlsx')
>>>wb.get_sheet_names()
['Sheet1','Sheet2','Sheet3']
>>>sheet=wb.get_sheet_by_name('Sheet3')
>>>sheet
<Worksheet"Sheet3">
>>>type(sheet)<class'openpyxl.worksheet.worksheet.Worksheet'>
>>>sheet.title
'Sheet3'
>>>anotherSheet=wb.get_active_sheet()
>>>anotherSheet
<Worksheet"Sheet1">
EachsheetisrepresentedbyaWorksheetobject,whichyoucanobtainbypassingthesheetnamestringtotheget_sheet_by_name()workbookmethod.Finally,youcancalltheget_active_sheet()methodofaWorkbookobjecttogettheworkbook’sactivesheet.Theactivesheetisthesheetthat’sontopwhentheworkbookisopenedinExcel.OnceyouhavetheWorksheetobject,youcangetitsnamefromthetitleattribute.
GettingCellsfromtheSheetsOnceyouhaveaWorksheetobject,youcanaccessaCellobjectbyitsname.Enterthefollowingintotheinteractiveshell:
>>>importopenpyxl
>>>wb=openpyxl.load_workbook('example.xlsx')
>>>sheet=wb.get_sheet_by_name('Sheet1')
>>>sheet['A1']
<CellSheet1.A1>
>>>sheet['A1'].value
datetime.datetime(2015,4,5,13,34,2)
>>>c=sheet['B1']
>>>c.value
'Apples'
>>>'Row'+str(c.row)+',Column'+c.column+'is'+c.value
'Row1,ColumnBisApples'
>>>'Cell'+c.coordinate+'is'+c.value
'CellB1isApples'
>>>sheet['C1'].value
73
TheCellobjecthasavalueattributethatcontains,unsurprisingly,thevaluestoredinthatcell.Cellobjectsalsohaverow,column,andcoordinateattributesthatprovidelocationinformationforthecell.
Here,accessingthevalueattributeofourCellobjectforcellB1givesusthestring'Apples'.Therowattributegivesustheinteger1,thecolumnattributegivesus'B',andthecoordinateattributegivesus'B1'.
OpenPyXLwillautomaticallyinterpretthedatesincolumnAandreturnthemasdatetimevaluesratherthanstrings.ThedatetimedatatypeisexplainedfurtherinChapter16.
Specifyingacolumnbylettercanbetrickytoprogram,especiallybecauseaftercolumnZ,thecolumnsstartbyusingtwoletters:AA,AB,AC,andsoon.Asanalternative,youcanalsogetacellusingthesheet’scell()methodandpassingintegersforitsrowandcolumnkeywordarguments.Thefirstroworcolumnintegeris1,not0.Continuetheinteractiveshellexamplebyenteringthefollowing:
>>>sheet.cell(row=1,column=2)
<CellSheet1.B1>
>>>sheet.cell(row=1,column=2).value
'Apples'
>>>foriinrange(1,8,2):
print(i,sheet.cell(row=i,column=2).value)
1Apples
3Pears
5Apples
7Strawberries
Asyoucansee,usingthesheet’scell()methodandpassingitrow=1andcolumn=2getsyouaCellobjectforcellB1,justlikespecifyingsheet['B1']did.Then,usingthecell()methodanditskeywordarguments,youcanwriteaforlooptoprintthevaluesofaseriesofcells.
SayyouwanttogodowncolumnBandprintthevalueineverycellwithanoddrownumber.Bypassing2fortherange()function’s“step”parameter,youcangetcellsfromeverysecondrow(inthiscase,alltheodd-numberedrows).Theforloop’sivariableispassedfortherowkeywordargumenttothecell()method,while2isalwayspassedforthecolumnkeywordargument.Notethattheinteger2,notthestring'B',ispassed.
YoucandeterminethesizeofthesheetwiththeWorksheetobject’sget_highest_row()andget_highest_column()methods.Enterthefollowingintotheinteractiveshell:
>>>importopenpyxl
>>>wb=openpyxl.load_workbook('example.xlsx')
>>>sheet=wb.get_sheet_by_name('Sheet1')
>>>sheet.get_highest_row()
7
>>>sheet.get_highest_column()
3
Notethattheget_highest_column()methodreturnsanintegerratherthantheletterthatappearsinExcel.
ConvertingBetweenColumnLettersandNumbersToconvertfromletterstonumbers,calltheopenpyxl.cell.column_index_from_string()function.Toconvertfromnumberstoletters,calltheopenpyxl.cell.get_column_letter()function.Enterthefollowingintotheinteractiveshell:
>>>importopenpyxl
>>>fromopenpyxl.cellimportget_column_letter,column_index_from_string
>>>get_column_letter(1)
'A'
>>>get_column_letter(2)
'B'
>>>get_column_letter(27)
'AA'
>>>get_column_letter(900)
'AHP'
>>>wb=openpyxl.load_workbook('example.xlsx')
>>>sheet=wb.get_sheet_by_name('Sheet1')
>>>get_column_letter(sheet.get_highest_column())
'C'
>>>column_index_from_string('A')
1
>>>column_index_from_string('AA')
27
Afteryouimportthesetwofunctionsfromtheopenpyxl.cellmodule,youcancallget_column_letter()andpassitanintegerlike27tofigureoutwhattheletternameofthe27thcolumnis.Thefunctioncolumn_index_string()doesthereverse:Youpassittheletternameofacolumn,andittellsyouwhatnumberthatcolumnis.Youdon’tneedtohaveaworkbookloadedtousethesefunctions.Ifyouwant,youcanloadaworkbook,getaWorksheetobject,andcallaWorksheetobjectmethodlikeget_highest_column()togetaninteger.Then,youcanpassthatintegertoget_column_letter().
GettingRowsandColumnsfromtheSheetsYoucansliceWorksheetobjectstogetalltheCellobjectsinarow,column,orrectangularareaofthespreadsheet.Thenyoucanloopoverallthecellsintheslice.Enterthefollowingintotheinteractiveshell:
>>>importopenpyxl
>>>wb=openpyxl.load_workbook('example.xlsx')
>>>sheet=wb.get_sheet_by_name('Sheet1')
>>>tuple(sheet['A1':'C3'])
((<CellSheet1.A1>,<CellSheet1.B1>,<CellSheet1.C1>),(<CellSheet1.A2>,
<CellSheet1.B2>,<CellSheet1.C2>),(<CellSheet1.A3>,<CellSheet1.B3>,
<CellSheet1.C3>))
➊>>>forrowOfCellObjectsinsheet['A1':'C3']:
➋forcellObjinrowOfCellObjects:
print(cellObj.coordinate,cellObj.value)
print('---ENDOFROW---')
A12015-04-0513:34:02
B1Apples
C173
---ENDOFROW---
A22015-04-0503:41:23
B2Cherries
C285
---ENDOFROW---
A32015-04-0612:46:51
B3Pears
C314
---ENDOFROW---
Here,wespecifythatwewanttheCellobjectsintherectangularareafromA1toC3,andwegetaGeneratorobjectcontainingtheCellobjectsinthatarea.TohelpusvisualizethisGeneratorobject,wecanusetuple()onittodisplayitsCellobjectsinatuple.
Thistuplecontainsthreetuples:oneforeachrow,fromthetopofthedesiredareatothebottom.EachofthesethreeinnertuplescontainstheCellobjectsinonerowofourdesiredarea,fromtheleftmostcelltotheright.Sooverall,oursliceofthesheetcontainsalltheCellobjectsintheareafromA1toC3,startingfromthetop-leftcellandendingwiththebottom-rightcell.
Toprintthevaluesofeachcellinthearea,weusetwoforloops.Theouterforloopgoesovereachrowintheslice➊.Then,foreachrow,thenestedforloopgoesthrougheachcellinthatrow➋.
Toaccessthevaluesofcellsinaparticularroworcolumn,youcanalsouseaWorksheetobject’srowsandcolumnsattribute.Enterthefollowingintotheinteractiveshell:
>>>importopenpyxl
>>>wb=openpyxl.load_workbook('example.xlsx')
>>>sheet=wb.get_active_sheet()
>>>sheet.columns[1]
(<CellSheet1.B1>,<CellSheet1.B2>,<CellSheet1.B3>,<CellSheet1.B4>,
<CellSheet1.B5>,<CellSheet1.B6>,<CellSheet1.B7>)
>>>forcellObjinsheet.columns[1]:
print(cellObj.value)
Apples
Cherries
Pears
Oranges
Apples
Bananas
Strawberries
UsingtherowsattributeonaWorksheetobjectwillgiveyouatupleoftuples.Eachoftheseinnertuplesrepresentsarow,andcontainstheCellobjectsinthatrow.Thecolumnsattributealsogivesyouatupleoftuples,witheachoftheinnertuplescontainingtheCellobjectsinaparticularcolumn.Forexample.xlsx,sincethereare7rowsand3columns,rowsgivesusatupleof7tuples(eachcontaining3Cellobjects),andcolumnsgivesusatupleof3tuples(eachcontaining7Cellobjects).
Toaccessoneparticulartuple,youcanrefertoitbyitsindexinthelargertuple.Forexample,togetthetuplethatrepresentscolumnB,youusesheet.columns[1].TogetthetuplecontainingtheCellobjectsincolumnA,you’dusesheet.columns[0].Onceyouhaveatuplerepresentingoneroworcolumn,youcanloopthroughitsCellobjectsandprinttheirvalues.
Workbooks,Sheets,CellsAsaquickreview,here’sarundownofallthefunctions,methods,anddatatypesinvolvedinreadingacelloutofaspreadsheetfile:
1. Importtheopenpyxlmodule.2. Calltheopenpyxl.load_workbook()function.3. GetaWorkbookobject.4. Calltheget_active_sheet()orget_sheet_by_name()workbookmethod.5. GetaWorksheetobject.6. Useindexingorthecell()sheetmethodwithrowandcolumnkeywordarguments.7. GetaCellobject.8. ReadtheCellobject’svalueattribute.
Project:ReadingDatafromaSpreadsheetSayyouhaveaspreadsheetofdatafromthe2010USCensusandyouhavetheboringtaskofgoingthroughitsthousandsofrowstocountboththetotalpopulationandthenumberofcensustractsforeachcounty.(Acensustractissimplyageographicareadefinedforthepurposesofthecensus.)Eachrowrepresentsasinglecensustract.We’llnamethespreadsheetfilecensuspopdata.xlsx,andyoucandownloaditfromhttp://nostarch.com/automatestuff/.ItscontentslooklikeFigure12-2.
Figure12-2.Thecensuspopdata.xlsxspreadsheet
EventhoughExcelcancalculatethesumofmultipleselectedcells,you’dstillhavetoselectthecellsforeachofthe3,000-pluscounties.Evenifittakesjustafewsecondstocalculateacounty’spopulationbyhand,thiswouldtakehourstodoforthewholespreadsheet.
Inthisproject,you’llwriteascriptthatcanreadfromthecensusspreadsheetfileandcalculatestatisticsforeachcountyinamatterofseconds.
Thisiswhatyourprogramdoes:
ReadsthedatafromtheExcelspreadsheet.Countsthenumberofcensustractsineachcounty.Countsthetotalpopulationofeachcounty.Printstheresults.
Thismeansyourcodewillneedtodothefollowing:
OpenandreadthecellsofanExceldocumentwiththeopenpyxlmodule.Calculateallthetractandpopulationdataandstoreitinadatastructure.Writethedatastructuretoatextfilewiththe.pyextensionusingthepprintmodule.
Step1:ReadtheSpreadsheetDataThereisjustonesheetinthecensuspopdata.xlsxspreadsheet,named'PopulationbyCensusTract',andeachrowholdsthedataforasinglecensustract.Thecolumnsarethetractnumber(A),thestateabbreviation(B),thecountyname(C),andthepopulationofthetract(D).
Openanewfileeditorwindowandenterthefollowingcode.Savethefileas
readCensusExcel.py.#!python3
#readCensusExcel.py-Tabulatespopulationandnumberofcensustractsfor
#eachcounty.
➊importopenpyxl,pprint
print('Openingworkbook…')
➋wb=openpyxl.load_workbook('censuspopdata.xlsx')
➌sheet=wb.get_sheet_by_name('PopulationbyCensusTract')
countyData={}
#TODO:FillincountyDatawitheachcounty'spopulationandtracts.
print('Readingrows…')
➍forrowinrange(2,sheet.get_highest_row()+1):
#Eachrowinthespreadsheethasdataforonecensustract.
state=sheet['B'+str(row)].value
county=sheet['C'+str(row)].value
pop=sheet['D'+str(row)].value
#TODO:OpenanewtextfileandwritethecontentsofcountyDatatoit.
Thiscodeimportstheopenpyxlmodule,aswellasthepprintmodulethatyou’llusetoprintthefinalcountydata➊.Thenitopensthecensuspopdata.xlsxfile➋,getsthesheetwiththecensusdata➌,andbeginsiteratingoveritsrows➍.
Notethatyou’vealsocreatedavariablenamedcountyData,whichwillcontainthepopulationsandnumberoftractsyoucalculateforeachcounty.Beforeyoucanstoreanythinginit,though,youshoulddetermineexactlyhowyou’llstructurethedatainsideit.
Step2:PopulatetheDataStructureThedatastructurestoredincountyDatawillbeadictionarywithstateabbreviationsasitskeys.Eachstateabbreviationwillmaptoanotherdictionary,whosekeysarestringsofthecountynamesinthatstate.Eachcountynamewillinturnmaptoadictionarywithjusttwokeys,'tracts'and'pop'.Thesekeysmaptothenumberofcensustractsandpopulationforthecounty.Forexample,thedictionarywilllooksimilartothis:
{'AK':{'AleutiansEast':{'pop':3141,'tracts':1},
'AleutiansWest':{'pop':5561,'tracts':2},
'Anchorage':{'pop':291826,'tracts':55},
'Bethel':{'pop':17013,'tracts':3},
'BristolBay':{'pop':997,'tracts':1},
--snip--
IfthepreviousdictionarywerestoredincountyData,thefollowingexpressionswouldevaluatelikethis:
>>>countyData['AK']['Anchorage']['pop']
291826
>>>countyData['AK']['Anchorage']['tracts']
55
Moregenerally,thecountyDatadictionary’skeyswilllooklikethis:countyData[stateabbrev][county]['tracts']
countyData[stateabbrev][county]['pop']
NowthatyouknowhowcountyDatawillbestructured,youcanwritethecodethatwillfillitwiththecountydata.Addthefollowingcodetothebottomofyourprogram:
#!python3
#readCensusExcel.py-Tabulatespopulationandnumberofcensustractsfor
#eachcounty.
--snip--
forrowinrange(2,sheet.get_highest_row()+1):
#Eachrowinthespreadsheethasdataforonecensustract.
state=sheet['B'+str(row)].value
county=sheet['C'+str(row)].value
pop=sheet['D'+str(row)].value
#Makesurethekeyforthisstateexists.
➊countyData.setdefault(state,{})
#Makesurethekeyforthiscountyinthisstateexists.
➋countyData[state].setdefault(county,{'tracts':0,'pop':0})
#Eachrowrepresentsonecensustract,soincrementbyone.
➌countyData[state][county]['tracts']+=1
#Increasethecountypopbythepopinthiscensustract.
➍countyData[state][county]['pop']+=int(pop)
#TODO:OpenanewtextfileandwritethecontentsofcountyDatatoit.
Thelasttwolinesofcodeperformtheactualcalculationwork,incrementingthevaluefortracts➌andincreasingthevalueforpop➍forthecurrentcountyoneachiterationoftheforloop.
TheothercodeistherebecauseyoucannotaddacountydictionaryasthevalueforastateabbreviationkeyuntilthekeyitselfexistsincountyData.(Thatis,countyData['AK']['Anchorage']['tracts']+=1willcauseanerrorifthe'AK'keydoesn’texistyet.)Tomakesurethestateabbreviationkeyexistsinyourdatastructure,youneedtocallthesetdefault()methodtosetavalueifonedoesnotalreadyexistforstate➊.
JustasthecountyDatadictionaryneedsadictionaryasthevalueforeachstateabbreviationkey,eachofthosedictionarieswillneeditsowndictionaryasthevalueforeachcountykey➋.Andeachofthosedictionariesinturnwillneedkeys'tracts'and'pop'thatstartwiththeintegervalue0.(Ifyoueverlosetrackofthedictionarystructure,lookbackattheexampledictionaryatthestartofthissection.)
Sincesetdefault()willdonothingifthekeyalreadyexists,youcancallitoneveryiterationoftheforloopwithoutaproblem.
Step3:WritetheResultstoaFileAftertheforloophasfinished,thecountyDatadictionarywillcontainallofthepopulationandtractinformationkeyedbycountyandstate.Atthispoint,youcouldprogrammorecodetowritethistoatextfileoranotherExcelspreadsheet.Fornow,let’sjustusethepprint.pformat()functiontowritethecountyDatadictionaryvalueasamassivestringtoafilenamedcensus2010.py.Addthefollowingcodetothebottomofyourprogram(makingsuretokeepitunindentedsothatitstaysoutsidetheforloop):
#!python3
#readCensusExcel.py-Tabulatespopulationandnumberofcensustractsfor
#eachcounty.
--snip--
forrowinrange(2,sheet.get_highest_row()+1):
--snip--
#OpenanewtextfileandwritethecontentsofcountyDatatoit.
print('Writingresults…')
resultFile=open('census2010.py','w')
resultFile.write('allData='+pprint.pformat(countyData))
resultFile.close()
print('Done.')
Thepprint.pformat()functionproducesastringthatitselfisformattedasvalidPythoncode.Byoutputtingittoatextfilenamedcensus2010.py,you’vegeneratedaPythonprogramfromyourPythonprogram!Thismayseemcomplicated,buttheadvantageis
thatyoucannowimportcensus2010.pyjustlikeanyotherPythonmodule.Intheinteractiveshell,changethecurrentworkingdirectorytothefolderwithyournewlycreatedcensus2010.pyfile(onmylaptop,thisisC:\Python34),andthenimportit:
>>>importos
>>>os.chdir('C:\\Python34')
>>>importcensus2010
>>>census2010.allData['AK']['Anchorage']
{'pop':291826,'tracts':55}
>>>anchoragePop=census2010.allData['AK']['Anchorage']['pop']
>>>print('The2010populationofAnchoragewas'+str(anchoragePop))
The2010populationofAnchoragewas291826
ThereadCensusExcel.pyprogramwasthrowawaycode:Onceyouhaveitsresultssavedtocensus2010.py,youwon’tneedtoruntheprogramagain.Wheneveryouneedthecountydata,youcanjustrunimportcensus2010.
Calculatingthisdatabyhandwouldhavetakenhours;thisprogramdiditinafewseconds.UsingOpenPyXL,youwillhavenotroubleextractinginformationthatissavedtoanExcelspreadsheetandperformingcalculationsonit.Youcandownloadthecompleteprogramfromhttp://nostarch.com/automatestuff/.
IdeasforSimilarProgramsManybusinessesandofficesuseExceltostorevarioustypesofdata,andit’snotuncommonforspreadsheetstobecomelargeandunwieldy.AnyprogramthatparsesanExcelspreadsheethasasimilarstructure:Itloadsthespreadsheetfile,prepssomevariablesordatastructures,andthenloopsthrougheachoftherowsinthespreadsheet.Suchaprogramcoulddothefollowing:
Comparedataacrossmultiplerowsinaspreadsheet.OpenmultipleExcelfilesandcomparedatabetweenspreadsheets.Checkwhetheraspreadsheethasblankrowsorinvaliddatainanycellsandalerttheuserifitdoes.ReaddatafromaspreadsheetanduseitastheinputforyourPythonprograms.
WritingExcelDocumentsOpenPyXLalsoprovideswaysofwritingdata,meaningthatyourprogramscancreateandeditspreadsheetfiles.WithPython,it’ssimpletocreatespreadsheetswiththousandsofrowsofdata.
CreatingandSavingExcelDocumentsCalltheopenpyxl.Workbook()functiontocreateanew,blankWorkbookobject.Enterthefollowingintotheinteractiveshell:
>>>importopenpyxl
>>>wb=openpyxl.Workbook()
>>>wb.get_sheet_names()
['Sheet']
>>>sheet=wb.get_active_sheet()
>>>sheet.title
'Sheet'
>>>sheet.title='SpamBaconEggsSheet'
>>>wb.get_sheet_names()
['SpamBaconEggsSheet']
TheworkbookwillstartoffwithasinglesheetnamedSheet.Youcanchangethenameofthesheetbystoringanewstringinitstitleattribute.
AnytimeyoumodifytheWorkbookobjectoritssheetsandcells,thespreadsheetfilewillnotbesaveduntilyoucallthesave()workbookmethod.Enterthefollowingintotheinteractiveshell(withexample.xlsxinthecurrentworkingdirectory):
>>>importopenpyxl
>>>wb=openpyxl.load_workbook('example.xlsx')
>>>sheet=wb.get_active_sheet()
>>>sheet.title='SpamSpamSpam'
>>>wb.save('example_copy.xlsx')
Here,wechangethenameofoursheet.Tosaveourchanges,wepassafilenameasastringtothesave()method.Passingadifferentfilenamethantheoriginal,suchas'example_copy.xlsx',savesthechangestoacopyofthespreadsheet.
Wheneveryoueditaspreadsheetyou’veloadedfromafile,youshouldalwayssavethenew,editedspreadsheettoadifferentfilenamethantheoriginal.Thatway,you’llstillhavetheoriginalspreadsheetfiletoworkwithincaseabuginyourcodecausedthenew,savedfiletohaveincorrectorcorruptdata.
CreatingandRemovingSheetsSheetscanbeaddedtoandremovedfromaworkbookwiththecreate_sheet()andremove_sheet()methods.Enterthefollowingintotheinteractiveshell:
>>>importopenpyxl
>>>wb=openpyxl.Workbook()
>>>wb.get_sheet_names()
['Sheet']
>>>wb.create_sheet()
<Worksheet"Sheet1">
>>>wb.get_sheet_names()
['Sheet','Sheet1']
>>>wb.create_sheet(index=0,title='FirstSheet')
<Worksheet"FirstSheet">
>>>wb.get_sheet_names()
['FirstSheet','Sheet','Sheet1']
>>>wb.create_sheet(index=2,title='MiddleSheet')
<Worksheet"MiddleSheet">
>>>wb.get_sheet_names()
['FirstSheet','Sheet','MiddleSheet','Sheet1']
Thecreate_sheet()methodreturnsanewWorksheetobjectnamedSheetX,whichbydefaultissettobethelastsheetintheworkbook.Optionally,theindexandnameofthenewsheetcanbespecifiedwiththeindexandtitlekeywordarguments.
Continuethepreviousexamplebyenteringthefollowing:>>>wb.get_sheet_names()
['FirstSheet','Sheet','MiddleSheet','Sheet1']
>>>wb.remove_sheet(wb.get_sheet_by_name('MiddleSheet'))
>>>wb.remove_sheet(wb.get_sheet_by_name('Sheet1'))
>>>wb.get_sheet_names()
['FirstSheet','Sheet']
Theremove_sheet()methodtakesaWorksheetobject,notastringofthesheetname,asitsargument.Ifyouknowonlythenameofasheetyouwanttoremove,callget_sheet_by_name()andpassitsreturnvalueintoremove_sheet().
Remembertocallthesave()methodtosavethechangesafteraddingsheetstoorremovingsheetsfromtheworkbook.
WritingValuestoCellsWritingvaluestocellsismuchlikewritingvaluestokeysinadictionary.Enterthisintotheinteractiveshell:
>>>importopenpyxl
>>>wb=openpyxl.Workbook()
>>>sheet=wb.get_sheet_by_name('Sheet')
>>>sheet['A1']='Helloworld!'
>>>sheet['A1'].value
'Helloworld!'
Ifyouhavethecell’scoordinateasastring,youcanuseitjustlikeadictionarykeyontheWorksheetobjecttospecifywhichcelltowriteto.
Project:UpdatingaSpreadsheetInthisproject,you’llwriteaprogramtoupdatecellsinaspreadsheetofproducesales.Yourprogramwilllookthroughthespreadsheet,findspecifickindsofproduce,andupdatetheirprices.Downloadthisspreadsheetfromhttp://nostarch.com/automatestuff/.Figure12-3showswhatthespreadsheetlookslike.
Figure12-3.Aspreadsheetofproducesales
Eachrowrepresentsanindividualsale.Thecolumnsarethetypeofproducesold(A),thecostperpoundofthatproduce(B),thenumberofpoundssold(C),andthetotalrevenuefromthesale(D).TheTOTALcolumnissettotheExcelformula=ROUND(B3*C3,2),whichmultipliesthecostperpoundbythenumberofpoundssoldandroundstheresulttothenearestcent.Withthisformula,thecellsintheTOTALcolumnwillautomaticallyupdatethemselvesifthereisachangeincolumnBorC.
Nowimaginethatthepricesofgarlic,celery,andlemonswereenteredincorrectly,leavingyouwiththeboringtaskofgoingthroughthousandsofrowsinthisspreadsheettoupdatethecostperpoundforanygarlic,celery,andlemonrows.Youcan’tdoasimplefind-and-replaceforthepricebecausetheremightbeotheritemswiththesamepricethatyoudon’twanttomistakenly“correct.”Forthousandsofrows,thiswouldtakehourstodobyhand.Butyoucanwriteaprogramthatcanaccomplishthisinseconds.
Yourprogramdoesthefollowing:
Loopsoveralltherows.Iftherowisforgarlic,celery,orlemons,changestheprice.
Thismeansyourcodewillneedtodothefollowing:
Openthespreadsheetfile.Foreachrow,checkwhetherthevalueincolumnAisCelery,Garlic,orLemon.Ifitis,updatethepriceincolumnB.Savethespreadsheettoanewfile(sothatyoudon’tlosetheoldspreadsheet,justincase).
Step1:SetUpaDataStructurewiththeUpdateInformationThepricesthatyouneedtoupdateareasfollows:
Celery 1.19
Garlic 3.07
Lemon 1.27
Youcouldwritecodelikethis:ifproduceName=='Celery':
cellObj=1.19
ifproduceName=='Garlic':
cellObj=3.07
ifproduceName=='Lemon':
cellObj=1.27
Havingtheproduceandupdatedpricedatahardcodedlikethisisabitinelegant.Ifyouneededtoupdatethespreadsheetagainwithdifferentpricesordifferentproduce,youwouldhavetochangealotofthecode.Everytimeyouchangecode,youriskintroducingbugs.
Amoreflexiblesolutionistostorethecorrectedpriceinformationinadictionaryandwriteyourcodetousethisdatastructure.Inanewfileeditorwindow,enterthefollowingcode:
#!python3
#updateProduce.py-Correctscostsinproducesalesspreadsheet.
importopenpyxl
wb=openpyxl.load_workbook('produceSales.xlsx')
sheet=wb.get_sheet_by_name('Sheet')
#Theproducetypesandtheirupdatedprices
PRICE_UPDATES={'Garlic':3.07,
'Celery':1.19,
'Lemon':1.27}
#TODO:Loopthroughtherowsandupdatetheprices.
SavethisasupdateProduce.py.Ifyouneedtoupdatethespreadsheetagain,you’llneedtoupdateonlythePRICE_UPDATESdictionary,notanyothercode.
Step2:CheckAllRowsandUpdateIncorrectPricesThenextpartoftheprogramwillloopthroughalltherowsinthespreadsheet.AddthefollowingcodetothebottomofupdateProduce.py:
#!python3
#updateProduce.py-Correctscostsinproducesalesspreadsheet.
--snip--
#Loopthroughtherowsandupdatetheprices.
➊forrowNuminrange(2,sheet.get_highest_row()):#skipthefirstrow
➋produceName=sheet.cell(row=rowNum,column=1).value
➌ifproduceNameinPRICE_UPDATES:
sheet.cell(row=rowNum,column=2).value=PRICE_UPDATES[produceName]
➍wb.save('updatedProduceSales.xlsx')
Weloopthroughtherowsstartingatrow2,sincerow1isjusttheheader➊.Thecellin
column1(thatis,columnA)willbestoredinthevariableproduceName➋.IfproduceNameexistsasakeyinthePRICE_UPDATESdictionary➌,thenyouknowthisisarowthatmusthaveitspricecorrected.ThecorrectpricewillbeinPRICE_UPDATES[produceName].
NoticehowcleanusingPRICE_UPDATESmakesthecode.Onlyoneifstatement,ratherthancodelikeifproduceName=='Garlic':,isnecessaryforeverytypeofproducetoupdate.AndsincethecodeusesthePRICE_UPDATESdictionaryinsteadofhardcodingtheproducenamesandupdatedcostsintotheforloop,youmodifyonlythePRICE_UPDATESdictionaryandnotthecodeiftheproducesalesspreadsheetneedsadditionalchanges.
Aftergoingthroughtheentirespreadsheetandmakingchanges,thecodesavestheWorkbookobjecttoupdatedProduceSales.xlsx➍.Itdoesn’toverwritetheoldspreadsheetjustincasethere’sabuginyourprogramandtheupdatedspreadsheetiswrong.Aftercheckingthattheupdatedspreadsheetlooksright,youcandeletetheoldspreadsheet.
Youcandownloadthecompletesourcecodeforthisprogramfromhttp://nostarch.com/automatestuff/.
IdeasforSimilarProgramsSincemanyofficeworkersuseExcelspreadsheetsallthetime,aprogramthatcanautomaticallyeditandwriteExcelfilescouldbereallyuseful.Suchaprogramcoulddothefollowing:
Readdatafromonespreadsheetandwriteittopartsofotherspreadsheets.Readdatafromwebsites,textfiles,ortheclipboardandwriteittoaspreadsheet.Automatically“cleanup”datainspreadsheets.Forexample,itcoulduseregularexpressionstoreadmultipleformatsofphonenumbersandeditthemtoasingle,standardformat.
SettingtheFontStyleofCellsStylingcertaincells,rows,orcolumnscanhelpyouemphasizeimportantareasinyourspreadsheet.Intheproducespreadsheet,forexample,yourprogramcouldapplyboldtexttothepotato,garlic,andparsniprows.Orperhapsyouwanttoitalicizeeveryrowwithacostperpoundgreaterthan$5.Stylingpartsofalargespreadsheetbyhandwouldbetedious,butyourprogramscandoitinstantly.
Tocustomizefontstylesincells,important,importtheFont()andStyle()functionsfromtheopenpyxl.stylesmodule.
fromopenpyxl.stylesimportFont,Style
ThisallowsyoutotypeFont()insteadofopenpyxl.styles.Font().(SeeImportingModulestoreviewthisstyleofimportstatement.)
Here’sanexamplethatcreatesanewworkbookandsetscellA1tohavea24-point,italicizedfont.Enterthefollowingintotheinteractiveshell:
>>>importopenpyxl
>>>fromopenpyxl.stylesimportFont,Style
>>>wb=openpyxl.Workbook()
>>>sheet=wb.get_sheet_by_name('Sheet')
➊>>>italic24Font=Font(size=24,italic=True)
➋>>>styleObj=Style(font=italic24Font)
➌>>>sheet['A'].style/styleObj
>>>sheet['A1']='Helloworld!'
>>>wb.save('styled.xlsx')
OpenPyXLrepresentsthecollectionofstylesettingsforacellwithaStyleobject,whichisstoredintheCellobject’sstyleattribute.Acell’sstylecanbesetbyassigningtheStyleobjecttothestyleattribute.
Inthisexample,Font(size=24,italic=True)returnsaFontobject,whichisstoredinitalic24Font➊.ThekeywordargumentstoFont(),sizeanditalic,configuretheFontobject’sstyleattributes.ThisFontobjectisthenpassedintotheStyle(font=italic24Font)call,whichreturnsthevalueyoustoredinstyleObj➋.AndwhenstyleObjisassignedtothecell’sstyleattribute➌,allthatfontstylinginformationgetsappliedtocellA1.
FontObjectsThestyleattributesinFontobjectsaffecthowthetextincellsisdisplayed.Tosetfontstyleattributes,youpasskeywordargumentstoFont().Table12-2showsthepossiblekeywordargumentsfortheFont()function.
Table12-2.KeywordArgumentsforFontstyleAttributes
Keywordargument Datatype Description
name String Thefontname,suchas'Calibri'or'TimesNewRoman'
size Integer Thepointsize
bold Boolean True,forboldfont
italic Boolean True,foritalicfont
YoucancallFont()tocreateaFontobjectandstorethatFontobjectinavariable.YouthenpassthattoStyle(),storetheresultingStyleobjectinavariable,andassignthatvariabletoaCellobject’sstyleattribute.Forexample,thiscodecreatesvariousfontstyles:
>>>importopenpyxl
>>>fromopenpyxl.stylesimportFont,Style
>>>wb=openpyxl.Workbook()
>>>sheet=wb.get_sheet_by_name('Sheet')
>>>fontObj1=Font(name='TimesNewRoman',bold=True)
>>>styleObj1=Style(font=fontObj1)
>>>sheet['A1'].style/styleObj
>>>sheet['A1']='BoldTimesNewRoman'
>>>fontObj2=Font(size=24,italic=True)
>>>styleObj2=Style(font=fontObj2)
>>>sheet['B3'].style/styleObj
>>>sheet['B3']='24ptItalic'
>>>wb.save('styles.xlsx')
Here,westoreaFontobjectinfontObj1anduseittocreateaStyleobject,whichwestoreinstyleObj1,andthensettheA1Cellobject’sstyleattributetostyleObj.WerepeattheprocesswithanotherFontobjectandStyleobjecttosetthestyleofasecondcell.Afteryourunthiscode,thestylesoftheA1andB3cellsinthespreadsheetwillbesettocustomfontstyles,asshowninFigure12-4.
Figure12-4.Aspreadsheetwithcustomfontstyles
ForcellA1,wesetthefontnameto'TimesNewRoman'andsetboldtotrue,soourtext
appearsinboldTimesNewRoman.Wedidn’tspecifyasize,sotheopenpyxldefault,11,isused.IncellB3,ourtextisitalic,withasizeof24;wedidn’tspecifyafontname,sotheopenpyxldefault,Calibri,isused.
FormulasFormulas,whichbeginwithanequalsign,canconfigurecellstocontainvaluescalculatedfromothercells.Inthissection,you’llusetheopenpyxlmoduletoprogrammaticallyaddformulastocells,justlikeanynormalvalue.Forexample:
>>>sheet['B9']='=SUM(B1:B8)'
Thiswillstore=SUM(B1:B8)asthevalueincellB9.ThissetstheB9celltoaformulathatcalculatesthesumofvaluesincellsB1toB8.YoucanseethisinactioninFigure12-5.
Figure12-5.CellB9containstheformula=SUM(B1:B8),whichaddsthecellsB1toB8.
Aformulaissetjustlikeanyothertextvalueinacell.Enterthefollowingintotheinteractiveshell:
>>>importopenpyxl
>>>wb=openpyxl.Workbook()
>>>sheet=wb.get_active_sheet()
>>>sheet['A1']=200
>>>sheet['A2']=300
>>>sheet['A3']='=SUM(A1:A2)'
>>>wb.save('writeFormula.xlsx')
ThecellsinA1andA2aresetto200and300,respectively.ThevalueincellA3issettoaformulathatsumsthevaluesinA1andA2.WhenthespreadsheetisopenedinExcel,A3willdisplayitsvalueas500.
Youcanalsoreadtheformulainacelljustasyouwouldanyvalue.However,ifyouwanttoseetheresultofthecalculationfortheformulainsteadoftheliteralformula,youmustpassTrueforthedata_onlykeywordargumenttoload_workbook().ThismeansaWorkbookobjectcanshoweithertheformulasortheresultoftheformulasbutnotboth.(ButyoucanhavemultipleWorkbookobjectsloadedforthesamespreadsheetfile.)Enterthefollowingintotheinteractiveshelltoseethedifferencebetweenloadingaworkbookwithandwithoutthedata_onlykeywordargument:
>>>importopenpyxl
>>>wbFormulas=openpyxl.load_workbook('writeFormula.xlsx')
>>>sheet=wbFormulas.get_active_sheet()
>>>sheet['A3'].value
'=SUM(A1:A2)'
>>>wbDataOnly=openpyxl.load_workbook('writeFormula.xlsx',data_only=True)
>>>sheet=wbDataOnly.get_active_sheet()
>>>sheet['A3'].value
500
Here,whenload_workbook()iscalledwithdata_only=True,theA3cellshows500,theresultofthe=SUM(A1:A2)formula,ratherthanthetextoftheformula.
Excelformulasofferalevelofprogrammabilityforspreadsheetsbutcanquicklybecomeunmanageableforcomplicatedtasks.Forexample,evenifyou’redeeplyfamiliarwithExcelformulas,it’saheadachetotrytodecipherwhat=IFERROR(TRIM(IF(LEN(VLOOKUP(F7,Sheet2!$A$1:$B$10000,2,FALSE))>0,SUBSTITUTE(VLOOKUP(F7,Sheet2!$A$1:$B$10000,2,FALSE),“”,“”),“”)),“”)actuallydoes.Pythoncodeismuchmorereadable.
AdjustingRowsandColumnsInExcel,adjustingthesizesofrowsandcolumnsisaseasyasclickinganddraggingtheedgesofaroworcolumnheader.Butifyouneedtosetaroworcolumn’ssizebasedonitscells’contentsorifyouwanttosetsizesinalargenumberofspreadsheetfiles,itwillbemuchquickertowriteaPythonprogramtodoit.
Rowsandcolumnscanalsobehiddenentirelyfromview.Ortheycanbe“frozen”inplacesothattheyarealwaysvisibleonthescreenandappearoneverypagewhenthespreadsheetisprinted(whichishandyforheaders).
SettingRowHeightandColumnWidthWorksheetobjectshaverow_dimensionsandcolumn_dimensionsattributesthatcontrolrowheightsandcolumnwidths.Enterthisintotheinteractiveshell:
>>>importopenpyxl
>>>wb=openpyxl.Workbook()
>>>sheet=wb.get_active_sheet()
>>>sheet['A1']='Tallrow'
>>>sheet['B2']='Widecolumn'
>>>sheet.row_dimensions[1].height=70
>>>sheet.column_dimensions['B'].width=20
>>>wb.save('dimensions.xlsx')
Asheet’srow_dimensionsandcolumn_dimensionsaredictionary-likevalues;row_dimensionscontainsRowDimensionobjectsandcolumn_dimensionscontainsColumnDimensionobjects.Inrow_dimensions,youcanaccessoneoftheobjectsusingthenumberoftherow(inthiscase,1or2).Incolumn_dimensions,youcanaccessoneoftheobjectsusingtheletterofthecolumn(inthiscase,AorB).
Thedimensions.xlsxspreadsheetlookslikeFigure12-6.
Figure12-6.Row1andcolumnBsettolargerheightsandwidths
OnceyouhavetheRowDimensionobject,youcansetitsheight.OnceyouhavetheColumnDimensionobject,youcansetitswidth.Therowheightcanbesettoanintegerorfloatvaluebetween0and409.Thisvaluerepresentstheheightmeasuredinpoints,whereonepointequals1/72ofaninch.Thedefaultrowheightis12.75.Thecolumnwidthcanbesettoanintegerorfloatvaluebetween0and255.Thisvaluerepresentsthenumberofcharactersatthedefaultfontsize(11point)thatcanbedisplayedinthecell.Thedefaultcolumnwidthis8.43characters.Columnswithwidthsof0orrowswithheightsof0arehiddenfromtheuser.
MergingandUnmergingCellsArectangularareaofcellscanbemergedintoasinglecellwiththemerge_cells()sheet
method.Enterthefollowingintotheinteractiveshell:>>>importopenpyxl
>>>wb=openpyxl.Workbook()
>>>sheet=wb.get_active_sheet()
>>>sheet.merge_cells('A1:D3')
>>>sheet['A1']='Twelvecellsmergedtogether.'
>>>sheet.merge_cells('C5:D5')
>>>sheet['C5']='Twomergedcells.'
>>>wb.save('merged.xlsx')
Theargumenttomerge_cells()isasinglestringofthetop-leftandbottom-rightcellsoftherectangularareatobemerged:'A1:D3'merges12cellsintoasinglecell.Tosetthevalueofthesemergedcells,simplysetthevalueofthetop-leftcellofthemergedgroup.
Whenyourunthiscode,merged.xlsxwilllooklikeFigure12-7.
Figure12-7.Mergedcellsinaspreadsheet
Tounmergecells,calltheunmerge_cells()sheetmethod.Enterthisintotheinteractiveshell.
>>>importopenpyxl
>>>wb=openpyxl.load_workbook('merged.xlsx')
>>>sheet=wb.get_active_sheet()
>>>sheet.unmerge_cells('A1:D3')
>>>sheet.unmerge_cells('C5:D5')
>>>wb.save('merged.xlsx')
Ifyousaveyourchangesandthentakealookatthespreadsheet,you’llseethatthemergedcellshavegonebacktobeingindividualcells.
FreezePanesForspreadsheetstoolargetobedisplayedallatonce,it’shelpfulto“freeze”afewofthetoprowsorleftmostcolumnsonscreen.Frozencolumnorrowheaders,forexample,arealwaysvisibletotheuserevenastheyscrollthroughthespreadsheet.Theseareknownasfreezepanes.InOpenPyXL,eachWorksheetobjecthasafreeze_panesattributethatcanbesettoaCellobjectorastringofacell’scoordinates.Notethatallrowsaboveandallcolumnstotheleftofthiscellwillbefrozen,buttherowandcolumnofthecellitselfwillnotbefrozen.
Tounfreezeallpanes,setfreeze_panestoNoneor'A1'.Table12-3showswhichrowsandcolumnswillbefrozenforsomeexamplesettingsoffreeze_panes.
Table12-3.FrozenPaneExamples
freeze_panessetting Rowsandcolumnsfrozen
sheet.freeze_panes='A2' Row1
sheet.freeze_panes='B1' ColumnA
sheet.freeze_panes='C1' ColumnsAandB
sheet.freeze_panes='C2' Row1andcolumnsAandB
sheet.freeze_panes='A1'orsheet.freeze_panes=None Nofrozenpanes
Makesureyouhavetheproducesalesspreadsheetfromhttp://nostarch.com/automatestuff/.Thenenterthefollowingintotheinteractiveshell:
>>>importopenpyxl
>>>wb=openpyxl.load_workbook('produceSales.xlsx')
>>>sheet=wb.get_active_sheet()
>>>sheet.freeze_panes='A2'
>>>wb.save('freezeExample.xlsx')
Ifyousetthefreeze_panesattributeto'A2',row1willalwaysbeviewable,nomatterwheretheuserscrollsinthespreadsheet.YoucanseethisinFigure12-8.
Figure12-8.Withfreeze_panessetto'A2',row1isalwaysvisibleevenastheuserscrollsdown.
ChartsOpenPyXLsupportscreatingbar,line,scatter,andpiechartsusingthedatainasheet’scells.Tomakeachart,youneedtodothefollowing:
1. CreateaReferenceobjectfromarectangularselectionofcells.2. CreateaSeriesobjectbypassingintheReferenceobject.3. CreateaChartobject.4. AppendtheSeriesobjecttotheChartobject.5. Optionally,setthedrawing.top,drawing.left,drawing.width,and
drawing.heightvariablesoftheChartobject.6. AddtheChartobjecttotheWorksheetobject.
TheReferenceobjectrequiressomeexplaining.Referenceobjectsarecreatedbycallingtheopenpyxl.charts.Reference()functionandpassingthreearguments:
1. TheWorksheetobjectcontainingyourchartdata.2. Atupleoftwointegers,representingthetop-leftcelloftherectangularselectionof
cellscontainingyourchartdata:Thefirstintegerinthetupleistherow,andthesecondisthecolumn.Notethat1isthefirstrow,not0.
3. Atupleoftwointegers,representingthebottom-rightcelloftherectangularselectionofcellscontainingyourchartdata:Thefirstintegerinthetupleistherow,andthesecondisthecolumn.
Figure12-9showssomesamplecoordinatearguments.
Figure12-9.Fromlefttoright:(1,1),(10,1);(3,2),(6,4);(5,3),(5,3)
Enterthisinteractiveshellexampletocreateabarchartandaddittothespreadsheet:>>>importopenpyxl
>>>wb=openpyxl.Workbook()
>>>sheet=wb.get_active_sheet()
>>>foriinrange(1,11):#createsomedataincolumnA
sheet['A'+str(i)]=i
>>>refObj=openpyxl.charts.Reference(sheet,(1,1),(10,1))
>>>seriesObj=openpyxl.charts.Series(refObj,title='Firstseries')
>>>chartObj=openpyxl.charts.BarChart()
>>>chartObj.append(seriesObj)
>>>chartObj.drawing.top=50#settheposition
>>>chartObj.drawing.left=100
>>>chartObj.drawing.width=300#setthesize
>>>chartObj.drawing.height=200
>>>sheet.add_chart(chartObj)
>>>wb.save('sampleChart.xlsx')
ThisproducesaspreadsheetthatlookslikeFigure12-10.
Figure12-10.Aspreadsheetwithachartadded
We’vecreatedabarchartbycallingopenpyxl.charts.BarChart().Youcanalsocreatelinecharts,scattercharts,andpiechartsbycallingopenpyxl.charts.LineChart(),openpyxl.charts.ScatterChart(),andopenpyxl.charts.PieChart().
Unfortunately,inthecurrentversionofOpenPyXL(2.1.4),theload_workbook()functiondoesnotloadchartsinExcelfiles.EveniftheExcelfilehascharts,theloadedWorkbookobjectwillnotincludethem.IfyouloadaWorkbookobjectandimmediatelysaveittothesame.xlsxfilename,youwilleffectivelyremovethechartsfromit.
SummaryOftenthehardpartofprocessinginformationisn’ttheprocessingitselfbutsimplygettingthedataintherightformatforyourprogram.ButonceyouhaveyourspreadsheetloadedintoPython,youcanextractandmanipulateitsdatamuchfasterthanyoucouldbyhand.
Youcanalsogeneratespreadsheetsasoutputfromyourprograms.SoifcolleaguesneedyourtextfileorPDFofthousandsofsalescontactstransferredtoaspreadsheetfile,youwon’thavetotediouslycopyandpasteitallintoExcel.
Equippedwiththeopenpyxlmoduleandsomeprogrammingknowledge,you’llfindprocessingeventhebiggestspreadsheetsapieceofcake.
PracticeQuestionsForthefollowingquestions,imagineyouhaveaWorkbookobjectinthevariablewb,aWorksheetobjectinsheet,aCellobjectincell,aCommentobjectincomm,andanImageobjectinimg.
Q: 1.Whatdoestheopenpyxl.load_workbook()functionreturn?
Q: 2.Whatdoestheget_sheet_names()workbookmethodreturn?
Q: 3.HowwouldyouretrievetheWorksheetobjectforasheetnamed'Sheet1'?
Q: 4.HowwouldyouretrievetheWorksheetobjectfortheworkbook’sactivesheet?
Q: 5.HowwouldyouretrievethevalueinthecellC5?
Q: 6.HowwouldyousetthevalueinthecellC5to"Hello"?
Q: 7.Howwouldyouretrievethecell’srowandcolumnasintegers?
Q: 8.Whatdotheget_highest_column()andget_highest_row()sheetmethodsreturn,andwhatisthedatatypeofthesereturnvalues?
Q: 9.Ifyouneededtogettheintegerindexforcolumn'M',whatfunctionwouldyouneedtocall?
Q: 10.Ifyouneededtogetthestringnameforcolumn14,whatfunctionwouldyouneedtocall?
Q: 11.HowcanyouretrieveatupleofalltheCellobjectsfromA1toF1?
Q: 12.Howwouldyousavetheworkbooktothefilenameexample.xlsx?
Q: 13.Howdoyousetaformulainacell?
Q: 14.Ifyouwanttoretrievetheresultofacell’sformulainsteadofthecell’sformulaitself,whatmustyoudofirst?
Q: 15.Howwouldyousettheheightofrow5to100?
Q: 16.HowwouldyouhidecolumnC?
Q: 17.NameafewfeaturesthatOpenPyXL2.1.4doesnotloadfromaspreadsheetfile.
Q: 18.Whatisafreezepane?
Q: 19.Whatfivefunctionsandmethodsdoyouhavetocalltocreateabarchart?
PracticeProjectsForpractice,writeprogramsthatperformthefollowingtasks.
MultiplicationTableMakerCreateaprogrammultiplicationTable.pythattakesanumberNfromthecommandlineandcreatesanN×NmultiplicationtableinanExcelspreadsheet.Forexample,whentheprogramisrunlikethis:
pymultiplicationTable.py6
…itshouldcreateaspreadsheetthatlookslikeFigure12-11.
Figure12-11.Amultiplicationtablegeneratedinaspreadsheet
Row1andcolumnAshouldbeusedforlabelsandshouldbeinbold.
BlankRowInserterCreateaprogramblankRowInserter.pythattakestwointegersandafilenamestringascommandlinearguments.Let’scallthefirstintegerNandthesecondintegerM.StartingatrowN,theprogramshouldinsertMblankrowsintothespreadsheet.Forexample,whentheprogramisrunlikethis:
pythonblankRowInserter.py32myProduce.xlsx
…the“before”and“after”spreadsheetsshouldlooklikeFigure12-12.
Figure12-12.Before(left)andafter(right)thetwoblankrowsareinsertedatrow3
Youcanwritethisprogrambyreadinginthecontentsofthespreadsheet.Then,whenwritingoutthenewspreadsheet,useaforlooptocopythefirstNlines.Fortheremaininglines,addMtotherownumberintheoutputspreadsheet.
SpreadsheetCellInverter
Writeaprogramtoinverttherowandcolumnofthecellsinthespreadsheet.Forexample,thevalueatrow5,column3willbeatrow3,column5(andviceversa).Thisshouldbedoneforallcellsinthespreadsheet.Forexample,the“before”and“after”spreadsheetswouldlooksomethinglikeFigure12-13.
Figure12-13.Thespreadsheetbefore(top)andafter(bottom)inversion
Youcanwritethisprogrambyusingnestedforloopstoreadinthespreadsheet’sdataintoalistoflistsdatastructure.ThisdatastructurecouldhavesheetData[x][y]forthecellatcolumnxandrowy.Then,whenwritingoutthenewspreadsheet,usesheetData[y][x]forthecellatcolumnxandrowy.
TextFilestoSpreadsheetWriteaprogramtoreadinthecontentsofseveraltextfiles(youcanmakethetextfilesyourself)andinsertthosecontentsintoaspreadsheet,withonelineoftextperrow.ThelinesofthefirsttextfilewillbeinthecellsofcolumnA,thelinesofthesecondtextfilewillbeinthecellsofcolumnB,andsoon.
Usethereadlines()Fileobjectmethodtoreturnalistofstrings,onestringperlineinthefile.Forthefirstfile,outputthefirstlinetocolumn1,row1.Thesecondlineshouldbewrittentocolumn1,row2,andsoon.Thenextfilethatisreadwithreadlines()willbewrittentocolumn2,thenextfiletocolumn3,andsoon.
SpreadsheettoTextFilesWriteaprogramthatperformsthetasksofthepreviousprograminreverseorder:TheprogramshouldopenaspreadsheetandwritethecellsofcolumnAintoonetextfile,the
Chapter13.WorkingwithPDFandwordDocumentsPDFandWorddocumentsarebinaryfiles,whichmakesthemmuchmorecomplexthanplaintextfiles.Inadditiontotext,theystorelotsoffont,color,andlayoutinformation.IfyouwantyourprogramstoreadorwritetoPDFsorWorddocuments,you’llneedtodomorethansimplypasstheirfilenamestoopen().
Fortunately,therearePythonmodulesthatmakeiteasyforyoutointeractwithPDFsandWorddocuments.Thischapterwillcovertwosuchmodules:PyPDF2andPython-Docx.
PDFDocumentsPDFstandsforPortableDocumentFormatandusesthe.pdffileextension.AlthoughPDFssupportmanyfeatures,thischapterwillfocusonthetwothingsyou’llbedoingmostoftenwiththem:readingtextcontentfromPDFsandcraftingnewPDFsfromexistingdocuments.
Themoduleyou’llusetoworkwithPDFsisPyPDF2.Toinstallit,runpipinstallPyPDF2fromthecommandline.Thismodulenameiscasesensitive,somakesuretheyislowercaseandeverythingelseisuppercase.(CheckoutAppendixAforfulldetailsaboutinstallingthird-partymodules.)Ifthemodulewasinstalledcorrectly,runningimportPyPDF2intheinteractiveshellshouldn’tdisplayanyerrors.
THEPROBLEMATICPDFFORMAT
WhilePDFfilesaregreatforlayingouttextinawaythat’seasyforpeopletoprintandread,they’renotstraightforwardforsoftwaretoparseintoplaintext.Assuch,PyPDF2mightmakemistakeswhenextractingtextfromaPDFandmayevenbeunabletoopensomePDFsatall.Thereisn’tmuchyoucandoaboutthis,unfortunately.PyPDF2maysimplybeunabletoworkwithsomeofyourparticularPDFfiles.Thatsaid,Ihaven’tfoundanyPDFfilessofarthatcan’tbeopenedwithPyPDF2.
ExtractingTextfromPDFsPyPDF2doesnothaveawaytoextractimages,charts,orothermediafromPDFdocuments,butitcanextracttextandreturnitasaPythonstring.TostartlearninghowPyPDF2works,we’lluseitontheexamplePDFshowninFigure13-1.
Figure13-1.ThePDFpagethatwewillbeextractingtextfrom
DownloadthisPDFfromhttp://nostarch.com/automatestuff/,andenterthefollowingintotheinteractiveshell:
>>>importPyPDF2
>>>pdfFileObj=open('meetingminutes.pdf','rb')
>>>pdfReader=PyPDF2.PdfFileReader(pdfFileObj)
➊>>>pdfReader.numPages
19
➋>>>pageObj=pdfReader.getPage(0)
➌>>>pageObj.extractText()
'OOFFFFIICCIIAALLBBOOAARRDDMMIINNUUTTEESSMeetingofMarch7,2015
\nTheBoardofElementaryandSecondaryEducationshallprovideleadership
andcreatepoliciesforeducationthatexpandopportunitiesforchildren,
empowerfamiliesandcommunities,andadvanceLouisianainanincreasingly
competitiveglobalmarket.BOARDofELEMENTARYandSECONDARYEDUCATION'
First,importthePyPDF2module.Thenopenmeetingminutes.pdfinreadbinarymodeandstoreitinpdfFileObj.TogetaPdfFileReaderobjectthatrepresentsthisPDF,callPyPDF2.PdfFileReader()andpassitpdfFileObj.StorethisPdfFileReaderobjectinpdfReader.
ThetotalnumberofpagesinthedocumentisstoredinthenumPagesattributeofaPdfFileReaderobject➊.TheexamplePDFhas19pages,butlet’sextracttextfromonlythefirstpage.
Toextracttextfromapage,youneedtogetaPageobject,whichrepresentsasinglepage
ofaPDF,fromaPdfFileReaderobject.YoucangetaPageobjectbycallingthegetPage()method➋onaPdfFileReaderobjectandpassingitthepagenumberofthepageyou’reinterestedin—inourcase,0.
PyPDF2usesazero-basedindexforgettingpages:Thefirstpageispage0,thesecondisIntroduction,andsoon.Thisisalwaysthecase,evenifpagesarenumbereddifferentlywithinthedocument.Forexample,sayyourPDFisathree-pageexcerptfromalongerreport,anditspagesarenumbered42,43,and44.Togetthefirstpageofthisdocument,youwouldwanttocallpdfReader.getPage(0),notgetPage(42)orgetPage(1).
OnceyouhaveyourPageobject,callitsextractText()methodtoreturnastringofthepage’stext➌.Thetextextractionisn’tperfect:ThetextCharlesE.“Chas”Roemer,PresidentfromthePDFisabsentfromthestringreturnedbyextractText(),andthespacingissometimesoff.Still,thisapproximationofthePDFtextcontentmaybegoodenoughforyourprogram.
DecryptingPDFsSomePDFdocumentshaveanencryptionfeaturethatwillkeepthemfrombeingreaduntilwhoeverisopeningthedocumentprovidesapassword.EnterthefollowingintotheinteractiveshellwiththePDFyoudownloaded,whichhasbeenencryptedwiththepasswordrosebud:
>>>importPyPDF2
>>>pdfReader=PyPDF2.PdfFileReader(open('encrypted.pdf','rb'))
➊>>>pdfReader.isEncrypted
True
>>>pdfReader.getPage(0)
➋Traceback(mostrecentcalllast):
File"<pyshell#173>",line1,in<module>
pdfReader.getPage()
--snip--
File"C:\Python34\lib\site-packages\PyPDF2\pdf.py",line1173,ingetObject
raiseutils.PdfReadError("filehasnotbeendecrypted")
PyPDF2.utils.PdfReadError:filehasnotbeendecrypted
➌>>>pdfReader.decrypt('rosebud')
1
>>>pageObj=pdfReader.getPage(0)
AllPdfFileReaderobjectshaveanisEncryptedattributethatisTrueifthePDFisencryptedandFalseifitisn’t➊.Anyattempttocallafunctionthatreadsthefilebeforeithasbeendecryptedwiththecorrectpasswordwillresultinanerror➋.
ToreadanencryptedPDF,callthedecrypt()functionandpassthepasswordasastring➌.Afteryoucalldecrypt()withthecorrectpassword,you’llseethatcallinggetPage()nolongercausesanerror.Ifgiventhewrongpassword,thedecrypt()functionwillreturn0andgetPage()willcontinuetofail.Notethatthedecrypt()methoddecryptsonlythePdfFileReaderobject,nottheactualPDFfile.Afteryourprogramterminates,thefileonyourharddriveremainsencrypted.Yourprogramwillhavetocalldecrypt()againthenexttimeitisrun.
CreatingPDFsPyPDF2’scounterparttoPdfFileReaderobjectsisPdfFileWriterobjects,whichcancreatenewPDFfiles.ButPyPDF2cannotwritearbitrarytexttoaPDFlikePythoncandowithplaintextfiles.Instead,PyPDF2’sPDF-writingcapabilitiesarelimitedtocopyingpagesfromotherPDFs,rotatingpages,overlayingpages,andencryptingfiles.
PyPDF2doesn’tallowyoutodirectlyeditaPDF.Instead,youhavetocreateanewPDFandthencopycontentoverfromanexistingdocument.Theexamplesinthissectionwillfollowthisgeneralapproach:
1. OpenoneormoreexistingPDFs(thesourcePDFs)intoPdfFileReaderobjects.2. CreateanewPdfFileWriterobject.3. CopypagesfromthePdfFileReaderobjectsintothePdfFileWriterobject.4. Finally,usethePdfFileWriterobjecttowritetheoutputPDF.
CreatingaPdfFileWriterobjectcreatesonlyavaluethatrepresentsaPDFdocumentinPython.Itdoesn’tcreatetheactualPDFfile.Forthat,youmustcallthePdfFileWriter’swrite()method.
Thewrite()methodtakesaregularFileobjectthathasbeenopenedinwrite-binarymode.YoucangetsuchaFileobjectbycallingPython’sopen()functionwithtwoarguments:thestringofwhatyouwantthePDF’sfilenametobeand'wb'toindicatethefileshouldbeopenedinwrite-binarymode.
Ifthissoundsalittleconfusing,don’tworry—you’llseehowthisworksinthefollowingcodeexamples.
CopyingPages
YoucanusePyPDF2tocopypagesfromonePDFdocumenttoanother.ThisallowsyoutocombinemultiplePDFfiles,cutunwantedpages,orreorderpages.
Downloadmeetingminutes.pdfandmeetingminutes2.pdffromhttp://nostarch.com/automatestuff/andplacethePDFsinthecurrentworkingdirectory.Enterthefollowingintotheinteractiveshell:
>>>importPyPDF2
>>>pdf1File=open('meetingminutes.pdf','rb')
>>>pdf2File=open('meetingminutes2.pdf','rb')
➊>>>pdf1Reader=PyPDF2.PdfFileReader(pdf1File)
➋>>>pdf2Reader=PyPDF2.PdfFileReader(pdf2File)
➌>>>pdfWriter=PyPDF2.PdfFileWriter()
>>>forpageNuminrange(pdf1Reader.numPages):
➍pageObj=pdf1Reader.getPage(pageNum)
➎pdfWriter.addPage(pageObj)
>>>forpageNuminrange(pdf2Reader.numPages):
➏pageObj=pdf2Reader.getPage(pageNum)
➐pdfWriter.addPage(pageObj)
➑>>>pdfOutputFile=open('combinedminutes.pdf','wb')
>>>pdfWriter.write(pdfOutputFile)
>>>pdfOutputFile.close()
>>>pdf1File.close()
>>>pdf2File.close()
OpenbothPDFfilesinreadbinarymodeandstorethetworesultingFileobjectsinpdf1Fileandpdf2File.CallPyPDF2.PdfFileReader()andpassitpdf1FiletogetaPdfFileReaderobjectformeetingminutes.pdf➊.Callitagainandpassitpdf2FiletogetaPdfFileReaderobjectformeetingminutes2.pdf➋.ThencreateanewPdfFileWriterobject,whichrepresentsablankPDFdocument➌.
Next,copyallthepagesfromthetwosourcePDFsandaddthemtothePdfFileWriterobject.GetthePageobjectbycallinggetPage()onaPdfFileReaderobject➍.ThenpassthatPageobjecttoyourPdfFileWriter’saddPage()method➎.Thesestepsaredonefirst
forpdf1Readerandthenagainforpdf2Reader.Whenyou’redonecopyingpages,writeanewPDFcalledcombinedminutes.pdfbypassingaFileobjecttothePdfFileWriter’swrite()method➏.
NOTE
PyPDF2cannotinsertpagesinthemiddleofaPdfFileWriterobject;theaddPage()methodwillonlyaddpagestotheend.
YouhavenowcreatedanewPDFfilethatcombinesthepagesfrommeetingminutes.pdfandmeetingminutes2.pdfintoasingledocument.RememberthattheFileobjectpassedtoPyPDF2.PdfFileReader()needstobeopenedinread-binarymodebypassing'rb'asthesecondargumenttoopen().Likewise,theFileobjectpassedtoPyPDF2.PdfFileWriter()needstobeopenedinwrite-binarymodewith'wb'.
RotatingPages
ThepagesofaPDFcanalsoberotatedin90-degreeincrementswiththerotateClockwise()androtateCounterClockwise()methods.Passoneoftheintegers90,180,or270tothesemethods.Enterthefollowingintotheinteractiveshell,withthemeetingminutes.pdffileinthecurrentworkingdirectory:
>>>importPyPDF2
>>>minutesFile=open('meetingminutes.pdf','rb')
>>>pdfReader=PyPDF2.PdfFileReader(minutesFile)
➊>>>page=pdfReader.getPage(0)
➋>>>page.rotateClockwise(90)
{'/Contents':[IndirectObject(961,0),IndirectObject(962,0),
--snip--
}
>>>pdfWriter=PyPDF2.PdfFileWriter()
>>>pdfWriter.addPage(page)
➌>>>resultPdfFile=open('rotatedPage.pdf','wb')
>>>pdfWriter.write(resultPdfFile)
>>>resultPdfFile.close()
>>>minutesFile.close()
HereweusegetPage(0)toselectthefirstpageofthePDF➊,andthenwecallrotateClockwise(90)onthatpage➋.WewriteanewPDFwiththerotatedpageandsaveitasrotatedPage.pdf➌.
TheresultingPDFwillhaveonepage,rotated90degreesclockwise,asinFigure13-2.ThereturnvaluesfromrotateClockwise()androtateCounterClockwise()containalotofinformationthatyoucanignore.
Figure13-2.TherotatedPage.pdffilewiththepagerotated90degreesclockwise
OverlayingPages
PyPDF2canalsooverlaythecontentsofonepageoveranother,whichisusefulforaddingalogo,timestamp,orwatermarktoapage.WithPython,it’seasytoaddwatermarkstomultiplefilesandonlytopagesyourprogramspecifies.
Downloadwatermark.pdffromhttp://nostarch.com/automatestuff/andplacethePDFinthecurrentworkingdirectoryalongwithmeetingminutes.pdf.Thenenterthefollowingintotheinteractiveshell:
>>>importPyPDF2
>>>minutesFile=open('meetingminutes.pdf','rb')
➋>>>pdfReader=PyPDF2.PdfFileReader(minutesFile)
➋>>>minutesFirstPage=pdfReader.getPage(0)
➌>>>pdfWatermarkReader=PyPDF2.PdfFileReader(open('watermark.pdf','rb'))
➍>>>minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
➎>>>pdfWriter=PyPDF2.PdfFileWriter()
➏>>>pdfWriter.addPage(minutesFirstPage)
➐>>>forpageNuminrange(1,pdfReader.numPages):
pageObj=pdfReader.getPage(pageNum)
pdfWriter.addPage(pageObj)
>>>resultPdfFile=open('watermarkedCover.pdf','wb')
>>>pdfWriter.write(resultPdfFile)
>>>minutesFile.close()
>>>resultPdfFile.close()
HerewemakeaPdfFileReaderobjectofmeetingminutes.pdf➊.WecallgetPage(0)togetaPageobjectforthefirstpageandstorethisobjectinminutesFirstPage➋.WethenmakeaPdfFileReaderobjectforwatermark.pdf➌andcallmergePage()onminutesFirstPage➍.TheargumentwepasstomergePage()isaPageobjectforthefirstpageofwatermark.pdf.
Nowthatwe’vecalledmergePage()onminutesFirstPage,minutesFirstPagerepresentsthewatermarkedfirstpage.WemakeaPdfFileWriterobject➎andaddthewatermarkedfirstpage➏.Thenweloopthroughtherestofthepagesinmeetingminutes.pdfandaddthemtothePdfFileWriterobject➐.Finally,weopenanewPDFcalledwatermarkedCover.pdfandwritethecontentsofthePdfFileWritertothenewPDF.
Figure13-3showstheresults.OurnewPDF,watermarkedCover.pdf,hasallthecontentsofthemeetingminutes.pdf,andthefirstpageiswatermarked.
Figure13-3.TheoriginalPDF(left),thewatermarkPDF(center),andthemergedPDF(right)
EncryptingPDFs
APdfFileWriterobjectcanalsoaddencryptiontoaPDFdocument.Enterthefollowingintotheinteractiveshell:
>>>importPyPDF2
>>>pdfFile=open('meetingminutes.pdf','rb')
>>>pdfReader=PyPDF2.PdfFileReader(pdfFile)
>>>pdfWriter=PyPDF2.PdfFileWriter()
>>>forpageNuminrange(pdfReader.numPages):
pdfWriter.addPage(pdfReader.getPage(pageNum))
➊>>>pdfWriter.encrypt('swordfish')
>>>resultPdf=open('encryptedminutes.pdf','wb')
>>>pdfWriter.write(resultPdf)
>>>resultPdf.close()
Beforecallingthewrite()methodtosavetoafile,calltheencrypt()methodandpassitapasswordstring➊.PDFscanhaveauserpassword(allowingyoutoviewthePDF)andanownerpassword(allowingyoutosetpermissionsforprinting,commenting,extractingtext,andotherfeatures).Theuserpasswordandownerpasswordarethefirstandsecondargumentstoencrypt(),respectively.Ifonlyonestringargumentispassedtoencrypt(),itwillbeusedforbothpasswords.
Inthisexample,wecopiedthepagesofmeetingminutes.pdftoaPdfFileWriterobject.WeencryptedthePdfFileWriterwiththepasswordswordfish,openedanewPDFcalledencryptedminutes.pdf,andwrotethecontentsofthePdfFileWritertothenewPDF.Beforeanyonecanviewencryptedminutes.pdf,they’llhavetoenterthispassword.Youmaywanttodeletetheoriginal,unencryptedmeetingminutes.pdffileafterensuringitscopywascorrectlyencrypted.
Project:CombiningSelectPagesfromManyPDFsSayyouhavetheboringjobofmergingseveraldozenPDFdocumentsintoasinglePDFfile.Eachofthemhasacoversheetasthefirstpage,butyoudon’twantthecoversheetrepeatedinthefinalresult.EventhoughtherearelotsoffreeprogramsforcombiningPDFs,manyofthemsimplymergeentirefilestogether.Let’swriteaPythonprogramtocustomizewhichpagesyouwantinthecombinedPDF.
Atahighlevel,here’swhattheprogramwilldo:
FindallPDFfilesinthecurrentworkingdirectory.SortthefilenamessothePDFsareaddedinorder.Writeeachpage,excludingthefirstpage,ofeachPDFtotheoutputfile.Intermsofimplementation,yourcodewillneedtodothefollowing:Callos.listdir()tofindallthefilesintheworkingdirectoryandremoveanynon-PDFfiles.CallPython’ssort()listmethodtoalphabetizethefilenames.CreateaPdfFileWriterobjectfortheoutputPDF.LoopovereachPDFfile,creatingaPdfFileReaderobjectforit.Loopovereachpage(exceptthefirst)ineachPDFfile.AddthepagestotheoutputPDF.WritetheoutputPDFtoafilenamedallminutes.pdf.
Forthisproject,openanewfileeditorwindowandsaveitascombinePdfs.py.
Step1:FindAllPDFFilesFirst,yourprogramneedstogetalistofallfileswiththe.pdfextensioninthecurrentworkingdirectoryandsortthem.Makeyourcodelooklikethefollowing:
#!python3
#combinePdfs.py-CombinesallthePDFsinthecurrentworkingdirectoryinto
#intoasinglePDF.
➊importPyPDF2,os
#GetallthePDFfilenames.
pdfFiles=[]
forfilenameinos.listdir('.'):
iffilename.endswith('.pdf'):
➋pdfFiles.append(filename)
➌pdfFiles.sort(key/str.lower)
➍pdfWriter=PyPDF2.PdfFileWriter()
#TODO:LoopthroughallthePDFfiles.
#TODO:Loopthroughallthepages(exceptthefirst)andaddthem.
#TODO:SavetheresultingPDFtoafile.
Aftertheshebanglineandthedescriptivecommentaboutwhattheprogramdoes,thiscodeimportstheosandPyPDF2modules➊.Theos.listdir('.')callwillreturnalistofeveryfileinthecurrentworkingdirectory.Thecodeloopsoverthislistandaddsonlythosefileswiththe.pdfextensiontopdfFiles➋.Afterward,thislistissortedinalphabeticalorderwiththekey/str.lowerkeywordargumenttosort()➌.
APdfFileWriterobjectiscreatedtoholdthecombinedPDFpages➍.Finally,afew
commentsoutlinetherestoftheprogram.
Step2:OpenEachPDFNowtheprogrammustreadeachPDFfileinpdfFiles.Addthefollowingtoyourprogram:
#!python3
#combinePdfs.py-CombinesallthePDFsinthecurrentworkingdirectoryinto
#asinglePDF.
importPyPDF2,os
#GetallthePDFfilenames.
pdfFiles=[]
--snip--
#LoopthroughallthePDFfiles.
forfilenameinpdfFiles:
pdfFileObj=open(filename,'rb')
pdfReader=PyPDF2.PdfFileReader(pdfFileObj)
#TODO:Loopthroughallthepages(exceptthefirst)andaddthem.
#TODO:SavetheresultingPDFtoafile.
ForeachPDF,theloopopensafilenameinread-binarymodebycallingopen()with'rb'asthesecondargument.Theopen()callreturnsaFileobject,whichgetspassedtoPyPDF2.PdfFileReader()tocreateaPdfFileReaderobjectforthatPDFfile.
Step3:AddEachPageForeachPDF,you’llwanttoloopovereverypageexceptthefirst.Addthiscodetoyourprogram:
#!python3
#combinePdfs.py-CombinesallthePDFsinthecurrentworkingdirectoryinto
#asinglePDF.
importPyPDF2,os
--snip--
#LoopthroughallthePDFfiles.
forfilenameinpdfFiles:
--snip--
#Loopthroughallthepages(exceptthefirst)andaddthem.
➊forpageNuminrange(1,pdfReader.numPages):
pageObj=pdfReader.getPage(pageNum)
pdfWriter.addPage(pageObj)
#TODO:SavetheresultingPDFtoafile.
ThecodeinsidetheforloopcopieseachPageobjectindividuallytothePdfFileWriterobject.Remember,youwanttoskipthefirstpage.SincePyPDF2considers0tobethefirstpage,yourloopshouldstartat1➊andthengoupto,butnotinclude,theintegerinpdfReader.numPages.
Step4:SavetheResultsAfterthesenestedforloopsaredonelooping,thepdfWritervariablewillcontainaPdfFileWriterobjectwiththepagesforallthePDFscombined.Thelaststepistowritethiscontenttoafileontheharddrive.Addthiscodetoyourprogram:
#!python3
#combinePdfs.py-CombinesallthePDFsinthecurrentworkingdirectoryinto
#asinglePDF.
importPyPDF2,os
--snip--
#LoopthroughallthePDFfiles.
forfilenameinpdfFiles:
--snip--
#Loopthroughallthepages(exceptthefirst)andaddthem.
forpageNuminrange(1,pdfReader.numPages):
--snip--
#SavetheresultingPDFtoafile.
pdfOutput=open('allminutes.pdf','wb')
pdfWriter.write(pdfOutput)
pdfOutput.close()
Passing'wb'toopen()openstheoutputPDFfile,allminutes.pdf,inwrite-binarymode.Then,passingtheresultingFileobjecttothewrite()methodcreatestheactualPDFfile.Acalltotheclose()methodfinishestheprogram.
IdeasforSimilarProgramsBeingabletocreatePDFsfromthepagesofotherPDFswillletyoumakeprogramsthatcandothefollowing:
CutoutspecificpagesfromPDFs.ReorderpagesinaPDF.CreateaPDFfromonlythosepagesthathavesomespecifictext,identifiedbyextractText().
WordDocumentsPythoncancreateandmodifyWorddocuments,whichhavethe.docxfileextension,withthepython-docxmodule.Youcaninstallthemodulebyrunningpipinstallpython-docx.(AppendixAhasfulldetailsoninstallingthird-partymodules.)
NOTE
WhenusingpiptofirstinstallPython-Docx,besuretoinstallpython-docx,notdocx.Theinstallationnamedocxisforadifferentmodulethatthisbookdoesnotcover.However,whenyouaregoingtoimportthepython-docxmodule,you’llneedtorunimportdocx,notimportpython-docx.
Ifyoudon’thaveWord,LibreOfficeWriterandOpenOfficeWriterarebothfreealternativeapplicationsforWindows,OSX,andLinuxthatcanbeusedtoopen.docxfiles.Youcandownloadthemfromhttps://www.libreoffice.organdhttp://openoffice.org,respectively.ThefulldocumentationforPython-Docxisavailableathttps://python-docx.readthedocs.org/.AlthoughthereisaversionofWordforOSX,thischapterwillfocusonWordforWindows.
Comparedtoplaintext,.docxfileshavealotofstructure.ThisstructureisrepresentedbythreedifferentdatatypesinPython-Docx.Atthehighestlevel,aDocumentobjectrepresentstheentiredocument.TheDocumentobjectcontainsalistofParagraphobjectsfortheparagraphsinthedocument.(AnewparagraphbeginswhenevertheuserpressesENTERorRETURNwhiletypinginaWorddocument.)EachoftheseParagraphobjectscontainsalistofoneormoreRunobjects.Thesingle-sentenceparagraphinFigure13-4hasfourruns.
Figure13-4.TheRunobjectsidentifiedinaParagraphobject
ThetextinaWorddocumentismorethanjustastring.Ithasfont,size,color,andotherstylinginformationassociatedwithit.AstyleinWordisacollectionoftheseattributes.ARunobjectisacontiguousrunoftextwiththesamestyle.AnewRunobjectisneededwheneverthetextstylechanges.
ReadingWordDocumentsLet’sexperimentwiththepython-docxmodule.Downloaddemo.docxfromhttp://nostarch.com/automatestuff/andsavethedocumenttotheworkingdirectory.Thenenterthefollowingintotheinteractiveshell:
>>>importdocx
➊>>>doc=docx.Document('demo.docx')
➋>>>len(doc.paragraphs)
7
➌>>>doc.paragraphs[0].text
'DocumentTitle'
➍>>>doc.paragraphs[1].text
'Aplainparagraphwithsomeboldandsomeitalic'
➎>>>len(doc.paragraphs[1].runs)
4
➏>>>doc.paragraphs[1].runs[0].text
'Aplainparagraphwithsome'
➐>>>doc.paragraphs[1].runs[1].text
'bold'
➑>>>doc.paragraphs[1].runs[2].text
'andsome'
➒>>>doc.paragraphs[1].runs[3].text
'italic'
At➊,weopena.docxfileinPython,calldocx.Document(),andpassthefilenamedemo.docx.ThiswillreturnaDocumentobject,whichhasaparagraphsattributethatisalistofParagraphobjects.Whenwecalllen()ondoc.paragraphs,itreturns7,whichtellsusthattherearesevenParagraphobjectsinthisdocument➋.EachoftheseParagraphobjectshasatextattributethatcontainsastringofthetextinthatparagraph(withoutthestyleinformation).Here,thefirsttextattributecontains'DocumentTitle'➌,andthesecondcontains'Aplainparagraphwithsomeboldandsomeitalic'➍.
EachParagraphobjectalsohasarunsattributethatisalistofRunobjects.Runobjectsalsohaveatextattribute,containingjustthetextinthatparticularrun.Let’slookatthetextattributesinthesecondParagraphobject,'Aplainparagraphwithsomeboldandsomeitalic'.Callinglen()onthisParagraphobjecttellsusthattherearefourRunobjects➎.Thefirstrunobjectcontains'Aplainparagraphwithsome'➏.Then,thetextchangetoaboldstyle,so'bold'startsanewRunobject➐.Thetextreturnstoanunboldedstyleafterthat,whichresultsinathirdRunobject,'andsome'➑.Finally,thefourthandlastRunobjectcontains'italic'inanitalicstyle➒.
WithPython-Docx,yourPythonprogramswillnowbeabletoreadthetextfroma.docxfileanduseitjustlikeanyotherstringvalue.
GettingtheFullTextfroma.docxFileIfyoucareonlyaboutthetext,notthestylinginformation,intheWorddocument,youcanusethegetText()function.Itacceptsafilenameofa.docxfileandreturnsasinglestringvalueofitstext.Openanewfileeditorwindowandenterthefollowingcode,savingitasreadDocx.py:
#!python3
importdocx
defgetText(filename):
doc=docx.Document(filename)
fullText=[]
forparaindoc.paragraphs:
fullText.append(para.text)
return'\n'.join(fullText)
ThegetText()functionopenstheWorddocument,loopsoveralltheParagraphobjectsintheparagraphslist,andthenappendstheirtexttothelistinfullText.Aftertheloop,thestringsinfullTextarejoinedtogetherwithnewlinecharacters.
ThereadDocx.pyprogramcanbeimportedlikeanyothermodule.NowifyoujustneedthetextfromaWorddocument,youcanenterthefollowing:
>>>importreadDocx
>>>print(readDocx.getText('demo.docx'))
DocumentTitle
Aplainparagraphwithsomeboldandsomeitalic
Heading,level1
Intensequote
firstiteminunorderedlist
firstiteminorderedlist
YoucanalsoadjustgetText()tomodifythestringbeforereturningit.Forexample,to
indenteachparagraph,replacetheappend()callinreadDocx.pywiththis:fullText.append(''+para.text)
Toaddadoublespaceinbetweenparagraphs,changethejoin()callcodetothis:return'\n\n'.join(fullText)
Asyoucansee,ittakesonlyafewlinesofcodetowritefunctionsthatwillreada.docxfileandreturnastringofitscontenttoyourliking.
StylingParagraphandRunObjectsInWordforWindows,youcanseethestylesbypressingCTRL-ALT-SHIFT-StodisplaytheStylespane,whichlookslikeFigure13-5.OnOSX,youcanviewtheStylespanebyclickingtheView▸Stylesmenuitem.
Figure13-5.DisplaytheStylespanebypressingCTRL-ALT-SHIFT-SonWindows.
Wordandotherwordprocessorsusestylestokeepthevisualpresentationofsimilartypesoftextconsistentandeasytochange.Forexample,perhapsyouwanttosetbodyparagraphsin11-point,TimesNewRoman,left-justified,ragged-righttext.Youcancreateastylewiththesesettingsandassignittoallbodyparagraphs.Then,ifyoulaterwanttochangethepresentationofallbodyparagraphsinthedocument,youcanjustchangethestyle,andallthoseparagraphswillbeautomaticallyupdated.
ForWorddocuments,therearethreetypesofstyles:ParagraphstylescanbeappliedtoParagraphobjects,characterstylescanbeappliedtoRunobjects,andlinkedstylescanbe
appliedtobothkindsofobjects.YoucangivebothParagraphandRunobjectsstylesbysettingtheirstyleattributetoastring.Thisstringshouldbethenameofastyle.IfstyleissettoNone,thentherewillbenostyleassociatedwiththeParagraphorRunobject.
ThestringvaluesforthedefaultWordstylesareasfollows:
'Normal' 'Heading5' 'ListBullet' 'ListParagraph'
'BodyText' 'Heading6' 'ListBullet2' 'MacroText'
'BodyText2' 'Heading7' 'ListBullet3' 'NoSpacing'
'BodyText3' 'Heading8' 'ListContinue' 'Quote'
'Caption' 'Heading9' 'ListContinue2' 'Subtitle'
'Heading1' 'IntenseQuote' 'ListContinue3' 'TOCHeading'
'Heading2' 'List' 'ListNumber' 'Title'
'Heading3' 'List2' 'ListNumber2'
'Heading4' 'List3' 'ListNumber3'
Whensettingthestyleattribute,donotusespacesinthestylename.Forexample,whilethestylenamemaybeSubtleEmphasis,youshouldsetthestyleattributetothestringvalue'SubtleEmphasis'insteadof'SubtleEmphasis'.IncludingspaceswillcauseWordtomisreadthestylenameandnotapplyit.
WhenusingalinkedstyleforaRunobject,youwillneedtoadd'Char'totheendofitsname.Forexample,tosettheQuotelinkedstyleforaParagraphobject,youwoulduseparagraphObj.style='Quote',butforaRunobject,youwoulduserunObj.style='QuoteChar'.
InthecurrentversionofPython-Docx(0.7.4),theonlystylesthatcanbeusedarethedefaultWordstylesandthestylesintheopened.docx.Newstylescannotbecreated—thoughthismaychangeinfutureversionsofPython-Docx.
CreatingWordDocumentswithNondefaultStylesIfyouwanttocreateWorddocumentsthatusestylesbeyondthedefaultones,youwillneedtoopenWordtoablankWorddocumentandcreatethestylesyourselfbyclickingtheNewStylebuttonatthebottomoftheStylespane(Figure13-6showsthisonWindows).
ThiswillopentheCreateNewStylefromFormattingdialog,whereyoucanenterthenewstyle.Then,gobackintotheinteractiveshellandopenthisblankWorddocumentwithdocx.Document(),usingitasthebaseforyourWorddocument.ThenameyougavethisstylewillnowbeavailabletousewithPython-Docx.
Figure13-6.TheNewStylebutton(left)andtheCreateNewStylefromFormattingdialog(right)
RunAttributesRunscanbefurtherstyledusingtextattributes.Eachattributecanbesettooneofthreevalues:True(theattributeisalwaysenabled,nomatterwhatotherstylesareappliedtotherun),False(theattributeisalwaysdisabled),orNone(defaultstowhatevertherun’sstyleissetto).
Table13-1liststhetextattributesthatcanbesetonRunobjects.
Table13-1.RunObjecttextAttributes
Attribute Description
bold Thetextappearsinbold.
italic Thetextappearsinitalic.
underline Thetextisunderlined.
strike Thetextappearswithstrikethrough.
double_strike Thetextappearswithdoublestrikethrough.
all_caps Thetextappearsincapitalletters.
small_caps Thetextappearsincapitalletters,withlowercaseletterstwopointssmaller.
shadow Thetextappearswithashadow.
outline Thetextappearsoutlinedratherthansolid.
rtl Thetextiswrittenright-to-left.
imprint Thetextappearspressedintothepage.
emboss Thetextappearsraisedoffthepageinrelief.
Forexample,tochangethestylesofdemo.docx,enterthefollowingintotheinteractiveshell:
>>>doc=docx.Document('demo.docx')
>>>doc.paragraphs[0].text
'DocumentTitle'
>>>doc.paragraphs[0].style
'Title'
>>>doc.paragraphs[0].style='Normal'
>>>doc.paragraphs[1].text
'Aplainparagraphwithsomeboldandsomeitalic'
>>>(doc.paragraphs[1].runs[0].text,doc.paragraphs[1].runs[1].text,doc.
paragraphs[1].runs[2].text,doc.paragraphs[1].runs[3].text)
('Aplainparagraphwithsome','bold','andsome','italic')
>>>doc.paragraphs[1].runs[0].style='QuoteChar'
>>>doc.paragraphs[1].runs[1].underline=True
>>>doc.paragraphs[1].runs[3].underline=True
>>>doc.save('restyled.docx')
Here,weusethetextandstyleattributestoeasilyseewhat’sintheparagraphsinourdocument.Wecanseethatit’ssimpletodivideaparagraphintorunsandaccesseachrunindividiaully.Sowegetthefirst,second,andfourthrunsinthesecondparagraph,styleeachrun,andsavetheresultstoanewdocument.
ThewordsDocumentTitleatthetopofrestyled.docxwillhavetheNormalstyleinsteadoftheTitlestyle,theRunobjectforthetextAplainparagraphwithsomewillhavetheQuoteCharstyle,andthetwoRunobjectsforthewordsboldanditalicwillhavetheirunderlineattributessettoTrue.Figure13-7showshowthestylesofparagraphsandruns
lookinrestyled.docx.
Figure13-7.Therestyled.docxfile
YoucanfindmorecompletedocumentationonPython-Docx’suseofstylesathttps://python-docx.readthedocs.org/en/latest/user/styles.html.
WritingWordDocumentsEnterthefollowingintotheinteractiveshell:
>>>importdocx
>>>doc=docx.Document()
>>>doc.add_paragraph('Helloworld!')
<docx.text.Paragraphobjectat0x0000000003B56F60>
>>>doc.save('helloworld.docx')
Tocreateyourown.docxfile,calldocx.Document()toreturnanew,blankWordDocumentobject.Theadd_paragraph()documentmethodaddsanewparagraphoftexttothedocumentandreturnsareferencetotheParagraphobjectthatwasadded.Whenyou’redoneaddingtext,passafilenamestringtothesave()documentmethodtosavetheDocumentobjecttoafile.
Thiswillcreateafilenamedhelloworld.docxinthecurrentworkingdirectorythat,whenopened,lookslikeFigure13-8.
Figure13-8.TheWorddocumentcreatedusingadd_paragraph('Helloworld!')
Youcanaddparagraphsbycallingtheadd_paragraph()methodagainwiththenewparagraph’stext.Ortoaddtexttotheendofanexistingparagraph,youcancalltheparagraph’sadd_run()methodandpassitastring.Enterthefollowingintotheinteractiveshell:
>>>importdocx
>>>doc=docx.Document()
>>>doc.add_paragraph('Helloworld!')
<docx.text.Paragraphobjectat0x000000000366AD30>
>>>paraObj1=doc.add_paragraph('Thisisasecondparagraph.')
>>>paraObj2=doc.add_paragraph('Thisisayetanotherparagraph.')
>>>paraObj1.add_run('Thistextisbeingaddedtothesecondparagraph.')
<docx.text.Runobjectat0x0000000003A2C860>
>>>doc.save('multipleParagraphs.docx')
TheresultingdocumentwilllooklikeFigure13-9.NotethatthetextThistextisbeingaddedtothesecondparagraph.wasaddedtotheParagraphobjectinparaObj1,whichwasthesecondparagraphaddedtodoc.Theadd_paragraph()andadd_run()functionsreturnparagraphandRunobjects,respectively,tosaveyouthetroubleofextractingthemasaseparatestep.
KeepinmindthatasofPython-Docxversion0.5.3,newParagraphobjectscanbeaddedonlytotheendofthedocument,andnewRunobjectscanbeaddedonlytotheendofaParagraphobject.
Thesave()methodcanbecalledagaintosavetheadditionalchangesyou’vemade.
Figure13-9.ThedocumentwithmultipleParagraphandRunobjectsadded
Bothadd_paragraph()andadd_run()acceptanoptionalsecondargumentthatisastringoftheParagraphorRunobject’sstyle.Forexample:
>>>doc.add_paragraph('Helloworld!','Title')
ThislineaddsaparagraphwiththetextHelloworld!intheTitlestyle.
AddingHeadingsCallingadd_heading()addsaparagraphwithoneoftheheadingstyles.Enterthefollowingintotheinteractiveshell:
>>>doc=docx.Document()
>>>doc.add_heading('Header0',0)
<docx.text.Paragraphobjectat0x00000000036CB3C8>
>>>doc.add_heading('Header1',1)
<docx.text.Paragraphobjectat0x00000000036CB630>
>>>doc.add_heading('Header2',2)
<docx.text.Paragraphobjectat0x00000000036CB828>
>>>doc.add_heading('Header3',3)
<docx.text.Paragraphobjectat0x00000000036CB2E8>
>>>doc.add_heading('Header4',4)
<docx.text.Paragraphobjectat0x00000000036CB3C8>
>>>doc.save('headings.docx')
Theargumentstoadd_heading()areastringoftheheadingtextandanintegerfrom0to4.Theinteger0makestheheadingtheTitlestyle,whichisusedforthetopofthedocument.Integers1to4areforvariousheadinglevels,with1beingthemainheadingand4thelowestsubheading.Theadd_heading()functionreturnsaParagraphobjecttosaveyouthestepofextractingitfromtheDocumentobjectasaseparatestep.
Theresultingheadings.docxfilewilllooklikeFigure13-10.
Figure13-10.Theheadings.docxdocumentwithheadings0to4
AddingLineandPageBreaksToaddalinebreak(ratherthanstartingawholenewparagraph),youcancalltheadd_break()methodontheRunobjectyouwanttohavethebreakappearafter.Ifyouwanttoaddapagebreakinstead,youneedtopassthevaluedocx.text.WD_BREAK.PAGEasaloneargumenttoadd_break(),asisdoneinthemiddleofthefollowingexample:
>>>doc=docx.Document()
>>>doc.add_paragraph('Thisisonthefirstpage!')
<docx.text.Paragraphobjectat0x0000000003785518>
➊>>>doc.paragraphs[0].runs[0].add_break(docx.text.WD_BREAK.PAGE)
>>>doc.add_paragraph('Thisisonthesecondpage!')
<docx.text.Paragraphobjectat0x00000000037855F8>
>>>doc.save('twoPage.docx')
Thiscreatesatwo-pageWorddocumentwithThisisonthefirstpage!onthefirstpageandThisisonthesecondpage!onthesecond.EventhoughtherewasstillplentyofspaceonthefirstpageafterthetextThisisonthefirstpage!,weforcedthenextparagraphtobeginonanewpagebyinsertingapagebreakafterthefirstrunofthefirstparagraph➊.
AddingPicturesDocumentobjectshaveanadd_picture()methodthatwillletyouaddanimagetotheendofthedocument.Sayyouhaveafilezophie.pnginthecurrentworkingdirectory.Youcanaddzophie.pngtotheendofyourdocumentwithawidthof1inchandheightof4centimeters(Wordcanusebothimperialandmetricunits)byenteringthefollowing:
>>>doc.add_picture('zophie.png',width=docx.shared.Inches(1),
height=docx.shared.Cm(4))
<docx.shape.InlineShapeobjectat0x00000000036C7D30>
Thefirstargumentisastringoftheimage’sfilename.Theoptionalwidthandheightkeywordargumentswillsetthewidthandheightoftheimageinthedocument.Ifleftout,thewidthandheightwilldefaulttothenormalsizeoftheimage.
You’llprobablyprefertospecifyanimage’sheightandwidthinfamiliarunitssuchas
inchesandcentimeters,soyoucanusethedocx.shared.Inches()anddocx.shared.Cm()functionswhenyou’respecifyingthewidthandheightkeywordarguments.
SummaryTextinformationisn’tjustforplaintextfiles;infact,it’sprettylikelythatyoudealwithPDFsandWorddocumentsmuchmoreoften.YoucanusethePyPDF2moduletoreadandwritePDFdocuments.Unfortunately,readingtextfromPDFdocumentsmightnotalwaysresultinaperfecttranslationtoastringbecauseofthecomplicatedPDFfileformat,andsomePDFsmightnotbereadableatall.Inthesecases,you’reoutofluckunlessfutureupdatestoPyPDF2supportadditionalPDFfeatures.
Worddocumentsaremorereliable,andyoucanreadthemwiththepython-docxmodule.YoucanmanipulatetextinWorddocumentsviaParagraphandRunobjects.Theseobjectscanalsobegivenstyles,thoughtheymustbefromthedefaultsetofstylesorstylesalreadyinthedocument.Youcanaddnewparagraphs,headings,breaks,andpicturestothedocument,thoughonlytotheend.
ManyofthelimitationsthatcomewithworkingwithPDFsandWorddocumentsarebecausetheseformatsaremeanttobenicelydisplayedforhumanreaders,ratherthaneasytoparsebysoftware.Thenextchaptertakesalookattwoothercommonformatsforstoringinformation:JSONandCSVfiles.Theseformatsaredesignedtobeusedbycomputers,andyou’llseethatPythoncanworkwiththeseformatsmuchmoreeasily.
PracticeQuestionsQ: 1.AstringvalueofthePDFfilenameisnotpassedtothePyPDF2.PdfFileReader()function.Whatdoyoupassto
thefunctioninstead?
Q: 2.WhatmodesdotheFileobjectsforPdfFileReader()andPdfFileWriter()needtobeopenedin?
Q: 3.HowdoyouacquireaPageobjectforAboutThisBookfromaPdfFileReaderobject?
Q: 4.WhatPdfFileReadervariablestoresthenumberofpagesinthePDFdocument?
Q: 5.IfaPdfFileReaderobject’sPDFisencryptedwiththepasswordswordfish,whatmustyoudobeforeyoucanobtainPageobjectsfromit?
Q: 6.Whatmethodsdoyouusetorotateapage?
Q: 7.WhatmethodreturnsaDocumentobjectforafilenameddemo.docx?
Q: 8.WhatisthedifferencebetweenaParagraphobjectandaRunobject?
Q: 9.HowdoyouobtainalistofParagraphobjectsforaDocumentobjectthat’sstoredinavariablenameddoc?
Q: 10.Whattypeofobjecthasbold,underline,italic,strike,andoutlinevariables?
Q: 11.WhatisthedifferencebetweensettingtheboldvariabletoTrue,False,orNone?
Q: 12.HowdoyoucreateaDocumentobjectforanewWorddocument?
Q: 13.Howdoyouaddaparagraphwiththetext'Hellothere!'toaDocumentobjectstoredinavariablenameddoc?
Q: 14.WhatintegersrepresentthelevelsofheadingsavailableinWorddocuments?
PracticeProjectsForpractice,writeprogramsthatdothefollowing.
PDFParanoiaUsingtheos.walk()functionfromChapter9,writeascriptthatwillgothrougheveryPDFinafolder(anditssubfolders)andencryptthePDFsusingapasswordprovidedonthecommandline.SaveeachencryptedPDFwithan_encrypted.pdfsuffixaddedtotheoriginalfilename.Beforedeletingtheoriginalfile,havetheprogramattempttoreadanddecryptthefiletoensurethatitwasencryptedcorrectly.
Then,writeaprogramthatfindsallencryptedPDFsinafolder(anditssubfolders)andcreatesadecryptedcopyofthePDFusingaprovidedpassword.Ifthepasswordisincorrect,theprogramshouldprintamessagetotheuserandcontinuetothenextPDF.
CustomInvitationsasWordDocumentsSayyouhaveatextfileofguestnames.Thisguests.txtfilehasonenameperline,asfollows:
Prof.Plum
MissScarlet
Col.Mustard
AlSweigart
Robocop
WriteaprogramthatwouldgenerateaWorddocumentwithcustominvitationsthatlooklikeFigure13-11.
SincePython-DocxcanuseonlythosestylesthatalreadyexistintheWorddocument,youwillhavetofirstaddthesestylestoablankWordfileandthenopenthatfilewithPython-Docx.ThereshouldbeoneinvitationperpageintheresultingWorddocument,socalladd_break()toaddapagebreakafterthelastparagraphofeachinvitation.Thisway,youwillneedtoopenonlyoneWorddocumenttoprintalloftheinvitationsatonce.
Figure13-11.TheWorddocumentgeneratedbyyourcustominvitescript
Youcandownloadasampleguests.txtfilefromhttp://nostarch.com/automatestuff/.
Brute-ForcePDFPasswordBreakerSayyouhaveanencryptedPDFthatyouhaveforgottenthepasswordto,butyourememberitwasasingleEnglishword.Tryingtoguessyourforgottenpasswordisquiteaboringtask.InsteadyoucanwriteaprogramthatwilldecryptthePDFbytryingeverypossibleEnglishworduntilitfindsonethatworks.Thisiscalledabrute-forcepasswordattack.Downloadthetextfiledictionary.txtfromhttp://nostarch.com/automatestuff/.Thisdictionaryfilecontainsover44,000Englishwordswithonewordperline.
Usingthefile-readingskillsyoulearnedinChapter8,createalistofwordstringsbyreadingthisfile.Thenloopovereachwordinthislist,passingittothedecrypt()method.Ifthismethodreturnstheinteger0,thepasswordwaswrongandyourprogramshouldcontinuetothenextpassword.Ifdecrypt()returns1,thenyourprogramshouldbreakoutoftheloopandprintthehackedpassword.Youshouldtryboththeuppercaseandlower-caseformofeachword.(Onmylaptop,goingthroughall88,000uppercaseandlowercasewordsfromthedictionaryfiletakesacoupleofminutes.Thisiswhyyoushouldn’tuseasimpleEnglishwordforyourpasswords.)
Chapter14.WorkingwithCSVFilesandJSONDataInChapter13,youlearnedhowtoextracttextfromPDFandWorddocuments.Thesefileswereinabinaryformat,whichrequiredspecialPythonmodulestoaccesstheirdata.CSVandJSONfiles,ontheotherhand,arejustplaintextfiles.Youcanviewtheminatexteditor,suchasIDLE’sfileeditor.ButPythonalsocomeswiththespecialcsvandjsonmodules,eachprovidingfunctionstohelpyouworkwiththesefileformats.
CSVstandsfor“comma-separatedvalues,”andCSVfilesaresimplifiedspreadsheetsstoredasplaintextfiles.Python’scsvmodulemakesiteasytoparseCSVfiles.
JSON(pronounced“JAY-sawn”or“Jason”—itdoesn’tmatterhowbecauseeitherwaypeoplewillsayyou’repronouncingitwrong)isaformatthatstoresinformationasJavaScriptsourcecodeinplaintextfiles.
(JSONisshortforJavaScriptObjectNotation.)Youdon’tneedtoknowtheJavaScriptprogramminglanguagetouseJSONfiles,buttheJSONformatisusefultoknowbecauseit’susedinmanywebapplications.
TheCSVModuleEachlineinaCSVfilerepresentsarowinthespreadsheet,andcommasseparatethecellsintherow.Forexample,thespreadsheetexample.xlsxfromhttp://nostarch.com/automatestuff/wouldlooklikethisinaCSVfile:
4/5/201513:34,Apples,73
4/5/20153:41,Cherries,85
4/6/201512:46,Pears,14
4/8/20158:59,Oranges,52
4/10/20152:07,Apples,152
4/10/201518:10,Bananas,23
4/10/20152:40,Strawberries,98
Iwillusethisfileforthischapter’sinteractiveshellexamples.Youcandownloadexample.csvfromhttp://nostarch.com/automatestuff/orenterthetextintoatexteditorandsaveitasexample.csv.
CSVfilesaresimple,lackingmanyofthefeaturesofanExcelspreadsheet.Forexample,CSVfiles
Don’thavetypesfortheirvalues—everythingisastringDon’thavesettingsforfontsizeorcolorDon’thavemultipleworksheetsCan’tspecifycellwidthsandheightsCan’thavemergedcellsCan’thaveimagesorchartsembeddedinthem
TheadvantageofCSVfilesissimplicity.CSVfilesarewidelysupportedbymanytypesofprograms,canbeviewedintexteditors(includingIDLE’sfileeditor),andareastraightforwardwaytorepresentspreadsheetdata.TheCSVformatisexactlyasadvertised:It’sjustatextfileofcomma-separatedvalues.
SinceCSVfilesarejusttextfiles,youmightbetemptedtoreadtheminasastringandthenprocessthatstringusingthetechniquesyoulearnedinChapter8.Forexample,sinceeachcellinaCSVfileisseparatedbyacomma,maybeyoucouldjustcallthesplit()methodoneachlineoftexttogetthevalues.ButnoteverycommainaCSVfilerepresentstheboundarybetweentwocells.CSVfilesalsohavetheirownsetofescapecharacterstoallowcommasandothercharacterstobeincludedaspartofthevalues.Thesplit()methoddoesn’thandletheseescapecharacters.Becauseofthesepotentialpitfalls,youshouldalwaysusethecsvmoduleforreadingandwritingCSVfiles.
ReaderObjectsToreaddatafromaCSVfilewiththecsvmodule,youneedtocreateaReaderobject.AReaderobjectletsyouiterateoverlinesintheCSVfile.Enterthefollowingintotheinteractiveshell,withexample.csvinthecurrentworkingdirectory:
➊>>>importcsv
➋>>>exampleFile=open('example.csv')
➌>>>exampleReader=csv.reader(exampleFile)
➍>>>exampleData=list(exampleReader)
➍>>>exampleData
[['4/5/201513:34','Apples','73'],['4/5/20153:41','Cherries','85'],
['4/6/201512:46','Pears','14'],['4/8/20158:59','Oranges','52'],
['4/10/20152:07','Apples','152'],['4/10/201518:10','Bananas','23'],
['4/10/20152:40','Strawberries','98']]
ThecsvmodulecomeswithPython,sowecanimportit➊withouthavingtoinstallitfirst.
ToreadaCSVfilewiththecsvmodule,firstopenitusingtheopen()function➋,justasyouwouldanyothertextfile.Butinsteadofcallingtheread()orreadlines()methodontheFileobjectthatopen()returns,passittothecsv.reader()function➌.ThiswillreturnaReaderobjectforyoutouse.Notethatyoudon’tpassafilenamestringdirectlytothecsv.reader()function.
ThemostdirectwaytoaccessthevaluesintheReaderobjectistoconvertittoaplainPythonlistbypassingittolist()➍.Usinglist()onthisReaderobjectreturnsalistoflists,whichyoucanstoreinavariablelikeexampleData.EnteringexampleDataintheshelldisplaysthelistoflists➎.
NowthatyouhavetheCSVfileasalistoflists,youcanaccessthevalueataparticularrowandcolumnwiththeexpressionexampleData[row][col],whererowistheindexofoneofthelistsinexampleData,andcolistheindexoftheitemyouwantfromthatlist.Enterthefollowingintotheinteractiveshell:
>>>exampleData[0][0]
'4/5/201513:34'
>>>exampleData[0][1]
'Apples'
>>>exampleData[0][2]
'73'
>>>exampleData[1][1]
'Cherries'
>>>exampleData[6][1]
'Strawberries'
exampleData[0][0]goesintothefirstlistandgivesusthefirststring,exampleData[0][2]goesintothefirstlistandgivesusthethirdstring,andsoon.
ReadingDatafromReaderObjectsinaforLoopForlargeCSVfiles,you’llwanttousetheReaderobjectinaforloop.Thisavoidsloadingtheentirefileintomemoryatonce.Forexample,enterthefollowingintotheinteractiveshell:
>>>importcsv
>>>exampleFile=open('example.csv')
>>>exampleReader=csv.reader(exampleFile)
>>>forrowinexampleReader:
print('Row#'+str(exampleReader.line_num)+''+str(row))
Row#1['4/5/201513:34','Apples','73']
Row#2['4/5/20153:41','Cherries','85']
Row#3['4/6/201512:46','Pears','14']
Row#4['4/8/20158:59','Oranges','52']
Row#5['4/10/20152:07','Apples','152']
Row#6['4/10/201518:10','Bananas','23']
Row#7['4/10/20152:40','Strawberries','98']
AfteryouimportthecsvmoduleandmakeaReaderobjectfromtheCSVfile,youcanloopthroughtherowsintheReaderobject.Eachrowisalistofvalues,witheachvaluerepresentingacell.
Theprint()functioncallprintsthenumberofthecurrentrowandthecontentsoftherow.Togettherownumber,usetheReaderobject’sline_numvariable,whichcontainsthenumberofthecurrentline.
TheReaderobjectcanbeloopedoveronlyonce.TorereadtheCSVfile,youmustcallcsv.readertocreateaReaderobject.
WriterObjectsAWriterobjectletsyouwritedatatoaCSVfile.TocreateaWriterobject,youusethecsv.writer()function.Enterthefollowingintotheinteractiveshell:
>>>importcsv
➊>>>outputFile=open('output.csv','w',newline='')
➋>>>outputWriter=csv.writer(outputFile)
>>>outputWriter.writerow(['spam','eggs','bacon','ham'])
21
>>>outputWriter.writerow(['Hello,world!','eggs','bacon','ham'])
32
>>>outputWriter.writerow([1,2,3.141592,4])
16
>>>outputFile.close()
First,callopen()andpassit'w'toopenafileinwritemode➊.Thiswillcreatetheobjectyoucanthenpasstocsv.writer()➋tocreateaWriterobject.
OnWindows,you’llalsoneedtopassablankstringfortheopen()function’snewlinekeywordargument.Fortechnicalreasonsbeyondthescopeofthisbook,ifyouforgettosetthenewlineargument,therowsinoutput.csvwillbedouble-spaced,asshowninFigure14-1.
Figure14-1.Ifyouforgetthenewline=''keywordargumentinopen(),theCSVfilewillbedouble-spaced.
Thewriterow()methodforWriterobjectstakesalistargument.EachvalueinthelistisplacedinitsowncellintheoutputCSVfile.Thereturnvalueofwriterow()isthenumberofcharacterswrittentothefileforthatrow(includingnewlinecharacters).
Thiscodeproducesanoutput.csvfilethatlookslikethis:spam,eggs,bacon,ham
"Hello,world!",eggs,bacon,ham
1,2,3.141592,4
NoticehowtheWriterobjectautomaticallyescapesthecommainthevalue'Hello,world!'withdoublequotesintheCSVfile.Thecsvmodulesavesyoufromhavingtohandlethesespecialcasesyourself.
ThedelimiterandlineterminatorKeywordArguments
Sayyouwanttoseparatecellswithatabcharacterinsteadofacommaandyouwanttherowstobedouble-spaced.Youcouldentersomethinglikethefollowingintotheinteractiveshell:
>>>importcsv
>>>csvFile=open('example.tsv','w',newline='')
➊>>>csvWriter=csv.writer(csvFile,delimiter='\t',lineterminator='\n\n')
>>>csvWriter.writerow(['apples','oranges','grapes'])
24
>>>csvWriter.writerow(['eggs','bacon','ham'])
17
>>>csvWriter.writerow(['spam','spam','spam','spam','spam','spam'])
32
>>>csvFile.close()
Thischangesthedelimiterandlineterminatorcharactersinyourfile.Thedelimiteristhecharacterthatappearsbetweencellsonarow.Bydefault,thedelimiterforaCSVfileisacomma.Thelineterminatoristhecharacterthatcomesattheendofarow.Bydefault,thelineterminatorisanewline.Youcanchangecharacterstodifferentvaluesbyusingthedelimiterandlineterminatorkeywordargumentswithcsv.writer().
Passingdelimeter='\t'andlineterminator='\n\n'➊changesthecharacterbetweencellstoatabandthecharacterbetweenrowstotwonewlines.Wethencallwriterow()threetimestogiveusthreerows.
Thisproducesafilenamedexample.tsvwiththefollowingcontents:applesorangesgrapes
eggsbaconham
spamspamspamspamspamspam
Nowthatourcellsareseparatedbytabs,we’reusingthefileextension.tsv,fortab-separatedvalues.
Project:RemovingtheHeaderfromCSVFilesSayyouhavetheboringjobofremovingthefirstlinefromseveralhundredCSVfiles.Maybeyou’llbefeedingthemintoanautomatedprocessthatrequiresjustthedataandnottheheadersatthetopofthecolumns.YoucouldopeneachfileinExcel,deletethefirstrow,andresavethefile—butthatwouldtakehours.Let’swriteaprogramtodoitinstead.
Theprogramwillneedtoopeneveryfilewiththe.csvextensioninthecurrentworkingdirectory,readinthecontentsoftheCSVfile,andrewritethecontentswithoutthefirstrowtoafileofthesamename.ThiswillreplacetheoldcontentsoftheCSVfilewiththenew,headlesscontents.
NOTE
Asalways,wheneveryouwriteaprogramthatmodifiesfiles,besuretobackupthefiles,firstjustincaseyourprogramdoesnotworkthewayyouexpectitto.Youdon’twanttoaccidentallyeraseyouroriginalfiles.
Atahighlevel,theprogrammustdothefollowing:
FindalltheCSVfilesinthecurrentworkingdirectory.Readinthefullcontentsofeachfile.Writeoutthecontents,skippingthefirstline,toanewCSVfile.Atthecodelevel,thismeanstheprogramwillneedtodothefollowing:Loopoveralistoffilesfromos.listdir(),skippingthenon-CSVfiles.CreateaCSVReaderobjectandreadinthecontentsofthefile,usingtheline_numattributetofigureoutwhichlinetoskip.CreateaCSVWriterobjectandwriteouttheread-indatatothenewfile.
Forthisproject,openanewfileeditorwindowandsaveitasremoveCsvHeader.py.
Step1:LoopThroughEachCSVFileThefirstthingyourprogramneedstodoisloopoveralistofallCSVfilenamesforthecurrentworkingdirectory.MakeyourremoveCsvHeader.pylooklikethis:
#!python3
#removeCsvHeader.py-RemovestheheaderfromallCSVfilesinthecurrent
#workingdirectory.
importcsv,os
os.makedirs('headerRemoved',exist_ok=True)
#Loopthrougheveryfileinthecurrentworkingdirectory.
forcsvFilenameinos.listdir('.'):
ifnotcsvFilename.endswith('.csv'):
➊continue#skipnon-csvfiles
print('Removingheaderfrom'+csvFilename+'...')
#TODO:ReadtheCSVfilein(skippingfirstrow).
#TODO:WriteouttheCSVfile.
Theos.makedirs()callwillcreateaheaderRemovedfolderwherealltheheadlessCSVfileswillbewritten.Aforlooponos.listdir('.')getsyoupartwaythere,butitwillloopoverallfilesintheworkingdirectory,soyou’llneedtoaddsomecodeatthestartoftheloopthatskipsfilenamesthatdon’tendwith.csv.Thecontinuestatement➊makes
theforloopmoveontothenextfilenamewhenitcomesacrossanon-CSVfile.
Justsothere’ssomeoutputastheprogramruns,printoutamessagesayingwhichCSVfiletheprogramisworkingon.Then,addsomeTODOcommentsforwhattherestoftheprogramshoulddo.
Step2:ReadintheCSVFileTheprogramdoesn’tremovethefirstlinefromtheCSVfile.Rather,itcreatesanewcopyoftheCSVfilewithoutthefirstline.Sincethecopy’sfilenameisthesameastheoriginalfilename,thecopywilloverwritetheoriginal.
Theprogramwillneedawaytotrackwhetheritiscurrentlyloopingonthefirstrow.AddthefollowingtoremoveCsvHeader.py.
#!python3
#removeCsvHeader.py-RemovestheheaderfromallCSVfilesinthecurrent
#workingdirectory.
--snip--
#ReadtheCSVfilein(skippingfirstrow).
csvRows=[]
csvFileObj=open(csvFilename)
readerObj=csv.reader(csvFileObj)
forrowinreaderObj:
ifreaderObj.line_num==1:
continue#skipfirstrow
csvRows.append(row)
csvFileObj.close()
#TODO:WriteouttheCSVfile.
TheReaderobject’sline_numattributecanbeusedtodeterminewhichlineintheCSVfileitiscurrentlyreading.AnotherforloopwillloopovertherowsreturnedfromtheCSVReaderobject,andallrowsbutthefirstwillbeappendedtocsvRows.
Astheforloopiteratesovereachrow,thecodecheckswhetherreaderObj.line_numissetto1.Ifso,itexecutesacontinuetomoveontothenextrowwithoutappendingittocsvRows.Foreveryrowafterward,theconditionwillbealwaysbeFalse,andtherowwillbeappendedtocsvRows.
Step3:WriteOuttheCSVFileWithouttheFirstRowNowthatcsvRowscontainsallrowsbutthefirstrow,thelistneedstobewrittenouttoaCSVfileintheheaderRemovedfolder.AddthefollowingtoremoveCsvHeader.py:
#!python3
#removeCsvHeader.py-RemovestheheaderfromallCSVfilesinthecurrent
#workingdirectory.
--snip--
#Loopthrougheveryfileinthecurrentworkingdirectory.
➊forcsvFilenameinos.listdir('.'):
ifnotcsvFilename.endswith('.csv'):
continue#skipnon-CSVfiles
--snip--
#WriteouttheCSVfile.
csvFileObj=open(os.path.join('headerRemoved',csvFilename),'w',
newline='')
csvWriter=csv.writer(csvFileObj)
forrowincsvRows:
csvWriter.writerow(row)
csvFileObj.close()
TheCSVWriterobjectwillwritethelisttoaCSVfileinheaderRemovedusingcsvFilename(whichwealsousedintheCSVreader).Thiswilloverwritetheoriginalfile.
OncewecreatetheWriterobject,weloopoverthesublistsstoredincsvRowsandwriteeachsublisttothefile.
Afterthecodeisexecuted,theouterforloop➊willlooptothenextfilenamefromos.listdir('.').Whenthatloopisfinished,theprogramwillbecomplete.
Totestyourprogram,downloadremoveCsvHeader.zipfromhttp://nostarch.com/automatestuff/andunzipittoafolder.RuntheremoveCsvHeader.pyprograminthatfolder.Theoutputwilllooklikethis:
RemovingheaderfromNAICS_data_1048.csv…
RemovingheaderfromNAICS_data_1218.csv…
--snip--
RemovingheaderfromNAICS_data_9834.csv…
RemovingheaderfromNAICS_data_9986.csv…
ThisprogramshouldprintafilenameeachtimeitstripsthefirstlinefromaCSVfile.
IdeasforSimilarProgramsTheprogramsthatyoucouldwriteforCSVfilesaresimilartothekindsyoucouldwriteforExcelfiles,sincethey’rebothspreadsheetfiles.Youcouldwriteprogramstodothefollowing:
ComparedatabetweendifferentrowsinaCSVfileorbetweenmultipleCSVfiles.CopyspecificdatafromaCSVfiletoanExcelfile,orviceversa.CheckforinvaliddataorformattingmistakesinCSVfilesandalerttheusertotheseerrors.ReaddatafromaCSVfileasinputforyourPythonprograms.
JSONandAPIsJavaScriptObjectNotationisapopularwaytoformatdataasasinglehuman-readablestring.JSONisthenativewaythatJavaScriptprogramswritetheirdatastructuresandusuallyresembleswhatPython’spprint()functionwouldproduce.Youdon’tneedtoknowJavaScriptinordertoworkwithJSON-formatteddata.
Here’sanexampleofdataformattedasJSON:{"name":"Zophie","isCat":true,
"miceCaught":0,"napsTaken":37.5,
"felineIQ":null}
JSONisusefultoknow,becausemanywebsitesofferJSONcontentasawayforprogramstointeractwiththewebsite.Thisisknownasprovidinganapplicationprogramminginterface(API).AccessinganAPIisthesameasaccessinganyotherwebpageviaaURL.ThedifferenceisthatthedatareturnedbyanAPIisformatted(withJSON,forexample)formachines;APIsaren’teasyforpeopletoread.
ManywebsitesmaketheirdataavailableinJSONformat.Facebook,Twitter,Yahoo,Google,Tumblr,Wikipedia,Flickr,Data.gov,Reddit,IMDb,RottenTomatoes,LinkedIn,andmanyotherpopularsitesofferAPIsforprogramstouse.Someofthesesitesrequireregistration,whichisalmostalwaysfree.You’llhavetofinddocumentationforwhatURLsyourprogramneedstorequestinordertogetthedatayouwant,aswellasthegeneralformatoftheJSONdatastructuresthatarereturned.ThisdocumentationshouldbeprovidedbywhateversiteisofferingtheAPI;iftheyhavea“Developers”page,lookforthedocumentationthere.
UsingAPIs,youcouldwriteprogramsthatdothefollowing:
Scraperawdatafromwebsites.(AccessingAPIsisoftenmoreconvenientthandownloadingwebpagesandparsingHTMLwithBeautifulSoup.)Automaticallydownloadnewpostsfromoneofyoursocialnetworkaccountsandpostthemtoanotheraccount.Forexample,youcouldtakeyourTumblrpostsandpostthemtoFacebook.Createa“movieencyclopedia”foryourpersonalmoviecollectionbypullingdatafromIMDb,RottenTomatoes,andWikipediaandputtingitintoasingletextfileonyourcomputer.
YoucanseesomeexamplesofJSONAPIsintheresourcesathttp://nostarch.com/automatestuff/.
TheJSONModulePython’sjsonmodulehandlesallthedetailsoftranslatingbetweenastringwithJSONdataandPythonvaluesforthejson.loads()andjson.dumps()functions.JSONcan’tstoreeverykindofPythonvalue.Itcancontainvaluesofonlythefollowingdatatypes:strings,integers,floats,Booleans,lists,dictionaries,andNoneType.JSONcannotrepresentPython-specificobjects,suchasFileobjects,CSVReaderorWriterobjects,Regexobjects,orSeleniumWebElementobjects.
ReadingJSONwiththeloads()FunctionTotranslateastringcontainingJSONdataintoaPythonvalue,passittothejson.loads()function.(Thenamemeans“loadstring,”not“loads.”)Enterthefollowingintotheinteractiveshell:
>>>stringOfJsonData='{"name":"Zophie","isCat":true,"miceCaught":0,
"felineIQ":null}'
>>>importjson
>>>jsonDataAsPythonValue=json.loads(stringOfJsonData)
>>>jsonDataAsPythonValue
{'isCat':True,'miceCaught':0,'name':'Zophie','felineIQ':None}
Afteryouimportthejsonmodule,youcancallloads()andpassitastringofJSONdata.NotethatJSONstringsalwaysusedoublequotes.ItwillreturnthatdataasaPythondictionary.Pythondictionariesarenotordered,sothekey-valuepairsmayappearinadifferentorderwhenyouprintjsonDataAsPythonValue.
WritingJSONwiththedumps()FunctionThejson.dumps()function(whichmeans“dumpstring,”not“dumps”)willtranslateaPythonvalueintoastringofJSON-formatteddata.Enterthefollowingintotheinteractiveshell:
>>>pythonValue={'isCat':True,'miceCaught':0,'name':'Zophie',
'felineIQ':None}
>>>importjson
>>>stringOfJsonData=json.dumps(pythonValue)
>>>stringOfJsonData
'{"isCat":true,"felineIQ":null,"miceCaught":0,"name":"Zophie"}'
ThevaluecanonlybeoneofthefollowingbasicPythondatatypes:dictionary,list,integer,float,string,Boolean,orNone.
Project:FetchingCurrentWeatherDataCheckingtheweatherseemsfairlytrivial:Openyourwebbrowser,clicktheaddressbar,typetheURLtoaweatherwebsite(orsearchforoneandthenclickthelink),waitforthepagetoload,lookpastalltheads,andsoon.
Actually,therearealotofboringstepsyoucouldskipifyouhadaprogramthatdownloadedtheweatherforecastforthenextfewdaysandprinteditasplaintext.ThisprogramusestherequestsmodulefromChapter11todownloaddatafromtheWeb.
Overall,theprogramdoesthefollowing:
Readstherequestedlocationfromthecommandline.DownloadsJSONweatherdatafromOpenWeatherMap.org.ConvertsthestringofJSONdatatoaPythondatastructure.Printstheweatherfortodayandthenexttwodays.Sothecodewillneedtodothefollowing:Joinstringsinsys.argvtogetthelocation.Callrequests.get()todownloadtheweatherdata.Calljson.loads()toconverttheJSONdatatoaPythondatastructure.Printtheweatherforecast.
Forthisproject,openanewfileeditorwindowandsaveitasquickWeather.py.
Step1:GetLocationfromtheCommandLineArgumentTheinputforthisprogramwillcomefromthecommandline.MakequickWeather.pylooklikethis:
#!python3
#quickWeather.py-Printstheweatherforalocationfromthecommandline.
importjson,requests,sys
#Computelocationfromcommandlinearguments.
iflen(sys.argv)<2:
print('Usage:quickWeather.pylocation')
sys.exit()
location=''.join(sys.argv[1:])
#TODO:DownloadtheJSONdatafromOpenWeatherMap.org'sAPI.
#TODO:LoadJSONdataintoaPythonvariable.
InPython,commandlineargumentsarestoredinthesys.argvlist.Afterthe#!shebanglineandimportstatements,theprogramwillcheckthatthereismorethanonecommandlineargument.(Recallthatsys.argvwillalwayshaveatleastoneelement,sys.argv[0],whichcontainsthePythonscript’sfilename.)Ifthereisonlyoneelementinthelist,thentheuserdidn’tprovidealocationonthecommandline,anda“usage”messagewillbeprovidedtotheuserbeforetheprogramends.
Commandlineargumentsaresplitonspaces.ThecommandlineargumentSanFrancisco,CAwouldmakesys.argvhold['quickWeather.py','San','Francisco,','CA'].Therefore,callthejoin()methodtojoinallthestringsexceptforthefirstinsys.argv.Storethisjoinedstringinavariablenamedlocation.
Step2:DownloadtheJSONData
OpenWeatherMap.orgprovidesreal-timeweatherinformationinJSONformat.Yourprogramsimplyhastodownloadthepageathttp://api.openweathermap.org/data/2.5/forecast/daily?q=<Location>&cnt=3,where<Location>isthenameofthecitywhoseweatheryouwant.AddthefollowingtoquickWeather.py.
#!python3
#quickWeather.py-Printstheweatherforalocationfromthecommandline.
--snip--
#DownloadtheJSONdatafromOpenWeatherMap.org'sAPI.
url='http://api.openweathermap.org/data/2.5/forecast/daily?q=%s&cnt=3'%(location)
response=requests.get(url)
response.raise_for_status()
#TODO:LoadJSONdataintoaPythonvariable.
Wehavelocationfromourcommandlinearguments.TomaketheURLwewanttoaccess,weusethe%splaceholderandinsertwhateverstringisstoredinlocationintothatspotintheURLstring.Westoretheresultinurlandpassurltorequests.get().Therequests.get()callreturnsaResponseobject,whichyoucancheckforerrorsbycallingraise_for_status().Ifnoexceptionisraised,thedownloadedtextwillbeinresponse.text.
Step3:LoadJSONDataandPrintWeatherTheresponse.textmembervariableholdsalargestringofJSON-formatteddata.ToconvertthistoaPythonvalue,callthejson.loads()function.TheJSONdatawilllooksomethinglikethis:
{'city':{'coord':{'lat':37.7771,'lon':-122.42},
'country':'UnitedStatesofAmerica',
'id':'5391959',
'name':'SanFrancisco',
'population':0},
'cnt':3,
'cod':'200',
'list':[{'clouds':0,
'deg':233,
'dt':1402344000,
'humidity':58,
'pressure':1012.23,
'speed':1.96,
'temp':{'day':302.29,
'eve':296.46,
'max':302.29,
'min':289.77,
'morn':294.59,
'night':289.77},
'weather':[{'description':'skyisclear',
'icon':'01d',
--snip--
YoucanseethisdatabypassingweatherDatatopprint.pprint().Youmaywanttocheckhttp://openweathermap.org/formoredocumentationonwhatthesefieldsmean.Forexample,theonlinedocumentationwilltellyouthatthe302.29after'day'isthedaytimetemperatureinKelvin,notCelsiusorFahrenheit.
Theweatherdescriptionsyouwantareafter'main'and'description'.Toneatlyprintthemout,addthefollowingtoquickWeather.py.
!python3
#quickWeather.py-Printstheweatherforalocationfromthecommandline.
--snip--
#LoadJSONdataintoaPythonvariable.
weatherData=json.loads(response.text)
#Printweatherdescriptions.
➊w=weatherData['list']
print('Currentweatherin%s:'%(location))
print(w[0]['weather'][0]['main'],'-',w[0]['weather'][0]['description'])
print()
print('Tomorrow:')
print(w[1]['weather'][0]['main'],'-',w[1]['weather'][0]['description'])
print()
print('Dayaftertomorrow:')
print(w[2]['weather'][0]['main'],'-',w[2]['weather'][0]['description'])
NoticehowthecodestoresweatherData['list']inthevariablewtosaveyousometyping➊.Youusew[0],w[1],andw[2]toretrievethedictionariesfortoday,tomorrow,andthedayaftertomorrow’sweather,respectively.Eachofthesedictionarieshasa'weather'key,whichcontainsalistvalue.You’reinterestedinthefirstlistitem,anesteddictionarywithseveralmorekeys,atindex0.Here,weprintthevaluesstoredinthe'main'and'description'keys,separatedbyahyphen.
WhenthisprogramisrunwiththecommandlineargumentquickWeather.pySanFrancisco,CA,theoutputlookssomethinglikethis:
CurrentweatherinSanFrancisco,CA:
Clear-skyisclear
Tomorrow:
Clouds-fewclouds
Dayaftertomorrow:
Clear-skyisclear
(TheweatherisoneofthereasonsIlikelivinginSanFrancisco!)
IdeasforSimilarProgramsAccessingweatherdatacanformthebasisformanytypesofprograms.Youcancreatesimilarprogramstodothefollowing:
Collectweatherforecastsforseveralcampsitesorhikingtrailstoseewhichonewillhavethebestweather.Scheduleaprogramtoregularlychecktheweatherandsendyouafrostalertifyouneedtomoveyourplantsindoors.(Chapter15coversscheduling,andChapter16explainshowtosendemail.)Pullweatherdatafrommultiplesitestoshowallatonce,orcalculateandshowtheaverageofthemultipleweatherpredictions.
SummaryCSVandJSONarecommonplaintextformatsforstoringdata.Theyareeasyforprogramstoparsewhilestillbeinghumanreadable,sotheyareoftenusedforsimplespreadsheetsorwebappdata.ThecsvandjsonmodulesgreatlysimplifytheprocessofreadingandwritingtoCSVandJSONfiles.
ThelastfewchaptershavetaughtyouhowtousePythontoparseinformationfromawidevarietyoffileformats.Onecommontaskistakingdatafromavarietyofformatsandparsingitfortheparticularinformationyouneed.Thesetasksareoftenspecifictothepointthatcommercialsoftwareisnotoptimallyhelpful.Bywritingyourownscripts,youcanmakethecomputerhandlelargeamountsofdatapresentedintheseformats.
InChapter15,you’llbreakawayfromdataformatsandlearnhowtomakeyourprogramscommunicatewithyoubysendingemailsandtextmessages.
PracticeQuestionsQ: 1.WhataresomefeaturesExcelspreadsheetshavethatCSVspreadsheetsdon’t?
Q: 2.Whatdoyoupasstocsv.reader()andcsv.writer()tocreateReaderandWriterobjects?
Q: 3.WhatmodesdoFileobjectsforreaderandWriterobjectsneedtobeopenedin?
Q: 4.WhatmethodtakesalistargumentandwritesittoaCSVfile?
Q: 5.Whatdothedelimiterandlineterminatorkeywordargumentsdo?
Q: 6.WhatfunctiontakesastringofJSONdataandreturnsaPythondatastructure?
Q: 7.WhatfunctiontakesaPythondatastructureandreturnsastringofJSONdata?
PracticeProjectForpractice,writeaprogramthatdoesthefollowing.
Excel-to-CSVConverterExcelcansaveaspreadsheettoaCSVfilewithafewmouseclicks,butifyouhadtoconverthundredsofExcelfilestoCSVs,itwouldtakehoursofclicking.UsingtheopenpyxlmodulefromChapter12,writeaprogramthatreadsalltheExcelfilesinthecurrentworkingdirectoryandoutputsthemasCSVfiles.
AsingleExcelfilemightcontainmultiplesheets;you’llhavetocreateoneCSVfilepersheet.ThefilenamesoftheCSVfilesshouldbe<excelfilename>_<sheettitle>.csv,where<excelfilename>isthefilenameoftheExcelfilewithoutthefileextension(forexample,'spam_data',not'spam_data.xlsx')and<sheettitle>isthestringfromtheWorksheetobject’stitlevariable.
Thisprogramwillinvolvemanynestedforloops.Theskeletonoftheprogramwilllooksomethinglikethis:
forexcelFileinos.listdir('.'):
#Skipnon-xlsxfiles,loadtheworkbookobject.
forsheetNameinwb.get_sheet_names():
#Loopthrougheverysheetintheworkbook.
sheet=wb.get_sheet_by_name(sheetName)
#CreatetheCSVfilenamefromtheExcelfilenameandsheettitle.
#Createthecsv.writerobjectforthisCSVfile.
#Loopthrougheveryrowinthesheet.
forrowNuminrange(1,sheet.get_highest_row()+1):
rowData=[]#appendeachcelltothislist
#Loopthrougheachcellintherow.
forcolNuminrange(1,sheet.get_highest_column()+1):
#Appendeachcell'sdatatorowData.
#WritetherowDatalisttotheCSVfile.
csvFile.close()
DownloadtheZIPfileexcelSpreadsheets.zipfromhttp://nostarch.com/automatestuff/,andunzipthespreadsheetsintothesamedirectoryasyourprogram.Youcanusetheseasthefilestotesttheprogramon.
Chapter15.KeepingTime,SchedulingTasks,andLaunchingProgramsRunningprogramswhileyou’resittingatyourcomputerisfine,butit’salsousefultohaveprogramsrunwithoutyourdirectsupervision.Yourcomputer’sclockcanscheduleprogramstoruncodeatsomespecifiedtimeanddateoratregularintervals.Forexample,yourprogramcouldscrapeawebsiteeveryhourtocheckforchangesordoaCPU-intensivetaskat4AMwhileyousleep.Python’stimeanddatetimemodulesprovidethesefunctions.
Youcanalsowriteprogramsthatlaunchotherprogramsonaschedulebyusingthesubprocessandthreadingmodules.Often,thefastestwaytoprogramistotakeadvantageofapplicationsthatotherpeoplehavealreadywritten.
ThetimeModuleYourcomputer’ssystemclockissettoaspecificdate,time,andtimezone.Thebuilt-intimemoduleallowsyourPythonprogramstoreadthesystemclockforthecurrenttime.Thetime.time()andtime.sleep()functionsarethemostusefulinthetimemodule.
Thetime.time()FunctionTheUnixepochisatimereferencecommonlyusedinprogramming:12AMonJanuary1,1970,CoordinatedUniversalTime(UTC).Thetime.time()functionreturnsthenumberofsecondssincethatmomentasafloatvalue.(Recallthatafloatisjustanumberwithadecimalpoint.)Thisnumberiscalledanepochtimestamp.Forexample,enterthefollowingintotheinteractiveshell:
>>>importtime
>>>time.time()
1425063955.068649
HereI’mcallingtime.time()onFebruary27,2015,at11:05PacificStandardTime,or7:05PMUTC.ThereturnvalueishowmanysecondshavepassedbetweentheUnixepochandthemomenttime.time()wascalled.
NOTE
TheinteractiveshellexampleswillyielddatesandtimesforwhenIwrotethischapterinFebruary2015.Unlessyou’reatimetraveler,yourdatesandtimeswillbedifferent.
Epochtimestampscanbeusedtoprofilecode,thatis,measurehowlongapieceofcodetakestorun.Ifyoucalltime.time()atthebeginningofthecodeblockyouwanttomeasureandagainattheend,youcansubtractthefirsttimestampfromthesecondtofindtheelapsedtimebetweenthosetwocalls.Forexample,openanewfileeditorwindowandenterthefollowingprogram:
importtime
➊defcalcProd():
#Calculatetheproductofthefirst100,000numbers.
product=1
foriinrange(1,100000):
product=product*i
returnproduct
➋startTime=time.time()
prod=calcProd()
➌endTime=time.time()
➍print('Theresultis%sdigitslong.'%(len(str(prod))))
➎print('Took%ssecondstocalculate.'%(endTime-startTime))
At➊,wedefineafunctioncalcProd()toloopthroughtheintegersfrom1to99,999andreturntheirproduct.At➋,wecalltime.time()andstoreitinstartTime.RightaftercallingcalcProd(),wecalltime.time()againandstoreitinendTime➌.WeendbyprintingthelengthoftheproductreturnedbycalcProd()➍andhowlongittooktoruncalcProd()➎.
SavethisprogramascalcProd.pyandrunit.Theoutputwilllooksomethinglikethis:Theresultis456569digitslong.
Took2.844162940979004secondstocalculate.
NOTE
AnotherwaytoprofileyourcodeistousethecProfile.run()function,whichprovidesamuchmoreinformativelevelofdetailthanthesimpletime.time()technique.ThecProfile.run()functionisexplainedathttps://docs.python.org/3/library/profile.html.
Thetime.sleep()FunctionIfyouneedtopauseyourprogramforawhile,callthetime.sleep()functionandpassitthenumberofsecondsyouwantyourprogramtostaypaused.Enterthefollowingintotheinteractiveshell:
>>>importtime
>>>foriinrange(3):
➊print('Tick')
➋time.sleep(1)
➌print('Tock')
➍time.sleep(1)
Tick
Tock
Tick
Tock
Tick
Tock
➎>>>time.sleep(5)
TheforloopwillprintTick➊,pauseforonesecond➋,printTock➌,pauseforonesecond➍,printTick,pause,andsoonuntilTickandTockhaveeachbeenprintedthreetimes.
Thetime.sleep()functionwillblock—thatis,itwillnotreturnandreleaseyourprogramtoexecuteothercode—untilafterthenumberofsecondsyoupassedtotime.sleep()haselapsed.Forexample,ifyouentertime.sleep(5)➎,you’llseethatthenextprompt(>>>)doesn’tappearuntilfivesecondshavepassed.
BeawarethatpressingCTRL-Cwillnotinterrupttime.sleep()callsinIDLE.IDLEwaitsuntiltheentirepauseisoverbeforeraisingtheKeyboardInterruptexception.Toworkaroundthisproblem,insteadofhavingasingletime.sleep(30)calltopausefor30seconds,useaforlooptomake30callstotime.sleep(1).
>>>foriinrange(30):
time.sleep(1)
IfyoupressCTRL-Csometimeduringthese30seconds,youshouldseetheKeyboardInterruptexceptionthrownrightaway.
RoundingNumbersWhenworkingwithtimes,you’lloftenencounterfloatvalueswithmanydigitsafterthedecimal.Tomakethesevalueseasiertoworkwith,youcanshortenthemwithPython’sbuilt-inround()function,whichroundsafloattotheprecisionyouspecify.Justpassinthenumberyouwanttoround,plusanoptionalsecondargumentrepresentinghowmanydigitsafterthedecimalpointyouwanttorounditto.Ifyouomitthesecondargument,round()roundsyournumbertothenearestwholeinteger.Enterthefollowingintotheinteractiveshell:
>>>importtime
>>>now=time.time()
>>>now
1425064108.017826
>>>round(now,2)
1425064108.02
>>>round(now,4)
1425064108.0178
>>>round(now)
1425064108
Afterimportingtimeandstoringtime.time()innow,wecallround(now,2)toroundnowtotwodigitsafterthedecimal,round(now,4)toroundtofourdigitsafterthedecimal,andround(now)toroundtothenearestinteger.
Project:SuperStopwatchSayyouwanttotrackhowmuchtimeyouspendonboringtasksyouhaven’tautomatedyet.Youdon’thaveaphysicalstopwatch,andit’ssurprisinglydifficulttofindafreestopwatchappforyourlaptoporsmartphonethatisn’tcoveredinadsanddoesn’tsendacopyofyourbrowserhistorytomarketers.(Itsaysitcandothisinthelicenseagreementyouagreedto.Youdidreadthelicenseagreement,didn’tyou?)YoucanwriteasimplestopwatchprogramyourselfinPython.
Atahighlevel,here’swhatyourprogramwilldo:
TracktheamountoftimeelapsedbetweenpressesoftheENTERkey,witheachkeypressstartinganew“lap”onthetimer.Printthelapnumber,totaltime,andlaptime.Thismeansyourcodewillneedtodothefollowing:Findthecurrenttimebycallingtime.time()andstoreitasatimestampatthestartoftheprogram,aswellasatthestartofeachlap.KeepalapcounterandincrementiteverytimetheuserpressesENTER.Calculatetheelapsedtimebysubtractingtimestamps.HandletheKeyboardInterruptexceptionsotheusercanpressCTRL-Ctoquit.
Openanewfileeditorwindowandsaveitasstopwatch.py.
Step1:SetUptheProgramtoTrackTimesThestopwatchprogramwillneedtousethecurrenttime,soyou’llwanttoimportthetimemodule.Yourprogramshouldalsoprintsomebriefinstructionstotheuserbeforecallinginput(),sothetimercanbeginaftertheuserpressesENTER.Thenthecodewillstarttrackinglaptimes.
Enterthefollowingcodeintothefileeditor,writingaTODOcommentasaplaceholderfortherestofthecode:
#!python3
#stopwatch.py-Asimplestopwatchprogram.
importtime
#Displaytheprogram'sinstructions.
print('PressENTERtobegin.Afterwards,pressENTERto"click"thestopwatch.
PressCtrl-Ctoquit.')
input()#pressEntertobegin
print('Started.')
startTime=time.time()#getthefirstlap'sstarttime
lastTime=startTime
lapNum=1
#TODO:Starttrackingthelaptimes.
Nowthatwe’vewrittenthecodetodisplaytheinstructions,startthefirstlap,notethetime,andsetourlapcountto1.
Step2:TrackandPrintLapTimesNowlet’swritethecodetostarteachnewlap,calculatehowlongthepreviouslaptook,andcalculatethetotaltimeelapsedsincestartingthestopwatch.We’lldisplaythelaptimeandtotaltimeandincreasethelapcountforeachnewlap.Addthefollowingcodetoyour
program:#!python3
#stopwatch.py-Asimplestopwatchprogram.
importtime
--snip--
#Starttrackingthelaptimes.
➊try:
➋whileTrue:
input()
➌lapTime=round(time.time()-lastTime,2)
➍totalTime=round(time.time()-startTime,2)
➎print('Lap#%s:%s(%s)'%(lapNum,totalTime,lapTime),end='')
lapNum+=1
lastTime=time.time()#resetthelastlaptime
➏exceptKeyboardInterrupt:
#HandletheCtrl-Cexceptiontokeepitserrormessagefromdisplaying.
print('\nDone.')
IftheuserpressesCTRL-Ctostopthestopwatch,theKeyboardInterruptexceptionwillberaised,andtheprogramwillcrashifitsexecutionisnotatrystatement.Topreventcrashing,wewrapthispartoftheprograminatrystatement➊.We’llhandletheexceptionintheexceptclause➏,sowhenCTRL-Cispressedandtheexceptionisraised,theprogramexecutionmovestotheexceptclausetoprintDone,insteadoftheKeyboardInterrupterrormessage.Untilthishappens,theexecutionisinsideaninfiniteloop➋thatcallsinput()andwaitsuntiltheuserpressesENTERtoendalap.Whenalapends,wecalculatehowlongthelaptookbysubtractingthestarttimeofthelap,lastTime,fromthecurrenttime,time.time()➌.Wecalculatethetotaltimeelapsedbysubtractingtheoverallstarttimeofthestopwatch,startTime,fromthecurrenttime➍.
Sincetheresultsofthesetimecalculationswillhavemanydigitsafterthedecimalpoint(suchas4.766272783279419),weusetheround()functiontoroundthefloatvaluetotwodigitsat➌and➍.
At➎,weprintthelapnumber,totaltimeelapsed,andthelaptime.SincetheuserpressingENTERfortheinput()callwillprintanewlinetothescreen,passend=''totheprint()functiontoavoiddouble-spacingtheoutput.Afterprintingthelapinformation,wegetreadyforthenextlapbyadding1tothecountlapNumandsettinglastTimetothecurrenttime,whichisthestarttimeofthenextlap.
IdeasforSimilarProgramsTimetrackingopensupseveralpossibilitiesforyourprograms.Althoughyoucandownloadappstodosomeofthesethings,thebenefitofwritingprogramsyourselfisthattheywillbefreeandnotbloatedwithadsanduselessfeatures.Youcouldwritesimilarprogramstodothefollowing:
Createasimpletimesheetappthatrecordswhenyoutypeaperson’snameandusesthecurrenttimetoclocktheminorout.Addafeaturetoyourprogramtodisplaytheelapsedtimesinceaprocessstarted,suchasadownloadthatusestherequestsmodule.(SeeChapter11.)Intermittentlycheckhowlongaprogramhasbeenrunningandoffertheuserachancetocanceltasksthataretakingtoolong.
ThedatetimeModuleThetimemoduleisusefulforgettingaUnixepochtimestamptoworkwith.Butifyouwanttodisplayadateinamoreconvenientformat,ordoarithmeticwithdates(forexample,figuringoutwhatdatewas205daysagoorwhatdateis123daysfromnow),youshouldusethedatetimemodule.
Thedatetimemodulehasitsowndatetimedatatype.datetimevaluesrepresentaspecificmomentintime.Enterthefollowingintotheinteractiveshell:
>>>importdatetime
➊>>>datetime.datetime.now()
➋datetime.datetime(2015,2,27,11,10,49,55,53)
➌>>>dt=datetime.datetime(2015,10,21,16,29,0)
➍>>>dt.year,dt.month,dt.day
(2015,10,21)
➎>>>dt.hour,dt.minute,dt.second
(16,29,0)
Callingdatetime.datetime.now()➊returnsadatetimeobject➋forthecurrentdateandtime,accordingtoyourcomputer’sclock.Thisobjectincludestheyear,month,day,hour,minute,second,andmicrosecondofthecurrentmoment.Youcanalsoretrieveadatetimeobjectforaspecificmomentbyusingthedatetime.datetime()function➌,passingitintegersrepresentingtheyear,month,day,hour,andsecondofthemomentyouwant.Theseintegerswillbestoredinthedatetimeobject’syear,month,day➍,hour,minute,andsecond➎attributes.
AUnixepochtimestampcanbeconvertedtoadatetimeobjectwiththedatetime.datetime.fromtimestamp()function.Thedateandtimeofthedatetimeobjectwillbeconvertedforthelocaltimezone.Enterthefollowingintotheinteractiveshell:
>>>datetime.datetime.fromtimestamp(1000000)
datetime.datetime(1970,1,12,5,46,40)
>>>datetime.datetime.fromtimestamp(time.time())
datetime.datetime(2015,2,27,11,13,0,604980)
Callingdatetime.datetime.fromtimestamp()andpassingit1000000returnsadatetimeobjectforthemoment1,000,000secondsaftertheUnixepoch.Passingtime.time(),theUnixepochtimestampforthecurrentmoment,returnsadatetimeobjectforthecurrentmoment.Sotheexpressionsdatetime.datetime.now()anddatetime.datetime.fromtimestamp(time.time())dothesamething;theybothgiveyouadatetimeobjectforthepresentmoment.
NOTE
TheseexampleswereenteredonacomputersettoPacificStandardTime.Ifyou’reinanothertimezone,yourresultswilllookdifferent.
datetimeobjectscanbecomparedwitheachotherusingcomparisonoperatorstofindoutwhichoneprecedestheother.Thelaterdatetimeobjectisthe“greater”value.Enterthefollowingintotheinteractiveshell:
➊>>>halloween2015=datetime.datetime(2015,10,31,0,0,0)
➋>>>newyears2016=datetime.datetime(2016,1,1,0,0,0)
>>>oct31_2015=datetime.datetime(2015,10,31,0,0,0)
➌>>>halloween2015==oct31_2015
True
➍>>>halloween2015>newyears2016
False
➎>>>newyears2016>halloween2015
True
>>>newyears2016!=oct31_2015
True
Makeadatetimeobjectforthefirstmoment(midnight)ofOctober31,2015andstoreitinhalloween2015➊.MakeadatetimeobjectforthefirstmomentofJanuary1,2016andstoreitinnewyears2016➋.ThenmakeanotherobjectformidnightonOctober31,2015andstoreitinoct31_2015.Comparinghalloween2015andoct31_2015showsthatthey’reequal➌.Comparingnewyears2016andhalloween2015showsthatnewyears2016isgreater(later)thanhalloween2015➍➎.
ThetimedeltaDataTypeThedatetimemodulealsoprovidesatimedeltadatatype,whichrepresentsadurationoftimeratherthanamomentintime.Enterthefollowingintotheinteractiveshell:
➊>>>delta=datetime.timedelta(days=11,hours=10,minutes=9,seconds=8)
➋>>>delta.days,delta.seconds,delta.microseconds
(11,36548,0)
>>>delta.total_seconds()
986948.0
>>>str(delta)
'11days,10:09:08'
Tocreateatimedeltaobject,usethedatetime.timedelta()function.Thedatetime.timedelta()functiontakeskeywordargumentsweeks,days,hours,minutes,seconds,milliseconds,andmicroseconds.Thereisnomonthoryearkeywordargumentbecause“amonth”or“ayear”isavariableamountoftimedependingontheparticularmonthoryear.Atimedeltaobjecthasthetotaldurationrepresentedindays,seconds,andmicroseconds.Thesenumbersarestoredinthedays,seconds,andmicrosecondsattributes,respectively.Thetotal_seconds()methodwillreturnthedurationinnumberofsecondsalone.Passingatimedeltaobjecttostr()willreturnanicelyformatted,human-readablestringrepresentationoftheobject.
Inthisexample,wepasskeywordargumentstodatetime.delta()tospecifyadurationof11days,10hours,9minutes,and8seconds,andstorethereturnedtimedeltaobjectindelta➊.Thistimedeltaobject’sdaysattributesstores11,anditssecondsattributestores36548(10hours,9minutes,and8seconds,expressedinseconds)➋.Callingtotal_seconds()tellsusthat11days,10hours,9minutes,and8secondsis986,948seconds.Finally,passingthetimedeltaobjecttostr()returnsastringclearlyexplaningtheduration.
Thearithmeticoperatorscanbeusedtoperformdatearithmeticondatetimevalues.Forexample,tocalculatethedate1,000daysfromnow,enterthefollowingintotheinteractiveshell:
>>>dt=datetime.datetime.now()
>>>dt
datetime.datetime(2015,2,27,18,38,50,636181)
>>>thousandDays=datetime.timedelta(days=1000)
>>>dt+thousandDays
datetime.datetime(2017,11,23,18,38,50,636181)
First,makeadatetimeobjectforthecurrentmomentandstoreitindt.Thenmakeatimedeltaobjectforadurationof1,000daysandstoreitinthousandDays.AdddtandthousandDaystogethertogetadatetimeobjectforthedate1,000daysfromnow.Python
willdothedatearithmetictofigureoutthat1,000daysafterFebruary27,2015,willbeNovember23,2017.Thisisusefulbecausewhenyoucalculate1,000daysfromagivendate,youhavetorememberhowmanydaysareineachmonthandfactorinleapyearsandothertrickydetails.Thedatetimemodulehandlesallofthisforyou.
timedeltaobjectscanbeaddedorsubtractedwithdatetimeobjectsorothertimedeltaobjectsusingthe+and-operators.Atimedeltaobjectcanbemultipliedordividedbyintegerorfloatvalueswiththe*and/operators.Enterthefollowingintotheinteractiveshell:
➊>>>oct21st=datetime.datetime(2015,10,21,16,29,0)
➋>>>aboutThirtyYears=datetime.timedelta(days=365*30)
>>>oct21st
datetime.datetime(2015,10,21,16,29)
>>>oct21st-aboutThirtyYears
datetime.datetime(1985,10,28,16,29)
>>>oct21st-(2*aboutThirtyYears)
datetime.datetime(1955,11,5,16,29)
HerewemakeadatetimeobjectforOctober21,2015➊andatimedeltaobjectforadurationofabout30years(we’reassuming365daysforeachofthoseyears)➋.SubtractingaboutThirtyYearsfromoct21stgivesusadatetimeobjectforthedate30yearsbeforeOctober21,2015.Subtracting2*aboutThirtyYearsfromoct21streturnsadatetimeobjectforthedate60yearsbeforeOctober21,2015.
PausingUntilaSpecificDateThetime.sleep()methodletsyoupauseaprogramforacertainnumberofseconds.Byusingawhileloop,youcanpauseyourprogramsuntilaspecificdate.Forexample,thefollowingcodewillcontinuetoloopuntilHalloween2016:
importdatetime
importtime
halloween2016=datetime.datetime(2016,10,31,0,0,0)
whiledatetime.datetime.now()<halloween2016:
time.sleep(1)
Thetime.sleep(1)callwillpauseyourPythonprogramsothatthecomputerdoesn’twasteCPUprocessingcyclessimplycheckingthetimeoverandover.Rather,thewhileloopwilljustchecktheconditiononcepersecondandcontinuewiththerestoftheprogramafterHalloween2016(orwheneveryouprogramittostop).
ConvertingdatetimeObjectsintoStringsEpochtimestampsanddatetimeobjectsaren’tveryfriendlytothehumaneye.Usethestrftime()methodtodisplayadatetimeobjectasastring.(Thefinthenameofthestrftime()functionstandsforformat.)
Thestrftime()methodusesdirectivessimilartoPython’sstringformatting.Table15-1hasafulllistofstrftime()directives.
Table15-1.strftime()Directives
strftimedirective Meaning
%Y Yearwithcentury,asin'2014'
%y Yearwithoutcentury,'00'to'99'(1970to2069)
%m Monthasadecimalnumber,'01'to'12'
%B Fullmonthname,asin'November'
%b Abbreviatedmonthname,asin'Nov'
%d Dayofthemonth,'01'to'31'
%j Dayoftheyear,'001'to'366'
%w Dayoftheweek,'0'(Sunday)to'6'(Saturday)
%A Fullweekdayname,asin'Monday'
%a Abbreviatedweekdayname,asin'Mon'
%H Hour(24-hourclock),'00'to'23'
%I Hour(12-hourclock),'01'to'12'
%M Minute,'00'to'59'
%S Second,'00'to'59'
%p 'AM'or'PM'
%% Literal'%'character
Passstrrftime()acustomformatstringcontainingformattingdirectives(alongwithanydesiredslashes,colons,andsoon),andstrftime()willreturnthedatetimeobject’sinformationasaformattedstring.Enterthefollowingintotheinteractiveshell:
>>>oct21st=datetime.datetime(2015,10,21,16,29,0)
>>>oct21st.strftime('%Y/%m/%d%H:%M:%S')
'2015/10/2116:29:00'
>>>oct21st.strftime('%I:%M%p')
'04:29PM'
>>>oct21st.strftime("%Bof'%y")
"Octoberof'15"
HerewehaveadatetimeobjectforOctober21,2015at4:29PM,storedinoct21st.Passingstrftime()thecustomformatstring'%Y/%m/%d%H:%M:%S'returnsastringcontaining2015,10,and21separatedbyslahesand16,29,and00separatedbycolons.Passing'%I:%M%p'returns'04:29PM',andpassing"%Bof'%y"returns"Octoberof'15".Notethatstrftime()doesn’tbeginwithdatetime.datetime.
ConvertingStringsintodatetimeObjectsIfyouhaveastringofdateinformation,suchas'2015/10/2116:29:00'or'October21,2015',andneedtoconvertittoadatetimeobject,usethedatetime.datetime.strptime()function.Thestrptime()functionistheinverseofthestrftime()method.Acustomformatstringusingthesamedirectivesasstrftime()mustbepassedsothatstrptime()knowshowtoparseandunderstandthestring.(Thepinthenameofthestrptime()functionstandsforparse.)
Enterthefollowingintotheinteractiveshell:➊>>>datetime.datetime.strptime('October21,2015','%B%d,%Y')
datetime.datetime(2015,10,21,0,0)
>>>datetime.datetime.strptime('2015/10/2116:29:00','%Y/%m/%d%H:%M:%S')
datetime.datetime(2015,10,21,16,29)
>>>datetime.datetime.strptime("Octoberof'15","%Bof'%y")
datetime.datetime(2015,10,1,0,0)
>>>datetime.datetime.strptime("Novemberof'63","%Bof'%y")
datetime.datetime(2063,11,1,0,0)
Togetadatetimeobjectfromthestring'October21,2015',pass'October21,2015'asthefirstargumenttostrptime()andthecustomformatstringthatcorrespondsto'October21,2015'asthesecondargument➊.Thestringwiththedateinformationmustmatchthecustomformatstringexactly,orPythonwillraiseaValueErrorexception.
ReviewofPython’sTimeFunctionsDatesandtimesinPythoncaninvolvequiteafewdifferentdatatypesandfunctions.Here’sareviewofthethreedifferenttypesofvaluesusedtorepresenttime:
AUnixepochtimestamp(usedbythetimemodule)isafloatorintegervalueofthenumberofsecondssince12AMonJanuary1,1970,UTC.Adatetimeobject(ofthedatetimemodule)hasintegersstoredintheattributesyear,month,day,hour,minute,andsecond.Atimedeltaobject(ofthedatetimemodule)representsatimeduration,ratherthanaspecificmoment.
Here’sareviewoftimefunctionsandtheirparametersandreturnvalues:
Thetime.time()functionreturnsanepochtimestampfloatvalueofthecurrentmoment.Thetime.sleep(seconds)functionstopstheprogramfortheamountofsecondsspecifiedbythesecondsargument.Thedatetime.datetime(year,month,day,hour,minute,second)functionreturnsadatetimeobjectofthemomentspecifiedbythearguments.Ifhour,minute,orsecondargumentsarenotprovided,theydefaultto0.Thedatetime.datetime.now()functionreturnsadatetimeobjectofthecurrentmoment.Thedatetime.datetime.fromtimestamp(epoch)functionreturnsadatetimeobjectofthemomentrepresentedbytheepochtimestampargument.Thedatetime.timedelta(weeks,days,hours,minutes,seconds,milliseconds,microseconds)functionreturnsatimedeltaobjectrepresentingadurationoftime.Thefunction’skeywordargumentsarealloptionalanddonotincludemonthoryear.Thetotal_seconds()methodfortimedeltaobjectsreturnsthenumberofsecondsthetimedeltaobjectrepresents.Thestrftime(format)methodreturnsastringofthetimerepresentedbythedatetimeobjectinacustomformatthat’sbasedontheformatstring.SeeTable15-1fortheformatdetails.Thedatetime.datetime.strptime(time_string,format)functionreturnsadatetimeobjectofthemomentspecifiedbytime_string,parsedusingtheformatstringargument.SeeTable15-1fortheformatdetails.
MultithreadingTointroducetheconceptofmultithreading,let’slookatanexamplesituation.Sayyouwanttoschedulesomecodetorunafteradelayorataspecifictime.Youcouldaddcodelikethefollowingatthestartofyourprogram:
importtime,datetime
startTime=datetime.datetime(2029,10,31,0,0,0)
whiledatetime.datetime.now()<startTime:
time.sleep(1)
print('ProgramnowstartingonHalloween2029')
--snip--
ThiscodedesignatesastarttimeofOctober31,2029,andkeepscallingtime.sleep(1)untilthestarttimearrives.Yourprogramcannotdoanythingwhilewaitingfortheloopoftime.sleep()callstofinish;itjustsitsarounduntilHalloween2029.ThisisbecausePythonprogramsbydefaulthaveasinglethreadofexecution.
Tounderstandwhatathreadofexecutionis,remembertheChapter2discussionofflowcontrol,whenyouimaginedtheexecutionofaprogramasplacingyourfingeronalineofcodeinyourprogramandmovingtothenextlineorwhereveritwassentbyaflowcontrolstatement.Asingle-threadedprogramhasonlyonefinger.Butamultithreadedprogramhasmultiplefingers.Eachfingerstillmovestothenextlineofcodeasdefinedbytheflowcontrolstatements,butthefingerscanbeatdifferentplacesintheprogram,executingdifferentlinesofcodeatthesametime.(Alloftheprogramsinthisbooksofarhavebeensinglethreaded.)
Ratherthanhavingallofyourcodewaituntilthetime.sleep()functionfinishes,youcanexecutethedelayedorscheduledcodeinaseparatethreadusingPython’sthreadingmodule.Theseparatethreadwillpauseforthetime.sleepcalls.Meanwhile,yourprogramcandootherworkintheoriginalthread.
Tomakeaseparatethread,youfirstneedtomakeaThreadobjectbycallingthethreading.Thread()function.EnterthefollowingcodeinanewfileandsaveitasthreadDemo.py:
importthreading,time
print('Startofprogram.')
➊deftakeANap():
time.sleep(5)
print('Wakeup!')
➋threadObj=threading.Thread(target=takeANap)
➌threadObj.start()
print('Endofprogram.')
At➊,wedefineafunctionthatwewanttouseinanewthread.TocreateaThreadobject,wecallthreading.Thread()andpassitthekeywordargumenttarget=takeANap➋.ThismeansthefunctionwewanttocallinthenewthreadistakeANap().Noticethatthekeywordargumentistarget=takeANap,nottarget=takeANap().ThisisbecauseyouwanttopassthetakeANap()functionitselfastheargument,notcalltakeANap()andpassitsreturnvalue.
AfterwestoretheThreadobjectcreatedbythreading.Thread()inthreadObj,wecall
threadObj.start()➌tocreatethenewthreadandstartexecutingthetargetfunctioninthenewthread.Whenthisprogramisrun,theoutputwilllooklikethis:
Startofprogram.
Endofprogram.
Wakeup!
Thiscanbeabitconfusing.Ifprint('Endofprogram.')isthelastlineoftheprogram,youmightthinkthatitshouldbethelastthingprinted.ThereasonWakeup!comesafteritisthatwhenthreadObj.start()iscalled,thetargetfunctionforthreadObjisruninanewthreadofexecution.ThinkofitasasecondfingerappearingatthestartofthetakeANap()function.Themainthreadcontinuestoprint('Endofprogram.').Meanwhile,thenewthreadthathasbeenexecutingthetime.sleep(5)call,pausesfor5seconds.Afteritwakesfromits5-secondnap,itprints'Wakeup!'andthenreturnsfromthetakeANap()function.Chronologically,'Wakeup!'isthelastthingprintedbytheprogram.
Normallyaprogramterminateswhenthelastlineofcodeinthefilehasrun(orthesys.exit()functioniscalled).ButthreadDemo.pyhastwothreads.Thefirstistheoriginalthreadthatbeganatthestartoftheprogramandendsafterprint('Endofprogram.').ThesecondthreadiscreatedwhenthreadObj.start()iscalled,beginsatthestartofthetakeANap()function,andendsaftertakeANap()returns.
APythonprogramwillnotterminateuntilallitsthreadshaveterminated.WhenyouranthreadDemo.py,eventhoughtheoriginalthreadhadterminated,thesecondthreadwasstillexecutingthetime.sleep(5)call.
PassingArgumentstotheThread’sTargetFunctionIfthetargetfunctionyouwanttoruninthenewthreadtakesarguments,youcanpassthetargetfunction’sargumentstothreading.Thread().Forexample,sayyouwantedtorunthisprint()callinitsownthread:
>>>print('Cats','Dogs','Frogs',sep='&')
Cats&Dogs&Frogs
Thisprint()callhasthreeregulararguments,'Cats','Dogs',and'Frogs',andonekeywordargument,sep='&'.Theregularargumentscanbepassedasalisttotheargskeywordargumentinthreading.Thread().Thekeywordargumentcanbespecifiedasadictionarytothekwargskeywordargumentinthreading.Thread().
Enterthefollowingintotheinteractiveshell:>>>importthreading
>>>threadObj=threading.Thread(target=print,args=['Cats','Dogs','Frogs'],
kwargs={'sep':'&'})
>>>threadObj.start()
Cats&Dogs&Frogs
Tomakesurethearguments'Cats','Dogs',and'Frogs'getpassedtoprint()inthenewthread,wepassargs=['Cats','Dogs','Frogs']tothreading.Thread().Tomakesurethekeywordargumentsep='&'getspassedtoprint()inthenewthread,wepasskwargs={'sep':'&'}tothreading.Thread().
ThethreadObj.start()callwillcreateanewthreadtocalltheprint()function,anditwillpass'Cats','Dogs',and'Frogs'asargumentsand'&'forthesepkeywordargument.
Thisisanincorrectwaytocreatethenewthreadthatcallsprint():threadObj=threading.Thread(target=print('Cats','Dogs','Frogs',sep='&'))
Whatthisendsupdoingiscallingtheprint()functionandpassingitsreturnvalue(print()’sreturnvalueisalwaysNone)asthetargetkeywordargument.Itdoesn’tpasstheprint()functionitself.Whenpassingargumentstoafunctioninanewthread,usethethreading.Thread()function’sargsandkwargskeywordarguments.
ConcurrencyIssuesYoucaneasilycreateseveralnewthreadsandhavethemallrunningatthesametime.Butmultiplethreadscanalsocauseproblemscalledconcurrencyissues.Theseissueshappenwhenthreadsreadandwritevariablesatthesametime,causingthethreadstotripovereachother.Concurrencyissuescanbehardtoreproduceconsistently,makingthemhardtodebug.
Multithreadedprogrammingisitsownwidesubjectandbeyondthescopeofthisbook.Whatyouhavetokeepinmindisthis:Toavoidconcurrencyissues,neverletmultiplethreadsreadorwritethesamevariables.WhenyoucreateanewThreadobject,makesureitstargetfunctionusesonlylocalvariablesinthatfunction.Thiswillavoidhard-to-debugconcurrencyissuesinyourprograms.
NOTE
Abeginner’stutorialonmultithreadedprogrammingisavailableathttp://nostarch.com/automatestuff/.
Project:MultithreadedXKCDDownloaderInChapter11,youwroteaprogramthatdownloadedalloftheXKCDcomicstripsfromtheXKCDwebsite.Thiswasasingle-threadedprogram:Itdownloadedonecomicatatime.Muchoftheprogram’srunningtimewasspentestablishingthenetworkconnectiontobeginthedownloadandwritingthedownloadedimagestotheharddrive.IfyouhaveabroadbandInternetconnection,yoursingle-threadedprogramwasn’tfullyutilizingtheavailablebandwidth.
AmultithreadedprogramthathassomethreadsdownloadingcomicswhileothersareestablishingconnectionsandwritingthecomicimagefilestodiskusesyourInternetconnectionmoreefficientlyanddownloadsthecollectionofcomicsmorequickly.OpenanewfileeditorwindowandsaveitasmultidownloadXkcd.py.Youwillmodifythisprogramtoaddmultithreading.Thecompletelymodifiedsourcecodeisavailabletodownloadfromhttp://nostarch.com/automatestuff/.
Step1:ModifytheProgramtoUseaFunctionThisprogramwillmostlybethesamedownloadingcodefromChapter11,soI’llskiptheexplanationfortheRequestsandBeautifulSoupcode.ThemainchangesyouneedtomakeareimportingthethreadingmoduleandmakingadownloadXkcd()function,whichtakesstartingandendingcomicnumbersasparameters.
Forexample,callingdownloadXkcd(140,280)wouldloopoverthedownloadingcodetodownloadthecomicsathttp://xkcd.com/140,http://xkcd.com/141,http://xkcd.com/142,andsoon,uptohttp://xkcd.com/279.EachthreadthatyoucreatewillcalldownloadXkcd()andpassadifferentrangeofcomicstodownload.
AddthefollowingcodetoyourmultidownloadXkcd.pyprogram:#!python3
#multidownloadXkcd.py-DownloadsXKCDcomicsusingmultiplethreads.
importrequests,os,bs4,threading
➊os.makedirs('xkcd',exist_ok=True)#storecomicsin./xkcd
➋defdownloadXkcd(startComic,endComic):
➌forurlNumberinrange(startComic,endComic):
#Downloadthepage.
print('Downloadingpagehttp://xkcd.com/%s…'%(urlNumber))
➍res=requests.get('http://xkcd.com/%s'%(urlNumber))
res.raise_for_status()
➎soup=bs4.BeautifulSoup(res.text)
#FindtheURLofthecomicimage.
➏comicElem=soup.select('#comicimg')
ifcomicElem==[]:
print('Couldnotfindcomicimage.')
else:
➐comicUrl=comicElem[0].get('src')
#Downloadtheimage.
print('Downloadingimage%s…'%(comicUrl))
➑res=requests.get(comicUrl)
res.raise_for_status()
#Savetheimageto./xkcd.
imageFile=open(os.path.join('xkcd',os.path.basename(comicUrl)),'wb')
forchunkinres.iter_content(100000):
imageFile.write(chunk)
imageFile.close()
#TODO:CreateandstarttheThreadobjects.
#TODO:Waitforallthreadstoend.
Afterimportingthemodulesweneed,wemakeadirectorytostorecomicsin➊andstartdefiningdownloadxkcd()➋.Weloopthroughallthenumbersinthespecifiedrange➌anddownloadeachpage➍.WeuseBeautifulSouptolookthroughtheHTMLofeachpage➎andfindthecomicimage➏.Ifnocomicimageisfoundonapage,weprintamessage.Otherwise,wegettheURLoftheimage➐anddownloadtheimage➑.Finally,wesavetheimagetothedirectorywecreated.
Step2:CreateandStartThreadsNowthatwe’vedefineddownloadXkcd(),we’llcreatethemultiplethreadsthateachcalldownloadXkcd()todownloaddifferentrangesofcomicsfromtheXKCDwebsite.AddthefollowingcodetomultidownloadXkcd.pyafterthedownloadXkcd()functiondefinition:
#!python3
#multidownloadXkcd.py-DownloadsXKCDcomicsusingmultiplethreads.
--snip--
#CreateandstarttheThreadobjects.
downloadThreads=[]#alistofalltheThreadobjects
foriinrange(0,1400,100):#loops14times,creates14threads
downloadThread=threading.Thread(target=downloadXkcd,args=(i,i+99))
downloadThreads.append(downloadThread)
downloadThread.start()
FirstwemakeanempylistdownloadThreads;thelistwillhelpuskeeptrackofthemanyThreadobjectswe’llcreate.Thenwestartourforloop.Eachtimethroughtheloop,wecreateaThreadobjectwiththreading.Thread(),appendtheThreadobjecttothelist,andcallstart()tostartrunningdownloadXkcd()inthenewthread.Sincetheforloopsetstheivariablefrom0to1400atstepsof100,iwillbesetto0onthefirstiteration,100ontheseconditeration,200onthethird,andsoon.Sincewepassargs=(i,i+99)tothreading.Thread(),thetwoargumentspassedtodownloadXkcd()willbe0and99onthefirstiteration,100and199ontheseconditeration,200and299onthethird,andsoon.
AstheThreadobject’sstart()methodiscalledandthenewthreadbeginstorunthecodeinsidedownloadXkcd(),themainthreadwillcontinuetothenextiterationoftheforloopandcreatethenextthread.
Step3:WaitforAllThreadstoEndThemainthreadmovesonasnormalwhiletheotherthreadswecreatedownloadcomics.Butsaythere’ssomecodeyoudon’twanttoruninthemainthreaduntilallthethreadshavecompleted.CallingaThreadobject’sjoin()methodwillblockuntilthatthreadhasfinished.ByusingaforlooptoiterateoveralltheThreadobjectsinthedownloadThreadslist,themainthreadcancallthejoin()methodoneachoftheotherthreads.Addthefollowingtothebottomofyourprogram:
#!python3
#multidownloadXkcd.py-DownloadsXKCDcomicsusingmultiplethreads.
--snip--
#Waitforallthreadstoend.
fordownloadThreadindownloadThreads:
downloadThread.join()
print('Done.')
The'Done.'stringwillnotbeprinteduntilallofthejoin()callshavereturned.IfaThreadobjecthasalreadycompletedwhenitsjoin()methodiscalled,thenthemethodwillsimplyreturnimmediately.Ifyouwantedtoextendthisprogramwithcodethatrunsonlyafterallofthecomicsdownloaded,youcouldreplacetheprint('Done.')linewithyournewcode.
LaunchingOtherProgramsfromPythonYourPythonprogramcanstartotherprogramsonyourcomputerwiththePopen()functioninthebuilt-insubprocessmodule.(ThePinthenameofthePopen()functionstandsforprocess.)Ifyouhavemultipleinstancesofanapplicationopen,eachofthoseinstancesisaseparateprocessofthesameprogram.Forexample,ifyouopenmultiplewindowsofyourwebbrowseratthesametime,eachofthosewindowsisadifferentprocessofthewebbrowserprogram.SeeFigure15-1foranexampleofmultiplecalculatorprocessesopenatonce.
Everyprocesscanhavemultiplethreads.Unlikethreads,aprocesscannotdirectlyreadandwriteanotherprocess’svariables.Ifyouthinkofamultithreadedprogramashavingmultiplefingersfollowingsourcecode,thenhavingmultipleprocessesofthesameprogramopenislikehavingafriendwithaseparatecopyoftheprogram’ssourcecode.Youarebothindependentlyexecutingthesameprogram.
IfyouwanttostartanexternalprogramfromyourPythonscript,passtheprogram’sfilenametosubprocess.Popen().(OnWindows,right-clicktheapplication’sStartmenuitemandselectPropertiestoviewtheapplication’sfilename.OnOSX,CTRL-clicktheapplicationandselectShowPackageContentstofindthepathtotheexecutablefile.)ThePopen()functionwillthenimmediatelyreturn.KeepinmindthatthelaunchedprogramisnotruninthesamethreadasyourPythonprogram.
Figure15-1.Sixrunningprocessesofthesamecalculatorprogram
OnaWindowscomputer,enterthefollowingintotheinteractiveshell:>>>importsubprocess
>>>subprocess.Popen('C:\\Windows\\System32\\calc.exe')
<subprocess.Popenobjectat0x0000000003055A58>
OnUbuntuLinux,youwouldenterthefollowing:>>>importsubprocess
>>>subprocess.Popen('/usr/bin/gnome-calculator')
<subprocess.Popenobjectat0x7f2bcf93b20>
OnOSX,theprocessisslightlydifferent.SeeOpeningFileswithDefaultApplications.
ThereturnvalueisaPopenobject,whichhastwousefulmethods:poll()andwait().
Youcanthinkofthepoll()methodasaskingyourfriendifshe’sfinishedrunningthecodeyougaveher.Thepoll()methodwillreturnNoneiftheprocessisstillrunningatthetimepoll()iscalled.Iftheprogramhasterminated,itwillreturntheprocess’sintegerexitcode.Anexitcodeisusedtoindicatewhethertheprocessterminatedwithouterrors(anexitcodeof0)orwhetheranerrorcausedtheprocesstoterminate(anonzeroexitcode—generally1,butitmayvarydependingontheprogram).
Thewait()methodislikewaitingforyourfriendtofinishworkingonhercodebeforeyoukeepworkingonyours.Thewait()methodwillblockuntilthelaunchedprocesshas
terminated.Thisishelpfulifyouwantyourprogramtopauseuntiltheuserfinisheswiththeotherprogram.Thereturnvalueofwait()istheprocess’sintegerexitcode.
OnWindows,enterthefollowingintotheinteractiveshell.Notethatthewait()callwillblockuntilyouquitthelaunchedcalculatorprogram.
➊>>>calcProc=subprocess.Popen('c:\\Windows\\System32\\calc.exe')
➋>>>calcProc.poll()==None
True
➌>>>calcProc.wait()
0
>>>calcProc.poll()
0
Hereweopenacalculatorprocess➊.Whileit’sstillrunning,wecheckifpoll()returnsNone➋.Itshould,astheprocessisstillrunning.Thenweclosethecalculatorprogramandcallwait()ontheterminatedprocess➌.wait()andpoll()nowreturn0,indicatingthattheprocessterminatedwithouterrors.
PassingCommandLineArgumentstoPopen()YoucanpasscommandlineargumentstoprocessesyoucreatewithPopen().Todoso,youpassalistasthesoleargumenttoPopen().Thefirststringinthislistwillbetheexecutablefilenameoftheprogramyouwanttolaunch;allthesubsequentstringswillbethecommandlineargumentstopasstotheprogramwhenitstarts.Ineffect,thislistwillbethevalueofsys.argvforthelaunchedprogram.
Mostapplicationswithagraphicaluserinterface(GUI)don’tusecommandlineargumentsasextensivelyascommandline–basedorterminal-basedprogramsdo.ButmostGUIapplicationswillacceptasingleargumentforafilethattheapplicationswillimmediatelyopenwhentheystart.Forexample,ifyou’reusingWindows,createasimpletextfilecalledC:\hello.txtandthenenterthefollowingintotheinteractiveshell:
>>>subprocess.Popen(['C:\\Windows\\notepad.exe','C:\\hello.txt'])
<subprocess.Popenobjectat0x00000000032DCEB8>
ThiswillnotonlylaunchtheNotepadapplicationbutalsohaveitimmediatelyopentheC:\hello.txtfile.
TaskScheduler,launchd,andcronIfyouarecomputersavvy,youmayknowaboutTaskScheduleronWindows,launchdonOSX,orthecronscheduleronLinux.Thesewell-documentedandreliabletoolsallallowyoutoscheduleapplicationstolaunchatspecifictimes.Ifyou’dliketolearnmoreaboutthem,youcanfindlinkstotutorialsathttp://nostarch.com/automatestuff/.
Usingyouroperatingsystem’sbuilt-inschedulersavesyoufromwritingyourownclock-checkingcodetoscheduleyourprograms.However,usethetime.sleep()functionifyoujustneedyourprogramtopausebriefly.Orinsteadofusingtheoperatingsystem’sscheduler,yourcodecanloopuntilacertaindateandtime,callingtime.sleep(1)eachtimethroughtheloop.
OpeningWebsiteswithPythonThewebbrowser.open()functioncanlaunchawebbrowserfromyourprogramtoaspecificwebsite,ratherthanopeningthebrowserapplicationwithsubprocess.Popen().
SeeProject:mapit.pywiththewebbrowserModuleformoredetails.
RunningOtherPythonScriptsYoucanlaunchaPythonscriptfromPythonjustlikeanyotherapplication.Youjusthavetopassthepython.exeexecutabletoPopen()andthefilenameofthe.pyscriptyouwanttorunasitsargument.Forexample,thefollowingwouldrunthehello.pyscriptfromChapter1:
>>>subprocess.Popen(['C:\\python34\\python.exe','hello.py'])
<subprocess.Popenobjectat0x000000000331CF28>
PassPopen()alistcontainingastringofthePythonexecutable’spathandastringofthescript’sfilename.Ifthescriptyou’relaunchingneedscommandlinearguments,addthemtothelistafterthescript’sfilename.ThelocationofthePythonexecutableonWindowsisC:\python34\python.exe.OnOSX,itis/Library/Frameworks/Python.framework/Versions/3.3/bin/python3.OnLinux,itis/usr/bin/python3.
UnlikeimportingthePythonprogramasamodule,whenyourPythonprogramlaunchesanotherPythonprogram,thetwoareruninseparateprocessesandwillnotbeabletoshareeachother’svariables.
OpeningFileswithDefaultApplicationsDouble-clickinga.txtfileonyourcomputerwillautomaticallylaunchtheapplicationassociatedwiththe.txtfileextension.Yourcomputerwillhaveseveralofthesefileextensionassociationssetupalready.PythoncanalsoopenfilesthiswaywithPopen().
Eachoperatingsystemhasaprogramthatperformstheequivalentofdouble-clickingadocumentfiletoopenit.OnWindows,thisisthestartprogram.OnOSX,thisistheopenprogram.OnUbuntuLinux,thisistheseeprogram.Enterthefollowingintotheinteractiveshell,passing'start','open',or'see'toPopen()dependingonyoursystem:
>>>fileObj=open('hello.txt','w')
>>>fileObj.write('Helloworld!')
12
>>>fileObj.close()
>>>importsubprocess
>>>subprocess.Popen(['start','hello.txt'],shell=True)
HerewewriteHelloworld!toanewhello.txtfile.ThenwecallPopen(),passingitalistcontainingtheprogramname(inthisexample,'start'forWindows)andthefilename.Wealsopasstheshell=Truekeywordargument,whichisneededonlyonWindows.Theoperatingsystemknowsallofthefileassociationsandcanfigureoutthatitshouldlaunch,say,Notepad.exetohandlethehello.txtfile.
OnOSX,theopenprogramisusedforopeningbothdocumentfilesandprograms.EnterthefollowingintotheinteractiveshellifyouhaveaMac:
>>>subprocess.Popen(['open','/Applications/Calculator.app/'])
<subprocess.Popenobjectat0x10202ff98>
TheCalculatorappshouldopen.
THEUNIXPHILOSOPHY
Programswelldesignedtobelaunchedbyotherprogramsbecomemorepowerfulthantheircodealone.TheUnixphilosophyisasetofsoftwaredesignprinciplesestablishedbytheprogrammersoftheUnixoperatingsystem(onwhichthemodernLinuxandOSXarebuilt).Itsaysthatit’sbettertowritesmall,limited-purposeprogramsthatcaninteroperate,ratherthanlarge,feature-richapplications.Thesmallerprogramsareeasiertounderstand,andbybeinginteroperable,theycanbethebuildingblocksofmuchmorepowerfulapplications.
Smartphoneappsfollowthisapproachaswell.Ifyourrestaurantappneedstodisplaydirectionstoacafé,thedevelopersdidn’treinventthewheelbywritingtheirownmapcode.Therestaurantappsimplylaunchesamapappwhilepassingitthecafé’saddress,justasyourPythoncodewouldcallafunctionandpassitarguments.
ThePythonprogramsyou’vebeenwritinginthisbookmostlyfittheUnixphilosophy,especiallyinoneimportantway:Theyusecommandlineargumentsratherthaninput()functioncalls.Ifalltheinformationyourprogramneedscanbesuppliedupfront,itispreferabletohavethisinformationpassedascommandlineargumentsratherthanwaitingfortheusertotypeitin.Thisway,thecommandlineargumentscanbeenteredbyahumanuserorsuppliedbyanotherprogram.Thisinteroperableapproachwillmakeyourprogramsreusableaspartofanotherprogram.
Thesoleexceptionisthatyoudon’twantpasswordspassedascommandlinearguments,sincethecommandlinemayrecordthemaspartofitscommandhistoryfeature.Instead,yourprogramshouldcalltheinput()functionwhenitneedsyoutoenterapassword.
YoucanreadmoreaboutUnixphilosophyathttps://en.wikipedia.org/wiki/Unix_philosophy/.
Project:SimpleCountdownProgramJustlikeit’shardtofindasimplestopwatchapplication,itcanbehardtofindasimplecountdownapplication.Let’swriteacountdownprogramthatplaysanalarmattheendofthecountdown.
Atahighlevel,here’swhatyourprogramwilldo:
Countdownfrom60.Playasoundfile(alarm.wav)whenthecountdownreacheszero.
Thismeansyourcodewillneedtodothefollowing:
Pauseforonesecondinbetweendisplayingeachnumberinthecountdownbycallingtime.sleep().Callsubprocess.Popen()toopenthesoundfilewiththedefaultapplication.
Openanewfileeditorwindowandsaveitascountdown.py.
Step1:CountDownThisprogramwillrequirethetimemoduleforthetime.sleep()functionandthesubprocessmoduleforthesubprocess.Popen()function.Enterthefollowingcodeandsavethefileascountdown.py:
#!python3
#countdown.py-Asimplecountdownscript.
importtime,subprocess
➊timeLeft=60
whiletimeLeft>0:
➋print(timeLeft,end='')
➌time.sleep(1)
➍timeLeft=timeLeft-1
#TODO:Attheendofthecountdown,playasoundfile.
Afterimportingtimeandsubprocess,makeavariablecalledtimeLefttoholdthenumberofsecondsleftinthecountdown➊.Itcanstartat60—oryoucanchangethevalueheretowhateveryouneedorevenhaveitgetsetfromacommandlineargument.
Inawhileloop,youdisplaytheremainingcount➋,pauseforonesecond➌,andthendecrementthetimeLeftvariable➍beforetheloopstartsoveragain.TheloopwillkeeploopingaslongastimeLeftisgreaterthan0.Afterthat,thecountdownwillbeover.
Step2:PlaytheSoundFileWhiletherearethird-partymodulestoplaysoundfilesofvariousformats,thequickandeasywayistojustlaunchwhateverapplicationtheuseralreadyusestoplaysoundfiles.Theoperatingsystemwillfigureoutfromthe.wavfileextensionwhichapplicationitshouldlaunchtoplaythefile.This.wavfilecouldeasilybesomeothersoundfileformat,suchas.mp3or.ogg.
Youcanuseanysoundfilethatisonyourcomputertoplayattheendofthecountdown,oryoucandownloadalarm.wavfromhttp://nostarch.com/automatestuff/.
Addthefollowingtoyourcode:
#!python3
#countdown.py-Asimplecountdownscript.
importtime,subprocess
--snip--
#Attheendofthecountdown,playasoundfile.
subprocess.Popen(['start','alarm.wav'],shell=True)
Afterthewhileloopfinishes,alarm.wav(orthesoundfileyouchoose)willplaytonotifytheuserthatthecountdownisover.OnWindows,besuretoinclude'start'inthelistyoupasstoPopen()andpassthekeywordargumentshell=True.OnOSX,pass'open'insteadof'start'andremoveshell=True.
Insteadofplayingasoundfile,youcouldsaveatextfilesomewherewithamessagelikeBreaktimeisover!andusePopen()toopenitattheendofthecountdown.Thiswilleffectivelycreateapop-upwindowwithamessage.Oryoucouldusethewebbrowser.open()functiontoopenaspecificwebsiteattheendofthecountdown.Unlikesomefreecountdownapplicationyou’dfindonline,yourowncountdownprogram’salarmcanbeanythingyouwant!
IdeasforSimilarProgramsAcountdownisasimpledelaybeforecontinuingtheprogram’sexecution.Thiscanalsobeusedforotherapplicationsandfeatures,suchasthefollowing:
Usetime.sleep()togivetheuserachancetopressCTRL-Ctocancelanaction,suchasdeletingfiles.Yourprogramcanprinta“PressCTRL-Ctocancel”messageandthenhandleanyKeyboardInterruptexceptionswithtryandexceptstatements.Foralong-termcountdown,youcanusetimedeltaobjectstomeasurethenumberofdays,hours,minutes,andsecondsuntilsomepoint(abirthday?ananniversary?)inthefuture.
SummaryTheUnixepoch(January1,1970,atmidnight,UTC)isastandardreferencetimeformanyprogramminglanguages,includingPython.Whilethetime.time()functionmodulereturnsanepochtimestamp(thatis,afloatvalueofthenumberofsecondssincetheUnixepoch),thedatetimemoduleisbetterforperformingdatearithmeticandformattingorparsingstringswithdateinformation.
Thetime.sleep()functionwillblock(thatis,notreturn)foracertainnumberofseconds.Itcanbeusedtoaddpausestoyourprogram.Butifyouwanttoscheduleyourprogramstostartatacertaintime,theinstructionsathttp://nostarch.com/automatestuff/cantellyouhowtousethescheduleralreadyprovidedbyyouroperatingsystem.
Thethreadingmoduleisusedtocreatemultiplethreads,whichisusefulwhenyouneedtodownloadmultiplefilesordoothertaskssimultaneously.Butmakesurethethreadreadsandwritesonlylocalvariables,oryoumightrunintoconcurrencyissues.
Finally,yourPythonprogramscanlaunchotherapplicationswiththesubprocess.Popen()function.CommandlineargumentscanbepassedtothePopen()calltoopenspecificdocumentswiththeapplication.Alternatively,youcanusethestart,open,orseeprogramwithPopen()touseyourcomputer’sfileassociationstoautomaticallyfigureoutwhichapplicationtousetoopenadocument.Byusingtheotherapplicationsonyourcomputer,yourPythonprogramscanleveragetheircapabilitiesforyourautomationneeds.
PracticeQuestionsQ: 1.WhatistheUnixepoch?
Q: 2.WhatfunctionreturnsthenumberofsecondssincetheUnixepoch?
Q: 3.Howcanyoupauseyourprogramforexactly5seconds?
Q: 4.Whatdoestheround()functionreturn?
Q: 5.Whatisthedifferencebetweenadatetimeobjectandatimedeltaobject?
Q: 6.Sayyouhaveafunctionnamedspam().Howcanyoucallthisfunctionandrunthecodeinsideitinaseparatethread?
Q: 7.Whatshouldyoudotoavoidconcurrencyissueswithmultiplethreads?
Q: 8.HowcanyouhaveyourPythonprogramrunthecalc.exeprogramlocatedintheC:\Windows\System32folder?
PracticeProjectsForpractice,writeprogramsthatdothefollowing.
PrettifiedStopwatchExpandthestopwatchprojectfromthischaptersothatitusestherjust()andljust()stringmethodsto“prettify”theoutput.(ThesemethodswerecoveredinChapter6.)Insteadofoutputsuchasthis:
Lap#1:3.56(3.56)
Lap#2:8.63(5.07)
Lap#3:17.68(9.05)
Lap#4:19.11(1.43)
…theoutputwilllooklikethis:Lap#1:3.56(3.56)
Lap#2:8.63(5.07)
Lap#3:17.68(9.05)
Lap#4:19.11(1.43)
NotethatyouwillneedstringversionsofthelapNum,lapTime,andtotalTimeintegerandfloatvariablesinordertocallthestringmethodsonthem.
Next,usethepyperclipmoduleintroducedinChapter6tocopythetextoutputtotheclipboardsotheusercanquicklypastetheoutputtoatextfileoremail.
ScheduledWebComicDownloaderWriteaprogramthatchecksthewebsitesofseveralwebcomicsandautomaticallydownloadstheimagesifthecomicwasupdatedsincetheprogram’slastvisit.Youroperatingsystem’sscheduler(ScheduledTasksonWindows,launchdonOSX,andcrononLinux)canrunyourPythonprogramonceaday.ThePythonprogramitselfcandownloadthecomicandthencopyittoyourdesktopsothatitiseasytofind.Thiswillfreeyoufromhavingtocheckthewebsiteyourselftoseewhetherithasupdated.(Alistofwebcomicsisavailableathttp://nostarch.com/automatestuff/.)
Chapter16.SendingEmailandTextMessagesCheckingandreplyingtoemailisahugetimesink.Ofcourse,youcan’tjustwriteaprogramtohandleallyouremailforyou,sinceeachmessagerequiresitsownresponse.Butyoucanstillautomateplentyofemail-relatedtasksonceyouknowhowtowriteprogramsthatcansendandreceiveemail.
Forexample,maybeyouhaveaspreadsheetfullofcustomerrecordsandwanttosendeachcustomeradifferentformletterdependingontheirageandlocationdetails.Commercialsoftwaremightnotbeabletodothisforyou;fortunately,youcanwriteyourownprogramtosendtheseemails,savingyourselfalotoftimecopyingandpastingformemails.
YoucanalsowriteprogramstosendemailsandSMStextstonotifyyouofthingsevenwhileyou’reawayfromyourcomputer.Ifyou’reautomatingataskthattakesacoupleofhourstodo,youdon’twanttogobacktoyourcomputereveryfewminutestocheckontheprogram’sstatus.Instead,theprogramcanjusttextyourphonewhenit’sdone—freeingyoutofocusonmoreimportantthingswhileyou’reawayfromyourcomputer.
SMTPMuchlikeHTTPistheprotocolusedbycomputerstosendwebpagesacrosstheInternet,SimpleMailTransferProtocol(SMTP)istheprotocolusedforsendingemail.SMTPdictateshowemailmessagesshouldbeformatted,encrypted,andrelayedbetweenmailservers,andalltheotherdetailsthatyourcomputerhandlesafteryouclickSend.Youdon’tneedtoknowthesetechnicaldetails,though,becausePython’ssmtplibmodulesimplifiesthemintoafewfunctions.
SMTPjustdealswithsendingemailstoothers.Adifferentprotocol,calledIMAP,dealswithretrievingemailssenttoyouandisdescribedinIMAP.
SendingEmailYoumaybefamiliarwithsendingemailsfromOutlookorThunderbirdorthroughawebsitesuchasGmailorYahoo!Mail.Unfortunately,Pythondoesn’tofferyouanicegraphicaluserinterfacelikethoseservices.Instead,youcallfunctionstoperformeachmajorstepofSMTP,asshowninthefollowinginteractiveshellexample.
NOTE
Don’tenterthisexampleinIDLE;itwon’tworkbecausesmtp.example.com,[email protected],MY_SECRET_PASSWORD,andalice@example.comarejustplaceholders.ThiscodeisjustanoverviewoftheprocessofsendingemailwithPython.
>>>importsmtplib
>>>smtpObj=smtplib.SMTP('smtp.example.com',587)
>>>smtpObj.ehlo()
(250,b'mx.example.comatyourservice,[216.172.148.131]\nSIZE35882577\
n8BITMIME\nSTARTTLS\nENHANCEDSTATUSCODES\nCHUNKING')
>>>smtpObj.starttls()
(220,b'2.0.0ReadytostartTLS')
>>>smtpObj.login('[email protected]','MY_SECRET_PASSWORD')
(235,b'2.7.0Accepted')
>>>smtpObj.sendmail('[email protected]','[email protected]','Subject:So
long.\nDearAlice,solongandthanksforallthefish.Sincerely,Bob')
{}
>>>smtpObj.quit()
(221,b'2.0.0closingconnectionko10sm23097611pbd.52-gsmtp')
Inthefollowingsections,we’llgothrougheachstep,replacingtheplaceholderswithyourinformationtoconnectandlogintoanSMTPserver,sendanemail,anddisconnectfromtheserver.
ConnectingtoanSMTPServerIfyou’veeversetupThunderbird,Outlook,oranotherprogramtoconnecttoyouremailaccount,youmaybefamiliarwithconfiguringtheSMTPserverandport.Thesesettingswillbedifferentforeachemailprovider,butawebsearchfor<yourprovider>smtpsettingsshouldturnuptheserverandporttouse.
ThedomainnamefortheSMTPserverwillusuallybethenameofyouremailprovider’sdomainname,withsmtp.infrontofit.Forexample,Gmail’sSMTPserverisatsmtp.gmail.com.Table16-1listssomecommonemailprovidersandtheirSMTPservers.(Theportisanintegervalueandwillalmostalwaysbe587,whichisusedbythecommandencryptionstandard,TLS.)
Table16-1.EmailProvidersandTheirSMTPServers
Provider SMTPserverdomainname
Gmail smtp.gmail.com
Outlook.com/Hotmail.com smtp-mail.outlook.com
YahooMail smtp.mail.yahoo.com
AT&T smpt.mail.att.net(port465)
Comcast smtp.comcast.net
Verizon smtp.verizon.net(port465)
Onceyouhavethedomainnameandportinformationforyouremailprovider,createanSMTPobjectbycallingsmptlib.SMTP(),passingthedomainnameasastringargument,andpassingtheportasanintegerargument.TheSMTPobjectrepresentsaconnectiontoanSMTPmailserverandhasmethodsforsendingemails.Forexample,thefollowingcallcreatesanSMTPobjectforconnectingtoGmail:
>>>smtpObj=smtplib.SMTP('smtp.gmail.com',587)
>>>type(smtpObj)
<class'smtplib.SMTP'>
Enteringtype(smtpObj)showsyouthatthere’sanSMTPobjectstoredinsmtpObj.You’llneedthisSMTPobjectinordertocallthemethodsthatlogyouinandsendemails.Ifthesmptlib.SMTP()callisnotsuccessful,yourSMTPservermightnotsupportTLSonport587.Inthiscase,youwillneedtocreateanSMTPobjectusingsmtplib.SMTP_SSL()andport465instead.
>>>smtpObj=smtplib.SMTP_SSL('smtp.gmail.com',465)
NOTE
IfyouarenotconnectedtotheInternet,Pythonwillraiseasocket.gaierror:[Errno11004]getaddrinfofailedorsimilarexception.
Foryourprograms,thedifferencesbetweenTLSandSSLaren’timportant.YouonlyneedtoknowwhichencryptionstandardyourSMTPserverusessoyouknowhowtoconnecttoit.Inalloftheinteractiveshellexamplesthatfollow,thesmtpObjvariablewillcontainanSMTPobjectreturnedbythesmtplib.SMTP()orsmtplib.SMTP_SSL()function.
SendingtheSMTP“Hello”MessageOnceyouhavetheSMTPobject,callitsoddlynamedehlo()methodto“sayhello”totheSMTPemailserver.ThisgreetingisthefirststepinSMTPandisimportantforestablishingaconnectiontotheserver.Youdon’tneedtoknowthespecificsoftheseprotocols.Justbesuretocalltheehlo()methodfirstthingaftergettingtheSMTPobjectorelsethelatermethodcallswillresultinerrors.Thefollowingisanexampleofanehlo()callanditsreturnvalue:
>>>smtpObj.ehlo()
(250,b'mx.google.comatyourservice,[216.172.148.131]\nSIZE35882577\
n8BITMIME\nSTARTTLS\nENHANCEDSTATUSCODES\nCHUNKING')
Ifthefirstiteminthereturnedtupleistheinteger250(thecodefor“success”inSMTP),thenthegreetingsucceeded.
StartingTLSEncryptionIfyouareconnectingtoport587ontheSMTPserver(thatis,you’reusingTLSencryption),you’llneedtocallthestarttls()methodnext.Thisrequiredstepenablesencryptionforyourconnection.Ifyouareconnectingtoport465(usingSSL),thenencryptionisalreadysetup,andyoushouldskipthisstep.
Here’sanexampleofthestarttls()methodcall:>>>smtpObj.starttls()
(220,b'2.0.0ReadytostartTLS')
starttls()putsyourSMTPconnectioninTLSmode.The220inthereturnvaluetellsyouthattheserverisready.
LoggingintotheSMTPServerOnceyourencryptedconnectiontotheSMTPserverissetup,youcanloginwithyourusername(usuallyyouremailaddress)andemailpasswordbycallingthelogin()method.
>>>smtpObj.login('[email protected]','MY_SECRET_PASSWORD')
(235,b'2.7.0Accepted')
GMAIL’SAPPLICATION-SPECIFICPASSWORDS
GmailhasanadditionalsecurityfeatureforGoogleaccountscalledapplication-specificpasswords.IfyoureceiveanApplication-specificpasswordrequirederrormessagewhenyourprogramtriestologin,youwillhavetosetuponeofthesepasswordsforyourPythonscript.Checkouttheresourcesathttp://nostarch.com/automatestuff/fordetaileddirectionsonhowtosetupanapplication-specificpasswordforyourGoogleaccount.
Passastringofyouremailaddressasthefirstargumentandastringofyourpasswordasthesecondargument.The235inthereturnvaluemeansauthenticationwassuccessful.Pythonwillraiseansmtplib.SMTPAuthenticationErrorexceptionforincorrectpasswords.
WARNING
Becarefulaboutputtingpasswordsinyoursourcecode.Ifanyoneevercopiesyourprogram,they’llhaveaccesstoyouremailaccount!It’sagoodideatocallinput()andhavetheusertypeinthepassword.Itmaybeinconvenienttohavetoenterapasswordeachtimeyourunyourprogram,butthisapproachwillpreventyoufromleavingyourpasswordinanunencryptedfileonyourcomputerwhereahackerorlaptopthiefcouldeasilygetit.
SendinganEmailOnceyouareloggedintoyouremailprovider’sSMTPserver,youcancallthesendmail()methodtoactuallysendtheemail.Thesendmail()methodcalllookslikethis:
>>>smtpObj.sendmail('[email protected]','[email protected]',
'Subject:Solong.\nDearAlice,solongandthanksforallthefish.Sincerely,
Bob')
{}
Thesendmail()methodrequiresthreearguments.
Youremailaddressasastring(fortheemail’s“from”address)Therecipient’semailaddressasastringoralistofstringsformultiplerecipients(forthe“to”address)Theemailbodyasastring
Thestartoftheemailbodystringmustbeginwith'Subject:\n'forthesubjectlineoftheemail.The'\n'newlinecharacterseparatesthesubjectlinefromthemainbodyoftheemail.
Thereturnvaluefromsendmail()isadictionary.Therewillbeonekey-valuepairinthedictionaryforeachrecipientforwhomemaildeliveryfailed.Anemptydictionarymeansallrecipientsweresuccessfullysenttheemail.
DisconnectingfromtheSMTPServerBesuretocallthequit()methodwhenyouaredonesendingemails.ThiswilldisconnectyourprogramfromtheSMTPserver.
>>>smtpObj.quit()
(221,b'2.0.0closingconnectionko10sm23097611pbd.52-gsmtp')
The221inthereturnvaluemeansthesessionisending.
Toreviewallthestepsforconnectingandloggingintotheserver,sendingemail,anddisconnection,seeSendingEmail.
IMAPJustasSMTPistheprotocolforsendingemail,theInternetMessageAccessProtocol(IMAP)specifieshowtocommunicatewithanemailprovider’sservertoretrieveemailssenttoyouremailaddress.Pythoncomeswithanimaplibmodule,butinfactthethird-partyimapclientmoduleiseasiertouse.ThischapterprovidesanintroductiontousingIMAPClient;thefulldocumentationisathttp://imapclient.readthedocs.org/.
TheimapclientmoduledownloadsemailsfromanIMAPserverinarathercomplicatedformat.Mostlikely,you’llwanttoconvertthemfromthisformatintosimplestringvalues.Thepyzmailmoduledoesthehardjobofparsingtheseemailmessagesforyou.YoucanfindthecompletedocumentationforPyzMailathttp://www.magiksys.net/pyzmail/.
InstallimapclientandpyzmailfromaTerminalwindow.AppendixAhasstepsonhowtoinstallthird-partymodules.
RetrievingandDeletingEmailswithIMAPFindingandretrievinganemailinPythonisamultistepprocessthatrequiresboththeimapclientandpyzmailthird-partymodules.Justtogiveyouanoverview,here’safullexampleofloggingintoanIMAPserver,searchingforemails,fetchingthem,andthenextractingthetextoftheemailmessagesfromthem.
>>>importimapclient
>>>imapObj=imapclient.IMAPClient('imap.gmail.com',ssl=True)
>>>imapObj.login('[email protected]','MY_SECRET_PASSWORD')
'[email protected](Success)'
>>>imapObj.select_folder('INBOX',readonly=True)
>>>UIDs=imapObj.search(['SINCE05-Jul-2014'])
>>>UIDs
[40032,40033,40034,40035,40036,40037,40038,40039,40040,40041]
>>>rawMessages=imapObj.fetch([40041],['BODY[]','FLAGS'])
>>>importpyzmail
>>>message=pyzmail.PyzMessage.factory(rawMessages[40041]['BODY[]'])
>>>message.get_subject()
'Hello!'
>>>message.get_addresses('from')
[('EdwardSnowden','[email protected]')]
>>>message.get_addresses('to')
[(JaneDoe','[email protected]')]
>>>message.get_addresses('cc')
[]
>>>message.get_addresses('bcc')
[]
>>>message.text_part!=None
True
>>>message.text_part.get_payload().decode(message.text_part.charset)
'Followthemoney.\r\n\r\n-Ed\r\n'
>>>message.html_part!=None
True
>>>message.html_part.get_payload().decode(message.html_part.charset)
'<divdir="ltr"><div>Solong,andthanksforallthefish!<br><br></div>-
Al<br></div>\r\n'
>>>imapObj.logout()
Youdon’thavetomemorizethesesteps.Afterwegothrougheachstepindetail,youcancomebacktothisoverviewtorefreshyourmemory.
ConnectingtoanIMAPServerJustlikeyouneededanSMTPobjecttoconnecttoanSMTPserverandsendemail,youneedanIMAPClientobjecttoconnecttoanIMAPserverandreceiveemail.Firstyou’llneedthedomainnameofyouremailprovider’sIMAPserver.ThiswillbedifferentfromtheSMTPserver’sdomainname.Table16-2liststheIMAPserversforseveralpopularemailproviders.
Table16-2.EmailProvidersandTheirIMAPServers
Provider IMAPserverdomainname
Gmail imap.gmail.com
Outlook.com/Hotmail.com imap-mail.outlook.com
YahooMail imap.mail.yahoo.com
AT&T imap.mail.att.net
Comcast imap.comcast.net
Verizon incoming.verizon.net
OnceyouhavethedomainnameoftheIMAPserver,calltheimapclient.IMAPClient()functiontocreateanIMAPClientobject.MostemailprovidersrequireSSLencryption,sopassthessl=Truekeywordargument.Enterthefollowingintotheinteractiveshell(usingyourprovider’sdomainname):
>>>importimapclient
>>>imapObj=imapclient.IMAPClient('imap.gmail.com',ssl=True)
Inalloftheinteractiveshellexamplesinthefollowingsections,theimapObjvariablewillcontainanIMAPClientobjectreturnedfromtheimapclient.IMAPClient()function.Inthiscontext,aclientistheobjectthatconnectstotheserver.
LoggingintotheIMAPServerOnceyouhaveanIMAPClientobject,callitslogin()method,passingintheusername(thisisusuallyyouremailaddress)andpasswordasstrings.
>>>imapObj.login('[email protected]','MY_SECRET_PASSWORD')
'[email protected](Success)'
WARNING
Remember,neverwriteapassworddirectlyintoyourcode!Instead,designyourprogramtoacceptthepasswordreturnedfrominput().
IftheIMAPserverrejectsthisusername/passwordcombination,Pythonwillraiseanimaplib.errorexception.ForGmailaccounts,youmayneedtouseanapplication-specificpassword;formoredetails,seeGmail’sApplication-SpecificPasswords.
SearchingforEmailOnceyou’reloggedon,actuallyretrievinganemailthatyou’reinterestedinisatwo-stepprocess.First,youmustselectafolderyouwanttosearchthrough.Then,youmustcalltheIMAPClientobject’ssearch()method,passinginastringofIMAPsearchkeywords.
SelectingaFolder
AlmosteveryaccounthasanINBOXfolderbydefault,butyoucanalsogetalistoffoldersbycallingtheIMAPClientobject’slist_folders()method.Thisreturnsalistoftuples.Eachtuplecontainsinformationaboutasinglefolder.Continuetheinteractiveshell
examplebyenteringthefollowing:>>>importpprint
>>>pprint.pprint(imapObj.list_folders())
[(('\\HasNoChildren',),'/','Drafts'),
(('\\HasNoChildren',),'/','Filler'),
(('\\HasNoChildren',),'/','INBOX'),
(('\\HasNoChildren',),'/','Sent'),
--snip-
(('\\HasNoChildren','\\Flagged'),'/','[Gmail]/Starred'),
(('\\HasNoChildren','\\Trash'),'/','[Gmail]/Trash')]
ThisiswhatyouroutputmightlooklikeifyouhaveaGmailaccount.(Gmailcallsitsfolderslabels,buttheyworkthesamewayasfolders.)Thethreevaluesineachofthetuples—forexample,(('\\HasNoChildren',),'/','INBOX')—areasfollows:
Atupleofthefolder’sflags.(Exactlywhattheseflagsrepresentisbeyondthescopeofthisbook,andyoucansafelyignorethisfield.)Thedelimiterusedinthenamestringtoseparateparentfoldersandsubfolders.Thefullnameofthefolder.
Toselectafoldertosearchthrough,passthefolder’snameasastringintotheIMAPClientobject’sselect_folder()method.
>>>imapObj.select_folder('INBOX',readonly=True)
Youcanignoreselect_folder()’sreturnvalue.Iftheselectedfolderdoesnotexist,Pythonwillraiseanimaplib.errorexception.
Thereadonly=Truekeywordargumentpreventsyoufromaccidentallymakingchangesordeletionstoanyoftheemailsinthisfolderduringthesubsequentmethodcalls.Unlessyouwanttodeleteemails,it’sagoodideatoalwayssetreadonlytoTrue.
PerformingtheSearch
Withafolderselected,youcannowsearchforemailswiththeIMAPClientobject’ssearch()method.Theargumenttosearch()isalistofstrings,eachformattedtotheIMAP’ssearchkeys.Table16-3describesthevarioussearchkeys.
Table16-3.IMAPSearchKeys
Searchkey Meaning
'ALL' Returnsallmessagesinthefolder.Youmayrunintoimaplibsizelimitsifyourequestallthemessagesinalargefolder.SeeSizeLimits.
'BEFOREdate','ONdate','SINCEdate'
Thesethreesearchkeysreturn,respectively,messagesthatwerereceivedbytheIMAPserverbefore,on,orafterthegivendate.Thedatemustbeformattedlike05-Jul-2015.Also,while'SINCE05-Jul-2015'willmatchmessagesonandafterJuly5,'BEFORE05-Jul-2015'willmatchonlymessagesbeforeJuly5butnotonJuly5itself.
'SUBJECT
string','BODYstring','TEXTstring'
Returnsmessageswherestringisfoundinthesubject,body,oreither,respectively.Ifstringhasspacesinit,thenencloseitwithdoublequotes:'TEXT"searchwithspaces"'.
'FROMstring','TOstring','CCstring','BCCstring'
Returnsallmessageswherestringisfoundinthe“from”emailaddress,“to”addresses,“cc”(carboncopy)addresses,or“bcc”(blindcarboncopy)addresses,respectively.Iftherearemultipleemailaddressesinstring,thenseparatethemwithspacesandenclosethemallwithdoublequotes:'CC"[email protected]@example.com"'.
'SEEN','UNSEEN'
Returnsallmessageswithandwithoutthe\Seenflag,respectively.Anemailobtainsthe\Seenflagifithasbeenaccessedwithafetch()methodcall(describedlater)orifitisclickedwhenyou’recheckingyouremailinanemailprogramorwebbrowser.It’smorecommontosaytheemailhasbeen“read”ratherthan“seen,”buttheymeanthesamething.
'ANSWERED','UNANSWERED'
Returnsallmessageswithandwithoutthe\Answeredflag,respectively.Amessageobtainsthe\Answeredflagwhenitisrepliedto.
'DELETED','UNDELETED'
Returnsallmessageswithandwithoutthe\Deletedflag,respectively.Emailmessagesdeletedwiththedelete_messages()methodaregiventhe\Deletedflagbutarenotpermanentlydeleteduntiltheexpunge()methodiscalled(seeDeletingEmails).Notethatsomeemailproviders,suchasGmail,automaticallyexpungeemails.
'DRAFT','UNDRAFT'
Returnsallmessageswithandwithoutthe\Draftflag,respectively.DraftmessagesareusuallykeptinaseparateDraftsfolderratherthanintheINBOXfolder.
'FLAGGED','UNFLAGGED'
Returnsallmessageswithandwithoutthe\Flaggedflag,respectively.Thisflagisusuallyusedtomarkemailmessagesas“Important”or“Urgent.”
'LARGERN','SMALLERN'
ReturnsallmessageslargerorsmallerthanNbytes,respectively.
'NOTsearch-key'
Returnsthemessagesthatsearch-keywouldnothavereturned.
'ORsearch-key1search-
key2'
Returnsthemessagesthatmatcheitherthefirstorsecondsearch-key.
NotethatsomeIMAPserversmayhaveslightlydifferentimplementationsforhowtheyhandletheirflagsandsearchkeys.Itmayrequiresomeexperimentationintheinteractiveshelltoseeexactlyhowtheybehave.
YoucanpassmultipleIMAPsearchkeystringsinthelistargumenttothesearch()method.Themessagesreturnedaretheonesthatmatchallthesearchkeys.Ifyouwanttomatchanyofthesearchkeys,usetheORsearchkey.FortheNOTandORsearchkeys,oneandtwocompletesearchkeysfollowtheNOTandOR,respectively.
Herearesomeexamplesearch()methodcallsalongwiththeirmeanings:
imapObj.search(['ALL']).Returnseverymessageinthecurrentlyselectedfolder.imapObj.search(['ON05-Jul-2015']).ReturnseverymessagesentonJuly5,2015.imapObj.search(['SINCE01-Jan-2015','BEFORE01-Feb-2015',
'UNSEEN']).ReturnseverymessagesentinJanuary2015thatisunread.(NotethatthismeansonandafterJanuary1anduptobutnotincludingFebruary1.)imapObj.search(['SINCE01-Jan-2015','[email protected]']).Returnseverymessagefromalice@example.comsentsincethestartof2015.imapObj.search(['SINCE01-Jan-2015','NOTFROM
[email protected]']).Returnseverymessagesentfromeveryoneexceptalice@example.comsincethestartof2015.imapObj.search(['[email protected]
[email protected]'])[email protected]@example.com.imapObj.search(['[email protected]','[email protected]']).Trickexample!Thissearchwillneverreturnanymessages,becausemessagesmustmatchallsearchkeywords.Sincetherecanbeonlyone“from”address,[email protected]@example.com.
Thesearch()methoddoesn’treturntheemailsthemselvesbutratheruniqueIDs(UIDs)fortheemails,asintegervalues.YoucanthenpasstheseUIDstothefetch()methodtoobtaintheemailcontent.
Continuetheinteractiveshellexamplebyenteringthefollowing:>>>UIDs=imapObj.search(['SINCE05-Jul-2015'])
>>>UIDs
[40032,40033,40034,40035,40036,40037,40038,40039,40040,40041]
Here,thelistofmessageIDs(formessagesreceivedJuly5onward)returnedbysearch()isstoredinUIDs.ThelistofUIDsreturnedonyourcomputerwillbedifferentfromtheonesshownhere;theyareuniquetoaparticularemailaccount.WhenyoulaterpassUIDstootherfunctioncalls,usetheUIDvaluesyoureceived,nottheonesprintedinthisbook’sexamples.
SizeLimits
Ifyoursearchmatchesalargenumberofemailmessages,Pythonmightraiseanexceptionthatsaysimaplib.error:gotmorethan10000bytes.Whenthishappens,youwillhavetodisconnectandreconnecttotheIMAPserverandtryagain.
ThislimitisinplacetopreventyourPythonprogramsfromeatinguptoomuchmemory.Unfortunately,thedefaultsizelimitisoftentoosmall.Youcanchangethislimitfrom10,000bytesto10,000,000bytesbyrunningthiscode:
>>>importimaplib
>>>imaplib._MAXLINE=10000000
Thisshouldpreventthiserrormessagefromcomingupagain.YoumaywanttomakethesetwolinespartofeveryIMAPprogramyouwrite.
USINGIMAPCLIENT’SGMAIL_SEARCH()METHOD
Ifyouareloggingintotheimap.gmail.comservertoaccessaGmailaccount,theIMAPClientobjectprovidesanextrasearchfunctionthatmimicsthesearchbaratthetopoftheGmailwebpage,ashighlightedinFigure16-1.
Figure16-1.ThesearchbaratthetopoftheGmailwebpage
InsteadofsearchingwithIMAPsearchkeys,youcanuseGmail’smoresophisticatedsearchengine.Gmaildoesagoodjobofmatchingcloselyrelatedwords(forexample,asearchfordrivingwillalsomatchdriveanddrove)andsortingthesearchresultsbymostsignificantmatches.YoucanalsouseGmail’sadvancedsearchoperators(seehttp://nostarch.com/automatestuff/formoreinformation).IfyouareloggingintoaGmailaccount,passthesearchtermstothegmail_search()methodinsteadofthesearch()method,likeinthefollowinginteractiveshellexample:
>>>UIDs=imapObj.gmail_search('meaningoflife')
>>>UIDs
[42]
Ah,yes—there’sthatemailwiththemeaningoflife!Iwaslookingforthat.
FetchinganEmailandMarkingItAsReadOnceyouhavealistofUIDs,youcancalltheIMAPClientobject’sfetch()methodtogettheactualemailcontent.
ThelistofUIDswillbefetch()’sfirstargument.Thesecondargumentshouldbethelist['BODY[]'],whichtellsfetch()todownloadallthebodycontentfortheemailsspecifiedinyourUIDlist.
Let’scontinueourinteractiveshellexample.>>>rawMessages=imapObj.fetch(UIDs,['BODY[]'])
>>>importpprint
>>>pprint.pprint(rawMessages)
{40040:{'BODY[]':'Delivered-To:[email protected]\r\n'
'Received:by10.76.71.167withSMTPid'
--snip--
'\r\n'
'------=_Part_6000970_707736290.1404819487066--\r\n',
'SEQ':5430}}
Importpprintandpassthereturnvaluefromfetch(),storedinthevariablerawMessages,topprint.pprint()to“prettyprint”it,andyou’llseethatthisreturnvalueisanesteddictionaryofmessageswithUIDsasthekeys.Eachmessageisstoredasadictionarywithtwokeys:'BODY[]'and'SEQ'.The'BODY[]'keymapstotheactualbodyoftheemail.The'SEQ'keyisforasequencenumber,whichhasasimilarroletotheUID.Youcansafelyignoreit.
Asyoucansee,themessagecontentinthe'BODY[]'keyisprettyunintelligible.It’sinaformatcalledRFC822,whichisdesignedforIMAPserverstoread.Butyoudon’tneedtounderstandtheRFC822format;laterinthischapter,thepyzmailmodulewillmakesenseofitforyou.
Whenyouselectedafoldertosearchthrough,youcalledselect_folder()withthereadonly=Truekeywordargument.Doingthiswillpreventyoufromaccidentallydeletinganemail—butitalsomeansthatemailswillnotgetmarkedasreadifyoufetchthemwiththefetch()method.Ifyoudowantemailstobemarkedasreadwhenyoufetchthem,youwillneedtopassreadonly=Falsetoselect_folder().Iftheselectedfolderisalreadyinreadonlymode,youcanreselectthecurrentfolderwithanothercalltoselect_folder(),thistimewiththereadonly=Falsekeywordargument:
>>>imapObj.select_folder('INBOX',readonly=False)
GettingEmailAddressesfromaRawMessageTherawmessagesreturnedfromthefetch()methodstillaren’tveryusefultopeoplewhojustwanttoreadtheiremail.ThepyzmailmoduleparsestheserawmessagesandreturnsthemasPyzMessageobjects,whichmakethesubject,body,“To”field,“From”field,andothersectionsoftheemaileasilyaccessibletoyourPythoncode.
Continuetheinteractiveshellexamplewiththefollowing(usingUIDsfromyourownemailaccount,nottheonesshownhere):
>>>importpyzmail
>>>message=pyzmail.PyzMessage.factory(rawMessages[40041]['BODY[]'])
First,importpyzmail.Then,tocreateaPyzMessageobjectofanemail,callthepyzmail.PyzMessage.factory()functionandpassitthe'BODY[]'sectionoftherawmessage.Storetheresultinmessage.NowmessagecontainsaPyzMessageobject,whichhasseveralmethodsthatmakeiteasytogettheemail’ssubjectline,aswellasallsenderandrecipientaddresses.Theget_subject()methodreturnsthesubjectasasimplestringvalue.Theget_addresses()methodreturnsalistofaddressesforthefieldyoupassit.Forexample,themethodcallsmightlooklikethis:
>>>message.get_subject()
'Hello!'
>>>message.get_addresses('from')
[('EdwardSnowden','[email protected]')]
>>>message.get_addresses('to')
[(JaneDoe','[email protected]')]
>>>message.get_addresses('cc')
[]
>>>message.get_addresses('bcc')
[]
Noticethattheargumentforget_addresses()is'from','to','cc',or'bcc'.Thereturnvalueofget_addresses()isalistoftuples.Eachtuplecontainstwostrings:Thefirstisthenameassociatedwiththeemailaddress,andthesecondistheemailaddressitself.If
therearenoaddressesintherequestedfield,get_addresses()returnsablanklist.Here,the'cc'carboncopyand'bcc'blindcarboncopyfieldsbothcontainednoaddressesandsoreturnedemptylists.
GettingtheBodyfromaRawMessageEmailscanbesentasplaintext,HTML,orboth.Plaintextemailscontainonlytext,whileHTMLemailscanhavecolors,fonts,images,andotherfeaturesthatmaketheemailmessagelooklikeasmallwebpage.Ifanemailisonlyplaintext,itsPyzMessageobjectwillhaveitshtml_partattributessettoNone.Likewise,ifanemailisonlyHTML,itsPyzMessageobjectwillhaveitstext_partattributesettoNone.
Otherwise,thetext_partorhtml_partvaluewillhaveaget_payload()methodthatreturnstheemail’sbodyasavalueofthebytesdatatype.(Thebytesdatatypeisbeyondthescopeofthisbook.)Butthisstillisn’tastringvaluethatwecanuse.Ugh!Thelaststepistocallthedecode()methodonthebytesvaluereturnedbyget_payload().Thedecode()methodtakesoneargument:themessage’scharacterencoding,storedinthetext_part.charsetorhtml_part.charsetattribute.This,finally,willreturnthestringoftheemail’sbody.
Continuetheinteractiveshellexamplebyenteringthefollowing:➊>>>message.text_part!=None
True
>>>message.text_part.get_payload().decode(message.text_part.charset)
➋'Solong,andthanksforallthefish!\r\n\r\n-Al\r\n'
➌>>>message.html_part!=None
True
➍>>>message.html_part.get_payload().decode(message.html_part.charset)
'<divdir="ltr"><div>Solong,andthanksforallthefish!<br><br></div>-Al
<br></div>\r\n'
Theemailwe’reworkingwithhasbothplaintextandHTMLcontent,sothePyzMessageobjectstoredinmessagehastext_partandhtml_partattributesnotequaltoNone➊➌.Callingget_payload()onthemessage’stext_partandthencallingdecode()onthebytesvaluereturnsastringofthetextversionoftheemail➋.Usingget_payload()anddecode()withthemessage’shtml_partreturnsastringoftheHTMLversionoftheemail➍.
DeletingEmailsTodeleteemails,passalistofmessageUIDstotheIMAPClientobject’sdelete_messages()method.Thismarkstheemailswiththe\Deletedflag.Callingtheexpunge()methodwillpermanentlydeleteallemailswiththe\Deletedflaginthecurrentlyselectedfolder.Considerthefollowinginteractiveshellexample:
➊>>>imapObj.select_folder('INBOX',readonly=False)
➋>>>UIDs=imapObj.search(['ON09-Jul-2015'])
>>>UIDs
[40066]
>>>imapObj.delete_messages(UIDs)
➌{40066:('\\Seen','\\Deleted')}
>>>imapObj.expunge()
('Success',[(5452,'EXISTS')])
Hereweselecttheinboxbycallingselect_folder()ontheIMAPClientobjectandpassing'INBOX'asthefirstargument;wealsopassthekeywordargumentreadonly=Falsesothatwecandeleteemails➊.Wesearchtheinboxformessages
receivedonaspecificdateandstorethereturnedmessageIDsinUIDs➋.Callingdelete_message()andpassingitUIDsreturnsadictionary;eachkey-valuepairisamessageIDandatupleofthemessage’sflags,whichshouldnowinclude\Deleted➌.Callingexpunge()thenpermanentlydeletesmessageswiththe\Deletedflagandreturnsasuccessmessageiftherewerenoproblemsexpungingtheemails.Notethatsomeemailproviders,suchasGmail,automaticallyexpungeemailsdeletedwithdelete_messages()insteadofwaitingforanexpungecommandfromtheIMAPclient.
DisconnectingfromtheIMAPServerWhenyourprogramhasfinishedretrievingordeletingemails,simplycalltheIMAPClient’slogout()methodtodisconnectfromtheIMAPserver.
>>>imapObj.logout()
Ifyourprogramrunsforseveralminutesormore,theIMAPservermaytimeout,orautomaticallydisconnect.Inthiscase,thenextmethodcallyourprogrammakesontheIMAPClientobjectwillraiseanexceptionlikethefollowing:
imaplib.abort:socketerror:[WinError10054]Anexistingconnectionwas
forciblyclosedbytheremotehost
Inthisevent,yourprogramwillhavetocallimapclient.IMAPClient()toconnectagain.
Whew!That’sit.Therewerealotofhoopstojumpthrough,butyounowhaveawaytogetyourPythonprogramstologintoanemailaccountandfetchemails.YoucanalwaysconsulttheoverviewinRetrievingandDeletingEmailswithIMAPwheneveryouneedtorememberallofthesteps.
Project:SendingMemberDuesReminderEmailsSayyouhavebeen“volunteered”totrackmemberduesfortheMandatoryVolunteerismClub.Thisisatrulyboringjob,involvingmaintainingaspreadsheetofeveryonewhohaspaideachmonthandemailingreminderstothosewhohaven’t.Insteadofgoingthroughthespreadsheetyourselfandcopyingandpastingthesameemailtoeveryonewhoisbehindondues,let’s—youguessedit—writeascriptthatdoesthisforyou.
Atahighlevel,here’swhatyourprogramwilldo:
ReaddatafromanExcelspreadsheet.Findallmemberswhohavenotpaidduesforthelatestmonth.Findtheiremailaddressesandsendthempersonalizedreminders.
Thismeansyourcodewillneedtodothefollowing:
OpenandreadthecellsofanExceldocumentwiththeopenpyxlmodule.(SeeChapter12forworkingwithExcelfiles.)Createadictionaryofmemberswhoarebehindontheirdues.LogintoanSMTPserverbycallingsmtplib.SMTP(),ehlo(),starttls(),andlogin().Forallmembersbehindontheirdues,sendapersonalizedreminderemailbycallingthesendmail()method.
OpenanewfileeditorwindowandsaveitassendDuesReminders.py.
Step1:OpentheExcelFileLet’ssaytheExcelspreadsheetyouusetotrackmembershipduespaymentslookslikeFigure16-2andisinafilenamedduesRecords.xlsx.Youcandownloadthisfilefromhttp://nostarch.com/automatestuff/.
Figure16-2.Thespreadsheetfortrackingmemberduespayments
Thisspreadsheethaseverymember’snameandemailaddress.Eachmonthhasacolumntrackingmembers’paymentstatuses.Thecellforeachmemberismarkedwiththetext
paidoncetheyhavepaidtheirdues.
TheprogramwillhavetoopenduesRecords.xlsxandfigureoutthecolumnforthelatestmonthbycallingtheget_highest_column()method.(YoucanconsultChapter12formoreinformationonaccessingcellsinExcelspreadsheetfileswiththeopenpyxlmodule.)Enterthefollowingcodeintothefileeditorwindow:
#!python3
#sendDuesReminders.py-Sendsemailsbasedonpaymentstatusinspreadsheet.
importopenpyxl,smtplib,sys
#Openthespreadsheetandgetthelatestduesstatus.
➊wb=openpyxl.load_workbook('duesRecords.xlsx')
➋sheet=wb.get_sheet_by_name('Sheet1')
➌lastCol=sheet.get_highest_column()
➍latestMonth=sheet.cell(row=1,column=lastCol).value
#TODO:Checkeachmember'spaymentstatus.
#TODO:Logintoemailaccount.
#TODO:Sendoutreminderemails.
Afterimportingtheopenpyxl,smtplib,andsysmodules,weopenourduesRecords.xlsxfileandstoretheresultingWorkbookobjectinwb➊.ThenwegetSheet1andstoretheresultingWorksheetobjectinsheet➋.NowthatwehaveaWorksheetobject,wecanaccessrows,columns,andcells.WestorethehighestcolumninlastCol➌,andwethenuserownumber1andlastColtoaccessthecellthatshouldholdthemostrecentmonth.WegetthevalueinthiscellandstoreitinlatestMonth➍.
Step2:FindAllUnpaidMembersOnceyou’vedeterminedthecolumnnumberofthelatestmonth(storedinlastCol),youcanloopthroughallrowsafterthefirstrow(whichhasthecolumnheaders)toseewhichmembershavethetextpaidinthecellforthatmonth’sdues.Ifthememberhasn’tpaid,youcangrabthemember’snameandemailaddressfromcolumns1and2,respectively.ThisinformationwillgointotheunpaidMembersdictionary,whichwilltrackallmemberswhohaven’tpaidinthemostrecentmonth.AddthefollowingcodetosendDuesReminder.py.
#!python3
#sendDuesReminders.py-Sendsemailsbasedonpaymentstatusinspreadsheet.
--snip--
#Checkeachmember'spaymentstatus.
unpaidMembers={}
➊forrinrange(2,sheet.get_highest_row()+1):
➋payment=sheet.cell(row=r,column=lastCol).value
ifpayment!='paid':
➌name=sheet.cell(row=r,column=1).value
➍email=sheet.cell(row=r,column=2).value
➎unpaidMembers[name]=email
ThiscodesetsupanemptydictionaryunpaidMembersandthenloopsthroughalltherowsafterthefirst➊.Foreachrow,thevalueinthemostrecentcolumnisstoredinpayment➋.Ifpaymentisnotequalto'paid',thenthevalueofthefirstcolumnisstoredinname➌,thevalueofthesecondcolumnisstoredinemail➍,andnameandemailareaddedtounpaidMembers➎.
Step3:SendCustomizedEmailRemindersOnceyouhavealistofallunpaidmembers,it’stimetosendthememailreminders.Addthefollowingcodetoyourprogram,exceptwithyourrealemailaddressandproviderinformation:
#!python3
#sendDuesReminders.py-Sendsemailsbasedonpaymentstatusinspreadsheet.
--snip--
#Logintoemailaccount.
smtpObj=smtplib.SMTP('smtp.gmail.com',587)
smtpObj.ehlo()
smtpObj.starttls()
smtpObj.login('[email protected]',sys.argv[1])
CreateanSMTPobjectbycallingsmtplib.SMTP()andpassingitthedomainnameandportforyourprovider.Callehlo()andstarttls(),andthencalllogin()andpassityouremailaddressandsys.argv[1],whichwillstoreyourpasswordstring.You’llenterthepasswordasacommandlineargumenteachtimeyouruntheprogram,toavoidsavingyourpasswordinyoursourcecode.
Onceyourprogramhasloggedintoyouremailaccount,itshouldgothroughtheunpaidMembersdictionaryandsendapersonalizedemailtoeachmember’semailaddress.AddthefollowingtosendDuesReminders.py:
#!python3
#sendDuesReminders.py-Sendsemailsbasedonpaymentstatusinspreadsheet.
--snip--
#Sendoutreminderemails.
forname,emailinunpaidMembers.items():
➊body="Subject:%sduesunpaid.\nDear%s,\nRecordsshowthatyouhavenot
paidduesfor%s.Pleasemakethispaymentassoonaspossible.Thankyou!'"%
(latestMonth,name,latestMonth)
➋print('Sendingemailto%s…'%email)
➌sendmailStatus=smtpObj.sendmail('[email protected]',email,body)
➍ifsendmailStatus!={}:
print('Therewasaproblemsendingemailto%s:%s'%(email,
sendmailStatus))
smtpObj.quit()
ThiscodeloopsthroughthenamesandemailsinunpaidMembers.Foreachmemberwhohasn’tpaid,wecustomizeamessagewiththelatestmonthandthemember’sname,andstorethemessageinbody➊.Weprintoutputsayingthatwe’resendinganemailtothismember’semailaddress➋.Thenwecallsendmail(),passingitthefromaddressandthecustomizedmessage➌.WestorethereturnvalueinsendmailStatus.
Rememberthatthesendmail()methodwillreturnanonemptydictionaryvalueiftheSMTPserverreportedanerrorsendingthatparticularemail.Thelastpartoftheforloopat➍checksifthereturneddictionaryisnonempty,andifitis,printstherecipient’semailaddressandthereturneddictionary.
Aftertheprogramisdonesendingalltheemails,thequit()methodiscalledtodisconnectfromtheSMTPserver.
Whenyouruntheprogram,theoutputwilllooksomethinglikethis:[email protected]…
TherecipientswillreceiveanemailthatlookslikeFigure16-3.
Figure16-3.AnautomaticallysentemailfromsendDuesReminders.py
SendingTextMessageswithTwilioMostpeoplearemorelikelytobeneartheirphonesthantheircomputers,sotextmessagescanbeamoreimmediateandreliablewayofsendingnotificationsthanemail.Also,theshortlengthoftextmessagesmakesitmorelikelythatapersonwillgetaroundtoreadingthem.
Inthissection,you’lllearnhowtosignupforthefreeTwilioserviceanduseitsPythonmoduletosendtextmessages.TwilioisanSMSgatewayservice,whichmeansit’saservicethatallowsyoutosendtextmessagesfromyourprograms.AlthoughyouwillbelimitedinhowmanytextsyoucansendpermonthandthetextswillbeprefixedwiththewordsSentfromaTwiliotrialaccount,thistrialserviceisprobablyadequateforyourpersonalprograms.Thefreetrialisindefinite;youwon’thavetoupgradetoapaidplanlater.
Twilioisn’ttheonlySMSgatewayservice.IfyouprefernottouseTwilio,youcanfindalternativeservicesbysearchingonlineforfreesmsgateway,pythonsmsapi,oreventwilioalternatives.
BeforesigningupforaTwilioaccount,installthetwiliomodule.AppendixAhasmoredetailsaboutinstallingthird-partymodules.
NOTE
ThissectionisspecifictotheUnitedStates.TwiliodoesofferSMStextingservicesforcountriesoutsideoftheUnitedStates,butthosespecificsaren’tcoveredinthisbook.Thetwiliomoduleanditsfunctions,however,willworkthesameoutsidetheUnitedStates.Seehttp://twilio.com/formoreinformation.
SigningUpforaTwilioAccountGotohttp://twilio.com/andfilloutthesign-upform.Onceyou’vesignedupforanewaccount,you’llneedtoverifyamobilephonenumberthatyouwanttosendtextsto.(Thisverificationisnecessarytopreventpeoplefromusingtheservicetospamrandomphonenumberswithtextmessages.)
Afterreceivingthetextwiththeverificationnumber,enteritintotheTwiliowebsitetoprovethatyouownthemobilephoneyouareverifying.Youwillnowbeabletosendtextstothisphonenumberusingthetwiliomodule.
Twilioprovidesyourtrialaccountwithaphonenumbertouseasthesenderoftextmessages.Youwillneedtwomorepiecesofinformation:youraccountSIDandtheauth(authentication)token.YoucanfindthisinformationontheDashboardpagewhenyouareloggedintoyourTwilioaccount.ThesevaluesactasyourTwiliousernameandpasswordwhenlogginginfromaPythonprogram.
SendingTextMessagesOnceyou’veinstalledthetwiliomodule,signedupforaTwilioaccount,verifiedyourphonenumber,registeredaTwiliophonenumber,andobtainedyouraccountSIDandauthtoken,youwillfinallybereadytosendyourselftextmessagesfromyourPythonscripts.
Comparedtoalltheregistrationsteps,theactualPythoncodeisfairlysimple.WithyourcomputerconnectedtotheInternet,enterthefollowingintotheinteractiveshell,replacing
theaccountSID,authToken,myTwilioNumber,andmyCellPhonevariablevalueswithyourrealinformation:
➊>>>fromtwilio.restimportTwilioRestClient
>>>accountSID='ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
>>>authToken='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
➋>>>twilioCli=TwilioRestClient(accountSID,authToken)
>>>myTwilioNumber='+14955551234'
>>>myCellPhone='+14955558888'
➌>>>message=twilioCli.messages.create(body='Mr.Watson-Comehere-Iwant
toseeyou.',from_=myTwilioNumber,to=myCellPhone)
Afewmomentsaftertypingthelastline,youshouldreceiveatextmessagethatreadsSentfromyourTwiliotrialaccount-Mr.Watson-Comehere-Iwanttoseeyou.
Becauseofthewaythetwiliomoduleissetup,youneedtoimportitusingfromtwilio.restimportTwilioRestClient,notjustimporttwilio➊.StoreyouraccountSIDinaccountSIDandyourauthtokeninauthTokenandthencallTwilioRestClient()andpassitaccountSIDandauthToken.ThecalltoTwilioRestClient()returnsaTwilioRestClientobject➋.Thisobjecthasamessagesattribute,whichinturnhasacreate()methodyoucanusetosendtextmessages.ThisisthemethodthatwillinstructTwilio’sserverstosendyourtextmessage.AfterstoringyourTwilionumberandcellphonenumberinmyTwilioNumberandmyCellPhone,respectively,callcreate()andpassitkeywordargumentsspecifyingthebodyofthetextmessage,thesender’snumber(myTwilioNumber),andtherecipient’snumber(myCellPhone)➌.
TheMessageobjectreturnedfromthecreate()methodwillhaveinformationaboutthetextmessagethatwassent.Continuetheinteractiveshellexamplebyenteringthefollowing:
>>>message.to
'+14955558888'
>>>message.from_
'+14955551234'
>>>message.body
'Mr.Watson-Comehere-Iwanttoseeyou.'
Theto,from_,andbodyattributesshouldholdyourcellphonenumber,Twilionumber,andmessage,respectively.Notethatthesendingphonenumberisinthefrom_attribute—withanunderscoreattheend—notfrom.ThisisbecausefromisakeywordinPython(you’veseenitusedinthefrommodulenameimport*formofimportstatement,forexample),soitcannotbeusedasanattributename.Continuetheinteractiveshellexamplewiththefollowing:
>>>message.status
'queued'
>>>message.date_created
datetime.datetime(2015,7,8,1,36,18)
>>>message.date_sent==None
True
Thestatusattributeshouldgiveyouastring.Thedate_createdanddate_sentattributesshouldgiveyouadatetimeobjectifthemessagehasbeencreatedandsent.Itmayseemoddthatthestatusattributeissetto'queued'andthedate_sentattributeissettoNonewhenyou’vealreadyreceivedthetextmessage.ThisisbecauseyoucapturedtheMessageobjectinthemessagevariablebeforethetextwasactuallysent.YouwillneedtorefetchtheMessageobjectinordertoseeitsmostup-to-datestatusanddate_sent.EveryTwiliomessagehasauniquestringID(SID)thatcanbeusedtofetchthelatest
updateoftheMessageobject.Continuetheinteractiveshellexamplebyenteringthefollowing:
>>>message.sid
'SM09520de7639ba3af137c6fcb7c5f4b51'
➊>>>updatedMessage=twilioCli.messages.get(message.sid)
>>>updatedMessage.status
'delivered'
>>>updatedMessage.date_sent
datetime.datetime(2015,7,8,1,36,18)
Enteringmessage.sidshowyouthismessage’slongSID.BypassingthisSIDtotheTwilioclient’sget()method➊,youcanretrieveanewMessageobjectwiththemostup-to-dateinformation.InthisnewMessageobject,thestatusanddate_sentattributesarecorrect.
Thestatusattributewillbesettooneofthefollowingstringvalues:'queued','sending','sent','delivered','undelivered',or'failed'.Thesestatusesareself-explanatory,butformoreprecisedetails,takealookattheresourcesathttp://nostarch.com/automatestuff/.
RECEIVINGTEXTMESSAGESWITHPYTHON
Unfortunately,receivingtextmessageswithTwilioisabitmorecomplicatedthansendingthem.Twiliorequiresthatyouhaveawebsiterunningitsownwebapplication.That’sbeyondthescopeofthisbook,butyoucanfindmoredetailsintheresourcesforthisbook(http://nostarch.com/automatestuff/).
Project:“JustTextMe”ModuleThepersonyou’llmostoftentextfromyourprogramsisprobablyyou.Textingisagreatwaytosendyourselfnotificationswhenyou’reawayfromyourcomputer.Ifyou’veautomatedaboringtaskwithaprogramthattakesacoupleofhourstorun,youcouldhaveitnotifyyouwithatextwhenit’sfinished.Oryoumayhavearegularlyscheduledprogramrunningthatsometimesneedstocontactyou,suchasaweather-checkingprogramthattextsyouaremindertopackanumbrella.
Asasimpleexample,here’sasmallPythonprogramwithatextmyself()functionthatsendsamessagepassedtoitasastringargument.Openanewfileeditorwindowandenterthefollowingcode,replacingtheaccountSID,authtoken,andphonenumberswithyourowninformation.SaveitastextMyself.py.
#!python3
#textMyself.py-Definesthetextmyself()functionthattextsamessage
#passedtoitasastring.
#Presetvalues:
accountSID='ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
authToken='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
myNumber='+15559998888'
twilioNumber='+15552225678'
fromtwilio.restimportTwilioRestClient
➊deftextmyself(message):
➋twilioCli=TwilioRestClient(accountSID,authToken)
➌twilioCli.messages.create(body=message,from_=twilioNumber,to=myNumber)
ThisprogramstoresanaccountSID,authtoken,sendingnumber,andreceivingnumber.Itthendefinedtextmyself()totakeonargument➊,makeaTwilioRestClientobject➋,andcallcreate()withthemessageyoupassed➌.
Ifyouwanttomakethetextmyself()functionavailabletoyourotherprograms,simplyplacethetextMyself.pyfileinthesamefolderasthePythonexecutable(C:\Python34onWindows,/usr/local/lib/python3.4onOSX,and/usr/bin/python3onLinux).Nowyoucanusethefunctioninyourotherprograms.Wheneveryouwantoneofyourprogramstotextyou,justaddthefollowing:
importtextmyself
textmyself.textmyself('Theboringtaskisfinished.')
YouneedtosignupforTwilioandwritethetextingcodeonlyonce.Afterthat,it’sjusttwolinesofcodetosendatextfromanyofyourotherprograms.
SummaryWecommunicatewitheachotherontheInternetandovercellphonenetworksindozensofdifferentways,butemailandtextingpredominate.Yourprogramscancommunicatethroughthesechannels,whichgivesthempowerfulnewnotificationfeatures.Youcanevenwriteprogramsrunningondifferentcomputersthatcommunicatewithoneanotherdirectlyviaemail,withoneprogramsendingemailswithSMTPandtheotherretrievingthemwithIMAP.
Python’ssmtplibprovidesfunctionsforusingtheSMTPtosendemailsthroughyouremailprovider’sSMTPserver.Likewise,thethird-partyimapclientandpyzmailmodulesletyouaccessIMAPserversandretrieveemailssenttoyou.AlthoughIMAPisabitmoreinvolvedthanSMTP,it’salsoquitepowerfulandallowsyoutosearchforparticularemails,downloadthem,andparsethemtoextractthesubjectandbodyasstringvalues.
Textingisabitdifferentfromemail,since,unlikeemail,morethanjustanInternetconnectionisneededtosendSMStexts.Fortunately,servicessuchasTwilioprovidemodulestoallowyoutosendtextmessagesfromyourprograms.Onceyougothroughaninitialsetupprocess,you’llbeabletosendtextswithjustacouplelinesofcode.
Withthesemodulesinyourskillset,you’llbeabletoprogramthespecificconditionsunderwhichyourprogramsshouldsendnotificationsorreminders.Nowyourprogramswillhavereachfarbeyondthecomputerthey’rerunningon!
PracticeQuestionsQ: 1.Whatistheprotocolforsendingemail?Forcheckingandreceivingemail?
Q: 2.Whatfoursmtplibfunctions/methodsmustyoucalltologintoanSMTPserver?
Q: 3.Whattwoimapclientfunctions/methodsmustyoucalltologintoanIMAPserver?
Q: 4.WhatkindofargumentdoyoupasstoimapObj.search()?
Q: 5.Whatdoyoudoifyourcodegetsanerrormessagethatsaysgotmorethan10000bytes?
Q: 6.TheimapclientmodulehandlesconnectingtoanIMAPserverandfindingemails.Whatisonemodulethathandlesreadingtheemailsthatimapclientcollects?
Q: 7.WhatthreepiecesofinformationdoyouneedfromTwiliobeforeyoucansendtextmessages?
PracticeProjectsForpractice,writeprogramsthatdothefollowing.
RandomChoreAssignmentEmailerWriteaprogramthattakesalistofpeople’semailaddressesandalistofchoresthatneedtobedoneandrandomlyassignschorestopeople.Emaileachpersontheirassignedchores.Ifyou’refeelingambitious,keeparecordofeachperson’spreviouslyassignedchoressothatyoucanmakesuretheprogramavoidsassigninganyonethesamechoretheydidlasttime.Foranotherpossiblefeature,scheduletheprogramtorunonceaweekautomatically.
Here’sahint:Ifyoupassalisttotherandom.choice()function,itwillreturnarandomlyselecteditemfromthelist.Partofyourcodecouldlooklikethis:
chores=['dishes','bathroom','vacuum','walkdog']
randomChore=random.choice(chores)
chores.remove(randomChore)#thischoreisnowtaken,soremoveit
UmbrellaReminderChapter11showedyouhowtousetherequestsmoduletoscrapedatafromhttp://weather.gov/.Writeaprogramthatrunsjustbeforeyouwakeupinthemorningandcheckswhetherit’srainingthatday.Ifso,havetheprogramtextyouaremindertopackanumbrellabeforeleavingthehouse.
AutoUnsubscriberWriteaprogramthatscansthroughyouremailaccount,findsalltheunsubscribelinksinallyouremails,andautomaticallyopenstheminabrowser.Thisprogramwillhavetologintoyouremailprovider’sIMAPserveranddownloadallofyouremails.YoucanuseBeautifulSoup(coveredinChapter11)tocheckforanyinstancewherethewordunsubscribeoccurswithinanHTMLlinktag.
OnceyouhavealistoftheseURLs,youcanusewebbrowser.open()toautomaticallyopenalloftheselinksinabrowser.
You’llstillhavetomanuallygothroughandcompleteanyadditionalstepstounsubscribeyourselffromtheselists.Inmostcases,thisinvolvesclickingalinktoconfirm.
Butthisscriptsavesyoufromhavingtogothroughallofyouremailslookingforunsubscribelinks.Youcanthenpassthisscriptalongtoyourfriendssotheycanrunitontheiremailaccounts.(Justmakesureyouremailpasswordisn’thardcodedinthesourcecode!)
ControllingYourComputerThroughEmailWriteaprogramthatchecksanemailaccountevery15minutesforanyinstructionsyouemailitandexecutesthoseinstructionsautomatically.Forexample,BitTorrentisapeer-to-peerdownloadingsystem.UsingfreeBitTorrentsoftwaresuchasqBittorrent,youcandownloadlargemediafilesonyourhomecomputer.Ifyouemailtheprograma(completelylegal,notatallpiratical)BitTorrentlink,theprogramwilleventuallycheckitsemail,findthismessage,extractthelink,andthenlaunchqBittorrenttostart
downloadingthefile.Thisway,youcanhaveyourhomecomputerbegindownloadswhileyou’reaway,andthe(completelylegal,notatallpiratical)downloadcanbefinishedbythetimeyoureturnhome.
Chapter15covershowtolaunchprogramsonyourcomputerusingthesubprocess.Popen()function.Forexample,thefollowingcallwouldlaunchtheqBittorrentprogram,alongwithatorrentfile:
qbProcess=subprocess.Popen(['C:\\ProgramFiles(x86)\\qBittorrent\\
qbittorrent.exe','shakespeare_complete_works.torrent'])
Ofcourse,you’llwanttheprogramtomakesuretheemailscomefromyou.Inparticular,youmightwanttorequirethattheemailscontainapassword,sinceitisfairlytrivialforhackerstofakea“from”addressinemails.Theprogramshoulddeletetheemailsitfindssothatitdoesn’trepeatinstructionseverytimeitcheckstheemailaccount.Asanextrafeature,havetheprogramemailortextyouaconfirmationeverytimeitexecutesacommand.Sinceyouwon’tbesittinginfrontofthecomputerthatisrunningtheprogram,it’sagoodideatousetheloggingfunctions(seeChapter10)towriteatextfilelogthatyoucancheckiferrorscomeup.
qBittorrent(aswellasotherBitTorrentapplications)hasafeaturewhereitcanquitautomaticallyafterthedownloadcompletes.Chapter15explainshowyoucandeterminewhenalaunchedapplicationhasquitwiththewait()methodforPopenobjects.Thewait()methodcallwillblockuntilqBittorrenthasstopped,andthenyourprogramcanemailortextyouanotificationthatthedownloadhascompleted.
Therearealotofpossiblefeaturesyoucouldaddtothisproject.Ifyougetstuck,youcandownloadanexampleimplementationofthisprogramfromhttp://nostarch.com/automatestuff/.
Chapter17.ManipulatingImagesIfyouhaveadigitalcameraorevenifyoujustuploadphotosfromyourphonetoFacebook,youprobablycrosspathswithdigitalimagefilesallthetime.Youmayknowhowtousebasicgraphicssoftware,suchasMicrosoftPaintorPaintbrush,orevenmoreadvancedapplicationssuchasAdobePhotoshop.Butifyouneedtoeditamassivenumberofimages,editingthembyhandcanbealengthy,boringjob.
EnterPython.Pillowisathird-partyPythonmoduleforinteractingwithimagefiles.Themodulehasseveralfunctionsthatmakeiteasytocrop,resize,andeditthecontentofanimage.WiththepowertomanipulateimagesthesamewayyouwouldwithsoftwaresuchasMicrosoftPaintorAdobePhotoshop,Pythoncanautomaticallyedithundredsorthousandsofimageswithease.
ComputerImageFundamentalsInordertomanipulateanimage,youneedtounderstandthebasicsofhowcomputersdealwithcolorsandcoordinatesinimagesandhowyoucanworkwithcolorsandcoordinatesinPillow.Butbeforeyoucontinue,installthepillowmodule.SeeAppendixAforhelpinstallingthird-partymodules.
ColorsandRGBAValuesComputerprogramsoftenrepresentacolorinanimageasanRGBAvalue.AnRGBAvalueisagroupofnumbersthatspecifytheamountofred,green,blue,andalpha(ortransparency)inacolor.Eachofthesecomponentvaluesisanintegerfrom0(noneatall)to255(themaximum).TheseRGBAvaluesareassignedtoindividualpixels;apixelisthesmallestdotofasinglecolorthecomputerscreencanshow(asyoucanimagine,therearemillionsofpixelsonascreen).Apixel’sRGBsettingtellsitpreciselywhatshadeofcoloritshoulddisplay.ImagesalsohaveanalphavaluetocreateRGBAvalues.Ifanimageisdisplayedonthescreenoverabackgroundimageordesktopwallpaper,thealphavaluedetermineshowmuchofthebackgroundyoucan“seethrough”theimage’spixel.
InPillow,RGBAvaluesarerepresentedbyatupleoffourintegervalues.Forexample,thecolorredisrepresentedby(255,0,0,255).Thiscolorhasthemaximumamountofred,nogreenorblue,andthemaximumalphavalue,meaningitisfullyopaque.Greenisrepresentedby(0,255,0,255),andblueis(0,0,255,255).White,thecombinationofallcolors,is(255,255,255,255),whileblack,whichhasnocoloratall,is(0,0,0,255).
Ifacolorhasanalphavalueof0,itisinvisible,anditdoesn’treallymatterwhattheRGBvaluesare.Afterall,invisibleredlooksthesameasinvisibleblack.
PillowusesthestandardcolornamesthatHTMLuses.Table17-1listsaselectionofstandardcolornamesandtheirvalues.
Table17-1.StandardColorNamesandTheirRGBAValues
Name RGBAValue Name RGBAValue
White (255,255,255,255) Red (255,0,0,255)
Green (0,128,0,255) Blue (0,0,255,255)
Gray (128,128,128,255) Yellow (255,255,0,255)
Black (0,0,0,255) Purple (128,0,128,255)
PillowofferstheImageColor.getcolor()functionsoyoudon’thavetomemorizeRGBAvaluesforthecolorsyouwanttouse.Thisfunctiontakesacolornamestringasitsfirstargument,andthestring'RGBA'asitssecondargument,anditreturnsanRGBAtuple.
CMYKANDRGBCOLORING
Ingradeschoolyoulearnedthatmixingred,yellow,andbluepaintscanformothercolors;forexample,youcanmixblueandyellowtomakegreenpaint.Thisisknownasthesubtractivecolormodel,anditappliestodyes,inks,andpigments.ThisiswhycolorprintershaveCMYKinkcartridges:theCyan(blue),Magenta(red),Yellow,andblacKinkcanbemixedtogethertoformanycolor.
However,thephysicsoflightuseswhat’scalledanadditivecolormodel.Whencombininglight(suchasthelightgivenoffbyyourcomputerscreen),red,green,andbluelightcanbecombinedtoformanyothercolor.ThisiswhyRGBvaluesrepresentcolorincomputerprograms.
Toseehowthisfunctionworks,enterthefollowingintotheinteractiveshell:➊>>>fromPILimportImageColor
➋>>>ImageColor.getcolor('red','RGBA')
(255,0,0,255)
➌>>>ImageColor.getcolor('RED','RGBA')
(255,0,0,255)
>>>ImageColor.getcolor('Black','RGBA')
(0,0,0,255)
>>>ImageColor.getcolor('chocolate','RGBA')
(210,105,30,255)
>>>ImageColor.getcolor('CornflowerBlue','RGBA')
(100,149,237,255)
First,youneedtoimporttheImageColormodulefromPIL➊(notfromPillow;you’llseewhyinamoment).ThecolornamestringyoupasstoImageColor.getcolor()iscaseinsensitive,sopassing'red'➋andpassing'RED'➌giveyouthesameRGBAtuple.Youcanalsopassmoreunusualcolornames,like'chocolate'and'CornflowerBlue'.
Pillowsupportsahugenumberofcolornames,from'aliceblue'to'whitesmoke'.Youcanfindthefulllistofmorethan100standardcolornamesintheresourcesathttp://nostarch.com/automatestuff/.
CoordinatesandBoxTuplesImagepixelsareaddressedwithx-andy-coordinates,whichrespectivelyspecifyapixel’shorizontalandverticallocationinanimage.Theoriginisthepixelatthetop-leftcorneroftheimageandisspecifiedwiththenotation(0,0).Thefirstzerorepresentsthex-coordinate,whichstartsatzeroattheoriginandincreasesgoingfromlefttoright.Thesecondzerorepresentsthey-coordinate,whichstartsatzeroattheoriginandincreasesgoingdowntheimage.Thisbearsrepeating:y-coordinatesincreasegoingdownward,whichistheoppositeofhowyoumayremembery-coordinatesbeingusedinmathclass.Figure17-1demonstrateshowthiscoordinatesystemworks.
ManyofPillow’sfunctionsandmethodstakeaboxtupleargument.ThismeansPillowisexpectingatupleoffourintegercoordinatesthatrepresentarectangularregioninanimage.Thefourintegersare,inorder,asfollows:
Figure17-1.Thex-andy-coordinatesofa27×26imageofsomesortofancientdatastoragedevice
Left:Thex-coordinateoftheleftmostedgeofthebox.Top:They-coordinateofthetopedgeofthebox.Right:Thex-coordinateofonepixeltotherightoftherightmostedgeofthebox.Thisintegermustbegreaterthantheleftinteger.Bottom:They-coordinateofonepixellowerthanthebottomedgeofthebox.Thisintegermustbegreaterthanthetopinteger.
Figure17-2.Thearearepresentedbytheboxtuple(3,1,9,6)
Notethattheboxincludestheleftandtopcoordinatesandgoesuptobutdoesnotincludetherightandbottomcoordinates.Forexample,theboxtuple(3,1,9,6)representsallthepixelsintheblackboxinFigure17-2.
ManipulatingImageswithPillowNowthatyouknowhowcolorsandcoordinatesworkinPillow,let’susePillowtomanipulateanimage.Figure17-3istheimagethatwillbeusedforalltheinteractiveshellexamplesinthischapter.Youcandownloaditfromhttp://nostarch.com/automatestuff/.
OnceyouhavetheimagefileZophie.pnginyourcurrentworkingdirectory,you’llbereadytoloadtheimageofZophieintoPython,likeso:
>>>fromPILimportImage
>>>catIm=Image.open('zophie.png')
Figure17-3.MycatZophie.Thecameraadds10pounds(whichisalotforacat).
Toloadtheimage,youimporttheImagemodulefromPillowandcallImage.open(),passingittheimage’sfilename.YoucanthenstoretheloadedimageinavariablelikeCatIm.ThemodulenameofPillowisPILtomakeitbackwardcompatiblewithanoldermodulecalledPythonImagingLibrary,whichiswhyyoumustrunfromPILimportImageinsteadoffromPillowimportImage.BecauseofthewayPillow’screatorssetupthepillowmodule,youmustusethefromPILimportImageformofimportstatement,ratherthansimplyimportPIL.
Iftheimagefileisn’tinthecurrentworkingdirectory,changetheworkingdirectorytothefolderthatcontainstheimagefilebycallingtheos.chdir()function.
>>>importos
>>>os.chdir('C:\\folder_with_image_file')
TheImage.open()functionreturnsavalueoftheImageobjectdatatype,whichishowPillowrepresentsanimageasaPythonvalue.YoucanloadanImageobjectfromanimagefile(ofanyformat)bypassingtheImage.open()functionastringofthefilename.AnychangesyoumaketotheImageobjectcanbesavedtoanimagefile(alsoofanyformat)withthesave()method.Alltherotations,resizing,cropping,drawing,andotherimagemanipulationswillbedonethroughmethodcallsonthisImageobject.
Toshortentheexamplesinthischapter,I’llassumeyou’veimportedPillow’sImagemoduleandthatyouhavetheZophieimagestoredinavariablenamedcatIm.Besurethatthezophie.pngfileisinthecurrentworkingdirectorysothattheImage.open()functioncanfindit.Otherwise,youwillalsohavetospecifythefullabsolutepathinthestringargumenttoImage.open().
WorkingwiththeImageDataTypeAnImageobjecthasseveralusefulattributesthatgiveyoubasicinformationabouttheimagefileitwasloadedfrom:itswidthandheight,thefilename,andthegraphicsformat(suchasJPEG,GIF,orPNG).
Forexample,enterthefollowingintotheinteractiveshell:>>>fromPILimportImage
>>>catIm=Image.open('zophie.png')
>>>catIm.size
➊(816,1088)
➋>>>width,height=catIm.size
➌>>>width
816
➍>>>height
1088
>>>catIm.filename
'zophie.png'
>>>catIm.format
'PNG'
>>>catIm.format_description
'Portablenetworkgraphics'
➎>>>catIm.save('zophie.jpg')
AftermakinganImageobjectfromZophie.pngandstoringtheImageobjectincatIm,wecanseethattheobject’ssizeattributecontainsatupleoftheimage’swidthandheightinpixels➊.Wecanassignthevaluesinthetupletowidthandheightvariables➋inordertoaccesswithwidth➌andheight➍individually.Thefilenameattributedescribestheoriginalfile’sname.Theformatandformat_descriptionattributesarestringsthatdescribetheimageformatoftheoriginalfile(withformat_descriptionbeingabitmoreverbose).
Finally,callingthesave()methodandpassingit'zophie.jpg'savesanewimagewiththefilenamezophie.jpgtoyourharddrive➎.Pillowseesthatthefileextensionis.jpgandautomaticallysavestheimageusingtheJPEGimageformat.Nowyoushouldhavetwoimages,zophie.pngandzophie.jpg,onyourharddrive.Whilethesefilesarebasedonthesameimage,theyarenotidenticalbecauseoftheirdifferentformats.
PillowalsoprovidestheImage.new()function,whichreturnsanImageobject—muchlikeImage.open(),excepttheimagerepresentedbyImage.new()’sobjectwillbeblank.TheargumentstoImage.new()areasfollows:
Thestring'RGBA',whichsetsthecolormodetoRGBA.(Thereareothermodesthat
thisbookdoesn’tgointo.)Thesize,asatwo-integertupleofthenewimage’swidthandheight.Thebackgroundcolorthattheimageshouldstartwith,asafour-integertupleofanRGBAvalue.YoucanusethereturnvalueoftheImageColor.getcolor()functionforthisargument.Alternatively,Image.new()alsosupportsjustpassingthestringofthestandardcolorname.
Forexample,enterthefollowingintotheinteractiveshell:>>>fromPILimportImage
➊>>>im=Image.new('RGBA',(100,200),'purple')
>>>im.save('purpleImage.png')
➋>>>im2=Image.new('RGBA',(20,20))
>>>im2.save('transparentImage.png')
HerewecreateanImageobjectforanimagethat’s100pixelswideand200pixelstall,withapurplebackground➊.ThisimageisthensavedtothefilepurpleImage.png.WecallImage.new()againtocreateanotherImageobject,thistimepassing(20,20)forthedimensionsandnothingforthebackgroundcolor➋.Invisibleblack,(0,0,0,0),isthedefaultcolorusedifnocolorargumentisspecified,sothesecondimagehasatransparentbackground;wesavethis20×20transparentsquareintransparentImage.png.
CroppingImagesCroppinganimagemeansselectingarectangularregioninsideanimageandremovingeverythingoutsidetherectangle.Thecrop()methodonImageobjectstakesaboxtupleandreturnsanImageobjectrepresentingthecroppedimage.Thecroppingdoesnothappeninplace—thatis,theoriginalImageobjectisleftuntouched,andthecrop()methodreturnsanewImageobject.Remeberthataboxedtuple—inthiscase,thecroppedsection—includestheleftcolumnandtoprowofpixelsbutonlygoesuptoanddoesnotincludetherightcolumnandbottomrowofpixels.
Enterthefollowingintotheinteractiveshell:>>>croppedIm=catIm.crop((335,345,565,560))
>>>croppedIm.save('cropped.png')
ThismakesanewImageobjectforthecroppedimage,storestheobjectincroppedIm,andthencallssave()oncroppedImtosavethecroppedimageincropped.png.Thenewfilecropped.pngwillbecreatedfromtheoriginalimage,likeinFigure17-4.
Figure17-4.Thenewimagewillbejustthecroppedsectionoftheoriginalimage.
CopyingandPastingImagesontoOtherImagesThecopy()methodwillreturnanewImageobjectwiththesameimageastheImageobjectitwascalledon.Thisisusefulifyouneedtomakechangestoanimagebutalsowanttokeepanuntouchedversionoftheoriginal.Forexample,enterthefollowingintotheinteractiveshell:
>>>catIm=Image.open('zophie.png')
>>>catCopyIm=catIm.copy()
ThecatImandcatCopyImvariablescontaintwoseparateImageobjects,whichbothhavethesameimageonthem.NowthatyouhaveanImageobjectstoredincatCopyIm,youcanmodifycatCopyImasyoulikeandsaveittoanewfilename,leavingzophie.pnguntouched.Forexample,let’strymodifyingcatCopyImwiththepaste()method.
Thepaste()methodiscalledonanImageobjectandpastesanotherimageontopofit.Let’scontinuetheshellexamplebypastingasmallerimageontocatCopyIm.
>>>faceIm=catIm.crop((335,345,565,560))
>>>faceIm.size
(230,215)
>>>catCopyIm.paste(faceIm,(0,0))
>>>catCopyIm.paste(faceIm,(400,500))
>>>catCopyIm.save('pasted.png')
Firstwepasscrop()aboxtuplefortherectangularareainzophie.pngthatcontainsZophie’sface.ThiscreatesanImageobjectrepresentinga230×215crop,whichwestoreinfaceIm.NowwecanpastefaceImontocatCopyIm.Thepaste()methodtakestwo
arguments:a“source”Imageobjectandatupleofthex-andy-coordinateswhereyouwanttopastethetop-leftcornerofthesourceImageobjectontothemainImageobject.Herewecallpaste()twiceoncatCopyIm,passing(0,0)thefirsttimeand(400,500)thesecondtime.ThispastesfaceImontocatCopyImtwice:oncewiththetop-leftcorneroffaceImat(0,0)oncatCopyIm,andoncewiththetop-leftcorneroffaceImat(400,500).Finally,wesavethemodifiedcatCopyImtopasted.png.Thepasted.pngimagelookslikeFigure17-5.
Figure17-5.Zophiethecat,withherfacepastedtwice
NOTE
Despitetheirnames,thecopy()andpaste()methodsinPillowdonotuseyourcomputer’sclipboard.
Notethatthepaste()methodmodifiesitsImageobjectinplace;itdoesnotreturnanImageobjectwiththepastedimage.Ifyouwanttocallpaste()butalsokeepanuntouchedversionoftheoriginalimagearound,you’llneedtofirstcopytheimageandthencallpaste()onthatcopy.
SayyouwanttotileZophie’sheadacrosstheentireimage,asinFigure17-6.Youcanachievethiseffectwithjustacoupleforloops.Continuetheinteractiveshellexamplebyenteringthefollowing:
>>>catImWidth,catImHeight=catIm.size
>>>faceImWidth,faceImHeight=faceIm.size
➊>>>catCopyTwo=catIm.copy()
➋>>>forleftinrange(0,catImWidth,faceImWidth):
➌fortopinrange(0,catImHeight,faceImHeight):
print(left,top)
catCopyTwo.paste(faceIm,(left,top))
00
0215
0430
0645
0860
01075
2300
230215
--snip--
690860
6901075
>>>catCopyTwo.save('tiled.png')
HerewestorethewidthofheightofcatImincatImWidthandcatImHeight.At➊wemakeacopyofcatImandstoreitincatCopyTwo.Nowthatwehaveacopythatwecanpasteonto,westartloopingtopastefaceImontocatCopyTwo.Theouterforloop’sleftvariablestartsat0andincreasesbyfaceImWidth(230)➋.Theinnerforloop’stopvariablestartat0andincreasesbyfaceImHeight(215)➌.ThesenestedforloopsproducevaluesforleftandtoptopasteagridoffaceImimagesoverthecatCopyTwoImageobject,asinFigure17-6.Toseeournestedloopsworking,weprintleftandtop.Afterthepastingiscomplete,wesavethemodifiedcatCopyTwototiled.png.
Figure17-6.Nestedforloopsusedwithpaste()toduplicatethecat’sface(aduplicat,ifyouwill).
PASTINGTRANSPARENTPIXELS
Normallytransparentpixelsarepastedaswhitepixels.Iftheimageyouwanttopastehastransparentpixels,passtheImageobjectasthethirdargumentsothatasolidrectangleisn’tpasted.Thisthirdargumentisthe“mask”Imageobject.AmaskisanImageobjectwherethealphavalueissignificant,butthered,green,andbluevaluesareignored.Themasktellsthepaste()functionwhichpixelsitshouldcopyandwhichitshouldleavetransparent.Advancedusageofmasksisbeyondthisbook,butifyouwanttopasteanimagethathastransparentpixels,passtheImageobjectagainasthethirdargument.
ResizinganImageTheresize()methodiscalledonanImageobjectandreturnsanewImageobjectofthespecifiedwidthandheight.Itacceptsatwo-integertupleargument,representingthenewwidthandheightofthereturnedimage.Enterthefollowingintotheinteractiveshell:
➊>>>width,height=catIm.size
➋>>>quartersizedIm=catIm.resize((int(width/2),int(height/2)))
>>>quartersizedIm.save('quartersized.png')
➌>>>svelteIm=catIm.resize((width,height+300))
>>>svelteIm.save('svelte.png')
HereweassignthetwovaluesinthecatIm.sizetupletothevariableswidthandheight➊.UsingwidthandheightinsteadofcatIm.size[0]andcatIm.size[1]makestherestofthecodemorereadable.
Thefirstresize()callpassesint(width/2)forthenewwidthandint(height/2)forthenewheight➋,sotheImageobjectreturnedfromresize()willbehalfthelengthandwidthoftheoriginalimage,orone-quarteroftheoriginalimagesizeoverall.Theresize()methodacceptsonlyintegersinitstupleargument,whichiswhyyouneededtowrapbothdivisionsby2inanint()call.
Thisresizingkeepsthesameproportionsforthewidthandheight.Butthenewwidthandheightpassedtoresize()donothavetobeproportionaltotheoriginalimage.ThesvelteImvariablecontainsanImageobjectthathastheoriginalwidthbutaheightthatis300pixelstaller➌,givingZophieamoreslenderlook.
Notethattheresize()methoddoesnotedittheImageobjectinplacebutinsteadreturnsanewImageobject.
RotatingandFlippingImagesImagescanberotatedwiththerotate()method,whichreturnsanewImageobjectoftherotatedimageandleavestheoriginalImageobjectunchanged.Theargumenttorotate()isasingleintegerorfloatrepresentingthenumberofdegreestorotatetheimagecounterclockwise.Enterthefollowingintotheinteractiveshell:
>>>catIm.rotate(90).save('rotated90.png')
>>>catIm.rotate(180).save('rotated180.png')
>>>catIm.rotate(270).save('rotated270.png')
Notehowyoucanchainmethodcallsbycallingsave()directlyontheImageobjectreturnedfromrotate().Thefirstrotate()andsave()callmakesanewImageobjectrepresentingtheimagerotatedcounterclockwiseby90degreesandsavestherotatedimagetorotated90.png.Thesecondandthirdcallsdothesame,butwith180degressand270degress.TheresultslooklikeFigure17-7.
Figure17-7.Theoriginalimage(left)andtheimagerotatedcounterclockwiseby90,180,and270degrees
Noticethatthewidthandheightoftheimagechangewhentheimageisrotated90or270degrees.Ifyourotateanimagebysomeotheramount,theoriginaldimensionsoftheimagearemaintained.OnWindows,ablackbackgroundisusedtofillinanygapsmadebytherotation,likeinFigure17-8.OnOSX,transparentpixelsareusedforthegapsinstead.
Therotate()methodhasanoptionalexpandkeywordargumentthatcanbesettoTruetoenlargethedimensionsoftheimagetofittheentirerotatednewimage.Forexample,enterthefollowingintotheinteractiveshell:
>>>catIm.rotate(6).save('rotated6.png')
>>>catIm.rotate(6,expand=True).save('rotated6_expanded.png')
Thefirstcallrotatestheimage6degreesandsavesittorotate6.png(seetheimageontheleftofFigure17-8).Thesecondcallrotatestheimage6degreeswithexpandsettoTrueandsavesittorotate6_expanded.png(seetheimageontherightofFigure17-8).
Figure17-8.Theimagerotated6degreesnormally(left)andwithexpand=True(right)
Youcanalsogeta“mirrorflip”ofanimagewiththetranspose()method.YoumustpasseitherImage.FLIP_LEFT_RIGHTorImage.FLIP_TOP_BOTTOMtothetranspose()method.Enterthefollowingintotheinteractiveshell:
>>>catIm.transpose(Image.FLIP_LEFT_RIGHT).save('horizontal_flip.png')
>>>catIm.transpose(Image.FLIP_TOP_BOTTOM).save('vertical_flip.png')
Likerotate(),transpose()createsanewImageobject.HerewaspassImage.FLIP_LEFT_RIGHTtofliptheimagehorizontallyandthensavetheresulttohorizontal_flip.png.Tofliptheimagevertically,wepassImage.FLIP_TOP_BOTTOMandsavetovertical_flip.png.TheresultslooklikeFigure17-9.
Figure17-9.Theoriginalimage(left),horizontalflip(center),andverticalflip(right)
ChangingIndividualPixelsThecolorofanindividualpixelcanberetrievedorsetwiththegetpixel()andputpixel()methods.Thesemethodsbothtakeatuplerepresentingthex-andy-coordinatesofthepixel.Theputpixel()methodalsotakesanadditionaltupleargumentforthecolorofthepixel.Thiscolorargumentisafour-integerRGBAtupleorathree-integerRGBtuple.Enterthefollowingintotheinteractiveshell:
➊>>>im=Image.new('RGBA',(100,100))
➋>>>im.getpixel((0,0))
(0,0,0,0)
➌>>>forxinrange(100):
foryinrange(50):
➍im.putpixel((x,y),(210,210,210))
>>>fromPILimportImageColor
➎>>>forxinrange(100):
foryinrange(50,100):
➏im.putpixel((x,y),ImageColor.getcolor('darkgray','RGBA'))
>>>im.getpixel((0,0))
(210,210,210,255)
>>>im.getpixel((0,50))
(169,169,169,255)
>>>im.save('putPixel.png')
At➊wemakeanewimagethatisa100×100transparentsquare.Callinggetpixel()onsomecoordinatesinthisimagereturns(0,0,0,0)becausetheimageistransparent➋.Tocolorpixelsinthisimage,wecanusenestedforloopstogothroughallthepixelsinthetophalfoftheimage➌andcoloreachpixelusingputpixel()➍.Herewepassputpixel()theRGBtuple(210,210,210),alightgray.
Saywewanttocolorthebottomhalfoftheimagedarkgraybutdon’tknowtheRGBtuplefordarkgray.Theputpixel()methoddoesn’tacceptastandardcolornamelike'darkgray',soyouhavetouseImageColor.getcolor()togetacolortuplefrom'darkgray'.Loopthroughthepixelsinthebottomhalfoftheimage➎andpass
putpixel()thereturnvalueofImageColor.getcolor()➏,andyoushouldnowhaveanimagethatislightgrayinitstophalfanddarkgrayinthebottomhalf,asshowninFigure17-10.Youcancallgetpixel()onsomecoordinatestoconfirmthatthecoloratanygivenpixeliswhatyouexpect.Finally,savetheimagetoputPixel.png.
Figure17-10.TheputPixel.pngimage
Ofcourse,drawingonepixelatatimeontoanimageisn’tveryconvenient.Ifyouneedtodrawshapes,usetheImageDrawfunctionsexplainedlaterinthischapter.
Project:AddingaLogoSayyouhavetheboringjobofresizingthousandsofimagesandaddingasmalllogowatermarktothecornerofeach.DoingthiswithabasicgraphicsprogramsuchasPaintbrushorPaintwouldtakeforever.AfanciergraphicsapplicationsuchasPhotoshopcandobatchprocessing,butthatsoftwarecostshundredsofdollars.Let’swriteascripttodoitinstead.
SaythatFigure17-11isthelogoyouwanttoaddtothebottom-rightcornerofeachimage:ablackcaticonwithawhiteborder,withtherestoftheimagetransparent.
Figure17-11.Thelogotobeaddedtotheimage.
Atahighlevel,here’swhattheprogramshoulddo:
Loadthelogoimage.Loopoverall.pngand.jpgfilesintheworkingdirectory.Checkwhethertheimageiswiderortallerthan300pixels.Ifso,reducethewidthorheight(whicheverislarger)to300pixelsandscaledowntheotherdimensionproportionally.Pastethelogoimageintothecorner.Savethealteredimagestoanotherfolder.
Thismeansthecodewillneedtodothefollowing:
Openthecatlogo.pngfileasanImageobject.Loopoverthestringsreturnedfromos.listdir('.').Getthewidthandheightoftheimagefromthesizeattribute.Calculatethenewwidthandheightoftheresizedimage.Calltheresize()methodtoresizetheimage.Callthepaste()methodtopastethelogo.Callthesave()methodtosavethechanges,usingtheoriginalfilename.
Step1:OpentheLogoImageForthisproject,openanewfileeditorwindow,enterthefollowingcode,andsaveitasresizeAndAddLogo.py:
#!python3
#resizeAndAddLogo.py-Resizesallimagesincurrentworkingdirectorytofit
#ina300x300square,andaddscatlogo.pngtothelower-rightcorner.
importos
fromPILimportImage
➊SQUARE_FIT_SIZE=300
➋LOGO_FILENAME='catlogo.png'
➌logoIm=Image.open(LOGO_FILENAME)
➍logoWidth,logoHeight=logoIm.size
#TODO:Loopoverallfilesintheworkingdirectory.
#TODO:Checkifimageneedstoberesized.
#TODO:Calculatethenewwidthandheighttoresizeto.
#TODO:Resizetheimage.
#TODO:Addthelogo.
#TODO:Savechanges.
BysettinguptheSQUARE_FIT_SIZE➊andLOGO_FILENAME➋constantsatthestartoftheprogram,we’vemadeiteasytochangetheprogramlater.Saythelogothatyou’readdingisn’tthecaticon,orsayyou’rereducingtheoutputimages’largestdimensiontosomethingotherthan300pixels.Withtheseconstantsatthestartoftheprogram,youcanjustopenthecode,changethosevaluesonce,andyou’redone.(Oryoucanmakeitsothatthevaluesfortheseconstantsaretakenfromthecommandlinearguments.)Withouttheseconstants,you’dinsteadhavetosearchthecodeforallinstancesof300and'catlogo.png'andreplacethemwiththevaluesforyournewproject.Inshort,usingconstantsmakesyourprogrammoregeneralized.
ThelogoImageobjectisreturnedfromImage.open()➌.Forreadability,logoWidthandlogoHeightareassignedtothevaluesfromlogoIm.size➍.
TherestoftheprogramisaskeletonofTODOcommentsfornow.
Step2:LoopOverAllFilesandOpenImagesNowyouneedtofindevery.pngfileand.jpgfileinthecurrentworkingdirectory.Notethatyoudon’twanttoaddthelogoimagetothelogoimageitself,sotheprogramshouldskipanyimagewithafilenamethat’sthesameasLOGO_FILENAME.Addthefollowingtoyourcode:
#!python3
#resizeAndAddLogo.py-Resizesallimagesincurrentworkingdirectorytofit
#ina300x300square,andaddscatlogo.pngtothelower-rightcorner.
importos
fromPILimportImage
--snip--
os.makedirs('withLogo',exist_ok=True)
#Loopoverallfilesintheworkingdirectory.
➊forfilenameinos.listdir('.'):
➋ifnot(filename.endswith('.png')orfilename.endswith('.jpg'))\
orfilename==LOGO_FILENAME:
➌continue#skipnon-imagefilesandthelogofileitself
➍im=Image.open(filename)
width,height=im.size
--snip--
First,theos.makedirs()callcreatesawithLogofoldertostorethefinishedimageswithlogos,insteadofoverwritingtheoriginalimagefiles.Theexist_ok=Truekeywordargumentwillkeepos.makedirs()fromraisinganexceptionifwithLogoalreadyexists.
Whileloopingthroughallthefilesintheworkingdirectorywithos.listdir('.')➊,thelongifstatement➋checkswhethereachfilenamedoesn’tendwith.pngor.jpg.Ifso—orifthefileisthelogoimageitself—thentheloopshouldskipitandusecontinue➌togotothenextfile.Iffilenamedoesendwith'.png'or'.jpg'(andisn’tthelogofile),youcanopenitasanImageobject➍andsetwidthandheight.
Step3:ResizetheImagesTheprogramshouldresizetheimageonlyifthewidthorheightislargerthanSQUARE_FIT_SIZE(300pixels,inthiscase),soputalloftheresizingcodeinsideanifstatementthatchecksthewidthandheightvariables.Addthefollowingcodetoyourprogram:
#!python3
#resizeAndAddLogo.py-Resizesallimagesincurrentworkingdirectorytofit
#ina300x300square,andaddscatlogo.pngtothelower-rightcorner.
importos
fromPILimportImage
--snip--
#Checkifimageneedstoberesized.
ifwidth>SQUARE_FIT_SIZEandheight>SQUARE_FIT_SIZE:
#Calculatethenewwidthandheighttoresizeto.
ifwidth>height:
➊height=int((SQUARE_FIT_SIZE/width)*height)
width=SQUARE_FIT_SIZE
else:
➋width=int((SQUARE_FIT_SIZE/height)*width)
height=SQUARE_FIT_SIZE
#Resizetheimage.
print('Resizing%s…'%(filename))
➌im=im.resize((width,height))
--snip--
Iftheimagedoesneedtoberesized,youneedtofindoutwhetheritisawideortallimage.Ifwidthisgreaterthanheight,thentheheightshouldbereducedbythesameproportionthatthewidthwouldbereduced➊.ThisproportionistheSQUARE_FIT_SIZEvaluedividedbythecurrentwidth.Thenewheightvalueisthisproportionmultipliedbythecurrentheightvalue.Sincethedivisionoperatorreturnsafloatvalueandresize()requiresthedimensionstobeintegers,remembertoconverttheresulttoanintegerwiththeint()function.Finally,thenewwidthvaluewillsimplybesettoSQUARE_FIT_SIZE.
Iftheheightisgreaterthanorequaltothewidth(bothcasesarehandledintheelseclause),thenthesamecalculationisdone,exceptwiththeheightandwidthvariablesswapped➋.
Oncewidthandheightcontainthenewimagedimensions,passthemtotheresize()methodandstorethereturnedImageobjectinim➌.
Step4:AddtheLogoandSavetheChangesWhetherornottheimagewasresized,thelogoshouldstillbepastedtothebottom-rightcorner.Whereexactlythelogoshouldbepasteddependsonboththesizeoftheimageandthesizeofthelogo.Figure17-12showshowtocalculatethepastingposition.Theleftcoordinateforwheretopastethelogowillbetheimagewidthminusthelogowidth;thetopcoordinateforwheretopastethelogowillbetheimageheightminusthelogoheight.
Figure17-12.Theleftandtopcoordinatesforplacingthelogointhebottom-rightcornershouldbetheimagewidth/heightminusthelogowidth/height.
Afteryourcodepastesthelogointotheimage,itshouldsavethemodifiedImageobject.Addthefollowingtoyourprogram:
#!python3
#resizeAndAddLogo.py-Resizesallimagesincurrentworkingdirectorytofit
#ina300x300square,andaddscatlogo.pngtothelower-rightcorner.
importos
fromPILimportImage
--snip--
#Checkifimageneedstoberesized.
--snip--
#Addthelogo.
➊print('Addinglogoto%s…'%(filename))
➋im.paste(logoIm,(width-logoWidth,height-logoHeight),logoIm)
#Savechanges.
➌im.save(os.path.join('withLogo',filename))
Thenewcodeprintsamessagetellingtheuserthatthelogoisbeingadded➊,pasteslogoImontoimatthecalculatedcoordinates➋,andsavesthechangestoafilenameinthewithLogodirectory➌.Whenyourunthisprogramwiththezophie.pngfileastheonlyimageintheworkingdirectory,theoutputwilllooklikethis:
Resizingzophie.png…
Addinglogotozophie.png…
Theimagezophie.pngwillbechangedtoa225×300-pixelimagethatlookslikeFigure17-13.Rememberthatthepaste()methodwillnotpastethetransparencypixelsifyoudonotpassthelogoImforthethirdargumentaswell.Thisprogramcanautomaticallyresizeand“logo-ify”hundredsofimagesinjustacoupleminutes.
Figure17-13.Theimagezophie.pngresizedandthelogoadded(left).Ifyouforgetthethirdargument,thetransparentpixelsinthelogowillbecopiedassolidwhitepixels(right).
IdeasforSimilarProgramsBeingabletocompositeimagesormodifyimagesizesinabatchcanbeusefulinmanyapplications.Youcouldwritesimilarprogramstodothefollowing:
AddtextorawebsiteURLtoimages.Addtimestampstoimages.Copyormoveimagesintodifferentfoldersbasedontheirsizes.Addamostlytransparentwatermarktoanimagetopreventothersfromcopyingit.
DrawingonImagesIfyouneedtodrawlines,rectangles,circles,orothersimpleshapesonanimage,usePillow’sImageDrawmodule.Enterthefollowingintotheinteractiveshell:
>>>fromPILimportImage,ImageDraw
>>>im=Image.new('RGBA',(200,200),'white')
>>>draw=ImageDraw.Draw(im)
First,weimportImageandImageDraw.Thenwecreateanewimage,inthiscase,a200×200whiteimage,andstoretheImageobjectinim.WepasstheImageobjecttotheImageDraw.Draw()functiontoreceiveanImageDrawobject.ThisobjecthasseveralmethodsfordrawingshapesandtextontoanImageobject.StoretheImageDrawobjectinavariablelikedrawsoyoucanuseiteasilyinthefollowingexample.
DrawingShapesThefollowingImageDrawmethodsdrawvariouskindsofshapesontheimage.Thefillandoutlineparametersforthesemethodsareoptionalandwilldefaulttowhiteifleftunspecified.
Points
Thepoint(xy,fill)methoddrawsindividualpixels.Thexyargumentrepresentsalistofthepointsyouwanttodraw.Thelistcanbealistofx-andy-coordinatetuples,suchas[(x,y),(x,y),...],oralistofx-andy-coordinateswithouttuples,suchas[x1,y1,x2,y2,...].ThefillargumentisthecolorofthepointsandiseitheranRGBAtupleorastringofacolorname,suchas'red'.Thefillargumentisoptional.
Lines
Theline(xy,fill,width)methoddrawsalineorseriesoflines.xyiseitheralistoftuples,suchas[(x,y),(x,y),...],oralistofintegers,suchas[x1,y1,x2,y2,...].Eachpointisoneoftheconnectingpointsonthelinesyou’redrawing.Theoptionalfillargumentisthecolorofthelines,asanRGBAtupleorcolorname.Theoptionalwidthargumentisthewidthofthelinesanddefaultsto1ifleftunspecified.
Rectangles
Therectangle(xy,fill,outline)methoddrawsarectangle.Thexyargumentisaboxtupleoftheform(left,top,right,bottom).Theleftandtopvaluesspecifythex-andy-coordinatesoftheupper-leftcorneroftherectangle,whilerightandbottomspecifythelower-rightcorner.Theoptionalfillargumentisthecolorthatwillfilltheinsideoftherectangle.Theoptionaloutlineargumentisthecoloroftherectangle’soutline.
Ellipses
Theellipse(xy,fill,outline)methoddrawsanellipse.Ifthewidthandheightoftheellipseareidentical,thismethodwilldrawacircle.Thexyargumentisaboxtuple(left,top,right,bottom)thatrepresentsaboxthatpreciselycontainstheellipse.Theoptionalfillargumentisthecoloroftheinsideoftheellipse,andtheoptionaloutlineargumentisthecoloroftheellipse’soutline.
Polygons
Thepolygon(xy,fill,outline)methoddrawsanarbitrarypolygon.Thexyargumentisalistoftuples,suchas[(x,y),(x,y),...],orintegers,suchas[x1,y1,x2,y2,...],representingtheconnectingpointsofthepolygon’ssides.Thelastpairofcoordinateswillbeautomaticallyconnectedtothefirstpair.Theoptionalfillargumentisthecoloroftheinsideofthepolygon,andtheoptionaloutlineargumentisthecolorofthepolygon’soutline.
DrawingExample
Enterthefollowingintotheinteractiveshell:>>>fromPILimportImage,ImageDraw
>>>im=Image.new('RGBA',(200,200),'white')
>>>draw=ImageDraw.Draw(im)
➊>>>draw.line([(0,0),(199,0),(199,199),(0,199),(0,0)],fill='black')
➋>>>draw.rectangle((20,30,60,60),fill='blue')
➌>>>draw.ellipse((120,30,160,60),fill='red')
➍>>>draw.polygon(((57,87),(79,62),(94,85),(120,90),(103,113)),
fill='brown')
➎>>>foriinrange(100,200,10):
draw.line([(i,0),(200,i-100)],fill='green')
>>>im.save('drawing.png')
AftermakinganImageobjectfora200×200whiteimage,passingittoImageDraw.Draw()togetanImageDrawobject,andstoringtheImageDrawobjectindraw,youcancalldrawingmethodsondraw.Herewemakeathin,blackoutlineattheedgesoftheimage➊,abluerectanglewithitstop-leftcornerat(20,30)andbottom-rightcornerat(60,60)➋,aredellipsedefinedbyaboxfrom(120,30)to(160,60)➌,abrownpolygonwithfivepoints➍,andapatternofgreenlinesdrawnwithaforloop➎.Theresultingdrawing.pngfilewilllooklikeFigure17-14.
Figure17-14.Theresultingdrawing.pngimage
Thereareseveralothershape-drawingmethodsforImageDrawobjects.Thefulldocumentationisavailableathttp://pillow.readthedocs.org/en/latest/reference/ImageDraw.html.
DrawingTextTheImageDrawobjectalsohasatext()methodfordrawingtextontoanimage.The
text()methodtakesfourarguments:xy,text,fill,andfont.
Thexyargumentisatwo-integertuplespecifyingtheupper-leftcornerofthetextbox.Thetextargumentisthestringoftextyouwanttowrite.Theoptionalfillargumentisthecolorofthetext.TheoptionalfontargumentisanImageFontobject,usedtosetthetype-faceandsizeofthetext.Thisisdescribedinmoredetailinthenextsection.
Sinceit’softenhardtoknowinadvancewhatsizeablockoftextwillbeinagivenfont,theImageDrawmodulealsooffersatextsize()method.Itsfirstargumentisthestringoftextyouwanttomeasure,anditssecondargumentisanoptionalImageFontobject.Thetextsize()methodwillthenreturnatwo-integertupleofthewidthandheightthatthetextinthegivenfontwouldbeifitwerewrittenontotheimage.Youcanusethiswidthandheighttohelpyoucalculateexactlywhereyouwanttoputthetextonyourimage.
Thefirstthreeargumentsfortext()arestraightforward.Beforeweusetext()todrawtextontoanimage,let’slookattheoptionalfourthargument,theImageFontobject.
Bothtext()andtextsize()takeanoptionalImageFontobjectastheirfinalarguments.Tocreateoneoftheseobjects,firstrunthefollowing:
>>>fromPILimportImageFont
Nowthatyou’veimportedPillow’sImageFontmodule,youcancalltheImageFont.truetype()function,whichtakestwoarguments.Thefirstargumentisastringforthefont’sTrueTypefile—thisistheactualfontfilethatlivesonyourharddrive.ATrueTypefilehasthe.ttffileextensionandcanusuallybefoundinthefollowingfolders:
OnWindows:C:\Windows\FontsOnOSX:/Library/Fontsand/System/Library/FontsOnLinux:/usr/share/fonts/truetype
Youdon’tactuallyneedtoenterthesepathsaspartoftheTrueTypefilestringbecausePythonknowstoautomaticallysearchforfontsinthesedirectories.ButPythonwilldisplayanerrorifitisunabletofindthefontyouspecified.
ThesecondargumenttoImageFont.truetype()isanintegerforthefontsizeinpoints(ratherthan,say,pixels).KeepinmindthatPillowcreatesPNGimagesthatare72pixelsperinchbydefault,andapointis1/72ofaninch.
Enterthefollowingintotheinteractiveshell,replacingFONT_FOLDERwiththeactualfoldernameyouroperatingsystemuses:
>>>fromPILimportImage,ImageDraw,ImageFont
>>>importos
➊>>>im=Image.new('RGBA',(200,200),'white')
➋>>>draw=ImageDraw.Draw(im)
➌>>>draw.text((20,150),'Hello',fill='purple')
>>>fontsFolder='FONT_FOLDER'#e.g.'Library/Fonts'
➍>>>arialFont=ImageFont.truetype(os.path.join(fontsFolder,'arial.ttf'),32)
➎>>>draw.text((100,150),'Howdy',fill='gray',font=arialFont)
>>>im.save('text.png')
AfterimportingImage,ImageDraw,ImageFont,andos,wemakeanImageobjectforanew200×200whiteimage➊andmakeanImageDrawobjectfromtheImageobject➋.Weusetext()todrawHelloat(20,150)inpurple➌.Wedidn’tpasstheoptionalfourth
argumentinthistext()call,sothetypefaceandsizeofthistextaren’tcustomized.
Tosetatypefaceandsize,wefirststorethefoldername(like/Library/Fonts)infontsFolder.ThenwecallImageFont.truetype(),passingitthe.ttffileforthefontwewant,followedbyanintegerfontsize➍.StoretheFontobjectyougetfromImageFont.truetype()inavariablelikearialFont,andthenpassthevariabletotext()inthefinalkeywordargument.Thetext()callat➎drawsHowdyat(100,150)ingrayin32-pointArial.
Theresultingtext.pngfilewilllooklikeFigure17-15.
Figure17-15.Theresultingtext.pngimage
SummaryImagesconsistofacollectionofpixels,andeachpixelhasanRGBAvalueforitscoloranditsaddressablebyx-andy-coordinates.TwocommonimageformatsareJPEGandPNG.Thepillowmodulecanhandlebothoftheseimageformatsandothers.
WhenanimageisloadedintoanImageobject,itswidthandheightdimensionsarestoredasatwo-integertupleinthesizeattribute.ObjectsoftheImagedatatypealsohavemethodsforcommonimagemanipulations:crop(),copy(),paste(),resize(),rotate(),andtranspose().TosavetheImageobjecttoanimagefile,callthesave()method.
Ifyouwantyourprogramtodrawshapesontoanimage,useImageDrawmethodstodrawpoints,lines,rectangles,ellipses,andpolygons.Themodulealsoprovidesmethodsfordrawingtextinatypefaceandfontsizeofyourchoosing.
Althoughadvanced(andexpensive)applicationssuchasPhotoshopprovideautomaticbatchprocessingfeatures,youcanusePythonscriptstodomanyofthesamemodificationsforfree.Inthepreviouschapters,youwrotePythonprogramstodealwithplaintextfiles,spreadsheets,PDFs,andotherformats.Withthepillowmodule,you’veextendedyourprogrammingpowerstoprocessingimagesaswell!
PracticeQuestionsQ: 1.WhatisanRGBAvalue?
Q: 2.HowcanyougettheRGBAvalueof'CornflowerBlue'fromthePillowmodule?
Q: 3.Whatisaboxtuple?
Q: 4.WhatfunctionreturnsanImageobjectfor,say,animagefilenamedzophie.png?
Q: 5.HowcanyoufindoutthewidthandheightofanImageobject’simage?
Q: 6.WhatmethodwouldyoucalltogetImageobjectfora100×100image,excludingthelowerleftquarterofit?
Q: 7.AftermakingchangestoanImageobject,howcouldyousaveitasanimagefile?
Q: 8.WhatmodulecontainsPillow’sshape-drawingcode?
Q: 9.Imageobjectsdonothavedrawingmethods.Whatkindofobjectdoes?Howdoyougetthiskindofobject?
PracticeProjectsForpractice,writeprogramsthatdothefollowing.
ExtendingandFixingtheChapterProjectProgramsTheresizeAndAddLogo.pyprograminthischapterworkswithPNGandJPEGfiles,butPillowsupportsmanymoreformatsthanjustthesetwo.ExtendresizeAndAddLogo.pytoprocessGIFandBMPimagesaswell.
AnothersmallissueisthattheprogrammodifiesPNGandJPEGfilesonlyiftheirfileextensionsaresetinlowercase.Forexample,itwillprocesszophie.pngbutnotzophie.PNG.Changethecodesothatthefileextensioncheckiscaseinsensitive.
Figure17-16.Whentheimageisn’tmuchlargerthanthelogo,theresultslookugly.
Finally,thelogoaddedtothebottom-rightcornerismeanttobejustasmallmark,butiftheimageisaboutthesamesizeasthelogoitself,theresultwilllooklikeFigure17-16.ModifyresizeAndAddLogo.pysothattheimagemustbeatleasttwicethewidthandheightofthelogoimagebeforethelogoispasted.Otherwise,itshouldskipaddingthelogo.
IdentifyingPhotoFoldersontheHardDriveIhaveabadhabitoftransferringfilesfrommydigitalcameratotemporaryfolderssomewhereontheharddriveandthenforgettingaboutthesefolders.Itwouldbenicetowriteaprogramthatcouldscantheentireharddriveandfindtheseleftover“photofolders.”
Writeaprogramthatgoesthrougheveryfolderonyourharddriveandfindspotentialphotofolders.Ofcourse,firstyou’llhavetodefinewhatyouconsidera“photofolder”tobe;let’ssaythatit’sanyfolderwheremorethanhalfofthefilesarephotos.Andhowdoyoudefinewhatfilesarephotos?
First,aphotofilemusthavethefileextension.pngor.jpg.Also,photosarelargeimages;aphotofile’swidthandheightmustbothbelargerthan500pixels.Thisisasafebet,sincemostdigitalcameraphotosareseveralthousandpixelsinwidthandheight.
Asahint,here’saroughskeletonofwhatthisprogrammightlooklike:#!python3#
Importmodulesandwritecommentstodescribethisprogram.
forfoldername,subfolders,filenamesinos.walk('C:\\'):
numPhotoFiles=0
numNonPhotoFiles=0
forfilenameinfilenames:
#Checkiffileextensionisn't.pngor.jpg.
ifTODO:
numNonPhotoFiles+=1
continue#skiptonextfilename
#OpenimagefileusingPillow.
#Checkifwidth&heightarelargerthan500.
ifTODO:
#Imageislargeenoughtobeconsideredaphoto.
numPhotoFiles+=1
else:
#Imageistoosmalltobeaphoto.
numNonPhotoFiles+=1
#Ifmorethanhalfoffileswerephotos,
#printtheabsolutepathofthefolder.
ifTODO:
print(TODO)
Whentheprogramruns,itshouldprinttheabsolutepathofanyphotofolderstothescreen.
CustomSeatingCardsChapter13includedapracticeprojecttocreatecustominvitationsfromalistofguestsinaplaintextfile.Asanadditionalproject,usethepillowmoduletocreateimagesforcustomseatingcardsforyourguests.Foreachoftheguestslistedintheguests.txtfilefromtheresourcesathttp://nostarch.com/automatestuff/,generateanimagefilewiththeguestnameandsomeflowerydecoration.Apublicdomainflowerimageisavailableintheresourcesathttp://nostarch.com/automatestuff/.
Toensurethateachseatingcardisthesamesize,addablackrectangleontheedgesoftheinvitationimagesothatwhentheimageisprintedout,therewillbeaguidelineforcutting.ThePNGfilesthatPillowproducesaresetto72pixelsperinch,soa4×5-inchcardwouldrequirea288×360-pixelimage.
Chapter18.ControllingtheKeyboardandMousewithGUIAutomationKnowingvariousPythonmodulesforeditingspreadsheets,downloadingfiles,andlaunchingprogramsisuseful,butsometimestherejustaren’tanymodulesfortheapplicationsyouneedtoworkwith.Theultimatetoolsforautomatingtasksonyourcomputerareprogramsyouwritethatdirectlycontrolthekeyboardandmouse.Theseprogramscancontrolotherapplicationsbysendingthemvirtualkeystrokesandmouseclicks,justasifyouweresittingatyourcomputerandinteractingwiththeapplicationsyourself.Thistechniqueisknownasgraphicaluserinterfaceautomation,orGUIautomationforshort.WithGUIautomation,yourprogramscandoanythingthatahumanusersittingatthecomputercando,exceptspillcoffeeonthekeyboard.
ThinkofGUIautomationasprogrammingaroboticarm.Youcanprogramtheroboticarmtotypeatyourkeyboardandmoveyourmouseforyou.Thistechniqueisparticularlyusefulfortasksthatinvolvealotofmindlessclickingorfillingoutofforms.
Thepyautoguimodulehasfunctionsforsimulatingmousemovements,buttonclicks,andscrollingthemousewheel.ThischaptercoversonlyasubsetofPyAutoGUI’sfeatures;youcanfindthefulldocumentationathttp://pyautogui.readthedocs.org/.
InstallingthepyautoguiModuleThepyautoguimodulecansendvirtualkeypressesandmouseclickstoWindows,OSX,andLinux.Dependingonwhichoperatingsystemyou’reusing,youmayhavetoinstallsomeothermodules(calleddependencies)beforeyoucaninstallPyAutoGUI.
OnWindows,therearenoothermodulestoinstall.OnOSX,runsudopip3installpyobjc-framework-Quartz,sudopip3installpyobjc-core,andthensudopip3installpyobjc.OnLinux,runsudopip3installpython3-xlibandsudoapt-getscrot.(ScrotisascreenshotprogramthatPyAutoGUIuses.)
Afterthesedependenciesareinstalled,runpipinstallpyautogui(orpip3onOSXandLinux)toinstallPyAutoGUI.
AppendixAhascompleteinformationoninstallingthird-partymodules.TotestwhetherPyAutoGUIhasbeeninstalledcorrectly,runimportpyautoguifromtheinteractiveshellandcheckforanyerrormessages.
StayingonTrackBeforeyoujumpintoaGUIautomation,youshouldknowhowtoescapeproblemsthatmayarise.Pythoncanmoveyourmouseandtypekeystrokesatanincrediblespeed.Infact,itmightbetoofastforotherprogramstokeepupwith.Also,ifsomethinggoeswrongbutyourprogramkeepsmovingthemousearound,itwillbehardtotellwhatexactlytheprogramisdoingorhowtorecoverfromtheproblem.LiketheenchantedbroomsfromDisney’sTheSorcerer’sApprentice,whichkeptfilling—andthenoverfilling—Mickey’stubwithwater,yourprogramcouldgetoutofcontroleventhoughit’sfollowingyourinstructionsperfectly.Stoppingtheprogramcanbedifficultifthemouseismovingaroundonitsown,preventingyoufromclickingtheIDLEwindowtocloseit.Fortunately,thereareseveralwaystopreventorrecoverfromGUIautomationproblems.
ShuttingDownEverythingbyLoggingOutPerhapsthesimplestwaytostopanout-of-controlGUIautomationprogramistologout,whichwillshutdownallrunningprograms.OnWindowsandLinux,thelogouthotkeyisCTRL-ALT-DEL.OnOSX,itis -SHIFT-OPTION-Q.Byloggingout,you’llloseanyunsavedwork,butatleastyouwon’thavetowaitforafullrebootofthecomputer.
PausesandFail-SafesYoucantellyourscripttowaitaftereveryfunctioncall,givingyouashortwindowtotakecontrolofthemouseandkeyboardifsomethinggoeswrong.Todothis,setthepyautogui.PAUSEvariabletothenumberofsecondsyouwantittopause.Forexample,aftersettingpyautogui.PAUSE=1.5,everyPyAutoGUIfunctioncallwillwaitoneandahalfsecondsafterperformingitsaction.Non-PyAutoGUIinstructionswillnothavethispause.
PyAutoGUIalsohasafail-safefeature.Movingthemousecursortotheupper-leftcornerofthescreenwillcausePyAutoGUItoraisethepyautogui.FailSafeExceptionexception.Yourprogramcaneitherhandlethisexceptionwithtryandexceptstatementsorlettheexceptioncrashyourprogram.Eitherway,thefail-safefeaturewillstoptheprogramifyouquicklymovethemouseasfarupandleftasyoucan.Youcandisablethisfeaturebysettingpyautogui.FAILSAFE=False.Enterthefollowingintotheinteractiveshell:
>>>importpyautogui
>>>pyautogui.PAUSE=1
>>>pyautogui.FAILSAFE=True
Hereweimportpyautoguiandsetpyautogui.PAUSEto1foraone-secondpauseaftereachfunctioncall.Wesetpyautogui.FAILSAFEtoTruetoenablethefail-safefeature.
ControllingMouseMovementInthissection,you’lllearnhowtomovethemouseandtrackitspositiononthescreenusingPyAutoGUI,butfirstyouneedtounderstandhowPyAutoGUIworkswithcoordinates.
ThemousefunctionsofPyAutoGUIusex-andy-coordinates.Figure18-1showsthecoordinatesystemforthecomputerscreen;it’ssimilartothecoordinatesystemusedforimages,discussedinChapter17.Theorigin,wherexandyarebothzero,isattheupper-leftcornerofthescreen.Thex-coordinatesincreasegoingtotheright,andthey-coordinatesincreasegoingdown.Allcoordinatesarepositiveintegers;therearenonegativecoordinates.
Figure18-1.Thecoordinatesofacomputerscreenwith1920×1080resolution
Yourresolutionishowmanypixelswideandtallyourscreenis.Ifyourscreen’sresolutionissetto1920×1080,thenthecoordinatefortheupper-leftcornerwillbe(0,0),andthecoordinateforthebottom-rightcornerwillbe(1919,1079).
Thepyautogui.size()functionreturnsatwo-integertupleofthescreen’swidthandheightinpixels.Enterthefollowingintotheinteractiveshell:
>>>importpyautogui
>>>pyautogui.size()
(1920,1080)
>>>width,height=pyautogui.size()
pyautogui.size()returns(1920,1080)onacomputerwitha1920×1080resolution;dependingonyourscreen’sresolution,yourreturnvaluemaybedifferent.Youcanstorethewidthandheightfrompyautogui.size()invariableslikewidthandheightforbetterreadabilityinyourprograms.
MovingtheMouse
Nowthatyouunderstandscreencoordinates,let’smovethemouse.Thepyautogui.moveTo()functionwillinstantlymovethemousecursortoaspecifiedpositiononthescreen.Integervaluesforthex-andy-coordinatesmakeupthefunction’sfirstandsecondarguments,respectively.Anoptionaldurationintegerorfloatkeywordargumentspecifiesthenumberofsecondsitshouldtaketomovethemousetothedestination.Ifyouleaveitout,thedefaultis0forinstantaneousmovement.(AllofthedurationkeywordargumentsinPyAutoGUIfunctionsareoptional.)Enterthefollowingintotheinteractiveshell:
>>>importpyautogui
>>>foriinrange(10):
pyautogui.moveTo(100,100,duration=0.25)
pyautogui.moveTo(200,100,duration=0.25)
pyautogui.moveTo(200,200,duration=0.25)
pyautogui.moveTo(100,200,duration=0.25)
Thisexamplemovesthemousecursorclockwiseinasquarepatternamongthefourcoordinatesprovidedatotaloftentimes.Eachmovementtakesaquarterofasecond,asspecifiedbytheduration=0.25keywordargument.Ifyouhadn’tpassedathirdargumenttoanyofthepyautogui.moveTo()calls,themousecursorwouldhaveinstantlyteleportedfrompointtopoint.
Thepyautogui.moveRel()functionmovesthemousecursorrelativetoitscurrentposition.Thefollowingexamplemovesthemouseinthesamesquarepattern,exceptitbeginsthesquarefromwhereverthemousehappenstobeonthescreenwhenthecodestartsrunning:
>>>importpyautogui
>>>foriinrange(10):
pyautogui.moveRel(100,0,duration=0.25)
pyautogui.moveRel(0,100,duration=0.25)
pyautogui.moveRel(-100,0,duration=0.25)
pyautogui.moveRel(0,-100,duration=0.25)
pyautogui.moveRel()alsotakesthreearguments:howmanypixelstomovehorizontallytotheright,howmanypixelstomoveverticallydownward,and(optionally)howlongitshouldtaketocompletethemovement.Anegativeintegerforthefirstorsecondargumentwillcausethemousetomoveleftorupward,respectively.
GettingtheMousePositionYoucandeterminethemouse’scurrentpositionbycallingthepyautogui.position()function,whichwillreturnatupleofthemousecursor’sxandypositionsatthetimeofthefunctioncall.Enterthefollowingintotheinteractiveshell,movingthemousearoundaftereachcall:
>>>pyautogui.position()
(311,622)
>>>pyautogui.position()
(377,481)
>>>pyautogui.position()
(1536,637)
Ofcourse,yourreturnvalueswillvarydependingonwhereyourmousecursoris.
Project:“WhereIstheMouseRightNow?”BeingabletodeterminethemousepositionisanimportantpartofsettingupyourGUIautomationscripts.Butit’salmostimpossibletofigureouttheexactcoordinatesofapixeljustbylookingatthescreen.Itwouldbehandytohaveaprogramthatconstantlydisplaysthex-andy-coordinatesofthemousecursorasyoumoveitaround.
Atahighlevel,here’swhatyourprogramshoulddo:
Displaythecurrentx-andy-coordinatesofthemousecursor.Updatethesecoordinatesasthemousemovesaroundthescreen.
Thismeansyourcodewillneedtodothefollowing:
Calltheposition()functiontofetchthecurrentcoordinates.Erasethepreviouslyprintedcoordinatesbyprinting\bbackspacecharacterstothescreen.HandletheKeyboardInterruptexceptionsotheusercanpressCTRL-Ctoquit.
OpenanewfileeditorwindowandsaveitasmouseNow.py.
Step1:ImporttheModuleStartyourprogramwiththefollowing:
#!python3
#mouseNow.py-Displaysthemousecursor'scurrentposition.
importpyautogui
print('PressCtrl-Ctoquit.')
#TODO:Getandprintthemousecoordinates.
ThebeginningoftheprogramimportsthepyautoguimoduleandprintsaremindertotheuserthattheyhavetopressCTRL-Ctoquit.
Step2:SetUptheQuitCodeandInfiniteLoopYoucanuseaninfinitewhilelooptoconstantlyprintthecurrentmousecoordinatesfrommouse.position().Asforthecodethatquitstheprogram,you’llneedtocatchtheKeyboardInterruptexception,whichisraisedwhenevertheuserpressesCTRL-C.Ifyoudon’thandlethisexception,itwilldisplayanuglytracebackanderrormessagetotheuser.Addthefollowingtoyourprogram:
#!python3
#mouseNow.py-Displaysthemousecursor'scurrentposition.
importpyautogui
print('PressCtrl-Ctoquit.')
try:
whileTrue:
#TODO:Getandprintthemousecoordinates.
➊exceptKeyboardInterrupt:
➋print('\nDone.')
Tohandletheexception,enclosetheinfinitewhileloopinatrystatement.WhentheuserpressesCTRL-C,theprogramexecutionwillmovetotheexceptclause➊andDone.willbeprintedinanewline➋.
Step3:GetandPrinttheMouseCoordinatesThecodeinsidethewhileloopshouldgetthecurrentmousecoordinates,formatthemtolooknice,andprintthem.Addthefollowingcodetotheinsideofthewhileloop:
#!python3
#mouseNow.py-Displaysthemousecursor'scurrentposition.
importpyautogui
print('PressCtrl-Ctoquit.')
--snip--
#Getandprintthemousecoordinates.
x,y=pyautogui.position()
positionStr='X:'+str(x).rjust(4)+'Y:'+str(y).rjust(4)
--snip--
Usingthemultipleassignmenttrick,thexandyvariablesaregiventhevaluesofthetwointegersreturnedinthetuplefrompyautogui.position().Bypassingxandytothestr()function,youcangetstringformsoftheintegercoordinates.Therjust()stringmethodwillright-justifythemsothattheytakeupthesameamountofspace,whetherthecoordinatehasone,two,three,orfourdigits.Concatenatingtheright-justifiedstringcoordinateswith'X:'and'Y:'labelsgivesusaneatlyformattedstring,whichwillbestoredinpositionStr.
Attheendofyourprogram,addthefollowingcode:#!python3
#mouseNow.py-Displaysthemousecursor'scurrentposition.
--snip--
print(positionStr,end='')
➊print('\b'*len(positionStr),end='',flush=True)
ThisactuallyprintspositionStrtothescreen.Theend=''keywordargumenttoprint()preventsthedefaultnewlinecharacterfrombeingaddedtotheendoftheprintedline.It’spossibletoerasetextyou’vealreadyprintedtothescreen—butonlyforthemostrecentlineoftext.Onceyouprintanewlinecharacter,youcan’teraseanythingprintedbeforeit.
Toerasetext,printthe\bbackspaceescapecharacter.Thisspecialcharactererasesacharacterattheendofthecurrentlineonthescreen.Thelineat➊usesstringreplicationtoproduceastringwithasmany\bcharactersasthelengthofthestringstoredinpositionStr,whichhastheeffectoferasingthepositionStrstringthatwaslastprinted.
Foratechnicalreasonbeyondthescopeofthisbook,alwayspassflush=Truetoprint()callsthatprint\bbackspacecharacters.Otherwise,thescreenmightnotupdatethetextasdesired.
Sincethewhilelooprepeatssoquickly,theuserwon’tactuallynoticethatyou’redeletingandreprintingthewholenumberonthescreen.Forexample,ifthex-coordinateis563andthemousemovesonepixeltotheright,itwilllooklikeonlythe3in563ischangedtoa4.
Whenyouruntheprogram,therewillbeonlytwolinesprinted.Theyshouldlooklikesomethinglikethis:
PressCtrl-Ctoquit.
X:290Y:424
ThefirstlinedisplaystheinstructiontopressCTRL-Ctoquit.Thesecondlinewiththemousecoordinateswillchangeasyoumovethemousearoundthescreen.Usingthisprogram,you’llbeabletofigureoutthemousecoordinatesforyourGUIautomationscripts.
ControllingMouseInteractionNowthatyouknowhowtomovethemouseandfigureoutwhereitisonthescreen,you’rereadytostartclicking,dragging,andscrolling.
ClickingtheMouseTosendavirtualmouseclicktoyourcomputer,callthepyautogui.click()method.Bydefault,thisclickusestheleftmousebuttonandtakesplacewhereverthemousecursoriscurrentlylocated.Youcanpassx-andy-coordinatesoftheclickasoptionalfirstandsecondargumentsifyouwantittotakeplacesomewhereotherthanthemouse’scurrentposition.
Ifyouwanttospecifywhichmousebuttontouse,includethebuttonkeywordargument,withavalueof'left','middle',or'right'.Forexample,pyautogui.click(100,150,button='left')willclicktheleftmousebuttonatthecoordinates(100,150),whilepyautogui.click(200,250,button='right')willperformaright-clickat(200,250).
Enterthefollowingintotheinteractiveshell:>>>importpyautogui
>>>pyautogui.click(10,5)
Youshouldseethemousepointermovetonearthetop-leftcornerofyourscreenandclickonce.Afull“click”isdefinedaspushingamousebuttondownandthenreleasingitbackupwithoutmovingthecursor.Youcanalsoperformaclickbycallingpyautogui.mouseDown(),whichonlypushesthemousebuttondown,andpyautogui.mouseUp(),whichonlyreleasesthebutton.Thesefunctionshavethesameargumentsasclick(),andinfact,theclick()functionisjustaconvenientwrapperaroundthesetwofunctioncalls.
Asafurtherconvenience,thepyautogui.doubleClick()functionwillperformtwoclickswiththeleftmousebutton,whilethepyautogui.rightClick()andpyautogui.middleClick()functionswillperformaclickwiththerightandmiddlemousebuttons,respectively.
DraggingtheMouseDraggingmeansmovingthemousewhileholdingdownoneofthemousebuttons.Forexample,youcanmovefilesbetweenfoldersbydraggingthefoldericons,oryoucanmoveappointmentsaroundinacalendarapp.
PyAutoGUIprovidesthepyautogui.dragTo()andpyautogui.dragRel()functionstodragthemousecursortoanewlocationoralocationrelativetoitscurrentone.TheargumentsfordragTo()anddragRel()arethesameasmoveTo()andmoveRel():thex-coordinate/horizontalmovement,they-coordinate/verticalmovement,andanoptionaldurationoftime.(OSXdoesnotdragcorrectlywhenthemousemovestooquickly,sopassingadurationkeywordargumentisrecommended.)
Totrythesefunctions,openagraphics-drawingapplicationsuchasPaintonWindows,PaintbrushonOSX,orGNUPaintonLinux.(Ifyoudon’thaveadrawingapplication,youcanusetheonlineoneathttp://sumopaint.com/.)IwillusePyAutoGUItodrawintheseapplications.
Withthemousecursoroverthedrawingapplication’scanvasandthePencilorBrushtoolselected,enterthefollowingintoanewfileeditorwindowandsaveitasspiralDraw.py:
importpyautogui,time
➊time.sleep(5)
➋pyautogui.click()#clicktoputdrawingprograminfocus
distance=200
whiledistance>0:
➌pyautogui.dragRel(distance,0,duration=0.2)#moveright
➍distance=distance-5
➎pyautogui.dragRel(0,distance,duration=0.2)#movedown
➏pyautogui.dragRel(-distance,0,duration=0.2)#moveleft
distance=distance-5
pyautogui.dragRel(0,-distance,duration=0.2)#moveup
Whenyourunthisprogram,therewillbeafive-seconddelay➊foryoutomovethemousecursoroverthedrawingprogram’swindowwiththePencilorBrushtoolselected.ThenspiralDraw.pywilltakecontrolofthemouseandclicktoputthedrawingprograminfocus➋.Awindowisinfocuswhenithasanactiveblinkingcursor,andtheactionsyoutake—liketypingor,inthiscase,draggingthemouse—willaffectthatwindow.Oncethedrawingprogramisinfocus,spiralDraw.pydrawsasquarespiralpatternliketheoneinFigure18-2.
Figure18-2.Theresultsfromthepyautogui.dragRel()example
Thedistancevariablestartsat200,soonthefirstiterationofthewhileloop,thefirstdragRel()calldragsthecursor200pixelstotheright,taking0.2seconds➌.distanceisthendecreasedto195➍,andtheseconddragRel()calldragsthecursor195pixelsdown
➎.ThethirddragRel()calldragsthecursor–195horizontally(195totheleft)➏,distanceisdecreasedto190,andthelastdragRel()calldragsthecursor190pixelsup.Oneachiteration,themouseisdraggedright,down,left,andup,anddistanceisslightlysmallerthanitwasinthepreviousiteration.Byloopingoverthiscode,youcanmovethemousecursortodrawasquarespiral.
Youcoulddrawthisspiralbyhand(orrather,bymouse),butyou’dhavetoworkslowlytobesoprecise.PyAutoGUIcandoitinafewseconds!
NOTE
Youcouldhaveyourcodedrawtheimageusingthepillowmodule’sdrawingfunctions—seeChapter17formoreinformation.ButusingGUIautomationallowsyoutomakeuseoftheadvanceddrawingtoolsthatgraphicsprogramscanprovide,suchasgradients,differentbrushes,orthefillbucket.
ScrollingtheMouseThefinalPyAutoGUImousefunctionisscroll(),whichyoupassanintegerargumentforhowmanyunitsyouwanttoscrollthemouseupordown.Thesizeofaunitvariesforeachoperatingsystemandapplication,soyou’llhavetoexperimenttoseeexactlyhowfaritscrollsinyourparticularsituation.Thescrollingtakesplaceatthemousecursor’scurrentposition.Passingapositiveintegerscrollsup,andpassinganegativeintegerscrollsdown.RunthefollowinginIDLE’sinteractiveshellwhilethemousecursorisovertheIDLEwindow:
>>>pyautogui.scroll(200)
You’llseeIDLEbrieflyscrollupward—andthengobackdown.ThedownwardscrollinghappensbecauseIDLEautomaticallyscrollsdowntothebottomafterexecutinganinstruction.Enterthiscodeinstead:
>>>importpyperclip
>>>numbers=''
>>>foriinrange(200):
numbers=numbers+str(i)+'\n'
>>>pyperclip.copy(numbers)
Thisimportspyperclipandsetsupanemptystring,numbers.Thecodethenloopsthrough200numbersandaddseachnumbertonumbers,alongwithanewline.Afterpyperclip.copy(numbers),theclipboardwillbeloadedwith200linesofnumbers.Openanewfileeditorwindowandpastethetextintoit.Thiswillgiveyoualargetextwindowtotryscrollingin.Enterthefollowingcodeintotheinteractiveshell:
>>>importtime,pyautogui
>>>time.sleep(5);pyautogui.scroll(100)
Onthesecondline,youentertwocommandsseparatedbyasemicolon,whichtellsPythontorunthecommandsasiftheywereonseparatelines.Theonlydifferenceisthattheinteractiveshellwon’tpromptyouforinputbetweenthetwoinstructions.Thisisimportantforthisexamplebecausewewanttothecalltopyautogui.scroll()tohappenautomaticallyafterthewait.(Notethatwhileputtingtwocommandsononelinecanbeusefulintheinteractiveshell,youshouldstillhaveeachinstructiononaseparatelineinyourprograms.)
AfterpressingENTERtorunthecode,youwillhavefivesecondstoclickthefileeditorwindowtoputitinfocus.Oncethepauseisover,thepyautogui.scroll()callwillcause
WorkingwiththeScreenYourGUIautomationprogramsdon’thavetoclickandtypeblindly.PyAutoGUIhasscreenshotfeaturesthatcancreateanimagefilebasedonthecurrentcontentsofthescreen.ThesefunctionscanalsoreturnaPillowImageobjectofthecurrentscreen’sappearance.Ifyou’vebeenskippingaroundinthisbook,you’llwanttoreadChapter17andinstallthepillowmodulebeforecontinuingwiththissection.
OnLinuxcomputers,thescrotprogramneedstobeinstalledtousethescreenshotfunctionsinPyAutoGUI.InaTerminalwindow,runsudoapt-getinstallscrottoinstallthisprogram.Ifyou’reonWindowsorOSX,skipthisstepandcontinuewiththesection.
GettingaScreenshotTotakescreenshotsinPython,callthepyautogui.screenshot()function.Enterthefollowingintotheinteractiveshell:
>>>importpyautogui
>>>im=pyautogui.screenshot()
TheimvariablewillcontaintheImageobjectofthescreenshot.YoucannowcallmethodsontheImageobjectintheimvariable,justlikeanyotherImageobject.Enterthefollowingintotheinteractiveshell:
>>>im.getpixel((0,0))
(176,176,175)
>>>im.getpixel((50,200))
(130,135,144)
Passgetpixel()atupleofcoordinates,like(0,0)or(50,200),andit’lltellyouthecolorofthepixelatthosecoordinatesinyourimage.Thereturnvaluefromgetpixel()isanRGBtupleofthreeintegersfortheamountofred,green,andblueinthepixel.(Thereisnofourthvalueforalpha,becausescreenshotimagesarefullyopaque.)Thisishowyourprogramscan“see”whatiscurrentlyonthescreen.
AnalyzingtheScreenshotSaythatoneofthestepsinyourGUIautomationprogramistoclickagraybutton.Beforecallingtheclick()method,youcouldtakeascreenshotandlookatthepixelwherethescriptisabouttoclick.Ifit’snotthesamegrayasthegraybutton,thenyourprogramknowssomethingiswrong.Maybethewindowmovedunexpectedly,ormaybeapop-updialoghasblockedthebutton.Atthispoint,insteadofcontinuing—andpossiblywreakinghavocbyclickingthewrongthing—yourprogramcan“see”thatitisn’tclickingontherightthingandstopitself.
PyAutoGUI’spixelMatchesColor()functionwillreturnTrueifthepixelatthegivenx-andy-coordinatesonthescreenmatchesthegivencolor.Thefirstandsecondargumentsareintegersforthex-andy-coordinates,andthethirdargumentisatupleofthreeintegersfortheRGBcolorthescreenpixelmustmatch.Enterthefollowingintotheinteractiveshell:
>>>importpyautogui
>>>im=pyautogui.screenshot()
➊>>>im.getpixel((50,200))
(130,135,144)
➋>>>pyautogui.pixelMatchesColor(50,200,(130,135,144))
True
➌>>>pyautogui.pixelMatchesColor(50,200,(255,135,144))
False
Aftertakingascreenshotandusinggetpixel()togetanRGBtupleforthecolorofapixelatspecificcoordinates➊,passthesamecoordinatesandRGBtupletopixelMatchesColor()➋,whichshouldreturnTrue.ThenchangeavalueintheRGBtupleandcallpixelMatchesColor()againforthesamecoordinates➌.Thisshouldreturnfalse.ThismethodcanbeusefultocallwheneveryourGUIautomationprogramsareabouttocallclick().Notethatthecoloratthegivencoordinatesmustexactlymatch.Ifitisevenslightlydifferent—forexample,(255,255,254)insteadof(255,255,255)—thenpixelMatchesColor()willreturnFalse.
Project:ExtendingthemouseNowProgramYoucouldextendthemouseNow.pyprojectfromearlierinthischaptersothatitnotonlygivesthex-andy-coordinatesofthemousecursor’scurrentpositionbutalsogivestheRGBcolorofthepixelunderthecursor.ModifythecodeinsidethewhileloopofmouseNow.pytolooklikethis:
#!python3
#mouseNow.py-Displaysthemousecursor'scurrentposition.
--snip--
positionStr='X:'+str(x).rjust(4)+'Y:'+str(y).rjust(4)
pixelColor=pyautogui.screenshot().getpixel((x,y))
positionStr+='RGB:('+str(pixelColor[0]).rjust(3)
positionStr+=','+str(pixelColor[1]).rjust(3)
positionStr+=','+str(pixelColor[2]).rjust(3)+')'
print(positionStr,end='')
--snip--
Now,whenyourunmouseNow.py,theoutputwillincludetheRGBcolorvalueofthepixelunderthemousecursor.
PressCtrl-Ctoquit.
X:406Y:17RGB:(161,50,50)
Thisinformation,alongwiththepixelMatchesColor()function,shouldmakeiteasytoaddpixelcolorcheckstoyourGUIautomationscripts.
ImageRecognitionButwhatifyoudonotknowbeforehandwherePyAutoGUIshouldclick?Youcanuseimagerecognitioninstead.GivePyAutoGUIanimageofwhatyouwanttoclickandletitfigureoutthecoordinates.
Forexample,ifyouhavepreviouslytakenascreenshottocapturetheimageofaSubmitbuttoninsubmit.png,thelocateOnScreen()functionwillreturnthecoordinateswherethatimageisfound.ToseehowlocateOnScreen()works,trytakingascreenshotofasmallareaonyourscreen;thensavetheimageandenterthefollowingintotheinteractiveshell,replacing'submit.png'withthefilenameofyourscreenshot:
>>>importpyautogui
>>>pyautogui.locateOnScreen('submit.png')
(643,745,70,29)
Thefour-integertuplethatlocateOnScreen()returnshasthex-coordinateoftheleftedge,they-coordinateofthetopedge,thewidth,andtheheightforthefirstplaceonthescreentheimagewasfound.Ifyou’retryingthisonyourcomputerwithyourownscreenshot,yourreturnvaluewillbedifferentfromtheoneshownhere.
Iftheimagecannotbefoundonthescreen,locateOnScreen()willreturnNone.Notethattheimageonthescreenmustmatchtheprovidedimageperfectlyinordertoberecognized.Iftheimageisevenapixeloff,locateOnScreen()willreturnNone.
Iftheimagecanbefoundinseveralplacesonthescreen,locateAllOnScreen()willreturnaGeneratorobject,whichcanbepassedtolist()toreturnalistoffour-integertuples.Therewillbeonefour-integertupleforeachlocationwheretheimageisfoundonthescreen.Continuetheinteractiveshellexamplebyenteringthefollowing(andreplacing'submit.png'withyourownimagefilename):
>>>list(pyautogui.locateAllOnScreen('submit.png'))
[(643,745,70,29),(1007,801,70,29)]
Eachofthefour-integertuplesrepresentsanareaonthescreen.Ifyourimageisonlyfoundinonearea,thenusinglist()andlocateAllOnScreen()justreturnsalistcontainingonetuple.
Onceyouhavethefour-integertuplefortheareaonthescreenwhereyourimagewasfound,youcanclickthecenterofthisareabypassingthetupletothecenter()functiontoreturnx-andy-coordinatesofthearea’scenter.Enterthefollowingintotheinteractiveshell,replacingtheargumentswithyourownfilename,four-integertuple,andcoordinatepair:
>>>pyautogui.locateOnScreen('submit.png')
(643,745,70,29)
>>>pyautogui.center((643,745,70,29))
(678,759)
>>>pyautogui.click((678,759))
Onceyouhavecentercoordinatesfromcenter(),passingthecoordinatestoclick()shouldclickthecenteroftheareaonthescreenthatmatchestheimageyoupassedtolocateOnScreen().
ControllingtheKeyboardPyAutoGUIalsohasfunctionsforsendingvirtualkeypressestoyourcomputer,whichenablesyoutofilloutformsorentertextintoapplications.
SendingaStringfromtheKeyboardThepyautogui.typewrite()functionsendsvirtualkeypressestothecomputer.Whatthesekeypressesdodependsonwhatwindowandtextfieldhavefocus.Youmaywanttofirstsendamouseclicktothetextfieldyouwantinordertoensurethatithasfocus.
Asasimpleexample,let’susePythontoautomaticallytypethewordsHelloworld!intoafileeditorwindow.First,openanewfileeditorwindowandpositionitintheupper-leftcornerofyourscreensothatPyAutoGUIwillclickintherightplacetobringitintofocus.Next,enterthefollowingintotheinteractiveshell:
>>>pyautogui.click(100,100);pyautogui.typewrite('Helloworld!')
Noticehowplacingtwocommandsonthesameline,separatedbyasemicolon,keepstheinteractiveshellfrompromptingyouforinputbetweenrunningthetwoinstructions.Thispreventsyoufromaccidentallybringinganewwindowintofocusbetweentheclick()andtypewrite()calls,whichwouldmessuptheexample.
Pythonwillfirstsendavirtualmouseclicktothecoordinates(100,100),whichshouldclickthefileeditorwindowandputitinfocus.Thetypewrite()callwillsendthetextHelloworld!tothewindow,makingitlooklikeFigure18-3.Younowhavecodethatcantypeforyou!
Figure18-3.UsingPyAutogGUItoclickthefileeditorwindowandtypeHelloworld!intoit
Bydefault,thetypewrite()functionwilltypethefullstringinstantly.However,youcanpassanoptionalsecondargumenttoaddashortpausebetweeneachcharacter.Thissecondargumentisanintegerorfloatvalueofthenumberofsecondstopause.Forexample,pyautogui.typewrite('Helloworld!',0.25)willwaitaquarter-secondaftertypingH,anotherquarter-secondaftere,andsoon.Thisgradualtypewritereffectmaybeusefulforslowerapplicationsthatcan’tprocesskeystrokesfastenoughtokeepupwithPyAutoGUI.
ForcharacterssuchasAor!,PyAutoGUIwillautomaticallysimulateholdingdowntheSHIFTkeyaswell.
KeyNamesNotallkeysareeasytorepresentwithsingletextcharacters.Forexample,howdoyourepresentSHIFTortheleftarrowkeyasasinglecharacter?InPyAutoGUI,thesekeyboardkeysarerepresentedbyshortstringvaluesinstead:'esc'fortheESCkeyor'enter'fortheENTERkey.
Insteadofasinglestringargument,alistofthesekeyboardkeystringscanbepassedtotypewrite().Forexample,thefollowingcallpressestheAkey,thentheBkey,thentheleftarrowkeytwice,andfinallytheXandYkeys:
>>>pyautogui.typewrite(['a','b','left','left','X','Y'])
Becausepressingtheleftarrowkeymovesthekeyboardcursor,thiswilloutputXYab.Table18-1liststhePyAutoGUIkeyboardkeystringsthatyoucanpasstotypewrite()tosimulatepressinganycombinationofkeys.
Youcanalsoexaminethepyautogui.KEYBOARD_KEYSlisttoseeallpossiblekeyboardkeystringsthatPyAutoGUIwillaccept.The'shift'stringreferstotheleftSHIFTkeyandisequivalentto'shiftleft'.Thesameappliesfor'ctrl','alt',and'win'strings;theyallrefertotheleft-sidekey.
Table18-1.PyKeyboardAttributes
Keyboardkeystring Meaning
'a','b','c','A','B','C','1','2','3','!','@','#',andsoon
Thekeysforsinglecharacters
'enter'(or'return'or'\n') TheENTERkey
'esc' TheESCkey
'shiftleft','shiftright' TheleftandrightSHIFTkeys
'altleft','altright' TheleftandrightALTkeys
'ctrlleft','ctrlright' TheleftandrightCTRLkeys
'tab'(or'\t') TheTABkey
'backspace','delete' TheBACKSPACEandDELETEkeys
'pageup','pagedown' ThePAGEUPandPAGEDOWNkeys
'home','end' TheHOMEandENDkeys
'up','down','left','right' Theup,down,left,andrightarrowkeys
'f1','f2','f3',andsoon TheF1toF12keys
'volumemute','volumedown','volumeup'
Themute,volumedown,andvolumeupkeys(somekeyboardsdonothavethesekeys,butyouroperatingsystemwillstillbeabletounderstandthesesimulatedkeypresses)
'pause' ThePAUSEkey
'capslock','numlock','scrolllock'
TheCAPSLOCK,NUMLOCK,andSCROLLLOCKkeys
'insert' TheINSorINSERTkey
'printscreen' ThePRTSCorPRINTSCREENkey
'winleft','winright' TheleftandrightWINkeys(onWindows)
'command' TheCommand( )key(onOSX)'option'TheOPTIONkey(onOSX)
PressingandReleasingtheKeyboardMuchlikethemouseDown()andmouseUp()functions,pyautogui.keyDown()andpyautogui.keyUp()willsendvirtualkeypressesandreleasestothecomputer.Theyare
passedakeyboardkeystring(seeTable18-1)fortheirargument.Forconvenience,PyAutoGUIprovidesthepyautogui.press()function,whichcallsbothofthesefunctionstosimulateacompletekeypress.
Runthefollowingcode,whichwilltypeadollarsigncharacter(obtainedbyholdingtheSHIFTkeyandpressing4):
>>>pyautogui.keyDown('shift');pyautogui.press('4');pyautogui.keyUp('shift')
ThislinepressesdownSHIFT,presses(andreleases)4,andthenreleasesSHIFT.Ifyouneedtotypeastringintoatextfield,thetypewrite()functionismoresuitable.Butforapplicationsthattakesingle-keycommands,thepress()functionisthesimplerapproach.
HotkeyCombinationsAhotkeyorshortcutisacombinationofkeypressestoinvokesomeapplicationfunction.ThecommonhotkeyforcopyingaselectionisCTRL-C(onWindowsandLinux)or⌘-C(onOSX).TheuserpressesandholdstheCTRLkey,thenpressestheCkey,andthenreleasestheCandCTRLkeys.TodothiswithPyAutoGUI’skeyDown()andkeyUp()functions,youwouldhavetoenterthefollowing:
pyautogui.keyDown('ctrl')
pyautogui.keyDown('c')
pyautogui.keyUp('c')
pyautogui.keyUp('ctrl')
Thisisrathercomplicated.Instead,usethepyautogui.hotkey()function,whichtakesmultiplekeyboardkeystringarguments,pressestheminorder,andreleasestheminthereverseorder.FortheCTRL-Cexample,thecodewouldsimplybeasfollows:
pyautogui.hotkey('ctrl','c')
Thisfunctionisespeciallyusefulforlargerhotkeycombinations.InWord,theCTRL-ALT-SHIFT-ShotkeycombinationdisplaystheStylepane.Insteadofmakingeightdifferentfunctioncalls(fourkeyDown()callsandfourkeyUp()calls),youcanjustcallhotkey('ctrl','alt','shift','s').
WithanewIDLEfileeditorwindowintheupper-leftcornerofyourscreen,enterthefollowingintotheinteractiveshell(inOSX,replace'alt'with'ctrl'):
>>>importpyautogui,time
>>>defcommentAfterDelay():
➊pyautogui.click(100,100)
➋pyautogui.typewrite('InIDLE,Alt-3commentsoutaline.')
time.sleep(2)
➌pyautogui.hotkey('alt','3')
>>>commentAfterDelay()
ThisdefinesafunctioncommentAfterDelay()that,whencalled,willclickthefileeditorwindowtobringitintofocus➊,typeInIDLE,Atl-3commentsoutaline➋,pausefor2seconds,andthensimulatepressingtheALT-3hotkey(orCTRL-3onOSX)➌.Thiskeyboardshortcutaddstwo#characterstothecurrentline,commentingitout.(ThisisausefultricktoknowwhenwritingyourowncodeinIDLE.)
ReviewofthePyAutoGUIFunctionsSincethischaptercoveredmanydifferentfunctions,hereisaquicksummaryreference:
moveTo(x,y).Movesthemousecursortothegivenxandycoordinates.moveRel(xOffset,yOffset).Movesthemousecursorrelativetoitscurrentposition.dragTo(x,y).Movesthemousecursorwhiletheleftbuttonishelddown.dragRel(xOffset,yOffset).Movesthemousecursorrelativetoitscurrentpositionwhiletheleftbuttonishelddown.click(x,y,button).Simulatesaclick(leftbuttonbydefault).rightClick().Simulatesaright-buttonclick.middleClick().Simulatesamiddle-buttonclick.doubleClick().Simulatesadoubleleft-buttonclick.mouseDown(x,y,button).Simulatespressingdownthegivenbuttonatthepositionx,y.mouseUp(x,y,button).Simulatesreleasingthegivenbuttonatthepositionx,y.scroll(units).Simulatesthescrollwheel.Apositiveargumentscrollsup;anegativeargumentscrollsdown.typewrite(message).Typesthecharactersinthegivenmessagestring.typewrite([key1,key2,key3]).Typesthegivenkeyboardkeystrings.press(key).Pressesthegivenkeyboardkeystring.keyDown(key).Simulatespressingdownthegivenkeyboardkey.keyUp(key).Simulatesreleasingthegivenkeyboardkey.hotkey([key1,key2,key3]).Simulatespressingthegivenkeyboardkeystringsdowninorderandthenreleasingtheminreverseorder.screenshot().ReturnsascreenshotasanImageobject.(SeeChapter17forinformationonImageobjects.)
Project:AutomaticFormFillerOfalltheboringtasks,fillingoutformsisthemostdreadedofchores.It’sonlyfittingthatnow,inthefinalchapterproject,youwillslayit.Sayyouhaveahugeamountofdatainaspreadsheet,andyouhavetotediouslyretypeitintosomeotherapplication’sforminterface—withnointerntodoitforyou.AlthoughsomeapplicationswillhaveanImportfeaturethatwillallowyoutouploadaspreadsheetwiththeinformation,sometimesitseemsthatthereisnootherwaythanmindlesslyclickingandtypingforhoursonend.You’vecomethisfarinthisbook;youknowthatofcoursethere’sanotherway.
TheformforthisprojectisaGoogleDocsformthatyoucanfindathttp://nostarch.com/automatestuff.ItlookslikeFigure18-4.
Figure18-4.Theformusedforthisproject
Atahighlevel,here’swhatyourprogramshoulddo:
Clickthefirsttextfieldoftheform.Movethroughtheform,typinginformationintoeachfield.ClicktheSubmitbutton.
Repeattheprocesswiththenextsetofdata.
Thismeansyourcodewillneedtodothefollowing:
Callpyautogui.click()toclicktheformandSubmitbutton.Callpyautogui.typewrite()toentertextintothefields.HandletheKeyboardInterruptexceptionsotheusercanpressCTRL-Ctoquit.
OpenanewfileeditorwindowandsaveitasformFiller.py.
Step1:FigureOuttheStepsBeforewritingcode,youneedtofigureouttheexactkeystrokesandmouseclicksthatwillfillouttheformonce.ThemouseNow.pyscriptinProject:“WhereIstheMouseRightNow?”canhelpyoufigureoutspecificmousecoordinates.Youneedtoknowonlythecoordinatesofthefirsttextfield.Afterclickingthefirstfield,youcanjustpressTABtomovefocustothenextfield.Thiswillsaveyoufromhavingtofigureoutthex-andy-coordinatestoclickforeveryfield.
Herearethestepsforenteringdataintotheform:
1. ClicktheNamefield.(UsemouseNow.pytodeterminethecoordinatesaftermaximizingthebrowserwindow.OnOSX,youmayneedtoclicktwice:oncetoputthebrowserinfocusandagaintoclicktheNamefield.)
2. TypeanameandthenpressTAB.3. TypeagreatestfearandthenpressTAB.4. Pressthedownarrowkeythecorrectnumberoftimestoselectthewizardpower
source:onceforwand,twiceforamulet,threetimesforcrystalball,andfourtimesformoney.ThenpressTAB.(NotethatonOSX,youwillhavetopressthedownarrowkeyonemoretimeforeachoption.Forsomebrowsers,youmayneedtopresstheENTERkeyaswell.)
5. PresstherightarrowkeytoselecttheanswertotheRobocopquestion.Pressitoncefor2,twicefor3,threetimesfor4,orfourtimesfor5;orjustpressthespacebartoselect1(whichishighlightedbydefault).ThenpressTAB.
6. TypeanadditionalcommentandthenpressTAB.7. PresstheENTERkeyto“click”theSubmitbutton.8. Aftersubmittingtheform,thebrowserwilltakeyoutoapagewhereyouwillneed
toclickalinktoreturntotheformpage.
Notethatifyourunthisprogramagainlater,youmayhavetoupdatethemouseclickcoordinates,sincethebrowserwindowmighthavechangedposition.Toworkaroundthis,alwaysmakesurethebrowserwindowismaximizedbeforefindingthecoordinatesofthefirstformfield.Also,differentbrowsersondifferentoperatingsystemsmightworkslightlydifferentlyfromthestepsgivenhere,socheckthatthesekeystrokecombinationsworkforyourcomputerbeforerunningyourprogram.
Step2:SetUpCoordinatesLoadtheexampleformyoudownloaded(Figure18-4)inabrowserandmaximizeyourbrowserwindow.OpenanewTerminalorcommandlinewindowtorunthemouseNow.pyscript,andthenmouseovertheNamefieldtofigureoutitsthex-andy-coordinates.
ThesenumberswillbeassignedtothenameFieldvariableinyourprogram.Also,findoutthex-andy-coordinatesandRGBtuplevalueoftheblueSubmitbutton.ThesevalueswillbeassignedtothesubmitButtonandsubmitButtonColorvariables,respectively.
Next,fillinsomedummydatafortheformandclickSubmit.YouneedtoseewhatthenextpagelookslikesothatyoucanusemouseNow.pytofindthecoordinatesoftheSubmitanotherresponselinkonthisnewpage.
Makeyoursourcecodelooklikethefollowing,beingsuretoreplaceallthevaluesinitalicswiththecoordinatesyoudeterminedfromyourowntests:
#!python3
#formFiller.py-Automaticallyfillsintheform.
importpyautogui,time
#Setthesetothecorrectcoordinatesforyourcomputer.
nameField=(648,319)
submitButton=(651,817)
submitButtonColor=(75,141,249)
submitAnotherLink=(760,224)
#TODO:Givetheuserachancetokillthescript.
#TODO:Waituntiltheformpagehasloaded.
#TODO:FillouttheNameField.
#TODO:FillouttheGreatestFear(s)field.
#TODO:FillouttheSourceofWizardPowersfield.
#TODO:FillouttheRobocopfield.
#TODO:FillouttheAdditionalCommentsfield.
#TODO:ClickSubmit.
#TODO:Waituntilformpagehasloaded.
#TODO:ClicktheSubmitanotherresponselink.
Nowyouneedthedatayouactuallywanttoenterintothisform.Intherealworld,thisdatamightcomefromaspreadsheet,aplaintextfile,orawebsite,anditwouldrequireadditionalcodetoloadintotheprogram.Butforthisproject,you’lljusthardcodeallthisdatainavariable.Addthefollowingtoyourprogram:
#!python3
#formFiller.py-Automaticallyfillsintheform.
--snip--
formData=[{'name':'Alice','fear':'eavesdroppers','source':'wand',
'robocop':4,'comments':'TellBobIsaidhi.'},
{'name':'Bob','fear':'bees','source':'amulet','robocop':4,
'comments':'n/a'},
{'name':'Carol','fear':'puppets','source':'crystalball',
'robocop':1,'comments':'Pleasetakethepuppetsoutofthe
breakroom.'},
{'name':'AlexMurphy','fear':'ED-209','source':'money',
'robocop':5,'comments':'Protecttheinnocent.Servethepublic
trust.Upholdthelaw.'},
]
--snip--
TheformDatalistcontainsfourdictionariesforfourdifferentnames.Eachdictionaryhasnamesoftextfieldsaskeysandresponsesasvalues.Thelastbitofsetupistoset
PyAutoGUI’sPAUSEvariabletowaithalfasecondaftereachfunctioncall.AddthefollowingtoyourprogramaftertheformDataassignmentstatement:
pyautogui.PAUSE=0.5
Step3:StartTypingDataAforloopwilliterateovereachofthedictionariesintheformDatalist,passingthevaluesinthedictionarytothePyAutoGUIfunctionsthatwillvirtuallytypeinthetextfields.
Addthefollowingcodetoyourprogram:#!python3
#formFiller.py-Automaticallyfillsintheform.
--snip--
forpersoninformData:
#Givetheuserachancetokillthescript.
print('>>>5SECONDPAUSETOLETUSERPRESSCTRL-C<<<')
➊time.sleep(5)
#Waituntiltheformpagehasloaded.
➋whilenotpyautogui.pixelMatchesColor(submitButton[0],submitButton[1],
submitButtonColor):
time.sleep(0.5)
--snip--
Asasmallsafetyfeature,thescripthasafive-secondpause➊thatgivestheuserachancetohitCTRL-C(ormovethemousecursortotheupper-leftcornerofthescreentoraisetheFailSafeExceptionexception)toshuttheprogramdownincaseit’sdoingsomethingunexpected.ThentheprogramwaitsuntiltheSubmitbutton’scolorisvisible➋,lettingtheprogramknowthattheformpagehasloaded.Rememberthatyoufiguredoutthecoordinateandcolorinformationinstep2andstoreditinthesubmitButtonandsubmitButtonColorvariables.TousepixelMatchesColor(),youpassthecoordinatessubmitButton[0]andsubmitButton[1],andthecolorsubmitButtonColor.
AfterthecodethatwaitsuntiltheSubmitbutton’scolorisvisible,addthefollowing:#!python3
#formFiller.py-Automaticallyfillsintheform.
--snip--
➊print('Entering%sinfo…'%(person['name']))
➋pyautogui.click(nameField[0],nameField[1])
#FillouttheNamefield.
➌pyautogui.typewrite(person['name']+'\t')
#FillouttheGreatestFear(s)field.
➍pyautogui.typewrite(person['fear']+'\t')
--snip--
Weaddanoccasionalprint()calltodisplaytheprogram’sstatusinitsTerminalwindowtolettheuserknowwhat’sgoingon➊.
Sincetheprogramknowsthattheformisloaded,it’stimetocallclick()toclicktheNamefield➋andtypewrite()toenterthestringinperson['name']➌.The'\t'characterisaddedtotheendofthestringpassedtotypewrite()tosimulatepressingTAB,whichmovesthekeyboardfocustothenextfield,GreatestFear(s).Anothercalltotypewrite()willtypethestringinperson['fear']intothisfieldandthentabtothenext
fieldintheform➍.
Step4:HandleSelectListsandRadioButtonsThedrop-downmenuforthe“wizardpowers”questionandtheradiobuttonsfortheRobocopfieldaretrickiertohandlethanthetextfields.Toclicktheseoptionswiththemouse,youwouldhavetofigureoutthex-andy-coordinatesofeachpossibleoption.It’seasiertousethekeyboardarrowkeystomakeaselectioninstead.
Addthefollowingtoyourprogram:#!python3
#formFiller.py-Automaticallyfillsintheform.
--snip--
#FillouttheSourceofWizardPowersfield.
➊ifperson['source']=='wand':
➋pyautogui.typewrite(['down','\t'])
elifperson['source']=='amulet':
pyautogui.typewrite(['down','down','\t'])
elifperson['source']=='crystalball':
pyautogui.typewrite(['down','down','down','\t'])
elifperson['source']=='money':
pyautogui.typewrite(['down','down','down','down','\t'])
#FillouttheRobocopfield.
➌ifperson['robocop']==1:
➍pyautogui.typewrite(['','\t'])
elifperson['robocop']==2:
pyautogui.typewrite(['right','\t'])
elifperson['robocop']==3:
pyautogui.typewrite(['right','right','\t'])
elifperson['robocop']==4:
pyautogui.typewrite(['right','right','right','\t'])
elifperson['robocop']==5:
pyautogui.typewrite(['right','right','right','right','\t'])
--snip--
Oncethedrop-downmenuhasfocus(rememberthatyouwrotecodetosimulatepressingTABafterfillingouttheGreatestFear(s)field),pressingthedownarrowkeywillmovetothenextitemintheselectionlist.Dependingonthevalueinperson['source'],yourprogramshouldsendanumberofdownarrowkeypressesbeforetabbingtothenextfield.Ifthevalueatthe'source'keyinthisuser’sdictionaryis'wand'➊,wesimulatepressingthedownarrowkeyonce(toselectWand)andpressingTAB➋.Ifthevalueatthe'source'keyis'amulet',wesimulatepressingthedownarrowkeytwiceandpressingTAB,andsoonfortheotherpossibleanswers.
TheradiobuttonsfortheRobocopquestioncanbeselectedwiththerightarrowkeys—or,ifyouwanttoselectthefirstchoice➌,byjustpressingthespacebar➍.
Step5:SubmittheFormandWaitYoucanfillouttheAdditionalCommentsfieldwiththetypewrite()functionbypassingperson['comments']asanargument.Youcantypeanadditional'\t'tomovethekeyboardfocustothenextfieldortheSubmitbutton.OncetheSubmitbuttonisinfocus,callingpyautogui.press('enter')willsimulatepressingtheENTERkeyandsubmittheform.Aftersubmittingtheform,yourprogramwillwaitfivesecondsforthenextpagetoload.
Oncethenewpagehasloaded,itwillhaveaSubmitanotherresponselinkthatwilldirect
thebrowsertoanew,emptyformpage.YoustoredthecoordinatesofthislinkasatupleinsubmitAnotherLinkinstep2,sopassthesecoordinatestopyautogui.click()toclickthislink.
Withthenewformreadytogo,thescript’souterforloopcancontinuetothenextiterationandenterthenextperson’sinformationintotheform.
Completeyourprogrambyaddingthefollowingcode:#!python3
#formFiller.py-Automaticallyfillsintheform.
--snip--
#FillouttheAdditionalCommentsfield.
pyautogui.typewrite(person['comments']+'\t')
#ClickSubmit.
pyautogui.press('enter')
#Waituntilformpagehasloaded.
print('ClickedSubmit.')
time.sleep(5)
#ClicktheSubmitanotherresponselink.
pyautogui.click(submitAnotherLink[0],submitAnotherLink[1])
Oncethemainforloophasfinished,theprogramwillhavepluggedintheinformationforeachperson.Inthisexample,thereareonlyfourpeopletoenter.Butifyouhad4,000people,thenwritingaprogramtodothiswouldsaveyoualotoftimeandtyping!
SummaryGUIautomationwiththepyautoguimoduleallowsyoutointeractwithapplicationsonyourcomputerbycontrollingthemouseandkeyboard.Whilethisapproachisflexibleenoughtodoanythingthatahumanusercando,thedownsideisthattheseprogramsarefairlyblindtowhattheyareclickingortyping.WhenwritingGUIautomationprograms,trytoensurethattheywillcrashquicklyifthey’regivenbadinstructions.Crashingisannoying,butit’smuchbetterthantheprogramcontinuinginerror.
Youcanmovethemousecursoraroundthescreenandsimulatemouseclicks,keystrokes,andkeyboardshortcutswithPyAutoGUI.Thepyautoguimodulecanalsocheckthecolorsonthescreen,whichcanprovideyourGUIautomationprogramwithenoughofanideaofthescreencontentstoknowwhetherithasgottenofftrack.YoucanevengivePyAutoGUIascreen-shotandletitfigureoutthecoordinatesoftheareayouwanttoclick.
YoucancombineallofthesePyAutoGUIfeaturestoautomateanymindlesslyrepetitivetaskonyourcomputer.Infact,itcanbedownrighthypnotictowatchthemousecursormoveonitsownandseetextappearonthescreenautomatically.Whynotspendthetimeyousavedbysittingbackandwatchingyourprogramdoallyourworkforyou?There’sacertainsatisfactionthatcomesfromseeinghowyourclevernesshassavedyoufromtheboringstuff.
PracticeQuestionsQ: 1.HowcanyoutriggerPyAutoGUI’sfailsafetostopaprogram?
Q: 2.Whatfunctionreturnsthecurrentresolution()?
Q: 3.Whatfunctionreturnsthecoordinatesforthemousecursor’scurrentposition?
Q: 4.Whatisthedifferencebetweenpyautogui.moveTo()andpyautogui.moveRel()?
Q: 5.Whatfunctionscanbeusedtodragthemouse?
Q: 6.Whatfunctioncallwilltypeoutthecharactersof"Helloworld!"?
Q: 7.Howcanyoudokeypressesforspecialkeyssuchasthekeyboard’sleftarrowkey?
Q: 8.Howcanyousavethecurrentcontentsofthescreentoanimagefilenamedscreenshot.png?
Q: 9.WhatcodewouldsetatwosecondpauseaftereveryPyAutoGUIfunctioncall?
PracticeProjectsForpractice,writeprogramsthatdothefollowing.
LookingBusyManyinstantmessagingprogramsdeterminewhetheryouareidle,orawayfromyourcomputer,bydetectingalackofmousemovementoversomeperiodoftime—say,tenminutes.Maybeyou’dliketosneakawayfromyourdeskforawhilebutdon’twantotherstoseeyourinstantmessengerstatusgointoidlemode.Writeascripttonudgeyourmousecursorslightlyeverytenseconds.Thenudgeshouldbesmallenoughsothatitwon’tgetinthewayifyoudohappentoneedtouseyourcomputerwhilethescriptisrunning.
InstantMessengerBotGoogleTalk,Skype,YahooMessenger,AIM,andotherinstantmessagingapplicationsoftenuseproprietaryprotocolsthatmakeitdifficultforotherstowritePythonmodulesthatcaninteractwiththeseprograms.Buteventheseproprietaryprotocolscan’tstopyoufromwritingaGUIautomationtool.
TheGoogleTalkapplicationhasasearchbarthatletsyouenterausernameonyourfriendlistandopenamessagingwindowwhenyoupressENTER.Thekeyboardfocusautomaticallymovestothenewwindow.Otherinstantmessengerapplicationshavesimilarwaystoopennewmessagewindows.Writeaprogramthatwillautomaticallysendoutanotificationmessagetoaselectgroupofpeopleonyourfriendlist.Yourprogrammayhavetodealwithexceptionalcases,suchasfriendsbeingoffline,thechatwindowappearingatdifferentcoordinatesonthescreen,orconfirmationboxesthatinterruptyourmessaging.Yourprogramwillhavetotakescreen-shotstoguideitsGUIinteractionandadoptwaysofdetectingwhenitsvirtualkeystrokesaren’tbeingsent.
NOTE
Youmaywanttosetupsomefaketestaccountssothatyoudon’taccidentallyspamyourrealfriendswhilewritingthisprogram.
Game-PlayingBotTutorialThereisagreattutorialtitled“HowtoBuildaPythonBotThatCanPlayWebGames”thatyoucanfindathttp://nostarch.com/automatestuff/.ThistutorialexplainshowtocreateaGUIautomationprograminPythonthatplaysaFlashgamecalledSushiGoRound.Thegameinvolvesclickingthecorrectingredientbuttonstofillcustomers’sushiorders.Thefasteryoufillorderswithoutmistakes,themorepointsyouget.ThisisaperfectlysuitedtaskforaGUIautomationprogram—andawaytocheattoahighscore!ThetutorialcoversmanyofthesametopicsthatthischaptercoversbutalsoincludesdescriptionsofPyAutoGUI’sbasicimagerecognitionfeatures.
AppendixA.InstallingThird-PartyModulesBeyondthestandardlibraryofmodulespackagedwithPython,otherdevelopershavewrittentheirownmodulestoextendPython’scapabilitiesevenfurther.Theprimarywaytoinstallthird-partymodulesistousePython’spiptool.ThistoolsecurelydownloadsandinstallsPythonmodulesontoyourcomputerfromhttps://pypi.python.org/,thewebsiteofthePythonSoftwareFoundation.PyPI,orthePythonPackageIndex,isasortoffreeappstoreforPythonmodules.
ThepipToolTheexecutablefileforthepiptooliscalledpiponWindowsandpip3onOSXandLinux.OnWindows,youcanfindpipatC:\Python34\Scripts\pip.exe.OnOSX,itisin/Library/Frameworks/Python.framework/Versions/3.4/bin/pip3.OnLinux,itisin/usr/bin/pip3.
WhilepipcomesautomaticallyinstalledwithPython3.4onWindowsandOSX,youmustinstallitseparatelyonLinux.Toinstallpip3onUbuntuorDebianLinux,openanewTerminalwindowandentersudoapt-getinstallpython3-pip.Toinstallpip3onFedoraLinux,entersudoyuminstallpython3-pipintoaTerminalwindow.Youwillneedtoentertheadministratorpasswordforyourcomputerinordertoinstallthissoftware.
InstallingThird-PartyModulesThepiptoolismeanttoberunfromthecommandline:Youpassitthecommandinstallfollowedbythenameofthemoduleyouwanttoinstall.Forexample,onWindowsyouwouldenterpipinstallModuleName,whereModuleNameisthenameofthemodule.OnOSXandLinux,you’llhavetorunpip3withthesudoprefixtograntadministrativeprivilegestoinstallthemodule.Youwouldneedtotypesudopip3installModuleName.
IfyoualreadyhavethemoduleinstalledbutwouldliketoupgradeittothelatestversionavailableonPyPI,runpipinstall–UModuleName(orpip3install–UModuleNameonOSXandLinux).
Afterinstallingthemodule,youcantestthatitinstalledsuccessfullybyrunningimportModuleNameintheinteractiveshell.Ifnoerrormessagesaredisplayed,youcanassumethemodulewasinstalledsuccessfully.
Youcaninstallallofthemodulescoveredinthisbookbyrunningthecommandslistednext.(Remembertoreplacepipwithpip3ifyou’reonOSXorLinux.)
pipinstallsend2trash
pipinstallrequests
pipinstallbeautifulsoup4
pipinstallselenium
pipinstallopenpyxl
pipinstallPyPDF2
pipinstallpython-docx(installpython-docx,notdocx)pipinstallimapclient
pipinstallpyzmail
pipinstalltwilio
pipinstallpillow
pipinstallpyobjc-core(onOSXonly)pipinstallpyobjc(onOSXonly)pipinstallpython3-xlib(onLinuxonly)pipinstallpyautogui
NOTE
ForOSXusers:Thepyobjcmodulecantake20minutesorlongertoinstall,sodon’tbealarmedifittakesawhile.Youshouldalsoinstallthepyobjc-coremodulefirst,whichwillreducetheoverallinstallationtime.
AppendixB.RunningProgramsIfyouhaveaprogramopeninIDLE’sfileeditor,runningitisasimplematterofpressingF5orselectingtheRun▸RunModulemenuitem.Thisisaneasywaytorunprogramswhilewritingthem,butopeningIDLEtorunyourfinishedprogramscanbeaburden.TherearemoreconvenientwaystoexecutePythonscripts.
ShebangLineThefirstlineofallyourPythonprogramsshouldbeashebangline,whichtellsyourcomputerthatyouwantPythontoexecutethisprogram.Theshebanglinebeginswith#!,buttherestdependsonyouroperatingsystem.
OnWindows,theshebanglineis#!python3.OnOSX,theshebanglineis#!/usr/bin/envpython3.OnLinux,theshebanglineis#!/usr/bin/python3.
YouwillbeabletorunPythonscriptsfromIDLEwithouttheshebangline,butthelineisneededtorunthemfromthecommandline.
RunningPythonProgramsonWindowsOnWindows,thePython3.4interpreterislocatedatC:\Python34\python.exe.Alternatively,theconvenientpy.exeprogramwillreadtheshebanglineatthetopofthe.pyfile’ssourcecodeandruntheappropriateversionofPythonforthatscript.Thepy.exeprogramwillmakesuretorunthePythonprogramwiththecorrectversionofPythonifmultipleversionsareinstalledonyourcomputer.
TomakeitconvenienttorunyourPythonprogram,createa.batbatchfileforrunningthePythonprogramwithpy.exe.Tomakeabatchfile,makeanewtextfilecontainingasinglelinelikethefollowing:
@py.exeC:\path\to\your\pythonScript.py%*
Replacethispathwiththeabsolutepathtoyourownprogram,andsavethisfilewitha.batfileextension(forexample,pythonScript.bat).ThisbatchfilewillkeepyoufromhavingtotypethefullabsolutepathforthePythonprogrameverytimeyouwanttorunit.Irecommendyouplaceallyourbatchand.pyfilesinasinglefolder,suchasC:\MyPythonScriptsorC:\Users\YourName\PythonScripts.
TheC:\MyPythonScriptsfoldershouldbeaddedtothesystempathonWindowssothatyoucanrunthebatchfilesinitfromtheRundialog.Todothis,modifythePATHenvironmentvariable.ClicktheStartbuttonandtypeEditenvironmentvariablesforyouraccount.Thisoptionshouldauto-completeafteryou’vebeguntotypeit.TheEnvironmentVariableswindowthatappearswilllooklikeFigureB-1.
FromSystemvariables,selectthePathvariableandclickEdit.IntheValuetextfield,appendasemicolon,typeC:\MyPythonScripts,andthenclickOK.NowyoucanrunanyPythonscriptintheC:\MyPythonScriptsfolderbysimplypressingWIN-Randenteringthescript’sname.RunningpythonScript,forinstance,willrunpythonScript.bat,whichinturnwillsaveyoufromhavingtorunthewholecommandpy.exeC:\MyPythonScripts\pythonScript.pyfromtheRundialog.
RunningPythonProgramsonOSXandLinuxOnOSX,selectingApplications▸Utilities▸TerminalwillbringupaTerminalwindow.ATerminalwindowisawaytoentercommandsonyourcomputerusingonlytext,ratherthanclickingthroughagraphicinterface.TobringuptheTerminalwindowonUbuntuLinux,presstheWIN(orSUPER)keytobringupDashandtypeinTerminal.
TheTerminalwindowwillbegininthehomefolderofyouruseraccount.Ifmyusernameisasweigart,thehomefolderwillbe/Users/asweigartonOSXand/home/asweigartonLinux.Thetilde(~)characterisashortcutforyourhomefolder,soyoucanentercd~tochangetoyourhomefolder.Youcanalsousethecdcommandtochangethecurrentworkingdirectorytoanyotherdirectory.OnbothOSXandLinux,thepwdcommandwillprintthecurrentworkingdirectory.
TorunyourPythonprograms,saveyour.pyfiletoyourhomefolder.Then,changethe.pyfile’spermissionstomakeitexecutablebyrunningchmod+xpythonScript.py.Filepermissionsarebeyondthescopeofthisbook,butyouwillneedtorunthiscommandonyourPythonfileifyouwanttoruntheprogramfromtheTerminalwindow.Onceyoudoso,youwillbeabletorunyourscriptwheneveryouwantbyopeningaTerminalwindowandentering./pythonScript.py.TheshebanglineatthetopofthescriptwilltelltheoperatingsystemwheretolocatethePythoninterpreter.
AppendixC.AnswerstothePracticeQuestionsThisappendixcontainstheanswerstothepracticeproblemsattheendofeachchapter.Ihighlyrecommendthatyoutakethetimetoworkthroughtheseproblems.Programmingismorethanmemorizingsyntaxandalistoffunctionnames.Aswhenlearningaforeignlanguage,themorepracticeyouputintoit,themoreyouwillgetoutofit.Therearemanywebsiteswithpracticeprogrammingproblemsaswell.Youcanfindalistoftheseathttp://nostarch.com/automatestuff/.
Chapter11. Theoperatorsare+,-,*,and/.Thevaluesare'hello',-88.8,and5.2. Thestringis'spam';thevariableisspam.Stringsalwaysstartandendwithquotes.3. Thethreedatatypesintroducedinthischapterareintegers,floating-pointnumbers,
andstrings.4. Anexpressionisacombinationofvaluesandoperators.Allexpressionsevaluate
(thatis,reduce)toasinglevalue.5. Anexpressionevaluatestoasinglevalue.Astatementdoesnot.6. Thebaconvariableissetto20.Thebacon+1expressiondoesnotreassignthe
valueinbacon(thatwouldneedanassignmentstatement:bacon=bacon+1).7. Bothexpressionsevaluatetothestring'spamspamspam'.8. Variablenamescannotbeginwithanumber.9. Theint(),float(),andstr()functionswillevaluatetotheinteger,floating-point
number,andstringversionsofthevaluepassedtothem.10. Theexpressioncausesanerrorbecause99isaninteger,andonlystringscanbe
concatenatedtootherstringswiththe+operator.ThecorrectwayisIhaveeaten'+str(99)+'burritos.'.
Chapter21. TrueandFalse,usingcapitalTandF,withtherestofthewordinlowercase2. and,or,andnot3. TrueandTrueisTrue.
TrueandFalseisFalse.FalseandTrueisFalse.FalseandFalseisFalse.TrueorTrueisTrue.TrueorFalseisTrue.FalseorTrueisTrue.FalseorFalseisFalse.notTrueisFalse.notFalseisTrue.
4. False
False
True
False
False
True
5. ==,!=,<,>,<=,and>=.6. ==istheequaltooperatorthatcomparestwovaluesandevaluatestoaBoolean,
while=istheassignmentoperatorthatstoresavalueinavariable.7. Aconditionisanexpressionusedinaflowcontrolstatementthatevaluatestoa
Booleanvalue.8. Thethreeblocksareeverythinginsidetheifstatementandthelines
print('bacon')andprint('ham').print('eggs')
ifspam>5:
print('bacon')
else:
print('ham')
print('spam')
9. Thecode:ifspam==1:
print('Hello')
elifspam==2:
print('Howdy')
else:
print('Greetings!')
10. PressCTRL-Ctostopaprogramstuckinaninfiniteloop.11. Thebreakstatementwillmovetheexecutionoutsideandjustafteraloop.The
continuestatementwillmovetheexecutiontothestartoftheloop.12. Theyalldothesamething.Therange(10)callrangesfrom0upto(butnot
including)10,range(0,10)explicitlytellsthelooptostartat0,andrange(0,10,1)explicitlytellsthelooptoincreasethevariableby1oneachiteration.
13. Thecode:foriinrange(1,11):
print(i)
Chapter31. Functionsreducetheneedforduplicatecode.Thismakesprogramsshorter,easierto
read,andeasiertoupdate.2. Thecodeinafunctionexecuteswhenthefunctioniscalled,notwhenthefunctionis
defined.3. Thedefstatementdefines(thatis,creates)afunction.4. Afunctionconsistsofthedefstatementandthecodeinitsdefclause.
Afunctioncalliswhatmovestheprogramexecutionintothefunction,andthefunctioncallevaluatestothefunction’sreturnvalue.
5. Thereisoneglobalscope,andalocalscopeiscreatedwheneverafunctioniscalled.6. Whenafunctionreturns,thelocalscopeisdestroyed,andallthevariablesinitare
forgotten.7. Areturnvalueisthevaluethatafunctioncallevaluatesto.Likeanyvalue,areturn
valuecanbeusedaspartofanexpression.8. Ifthereisnoreturnstatementforafunction,itsreturnvalueisNone.9. Aglobalstatementwillforceavariableinafunctiontorefertotheglobalvariable.10. ThedatatypeofNoneisNoneType.11. Thatimportstatementimportsamodulenamedareallyourpetsnamederic.(This
isn’tarealPythonmodule,bytheway.)12. Thisfunctioncanbecalledwithspam.bacon().13. Placethelineofcodethatmightcauseanerrorinatryclause.14. Thecodethatcouldpotentiallycauseanerrorgoesinthetryclause.
Thecodethatexecutesifanerrorhappensgoesintheexceptclause.
Chapter41. Theemptylistvalue,whichisalistvaluethatcontainsnoitems.Thisissimilarto
how''istheemptystringvalue.2. spam[2]='hello'(Noticethatthethirdvalueinalistisatindex2becausethe
firstindexis0.)3. 'd'(Notethat'3'*2isthestring'33',whichispassedtoint()beforebeing
dividedby11.Thiseventuallyevaluatesto3.Expressionscanbeusedwherevervaluesareused.)
4. 'd'(Negativeindexescountfromtheend.)5. ['a','b']6. 17. [3.14,'cat',11,'cat',True,99]8. [3.14,11,'cat',True]9. Theoperatorforlistconcatenationis+,whiletheoperatorforreplicationis*.(This
isthesameasforstrings.)10. Whileappend()willaddvaluesonlytotheendofalist,insert()canaddthem
anywhereinthelist.11. Thedelstatementandtheremove()listmethodaretwowaystoremovevaluesfrom
alist.12. Bothlistsandstringscanbepassedtolen(),haveindexesandslices,beusedinfor
loops,beconcatenatedorreplicated,andbeusedwiththeinandnotinoperators.13. Listsaremutable;theycanhavevaluesadded,removed,orchanged.Tuplesare
immutable;theycannotbechangedatall.Also,tuplesarewrittenusingparentheses,(and),whilelistsusethesquarebrackets,[and].
14. (42,)(Thetrailingcommaismandatory.)15. Thetuple()andlist()functions,respectively16. Theycontainreferencestolistvalues.17. Thecopy.copy()functionwilldoashallowcopyofalist,whilethe
copy.deepcopy()functionwilldoadeepcopyofalist.Thatis,onlycopy.deepcopy()willduplicateanylistsinsidethelist.
Chapter51. Twocurlybrackets:{}2. {'foo':42}3. Theitemsstoredinadictionaryareunordered,whiletheitemsinalistareordered.4. YougetaKeyErrorerror.5. Thereisnodifference.Theinoperatorcheckswhetheravalueexistsasakeyinthe
dictionary.6. 'cat'inspamcheckswhetherthereisa'cat'keyinthedictionary,while'cat'
inspam.values()checkswhetherthereisavalue'cat'foroneofthekeysinspam.
7. spam.setdefault('color','black')8. pprint.pprint()
Chapter61. Escapecharactersrepresentcharactersinstringvaluesthatwouldotherwisebe
difficultorimpossibletotypeintocode.2. \nisanewline;\tisatab.3. The\\escapecharacterwillrepresentabackslashcharacter.4. ThesinglequoteinHowl'sisfinebecauseyou’veuseddoublequotestomarkthe
beginningandendofthestring.5. Multilinestringsallowyoutousenewlinesinstringswithoutthe\nescape
character.6. Theexpressionsevaluatetothefollowing:
'e'
'Hello'
'Hello'
'loworld!
7. Theexpressionsevaluatetothefollowing:
'HELLO'
True
'hello'
8. Theexpressionsevaluatetothefollowing:
['Remember,','remember,','the','fifth','of','November.']
'There-can-be-only-one.'
9. Therjust(),ljust(),andcenter()stringmethods,respectively10. Thelstrip()andrstrip()methodsremovewhitespacefromtheleftandright
endsofastring,respectively.
Chapter71. There.compile()functionreturnsRegexobjects.2. Rawstringsareusedsothatbackslashesdonothavetobeescaped.3. Thesearch()methodreturnsMatchobjects.4. Thegroup()methodreturnsstringsofthematchedtext.5. Group0istheentirematch,group1coversthefirstsetofparentheses,andgroup2
coversthesecondsetofparentheses.6. Periodsandparenthesescanbeescapedwithabackslash:\.,\(,and\).7. Iftheregexhasnogroups,alistofstringsisreturned.Iftheregexhasgroups,alist
oftuplesofstringsisreturned.8. The|charactersignifiesmatching“either,or”betweentwogroups.9. The?charactercaneithermean“matchzerooroneoftheprecedinggroup”orbe
usedtosignifynongreedymatching.10. The+matchesoneormore.The*matcheszeroormore.11. The{3}matchesexactlythreeinstancesoftheprecedinggroup.The{3,5}matches
betweenthreeandfiveinstances.12. The\d,\w,and\sshorthandcharacterclassesmatchasingledigit,word,orspace
character,respectively.13. The\D,\W,and\Sshorthandcharacterclassesmatchasinglecharacterthatisnota
digit,word,orspacecharacter,respectively.14. Passingre.Iorre.IGNORECASEasthesecondargumenttore.compile()willmake
thematchingcaseinsensitive.15. The.characternormallymatchesanycharacterexceptthenewlinecharacter.If
re.DOTALLispassedasthesecondargumenttore.compile(),thenthedotwillalsomatchnewlinecharacters.
16. The.*performsagreedymatch,andthe.*?performsanongreedymatch.17. Either[0-9a-z]or[a-z0-9]18. 'Xdrummers,Xpipers,fiverings,Xhens'19. There.VERBOSEargumentallowsyoutoaddwhitespaceandcommentstothestring
passedtore.compile().20. re.compile(r'^\d{1,3}(,{3})*$')willcreatethisregex,butotherregexstrings
canproduceasimilarregularexpression.21. re.compile(r'[A-Z][a-z]*\sNakamoto')22. re.compile(r'(Alice|Bob|Carol)\s(eats|pets|throws)\
s(apples|cats|baseballs)\.',re.IGNORECASE)
Chapter81. Relativepathsarerelativetothecurrentworkingdirectory.2. Absolutepathsstartwiththerootfolder,suchas/orC:\.3. Theos.getcwd()functionreturnsthecurrentworkingdirectory.Theos.chdir()
functionchangesthecurrentworkingdirectory.4. The.folderisthecurrentfolder,and..istheparentfolder.5. C:\bacon\eggsisthedirname,whilespam.txtisthebasename.6. Thestring'r'forreadmode,'w'forwritemode,and'a'forappendmode7. Anexistingfileopenedinwritemodeiserasedandcompletelyoverwritten.8. Theread()methodreturnsthefile’sentirecontentsasasinglestringvalue.The
readlines()methodreturnsalistofstrings,whereeachstringisalinefromthefile’scontents.
9. Ashelfvalueresemblesadictionaryvalue;ithaskeysandvalues,alongwithkeys()andvalues()methodsthatworksimilarlytothedictionarymethodsofthesamenames.
Chapter91. Theshutil.copy()functionwillcopyasinglefile,whileshutil.copytree()will
copyanentirefolder,alongwithallitscontents.2. Theshutil.move()functionisusedforrenamingfiles,aswellasmovingthem.3. Thesend2trashfunctionswillmoveafileorfoldertotherecyclebin,whileshutil
functionswillpermanentlydeletefilesandfolders.4. Thezipfile.ZipFile()functionisequivalenttotheopen()function;thefirst
argumentisthefilename,andthesecondargumentisthemodetoopentheZIPfilein(read,write,orappend).
Chapter101. assert(spam>=10,'Thespamvariableislessthan10.')2. assert(eggs.lower()!=bacon.lower(),'Theeggsandbaconvariablesare
thesame!')orassert(eggs.upper()!=bacon.upper(),'Theeggsandbaconvariablesarethesame!')
3. assert(False,'Thisassertionalwaystriggers.')4. Tobeabletocalllogging.debug(),youmusthavethesetwolinesatthestartof
yourprogram:importlogging
logging.basicConfig(level=logging.DEBUG,format='%(asctime)s-
%(levelname)s-%(message)s')
5. TobeabletosendloggingmessagestoafilenamedprogramLog.txtwithlogging.debug(),youmusthavethesetwolinesatthestartofyourprogram:
importlogging
>>>logging.basicConfig(filename='programLog.txt',level=logging.DEBUG,
format='%(asctime)s-%(levelname)s-%(message)s')
6. DEBUG,INFO,WARNING,ERROR,andCRITICAL7. logging.disable(logging.CRITICAL)8. Youcandisableloggingmessageswithoutremovingtheloggingfunctioncalls.You
canselectivelydisablelower-levelloggingmessages.Youcancreateloggingmessages.Loggingmessagesprovidesatimestamp.
9. TheStepbuttonwillmovethedebuggerintoafunctioncall.TheOverbuttonwillquicklyexecutethefunctioncallwithoutsteppingintoit.TheOutbuttonwillquicklyexecutetherestofthecodeuntilitstepsoutofthefunctionitcurrentlyisin.
10. AfteryouclickGo,thedebuggerwillstopwhenithasreachedtheendoftheprogramoralinewithabreakpoint.
11. Abreakpointisasettingonalineofcodethatcausesthedebuggertopausewhentheprogramexecutionreachestheline.
12. TosetabreakpointinIDLE,right-clickthelineandselectSetBreakpointfromthecontextmenu.
Chapter111. Thewebbrowsermodulehasanopen()methodthatwilllaunchawebbrowsertoa
specificURL,andthat’sit.TherequestsmodulecandownloadfilesandpagesfromtheWeb.TheBeautifulSoupmoduleparsesHTML.Finally,theseleniummodulecanlaunchandcontrolabrowser.
2. Therequests.get()functionreturnsaResponseobject,whichhasatextattributethatcontainsthedownloadedcontentasastring.
3. Theraise_for_status()methodraisesanexceptionifthedownloadhadproblemsanddoesnothingifthedownloadsucceeded.
4. Thestatus_codeattributeoftheResponseobjectcontainstheHTTPstatuscode.5. Afteropeningthenewfileonyourcomputerin'wb'“writebinary”mode,useafor
loopthatiteratesovertheResponseobject’siter_content()methodtowriteoutchunkstothefile.Here’sanexample:
saveFile=open('filename.html','wb')
forchunkinres.iter_content(100000):
saveFile.write(chunk)
6. F12bringsupthedevelopertoolsinChrome.PressingCTRL-SHIFT-C(onWindowsandLinux)or⌘-OPTION-C(onOSX)bringsupthedevelopertoolsinFirefox.
7. Right-clicktheelementinthepage,andselectInspectElementfromthemenu.8. '#main'9. '.highlight'10. 'divdiv'11. 'button[value="favorite"]'12. spam.getText()13. linkElem.attrs14. Theseleniummoduleisimportedwithfromseleniumimportwebdriver.15. Thefind_element_*methodsreturnthefirstmatchingelementasaWebElement
object.Thefind_elements_*methodsreturnalistofallmatchingelementsasWebElementobjects.
16. Theclick()andsend_keys()methodssimulatemouseclicksandkeyboardkeys,respectively.
17. Callingthesubmit()methodonanyelementwithinaformsubmitstheform.18. Theforward(),back(),andrefresh()WebDriverobjectmethodssimulatethese
browserbuttons.
Chapter121. Theopenpyxl.load_workbook()functionreturnsaWorkbookobject.2. Theget_sheet_names()methodreturnsaWorksheetobject.3. Callwb.get_sheet_by_name('Sheet1').4. Callwb.get_active_sheet().5. sheet['C5'].valueorsheet.cell(row=5,column=3).value6. sheet['C5']='Hello'orsheet.cell(row=5,column=3).value='Hello'7. cell.rowandcell.column8. Theyreturnthehighestcolumnandrowwithvaluesinthesheet,respectively,as
integervalues.9. openpyxl.cell.column_index_from_string('M')10. openpyxl.cell.get_column_letter(14)11. sheet['A1':'F1']12. wb.save('example.xlsx’)13. Aformulaissetthesamewayasanyvalue.Setthecell’svalueattributetoastring
oftheformulatext.Rememberthatformulasbeginwiththe=sign.14. Whencallingload_workbook(),passTrueforthedata_onlykeywordargument.15. sheet.row_dimensions[5].height=10016. sheet.column_dimensions['C'].hidden=True17. OpenPyXL2.0.5doesnotloadfreezepanes,printtitles,images,orcharts.18. Freezepanesarerowsandcolumnsthatwillalwaysappearonthescreen.Theyare
usefulforheaders.19. openpyxl.charts.Reference(),openpyxl.charts.Series(),openpyxl.charts.
BarChart(),chartObj.append(seriesObj),andadd_chart()
Chapter131. AFileobjectreturnedfromopen()2. Read-binary('rb')forPdfFileReader()andwrite-binary('wb')for
PdfFileWriter()
3. CallinggetPage(4)willreturnaPageobjectforAboutThisBook,sincepage0isthefirstpage.
4. ThenumPagesvariablestoresanintegerofthenumberofpagesinthePdfFileReaderobject.
5. Calldecrypt('swordfish').6. TherotateClockwise()androtateCounterClockwise()methods.Thedegreesto
rotateispassedasanintegerargument.7. docx.Document('demo.docx')8. Adocumentcontainsmultipleparagraphs.Aparagraphbeginsonanewlineand
containsmultipleruns.Runsarecontiguousgroupsofcharacterswithinaparagraph.9. Usedoc.paragraphs.10. ARunobjecthasthesevariables(notaParagraph).11. TruealwaysmakestheRunobjectboldedandFalsemakesitalwaysnotbolded,no
matterwhatthestyle’sboldsettingis.NonewillmaketheRunobjectjustusethestyle’sboldsetting.
12. Callthedocx.Document()function.13. doc.add_paragraph('Hellothere!')14. Theintegers0,1,2,3,and4
Chapter141. InExcel,spreadsheetscanhavevaluesofdatatypesotherthanstrings;cellscan
havedifferentfonts,sizes,orcolorsettings;cellscanhavevaryingwidthsandheights;adjacentcellscanbemerged;andyoucanembedimagesandcharts.
2. YoupassaFileobject,obtainedfromacalltoopen().3. Fileobjectsneedtobeopenedinread-binary('rb')forReaderobjectsandwrite-
binary('wb')forWriterobjects.4. Thewriterow()method5. Thedelimiterargumentchangesthestringusedtoseparatecellsinarow.The
lineterminatorargumentchangesthestringusedtoseparaterows.6. json.loads()7. json.dumps()
Chapter151. Areferencemomentthatmanydateandtimeprogramsuse.ThemomentisJanuary
1st,1970,UTC.2. time.time()3. time.sleep(5)4. Itreturnstheclosestintegertotheargumentpassed.Forexample,round(2.4)
returns2.5. Adatetimeobjectrepresentsaspecificmomentintime.Atimedeltaobject
representsadurationoftime.6. threadObj=threading.Thread(target=spam)7. threadObj.start()8. Makesurethatcoderunninginonethreaddoesnotreadorwritethesamevariables
ascoderunninginanotherthread.9. subprocess.Popen('c:\\Windows\\System32\\calc.exe')
Chapter161. SMTPandIMAP,respectively2. smtplib.SMTP(),smtpObj.ehlo(),smptObj.starttls(),andsmtpObj.login()3. imapclient.IMAPClient()andimapObj.login()4. AlistofstringsofIMAPkeywords,suchas'BEFORE<date>','FROM<string>',or
'SEEN'
5. Assignthevariableimaplib._MAXLINEalargeintegervalue,suchas10000000.6. Thepyzmailmodulereadsdownloadedemails.7. YouwillneedtheTwilioaccountSIDnumber,theauthenticationtokennumber,and
yourTwiliophonenumber.
Chapter171. AnRGBAvalueisatupleof4integers,eachrangingfrom0to255.Thefour
integerscorrespondtotheamountofred,green,blue,andalpha(transparency)inthecolor.
2. AfunctioncalltoImageColor.getcolor('CornflowerBlue','RGBA')willreturn(100,149,237,255),theRGBAvalueforthatcolor.
3. Aboxtupleisatuplevalueoffourintegers:theleftedgex-coordinate,thetopedgey-coordinate,thewidth,andtheheight,respectively.
4. Image.open('zophie.png')5. imageObj.sizeisatupleoftwointegers,thewidthandtheheight.6. imageObj.crop((0,50,50,50)).Noticethatyouarepassingaboxtupleto
crop(),notfourseparateintegerarguments.7. CalltheimageObj.save('new_filename.png')methodoftheImageobject.8. TheImageDrawmodulecontainscodetodrawonimages.9. ImageDrawobjectshaveshape-drawingmethodssuchaspoint(),line(),or
rectangle().TheyarereturnedbypassingtheImageobjecttotheImageDraw.Draw()function.
Chapter181. Movethemousetothetop-leftcornerofthescreen,thatis,the(0,0)coordinates.2. pyautogui.size()returnsatuplewithtwointegersforthewidthandheightofthe
screen.3. pyautogui.position()returnsatuplewithtwointegersforthex-andy-
coordinatesofthemousecursor.4. ThemoveTo()functionmovesthemousetoabsolutecoordinatesonthescreen,
whilethemoveRel()functionmovesthemouserelativetothemouse’scurrentposition.
5. pyautogui.dragTo()andpyautogui.dragRel()6. pyautogui.typewrite('Helloworld!')7. Eitherpassalistofkeyboardkeystringstopyautogui.typewrite()(suchas
'left')orpassasinglekeyboardkeystringtopyautogui.press().8. pyautogui.screenshot('screenshot.png')9. pyautogui.PAUSE=2
AppendixD.ResourcesVisithttp://nostarch.com/automatestuff/forresources,errata,andmoreinformation.
Moreno-nonsensebooksfrom NOSTARCHPRESS
PYTHONPLAYGROUND
GeekyWeekendProjectsfortheCuriousProgrammer
byMAHESHVENKITACHALAM
MAY2015,304PP.,$29.95
ISBN978-1-59327-604-1
PYTHONCRASHCOURSE
AHands-On,Project-BasedIntroductiontoProgramming
byERICMATTHES
JULY2015,624PP.,$34.95
ISBN978-1-59327-603-4
THELINUXCOMMANDLINE
ACompleteIntroduction
byWILLIAME.SHOTTS,JR.
JANUARY2012,480PP.,$39.95
ISBN978-1-59327-389-7
JAVASCRIPTFORKIDS
APlayfulIntroductiontoProgramming
byNICKMORGAN
DECEMBER2014,336PP.,$34.95
ISBN978-1-59327-408-5
fullcolor
STATISTICSDONEWRONG
TheWoefullyCompleteGuide
byALEXREINHART
MARCH2015,176PP.,$24.95
ISBN978-1-59327-620-1
DATAVISUALIZATIONWITHJAVASCRIPT
bySTEPHENA.THOMAS
MARCH2015,384PP.,$39.95
ISBN978-1-59327-605-8
fullcolor
PHONE:
800.420.7240OR415.863.9900
EMAIL:
WEB:
WWW.NOSTARCH.COM
IndexANOTEONTHEDIGITALINDEX
Alinkinanindexentryisdisplayedasthesectiontitleinwhichthatentryappears.Becausesomesectionshavemultipleindexmarkers,itisnotunusualforanentrytohaveseverallinkstothesamesection.Clickingonanylinkwilltakeyoudirectlytotheplaceinthetextinwhichthemarkerappears.
Symbols=(assignment)operator,StringConcatenationandReplication,ComparisonOperators
$(dollarsign),CharacterClasses,MatchingNewlineswiththeDotCharacter
.(dotcharacter),TheCaretandDollarSignCharacters
usinginpaths,TheCurrentWorkingDirectory
wildcardmatches,TheCaretandDollarSignCharacters
”(doublequotes),StringLiterals
**(exponent)operator,EnteringExpressionsintotheInteractiveShell
==(equalto)operator,BooleanValues,ComparisonOperators
/(forwardslash),FilesandFilePaths
divisionoperator,EnteringExpressionsintotheInteractiveShell,TheMultipleAssignmentTrick
\(backslash),StringLiterals,CreatingRegexObjects,MatchingNewlineswiththeDotCharacter,FilesandFilePaths
linecontinuationcharacter,ExampleProgram:Magic8BallwithaList
>(greaterthan)operator,BooleanValues
>=(greaterthanorequalto)operator,BooleanValues
#(hashcharacter),MultilineStringswithTripleQuotes
//(integerdivision/flooredquotient)operator,EnteringExpressionsintotheInteractiveShell
<(lessthan)operator,BooleanValues
<=(lessthanorequalto)operator,BooleanValues
%(modulus/remainder)operator,EnteringExpressionsintotheInteractiveShell,TheMultipleAssignmentTrick
*(multiplication)operator,EnteringExpressionsintotheInteractiveShell,GettingaList’sLengthwithlen(),TheMultipleAssignmentTrick
!=(notequalto)operator,BooleanValues
()(parentheses),MutableandImmutableDataTypes,ReviewofRegularExpressionMatching
|(pipecharacter),GroupingwithParentheses,ManagingComplexRegexes
+(plussign),OptionalMatchingwiththeQuestionMark,MatchingNewlineswiththeDotCharacter
additionoperator,EnteringExpressionsintotheInteractiveShell,TheInteger,Floating-Point,andStringDataTypes,GettingaList’sLengthwithlen(),TheMultipleAssignmentTrick
?(questionmark),MatchingMultipleGroupswiththePipe,MatchingNewlineswiththeDotCharacter
‘(singlequote),StringLiterals
[](squarebrackets),TheListDataType,MatchingNewlineswiththeDotCharacter
*(star),MatchingNewlineswiththeDotCharacter
usingwithwildcardcharacter,TheWildcardCharacter
zeroormorematcheswith,OptionalMatchingwiththeQuestionMark
-(subtraction)operator,EnteringExpressionsintotheInteractiveShell,TheMultipleAssignmentTrick
^(caretsymbol),MatchingNewlineswiththeDotCharacter
matchingbeginningofstring,CharacterClasses
negativecharacterclasses,CharacterClasses
”’(triplequotes),EscapeCharacters,ManagingComplexRegexes
_(underscore),VariableNames
:(colon),BlocksofCode,whileLoopStatements,forLoopsandtherange()Function,NegativeIndexes,IndexingandSlicingStrings
{}(curlybrackets),DictionariesandStructuringData,MatchingNewlineswiththeDotCharacter
greedyvs.nongreedymatching,MatchingOneorMorewiththePlus
matchingspecificrepetitionswith,MatchingOneorMorewiththePlus
A%Adirective,PausingUntilaSpecificDate
%adirective,PausingUntilaSpecificDate
absolutepaths,TheCurrentWorkingDirectory
abspath()function,Theos.pathModule
addition(+)operator,EnteringExpressionsintotheInteractiveShell,TheInteger,Floating-Point,andStringDataTypes,GettingaList’sLengthwithlen(),TheMultipleAssignmentTrick
additivecolormodel,ColorsandRGBAValues
add_heading()method,WritingWordDocuments
addPage()method,CreatingPDFs
add_paragraph()method,WritingWordDocuments
add_picture()method,AddingHeadings
add_run()method,WritingWordDocuments
algebraicchessnotation,PrettyPrinting
all_capsattribute,RunAttributes
ALLsearchkey,SelectingaFolder
alpha,defined,ComputerImageFundamentals
andoperator,ComparisonOperators
ANSWEREDsearchkey,PerformingtheSearch
API(applicationprogramminginterface),Step3:WriteOuttheCSVFileWithouttheFirstRow
append()method,Methods
application-specificpasswords,LoggingintotheSMTPServer
argskeyword,PassingArgumentstotheThread’sTargetFunction
arguments,function,Comments,defStatementswithParameters
keywordarguments,ReturnValuesandreturnStatements
passingtoprocesses,LaunchingOtherProgramsfromPython
passingtothreads,Multithreading
assertions,Assertions
assignment(=)operator,StringConcatenationandReplication,ComparisonOperators
AT&Tmail,ConnectingtoanSMTPServer,RetrievingandDeletingEmailswithIMAP
attributes,HTML,AQuickRefresher,GettingDatafromanElement’sAttributes
augmentedassignmentoperators,TheMultipleAssignmentTrick
B\bbackspaceescapecharacter,Step3:GetandPrinttheMouseCoordinates
%Bdirective,PausingUntilaSpecificDate
%bdirective,PausingUntilaSpecificDate
back()method,SendingSpecialKeys
backslash(\),StringLiterals,CreatingRegexObjects,MatchingNewlineswiththeDotCharacter,FilesandFilePaths
BarChart()function,Charts
basename()function,HandlingAbsoluteandRelativePaths
BCCsearchkey,PerformingtheSearch
BeautifulSoup,ParsingHTMLwiththeBeautifulSoupModule
(seealsobs4module)
BeautifulSoupobjects,ParsingHTMLwiththeBeautifulSoupModule
BEFOREsearchkey,SelectingaFolder
binaryfiles,FindingFileSizesandFolderContents,WritingtoFiles
binaryoperators,ComparisonOperators
bitwiseoroperator,ManagingComplexRegexes
blankstrings,TheInteger,Floating-Point,andStringDataTypes
blockingexecution,Thetime.time()Function
blocksofcode,MixingBooleanandComparisonOperators
BODYsearchkey,SelectingaFolder
boldattribute,RunAttributes
Booleandatatype
binaryoperators,ComparisonOperators
flowcontroland,FlowControl
inoperator,TheinandnotinOperators
notinoperator,TheinandnotinOperators
“truthy”and“falsey”values,continueStatements
usingbinaryandcomparisonoperatorstogether,BinaryBooleanOperators
boxtuples,CoordinatesandBoxTuples
breakpoints,debuggingusing,DebuggingaNumberAddingProgram
breakstatements
overview,AnAnnoyingwhileLoop
usinginforloop,forLoopsandtherange()Function
browser,openingusingwebbrowsermodule,WebScraping
bs4module
creatingobjectfromHTML,ParsingHTMLwiththeBeautifulSoupModule
findingelementwithselect()method,CreatingaBeautifulSoupObjectfromHTML
gettingattribute,GettingDatafromanElement’sAttributes
overview,ParsingHTMLwiththeBeautifulSoupModule
built-infunctions,ImportingModules
bulletedlist,creatinginWikimarkup,Project:AddingBulletstoWikiMarkup
copyingandpastingclipboard,Project:AddingBulletstoWikiMarkup
joiningmodifiedlines,Step3:JointheModifiedLines
overview,Project:AddingBulletstoWikiMarkup
separatinglinesoftext,Step1:CopyandPastefromtheClipboard
Ccallingfunctions,Comments
callstack,defined,RaisingExceptions
camelcase,VariableNames
caretsymbol(^),MatchingNewlineswiththeDotCharacter
matchingbeginningofstring,CharacterClasses
negativecharacterclasses,CharacterClasses
CascadingStyleSheets(CSS)
matchingwithseleniummodule,FindingElementsonthePage
selectors,CreatingaBeautifulSoupObjectfromHTML
casesensitivity,VariableNames,Case-InsensitiveMatching
CCsearchkey,PerformingtheSearch
Cellobjects,GettingSheetsfromtheWorkbook
cells,inExcelspreadsheets,WorkingwithExcelSpreadsheets
accessingCellobjectbyitsname,GettingSheetsfromtheWorkbook
mergingandunmerging,SettingRowHeightandColumnWidth
writingvaluesto,CreatingandRemovingSheets
center()method,Thejoin()andsplit()StringMethods,ImageRecognition
chainingmethodcalls,RotatingandFlippingImages
characterclasses,Thefindall()Method,MatchingNewlineswiththeDotCharacter
characterstyles,StylingParagraphandRunObjects
charts,Excel,FreezePanes
chdir()function,TheCurrentWorkingDirectory
Chrome,developertoolsin,ViewingtheSourceHTMLofaWebPage
clear()method,FindingElementsonthePage
click()function,ClickingtheMouse,ReviewofthePyAutoGUIFunctions,Project:AutomaticFormFiller
clickingmouse,ClickingtheMouse
click()method,FindingElementsonthePage
clipboard,usingstringfrom,Step3:HandletheClipboardContentandLaunchtheBrowser
CMYKcolormodel,ColorsandRGBAValues
colon(:),BlocksofCode,whileLoopStatements,forLoopsandtherange()Function,NegativeIndexes,IndexingandSlicingStrings
colorvalues
CMYKvs.RGBcolormodels,ColorsandRGBAValues
RGBAvalues,ComputerImageFundamentals
column_index_from_string()function,ConvertingBetweenColumnLettersandNumbers
columns,inExcelspreadsheets
settingheightandwidthof,Formulas
slicingWorksheetobjectstogetCellobjectsin,ConvertingBetweenColumnLettersandNumbers
Comcastmail,ConnectingtoanSMTPServer,RetrievingandDeletingEmailswithIMAP
comma-delimiteditems,TheListDataType
commandlinearguments,Step1:FigureOuttheURL
commentAfterDelay()function,PressingandReleasingtheKeyboard
comments
multiline,MultilineStringswithTripleQuotes
overview,Comments
comparisonoperators
overview,BooleanValues
usingbinaryoperatorswith,BinaryBooleanOperators
compile()function,CreatingRegexObjects,ReviewofRegularExpressionMatching,ManagingComplexRegexes
compressedfiles
backingupfolderinto,Step3:FormtheNewFilenameandRenametheFiles
creatingZIPfiles,ExtractingfromZIPFiles
extractingZIPfiles,ExtractingfromZIPFiles
overview,WalkingaDirectoryTree
readingZIPfiles,CompressingFileswiththezipfileModule
computerscreen
coordinatesof,PausesandFail-Safes
resolutionof,ControllingMouseMovement
concatenation
oflists,GettingaList’sLengthwithlen()
string,TheInteger,Floating-Point,andStringDataTypes
concurrencyissues,PassingArgumentstotheThread’sTargetFunction
conditions,defined,MixingBooleanandComparisonOperators
continuestatements
overview,continueStatements
usinginforloop,forLoopsandtherange()Function
CoordinatedUniversalTime(UTC),ThetimeModule
coordinates
ofcomputerscreen,PausesandFail-Safes
ofanimage,ColorsandRGBAValues
copy()function,PassingReferences,RemovingWhitespacewithstrip(),rstrip(),andlstrip(),OrganizingFiles,CopyingandPastingImagesontoOtherImages
copytree()function,OrganizingFiles
countdownproject,Project:SimpleCountdownProgram
countingdown,Project:SimpleCountdownProgram
overview,Project:SimpleCountdownProgram
playingsoundfile,Project:SimpleCountdownProgram
cProfile.run()function,Thetime.time()Function
crashes,program,EnteringExpressionsintotheInteractiveShell
create_sheet()method,CreatingandRemovingSheets
CRITICALlevel,LoggingLevels
cron,LaunchingOtherProgramsfromPython
croppingimages,WorkingwiththeImageDataType
CSS(CascadingStyleSheets)
matchingwithseleniummodule,FindingElementsonthePage
selectors,CreatingaBeautifulSoupObjectfromHTML
CSVfiles
defined,WorkingwithCSVFilesandJSONData
delimeterfor,ThedelimiterandlineterminatorKeywordArguments
formatoverview,WorkingwithCSVFilesandJSONData
lineterminatorfor,ThedelimiterandlineterminatorKeywordArguments
Readerobjects,ReaderObjects
readingdatainloop,ReadingDatafromReaderObjectsinaforLoop
removingheaderfrom,ThedelimiterandlineterminatorKeywordArguments
loopingthroughCSVfiles,Project:RemovingtheHeaderfromCSVFiles
overview,ThedelimiterandlineterminatorKeywordArguments
readinginCSVfile,Project:RemovingtheHeaderfromCSVFiles
writingoutCSVfile,Step2:ReadintheCSVFile
Writerobjects,ReadingDatafromReaderObjectsinaforLoop
curlybrackets({}),DictionariesandStructuringData,MatchingNewlineswiththeDotCharacter
greedyvs.nongreedymatching,MatchingOneorMorewiththePlus
matchingspecificrepetitionswith,MatchingOneorMorewiththePlus
currentworkingdirectory,TheCurrentWorkingDirectory
D\Dcharacterclass,Thefindall()Method
\dcharacterclass,Thefindall()Method
%ddirective,PausingUntilaSpecificDate
datastructures
algebraicchessnotation,PrettyPrinting
tic-tac-toeboard,UsingDataStructurestoModelReal-WorldThings
datatypes
Booleans,FlowControl
defined,EnteringExpressionsintotheInteractiveShell
dictionaries,DictionariesandStructuringData
floating-pointnumbers,TheInteger,Floating-Point,andStringDataTypes
integers,TheInteger,Floating-Point,andStringDataTypes
list()function,TheTupleDataType
lists,TheListDataType
mutablevs.immutable,List-likeTypes:StringsandTuples
Nonevalue,ReturnValuesandreturnStatements
strings,TheInteger,Floating-Point,andStringDataTypes
tuple()function,TheTupleDataType
tuples,MutableandImmutableDataTypes
datetimemodule
arithmeticusing,ThetimedeltaDataType
convertingobjectstostrings,PausingUntilaSpecificDate
convertingstringstoobjects,ConvertingdatetimeObjectsintoStrings
fromtimestamp()function,ThedatetimeModule
now()function,ThedatetimeModule
overview,ThedatetimeModule,ReviewofPython’sTimeFunctions
pausingprogramuntiltime,PausingUntilaSpecificDate
timedeltadatatype,ThedatetimeModule
total_seconds()method,ThedatetimeModule
datetimeobjects,ThedatetimeModule
convertingtostrings,PausingUntilaSpecificDate
convertingfromstringsto,ConvertingdatetimeObjectsintoStrings
debug()function,UsingtheloggingModule
debugging
assertions,Assertions
defined,WhatIsPython?
gettingtracebackasstring,RaisingExceptions
inIDLE
overview,DisablingLogging
steppingthroughprogram,Over
usingbreakpoints,DebuggingaNumberAddingProgram
logging
disabling,LoggingLevels
tofile,DisablingLogging
levelsof,UsingtheloggingModule
loggingmodule,UsinganAssertioninaTrafficLightSimulation
print()functionand,UsingtheloggingModule
raisingexceptions,Debugging
DEBUGlevel,UsingtheloggingModule
decimalnumbers(seefloating-pointnumbers)
decode()method,GettingEmailAddressesfromaRawMessage
decryption,ofPDFfiles,ExtractingTextfromPDFs
deduplicatingcode,Functions
deepcopy()function,PassingReferences
defstatements,Functions
withparameters,defStatementswithParameters
DELETEDsearchkey,PerformingtheSearch
delete_messages()method,GettingtheBodyfromaRawMessage
deletingfiles/folders
permanently,MovingandRenamingFilesandFolders
usingsend2trashmodule,PermanentlyDeletingFilesandFolders
delstatements,RemovingValuesfromListswithdelStatements
dictionaries
copy()function,PassingReferences
deepcopy()function,PassingReferences
get()method,CheckingWhetheraKeyorValueExistsinaDictionary
inoperator,CheckingWhetheraKeyorValueExistsinaDictionary
items()method,Dictionariesvs.Lists
keys()method,Dictionariesvs.Lists
listsvs.,TheDictionaryDataType
nesting,ATic-Tac-ToeBoard
notinoperator,CheckingWhetheraKeyorValueExistsinaDictionary
overview,DictionariesandStructuringData
setdefault()method,Thesetdefault()Method
values()method,Dictionariesvs.Lists
directories
absolutevs.relativepaths,TheCurrentWorkingDirectory
backslashvs.forwardslash,FilesandFilePaths
copying,OrganizingFiles
creating,Absolutevs.RelativePaths
currentworkingdirectory,TheCurrentWorkingDirectory
defined,ReadingandWritingFiles
deletingpermanently,MovingandRenamingFilesandFolders
deletingusingsend2trashmodule,PermanentlyDeletingFilesandFolders
moving,CopyingFilesandFolders
os.pathmodule
absolutepathsin,Theos.pathModule
filesizes,HandlingAbsoluteandRelativePaths
foldercontents,HandlingAbsoluteandRelativePaths
overview,Theos.pathModule
pathvalidity,FindingFileSizesandFolderContents
relativepathsin,Theos.pathModule
renaming,CopyingFilesandFolders
walking,SafeDeleteswiththesend2trashModule
dirname()function,HandlingAbsoluteandRelativePaths
disable()function,LoggingLevels
division(/)operator,EnteringExpressionsintotheInteractiveShell,TheMultipleAssignmentTrick
Documentobjects,WordDocuments
dollarsign($),CharacterClasses,MatchingNewlineswiththeDotCharacter
dotcharacter(.),TheCaretandDollarSignCharacters
usinginpaths,TheCurrentWorkingDirectory
wildcardmatches,TheCaretandDollarSignCharacters
dot-starcharacter(.*),TheWildcardCharacter
doubleClick()function,ClickingtheMouse,ReviewofthePyAutoGUIFunctions
doublequotes(“),StringLiterals
double_strikeattribute,RunAttributes
downloading
filesfromweb,SavingDownloadedFilestotheHardDrive
webpages,DownloadingFilesfromtheWebwiththerequestsModule
XKCDcomics,Step3:OpenWebBrowsersforEachResult,Project:MultithreadedXKCDDownloader
DRAFTsearchkey,PerformingtheSearch
draggingmouse,ClickingtheMouse
dragRel()function,ClickingtheMouse,DraggingtheMouse,ReviewofthePyAutoGUIFunctions
dragTo()function,ClickingtheMouse,ReviewofthePyAutoGUIFunctions
drawingonimages
ellipses,Lines
exampleprogram,Lines
ImageDrawmodule,IdeasforSimilarPrograms
lines,IdeasforSimilarPrograms
points,IdeasforSimilarPrograms
polygons,Lines
rectangles,Lines
text,DrawingExample
dumps()function,ReadingJSONwiththeloads()Function
durationkeywordarguments,ControllingMouseMovement
Eehlo()method,ConnectingtoanSMTPServer,Step3:SendCustomizedEmailReminders
elements,HTML,SavingDownloadedFilestotheHardDrive
elifstatements,elseStatements
ellipse()method,Lines
elsestatements,ifStatements
emailaddresses,extracting,Combiningre.IGNORECASE,re.DOTALL,andre.VERBOSE
creatingregex,Project:PhoneNumberandEmailAddressExtractor
findingmatchesonclipboard,Step2:CreateaRegexforEmailAddresses
joiningmatchesintoastring,Step3:FindAllMatchesintheClipboardText
overview,Combiningre.IGNORECASE,re.DOTALL,andre.VERBOSE
emails
deleting,GettingtheBodyfromaRawMessage
disconnectingfromserver,GettingtheBodyfromaRawMessage
fetching
folders,ConnectingtoanIMAPServer
gettingmessagecontent,SizeLimits
loggingintoserver,ConnectingtoanIMAPServer
overview,DisconnectingfromtheSMTPServer
rawmessages,FetchinganEmailandMarkingItAsRead
gmail_search()method,SizeLimits
IMAP,DisconnectingfromtheSMTPServer
markingmessageasread,SizeLimits
searching,ConnectingtoanIMAPServer
sending
connectingtoSMTPserver,ConnectingtoanSMTPServer
disconnectingfromserver,DisconnectingfromtheSMTPServer
loggingintoserver,ConnectingtoanSMTPServer
overview,SMTP
reminder,DisconnectingfromtheIMAPServer
sending“hello”message,ConnectingtoanSMTPServer
sendingmessage,LoggingintotheSMTPServer
TLSencryption,ConnectingtoanSMTPServer
SMTP,SMTP
embossattribute,RunAttributes
encryption,ofPDFfiles,OverlayingPages
endswith()method,TheisXStringMethods
epochtimestamps,ThetimeModule,ThedatetimeModule,ReviewofPython’sTimeFunctions
equalto(==)operator,BooleanValues,ComparisonOperators
ERRORlevel,LoggingLevels
errors
crashesand,EnteringExpressionsintotheInteractiveShell
helpfor,StartingIDLE
escapecharacters,StringLiterals
evaluation,defined,EnteringExpressionsintotheInteractiveShell
Excelspreadsheets
applicationsupport,WorkingwithExcelSpreadsheets
chartsin,FreezePanes
columnwidth,Formulas
convertingbetweencolumnlettersandnumbers,ConvertingBetweenColumnLettersandNumbers
creatingdocuments,IdeasforSimilarPrograms
creatingworksheets,CreatingandRemovingSheets
deletingworksheets,CreatingandRemovingSheets
fontstyles,SettingtheFontStyleofCells
formulasin,FontObjects
freezingpanes,MergingandUnmergingCells
gettingcellvalues,GettingSheetsfromtheWorkbook
gettingrowsandcolumns,ConvertingBetweenColumnLettersandNumbers
gettingworksheetnames,GettingSheetsfromtheWorkbook
mergingandunmergingcells,SettingRowHeightandColumnWidth
openingdocuments,ReadingExcelDocuments
openpyxlmodule,WorkingwithExcelSpreadsheets
overview,WorkingwithExcelSpreadsheets
readingfiles
overview,GettingRowsandColumnsfromtheSheets
populatingdatastructure,Step1:ReadtheSpreadsheetData
readingdata,Project:ReadingDatafromaSpreadsheet
writingresultstofile,Step2:PopulatetheDataStructure
andreminderemailsproject,DisconnectingfromtheIMAPServer
rowheight,Formulas
savingworkbooks,IdeasforSimilarPrograms
updating,WritingValuestoCells
overview,WritingValuestoCells
setup,Project:UpdatingaSpreadsheet
workbooksvs.,WorkingwithExcelSpreadsheets
writingvaluestocells,CreatingandRemovingSheets
Exceptionobjects,RaisingExceptions
exceptions
assertionsand,Assertions
gettingtracebackasstring,RaisingExceptions
handling,TheglobalStatement
raising,Debugging
execution,program
defined,FlowControl
overview,BlocksofCode
pausinguntilspecifictime,PausingUntilaSpecificDate
terminatingprogramwithsys.exit(),ImportingModules
exists()function,FindingFileSizesandFolderContents
exitcodes,LaunchingOtherProgramsfromPython
expandkeyword,RotatingandFlippingImages
exponent(**)operator,EnteringExpressionsintotheInteractiveShell
expressions
conditionsand,MixingBooleanandComparisonOperators
ininteractiveshell,EnteringExpressionsintotheInteractiveShell
expunge()method,GettingtheBodyfromaRawMessage
extensions,file,ReadingandWritingFiles
extractall()method,ExtractingfromZIPFiles
extractingZIPfiles,ExtractingfromZIPFiles
extract()method,ExtractingfromZIPFiles
FFailSafeExceptionexception,Step2:SetUpCoordinates
“falsey”values,continueStatements
fetch()method,PerformingtheSearch,SizeLimits
fileeditor,VariableNames
filemanagement
absolutevs.relativepaths,TheCurrentWorkingDirectory
backslashvs.forwardslash,FilesandFilePaths
compressedfiles
backingupto,Step3:FormtheNewFilenameandRenametheFiles
creatingZIPfiles,ExtractingfromZIPFiles
extractingZIPfiles,ExtractingfromZIPFiles
overview,WalkingaDirectoryTree
readingZIPfiles,CompressingFileswiththezipfileModule
creatingdirectories,Absolutevs.RelativePaths
currentworkingdirectory,TheCurrentWorkingDirectory
multiclipboardproject,Step4:WriteContenttotheQuizandAnswerKeyFiles
openingfiles,TheFileReading/WritingProcess
os.pathmodule
absolutepathsin,Theos.pathModule
filesizes,HandlingAbsoluteandRelativePaths
foldercontents,HandlingAbsoluteandRelativePaths
overview,Theos.pathModule
pathvalidity,FindingFileSizesandFolderContents
relativepathsin,Theos.pathModule
overview,ReadingandWritingFiles
paths,ReadingandWritingFiles
plaintextvs.binaryfiles,FindingFileSizesandFolderContents
readingfiles,OpeningFileswiththeopen()Function
renamingfiles,datestyles,CreatingandAddingtoZIPFiles
savingvariableswithpformat()function,SavingVariableswiththeshelveModule
send2trashmodule,PermanentlyDeletingFilesandFolders
shelvemodule,WritingtoFiles
shutilmodule
copyingfiles/folders,OrganizingFiles
deletingfiles/folders,MovingandRenamingFilesandFolders
movingfiles/folders,CopyingFilesandFolders
renamingfiles/folders,CopyingFilesandFolders
walkingdirectorytrees,SafeDeleteswiththesend2trashModule
writingfiles,ReadingtheContentsofFiles
filenames,defined,ReadingandWritingFiles
Fileobjects,OpeningFileswiththeopen()Function
findall()method,GreedyandNongreedyMatching
find_element_by_*methods,StartingaSelenium-ControlledBrowser
find_elements_by_*methods,StartingaSelenium-ControlledBrowser
Firefox,developertoolsin,OpeningYourBrowser’sDeveloperTools
FLAGGEDsearchkey,PerformingtheSearch
flippingimages,RotatingandFlippingImages
float()function,Thelen()Function
floating-pointnumbers
integerequivalence,Thestr(),int(),andfloat()Functions
overview,TheInteger,Floating-Point,andStringDataTypes
rounding,Thetime.sleep()Function
flowcontrol
binaryoperators,ComparisonOperators
blocksofcode,MixingBooleanandComparisonOperators
Booleanvaluesand,FlowControl
breakstatements,AnAnnoyingwhileLoop
comparisonoperators,BooleanValues
conditions,MixingBooleanandComparisonOperators
continuestatements,continueStatements
elifstatements,elseStatements
elsestatements,ifStatements
ifstatements,BlocksofCode
overview,FlowControl
usingbinaryandcomparisonoperatorstogether,BinaryBooleanOperators
whileloops,whileLoopStatements
folders
absolutevs.relativepaths,TheCurrentWorkingDirectory
backinguptoZIPfile,Step3:FormtheNewFilenameandRenametheFiles
creatingnewZIPfile,Step1:FigureOuttheZIPFile’sName
figuringoutZIPfilename,Project:BackingUpaFolderintoaZIPFile
walkingdirectorytree,Step1:FigureOuttheZIPFile’sName
backslashvs.forwardslash,FilesandFilePaths
copying,OrganizingFiles
creating,Absolutevs.RelativePaths
currentworkingdirectory,TheCurrentWorkingDirectory
defined,ReadingandWritingFiles
deletingpermanently,MovingandRenamingFilesandFolders
deletingusingsend2trashmodule,PermanentlyDeletingFilesandFolders
moving,CopyingFilesandFolders
os.pathmodule
absolutepathsin,Theos.pathModule
filesizes,HandlingAbsoluteandRelativePaths
foldercontents,HandlingAbsoluteandRelativePaths
overview,Theos.pathModule
pathvalidity,FindingFileSizesandFolderContents
relativepathsin,Theos.pathModule
renaming,CopyingFilesandFolders
walkingdirectorytrees,SafeDeleteswiththesend2trashModule
Fontobjects,SettingtheFontStyleofCells
fontstyles,inExcelspreadsheets,SettingtheFontStyleofCells
forloops
overview,continueStatements
usingdictionaryitemsin,Thekeys(),values(),anditems()Methods
usinglistswith,UsingforLoopswithLists
formatattribute,WorkingwiththeImageDataType
format_descriptionattribute,WorkingwiththeImageDataType
formDatalist,Step2:SetUpCoordinates
formfillerproject,ReviewofthePyAutoGUIFunctions
overview,ReviewofthePyAutoGUIFunctions
radiobuttons,Step3:StartTypingData
selectlists,Step3:StartTypingData
settingupcoordinates,Step1:FigureOuttheSteps
stepsinprocess,Project:AutomaticFormFiller
submittingform,Step4:HandleSelectListsandRadioButtons
typingdata,Step2:SetUpCoordinates
formulas,inExcelspreadsheets,FontObjects
forward()method,SendingSpecialKeys
forwardslash(/),FilesandFilePaths
FROMsearchkey,PerformingtheSearch
fromtimestamp()function,ThedatetimeModule,ReviewofPython’sTimeFunctions
functions,ReviewofthePyAutoGUIFunctions
(seealsonamesofindividualfunctions)
arguments,Comments,defStatementswithParameters
as“blackbox”,TheglobalStatement
built-in,ImportingModules
defstatements,defStatementswithParameters
exceptionhandling,TheglobalStatement
keywordarguments,ReturnValuesandreturnStatements
Nonevalueand,ReturnValuesandreturnStatements
overview,Functions
parameters,defStatementswithParameters
returnvalues,defStatementswithParameters
Gget_active_sheet()method,GettingSheetsfromtheWorkbook
get_addresses()method,GettingEmailAddressesfromaRawMessage
get_attribute()method,FindingElementsonthePage
getcolor()function,ComputerImageFundamentals,WorkingwiththeImageDataType
get_column_letter()function,ConvertingBetweenColumnLettersandNumbers
getcwd()function,TheCurrentWorkingDirectory
get()function
overview,CheckingWhetheraKeyorValueExistsinaDictionary
requestsmodule,DownloadingFilesfromtheWebwiththerequestsModule
get_highest_column()method,GettingCellsfromtheSheets,Step1:OpentheExcelFile
get_highest_row()method,GettingCellsfromtheSheets
get_payload()method,GettingEmailAddressesfromaRawMessage
getpixel()function,ChangingIndividualPixels,ScrollingtheMouse,AnalyzingtheScreenshot
get_sheet_by_name()method,GettingSheetsfromtheWorkbook
get_sheet_names()method,GettingSheetsfromtheWorkbook
getsize()function,HandlingAbsoluteandRelativePaths
get_subject()method,GettingEmailAddressesfromaRawMessage
getText()function,ReadingWordDocuments
GIFformat,WorkingwiththeImageDataType
globalscope,LocalandGlobalVariableswiththeSameName
Gmail,ConnectingtoanSMTPServer,LoggingintotheSMTPServer,RetrievingandDeletingEmailswithIMAP
gmail_search()method,SizeLimits
GoogleMaps,WebScraping
graphicaluserinterfaceautomation(seeGUI(graphicaluserinterface)automation)
greaterthan(>)operator,BooleanValues
greaterthanorequalto(>=)operator,BooleanValues
greedymatching
dot-starfor,TheWildcardCharacter
inregularexpressions,MatchingOneorMorewiththePlus
group()method,CreatingRegexObjects,ReviewofRegularExpressionMatching
groups,regularexpression
matching
greedy,MatchingOneorMorewiththePlus
nongreedy,GreedyandNongreedyMatching
oneormore,OptionalMatchingwiththeQuestionMark
optional,MatchingMultipleGroupswiththePipe
specificreptitions,MatchingOneorMorewiththePlus
zeroormore,OptionalMatchingwiththeQuestionMark
usingparentheses,ReviewofRegularExpressionMatching
usingpipecharacterin,GroupingwithParentheses
GuesstheNumberprogram,ExceptionHandling
GUI(graphicaluserinterface)automation,ReviewofthePyAutoGUIFunctions
(seealsoformfillerproject)
controllingkeyboard,ImageRecognition
hotkeycombinations,PressingandReleasingtheKeyboard
keynames,SendingaStringfromtheKeyboard
pressingandreleasing,KeyNames
sendingstringfromkeyboard,ImageRecognition
controllingmouse,PausesandFail-Safes,Step3:GetandPrinttheMouseCoordinates
clickingmouse,ClickingtheMouse
draggingmouse,ClickingtheMouse
scrollingmouse,DraggingtheMouse
determiningmouseposition,MovingtheMouse
imagerecognition,Project:ExtendingthemouseNowProgram
installingpyautoguimodule,ControllingtheKeyboardandMousewithGUIAutomation
loggingoutofprogram,ControllingtheKeyboardandMousewithGUIAutomation
overview,ControllingtheKeyboardandMousewithGUIAutomation
screenshots,ScrollingtheMouse
stoppingprogram,ControllingtheKeyboardandMousewithGUIAutomation
H%Hdirective,PausingUntilaSpecificDate
hashcharacter(#),MultilineStringswithTripleQuotes
headings,Worddocument,WritingWordDocuments
help
askingonline,HowtoFindHelp
forerrormessages,StartingIDLE
hotkeycombinations,PressingandReleasingtheKeyboard
hotkey()function,PressingandReleasingtheKeyboard,ReviewofthePyAutoGUIFunctions
Hotmail.com,ConnectingtoanSMTPServer,RetrievingandDeletingEmailswithIMAP
HTML(HypertextMarkupLanguage)
browserdevelopertoolsand,ViewingtheSourceHTMLofaWebPage
findingelements,UsingtheDeveloperToolstoFindHTMLElements
learningresources,SavingDownloadedFilestotheHardDrive
overview,SavingDownloadedFilestotheHardDrive
viewingpagesource,AQuickRefresher
I%Idirective,PausingUntilaSpecificDate
idattribute,AQuickRefresher
IDLE(interactivedevelopmentenvironment)
creatingprograms,VariableNames
debuggingin
overview,DisablingLogging
steppingthroughprogram,Over
usingbreakpoints,DebuggingaNumberAddingProgram
expressionsin,EnteringExpressionsintotheInteractiveShell
overview,StartingIDLE
runningscriptsoutsideof,CopyingandPastingStringswiththepyperclipModule
starting,DownloadingandInstallingPython
ifstatements
overview,BlocksofCode
usinginwhileloop,whileLoopStatements
imageDrawmodule,IdeasforSimilarPrograms
imageDrawobjects,IdeasforSimilarPrograms
ImageFontobjects,DrawingExample
Imageobjects,ManipulatingImageswithPillow
images
addinglogoto,Project:AddingaLogo
attributesfor,WorkingwiththeImageDataType
boxtuples,CoordinatesandBoxTuples
colorvaluesin,ComputerImageFundamentals
coordinatesin,ColorsandRGBAValues
copyingandpastingin,CopyingandPastingImagesontoOtherImages
cropping,WorkingwiththeImageDataType
drawingon
exampleprogram,Lines
ellipses,Lines
ImageDrawmodule,IdeasforSimilarPrograms
lines,IdeasforSimilarPrograms
points,IdeasforSimilarPrograms
polygons,Lines
rectangles,Lines
text,DrawingExample
flipping,RotatingandFlippingImages
openingwithPillow,CoordinatesandBoxTuples
pixelmanipulation,ChangingIndividualPixels
recognitionof,Project:ExtendingthemouseNowProgram
resizing,CopyingandPastingImagesontoOtherImages
RGBAvalues,ComputerImageFundamentals
rotating,RotatingandFlippingImages
transparentpixels,CopyingandPastingImagesontoOtherImages
IMAP(InternetMessageAccessProtocol)
defined,DisconnectingfromtheSMTPServer
deletingmessages,GettingtheBodyfromaRawMessage
disconnectingfromserver,GettingtheBodyfromaRawMessage
fetchingmessages,SizeLimits
folders,ConnectingtoanIMAPServer
loggingintoserver,ConnectingtoanIMAPServer
searchingmessages,ConnectingtoanIMAPServer
imapclientmodule,DisconnectingfromtheSMTPServer
IMAPClientobjects,RetrievingandDeletingEmailswithIMAP
immutabledatatypes,List-likeTypes:StringsandTuples
importingmodules
overview,ImportingModules
pyautoguimodule,MovingtheMouse
imprintattribute,RunAttributes
imvariable,ScrollingtheMouse
indentation,ExampleProgram:Magic8BallwithaList
indexes
fordictionaries(seekeys,dictionary)
forlists
changingvaluesusing,GettingaList’sLengthwithlen()
gettingvalueusing,TheListDataType
negative,NegativeIndexes
removingvaluesfromlistusing,RemovingValuesfromListswithdelStatements
forstrings,MultilineStringswithTripleQuotes
IndexError,TheDictionaryDataType
index()method,Methods
infiniteloops,AnAnnoyingwhileLoop,continueStatements,Step1:ImporttheModule
INFOlevel,UsingtheloggingModule
inoperator
usingwithdictionaries,CheckingWhetheraKeyorValueExistsinaDictionary
usingwithlists,TheinandnotinOperators
usingwithstrings,IndexingandSlicingStrings
input()function
overview,Comments,Methods
usingforsensitiveinformation,LoggingintotheSMTPServer
installing
openpyxlmodule,WorkingwithExcelSpreadsheets
pyautoguimodule,ControllingtheKeyboardandMousewithGUIAutomation
Python,AboutThisBook
seleniummodule,Step4:SavetheImageandFindthePreviousComic
third-partymodules,InstallingThird-PartyModules
int,TheInteger,Floating-Point,andStringDataTypes
(seealsointegers)
integerdivision/flooredquotient(//)operator,EnteringExpressionsintotheInteractiveShell
integers
floating-pointequivalence,Thestr(),int(),andfloat()Functions
overview,TheInteger,Floating-Point,andStringDataTypes
interactivedevelopmentenvironment(seeIDLE(interactivedevelopmentenvironment))
interactiveshell(seeIDLE)
InternetExplorer,developertoolsin,ViewingtheSourceHTMLofaWebPage
InternetMessageAccessProtocol(seeIMAP(InternetMessageAccessProtocol))
interpreter,Python,DownloadingandInstallingPython
int()function,Thelen()Function
isabs()function,Theos.pathModule
isalnum()method,Theupper(),lower(),isupper(),andislower()StringMethods
isalpha()method,Theupper(),lower(),isupper(),andislower()StringMethods
isdecimal()method,Theupper(),lower(),isupper(),andislower()StringMethods
isdir()function,FindingFileSizesandFolderContents
is_displayed()method,FindingElementsonthePage
is_enabled()method,FindingElementsonthePage
isfile()function,FindingFileSizesandFolderContents
islower()method,Theupper(),lower(),isupper(),andislower()StringMethods
is_selected()method,FindingElementsonthePage
isspace()method,TheisXStringMethods
istitle()method,TheisXStringMethods
isupper()method,Theupper(),lower(),isupper(),andislower()StringMethods
italicattribute,RunAttributes
items()method,Dictionariesvs.Lists
iter_content()method,SavingDownloadedFilestotheHardDrive
J%jdirective,PausingUntilaSpecificDate
join()method,TheisXStringMethods,FilesandFilePaths,Theos.pathModule,Step2:CreateandStartThreads
JPEGformat,WorkingwiththeImageDataType
JSONfiles
APIsfor,Step3:WriteOuttheCSVFileWithouttheFirstRow
defined,WorkingwithCSVFilesandJSONData
formatoverview,Step3:WriteOuttheCSVFileWithouttheFirstRow
reading,JSONandAPIs
andweatherdataproject,ReadingJSONwiththeloads()Function
writing,ReadingJSONwiththeloads()Function
justifyingtext,Thejoin()andsplit()StringMethods
Kkeyboard
controlling,withPyAutoGUI
hotkeycombinations,PressingandReleasingtheKeyboard
pressingandreleasingkeys,KeyNames
sendingstringfromkeyboard,ImageRecognition
keynames,SendingaStringfromtheKeyboard
KeyboardInterruptexception,Step2:TrackandPrintLapTimes,MovingtheMouse,Step1:ImporttheModule
keyDown()function,KeyNames,PressingandReleasingtheKeyboard,ReviewofthePyAutoGUIFunctions
keys,dictionary,DictionariesandStructuringData
keys()method,Dictionariesvs.Lists
keyUp()function,KeyNames,PressingandReleasingtheKeyboard,ReviewofthePyAutoGUIFunctions
keywordarguments,ReturnValuesandreturnStatements
LLARGERsearchkey,PerformingtheSearch
launchd,LaunchingOtherProgramsfromPython
launchingprograms
andcountdownproject,Project:SimpleCountdownProgram
openingfileswithdefaultapplications,TaskScheduler,launchd,andcron
openingwebsites,TaskScheduler,launchd,andcron
overview,Step2:CreateandStartThreads
passingcommandlineargumentstoprocesses,LaunchingOtherProgramsfromPython
poll()method,LaunchingOtherProgramsfromPython
runningPythonscripts,TaskScheduler,launchd,andcron
scheduling,LaunchingOtherProgramsfromPython
sleep()function,TaskScheduler,launchd,andcron
wait()method,LaunchingOtherProgramsfromPython
len()function,WordDocuments
findingnumberofvaluesinlist,GettingaList’sLengthwithlen()
overview,Theinput()Function
lessthan(<)operator,BooleanValues
lessthanorequalto(<=)operator,BooleanValues
LibreOffice,WorkingwithExcelSpreadsheets,Step4:SavetheResults
linebreaks,Worddocument,AddingHeadings
LineChart()function,Charts
linecontinuationcharacter(\),ExampleProgram:Magic8BallwithaList
line()method,IdeasforSimilarPrograms
linkedstyles,StylingParagraphandRunObjects
Linux
backslashvs.forwardslash,FilesandFilePaths
cron,LaunchingOtherProgramsfromPython
installingPython,DownloadingandInstallingPython
installingthird-partymodules,ThepipTool
launchingprocessesfromPython,LaunchingOtherProgramsfromPython
loggingoutofautomationprogram,ControllingtheKeyboardandMousewithGUIAutomation
openingfileswithdefaultapplications,TaskScheduler,launchd,andcron
piptoolon,InstallingThird-PartyModules
Pythonsupport,WhatIsPython?
runningPythonprogramson,RunningPythonProgramsonOSXandLinux
startingIDLE,StartingIDLE
Unixphilosophy,OpeningFileswithDefaultApplications
listdir()function,HandlingAbsoluteandRelativePaths
list_folders()method,ConnectingtoanIMAPServer
list()function,ReaderObjects,ImageRecognition
lists
append()method,Methods
augmentedassignmentoperators,TheMultipleAssignmentTrick
changingvaluesusingindex,GettingaList’sLengthwithlen()
concatenationof,GettingaList’sLengthwithlen()
copy()function,PassingReferences
deepcopy()function,PassingReferences
dictionariesvs.,TheDictionaryDataType
findingnumberofvaluesusinglen(),GettingaList’sLengthwithlen()
gettingsublistswithslices,NegativeIndexes
gettingvalueusingindex,TheListDataType
index()method,Methods
inoperator,TheinandnotinOperators
insert()method,Methods
list()function,TheTupleDataType
Magic8Ballexampleprogramusing,SortingtheValuesinaListwiththesort()Method
multipleassignmenttrick,TheinandnotinOperators
mutablevs.immutabledatatypes,List-likeTypes:StringsandTuples
negativeindexes,NegativeIndexes
nesting,ATic-Tac-ToeBoard
notinoperator,TheinandnotinOperators
overview,TheListDataType
remove()method,AddingValuestoListswiththeappend()andinsert()Methods
removingvaluesfrom,RemovingValuesfromListswithdelStatements
replicationof,GettingaList’sLengthwithlen()
sort()method,RemovingValuesfromListswithremove()
storingvariablesas,RemovingValuesfromListswithdelStatements
usingwithforloops,UsingforLoopswithLists
ljust()method,Thejoin()andsplit()StringMethods
load_workbook()function,ReadingExcelDocuments
loads()function,JSONandAPIs,Step2:DownloadtheJSONData
localscope,LocalandGlobalScope
locateAllOnScreen()function,ImageRecognition
locateOnScreen()function,Project:ExtendingthemouseNowProgram
locationattribute,FindingElementsonthePage
logging
disabling,LoggingLevels
tofile,DisablingLogging
levelsof,UsingtheloggingModule
print()functionand,UsingtheloggingModule
loggingmodule,UsinganAssertioninaTrafficLightSimulation
loggingout,ofautomationprogram,ControllingtheKeyboardandMousewithGUIAutomation
login()method,ConnectingtoanSMTPServer,ConnectingtoanIMAPServer,Step3:SendCustomizedEmailReminders
logo,addingtoanimage,Project:AddingaLogo
loopingoverfiles,Step1:OpentheLogoImage
openinglogoimage,Project:AddingaLogo
overview,Step3:ResizetheImages
resizingimage,Step2:LoopOverAllFilesandOpenImages
logout()method,GettingtheBodyfromaRawMessage
LogRecordobjects,UsinganAssertioninaTrafficLightSimulation
loops
breakstatements,AnAnnoyingwhileLoop
continuestatements,continueStatements
forloop,continueStatements
range()functionfor,AnEquivalentwhileLoop
readingdatafromCSVfile,ReadingDatafromReaderObjectsinaforLoop
usinglistswith,UsingforLoopswithLists
whileloop,whileLoopStatements
lower()method,Theupper(),lower(),isupper(),andislower()StringMethods
lstrip()method,JustifyingTextwithrjust(),ljust(),andcenter()
M%Mdirective,PausingUntilaSpecificDate
%mdirective,PausingUntilaSpecificDate
MacOSX(seeOSX)
Magic8Ballexampleprogram,SortingtheValuesinaListwiththesort()Method
makedirs()function,Absolutevs.RelativePaths,Step2:LoopOverAllFilesandOpenImages
maps,openwhenlocationiscopied,WebScraping
figuringoutURL,WebScraping
handlingclipboardcontent,Step3:HandletheClipboardContentandLaunchtheBrowser
handlingcommandlineargument,Step1:FigureOuttheURL
launchingbrowser,Step3:HandletheClipboardContentandLaunchtheBrowser
overview,WebScraping
Matchobjects,CreatingRegexObjects
math
operatorsfor,EnteringExpressionsintotheInteractiveShell
programmingand,WhatIsPython?
mergePage()method,OverlayingPages
Messageobjects,SendingTextMessages
methods
chainingcalls,RotatingandFlippingImages
defined,Methods
dictionary
get()method,CheckingWhetheraKeyorValueExistsinaDictionary
items()method,Dictionariesvs.Lists
keys()method,Dictionariesvs.Lists
setdefault()method,Thesetdefault()Method
values()method,Dictionariesvs.Lists
list
append()method,Methods
index()method,Methods
insert()method,Methods
remove()method,AddingValuestoListswiththeappend()andinsert()Methods
sort()method,RemovingValuesfromListswithremove()
string
center()method,Thejoin()andsplit()StringMethods
copy()method,RemovingWhitespacewithstrip(),rstrip(),andlstrip()
endswith()method,TheisXStringMethods
isalnum()method,Theupper(),lower(),isupper(),andislower()StringMethods
isalpha()method,Theupper(),lower(),isupper(),andislower()StringMethods
isdecimal()method,Theupper(),lower(),isupper(),andislower()StringMethods
islower()method,Theupper(),lower(),isupper(),andislower()StringMethods
isspace()method,TheisXStringMethods
istitle()method,TheisXStringMethods
isupper()method,Theupper(),lower(),isupper(),andislower()StringMethods
join()method,TheisXStringMethods
ljust()method,Thejoin()andsplit()StringMethods
lower()method,Theupper(),lower(),isupper(),andislower()StringMethods
lstrip()method,JustifyingTextwithrjust(),ljust(),andcenter()
paste()method,RemovingWhitespacewithstrip(),rstrip(),andlstrip()
rjust()method,Thejoin()andsplit()StringMethods
rstrip()method,JustifyingTextwithrjust(),ljust(),andcenter()
split()method,TheisXStringMethods
startswith()method,TheisXStringMethods
strip()method,JustifyingTextwithrjust(),ljust(),andcenter()
upper()method,Theupper(),lower(),isupper(),andislower()StringMethods
MicrosoftWindows(seeWindowsOS)
middleClick()function,ClickingtheMouse,ReviewofthePyAutoGUIFunctions
modules
importing,ImportingModules
third-party,installing,ThepipTool
modulus/remainder(%)operator,EnteringExpressionsintotheInteractiveShell,TheMultipleAssignmentTrick
MontyPython,WhatIsPython?
mouse
controlling,PausesandFail-Safes,Step3:GetandPrinttheMouseCoordinates
clickingmouse,ClickingtheMouse
draggingmouse,ClickingtheMouse
scrollingmouse,DraggingtheMouse
determiningpositionof,MovingtheMouse
locating,MovingtheMouse
gettingcoordinates,Step1:ImporttheModule
handlingKeyboardInterruptexception,Step1:ImporttheModule
importingpyautoguimodule,Step1:ImporttheModule
infiniteloop,Step1:ImporttheModule
overview,MovingtheMouse
andpixels,identifyingcolorsof,AnalyzingtheScreenshot
mouseDown()function,ClickingtheMouse,ReviewofthePyAutoGUIFunctions
mouse.position()function,Step1:ImporttheModule
mouseUp()function,ReviewofthePyAutoGUIFunctions
move()function,CopyingFilesandFolders
moveRel()function,ControllingMouseMovement,MovingtheMouse,ClickingtheMouse,ReviewofthePyAutoGUIFunctions
moveTo()function,ControllingMouseMovement,ClickingtheMouse,ReviewofthePyAutoGUIFunctions
movingfiles/folders,CopyingFilesandFolders
multiclipboardproject,Step4:WriteContenttotheQuizandAnswerKeyFiles
listingkeywords,Step2:SaveClipboardContentwithaKeyword
loadingkeywordcontent,Step2:SaveClipboardContentwithaKeyword
overview,Step4:WriteContenttotheQuizandAnswerKeyFiles
savingclipboardcontent,Step1:CommentsandShelfSetup
settingupshelffile,Step1:CommentsandShelfSetup
multilinecomments,MultilineStringswithTripleQuotes
multilinestrings,EscapeCharacters
multipleassignmenttrick,TheinandnotinOperators
multiplication(*)operator,EnteringExpressionsintotheInteractiveShell,GettingaList’sLengthwithlen(),TheMultipleAssignmentTrick
multithreading
concurrencyissues,PassingArgumentstotheThread’sTargetFunction
downloadingmultipleimages,,Project:MultithreadedXKCDDownloader
creatingandstartingthreads,Step1:ModifytheProgramtoUseaFunction
usingdownloadXkcd()function,Project:MultithreadedXKCDDownloader
waitingforthreadstoend,Step2:CreateandStartThreads
join()method,Step2:CreateandStartThreads
overview,Multithreading
passingargumentstothreads,Multithreading
start()method,Multithreading,PassingArgumentstotheThread’sTargetFunction
Thread()function,Multithreading
mutabledatatypes,List-likeTypes:StringsandTuples
NNameError,RemovingValuesfromListswithdelStatements
namelist()method,CompressingFileswiththezipfileModule
negativecharacterclasses,CharacterClasses
negativeindexes,NegativeIndexes
nestedlistsanddictionaries,ATic-Tac-ToeBoard
newlinekeywordargument,ReadingDatafromReaderObjectsinaforLoop
Nonevalue,ReturnValuesandreturnStatements
nongreedymatching
dot,star,andquestionmarkfor,TheWildcardCharacter
inregularexpressions,GreedyandNongreedyMatching
notequalto(!=)operator,BooleanValues
notinoperator
usingwithdictionaries,CheckingWhetheraKeyorValueExistsinaDictionary
usingwithlists,TheinandnotinOperators
usingwithstrings,IndexingandSlicingStrings
notoperator,BinaryBooleanOperators
NOTsearchkey,PerformingtheSearch
now()function,ThedatetimeModule,ReviewofPython’sTimeFunctions
OONsearchkey,SelectingaFolder
open()function,TheFileReading/WritingProcess,WebScraping,TaskScheduler,launchd,andcron,ManipulatingImageswithPillow
openingfiles,TheFileReading/WritingProcess
OpenOffice,WorkingwithExcelSpreadsheets,Step4:SavetheResults
openprogram,TaskScheduler,launchd,andcron
openpyxlmodule,installing,WorkingwithExcelSpreadsheets
operators
augmentedassignment,TheMultipleAssignmentTrick
binary,ComparisonOperators
comparison,BooleanValues
defined,EnteringExpressionsintotheInteractiveShell
math,EnteringExpressionsintotheInteractiveShell
usingbinaryandcomparisonoperatorstogether,BinaryBooleanOperators
orderofoperations,EnteringExpressionsintotheInteractiveShell
oroperator,BinaryBooleanOperators
ORsearchkey,PerformingtheSearch
OSX
backslashvs.forwardslash,FilesandFilePaths
installingPython,DownloadingandInstallingPython
installingthird-partymodules,ThepipTool
launchd,LaunchingOtherProgramsfromPython
launchingprocessesfromPython,OpeningFileswithDefaultApplications
loggingoutofautomationprogram,ControllingtheKeyboardandMousewithGUIAutomation
openingfileswithdefaultapplications,TaskScheduler,launchd,andcron
piptoolon,InstallingThird-PartyModules
Pythonsupport,WhatIsPython?
runningPythonprogramson,RunningPythonProgramsonOSXandLinux
startingIDLE,StartingIDLE
Unixphilosophy,OpeningFileswithDefaultApplications
outlineattribute,RunAttributes
Outlook.com,ConnectingtoanSMTPServer,RetrievingandDeletingEmailswithIMAP
P
%pdirective,PausingUntilaSpecificDate
pagebreaks,Worddocument,AddingHeadings
Pageobjects,ExtractingTextfromPDFs
Paragraphobjects,WordDocuments
paragraphs,Worddocument,GettingtheFullTextfroma.docxFile
parameters,function,defStatementswithParameters
parentheses(),MutableandImmutableDataTypes,ReviewofRegularExpressionMatching
parsing,defined,ParsingHTMLwiththeBeautifulSoupModule
passingarguments,Comments
passingreferences,PassingReferences
passwords
application-specific,LoggingintotheSMTPServer
managingproject,CopyingandPastingStringswiththepyperclipModule
command-linearguments,Step1:ProgramDesignandDataStructures
copyingpassword,Step1:ProgramDesignandDataStructures
datastructures,CopyingandPastingStringswiththepyperclipModule
overview,CopyingandPastingStringswiththepyperclipModule
pastebin.com,Summary
paste()method,RemovingWhitespacewithstrip(),rstrip(),andlstrip(),CopyingandPastingImagesontoOtherImages,CopyingandPastingImagesontoOtherImages
paths
absolutevs.relative,TheCurrentWorkingDirectory
backslashvs.forwardslash,FilesandFilePaths
currentworkingdirectory,TheCurrentWorkingDirectory
overview,ReadingandWritingFiles
os.pathmodule
absolutepathsin,Theos.pathModule
filesizes,HandlingAbsoluteandRelativePaths
foldercontents,HandlingAbsoluteandRelativePaths
overview,Theos.pathModule
pathvalidity,FindingFileSizesandFolderContents
relativepathsin,Theos.pathModule
PAUSEvariable,PausesandFail-Safes,Step2:SetUpCoordinates
PdfFileReaderobjects,ExtractingTextfromPDFs
PDFfiles
combiningpagesfrommultiplefiles,EncryptingPDFs
addingpages,EncryptingPDFs
findingPDFfiles,Step1:FindAllPDFFiles
openingPDFs,Step1:FindAllPDFFiles
overview,EncryptingPDFs
savingresults,Step2:OpenEachPDF
creating,DecryptingPDFs
decrypting,ExtractingTextfromPDFs
encrypting,OverlayingPages
extractingtextfrom,PDFDocuments
formatoverview,WorkingwithPDFandwordDocuments
pagesin
copying,CreatingPDFs
overlaying,OverlayingPages
rotating,CopyingPages
PdfFileWriterobjects,DecryptingPDFs
pformat()function
overview,Thesetdefault()Method
savingvariablesintextfilesusing,SavingVariableswiththeshelveModule
phonenumbers,extracting,Combiningre.IGNORECASE,re.DOTALL,andre.VERBOSE
creatingregex,Project:PhoneNumberandEmailAddressExtractor
findingmatchesonclipboard,Step2:CreateaRegexforEmailAddresses
joiningmatchesintoastring,Step3:FindAllMatchesintheClipboardText
overview,Combiningre.IGNORECASE,re.DOTALL,andre.VERBOSE
Pillow
copyingandpastinginimages,CopyingandPastingImagesontoOtherImages
croppingimages,WorkingwiththeImageDataType
drawingonimages
ellipses,Lines
exampleprogram,Lines
ImageDrawmodule,IdeasforSimilarPrograms
lines,IdeasforSimilarPrograms
points,IdeasforSimilarPrograms
polygons,Lines
rectangles,Lines
text,DrawingExample
flippingimages,RotatingandFlippingImages
imageattributes,WorkingwiththeImageDataType
module,ComputerImageFundamentals
openingimages,CoordinatesandBoxTuples
pixelmanipulation,ChangingIndividualPixels
resizingimages,CopyingandPastingImagesontoOtherImages
rotatingimages,RotatingandFlippingImages
transparentpixels,CopyingandPastingImagesontoOtherImages
pipecharacter(|),GroupingwithParentheses,ManagingComplexRegexes
piptool,InstallingThird-PartyModules
pixelMatchesColor()function,AnalyzingtheScreenshot,Step3:StartTypingData
pixels,ComputerImageFundamentals,ChangingIndividualPixels
plaintextfiles,FindingFileSizesandFolderContents
plussign(+),OptionalMatchingwiththeQuestionMark,MatchingNewlineswiththeDotCharacter
PNGformat,WorkingwiththeImageDataType
point()method,IdeasforSimilarPrograms
poll()method,LaunchingOtherProgramsfromPython
polygon()method,Lines
Popen()function,Step2:CreateandStartThreads
openingfileswithdefaultapplications,TaskScheduler,launchd,andcron
passingcommandlineargumentsto,LaunchingOtherProgramsfromPython
position()function,MovingtheMouse,Step3:GetandPrinttheMouseCoordinates
pprint()function,Thesetdefault()Method
precedenceofmathoperators,EnteringExpressionsintotheInteractiveShell
press()function,PressingandReleasingtheKeyboard,ReviewofthePyAutoGUIFunctions,Step4:HandleSelectListsandRadioButtons
print()function,Step3:StartTypingData
loggingand,UsingtheloggingModule
overview,Comments
passingmultipleargumentsto,KeywordArgumentsandprint()
usingvariableswith,Theinput()Function
processes
andcountdownproject,Project:SimpleCountdownProgram
defined,Step2:CreateandStartThreads
openingfileswithdefaultapplications,TaskScheduler,launchd,andcron
openingwebsites,TaskScheduler,launchd,andcron
passingcommandlineargumentsto,LaunchingOtherProgramsfromPython
poll()method,LaunchingOtherProgramsfromPython
Popen()function,Step2:CreateandStartThreads
wait()method,LaunchingOtherProgramsfromPython
profilingcode,ThetimeModule
programming
blocksofcode,MixingBooleanandComparisonOperators
comments,Comments
creativityneededfor,ProgrammingIsaCreativeActivity
deduplicatingcode,Functions
defined,Conventions
exceptionhandling,TheglobalStatement
execution,program,BlocksofCode
functionsas“blackboxes”,TheglobalStatement
globalscope,LocalandGlobalVariableswiththeSameName
indentation,ExampleProgram:Magic8BallwithaList
localscope,LocalandGlobalScope
mathand,WhatIsPython?
Python,WhatIsPython?
terminatingprogramwithsys.exit(),ImportingModules
projects
AddingBulletstoWikiMarkup,Project:AddingBulletstoWikiMarkup
AddingaLogo,Project:AddingaLogo
AutomaticFormFiller,ReviewofthePyAutoGUIFunctions
BackingUpaFolderintoaZIPFile,Step3:FormtheNewFilenameandRenametheFiles
CombiningSelectPagesfromManyPDFs,EncryptingPDFs
DownloadingAllXKCDComics,Step3:OpenWebBrowsersforEachResult
ExtendingthemouseNowProgram,AnalyzingtheScreenshot
FetchingCurrentWeatherData,ReadingJSONwiththeloads()Function
GeneratingRandomQuizFiles,SavingVariableswiththepprint.pformat()Function
“I’mFeelingLucky”GoogleSearch,GettingDatafromanElement’sAttributes
“JustTextMe”Module,Project:“JustTextMe”Module
mapIt.pywiththewebbrowserModule,WebScraping
Multiclipboard,Step4:WriteContenttotheQuizandAnswerKeyFiles
MultithreadedXKCDDownloader,Project:MultithreadedXKCDDownloader
PasswordLocker,CopyingandPastingStringswiththepyperclipModule
PhoneNumberandEmailAddressExtractor,Combiningre.IGNORECASE,re.DOTALL,andre.VERBOSE
ReadingDatafromaSpreadsheet,GettingRowsandColumnsfromtheSheets
RemovingtheHeaderfromCSVFiles,ThedelimiterandlineterminatorKeywordArguments
RenamingFileswithAmerican-StyleDatestoEuropean-StyleDates,CreatingandAddingtoZIPFiles
SendingMemberDuesReminderEmails,DisconnectingfromtheIMAPServer
SimpleCountdownProgram,Project:SimpleCountdownProgram
SuperStopwatch,Thetime.sleep()Function
UpdatingaSpreadsheet,WritingValuestoCells
“WhereIstheMouseRightNow?”,MovingtheMouse
putpixel()method,ChangingIndividualPixels
pyautogui.click()function,Project:AutomaticFormFiller
pyautogui.click()method,ClickingtheMouse
pyautogui.doubleClick()function,ClickingtheMouse
pyautogui.dragTo()function,ClickingtheMouse
pyautogui.FailSafeExceptionexception,PausesandFail-Safes
pyautogui.hotkey()function,PressingandReleasingtheKeyboard
pyautogui.keyDown()function,KeyNames
pyautogui.keyUp()function,KeyNames
pyautogui.middleClick()function,ClickingtheMouse
pyautoguimodule
formfillerproject,ReviewofthePyAutoGUIFunctions
controllingkeyboard,ImageRecognition
hotkeycombinations,PressingandReleasingtheKeyboard
keynames,SendingaStringfromtheKeyboard
pressingandreleasingkeys,KeyNames
sendingstringfromkeyboard,ImageRecognition
controllingmouse,PausesandFail-Safes,Step3:GetandPrinttheMouseCoordinates
clickingmouse,ClickingtheMouse
draggingmouse,ClickingtheMouse
scrollingmouse,DraggingtheMouse
documentationfor,ControllingtheKeyboardandMousewithGUIAutomation
fail-safefeature,PausesandFail-Safes
functions,ReviewofthePyAutoGUIFunctions
imagerecognition,Project:ExtendingthemouseNowProgram
importing,MovingtheMouse
installing,ControllingtheKeyboardandMousewithGUIAutomation
pausingfunctioncalls,PausesandFail-Safes
screenshots,ScrollingtheMouse
pyautogui.mouseDown()function,ClickingtheMouse
pyautogui.moveRel()function,ControllingMouseMovement,MovingtheMouse
pyautogui.moveTo()function,ControllingMouseMovement
pyautogui.PAUSEvariable,PausesandFail-Safes
pyautogui.position()function,Step3:GetandPrinttheMouseCoordinates
pyautogui.press()function,Step4:HandleSelectListsandRadioButtons
pyautogui.rightClick()function,ClickingtheMouse
pyautogui.screenshot()function,ScrollingtheMouse
pyautogui.size()function,ControllingMouseMovement
pyautogui.typewrite()function,ImageRecognition,SendingaStringfromtheKeyboard,Project:AutomaticFormFiller
py.exeprogram,ShebangLine
pyobjcmodule,ThepipTool
PyPDF2module
combiningpagesfrommultiplePDFs,EncryptingPDFs
creatingPDFs,DecryptingPDFs
decryptingPDFs,ExtractingTextfromPDFs
encryptingPDFs,OverlayingPages
extractingtextfromPDFs,PDFDocuments
formatoverview,WorkingwithPDFandwordDocuments
pagesinPDFs
copying,CreatingPDFs
overlaying,OverlayingPages
rotating,CopyingPages
pyperclipmodule,RemovingWhitespacewithstrip(),rstrip(),andlstrip()
Python
datatypes,EnteringExpressionsintotheInteractiveShell
downloading,AboutThisBook
exampleprogram,VariableNames
help,StartingIDLE
installing,AboutThisBook
interactiveshell,StartingIDLE
interpreter,defined,DownloadingandInstallingPython
mathand,WhatIsPython?
overview,WhatIsPython?
programmingoverview,Conventions
startingIDLE,DownloadingandInstallingPython
python-docxmodule,Step4:SavetheResults
pyzmailmodule,DisconnectingfromtheSMTPServer,FetchinganEmailandMarkingItAsRead
PyzMessageobjects,FetchinganEmailandMarkingItAsRead
Qquestionmark(?),MatchingMultipleGroupswiththePipe,MatchingNewlineswiththeDotCharacter
quit()method,SendingSpecialKeys,DisconnectingfromtheSMTPServer,Step3:SendCustomizedEmailReminders
quizgenerator,SavingVariableswiththepprint.pformat()Function
creatingquizfile,Step2:CreatetheQuizFileandShuffletheQuestionOrder
creatingansweroptions,Step3:CreatetheAnswerOptions
overview,SavingVariableswiththepprint.pformat()Function
shufflingquestionorder,Step2:CreatetheQuizFileandShuffletheQuestionOrder
storingquizdataindictionary,Step1:StoretheQuizDatainaDictionary
writingcontenttofiles,Step3:CreatetheAnswerOptions
Rradiobuttons,Step3:StartTypingData
raise_for_status()method,DownloadingaWebPagewiththerequests.get()Function
raisekeyword,Debugging
range()function,AnEquivalentwhileLoop
rawstrings,EscapeCharacters,CreatingRegexObjects
Readerobjects,ReaderObjects
readingfiles,OpeningFileswiththeopen()Function,CompressingFileswiththezipfileModule
readlines()method,OpeningFileswiththeopen()Function
read()method,OpeningFileswiththeopen()Function
rectangle()method,Lines
Reddit,HowtoFindHelp
Referenceobjects,Charts
references
overview,TheTupleDataType
passing,PassingReferences
refresh()method,SendingSpecialKeys
Regexobjects
creating,FindingPatternsofTextWithoutRegularExpressions
matching,CreatingRegexObjects
regularexpressions
beginningofstringmatches,CharacterClasses
casesensitivity,Case-InsensitiveMatching
characterclasses,Thefindall()Method
creatingRegexobjects,FindingPatternsofTextWithoutRegularExpressions
defined,PatternMatchingwithRegularExpressions
endofstringmatches,CharacterClasses
extractingphonenumbersandemailsaddresses,Combiningre.IGNORECASE,re.DOTALL,andre.VERBOSE
findall()method,GreedyandNongreedyMatching
findingtextwithout,PatternMatchingwithRegularExpressions
greedymatching,MatchingOneorMorewiththePlus
grouping
matchingspecificrepetitions,MatchingOneorMorewiththePlus
oneormorematches,OptionalMatchingwiththeQuestionMark
optionalmatching,MatchingMultipleGroupswiththePipe
usingparentheses,ReviewofRegularExpressionMatching
usingpipecharacterin,GroupingwithParentheses
zeroormorematches,OptionalMatchingwiththeQuestionMark
HTMLand,OpeningYourBrowser’sDeveloperTools
matchingwith,CreatingRegexObjects
multipleargumentsforcompile()function,ManagingComplexRegexes
nongreedymatching,GreedyandNongreedyMatching
patternsfor,FindingPatternsofTextWithoutRegularExpressions
spreadingovermultiplelines,ManagingComplexRegexes
substitutingstringsusing,Case-InsensitiveMatching
symbolreference,MatchingNewlineswiththeDotCharacter
wildcardcharacter,TheCaretandDollarSignCharacters
relativepaths,TheCurrentWorkingDirectory
relpath()function,Theos.pathModule,HandlingAbsoluteandRelativePaths
remainder/modulus(%)operator,EnteringExpressionsintotheInteractiveShell,TheMultipleAssignmentTrick
remove()method,AddingValuestoListswiththeappend()andinsert()Methods
remove_sheet()method,CreatingandRemovingSheets
renamingfiles/folders,CopyingFilesandFolders
datestyles,CreatingandAddingtoZIPFiles
creatingregexfordates,CreatingandAddingtoZIPFiles
identifyingdatesinfilenames,Step1:CreateaRegexforAmerican-StyleDates
overview,CreatingandAddingtoZIPFiles
renamingfiles,Step3:FormtheNewFilenameandRenametheFiles
replication
oflists,GettingaList’sLengthwithlen()
string,StringConcatenationandReplication
requestsmodule
downloadingfiles,SavingDownloadedFilestotheHardDrive
downloadingpages,DownloadingFilesfromtheWebwiththerequestsModule
resolutionofcomputerscreen,ControllingMouseMovement
Responseobjects,DownloadingFilesfromtheWebwiththerequestsModule
returnvalues,function,defStatementswithParameters
reversekeyword,RemovingValuesfromListswithremove()
RGBAvalues,ComputerImageFundamentals
RGBcolormodel,ColorsandRGBAValues
rightClick()function,ClickingtheMouse,ReviewofthePyAutoGUIFunctions
rjust()method,Thejoin()andsplit()StringMethods,Step3:GetandPrinttheMouseCoordinates
rmdir()function,MovingandRenamingFilesandFolders
rmtree()function,MovingandRenamingFilesandFolders
rotateClockwise()method,CopyingPages
rotateCounterClockwise()method,CopyingPages
rotatingimages,RotatingandFlippingImages
roundingnumbers,Thetime.sleep()Function
rows,inExcelspreadsheets
settingheightandwidthof,Formulas
slicingWorksheetobjectstogetCellobjectsin,ConvertingBetweenColumnLettersandNumbers
rstrip()method,JustifyingTextwithrjust(),ljust(),andcenter()
rtlattribute,RunAttributes
Runobjects,StylingParagraphandRunObjects,RunAttributes
runningprograms
onLinux,RunningPythonProgramsonOSXandLinux
onOSX,RunningPythonProgramsonOSXandLinux
overview,RunningPrograms
onWindows,ShebangLine
shebangline,RunningPrograms
S\Scharacterclass,Thefindall()Method
\scharacterclass,Thefindall()Method
%Sdirective,PausingUntilaSpecificDate
Safari,developertoolsin,OpeningYourBrowser’sDeveloperTools
save()method,ManipulatingImageswithPillow
scope
global,LocalandGlobalVariableswiththeSameName
local,LocalandGlobalScope
screenshot()function,ScrollingtheMouse,ReviewofthePyAutoGUIFunctions
screenshots
analyzing,AnalyzingtheScreenshot
getting,ScrollingtheMouse
scripts
runningfromPythonprogram,TaskScheduler,launchd,andcron
runningoutsideofIDLE,CopyingandPastingStringswiththepyperclipModule
scroll()function,DraggingtheMouse,ScrollingtheMouse,ReviewofthePyAutoGUIFunctions
scrollingmouse,DraggingtheMouse
searching
email,ConnectingtoanIMAPServer
theWeb,GettingDatafromanElement’sAttributes
findingresults,Step1:GettheCommandLineArgumentsandRequesttheSearchPage
gettingcommandlinearguments,Step1:GettheCommandLineArgumentsandRequesttheSearchPage
openingwebbrowserforresults,Step2:FindAlltheResults
overview,GettingDatafromanElement’sAttributes
requestingsearchpage,Step1:GettheCommandLineArgumentsandRequesttheSearchPage
search()method,CreatingRegexObjects
SEENsearchkey,PerformingtheSearch
seeprogram,TaskScheduler,launchd,andcron
select_folder()method,SelectingaFolder
selectlists,Step3:StartTypingData
select()method,bs4module,CreatingaBeautifulSoupObjectfromHTML
selectors,CSS,CreatingaBeautifulSoupObjectfromHTML,FindingElementsonthePage
seleniummodule
clickingbuttons,SendingSpecialKeys
findingelements,StartingaSelenium-ControlledBrowser
followinglinks,FindingElementsonthePage
installing,Step4:SavetheImageandFindthePreviousComic
sendingspecialkeystrokes,FillingOutandSubmittingForms
submittingforms,FindingElementsonthePage
usingFirefoxwith,Step4:SavetheImageandFindthePreviousComic
send2trashmodule,PermanentlyDeletingFilesandFolders
sendingreminderemails,DisconnectingfromtheIMAPServer
findingunpaidmembers,Step2:FindAllUnpaidMembers
openingExcelfile,DisconnectingfromtheIMAPServer
overview,DisconnectingfromtheIMAPServer
sendingemails,Step2:FindAllUnpaidMembers
send_keys()method,FindingElementsonthePage
sendmail()method,LoggingintotheSMTPServer,Step3:SendCustomizedEmailReminders
sequencenumbers,FetchinganEmailandMarkingItAsRead
sequences,UsingforLoopswithLists
setdefault()method,Thesetdefault()Method
shadowattribute,RunAttributes
shebangline,RunningPrograms
shelvemodule,WritingtoFiles
ShortMessageService(SMS)
sendingmessages,SendingTextMessages
Twilioservice,SendingTextMessageswithTwilio
shutilmodule
deletingfiles/folders,MovingandRenamingFilesandFolders
movingfiles/folders,CopyingFilesandFolders
renamingfiles/folders,CopyingFilesandFolders
SID(stringID),SendingTextMessages
SimpleMailTransferProtocol(seeSMTP(SimpleMailTransferProtocol))
SINCEsearchkey,SelectingaFolder
singlequote(‘),StringLiterals
single-threadedprograms,Multithreading
size()function,ControllingMouseMovement
sleep()function,Thetime.time()Function,PausingUntilaSpecificDate,ReviewofPython’sTimeFunctions,TaskScheduler,launchd,andcron
slices
gettingsublistswith,NegativeIndexes
forstrings,MultilineStringswithTripleQuotes
small_capsattribute,RunAttributes
SMALLERsearchkey,PerformingtheSearch
SMS(ShortMessageService)
sendingmessages,SendingTextMessages
Twilioservice,SendingTextMessageswithTwilio
SMTP(SimpleMailTransferProtocol)
connectingtoserver,ConnectingtoanSMTPServer
defined,SMTP
disconnectingfromserver,DisconnectingfromtheSMTPServer
loggingintoserver,ConnectingtoanSMTPServer
sending“hello”message,ConnectingtoanSMTPServer
sendingmessage,LoggingintotheSMTPServer
TLSencryption,ConnectingtoanSMTPServer
SMTPobjects,ConnectingtoanSMTPServer
sort()method,RemovingValuesfromListswithremove()
soundfiles,playing,Project:SimpleCountdownProgram
sourcecode,defined,Conventions
split()method,TheisXStringMethods,HandlingAbsoluteandRelativePaths,WorkingwithCSVFilesandJSONData
spreadsheets(seeExcelspreadsheets)
squarebrackets[],TheListDataType
StackOverflow,HowtoFindHelp
standardlibrary,ImportingModules
star(*),TheWildcardCharacter,MatchingNewlineswiththeDotCharacter
usingwithwildcardcharacter,TheWildcardCharacter
zeroormorematcheswith,OptionalMatchingwiththeQuestionMark
start()method,Multithreading,PassingArgumentstotheThread’sTargetFunction,Step1:ModifytheProgramtoUseaFunction
startprogram,TaskScheduler,launchd,andcron
startswith()method,TheisXStringMethods
starttls()method,ConnectingtoanSMTPServer,Step3:SendCustomizedEmailReminders
stepargument,AnEquivalentwhileLoop
stopwatchproject,Thetime.sleep()Function
overview,Thetime.sleep()Function
setup,Project:SuperStopwatch
trackinglaptimes,Project:SuperStopwatch
strftime()function,PausingUntilaSpecificDate,ReviewofPython’sTimeFunctions
str()function,Thelen()Function,TheTupleDataType,Step3:GetandPrinttheMouse
Coordinates
strikeattribute,RunAttributes
stringID(SID),SendingTextMessages
strings
center()method,Thejoin()andsplit()StringMethods
concatenation,TheInteger,Floating-Point,andStringDataTypes
convertingdatetimeobjectsto,PausingUntilaSpecificDate
convertingtodatetimeobjects,ConvertingdatetimeObjectsintoStrings
copyingandpasting,RemovingWhitespacewithstrip(),rstrip(),andlstrip()
doublequotesfor,StringLiterals
endswith()method,TheisXStringMethods
escapecharacters,StringLiterals
extractingPDFtextas,PDFDocuments
gettingtracebackas,RaisingExceptions
indexesfor,MultilineStringswithTripleQuotes
inoperator,IndexingandSlicingStrings
isalnum()method,Theupper(),lower(),isupper(),andislower()StringMethods
isalpha()method,Theupper(),lower(),isupper(),andislower()StringMethods
isdecimal()method,Theupper(),lower(),isupper(),andislower()StringMethods
islower()method,Theupper(),lower(),isupper(),andislower()StringMethods
isspace()method,TheisXStringMethods
istitle()method,TheisXStringMethods
isupper()method,Theupper(),lower(),isupper(),andislower()StringMethods
join()method,TheisXStringMethods
literals,StringLiterals
ljust()method,Thejoin()andsplit()StringMethods
lower()method,Theupper(),lower(),isupper(),andislower()StringMethods
lstrip()method,JustifyingTextwithrjust(),ljust(),andcenter()
multiline,EscapeCharacters
mutablevs.immutabledatatypes,List-likeTypes:StringsandTuples
notinoperator,IndexingandSlicingStrings
overview,TheInteger,Floating-Point,andStringDataTypes
raw,EscapeCharacters
replicationof,StringConcatenationandReplication
rjust()method,Thejoin()andsplit()StringMethods
rstrip()method,JustifyingTextwithrjust(),ljust(),andcenter()
slicing,MultilineStringswithTripleQuotes
split()method,TheisXStringMethods
startswith()method,TheisXStringMethods
strip()method,JustifyingTextwithrjust(),ljust(),andcenter()
substitutingusingregularexpressions,Case-InsensitiveMatching
upper()method,Theupper(),lower(),isupper(),andislower()StringMethods
strip()method,JustifyingTextwithrjust(),ljust(),andcenter()
strptime()function,ConvertingdatetimeObjectsintoStrings,ReviewofPython’sTimeFunctions
strs,TheInteger,Floating-Point,andStringDataTypes
(seealsostrings)
Styleobjects,SettingtheFontStyleofCells
SUBJECTsearchkey,SelectingaFolder
sublists,gettingwithslices,NegativeIndexes
sub()method,Case-InsensitiveMatching
submitButtonColorvariable,Step1:FigureOuttheSteps,Step3:StartTypingData
submitButtonvariable,Step1:FigureOuttheSteps
submit()method,FillingOutandSubmittingForms
subprocessmodule,KeepingTime,SchedulingTasks,andLaunchingPrograms,Step2:CreateandStartThreads
subtraction(-)operator,EnteringExpressionsintotheInteractiveShell,TheMultipleAssignmentTrick
subtractivecolormodel,ColorsandRGBAValues
Sudokupuzzles,WhatIsPython?
sys.exit()function,ImportingModules
Ttag_nameattribute,FindingElementsonthePage
Tagobjects,CreatingaBeautifulSoupObjectfromHTML
tags,HTML,SavingDownloadedFilestotheHardDrive
TaskScheduler,LaunchingOtherProgramsfromPython
termination,program,YourFirstProgram,ImportingModules
textattribute,ReadingWordDocuments,RunAttributes
textmessaging
automaticnotifications,Project:“JustTextMe”Module
sendingmessages,SendingTextMessages
Twilioservice,SendingTextMessageswithTwilio
text()method,DrawingExample
TEXTsearchkey,SelectingaFolder
textsize()method,DrawingText
third-partymodules,installing,InstallingThird-PartyModules
Thread()function,Multithreading,Step1:ModifytheProgramtoUseaFunction
threadingmodule,KeepingTime,SchedulingTasks,andLaunchingPrograms,Multithreading
Threadobjects,Multithreading
threads
concurrencyissues,PassingArgumentstotheThread’sTargetFunction
join()method,Step2:CreateandStartThreads
multithreading,Multithreading
imagedownloader,Project:MultithreadedXKCDDownloader
passingargumentsto,Multithreading
processesvs.,Step2:CreateandStartThreads
tic-tac-toeboard,UsingDataStructurestoModelReal-WorldThings
timedeltadatatype,ThedatetimeModule,ReviewofPython’sTimeFunctions
timedeltaobjects,ThedatetimeModule
timemodule
overview,ReviewofPython’sTimeFunctions
sleep()function,Thetime.time()Function,PausingUntilaSpecificDate
stopwatchproject,Thetime.sleep()Function
time()function,ThetimeModule
TLSencryption,ConnectingtoanSMTPServer
top-leveldomains,Step2:CreateaRegexforEmailAddresses
TOsearchkey,PerformingtheSearch
total_seconds()method,ThedatetimeModule,ReviewofPython’sTimeFunctions
traceback,gettingfromerror,RaisingExceptions
transparency,ComputerImageFundamentals,CopyingandPastingImagesontoOtherImages
transpose()method,RotatingandFlippingImages
triplequotes(”’),EscapeCharacters,ManagingComplexRegexes
truetype()function,DrawingText
truthtables,ComparisonOperators
“truthy”values,continueStatements
tupledatatype
overview,MutableandImmutableDataTypes
tuple()function,TheTupleDataType
twiliomodule,SendingTextMessageswithTwilio
TwilioRestClientobjects,SendingTextMessages
Twilioservice
automatictextmessages,Project:“JustTextMe”Module
overview,SendingTextMessageswithTwilio
sendingtextmessages,SendingTextMessages
TypeError,GettingIndividualValuesinaListwithIndexes,List-likeTypes:StringsandTuples
typewrite()function,ImageRecognition,SendingaStringfromtheKeyboard,ReviewofthePyAutoGUIFunctions,Project:AutomaticFormFiller,Step3:StartTypingData,Step4:HandleSelectListsandRadioButtons
UUbuntu,DownloadingandInstallingPython
cron,LaunchingOtherProgramsfromPython
launchingprocessesfromPython,LaunchingOtherProgramsfromPython
openingfileswithdefaultapplications,TaskScheduler,launchd,andcron
Unixphilosophy,OpeningFileswithDefaultApplications
UNANSWEREDsearchkey,PerformingtheSearch
UNDELETEDsearchkey,PerformingtheSearch
underlineattribute,RunAttributes
underscore(_),VariableNames
UNDRAFTsearchkey,PerformingtheSearch
UNFLAGGEDsearchkey,PerformingtheSearch
Unicodeencodings,SavingDownloadedFilestotheHardDrive
Unixepoch,ThetimeModule,ThedatetimeModule,ReviewofPython’sTimeFunctions
Unixphilosophy,OpeningFileswithDefaultApplications
unlink()function,MovingandRenamingFilesandFolders
UNSEENsearchkey,PerformingtheSearch
upper()method,Theupper(),lower(),isupper(),andislower()StringMethods
UTC(CoordinatedUniversalTime),ThetimeModule
VValueError,TheMultipleAssignmentTrick,ConvertingdatetimeObjectsintoStrings
values,defined,EnteringExpressionsintotheInteractiveShell,FindingPatternsofTextWithoutRegularExpressions
values()method,Dictionariesvs.Lists
variables,Methods
(seealsolists)
assignmentstatements,StringConcatenationandReplication
defined,StringConcatenationandReplication
global,LocalandGlobalVariableswiththeSameName
initializing,AssignmentStatements
local,LocalandGlobalScope
naming,VariableNames
Nonevalueand,ReturnValuesandreturnStatements
overwriting,AssignmentStatements
references,TheTupleDataType
savingwithshelvemodule,WritingtoFiles
storingaslist,RemovingValuesfromListswithdelStatements
Verizonmail,ConnectingtoanSMTPServer,RetrievingandDeletingEmailswithIMAP
volumes,defined,FilesandFilePaths
W\Wcharacterclass,Thefindall()Method
\wcharacterclass,Thefindall()Method
%wdirective,PausingUntilaSpecificDate
walk()function,SafeDeleteswiththesend2trashModule,LaunchingOtherProgramsfromPython
WARNINGlevel,UsingtheloggingModule
weatherdata,fetching,ReadingJSONwiththeloads()Function
downloadingJSONdata,Step1:GetLocationfromtheCommandLineArgument
gettinglocation,Step1:GetLocationfromtheCommandLineArgument
loadingJSONdata,Step2:DownloadtheJSONData
overview,ReadingJSONwiththeloads()Function
webbrowsermodule
open()function,TaskScheduler,launchd,andcron
openingbrowserusing,WebScraping
WebDriverobjects,StartingaSelenium-ControlledBrowser
WebElementobjects,StartingaSelenium-ControlledBrowser
webscraping
bs4module
creatingobjectfromHTML,ParsingHTMLwiththeBeautifulSoupModule
findingelementwithselect()method,CreatingaBeautifulSoupObjectfromHTML
gettingattribute,GettingDatafromanElement’sAttributes
overview,ParsingHTMLwiththeBeautifulSoupModule
downloading
files,SavingDownloadedFilestotheHardDrive
images,Step3:OpenWebBrowsersforEachResult
pages,DownloadingFilesfromtheWebwiththerequestsModule
andGooglemapsproject,WebScraping
andGooglesearchproject,GettingDatafromanElement’sAttributes
HTML
browserdevelopertoolsand,ViewingtheSourceHTMLofaWebPage
findingelements,UsingtheDeveloperToolstoFindHTMLElements
learningresources,SavingDownloadedFilestotheHardDrive
overview,SavingDownloadedFilestotheHardDrive
viewingpagesource,AQuickRefresher
overview,WebScraping
requestsmodule,DownloadingFilesfromtheWebwiththerequestsModule
seleniummoduleclickingbuttons,SendingSpecialKeys
findingelements,StartingaSelenium-ControlledBrowser
followinglinks,FindingElementsonthePage
installing,Step4:SavetheImageandFindthePreviousComic
sendingspecialkeystrokes,FillingOutandSubmittingForms
submittingforms,FindingElementsonthePage
usingFirefoxwith,Step4:SavetheImageandFindthePreviousComic
websites,openingfromscript,TaskScheduler,launchd,andcron
whileloops
gettingandprintingmousecoordinatesusing,Step1:ImporttheModule
infinite,Step1:ImporttheModule
overview,whileLoopStatements
whitespace,removing,JustifyingTextwithrjust(),ljust(),andcenter()
wildcardcharacter(.),TheCaretandDollarSignCharacters
WindowsOS
backslashvs.forwardslash,FilesandFilePaths
installingPython,AboutThisBook
installingthird-partymodules,ThepipTool
launchingprocessesfromPython,LaunchingOtherProgramsfromPython
loggingoutofautomationprogram,ControllingtheKeyboardandMousewithGUIAutomation
openingfileswithdefaultapplications,TaskScheduler,launchd,andcron
piptoolon,InstallingThird-PartyModules
Pythonsupport,WhatIsPython?
runningPythonprogramson,ShebangLine
startingIDLE,DownloadingandInstallingPython
TaskScheduler,LaunchingOtherProgramsfromPython
Worddocuments
addingheadings,WritingWordDocuments
creatingdocumentswithnondefaultstyles,StylingParagraphandRunObjects
formatoverview,Step4:SavetheResults
gettingtextfrom,ReadingWordDocuments
line/pagebreaks,AddingHeadings
picturesin,AddingHeadings
python-docxmodule,Step4:SavetheResults
reading,WordDocuments
Runobjectattributes,RunAttributes
stylingparagraphs,GettingtheFullTextfroma.docxFile
writingtofile,RunAttributes
Workbookobjects,ReadingExcelDocuments
workbooks,Excel,WorkingwithExcelSpreadsheets
creatingworksheets,CreatingandRemovingSheets
deletingworksheets,CreatingandRemovingSheets
opening,ReadingExcelDocuments
saving,IdeasforSimilarPrograms
Worksheetobjects,GettingSheetsfromtheWorkbook
write()method,ReadingtheContentsofFiles
Writerobjects,ReadingDatafromReaderObjectsinaforLoop
writerow()method,WriterObjects
XXKCDcomics
downloadingproject,Step3:OpenWebBrowsersforEachResult
designingprogram,Project:DownloadingAllXKCDComics
downloadingwebpage,Step1:DesigntheProgram
overview,Step3:OpenWebBrowsersforEachResult
savingimage,Step4:SavetheImageandFindthePreviousComic
multithreadeddownloadingproject,Project:MultithreadedXKCDDownloader
creatingandstartingthreads,Step1:ModifytheProgramtoUseaFunction
usingdownloadXkcd()function,Project:MultithreadedXKCDDownloader
waitingforthreadstoend,Step2:CreateandStartThreads
Y%Ydirective,PausingUntilaSpecificDate
%ydirective,PausingUntilaSpecificDate
Yahoo!Mail,ConnectingtoanSMTPServer,RetrievingandDeletingEmailswithIMAP
Zzipfilemodule
creatingZIPfiles,ExtractingfromZIPFiles
extractingZIPfiles,ExtractingfromZIPFiles
andfolders,Step3:FormtheNewFilenameandRenametheFiles
overview,WalkingaDirectoryTree
readingZIPfiles,CompressingFileswiththezipfileModule
ZipFileobjects,CompressingFileswiththezipfileModule
ZipInfoobjects,CompressingFileswiththezipfileModule
AutomatetheBoringStuffwithPython:PracticalProgrammingforTotalBeginnersAlbertSweigartCopyright©2015AUTOMATETHEBORINGSTUFFWITHPYTHON.
Allrightsreserved.Nopartofthisworkmaybereproducedortransmittedinanyformorbyanymeans,electronicormechanical,includingphotocopying,recording,orbyanyinformationstorageorretrievalsystem,withoutthepriorwrittenpermissionofthecopyrightownerandthepublisher.
1918171615123456789
ISBN-10:1-59327-599-4
ISBN-13:978-1-59327-599-0
Publisher:WilliamPollockProductionEditor:LaurelChunCoverIllustration:JoshEllingsonInteriorDesign:OctopodStudiosDevelopmentalEditors:JenniferGriffith-Delgado,GregPoulos,andLeslieShenTechnicalReviewer:AriLacenskiCopyeditor:KimWimpsettCompositor:SusanGlinertStevensProofreader:LisaDevotoFarrellIndexer:BIMIndexingandProofreadingServices
Forinformationondistribution,translations,orbulksales,pleasecontactNoStarchPress,Inc.directly:
LibraryofCongressControlNumber:2014953114
NoStarchPressandtheNoStarchPresslogoareregisteredtrademarksofNoStarchPress,Inc.Otherproductandcompanynamesmentionedhereinmaybethetrademarksoftheirrespectiveowners.Ratherthanuseatrademarksymbolwitheveryoccurrenceofatrademarkedname,weareusingthenamesonlyinaneditorialfashionandtothebenefitofthetrademarkowner,withnointentionofinfringementofthetrademark.
Theinformationinthisbookisdistributedonan“AsIs”basis,withoutwarranty.Whileeveryprecautionhasbeentakeninthepreparationofthiswork,neithertheauthornorNoStarchPress,Inc.shallhaveanyliabilitytoanypersonorentitywithrespecttoanylossordamagecausedorallegedtobecauseddirectlyorindirectlybytheinformationcontainedinit.
NoStarchPress
2015-04-16T12:10:03-07:00
AutomatetheBoringStuffwithPython:PracticalProgrammingforTotalBeginnersTableofContents
DedicationAbouttheAuthorAbouttheTechReviewerAcknowledgmentsIntroduction
WhomIsThisBookFor?ConventionsWhatIsProgramming?
WhatIsPython?ProgrammersDon’tNeedtoKnowMuchMathProgrammingIsaCreativeActivity
AboutThisBookDownloadingandInstallingPythonStartingIDLE
TheInteractiveShell
HowtoFindHelpAskingSmartProgrammingQuestionsSummary
I.PythonProgrammingBasics
1.PythonBasics
EnteringExpressionsintotheInteractiveShellTheInteger,Floating-Point,andStringDataTypesStringConcatenationandReplicationStoringValuesinVariables
AssignmentStatementsVariableNames
YourFirstProgramDissectingYourProgram
CommentsTheprint()FunctionTheinput()FunctionPrintingtheUser’sNameThelen()FunctionThestr(),int(),andfloat()Functions
SummaryPracticeQuestions
2.FlowControl
BooleanValuesComparisonOperatorsBooleanOperators
BinaryBooleanOperatorsThenotOperator
MixingBooleanandComparisonOperatorsElementsofFlowControl
ConditionsBlocksofCode
ProgramExecutionFlowControlStatements
ifStatementselseStatementselifStatementswhileLoopStatements
AnAnnoyingwhileLoop
breakStatementscontinueStatementsforLoopsandtherange()Function
AnEquivalentwhileLoopTheStarting,Stopping,andSteppingArgumentstorange()
ImportingModules
fromimportStatements
EndingaProgramEarlywithsys.exit()SummaryPracticeQuestions
3.Functions
defStatementswithParametersReturnValuesandreturnStatementsTheNoneValueKeywordArgumentsandprint()LocalandGlobalScope
LocalVariablesCannotBeUsedintheGlobalScopeLocalScopesCannotUseVariablesinOtherLocalScopesGlobalVariablesCanBeReadfromaLocalScopeLocalandGlobalVariableswiththeSameName
TheglobalStatementExceptionHandlingAShortProgram:GuesstheNumberSummaryPracticeQuestionsPracticeProjects
TheCollatzSequenceInputValidation
4.Lists
TheListDataType
GettingIndividualValuesinaListwithIndexesNegativeIndexesGettingSublistswithSlicesGettingaList’sLengthwithlen()ChangingValuesinaListwithIndexesListConcatenationandListReplicationRemovingValuesfromListswithdelStatements
WorkingwithLists
UsingforLoopswithListsTheinandnotinOperatorsTheMultipleAssignmentTrick
AugmentedAssignmentOperatorsMethods
FindingaValueinaListwiththeindex()MethodAddingValuestoListswiththeappend()andinsert()MethodsRemovingValuesfromListswithremove()SortingtheValuesinaListwiththesort()Method
ExampleProgram:Magic8BallwithaListList-likeTypes:StringsandTuples
MutableandImmutableDataTypesTheTupleDataTypeConvertingTypeswiththelist()andtuple()Functions
References
PassingReferencesThecopyModule’scopy()anddeepcopy()Functions
SummaryPracticeQuestionsPracticeProjects
CommaCodeCharacterPictureGrid
5.DictionariesandStructuringData
TheDictionaryDataType
Dictionariesvs.ListsThekeys(),values(),anditems()MethodsCheckingWhetheraKeyorValueExistsinaDictionaryTheget()MethodThesetdefault()Method
PrettyPrintingUsingDataStructurestoModelReal-WorldThings
ATic-Tac-ToeBoardNestedDictionariesandLists
SummaryPracticeQuestionsPracticeProjects
FantasyGameInventoryListtoDictionaryFunctionforFantasyGameInventory
6.ManipulatingStrings
WorkingwithStrings
StringLiterals
DoubleQuotesEscapeCharactersRawStringsMultilineStringswithTripleQuotesMultilineComments
IndexingandSlicingStringsTheinandnotinOperatorswithStrings
UsefulStringMethods
Theupper(),lower(),isupper(),andislower()StringMethodsTheisXStringMethodsThestartswith()andendswith()StringMethodsThejoin()andsplit()StringMethodsJustifyingTextwithrjust(),ljust(),andcenter()RemovingWhitespacewithstrip(),rstrip(),andlstrip()CopyingandPastingStringswiththepyperclipModule
Project:PasswordLocker
Step1:ProgramDesignandDataStructuresStep2:HandleCommandLineArgumentsStep3:CopytheRightPassword
Project:AddingBulletstoWikiMarkup
Step1:CopyandPastefromtheClipboardStep2:SeparatetheLinesofTextandAddtheStarStep3:JointheModifiedLines
SummaryPracticeQuestionsPracticeProject
TablePrinter
II.AutomatingTasks
7.PatternMatchingwithRegularExpressions
FindingPatternsofTextWithoutRegularExpressions
FindingPatternsofTextwithRegularExpressions
CreatingRegexObjectsMatchingRegexObjectsReviewofRegularExpressionMatching
MorePatternMatchingwithRegularExpressions
GroupingwithParenthesesMatchingMultipleGroupswiththePipeOptionalMatchingwiththeQuestionMarkMatchingZeroorMorewiththeStarMatchingOneorMorewiththePlusMatchingSpecificRepetitionswithCurlyBrackets
GreedyandNongreedyMatchingThefindall()MethodCharacterClassesMakingYourOwnCharacterClassesTheCaretandDollarSignCharactersTheWildcardCharacter
MatchingEverythingwithDot-StarMatchingNewlineswiththeDotCharacter
ReviewofRegexSymbolsCase-InsensitiveMatchingSubstitutingStringswiththesub()MethodManagingComplexRegexesCombiningre.IGNORECASE,re.DOTALL,andre.VERBOSEProject:PhoneNumberandEmailAddressExtractor
Step1:CreateaRegexforPhoneNumbersStep2:CreateaRegexforEmailAddressesStep3:FindAllMatchesintheClipboardTextStep4:JointheMatchesintoaStringfortheClipboardRunningtheProgramIdeasforSimilarPrograms
SummaryPracticeQuestionsPracticeProjects
StrongPasswordDetectionRegexVersionofstrip()
8.ReadingandWritingFiles
FilesandFilePaths
BackslashonWindowsandForwardSlashonOSXandLinuxTheCurrentWorkingDirectoryAbsolutevs.RelativePathsCreatingNewFolderswithos.makedirs()
Theos.pathModule
HandlingAbsoluteandRelativePathsFindingFileSizesandFolderContentsCheckingPathValidity
TheFileReading/WritingProcess
OpeningFileswiththeopen()FunctionReadingtheContentsofFilesWritingtoFiles
SavingVariableswiththeshelveModuleSavingVariableswiththepprint.pformat()FunctionProject:GeneratingRandomQuizFiles
Step1:StoretheQuizDatainaDictionaryStep2:CreatetheQuizFileandShuffletheQuestionOrderStep3:CreatetheAnswerOptionsStep4:WriteContenttotheQuizandAnswerKeyFiles
Project:Multiclipboard
Step1:CommentsandShelfSetupStep2:SaveClipboardContentwithaKeywordStep3:ListKeywordsandLoadaKeyword’sContent
SummaryPracticeQuestionsPracticeProjects
ExtendingtheMulticlipboardMadLibsRegexSearch
9.OrganizingFiles
TheshutilModule
CopyingFilesandFoldersMovingandRenamingFilesandFoldersPermanentlyDeletingFilesandFoldersSafeDeleteswiththesend2trashModule
WalkingaDirectoryTreeCompressingFileswiththezipfileModule
ReadingZIPFilesExtractingfromZIPFilesCreatingandAddingtoZIPFiles
Project:RenamingFileswithAmerican-StyleDatestoEuropean-StyleDates
Step1:CreateaRegexforAmerican-StyleDatesStep2:IdentifytheDatePartsfromtheFilenamesStep3:FormtheNewFilenameandRenametheFilesIdeasforSimilarPrograms
Project:BackingUpaFolderintoaZIPFile
Step1:FigureOuttheZIPFile’sNameStep2:CreatetheNewZIPFileStep3:WalktheDirectoryTreeandAddtotheZIPFileIdeasforSimilarPrograms
SummaryPracticeQuestionsPracticeProjects
SelectiveCopyDeletingUnneededFilesFillingintheGaps
10.Debugging
RaisingExceptionsGettingtheTracebackasaStringAssertions
UsinganAssertioninaTrafficLightSimulationDisablingAssertions
Logging
UsingtheloggingModuleDon’tDebugwithprint()LoggingLevelsDisablingLoggingLoggingtoaFile
IDLE’sDebugger
GoStepOverOutQuitDebuggingaNumberAddingProgramBreakpoints
SummaryPracticeQuestionsPracticeProject
DebuggingCoinToss
11.WebScraping
Project:mapit.pywiththewebbrowserModule
Step1:FigureOuttheURLStep2:HandletheCommandLineArgumentsStep3:HandletheClipboardContentandLaunchtheBrowserIdeasforSimilarPrograms
DownloadingFilesfromtheWebwiththerequestsModule
DownloadingaWebPagewiththerequests.get()FunctionCheckingforErrors
SavingDownloadedFilestotheHardDrive
HTML
ResourcesforLearningHTMLAQuickRefresherViewingtheSourceHTMLofaWebPageOpeningYourBrowser’sDeveloperToolsUsingtheDeveloperToolstoFindHTMLElements
ParsingHTMLwiththeBeautifulSoupModule
CreatingaBeautifulSoupObjectfromHTMLFindinganElementwiththeselect()MethodGettingDatafromanElement’sAttributes
Project:“I’mFeelingLucky”GoogleSearch
Step1:GettheCommandLineArgumentsandRequesttheSearchPageStep2:FindAlltheResultsStep3:OpenWebBrowsersforEachResultIdeasforSimilarPrograms
Project:DownloadingAllXKCDComics
Step1:DesigntheProgramStep2:DownloadtheWebPageStep3:FindandDownloadtheComicImageStep4:SavetheImageandFindthePreviousComicIdeasforSimilarPrograms
ControllingtheBrowserwiththeseleniumModule
StartingaSelenium-ControlledBrowserFindingElementsonthePageClickingthePageFillingOutandSubmittingFormsSendingSpecialKeysClickingBrowserButtonsMoreInformationonSelenium
SummaryPracticeQuestionsPracticeProjects
CommandLineEmailerImageSiteDownloader2048LinkVerification
12.WorkingwithExcelSpreadsheets
ExcelDocumentsInstallingtheopenpyxlModuleReadingExcelDocuments
OpeningExcelDocumentswithOpenPyXLGettingSheetsfromtheWorkbookGettingCellsfromtheSheetsConvertingBetweenColumnLettersandNumbersGettingRowsandColumnsfromtheSheetsWorkbooks,Sheets,Cells
Project:ReadingDatafromaSpreadsheet
Step1:ReadtheSpreadsheetDataStep2:PopulatetheDataStructureStep3:WritetheResultstoaFileIdeasforSimilarPrograms
WritingExcelDocuments
CreatingandSavingExcelDocumentsCreatingandRemovingSheetsWritingValuestoCells
Project:UpdatingaSpreadsheet
Step1:SetUpaDataStructurewiththeUpdateInformationStep2:CheckAllRowsandUpdateIncorrectPricesIdeasforSimilarPrograms
SettingtheFontStyleofCellsFontObjectsFormulasAdjustingRowsandColumns
SettingRowHeightandColumnWidthMergingandUnmergingCellsFreezePanes
ChartsSummaryPracticeQuestions
PracticeProjects
MultiplicationTableMakerBlankRowInserterSpreadsheetCellInverterTextFilestoSpreadsheetSpreadsheettoTextFiles
13.WorkingwithPDFandwordDocuments
PDFDocuments
ExtractingTextfromPDFsDecryptingPDFsCreatingPDFs
CopyingPagesRotatingPagesOverlayingPagesEncryptingPDFs
Project:CombiningSelectPagesfromManyPDFs
Step1:FindAllPDFFilesStep2:OpenEachPDFStep3:AddEachPageStep4:SavetheResultsIdeasforSimilarPrograms
WordDocuments
ReadingWordDocumentsGettingtheFullTextfroma.docxFileStylingParagraphandRunObjectsCreatingWordDocumentswithNondefaultStylesRunAttributesWritingWordDocumentsAddingHeadingsAddingLineandPageBreaksAddingPictures
SummaryPracticeQuestionsPracticeProjects
PDFParanoiaCustomInvitationsasWordDocumentsBrute-ForcePDFPasswordBreaker
14.WorkingwithCSVFilesandJSONData
TheCSVModule
ReaderObjectsReadingDatafromReaderObjectsinaforLoopWriterObjectsThedelimiterandlineterminatorKeywordArguments
Project:RemovingtheHeaderfromCSVFiles
Step1:LoopThroughEachCSVFileStep2:ReadintheCSVFileStep3:WriteOuttheCSVFileWithouttheFirstRowIdeasforSimilarPrograms
JSONandAPIsTheJSONModule
ReadingJSONwiththeloads()FunctionWritingJSONwiththedumps()Function
Project:FetchingCurrentWeatherData
Step1:GetLocationfromtheCommandLineArgumentStep2:DownloadtheJSONDataStep3:LoadJSONDataandPrintWeatherIdeasforSimilarPrograms
SummaryPracticeQuestionsPracticeProject
Excel-to-CSVConverter
15.KeepingTime,SchedulingTasks,andLaunchingPrograms
ThetimeModule
Thetime.time()FunctionThetime.sleep()Function
RoundingNumbersProject:SuperStopwatch
Step1:SetUptheProgramtoTrackTimesStep2:TrackandPrintLapTimesIdeasforSimilarPrograms
ThedatetimeModule
ThetimedeltaDataTypePausingUntilaSpecificDateConvertingdatetimeObjectsintoStringsConvertingStringsintodatetimeObjects
ReviewofPython’sTimeFunctionsMultithreading
PassingArgumentstotheThread’sTargetFunctionConcurrencyIssues
Project:MultithreadedXKCDDownloader
Step1:ModifytheProgramtoUseaFunctionStep2:CreateandStartThreadsStep3:WaitforAllThreadstoEnd
LaunchingOtherProgramsfromPython
PassingCommandLineArgumentstoPopen()TaskScheduler,launchd,andcronOpeningWebsiteswithPythonRunningOtherPythonScriptsOpeningFileswithDefaultApplications
Project:SimpleCountdownProgram
Step1:CountDownStep2:PlaytheSoundFileIdeasforSimilarPrograms
SummaryPracticeQuestionsPracticeProjects
PrettifiedStopwatchScheduledWebComicDownloader
16.SendingEmailandTextMessages
SMTP
SendingEmail
ConnectingtoanSMTPServerSendingtheSMTP“Hello”MessageStartingTLSEncryptionLoggingintotheSMTPServerSendinganEmailDisconnectingfromtheSMTPServer
IMAPRetrievingandDeletingEmailswithIMAP
ConnectingtoanIMAPServerLoggingintotheIMAPServerSearchingforEmail
SelectingaFolderPerformingtheSearchSizeLimits
FetchinganEmailandMarkingItAsReadGettingEmailAddressesfromaRawMessageGettingtheBodyfromaRawMessageDeletingEmailsDisconnectingfromtheIMAPServer
Project:SendingMemberDuesReminderEmails
Step1:OpentheExcelFileStep2:FindAllUnpaidMembersStep3:SendCustomizedEmailReminders
SendingTextMessageswithTwilio
SigningUpforaTwilioAccountSendingTextMessages
Project:“JustTextMe”ModuleSummaryPracticeQuestionsPracticeProjects
RandomChoreAssignmentEmailerUmbrellaReminderAutoUnsubscriberControllingYourComputerThroughEmail
17.ManipulatingImages
ComputerImageFundamentals
ColorsandRGBAValuesCoordinatesandBoxTuples
ManipulatingImageswithPillow
WorkingwiththeImageDataTypeCroppingImagesCopyingandPastingImagesontoOtherImagesResizinganImageRotatingandFlippingImagesChangingIndividualPixels
Project:AddingaLogo
Step1:OpentheLogoImageStep2:LoopOverAllFilesandOpenImagesStep3:ResizetheImagesStep4:AddtheLogoandSavetheChangesIdeasforSimilarPrograms
DrawingonImages
DrawingShapes
PointsLinesRectanglesEllipsesPolygonsDrawingExample
DrawingText
SummaryPracticeQuestionsPracticeProjects
ExtendingandFixingtheChapterProjectProgramsIdentifyingPhotoFoldersontheHardDriveCustomSeatingCards
18.ControllingtheKeyboardandMousewithGUIAutomation
InstallingthepyautoguiModule
StayingonTrack
ShuttingDownEverythingbyLoggingOutPausesandFail-Safes
ControllingMouseMovement
MovingtheMouseGettingtheMousePosition
Project:“WhereIstheMouseRightNow?”
Step1:ImporttheModuleStep2:SetUptheQuitCodeandInfiniteLoopStep3:GetandPrinttheMouseCoordinates
ControllingMouseInteraction
ClickingtheMouseDraggingtheMouseScrollingtheMouse
WorkingwiththeScreen
GettingaScreenshotAnalyzingtheScreenshot
Project:ExtendingthemouseNowProgramImageRecognitionControllingtheKeyboard
SendingaStringfromtheKeyboardKeyNamesPressingandReleasingtheKeyboardHotkeyCombinations
ReviewofthePyAutoGUIFunctionsProject:AutomaticFormFiller
Step1:FigureOuttheStepsStep2:SetUpCoordinatesStep3:StartTypingDataStep4:HandleSelectListsandRadioButtonsStep5:SubmittheFormandWait
SummaryPracticeQuestions
PracticeProjects
LookingBusyInstantMessengerBotGame-PlayingBotTutorial
A.InstallingThird-PartyModules
ThepipToolInstallingThird-PartyModules
B.RunningPrograms
ShebangLineRunningPythonProgramsonWindowsRunningPythonProgramsonOSXandLinux
C.AnswerstothePracticeQuestions
Chapter1Chapter2Chapter3Chapter4Chapter5Chapter6Chapter7Chapter8Chapter9Chapter10Chapter11Chapter12Chapter13Chapter14Chapter15Chapter16Chapter17Chapter18
D.ResourcesIndexCopyright