lecture 14 - web.stanford.edu
TRANSCRIPT
Lecture14Greedyalgorithms!
Announcements
• HW6DueFriday!
• TONSOFPRACTICEONDYNAMICPROGRAMMING
Lastweek
Roadmap
Graphs!
Asymptotic
Analysis
Dynamic
ProgrammingGreedyAlgs
MIDTERM
The
Future!
Moredetailedscheduleonthewebsite!
Thisweek
• Greedyalgorithms!
• Buildsonourideasfromdynamicprogramming
Greedyalgorithms
• Makechoicesone-at-a-time.
• Neverlookback.
• Hopeforthebest.
Today
• Onenon-exampleofagreedyalgorithm:
• Knapsackagain
• Threeexamplesofgreedyalgorithms:
• ActivitySelection
• JobScheduling
• HuffmanCoding
Non-example
• UnboundedKnapsack.
• (Frompre-lectureexercise)
• UnboundedKnapsack:
• SupposeIhaveinfinitecopiesofalloftheitems.
• What’sthemostvaluablewaytofilltheknapsack?
• “Greedy”algorithmforunboundedknapsack:
• TacoshavethebestValue/Weightratio!
• Keepgrabbingtacos!
Weight:
Value:
6 2 4 3 11
20 8 14 3513
Item:
Capacity:10
Totalweight:10
Totalvalue:42
Totalweight:9
Totalvalue:39
ExamplewheregreedyworksActivityselection
FrisbeePractice
Orchestra
CS161study
group
Sleep
CS110
Class
TheoryLunch
TheorySeminar
Combinatorics
Seminar
Underwaterbasket
weavingclass
Math51Class
CS161Class
CS166Class
CS161
Section
CS161Office
Hours
Swimming
lessons
Programming
teammeeting
Socialactivity
time
Youcanonlydooneactivityatatime,andyouwantto
maximizethenumberofactivitiesthatyoudo.
Whattochoose?
Activityselection
• Input:
• Activitiesa1,a2,…,an• Starttimess1,s2,…,sn• Finishtimesf1,f2,…,fn
• Output:
• Howmanyactivitiescanyoudotoday?
GreedyAlgorithm
a3a1
a4a2
a5
a7
a6
time
• Pickactivityyoucanaddwiththesmallestfinishtime.
• Repeat.
GreedyAlgorithm
a3a1
a4a2
a5
a7
a6
time
• Pickactivityyoucanaddwiththesmallestfinishtime.
• Repeat.
GreedyAlgorithm
a3a1
a4a2
a5
a7
a6
time
• Pickactivityyoucanaddwiththesmallestfinishtime.
• Repeat.
GreedyAlgorithm
a3a1
a4a2
a5
a7
a6
time
• Pickactivityyoucanaddwiththesmallestfinishtime.
• Repeat.
GreedyAlgorithm
a3a1
a4a2
a5
a7
a6
time
• Pickactivityyoucanaddwiththesmallestfinishtime.
• Repeat.
GreedyAlgorithm
a3a1
a4a2
a5
a7
a6
time
• Pickactivityyoucanaddwiththesmallestfinishtime.
• Repeat.
GreedyAlgorithm
a3a1
a4a2
a5
a7
a6
time
• Pickactivityyoucanaddwiththesmallestfinishtime.
• Repeat.
GreedyAlgorithm
a3a1
a4a2
a5
a7
a6
time
• Pickactivityyoucanaddwiththesmallestfinishtime.
• Repeat.
Atleastit’sfast
• Runningtime:
• O(n)iftheactivitiesarealreadysortedbyfinishtime.
• OtherwiseO(nlog(n))ifyouhavetosortthemfirst.
Whatmakesitgreedy?
• Ateachstepinthealgorithm,makeachoice.
• Hey,Icanincreasemyactivitysetbyone,
• Andleavelotsofroomforfuturechoices,
• Let’sdothatandhopeforthebest!!!
• Hope thatattheendoftheday,thisresultsinagloballyoptimalsolution.
Threequestions
1. Doesthisgreedyalgorithmforactivityselectionwork?
2. Ingeneral,whenaregreedyalgorithmsagoodidea?
3. The“greedy”approachisoftenthefirstyou’dthinkof…
• Whyarewegettingtoitnow,inWeek8?
Answers
1. Doesthisgreedyalgorithmforactivityselectionwork?
• Yes.
2. Ingeneral,whenaregreedyalgorithmsagoodidea?
• Whentheyexhibitespeciallyniceoptimalsubstructure.
3. The“greedy”approachisoftenthefirstyou’dthinkof…
• Whyarewegettingtoitnow,inWeek8?
• Relatedtodynamicprogramming!(WhichwedidinWeek7).
• Provingthatgreedyalgorithmsworkisoftennotsoeasy.
(Seemsto:IPython notebook…) (Butnowlet’sseewhy…)
Whydoesitwork?
• Wheneverwemakeachoice,wedon’truleoutanoptimalsolution.
a3a1
a4a2
a5
a7
a6
time
a5a3
a7
There’ssomeoptimalsolutionthat
containsournextchoiceOurnext
choicewould
bethisone:
Toseethis,consider
OptimalSubstructure
• Subproblem i :
• A[i]=NumberofactivitiesyoucandoafterActivityi finishes.
ai
a2
a7
a6
time
a4
aka3
Wanttoshow:whenwemakeachoiceak,theoptimalsolution
tothesmallersub-problemkwillhelpussolvesub-problemi
Claim
• Letak havethesmallestfinishtimeamongactivitiesdo-ableafterai finishes.
• ThenA[i]=A[k]+1.
akai
a2
a7
a6
time
a4
aka3
A[k]:howmany
activitiescanIdohere?
A[i]:howmanyactivitiescanIdohere?
Proof• Letak havethesmallestfinishtimeamongactivitiesdo-ableafterai finishes.
• ThenA[i]=A[k]+1.
a1ai
a2
a7
a6
time
a4
aka3
• ClearlyA[i]≥ A[k]+1• SincewehaveasolutionwithA[k]+1activities.
ai
a2
Proof• Letak havethesmallestfinishtimeamongactivitiesdo-ableafterai finishes.
• ThenA[i]=A[k]+1.
• SupposetowardacontradictionthatA[i]> A[k]+1.
• There’ssomebettersolutiontosubproblem(i)that
doesn’tuseak• Sayaj endsfirstafterai inthatbettersolution.
• Removeaj andaddak fromthebettersolution.
akai
a2
a7
a6
time
a4
a3 a7a4
a3
aj
Thesetwodon’tcount
forsub-problem(i)so
let’sgreythemout.
Proof• Letak havethesmallestfinishtimeamongactivitiesdo-ableafterai finishes.
• ThenA[i]=A[k]+1.
• SupposetowardacontradictionthatA[i]> A[k]+1.
• There’ssomebettersolutiontosubproblem(i)that
doesn’tuseak• Sayaj endsfirstafterai inthatbettersolution.
• Removeaj andaddak fromthebettersolution.
• Nowyouhaveasolutionofthesamesize…
butitincludesak soitmusthavesize≤A[k]+1.ak
ai
a2
a7
a6
time
aj
a3 a7a3
Proof• Letak havethesmallestfinishtimeamongactivitiesdo-ableafterai finishes.
• ThenA[i]=A[k]+1.
a1ai
a2
a7
a6
time
a4
aka3
• ClearlyA[i]≥ A[k]+1• SincewehaveasolutionwithA[k]+1activities.
• Andwejustshowed A[i]≤ A[k]+1• Bycontradiction
• Thatprovestheclaim.
Weneverruleoutanoptimalsolution
• We’ve shown:
• Ifwechooseak havethesmallestfinishtimeamongactivitiesdo-ableafterai finishes,thenA[i]=A[k]+1.
• Thatis:
• Assumethatwehaveanoptimalsolutionuptoai• Byaddingak wearestillontracktohitthatoptimalvalue
ai
a2
a7
a6
time
a4
aka3
Sothealgorithmiscorrect
• Weneverruleoutanoptimalsolution
• Attheendofthealgorithm,we’vegotasolution.
• It’snotnotoptimal.
• Soitmustbeoptimal.
LuckytheLackadaisicalLemur
Sothealgorithmiscorrect
• InductiveHypothesis:• Afteraddingthet’th thing,thereisanoptimalsolutionthatextendsthecurrentsolution.
• Basecase:• Afteraddingzeroactivities,thereisanoptimalsolutionextendingthat.
• Inductivestep:• TODO
• Conclusion:• Afteraddingthelastactivity,thereisanoptimalsolutionthatextendsthecurrentsolution.
• Thecurrentsolutionistheonlysolutionthatextendsthecurrentsolution.
• Sothecurrentsolutionisoptimal.
PluckythePedanticPenguin
Inductivestep
• Supposethatafteraddingthet’th thing(Activityi),thereisanoptimalsolution:
• XactivitiesdoneandA[i]activitiesleft.
• Thenweaddthe(t+1)’st thing(Activityk).
• A[k]=A[i]- 1(bytheclaim)
• Now:
• X+1activitiesdoneandA[i]– 1activitiesleft.
• Samenumberasbefore!
• Stilloptimal.
Sothealgorithmiscorrect
• InductiveHypothesis:• Afteraddingthet’th thing,thereisanoptimalsolutionthatextendsthecurrentsolution.
• Basecase:• Afteraddingzeroactivities,thereisanoptimalsolutionextendingthat.
• Inductivestep:• TODO
• Conclusion:• Afteraddingthelastactivity,thereisanoptimalsolutionthatextendsthecurrentsolution.
• Thecurrentsolutionistheonlysolutionthatextendsthecurrentsolution.
• Sothecurrentsolutionisoptimal.
PluckythePedanticPenguin
Commonstrategyforgreedyalgorithms
• Makeaseriesofchoices.
• Showthat,ateachstep,ourchoicewon’truleoutanoptimalsolution attheendoftheday.
• Afterwe’vemadeallourchoices,wehaven’truledoutanoptimalsolution,sowemusthavefoundone.
Commonstrategy(formally)forgreedyalgorithms
• InductiveHypothesis:
• Aftergreedychoicet,youhaven’truledoutsuccess.
• Basecase:
• Successispossiblebeforeyoumakeanychoices.
• Inductivestep:
• TODO
• Conclusion:
• Ifyoureachtheendofthealgorithmandhaven’truledoutsuccessthenyoumusthavesucceeded.
DPviewofactivityselection
• Thisalgorithmismostnaturallyviewedasa
greedyalgorithm.• Makegreedychoices
• Neverruleoutsuccess
• But,wecouldviewitasaDPalgorithm• Takeadvantageofoptimalsub-structureandfill
inatable.
• We’lldothatnow.• Justforpedagogy!
• (Thisisn’tthebestwaytothinkaboutactivity
selection).
RecipeforapplyingDynamicProgramming
• Step1:Identifyoptimalsubstructure.
• Step2:Findarecursiveformulationforthevalueoftheoptimalsolution.
• Step3:Usedynamicprogrammingtofindthevalueoftheoptimalsolution.
• Step4:Ifneeded,keeptrackofsomeadditionalinfosothatthealgorithmfromStep3canfindtheactualsolution.
• Step5:Ifneeded,codethisuplikeareasonableperson.
Optimalsubstructure
• Subproblem i:
• A[i]=numberofactivitiesyoucandoafterActivityi finishes.
ai
a2
a7
a6
time
a4
a1a3
RecipeforapplyingDynamicProgramming
• Step1:Identifyoptimalsubstructure.
• Step2:Findarecursiveformulationforthevalueoftheoptimalsolution.
• Step3:Usedynamicprogrammingtofindthevalueoftheoptimalsolution.
• Step4:Ifneeded,keeptrackofsomeadditionalinfosothatthealgorithmfromStep3canfindtheactualsolution.
• Step5:Ifneeded,codethisuplikeareasonableperson.
Wedidthatalready
• Letak havethesmallestfinishtimeamongactivitiesdo-ableafterai finishes.
• ThenA[i]=A[k]+1.
a1ai
a2
a7
a6
time
a4
aka3
A[k]:howmany
activitiescanIdohere?
A[i]:howmanyactivitiescanIdohere?
RecipeforapplyingDynamicProgramming
• Step1:Identifyoptimalsubstructure.
• Step2:Findarecursiveformulationforthevalueoftheoptimalsolution.
• Step3:Usedynamicprogrammingtofindthevalueoftheoptimalsolution.
• Step4:Ifneeded,keeptrackofsomeadditionalinfosothatthealgorithmfromStep3canfindtheactualsolution.
• Step5:Ifneeded,codethisuplikeareasonableperson.
Top-downDP
• InitializeaglobalarrayAto[None,…,None]
• Makea“dummy”activitythatendsattime-1.
• def findNumActivities(i):
• IfA[i]!=None:
• Return A[i]
• LetActivitykbetheactivityIcanfitinmyscheduleafterActivityi withthesmallestfinishtime.
• If thereisnosuchactivityk,setA[i]=0
• Else,A[i]=findNumActivities(k)+1
• Return A[i]
• Return findNumActivities(0)
Thisisaterriblewaytowritethis!
Theonlythingthatmattershereisthatthe
highlightedlinesareourrecursiverelationship.
SeeIPython notebookfor
implementation
RecipeforapplyingDynamicProgramming
• Step1:Identifyoptimalsubstructure.
• Step2:Findarecursiveformulationforthevalueoftheoptimalsolution.
• Step3:Usedynamicprogrammingtofindthevalueoftheoptimalsolution.
• Step4:Ifneeded,keeptrackofsomeadditionalinfosothatthealgorithmfromStep3canfindtheactualsolution.
• Step5:Ifneeded,codethisuplikeareasonableperson.
Top-downDP
• InitializeaglobalarrayAto[None,…,None]
• InitializeaglobalarrayNextto[None,…,None]
• Makea“dummy”activitythatendsattime-1.
• def findNumActivities(i):• IfA[i]!=None:
• Return A[i]• LetActivitykbetheactivityIcanfitinmyscheduleafterActivityi withthesmallestfinishtime.
• If thereisnosuchactivityk,setA[i]=0• Else,A[i]=findNumActivities(k)+1and Next[i]=k• Return A[i]
• findNumActivities(0)
• Stepthrough“Next”arraytogetschedule.
Thisisaterriblewaytowritethis!
Theonlythingthatmattershereisthatthe
highlightedlinesareourrecursiverelationship.
SeeIPython notebookfor
implementation
Let’sstepthroughit.(SeeIPython notebookforcodewithsomeprintstatements)
Thislooksprettyfamiliar!!
Let’sstepthroughit.
a3a1
a4a2
a5
a7
a6
time
• Startwiththeactivitywiththesmallestfinishtime.
Let’sstepthroughit
a3a1
a4a2
a5
a7
a6
time
• Nowfindthenextactivitystilldo-ablewiththesmallestfinishtime,andrecurse afterthat.
Let’sstepthroughit
a3a1
a4a2
a5
a7
a6
time
• Nowfindthenextactivitystilldo-ablewiththesmallestfinishtime,andrecurse afterthat.
Let’sstepthroughit
a3a1
a4a2
a5
a7
a6
time
• Nowfindthenextactivitystilldo-ablewiththesmallestfinishtime,andrecurse afterthat.
Let’sstepthroughit
a3a1
a4a2
a5
a7
a6
time
• Ta-da!
It’sexactlythesame*asthegreedysolution!
*ifyouimplementthetop-downDPsolutionappropriately.
Sub-problemgraphview
• Divide-and-conquer:
Bigproblem
sub-problemsub-problem
sub-sub-
problem
sub-sub-
problem
sub-sub-
problem
sub-sub-
problem
sub-sub-
problem
Sub-problemgraphview
• DynamicProgramming:
Bigproblem
sub-problemsub-problem
sub-sub-
problemsub-sub-
problem
sub-sub-
problem
sub-sub-
problem
sub-problem
Sub-problemgraphview
• Greedyalgorithms:
Bigproblem
sub-sub-
problem
sub-problem
Sub-problemgraphview
• Greedyalgorithms:
Bigproblem
sub-sub-
problem
sub-problem
• Notonlyisthereoptimalsub-structure:• optimalsolutionstoaproblemaremadeup
fromoptimalsolutionsofsub-problems
• buteachproblemdependsononlyone
sub-problem.
Answers
1. Doesthisgreedyalgorithmforactivityselectionwork?
• Yes.
2. Ingeneral,whenaregreedyalgorithmsagoodidea?
• Whentheyexhibitespeciallyniceoptimalsubstructure.
3. The“greedy”approachisoftenthefirstyou’dthinkof…
• Whyarewegettingtoitnow,inWeek8?
• Relatedtodynamicprogramming!(WhichwedidinWeek7).
• Provingthatgreedyalgorithmsworkisoftennotsoeasy.
Let’sseeafewmoreexamples
Anotherexample:
Scheduling
Overcommitted
StanfordStudent
CS161HW!
Callyourparents!
MathHW!
EconHW!
Practicemusicalinstrument!
ReadCLRS!
Haveasociallife!
Sleep!
Administrativestuffforyourstudentclub!
Dolaundry!
Meditate!
Scheduling
• ntasks
• Taski takesti hours
• Everythingisalreadylate!
• Foreveryhourthatpassesuntiltaski isdone,payci
• CS161HW,thenSleep:costs10⋅ 2+(10+8)⋅ 3=74units• Sleep,thenCS161HW:costs8⋅ 3+(10+8)⋅ 2=60units
CS161HW!
Sleep!
10hours
8hours
Cost:2 unitsper
houruntilit’sdone.
Cost:3unitsper
houruntilit’sdone.
Optimalsubstructure
• Thisproblembreaksupnicelyintosub-problems:
JobA JobB JobC JobD
Supposethisistheoptimalschedule:
Thenthismustbetheoptimal
scheduleonjustjobsB,C,D.
Optimalsubstructure
• Seemsamenabletoagreedyalgorithm:
JobA JobB JobC JobD
Takethebestjobfirst Thensolvethisproblem
JobBJobC JobD
Takethebestjobfirst Thensolvethisproblem
JobBJobD
Takethebestjobfirst
(Thatone’seasyJ )
Thensolvethisproblem
Whatdoes“best”mean?
• Recipeforgreedyalgorithmanalysis:
• Wemakeaseriesofchoices.
• Weshowthat,ateachstep,ourchoicewon’truleoutanoptimalsolution attheendoftheday.
• Afterwe’vemadeallourchoices,wehaven’truledoutanoptimalsolution,sowemusthavefoundone.
JobA JobB JobC JobD
“Best”means:won’truleoutanoptimalsolution.
Theoptimalsolutiontothisproblemextendsanoptimalsolutiontothewholething.
Head-to-head
• Ofthesetwojobs,whichshouldwedofirst?
• Cost(AthenB)=x⋅z+(x+y) ⋅ w• Cost(BthenA)=y ⋅w+(x+y) ⋅z
JobA
JobB
xhours
y hours
Cost:z unitsper
houruntilit’sdone.
Cost:w unitsper
houruntilit’sdone.
AthenBisbetterthanBthenAwhen:
𝑥𝑧 + 𝑥 + 𝑦 𝑤 ≤ 𝑦𝑤 + 𝑥 + 𝑦 𝑧𝑥𝑧 + 𝑥𝑤 + 𝑦𝑤 ≤ 𝑦𝑤 + 𝑥𝑧 + 𝑦𝑧
𝑤𝑥 ≤ 𝑦𝑧𝑤𝑦 ≤
𝑧𝑥
Whatmattersistheratio:
costofdelaytimeittakes
Dothejobwiththe
biggestratiofirst.
Lemma
• GivenjobssothatJobi takestime ti withcostci ,
• Thereisanoptimalschedulesothatthefirstjobistheonethatmaximizestheratioci/ti
• Proof:
• SayJobBmaximizesthisratio,andit’snotfirst:
• SwitchAandB!Nothingelsewillchange,andweshowedonthepreviousslidethatthecostwon’tincrease.
• RepeatuntilBisfirst.
JobA JobB
cA/tA >=cB/tB
JobC JobD
JobAJobBJobC JobD
Choosegreedily:Biggestcost/timeratiofirst
• Jobi takestime ti withcostci
• Thereisanoptimalschedulesothatthefirstjobistheonethatmaximizestheratioci/ti
• Soifwechoosejobsgreedilyaccordingtoci/ti,weneverruleoutsuccess!
GreedySchedulingSolution
• scheduleJobs(JOBS):
• SortJOBSbytheratio:
• 𝒓𝒊 = 𝒄𝒊𝒕𝒊 =
costofdelayingjobitimejobitakestocomplete
• Saythatsorted_JOBS[i] isthejobwiththei’th biggestri• Return sorted_JOBS
TherunningtimeisO(nlog(n))
Nowyoucangoaboutyourschedule
peacefully,intheoptimalway.
Formally,useinduction!
• Inductivehypothesis:
• Thereisanoptimalorderingsothatthefirsttjobsaresorted_JOBS[:t].
• Basecase:
• Whent=0,thisreads:“Thereisanoptimalorderingsothatthefirst0jobsare[]”
• That’strue.
• InductiveStep:
• Boilsdownto:thereisanoptimalorderingonsorted_JOBS[t:]sothatsorted_JOBS[t]isfirst.
• ThisfollowsfromtheLemma.
• Conclusion:
• Whent=n,thisreads:“Thereisanoptimalorderingsothatthefirstnjobsaresorted_JOBS.”
• aka,whatwereturnedisanoptimalordering.
SLIDESKIPPEDINCLASS
Whathavewelearned?
• Agreedyalgorithmworksforscheduling
• Thisfollowedthesameoutlineasthepreviousexample:
• Identifyoptimalsubstructure:
• Findawaytomake“safe”choicesthatwon’truleoutanoptimalsolution.
• largestratiosfirst.
JobA JobB JobC JobD
OnemoreexampleHuffmancoding
• everyday english sentence• 01100101011101100110010101110010011110010110010001100001011110010010000001100101011011100110011101101100011010010111001101101000001000000111001101100101011011100111010001100101011011100110001101100101
• qwertyui_opasdfg+hjklzxcv• 01110001011101110110010101110010011101000111100101110101011010010101111101101111011100000110000101110011011001000110011001100111001010110110100001101010011010110110110001111010011110000110001101110110
OnemoreexampleHuffmancoding
• everyday english sentence• 01100101 0111011001100101 01110010011110010110010001100001011110010010000001100101 011011100110011101101100011010010111001101101000001000000111001101100101 011011100111010001100101 011011100110001101100101
• qwertyui_opasdfg+hjklzxcv• 01110001011101110110010101110010011101000111100101110101011010010101111101101111011100000110000101110011011001000110011001100111001010110110100001101010011010110110110001111010011110000110001101110110
ASCIIisprettywasteful.Ife
showsupsooften,weshould
haveamoreparsimoniousway
ofrepresentingit!
Supposewehavesomedistributiononcharacters
Supposewehavesomedistributiononcharacters
A B C D E F
Percentage
Letter
45
1312
16
9
5
Forsimplicity,
let’sgowiththis
made-upexample
Howtoencodethemas
efficientlyaspossible?
Try0(likeASCII)
A B C D E F
Percentage
Letter
45
1312
16
9
5
000 011001 010 100 101
• Everyletterisassignedabinarystring
ofthreebits.
Wasteful!
• 110and111areneverused.
• Weshouldhaveashorterwayof
representingA.
Try1
A B C D E F
Percentage
Letter
45
1312
16
9
5
0 100 01 10 11
• Everyletterisassignedabinarystring
ofoneortwobits.
• Themorefrequentlettersgetthe
shorterstrings.
• Problem:
• Does000meanAAAorBAorAB?
Try2:prefix-freecoding
A B C D E F
Percentage
Letter
45
1312
16
9
5
01 00101 110 111 100
• Everyletterisassignedabinarystring.
• Morefrequentlettersgetshorterstrings.
• Noencodedstringisaprefixofanyother.
10010101
Confusingly,“prefix-freecodes”arealsosometimes
called“prefixcodes”(includinginCLRS).
Try2:prefix-freecoding
A B C D E F
Percentage
Letter
45
1312
16
9
5
01 00101 110 111 100
• Everyletterisassignedabinarystring.
• Morefrequentlettersgetshorterstrings.
• Noencodedstringisaprefixofanyother.
10010101 F
Confusingly,“prefix-freecodes”arealsosometimes
called“prefixcodes”(includinginCLRS).
Try2:prefix-freecoding
A B C D E F
Percentage
Letter
45
1312
16
9
5
01 00101 110 111 100
• Everyletterisassignedabinarystring.
• Morefrequentlettersgetshorterstrings.
• Noencodedstringisaprefixofanyother.
10010101 FB
Confusingly,“prefix-freecodes”arealsosometimes
called“prefixcodes”(includinginCLRS).
Try2:prefix-freecoding
A B C D E F
Percentage
Letter
45
1312
16
9
5
01 00101 110 111 100
• Everyletterisassignedabinarystring.
• Morefrequentlettersgetshorterstrings.
• Noencodedstringisaprefixofanyother.
10010101 FBA
Question:Whatisthemost
efficientwaytodoprefix-free
coding?(Thisisn’tit).
Confusingly,“prefix-freecodes”arealsosometimes
called“prefixcodes”(includinginCLRS).
Aprefix-freecodeisatree
D:16A:45
B:13F:5 C:12 E:9
0
0 0
0 0 1
1
1
1
1
00 01
100 101 110 111Aslongasalltheletters
showupasleaves,this
codeis prefix-free.
B:13belowmeansthat‘B’
makesup13%ofthe
charactersthateverappear.
Sometreesarebetterthanothers
D:16A:45
B:13F:5 C:12 E:9
0
0 0
0 0 1
1
1
1
1
00 01
100 101 110 111
• Imaginechoosingaletteratrandomfromthelanguage.
• Notuniform,butaccordingtoourhistogram!
• Thecostofatreeistheexpectedlengthoftheencodingofthatletter.
Expectedcostofencodingaletterwiththistree:
𝟐 𝟎. 𝟒𝟓 + 𝟎. 𝟏𝟔 + 𝟑 𝟎. 𝟎𝟓 + 𝟎. 𝟏𝟑 + 𝟎. 𝟏𝟐 + 𝟎. 𝟎𝟗 = 𝟐. 𝟑𝟗
Cost=
K 𝑃 𝑥 ⋅ depth(𝑥)�
QRSTRUV P(x)isthe
probability
ofletterx
Thedepthinthe
treeisthelength
oftheencoding
Question
• GivenadistributionP onletters,findthelowest-costtree,where
cost(tree) = K 𝑃 𝑥 ⋅ depth(𝑥)�
XYZ[Y\V P(x)isthe
probability
ofletterx
Thedepthinthe
treeisthelength
oftheencoding
Optimalsub-structure
• Supposethisisanoptimaltree:
10
Thenthisisan
optimaltreeon
fewerletters.
Otherwise,wecould
changethissub-tree
andendupwitha
betteroveralltree.
Inordertodesignagreedyalgorithm
• Thinkaboutwhatlettersbelonginthissub-problem...
10What’sasafe
choicetomake
fortheselower
sub-trees?
Infrequent
elements!Wewantthemaslow
downaspossible.
Solutiongreedilybuildsubtrees,startingwiththeinfrequentletters
D:16A:45 B:13 F:5C:12 E:9
14
0 1
Solutiongreedilybuildsubtrees,startingwiththeinfrequentletters
D:16A:45 B:13 F:5C:12 E:9
14
0 1
25
0 1
Solutiongreedilybuildsubtrees,startingwiththeinfrequentletters
D:16A:45 B:13 F:5C:12 E:9
14
0 1
25
0 1
30
1
0
Solutiongreedilybuildsubtrees,startingwiththeinfrequentletters
D:16A:45 B:13 F:5C:12 E:9
14
0 1
25
0 1
30
1
0
551
0
Solutiongreedilybuildsubtrees,startingwiththeinfrequentletters
D:16A:45 B:13 F:5C:12 E:9
14
0 1
25
0 1
30
1
0
551
0
1001
0
Solutiongreedilybuildsubtrees,startingwiththeinfrequentletters
D:16
A:45
B:13
F:5
C:12
E:9
14
0 1
25
0 1
30
10
5510
100
10
0
100 101 110
1110 1111
Expectedcostofencodingaletter:
𝟏 ⋅ 𝟎. 𝟒𝟓+
𝟑 ⋅ 𝟎. 𝟒𝟏+
𝟒 ⋅ 𝟎. 𝟏𝟒= 𝟐. 𝟐𝟒
Whatexactlywasthealgorithm?
• Createanodelikeforeachletter/frequency
• Thekeyisthefrequency(16inthiscase)
• LetCURRENT bethelistofallthesenodes.
• while len(CURRENT)>1:
• X andY← thenodesinCURRENT withthesmallestkeys.
• CreateanewnodeZ withZ.key =X.key +Y.key
• SetZ.left =X,Z.right =Y
• AddZ toCURRENT andremoveX andY
• returnCURRENT[0]
D:16
F:5 E:9
14
0 1
Y
Z
XD:16A:45 B:13 C:12
Doesitwork?
• Yes.
• Samestrategy:
• Showthatateachstep,thechoiceswearemakingwon’truleoutanoptimalsolution.
• Lemma:
• Supposethatxandyarethetwoleast-frequentletters.Thenthereisanoptimaltreewherexandyaresiblings.
D:16A:45 B:13 F:5C:12 E:9
14
0 1
Lemmaproofidea
• Saythatanoptimaltreelookslikethis:
• Whathappenstothecostifweswapxfora?• thecostcan’tincrease;awasmorefrequentthanx,andwejustmadeitsencodingshorter.
• Repeatthislogicuntilwegetanoptimaltreewithxandyassiblings.• Thecostneverincreasedsothistreeisstilloptimal.
Ifxandyarethetwoleast-frequentletters,there
isanoptimaltreewherexandyaresiblings.
x
a
Lowest-levelsibling
nodes:atleastoneof
themisneitherxnory
Lemmaproofidea
• Saythatanoptimaltreelookslikethis:
• Whathappenstothecostifweswapxfora?• thecostcan’tincrease;awasmorefrequentthanx,andwejustmadeitsencodingshorter.
• Repeatthislogicuntilwegetanoptimaltreewithxandyassiblings.• Thecostneverincreasedsothistreeisstilloptimal.
x y
Lowest-levelsibling
nodes:atleastoneof
themisneitherxnory
Ifxandyarethetwoleast-frequentletters,there
isanoptimaltreewherexandyaresiblings.
Proofstrategyjustlikebefore
• Showthatateachstep,thechoiceswearemakingwon’truleoutanoptimalsolution.
• Lemma:
• Supposethatxandyarethetwoleast-frequentletters.Thenthereisanoptimaltreewherexandyaresiblings.
D:16A:45 B:13 F:5C:12 E:9
14
0 1
Proofstrategyjustlikebefore
• Showthatateachstep,thechoiceswearemakingwon’truleoutanoptimalsolution.
• Lemma:
• Supposethatxandyarethetwoleast-frequentletters.Thenthereisanoptimaltreewherexandyaresiblings.
That’senoughtoshowthatwe
don’truleoutoptimalityafter
thefirststep.
Whataboutoncewestart
groupingstuff?
D:16A:45 B:13 F:5C:12 E:9
0 1
25
01
1
014
30
Lemma2thisdistinctiondoesn’treallymatter
D:16
F:5E:9
14
0 1
25
0 1
30
10
5510
100
10
C:12B:13
A:45 A:4555
10
100
10
G:25H:30
Thefirstthingisanoptimal
treeon{A,B,C,D,E,F}
ifandonlyif
thesecondthingisan
optimaltreeon{A,G,H}
• Foraproof:
• SeeCLRS,Lemma16.3
• Rigorousalthoughpresentedinaslightlydifferentway
• SeeLectureNotes14
• Abitsketchier,butpresentedinthesamewayashere
• Proveityourself!
• Thisisthebest!
Siggi theStudiousStork
Gettingallthedetails
isn’tthatimportant,but
youshouldconvince
yourselfthatthisistrue.
Lemma2thisdistinctiondoesn’treallymatter
Together
• Lemma1:
• Supposethatxandyarethetwoleast-frequentletters.Thenthereisanoptimaltreewherexandyaresiblings.
• Lemma2:
• WemayaswellimaginethatCURRENTcontainsonlyleaves.
• Theseimply:
• Ateachstep,ourchoicedoesn’truleoutanoptimaltree.
Thewholeargument
• Inductivehypothesis:• afterthet’th step,
• thereisanoptimaltreecontainingthecurrentsubtreesas“leaves”
• Basecase:• afterthe0’thstep,
• thereisanoptimaltreecontainingallthecharacters.
• Inductivestep:• TODO
• Conclusion:• afterthelaststep,
• thereisanoptimaltreecontainingthiswholetreeasasubtree.
• aka,• afterthelaststepthetreewe’veconstructedisoptimal.
Afterthet’th step,we’vegotabunchofcurrentsub-trees:
Inductivehyp.asserts
thatoursubtreescanbe
assembledintoan
optimaltree:
Inductivestep
• Supposethattheinductivehypothesisholdsfort-1
• Aftert-1steps,thereisanoptimaltreecontainingallthecurrentsub-treesas“leaves.”
• Wanttoshow:
• Aftertsteps,thereisanoptimaltreecontainingallthecurrentsub-treesasleaves.
We’vegotabunchofcurrentsub-trees:
xy
saythatxandyarethetwosmallest.
wz
Inductivestep
• Supposethattheinductivehypothesisholdsfort-1
• Aftert-1steps,thereisanoptimaltreecontainingallthecurrentsub-treesas“leaves.”
• ByLemma2,mayaswelltreatas
We’vegotabunchofcurrentsub-trees:
xyw
saythatxandyarethetwosmallest.
aa
yxw
z
z
Inductivestep
• Supposethattheinductivehypothesisholdsfort-1
• Aftert-1steps,thereisanoptimaltreecontainingallthecurrentsub-treesas“leaves.”
• ByLemma2,mayaswelltreatas
• Inparticular,optimaltreesonthisnewalphabetcorrespondtooptimaltreesontheoriginalalphabet.
We’vegotabunchofcurrentsub-trees:
xyw
saythatxandyarethetwosmallest.
aa
zwyx
z
Inductivestep
• Supposethattheinductivehypothesisholdsfort-1
• Aftert-1steps,thereisanoptimaltreecontainingallthecurrentsub-treesas“leaves.”
• Ouralgorithmwoulddothisatlevelt:
We’vegotabunchofcurrentsub-trees:
xyw
saythatxandyarethetwosmallest.
xy
wa a=x+y
z
zwyx
z
Inductivestep
• Supposethattheinductivehypothesisholdsfort-1
• Aftert-1steps,thereisanoptimaltreecontainingallthecurrentsub-treesas“leaves.”
• Ouralgorithmwoulddothisatlevelt:
We’vegotabunchofcurrentsub-trees:
xyw
saythatxandyarethetwosmallest.
zw
a
yx
xy
wa a=x+y
Lemma1impliesthatthere’s
anoptimalsub-treethatlooks
likethis;aka,whatour
algorithmdidokay.
z
z
Inductivestep
• Supposethattheinductivehypothesisholdsfort-1
• Aftert-1steps,thereisanoptimaltreecontainingallthecurrentsub-treesas“leaves.”
• Ouralgorithmwoulddothisatlevelt:
We’vegotabunchofcurrentsub-trees:
xyw
saythatxandyarethetwosmallest.
w
a
xy
wa a=x+y
Lemma2againsaysthat
there’sanoptimaltreethat
lookslikethis
z
yxz
z
Inductivestep
• Supposethattheinductivehypothesisholdsfort-1
• Aftert-1steps,thereisanoptimaltreecontainingallthecurrentsub-treesas“leaves.”
• Ouralgorithmwoulddothisatlevelt:
We’vegotabunchofcurrentsub-trees:
xyw
saythatxandyarethetwosmallest.
w
a
xy
wa a=x+y
Lemma2againsaysthat
there’sanoptimaltreethat
lookslikethis
z
yxz
Thisiswhatwe
wantedtoshowfor
theinductivestep.
z
Inductiveoutline:
• Inductivehypothesis:• afterthet’th step,
• thereisanoptimaltreecontainingthecurrentsubtreesas“leaves”
• Basecase:• afterthe0’thstep,
• thereisanoptimaltreecontainingallthevertices.
• Inductivestep:• TODO
• Conclusion:• afterthelaststep,
• thereisanoptimaltreecontainingthiswholetreeasasubtree.
• aka,• afterthelaststepthetreewe’veconstructedisoptimal.
Afterthet’th step,we’vegotabunchofcurrentsub-trees:
Inductivehyp.asserts
thatoursubtreescanbe
assembledintoan
optimaltree:
Whathavewelearned?
• ASCIIisn’tanoptimalwaytoencodeEnglish,sincethedistributiononlettersisn’tuniform.
• HuffmanCodingisanoptimalway!
• Tocomeupwithanoptimalschemeforanylanguageefficiently,wecanuseagreedyalgorithm.
• Tocomeupwithagreedyalgorithm:
• Identifyoptimalsubstructure
• Findawaytomake“safe”choicesthatwon’truleoutanoptimalsolution.
• Createsubtreesoutofthesmallesttwocurrentsubtrees.
RecapI
• Greedyalgorithms!
• Threeexamples:
• ActivitySelection
• SchedulingJobs
• HuffmanCoding
RecapII
• Greedyalgorithms!
• Ofteneasytowritedown
• Butmaybehardtocomeupwithandhardtojustify
• Thenaturalgreedyalgorithmmaynotalwaysbecorrect.
• Aproblemisagoodcandidateforagreedyalgorithmif:
• ithasoptimalsubstructure
• thatoptimalsubstructureisREALLYNICE
• solutionsdependonjustoneothersub-problem.
Nexttime
• GreedyalgorithmsforMinimumSpanningTree!
• Pre-lectureexercise:candidategreedyalgorithmsforMST
Before nexttime