cse847 project description
TRANSCRIPT
-
CSE847Project:LargeScaleImageClassification
1. IntroductionTheobjectiveofthisprojectistobuildlargescaleimageclassifiers.Youarerequiredtobuildprogramsthatefficientlylearnclassificationmodelsfromonemillionhighdimensionaltrainingexamples,andapplythelearnedclassifierstomakepredictforaround200,000testexamples.Althoughyouareallowedtousetheoffshelftools,youareencouragedtodevelopyourownclassificationalgorithmsandlearningprograms.Thecourseprojectwillbeevaluatedinthreeaspects:theclassificationperformanceofyouralgorithms(70%),yourpresentation(20%)andyourfinalreport(10%).
Toevaluatetheperformanceofyouralgorithmsandprograms,youarerequiredtosubmittheclassificationresultsofthetestingdata,whichwillbeevaluatedbytheinstructorusingthemetricdescribedinsection4.Therankingoftheevaluationresultswillbereleasedinthefinalpresentation.Inyourpresentation,youneedtoreporttherunningtimesofyourprogramsfortrainingandtesting,andthemaximummemoryusedintrainingandtesting.Youmayalsoincludethespecialeffortsyouputintothecourseprojecttoimprovetheefficiencyandtheaccuracyofyourlearningprograms.Forinstance,youcanexplainthestrategyyouusedtoefficientlytrainaclassifierfromalargenumberoftrainingexamplesusingonlyalimitedamountofmemory.
2. DatasetThedatasetusedinthisprojectismodifiedfromtheImageNetLargeScaleVisualRecognitionChallenge2010(ILSVRC2010).Formoredetailsoftheoriginaldataset,youcanvisittheILSVRC2010websitehttp://www.imagenet.org/challenges/LSVRC/2010/Thedatasetusedinthecourseprojectconsistsof1,262,106,imagesthataredistributedover164classes.SomeoftheclassesaredirectlyfromtheImageNetdataset,whiletheothersaregeneratedbymergingmultipleclassesinordertomakeitmorechallenging.Eachimageinthedatasetisrepresentedbyavectorof900dimensions,andisassignedtooneofthe164classes.Allthefeaturesareintegers.Werandomlychoose1,000,000imagesfromthedatasettoformthetrainingset,andusetherest262,106imagesasthetestingset.Furthermore,werandomlyselect125,000imagesfromthetrainingdatasettocreateasmalldevelopmentset,whichwillbeusedforalgorithmdevelopment.Foreachset,theimagefeaturesandthecorrespondingclassassignmentsaresavedintwoplaintextfiles,namedasxxx.txtandxxx_label.txt,respectively.Ofcourse,test_label.txtfileisunavailable.Eachlineinxxx.txtisthefeaturevectorofanimageandthe
-
valuesineachfeaturevectorareseparatedbyspaces.Thelinenumbersareusedastheindexidsforimages.Forexample,theimagewhichfeaturesareonthefirstlinehasindexid1.Eachlineinxxx_label.txtistheclasslabelforthecorrespondingimageinthexxx.txt.
Inthecourseproject,thedevelopmentsetwillbedistributedon03/14/2013.Itcanbedownloadedfromhttp://www.cse.msu.edu/~cse847/project/development.rar.Boththetrainingsetandtestingsetwillbeavailableon04/04/2013andcanbedownloadedfromhttp://www.cse.msu.edu/~cse847/project/training.rarandhttp://www.cse.msu.edu/~cse847/project/testing.rar.Youneedtosendyoupredictionresultsforthetestingsetbyemailtoyourinstructoron04/17/2011(11:59pm).
3. SubmissionsForeachclass,youneedtoreturnalistoftheindicesfor100testimages,inthedescendingorderoftheclassificationscores,i.e.,thefirstimageindexintherankinglistshouldbetheonethatismostlikelytobeassignedtotheclassandetc.Pleaseusethefollowingformatforeachlineinthesubmittedfile:
Classlabel imageindex
whereclasslabelisthelabelofthepredictedclass,variedfrom1to164;imageindexistheindexofatestimage.ThetwofieldsareseparatedbyaTab.Pleaseputthe100imageindicesofclass1,orderedbytheclassificationscores,atthetopofthefile,followedbythe100imageindicesoftheclass2andsoon.Belowisanexampleofthefile:
1 165
1 32464
164 3332
164 8476
100imageindicesforfirstclass1
100imageindicesforthelastclass164
-
4. Evaluationmetric:
TheMeanAveragePrecision(MAP)isusedtoevaluatetheperformance,whichiscomputedasthefollowing:
1
1
Whereisthetotalnumberofclasses(i.e.,164),isthenumberofimagesreturnedforeachclass(i.e.,100).
iscalledprecision,andisdefinedasthepercentageofthefirstktestimages,returnedbyyourprograms,thatbelongtoclassi.