generalized information theory meets human cognition...
TRANSCRIPT
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
GeneralizedInformationTheoryMeetsHumanCognition:
IntroducingaUnifiedFrameworktoModelUncertaintyandInformationSearch
VincenzoCrupi1,JonathanD.Nelson2,3,BjörnMeder3,GustavoCevolani4,andKatyaTentori5
1CenterforLogic,Language,andCognition,DepartmentofPhilosophyandEducation,
UniversityofTurin,Italy
2SchoolofPsychology,UniversityofSurrey,Guildforfd,UK
3CenterforAdaptiveBehaviorandCognition,MaxPlanckInstituteforHuman
Development,Berlin,Germany
4IMTSchoolforAdvancedStudies,Lucca,Italy
5CenterforMind/BrainSciences,UniversityofTrento,Italy
AuthorNote
CorrespondenceconcerningthisarticleshouldbeaddressedtoVincenzoCrupi,Department
ofPhilosophyandEducation,UniversityofTurin,viaSant’Ottavio20,10124,Torino(Italy),
[email protected]/1-2,NE1713/1-2,
andME3717/2-2fromtheDeutscheForschungsgemeinschaftaspartofthepriorityprogram
NewFrameworksofRationality(SPP1516).WethankNickChater,LauraMartignon,Andrea
Passerini,andPaulPedersenforhelpfulcommentsandexchanges.
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
2
Abstract
Searchingforinformationiscriticalinmanysituations.Inmedicine,forinstance,careful
choiceofadiagnostictestcanhelpnarrowdowntherangeofplausiblediseasesthatthe
patientmighthave.Inaprobabilisticframework,testselectionisoftenmodeledbyassuming
thatpeople’sgoalistoreduceuncertaintyaboutpossiblestatesoftheworld.Incognitive
science,psychology,andmedicaldecisionmaking,Shannonentropyisthemostprominent
andmostwidelyusedmodeltoformalizeprobabilisticuncertaintyandthereductionthereof.
However,avarietyofalternativeentropymetrics(Hartley,Quadratic,Tsallis,Rényi,and
more)arepopularinthesocialandthenaturalsciences,computerscience,andphilosophyof
science.Particularentropymeasureshavebeenpredominantinparticularresearchareas,and
itisoftenanopenissuewhetherthesedivergencesemergefromdifferenttheoreticaland
practicalgoalsoraremerelyduetohistoricalaccident.Cuttingacrossdisciplinaryboundaries,
weshowthatseveralentropyandentropyreductionmeasuresariseasspecialcasesina
unifiedformalism,theSharma-Mittalframework.Usingmathematicalresults,computer
simulations,andanalysesofpublishedbehavioraldata,wediscussfourkeyquestions:How
dovariousentropymodelsrelatetoeachother?Whatinsightscanbeobtainedbyconsidering
diverseentropymodelswithinaunifiedframework?Whatisthepsychologicalplausibilityof
differententropymodels?Whatnewquestionsandinsightsforresearchonhuman
informationacquisitionfollow?Ourworkprovidesseveralnewpathwaysfortheoreticaland
empiricalresearch,reconcilingapparentlyconflictingapproachesandempiricalfindings
withinacomprehensiveandunifiedinformation-theoreticformalism.
KEYWORDS:
Entropy,Uncertainty,Valueofinformation,Informationsearch,Probabilisticmodels
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
3
GeneralizedInformationTheoryMeetsHumanCognition:
IntroducingaUnifiedFrameworktoModelUncertaintyandInformationSearch
1.Introduction
Akeytopicinthestudyofrationality,cognition,andbehavioristheeffectivesearchfor
relevantinformationorevidence.Informationsearchisalsocloselyconnectedtothenotionof
uncertainty.Typically,anagentwillseektoacquireinformationtoreduceuncertaintyabout
aninferenceordecisionproblem.Physiciansprescribemedicaltestsinordertohandlearrays
ofpossiblediagnoses.Detectivesseekwitnessesinordertoidentifytheculpritofacrime.
And,ofcourse,scientistsgatherdatainordertodiscriminateamongdifferenthypotheses.
Inpsychologyandcognitivescience,mostearlyworkoninformationacquisitionadopteda
logical,deductiveinferenceperspective.InthespiritofPopper’s(1959)influential
falsificationistphilosophyofscience,theideawasthatlearnersshouldseekinformationthat
couldhelpthemfalsifyhypotheses(e.g.,expressedasaconditionalorarule;Wason,1960,
1966,1968).However,manyhumanreasonersdidnotseemtobelievethatinformationis
usefulifandonlyifitcanpotentiallyruleout(falsify)ahypothesis.Fromthe1980s,cognitive
scientistsstartedanalyzinghumaninformationsearchwithacloserlookatinductive
inference,usingprobabilisticmodelstoquantifythevalueofinformationandendorsingthem
asnormativebenchmarks(e.g.,Baron,1985;Klayman&Ha,1987;Skov&Sherman,1986;
Slowiaczek,Klayman,Sherman,&Skov,1992;Trope&Bassok,1982,1983).Thisresearch
wasinspiredbyseminalworkinphilosophyofscience(e.g.Good,1950),statistics(e.g.
Lindley,1956),anddecisiontheory(Savage,1972).Inthisview,eachoutcomeofaquery
couldmodifyanagent’sbeliefsaboutthehypothesesbeingconsidered,thusprovidingsome
amountofinformation.Forinstance,thekeytheoreticalpointofOaksfordandChater’s(1994,
2003)analysisofWason’sselectiontaskwastoconceptualizeinformationacquisitionasa
pieceofprobabilisticinductivereasoning,assumingthatpeople’sgoalistoreduce
uncertaintyaboutwhetheraruleholdsornot.Inasimilarvein,researchersinvisionscience
haveusedmeasuresofuncertaintyreductiontopredictvisualqueriesforgathering
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
4
information(i.e.,eyemovements;Legge,Klitz,&Tjan,1997;Najemnik&Geisler,2005,2009;
Nelson&Cottrell,2007;Renninger,Coughlan,Verghese,&Malik,2005),ortoguidearobot’s
eyemovements(Denzler&Brown,2002).Probabilisticmodelsofuncertaintyreductionhave
alsobeenusedtopredicthumanqueryselectionincausalreasoning(Bramley,Lagnado,&
Speekenbrink,2015),hypothesistesting(Austerweil&Griffiths,2011;Navarro&Perfors,
2011;Nelson,Divjak,Gudmundsdottir,Martignon,&Meder,2014;Nelson,Tenenbaum,&
Movellan,2001),andcategorization(Meder&Nelson,2012;Nelson,McKenzie,Cottrell,&
Sejnowski,2010).
Ifreducinguncertaintyisamajorcognitivegoalandmotivationforinformation
acquisition,acriticalissueishowuncertaintyandthereductionthereofcanberepresentedin
arigorousmanner.Afruitfulapproachtoformalizeuncertaintyisusingthemathematical
notionofentropy,whichinturngeneratesacorrespondingmodeloftheinformationalutility
ofanexperimentastheexpectedreductionofentropy(uncertainty),sometimescalled
expectedinformationgain.
Inmanydisciplines,includingpsychologyandneuroscience(Hasson,2016),themost
prominentmodelisShannon(1948)entropy.However,anumberofnon-equivalentmeasures
ofentropyhavebeensuggested,andarebeingused,inavarietyofresearchdomains.
ExamplesincludetheapplicationofQuadraticentropyinecology(Lande,1996),thefamilyof
Rényi(1961)entropiesincomputerscienceandimageprocessing(Boztas,2014;Sahoo&
Arora,2004),andTsallisentropiesinphysics(Tsallis,2011).Itiscurrentlyunknownwhether
theseotherentropymodelswouldhavepotentialtoaddresskeytheoreticalandempirical
questionsincognitivescience.Here,webringtogetherthesedifferentmodelsina
comprehensivetheoreticalframework,theSharma-Mittalformalism(fromSharma&Mittal,
1975),whichincorporatesalargenumberofprominententropymeasuresasspecialcases.
Carefulconsiderationoftheformalpropertiesofthisfamilyofentropymeasureswillreveal
importantimplicationsformodelinguncertaintyandinformationsearchbehavior.Against
thisrichtheoreticalbackground,wewilldrawonexistingbehavioraldataandnovel
simulationstoexplorehowdifferentmodelsrelatetoeachother,elucidatetheirpsychological
meaningandplausibility,andshowhowtheycangeneratenewtestablepredictions.
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
5
Theremainderofthispaperisorganizedasfollows.Webeginbyspellingoutwhatan
entropymeasureisandhowitcanbeemployedtorepresentuncertaintyandthe
informationalvalueofqueries(questions,tests,experiments)(section2.).Subsequently,we
reviewfourrepresentativeandinfluentialdefinitionsofentropy,namelyQuadratic,Hartley,
Shannon,andErrorentropy(3.).Thesemodelshavebeen,andcontinuetobe,ofimportance
indifferentareasofresearch.Inthemaintheoreticalsectionofthepaper,wedescribea
unifiedformalframeworkgeneratingabiparametriccontinuumofentropymeasures.
Drawingonworkingeneralizedinformationtheory,weshowthatmanyextantmodelsof
entropyandexpectedentropyreductioncanbeembeddedinthiscomprehensiveformalism
(4.).Weprovideanumberofnewmathematicalresultsinthissection.Wealsoaddressthe
theoreticalmeaningoftheparametersinvolvedwhenthetargetdomainofapplicationis
humanreasoning,withimplicationsforbothnormativeanddescriptiveapproaches.Wethen
furtherelaborateontheconnectionwithexperimentalresearchinseveralways.First,we
presentsimulationresultsfromanextensiveexplorationofinformationsearchdecision
problemsinwhichalternativemodelsprovidestronglydiverging,empiricallytestable
predictions(5.).Second,wereportanddiscussanoverarchinganalysisoftheinformation-
theoreticaccountofthemostwidelyknownexperimentalparadigmforthestudyof
informationgathering,i.e.,Wason’s(1966,1968)abstractselectiontask(6.1.).Thenwe
investigatewhichmodelsperformbetteragainstdatafromarangeofexperience-based
studiesonhumaninformationsearchbehavior(Meder&Nelson,2012;Nelsonetal.,2010)
(6.2.).Wealsopointoutthatsomeentropymodelsfromthisframeworkofferpotential
explanationofhumaninformationsearchbehaviorinexperimentswhereprobabilitiesare
conveyedthroughwordsandnumbers,whichtodatehavebeenperplexingtoaccountfor
theoretically(6.3).Finally,weshowthatnewmodelsofferatheoreticallysatisfyingand
descriptivelyadequateunificationofdisparateresultsacrossdifferentkindsoftasks(6.4.).In
theGeneralDiscussion(7.),weoutlineandassesstheprospectsofageneralizedinformation-
theoreticframeworkforguidingthestudyofhumaninferenceanddecisionmaking.
Partofourdiscussionreliesandelaboratesonmathematicalanalyses,includingnovel
results.Moreover,althoughanumberofthemathematicalpointsinthepapercanbefound
scatteredthroughthemathematicsandphysicsliterature,herewebringthemtogether
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
6
systematically.WeprovideSupplementaryMaterialswherenon-trivialderivationsaregiven
accordingtoourunifiednotation.Throughouteachsectionofthetext,statementsrequiringa
mathematicalproofareflaggedbysquarebrackets[SupplMat],andtheproofisthen
presentedinthecorrespondingsubsectionoftheSupplementaryMaterialsfile.Amongthe
formalresultsprovidedthatarenoveltothebestofourknowledge,thefollowingwefind
especiallyimportant:theordinalequivalenceofSharma-Mittalentropymeasuresofthesame
order(proofinSupplMat,section4),theadditivityofallSharma-Mittalmeasuresofexpected
entropyreductionforsequentialtests(againSupplMat,4),andthedistinctiveroleofthe
degreeparameterininformationsearchtaskssuchasthePersonGame(SupplMat,5).
FurthernovelresultsincludethesubsumptionofdiversemodelssuchastheArimoto(1971)
andthePowerentropieswithintheSharma-Mittalframework(SupplMat,3),andthe
specificationofhowanumberofdifferententropymeasurescanbeconstruedwithinthe
generaltheoryofmeans(Table4).
2.Entropies,uncertainty,andinformationsearch
Accordingtoawell-knownanecdote,theoriginsofinformationtheoryweremarkedbya
wittyjokeofJohnvonNeumann.ClaudeShannonwasdoubtfulhowtocallthekeyconceptof
hisgroundbreakingworkonthe“mathematicaltheoryofcommunication”(Shannon,1948).
“Youshouldcallitentropy,”vonNeumannsuggested.Ofcourse,vonNeumannmusthave
beenawareofthecloseconnectionsbetweenShannon’sformulaandBoltzmann’sdefinition
ofentropyinclassicalstatisticalmechanics.Butthemostimportantreasonforhissuggestion,
vonNeumannquipped,wasthat“nobodyknowswhatentropyreallyis,soinadebateyouwill
alwayshavetheadvantage”(seeTribus&McIrvine,1971).Shannonacceptedtheadvice.
Severaldecadeslater,vonNeumann’sremarkseemsevenmorepointed,ifanything.
Influentialobservershavevoicedcautionandconcernabouttheproliferationofmathematical
analysesofentropyandrelatednotions(Aczél,1984,1987).Meanwhile,manyapplications
havebeendeveloped,forinstanceinphysicsandecology(see,e.g.,Beck,2009;Keylock,
2005).Butrecurrenttheoreticalcontroversieshavearisen,too,alongwithoccasional
complaintsofconceptualconfusion(seeCho,2002,andJost,2006,respectively).
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
7
Luckily,thesethornyissueswillbetangentialtoourmainconcerns.Althoughagiven
formalizationofentropycanbeconsideredfortherepresentationandmeasurementof
differentconstructsineachofavarietyofdomains,wefocusononetargetconceptforwhich
entropiescanbeemployed,namelytheuncertaintyconcerningavariableXgivenaprobability
distributionP.Inthisregard,thekeyquestionisthefollowing:Howmuchuncertaintyis
conveyedaboutvariableXbyagivenprobabilitydistributionP?Thisnotioniscentraltothe
normativeanddescriptivestudyofhumancognition.
Suppose,forinstance,thataninfectioncanbecausedbythreedifferenttypesofvirus,and
labelx1,x2,x3thecorrespondingpossibilities.Considertwodifferentprobabilityassignments,
suchas,say:
P(x1)=0.49,P(x2)=0.49,P(x3)=0.02
and
P*(x1)=0.70,P*(x2)=0.15,P*(x3)=0.15
IstheuncertaintyaboutX={x1,x2,x3}greaterunderPorunderP*?Anentropymeasure
enablesustogiveprecisequantitativevaluesinbothcase,andhenceaclearanswer.
Importantly,however,theanswerwilloftenbemeasure-dependent,fordifferententropy
measuresconveydifferentideasofuncertaintyandexhibitdistinctmathematicalproperties
oftheoreticalinterest.Wewillseethisindetaillateron.
Onceuncertaintyasourconceptualtargethasbeenoutlined,wecanturntoentropyasa
mathematicalobject.ConsiderafinitesetXofnmutuallyexclusiveandjointlyexhaustive
possibilitiesx1,…,xnonwhichaprobabilitydistributionP(X)isdefined,sothatP(X)={P(x1),
…,P(xn)},withP(xi)≥0foranyi(1≤i≤n)and∑ "($%&'∈) ) = 1.ThenelementsinX={x1,…,
xn}canbetakenasrepresentingdifferentkindsofentities,suchasevents,categories,or
propositions.Forourpurposes,entisanentropymeasureifitisafunctionfoftherelevant
probabilityvaluesonly,i.e.:
-./0(1) = 2[P(x1),…,P(xn)]
andfunctionfsatisfiesasmallnumberofbasicproperties(seebelow).Noticethat,ingeneral,
anentropyfunctioncanbereadilyextendedtothecaseofaconditionalprobability
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
8
distributiongivensomedatumy.Infact,undertheconditionalprobabilitydistributionP(X|y),
onehas-./0(1|4) = 2["($6|4),… , "($9|4)].
Shannonentropyhasbeensoprominentincognitivesciencethatsomereaderswillask:
whywedonotjuststickwithit?MorespecificobjectionsinthisveinincludethatShannon
entropyisuniquelyaxiomaticallymotivated,thatShannonentropyisalreadycentralto
psychologicaltheoryofthevalueofinformation,orthatShannonentropyisoptimalincertain
appliedsituations.Eachobjectioncanbeaddressedseparately.First,anumberofentropy
metricsinourgeneralizedframework(notonlyShannon)havebeenorcanbeuniquely
derivedfromspecificsetsofaxioms(seeCsiszár,2008).Second,althoughShannonentropy
hasanumberintuitivelydesirableproperties,itisnotaseriouscompetitivedescriptive
psychologicalmodelofthevalueofinformationinsometasks(e.g.,Nelsonetal.,2010).Third,
severalpublishedpapersinapplieddomainsreportsuperiorperformancewhenother
entropymeasuresareused(e.g.,Ramírez-Reyesetal.,2016).Indeed,Shannon’s(1948)own
viewwasthatalthoughaxiomaticcharacterizationcanlendplausibilitytomeasuresof
entropyandinformation,“therealjustification”(p.393)restsonthemeasures’operational
relevance.Ageneralizedmathematicalframeworkcanincreaseourtheoreticalunderstanding
oftherelationshipsamongdifferentmeasures,unifydiversepsychologicalfindings,and
generatenovelquestionsforfutureresearch.
Scholarshaveuseddifferentpropertiesasdefininganentropymeasure(see,e.g.,Csizsár,
2008).Besidessomeusualtechnicalrequirement(likenon-negativity),akeyideaisthat
entropyshouldbeappropriatelysensitivetohowevenorunevenadistributionis,atleast
withrespecttotheextremecasesofanuniformprobabilityfunction,U(X)={1/n,…,1/n},or
ofadeterministicfunctionV(X)whereV(xi)=1forsomei(1≤i≤n)and0forallotherxs.(In
thelattercase,thedistributionactuallyreflectsatruth-valueassignment,inlogicalparlance.)
Inoursetting,U(X)representsthehighestpossibledegreeofuncertaintyaboutX,whileunder
V(X)thetruevalueofXisknownforsure,andnouncertaintyisleft.Henceitmustholdthat,
foranyXandP(X),-./;(1) ≥ -./0(1) ≥ -./=(1),withatleastoneinequalitystrict.This
basicandminimalconditionwelabelevennesssensitivity.ItisconveyedbyShannonentropy
aswellasmanyothers,asweshallsee,anditguarantees,forinstance,thatentropyisstrictly
higherfor,say,adistributionlike{1/3,1/3,1/3}thanfor{1,0,0}.
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
9
Oncetheideaofanentropymeasureischaracterized,onecanstudydifferentmeasuresof
expectedentropyreduction.ThisamountstoconsideringtwovariablesXandY,anddefining
theexpectedreductionoftheinitialentropyofXacrosstheelementsofY.Toillustrate,inthe
viralinfectionexamplementionedabove,Xmayconcernthetypeofvirusactuallyinvolved,
whileYcouldbesomeclinicallyobservablemarker(liketheresultofabloodtest)whichis
informationallyrelevantforX.Mathematically,givenajointprobabilitydistributionP(X,Y)
overthecombinationoftwovariablesXandY(i.e.,theirCartesianproductX´Y),theactual
changeinentropyaboutXdeterminedbyanelementyinYcanberepresentedas
∆-./0(1, 4) = -./0(1) − -./0(1|4).Accordingly,theexpectedreductionoftheinitialentropy
ofXacrosstheelementsofYcanbecomputedinastandardway,asfollows:1
@0(1, A) = B ∆-./0C1, 4DE"(4DGH∈I
)
Thenotation@0(1, A)isadaptedfromworkonthefoundationsofBayesianstatistics,where
theexpectedreductioninentropyisseenasmeasuringthedependenceofvariableXon
variableY,oroftherelevanceofYforX(see,e.g.,Dawid&Musio,2014).
Verymuchasforentropyitself,theexpectedreductionofentropyremainsasgeneraland
neutralanotionaspossible.Rmeasures,too,canbegivendifferentinterpretationsin
differentdomains.Inmanycontexts,itisplausiblyassumedthatreductionoftheuncertainty
isamajordimensionofthepurelyinformational(orepistemic)valueofthesearchformore
data.WewillthusconsiderameasureRasprovidingaformalapproachtoquestionsofthe
followingkind:GivenXasatargetofinvestigation,whatistheexpectedusefulnessoffinding
outaboutYfromapurelyinformationalpointofview?Hence,thenotionofuncertaintyis
tightlycoupledtotherationalassessmentoftheexpectedinformationalutilityofpursuinga
givensearchforadditionalevidence(performingaquery,executingatest,runningan
experiment).(SeeCrupi&Tentori,2014;Nelson,2005,2008.Formorediscussion,alsosee
Evans&Over,1996;Roche&Shogenji,2016.)
1Fortechnicalreasons,wewillassumeP(yj)>0foranyj.Thisislargelyasafeprovisoforourcurrentpurposes.
Infact,inoursettingwithbothXandYfinitesets,anyzeroprobabilityoutcomeinYcouldjustbeomitted.
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
10
Formally,XandYcanjustbeseenaspartitionsofpossibilities.Inthisinterpretation,
however,theyplayquitedifferentrolesin@0(1, A).Thefirstargument,X,representsthe
overallgoaloftheinquiry,whilethesecond,Y,issupposedtobedirectlyaccessibletothe
informationseeker.Inatypicalapplication,Ywillbemoreorlessusefulatesttolearnabout
targetX,althoughunabletoconclusivelyestablishwhatthetruehypothesisinXis.
Table1.Notationemployed.
Notation Description
H={h1,…,hn} Apartitionofnpossibilities(orhypothesisspace).
P(H) ProbabilitydistributionPdefinedovertheelementsofH.
P(H|e) ProbabilitydistributionPdefinedovertheelementsofHconditionalone.
U(H) UniformprobabilitydistributionovertheelementsofH.
V(H) AprobabilitydistributionsuchthatV(hi)=1forsomei(1≤i≤n)and0forallotherhs.
H´E Thevariableobtainedbythecombination(Cartesianproduct)ofvariablesHandE.
P(H,E) JointprobabilitydistributionoverthecombinationofvariablesHandE.
J ⊥0 L GivenP(H,E),variablesHandEarestatisticallyindependent.
J ⊥0 L|M GivenP(H,E,F),variablesHandEarestatisticallyindependentconditionaloneachelementinF.
-./0(J) EntropyofHgivenP(H).
-./0(J|-) ConditionalentropyofHonegivenP(H|e).
∆-./0(J, -) ReductionoftheinitialentropyofHprovidedbye,i.e.,-./0(J) − -./0(J|-).
@0(J,L) ExpectedreductionoftheentropyofHacrosstheelementsofE,givenP(H,E).
@0(J,L|2) ExpectedreductionoftheentropyofHacrosstheelementsofE,givenP(H,E|f).
@0(J,L|M) Expectedvalueof@0(J,L|2)acrosstheelementsofF,givenP(H,E,F).
lnt(x) TheTsallisgeneralizationofthenaturallogarithm(withparametert).
et(x) TheTsallisgeneralizationoftheordinaryexponential(withparametert).
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
11
Ingeneral,theoccurrenceofoneparticularelementyofYdoesnotneedtoreducethe
initialentropyaboutX;itmightasmuchincreaseit,hencemaking∆-./0(1, 4)negative.This
quantitycanbenegativeif(forinstance)datumychangesprobabilitiesfromP(X)={0.9,0.1}
toP(X|y)={0.6,0.4}.Butcan@0(1, A),i.e.,theexpectedinformationalusefulnessofYfor
learningaboutX,benegative?SomeRmeasuresarestrictlynon-negative,butotherscanin
factbenegativeintheexpectation;thisdependsonkeypropertiesoftheunderlyingentropy
measure,aswediscusslateron.
Tosummarize,inthedomainofhumancognition,probabilitydistributionscanbe
employedtorepresentanagent’sdegreesofbelief(betheybasedonobjectivestatistical
informationorsubjectiveconfidence),withentropy-./0(1)providingaformalizationofthe
uncertaintyaboutX(givenP).Relyingonthereductionofuncertaintyasaninformational
utility,@0(1, A)istheninterpretedasameasureoftheexpectedusefulnessofaquery(test,
experiment)YrelativetoatargethypothesisspaceX.Fromnowon,toemphasizethis
interpretation,wewilloftenuseH={h1,…,hn}todenoteahypothesissetofinterestandE=
{e1,…,em}forapossiblesearchforevidence.Table1summarizesourterminologyinthis
respectaswellasforthesubsequentsections.
3.FourInfluentialEntropyModels
Wewillnowbrieflyreviewfourimportantmodelsofentropyandthecorrespondingmodels
ofexpectedentropyreduction.
3.1.Quadraticentropy
Entropy/Uncertainty.Someinterestingentropymeasureswereoriginallyproposedlong
beforetheexchangebetweenShannonandvonNeumann,whenentropywasnotyeta
scientifictermoutsidestatisticalthermodynamics.Hereisonemajorinstance:
-./0NOPQ(J) = 1 − ∑ "(ℎ%)ST'∈U
LabeledQuadraticentropyinVajdaandZvárová(2007),thismeasureiswidelyknownasthe
Gini(orGini-Simpson)index,afterGini(1912)andSimpson(1949)(alsoseeGibbs&Martin,
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
12
1962).Itisoftenemployedasanindexofbiologicaldiversity(see,e.g.,Patil&Taille,1982)
andsometimesspelledoutinthefollowingequivalentformulation:
-./0NOPQ(J) = ∑ "(ℎ%T'∈U )(1 − "(ℎ%))
TheaboveformulasuggestsameaningfulinterpretationwithHamountingtoapartitionof
hypothesesconsideredbyanuncertainagent.Inthisreading,-./NOPQ computestheaverage
(expected)surprisethattheagentwouldexperienceinfindingoutwhatthetrueelementofH
is,given1–P(h)asameasureofthesurprisethatarisesincasehobtains(seeCrupi&
Tentori,2014).2
Entropyreduction/Informationalvalueofqueries.Quadraticentropyreduction,namely,
∆-./0NOPQ(J, -) = -./0
NOPQ(J) − -./0NOPQ(J|-),hasbeenoccasionallymentionedin
philosophicalanalysesofscientificinference(Niiniluoto&Tuomela,1973,p.67).Inturn,its
associatedexpectedreductionmeasure,@0NOPQ(J, L) = ∑ ∆-./0
NOPQCJ, -DE"(-DVH∈W ),was
appliedbyHorwich(1982,pp.127-129),againinformalphilosophyofscience,andstudiedin
computersciencebyRaileanuandStoffel(2004).
3.2.Hartleyentropy
Entropy/Uncertainty.Gini’sworkdidnotplayanyapparentroleinthedevelopmentof
Shannon’s(1948)theory.AseminalpaperbyHartley(1928),however,wasastartingpoint
forShannon’sanalysis.OnelastinginsightofHartleywastheintroductionoflogarithmic
functions,whichhavebecomeubiquitousininformationtheoryeversince.AsHartleyalso
realized,thechoiceofabaseforthelogarithmisamatterofconventionallysettingaunitof
measurement(Hartley,1928,pp.539-541).Throughoutourdiscussion,wewillemploythe
naturallogarithm,denotedasln.
2-./NOPQ alsoquantifiestheoverallexpectedinaccuracyofprobabilitydistributionP(H)asmeasured
bytheso-calledBrierscore(i.e.,thesquaredEuclideandistancefromthepossibletruth-value
assignmentsoverH;seeBrier,1950;Leitgeb&Pettigrew,2010a,b;Pettigrew,2013;Selten,1998).
Festa(1993,137ff.)alsogivesausefuldiscussionofQuadraticentropyinthephilosophyofscience,
includingCarnap’s(1952)classicalworkininductivelogic.
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
13
InspiredbyHartley’s(1928)originalideathattheinformationprovidedbytheobservation
ofoneamongnpossiblevaluesofavariableisincreasinglyinformativethelargernis,and
thatitimmediatelyreflectstheentropyofthatvariable,onecandefinetheHartleyentropyas
follows(Aczél,Forte,andNg,1974):
-./0UPXYZVG(J) = [.\∑ "(ℎ%)]T'∈U ^
Undertheconvention00=0(whichisstandardintheentropyliterature),andgiventhat
P(hi)0=1wheneverP(hi)>0,-./UPXYZVG computesthelogarithmofthenumberofallnon-null
probabilityelementsinH.
Entropyreduction/Informationalvalueofqueries.Whenappliedtothedomainofreasoning
andcognition,theimplicationsofHartleyentropyrevealaninterestingPopperianflavor.A
pieceofevidenceeisuseful,itturnsout,onlytotheextentthatitexcludes(“falsifies”)atleast
someofthehypothesesinH,forotherwisethereductioninHartleyentropy,
∆-./0UPXYZVG(J, -) = -./0
UPXYZVG(J) − -./0UPXYZVG(J|-),isjustzero.Anagentadoptingsucha
measureofinformationalutilitywouldthenonlyvalueatestoutcome,e,insofarasit
conclusivelyrulesoutatleastonehypothesisinH.IfnopossibleoutcomeinEispotentiallya
“falsifier”forsomehypothesisinH,thentheexpectedreductionofHartleyentropy,@UPXYZVG ,
isalsozero,implyingthatqueryEhasnoexpectedusefulnessatallwithrespecttoH.
3.3.Shannonentropy
Entropy/Uncertainty.Inmanycontexts,thenotionofentropyissimplyandimmediately
equatedtoShannon’sformalism.Overall,suchspecialconsiderationiswell-deservedand
motivatedbycountlessapplicationsspreadovervirtuallyallbranchesofscience.Theformof
Shannonentropyisfairlywell-known:
-./0_TP99`9(J) = ∑ "(ℎ%)[. a6
0(T')bT'∈U
Concerningtheinterpretationoftheformula,manypointsmadeearlierforquadraticentropy
applytoShannonentropytoo,givenrelevantadjustments.Infact,ln(1/P(h))isanother
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
14
measureofthesurpriseinfindingoutthatastateofaffairshobtains,andthus-./_TP99`9is
itsoverallexpectedvaluerelativetoH.3
Figure1.AgraphicalillustrationofQuadratic,Hartley,Shannon,andErrorentropyasdistinctmeasuresof
uncertaintyoverabinaryhypothesissetH={h,ℎ}asafunctionoftheprobabilityofh.
Entropyreduction/Informationalvalueofqueries.ThereductionofShannonentropy,
∆-./0_TP99`9(J, -) = -./0_TP99`9(J) − -./0_TP99`9(J|-),issometimescalledinformationgain
anditisoftenconsideredasameasureoftheinformationalutilityofadatume.Itsexpected
value,alsocalledexpectedinformationgain,@0_TP99`9(J, L) = ∑ ∆-./0_TP99`9CJ, -DEVH∈W "(-D),
isthenviewedasameasureofusefulnessofqueryEforlearningaboutH.(See,e.g.,
3Thequantityln(1/P(h))alsocharacterizesapopularapproachtothemeasurementoftheinaccuracy
ofprobabilitydistributionP(H)whenhisthetrueelementinH(so-calledlogarithmicscore),
and-./_TP99`9 canbeseenascomputingtheexpectedinaccuracyofP(H)accordingly(seeGood,
1952;alsoseeGneiting&Raftery,2007).
0.0
0.5
1.0
0.00 0.25 0.50 0.75 1.00Probability of h
Entro
py o
f bin
ary
varia
ble H
EntropyHartley
Shannon
Quadratic
Error
Hartley, Quadratic, Shannon, and Error entropy
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
15
Austerweil&Griffiths,2011;Bar-Hillel&Carnap,1953;Lindley,1956;Oaksford&Chater,
1994,2003,andRuggeri&Lombrozo,2015;alsoseeBenish,1999,andNelson,2005,2008,
formorediscussion.)
3.4.Errorentropy
Entropy/Uncertainty.GivenadistributionP(H)andthegoalofpredictingthetrueelementof
H,arationalagentwouldplausiblyselecth*suchthatP(h*)=maxT'∈U["(ℎ%)],and1–
maxT'∈U["(ℎ%)]wouldthenbetheprobabilityoferror.SinceFano’s(1961)seminalwork,this
quantityhasreceivedconsiderableattentionininformationtheory.AlsoknownasBayes’s
error,wewillcallthisquantityErrorentropy:
-./0WXX`X(J) = 1 − maxT'∈U["(ℎ%)]
Notethat-./WXX`XisonlyconcernedwiththelargestvalueinthedistributionP(H),namely
maxT'∈U["(ℎ%)].Thelowerthatvalue,thehigherthechanceoferrorwereaguesstobemade,
thusthehighertheuncertaintyaboutH.
Entropyreduction/Informationalvalueofqueries.Unliketheothermodelsabove,Error
entropyhasseldombeenconsideredinthenaturalorsocialsciences.However,itcanbe
takenasasoundbasisfortheanalysisofrationalbehavior.Inthelatterdomain,itisquite
naturaltorelyonthereductionoftheexpectedprobabilityoferror∆-./0WXX`X(J, -) =
-./0WXX`X(J) − -./0WXX`X(J|-)astheutilityofadatum(oftenlabelledprobabilitygain;see
Baron,1985;Nelson,2005,2008)andonitsexpectedvalue,@0WXX`X(J, L) =
∑ ∆-./0WXX`XCJ, -DEVH∈W "(-D),astheusefulnessofaqueryortest.Indeed,thereareimportant
occurrencesofthismodelinthestudyofhumancognition.4
4AnearlyexampleisBaron’s(1985,ch,4)presentationof@WXX`X ,followingSavage(1972,ch.6).Experimental
investigationsonwhether@WXX`X canaccountforactualpatternsofreasoningincludeBaron,Beattie,and
Hershey(1988),Bramley,Lagnado,andSpeekenbrink(2015),MederandNelson(2012),Nelson,McKenzie,
Cottrell,andSejnowski(2010),andRusconi,Marelli,D’Addario,Russo,andCherubini(2014),whileCrupi,
Tentori,andLombardi(2009)reliedon@WXX`X intheircriticalanalysisofso-calledpseudodiagnosticity(alsosee
Crupi&Girotto,2014;Tweeney,Doherty,&Kleiter,2010).
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
16
4.AUnifiedFrameworkforUncertaintyandInformationSearch
Thesetofmodelsintroducedaboverepresentsadiversesampleinhistorical,theoretical,and
mathematicalterms(seeFigure1foragraphicalillustration).Istheprominenceofparticular
modelsduetofundamentaldistinctiveproperties,orlargelyduetohistoricalaccident?What
aretherelationshipsamongthesemodels?Inthissectionweshowhowallofthesemodels
canbeembeddedinaunifiedmathematicalformalism,providingnewinsight.
4.1.Sharma-Mittalentropies
LetustakeShannonentropyagainasaconvenientstartingpoint.Asnotedabove,Shannon
entropyisanaverage,morepreciselyaself-weightedaverage,displayingthefollowing
structure:
∑ "(ℎ%)f.2["(ℎ%)]T'∈U
Thelabelself-weightedindicatesthateachprobabilityP(h)servesasaweightforthevalueof
functioninfhavingthatsameprobabilityasitsargument,namely,inf[P(h)].Thefunctioninf
canbeseenascapturinganotionofatomicinformation(orsurprise),assigningavaluetoeach
distinctelementofHonthebasisofitsownprobability(andnothingelse).Anobvious
requirementhereisthatinfshouldbeadecreasingfunction,becauseafindingthatwas
antecedentlyhighlyprobable(improbable)provideslittle(much)newinformation(anidea
thatFloridi,2013,calls“inverserelationshipprinciple”afterBarwise,1997,p.491).In
Shannonentropy,onehasinf(x)=ln(1/x).Giveninf(x)=1–x,instead,Quadraticentropy
arisesfromtheverysameschemeabove.
Aself-weightedaverageisaspecialcaseofageneralized(self-weighted)mean,whichcan
becharacterizedasfollows:
gh6i∑ "(ℎ%)g{f.2["(ℎ%)]}T'∈U l
wheregisadifferentiableandstrictlyincreasingfunction(seeWang&Jiang,2005;alsosee
Muliere&Parmigiani,1993,forthefascinatinghistoryoftheseideas).Fordifferentchoicesof
g,differentkindsof(self-weighted)meansareinstantiated.Withg(x)=x,theweighted
averageaboveobtainsonceagain.Foranotherstandardinstance,g(x)=1/xgivesrisetothe
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
17
harmonicmean.Letusnowconsidertheformofgeneralized(self-weighted)meansabove
andfocusonthefollowingsetting:
g(x)=lnr[et(x)]
inf(x)=lnt(1/x)
where
[.Y($) =&(mno)h66hY
-Y& = [1 + (1 − /)$]mmno
aregeneralizedversionsofthenaturallogarithmandexponentialfunctions,respectively,
oftenassociatedwithTsallis’s(1988)work.Importantly,thelntfunctionrecoversthe
ordinarynaturallogarithmlninthelimitfort®1,sothatonecansafelyequatelnt(x)=ln(x)
fort=1andhaveaniceandsmoothgeneralizedlogarithmicfunction.5Similarly,itisassumed
that-Y& =exfort=1,asthisisthelimitfort®1[SupplMat,section1].Negativevaluesof
parametersrandtwillnotneedconcernushere:we’llbeassumingr,t≥0throughout.
Oncefedintothegeneralizedmeansequation,thesespecificationsofinf(x)andg(x)yielda
two-parameterfamilyofentropymeasuresoforderranddegreet[SupplMat,2]:
-./0_q(r,o)(J) = 6
Yh6 s1 − C∑ "(ℎ%)XT'∈U Eonmrnmt
ThelabelSMreferstoSharmaandMittal(1975),wherethisformalismwasoriginally
proposed(alsoseeMasi,2005,andHoffmann,2008).AllfunctionsintheSharma-Mittalfamily
areevennesssensitive(see2.above),thusinlinewithabasiccharacterizationofentropies
[SupplMat,2].Also,with-./_q(r,o)onecanembedthewholesetoffourclassicmeasuresin
ourinitiallist.Moreprecisely[SupplMat,3]:
5TheideaoflntisoftencreditedtoTsallisforhisworkingeneralizedthermodynamics(seeTsallis,1988,and
2011).ThemathematicalpointmaywellgobacktoEuler,however(seeHoffmann,2008,p.7).Formoretheory,
alsoseeHavrdaandCharvát(1967),Daróczy(1970),Naudts(2002),Kaniadakis,Lissia,andScarfone(2004).
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
18
– QuadraticentropycanbederivedfromtheSharma-Mittalfamilyforr=t=2,thatis,
-./0_q(u,u)(J) = -./0
NOPQ(J);
– HartleyentropycanbederivedfromtheSharma-Mittalfamilyforr=0andt=1,thatis,
-./0_q(v,m)(J) = -./0
UPXYZVG(J);
– ShannonentropycanbederivedfromtheSharma-Mittalfamilyforr=t=1,thatis,
-./0_q(m,m)(J) = -./0_TP99`9(J);
– ErrorentropyisrecoveredfromtheSharma-Mittalfamilyinthelimitforr®∞whent=
2,sothatwehave-./0_q(w,u)(J) = -./0WXX`X(J).
Figure2.TheSharma-MittalfamilyofentropymeasuresisrepresentedinaCartesianquadrantwithvaluesofthe
orderparameterrandofthedegreeparametertlyingonthex–andy–axis,respectively.Eachpointinthe
quadrantcorrespondstoaspecificentropymeasure,eachlinecorrespondstoadistinctone-parameter
generalizedentropyfunction.Severalspecialcasesarehighlighted.(Relevantreferencesandformulasarelisted
inTable4.)
●●
●
●
Hartley
Shannon
Quadratic
Rényi
Effective number
Power
Error entropy
Tsallis
Gau
ssia
n
Arimoto
Orig
in
0
1
2
3
4
0 1 2 3 4Order r
Deg
ree t
Sharma−Mittal space of entropy measures
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
19
Figure3.Agraphicalillustrationofthegeneralizedatomicinformationfunctionlnt(1/P(h))forfourdifferent
valuesoftheparametert(0,1,2,and5,respectively,forthecurvesfromtoptobottom).Appropriately,the
amountofinformationarisingfromfindingoutthathisthecaseisadecreasingfunctionofP(h).Forhighvalues
oft,however,suchdecreaseisflattened:witht=5(thelowestcurveinthefigure)findingoutthathistrue
providesalmostthesameamountofinformationforalargesetofinitialprobabilityassignments.
Agooddealmorecanbesaidaboutthescopeofthisapproach:seeFigures2and3,Table4,
andSupplMat(section3)foradditionalmaterial.Here,wewillonlymentionbrieflythree
importantfurtherpointsaboutR-measuresintheSharma-Mittalframeworkandtheir
meaningformodellinginformationsearchbehavior.Theyareasfollows.
Additivityofexpectedentropyreduction:ForanyH,E,Fand"(J, L, M),@0_q(r,o)(J, L × M) =
@0_q(r,o)(J, L) + @0
_q(r,o)(J, M|L).
Thisstatementmeansthat,foranySharma-MittalR-measure,theinformationalutilityofa
combinedtestL × MforHamountstothesumoftheplainutilityofEandtheutilityofFthat
0
1
2
3
0.00 0.25 0.50 0.75 1.00Probability of h
Gen
eral
ized
ato
mic
info
rmat
ion
of h
Degreet = 0
t = 1
t = 2
t = 5
Tsallis generalized atomic information
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
20
isexpectedconsideringallpossibleoutcomesofE[SupplMat,4].(Formally,@0_q(r,o)(J, M|L) =
∑ @0_q(r,o)CJ, M|-DE"C-DEVH∈W ,while@0
_q(r,o)CJ, M|-DEdenotestheexpectedentropyreductionof
HprovidedbyFascomputedwhenallrelevantprobabilitiesareconditionalizedonej.)
AccordingtoNelson’s(2008)discussion,thiselegantadditivitypropertyofexpectedentropy
reductionisimportantandhighlydesirableasconcernstheanalysisoftherational
assessmentoftestsorqueries.Moreover,onecanseethattheadditivityofexpectedentropy
reductioncanbeextendedtoanyfinitechainofqueriesandthusbeappliedtosequential
searchtaskssuchasthoseexperimentallyinvestigatedbyNelsonetal.(2014).
Irrelevance:ForanyH,Eand"(J, L),ifeitherE={e}orJ ⊥0 L,then@0_q(r,o)(J, L) = 0.
Thisstatementsaysthattwospecialkindsofqueriescanbeknowninadvancetobeofnouse,
thatis,informationallyinconsequentialrelativetothehypothesissetofinterest.Oneisthe
caseofanemptytestE={e}withasinglepossibilitythatisalreadyknowntoobtainwith
certainty,sothatP(e)=1.AssuggestedvividlybyFloridi(2009,p.26),thiswouldbelike
consultingtheraveninEdgarAllanPoe’sfamouspoem,whichisknowntogiveoneandthe
sameanswernomatterwhat(italwaysspellsout“Nevermore”).Theothercaseiswhen
variablesHandEareunrelated,thatis,statisticallyindependentaccordingtoP(J ⊥0 Lin
ournotation).Inbothofthesecircumstances,@0_q(r,o)(J, L) = 0simplybecausethepriorand
posteriordistributiononHareidenticalforeachpossiblevalueofE,sothatnoentropy
reductioncaneverobtain.
Bytheirrelevancecondition,emptyandunrelatedquerieshavezeroexpectedutility—but
canaqueryEhaveanegativeexpectedutility?Ifso,arationalagentwouldbewillingtopaya
costjustfornotbeingtoldwhatthetruestateofaffairsisasconcernsE,muchasan
abandonedloverwhowantstobesparedbeingtoldwhetherher/hisbelovedisorisnot
happybecauses/heexpectsmoreharmthangood.Note,however,thatforthelovernon-
informationalcostsareclearlyinvolved,whileweareassumingqueriesorteststobe
assessedinpurelyinformationalterms,bracketingallfurtherfactors(see,e.g.,Raiffa&
Schlaifer,1961,Meder&Nelson,2012,andMarkant&Gureckis,2012,forworkinvolving
situation-specificpayoffs).Inthisperspective,itisreasonableandcommontoseeirrelevance
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
21
astheworst-casescenarioandexcludethepossibilityofinformationallyharmfultests:an
irrelevanttest(whetheremptyorstatisticallyunrelated)simplycannottellusanythingof
interest,butthatisasbadasitcanget(seeGood,1967,andGoosens,1976,forseminal
analyses;alsoseeDawid,1998).6
Interestingly,notallSharma-Mittalmeasuresofexpectedentropyreductionarenon-
negative.Someofthemdoallowforthecontroversialideathattherecouldexistdetrimental
testsinpurelyinformationalterms,suchthatanagentshouldrankthemworsethanan
irrelevantsearchandtakeactivemeasurestoavoidthem(despitethemhaving,by
assumption,nointrinsiccost).Mathematically,anon-negativemeasure@0(J, L)isgenerated
ifandonlyiftheunderlyingentropymeasureisaconcavefunction[SupplMat,4],andthe
conditionsforconcavityareasfollows:
Concavity:-./0_q(r,o)(J)isaconcavefunctionof{P(h1),…,P(hn)}justincaset≥2–1/r.7
IntermsofFigure2,thismeansthatanyentropy(representedbyapoint)belowtheArimoto
curveisnotgenerallyconcave(seeFigure4foragraphicalillustrationofastronglynon-
concaveentropymeasure).Thus,iftheconcavityofentisrequired(topreservethenon-
6Intheoriesofso-calledimpreciseprobabilities,thenotionarisesofadetrimentalexperimentEinthesensethat
intervalprobabilityestimatesforeachelementinahypothesissetofinterestHcanbeproperlyincludedinthe
correspondingintervalprobabilityestimatesconditionaloneachelementinE.Thisphenomenonisknownas
dilation:one’sinitialstateofcredenceaboutHbecomeslessprecise(thusmoreuncertain,underaplausible
interpretation)nomatterhowanexperimentturnsout.Thestronglyunattractivecharacterofthisimplication
hasbeensometimesdisregarded(seeTweeneyetal.,2010,foranexampleinthepsychologyofreasoning),but
theprevailingviewisthatappropriatemovesarerequiredtoavoiditordispelit(forrecentdiscussions,see
Bradley&Steele,2014;Pedersen&Wheeler,2014).
7ThisimportantresultisproveninHoffmann(2008),andalreadymentionedinTanejaetal.(1989,p.61),who
inturnrefertovanderPyl(1978)foraproof.Wedidnotpositconcavityasadefiningpropertyofentropies,and
that’showitshouldbe,inouropinion.Concavitymaydefinitelybeconvenientorevenrequiredinsome
applications,butbarringnon-concavefunctionswouldbeoverlyrestrictiveasconcernstheformalnotionof
entropy.Inphysics,forinstance,concavityistakenasdirectlyrelevantforgeneralizedthermodynamics(Beck,
2009,p.499;Tsallis,2004,p.10).Inbiologicalapplications,ontheotherhand,concavitywassuggestedby
Lewontin(1972;alsoseeRao2010,p.71),butseenashaving“nointuitivemotivation”byPatilandTaille(1982,
p.552).
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
22
negativityofR),thenmanyprominentspecialcasesareretained(includingQuadratic,
Hartley,Shannon,andErrorentropy),butasignificantbitofthewholeSharma-Mittal
parameterspaceisruledout.Thisconcerns,forinstance,entropiesofdegree1andorder
higherthan1(seeBen-Bassat&Raviv,1978).
Table4.AsummaryoftheSharma-Mittalframeworkandseveralofitsspecialcases,includingaspecificationof
theirstructureinthegeneraltheoryofmeansandakeyreferenceforeach.
(r,t)-setting AlgebraicformofentP(H)Generalizedmeanconstruction
Characteristicfunctionanditsinverse Atomicinformation
Sharma-Mittal
Sharma&Mittal(1975)
r≥0
t≥0
1/ − 1
⎣⎢⎢⎡1 − }B "(ℎ%)X
T'∈U
~
Yh6Xh6
⎦⎥⎥⎤ g($) = [.X(-Y&) gh6($) = [.Y(-X&) f.2($) = [.Y Ç
1$É
EffectiveNumbers
Hil(1973)
r≥0
t=0}B "(ℎ%)XT'∈U
~
66hX
− 1 g($) = [.X(1 + $) gh6($) = -X& − 1 f.2($) =1 − $$
Rényi
Rényi(1961)
r≥0
t=1
11 − Ñ [.}B "(ℎ%)X
T'∈U
~ g($) = [.X(-&) gh6($) = [.(-X&) f.2($) = [. Ç1$É
Powerentropies
Laakso&Taagepera(1979)
r≥0
t=21 − }B "(ℎ%)X
T'∈U
~
6Xh6
g($) = [.X Ç1
1 − $Égh6($) = 1 − (-X&)h6 f.2($) = 1 − $
Gaussian
Frank(2004)
r=1
t≥0
1/ − 1 Ö1 − -(6hY)s∑ 0(T')Ü'∈á Z9Ç 6
0(T')Étà g($) = [.(-Y&) gh6($) = [.Y(-&) f.2($) = [.Y Ç
1$É
Arimoto
Arimoto(1971)
r≥½
/ = 2 −1Ñ
ÑÑ − 1
⎣⎢⎢⎡1 − }B "(ℎ%)X
T'∈U
~
6X
⎦⎥⎥⎤ g($) = [.X s1 + Ç
1 − ÑÑ É$t
X6hX gh6($) =
ÑÑ − 1 ä1 − (-X
&)6hXX ã f.2($) =
ÑÑ − 1 ä1 − $
Xh6X ã
Tsallis
Tsallis(1988)
r=t≥0 1/ − 1}1 − B "(ℎ%)Y
T'∈U
~ g($) = $ gh6($) = $ f.2($) = [.Y Ç1$É
Quadratic
Gini(1912)
r=t=2 1 − B "(ℎ%)ST'∈U
g($) = $ gh6($) = $ f.2($) = 1 − $
Shannon
Shannon(1948)
r=t=1 B "(ℎ%)T'∈U
[. Ç1
"(ℎ%)É g($) = $ gh6($) = $ f.2($) = [. Ç
1$É
Hartley
Hartley(1928)
r=0
t=1[. }B "(ℎ%)]
T'∈U
~ g($) = -& − 1 gh6($) = [.(1 + $) f.2($) = [. Ç1$É
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
23
Figure4.Graphicalillustrationofthenon-concaveentropy-./_q(uv,v)forabinaryhypothesissetH={h,ℎ}asa
functionoftheprobabilityofh.
4.2.Psychologicalinterpretationoftheorderanddegreeparameter
Theorderparameterr:Imbalanceandcontinuity.Whatisthemeaningoftheorderparameter
intheSharma-Mittalformalismwhenentropiesandexpectedentropyreductionmeasures
representuncertaintyandthevalueofqueries,respectively?Toclarify,letusconsiderwhat
happenswithextremevaluesofr,i.e.,ifr=0orgoestoinfinity,respectively[SupplMat,3]:
-./0_q(v,o)(J) = [.Y\∑ "(ℎ%)]T'∈U ^
-./0_q(w,o)(J) = [.Y s
6åçéÜ'∈á[0(T')]
t
Giventheconvention00=0,∑ "(ℎ%)]T'∈U simplycomputesthenumberofallelementsinH
withanon-nullprobability.Accordingly,whenr=0,entropybecomesa(increasing)function
ofthemerenumberofthe“live”(non-zeroprobability)optionsinH.Whenrgoestoinfinity,
0.0
0.5
1.0
0.00 0.25 0.50 0.75 1.00Probability of h
Entro
py o
f bin
ary
varia
ble H
Non−concave entropy
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
24
ontheotherhand,entropybecomesa(decreasing)functionoftheprobabilityofasingle
elementinH,i.e.,themostlikelyhypothesis.Thisshowsthattheorderparameterrisanindex
oftheimbalanceoftheentropyfunction,whichindicateshowmuchtheentropymeasure
discountsminor(lowprobability)hypotheses.Fororder-0measures,theactualprobability
distributionisneglected:non-zeroprobabilityhypothesisarejustcounted,asiftheywereall
equallyimportant(seeGauvrit&Morsanyi,2014).Fororder-¥measures,ontheotherhand,
onlythemostprobablehypothesismatters,andallotherhypothesesaredisregarded
altogether.Forintermediatevaluesofr,morelikelyhypothesescountmore,butlesslikely
hypothesesdoretainsomeweight.Thehigher[lower]ris,themore[less]thelikely
hypothesesareregardedandtheunlikelyhypothesesarediscounted.Importantly,for
extremevaluesoftheorderparameter,anotherwisenaturalideaofcontinuityfailsinthe
measurementofentropy:whenrgoestoeitherzeroorinfinity,itisnotthecasethatsmall
(large)changesintheprobabilitydistributionP(H)producecomparablysmall(large)changes
inentropyvalues.
Toseebetterhoworder-0entropymeasuresbehave,considerthesimplestofthem:
-./0_q(v,v)(J) = .è − 1
where.è = ∑ "(ℎ%)]T'∈U ,so.èdenotesthenumberofhypothesesinHwithanon-null
(strictlypositive)probability.Giventhe–1correction,-./_q(v,v)canbeinterpretedasthe
“numberofcontenders”foreachentityinsetH,becauseittakesvalue0whenonlyone
elementisleft.Forfuturereference,wewilllabel-./_q(v,v) Originentropybecauseitmarks
theoriginofthegraphinFigure3.Importantly,theexpectedreductionofOriginentropyis
justtheexpectednumberofhypothesesinHconclusivelyfalsifiedbyatestE.
TotheextentthatalldetailsofthepriorandposteriorprobabilitydistributionoverHare
neglected,computationaldemandsaresignificantlydecreasedwithorder-0entropies.Asa
consequence,measuresoftheexpectedreductionofanorder-0entropy(andespeciallyOrigin
entropy)alsoamounttocomparablyfrugal,heuristicorquasi-heuristicmodelsofinformation
search(seeBaronetal.’smodel,1988,p.106).Lackofcontinuity,too,isassociatedwith
heuristicmodels,whichoftenrelyondiscreteelementsinsteadofcontinuousrepresentations
(seeGigerenzer,Hertwig,&Pachur,2011;Katsikopoulos,Schooler,&Hertwig,2010).More
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
25
generally,whentheorderparameterapproaches0,entropymeasuresbecomemoreand
morebalanced,meaningthattheytreatalllivehypothesesmoreandmoreequally.What
happenstotheassociatedexpectedentropyreductionmeasuresisthattheybecomemore
andmore“Popperian”inspirit.Infact,fororder-0relevancemeasures,atestEwilldeliver
somenon-nullexpectedinformationalutilityabouthypothesissetHifandonlyifsomeofthe
possibleoutcomesofEcanconclusivelyruleoutsomeelementinH.Otherwise,theexpected
entropyreductionwillbezero,nomatterhowlargethechangesinprobabilitythatmight
arisefromE.Cognitively,relevancemeasuresofloworderwouldthendescribethe
informationsearchpreferencesofanagentwhoisdistinctivelyeagertoprunedownthelistof
candidatehypotheses,anattitudewhichmightprevailinearlierstagesofaninquiry,when
suchalistcanbesizable.
Amongentropymeasuresoforderinfinity,wealreadyknow-./_q(w,u) = 1 −
maxT'∈U["(ℎ%)]asErrorentropy.Whatthisillustratesisthat,whenrgoestoinfinity,entropy
measuresbecomemoreandmoredecision-theoreticinashort-sightedkindofway:inthe
limit,theyareonlyaffectedbytheprobabilityofacorrectguessgiventhecurrentlyavailable
information.Anotableconsequencefortheassociatedmeasuresofexpectedentropy
reductionisthatatestEcandeliversomenon-nullexpectedinformationalutilityonlyifsome
ofthepossibleoutcomesofEcanaltertheprobabilityofthemodalhypothesisinH.Ifthatis
notthecase,thentheexpectedutilitywillbezero,nomatterhowsignificantthechangesin
theprobabilitydistributionarisingfromE.Cognitively,then,R-measuresofveryhighorder
woulddescribetheinformationsearchpreferencesofanagentwhoispredominantly
concernedwithanestimateoftheprobabilityoferrorinanimpendingchoicefromsetH.
Thedegreeparametert:Perfecttestsandcertainty.Letusnowconsiderbrieflythemeaningof
thedegreeparametertintheSharma-Mittalformalismwhenentropiesandrelevance
measuresrepresentuncertaintyandthevalueofqueries,respectively.Aremarkablefact
aboutthedegreeparametertisthat(unliketheorderparameterr)itdoesnotaffectthe
rankingofentropyvalues.Indeed,onecanshowthatanySharma-Mittalentropymeasureisa
strictlyincreasingfunctionofanyothermeasureofthesameorderr,regardlessofthedegree
(foranyhypothesissetHandanyprobabilitydistributionP)[SupplMat,4].Thus,concerning
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
26
theordinalcomparisonofentropyvalues,onlyiftheorderdifferscandivergencesbetween
pairsofSMentropymeasuresarise.Ontheotherhand,theimplicationsofthedegree
parameterformeasuresofexpectedentropyreductionaresignificantandhavenotreceived
muchattention.
Asausefulbasisfordiscussion,supposethatvariablesHandEareindependent,inthe
standardsensethatforanyhiÎHandanyejÎE,P(hiÇej)=P(hi)P(ej),denotedasJ ⊥0 L.
Thenwehave[SupplMat,4]:
@0_q(r,o)(L, L) − @0
_q(r,o)(J × L, L) = (/ − 1)-./0_q(r,o)(J)-./0
_q(r,o)(L)
Ifexpectedentropyreductionisinterpretedasameasureoftheinformationalutilityof
queriesortests,thisequalitygovernstherelationshipbetweenthecomputedutilitiesofEin
caseitisa“perfect”(conclusive)testandincaseitisnot.Moreprecisely,thefirsttermonthe
left,@0_q(r,o)(L, L),measurestheexpectedinformationalutilityofaperfecttestbecausethe
testitselfandthetargetofinvestigationarethesame,hencefindingoutthetruevalueofE
removesallrelevantuncertainty.Ontheotherhand,Eisnotanymoreaperfecttestinthe
secondtermoftheequationabove,@0_q(r,o)(J × L,L),forhereamorefine-grainedhypothesis
setJ × Lisatissue,thusamoredemandingepistemictarget;hencefindingoutthetruevalue
ofEwouldnotremoveallrelevantuncertainty.(Recallthat,byassumption,Hisstatistically
independentfromE,sotheuncertaintyaboutHwouldremainuntouched,asitwere,after
knowingaboutE.).Withentropiesofdegree1(includingShannon),theassociatedmeasures
ofexpectedentropyreductionimplythatEhasexactlyidenticalutilityinbothcases,becauset
=1nullifiestheright-handsideoftheequation,regardlessoftheorderparameterr.Witht>1
theright-handsideispositive,soEisastrictlymoreusefultestwhenitisconclusivethan
whenitisnot.Witht<1,onthecontrary,theright-handsideisnegative,soEisstrictlyless
usefulatestwhenitisconclusivethanwhenitisnot.Notethattheseareordinalrelationships
(rankings).Incomparingtheexpectedinformationalutilityofqueries,thedegreeparametert
canthusplayacrucialrole.CrupiandTentori(2014,p.88)providedsomesimple
illustrationswhichcanbeadaptedasfavoringanentropywitht>1asthebasisfortheR-
measureoftheexpectedutilityofqueries(here,wepresentanillustrationinFigure5).
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
27
Figure5.Considerastandard52-cardplayingdeck,withSuitcorrespondingtothe4equallyprobablesuits,
Valuecorrespondingtothe13equallyprobablenumbers(orfaces)thatacardcantake(2through10,Jack,
Queen,King,Ace),andSuit�Valuecorrespondingtothe52equallyprobableindividualcardsinthedeck.
Supposethatyouwillbetoldthesuitofarandomlychosencard.Isthismorevaluabletoyouif(i)(perfecttest
case)yourgoalistolearnthesuit,i.e.,RP(Suit,Suit),or(ii)(inconclusivetestcase)yourgoalistolearnthe
specificcard,i.e.,RP(Suit�Value,Suit)?Whatistheratioofthevalueoftheexpectedentropyreductionin(i)vs.
(ii)?Fordegree1,theinformationtobeobtainedhasequalvalueineachcase.Fordegreesgreaterthan1,the
perfecttestismoreuseful.Fordegreeslessthanone,theinconclusivetestismoreuseful.Interestingly,asthe
figureshows,thedegreeparameteruniquelydeterminestherelativevalueofRP(Suit,Suit)andRP(Suit�Value,
Suit),regardlessoftheorderparameter.IntheFigure,valuesoftheorderparameterrandofthedegree
parametertlieonthex–andy–axis,respectively.Colorrepresentsthelogoftheratiobetweentheconclusive
testandtheinconclusivetestcaseinthecardexampleabove:blackmeansthattheinformationvaluesofthe
testsareequal(logoftheratiois0);warm/coolshadesindicatethattheconclusivetesthasahigher/lower
value,respectively(logoftheratioispositive/negative).
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
28
Themeaningofahighdegreeparameterisofparticularinterestinso-calledTsallisfamily
ofentropymeasures,obtainedfrom-./_q(r,o) whenr=t(seeTable4).ConsiderTsallis
entropyofdegree30,thatis-./_q(êv,êv).Withthismeasure,entropyremainsveryclosetoa
upperboundvalueof1/(t–1)≈0.0345unlesstheprobabilitydistributionreflectsnear-
certaintyaboutthetrueelementinthehypothesissetH.Forinstance,forasunevena
distributionas{0.90,0.05,0.05},-./ëíPZZ%í(êv)yieldsentropy0.03330,stillcloseto0.0345,
whileitquicklyapproaches0whentheprobabilityofonehypothesisexceeds0.99.Non-
Certaintyentropyseemsausefullabelforfuturereference,asmeasure-./ëíPZZ%í(êv) essentially
impliesthatentropyisalmostinvariantaslongasanappreciablelackofcertainty(a
“reasonabledoubt”,asitwere)endures.Accordingly,theentropyreductionfromapieceof
evidenceeislargelynegligibleunlessoneisledtoacquireaveryhighdegreeofcertainty
aboutH,anditapproachestheupperboundof1/(t–1)astheposteriorprobabilitycomes
closetomatchingatruth-valueassignment(withP(hi)=1forsomeiand0forallotherhs).
Uptotheinconsequentialnormalizingconstantt–1,theexpectedreductionofthisentropy,
@ëíPZZ%í(êv),amountstoasmoothvariantofNelson’setal.(2010)“probability-of-certainty
heuristic”,whereadatumeiÎEhasinformationalutility1ifitrevealsthetrueelementinH
withcertaintyandutility0otherwise,sothattheexpectedutilityofEitselfisjusttheoverall
probabilitythatcertaintyaboutHiseventuallyachievedbythattest.Theseremarksfurther
illustratethatalargerdegreetimpliesanincreasingtendencyofthecorrespondingR-
measuretovaluehighlytheattainmentofcertaintyorquasi-certaintyaboutthetarget
hypothesissetwhenassessingatest.
5.Asystematicexplorationofhowkeyinformationsearchmodelsdiverge
Dependingondifferententropyfunctions,twomeasuresRandR*oftheexpectedreductionof
entropyastheinformationalutilityoftestsmaydisagreeintheirrankings.Formally,there
existvariablesH,E,andFandprobabilitydistributionP(H,E,F)suchthat@0(J, L) > @0(J, M)
while@0∗(J, L) < @0∗(J, M);thus,R-measuresarenotgenerallyordinallyequivalent.Inthe
following,wewillfocusonanillustrativesampleofmeasuresintheSharma-Mittalframework
andshowthatsuchdivergencescanbewidespread,strong,andtellingaboutthespecific
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
29
tenetsofthosemeasures.Thismeansthatdifferententropymeasurescanprovidemarkedly
divergentimplicationsintheassessmentofpossiblequeries’expectedusefulness.Depending
ontheinterpretationofthemodels,thisinturnimpliesconflictingempiricalpredictions
and/orincompatiblenormativerecommendations.
Ourlistwillincludethreeclassicalmodelsthatarestandardatleastinsomedomains,namely
Shannon,Quadratic,andErrorentropy.Italsoincludesthreemeasureswhichwepreviously
labelledheuristicorquasi-heuristicinthattheylargelyorcompletelydisregardquantitative
informationconveyedbytherelevantprobabilitydistributionP:theseareOriginentropy(or
the“numberofcontenders”),Hartleyentropy,andNon-Certaintyentropy,asdefinedabove.
Forawidercoverageandcomparison,wealsoincludeanentropyfunctionlyingwellbelow
theArimotocurveinFigure2,thatis,-./_q(uv,v) ,andthuslabelledNon-Concave(seeFigure4).
Weransimulationstoidentifycasesofstrongdisagreementbetweenoursevenmeasures
ofexpectedentropyreduction,onapairwisebasis,aboutwhichoftwotestsistakentobe
moreuseful.Ineachsimulation,weconsideredascenariowithathreefoldhypothesisspaceH
={h1,h2,h3},andtwobinarytests,E={e,-}andF={f,2}.8Thegoalofeachsimulationwasto
findacase—thatis,aspecificjointprobabilitydistributionP(H,E,F)—wheretwoR-
measuresstronglydisagreeaboutwhichoftwotestsismostuseful.Theidealscenariohereis
acasewhereexpectedreductionofonekindofentropy(say,Origin)impliesthatEisasuseful
ascanpossiblybefound,whileFisasbadasitcanbe,andtheexpectedreductionofanother
kindofentropy(say,Shannon)impliestheopposite,withequalstrengthofconviction.
ThequantificationofthedisagreementbetweentwoR-measuresinagivencase—fora
givenP(H,E,F)—arisesfromthreesteps(alsoseeNelsonetal.,2010).(i)Normalization:for
eachmeasure,wedividenominalvaluesofexpectedentropyreduction(foreachofEandF)
bytheexpectedentropyreductionofaconclusivetestforthreeequallyprobablehypotheses,
8Weusedthree-hypothesisscenariostoillustratethedifferencesamongourselectedsampleofRmeasures,
becausescenariosofthiskindappearedtoofferareasonablebalanceofbeingsimpleyetpowerfulenoughto
deliverdivergencesthatarestrongandintuitivelyclear.Notehoweverthattwo-hypothesisscenarioscanalso
clearlydifferentiatemanyoftheRmeasures(seethereviewofbehavioralresearchonbinaryclassificationtasks
inthesubsequentsections).
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
30
thatis,by@;(J, J).(ii)PreferenceStrength:foreachmeasure,wecomputethesimple
differencebetweenthe(normalized)expectedentropyreductionfortestEandfortestF,that
is,ñó(U,W)ñò(U,U)– ñó(U,ö)ñò(U,U)
.(iii)DisagreementStrength(DS):ifthetwomeasuresagreeonwhetherE
orFismostuseful,DSisdefinedaszero;iftheydisagree,DSisdefinedasthegeometricmean
ofthosemeasures’respectiveabsolutepreferencestrengthsinstep(ii).
Inthesimulations,avarietyoftechniqueswereinvolvedinordertomaximize
disagreementstrength,includingrandomgenerationofpriorprobabilitiesoverHandof
likelihoodsforEandF,optimizationoflikelihoodsalone,andjointoptimizationoflikelihoods
andpriors.EachexamplereportedherewasfoundintheattempttomaximizeDSfora
particularpairofmeasures.Wereliedonthesimulationslargelyasaheuristictool,thus
selectingandslightlyadaptingthenumericalexamplestomakethemmoreintuitiveand
improveclarity.9
ForeachpairofR-measuresinoursampleofseven,atleastonecaseofmoderateorstrong
disagreementwasfound(Table5).Thus,foreachpairwisecomparisononecanidentify
probabilitiesforwhichthemodelsmakedivergingclaimsaboutwhichtestismoreuseful.In
whatfollows,weappendashortdiscussiontothecasesinwhichShannonentropystrongly
disagreeswitheachcompetingmodel.Suchdiscussionisillustrativeandqualitative,to
intuitivelyhighlighttheunderlyingpropertiesofdifferentmodels.Similarexplicationscould
beprovidedforallotherpairwisecomparisons,butareomittedforthesakeofbrevity.
Shannonvs.Non-CertaintyEntropy(case3inTable5;DS=0.30).Initspurestform,Non-
Certaintyentropyequals0ifonehypothesisinHisknowntobetruewithcertainty,and1
otherwise.Asaconsequence,theentropyreductionexpectedfromatestEjustamountstothe
probabilitythatfullcertaintywillbeachievedafterEisperformed.WithintheSharma-Mittal
framework,thisbehaviorcanbeoftenapproximatedbyanentropymeasuresuchasTsallisof
degree30,asexplainedabove.10OneexamplewheretheexpectedreductionofShannonand
9Itisimportanttonotethattheproceduresweuseddonotguaranteefindinggloballymaximalsolutions;thus,a
failuretofindacaseofstrongdisagreementdoesnotnecessarilyentailthatnosuchcaseexists.
10Oneshouldnote,however,thatTsallis30,unlikepurenon-certaintyentropy,isacontinuousfunction.Asa
consequence,theapproximationdescribedeventuallyfailswhenonegetsveryclosetolimitingcases.More
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
31
Non-CertaintyentropydisagreesignificantlyinvolvesapriorP(H)={0.67,0.10,0.23}.The
Non-CertaintymeasureratesverypoorlyatestEsuchthatP(H|e)={0.899,0.100,0.001},
P(H|-)={0.001,0.100,0.899},andP(e)=0.74,andstronglyprefersatestFsuchthatP(H|f)=
{1,0,0},P(H|2)={0.40,0.18,0.42},andP(f)=0.45,becausetheprobabilitytoattainfull
certaintyfromFissizable(45%).TheexpectedreductionofShannonentropyimpliesthe
oppositeranking,becausetestE,whileunabletoprovidefullcertainty,willinvariablyyielda
highlyskewedposteriorascomparedtotheprior.
Shannonvs.OriginandHartleyEntropy(case5inTable5;DS=0.56andDS=0.48,
respectively).ThereductionofbothOriginandHartleyentropysharesimilarideasof
countinghowmanyhypothesesareconclusivelyruledoutbytheevidence.Forexample,with
priorP(H)={0.500,0.499,0.001},theexpectedreductionofeitherOriginorHartleyentropy
assignsvaluezerototestEsuchthatP(H|e)={0.998,0.001,0.001},P(H|-)={0.001,0.998,
0.001},andP(e)=0.501,becausenohypothesisiseverruledoutconclusively,andrather
preferstestFsuchthatP(H|f)={0.501,0.499,0},P(H|2)={0,0.499,0.501},andP(f)=0.998.
TheexpectedreductionofShannonentropyimpliestheoppositeranking,becauseFwill
almostalwaysyieldonlyatinychangeinoveralluncertainty.
Shannonvs.Non-ConcaveEntropy(case6inTable5;DS=0.26).Fornon-concaveentropies,
theexpectedentropyreductionmayturnouttobenegative,thusindicatinganallegedly
detrimentalquery,thatis,atestwhereexpectedutilityislowerthanthatofacompletely
irrelevanttest.Thisfeatureyieldscasesofsignificantdisagreementbetweentheexpected
reductionofourillustrativeNon-Concaveentropy,-./_q(uv,v) ,andofclassicalconcave
measuressuchasShannon.WithapriorP(H)={0.66,0.17,0.17},theNon-Concavemeasure
ratesatestEsuchthatP(H|e)={1,0,0},P(H|-)={1/3,1/3,1/3},andP(e)=0.49muchlower
thananirrelevanttestFsuchthatP(H|f)=P(H|2)=P(H).Indeed,thenon-concaveR-measure
assignsasignificantnegativevaluetotestE.Thiscriticallydependsononeinterestingfact:for
precisely,Tsallis30entropyrapidlydecreasesforalmostcertaindistributionssuchas,say,P(H)={0.998,0.001,
0.001}.Infact,Tsallis30entropyissizableandalmostconstantifP(H)conveysaless-than-almost-certainstateof
belief,andbecomeslargelynegligibleotherwise.
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
32
Non-Concaveentropy,goingfromP(H)toacompletelyflatposterior,P(H|-),isanextremely
aversiveoutcome(i.e.itimpliesaverylargeincreaseinuncertainty),whilethe49%chanceof
achievingcertaintybydatumeisnothighlyvalued(afeatureoflowdegreemeasures,aswe
know).TheexpectedreductionofShannonentropyimpliestheoppositerankinginstead,asit
conveystheprinciplethatnotestcanbeinformationallylessusefulthananirrelevanttest
(suchasF).
Shannonvs.QuadraticEntropy(case8inTable5;DS=0.09).ShannonandQuadratic
entropiesaresimilarinmanyways,yetatleastcasesofmoderatedisagreementcanbefound.
OneiswithpriorP(H)={0.50,0.14,0.36}.TestEissuchthatP(H|e)={0.72,0.14,0.14},
P(H|-)={0.14,0.14,0.72},andP(e)=0.62,whilewithtestFonehasP(H|f)={0.5,0.5,0},
P(H|2)={0.5,0,0.5},andP(f)=0.28.ExpectedQuadraticentropyreductionranksEoverF,as
itputsaparticularlyhighvalueonposteriordistributionswhereonesinglehypothesiscomes
toprevail.Incomparison,thisislessimportantforthereductionofShannonentropy,aslong
assomehypothesesarecompletely(orlargely)ruledout,asoccurswithF.Accordingly,the
ShannonmeasureprefersFoverE.
Shannonvs.ErrorEntropy(case9inTable5;DS=0.20).Astrongerdisagreementarises
betweenShannonandErrorentropy.ConsiderpriorP(H)={0.50,0.18,0.32},atestEsuch
thatP(H|e)={0.65,0.18,0.17},P(H|-)={0.17,0.18,0.65},andP(e)=0.69,andatestFsuch
thatP(H|f)={0.5,0.5,0},P(H|2)={0.5,0,0.5},andP(f)=0.36.Theexpectedreductionof
ErrorentropyissignificantwithEbutzerowithF,becausethelatterwillleavethemodal
probabilityuntouched.(Notethatitdoesnotmatterthatthehypotheseswiththemaximum
probabilitychanged.)However,testF,unlikeE,willinvariablyruleoutanhypothesisthatwas
apriorisignificantlyprobable,andforthisreasonispreferredbytheShannonR-measure.
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
33
Table5.Casesofstrongdisagreementbetweensevenmeasuresofexpectedentropyreduction.TwobinarytestsEandFareconsideredforaternaryhypothesissetH.Preferencestrengthisthedifferencebetween(normalized)valuesofexpectedentropyreductionforEandF,respectively:itispositiveiftestEisstrictlypreferred,negativeifFisstrictlypreferred,andnullif
they’reratedequally.Themostrelevantpreferencevaluestobecomparedarehighlightedinbold:theyillustratethat,foreachpairofR-measuresinoursampleofseven,thetable
includesatleastonecaseofmoderateorstrongdisagreement.
n. P(H)TestE TestF Preferencestrengthintheexpectedreductionofentropy
P(H|e)vs.P(H|!) P(e)vs.P(!) P(H|f)vs.P(H|") P(f)vs.P(") Non-Certainty Origin Hartley Non-Concave Shannon Quadratic Error
1 {0.50,0.25,0.25} {0.5,0.5,0}{0.5,0,0.5}
0.50.5
{1,0,0}{1/3,1/3,1/3}
0.250.75
–0.250 0.250 0.119 0.250 0.119 0 0
2 {0.67,0.17,0.17} {0.82,0.17,0.01}{0.01,0.17,0.82}
0.80.2
{1,0,0}{1/3,1/3,1/3}
0.490.51
–0.487 –0.490 –0.490 0.394 0.046 0.062 0.240
3 {0.67,0.10,0.23} {0.899,0.1,0.001}{0.001,0.1,0.899}
0.740.26
{1,0,0}{0.40,0.18,0.42}
0.450.55
–0.409 –0.450 –0.450 0.342 0.218 0.249 0.329
4 {0.6,0.1,0.3} {1,0,0}{1/3,1/6,1/2}
0.40.6
{0.7,0.3,0}{0.55,0.0.45}
1/32/3
0.400 –0.100 0.031 0.045 0.051 0.155 0.150
5 {0.5,0.499,0.001} {0.998,0.001,0.001}{0.001,0.998,0.001}
0.5010.499
{0.501,0.499,0}{0,0.499,0.501}
0.9980.002
0.942 –0.500 –0.369 0.499 0.617 0.744 0.746
6 {0.66,0.17,0.17} {1,0,0}{1/3,1/3,1/3}
0.490.51
{0.66,0.17,0.17}{0.66,0.17,0.17}
0.50.5
0.490 0.490 0.490 –0.236 0.288 0.250 0
7 {0.53,0.25,0.22} {1,0,0}{0.295,0.375,0.330}
1/32/3
{0.53,0.25,0.22}{0.53,0.25,0.22}
0.50.5
0.333 0.333 0.333 –0.123 0.261 0.249 0.080
8 {0.50,0.14,0.36} {0.72,0.14,0.14}{0.14,0.14,0.72}
0.620.38
{0.5,0.5,0}{0.5,0,0.5}
0.280.72
0 –0.500 –0.369 0.293 –0.085 0.086 0.330
9 {0.50,0.18,0.32} {0.65,0.18,0.17}{0.17,0.18,0.65}
0.690.31
{0.5,0.5,0}{0.5,0,0.5}
0.360.64
0 –0.180 –0.133 0.213 –0.179 –0.024 0.225
10 {0.42,0.42,0.16} {0.5,0.5,0}{0,0,1}
0.840.16
{0.66,0.24,0.10}{0.10,0.66,0.24}
0.570.43
0.160 0.580 0.470 –0.146 0.241 0.115 –0.120
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
34
6.Modelcomparison:Predictionandbehavior
Nowthatwehaveseenexamplesillustratingthetheoreticalpropertiesofavarietyof
Sharma-Mittalrelevancemeasures,weturntoaddressingwhethertheSharma-Mittal
measurescanhelpwithpsychologicalornormativetheoryofthevalueofinformation.
6.1.ComprehensiveanalysisofWason’sabstractselectiontask
ThesinglemostwidelystudiedexperimentalinformationsearchparadigmisWason’s
(1966)selectiontask.Intheclassical,abstractversion,participantsarepresentedwitha
conditionalhypothesis(or“rule”),h=“ifA[antecedent],thenC[consequent]”.The
hypothesisconcernssomecards,eachofwhichhasaletterononesideandanumberon
theother,forinstanceA=“thecardhasavowelononeside”andC=“thecardhasaneven
numberontheotherside”.Onesideisdisplayedforeachoffourcards:oneinstantiatingA
(e.g.,showingletterE),oneinstantiatingnot-A(e.g.,showingletterK),oneinstantiatingC
(e.g.,showingnumber4),andoneinstantiatingnot-C(e.g.,showingnumber7).
Participantshavethereforefourinformationsearchoptionsinordertoassessthetruthor
falsityofhypothesish:turningovertheA,thenot-A,theC,orthenot-Ccard.Theyareasked
tochoosewhichonestheywouldpickupasusefultoestablishwhetherthehypothesis
holdsornot.All,none,oranysubsetofthefourcardscanbeselected.
AccordingtoWason’s(1966)original,“Popperian”readingofthetask,theAandnot-C
searchoptionsareusefulbecausetheycouldfalsifyh(bypossiblyrevealingaevennumber
andavowel,respectively),soarationalagentshouldselectthem.Thenot-AandCoptions,
onthecontrary,couldnotprovideconclusivelyrefutingevidence,sothey’reworthlessin
thisinterpretation.However,observedchoicefrequenciesdepartmarkedlyfromthese
prescriptions.InOaksfordandChater’s(1994,p.613)metaanalysis,theywere89%,16%,
62%,and25%forA,not-A,C,andnot-C,respectively.OaksfordandChater(1994,2003)
devisedBayesianmodelsofthetaskinwhichagentstreatthefourcardsassampledfroma
largerdeckandareassumedtomaximizetheexpectedreductionofuncertainty,with
Shannonentropyasthestandardmeasure.OaksfordandChaterpostulatedafoil
hypothesisℎinwhichAandCarestatisticallyindependentandatargethypothesishunder
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
35
whichCalways(oralmostalways)followsA.InOaksfordandChater’s(1994)
“deterministic”analysis,CalwaysfollowedAunderthedependencehypothesish.Akey
innovationinOaksfordandChater(2003,p.291)wastheintroductionofan“exception”
parameter,suchthatP(C|A)=1–P(exception)underh.Themodelalsorequires
parametersaandgfortheprobabilitiesP(A)andP(C)oftheantecedentandconsequentof
h.WeimplementOaksfordandChater’s(2003)model,positinga=0.22andg=0.27
(accordingtothe“rarity”assumption),andanuniformprioronH={h,ℎ},assuggestedinOaksfordandChater(2003,p.296).Weexploredtheimplicationsofcalculatingthe
expectedusefulnessofturningovereachcard,notonlyaccordingtoShannonentropy
reduction,butforthewholesetofentropymeasuresfromtheSharma-Mittalframework.11
EmpiricalData.Wefirstaddresshowwelldifferentexpectedentropyreduction
measurescorrespondtoempiricalaggregatecardselectionfrequenciesinthetask,with
respecttoOaksfordandChater’s(2003)model.Fortheselectionfrequencies,weusethe
abstractselectiontaskdataasreportedbyOaksfordandChater(1994,p.613)and
mentionedabove(89%,16%,62%,and25%forA,not-A,C,andnot-C,respectively).
Figure6(toprow)showstherankcorrelationbetweenrelevancevaluesandempirical
selectionfrequenciesforeachorderanddegreevaluefrom0to20,instepsof0.25.First
considerresultsforthemodelwithP(exception)=0(Figure6,topleftsubplot).Awide
rangeofmeasures,includingexpectedreductionofShannonandQuadraticentropy,of
somenon-concaveentropies(e.g.,#$%('(,'.+))andofmeasureswithfairlyhighdegree(e.g.,
#$%('(,-))correlateperfectlywiththerankofselectionfrequencies.However,ifahigh
degreemeasurewithmoderateorhighorderisused,therankcorrelationisnotperfect.
11Tofittherelevantpatternsofresponses,wepursuedavarietyofmethods,includingoptimizingHattori’s
“selectiontendencyfunction”(whichmapsexpectedentropyreductionontothepredictedprobabilitythata
cardwillbeselected,seeHattori,1999,2002;alsoseeStringer,Borsboom,&Wagenmakers,2011),ortaking
previouslyreportedparametersforHattori’sselectiontendencyfunction;Spearmanrankcorrelation
coefficients;andPearsoncorrelations.Similarresultswereobtainedacrossthesemethods.Becausetherank
correlationsaresimpletodiscuss,wefocusonthosehere.Fullsimulationresultsfortheseandother
measures,modelvariantswithothervaluesofP(exception),andMatlabcode,areavailablefromJ.D.N.
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
36
ConsiderforinstancetheTsallismeasureofdegree20(i.e.#$%(.(,.()).Thisleadsto
relevancevaluesfortheA,not-A,C,andnot-Ccardsof0.0281,0.0002,0.0008,and0.0084,
respectively.BecausetherelativeorderingoftheCandthenot-Ccardisincorrect(from
theperspectiveofobservedchoices),therankcorrelationisonly0.8.Thesamerank
correlationof0.8isobtained,butforadifferentreason,fromstronglynon-concave
relevancemeasures.#$%(.(,(),forinstance,givesvaluesof1.181,0.380,1.054,and0.372
(againfortheA,not-A,C,andnot-Ccards,respectively),sothatthenot-Acardisdeemed
moreinformativethanthenot-Ccardbythisrelevancemeasure.
LetusnowconsiderexpectedreductionofOriginentropy,#$%((,() ,asanexampleofthe0-
ordermeasures.Itgivesrelevancevaluesof0.527,0,0,and0.159fortheA,not-A,C,andnot-C
cards,respectively.ThisissimilartoWason’sanalysisofthetask:onlytheAandthenot-C
cardscanfalsifyahypothesis(namely,thedependencehypothesish),thusonlythosetwo
cardshavevalue.Theothercardscouldchangetherelativeplausibilityofhvs.ℎ;however,accordingto0-ordermeasures,noinformationalvalueisachievedbecausenohypothesisis
definitelyruledout.Inthissense,0-ordermeasurescanbethoughtofasbringingelementsof
theoriginallogicalinterpretationoftheselectiontaskintothesameunifiedinformation-
theoreticframeworkincludingShannonandgeneralizedentropies(seebelowformoreon
this).Interestingly,thisdoesnotimplythattheAandthenot-Ccardsareequallyvaluable:in
themodel,theAcardoffersahigherchanceoffalsifyinghthanthenot-Ccard,soitismore
valuable,accordingtothisanalysis.Thus,whileincorporatingthebasicideaoftheimportance
ofpossiblefalsification,the0-orderSharma-Mittalformalizationofinformationalvalueoffers
somethingthatthestandardlogicalreadingdoesnot:arationaleforassessingtherelative
valueamongthosequeries(theAandthenot-Ccard)providingthepossibilityoffalsifyinga
hypothesis.TheOriginentropyvaluesandtheempiricaldataagreethattheAcardismost
usefuland(uptoatie)thatthenot-Acardisleastuseful,butdisagreeonvirtuallyeverything
else;#$%((,() ’srankcorrelationtoempiricalcardselectionfrequenciesis0.6325.
WhatifOaksfordandChater’s(2003)modeliscombinedwithexceptionparameter
P(exception)=0.1,ratherthan0?Inthiscase,theempiricalselectionfrequenciesperfectly
correlatewiththetheoreticalvaluesforanevenwiderrangeofmeasuresthanforthe
“deterministic”model(Figure6,toprightplot).Forinstance,Tsallisofdegree11,i.e.#$%('',''),
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
37
whichhadrankcorrelationof0.8withP(exception)=0,hasaperfectrankcorrelationwith
0.1.Thisisduetotherelativeorderingofthenot-AandCcards.FortheP(exception)=0
model,theA,not-A,C,andnot-Ccardshad#$%('','') relevanceof0.059,0.002,0.012,and
0.016,respectively;withP(exception)=0.1,thecards’respectiverelevancevaluesare0.019,
0.001,0.007,and0.005.Inaddition,adramaticdifferencebetweenP(exception)=0and
P(exception)=0.1arisesforthe0-ordermeasures.IfP(exception)>0,evenifverysmall,no
amountofobtaineddatacaneverleadtorulingoutahypothesisinthemodel.Therefore,with
P(exception)=0.1allcardshavezerovaluefor0-ordermeasures,andthecorrelationwith
behavioraldataisundefined(plottedblackinFigure6).
AprobabilisticunderstandingofWason’snormativeindications.Finally,wediscusshowwell
theexpectedinformationalvalueofthecards,ascalculatedusingOaksfordandChater’s
(2003)modelandvariousSharma-Mittalmeasures,correspondstoWason’soriginal
interpretationofthetask.Wethusconductedthesameanalysesasabove,butinsteadofusing
thehumanselectionfrequenciesweassumedthattheAcardwasselectedwith100%,thenot-
Acardwith0%,theCcardwith0%,andthenot-Ccardwith100%probability.The0-order
relevancemeasures,againwithinOaksfordandChater’s(2003)modelwithP(exception)=0,
provideaprobabilisticunderstandingofWason’snormativeindications.LikeWason,the0-
ordermeasuresdeemonlytheAandthenot-CcardstobeusefulwhenP(exception)=0.The
rankcorrelationwiththeoreticalselectionfrequenciesfromWason’sanalysisis0.94(see
Figure6,bottomleftplot).Whyisthecorrelationnotperfect?Theprobabilistic
understandingproposed,asdiscussedabove,goesbeyondthelogicalanalysis:becausetheA
cardoffersahigherprobabilityoffalsificationthanthenot-Ccarddoesintheprobability
model,the0-orderrelevancemeasuresvaluetheformermorethanthelatter.Recallthatour
hypotheticalparticipantsalwaysselectbothcardsthatentailthepossibilityoffalsifyingthe
dependencehypothesis;thus,thecorrelationislessthanone.Theworstcorrelationwith
Wason’srankingisfromthestronglynon-concavemeasures,suchas#$%(.(,();thiscorrelation
isexactlyzero.
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
38
OaskfordandChater’s(2003)
modelwithP(exception)=0
OaskfordandChater’s(2003)
modelwithP(exception)=0.1
humandata
Wason'stheory
Figure6.PlotsofrankcorrelationvaluesfortheexpectedreductionofvariousSharma-Mittalentropiesin
OaksfordandChater’s(2003)modeloftheWasonselectiontask.Inthetoprow,modelsofexpected
entropyreductionarecomparedwithempiricalaggregatecardselectionfrequencies.Inthebottomrow,
instead,thecomparisoniswiththeoreticalchoicesimpliedbyWason’soriginalanalysisofthetask.Inthe
leftvs.rightcolumnstheconditionalprobabilityrepresentationof“ifvowel,thenevennumber”rulesout
expectionsorallowsforthem(withprobability0.1),respectively.
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
39
TheWasonselectiontaskillustratesthetheoreticalpotentialoftheSharma-Mittal
framework.Whereasotherauthorsnotedtherobustnessofprobabilisticanalysesofthe
taskacrossdifferentmeasuresofinformationalutility(seeFitelson&Hawthorne,2010;
Nelson,2005,pp.985-986;Oaksford&Chater,2007),thevarietyofmeasuresinvolvedin
thoseanalysesaroseinanadhocway.Weextendthoseresults,andshowthateventhe
traditional,allegedlyanti-Bayesianreadingofthetaskcanberecoveredsmoothlyinone
overarchingframework.Inparticular,theimplicationsofWason’sPopperian
interpretationcanberepresentedwellbythemaximizationoftheexpectedreductionofan
entirelybalanced(order-0)Sharma-Mittalmeasure(suchasOriginorHartleyentropy)ina
deterministicreadingofthetask(i.e.,withP(exception)=0).Conversely,thismeansthat
adoptingaprobabilisticapproachtoWason’staskisnotbyitselfsufficienttoaccountfor
observedbehavior.Eventhen,infact,people’schoiceswouldstilldivergefromatleast
sometheoreticallyviablemodelsofinformationsearch.
6.2.Informationsearchinexperience-basedstudies
Isthesameexpecteduncertaintyreductionmeasureabletoaccountforhumanbehavior
acrossavarietyoftasks?Toexplorethisissue,wereviewedexperimentalscenarios
employedinexperience-basedinvestigationsofinformationsearchbehavior.Inthis
experimentalparadigm,participantslearntheunderlyingstatisticalstructureofan
environmentwhereitems(planktonspecimens)arevisuallydisplayedandsubjecttoa
binaryclassification(kindAvs.B)forwhichtwobinaryfeatures(yellowvs.blackeye;dark
vs.lightclaw)arepotentiallyrelevant.Immediatefeedbackisprovidedaftereachtrialina
learningphase,untilaperformancecriterionisreached,indicatingadequatemasteryofthe
environmentalstatistics.Inasubsequentinformation-acquisitiontestphaseofthis
procedure,bothofthetwofeatures(eyeandclaw)areobscured,andparticipantshaveto
selectthemostinformative/usefulfeaturerelativetothetargetcategories(kindsof
plankton).(SeeNelsonetal.,2010,foradetaileddescription.)Inourcurrentterms,these
scenariosconcernabinaryhypothesisspaceH={specimenofkindA,specimenofkindB}
andtwobinarytestsE={yelloweye,blackeye}andF={darkclaw,lightclaw}.Ineach
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
40
case,theexperience-basedlearningphaseconveyedthestructureofthejointprobability
distributionP(H,E,F)toparticipants.Thetestphase,inwhicheitherfeatureEorFcanbe
viewed,representsawaytoseewhethertheparticipantsdeemed#/(0, 1)or#/(0, 2)tobegreater.
Table6.Choicesbetweentwobinarytests/experiments(Evs.F)forabinaryclassificationproblem(H)in
experience-basedexperimentalprocedures.Cases1-3aretakenfromNelsonetal.(2010,Exp.1);cases4-5from
Exp.3inthesamearticle;case6isanunpublishedstudyusingthesameexperimentalprocedure;cases7-8are
fromMederandNelson(2012,Exp.1).
n. P(H)TestE TestF %observed
choicesofEP(H|e)vs.P(H|3) P(e)vs.P(3) P(H|f)vs.P(H|4) P(f)vs.P(4)
1 {0.7,0.3} {0,1}
{0.754,0.246}
0.072
0.928
{1,0}
{0.501,0.499}
0.399
0.601
82%(23/28)
2 {0.7,0.3} {0,1}
{0.767,0.233}
0.087
0.913
{1,0}
{0.501,0.499}
0.399
0.601
82%(23/28)
3 {0.7,0.3} {0.109,0.891}
{0.978,0.022}
0.320
0.680
{1,0}
{0.501,0.499}
0.399
0.601
97%(28/29)
4 {0.7,0.3} {0,1}
{0.733,0,.267}
0.045
0.955
{1,0}
{0.501,0.499}
0.399
0.601
89%(8/9)
5 {0.7,0.3} {0.201,0.799}
{0.780,0.220}
0.139
0.861
{1,0}
{0.501,0.499}
0.399
0.601
70%(14/20)
6 {0.7,0.3} {0.135,0.865}
{0.848,0.152}
0.208
0.792
{1,0}
{0.501,0.499}
0.399
0.601
70%(14/20)
7 {0.44,0.56} {0.595,0.405}
{0.331,0.669}
0.414
0.586
{0,1}
{0.502,0.498}
0.123
0.877
60%(12/20)
8 {0.36,0.64} {0.090,0.910}
{0.707,0.293}
0.562
0.438
{0,1}
{0.501,0.499}
0.282
0.118
79%(15/19)
Overall,wefoundeightrelevantexperimentalscenariosfromtheexperimental
paradigmdescribedabove(theyarelistedinTable6)inwhichtherewasatleastsome
interestingdisagreementamongtheSharma-Mittalmeasuresaboutwhichfeatureismore
useful.Foreach,wederivedvaluesofexpecteduncertaintyreductionfromSharma-Mittal
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
41
measuresoforderanddegreefrom0to20,inincrementsof0.25,andwecomputedthe
simpleproportionofcasesinwhicheachmeasure’srankingof#/(0, 1)and#/(0, 2)matchedthemostprevalentobservedchoice.
Nelsonetal.(2010)devisedtheirscenariostodissociatepredictionsfromasampleof
competingandhistoricallyinfluentialmodelsofrationalinformationsearch.Their
conclusionwasthattheexpectedreductionofErrorentropy(expectedprobabilitygain,in
theirterminology)accountedforparticipants’behaviorandoutperformedtheexpected
reductionofShannonentropy(expectedinformationgain,intheirterminology).Amore
comprehensiveanalysiswithinourcurrentapproachimpliesaricherpicture.Thedataset
employedcanbeaccuratelyrepresentedintheSharma-Mittalframeworkforasignificant
rangeofdegreevaluesprovidedthattheorderparameterishighenough(theresultsare
displayedinFigure7,leftside).Observedchoicesareespeciallyconsistentwithexpected
reductionofaquiteunbalanced(e.g.,r≥4),concaveorquasi-concave(tcloseto2)
Sharma-Mittalentropymeasure.Importantly,thereisoverlapbetweenresultsfrom
modelingtheWasonselectiontaskandtheseexperience-basedlearningdata,givinghope
totheideathataunifiedtheoreticalexplanationofhumanbehaviormayextendacross
severaltasks.
6.3.Informationsearchinwords-and-numbersstudies
Theexperience-basedlearningtasksdiscussedabovewereinspiredbyanalogoustasksin
whichthepriorprobabilitiesofcategoriesandfeaturelikelihoodswerepresentedto
participantsusingwordsandnumbers(e.g.,SkovandSherman,1986).Werefertosuch
tasksasPlanetVumaexperiments,reflectingthetypicallywhimsicalcontent,suchas
classifyingspeciesofaliensonPlanetVuma,designedtonotconflictwithpeople’s
experiencewithrealobjectcategories.
WhereasexpectedreductionofErrorentropy,andothermodelsasdiscussedabove,
givesaplausibleexplanationoftheexperience-basedlearningtaskdata,individualdatain
words-and-numbersstudiesareverynoisy,andnoattempthasbeenmadetoseewhether
aunifiedtheorycouldaccountforthemodalresponsesacrossthesetasks.Wethereforere-
analyzedempiricaldatafromseveralPlanetVumaexperiments,inamanneranalogousto
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
42
ouranalysesoftheexperience-basedlearningdataabove(Figure7).Whatdotheresults
show?Tooursurprise,theresultssuggestthattheremaybeasystematicexplanationof
people’sbehavioronwords-and-numbers-basedtasks.
Experience-basedlearning(Plankton) Words-and-numberspresentation(PlanetVuma)
Figure7.Ontheleft,agraphicalillustrationoftheempiricalaccuracyofSharma-Mittalmeasuresrelativeto
binaryinformationsearchchoicesin8experience-basedexperimentalscenarios(describedinTable6).The
shadeateachpointillustratestheproportionofchoices(outof8)correctlypredictedbytheexpected
reductionofthecorrespondingunderlyingentropy,withwhiteandblackindicatingmaximum(8/8)and
minimum(0/8)accuracy,respectively.ResultssuggestthatanArimotometricofmoderateorhighorderis
highlyconsistentwithhumanchoices.Ontheright,illustrationoftheempiricalaccuracyofSharma-Mittal
measuresintheoreticallysimilartasks,butwhereprobabilisticinformationispresentedinastandard
explicitformat(withnumericpriorprobabilitiesandtestlikelihoods).Inthesetasks,individualparticipants’
testchoicesarehighlynoisy.Canasystematictheorystillaccountforthemodalresultsacrosstasks?We
analyzed13cases(describedinTable7)ofbinaryinformationsearchpreferences.Theshadeateachpoint
illustratestheproportionofcomparisons(outof13)correctlypredictedbytheexpectedreductionofthe
correspondingunderlyingentropy,withwhiteandblackagainindicatingmaximum(13/13)andminimum
(0/13)accuracy,respectively.Resultsshowthatawiderangeofmeasuresisconsistentwithavailable
experimentalfindings,includingShannonentropyaswellasavarietyofhigh-degreemeasures(degreemuch
higherthantheArimotocurve).
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
43
Table7.Choicesbetweentwobinarytests/experiments(Evs.F)forabinaryclassificationproblem(H)in
words-and-numbers(VumaPlanet)experiments.Cases1-6arefromNelson(2005);case7isfromSkov&
Sherman(1986),cases8-10arefromNelsonetal.(2010,Exp.1);cases11-13fromWuetal.(inpress,Exp.1-3).
Ineachcase,testEwasdeemedmoreusefulthantestFbytheparticipants.Weonlyreportscenariosforwhich
atleasttwoSharma-Mittalmeasuresstrictlydisagreeaboutwhichofthetestshashigherexpectedusefulness.
(Thus,notallfeaturequeriesinvolvedintheoriginalarticlesarelistedhere.)Nelson(2005)askedparticipants
togivearankorderingamongfourpossiblefeatures’informationvalues.Herewelistthesixcorresponding
pairwisecomparisons,ineachcaselabelingthefeaturethatwasrankedhigherasthefavoriteone(E).Wuetal.
(inpress)studied14differentprobability,naturalfrequency,andgraphicalinformationformatsforthe
presentationofrelevantprobabilities.Forcomparisonwithotherstudies,wetakeresultsonlyfromthestandard
probabilityformathere.
n. P(H)TestE TestF
P(e|h) P(3|6) P(f|h) P(4|6)
1 {0.5,0.5} 0.70 0.30 0.99 1.00
2 {0.5,0.5} 0.30 0.0001 0.99 1.00
3 {0.5,0.5} 0.01 0.99 0.99 1.00
4 {0.5,0.5} 0.30 0.0001 0.70 0.30
5 {0.5,0.5} 0.01 0.99 0.30 0.0001
6 {0.5,0.5} 0.01 0.99 0.70 0.30
7 {0.5,0.5} 0.90 0.55 0.65 0.30
8 {0.7,0.3} 0.57 0 0 0.24
9 {0.7,0.3} 0.57 0 0 0.29
10 {0.7,0.3} 0.05 0.95 0.57 0
11 {0.7,0.3} 0.41 0.93 0.03 0.30
12 {0.7,0.3} 0.43 1.00 0.04 0.37
13 {0.72,0.28} 0.03 0.83 0.39 1.00
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
44
ThedegreeofthemostplausiblemeasuresisconsiderablyabovetheArimotocurve,
althoughnotashighas,forinstance,Non-Certaintyentropy(order30).Fromadescriptive
psychologicalstandpoint,aplausibleinterpretationisthatwhenconfrontedwithwords-
and-numbers-typetasks,peoplehaveastrongfocusonthechancesofobtainingacertain
ornear-to-certainresult,andarelessconcernedwith(or,perhaps,attunedto)thedetails
oftheindividualitemsintheprobabilitydistribution.TheSharma-Mittalframework
providespotentialexplanationforheretoforeperplexingexperimentalresults,whilealso
highlightingkeyquestions(e.g.,howmuchpreferencefornear-certainty,exactly,do
subjectshave)forfutureempiricalresearchonwords-and-numberstasks.
6.4.UnifyingtheoryandintuitioninthePersonGame(Havingyourcakeandeatingittoo)
Inthissection,weintroduceanothertheoreticalconundrumfromtheliterature,andshow
howtheSharma-Mittalframeworkmayhelpsolveit.Aspointedoutabove,theexpected
reductionofErrorentropyhadappearedinitiallytoprovidethebestexplanationof
people’sintuitionsandbehavioronexperience-based-learning-basedinformationsearch
tasks(Nelsonetal.,2010).Butthismodelleadstopotentiallycounterintuitivebehavioron
anotherinterestingkindofinformationsearchtask,namelythePersonGame(avariantof
theTwentyQuestionsgame).Inthisgame,ncards(say,20)withdifferentfacesare
presented.Oneofthosefaceshasbeenchosenatrandom(withequalprobability)tobethe
correctfaceinaparticularroundofthegame.Theplayer’staskistofindthetruefacein
thesmallestnumberofyes/noquestionsaboutphysicalfeaturesofthefaces.Forinstance,
askingwhetherthepersonhasabeardwouldbeapossiblequestion,E={e,7},withe=beardand7=nobeard.Ifk<nisthenumberofcharacterswithabeard,thenP(e)=k/nandP(7)=(n–k)/n.Moreover,a“yes”answerwillleavekequiprobableguessesstillinplay,anda“no”answern–ksuchguesses.
Severalpapershavereported(seeNelsonetal.,2014,forreferences)thatpeople
preferentiallyaskaboutfeaturesthatarepossessedbycloseto50%oftheremaining
possibleitems,thuswithP(e)closeto0.5.Thisstrategycanbelabelledthesplit-half
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
45
heuristic.Itisoptimaltominimizetheexpectednumberofquestionsneededundersome
taskvariants(Navarro&Perfors,2011),althoughnotinthegeneralcase(Nelson,Meder,&
Jones,2016),andcanbeaccountedforusingexpectedShannonentropyreduction.But
expectedShannonentropyreductioncannotaccountforpeople’sbehavioronexperience-
basedlearninginformationsearchtasks,asouraboveanalysesshow.CanexpectedError
entropyreductionaccountfortheseresultsandintuitions?Putmorebroadly,canthesame
entropymodelprovideasatisfyingaccountforboththePersonGameandtheexperience-
basedlearningtasks?Asithappens,Errorentropycannotaccountforthepreferenceto
splittheremainingitemscloseto50%.Infact,everypossiblequestion(unlessitsansweris
knownalready,becausenoneoralloftheremainingfaceshavethefeature)hasexactlythe
sameexpectedErrorentropyreduction,namely1/k,wheretherearekitemsremaining
(Nelson,Meder,&Jones,2016).Thismightleadustowonderwhetherwemusthave
differententropy/informationmodelstoaccountforpeople’sintuitionsandbehavior
acrossthesedifferenttasks.Indeed,itwouldcallintoquestionthepotentialforaunified
andgeneralpurposetheoryofthepsychologicalvalueofinformation.
ItturnsoutthatthefindingsonwhyexpectedShannonentropyreductionfavors
questionsclosetoa50:50split,andwhyErrorentropyhasnosuchpreference,applymuch
moregenerallythantoShannonandErrorentropy.Infact,forallSharma-Mittalmeasures,
theordinalevaluationofquestionsonthePersonGameissolelyafunctionofthedegreeof
theentropymeasure,andhasnothingtodowiththeorderofthemeasure[SupplMat,5].
Amongotherthings,thisimpliesthatallentropy-basedmeasureswithdegreet=1have
theexactsamepreferencesasexpectedShannonentropyreduction,andallofthem
quantifytheusefulnessofqueryingafeatureasafunctionoftheproportionofremaining
itemsthatpossessthatfeature.Similarly,alldegree-2measures,andnotonlyError
entropy,deemallquestionstobeequallyusefulinthePersonGame.Thecoreofthis
insightstemsfromthefactthat,ifaprobabilitydistributionisuniform,thentheentropyof
thatdistributiondependsonlyonthedegreeofaSharma-Mittalentropymeasure.More
formally,foranysetofhypothesesH={h1,h2,…hn}withauniformprobabilitydistribution
U(H):
789:$%(;,<)(0) = >8?(8)
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
46
Figure8.TheexpectedentropyreductionofabinaryquestionE={e,7}inthePersonGamewithahypothesissetHofsize40(thepossibleguesses,thatis,charactersinitiallyinplay)asafunctionofthe
proportionofpossibleguessesremainingaftergettingdatume(e.g.,a“yes”answerto“hasthechosenperson
abeard?”).Questionsaredeemedmostvaluablewiththezero-degreeentropymeasures(bottomrightplot).
Althoughtheshapeofthecurveissimilarforthedegreet=0anddegreet=1measures,theactual
informationvalue(seetheyaxis)decreasesasthedegreeincreases.Fordegreet=2(forexampleforError
entropy),everyquestionisequallyuseful(providedthatthereissomeuncertaintyabouttheanswer;bottom
leftplot).Ifthedegreeisgreaterthan2,thentheleast-equally-splitquestions(e.g.,1:39questions,inthecase
of40items)aredeemedmostuseful(leftcolumn,topandmiddlerow).Theorderparameterisirrelevantfor
purposesofevaluatingquestions’expectedusefulnessinthePersonGame,becauseallpriorandpossible
posteriorprobabilitydistributionsareuniform(seetext).
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
47
Figure8showshowpossiblequestionsarevalued,inthePersonGame,asafunctionof
theproportionofremainingitemsthatpossessaparticularfeature.Weseethatift=1,as
forShannonandallRényientropies,questionswithclosetoa50:50splitarepreferred.If
thedegreetisgreaterthan1butlessthan2,questionswithclosetoa50:50splitarestill
preferred,butlessso.Ift=2,then1:99and50:50questionsaredeemedequallyuseful.
Remarkably,ifthedegreeisgreaterthan2,thena1:99questionispreferredtoa50:50
question.
WhilethechoiceofparticularSharma-Mittalmeasuresisonlypartlyconstrainedby
observedpreferencesinthePersonGamealone(andspecificallythevalueoftheorder
parameterrisnot),nothinginprinciplewouldguaranteethatajointandcoherentaccount
ofsuchbehaviorandotherfindingsexists.Itisthenimportanttopointoutthatonecan,in
fact,pickupanentropymeasurewherebytheexperience-baseddataabovefollowalong
withagreaterinformativevaluefor50:50questionsthanfor1:99questionsinthePerson
Game.Forinstance,medium-orderArimotoentropies(suchas789$%('(,'.@))willwork.
7.Generaldiscussion
Inthispaper,wehavepresentedageneralframeworkfortheformalanalysisof
uncertainty,theSharma-Mittalentropyformalism.Thisframeworkgeneratesa
comprehensiveapproachtotheinformationalvalueofqueries(questions,tests,
experiments)astheexpectedreductionofuncertainty.Theamountoftheoreticalinsight
andunificationachievedisremarkable,inourview.Moreover,suchaframeworkcanhelp
usunderstandexistingempiricalresults,andpointoutimportantresearchquestionsfor
futureinvestigationofhumanintuitionandreasoningprocessesasconcernsuncertainty
andinformationsearch.
Mathematically,theparsimonyoftheSharma-Mittalformalismisappealingandyields
decisiveadvantagesinanalyticmanipulations,derivations,andcalculations,too.Withinthe
domainofcognitivescience,noearlierattempthasbeenmadetounifysomanyexisting
modelsconcerninginformationsearch/acquisitionbehavior.Notably,thisinvolvesboth
popularcandidaterationalmeasuresofinformationalutility(suchastheexpected
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
48
reductionofShannonorErrorentropy)andavowedheuristicmodels,suchasBaronetal.’s
(1988,106)quasi-Popperianheuristic(maximizationoftheexpectednumberof
hypothesesruledout,i.e.,theexpectedreductionofOriginentropy)andNelsonetal.’s
(2010,962)“probability-of-certainty”heuristic(closelyapproximatedbytheexpected
reductionofahighdegreeTsallisentropy,orasimilarmeasure).Inaddition,onceapplied
touncertaintyandinformationsearch,theSharma-Mittalparametersarenotdumb
mathematicalconstruals,butrathercapturecognitivelyandbehaviorallymeaningfulideas.
Roughly,theorderparameter,r,captureshowmuchonedisregardsminorhypotheses(via
thekindofmeansappliedtotheprobabilityvaluesinP(H)).Thedegreeparametert,onthe
otherhand,captureshowmuchonecaresaboutgetting(veryclose)tocertainty(viathe
behaviorofthesurprise/atomicinformationfunction;seeFigure3).Thus,highorder
indicatesastrongfocusontheprevalent(mostlikely)elementinthehypothesissetand
lackofconsiderationforminorpossibilities.Averyloworder,ontheotherhand,impliesa
Popperianorquasi-Popperianattitudeintheassessmentoftests,withamarked
appreciationofpotentiallyfalsifyingoralmostfalsifyingevidence.Thedegreeparameter,
inturn,hasimportantimplicationsforhowmuchpotentiallyconclusiveexperimentsare
valued,ascomparedtoexperimentsthatareinformativebutnotconclusive.Moreover,for
eachparticularorder,ifthedegreeishigherthanthecorrespondingArimotoentropy(and
inanycaseiftheorderislessthan0.5orthedegreeisatleast2),thentheconcavityofthe
entropymeasureguaranteesthatnoexperimentwillberatedashavingnegativeexpected
usefulness.
EvenaccordingtofairlycautiousviewssuchasAczel’s(1984),theaboveremarksseem
toprovideafairlystrongmotivationtoconsiderpursuingageneralizedapproach.Hereis
anotherpossibleconcern,however.Uncertaintyandtheinformationalvalueoftestsmay
beinvolvedinmanyargumentsconcerninghumancognition.Nowweseethatthose
notionscanbeformalizedinmanydifferentways,suchthatdifferentproperties(say,
additivity,ornon-negativity)areorarenotimplied.Thus,theargumentsatissuemightbe
validforsomechoicesofthecorrespondingmeasuresandnotforothers.Thispointhas
beenlabelledtheissueofmeasure-sensitivityinrelatedareas(Fitelson,1999)—isit
somethingtobeworriedabout?Doesitraiseproblemsforourproposal?
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
49
Itisnotuncommonformeasure-sensitivitytofosterskepticalordismissivereactionson
theprospectsoftheformalanalysisoftheconceptatissue(e.g.Hurlbert,1971,Kyburg&
Teng,2001,pp.98ff.).However,measure-sensitivityisawidespreadandmundane
phenomenon.Inareasrelatedtotheformalanalysisofreasoning,theissuearises,for
instance,forBayesiantheoriesofinductiveconfirmation(e.g.,Brössel2013;Crupi&
Tentori,2016;Festa&Cevolani,2016;Glass,2013;Hájek&Joyce,2008;Roche&Shogenji,
2014),scoringrulesandmeasuresofaccuracy(e.g.,D’Agostino&Sinigaglia,2010;Leitgeb
&Pettigrew,2010a,b;Levinstein,2012;Preddetal.,2009),andmeasuresofcausal
strength(e.g.,Griffiths&Tenenbaum,2005,2009;Fitelson&Hitchcock,2011;Meder,
Mayrhofer,&Waldmann,2014;Sprenger,2016).Ourtreatmentcontributestomakethe
samepointexplicitformeasuresofuncertaintyandtheinformationalvalueof
experiments.Thisweseeasaconstructivecontribution.Theprominenceofonespecific
measureinoneresearchdomainmaywellhavebeenpartlyaffectedbyhistorical
contingencies.Asaconsequence,whenatheoreticalorexperimentalinferencerelieson
thechoiceofonemeasure,itmakessensetocheckhowrobustitisacrossdifferentchoices
or,alternatively,toacknowledgewhichmeasure-specificpropertiessupporttheconclusion
andhowcompellingtheyare.Havingapluralityofrelatedmeasuresavailableisindeedan
importantopportunity.Itpromptsthoroughinvestigationofthefeaturesofalternative
optionsandtheirrelationships(e.g.,Crupi,Chater,&Tentori,2013;Huber&Schmidt-Petri,
2009;Nelson,2005,2008),itcanprovidearichsourceoftoolsforboththeorizingandthe
designofnewexperimentalinvestigations(e.g.,Rusconietal.,2014;Schupbach,2011;
Tentorietal.,2007),anditmakesitpossibletotailorspecificmodelstovaryingtasksand
contextswithinanotherwisecoherentapproach(e.g.,Crupi&Tentori,2014;Dawid&
Musio,2014;Oaksford&Hahn,2007).
WhichSharma-Mittalmeasuresaremoreconsistentwithobservedbehavioroverall?
Accordingtoouranalyses,asubsetofSharma-Mittalinformationsearchmodelsreceivesa
significantamountofconvergentsupport.Wefoundthatmeasuresofhighbutfiniteorder
accountingfortheexperience-based(planktontask)data(Figure7,leftside)arealso
empiricallyadequateforabstractselectiontaskdata(Figure6,toprow)andresultsfroma
TwentyQuestionskindoftasksuchasthePersonGame(Figure8).Ontheotherhand,the
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
50
bestfitwithwords-and-numbers(PlanetVuma)informationsearchtasksindicatesa
differentkindofmodelwithintheSharma-Mittalframework(Figure7,rightside).For
thesecases,ouranalysisthussuggeststhatpeople’sbehaviormaycomplywithdifferent
measuresindifferentsituations,soakeyquestionarisesaboutthefeaturesofataskwhich
affectsuchvariationinaconsistentway,suchasacomparablystrongerappreciationof
certaintyorquasi-certaintyaspromptedbyanexperimentalprocedureconveying
environmentalstatisticsbyexplicitverbalandnumericalstimuli.
Beyondthisbroadoutlook,ourdiscussionalsoallowsfortheresolutionofanumberof
puzzles.Letusmentionalastone.Nelsonetal.(2010)hadconcludedfromtheir
experimentalinvestigationsthathumaninformationsearchinanexperience-basedsetting
wasappropriatelyaccountedforbymaximizationoftheexpectedreductionofError
entropy.Thisspecificmodel,however,exhibitssomequestionablepropertiesrelatedtoits
lackofmathematicalcontinuity:inparticular,ifthemostlikelyhypothesisinHisnot
changedbyanypossibleevidenceinE,thenthelatterhasnoinformationalutility
whatsoeveraccordingto#ABBCB ,nomatterifitcanruleoutothernon-negligiblehypothesesintheset(see,e.g.,cases1and6inTable6).FindingsfromBaronetal.(1988)
suggestthatthismightnotdescribehumanjudgmentadequately.Inthatstudy,
participantsweregivenafictitiousmedicaldiagnosisscenariowithP(H)={0.64,0.24,
0.12},andaseriesofpossiblebinarytestsincludingEsuchthatP(H|e)={0.47,0.35,0.18},
P(H|7)={1,0,0}andP(e)=0.68andanothercompletelyirrelevanttestF(withanevenchanceofapositive/negativeresultoneachoneoftheelementsinH,sothatP(H|f)=
P(H|D)=P(H)).Accordingto#ABBCB ,testsEandFarebothequallyworthless—
#/ABBCB(0, 1) = #/ABBCB(0, 2) = 0—becausehypothesish1ÎHremainsthemostlikelyno
matterwhat.Participants’meanratingsoftheusefulnessofEandFweremarkedly
different,however:0.48vs.0.09(ona0-1scale).Indeed,ratingEhigherthanFseemsat
leastreasonable,contrarytowhat#ABBCB implies.IntheSharma-Mittalframework,reconciliationispossible:expectedreductionofarelativelyhighorder(say,10)entropy
measurefromtheArimotofamilywouldaccountforNelsonetal.’s(2010)andsimilar
findings(seeFigure7),andstillwouldnotputtestEaboveonaparwiththeentirely
pointlesstestF.Indeed,givenourtheoreticalbackgroundandthelimitedempirical
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
51
indicationsavailable,suchameasurewouldcountasaplausiblechoiceinourview,had
onetopickupaspecificentropyunderlyingawidelyapplicablemodeloftheinformational
utilityofexperiments.Moreover,thiskindofoperationhaswiderscope.Originentropy,for
instance,mayimplylargelyappropriateratingsinsomecontexts(say,biological)andyet
notbewell-behavedbecauseofitsdiscontinuities:aSharma-Mittalmeasuresuchas
789$%((.',(.') wouldthencloselyapproximatetheformerwhileavoidingthelatter.
Manyfurtherempiricalissuescanbeaddressed.Foroneinstance,ouranalysisofhuman
datainTables6-7andFigure7providesrelativelyweakandindirectevidenceagainstnon-
concaveentropymeasuresasabasisfortheassessmentoftheinformationalutilityof
queriesbyhumanagents.However,stronglydivergingpredictionscanbegeneratedfrom
concavevs.non-concavemeasures(asillustratedincases6and7,Table5),andhenceput
toempiricaltest.Moreover,ourexplanatoryreanalysisofpriorworkwasbasedonthe
aggregatedatareportedinearlierarticles—buthowdoesthisextendtoindividual
behavior?Weareawareofnostudiesthataddressquestionsofwhetherthereare
meaningfulindividualdifferencesinthepsychologyofinformation.Thus,whileinferences
aboutindividualsshouldbethegoal(Lee,2011),thisrequiresfutureresearch,perhaps
withadaptiveBayesianexperimentaldesigntechniques(Kimetal.,2014).Bettermodelsof
individual-levelpsychologycouldalsoservethegoalofidentifyingtheinformationthat
wouldbemostinformativeforindividualhumanlearners(Gureckis&Markant,2012),
potentiallyenhancingautomatedtutorsystems.Anotherideaconcernsthedirect
assessmentofuncertainty,e.g.,whethermoreuncertaintyisperceivedin,say,P(H)={0.49,
0.49,0.02}vs.P*(H)={0.70,0.15,0.15}.Judgmentsofthiskindarelikelytoplayarolein
humanreasoninganddecision-makingandmaybeplausiblymodulatedbyanumberof
interestingfactors.Moreover,anarrayofrelevantpredictionscanbegeneratedfromthe
Sharma-Mittalframeworktodissociatesubsetsofentropymeasures.Yetasfarasweknow,
andrathersurprisingly,noestablishedexperimentalprocedureexistsforadirect
behavioralmeasurementofthejudgedoveralluncertaintyconcerningahypothesisset;
thisisanotherimportantareaforfutureinvestigation.
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
52
REFERENCES
AczélJ.(1984).Measuringinformationbeyondcommunicationtheory:Whysomegeneralized
informationmeasuresmaybeuseful,othersnot.AequationesMathematicae,27,1-19.
AczélJ.(1987).Characterizinginformationmeasures:Approachingtheendofanera.LectureNotes
inComputerScience,286,357-384.
AczélJ.,ForteB.,&NgC.T.(1974).WhytheShannonandHartleyentropiesare“natural”.Advances
inAppliedProbability,6,131-146.
ArimotoS.(1971).Information-theoreticalconsiderationsonestimationproblems.Informationand
Control,19,181-194.
AusterweilJ.L.&GriffithsT.L.(2011).Seekingconfirmationisrationalfordeterministichypotheses.
CognitiveScience,35,499-526.
Bar-HillelY.&CarnapR.(1953).Semanticinformation.BritishJournalforthePhilosophyofScience,
4,147-157.
BaronJ.(1985).Rationalityandintelligence.NewYork:CambridgeUniversityPress.
BaronJ.,BeattieJ.,&HersheyJ.C.(1988).HeuristicsandbiasesindiagnosticreasoningII:
Congruence,information,andcertainty.OrganizationalBehaviorandHumanDecisionProcesses,
42,88-110.
BarwiseJ.(1997).Informationandpossibilities.NotreDameJournalofFormalLogic,38,488-515.
BeckC.(2009).Generalisedinformationandentropymeasuresinphysics.ContemporaryPhysics,
50,495-510.
Ben-BassatM.&RavivJ.(1978).Rényi’sentropyandtheprobabilityoferror.IEEETransactionson
InformationTheory,24,324-331.
BenishW.A.(1999).Relativeentropyasameasureofdiagnosticinformation.MedicalDecision
Making,19,202-206.
BoztasS.(2014).OnRényientropiesandtheirapplicationstoguessingattacksincryptography.
IEICETransactionsonFundamentalsofElectronics,Communication,andComputerSciences,97,
2542-2548.
BradleyS.&SteeleK.(2014).Uncertainty,learningandthe‘problem’ofdilation.Erkenntnis,79,
1287–1303.
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
53
BramleyN.R.,LagnadoD.,&SpeekenbrinkM.(2015).Conservativeforgetfulscholars:Howpeople
learncausalstructurethroughsequencesofinterventions.JournalofExperimentalPsychology:
Learning,Memory,andCognition,41,708-731.
BrierG.W.(1950).Verificationofforecastsexpressedintermsofprobability.MonthlyWeather
Report,78,1-3.
BrösselP.(2013).Theproblemofmeasuresensitivityredux.PhilosophyofScience,80,378–397.
CarnapR.(1952).TheContinuumofInductiveMethods.Chicago:UniversityofChicagoPress.
ChoA.(2002).Afreshtakeondisorder,ordisorderlyscience.Science,297,1268–1269.
CrupiV.&GirottoV.(2014).Fromistoought,andback:Hownormativeconcernsforsterprogress
inreasoningresearch.FrontiersinPsychology,5,219.
CrupiV.&TentoriK.(2014).Measuringinformationandconfirmation.StudiesintheHistoryand
PhilosophyofScience,47,81-90.
CrupiV.&TentoriK.(2016).Confirmationtheory.InA.Hájek&C.Hitchcock(eds.),Oxford
HandbookofPhilosophyandProbability(pp.650-665).Oxford:OxfordUniversityPress.
CrupiV.,ChaterN.,&TentoriK.(2013).Newaxiomsforprobabilityandlikelihoodratiomeasures.
BritishJournalforthePhilosophyofScience,64,189–204.
CrupiV.,TentoriK.,&LombardiL.(2009).Pseudodiagnosticityrevisited.PsychologicalReview,116,
971-985.
CsizárI.(2008).Axiomaticcharacterizationsofinformationmeasures.Entropy,10,261-273.
D’AgostinoM.&SinigagliaC.(2010).Epistemicaccuracyandsubjectiveprobability.InM.Suárez,M.
Dorato,&M.Rèdei(eds.),EpistemologyandMethodologyofScience(pp.95-105).Berlin:
Springer.
DaróczyZ.(1970).Generalizedinformationfunctions.InformationandControl,16,36-51.
DawidA.P.(1998).Coherentmeasuresofdiscrepancy,uncertaintyanddependence,with
applicationstoBayesianpredictiveexperimentaldesign.TechnicalReport139,Departmentof
StatisticalScience,UniversityCollegeLondon
(http://www.ucl.ac.uk/Stats/research/pdfs/139b.zip).
DawidA.P.&MusioM.(2014).Theoryandapplicationsofproperscoringrules.Metron,72,169-
183.
DenzlerJ.&BrownC.M(2002).Informationtheoreticsensordataselectionforactiveobject
recognitionandstateestimation.IEEETransactionsonPatternAnalysisandMachineIntelligence,
24,145-157.
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
54
EvansJ.St.B.T.&OverD.E.(1996).Rationalityintheselectiontask:Epistemicutilityversus
uncertaintyreduction.PsychologicalReview,103,356-363.
FanoR.(1961).TransmissionofInformation:AStatisticalTheoryofCommunications.Cambridge:
MITPress.
FestaR.(1993).OptimumInductiveMethods.RijksuniversiteitGroningen.
FestaR.&CevolaniG.(2016).UnfoldingthegrammarofBayesianconfirmation:Likelihoodand
anti-likelihoodprinciples.PhilosophyofScience,forthcoming.
FitelsonB.(1999).ThepluralityofBayesianmeasuresofconfirmationandtheproblemofmeasure
sensitivity.PhilosophyofScience,66,S362–S378.
FitelsonB.&HawthorneJ.(2010).TheWasontask(s)andtheparadoxofconfirmation.
PhilosophicalPerspectives,24(Epistemology),207-241.
FitelsonB.&HitchcockC.(2011).Probabilisticmeasuresofcausalstrength.InP.McKayIllari,F.
Russo,&J.Williamson(eds.),CausalityintheSciences(pp.600–27).Oxford:OxfordUniversity
Press.
FloridiL.(2009).Philosophicalconceptionsofinformation.InG.Sommaruga(ed.),FormalTheories
ofInformation(pp.13-53).Berlin:Springer.
FloridiL.(2013).Semanticconceptionsofinformation.InE.Zalta(ed.),TheStanfordEncyclopedia
ofPhilosophy(Spring2013Edition).url=
http://plato.stanford.edu/archives/spr2013/entries/information-semantic.
FrankT.(2004).CompletedescriptionofageneralizedOrnstein-Uhlenbeckprocessrelatedtothe
non-extensiveGaussianentropy.PhysicaA,340,251-256.
GauvritN.&MorsanyiK.(2014).Theequiprobabilityfromamathematicalandpsychological
perspective.AdvancesinCognitivePsychology,10,119-130.
GibbsJ.P.&MartinW.T.(1962).Urbanization,technology,andthedivisionoflabor.American
SociologicalReview,27,667-677.
GigerenzerG.,HertwigR.,&PachurT.(eds.)(2011).Heuristics:TheFoundationsofAdaptive
Behavior.NewYork:OxfordUniversityPress.
GiniC.(1912).Variabilitàemutabilità.InMemoriedimetodologiastatistica,I:Variabilitàe
concentrazione(pp.189-358).Milano:Giuffrè,1939.
GlassD.H.(2013).Confirmationmeasuresofassociationruleinterestingness.Knowledge-Based
Systems,44,65-77.
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
55
GneitingT.&RafteryA.E.(2007).Strictlyproperscoringrules,prediction,andestimation.Journal
oftheAmericanStatisticalAssociation,102,359-378.
GoodI.J.(1950).ProbabilityandtheWeightofEvidence.NewYork:Griffin.
GoodI.J.(1952).RationalDecisions.JournaloftheRoyalStatisticalSocietyB,14,107-114.
GoodI.J.(1967).Ontheprincipleoftotalevidence.BritishJournalforthePhilosophyofScience,17,
319-321.
GoosensW.K.(1976).Acritiqueofepistemicutilities.InR.Bogdan(ed.),Localinduction(pp.93-
114).Dordrecht:Reidel.
GriffithsT.L.&TenenbaumJ.B.(2005).Structureandstrengthincausalinduction.Cognitive
Psychology,51,334-384.
GriffithsT.L.&TenenbaumJ.B.(2009).Theory-basedcausalinduction.PsychologicalReview,116,
661-716.
GureckisT.M.&MarkantD.B.(2012).Self-directedlearning:Acognitiveandcomputational
perspective.PerspectivesonPsychologicalScience,7,464-481.
HájekA.&JoyceJ.(2008).Confirmation.InS.Psillos&M.Curd(eds.),RoutledgeCompaniontothe
PhilosophyofScience(pp.115-129).NewYork:Routledge.
HartleyR.(1928).Transmissionofinformation.BellsSystemsTechnicalJournal,7,535-563.
HassonU.(2016).Theneurobiologyofuncertainty:Implicationsforstatisticallearning.
PhilosophicalTransactionsB,371,20160048.
HattoriM.(1999).TheeffectsofprobabilisticinformationinWason’sselectiontask:Ananalysisof
strategybasedontheODSmodel.InProceedingsofthe16thAnnualMeetingoftheJapanese
CognitiveScienceSociety(pp.623-626).
HattoriM.(2002).AquantitativemodelofoptimaldataselectioninWason’sselectiontask.
QuarterlyJournalofExperimentalPsychology,55,1241-1272.
HavrdaJ.&CharvátF.(1967).Quantificationmethodofclassificationprocesses.Conceptof
structurala-entropy.Kybernetica,3,30-35.
HillM.(1973).Diversityandevenness:Aunifyingnotationanditsconsequences.Ecology,54,427-
431.
HoffmannS.(2008).Generalizeddistribution-baseddiversitymeasurement:Surveyandunification.
FacultyofEconomicsandManagementMagdeburg,WorkingPaper23(http://www.ww.uni-
magdeburg.de/fwwdeka/femm/a2008_Dateien/2008_23.pdf).
HorwichP.(1982).ProbabilityandEvidence.Cambridge,UK:CambridgeUniversityPress.
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
56
HuberF.&Schmidt-PetriC.(eds.)(2009).DegreesofBelief.Dordrecht:Springer.
HurlbertS.H.(1971).Thenon-conceptofspeciesdiversity:Acritiqueandalternativeparameters.
Ecology,52,577-586.
JostL.(2006).Entropyanddiversity.Oikos,113,363-375.
KaniadakisG.,LissiaM.,&ScarfoneA.M.(2004).Deformedlogarithmsandentropies.PhysicaA,340,
41-49.
KatsikopoulosK.V.,SchoolerL.J.,&HertwigR.(2010).Therobustbeautyofordinary
information,PsychologicalReview,117,1259-1266.
KeylockJ.C.(2005).SimpsondiversityandtheShannon-Wienerindexasspecialcasesofa
generalizedentropy.Oikos,109:203-207.
KimW.,PittM.A.,LuZ.L.,SteyversM.,&Myung,J.I.(2014).Ahierarchicaladaptiveapproachto
optimalexperimentaldesign.NeuralComputation,26,2465-2492.
KlaymanJ.&HaY.-W.(1987).Confirmation,disconfirmation,andinformationinhypothesistesting.
PsychologicalReview,94,211-228.
KyburgH.E.&TengC.M.(2001).UncertainInference.NewYork:CambridgeUniversityPress.
LaaksoM.&TaageperaR.(1979).“Effective”numberofparties–Ameasurewithapplicationto
WestEurope.ComparativePoliticalStudies,12,3-27.
LandeR.(1996).Statisticsandpartitioningofspeciesdiversity,andsimilarityamongmultiple
communities.Oikos,76,5-13.
LeeM.D.(2011).HowcognitivemodelingcanbenefitfromhierarchicalBayesianmodels.Journalof
MathematicalPsychology,55,1-7.
LeggeG.E.,KlitzT.S.,&TjanB.S.(1997).Mr.Chips:Anidealobservermodelofreading.
PsychologicalReview,104,524-553.
LeitgebH.&PettigrewR.(2010a).AnobjectivejustificationofBayesianismI:Measuring
inaccuracy.PhilosophyofScience,77,201-235.
LeitgebH.&PettigrewR.(2010b).AnobjectivejustificationofBayesianismII:Theconsequencesof
minimizinginaccuracy.PhilosophyofScience,77,236-272.
LevinsteinB.(2012).LeitgebandPettigrewonaccuracyandupdating.PhilosophyofScience,79,
413-424.
LewontinR.C.(1972).Theapportionmentofhumandiversity.EvolutionaryBiology,6,381-398.
LindleyD.V.(1956).Onameasureoftheinformationprovidedbyanexperiment.Annalsof
MathematicalStatistics,27,986-1005.
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
57
MarkantD.&GureckisT.M.(2012).Doestheutilityofinformationinfluencesamplingbehavior?In
N.Miyake,D.Peebles,&R.P.Cooper(eds.),Proceedingsofthe34thAnnualConferenceofthe
CognitiveScienceSociety(pp.719-724).Austin,TX:CognitiveScienceSociety.
MasiM.(2005).AstepbeyondTsallisandRényientropies.PhysicsLettersA,338,217-224.
MederB.&NelsonJ.D.(2012).Informationsearchwithsituation-specificrewardfunctions.
JudgmentandDecisionMaking,7,119-148.
MederB.,MayrhoferR.,&WaldmannM.R.(2014).Structureinductionindiagnosticcausal
reasoning.PsychologicalReview,121,277-301.
MuliereP.&ParmigianiG.(1993).Utilityandmeansinthe1930s.StatisticalScience,8,421-432.
NajemnikJ.&GeislerW.S.(2005).Optimaleyemovementstrategiesinvisualsearch.Nature,434,
387-391.
NajemnikJ.&GeislerW.S.(2009).Simplesummationruleforoptimalfixationselectioninvisual
search.VisionResearch,49,1286-1294.
NaudtsJ.(2002).Deformedexponentialsandlogarithmsingeneralizedthermostatistics.PhysicaA,
316,323-334.
NavarroD.J.&PerforsA.F.(2011).Hypothesisgeneration,sparsecategories,andthepositivetest
strategy.PsychologicalReview,118,120-134.
NelsonJ.D.(2005).Findingusefulquestions:OnBayesiandiagnosticity,probability,impact,and
informationgain.PsychologicalReview,112,979-999.
NelsonJ.D.(2008).Towardsarationaltheoryofhumaninformationacquisition.InM.Oaksford&
N.Chater(eds.),Theprobabilisticmind:Prospectsforrationalmodelsofcognition(pp.143-163).
Oxford:OxfordUniversityPress.
NelsonJ.D.&CottrellG.W.(2007).Aprobabilisticmodelofeyemovementsinconceptformation.
Neurocomputing,70,2256-2272.
NelsonJ.D.,DivjakB.,GudmundsdottirG.,MartignonL.,&MederB.(2014).Children’ssequential
informationsearchissensitivetoenvironmentalprobabilities.Cognition,130,74-80.
NelsonJ.D.,McKenzieC.R.M.,CottrellG.W.,&SejnowskiT.J.(2010).Experiencematters:
Informationacquisitionoptimizesprobabilitygain.PsychologicalScience,21,960-969.
NelsonJ.D.,MederB.,&JonesM.(2016).Onthefinelinebetween“heuristic”and“optimal”
sequentialquestionstrategies.Submitted.
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
58
NelsonJ.D.,TenenbaumJ.B.,&MovellanJ.R.(2001).Activeinferenceinconceptlearning.InJ.D.
Moore&K.Stenning(eds.),Proceedingsofthe23rdConferenceoftheCognitiveScienceSociety
(pp.692-697).Mahwah(NJ):Erlbaum.
NiiniluotoI.&TuomelaR.(1973).TheoreticalConceptsandHypothetico-InductiveInference.
Dordrecht:Reidel.
OaksfordM.&ChaterN.(1994).Arationalanalysisoftheselectiontaskasoptimaldataselection.
PsychologicalReview,101,608-631.
OaksfordM.&ChaterN.(2003).Optimaldataselection:Revision,review,andre-evaluation.
PsychonomicBulletin&Review,10,289-318.
OaksfordM.&HahnU.(2007).Induction,deduction,andargumentstrengthinhumanreasoning
andargumentation.InA.Feeney&E.Heit(eds.),InductiveReasoning:Experimental,
Developmental,andComputationalApproaches(pp.269-301).Cambridge(UK):Cambridge
UniversityPress.
OaksfordM.&ChaterN.(2007).Bayesianrationality:Theprobabilisticapproachtohuman
reasoning.Oxford(UK):OxfordUniversityPress.
PatilG.&TailleC.(1982).Diversityasaconceptanditsmeasurement.JournaloftheAmerican
StatisticalAssociation,77,548-561.
PedersenP.&WheelerG.(2014).Demystifyingdilation.Erkenntnis,79:1305-1342.
PettigrewR.(2013).Epistemicutilityandnormsforcredences.PhilosophyCompass,8,897-908.
PopperK.R.(1959).TheLogicofScientificDiscovery.London:Routledge.
PreddJ.B.,SeiringerR.,LiebE.J.,OshersonD.,PoorH.V.,&KulkarniS.R.(2009).Probabilistic
coherenceandproperscoringrules.IEEETransactionsonInformationTheory,55,4786-4792.
RaiffaH.&SchlaiferR.(1961).AppliedStatisticalDecisionTheory.Boston:ClintonPress.
RaileanuL.E.&StoffelK.(2004).TheoreticalcomparisonbetweentheGiniIndexandInformation
Gaincriteria.AnnalsofMathematicsandArtificialIntelligence,41,77-93.
Ramírez-ReyesA.,Hernández-MontoyaA.R.,Herrera-CorralG.,&Domínguez-JiménezI.(2016).
DeterminingtheentropicindexqofTsallisentropyinimagesthroughredundancy.Entropy,18,
299.
RaoC.R.(2010).Quadraticentropyandanalysisofdiversity.Sankhya:TheIndianJournalof
Statistics,72A,70-80.
RenningerL.W.,CoughlanJ.,VergheseP.,&MalikJ.(2005).Aninformationmaximizationmodelof
eyemovements.AdvancesinNeuralInformationProcessingSystems,17,1121-1128.
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
59
RényiA.(1961).Onmeasuresofentropyandinformation.InProceedingsoftheFourthBerkeley
SymposiumonMathematicalStatisticsandProbabilityI(pp.547-556).Berkeley(CA):University
ofCaliforniaPress.
RicottaC.(2003).Onparametricevennessmeasures.JournalofTheoreticalBiology,222,189-197.
RocheW.&ShogenjiT.(2014).Dwindlingconfirmation.PhilosophyofScience,81,114-137.
RocheW.&ShogenjiT.(2016).Informationandinaccuracy.BritishJournalforthePhilosophyof
Science,forthcoming.
RuggeriA.&LombrozoT.(2015).Childrenadapttheirquestionstoachieveefficientsearch.
Cognition,143,203-216.
RusconiP.,MarelliM.,D’AddarioM.,RussoS.,&CherubiniP.(2014).Evidenceevaluation:Measure
ZcorrespondstohumanutilityjudgmentsbetterthanmeasureLandoptimal-experimental-
designmodels.JournalofExperimentalPsychology:Learning,Memory,andCognition,40,703-
723.
SahooP.K.&AroraG.(2004).Athresholdingmethodbasedontwo-dimensionalRényi’sentropy.
PatternRecognition,37,1149-1161.
SavageL.J.(1972).Thefoundationsofstatistics.NewYork:Wiley.
SeltenR.(1998).Axiomaticcharacterizationofthequadraticscoringrule.ExperimentalEconomics,
1,43–61.
ShannonC.E.(1948).Amathematicaltheoryofcommunication.BellSystemTechnicalJournal,27,
379-423and623-656.
SharmaB.&MittalD.(1975).Newnon–additivemeasuresofentropyfordiscreteprobability
distributions.JournalofMathematicalSciences(Delhi),10,28-40.
SimpsonE.H.(1949).Measurementofdiversity.Nature,163,688.
SkovR.B.&ShermanS.J.(1986).Information-gatheringprocesses:Diagnosticity,hypothesis-
confirmationstrategies,andperceivedhypothesisconfirmation.JournalofExperimental
Psychology,22,93-121.
SlowiaczekL.M.,KlaymanJ.,ShermanS.L.,&SkovR.B.(1992).Informationselectionandusein
hypothesistesting:Whatisagoodquestion,andwhatisagoodanswer?Memory&Cognition,20,
392-405.
SprengerJ.(2016).Foundationsforaprobabilistictheoryofcausalstrength.See:http://philsci-
archive.pitt.edu/11927/1/GradedCausation-v2.pdf.
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
60
StringerS.,BorsboomD.,&WagenmakersE.-J.(2011).Bayesianinferencefortheinformationgain
model.BehavioralResearchMethods,43,297-309.
TanejaI.J.,PardoL.,MoralesD.,&MenéndezM.L.(1989).Ongeneralizedinformationand
divergencemeasuresandtheirapplications:Abriefreview.Qüestiió,13,47-73.
TentoriK.,CrupiV.,BoniniN.,andOshersonD.(2007).Comparisonofconfirmationmeasures.
Cognition,103,107-119.
TribusM.&McIrvineE.C.(1971).Energyandinformation.ScientificAmerican,225,179-188.
TropeY.&BassokM.(1982).Confirmatoryanddiagnosingstrategiesinsocialinformation
gathering.JournalofPersonalityandSocialPsychology,43,22-34.
TropeY.&BassokM.(1983).Information-gatheringstrategiesinhypothesistesting.Journalof
ExperimentalandSocialPsychology,19,560-576.
TsallisC.(2002).Entropicnon-extensivity:Apossiblemeasureofcomplexity.Chaos,Solitons,and
Fractals,13,371-391.
TsallisC.(2004).Whatshouldastatisticalmechanicssatisfytoreflectnature?PhysicaD,193,3-34.
TsallisC.(2011).ThenonadditiveentropySqanditsapplicationsinphysicsandelsewhere:Some
remarks.Entropy,13,1765-1804.
Tsallis,C.(1988).PossiblegeneralizationofBoltzmann-Gibbsstatistics.JournalofStatistical
Physics,52,479-487.
TweeneyR.D.,DohertyM.,&KleiterG.D.(2010).Thepseudodiagnosticitytrap:Shouldparticipants
consideralternativehypotheses?Thinking&Reasoning,16,332-345.
VajdaI.&ZvárováJ.(2007).Ongeneralizedentropies,Bayesiandecisions,andstatisticaldiversity.
Kybernetika,43,675-696.
vanderPylT.(1978).Propriétésdel’informationd’ordreaetdetypeb.InThéoriedel’information:
Développementsrécentsetapplications(pp.161-171),ColloquesInternationalesduCNRS,276.
Paris:CentreNationaldelaRechercheScientifique.
WangG.&JiangM.(2005).Axiomaticcharacterizationofnon-linearhomomorphicmeans.
MathematicalAnalysisandApplications,303,350-363.
WasonP.(1960).Onthefailuretoeliminatehypothesesinaconceptualtask.QuarterlyJournalof
ExperimentalPsychology,12,129-140.
WasonP.(1966).Reasoning.InB.Foss(ed.),Newhorizonsinpsychology(pp.135-151).
Harmonsworth,(UK):Penguin.
GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION
61
WasonP.(1968).Reasoningaboutarule.QuarterlyJournalofExperimentalPsychology,20,273-
281.
WuC.,MederB.,FilimonF.,&NelsonJ.D.(inpress).Askingbetterquestions:Howpresentation
formatsguideinformationsearch.JournalofExperimentalPsychology:Learning,Memory,and
Cognition,43,1274-1297.
62
ThisistheSupplementaryMaterialsfileof:
CrupiV.,NelsonJ.D.,MederB.,CevolaniG.,andTentoriK.,Generalizedinformationtheorymeetshumancognition:Introducingaunifiedframeworktomodeluncertaintyandinformationsearch.CognitiveScience,2018.
1.Generalizedlogarithmandexponential
ConsidertheTsallislogarithm,!"#(%) = (
()#*%(()#) − 1-,andnotethat1 + (1 − /)!"#(%) = %(()#),
therefore% = [1 + (1 − /)!"#(%)]2
234.Thisshowsthatthegeneralizedexponential5#6 =
[1 + (1 − /)%]2
234justistheinversefunctionof!"#(%).
Inordertoshowthattheordinarynaturallogarithmisrecoveredfrom!"#(%)(x>0)inthelimitfort®1,wepositx=1–yandfirstconsiderx≤1,sothat|–y|<1.Thenwehave:
lim#→(
{!"#(%)} = lim#→(
{!"#(1 − =)} = lim#→(
>(
()#*(1 − =)(()#) − 1-?
Bythebinomialexpansionof(1 − =)(()#):
lim#→(
>(
()#@−1 + A1 + (1 − /)(−=) +
(()#)(()#)()()B)C
D!+
(()#)(()#)()(()#)D)()B)F
G!+ ⋯IJ?
= lim#→(
>(−=) +()#)()B)C
D!+
()#)()#)()()B)F
G!+ ⋯?
= lim#→(
>(−=) −#()B)C
D!+
(#)(#K()()B)F
G!− ⋯ ?
= (−=) −()B)C
D!+
D!()B)F
G!− ⋯
= (−=) −()B)C
D+
()B)F
G−⋯
whichistheseriesexpansionof!"(1 − =) = ln(%)(recallthat|–y|<1).Forthecasex>1,onecan
positx=1/(1–y),sothatagain|–y|<1andcomputelim#→(
NA2
23OI(234)
)(
()#P = lim
#→(>−
(
#)(*(1 − =)(#)() − 1-?,
thusgettingthesameresultfromasimilarderivation.
Justlikethenaturallogarithm,lnt(x)isnon-negativeifx≥1,becauseift<1,then%(()#) ≥ %R = 1,
therefore (()#
*%(()#) − 1- ≥ 0,whileift>1,then%(()#) ≤ %R = 1,thereforeagain (()#
*%(()#) − 1- ≥ 0.
If0<x<1,lnt(x)isnegativeinstead,againlikethenaturallogarithm.
Toshowthattheordinaryexponentialisrecoveredfrom5#(%)(x>0)inthelimitfort®1,weagainrelyonthebinomialexpansion,asfollows.
lim#→(
{5#(%)} = lim#→(
>[1 + (1 − /)%]2
234?
63
=lim#→(
U1 + A(
()#I (1 − /)% + A
(
()#I A
(
()#− 1I
V(()#)6WC
D!+ A
(
()#I A
(
()#− 1I A
(
()#− 2I
V(()#)6WF
G!+ ⋯ Y
=lim#→(
>1 + A(
()#I (1 − /)% + A
(
()#I A
#
()#I(()#)C6C
D!+ A
(
()#I A
#
()#I A
D#)(
()#I(()#)F6F
G!+ ⋯ ?
=1 + % + 6C
D!+
6F
G!+ ⋯
= 1 +∑6[
\!
]\^( = 5(%)
Justliketheordinaryexponential,et(x)≥1ifx≥0,becauseift<1,thenonehas[1+(1–t)x]≥1toapositivepower1/(1–t),whileift>1,thenonehas[1+(1–t)x]≤1toanegativepower1/(1–t).If0<x<1,et(x)<1instead,againliketheordinaryexponential.
2.Sharma-Mittalentropies
FirstwewillderivetheSharma-Mittalformulafromitsgeneralizedmeanform.Wehaveg(x)=lnret(x)andinf(x)=lnt(x).Letusfindg–1(x),bysolvingy=g(x)forx,asfollows.
= = !"_5#(%)
= !"_ @(1 + (1 − /)%)2
234J
=(
()_@(1 + (1 − /)%)
23`
234 − 1J
Therefore:
1 + (1 − a)= = [1 + (1 − /)%]23`
234
[1 + (1 − a)=]2
23` = [1 + (1 − /)%]2
234
5_(=) = [1 + (1 − /)%]2
234
[5_(=)]()# = 1 + (1 − /)%
(
()#{[5_(=)]
()# − 1} = %
% = !"#5_(=)
Sog–1(x)=lnter(x).NowwehavealltheelementstoderivetheSharma-Mittalformula.
5"/b
cd(`,4)(f) = g)( >∑ hijk∈m
g @n"o A(
pkIJ?
= !"#5_ >∑ hijk∈m!"_5# @!"# A
(
pkIJ?
= !"#5_ >∑ hijk∈m!"_ A
(
pkI?
= !"#5_ U(
()_∑ hijk∈m
qA(
pkI()_
− 1rY
= !"# >1 + (1 − a) @(
()_∑ hijk∈m
Vhi(_)() − 1WJ?
2
23`
64
= !"#s1 + ∑ hijk∈mhi(_)() − ∑ hijk∈m
t
2
23`
= !"#s∑ hi_
jk∈mt
2
23`= (
()#qV∑ hi
_jk∈m
W
234
23` − 1r =(
#)(q1 − V∑ hi
_jk∈m
W
432
`32r
Letusnotethat5"/cd(`,4)satisfiesthebasicpropertiesofentropymeasures.Aspointedoutabove,
Tsallislogarithmlnt(x)isalwaysnon-negativeifx≥1,thereforesois∑ hijk∈m!"_ A
(
pkI.Moreover,
er(x)≥1ifx≥0(seeabove),so5_ >∑ hijk∈m!"_ A
(
pkI? ≥ 1andfinally!"#5_ >∑ hijk∈m
!"_ A(
pkI? ≥ 0.
Thisprovesthatnon-negativityholdsfor5"/cd(`,4).Letusthenconsiderevennesssensitivity.Wealreadyknowthat5"/cd(`,4)isnon-negative;also,∑ hi
_ = 1jk∈mincasepi=1forsomei,sothat
5"/u
cd(`,4)(f) = 0.Asaconsequence,foranyHandP(H),5"/
b
cd(`,4)(f) ≥ 5"/
u
cd(`,4)(f) = 0.Inorder
tocompletetheproofofevennesssensitivity,wewillnowstudythemaximizationof5"/cd(`,4)by
meansofso-calledLagrangemultipliers.Wehavetomaximize∑ hijk∈m!"_ A
(
pkI =
(
()_*∑ (hi)
_jk∈m
− 1-,sowestudyo(%(, … , %\) = (
()_[∑ (%i)
_(wiw\ − 1]undertheconstraint
∑ (%i)(wiw\ = 1.BytheLagrangemultipliersmethod,wegetasystemofn+1equationsasfollows:
⎩⎪⎨
⎪⎧
a
1 − a%(
(_)() = |
…a
1 − a%\
(_)() = |
%( + ⋯+ %\ = 1
wherex1=…=xn=1/nistheonlysolution.Thismeansthat∑ hijk∈m!"_ A
(
pkIiseithermaximizedor
minimizedfortheuniformdistributionU(H).Butactually5"/}
cd(`,4)(f)mustbeamaximum,sothat,
foranyHandP(H),5"/}
cd(`,4)(f) ≥ 5"/
b
cd(`,4)(f).Infact,5"/
}
cd(`,4)(f)isstrictlypositive,because
∑ hijk∈m!"_ A
(
pkI = !"_(")is(recallthatn>1).Hence,foranyH,5"/}
cd(`,4)(f) > 5"/
u
cd(`,4)(f) = 0,
andevennesssensitivityisshowntohold.3.SomespecialcasesoftheSharma-Mittalfamily
Giventheaboveanalysisofgeneralizedlogarithmsandexponentials,wehaveRényi(1961)entropyasaspecialcaseoftheSharma-Mittalfamilyasfollows:
5"/b
cd(`,2)(f) = !" >5_ @∑ hijk∈m
!"_ A(
pkIJ?
= !" �@1 + (1 − a)(
()_V∑ hi
_jk∈m
− 1WJ
2
23`Ä
= !" U*∑ hi_
jk∈m-
2
23`Y =(
()_!"V∑ hi
_jk∈m
W =5"/b
Åé\Bi(`)(f)
65
ForShannonentropy,inparticular,oneonlyneedstonotethat5"/b
cd(2,2)(f) =
!" >5 @∑ hijk∈m!" A
(
pkIJ? = ∑ hijk∈m
!" A(
pkI.
ForTsallis(1988)entropy,wehave:
5"/b
cd(4,4)(f) = !"#5# >∑ hijk∈m
!"# A(
pkI? = ∑ hijk∈m
!"# A(
pkI =
(
#)(*1 − ∑ hi
#jk∈m
- = 5"/b
ÉÑÖÜÜiÑ(4)(f)
ForanothergeneralizationofShannonentropy,i.e.Gaussianentropy(Frank,2004),wehave:
5"/b
cd(2,4)(f) = !"#5 >∑ hijk∈m
!" A(
pkI? =
(
()#�5
(()#)q∑ pkák∈àÜ\â
2
äkãr− 1Ä = 5"/
b
åÖçÑÑ(4)(f)
Thewayinwhich5"/åÖçÑÑ(4)recoversShannonentropyfort=1againfollowsbythebehaviorof
thegeneralizedlogarithm,because5"/b
åÖçÑÑ(2)(f) = !" >5 @∑ hijk∈m
!" A(
pkIJ? = ∑ hijk∈m
!" A(
pkI.
ForPowerentropies,5"/b
cd(`,C)(f) = 5"/
b
béèê_(`)(f)followsimmediatelyfrom5"/
b
cd(`,4)(f) =
(
#)(q1 − V∑ hi
_jk∈m
W
432
`32r,andthesameforQuadraticentropy,i.e.,5"/b
cd(C,C)(f) = 5"/b
ëçÖí(f).
Ifwepositt=2–1/r,wehave5"/b
cd(`,C3
2
`)(f) =
_
_)(q1 − V∑ ì(ℎi)
_jk∈m
W
2
`r,whichhappenstobe
preciselyArimoto’s(1971)entropy,underaninconsequentialchangeofparametrization(Arimoto,1971,usedaparameterbtobesetto1/rinournotation).
ForEffectiveNumbermeasures(Hill,1973),wehave:
5"/b
cd(`,ï)(f) =
(
)(q1 − V∑ hi
_jk∈m
W
32
`32r = V∑ hi_
jk∈mW
2
23` − 1 = 5"/b
ñó(`)(f)
AsafurtherpointconcerningEffectiveNumbers,consideraSharma-Mittalmeasure5"/b
cd(`,4)(f) =
!"#5_ >∑ hijk∈m!"_ A
(
pkI?,foranychoiceofrandt(bothnon-negative).WeaskwhatisthenumberN
ofequiprobableelementsinapartitionKsuchthat5"/b
cd(`,4)(f) = 5"/
}
cd(`,4)(ò).Wenotethat
5"/}
cd(`,4)(ò) = !"#5_{!"_(ô)} = !"#(ô),thusweposit:
5"/b
cd(`,4)(f) = !"#5_ >∑ hijk∈m
!"_ A(
pkI? = !"#(ô)
ô = 5_ >∑ hijk∈m!"_ A
(
pkI?
= @1 + (1 − a)(
()_V∑ hi
_ − 1jk∈mWJ
2
23`
= V∑ hi_
jk∈mW
2
23` = 5"/b
ñó(`)(f) + 1
66
Thisshowsthat,regardlessofthedegreeparametert,foranySharma-Mittalmeasureofaspecified
orderr,5"/b
ñó(`)(f) + 1computesthetheoreticalnumberNofequallyprobableelementsthat
wouldbejustasentropicasHunderthatmeasureandgivenP(H).
Thederivationoftheformof5"/cd(ï,4)isasfollows:
5"/b
cd(ï,4)(f) =
(
#)(q1 − V∑ hi
Rjk∈m
W
432
32 r= (
()#*("K)(()#) − 1-= !"#("
K)
where"KisthenumberofelementsinHwithanon-nullprobabilityaccordingtoP(H)(recallthatweapplytheconvention00=0,commonintheentropyliterature).Hartleyentropy,5"/b
mÖ_#ÜêB(f) = !"("K),immediatelyfollowsasaspecialcasefort=1,justasOriginentropy,
5"/bö_iõi\
(f) = "K − 1,fort=0.t=2yields5"/b
cd(ï,C)(f) = −[("K))( − 1] =
\ú)(
\ú.
Forthecaseofinfiniteorder,weposith∗ = maxjk∈m
(hi)andnotethefollowing(nisagainthe
overallsizeofH):
(h∗)_ ≤ ∑ hi
_ ≤jk∈m"(h∗)
_
Assuming#)(_)(
≥ 0involvesnolossofgeneralityinwhatfollows:
((h∗)_)
432
`32 ≤ V∑ hi_
jk∈mW
432
`32 ≤ ("(h∗)_)
432
`32
!" @((h∗)_)
432
`32J ≤ !" qV∑ hi_
jk∈mW
432
`32r ≤ !" @"432
`32((h∗)_)
432
`32J
!" @((h∗)_)
432
`32J ≤ !" qV∑ hi_
jk∈mW
432
`32r ≤ !" A"432
`32I + !" @((h∗)_)
432
`32J
lim_→]
>!" @((h∗)_)
432
`32J? ≤ lim_→]
U!" qV∑ hi_
jk∈mW
432
`32rY ≤ lim_→]
>#)(
_)(!"(")? + lim
_→]>!" @((h∗)
_)432
`32J?
lim_→]
>!" @((h∗)_)
432
`32J? ≤ lim_→]
U!" qV∑ hi_
jk∈mW
432
`32rY ≤ 0 + lim_→]
>!" @((h∗)_)
432
`32J?
Therefore:
lim_→]
U!" qV∑ hi_
jk∈mW
432
`32rY = lim_→]
>!" @((h∗)_)
432
`32J? = lim_→]
U!" qA(h∗)`
`32I#)(
rY
Thelimitforr®¥oftheargumentofthelnfunctionexistsandisfinite: lim_→]
qA(h∗)`
`32I#)(
r =
h∗(#)().Forthisreason,wecanconclude:
!" U lim_→]
qV∑ hi_
jk∈mW
432
`32rY = !" U lim_→]
qA(h∗)`
`32I#)(
rY
lim_→]
qV∑ hi_
jk∈mW
432
`32r = h∗(#)()
67
(
()#q lim_→]
V∑ hi_
jk∈mW
432
`32 − 1r =(
()#*h∗
(#)() − 1-
lim_→]
q(
()#âV∑ hi
_jk∈m
W
432
`32 − 1ãr =(
()#âA
(
p∗I()#
− 1ã
lim_→]
@5"/b
cd(`,4)(f)J = !"# A
(
p∗I
Errorentropyisaspecialcasefort=2,because!"D A (p∗I = −âA(
p∗I)(
− 1ã = 1 − h∗ = 1 −maxjk∈m
(hi).
t=1yields!" † (
°¢£ák∈à
(pk)§,while,fort=0,oneimmediatelyhas (
°¢£ák∈à
(pk)− 1.
4.Ordinalequivalence,additivity,andconcavity
TheordinalequivalenceofanypairofSharma-Mittalmeasures5"/cd(`,4)and5"/cd(`,4∗)withthesameorderranddifferentdegrees,tandt*,iseasilyprovenonthebasisoftheinverserelationshipoflnt(x)andet(x).Infact,foranyr,t,t*,andanyHandP(H),5"/cd(`,4)isastrictlyincreasingfunctionof5"/cd(`,4∗):
5"/b
cd(`,4)(f) = !"#5_ @∑ hijk∈m
!"_ A(
pkIJ
= !"#5#∗ >!"#∗5_ @∑ hijk∈m!"_ A
(
pkIJ?
= !"#5#∗ >5"/bcd(`,4∗)(f)?
Fordegreest,t*≠1,thisimpliesthat:
5"/b
cd(`,4)(f) =
(
()#�@1 + (1 − /∗)5"/
b
cd(`,4∗)J
234
234∗
− 1Ä
whereaswhent=1and/ort*=1,thelimitingcasesoftheordinaryexponentialand/ornaturallogarithmapply.Thisgeneralresultisnovelintheliteraturetothebestofourknowledge.However,awell-knownspecialcaseistherelationshipbetweenRényientropiesandtheEffectiveNumbermeasures(seeHill,1973,p.428,andRicotta,2003,p.191):
5"/b
Åé\Bi(`)(f) = 5"/
b
cd(`,2)(f)
= !" >5R @5"/bcd(`,ï)
(f)J?
= !" >1 + 5"/b
cd(`,ï)(f)?
= !" >5"/b
ñó(`)(f) + 1?
AnotherneatillustrationinvolvesPowerentropymeasuresandRényientropies:
68
5"/b
béèê_(`)(f) = 5"/
b
cd(`,C)(f)
= !"D >5 @5"/bcd(`,2)
(f)J?
= 1 − 5)ê\#•¶é[Ok(`)
(m)
WewillnowderivethegeneraladditivityruleforSharma-Mittalentropiesconcerningindependentvariables,i.e.,whenß ⊥b ©holds.Tosimplifynotation,belowwewilluse∑(ß)asashorthandfor
V∑ ì(%i)_
6k∈™W
432
`32(thesameforY,andsoon)andwewilluse5"/(ß)asashorthandfor
5"/b
cd(`,4)(ß)(thesamefortheexpectedreductionofentropy,R).
5"/(ß) + 5"/(©) − (/ − 1)5"/(ß)5"/(©)=
(
#)([1 − ∑(ß)] +
(
#)([1 − ∑(©)] − (/ − 1)
(
#)([1 − ∑(ß)]
(
#)([1 − ∑(©)]
=(
#)([1 − ∑(ß)] +
(
#)([1 − ∑(©)] −
(
#)([1 − ∑(ß)][1 − ∑(©)]
=(
#)(−
(
#)(∑(ß) +
(
#)(−
(
#)(∑(©) −
(
#)(+
(
#)(∑(ß) +
(
#)(∑(©) −
(
#)(∑(ß)∑(©)
=(
#)(−
(
#)(∑(ß)∑(©)
=(
#)(−
(
#)(V∑ ì(%i)
_6k∈™
W
432
`32 A∑ ì(=´)_
B¨∈≠I
432
`32
=(
#)(Æ1 − A∑ ì(%i)
_6k∈™
∑ ì(=´)_
B¨∈≠I
432
`32Ø
=(
#)(Æ1 − A∑ ∑ Vì(%i)ì(=´)W
_
B¨∈≠6k∈™I
432
`32Ø
=(
#)(Æ1 − A∑ ∑ ìV%i ∩ =´W
_
B¨∈≠6k∈™I
432
`32Ø =5"/(ß × ©)
Thisadditivityruleinturngovernstherelationshipbetweentheexpectedentropyreductionofatestincaseitisaperfect(conclusive)experimentandincaseitisnot.Moreprecisely,itimpliesthatforindependentvariablesEandH:
≤(≥, ≥) − ≤(f × ≥, ≥) = (/ − 1)5"/(f)5"/(≥)
Infact:
≤(≥, ≥) − ≤(f × ≥, ≥) = 5"/(≥) − ∑ *5"/V≥|5́ W-ìV5́ Wê¨∈ñ− 5"/(f × ≥) + ∑ *5"/Vf × ≥|5́ W-ìV5́ Wê¨∈ñ
= 5"/(≥) − 0 − 5"/(f) − 5"/(≥) + (/ − 1)5"/(f)5"/(≥) + ∑ *5"/Vf × ≥|5́ W-ìV5́ Wê¨∈ñ
= −5"/(f) + (/ − 1)5"/(f)5"/(≥) + ∑ *5"/Vf|5́ W + 5"/V≥|5́ W − (/ − 1)5"/Vf|5́ W5"/V≥|5́ W-ìV5́ Wê¨∈ñ
= −5"/(f) + (/ − 1)5"/(f)5"/(≥) + ∑ *5"/Vf|5́ W + 0 − (/ − 1)V5"/Vf|5́ W × 0W-ìV5́ Wê¨∈ñ
= −5"/(f) + (/ − 1)5"/(f)5"/(≥) + 5"/(f) = (/ − 1)5"/(f)5"/(≥)
69
Sharma-Mittalmeasuresofexpectedentropyreductionarealsogenerallyadditiveforacombinationofexperiments,thatis,foranyH,E,Fandì(f, ≥, µ),itholdsthat≤(f, ≥ × µ) =≤(f, ≥) + ≤(f, µ|≥).Toseethis,letusfirstconsidertheentropyreductionofaspecificdatume,∆5"/(f, 5) = 5"/(f) − 5"/(f|5).∆5"/isclearlyadditiveinthefollowingway:
∆5"/(f, 5 ∩ o) = 5"/(f) − 5"/(f|5 ∩ o)= 5"/(f) − 5"/(f|5) + 5"/(f|5) − 5"/(f|5 ∩ o)= ∆5"/(f, 5) + ∆5"/(f, o|5)
Butthispatterncarriesovertotheexpectedvalue≤(f, ≥ × µ):
≤(f, ≥ × µ) = ∑ ∑ *∆5"/Vf, 5́ ∩ o∑W-ìV5́ ∩ o∑W∏π∈∫ê¨∈ñ
= ∑ ∑ *∆5"/Vf, 5́ W + ∆5"/Vf, o∑|5́ W-ìVo∑|5́ WìV5́ W∏π∈∫ê¨∈ñ
= ∑ ∑ *∆5"/Vf, 5́ W-ìVo∑|5́ WìV5́ W +∏π∈∫ê¨∈ñ∑ ∑ *∆5"/Vf, o∑|5́ W-ìVo∑|5́ WìV5́ W∏π∈∫ê¨∈ñ
= ∑ *∆5"/Vf, 5́ W-ìV5́ Wê¨∈ñ+ ∑ s∑ *∆5"/Vf, o∑|5́ W-ìVo∑|5́ W∏π∈∫
tìV5́ Wê¨∈ñ
= ≤(f, ≥) + ∑ s≤Vf, µ|5́ WtìV5́ Wê¨∈ñ
= ≤(f, ≥) + ≤(f, µ|≥)
Thisresultisnovelintheliteraturetothebestofourknowledge.
Finally,wewillshowthat,foranyH,E,andP(H,E),≤(f, ≥) ≥ 0ifandonlyifentisconcave.Letªb(º)betheexpectedvalueofavariablevforsomeprobabilitydistributionP={p1,…,pm},i.e.ªb(º) = ∑ ºi
Ωi^( hi.AccordingtoamultivariateversionofJensen’sinequality,g(%(, … , %\)isa
concavefunctionifandonlyifgoftheexpectedvaluesofitsargumentsisgreaterthan(orequalto)theexpectedvalueofg,thatis:
g[ªb(%(), … , ªb(%\)] ≥ ªb[g(%(, … , %\)]
Nowwesetg(%(, … , %\) = 5"/(f|5)andwepositthatªb(%)becomputedonthebasisofP(E),i.e.ªb(º) = ∑ ºiìV5́ Wê¨∈ñ
.Assumingthatentisconcave,wehave:
(
#)(æ1 − A∑ A∑ ì(ℎi|5́ )ì(5́ê¨∈ñ
)I_
jk∈mI
432
`32
ø ≥∑ q(
#)(â1 − V∑ ì(ℎi|5́ )
_jk∈m
W
432
`32ãrìV5́ Wê¨∈ñ
(
#)(â1 − V∑ ì(ℎi)
_jk∈m
W
432
`32ã − ∑ q(
#)(â1 − V∑ ì(ℎi|5́ )
_jk∈m
W
432
`32ãrìV5́ Wê¨∈ñ≥ 0
∑ q(
#)(â1 − V∑ ì(ℎi)
_jk∈m
W
432
`32ã −(
#)(â1 − V∑ ì(ℎi|5́ )
_jk∈m
W
432
`32ãrìV5́ Wê¨∈ñ≥ 0
∑ ∆5"/(f, 5)ìV5́ Wê¨∈ñ≥ 0
≤(f, ≥) ≥ 0
70
5.ExpectedentropyreductioninthePersonGame
Toanalyzetheexpectedentropyreductionofonebinaryqueryinthepersongame,wewillpositH={h1,…,hn}(thesetofpossibleguessesastowhotherandomlyselectedcharacteris)andE={e,5}(theyes/noanswerstoaquestionsuchas“doestheselectedcharacterhaveblueeyes?”;recallthat“5”denotesthecomplementorthenegationofe).ThejointprobabilitydistributionP(H,E)isdefinedasfollows:P(hiÇe)=1/nincasei≤k(with1≤k<n)andP(hiÇe)=0otherwise;P(hiÇ5)=0incasei≤kandP(hiÇ5)=1/notherwise.ThisimpliesthatP(hi)=1/nforeachi(allguessesareinitiallyequiprobable),P(hi|e)=1/kforeachi≤k(theposteriorgiveneisauniformdistributionoverkelementsofH),andP(hi|5)=1/(n–k)foreachi>k(theposteriorgiven5isauniformdistributionovern–kelementsofH).Moreover,P(e)=k/n.Giventhegeneralfactthat
5"/}
cd(`,4)(f) = !"#("),wehave:
≤b
cd(`,4)(f, ≥) = [!"#(") − !"#(¿)]ì(5) + [!"#(") − !"#(" − ¿)]ì(5)
Algebraicmanipulationsyield:
≤b
cd(`,4)(f, ≥) =
"()#
1 − /[1 − (ì(5)D)# + ì(5)D)#)]
Inthespecialcaset=2,onethenhas≤b
cd(`,C)(f, ≥) =
(
\,sothattheexpectedusefulnessofqueryEis
constant,regardlessofthevalueofP(e).Moregenerally,however,thefirstderivativeof≤b
cd(`,4)(f, ≥)
is
"()#
1 − /[(2 − /)ì(5)()# − (2 − /)ì(5)()#]
whichequalszeroforP(e)=P(5),sothat≤b
cd(`,4)(f, ≥)hasamaximumoraminimumforP(e)=½.
Thesecondderivative,inturn,is:
"()#(/ − 2)[ì(5))# + ì(5))#]
whichisstrictlypositive[negative]incasetisstrictlyhigher[lower]than2.So,inthepersongame,
≤b
cd(`,4)(f, ≥)isastrictlyconcavefunctionofP(e)whent<2,andP(e)=½isthenamaximum.
Whent>2,onthecontrary,≤b
cd(`,4)(f, ≥)isastrictlyconvexfunctionofP(e),andP(e)=½isthena
minimum.Thisgeneralresultisnovelintheliteraturetothebestofourknowledge.