generalized information theory meets human cognition...

70
GENERALIZED INFORMATION THEORY AND HUMAN COGNITION Generalized Information Theory Meets Human Cognition: Introducing a Unified Framework to Model Uncertainty and Information Search Vincenzo Crupi 1 , Jonathan D. Nelson 2,3 , Björn Meder 3 , Gustavo Cevolani 4 , and Katya Tentori 5 1 Center for Logic, Language, and Cognition, Department of Philosophy and Education, University of Turin, Italy 2 School of Psychology, University of Surrey, Guildforfd, UK 3 Center for Adaptive Behavior and Cognition, Max Planck Institute for Human Development, Berlin, Germany 4 IMT School for Advanced Studies, Lucca, Italy 5 Center for Mind / Brain Sciences, University of Trento, Italy Author Note Correspondence concerning this article should be addressed to Vincenzo Crupi, Department of Philosophy and Education, University of Turin, via Sant’Ottavio 20, 10124, Torino (Italy), [email protected]. This research was supported by grants CR 409/1-2, NE 1713/1-2, and ME 3717/2-2 from the Deutsche Forschungsgemeinschaft as part of the priority program New Frameworks of Rationality (SPP 1516). We thank Nick Chater, Laura Martignon, Andrea Passerini, and Paul Pedersen for helpful comments and exchanges.

Upload: others

Post on 15-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

GeneralizedInformationTheoryMeetsHumanCognition:

IntroducingaUnifiedFrameworktoModelUncertaintyandInformationSearch

VincenzoCrupi1,JonathanD.Nelson2,3,BjörnMeder3,GustavoCevolani4,andKatyaTentori5

1CenterforLogic,Language,andCognition,DepartmentofPhilosophyandEducation,

UniversityofTurin,Italy

2SchoolofPsychology,UniversityofSurrey,Guildforfd,UK

3CenterforAdaptiveBehaviorandCognition,MaxPlanckInstituteforHuman

Development,Berlin,Germany

4IMTSchoolforAdvancedStudies,Lucca,Italy

5CenterforMind/BrainSciences,UniversityofTrento,Italy

AuthorNote

CorrespondenceconcerningthisarticleshouldbeaddressedtoVincenzoCrupi,Department

ofPhilosophyandEducation,UniversityofTurin,viaSant’Ottavio20,10124,Torino(Italy),

[email protected]/1-2,NE1713/1-2,

andME3717/2-2fromtheDeutscheForschungsgemeinschaftaspartofthepriorityprogram

NewFrameworksofRationality(SPP1516).WethankNickChater,LauraMartignon,Andrea

Passerini,andPaulPedersenforhelpfulcommentsandexchanges.

Page 2: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

2

Abstract

Searchingforinformationiscriticalinmanysituations.Inmedicine,forinstance,careful

choiceofadiagnostictestcanhelpnarrowdowntherangeofplausiblediseasesthatthe

patientmighthave.Inaprobabilisticframework,testselectionisoftenmodeledbyassuming

thatpeople’sgoalistoreduceuncertaintyaboutpossiblestatesoftheworld.Incognitive

science,psychology,andmedicaldecisionmaking,Shannonentropyisthemostprominent

andmostwidelyusedmodeltoformalizeprobabilisticuncertaintyandthereductionthereof.

However,avarietyofalternativeentropymetrics(Hartley,Quadratic,Tsallis,Rényi,and

more)arepopularinthesocialandthenaturalsciences,computerscience,andphilosophyof

science.Particularentropymeasureshavebeenpredominantinparticularresearchareas,and

itisoftenanopenissuewhetherthesedivergencesemergefromdifferenttheoreticaland

practicalgoalsoraremerelyduetohistoricalaccident.Cuttingacrossdisciplinaryboundaries,

weshowthatseveralentropyandentropyreductionmeasuresariseasspecialcasesina

unifiedformalism,theSharma-Mittalframework.Usingmathematicalresults,computer

simulations,andanalysesofpublishedbehavioraldata,wediscussfourkeyquestions:How

dovariousentropymodelsrelatetoeachother?Whatinsightscanbeobtainedbyconsidering

diverseentropymodelswithinaunifiedframework?Whatisthepsychologicalplausibilityof

differententropymodels?Whatnewquestionsandinsightsforresearchonhuman

informationacquisitionfollow?Ourworkprovidesseveralnewpathwaysfortheoreticaland

empiricalresearch,reconcilingapparentlyconflictingapproachesandempiricalfindings

withinacomprehensiveandunifiedinformation-theoreticformalism.

KEYWORDS:

Entropy,Uncertainty,Valueofinformation,Informationsearch,Probabilisticmodels

Page 3: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

3

GeneralizedInformationTheoryMeetsHumanCognition:

IntroducingaUnifiedFrameworktoModelUncertaintyandInformationSearch

1.Introduction

Akeytopicinthestudyofrationality,cognition,andbehavioristheeffectivesearchfor

relevantinformationorevidence.Informationsearchisalsocloselyconnectedtothenotionof

uncertainty.Typically,anagentwillseektoacquireinformationtoreduceuncertaintyabout

aninferenceordecisionproblem.Physiciansprescribemedicaltestsinordertohandlearrays

ofpossiblediagnoses.Detectivesseekwitnessesinordertoidentifytheculpritofacrime.

And,ofcourse,scientistsgatherdatainordertodiscriminateamongdifferenthypotheses.

Inpsychologyandcognitivescience,mostearlyworkoninformationacquisitionadopteda

logical,deductiveinferenceperspective.InthespiritofPopper’s(1959)influential

falsificationistphilosophyofscience,theideawasthatlearnersshouldseekinformationthat

couldhelpthemfalsifyhypotheses(e.g.,expressedasaconditionalorarule;Wason,1960,

1966,1968).However,manyhumanreasonersdidnotseemtobelievethatinformationis

usefulifandonlyifitcanpotentiallyruleout(falsify)ahypothesis.Fromthe1980s,cognitive

scientistsstartedanalyzinghumaninformationsearchwithacloserlookatinductive

inference,usingprobabilisticmodelstoquantifythevalueofinformationandendorsingthem

asnormativebenchmarks(e.g.,Baron,1985;Klayman&Ha,1987;Skov&Sherman,1986;

Slowiaczek,Klayman,Sherman,&Skov,1992;Trope&Bassok,1982,1983).Thisresearch

wasinspiredbyseminalworkinphilosophyofscience(e.g.Good,1950),statistics(e.g.

Lindley,1956),anddecisiontheory(Savage,1972).Inthisview,eachoutcomeofaquery

couldmodifyanagent’sbeliefsaboutthehypothesesbeingconsidered,thusprovidingsome

amountofinformation.Forinstance,thekeytheoreticalpointofOaksfordandChater’s(1994,

2003)analysisofWason’sselectiontaskwastoconceptualizeinformationacquisitionasa

pieceofprobabilisticinductivereasoning,assumingthatpeople’sgoalistoreduce

uncertaintyaboutwhetheraruleholdsornot.Inasimilarvein,researchersinvisionscience

haveusedmeasuresofuncertaintyreductiontopredictvisualqueriesforgathering

Page 4: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

4

information(i.e.,eyemovements;Legge,Klitz,&Tjan,1997;Najemnik&Geisler,2005,2009;

Nelson&Cottrell,2007;Renninger,Coughlan,Verghese,&Malik,2005),ortoguidearobot’s

eyemovements(Denzler&Brown,2002).Probabilisticmodelsofuncertaintyreductionhave

alsobeenusedtopredicthumanqueryselectionincausalreasoning(Bramley,Lagnado,&

Speekenbrink,2015),hypothesistesting(Austerweil&Griffiths,2011;Navarro&Perfors,

2011;Nelson,Divjak,Gudmundsdottir,Martignon,&Meder,2014;Nelson,Tenenbaum,&

Movellan,2001),andcategorization(Meder&Nelson,2012;Nelson,McKenzie,Cottrell,&

Sejnowski,2010).

Ifreducinguncertaintyisamajorcognitivegoalandmotivationforinformation

acquisition,acriticalissueishowuncertaintyandthereductionthereofcanberepresentedin

arigorousmanner.Afruitfulapproachtoformalizeuncertaintyisusingthemathematical

notionofentropy,whichinturngeneratesacorrespondingmodeloftheinformationalutility

ofanexperimentastheexpectedreductionofentropy(uncertainty),sometimescalled

expectedinformationgain.

Inmanydisciplines,includingpsychologyandneuroscience(Hasson,2016),themost

prominentmodelisShannon(1948)entropy.However,anumberofnon-equivalentmeasures

ofentropyhavebeensuggested,andarebeingused,inavarietyofresearchdomains.

ExamplesincludetheapplicationofQuadraticentropyinecology(Lande,1996),thefamilyof

Rényi(1961)entropiesincomputerscienceandimageprocessing(Boztas,2014;Sahoo&

Arora,2004),andTsallisentropiesinphysics(Tsallis,2011).Itiscurrentlyunknownwhether

theseotherentropymodelswouldhavepotentialtoaddresskeytheoreticalandempirical

questionsincognitivescience.Here,webringtogetherthesedifferentmodelsina

comprehensivetheoreticalframework,theSharma-Mittalformalism(fromSharma&Mittal,

1975),whichincorporatesalargenumberofprominententropymeasuresasspecialcases.

Carefulconsiderationoftheformalpropertiesofthisfamilyofentropymeasureswillreveal

importantimplicationsformodelinguncertaintyandinformationsearchbehavior.Against

thisrichtheoreticalbackground,wewilldrawonexistingbehavioraldataandnovel

simulationstoexplorehowdifferentmodelsrelatetoeachother,elucidatetheirpsychological

meaningandplausibility,andshowhowtheycangeneratenewtestablepredictions.

Page 5: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

5

Theremainderofthispaperisorganizedasfollows.Webeginbyspellingoutwhatan

entropymeasureisandhowitcanbeemployedtorepresentuncertaintyandthe

informationalvalueofqueries(questions,tests,experiments)(section2.).Subsequently,we

reviewfourrepresentativeandinfluentialdefinitionsofentropy,namelyQuadratic,Hartley,

Shannon,andErrorentropy(3.).Thesemodelshavebeen,andcontinuetobe,ofimportance

indifferentareasofresearch.Inthemaintheoreticalsectionofthepaper,wedescribea

unifiedformalframeworkgeneratingabiparametriccontinuumofentropymeasures.

Drawingonworkingeneralizedinformationtheory,weshowthatmanyextantmodelsof

entropyandexpectedentropyreductioncanbeembeddedinthiscomprehensiveformalism

(4.).Weprovideanumberofnewmathematicalresultsinthissection.Wealsoaddressthe

theoreticalmeaningoftheparametersinvolvedwhenthetargetdomainofapplicationis

humanreasoning,withimplicationsforbothnormativeanddescriptiveapproaches.Wethen

furtherelaborateontheconnectionwithexperimentalresearchinseveralways.First,we

presentsimulationresultsfromanextensiveexplorationofinformationsearchdecision

problemsinwhichalternativemodelsprovidestronglydiverging,empiricallytestable

predictions(5.).Second,wereportanddiscussanoverarchinganalysisoftheinformation-

theoreticaccountofthemostwidelyknownexperimentalparadigmforthestudyof

informationgathering,i.e.,Wason’s(1966,1968)abstractselectiontask(6.1.).Thenwe

investigatewhichmodelsperformbetteragainstdatafromarangeofexperience-based

studiesonhumaninformationsearchbehavior(Meder&Nelson,2012;Nelsonetal.,2010)

(6.2.).Wealsopointoutthatsomeentropymodelsfromthisframeworkofferpotential

explanationofhumaninformationsearchbehaviorinexperimentswhereprobabilitiesare

conveyedthroughwordsandnumbers,whichtodatehavebeenperplexingtoaccountfor

theoretically(6.3).Finally,weshowthatnewmodelsofferatheoreticallysatisfyingand

descriptivelyadequateunificationofdisparateresultsacrossdifferentkindsoftasks(6.4.).In

theGeneralDiscussion(7.),weoutlineandassesstheprospectsofageneralizedinformation-

theoreticframeworkforguidingthestudyofhumaninferenceanddecisionmaking.

Partofourdiscussionreliesandelaboratesonmathematicalanalyses,includingnovel

results.Moreover,althoughanumberofthemathematicalpointsinthepapercanbefound

scatteredthroughthemathematicsandphysicsliterature,herewebringthemtogether

Page 6: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

6

systematically.WeprovideSupplementaryMaterialswherenon-trivialderivationsaregiven

accordingtoourunifiednotation.Throughouteachsectionofthetext,statementsrequiringa

mathematicalproofareflaggedbysquarebrackets[SupplMat],andtheproofisthen

presentedinthecorrespondingsubsectionoftheSupplementaryMaterialsfile.Amongthe

formalresultsprovidedthatarenoveltothebestofourknowledge,thefollowingwefind

especiallyimportant:theordinalequivalenceofSharma-Mittalentropymeasuresofthesame

order(proofinSupplMat,section4),theadditivityofallSharma-Mittalmeasuresofexpected

entropyreductionforsequentialtests(againSupplMat,4),andthedistinctiveroleofthe

degreeparameterininformationsearchtaskssuchasthePersonGame(SupplMat,5).

FurthernovelresultsincludethesubsumptionofdiversemodelssuchastheArimoto(1971)

andthePowerentropieswithintheSharma-Mittalframework(SupplMat,3),andthe

specificationofhowanumberofdifferententropymeasurescanbeconstruedwithinthe

generaltheoryofmeans(Table4).

2.Entropies,uncertainty,andinformationsearch

Accordingtoawell-knownanecdote,theoriginsofinformationtheoryweremarkedbya

wittyjokeofJohnvonNeumann.ClaudeShannonwasdoubtfulhowtocallthekeyconceptof

hisgroundbreakingworkonthe“mathematicaltheoryofcommunication”(Shannon,1948).

“Youshouldcallitentropy,”vonNeumannsuggested.Ofcourse,vonNeumannmusthave

beenawareofthecloseconnectionsbetweenShannon’sformulaandBoltzmann’sdefinition

ofentropyinclassicalstatisticalmechanics.Butthemostimportantreasonforhissuggestion,

vonNeumannquipped,wasthat“nobodyknowswhatentropyreallyis,soinadebateyouwill

alwayshavetheadvantage”(seeTribus&McIrvine,1971).Shannonacceptedtheadvice.

Severaldecadeslater,vonNeumann’sremarkseemsevenmorepointed,ifanything.

Influentialobservershavevoicedcautionandconcernabouttheproliferationofmathematical

analysesofentropyandrelatednotions(Aczél,1984,1987).Meanwhile,manyapplications

havebeendeveloped,forinstanceinphysicsandecology(see,e.g.,Beck,2009;Keylock,

2005).Butrecurrenttheoreticalcontroversieshavearisen,too,alongwithoccasional

complaintsofconceptualconfusion(seeCho,2002,andJost,2006,respectively).

Page 7: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

7

Luckily,thesethornyissueswillbetangentialtoourmainconcerns.Althoughagiven

formalizationofentropycanbeconsideredfortherepresentationandmeasurementof

differentconstructsineachofavarietyofdomains,wefocusononetargetconceptforwhich

entropiescanbeemployed,namelytheuncertaintyconcerningavariableXgivenaprobability

distributionP.Inthisregard,thekeyquestionisthefollowing:Howmuchuncertaintyis

conveyedaboutvariableXbyagivenprobabilitydistributionP?Thisnotioniscentraltothe

normativeanddescriptivestudyofhumancognition.

Suppose,forinstance,thataninfectioncanbecausedbythreedifferenttypesofvirus,and

labelx1,x2,x3thecorrespondingpossibilities.Considertwodifferentprobabilityassignments,

suchas,say:

P(x1)=0.49,P(x2)=0.49,P(x3)=0.02

and

P*(x1)=0.70,P*(x2)=0.15,P*(x3)=0.15

IstheuncertaintyaboutX={x1,x2,x3}greaterunderPorunderP*?Anentropymeasure

enablesustogiveprecisequantitativevaluesinbothcase,andhenceaclearanswer.

Importantly,however,theanswerwilloftenbemeasure-dependent,fordifferententropy

measuresconveydifferentideasofuncertaintyandexhibitdistinctmathematicalproperties

oftheoreticalinterest.Wewillseethisindetaillateron.

Onceuncertaintyasourconceptualtargethasbeenoutlined,wecanturntoentropyasa

mathematicalobject.ConsiderafinitesetXofnmutuallyexclusiveandjointlyexhaustive

possibilitiesx1,…,xnonwhichaprobabilitydistributionP(X)isdefined,sothatP(X)={P(x1),

…,P(xn)},withP(xi)≥0foranyi(1≤i≤n)and∑ "($%&'∈) ) = 1.ThenelementsinX={x1,…,

xn}canbetakenasrepresentingdifferentkindsofentities,suchasevents,categories,or

propositions.Forourpurposes,entisanentropymeasureifitisafunctionfoftherelevant

probabilityvaluesonly,i.e.:

-./0(1) = 2[P(x1),…,P(xn)]

andfunctionfsatisfiesasmallnumberofbasicproperties(seebelow).Noticethat,ingeneral,

anentropyfunctioncanbereadilyextendedtothecaseofaconditionalprobability

Page 8: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

8

distributiongivensomedatumy.Infact,undertheconditionalprobabilitydistributionP(X|y),

onehas-./0(1|4) = 2["($6|4),… , "($9|4)].

Shannonentropyhasbeensoprominentincognitivesciencethatsomereaderswillask:

whywedonotjuststickwithit?MorespecificobjectionsinthisveinincludethatShannon

entropyisuniquelyaxiomaticallymotivated,thatShannonentropyisalreadycentralto

psychologicaltheoryofthevalueofinformation,orthatShannonentropyisoptimalincertain

appliedsituations.Eachobjectioncanbeaddressedseparately.First,anumberofentropy

metricsinourgeneralizedframework(notonlyShannon)havebeenorcanbeuniquely

derivedfromspecificsetsofaxioms(seeCsiszár,2008).Second,althoughShannonentropy

hasanumberintuitivelydesirableproperties,itisnotaseriouscompetitivedescriptive

psychologicalmodelofthevalueofinformationinsometasks(e.g.,Nelsonetal.,2010).Third,

severalpublishedpapersinapplieddomainsreportsuperiorperformancewhenother

entropymeasuresareused(e.g.,Ramírez-Reyesetal.,2016).Indeed,Shannon’s(1948)own

viewwasthatalthoughaxiomaticcharacterizationcanlendplausibilitytomeasuresof

entropyandinformation,“therealjustification”(p.393)restsonthemeasures’operational

relevance.Ageneralizedmathematicalframeworkcanincreaseourtheoreticalunderstanding

oftherelationshipsamongdifferentmeasures,unifydiversepsychologicalfindings,and

generatenovelquestionsforfutureresearch.

Scholarshaveuseddifferentpropertiesasdefininganentropymeasure(see,e.g.,Csizsár,

2008).Besidessomeusualtechnicalrequirement(likenon-negativity),akeyideaisthat

entropyshouldbeappropriatelysensitivetohowevenorunevenadistributionis,atleast

withrespecttotheextremecasesofanuniformprobabilityfunction,U(X)={1/n,…,1/n},or

ofadeterministicfunctionV(X)whereV(xi)=1forsomei(1≤i≤n)and0forallotherxs.(In

thelattercase,thedistributionactuallyreflectsatruth-valueassignment,inlogicalparlance.)

Inoursetting,U(X)representsthehighestpossibledegreeofuncertaintyaboutX,whileunder

V(X)thetruevalueofXisknownforsure,andnouncertaintyisleft.Henceitmustholdthat,

foranyXandP(X),-./;(1) ≥ -./0(1) ≥ -./=(1),withatleastoneinequalitystrict.This

basicandminimalconditionwelabelevennesssensitivity.ItisconveyedbyShannonentropy

aswellasmanyothers,asweshallsee,anditguarantees,forinstance,thatentropyisstrictly

higherfor,say,adistributionlike{1/3,1/3,1/3}thanfor{1,0,0}.

Page 9: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

9

Oncetheideaofanentropymeasureischaracterized,onecanstudydifferentmeasuresof

expectedentropyreduction.ThisamountstoconsideringtwovariablesXandY,anddefining

theexpectedreductionoftheinitialentropyofXacrosstheelementsofY.Toillustrate,inthe

viralinfectionexamplementionedabove,Xmayconcernthetypeofvirusactuallyinvolved,

whileYcouldbesomeclinicallyobservablemarker(liketheresultofabloodtest)whichis

informationallyrelevantforX.Mathematically,givenajointprobabilitydistributionP(X,Y)

overthecombinationoftwovariablesXandY(i.e.,theirCartesianproductX´Y),theactual

changeinentropyaboutXdeterminedbyanelementyinYcanberepresentedas

∆-./0(1, 4) = -./0(1) − -./0(1|4).Accordingly,theexpectedreductionoftheinitialentropy

ofXacrosstheelementsofYcanbecomputedinastandardway,asfollows:1

@0(1, A) = B ∆-./0C1, 4DE"(4DGH∈I

)

Thenotation@0(1, A)isadaptedfromworkonthefoundationsofBayesianstatistics,where

theexpectedreductioninentropyisseenasmeasuringthedependenceofvariableXon

variableY,oroftherelevanceofYforX(see,e.g.,Dawid&Musio,2014).

Verymuchasforentropyitself,theexpectedreductionofentropyremainsasgeneraland

neutralanotionaspossible.Rmeasures,too,canbegivendifferentinterpretationsin

differentdomains.Inmanycontexts,itisplausiblyassumedthatreductionoftheuncertainty

isamajordimensionofthepurelyinformational(orepistemic)valueofthesearchformore

data.WewillthusconsiderameasureRasprovidingaformalapproachtoquestionsofthe

followingkind:GivenXasatargetofinvestigation,whatistheexpectedusefulnessoffinding

outaboutYfromapurelyinformationalpointofview?Hence,thenotionofuncertaintyis

tightlycoupledtotherationalassessmentoftheexpectedinformationalutilityofpursuinga

givensearchforadditionalevidence(performingaquery,executingatest,runningan

experiment).(SeeCrupi&Tentori,2014;Nelson,2005,2008.Formorediscussion,alsosee

Evans&Over,1996;Roche&Shogenji,2016.)

1Fortechnicalreasons,wewillassumeP(yj)>0foranyj.Thisislargelyasafeprovisoforourcurrentpurposes.

Infact,inoursettingwithbothXandYfinitesets,anyzeroprobabilityoutcomeinYcouldjustbeomitted.

Page 10: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

10

Formally,XandYcanjustbeseenaspartitionsofpossibilities.Inthisinterpretation,

however,theyplayquitedifferentrolesin@0(1, A).Thefirstargument,X,representsthe

overallgoaloftheinquiry,whilethesecond,Y,issupposedtobedirectlyaccessibletothe

informationseeker.Inatypicalapplication,Ywillbemoreorlessusefulatesttolearnabout

targetX,althoughunabletoconclusivelyestablishwhatthetruehypothesisinXis.

Table1.Notationemployed.

Notation Description

H={h1,…,hn} Apartitionofnpossibilities(orhypothesisspace).

P(H) ProbabilitydistributionPdefinedovertheelementsofH.

P(H|e) ProbabilitydistributionPdefinedovertheelementsofHconditionalone.

U(H) UniformprobabilitydistributionovertheelementsofH.

V(H) AprobabilitydistributionsuchthatV(hi)=1forsomei(1≤i≤n)and0forallotherhs.

H´E Thevariableobtainedbythecombination(Cartesianproduct)ofvariablesHandE.

P(H,E) JointprobabilitydistributionoverthecombinationofvariablesHandE.

J ⊥0 L GivenP(H,E),variablesHandEarestatisticallyindependent.

J ⊥0 L|M GivenP(H,E,F),variablesHandEarestatisticallyindependentconditionaloneachelementinF.

-./0(J) EntropyofHgivenP(H).

-./0(J|-) ConditionalentropyofHonegivenP(H|e).

∆-./0(J, -) ReductionoftheinitialentropyofHprovidedbye,i.e.,-./0(J) − -./0(J|-).

@0(J,L) ExpectedreductionoftheentropyofHacrosstheelementsofE,givenP(H,E).

@0(J,L|2) ExpectedreductionoftheentropyofHacrosstheelementsofE,givenP(H,E|f).

@0(J,L|M) Expectedvalueof@0(J,L|2)acrosstheelementsofF,givenP(H,E,F).

lnt(x) TheTsallisgeneralizationofthenaturallogarithm(withparametert).

et(x) TheTsallisgeneralizationoftheordinaryexponential(withparametert).

Page 11: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

11

Ingeneral,theoccurrenceofoneparticularelementyofYdoesnotneedtoreducethe

initialentropyaboutX;itmightasmuchincreaseit,hencemaking∆-./0(1, 4)negative.This

quantitycanbenegativeif(forinstance)datumychangesprobabilitiesfromP(X)={0.9,0.1}

toP(X|y)={0.6,0.4}.Butcan@0(1, A),i.e.,theexpectedinformationalusefulnessofYfor

learningaboutX,benegative?SomeRmeasuresarestrictlynon-negative,butotherscanin

factbenegativeintheexpectation;thisdependsonkeypropertiesoftheunderlyingentropy

measure,aswediscusslateron.

Tosummarize,inthedomainofhumancognition,probabilitydistributionscanbe

employedtorepresentanagent’sdegreesofbelief(betheybasedonobjectivestatistical

informationorsubjectiveconfidence),withentropy-./0(1)providingaformalizationofthe

uncertaintyaboutX(givenP).Relyingonthereductionofuncertaintyasaninformational

utility,@0(1, A)istheninterpretedasameasureoftheexpectedusefulnessofaquery(test,

experiment)YrelativetoatargethypothesisspaceX.Fromnowon,toemphasizethis

interpretation,wewilloftenuseH={h1,…,hn}todenoteahypothesissetofinterestandE=

{e1,…,em}forapossiblesearchforevidence.Table1summarizesourterminologyinthis

respectaswellasforthesubsequentsections.

3.FourInfluentialEntropyModels

Wewillnowbrieflyreviewfourimportantmodelsofentropyandthecorrespondingmodels

ofexpectedentropyreduction.

3.1.Quadraticentropy

Entropy/Uncertainty.Someinterestingentropymeasureswereoriginallyproposedlong

beforetheexchangebetweenShannonandvonNeumann,whenentropywasnotyeta

scientifictermoutsidestatisticalthermodynamics.Hereisonemajorinstance:

-./0NOPQ(J) = 1 − ∑ "(ℎ%)ST'∈U

LabeledQuadraticentropyinVajdaandZvárová(2007),thismeasureiswidelyknownasthe

Gini(orGini-Simpson)index,afterGini(1912)andSimpson(1949)(alsoseeGibbs&Martin,

Page 12: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

12

1962).Itisoftenemployedasanindexofbiologicaldiversity(see,e.g.,Patil&Taille,1982)

andsometimesspelledoutinthefollowingequivalentformulation:

-./0NOPQ(J) = ∑ "(ℎ%T'∈U )(1 − "(ℎ%))

TheaboveformulasuggestsameaningfulinterpretationwithHamountingtoapartitionof

hypothesesconsideredbyanuncertainagent.Inthisreading,-./NOPQ computestheaverage

(expected)surprisethattheagentwouldexperienceinfindingoutwhatthetrueelementofH

is,given1–P(h)asameasureofthesurprisethatarisesincasehobtains(seeCrupi&

Tentori,2014).2

Entropyreduction/Informationalvalueofqueries.Quadraticentropyreduction,namely,

∆-./0NOPQ(J, -) = -./0

NOPQ(J) − -./0NOPQ(J|-),hasbeenoccasionallymentionedin

philosophicalanalysesofscientificinference(Niiniluoto&Tuomela,1973,p.67).Inturn,its

associatedexpectedreductionmeasure,@0NOPQ(J, L) = ∑ ∆-./0

NOPQCJ, -DE"(-DVH∈W ),was

appliedbyHorwich(1982,pp.127-129),againinformalphilosophyofscience,andstudiedin

computersciencebyRaileanuandStoffel(2004).

3.2.Hartleyentropy

Entropy/Uncertainty.Gini’sworkdidnotplayanyapparentroleinthedevelopmentof

Shannon’s(1948)theory.AseminalpaperbyHartley(1928),however,wasastartingpoint

forShannon’sanalysis.OnelastinginsightofHartleywastheintroductionoflogarithmic

functions,whichhavebecomeubiquitousininformationtheoryeversince.AsHartleyalso

realized,thechoiceofabaseforthelogarithmisamatterofconventionallysettingaunitof

measurement(Hartley,1928,pp.539-541).Throughoutourdiscussion,wewillemploythe

naturallogarithm,denotedasln.

2-./NOPQ alsoquantifiestheoverallexpectedinaccuracyofprobabilitydistributionP(H)asmeasured

bytheso-calledBrierscore(i.e.,thesquaredEuclideandistancefromthepossibletruth-value

assignmentsoverH;seeBrier,1950;Leitgeb&Pettigrew,2010a,b;Pettigrew,2013;Selten,1998).

Festa(1993,137ff.)alsogivesausefuldiscussionofQuadraticentropyinthephilosophyofscience,

includingCarnap’s(1952)classicalworkininductivelogic.

Page 13: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

13

InspiredbyHartley’s(1928)originalideathattheinformationprovidedbytheobservation

ofoneamongnpossiblevaluesofavariableisincreasinglyinformativethelargernis,and

thatitimmediatelyreflectstheentropyofthatvariable,onecandefinetheHartleyentropyas

follows(Aczél,Forte,andNg,1974):

-./0UPXYZVG(J) = [.\∑ "(ℎ%)]T'∈U ^

Undertheconvention00=0(whichisstandardintheentropyliterature),andgiventhat

P(hi)0=1wheneverP(hi)>0,-./UPXYZVG computesthelogarithmofthenumberofallnon-null

probabilityelementsinH.

Entropyreduction/Informationalvalueofqueries.Whenappliedtothedomainofreasoning

andcognition,theimplicationsofHartleyentropyrevealaninterestingPopperianflavor.A

pieceofevidenceeisuseful,itturnsout,onlytotheextentthatitexcludes(“falsifies”)atleast

someofthehypothesesinH,forotherwisethereductioninHartleyentropy,

∆-./0UPXYZVG(J, -) = -./0

UPXYZVG(J) − -./0UPXYZVG(J|-),isjustzero.Anagentadoptingsucha

measureofinformationalutilitywouldthenonlyvalueatestoutcome,e,insofarasit

conclusivelyrulesoutatleastonehypothesisinH.IfnopossibleoutcomeinEispotentiallya

“falsifier”forsomehypothesisinH,thentheexpectedreductionofHartleyentropy,@UPXYZVG ,

isalsozero,implyingthatqueryEhasnoexpectedusefulnessatallwithrespecttoH.

3.3.Shannonentropy

Entropy/Uncertainty.Inmanycontexts,thenotionofentropyissimplyandimmediately

equatedtoShannon’sformalism.Overall,suchspecialconsiderationiswell-deservedand

motivatedbycountlessapplicationsspreadovervirtuallyallbranchesofscience.Theformof

Shannonentropyisfairlywell-known:

-./0_TP99`9(J) = ∑ "(ℎ%)[. a6

0(T')bT'∈U

Concerningtheinterpretationoftheformula,manypointsmadeearlierforquadraticentropy

applytoShannonentropytoo,givenrelevantadjustments.Infact,ln(1/P(h))isanother

Page 14: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

14

measureofthesurpriseinfindingoutthatastateofaffairshobtains,andthus-./_TP99`9is

itsoverallexpectedvaluerelativetoH.3

Figure1.AgraphicalillustrationofQuadratic,Hartley,Shannon,andErrorentropyasdistinctmeasuresof

uncertaintyoverabinaryhypothesissetH={h,ℎ}asafunctionoftheprobabilityofh.

Entropyreduction/Informationalvalueofqueries.ThereductionofShannonentropy,

∆-./0_TP99`9(J, -) = -./0_TP99`9(J) − -./0_TP99`9(J|-),issometimescalledinformationgain

anditisoftenconsideredasameasureoftheinformationalutilityofadatume.Itsexpected

value,alsocalledexpectedinformationgain,@0_TP99`9(J, L) = ∑ ∆-./0_TP99`9CJ, -DEVH∈W "(-D),

isthenviewedasameasureofusefulnessofqueryEforlearningaboutH.(See,e.g.,

3Thequantityln(1/P(h))alsocharacterizesapopularapproachtothemeasurementoftheinaccuracy

ofprobabilitydistributionP(H)whenhisthetrueelementinH(so-calledlogarithmicscore),

and-./_TP99`9 canbeseenascomputingtheexpectedinaccuracyofP(H)accordingly(seeGood,

1952;alsoseeGneiting&Raftery,2007).

0.0

0.5

1.0

0.00 0.25 0.50 0.75 1.00Probability of h

Entro

py o

f bin

ary

varia

ble H

EntropyHartley

Shannon

Quadratic

Error

Hartley, Quadratic, Shannon, and Error entropy

Page 15: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

15

Austerweil&Griffiths,2011;Bar-Hillel&Carnap,1953;Lindley,1956;Oaksford&Chater,

1994,2003,andRuggeri&Lombrozo,2015;alsoseeBenish,1999,andNelson,2005,2008,

formorediscussion.)

3.4.Errorentropy

Entropy/Uncertainty.GivenadistributionP(H)andthegoalofpredictingthetrueelementof

H,arationalagentwouldplausiblyselecth*suchthatP(h*)=maxT'∈U["(ℎ%)],and1–

maxT'∈U["(ℎ%)]wouldthenbetheprobabilityoferror.SinceFano’s(1961)seminalwork,this

quantityhasreceivedconsiderableattentionininformationtheory.AlsoknownasBayes’s

error,wewillcallthisquantityErrorentropy:

-./0WXX`X(J) = 1 − maxT'∈U["(ℎ%)]

Notethat-./WXX`XisonlyconcernedwiththelargestvalueinthedistributionP(H),namely

maxT'∈U["(ℎ%)].Thelowerthatvalue,thehigherthechanceoferrorwereaguesstobemade,

thusthehighertheuncertaintyaboutH.

Entropyreduction/Informationalvalueofqueries.Unliketheothermodelsabove,Error

entropyhasseldombeenconsideredinthenaturalorsocialsciences.However,itcanbe

takenasasoundbasisfortheanalysisofrationalbehavior.Inthelatterdomain,itisquite

naturaltorelyonthereductionoftheexpectedprobabilityoferror∆-./0WXX`X(J, -) =

-./0WXX`X(J) − -./0WXX`X(J|-)astheutilityofadatum(oftenlabelledprobabilitygain;see

Baron,1985;Nelson,2005,2008)andonitsexpectedvalue,@0WXX`X(J, L) =

∑ ∆-./0WXX`XCJ, -DEVH∈W "(-D),astheusefulnessofaqueryortest.Indeed,thereareimportant

occurrencesofthismodelinthestudyofhumancognition.4

4AnearlyexampleisBaron’s(1985,ch,4)presentationof@WXX`X ,followingSavage(1972,ch.6).Experimental

investigationsonwhether@WXX`X canaccountforactualpatternsofreasoningincludeBaron,Beattie,and

Hershey(1988),Bramley,Lagnado,andSpeekenbrink(2015),MederandNelson(2012),Nelson,McKenzie,

Cottrell,andSejnowski(2010),andRusconi,Marelli,D’Addario,Russo,andCherubini(2014),whileCrupi,

Tentori,andLombardi(2009)reliedon@WXX`X intheircriticalanalysisofso-calledpseudodiagnosticity(alsosee

Crupi&Girotto,2014;Tweeney,Doherty,&Kleiter,2010).

Page 16: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

16

4.AUnifiedFrameworkforUncertaintyandInformationSearch

Thesetofmodelsintroducedaboverepresentsadiversesampleinhistorical,theoretical,and

mathematicalterms(seeFigure1foragraphicalillustration).Istheprominenceofparticular

modelsduetofundamentaldistinctiveproperties,orlargelyduetohistoricalaccident?What

aretherelationshipsamongthesemodels?Inthissectionweshowhowallofthesemodels

canbeembeddedinaunifiedmathematicalformalism,providingnewinsight.

4.1.Sharma-Mittalentropies

LetustakeShannonentropyagainasaconvenientstartingpoint.Asnotedabove,Shannon

entropyisanaverage,morepreciselyaself-weightedaverage,displayingthefollowing

structure:

∑ "(ℎ%)f.2["(ℎ%)]T'∈U

Thelabelself-weightedindicatesthateachprobabilityP(h)servesasaweightforthevalueof

functioninfhavingthatsameprobabilityasitsargument,namely,inf[P(h)].Thefunctioninf

canbeseenascapturinganotionofatomicinformation(orsurprise),assigningavaluetoeach

distinctelementofHonthebasisofitsownprobability(andnothingelse).Anobvious

requirementhereisthatinfshouldbeadecreasingfunction,becauseafindingthatwas

antecedentlyhighlyprobable(improbable)provideslittle(much)newinformation(anidea

thatFloridi,2013,calls“inverserelationshipprinciple”afterBarwise,1997,p.491).In

Shannonentropy,onehasinf(x)=ln(1/x).Giveninf(x)=1–x,instead,Quadraticentropy

arisesfromtheverysameschemeabove.

Aself-weightedaverageisaspecialcaseofageneralized(self-weighted)mean,whichcan

becharacterizedasfollows:

gh6i∑ "(ℎ%)g{f.2["(ℎ%)]}T'∈U l

wheregisadifferentiableandstrictlyincreasingfunction(seeWang&Jiang,2005;alsosee

Muliere&Parmigiani,1993,forthefascinatinghistoryoftheseideas).Fordifferentchoicesof

g,differentkindsof(self-weighted)meansareinstantiated.Withg(x)=x,theweighted

averageaboveobtainsonceagain.Foranotherstandardinstance,g(x)=1/xgivesrisetothe

Page 17: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

17

harmonicmean.Letusnowconsidertheformofgeneralized(self-weighted)meansabove

andfocusonthefollowingsetting:

g(x)=lnr[et(x)]

inf(x)=lnt(1/x)

where

[.Y($) =&(mno)h66hY

-Y& = [1 + (1 − /)$]mmno

aregeneralizedversionsofthenaturallogarithmandexponentialfunctions,respectively,

oftenassociatedwithTsallis’s(1988)work.Importantly,thelntfunctionrecoversthe

ordinarynaturallogarithmlninthelimitfort®1,sothatonecansafelyequatelnt(x)=ln(x)

fort=1andhaveaniceandsmoothgeneralizedlogarithmicfunction.5Similarly,itisassumed

that-Y& =exfort=1,asthisisthelimitfort®1[SupplMat,section1].Negativevaluesof

parametersrandtwillnotneedconcernushere:we’llbeassumingr,t≥0throughout.

Oncefedintothegeneralizedmeansequation,thesespecificationsofinf(x)andg(x)yielda

two-parameterfamilyofentropymeasuresoforderranddegreet[SupplMat,2]:

-./0_q(r,o)(J) = 6

Yh6 s1 − C∑ "(ℎ%)XT'∈U Eonmrnmt

ThelabelSMreferstoSharmaandMittal(1975),wherethisformalismwasoriginally

proposed(alsoseeMasi,2005,andHoffmann,2008).AllfunctionsintheSharma-Mittalfamily

areevennesssensitive(see2.above),thusinlinewithabasiccharacterizationofentropies

[SupplMat,2].Also,with-./_q(r,o)onecanembedthewholesetoffourclassicmeasuresin

ourinitiallist.Moreprecisely[SupplMat,3]:

5TheideaoflntisoftencreditedtoTsallisforhisworkingeneralizedthermodynamics(seeTsallis,1988,and

2011).ThemathematicalpointmaywellgobacktoEuler,however(seeHoffmann,2008,p.7).Formoretheory,

alsoseeHavrdaandCharvát(1967),Daróczy(1970),Naudts(2002),Kaniadakis,Lissia,andScarfone(2004).

Page 18: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

18

– QuadraticentropycanbederivedfromtheSharma-Mittalfamilyforr=t=2,thatis,

-./0_q(u,u)(J) = -./0

NOPQ(J);

– HartleyentropycanbederivedfromtheSharma-Mittalfamilyforr=0andt=1,thatis,

-./0_q(v,m)(J) = -./0

UPXYZVG(J);

– ShannonentropycanbederivedfromtheSharma-Mittalfamilyforr=t=1,thatis,

-./0_q(m,m)(J) = -./0_TP99`9(J);

– ErrorentropyisrecoveredfromtheSharma-Mittalfamilyinthelimitforr®∞whent=

2,sothatwehave-./0_q(w,u)(J) = -./0WXX`X(J).

Figure2.TheSharma-MittalfamilyofentropymeasuresisrepresentedinaCartesianquadrantwithvaluesofthe

orderparameterrandofthedegreeparametertlyingonthex–andy–axis,respectively.Eachpointinthe

quadrantcorrespondstoaspecificentropymeasure,eachlinecorrespondstoadistinctone-parameter

generalizedentropyfunction.Severalspecialcasesarehighlighted.(Relevantreferencesandformulasarelisted

inTable4.)

●●

Hartley

Shannon

Quadratic

Rényi

Effective number

Power

Error entropy

Tsallis

Gau

ssia

n

Arimoto

Orig

in

0

1

2

3

4

0 1 2 3 4Order r

Deg

ree t

Sharma−Mittal space of entropy measures

Page 19: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

19

Figure3.Agraphicalillustrationofthegeneralizedatomicinformationfunctionlnt(1/P(h))forfourdifferent

valuesoftheparametert(0,1,2,and5,respectively,forthecurvesfromtoptobottom).Appropriately,the

amountofinformationarisingfromfindingoutthathisthecaseisadecreasingfunctionofP(h).Forhighvalues

oft,however,suchdecreaseisflattened:witht=5(thelowestcurveinthefigure)findingoutthathistrue

providesalmostthesameamountofinformationforalargesetofinitialprobabilityassignments.

Agooddealmorecanbesaidaboutthescopeofthisapproach:seeFigures2and3,Table4,

andSupplMat(section3)foradditionalmaterial.Here,wewillonlymentionbrieflythree

importantfurtherpointsaboutR-measuresintheSharma-Mittalframeworkandtheir

meaningformodellinginformationsearchbehavior.Theyareasfollows.

Additivityofexpectedentropyreduction:ForanyH,E,Fand"(J, L, M),@0_q(r,o)(J, L × M) =

@0_q(r,o)(J, L) + @0

_q(r,o)(J, M|L).

Thisstatementmeansthat,foranySharma-MittalR-measure,theinformationalutilityofa

combinedtestL × MforHamountstothesumoftheplainutilityofEandtheutilityofFthat

0

1

2

3

0.00 0.25 0.50 0.75 1.00Probability of h

Gen

eral

ized

ato

mic

info

rmat

ion

of h

Degreet = 0

t = 1

t = 2

t = 5

Tsallis generalized atomic information

Page 20: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

20

isexpectedconsideringallpossibleoutcomesofE[SupplMat,4].(Formally,@0_q(r,o)(J, M|L) =

∑ @0_q(r,o)CJ, M|-DE"C-DEVH∈W ,while@0

_q(r,o)CJ, M|-DEdenotestheexpectedentropyreductionof

HprovidedbyFascomputedwhenallrelevantprobabilitiesareconditionalizedonej.)

AccordingtoNelson’s(2008)discussion,thiselegantadditivitypropertyofexpectedentropy

reductionisimportantandhighlydesirableasconcernstheanalysisoftherational

assessmentoftestsorqueries.Moreover,onecanseethattheadditivityofexpectedentropy

reductioncanbeextendedtoanyfinitechainofqueriesandthusbeappliedtosequential

searchtaskssuchasthoseexperimentallyinvestigatedbyNelsonetal.(2014).

Irrelevance:ForanyH,Eand"(J, L),ifeitherE={e}orJ ⊥0 L,then@0_q(r,o)(J, L) = 0.

Thisstatementsaysthattwospecialkindsofqueriescanbeknowninadvancetobeofnouse,

thatis,informationallyinconsequentialrelativetothehypothesissetofinterest.Oneisthe

caseofanemptytestE={e}withasinglepossibilitythatisalreadyknowntoobtainwith

certainty,sothatP(e)=1.AssuggestedvividlybyFloridi(2009,p.26),thiswouldbelike

consultingtheraveninEdgarAllanPoe’sfamouspoem,whichisknowntogiveoneandthe

sameanswernomatterwhat(italwaysspellsout“Nevermore”).Theothercaseiswhen

variablesHandEareunrelated,thatis,statisticallyindependentaccordingtoP(J ⊥0 Lin

ournotation).Inbothofthesecircumstances,@0_q(r,o)(J, L) = 0simplybecausethepriorand

posteriordistributiononHareidenticalforeachpossiblevalueofE,sothatnoentropy

reductioncaneverobtain.

Bytheirrelevancecondition,emptyandunrelatedquerieshavezeroexpectedutility—but

canaqueryEhaveanegativeexpectedutility?Ifso,arationalagentwouldbewillingtopaya

costjustfornotbeingtoldwhatthetruestateofaffairsisasconcernsE,muchasan

abandonedloverwhowantstobesparedbeingtoldwhetherher/hisbelovedisorisnot

happybecauses/heexpectsmoreharmthangood.Note,however,thatforthelovernon-

informationalcostsareclearlyinvolved,whileweareassumingqueriesorteststobe

assessedinpurelyinformationalterms,bracketingallfurtherfactors(see,e.g.,Raiffa&

Schlaifer,1961,Meder&Nelson,2012,andMarkant&Gureckis,2012,forworkinvolving

situation-specificpayoffs).Inthisperspective,itisreasonableandcommontoseeirrelevance

Page 21: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

21

astheworst-casescenarioandexcludethepossibilityofinformationallyharmfultests:an

irrelevanttest(whetheremptyorstatisticallyunrelated)simplycannottellusanythingof

interest,butthatisasbadasitcanget(seeGood,1967,andGoosens,1976,forseminal

analyses;alsoseeDawid,1998).6

Interestingly,notallSharma-Mittalmeasuresofexpectedentropyreductionarenon-

negative.Someofthemdoallowforthecontroversialideathattherecouldexistdetrimental

testsinpurelyinformationalterms,suchthatanagentshouldrankthemworsethanan

irrelevantsearchandtakeactivemeasurestoavoidthem(despitethemhaving,by

assumption,nointrinsiccost).Mathematically,anon-negativemeasure@0(J, L)isgenerated

ifandonlyiftheunderlyingentropymeasureisaconcavefunction[SupplMat,4],andthe

conditionsforconcavityareasfollows:

Concavity:-./0_q(r,o)(J)isaconcavefunctionof{P(h1),…,P(hn)}justincaset≥2–1/r.7

IntermsofFigure2,thismeansthatanyentropy(representedbyapoint)belowtheArimoto

curveisnotgenerallyconcave(seeFigure4foragraphicalillustrationofastronglynon-

concaveentropymeasure).Thus,iftheconcavityofentisrequired(topreservethenon-

6Intheoriesofso-calledimpreciseprobabilities,thenotionarisesofadetrimentalexperimentEinthesensethat

intervalprobabilityestimatesforeachelementinahypothesissetofinterestHcanbeproperlyincludedinthe

correspondingintervalprobabilityestimatesconditionaloneachelementinE.Thisphenomenonisknownas

dilation:one’sinitialstateofcredenceaboutHbecomeslessprecise(thusmoreuncertain,underaplausible

interpretation)nomatterhowanexperimentturnsout.Thestronglyunattractivecharacterofthisimplication

hasbeensometimesdisregarded(seeTweeneyetal.,2010,foranexampleinthepsychologyofreasoning),but

theprevailingviewisthatappropriatemovesarerequiredtoavoiditordispelit(forrecentdiscussions,see

Bradley&Steele,2014;Pedersen&Wheeler,2014).

7ThisimportantresultisproveninHoffmann(2008),andalreadymentionedinTanejaetal.(1989,p.61),who

inturnrefertovanderPyl(1978)foraproof.Wedidnotpositconcavityasadefiningpropertyofentropies,and

that’showitshouldbe,inouropinion.Concavitymaydefinitelybeconvenientorevenrequiredinsome

applications,butbarringnon-concavefunctionswouldbeoverlyrestrictiveasconcernstheformalnotionof

entropy.Inphysics,forinstance,concavityistakenasdirectlyrelevantforgeneralizedthermodynamics(Beck,

2009,p.499;Tsallis,2004,p.10).Inbiologicalapplications,ontheotherhand,concavitywassuggestedby

Lewontin(1972;alsoseeRao2010,p.71),butseenashaving“nointuitivemotivation”byPatilandTaille(1982,

p.552).

Page 22: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

22

negativityofR),thenmanyprominentspecialcasesareretained(includingQuadratic,

Hartley,Shannon,andErrorentropy),butasignificantbitofthewholeSharma-Mittal

parameterspaceisruledout.Thisconcerns,forinstance,entropiesofdegree1andorder

higherthan1(seeBen-Bassat&Raviv,1978).

Table4.AsummaryoftheSharma-Mittalframeworkandseveralofitsspecialcases,includingaspecificationof

theirstructureinthegeneraltheoryofmeansandakeyreferenceforeach.

(r,t)-setting AlgebraicformofentP(H)Generalizedmeanconstruction

Characteristicfunctionanditsinverse Atomicinformation

Sharma-Mittal

Sharma&Mittal(1975)

r≥0

t≥0

1/ − 1

⎣⎢⎢⎡1 − }B "(ℎ%)X

T'∈U

~

Yh6Xh6

⎦⎥⎥⎤ g($) = [.X(-Y&) gh6($) = [.Y(-X&) f.2($) = [.Y Ç

1$É

EffectiveNumbers

Hil(1973)

r≥0

t=0}B "(ℎ%)XT'∈U

~

66hX

− 1 g($) = [.X(1 + $) gh6($) = -X& − 1 f.2($) =1 − $$

Rényi

Rényi(1961)

r≥0

t=1

11 − Ñ [.}B "(ℎ%)X

T'∈U

~ g($) = [.X(-&) gh6($) = [.(-X&) f.2($) = [. Ç1$É

Powerentropies

Laakso&Taagepera(1979)

r≥0

t=21 − }B "(ℎ%)X

T'∈U

~

6Xh6

g($) = [.X Ç1

1 − $Égh6($) = 1 − (-X&)h6 f.2($) = 1 − $

Gaussian

Frank(2004)

r=1

t≥0

1/ − 1 Ö1 − -(6hY)s∑ 0(T')Ü'∈á Z9Ç 6

0(T')Étà g($) = [.(-Y&) gh6($) = [.Y(-&) f.2($) = [.Y Ç

1$É

Arimoto

Arimoto(1971)

r≥½

/ = 2 −1Ñ

ÑÑ − 1

⎣⎢⎢⎡1 − }B "(ℎ%)X

T'∈U

~

6X

⎦⎥⎥⎤ g($) = [.X s1 + Ç

1 − ÑÑ É$t

X6hX gh6($) =

ÑÑ − 1 ä1 − (-X

&)6hXX ã f.2($) =

ÑÑ − 1 ä1 − $

Xh6X ã

Tsallis

Tsallis(1988)

r=t≥0 1/ − 1}1 − B "(ℎ%)Y

T'∈U

~ g($) = $ gh6($) = $ f.2($) = [.Y Ç1$É

Quadratic

Gini(1912)

r=t=2 1 − B "(ℎ%)ST'∈U

g($) = $ gh6($) = $ f.2($) = 1 − $

Shannon

Shannon(1948)

r=t=1 B "(ℎ%)T'∈U

[. Ç1

"(ℎ%)É g($) = $ gh6($) = $ f.2($) = [. Ç

1$É

Hartley

Hartley(1928)

r=0

t=1[. }B "(ℎ%)]

T'∈U

~ g($) = -& − 1 gh6($) = [.(1 + $) f.2($) = [. Ç1$É

Page 23: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

23

Figure4.Graphicalillustrationofthenon-concaveentropy-./_q(uv,v)forabinaryhypothesissetH={h,ℎ}asa

functionoftheprobabilityofh.

4.2.Psychologicalinterpretationoftheorderanddegreeparameter

Theorderparameterr:Imbalanceandcontinuity.Whatisthemeaningoftheorderparameter

intheSharma-Mittalformalismwhenentropiesandexpectedentropyreductionmeasures

representuncertaintyandthevalueofqueries,respectively?Toclarify,letusconsiderwhat

happenswithextremevaluesofr,i.e.,ifr=0orgoestoinfinity,respectively[SupplMat,3]:

-./0_q(v,o)(J) = [.Y\∑ "(ℎ%)]T'∈U ^

-./0_q(w,o)(J) = [.Y s

6åçéÜ'∈á[0(T')]

t

Giventheconvention00=0,∑ "(ℎ%)]T'∈U simplycomputesthenumberofallelementsinH

withanon-nullprobability.Accordingly,whenr=0,entropybecomesa(increasing)function

ofthemerenumberofthe“live”(non-zeroprobability)optionsinH.Whenrgoestoinfinity,

0.0

0.5

1.0

0.00 0.25 0.50 0.75 1.00Probability of h

Entro

py o

f bin

ary

varia

ble H

Non−concave entropy

Page 24: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

24

ontheotherhand,entropybecomesa(decreasing)functionoftheprobabilityofasingle

elementinH,i.e.,themostlikelyhypothesis.Thisshowsthattheorderparameterrisanindex

oftheimbalanceoftheentropyfunction,whichindicateshowmuchtheentropymeasure

discountsminor(lowprobability)hypotheses.Fororder-0measures,theactualprobability

distributionisneglected:non-zeroprobabilityhypothesisarejustcounted,asiftheywereall

equallyimportant(seeGauvrit&Morsanyi,2014).Fororder-¥measures,ontheotherhand,

onlythemostprobablehypothesismatters,andallotherhypothesesaredisregarded

altogether.Forintermediatevaluesofr,morelikelyhypothesescountmore,butlesslikely

hypothesesdoretainsomeweight.Thehigher[lower]ris,themore[less]thelikely

hypothesesareregardedandtheunlikelyhypothesesarediscounted.Importantly,for

extremevaluesoftheorderparameter,anotherwisenaturalideaofcontinuityfailsinthe

measurementofentropy:whenrgoestoeitherzeroorinfinity,itisnotthecasethatsmall

(large)changesintheprobabilitydistributionP(H)producecomparablysmall(large)changes

inentropyvalues.

Toseebetterhoworder-0entropymeasuresbehave,considerthesimplestofthem:

-./0_q(v,v)(J) = .è − 1

where.è = ∑ "(ℎ%)]T'∈U ,so.èdenotesthenumberofhypothesesinHwithanon-null

(strictlypositive)probability.Giventhe–1correction,-./_q(v,v)canbeinterpretedasthe

“numberofcontenders”foreachentityinsetH,becauseittakesvalue0whenonlyone

elementisleft.Forfuturereference,wewilllabel-./_q(v,v) Originentropybecauseitmarks

theoriginofthegraphinFigure3.Importantly,theexpectedreductionofOriginentropyis

justtheexpectednumberofhypothesesinHconclusivelyfalsifiedbyatestE.

TotheextentthatalldetailsofthepriorandposteriorprobabilitydistributionoverHare

neglected,computationaldemandsaresignificantlydecreasedwithorder-0entropies.Asa

consequence,measuresoftheexpectedreductionofanorder-0entropy(andespeciallyOrigin

entropy)alsoamounttocomparablyfrugal,heuristicorquasi-heuristicmodelsofinformation

search(seeBaronetal.’smodel,1988,p.106).Lackofcontinuity,too,isassociatedwith

heuristicmodels,whichoftenrelyondiscreteelementsinsteadofcontinuousrepresentations

(seeGigerenzer,Hertwig,&Pachur,2011;Katsikopoulos,Schooler,&Hertwig,2010).More

Page 25: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

25

generally,whentheorderparameterapproaches0,entropymeasuresbecomemoreand

morebalanced,meaningthattheytreatalllivehypothesesmoreandmoreequally.What

happenstotheassociatedexpectedentropyreductionmeasuresisthattheybecomemore

andmore“Popperian”inspirit.Infact,fororder-0relevancemeasures,atestEwilldeliver

somenon-nullexpectedinformationalutilityabouthypothesissetHifandonlyifsomeofthe

possibleoutcomesofEcanconclusivelyruleoutsomeelementinH.Otherwise,theexpected

entropyreductionwillbezero,nomatterhowlargethechangesinprobabilitythatmight

arisefromE.Cognitively,relevancemeasuresofloworderwouldthendescribethe

informationsearchpreferencesofanagentwhoisdistinctivelyeagertoprunedownthelistof

candidatehypotheses,anattitudewhichmightprevailinearlierstagesofaninquiry,when

suchalistcanbesizable.

Amongentropymeasuresoforderinfinity,wealreadyknow-./_q(w,u) = 1 −

maxT'∈U["(ℎ%)]asErrorentropy.Whatthisillustratesisthat,whenrgoestoinfinity,entropy

measuresbecomemoreandmoredecision-theoreticinashort-sightedkindofway:inthe

limit,theyareonlyaffectedbytheprobabilityofacorrectguessgiventhecurrentlyavailable

information.Anotableconsequencefortheassociatedmeasuresofexpectedentropy

reductionisthatatestEcandeliversomenon-nullexpectedinformationalutilityonlyifsome

ofthepossibleoutcomesofEcanaltertheprobabilityofthemodalhypothesisinH.Ifthatis

notthecase,thentheexpectedutilitywillbezero,nomatterhowsignificantthechangesin

theprobabilitydistributionarisingfromE.Cognitively,then,R-measuresofveryhighorder

woulddescribetheinformationsearchpreferencesofanagentwhoispredominantly

concernedwithanestimateoftheprobabilityoferrorinanimpendingchoicefromsetH.

Thedegreeparametert:Perfecttestsandcertainty.Letusnowconsiderbrieflythemeaningof

thedegreeparametertintheSharma-Mittalformalismwhenentropiesandrelevance

measuresrepresentuncertaintyandthevalueofqueries,respectively.Aremarkablefact

aboutthedegreeparametertisthat(unliketheorderparameterr)itdoesnotaffectthe

rankingofentropyvalues.Indeed,onecanshowthatanySharma-Mittalentropymeasureisa

strictlyincreasingfunctionofanyothermeasureofthesameorderr,regardlessofthedegree

(foranyhypothesissetHandanyprobabilitydistributionP)[SupplMat,4].Thus,concerning

Page 26: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

26

theordinalcomparisonofentropyvalues,onlyiftheorderdifferscandivergencesbetween

pairsofSMentropymeasuresarise.Ontheotherhand,theimplicationsofthedegree

parameterformeasuresofexpectedentropyreductionaresignificantandhavenotreceived

muchattention.

Asausefulbasisfordiscussion,supposethatvariablesHandEareindependent,inthe

standardsensethatforanyhiÎHandanyejÎE,P(hiÇej)=P(hi)P(ej),denotedasJ ⊥0 L.

Thenwehave[SupplMat,4]:

@0_q(r,o)(L, L) − @0

_q(r,o)(J × L, L) = (/ − 1)-./0_q(r,o)(J)-./0

_q(r,o)(L)

Ifexpectedentropyreductionisinterpretedasameasureoftheinformationalutilityof

queriesortests,thisequalitygovernstherelationshipbetweenthecomputedutilitiesofEin

caseitisa“perfect”(conclusive)testandincaseitisnot.Moreprecisely,thefirsttermonthe

left,@0_q(r,o)(L, L),measurestheexpectedinformationalutilityofaperfecttestbecausethe

testitselfandthetargetofinvestigationarethesame,hencefindingoutthetruevalueofE

removesallrelevantuncertainty.Ontheotherhand,Eisnotanymoreaperfecttestinthe

secondtermoftheequationabove,@0_q(r,o)(J × L,L),forhereamorefine-grainedhypothesis

setJ × Lisatissue,thusamoredemandingepistemictarget;hencefindingoutthetruevalue

ofEwouldnotremoveallrelevantuncertainty.(Recallthat,byassumption,Hisstatistically

independentfromE,sotheuncertaintyaboutHwouldremainuntouched,asitwere,after

knowingaboutE.).Withentropiesofdegree1(includingShannon),theassociatedmeasures

ofexpectedentropyreductionimplythatEhasexactlyidenticalutilityinbothcases,becauset

=1nullifiestheright-handsideoftheequation,regardlessoftheorderparameterr.Witht>1

theright-handsideispositive,soEisastrictlymoreusefultestwhenitisconclusivethan

whenitisnot.Witht<1,onthecontrary,theright-handsideisnegative,soEisstrictlyless

usefulatestwhenitisconclusivethanwhenitisnot.Notethattheseareordinalrelationships

(rankings).Incomparingtheexpectedinformationalutilityofqueries,thedegreeparametert

canthusplayacrucialrole.CrupiandTentori(2014,p.88)providedsomesimple

illustrationswhichcanbeadaptedasfavoringanentropywitht>1asthebasisfortheR-

measureoftheexpectedutilityofqueries(here,wepresentanillustrationinFigure5).

Page 27: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

27

Figure5.Considerastandard52-cardplayingdeck,withSuitcorrespondingtothe4equallyprobablesuits,

Valuecorrespondingtothe13equallyprobablenumbers(orfaces)thatacardcantake(2through10,Jack,

Queen,King,Ace),andSuit�Valuecorrespondingtothe52equallyprobableindividualcardsinthedeck.

Supposethatyouwillbetoldthesuitofarandomlychosencard.Isthismorevaluabletoyouif(i)(perfecttest

case)yourgoalistolearnthesuit,i.e.,RP(Suit,Suit),or(ii)(inconclusivetestcase)yourgoalistolearnthe

specificcard,i.e.,RP(Suit�Value,Suit)?Whatistheratioofthevalueoftheexpectedentropyreductionin(i)vs.

(ii)?Fordegree1,theinformationtobeobtainedhasequalvalueineachcase.Fordegreesgreaterthan1,the

perfecttestismoreuseful.Fordegreeslessthanone,theinconclusivetestismoreuseful.Interestingly,asthe

figureshows,thedegreeparameteruniquelydeterminestherelativevalueofRP(Suit,Suit)andRP(Suit�Value,

Suit),regardlessoftheorderparameter.IntheFigure,valuesoftheorderparameterrandofthedegree

parametertlieonthex–andy–axis,respectively.Colorrepresentsthelogoftheratiobetweentheconclusive

testandtheinconclusivetestcaseinthecardexampleabove:blackmeansthattheinformationvaluesofthe

testsareequal(logoftheratiois0);warm/coolshadesindicatethattheconclusivetesthasahigher/lower

value,respectively(logoftheratioispositive/negative).

Page 28: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

28

Themeaningofahighdegreeparameterisofparticularinterestinso-calledTsallisfamily

ofentropymeasures,obtainedfrom-./_q(r,o) whenr=t(seeTable4).ConsiderTsallis

entropyofdegree30,thatis-./_q(êv,êv).Withthismeasure,entropyremainsveryclosetoa

upperboundvalueof1/(t–1)≈0.0345unlesstheprobabilitydistributionreflectsnear-

certaintyaboutthetrueelementinthehypothesissetH.Forinstance,forasunevena

distributionas{0.90,0.05,0.05},-./ëíPZZ%í(êv)yieldsentropy0.03330,stillcloseto0.0345,

whileitquicklyapproaches0whentheprobabilityofonehypothesisexceeds0.99.Non-

Certaintyentropyseemsausefullabelforfuturereference,asmeasure-./ëíPZZ%í(êv) essentially

impliesthatentropyisalmostinvariantaslongasanappreciablelackofcertainty(a

“reasonabledoubt”,asitwere)endures.Accordingly,theentropyreductionfromapieceof

evidenceeislargelynegligibleunlessoneisledtoacquireaveryhighdegreeofcertainty

aboutH,anditapproachestheupperboundof1/(t–1)astheposteriorprobabilitycomes

closetomatchingatruth-valueassignment(withP(hi)=1forsomeiand0forallotherhs).

Uptotheinconsequentialnormalizingconstantt–1,theexpectedreductionofthisentropy,

@ëíPZZ%í(êv),amountstoasmoothvariantofNelson’setal.(2010)“probability-of-certainty

heuristic”,whereadatumeiÎEhasinformationalutility1ifitrevealsthetrueelementinH

withcertaintyandutility0otherwise,sothattheexpectedutilityofEitselfisjusttheoverall

probabilitythatcertaintyaboutHiseventuallyachievedbythattest.Theseremarksfurther

illustratethatalargerdegreetimpliesanincreasingtendencyofthecorrespondingR-

measuretovaluehighlytheattainmentofcertaintyorquasi-certaintyaboutthetarget

hypothesissetwhenassessingatest.

5.Asystematicexplorationofhowkeyinformationsearchmodelsdiverge

Dependingondifferententropyfunctions,twomeasuresRandR*oftheexpectedreductionof

entropyastheinformationalutilityoftestsmaydisagreeintheirrankings.Formally,there

existvariablesH,E,andFandprobabilitydistributionP(H,E,F)suchthat@0(J, L) > @0(J, M)

while@0∗(J, L) < @0∗(J, M);thus,R-measuresarenotgenerallyordinallyequivalent.Inthe

following,wewillfocusonanillustrativesampleofmeasuresintheSharma-Mittalframework

andshowthatsuchdivergencescanbewidespread,strong,andtellingaboutthespecific

Page 29: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

29

tenetsofthosemeasures.Thismeansthatdifferententropymeasurescanprovidemarkedly

divergentimplicationsintheassessmentofpossiblequeries’expectedusefulness.Depending

ontheinterpretationofthemodels,thisinturnimpliesconflictingempiricalpredictions

and/orincompatiblenormativerecommendations.

Ourlistwillincludethreeclassicalmodelsthatarestandardatleastinsomedomains,namely

Shannon,Quadratic,andErrorentropy.Italsoincludesthreemeasureswhichwepreviously

labelledheuristicorquasi-heuristicinthattheylargelyorcompletelydisregardquantitative

informationconveyedbytherelevantprobabilitydistributionP:theseareOriginentropy(or

the“numberofcontenders”),Hartleyentropy,andNon-Certaintyentropy,asdefinedabove.

Forawidercoverageandcomparison,wealsoincludeanentropyfunctionlyingwellbelow

theArimotocurveinFigure2,thatis,-./_q(uv,v) ,andthuslabelledNon-Concave(seeFigure4).

Weransimulationstoidentifycasesofstrongdisagreementbetweenoursevenmeasures

ofexpectedentropyreduction,onapairwisebasis,aboutwhichoftwotestsistakentobe

moreuseful.Ineachsimulation,weconsideredascenariowithathreefoldhypothesisspaceH

={h1,h2,h3},andtwobinarytests,E={e,-}andF={f,2}.8Thegoalofeachsimulationwasto

findacase—thatis,aspecificjointprobabilitydistributionP(H,E,F)—wheretwoR-

measuresstronglydisagreeaboutwhichoftwotestsismostuseful.Theidealscenariohereis

acasewhereexpectedreductionofonekindofentropy(say,Origin)impliesthatEisasuseful

ascanpossiblybefound,whileFisasbadasitcanbe,andtheexpectedreductionofanother

kindofentropy(say,Shannon)impliestheopposite,withequalstrengthofconviction.

ThequantificationofthedisagreementbetweentwoR-measuresinagivencase—fora

givenP(H,E,F)—arisesfromthreesteps(alsoseeNelsonetal.,2010).(i)Normalization:for

eachmeasure,wedividenominalvaluesofexpectedentropyreduction(foreachofEandF)

bytheexpectedentropyreductionofaconclusivetestforthreeequallyprobablehypotheses,

8Weusedthree-hypothesisscenariostoillustratethedifferencesamongourselectedsampleofRmeasures,

becausescenariosofthiskindappearedtoofferareasonablebalanceofbeingsimpleyetpowerfulenoughto

deliverdivergencesthatarestrongandintuitivelyclear.Notehoweverthattwo-hypothesisscenarioscanalso

clearlydifferentiatemanyoftheRmeasures(seethereviewofbehavioralresearchonbinaryclassificationtasks

inthesubsequentsections).

Page 30: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

30

thatis,by@;(J, J).(ii)PreferenceStrength:foreachmeasure,wecomputethesimple

differencebetweenthe(normalized)expectedentropyreductionfortestEandfortestF,that

is,ñó(U,W)ñò(U,U)– ñó(U,ö)ñò(U,U)

.(iii)DisagreementStrength(DS):ifthetwomeasuresagreeonwhetherE

orFismostuseful,DSisdefinedaszero;iftheydisagree,DSisdefinedasthegeometricmean

ofthosemeasures’respectiveabsolutepreferencestrengthsinstep(ii).

Inthesimulations,avarietyoftechniqueswereinvolvedinordertomaximize

disagreementstrength,includingrandomgenerationofpriorprobabilitiesoverHandof

likelihoodsforEandF,optimizationoflikelihoodsalone,andjointoptimizationoflikelihoods

andpriors.EachexamplereportedherewasfoundintheattempttomaximizeDSfora

particularpairofmeasures.Wereliedonthesimulationslargelyasaheuristictool,thus

selectingandslightlyadaptingthenumericalexamplestomakethemmoreintuitiveand

improveclarity.9

ForeachpairofR-measuresinoursampleofseven,atleastonecaseofmoderateorstrong

disagreementwasfound(Table5).Thus,foreachpairwisecomparisononecanidentify

probabilitiesforwhichthemodelsmakedivergingclaimsaboutwhichtestismoreuseful.In

whatfollows,weappendashortdiscussiontothecasesinwhichShannonentropystrongly

disagreeswitheachcompetingmodel.Suchdiscussionisillustrativeandqualitative,to

intuitivelyhighlighttheunderlyingpropertiesofdifferentmodels.Similarexplicationscould

beprovidedforallotherpairwisecomparisons,butareomittedforthesakeofbrevity.

Shannonvs.Non-CertaintyEntropy(case3inTable5;DS=0.30).Initspurestform,Non-

Certaintyentropyequals0ifonehypothesisinHisknowntobetruewithcertainty,and1

otherwise.Asaconsequence,theentropyreductionexpectedfromatestEjustamountstothe

probabilitythatfullcertaintywillbeachievedafterEisperformed.WithintheSharma-Mittal

framework,thisbehaviorcanbeoftenapproximatedbyanentropymeasuresuchasTsallisof

degree30,asexplainedabove.10OneexamplewheretheexpectedreductionofShannonand

9Itisimportanttonotethattheproceduresweuseddonotguaranteefindinggloballymaximalsolutions;thus,a

failuretofindacaseofstrongdisagreementdoesnotnecessarilyentailthatnosuchcaseexists.

10Oneshouldnote,however,thatTsallis30,unlikepurenon-certaintyentropy,isacontinuousfunction.Asa

consequence,theapproximationdescribedeventuallyfailswhenonegetsveryclosetolimitingcases.More

Page 31: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

31

Non-CertaintyentropydisagreesignificantlyinvolvesapriorP(H)={0.67,0.10,0.23}.The

Non-CertaintymeasureratesverypoorlyatestEsuchthatP(H|e)={0.899,0.100,0.001},

P(H|-)={0.001,0.100,0.899},andP(e)=0.74,andstronglyprefersatestFsuchthatP(H|f)=

{1,0,0},P(H|2)={0.40,0.18,0.42},andP(f)=0.45,becausetheprobabilitytoattainfull

certaintyfromFissizable(45%).TheexpectedreductionofShannonentropyimpliesthe

oppositeranking,becausetestE,whileunabletoprovidefullcertainty,willinvariablyyielda

highlyskewedposteriorascomparedtotheprior.

Shannonvs.OriginandHartleyEntropy(case5inTable5;DS=0.56andDS=0.48,

respectively).ThereductionofbothOriginandHartleyentropysharesimilarideasof

countinghowmanyhypothesesareconclusivelyruledoutbytheevidence.Forexample,with

priorP(H)={0.500,0.499,0.001},theexpectedreductionofeitherOriginorHartleyentropy

assignsvaluezerototestEsuchthatP(H|e)={0.998,0.001,0.001},P(H|-)={0.001,0.998,

0.001},andP(e)=0.501,becausenohypothesisiseverruledoutconclusively,andrather

preferstestFsuchthatP(H|f)={0.501,0.499,0},P(H|2)={0,0.499,0.501},andP(f)=0.998.

TheexpectedreductionofShannonentropyimpliestheoppositeranking,becauseFwill

almostalwaysyieldonlyatinychangeinoveralluncertainty.

Shannonvs.Non-ConcaveEntropy(case6inTable5;DS=0.26).Fornon-concaveentropies,

theexpectedentropyreductionmayturnouttobenegative,thusindicatinganallegedly

detrimentalquery,thatis,atestwhereexpectedutilityislowerthanthatofacompletely

irrelevanttest.Thisfeatureyieldscasesofsignificantdisagreementbetweentheexpected

reductionofourillustrativeNon-Concaveentropy,-./_q(uv,v) ,andofclassicalconcave

measuressuchasShannon.WithapriorP(H)={0.66,0.17,0.17},theNon-Concavemeasure

ratesatestEsuchthatP(H|e)={1,0,0},P(H|-)={1/3,1/3,1/3},andP(e)=0.49muchlower

thananirrelevanttestFsuchthatP(H|f)=P(H|2)=P(H).Indeed,thenon-concaveR-measure

assignsasignificantnegativevaluetotestE.Thiscriticallydependsononeinterestingfact:for

precisely,Tsallis30entropyrapidlydecreasesforalmostcertaindistributionssuchas,say,P(H)={0.998,0.001,

0.001}.Infact,Tsallis30entropyissizableandalmostconstantifP(H)conveysaless-than-almost-certainstateof

belief,andbecomeslargelynegligibleotherwise.

Page 32: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

32

Non-Concaveentropy,goingfromP(H)toacompletelyflatposterior,P(H|-),isanextremely

aversiveoutcome(i.e.itimpliesaverylargeincreaseinuncertainty),whilethe49%chanceof

achievingcertaintybydatumeisnothighlyvalued(afeatureoflowdegreemeasures,aswe

know).TheexpectedreductionofShannonentropyimpliestheoppositerankinginstead,asit

conveystheprinciplethatnotestcanbeinformationallylessusefulthananirrelevanttest

(suchasF).

Shannonvs.QuadraticEntropy(case8inTable5;DS=0.09).ShannonandQuadratic

entropiesaresimilarinmanyways,yetatleastcasesofmoderatedisagreementcanbefound.

OneiswithpriorP(H)={0.50,0.14,0.36}.TestEissuchthatP(H|e)={0.72,0.14,0.14},

P(H|-)={0.14,0.14,0.72},andP(e)=0.62,whilewithtestFonehasP(H|f)={0.5,0.5,0},

P(H|2)={0.5,0,0.5},andP(f)=0.28.ExpectedQuadraticentropyreductionranksEoverF,as

itputsaparticularlyhighvalueonposteriordistributionswhereonesinglehypothesiscomes

toprevail.Incomparison,thisislessimportantforthereductionofShannonentropy,aslong

assomehypothesesarecompletely(orlargely)ruledout,asoccurswithF.Accordingly,the

ShannonmeasureprefersFoverE.

Shannonvs.ErrorEntropy(case9inTable5;DS=0.20).Astrongerdisagreementarises

betweenShannonandErrorentropy.ConsiderpriorP(H)={0.50,0.18,0.32},atestEsuch

thatP(H|e)={0.65,0.18,0.17},P(H|-)={0.17,0.18,0.65},andP(e)=0.69,andatestFsuch

thatP(H|f)={0.5,0.5,0},P(H|2)={0.5,0,0.5},andP(f)=0.36.Theexpectedreductionof

ErrorentropyissignificantwithEbutzerowithF,becausethelatterwillleavethemodal

probabilityuntouched.(Notethatitdoesnotmatterthatthehypotheseswiththemaximum

probabilitychanged.)However,testF,unlikeE,willinvariablyruleoutanhypothesisthatwas

apriorisignificantlyprobable,andforthisreasonispreferredbytheShannonR-measure.

Page 33: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

33

Table5.Casesofstrongdisagreementbetweensevenmeasuresofexpectedentropyreduction.TwobinarytestsEandFareconsideredforaternaryhypothesissetH.Preferencestrengthisthedifferencebetween(normalized)valuesofexpectedentropyreductionforEandF,respectively:itispositiveiftestEisstrictlypreferred,negativeifFisstrictlypreferred,andnullif

they’reratedequally.Themostrelevantpreferencevaluestobecomparedarehighlightedinbold:theyillustratethat,foreachpairofR-measuresinoursampleofseven,thetable

includesatleastonecaseofmoderateorstrongdisagreement.

n. P(H)TestE TestF Preferencestrengthintheexpectedreductionofentropy

P(H|e)vs.P(H|!) P(e)vs.P(!) P(H|f)vs.P(H|") P(f)vs.P(") Non-Certainty Origin Hartley Non-Concave Shannon Quadratic Error

1 {0.50,0.25,0.25} {0.5,0.5,0}{0.5,0,0.5}

0.50.5

{1,0,0}{1/3,1/3,1/3}

0.250.75

–0.250 0.250 0.119 0.250 0.119 0 0

2 {0.67,0.17,0.17} {0.82,0.17,0.01}{0.01,0.17,0.82}

0.80.2

{1,0,0}{1/3,1/3,1/3}

0.490.51

–0.487 –0.490 –0.490 0.394 0.046 0.062 0.240

3 {0.67,0.10,0.23} {0.899,0.1,0.001}{0.001,0.1,0.899}

0.740.26

{1,0,0}{0.40,0.18,0.42}

0.450.55

–0.409 –0.450 –0.450 0.342 0.218 0.249 0.329

4 {0.6,0.1,0.3} {1,0,0}{1/3,1/6,1/2}

0.40.6

{0.7,0.3,0}{0.55,0.0.45}

1/32/3

0.400 –0.100 0.031 0.045 0.051 0.155 0.150

5 {0.5,0.499,0.001} {0.998,0.001,0.001}{0.001,0.998,0.001}

0.5010.499

{0.501,0.499,0}{0,0.499,0.501}

0.9980.002

0.942 –0.500 –0.369 0.499 0.617 0.744 0.746

6 {0.66,0.17,0.17} {1,0,0}{1/3,1/3,1/3}

0.490.51

{0.66,0.17,0.17}{0.66,0.17,0.17}

0.50.5

0.490 0.490 0.490 –0.236 0.288 0.250 0

7 {0.53,0.25,0.22} {1,0,0}{0.295,0.375,0.330}

1/32/3

{0.53,0.25,0.22}{0.53,0.25,0.22}

0.50.5

0.333 0.333 0.333 –0.123 0.261 0.249 0.080

8 {0.50,0.14,0.36} {0.72,0.14,0.14}{0.14,0.14,0.72}

0.620.38

{0.5,0.5,0}{0.5,0,0.5}

0.280.72

0 –0.500 –0.369 0.293 –0.085 0.086 0.330

9 {0.50,0.18,0.32} {0.65,0.18,0.17}{0.17,0.18,0.65}

0.690.31

{0.5,0.5,0}{0.5,0,0.5}

0.360.64

0 –0.180 –0.133 0.213 –0.179 –0.024 0.225

10 {0.42,0.42,0.16} {0.5,0.5,0}{0,0,1}

0.840.16

{0.66,0.24,0.10}{0.10,0.66,0.24}

0.570.43

0.160 0.580 0.470 –0.146 0.241 0.115 –0.120

Page 34: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

34

6.Modelcomparison:Predictionandbehavior

Nowthatwehaveseenexamplesillustratingthetheoreticalpropertiesofavarietyof

Sharma-Mittalrelevancemeasures,weturntoaddressingwhethertheSharma-Mittal

measurescanhelpwithpsychologicalornormativetheoryofthevalueofinformation.

6.1.ComprehensiveanalysisofWason’sabstractselectiontask

ThesinglemostwidelystudiedexperimentalinformationsearchparadigmisWason’s

(1966)selectiontask.Intheclassical,abstractversion,participantsarepresentedwitha

conditionalhypothesis(or“rule”),h=“ifA[antecedent],thenC[consequent]”.The

hypothesisconcernssomecards,eachofwhichhasaletterononesideandanumberon

theother,forinstanceA=“thecardhasavowelononeside”andC=“thecardhasaneven

numberontheotherside”.Onesideisdisplayedforeachoffourcards:oneinstantiatingA

(e.g.,showingletterE),oneinstantiatingnot-A(e.g.,showingletterK),oneinstantiatingC

(e.g.,showingnumber4),andoneinstantiatingnot-C(e.g.,showingnumber7).

Participantshavethereforefourinformationsearchoptionsinordertoassessthetruthor

falsityofhypothesish:turningovertheA,thenot-A,theC,orthenot-Ccard.Theyareasked

tochoosewhichonestheywouldpickupasusefultoestablishwhetherthehypothesis

holdsornot.All,none,oranysubsetofthefourcardscanbeselected.

AccordingtoWason’s(1966)original,“Popperian”readingofthetask,theAandnot-C

searchoptionsareusefulbecausetheycouldfalsifyh(bypossiblyrevealingaevennumber

andavowel,respectively),soarationalagentshouldselectthem.Thenot-AandCoptions,

onthecontrary,couldnotprovideconclusivelyrefutingevidence,sothey’reworthlessin

thisinterpretation.However,observedchoicefrequenciesdepartmarkedlyfromthese

prescriptions.InOaksfordandChater’s(1994,p.613)metaanalysis,theywere89%,16%,

62%,and25%forA,not-A,C,andnot-C,respectively.OaksfordandChater(1994,2003)

devisedBayesianmodelsofthetaskinwhichagentstreatthefourcardsassampledfroma

largerdeckandareassumedtomaximizetheexpectedreductionofuncertainty,with

Shannonentropyasthestandardmeasure.OaksfordandChaterpostulatedafoil

hypothesisℎinwhichAandCarestatisticallyindependentandatargethypothesishunder

Page 35: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

35

whichCalways(oralmostalways)followsA.InOaksfordandChater’s(1994)

“deterministic”analysis,CalwaysfollowedAunderthedependencehypothesish.Akey

innovationinOaksfordandChater(2003,p.291)wastheintroductionofan“exception”

parameter,suchthatP(C|A)=1–P(exception)underh.Themodelalsorequires

parametersaandgfortheprobabilitiesP(A)andP(C)oftheantecedentandconsequentof

h.WeimplementOaksfordandChater’s(2003)model,positinga=0.22andg=0.27

(accordingtothe“rarity”assumption),andanuniformprioronH={h,ℎ},assuggestedinOaksfordandChater(2003,p.296).Weexploredtheimplicationsofcalculatingthe

expectedusefulnessofturningovereachcard,notonlyaccordingtoShannonentropy

reduction,butforthewholesetofentropymeasuresfromtheSharma-Mittalframework.11

EmpiricalData.Wefirstaddresshowwelldifferentexpectedentropyreduction

measurescorrespondtoempiricalaggregatecardselectionfrequenciesinthetask,with

respecttoOaksfordandChater’s(2003)model.Fortheselectionfrequencies,weusethe

abstractselectiontaskdataasreportedbyOaksfordandChater(1994,p.613)and

mentionedabove(89%,16%,62%,and25%forA,not-A,C,andnot-C,respectively).

Figure6(toprow)showstherankcorrelationbetweenrelevancevaluesandempirical

selectionfrequenciesforeachorderanddegreevaluefrom0to20,instepsof0.25.First

considerresultsforthemodelwithP(exception)=0(Figure6,topleftsubplot).Awide

rangeofmeasures,includingexpectedreductionofShannonandQuadraticentropy,of

somenon-concaveentropies(e.g.,#$%('(,'.+))andofmeasureswithfairlyhighdegree(e.g.,

#$%('(,-))correlateperfectlywiththerankofselectionfrequencies.However,ifahigh

degreemeasurewithmoderateorhighorderisused,therankcorrelationisnotperfect.

11Tofittherelevantpatternsofresponses,wepursuedavarietyofmethods,includingoptimizingHattori’s

“selectiontendencyfunction”(whichmapsexpectedentropyreductionontothepredictedprobabilitythata

cardwillbeselected,seeHattori,1999,2002;alsoseeStringer,Borsboom,&Wagenmakers,2011),ortaking

previouslyreportedparametersforHattori’sselectiontendencyfunction;Spearmanrankcorrelation

coefficients;andPearsoncorrelations.Similarresultswereobtainedacrossthesemethods.Becausetherank

correlationsaresimpletodiscuss,wefocusonthosehere.Fullsimulationresultsfortheseandother

measures,modelvariantswithothervaluesofP(exception),andMatlabcode,areavailablefromJ.D.N.

Page 36: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

36

ConsiderforinstancetheTsallismeasureofdegree20(i.e.#$%(.(,.()).Thisleadsto

relevancevaluesfortheA,not-A,C,andnot-Ccardsof0.0281,0.0002,0.0008,and0.0084,

respectively.BecausetherelativeorderingoftheCandthenot-Ccardisincorrect(from

theperspectiveofobservedchoices),therankcorrelationisonly0.8.Thesamerank

correlationof0.8isobtained,butforadifferentreason,fromstronglynon-concave

relevancemeasures.#$%(.(,(),forinstance,givesvaluesof1.181,0.380,1.054,and0.372

(againfortheA,not-A,C,andnot-Ccards,respectively),sothatthenot-Acardisdeemed

moreinformativethanthenot-Ccardbythisrelevancemeasure.

LetusnowconsiderexpectedreductionofOriginentropy,#$%((,() ,asanexampleofthe0-

ordermeasures.Itgivesrelevancevaluesof0.527,0,0,and0.159fortheA,not-A,C,andnot-C

cards,respectively.ThisissimilartoWason’sanalysisofthetask:onlytheAandthenot-C

cardscanfalsifyahypothesis(namely,thedependencehypothesish),thusonlythosetwo

cardshavevalue.Theothercardscouldchangetherelativeplausibilityofhvs.ℎ;however,accordingto0-ordermeasures,noinformationalvalueisachievedbecausenohypothesisis

definitelyruledout.Inthissense,0-ordermeasurescanbethoughtofasbringingelementsof

theoriginallogicalinterpretationoftheselectiontaskintothesameunifiedinformation-

theoreticframeworkincludingShannonandgeneralizedentropies(seebelowformoreon

this).Interestingly,thisdoesnotimplythattheAandthenot-Ccardsareequallyvaluable:in

themodel,theAcardoffersahigherchanceoffalsifyinghthanthenot-Ccard,soitismore

valuable,accordingtothisanalysis.Thus,whileincorporatingthebasicideaoftheimportance

ofpossiblefalsification,the0-orderSharma-Mittalformalizationofinformationalvalueoffers

somethingthatthestandardlogicalreadingdoesnot:arationaleforassessingtherelative

valueamongthosequeries(theAandthenot-Ccard)providingthepossibilityoffalsifyinga

hypothesis.TheOriginentropyvaluesandtheempiricaldataagreethattheAcardismost

usefuland(uptoatie)thatthenot-Acardisleastuseful,butdisagreeonvirtuallyeverything

else;#$%((,() ’srankcorrelationtoempiricalcardselectionfrequenciesis0.6325.

WhatifOaksfordandChater’s(2003)modeliscombinedwithexceptionparameter

P(exception)=0.1,ratherthan0?Inthiscase,theempiricalselectionfrequenciesperfectly

correlatewiththetheoreticalvaluesforanevenwiderrangeofmeasuresthanforthe

“deterministic”model(Figure6,toprightplot).Forinstance,Tsallisofdegree11,i.e.#$%('',''),

Page 37: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

37

whichhadrankcorrelationof0.8withP(exception)=0,hasaperfectrankcorrelationwith

0.1.Thisisduetotherelativeorderingofthenot-AandCcards.FortheP(exception)=0

model,theA,not-A,C,andnot-Ccardshad#$%('','') relevanceof0.059,0.002,0.012,and

0.016,respectively;withP(exception)=0.1,thecards’respectiverelevancevaluesare0.019,

0.001,0.007,and0.005.Inaddition,adramaticdifferencebetweenP(exception)=0and

P(exception)=0.1arisesforthe0-ordermeasures.IfP(exception)>0,evenifverysmall,no

amountofobtaineddatacaneverleadtorulingoutahypothesisinthemodel.Therefore,with

P(exception)=0.1allcardshavezerovaluefor0-ordermeasures,andthecorrelationwith

behavioraldataisundefined(plottedblackinFigure6).

AprobabilisticunderstandingofWason’snormativeindications.Finally,wediscusshowwell

theexpectedinformationalvalueofthecards,ascalculatedusingOaksfordandChater’s

(2003)modelandvariousSharma-Mittalmeasures,correspondstoWason’soriginal

interpretationofthetask.Wethusconductedthesameanalysesasabove,butinsteadofusing

thehumanselectionfrequenciesweassumedthattheAcardwasselectedwith100%,thenot-

Acardwith0%,theCcardwith0%,andthenot-Ccardwith100%probability.The0-order

relevancemeasures,againwithinOaksfordandChater’s(2003)modelwithP(exception)=0,

provideaprobabilisticunderstandingofWason’snormativeindications.LikeWason,the0-

ordermeasuresdeemonlytheAandthenot-CcardstobeusefulwhenP(exception)=0.The

rankcorrelationwiththeoreticalselectionfrequenciesfromWason’sanalysisis0.94(see

Figure6,bottomleftplot).Whyisthecorrelationnotperfect?Theprobabilistic

understandingproposed,asdiscussedabove,goesbeyondthelogicalanalysis:becausetheA

cardoffersahigherprobabilityoffalsificationthanthenot-Ccarddoesintheprobability

model,the0-orderrelevancemeasuresvaluetheformermorethanthelatter.Recallthatour

hypotheticalparticipantsalwaysselectbothcardsthatentailthepossibilityoffalsifyingthe

dependencehypothesis;thus,thecorrelationislessthanone.Theworstcorrelationwith

Wason’srankingisfromthestronglynon-concavemeasures,suchas#$%(.(,();thiscorrelation

isexactlyzero.

Page 38: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

38

OaskfordandChater’s(2003)

modelwithP(exception)=0

OaskfordandChater’s(2003)

modelwithP(exception)=0.1

humandata

Wason'stheory

Figure6.PlotsofrankcorrelationvaluesfortheexpectedreductionofvariousSharma-Mittalentropiesin

OaksfordandChater’s(2003)modeloftheWasonselectiontask.Inthetoprow,modelsofexpected

entropyreductionarecomparedwithempiricalaggregatecardselectionfrequencies.Inthebottomrow,

instead,thecomparisoniswiththeoreticalchoicesimpliedbyWason’soriginalanalysisofthetask.Inthe

leftvs.rightcolumnstheconditionalprobabilityrepresentationof“ifvowel,thenevennumber”rulesout

expectionsorallowsforthem(withprobability0.1),respectively.

Page 39: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

39

TheWasonselectiontaskillustratesthetheoreticalpotentialoftheSharma-Mittal

framework.Whereasotherauthorsnotedtherobustnessofprobabilisticanalysesofthe

taskacrossdifferentmeasuresofinformationalutility(seeFitelson&Hawthorne,2010;

Nelson,2005,pp.985-986;Oaksford&Chater,2007),thevarietyofmeasuresinvolvedin

thoseanalysesaroseinanadhocway.Weextendthoseresults,andshowthateventhe

traditional,allegedlyanti-Bayesianreadingofthetaskcanberecoveredsmoothlyinone

overarchingframework.Inparticular,theimplicationsofWason’sPopperian

interpretationcanberepresentedwellbythemaximizationoftheexpectedreductionofan

entirelybalanced(order-0)Sharma-Mittalmeasure(suchasOriginorHartleyentropy)ina

deterministicreadingofthetask(i.e.,withP(exception)=0).Conversely,thismeansthat

adoptingaprobabilisticapproachtoWason’staskisnotbyitselfsufficienttoaccountfor

observedbehavior.Eventhen,infact,people’schoiceswouldstilldivergefromatleast

sometheoreticallyviablemodelsofinformationsearch.

6.2.Informationsearchinexperience-basedstudies

Isthesameexpecteduncertaintyreductionmeasureabletoaccountforhumanbehavior

acrossavarietyoftasks?Toexplorethisissue,wereviewedexperimentalscenarios

employedinexperience-basedinvestigationsofinformationsearchbehavior.Inthis

experimentalparadigm,participantslearntheunderlyingstatisticalstructureofan

environmentwhereitems(planktonspecimens)arevisuallydisplayedandsubjecttoa

binaryclassification(kindAvs.B)forwhichtwobinaryfeatures(yellowvs.blackeye;dark

vs.lightclaw)arepotentiallyrelevant.Immediatefeedbackisprovidedaftereachtrialina

learningphase,untilaperformancecriterionisreached,indicatingadequatemasteryofthe

environmentalstatistics.Inasubsequentinformation-acquisitiontestphaseofthis

procedure,bothofthetwofeatures(eyeandclaw)areobscured,andparticipantshaveto

selectthemostinformative/usefulfeaturerelativetothetargetcategories(kindsof

plankton).(SeeNelsonetal.,2010,foradetaileddescription.)Inourcurrentterms,these

scenariosconcernabinaryhypothesisspaceH={specimenofkindA,specimenofkindB}

andtwobinarytestsE={yelloweye,blackeye}andF={darkclaw,lightclaw}.Ineach

Page 40: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

40

case,theexperience-basedlearningphaseconveyedthestructureofthejointprobability

distributionP(H,E,F)toparticipants.Thetestphase,inwhicheitherfeatureEorFcanbe

viewed,representsawaytoseewhethertheparticipantsdeemed#/(0, 1)or#/(0, 2)tobegreater.

Table6.Choicesbetweentwobinarytests/experiments(Evs.F)forabinaryclassificationproblem(H)in

experience-basedexperimentalprocedures.Cases1-3aretakenfromNelsonetal.(2010,Exp.1);cases4-5from

Exp.3inthesamearticle;case6isanunpublishedstudyusingthesameexperimentalprocedure;cases7-8are

fromMederandNelson(2012,Exp.1).

n. P(H)TestE TestF %observed

choicesofEP(H|e)vs.P(H|3) P(e)vs.P(3) P(H|f)vs.P(H|4) P(f)vs.P(4)

1 {0.7,0.3} {0,1}

{0.754,0.246}

0.072

0.928

{1,0}

{0.501,0.499}

0.399

0.601

82%(23/28)

2 {0.7,0.3} {0,1}

{0.767,0.233}

0.087

0.913

{1,0}

{0.501,0.499}

0.399

0.601

82%(23/28)

3 {0.7,0.3} {0.109,0.891}

{0.978,0.022}

0.320

0.680

{1,0}

{0.501,0.499}

0.399

0.601

97%(28/29)

4 {0.7,0.3} {0,1}

{0.733,0,.267}

0.045

0.955

{1,0}

{0.501,0.499}

0.399

0.601

89%(8/9)

5 {0.7,0.3} {0.201,0.799}

{0.780,0.220}

0.139

0.861

{1,0}

{0.501,0.499}

0.399

0.601

70%(14/20)

6 {0.7,0.3} {0.135,0.865}

{0.848,0.152}

0.208

0.792

{1,0}

{0.501,0.499}

0.399

0.601

70%(14/20)

7 {0.44,0.56} {0.595,0.405}

{0.331,0.669}

0.414

0.586

{0,1}

{0.502,0.498}

0.123

0.877

60%(12/20)

8 {0.36,0.64} {0.090,0.910}

{0.707,0.293}

0.562

0.438

{0,1}

{0.501,0.499}

0.282

0.118

79%(15/19)

Overall,wefoundeightrelevantexperimentalscenariosfromtheexperimental

paradigmdescribedabove(theyarelistedinTable6)inwhichtherewasatleastsome

interestingdisagreementamongtheSharma-Mittalmeasuresaboutwhichfeatureismore

useful.Foreach,wederivedvaluesofexpecteduncertaintyreductionfromSharma-Mittal

Page 41: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

41

measuresoforderanddegreefrom0to20,inincrementsof0.25,andwecomputedthe

simpleproportionofcasesinwhicheachmeasure’srankingof#/(0, 1)and#/(0, 2)matchedthemostprevalentobservedchoice.

Nelsonetal.(2010)devisedtheirscenariostodissociatepredictionsfromasampleof

competingandhistoricallyinfluentialmodelsofrationalinformationsearch.Their

conclusionwasthattheexpectedreductionofErrorentropy(expectedprobabilitygain,in

theirterminology)accountedforparticipants’behaviorandoutperformedtheexpected

reductionofShannonentropy(expectedinformationgain,intheirterminology).Amore

comprehensiveanalysiswithinourcurrentapproachimpliesaricherpicture.Thedataset

employedcanbeaccuratelyrepresentedintheSharma-Mittalframeworkforasignificant

rangeofdegreevaluesprovidedthattheorderparameterishighenough(theresultsare

displayedinFigure7,leftside).Observedchoicesareespeciallyconsistentwithexpected

reductionofaquiteunbalanced(e.g.,r≥4),concaveorquasi-concave(tcloseto2)

Sharma-Mittalentropymeasure.Importantly,thereisoverlapbetweenresultsfrom

modelingtheWasonselectiontaskandtheseexperience-basedlearningdata,givinghope

totheideathataunifiedtheoreticalexplanationofhumanbehaviormayextendacross

severaltasks.

6.3.Informationsearchinwords-and-numbersstudies

Theexperience-basedlearningtasksdiscussedabovewereinspiredbyanalogoustasksin

whichthepriorprobabilitiesofcategoriesandfeaturelikelihoodswerepresentedto

participantsusingwordsandnumbers(e.g.,SkovandSherman,1986).Werefertosuch

tasksasPlanetVumaexperiments,reflectingthetypicallywhimsicalcontent,suchas

classifyingspeciesofaliensonPlanetVuma,designedtonotconflictwithpeople’s

experiencewithrealobjectcategories.

WhereasexpectedreductionofErrorentropy,andothermodelsasdiscussedabove,

givesaplausibleexplanationoftheexperience-basedlearningtaskdata,individualdatain

words-and-numbersstudiesareverynoisy,andnoattempthasbeenmadetoseewhether

aunifiedtheorycouldaccountforthemodalresponsesacrossthesetasks.Wethereforere-

analyzedempiricaldatafromseveralPlanetVumaexperiments,inamanneranalogousto

Page 42: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

42

ouranalysesoftheexperience-basedlearningdataabove(Figure7).Whatdotheresults

show?Tooursurprise,theresultssuggestthattheremaybeasystematicexplanationof

people’sbehavioronwords-and-numbers-basedtasks.

Experience-basedlearning(Plankton) Words-and-numberspresentation(PlanetVuma)

Figure7.Ontheleft,agraphicalillustrationoftheempiricalaccuracyofSharma-Mittalmeasuresrelativeto

binaryinformationsearchchoicesin8experience-basedexperimentalscenarios(describedinTable6).The

shadeateachpointillustratestheproportionofchoices(outof8)correctlypredictedbytheexpected

reductionofthecorrespondingunderlyingentropy,withwhiteandblackindicatingmaximum(8/8)and

minimum(0/8)accuracy,respectively.ResultssuggestthatanArimotometricofmoderateorhighorderis

highlyconsistentwithhumanchoices.Ontheright,illustrationoftheempiricalaccuracyofSharma-Mittal

measuresintheoreticallysimilartasks,butwhereprobabilisticinformationispresentedinastandard

explicitformat(withnumericpriorprobabilitiesandtestlikelihoods).Inthesetasks,individualparticipants’

testchoicesarehighlynoisy.Canasystematictheorystillaccountforthemodalresultsacrosstasks?We

analyzed13cases(describedinTable7)ofbinaryinformationsearchpreferences.Theshadeateachpoint

illustratestheproportionofcomparisons(outof13)correctlypredictedbytheexpectedreductionofthe

correspondingunderlyingentropy,withwhiteandblackagainindicatingmaximum(13/13)andminimum

(0/13)accuracy,respectively.Resultsshowthatawiderangeofmeasuresisconsistentwithavailable

experimentalfindings,includingShannonentropyaswellasavarietyofhigh-degreemeasures(degreemuch

higherthantheArimotocurve).

Page 43: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

43

Table7.Choicesbetweentwobinarytests/experiments(Evs.F)forabinaryclassificationproblem(H)in

words-and-numbers(VumaPlanet)experiments.Cases1-6arefromNelson(2005);case7isfromSkov&

Sherman(1986),cases8-10arefromNelsonetal.(2010,Exp.1);cases11-13fromWuetal.(inpress,Exp.1-3).

Ineachcase,testEwasdeemedmoreusefulthantestFbytheparticipants.Weonlyreportscenariosforwhich

atleasttwoSharma-Mittalmeasuresstrictlydisagreeaboutwhichofthetestshashigherexpectedusefulness.

(Thus,notallfeaturequeriesinvolvedintheoriginalarticlesarelistedhere.)Nelson(2005)askedparticipants

togivearankorderingamongfourpossiblefeatures’informationvalues.Herewelistthesixcorresponding

pairwisecomparisons,ineachcaselabelingthefeaturethatwasrankedhigherasthefavoriteone(E).Wuetal.

(inpress)studied14differentprobability,naturalfrequency,andgraphicalinformationformatsforthe

presentationofrelevantprobabilities.Forcomparisonwithotherstudies,wetakeresultsonlyfromthestandard

probabilityformathere.

n. P(H)TestE TestF

P(e|h) P(3|6) P(f|h) P(4|6)

1 {0.5,0.5} 0.70 0.30 0.99 1.00

2 {0.5,0.5} 0.30 0.0001 0.99 1.00

3 {0.5,0.5} 0.01 0.99 0.99 1.00

4 {0.5,0.5} 0.30 0.0001 0.70 0.30

5 {0.5,0.5} 0.01 0.99 0.30 0.0001

6 {0.5,0.5} 0.01 0.99 0.70 0.30

7 {0.5,0.5} 0.90 0.55 0.65 0.30

8 {0.7,0.3} 0.57 0 0 0.24

9 {0.7,0.3} 0.57 0 0 0.29

10 {0.7,0.3} 0.05 0.95 0.57 0

11 {0.7,0.3} 0.41 0.93 0.03 0.30

12 {0.7,0.3} 0.43 1.00 0.04 0.37

13 {0.72,0.28} 0.03 0.83 0.39 1.00

Page 44: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

44

ThedegreeofthemostplausiblemeasuresisconsiderablyabovetheArimotocurve,

althoughnotashighas,forinstance,Non-Certaintyentropy(order30).Fromadescriptive

psychologicalstandpoint,aplausibleinterpretationisthatwhenconfrontedwithwords-

and-numbers-typetasks,peoplehaveastrongfocusonthechancesofobtainingacertain

ornear-to-certainresult,andarelessconcernedwith(or,perhaps,attunedto)thedetails

oftheindividualitemsintheprobabilitydistribution.TheSharma-Mittalframework

providespotentialexplanationforheretoforeperplexingexperimentalresults,whilealso

highlightingkeyquestions(e.g.,howmuchpreferencefornear-certainty,exactly,do

subjectshave)forfutureempiricalresearchonwords-and-numberstasks.

6.4.UnifyingtheoryandintuitioninthePersonGame(Havingyourcakeandeatingittoo)

Inthissection,weintroduceanothertheoreticalconundrumfromtheliterature,andshow

howtheSharma-Mittalframeworkmayhelpsolveit.Aspointedoutabove,theexpected

reductionofErrorentropyhadappearedinitiallytoprovidethebestexplanationof

people’sintuitionsandbehavioronexperience-based-learning-basedinformationsearch

tasks(Nelsonetal.,2010).Butthismodelleadstopotentiallycounterintuitivebehavioron

anotherinterestingkindofinformationsearchtask,namelythePersonGame(avariantof

theTwentyQuestionsgame).Inthisgame,ncards(say,20)withdifferentfacesare

presented.Oneofthosefaceshasbeenchosenatrandom(withequalprobability)tobethe

correctfaceinaparticularroundofthegame.Theplayer’staskistofindthetruefacein

thesmallestnumberofyes/noquestionsaboutphysicalfeaturesofthefaces.Forinstance,

askingwhetherthepersonhasabeardwouldbeapossiblequestion,E={e,7},withe=beardand7=nobeard.Ifk<nisthenumberofcharacterswithabeard,thenP(e)=k/nandP(7)=(n–k)/n.Moreover,a“yes”answerwillleavekequiprobableguessesstillinplay,anda“no”answern–ksuchguesses.

Severalpapershavereported(seeNelsonetal.,2014,forreferences)thatpeople

preferentiallyaskaboutfeaturesthatarepossessedbycloseto50%oftheremaining

possibleitems,thuswithP(e)closeto0.5.Thisstrategycanbelabelledthesplit-half

Page 45: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

45

heuristic.Itisoptimaltominimizetheexpectednumberofquestionsneededundersome

taskvariants(Navarro&Perfors,2011),althoughnotinthegeneralcase(Nelson,Meder,&

Jones,2016),andcanbeaccountedforusingexpectedShannonentropyreduction.But

expectedShannonentropyreductioncannotaccountforpeople’sbehavioronexperience-

basedlearninginformationsearchtasks,asouraboveanalysesshow.CanexpectedError

entropyreductionaccountfortheseresultsandintuitions?Putmorebroadly,canthesame

entropymodelprovideasatisfyingaccountforboththePersonGameandtheexperience-

basedlearningtasks?Asithappens,Errorentropycannotaccountforthepreferenceto

splittheremainingitemscloseto50%.Infact,everypossiblequestion(unlessitsansweris

knownalready,becausenoneoralloftheremainingfaceshavethefeature)hasexactlythe

sameexpectedErrorentropyreduction,namely1/k,wheretherearekitemsremaining

(Nelson,Meder,&Jones,2016).Thismightleadustowonderwhetherwemusthave

differententropy/informationmodelstoaccountforpeople’sintuitionsandbehavior

acrossthesedifferenttasks.Indeed,itwouldcallintoquestionthepotentialforaunified

andgeneralpurposetheoryofthepsychologicalvalueofinformation.

ItturnsoutthatthefindingsonwhyexpectedShannonentropyreductionfavors

questionsclosetoa50:50split,andwhyErrorentropyhasnosuchpreference,applymuch

moregenerallythantoShannonandErrorentropy.Infact,forallSharma-Mittalmeasures,

theordinalevaluationofquestionsonthePersonGameissolelyafunctionofthedegreeof

theentropymeasure,andhasnothingtodowiththeorderofthemeasure[SupplMat,5].

Amongotherthings,thisimpliesthatallentropy-basedmeasureswithdegreet=1have

theexactsamepreferencesasexpectedShannonentropyreduction,andallofthem

quantifytheusefulnessofqueryingafeatureasafunctionoftheproportionofremaining

itemsthatpossessthatfeature.Similarly,alldegree-2measures,andnotonlyError

entropy,deemallquestionstobeequallyusefulinthePersonGame.Thecoreofthis

insightstemsfromthefactthat,ifaprobabilitydistributionisuniform,thentheentropyof

thatdistributiondependsonlyonthedegreeofaSharma-Mittalentropymeasure.More

formally,foranysetofhypothesesH={h1,h2,…hn}withauniformprobabilitydistribution

U(H):

789:$%(;,<)(0) = >8?(8)

Page 46: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

46

Figure8.TheexpectedentropyreductionofabinaryquestionE={e,7}inthePersonGamewithahypothesissetHofsize40(thepossibleguesses,thatis,charactersinitiallyinplay)asafunctionofthe

proportionofpossibleguessesremainingaftergettingdatume(e.g.,a“yes”answerto“hasthechosenperson

abeard?”).Questionsaredeemedmostvaluablewiththezero-degreeentropymeasures(bottomrightplot).

Althoughtheshapeofthecurveissimilarforthedegreet=0anddegreet=1measures,theactual

informationvalue(seetheyaxis)decreasesasthedegreeincreases.Fordegreet=2(forexampleforError

entropy),everyquestionisequallyuseful(providedthatthereissomeuncertaintyabouttheanswer;bottom

leftplot).Ifthedegreeisgreaterthan2,thentheleast-equally-splitquestions(e.g.,1:39questions,inthecase

of40items)aredeemedmostuseful(leftcolumn,topandmiddlerow).Theorderparameterisirrelevantfor

purposesofevaluatingquestions’expectedusefulnessinthePersonGame,becauseallpriorandpossible

posteriorprobabilitydistributionsareuniform(seetext).

Page 47: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

47

Figure8showshowpossiblequestionsarevalued,inthePersonGame,asafunctionof

theproportionofremainingitemsthatpossessaparticularfeature.Weseethatift=1,as

forShannonandallRényientropies,questionswithclosetoa50:50splitarepreferred.If

thedegreetisgreaterthan1butlessthan2,questionswithclosetoa50:50splitarestill

preferred,butlessso.Ift=2,then1:99and50:50questionsaredeemedequallyuseful.

Remarkably,ifthedegreeisgreaterthan2,thena1:99questionispreferredtoa50:50

question.

WhilethechoiceofparticularSharma-Mittalmeasuresisonlypartlyconstrainedby

observedpreferencesinthePersonGamealone(andspecificallythevalueoftheorder

parameterrisnot),nothinginprinciplewouldguaranteethatajointandcoherentaccount

ofsuchbehaviorandotherfindingsexists.Itisthenimportanttopointoutthatonecan,in

fact,pickupanentropymeasurewherebytheexperience-baseddataabovefollowalong

withagreaterinformativevaluefor50:50questionsthanfor1:99questionsinthePerson

Game.Forinstance,medium-orderArimotoentropies(suchas789$%('(,'.@))willwork.

7.Generaldiscussion

Inthispaper,wehavepresentedageneralframeworkfortheformalanalysisof

uncertainty,theSharma-Mittalentropyformalism.Thisframeworkgeneratesa

comprehensiveapproachtotheinformationalvalueofqueries(questions,tests,

experiments)astheexpectedreductionofuncertainty.Theamountoftheoreticalinsight

andunificationachievedisremarkable,inourview.Moreover,suchaframeworkcanhelp

usunderstandexistingempiricalresults,andpointoutimportantresearchquestionsfor

futureinvestigationofhumanintuitionandreasoningprocessesasconcernsuncertainty

andinformationsearch.

Mathematically,theparsimonyoftheSharma-Mittalformalismisappealingandyields

decisiveadvantagesinanalyticmanipulations,derivations,andcalculations,too.Withinthe

domainofcognitivescience,noearlierattempthasbeenmadetounifysomanyexisting

modelsconcerninginformationsearch/acquisitionbehavior.Notably,thisinvolvesboth

popularcandidaterationalmeasuresofinformationalutility(suchastheexpected

Page 48: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

48

reductionofShannonorErrorentropy)andavowedheuristicmodels,suchasBaronetal.’s

(1988,106)quasi-Popperianheuristic(maximizationoftheexpectednumberof

hypothesesruledout,i.e.,theexpectedreductionofOriginentropy)andNelsonetal.’s

(2010,962)“probability-of-certainty”heuristic(closelyapproximatedbytheexpected

reductionofahighdegreeTsallisentropy,orasimilarmeasure).Inaddition,onceapplied

touncertaintyandinformationsearch,theSharma-Mittalparametersarenotdumb

mathematicalconstruals,butrathercapturecognitivelyandbehaviorallymeaningfulideas.

Roughly,theorderparameter,r,captureshowmuchonedisregardsminorhypotheses(via

thekindofmeansappliedtotheprobabilityvaluesinP(H)).Thedegreeparametert,onthe

otherhand,captureshowmuchonecaresaboutgetting(veryclose)tocertainty(viathe

behaviorofthesurprise/atomicinformationfunction;seeFigure3).Thus,highorder

indicatesastrongfocusontheprevalent(mostlikely)elementinthehypothesissetand

lackofconsiderationforminorpossibilities.Averyloworder,ontheotherhand,impliesa

Popperianorquasi-Popperianattitudeintheassessmentoftests,withamarked

appreciationofpotentiallyfalsifyingoralmostfalsifyingevidence.Thedegreeparameter,

inturn,hasimportantimplicationsforhowmuchpotentiallyconclusiveexperimentsare

valued,ascomparedtoexperimentsthatareinformativebutnotconclusive.Moreover,for

eachparticularorder,ifthedegreeishigherthanthecorrespondingArimotoentropy(and

inanycaseiftheorderislessthan0.5orthedegreeisatleast2),thentheconcavityofthe

entropymeasureguaranteesthatnoexperimentwillberatedashavingnegativeexpected

usefulness.

EvenaccordingtofairlycautiousviewssuchasAczel’s(1984),theaboveremarksseem

toprovideafairlystrongmotivationtoconsiderpursuingageneralizedapproach.Hereis

anotherpossibleconcern,however.Uncertaintyandtheinformationalvalueoftestsmay

beinvolvedinmanyargumentsconcerninghumancognition.Nowweseethatthose

notionscanbeformalizedinmanydifferentways,suchthatdifferentproperties(say,

additivity,ornon-negativity)areorarenotimplied.Thus,theargumentsatissuemightbe

validforsomechoicesofthecorrespondingmeasuresandnotforothers.Thispointhas

beenlabelledtheissueofmeasure-sensitivityinrelatedareas(Fitelson,1999)—isit

somethingtobeworriedabout?Doesitraiseproblemsforourproposal?

Page 49: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

49

Itisnotuncommonformeasure-sensitivitytofosterskepticalordismissivereactionson

theprospectsoftheformalanalysisoftheconceptatissue(e.g.Hurlbert,1971,Kyburg&

Teng,2001,pp.98ff.).However,measure-sensitivityisawidespreadandmundane

phenomenon.Inareasrelatedtotheformalanalysisofreasoning,theissuearises,for

instance,forBayesiantheoriesofinductiveconfirmation(e.g.,Brössel2013;Crupi&

Tentori,2016;Festa&Cevolani,2016;Glass,2013;Hájek&Joyce,2008;Roche&Shogenji,

2014),scoringrulesandmeasuresofaccuracy(e.g.,D’Agostino&Sinigaglia,2010;Leitgeb

&Pettigrew,2010a,b;Levinstein,2012;Preddetal.,2009),andmeasuresofcausal

strength(e.g.,Griffiths&Tenenbaum,2005,2009;Fitelson&Hitchcock,2011;Meder,

Mayrhofer,&Waldmann,2014;Sprenger,2016).Ourtreatmentcontributestomakethe

samepointexplicitformeasuresofuncertaintyandtheinformationalvalueof

experiments.Thisweseeasaconstructivecontribution.Theprominenceofonespecific

measureinoneresearchdomainmaywellhavebeenpartlyaffectedbyhistorical

contingencies.Asaconsequence,whenatheoreticalorexperimentalinferencerelieson

thechoiceofonemeasure,itmakessensetocheckhowrobustitisacrossdifferentchoices

or,alternatively,toacknowledgewhichmeasure-specificpropertiessupporttheconclusion

andhowcompellingtheyare.Havingapluralityofrelatedmeasuresavailableisindeedan

importantopportunity.Itpromptsthoroughinvestigationofthefeaturesofalternative

optionsandtheirrelationships(e.g.,Crupi,Chater,&Tentori,2013;Huber&Schmidt-Petri,

2009;Nelson,2005,2008),itcanprovidearichsourceoftoolsforboththeorizingandthe

designofnewexperimentalinvestigations(e.g.,Rusconietal.,2014;Schupbach,2011;

Tentorietal.,2007),anditmakesitpossibletotailorspecificmodelstovaryingtasksand

contextswithinanotherwisecoherentapproach(e.g.,Crupi&Tentori,2014;Dawid&

Musio,2014;Oaksford&Hahn,2007).

WhichSharma-Mittalmeasuresaremoreconsistentwithobservedbehavioroverall?

Accordingtoouranalyses,asubsetofSharma-Mittalinformationsearchmodelsreceivesa

significantamountofconvergentsupport.Wefoundthatmeasuresofhighbutfiniteorder

accountingfortheexperience-based(planktontask)data(Figure7,leftside)arealso

empiricallyadequateforabstractselectiontaskdata(Figure6,toprow)andresultsfroma

TwentyQuestionskindoftasksuchasthePersonGame(Figure8).Ontheotherhand,the

Page 50: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

50

bestfitwithwords-and-numbers(PlanetVuma)informationsearchtasksindicatesa

differentkindofmodelwithintheSharma-Mittalframework(Figure7,rightside).For

thesecases,ouranalysisthussuggeststhatpeople’sbehaviormaycomplywithdifferent

measuresindifferentsituations,soakeyquestionarisesaboutthefeaturesofataskwhich

affectsuchvariationinaconsistentway,suchasacomparablystrongerappreciationof

certaintyorquasi-certaintyaspromptedbyanexperimentalprocedureconveying

environmentalstatisticsbyexplicitverbalandnumericalstimuli.

Beyondthisbroadoutlook,ourdiscussionalsoallowsfortheresolutionofanumberof

puzzles.Letusmentionalastone.Nelsonetal.(2010)hadconcludedfromtheir

experimentalinvestigationsthathumaninformationsearchinanexperience-basedsetting

wasappropriatelyaccountedforbymaximizationoftheexpectedreductionofError

entropy.Thisspecificmodel,however,exhibitssomequestionablepropertiesrelatedtoits

lackofmathematicalcontinuity:inparticular,ifthemostlikelyhypothesisinHisnot

changedbyanypossibleevidenceinE,thenthelatterhasnoinformationalutility

whatsoeveraccordingto#ABBCB ,nomatterifitcanruleoutothernon-negligiblehypothesesintheset(see,e.g.,cases1and6inTable6).FindingsfromBaronetal.(1988)

suggestthatthismightnotdescribehumanjudgmentadequately.Inthatstudy,

participantsweregivenafictitiousmedicaldiagnosisscenariowithP(H)={0.64,0.24,

0.12},andaseriesofpossiblebinarytestsincludingEsuchthatP(H|e)={0.47,0.35,0.18},

P(H|7)={1,0,0}andP(e)=0.68andanothercompletelyirrelevanttestF(withanevenchanceofapositive/negativeresultoneachoneoftheelementsinH,sothatP(H|f)=

P(H|D)=P(H)).Accordingto#ABBCB ,testsEandFarebothequallyworthless—

#/ABBCB(0, 1) = #/ABBCB(0, 2) = 0—becausehypothesish1ÎHremainsthemostlikelyno

matterwhat.Participants’meanratingsoftheusefulnessofEandFweremarkedly

different,however:0.48vs.0.09(ona0-1scale).Indeed,ratingEhigherthanFseemsat

leastreasonable,contrarytowhat#ABBCB implies.IntheSharma-Mittalframework,reconciliationispossible:expectedreductionofarelativelyhighorder(say,10)entropy

measurefromtheArimotofamilywouldaccountforNelsonetal.’s(2010)andsimilar

findings(seeFigure7),andstillwouldnotputtestEaboveonaparwiththeentirely

pointlesstestF.Indeed,givenourtheoreticalbackgroundandthelimitedempirical

Page 51: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

51

indicationsavailable,suchameasurewouldcountasaplausiblechoiceinourview,had

onetopickupaspecificentropyunderlyingawidelyapplicablemodeloftheinformational

utilityofexperiments.Moreover,thiskindofoperationhaswiderscope.Originentropy,for

instance,mayimplylargelyappropriateratingsinsomecontexts(say,biological)andyet

notbewell-behavedbecauseofitsdiscontinuities:aSharma-Mittalmeasuresuchas

789$%((.',(.') wouldthencloselyapproximatetheformerwhileavoidingthelatter.

Manyfurtherempiricalissuescanbeaddressed.Foroneinstance,ouranalysisofhuman

datainTables6-7andFigure7providesrelativelyweakandindirectevidenceagainstnon-

concaveentropymeasuresasabasisfortheassessmentoftheinformationalutilityof

queriesbyhumanagents.However,stronglydivergingpredictionscanbegeneratedfrom

concavevs.non-concavemeasures(asillustratedincases6and7,Table5),andhenceput

toempiricaltest.Moreover,ourexplanatoryreanalysisofpriorworkwasbasedonthe

aggregatedatareportedinearlierarticles—buthowdoesthisextendtoindividual

behavior?Weareawareofnostudiesthataddressquestionsofwhetherthereare

meaningfulindividualdifferencesinthepsychologyofinformation.Thus,whileinferences

aboutindividualsshouldbethegoal(Lee,2011),thisrequiresfutureresearch,perhaps

withadaptiveBayesianexperimentaldesigntechniques(Kimetal.,2014).Bettermodelsof

individual-levelpsychologycouldalsoservethegoalofidentifyingtheinformationthat

wouldbemostinformativeforindividualhumanlearners(Gureckis&Markant,2012),

potentiallyenhancingautomatedtutorsystems.Anotherideaconcernsthedirect

assessmentofuncertainty,e.g.,whethermoreuncertaintyisperceivedin,say,P(H)={0.49,

0.49,0.02}vs.P*(H)={0.70,0.15,0.15}.Judgmentsofthiskindarelikelytoplayarolein

humanreasoninganddecision-makingandmaybeplausiblymodulatedbyanumberof

interestingfactors.Moreover,anarrayofrelevantpredictionscanbegeneratedfromthe

Sharma-Mittalframeworktodissociatesubsetsofentropymeasures.Yetasfarasweknow,

andrathersurprisingly,noestablishedexperimentalprocedureexistsforadirect

behavioralmeasurementofthejudgedoveralluncertaintyconcerningahypothesisset;

thisisanotherimportantareaforfutureinvestigation.

Page 52: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

52

REFERENCES

AczélJ.(1984).Measuringinformationbeyondcommunicationtheory:Whysomegeneralized

informationmeasuresmaybeuseful,othersnot.AequationesMathematicae,27,1-19.

AczélJ.(1987).Characterizinginformationmeasures:Approachingtheendofanera.LectureNotes

inComputerScience,286,357-384.

AczélJ.,ForteB.,&NgC.T.(1974).WhytheShannonandHartleyentropiesare“natural”.Advances

inAppliedProbability,6,131-146.

ArimotoS.(1971).Information-theoreticalconsiderationsonestimationproblems.Informationand

Control,19,181-194.

AusterweilJ.L.&GriffithsT.L.(2011).Seekingconfirmationisrationalfordeterministichypotheses.

CognitiveScience,35,499-526.

Bar-HillelY.&CarnapR.(1953).Semanticinformation.BritishJournalforthePhilosophyofScience,

4,147-157.

BaronJ.(1985).Rationalityandintelligence.NewYork:CambridgeUniversityPress.

BaronJ.,BeattieJ.,&HersheyJ.C.(1988).HeuristicsandbiasesindiagnosticreasoningII:

Congruence,information,andcertainty.OrganizationalBehaviorandHumanDecisionProcesses,

42,88-110.

BarwiseJ.(1997).Informationandpossibilities.NotreDameJournalofFormalLogic,38,488-515.

BeckC.(2009).Generalisedinformationandentropymeasuresinphysics.ContemporaryPhysics,

50,495-510.

Ben-BassatM.&RavivJ.(1978).Rényi’sentropyandtheprobabilityoferror.IEEETransactionson

InformationTheory,24,324-331.

BenishW.A.(1999).Relativeentropyasameasureofdiagnosticinformation.MedicalDecision

Making,19,202-206.

BoztasS.(2014).OnRényientropiesandtheirapplicationstoguessingattacksincryptography.

IEICETransactionsonFundamentalsofElectronics,Communication,andComputerSciences,97,

2542-2548.

BradleyS.&SteeleK.(2014).Uncertainty,learningandthe‘problem’ofdilation.Erkenntnis,79,

1287–1303.

Page 53: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

53

BramleyN.R.,LagnadoD.,&SpeekenbrinkM.(2015).Conservativeforgetfulscholars:Howpeople

learncausalstructurethroughsequencesofinterventions.JournalofExperimentalPsychology:

Learning,Memory,andCognition,41,708-731.

BrierG.W.(1950).Verificationofforecastsexpressedintermsofprobability.MonthlyWeather

Report,78,1-3.

BrösselP.(2013).Theproblemofmeasuresensitivityredux.PhilosophyofScience,80,378–397.

CarnapR.(1952).TheContinuumofInductiveMethods.Chicago:UniversityofChicagoPress.

ChoA.(2002).Afreshtakeondisorder,ordisorderlyscience.Science,297,1268–1269.

CrupiV.&GirottoV.(2014).Fromistoought,andback:Hownormativeconcernsforsterprogress

inreasoningresearch.FrontiersinPsychology,5,219.

CrupiV.&TentoriK.(2014).Measuringinformationandconfirmation.StudiesintheHistoryand

PhilosophyofScience,47,81-90.

CrupiV.&TentoriK.(2016).Confirmationtheory.InA.Hájek&C.Hitchcock(eds.),Oxford

HandbookofPhilosophyandProbability(pp.650-665).Oxford:OxfordUniversityPress.

CrupiV.,ChaterN.,&TentoriK.(2013).Newaxiomsforprobabilityandlikelihoodratiomeasures.

BritishJournalforthePhilosophyofScience,64,189–204.

CrupiV.,TentoriK.,&LombardiL.(2009).Pseudodiagnosticityrevisited.PsychologicalReview,116,

971-985.

CsizárI.(2008).Axiomaticcharacterizationsofinformationmeasures.Entropy,10,261-273.

D’AgostinoM.&SinigagliaC.(2010).Epistemicaccuracyandsubjectiveprobability.InM.Suárez,M.

Dorato,&M.Rèdei(eds.),EpistemologyandMethodologyofScience(pp.95-105).Berlin:

Springer.

DaróczyZ.(1970).Generalizedinformationfunctions.InformationandControl,16,36-51.

DawidA.P.(1998).Coherentmeasuresofdiscrepancy,uncertaintyanddependence,with

applicationstoBayesianpredictiveexperimentaldesign.TechnicalReport139,Departmentof

StatisticalScience,UniversityCollegeLondon

(http://www.ucl.ac.uk/Stats/research/pdfs/139b.zip).

DawidA.P.&MusioM.(2014).Theoryandapplicationsofproperscoringrules.Metron,72,169-

183.

DenzlerJ.&BrownC.M(2002).Informationtheoreticsensordataselectionforactiveobject

recognitionandstateestimation.IEEETransactionsonPatternAnalysisandMachineIntelligence,

24,145-157.

Page 54: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

54

EvansJ.St.B.T.&OverD.E.(1996).Rationalityintheselectiontask:Epistemicutilityversus

uncertaintyreduction.PsychologicalReview,103,356-363.

FanoR.(1961).TransmissionofInformation:AStatisticalTheoryofCommunications.Cambridge:

MITPress.

FestaR.(1993).OptimumInductiveMethods.RijksuniversiteitGroningen.

FestaR.&CevolaniG.(2016).UnfoldingthegrammarofBayesianconfirmation:Likelihoodand

anti-likelihoodprinciples.PhilosophyofScience,forthcoming.

FitelsonB.(1999).ThepluralityofBayesianmeasuresofconfirmationandtheproblemofmeasure

sensitivity.PhilosophyofScience,66,S362–S378.

FitelsonB.&HawthorneJ.(2010).TheWasontask(s)andtheparadoxofconfirmation.

PhilosophicalPerspectives,24(Epistemology),207-241.

FitelsonB.&HitchcockC.(2011).Probabilisticmeasuresofcausalstrength.InP.McKayIllari,F.

Russo,&J.Williamson(eds.),CausalityintheSciences(pp.600–27).Oxford:OxfordUniversity

Press.

FloridiL.(2009).Philosophicalconceptionsofinformation.InG.Sommaruga(ed.),FormalTheories

ofInformation(pp.13-53).Berlin:Springer.

FloridiL.(2013).Semanticconceptionsofinformation.InE.Zalta(ed.),TheStanfordEncyclopedia

ofPhilosophy(Spring2013Edition).url=

http://plato.stanford.edu/archives/spr2013/entries/information-semantic.

FrankT.(2004).CompletedescriptionofageneralizedOrnstein-Uhlenbeckprocessrelatedtothe

non-extensiveGaussianentropy.PhysicaA,340,251-256.

GauvritN.&MorsanyiK.(2014).Theequiprobabilityfromamathematicalandpsychological

perspective.AdvancesinCognitivePsychology,10,119-130.

GibbsJ.P.&MartinW.T.(1962).Urbanization,technology,andthedivisionoflabor.American

SociologicalReview,27,667-677.

GigerenzerG.,HertwigR.,&PachurT.(eds.)(2011).Heuristics:TheFoundationsofAdaptive

Behavior.NewYork:OxfordUniversityPress.

GiniC.(1912).Variabilitàemutabilità.InMemoriedimetodologiastatistica,I:Variabilitàe

concentrazione(pp.189-358).Milano:Giuffrè,1939.

GlassD.H.(2013).Confirmationmeasuresofassociationruleinterestingness.Knowledge-Based

Systems,44,65-77.

Page 55: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

55

GneitingT.&RafteryA.E.(2007).Strictlyproperscoringrules,prediction,andestimation.Journal

oftheAmericanStatisticalAssociation,102,359-378.

GoodI.J.(1950).ProbabilityandtheWeightofEvidence.NewYork:Griffin.

GoodI.J.(1952).RationalDecisions.JournaloftheRoyalStatisticalSocietyB,14,107-114.

GoodI.J.(1967).Ontheprincipleoftotalevidence.BritishJournalforthePhilosophyofScience,17,

319-321.

GoosensW.K.(1976).Acritiqueofepistemicutilities.InR.Bogdan(ed.),Localinduction(pp.93-

114).Dordrecht:Reidel.

GriffithsT.L.&TenenbaumJ.B.(2005).Structureandstrengthincausalinduction.Cognitive

Psychology,51,334-384.

GriffithsT.L.&TenenbaumJ.B.(2009).Theory-basedcausalinduction.PsychologicalReview,116,

661-716.

GureckisT.M.&MarkantD.B.(2012).Self-directedlearning:Acognitiveandcomputational

perspective.PerspectivesonPsychologicalScience,7,464-481.

HájekA.&JoyceJ.(2008).Confirmation.InS.Psillos&M.Curd(eds.),RoutledgeCompaniontothe

PhilosophyofScience(pp.115-129).NewYork:Routledge.

HartleyR.(1928).Transmissionofinformation.BellsSystemsTechnicalJournal,7,535-563.

HassonU.(2016).Theneurobiologyofuncertainty:Implicationsforstatisticallearning.

PhilosophicalTransactionsB,371,20160048.

HattoriM.(1999).TheeffectsofprobabilisticinformationinWason’sselectiontask:Ananalysisof

strategybasedontheODSmodel.InProceedingsofthe16thAnnualMeetingoftheJapanese

CognitiveScienceSociety(pp.623-626).

HattoriM.(2002).AquantitativemodelofoptimaldataselectioninWason’sselectiontask.

QuarterlyJournalofExperimentalPsychology,55,1241-1272.

HavrdaJ.&CharvátF.(1967).Quantificationmethodofclassificationprocesses.Conceptof

structurala-entropy.Kybernetica,3,30-35.

HillM.(1973).Diversityandevenness:Aunifyingnotationanditsconsequences.Ecology,54,427-

431.

HoffmannS.(2008).Generalizeddistribution-baseddiversitymeasurement:Surveyandunification.

FacultyofEconomicsandManagementMagdeburg,WorkingPaper23(http://www.ww.uni-

magdeburg.de/fwwdeka/femm/a2008_Dateien/2008_23.pdf).

HorwichP.(1982).ProbabilityandEvidence.Cambridge,UK:CambridgeUniversityPress.

Page 56: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

56

HuberF.&Schmidt-PetriC.(eds.)(2009).DegreesofBelief.Dordrecht:Springer.

HurlbertS.H.(1971).Thenon-conceptofspeciesdiversity:Acritiqueandalternativeparameters.

Ecology,52,577-586.

JostL.(2006).Entropyanddiversity.Oikos,113,363-375.

KaniadakisG.,LissiaM.,&ScarfoneA.M.(2004).Deformedlogarithmsandentropies.PhysicaA,340,

41-49.

KatsikopoulosK.V.,SchoolerL.J.,&HertwigR.(2010).Therobustbeautyofordinary

information,PsychologicalReview,117,1259-1266.

KeylockJ.C.(2005).SimpsondiversityandtheShannon-Wienerindexasspecialcasesofa

generalizedentropy.Oikos,109:203-207.

KimW.,PittM.A.,LuZ.L.,SteyversM.,&Myung,J.I.(2014).Ahierarchicaladaptiveapproachto

optimalexperimentaldesign.NeuralComputation,26,2465-2492.

KlaymanJ.&HaY.-W.(1987).Confirmation,disconfirmation,andinformationinhypothesistesting.

PsychologicalReview,94,211-228.

KyburgH.E.&TengC.M.(2001).UncertainInference.NewYork:CambridgeUniversityPress.

LaaksoM.&TaageperaR.(1979).“Effective”numberofparties–Ameasurewithapplicationto

WestEurope.ComparativePoliticalStudies,12,3-27.

LandeR.(1996).Statisticsandpartitioningofspeciesdiversity,andsimilarityamongmultiple

communities.Oikos,76,5-13.

LeeM.D.(2011).HowcognitivemodelingcanbenefitfromhierarchicalBayesianmodels.Journalof

MathematicalPsychology,55,1-7.

LeggeG.E.,KlitzT.S.,&TjanB.S.(1997).Mr.Chips:Anidealobservermodelofreading.

PsychologicalReview,104,524-553.

LeitgebH.&PettigrewR.(2010a).AnobjectivejustificationofBayesianismI:Measuring

inaccuracy.PhilosophyofScience,77,201-235.

LeitgebH.&PettigrewR.(2010b).AnobjectivejustificationofBayesianismII:Theconsequencesof

minimizinginaccuracy.PhilosophyofScience,77,236-272.

LevinsteinB.(2012).LeitgebandPettigrewonaccuracyandupdating.PhilosophyofScience,79,

413-424.

LewontinR.C.(1972).Theapportionmentofhumandiversity.EvolutionaryBiology,6,381-398.

LindleyD.V.(1956).Onameasureoftheinformationprovidedbyanexperiment.Annalsof

MathematicalStatistics,27,986-1005.

Page 57: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

57

MarkantD.&GureckisT.M.(2012).Doestheutilityofinformationinfluencesamplingbehavior?In

N.Miyake,D.Peebles,&R.P.Cooper(eds.),Proceedingsofthe34thAnnualConferenceofthe

CognitiveScienceSociety(pp.719-724).Austin,TX:CognitiveScienceSociety.

MasiM.(2005).AstepbeyondTsallisandRényientropies.PhysicsLettersA,338,217-224.

MederB.&NelsonJ.D.(2012).Informationsearchwithsituation-specificrewardfunctions.

JudgmentandDecisionMaking,7,119-148.

MederB.,MayrhoferR.,&WaldmannM.R.(2014).Structureinductionindiagnosticcausal

reasoning.PsychologicalReview,121,277-301.

MuliereP.&ParmigianiG.(1993).Utilityandmeansinthe1930s.StatisticalScience,8,421-432.

NajemnikJ.&GeislerW.S.(2005).Optimaleyemovementstrategiesinvisualsearch.Nature,434,

387-391.

NajemnikJ.&GeislerW.S.(2009).Simplesummationruleforoptimalfixationselectioninvisual

search.VisionResearch,49,1286-1294.

NaudtsJ.(2002).Deformedexponentialsandlogarithmsingeneralizedthermostatistics.PhysicaA,

316,323-334.

NavarroD.J.&PerforsA.F.(2011).Hypothesisgeneration,sparsecategories,andthepositivetest

strategy.PsychologicalReview,118,120-134.

NelsonJ.D.(2005).Findingusefulquestions:OnBayesiandiagnosticity,probability,impact,and

informationgain.PsychologicalReview,112,979-999.

NelsonJ.D.(2008).Towardsarationaltheoryofhumaninformationacquisition.InM.Oaksford&

N.Chater(eds.),Theprobabilisticmind:Prospectsforrationalmodelsofcognition(pp.143-163).

Oxford:OxfordUniversityPress.

NelsonJ.D.&CottrellG.W.(2007).Aprobabilisticmodelofeyemovementsinconceptformation.

Neurocomputing,70,2256-2272.

NelsonJ.D.,DivjakB.,GudmundsdottirG.,MartignonL.,&MederB.(2014).Children’ssequential

informationsearchissensitivetoenvironmentalprobabilities.Cognition,130,74-80.

NelsonJ.D.,McKenzieC.R.M.,CottrellG.W.,&SejnowskiT.J.(2010).Experiencematters:

Informationacquisitionoptimizesprobabilitygain.PsychologicalScience,21,960-969.

NelsonJ.D.,MederB.,&JonesM.(2016).Onthefinelinebetween“heuristic”and“optimal”

sequentialquestionstrategies.Submitted.

Page 58: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

58

NelsonJ.D.,TenenbaumJ.B.,&MovellanJ.R.(2001).Activeinferenceinconceptlearning.InJ.D.

Moore&K.Stenning(eds.),Proceedingsofthe23rdConferenceoftheCognitiveScienceSociety

(pp.692-697).Mahwah(NJ):Erlbaum.

NiiniluotoI.&TuomelaR.(1973).TheoreticalConceptsandHypothetico-InductiveInference.

Dordrecht:Reidel.

OaksfordM.&ChaterN.(1994).Arationalanalysisoftheselectiontaskasoptimaldataselection.

PsychologicalReview,101,608-631.

OaksfordM.&ChaterN.(2003).Optimaldataselection:Revision,review,andre-evaluation.

PsychonomicBulletin&Review,10,289-318.

OaksfordM.&HahnU.(2007).Induction,deduction,andargumentstrengthinhumanreasoning

andargumentation.InA.Feeney&E.Heit(eds.),InductiveReasoning:Experimental,

Developmental,andComputationalApproaches(pp.269-301).Cambridge(UK):Cambridge

UniversityPress.

OaksfordM.&ChaterN.(2007).Bayesianrationality:Theprobabilisticapproachtohuman

reasoning.Oxford(UK):OxfordUniversityPress.

PatilG.&TailleC.(1982).Diversityasaconceptanditsmeasurement.JournaloftheAmerican

StatisticalAssociation,77,548-561.

PedersenP.&WheelerG.(2014).Demystifyingdilation.Erkenntnis,79:1305-1342.

PettigrewR.(2013).Epistemicutilityandnormsforcredences.PhilosophyCompass,8,897-908.

PopperK.R.(1959).TheLogicofScientificDiscovery.London:Routledge.

PreddJ.B.,SeiringerR.,LiebE.J.,OshersonD.,PoorH.V.,&KulkarniS.R.(2009).Probabilistic

coherenceandproperscoringrules.IEEETransactionsonInformationTheory,55,4786-4792.

RaiffaH.&SchlaiferR.(1961).AppliedStatisticalDecisionTheory.Boston:ClintonPress.

RaileanuL.E.&StoffelK.(2004).TheoreticalcomparisonbetweentheGiniIndexandInformation

Gaincriteria.AnnalsofMathematicsandArtificialIntelligence,41,77-93.

Ramírez-ReyesA.,Hernández-MontoyaA.R.,Herrera-CorralG.,&Domínguez-JiménezI.(2016).

DeterminingtheentropicindexqofTsallisentropyinimagesthroughredundancy.Entropy,18,

299.

RaoC.R.(2010).Quadraticentropyandanalysisofdiversity.Sankhya:TheIndianJournalof

Statistics,72A,70-80.

RenningerL.W.,CoughlanJ.,VergheseP.,&MalikJ.(2005).Aninformationmaximizationmodelof

eyemovements.AdvancesinNeuralInformationProcessingSystems,17,1121-1128.

Page 59: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

59

RényiA.(1961).Onmeasuresofentropyandinformation.InProceedingsoftheFourthBerkeley

SymposiumonMathematicalStatisticsandProbabilityI(pp.547-556).Berkeley(CA):University

ofCaliforniaPress.

RicottaC.(2003).Onparametricevennessmeasures.JournalofTheoreticalBiology,222,189-197.

RocheW.&ShogenjiT.(2014).Dwindlingconfirmation.PhilosophyofScience,81,114-137.

RocheW.&ShogenjiT.(2016).Informationandinaccuracy.BritishJournalforthePhilosophyof

Science,forthcoming.

RuggeriA.&LombrozoT.(2015).Childrenadapttheirquestionstoachieveefficientsearch.

Cognition,143,203-216.

RusconiP.,MarelliM.,D’AddarioM.,RussoS.,&CherubiniP.(2014).Evidenceevaluation:Measure

ZcorrespondstohumanutilityjudgmentsbetterthanmeasureLandoptimal-experimental-

designmodels.JournalofExperimentalPsychology:Learning,Memory,andCognition,40,703-

723.

SahooP.K.&AroraG.(2004).Athresholdingmethodbasedontwo-dimensionalRényi’sentropy.

PatternRecognition,37,1149-1161.

SavageL.J.(1972).Thefoundationsofstatistics.NewYork:Wiley.

SeltenR.(1998).Axiomaticcharacterizationofthequadraticscoringrule.ExperimentalEconomics,

1,43–61.

ShannonC.E.(1948).Amathematicaltheoryofcommunication.BellSystemTechnicalJournal,27,

379-423and623-656.

SharmaB.&MittalD.(1975).Newnon–additivemeasuresofentropyfordiscreteprobability

distributions.JournalofMathematicalSciences(Delhi),10,28-40.

SimpsonE.H.(1949).Measurementofdiversity.Nature,163,688.

SkovR.B.&ShermanS.J.(1986).Information-gatheringprocesses:Diagnosticity,hypothesis-

confirmationstrategies,andperceivedhypothesisconfirmation.JournalofExperimental

Psychology,22,93-121.

SlowiaczekL.M.,KlaymanJ.,ShermanS.L.,&SkovR.B.(1992).Informationselectionandusein

hypothesistesting:Whatisagoodquestion,andwhatisagoodanswer?Memory&Cognition,20,

392-405.

SprengerJ.(2016).Foundationsforaprobabilistictheoryofcausalstrength.See:http://philsci-

archive.pitt.edu/11927/1/GradedCausation-v2.pdf.

Page 60: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

60

StringerS.,BorsboomD.,&WagenmakersE.-J.(2011).Bayesianinferencefortheinformationgain

model.BehavioralResearchMethods,43,297-309.

TanejaI.J.,PardoL.,MoralesD.,&MenéndezM.L.(1989).Ongeneralizedinformationand

divergencemeasuresandtheirapplications:Abriefreview.Qüestiió,13,47-73.

TentoriK.,CrupiV.,BoniniN.,andOshersonD.(2007).Comparisonofconfirmationmeasures.

Cognition,103,107-119.

TribusM.&McIrvineE.C.(1971).Energyandinformation.ScientificAmerican,225,179-188.

TropeY.&BassokM.(1982).Confirmatoryanddiagnosingstrategiesinsocialinformation

gathering.JournalofPersonalityandSocialPsychology,43,22-34.

TropeY.&BassokM.(1983).Information-gatheringstrategiesinhypothesistesting.Journalof

ExperimentalandSocialPsychology,19,560-576.

TsallisC.(2002).Entropicnon-extensivity:Apossiblemeasureofcomplexity.Chaos,Solitons,and

Fractals,13,371-391.

TsallisC.(2004).Whatshouldastatisticalmechanicssatisfytoreflectnature?PhysicaD,193,3-34.

TsallisC.(2011).ThenonadditiveentropySqanditsapplicationsinphysicsandelsewhere:Some

remarks.Entropy,13,1765-1804.

Tsallis,C.(1988).PossiblegeneralizationofBoltzmann-Gibbsstatistics.JournalofStatistical

Physics,52,479-487.

TweeneyR.D.,DohertyM.,&KleiterG.D.(2010).Thepseudodiagnosticitytrap:Shouldparticipants

consideralternativehypotheses?Thinking&Reasoning,16,332-345.

VajdaI.&ZvárováJ.(2007).Ongeneralizedentropies,Bayesiandecisions,andstatisticaldiversity.

Kybernetika,43,675-696.

vanderPylT.(1978).Propriétésdel’informationd’ordreaetdetypeb.InThéoriedel’information:

Développementsrécentsetapplications(pp.161-171),ColloquesInternationalesduCNRS,276.

Paris:CentreNationaldelaRechercheScientifique.

WangG.&JiangM.(2005).Axiomaticcharacterizationofnon-linearhomomorphicmeans.

MathematicalAnalysisandApplications,303,350-363.

WasonP.(1960).Onthefailuretoeliminatehypothesesinaconceptualtask.QuarterlyJournalof

ExperimentalPsychology,12,129-140.

WasonP.(1966).Reasoning.InB.Foss(ed.),Newhorizonsinpsychology(pp.135-151).

Harmonsworth,(UK):Penguin.

Page 61: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

GENERALIZEDINFORMATIONTHEORYANDHUMANCOGNITION

61

WasonP.(1968).Reasoningaboutarule.QuarterlyJournalofExperimentalPsychology,20,273-

281.

WuC.,MederB.,FilimonF.,&NelsonJ.D.(inpress).Askingbetterquestions:Howpresentation

formatsguideinformationsearch.JournalofExperimentalPsychology:Learning,Memory,and

Cognition,43,1274-1297.

Page 62: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

62

ThisistheSupplementaryMaterialsfileof:

CrupiV.,NelsonJ.D.,MederB.,CevolaniG.,andTentoriK.,Generalizedinformationtheorymeetshumancognition:Introducingaunifiedframeworktomodeluncertaintyandinformationsearch.CognitiveScience,2018.

1.Generalizedlogarithmandexponential

ConsidertheTsallislogarithm,!"#(%) = (

()#*%(()#) − 1-,andnotethat1 + (1 − /)!"#(%) = %(()#),

therefore% = [1 + (1 − /)!"#(%)]2

234.Thisshowsthatthegeneralizedexponential5#6 =

[1 + (1 − /)%]2

234justistheinversefunctionof!"#(%).

Inordertoshowthattheordinarynaturallogarithmisrecoveredfrom!"#(%)(x>0)inthelimitfort®1,wepositx=1–yandfirstconsiderx≤1,sothat|–y|<1.Thenwehave:

lim#→(

{!"#(%)} = lim#→(

{!"#(1 − =)} = lim#→(

>(

()#*(1 − =)(()#) − 1-?

Bythebinomialexpansionof(1 − =)(()#):

lim#→(

>(

()#@−1 + A1 + (1 − /)(−=) +

(()#)(()#)()()B)C

D!+

(()#)(()#)()(()#)D)()B)F

G!+ ⋯IJ?

= lim#→(

>(−=) +()#)()B)C

D!+

()#)()#)()()B)F

G!+ ⋯?

= lim#→(

>(−=) −#()B)C

D!+

(#)(#K()()B)F

G!− ⋯ ?

= (−=) −()B)C

D!+

D!()B)F

G!− ⋯

= (−=) −()B)C

D+

()B)F

G−⋯

whichistheseriesexpansionof!"(1 − =) = ln(%)(recallthat|–y|<1).Forthecasex>1,onecan

positx=1/(1–y),sothatagain|–y|<1andcomputelim#→(

NA2

23OI(234)

)(

()#P = lim

#→(>−

(

#)(*(1 − =)(#)() − 1-?,

thusgettingthesameresultfromasimilarderivation.

Justlikethenaturallogarithm,lnt(x)isnon-negativeifx≥1,becauseift<1,then%(()#) ≥ %R = 1,

therefore (()#

*%(()#) − 1- ≥ 0,whileift>1,then%(()#) ≤ %R = 1,thereforeagain (()#

*%(()#) − 1- ≥ 0.

If0<x<1,lnt(x)isnegativeinstead,againlikethenaturallogarithm.

Toshowthattheordinaryexponentialisrecoveredfrom5#(%)(x>0)inthelimitfort®1,weagainrelyonthebinomialexpansion,asfollows.

lim#→(

{5#(%)} = lim#→(

>[1 + (1 − /)%]2

234?

Page 63: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

63

=lim#→(

U1 + A(

()#I (1 − /)% + A

(

()#I A

(

()#− 1I

V(()#)6WC

D!+ A

(

()#I A

(

()#− 1I A

(

()#− 2I

V(()#)6WF

G!+ ⋯ Y

=lim#→(

>1 + A(

()#I (1 − /)% + A

(

()#I A

#

()#I(()#)C6C

D!+ A

(

()#I A

#

()#I A

D#)(

()#I(()#)F6F

G!+ ⋯ ?

=1 + % + 6C

D!+

6F

G!+ ⋯

= 1 +∑6[

\!

]\^( = 5(%)

Justliketheordinaryexponential,et(x)≥1ifx≥0,becauseift<1,thenonehas[1+(1–t)x]≥1toapositivepower1/(1–t),whileift>1,thenonehas[1+(1–t)x]≤1toanegativepower1/(1–t).If0<x<1,et(x)<1instead,againliketheordinaryexponential.

2.Sharma-Mittalentropies

FirstwewillderivetheSharma-Mittalformulafromitsgeneralizedmeanform.Wehaveg(x)=lnret(x)andinf(x)=lnt(x).Letusfindg–1(x),bysolvingy=g(x)forx,asfollows.

= = !"_5#(%)

= !"_ @(1 + (1 − /)%)2

234J

=(

()_@(1 + (1 − /)%)

23`

234 − 1J

Therefore:

1 + (1 − a)= = [1 + (1 − /)%]23`

234

[1 + (1 − a)=]2

23` = [1 + (1 − /)%]2

234

5_(=) = [1 + (1 − /)%]2

234

[5_(=)]()# = 1 + (1 − /)%

(

()#{[5_(=)]

()# − 1} = %

% = !"#5_(=)

Sog–1(x)=lnter(x).NowwehavealltheelementstoderivetheSharma-Mittalformula.

5"/b

cd(`,4)(f) = g)( >∑ hijk∈m

g @n"o A(

pkIJ?

= !"#5_ >∑ hijk∈m!"_5# @!"# A

(

pkIJ?

= !"#5_ >∑ hijk∈m!"_ A

(

pkI?

= !"#5_ U(

()_∑ hijk∈m

qA(

pkI()_

− 1rY

= !"# >1 + (1 − a) @(

()_∑ hijk∈m

Vhi(_)() − 1WJ?

2

23`

Page 64: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

64

= !"#s1 + ∑ hijk∈mhi(_)() − ∑ hijk∈m

t

2

23`

= !"#s∑ hi_

jk∈mt

2

23`= (

()#qV∑ hi

_jk∈m

W

234

23` − 1r =(

#)(q1 − V∑ hi

_jk∈m

W

432

`32r

Letusnotethat5"/cd(`,4)satisfiesthebasicpropertiesofentropymeasures.Aspointedoutabove,

Tsallislogarithmlnt(x)isalwaysnon-negativeifx≥1,thereforesois∑ hijk∈m!"_ A

(

pkI.Moreover,

er(x)≥1ifx≥0(seeabove),so5_ >∑ hijk∈m!"_ A

(

pkI? ≥ 1andfinally!"#5_ >∑ hijk∈m

!"_ A(

pkI? ≥ 0.

Thisprovesthatnon-negativityholdsfor5"/cd(`,4).Letusthenconsiderevennesssensitivity.Wealreadyknowthat5"/cd(`,4)isnon-negative;also,∑ hi

_ = 1jk∈mincasepi=1forsomei,sothat

5"/u

cd(`,4)(f) = 0.Asaconsequence,foranyHandP(H),5"/

b

cd(`,4)(f) ≥ 5"/

u

cd(`,4)(f) = 0.Inorder

tocompletetheproofofevennesssensitivity,wewillnowstudythemaximizationof5"/cd(`,4)by

meansofso-calledLagrangemultipliers.Wehavetomaximize∑ hijk∈m!"_ A

(

pkI =

(

()_*∑ (hi)

_jk∈m

− 1-,sowestudyo(%(, … , %\) = (

()_[∑ (%i)

_(wiw\ − 1]undertheconstraint

∑ (%i)(wiw\ = 1.BytheLagrangemultipliersmethod,wegetasystemofn+1equationsasfollows:

⎩⎪⎨

⎪⎧

a

1 − a%(

(_)() = |

…a

1 − a%\

(_)() = |

%( + ⋯+ %\ = 1

wherex1=…=xn=1/nistheonlysolution.Thismeansthat∑ hijk∈m!"_ A

(

pkIiseithermaximizedor

minimizedfortheuniformdistributionU(H).Butactually5"/}

cd(`,4)(f)mustbeamaximum,sothat,

foranyHandP(H),5"/}

cd(`,4)(f) ≥ 5"/

b

cd(`,4)(f).Infact,5"/

}

cd(`,4)(f)isstrictlypositive,because

∑ hijk∈m!"_ A

(

pkI = !"_(")is(recallthatn>1).Hence,foranyH,5"/}

cd(`,4)(f) > 5"/

u

cd(`,4)(f) = 0,

andevennesssensitivityisshowntohold.3.SomespecialcasesoftheSharma-Mittalfamily

Giventheaboveanalysisofgeneralizedlogarithmsandexponentials,wehaveRényi(1961)entropyasaspecialcaseoftheSharma-Mittalfamilyasfollows:

5"/b

cd(`,2)(f) = !" >5_ @∑ hijk∈m

!"_ A(

pkIJ?

= !" �@1 + (1 − a)(

()_V∑ hi

_jk∈m

− 1WJ

2

23`Ä

= !" U*∑ hi_

jk∈m-

2

23`Y =(

()_!"V∑ hi

_jk∈m

W =5"/b

Åé\Bi(`)(f)

Page 65: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

65

ForShannonentropy,inparticular,oneonlyneedstonotethat5"/b

cd(2,2)(f) =

!" >5 @∑ hijk∈m!" A

(

pkIJ? = ∑ hijk∈m

!" A(

pkI.

ForTsallis(1988)entropy,wehave:

5"/b

cd(4,4)(f) = !"#5# >∑ hijk∈m

!"# A(

pkI? = ∑ hijk∈m

!"# A(

pkI =

(

#)(*1 − ∑ hi

#jk∈m

- = 5"/b

ÉÑÖÜÜiÑ(4)(f)

ForanothergeneralizationofShannonentropy,i.e.Gaussianentropy(Frank,2004),wehave:

5"/b

cd(2,4)(f) = !"#5 >∑ hijk∈m

!" A(

pkI? =

(

()#�5

(()#)q∑ pkák∈àÜ\â

2

äkãr− 1Ä = 5"/

b

åÖçÑÑ(4)(f)

Thewayinwhich5"/åÖçÑÑ(4)recoversShannonentropyfort=1againfollowsbythebehaviorof

thegeneralizedlogarithm,because5"/b

åÖçÑÑ(2)(f) = !" >5 @∑ hijk∈m

!" A(

pkIJ? = ∑ hijk∈m

!" A(

pkI.

ForPowerentropies,5"/b

cd(`,C)(f) = 5"/

b

béèê_(`)(f)followsimmediatelyfrom5"/

b

cd(`,4)(f) =

(

#)(q1 − V∑ hi

_jk∈m

W

432

`32r,andthesameforQuadraticentropy,i.e.,5"/b

cd(C,C)(f) = 5"/b

ëçÖí(f).

Ifwepositt=2–1/r,wehave5"/b

cd(`,C3

2

`)(f) =

_

_)(q1 − V∑ ì(ℎi)

_jk∈m

W

2

`r,whichhappenstobe

preciselyArimoto’s(1971)entropy,underaninconsequentialchangeofparametrization(Arimoto,1971,usedaparameterbtobesetto1/rinournotation).

ForEffectiveNumbermeasures(Hill,1973),wehave:

5"/b

cd(`,ï)(f) =

(

)(q1 − V∑ hi

_jk∈m

W

32

`32r = V∑ hi_

jk∈mW

2

23` − 1 = 5"/b

ñó(`)(f)

AsafurtherpointconcerningEffectiveNumbers,consideraSharma-Mittalmeasure5"/b

cd(`,4)(f) =

!"#5_ >∑ hijk∈m!"_ A

(

pkI?,foranychoiceofrandt(bothnon-negative).WeaskwhatisthenumberN

ofequiprobableelementsinapartitionKsuchthat5"/b

cd(`,4)(f) = 5"/

}

cd(`,4)(ò).Wenotethat

5"/}

cd(`,4)(ò) = !"#5_{!"_(ô)} = !"#(ô),thusweposit:

5"/b

cd(`,4)(f) = !"#5_ >∑ hijk∈m

!"_ A(

pkI? = !"#(ô)

ô = 5_ >∑ hijk∈m!"_ A

(

pkI?

= @1 + (1 − a)(

()_V∑ hi

_ − 1jk∈mWJ

2

23`

= V∑ hi_

jk∈mW

2

23` = 5"/b

ñó(`)(f) + 1

Page 66: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

66

Thisshowsthat,regardlessofthedegreeparametert,foranySharma-Mittalmeasureofaspecified

orderr,5"/b

ñó(`)(f) + 1computesthetheoreticalnumberNofequallyprobableelementsthat

wouldbejustasentropicasHunderthatmeasureandgivenP(H).

Thederivationoftheformof5"/cd(ï,4)isasfollows:

5"/b

cd(ï,4)(f) =

(

#)(q1 − V∑ hi

Rjk∈m

W

432

32 r= (

()#*("K)(()#) − 1-= !"#("

K)

where"KisthenumberofelementsinHwithanon-nullprobabilityaccordingtoP(H)(recallthatweapplytheconvention00=0,commonintheentropyliterature).Hartleyentropy,5"/b

mÖ_#ÜêB(f) = !"("K),immediatelyfollowsasaspecialcasefort=1,justasOriginentropy,

5"/bö_iõi\

(f) = "K − 1,fort=0.t=2yields5"/b

cd(ï,C)(f) = −[("K))( − 1] =

\ú)(

\ú.

Forthecaseofinfiniteorder,weposith∗ = maxjk∈m

(hi)andnotethefollowing(nisagainthe

overallsizeofH):

(h∗)_ ≤ ∑ hi

_ ≤jk∈m"(h∗)

_

Assuming#)(_)(

≥ 0involvesnolossofgeneralityinwhatfollows:

((h∗)_)

432

`32 ≤ V∑ hi_

jk∈mW

432

`32 ≤ ("(h∗)_)

432

`32

!" @((h∗)_)

432

`32J ≤ !" qV∑ hi_

jk∈mW

432

`32r ≤ !" @"432

`32((h∗)_)

432

`32J

!" @((h∗)_)

432

`32J ≤ !" qV∑ hi_

jk∈mW

432

`32r ≤ !" A"432

`32I + !" @((h∗)_)

432

`32J

lim_→]

>!" @((h∗)_)

432

`32J? ≤ lim_→]

U!" qV∑ hi_

jk∈mW

432

`32rY ≤ lim_→]

>#)(

_)(!"(")? + lim

_→]>!" @((h∗)

_)432

`32J?

lim_→]

>!" @((h∗)_)

432

`32J? ≤ lim_→]

U!" qV∑ hi_

jk∈mW

432

`32rY ≤ 0 + lim_→]

>!" @((h∗)_)

432

`32J?

Therefore:

lim_→]

U!" qV∑ hi_

jk∈mW

432

`32rY = lim_→]

>!" @((h∗)_)

432

`32J? = lim_→]

U!" qA(h∗)`

`32I#)(

rY

Thelimitforr®¥oftheargumentofthelnfunctionexistsandisfinite: lim_→]

qA(h∗)`

`32I#)(

r =

h∗(#)().Forthisreason,wecanconclude:

!" U lim_→]

qV∑ hi_

jk∈mW

432

`32rY = !" U lim_→]

qA(h∗)`

`32I#)(

rY

lim_→]

qV∑ hi_

jk∈mW

432

`32r = h∗(#)()

Page 67: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

67

(

()#q lim_→]

V∑ hi_

jk∈mW

432

`32 − 1r =(

()#*h∗

(#)() − 1-

lim_→]

q(

()#âV∑ hi

_jk∈m

W

432

`32 − 1ãr =(

()#âA

(

p∗I()#

− 1ã

lim_→]

@5"/b

cd(`,4)(f)J = !"# A

(

p∗I

Errorentropyisaspecialcasefort=2,because!"D A (p∗I = −âA(

p∗I)(

− 1ã = 1 − h∗ = 1 −maxjk∈m

(hi).

t=1yields!" † (

°¢£ák∈à

(pk)§,while,fort=0,oneimmediatelyhas (

°¢£ák∈à

(pk)− 1.

4.Ordinalequivalence,additivity,andconcavity

TheordinalequivalenceofanypairofSharma-Mittalmeasures5"/cd(`,4)and5"/cd(`,4∗)withthesameorderranddifferentdegrees,tandt*,iseasilyprovenonthebasisoftheinverserelationshipoflnt(x)andet(x).Infact,foranyr,t,t*,andanyHandP(H),5"/cd(`,4)isastrictlyincreasingfunctionof5"/cd(`,4∗):

5"/b

cd(`,4)(f) = !"#5_ @∑ hijk∈m

!"_ A(

pkIJ

= !"#5#∗ >!"#∗5_ @∑ hijk∈m!"_ A

(

pkIJ?

= !"#5#∗ >5"/bcd(`,4∗)(f)?

Fordegreest,t*≠1,thisimpliesthat:

5"/b

cd(`,4)(f) =

(

()#�@1 + (1 − /∗)5"/

b

cd(`,4∗)J

234

234∗

− 1Ä

whereaswhent=1and/ort*=1,thelimitingcasesoftheordinaryexponentialand/ornaturallogarithmapply.Thisgeneralresultisnovelintheliteraturetothebestofourknowledge.However,awell-knownspecialcaseistherelationshipbetweenRényientropiesandtheEffectiveNumbermeasures(seeHill,1973,p.428,andRicotta,2003,p.191):

5"/b

Åé\Bi(`)(f) = 5"/

b

cd(`,2)(f)

= !" >5R @5"/bcd(`,ï)

(f)J?

= !" >1 + 5"/b

cd(`,ï)(f)?

= !" >5"/b

ñó(`)(f) + 1?

AnotherneatillustrationinvolvesPowerentropymeasuresandRényientropies:

Page 68: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

68

5"/b

béèê_(`)(f) = 5"/

b

cd(`,C)(f)

= !"D >5 @5"/bcd(`,2)

(f)J?

= 1 − 5)ê\#•¶é[Ok(`)

(m)

WewillnowderivethegeneraladditivityruleforSharma-Mittalentropiesconcerningindependentvariables,i.e.,whenß ⊥b ©holds.Tosimplifynotation,belowwewilluse∑(ß)asashorthandfor

V∑ ì(%i)_

6k∈™W

432

`32(thesameforY,andsoon)andwewilluse5"/(ß)asashorthandfor

5"/b

cd(`,4)(ß)(thesamefortheexpectedreductionofentropy,R).

5"/(ß) + 5"/(©) − (/ − 1)5"/(ß)5"/(©)=

(

#)([1 − ∑(ß)] +

(

#)([1 − ∑(©)] − (/ − 1)

(

#)([1 − ∑(ß)]

(

#)([1 − ∑(©)]

=(

#)([1 − ∑(ß)] +

(

#)([1 − ∑(©)] −

(

#)([1 − ∑(ß)][1 − ∑(©)]

=(

#)(−

(

#)(∑(ß) +

(

#)(−

(

#)(∑(©) −

(

#)(+

(

#)(∑(ß) +

(

#)(∑(©) −

(

#)(∑(ß)∑(©)

=(

#)(−

(

#)(∑(ß)∑(©)

=(

#)(−

(

#)(V∑ ì(%i)

_6k∈™

W

432

`32 A∑ ì(=´)_

B¨∈≠I

432

`32

=(

#)(Æ1 − A∑ ì(%i)

_6k∈™

∑ ì(=´)_

B¨∈≠I

432

`32Ø

=(

#)(Æ1 − A∑ ∑ Vì(%i)ì(=´)W

_

B¨∈≠6k∈™I

432

`32Ø

=(

#)(Æ1 − A∑ ∑ ìV%i ∩ =´W

_

B¨∈≠6k∈™I

432

`32Ø =5"/(ß × ©)

Thisadditivityruleinturngovernstherelationshipbetweentheexpectedentropyreductionofatestincaseitisaperfect(conclusive)experimentandincaseitisnot.Moreprecisely,itimpliesthatforindependentvariablesEandH:

≤(≥, ≥) − ≤(f × ≥, ≥) = (/ − 1)5"/(f)5"/(≥)

Infact:

≤(≥, ≥) − ≤(f × ≥, ≥) = 5"/(≥) − ∑ *5"/V≥|5́ W-ìV5́ Wê¨∈ñ− 5"/(f × ≥) + ∑ *5"/Vf × ≥|5́ W-ìV5́ Wê¨∈ñ

= 5"/(≥) − 0 − 5"/(f) − 5"/(≥) + (/ − 1)5"/(f)5"/(≥) + ∑ *5"/Vf × ≥|5́ W-ìV5́ Wê¨∈ñ

= −5"/(f) + (/ − 1)5"/(f)5"/(≥) + ∑ *5"/Vf|5́ W + 5"/V≥|5́ W − (/ − 1)5"/Vf|5́ W5"/V≥|5́ W-ìV5́ Wê¨∈ñ

= −5"/(f) + (/ − 1)5"/(f)5"/(≥) + ∑ *5"/Vf|5́ W + 0 − (/ − 1)V5"/Vf|5́ W × 0W-ìV5́ Wê¨∈ñ

= −5"/(f) + (/ − 1)5"/(f)5"/(≥) + 5"/(f) = (/ − 1)5"/(f)5"/(≥)

Page 69: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

69

Sharma-Mittalmeasuresofexpectedentropyreductionarealsogenerallyadditiveforacombinationofexperiments,thatis,foranyH,E,Fandì(f, ≥, µ),itholdsthat≤(f, ≥ × µ) =≤(f, ≥) + ≤(f, µ|≥).Toseethis,letusfirstconsidertheentropyreductionofaspecificdatume,∆5"/(f, 5) = 5"/(f) − 5"/(f|5).∆5"/isclearlyadditiveinthefollowingway:

∆5"/(f, 5 ∩ o) = 5"/(f) − 5"/(f|5 ∩ o)= 5"/(f) − 5"/(f|5) + 5"/(f|5) − 5"/(f|5 ∩ o)= ∆5"/(f, 5) + ∆5"/(f, o|5)

Butthispatterncarriesovertotheexpectedvalue≤(f, ≥ × µ):

≤(f, ≥ × µ) = ∑ ∑ *∆5"/Vf, 5́ ∩ o∑W-ìV5́ ∩ o∑W∏π∈∫ê¨∈ñ

= ∑ ∑ *∆5"/Vf, 5́ W + ∆5"/Vf, o∑|5́ W-ìVo∑|5́ WìV5́ W∏π∈∫ê¨∈ñ

= ∑ ∑ *∆5"/Vf, 5́ W-ìVo∑|5́ WìV5́ W +∏π∈∫ê¨∈ñ∑ ∑ *∆5"/Vf, o∑|5́ W-ìVo∑|5́ WìV5́ W∏π∈∫ê¨∈ñ

= ∑ *∆5"/Vf, 5́ W-ìV5́ Wê¨∈ñ+ ∑ s∑ *∆5"/Vf, o∑|5́ W-ìVo∑|5́ W∏π∈∫

tìV5́ Wê¨∈ñ

= ≤(f, ≥) + ∑ s≤Vf, µ|5́ WtìV5́ Wê¨∈ñ

= ≤(f, ≥) + ≤(f, µ|≥)

Thisresultisnovelintheliteraturetothebestofourknowledge.

Finally,wewillshowthat,foranyH,E,andP(H,E),≤(f, ≥) ≥ 0ifandonlyifentisconcave.Letªb(º)betheexpectedvalueofavariablevforsomeprobabilitydistributionP={p1,…,pm},i.e.ªb(º) = ∑ ºi

Ωi^( hi.AccordingtoamultivariateversionofJensen’sinequality,g(%(, … , %\)isa

concavefunctionifandonlyifgoftheexpectedvaluesofitsargumentsisgreaterthan(orequalto)theexpectedvalueofg,thatis:

g[ªb(%(), … , ªb(%\)] ≥ ªb[g(%(, … , %\)]

Nowwesetg(%(, … , %\) = 5"/(f|5)andwepositthatªb(%)becomputedonthebasisofP(E),i.e.ªb(º) = ∑ ºiìV5́ Wê¨∈ñ

.Assumingthatentisconcave,wehave:

(

#)(æ1 − A∑ A∑ ì(ℎi|5́ )ì(5́ê¨∈ñ

)I_

jk∈mI

432

`32

ø ≥∑ q(

#)(â1 − V∑ ì(ℎi|5́ )

_jk∈m

W

432

`32ãrìV5́ Wê¨∈ñ

(

#)(â1 − V∑ ì(ℎi)

_jk∈m

W

432

`32ã − ∑ q(

#)(â1 − V∑ ì(ℎi|5́ )

_jk∈m

W

432

`32ãrìV5́ Wê¨∈ñ≥ 0

∑ q(

#)(â1 − V∑ ì(ℎi)

_jk∈m

W

432

`32ã −(

#)(â1 − V∑ ì(ℎi|5́ )

_jk∈m

W

432

`32ãrìV5́ Wê¨∈ñ≥ 0

∑ ∆5"/(f, 5)ìV5́ Wê¨∈ñ≥ 0

≤(f, ≥) ≥ 0

Page 70: Generalized Information Theory Meets Human Cognition ...philsci-archive.pitt.edu/14436/7/entropies2018_final.pdf · Generalized Information Theory Meets Human Cognition: Introducing

70

5.ExpectedentropyreductioninthePersonGame

Toanalyzetheexpectedentropyreductionofonebinaryqueryinthepersongame,wewillpositH={h1,…,hn}(thesetofpossibleguessesastowhotherandomlyselectedcharacteris)andE={e,5}(theyes/noanswerstoaquestionsuchas“doestheselectedcharacterhaveblueeyes?”;recallthat“5”denotesthecomplementorthenegationofe).ThejointprobabilitydistributionP(H,E)isdefinedasfollows:P(hiÇe)=1/nincasei≤k(with1≤k<n)andP(hiÇe)=0otherwise;P(hiÇ5)=0incasei≤kandP(hiÇ5)=1/notherwise.ThisimpliesthatP(hi)=1/nforeachi(allguessesareinitiallyequiprobable),P(hi|e)=1/kforeachi≤k(theposteriorgiveneisauniformdistributionoverkelementsofH),andP(hi|5)=1/(n–k)foreachi>k(theposteriorgiven5isauniformdistributionovern–kelementsofH).Moreover,P(e)=k/n.Giventhegeneralfactthat

5"/}

cd(`,4)(f) = !"#("),wehave:

≤b

cd(`,4)(f, ≥) = [!"#(") − !"#(¿)]ì(5) + [!"#(") − !"#(" − ¿)]ì(5)

Algebraicmanipulationsyield:

≤b

cd(`,4)(f, ≥) =

"()#

1 − /[1 − (ì(5)D)# + ì(5)D)#)]

Inthespecialcaset=2,onethenhas≤b

cd(`,C)(f, ≥) =

(

\,sothattheexpectedusefulnessofqueryEis

constant,regardlessofthevalueofP(e).Moregenerally,however,thefirstderivativeof≤b

cd(`,4)(f, ≥)

is

"()#

1 − /[(2 − /)ì(5)()# − (2 − /)ì(5)()#]

whichequalszeroforP(e)=P(5),sothat≤b

cd(`,4)(f, ≥)hasamaximumoraminimumforP(e)=½.

Thesecondderivative,inturn,is:

"()#(/ − 2)[ì(5))# + ì(5))#]

whichisstrictlypositive[negative]incasetisstrictlyhigher[lower]than2.So,inthepersongame,

≤b

cd(`,4)(f, ≥)isastrictlyconcavefunctionofP(e)whent<2,andP(e)=½isthenamaximum.

Whent>2,onthecontrary,≤b

cd(`,4)(f, ≥)isastrictlyconvexfunctionofP(e),andP(e)=½isthena

minimum.Thisgeneralresultisnovelintheliteraturetothebestofourknowledge.