the danger of a big data episteme and the need to evolve gis
TRANSCRIPT
7/27/2019 The Danger of a Big Data Episteme and the Need to Evolve GIS
http://slidepdf.com/reader/full/the-danger-of-a-big-data-episteme-and-the-need-to-evolve-gis 1/9
TheDangerofaBigDataEpistemeandtheNeedtoEvolveGIS
SeanP.Gorman–TimbrLLC,Charlottesville,VAUSA
Abstract
Theemergenceof“BigData”asadominanttechnologymemechallengesGeography’s
technicalunderpinnings,foundinGIS,whileengagingthedisciplineinaconversation
aboutthememe’simpactonsociety.Thisallowsscholarstoengagecollaboratively
frombothacomputationallyquantitativeandcriticallyqualitativeperspective.For
Geographythereisanopportunitytopointouttheseshortcomingsthroughcritical
appraisalsof“BigData”anditsreflectionofsociety.Complimentarilythisopensthe
doortodevelopingmethodologiesthatwillallowforamorerealisticinterpretationof
“BigData”analysisinthecontextofanunfilteredsocietalview.
BigDataasaMeme
“BigData”isapopulartechnologicalmemethathasbecomepervasiveinthelanguage
discussingavarietyofcomputingchallengesandtrends.Theterm“BigData”has
severalcharacteristicsoftenassociatedwithpopulartechnologymemes:
1) Thecomponentwords“big”and“data”arebothbroadandgeneral
2) Thetermisopentomultipleinterpretations
3) Thewordscaneasilybecompositedwithothertermstofurtherspreadthe
meme–BigScience,BigComplexity,BigPrivacy
“BigData”issimilarintrajectoryto“opensource”and“web2.0”memesthatleadtoa
plethoraof“open”prefixesand“2.0”suffixes.Whilethesemanticsofthecreationand
useoftheterm“BigData”isafascinatingroadtowalk,thispositionpaperwillfocus
moreonthemethodologicalthanthecriticalissuesofthememe.Specifically,thepaper
willexaminetheroleofgeographyin“BigData”throughthechallengesitcreatesfor
computation,methodology,andinterpretation.Further,itwillexploretheimpactof
“BigData”onthedisciplineofGeographyasseenthroughthelensofGIS.Itshouldbe
keptinmindthattheimpactsof“BigData”gobeyondjustquantitativeapproaches.
Theymayalsoimpactqualitativeresearchasseenthroughtheemergingworksin
“digitalhumanities”.
Whileit’sdifficulttopindownageneraldefinitionof“BigData”itisusefultohavea
startingpointforunderstandingtheroleofGeographytodayandinthefuture.The
shortdefinitionof“BigData”isthatitencompasses“acollectionofdatasetssolarge
andcomplexthatitbecomesdifficulttoprocessusingon-handdatabasemanagement
toolsortraditionaldataprocessingapplications(Wikipedia2013).”Further,theunique
7/27/2019 The Danger of a Big Data Episteme and the Need to Evolve GIS
http://slidepdf.com/reader/full/the-danger-of-a-big-data-episteme-and-the-need-to-evolve-gis 2/9
characteristicsofthesedatasetsthatmakethemdifficulttomanagewithtraditional
toolsare:
• Volume–thesizeofdatathatmustbemanaged
• Velocity–thespeedatwhichthatdataarrivesandneedstobe
processed/analyzed• Variety–thetypesofdatahandledincludestructuredandunstructureddata
(i.e.text,sensoroutput,GPS,video,audio,logfilesetc.)
• Veracity–theaccuracyandprecisionofdataisvariable
Inaddressingtheroleofgeographyin“BigData”oneofthekeytakeawaysisthatnot
onlyisgeographyjustonetype,ofseveraltypesofdata,itisalsoofteninconsistent
acrossasingledataset.Forinstance,inmanymobileandsocialapplicationsusers
decidewhethertoincludetheirlocationornot.Thisisindicativeofalargertrendseen
in“locationasafeature”.
TheEmergenceofLocationasFeature
Oneoftheimportantproducersof“BigData”hasbeenthegrowthofmobile,socialand
locationapplications,oftencalledSoLoMo(Meeker2011).Whilegeographic
informationscienceshaslargelyevolvedalongapathofincreasingspecializationand
complexitydrivenbyprofessionals,SoLoMohasemergedasageneraltechnologytrend
quicklymakinglocationubiquitous,drivenbyconsumers.ByitsverynatureSoLoMohas
beencenteredonselfservice,andallowingtheconsumermassestoseamlesslyinteract
withlocationandgeography.Thistechnologyshifthasbeendrivenbyseveralevolving
factorsandevents.First,GPSenabledalargernumberofpeopletocreategeographic
data.ThiswasfollowedbytheincorporationofGPSintocommoditytechnologieslike
mobiledevices.Inadditionlocationhaspermeateduptheinformationtechnologystack
withW3CspecificationsforaddinglocationtoWebbrowsers,andeventheinclusionof
locationintodesktopoperatingsystems.Thelocationcomponentcreatedbythese
technologiesisonedatafeatureofanexistingbaseline,andnotastandalone
technologyaswasdevelopedwithGIS.Further,theattributesofdatawentbeyond
whatthecomputationalunderpinningsofGISwasoriginallyconstructedfor–now
integratingunstructureddataandtemporalattributesbothatverylargevolumeand
highspeeds.
Theadoptionof“locationasafeature”hasbeenmassiveinscale.Thegraphicbelowcoversjusttheadoptionofmobile/locationtechnologiestodrivesocialapplications:
7/27/2019 The Danger of a Big Data Episteme and the Need to Evolve GIS
http://slidepdf.com/reader/full/the-danger-of-a-big-data-episteme-and-the-need-to-evolve-gis 3/9
Figure1:TheGeoSocialUniverseAdoptingLocationasaFeature(Jess32013)
OneofthechallengesforGeographyasadisciplineisthat“locationasafeature”
happenedoutsidetheparadigmofgeographicinformationscience.Thisoccurredforseveralreasons.1)GISwasbuiltforworkingwithgeographicdataandlocationasthe
centeroftheoperatingsystem.FortherapidlygrowingSoLoMospacelocationwasjust
oneofmanycomponentsthatwerebeingleveraged,andwasnotthecenterofthe
operatingsystem.2)Dataflowingfrommobiledevicesthroughsocialnetworksis
dynamicandnotstatic.Whiletime-spacehasbeenanactiveareaofstudyin
geographicinformationscience,traditionallydatasourceshavenotbeenunbounded
andperpetuallyupdating.Thisphenomenonisreinforcedbythedatacharacteristics
acrosstheseemergingservices(bigeventsliketheSuperBowlandNewYearsEvecan
resultinratesof5,000-6,000tweetspersecondfromalocation)andinmassive
volumes(155milliontweetsinaday)(Twitter2011).Putintothelargerperspective
90%ofalldatainthehistoryofhumanityhasbeencreatedinthelasttwoyears(Tofel
2011).ThiswasnotthetechnologyparadigmwhenGISemerged–datawasstatic,in
mostlysmallvolumesandintendedforarelativelysmallaudience.ThisisnottosayGIS
hasnotevolved,butithasiterativelyadaptedtorequirementsinitsnicheof
practitioners,andnottothedemandsofthelargermarketthatisservedby“BigData”.
7/27/2019 The Danger of a Big Data Episteme and the Need to Evolve GIS
http://slidepdf.com/reader/full/the-danger-of-a-big-data-episteme-and-the-need-to-evolve-gis 4/9
Monolithicvs.Distributed
InitsinceptionGISwaspredominantlyamonolithicapplicationrunninginamainframe
environmentandasitevolvedfromthecommandlinetothedesktopthesame
monolithicstructurepersisted.Monolithicsystems“helptheusercarryoutacomplete
task,endtoend,andare‘privatedatasilos’ratherthanpartsofalargersystemofapplicationsthatworktogether(Wikipedia2011).”InthecaseofGISthecompletetask
wasperforminggeospatialanalysisthatresultedinanendproducttobedistributedto
viewers.Fewothersystemsatthistimegeneratedspatialdata,somonolithicstructures
werenotonlyapopularbutalsoapracticalsolution.Thesamestructuresalso
dominatedwordprocessingandpersonalfinanceapplicationsforsimilarreasons
(Wikipedia2011).
Themonolithicstructurealsomatchedupwellwiththephilosophicaldirectionof
geographicinformationscience.Inthisconstruct,GISwasviewednotonlyasascience
butaprofession,whichrequiredspecialtyskillsandtrainingwithingeography
departments.Whilethiscreatedacorpusofhighlytrainedprofessionalsitalsocreated
aninsularapproachthatalsomanifesteditselfinthetechnicalarchitectureofGIS.Data
wascreated,managed,analyzed,visualizedandpublishedallwithinasinglesystem,and
theresultwasconsideredauthoritativeandcanonical.Operationofthesystemwasby
trainedprofessionals.Sincedatawasnotcreatedexternallyamonolithicdesignwas
efficientandwellsuitedtothecustomersetatthetime.
Astechnologyhasevolvedfrommonolithictodistributedsystems,GIShasadaptedand
evolvedaswell.First,GISadaptedtoaclientserverenvironment,andincreasingly
providesApplicationProgrammingInterfaces(API)todataandanalysisenabling
externalconsumption.Mostrecentlytherehavebeenconnectionsof“BigData”computationalplatformslikeHadooptoGISapplications(Github2013aandGithub
2013b).WhilethehooksintoGIShavemodernized,thestructurehasevolvedtoamore
distributedarchitecture–althoughuserworkflowsarestillgearedtowardstheentire
taskbeingdoneend-to-endinthesystem.
ChallengesofScopeandScale
Computationally“BigData”andrelatedtrendshavecreatedadistributedecosystem
withmanycomponentsandusers.Aslocationdatacomesfromanincreasingvarietyof
devicesandcontributorsthereisachallengeofwhatmechanismwillmanagetheaccuracyandveracityofthedata.Therearepotentialproblemsinbothscaleandscope
applyingthecurrentGISprocessfordeterminingwhatis“authoritativedata”tothese
emergingsourcesofunboundedlocationdata.Fromascopeperspectiveitrequiresthe
GIScadreofprofessionalstobeexpertsinamassivenumberofsubjectmatterareas–
anthropology,sociology,economics,politicalscience,socialmedia,disasterresponse
etc..Isthedisasterresponseprofessionalonthegroundinabetterpositionto
determinethequalityofdatabeingreportedbycitizens,oraretheGISprofessionals
7/27/2019 The Danger of a Big Data Episteme and the Need to Evolve GIS
http://slidepdf.com/reader/full/the-danger-of-a-big-data-episteme-and-the-need-to-evolve-gis 5/9
backatheadquarters?Shouldaninstitutionbedependentonhaving
geography/geomaticacademicdepartmentsgenerateGIScurriculumstocreateanew
generationofsocialmediaanalystsbeforerespondingtothepressingneedtoanalyzea
newsourceofinformation?Theinherentproblemofhavingthedisciplineofgeography
createaspecialtydisciplineforeveryaspectofsciencethathasalocationorgeographic
componenthaslongbeenrecognizedas“therecurringidentitycrisisthatplaguesmoderngeographyanditspractitioners(Tuason1987)”.
Thisiswheretheproblemofscalebecomespotentiallyinsurmountable.Themonolithic
structurewhichrequiresaGIStrainedpersontodictatedataasauthoritativehasan
inherentdependencyofrequiringatrainedpersontoalwaysbeintheloop.Asthe
volumeoflocationenableddataincreasesatanexponentialrateitraisesthereal
problemofhowdothenumberofGISprofessionalsscaletokeeppacewiththespeed
andvolumeofthenewdatathatmustbeverified.ThestructureofGISasatechnology
andprofessionwasnotbuilttohandlemassivevolumesofexternalauthoreddata.
Becauseofitsmonolithicstructure,datawastobegeneratedbyprofessionalssolely
withintheGISworkflow.Now,Twitteraloneisgeneratingmillionsoflocation-enabled
messagesperday.Simply,therearenotenoughtrainedprofessionalstoverifyeach
newpieceofdataeveniftheydidhavethetools.Itisaproblemofsupplyanddemand.
Thesupplyofdatabeinggeneratedhasfaroutstrippedthesupplyoftrained
professionalstoverifyit-requiringanewparadigminordertoadapt.Thisisnottosay
theconceptofverifiedandunverifieddataisnotcriticaltoeffectiveoperations.Itis
sayingthatinordertokeepupwiththerapidlygrowingvolumeofdata,theverification
andvalidationofdatacannotcontinuetobepurelydependentontrainedhuman
professionalsdoingthisbyhandorwithcurrenttools.Innovation,automation,
statisticalinferenceandtheuseofcrowdsourcingtoenableverificationandvalidation
ofdataaregreatlyneededinorderforGIStosuccessfullyadapt.
Issuesofprivacyandthepotentialofcreatingbothgovernmentandcorporatedriven
surveillancestatesfurthercomplicatethischallenge(DodgeandKitchin2007).As
humansaretakenoutoftheloopandreplacedwithalgorithmicregulationthe
applicationofethicsandgovernanceisunclear.Whilethisgoesoutsidethescopeof
thispositionpaper,itisausefulconnectionofhowtechnologicalchallengesof“Big
Data”aredirectlylinkedtosocietalrepercussionsbeingfocusedonbyotherpapersin
thisjournaledition.
StatisticalChallengesofDataatScale
Theamountofdataemergingfrom“BigData”,wherelocationisonefeatureofdata,is
onlygoingtoincreaseatever-highervelocities.Thisnewrealityisgoingtorequire
innovativeconceptsaroundnotonlyleveragingthecrowdfordata,butalsousingthe
crowdtoascertaintheveracityofdata.Traditionalconceptslikeerrorboundswill
fundamentallychangebecausedatacollectionhasexpandedfromjustaperiodicbasis
toalsoincludepersistentcollectionfrommillionsofgloballydistributedsensors.Inthis
7/27/2019 The Danger of a Big Data Episteme and the Need to Evolve GIS
http://slidepdf.com/reader/full/the-danger-of-a-big-data-episteme-and-the-need-to-evolve-gis 6/9
context,errorwillbeafluidconceptandnotastaticmeasure.Metadataneedstoalso
evolvetoafluidconceptinthesecases.TherequirementfordedicatedGISmetadata
librarianswithhundredsofmetadataelementswillnotscalefor“BigData”.Thecrowd
canbeleveragedtoverifyandupdatemetadataasonepotentialifnotentirely
sufficientpath.Thishasbeendonewithgreatsuccessfor“pointofinterest”(POI)and
roaddatabyprojectsrangingfromFactualtoOpenStreetMaprespectively.
Further,theconceptofsamplesizeandmarginoferrorisbeingturnedupsidedown.
Previouslyasmallcadreofhighlytrainedprofessionalsmadeasmallnumberofvery
preciseobservationsandthesewereextrapolatedtoanentirepopulation.Now,sample
sizescomeclosetothesizeoftheactualpopulation,butarealsoincrediblybiased(i.e.
TwitterprovidesamassivesamplebutitisbiasedtoonlythoseusingTwitter).Recent
workbytheOxfordInternetInstitutefoundlargebiasesjustindifferentmethodsof
accessingTwittertoquerydataforanalysis-searchAPIvs.streamingAPI(Gonzales-
Bailonetal2012).Thereisstillalackoffundamentalscienceinunderstandingwhatthe
geographicanddemographicbiasesareoftheproducersof“BigData”,throughthe
varietyofuserdrivenservicesthatcreatethecontent.
TheMethodologicalChallengesoftheVarietyinBigData
Theemergenceof"locationasafeature"inmobileandwebappshasnotonly
generatedalargeamountofnewdata,butalsochangedthedefiningcharacteristicsof
thedata.Thisemergingdataisoftenunstructured,unverified,streamingand
unbounded–asnotedabovethisisadifferentworldthanthemajorityofstructuredGIS
dataworkedwithtraditionally.
Tacklingthisdatameansnotonlyreimaginingmanycurrentstatisticaltechniques,butalsodippingintootherdisciplinesandtool-boxeslikenaturallanguageprocessing,
statisticalmechanicsandmachinelearningtonameafew.ExtendingGeographyto
workwiththeseemergentsourcesofdatamean1)evolvingcurrentdisciplinary
approachesand2)borrowingfromotherdisciplinestosolvenewproblems.
“BigData”hasseveralfeaturestoitthatgeographicinformationsciencehasnot
commonlyfocusedon,andthereisnotasolidexistingmethodologicalframeworkfor
managing.Challengesindealingwitherror,accuracy,andsamplebiashavebeen
addressedbrieflyinthispaper.Expandingthelisttodealingwiththeunstructured
aspectsofbigdata,unboundeddatastreams,locationasasubsetofalargerdataset
andothersgoeswellbeyondthescopeofthepaper.Itisusefultogiveatrivialexample
ofhowthesechallengescanmakeevenasimplegeographicanalysistaskchallenging
though.
Creatingthematicmapsisoneofthemostcommoncartographicoutputsandselecting
therightclassificationfordataispartoftellingtheappropriatestoryofadataanalysis.
Whenthedataforamapisstaticthisisafairlystraightforwardtask.Whendatais
7/27/2019 The Danger of a Big Data Episteme and the Need to Evolve GIS
http://slidepdf.com/reader/full/the-danger-of-a-big-data-episteme-and-the-need-to-evolve-gis 7/9
dynamicandunboundedthetaskbecomesconsiderablymorecomplex.Startingwith
fourofthemostcommonapproachestobinningdataforthematicmapping-equal
interval,standarddeviation,Jenksnaturalbreaksandquantile–thechallengesquickly
emerge.BothstandarddeviationandJenk’srequiretheminimumandmaximumvalues
ofthedatadistributiontobeknown.Inthecaseofanunboundedperpetuallyupdating
stream,itisnotpossibletoknowwhatthesevalueswillbe.Theminimumandmaximumvaluesofthedatastreamhistoricallycouldbeusedasaproxy,butthese
couldeasilybeexceededinthefuturecausingthemappingtobeinaccurate.Quantile
andequalintervalcanbecalculateddynamicallysincetheydonotrequiretheboundsof
data,butdonotcoveralldatadistributionsaccurately.Further,thesedatadistributions
willchangeovertimesotheappropriatebinningattime“x”mightnotalsobethe
correctbinningattime“y”.Thisbeginstoprovidesomeperspectiveonthechallenges
“BigData”holdsforgeographicmethodologies,whichonlybecomemorecomplexwhen
appliedtomoresophisticatedgeographicmethodologiesutilizing“BigData”.
TheChallengeofInterpretationwhenBigDataEqualsthePerceivedEndofTheory
Themethodologicalchallengesimposedby“BigData”makeinterpretationexceeding
difficult.Inspiteoftheseobstaclesthereisapopularconceptionthat“BigData”will
notonlybetheendoftheory(Anderson2008),butevenfurtherthe:
“Beliefthatbigdata,harnessedthroughcollectiveintelligence,wouldallowusto
getattherightanswertoeveryproblem,makingbothrepresentationand
deliberationunnecessary”(Morozov2012).
Thepanaceaaspectsof“BigData”havegrownaspopularperception,leadingtobeliefs
thatresultsofanalysesareapplicabletosocietywritlarge.Haklay(2013)haswrittenonthetrendoftoolsanddatageneratedbythetechnologicaleliteandthebiases(Haklay
andBudhathoki2010)inthedatagenerated.Whiletheconceptofahumanpowered
sensorwebdrivenbytheadoptionofmobiledevicesiscompelling-thereislittle
understandingofthemacro-scaledynamics.Whoandwhoisnotconnected?Who
contributesandwhopassivelyconsumes?Howdoesthisbreakdownbydemographic
andgeography?Thedigitaldivideismuchmorethanconnectivity,butalsothe
participationonthevariousservicesridingacrossnetworksthatgenerate“BigData”.
Whatarethe“datashadows”createdbytheinteractionsofhumanandmachinesacross
networksthatcompresstimeandspace(Graham2013)?Thecreationofcontentthat
feeds“BigData”bothactivelyandpassivelyhasitsowndistinctgeographiesandbiasesthatareonlybeginningtobeconceptualized.Withoutthisparameterizationitis
incrediblydifficulttointerprettheresultsof“BigData”inthecontextofglobalsociety.
Conclusion
Thispaperbegandiscussingaspectsof“BigData”asatechnologicmeme.Exploring
how“BigData”hasevolvedpointstoitsperceivedemergenceasanepisteme.What
7/27/2019 The Danger of a Big Data Episteme and the Need to Evolve GIS
http://slidepdf.com/reader/full/the-danger-of-a-big-data-episteme-and-the-need-to-evolve-gis 8/9
beganasanevolutionincomputationhasmorphedinpopularculturetobeafieldof
scientificity.Thosethatworkwith“BigData”areevenreferredtoas“datascientists”.
Thereductionistmethodsofunderstandingrealityin“BigData”producenew
knowledgeandmethodsforthecontrolofreality.Yetitisnotarealitythatreflectsthe
largersociety,butinsteadthesmallminoritycontributingcontent.
ForGeographyasadisciplinethereisanopportunitytopointouttheseshortcoming
throughcriticalappraisalsof“BigData”anditsreflectionofsociety.Further,thereis
potentiallyanevenlargeropportunityindevelopingthemethodologiesthatwillallow
foramorerealisticinterpretationof“BigData”analysisinthecontextofanunfiltered
societalview.Todosothegeographicinformationscienceaspectofthedisciplinewill
needtoevolvetheirapproachtodataandanalysis.Insuccessthisprovidesaunique
opportunityforpositivisticandpost-positivisticscholarsinGeographytocollaboratein
pushingthedisciplineforwardtoanareainneedofgreaterillumination.
References
Anderson,C.(2008)“TheEndofTheory:TheDataDelugeMakestheScientificMethod
Obsolete”.Wired ,June23,2008,
http://www.wired.com/science/discoveries/magazine/16-07/pb_theory
Dodge,M.andKitchin,R.(2007)“Outlinesofaworldcominginexistence’:Pervasive
computingandtheethicsofforgetting”EnvironmentandPlanningB34:3,431-445
Github(2013a)“GISToolsforHadoop”April3,2013http://esri.github.com/gis-tools-
for-hadoop/
Github(2013b)“SpatialHadoop”April3,2013
https://github.com/aseldawy/spatialhadoop
Gonzales-Bailon,S.Wang,N.Rivero,A.Borge-Holthoefer,J.andMoreno,Y.(2012)
“AssessingtheBiasonCommunicationsNetworksSampledfromTwitter”arXiv.org,
December7,2012http://arxiv.org/abs/1212.1684
Graham,M.(2013)“TheVirtualDimension.”InGlobalCityChallenges:debatinga
concept,improvingthepractice.eds.M.AcutoandW.Steele.London:Palgrave.(in
press)
Haklay,M.(2013)“NeogeographyandtheDelusionofDemocratization”Environment
andPlanningA,45:1,55–69
Haklay,M.andBudhathoki,N.(2011)“OpenStreetMap–OverviewandMotivational
Factors”HorizonInfrastructureChallengeThemeDay,theUniversityofNottingham,
7/27/2019 The Danger of a Big Data Episteme and the Need to Evolve GIS
http://slidepdf.com/reader/full/the-danger-of-a-big-data-episteme-and-the-need-to-evolve-gis 9/9
UK,March19,2010http://www.academia.edu/2656557/OpenStreetMap-
Overview_and_Motivational_Factors
Jess3(2013)“TheGeoSocialUniverse3.0”February7,2013
http://blog.jess3.com/2013/02/the-geosocial-universe-3-0.html Meeker,M.(2011)“TopMobileInternetTrends”February10,2011
http://www.slideshare.net/kleinerperkins/kpcb-top-10-mobile-trends-feb-2011
Morozov,E.(2013)“TheMemeHustler:TomO’Reilly’sCrazyTalk”TheBaffler,22
http://thebaffler.com/past/the_meme_hustler
Toffe,KevinC(2011),“ReducingDataLatencyLeadstoFasterDecisions”March23,
2011http://gigaom.com/2011/03/23/reducing-data-latency-leads-to-faster-decisions/
Tuason,JulieA.(1987)'ReconcilingtheUnityandDiversityofGeography',Journalof
Geography,86:5,190—193
Twitter(2011),“CelebratingaNewYearwithaNewTweetRecord”January6,2011
http://blog.twitter.com/2011/01/celebrating-new-year-with-new-tweet.html
Wikipedia(2013)“BigData”April2,2013
http://en.wikipedia.org/w/index.php?title=Big_data&oldid=574362901
Wikipedia(2011),“MonolithicApplication”April20,2011
http://en.wikipedia.org/w/index.php?title=Monolithic_application&oldid=57292208
6