Transcript

7/27/2019 The Danger of a Big Data Episteme and the Need to Evolve GIS

http://slidepdf.com/reader/full/the-danger-of-a-big-data-episteme-and-the-need-to-evolve-gis 1/9

TheDangerofaBigDataEpistemeandtheNeedtoEvolveGIS

SeanP.Gorman–TimbrLLC,Charlottesville,VAUSA

Abstract

Theemergenceof“BigData”asadominanttechnologymemechallengesGeography’s

technicalunderpinnings,foundinGIS,whileengagingthedisciplineinaconversation

aboutthememe’simpactonsociety.Thisallowsscholarstoengagecollaboratively

frombothacomputationallyquantitativeandcriticallyqualitativeperspective.For

Geographythereisanopportunitytopointouttheseshortcomingsthroughcritical

appraisalsof“BigData”anditsreflectionofsociety.Complimentarilythisopensthe

doortodevelopingmethodologiesthatwillallowforamorerealisticinterpretationof

“BigData”analysisinthecontextofanunfilteredsocietalview.

BigDataasaMeme

“BigData”isapopulartechnologicalmemethathasbecomepervasiveinthelanguage

discussingavarietyofcomputingchallengesandtrends.Theterm“BigData”has

severalcharacteristicsoftenassociatedwithpopulartechnologymemes:

1)  Thecomponentwords“big”and“data”arebothbroadandgeneral

2)  Thetermisopentomultipleinterpretations

3)  Thewordscaneasilybecompositedwithothertermstofurtherspreadthe

meme–BigScience,BigComplexity,BigPrivacy

“BigData”issimilarintrajectoryto“opensource”and“web2.0”memesthatleadtoa

plethoraof“open”prefixesand“2.0”suffixes.Whilethesemanticsofthecreationand

useoftheterm“BigData”isafascinatingroadtowalk,thispositionpaperwillfocus

moreonthemethodologicalthanthecriticalissuesofthememe.Specifically,thepaper

willexaminetheroleofgeographyin“BigData”throughthechallengesitcreatesfor

computation,methodology,andinterpretation.Further,itwillexploretheimpactof

“BigData”onthedisciplineofGeographyasseenthroughthelensofGIS.Itshouldbe

keptinmindthattheimpactsof“BigData”gobeyondjustquantitativeapproaches.

Theymayalsoimpactqualitativeresearchasseenthroughtheemergingworksin

“digitalhumanities”.

Whileit’sdifficulttopindownageneraldefinitionof“BigData”itisusefultohavea

startingpointforunderstandingtheroleofGeographytodayandinthefuture.The

shortdefinitionof“BigData”isthatitencompasses“acollectionofdatasetssolarge

andcomplexthatitbecomesdifficulttoprocessusingon-handdatabasemanagement

toolsortraditionaldataprocessingapplications(Wikipedia2013).”Further,theunique

7/27/2019 The Danger of a Big Data Episteme and the Need to Evolve GIS

http://slidepdf.com/reader/full/the-danger-of-a-big-data-episteme-and-the-need-to-evolve-gis 2/9

characteristicsofthesedatasetsthatmakethemdifficulttomanagewithtraditional

toolsare:

•  Volume–thesizeofdatathatmustbemanaged

•  Velocity–thespeedatwhichthatdataarrivesandneedstobe

processed/analyzed•  Variety–thetypesofdatahandledincludestructuredandunstructureddata

(i.e.text,sensoroutput,GPS,video,audio,logfilesetc.)

•  Veracity–theaccuracyandprecisionofdataisvariable

Inaddressingtheroleofgeographyin“BigData”oneofthekeytakeawaysisthatnot

onlyisgeographyjustonetype,ofseveraltypesofdata,itisalsoofteninconsistent

acrossasingledataset.Forinstance,inmanymobileandsocialapplicationsusers

decidewhethertoincludetheirlocationornot.Thisisindicativeofalargertrendseen

in“locationasafeature”.

TheEmergenceofLocationasFeature

Oneoftheimportantproducersof“BigData”hasbeenthegrowthofmobile,socialand

locationapplications,oftencalledSoLoMo(Meeker2011).Whilegeographic

informationscienceshaslargelyevolvedalongapathofincreasingspecializationand

complexitydrivenbyprofessionals,SoLoMohasemergedasageneraltechnologytrend

quicklymakinglocationubiquitous,drivenbyconsumers.ByitsverynatureSoLoMohas

beencenteredonselfservice,andallowingtheconsumermassestoseamlesslyinteract

withlocationandgeography.Thistechnologyshifthasbeendrivenbyseveralevolving

factorsandevents.First,GPSenabledalargernumberofpeopletocreategeographic

data.ThiswasfollowedbytheincorporationofGPSintocommoditytechnologieslike

mobiledevices.Inadditionlocationhaspermeateduptheinformationtechnologystack

withW3CspecificationsforaddinglocationtoWebbrowsers,andeventheinclusionof

locationintodesktopoperatingsystems.Thelocationcomponentcreatedbythese

technologiesisonedatafeatureofanexistingbaseline,andnotastandalone

technologyaswasdevelopedwithGIS.Further,theattributesofdatawentbeyond

whatthecomputationalunderpinningsofGISwasoriginallyconstructedfor–now

integratingunstructureddataandtemporalattributesbothatverylargevolumeand

highspeeds.

Theadoptionof“locationasafeature”hasbeenmassiveinscale.Thegraphicbelowcoversjusttheadoptionofmobile/locationtechnologiestodrivesocialapplications:

7/27/2019 The Danger of a Big Data Episteme and the Need to Evolve GIS

http://slidepdf.com/reader/full/the-danger-of-a-big-data-episteme-and-the-need-to-evolve-gis 3/9

Figure1:TheGeoSocialUniverseAdoptingLocationasaFeature(Jess32013)

OneofthechallengesforGeographyasadisciplineisthat“locationasafeature”

happenedoutsidetheparadigmofgeographicinformationscience.Thisoccurredforseveralreasons.1)GISwasbuiltforworkingwithgeographicdataandlocationasthe

centeroftheoperatingsystem.FortherapidlygrowingSoLoMospacelocationwasjust

oneofmanycomponentsthatwerebeingleveraged,andwasnotthecenterofthe

operatingsystem.2)Dataflowingfrommobiledevicesthroughsocialnetworksis

dynamicandnotstatic.Whiletime-spacehasbeenanactiveareaofstudyin

geographicinformationscience,traditionallydatasourceshavenotbeenunbounded

andperpetuallyupdating.Thisphenomenonisreinforcedbythedatacharacteristics

acrosstheseemergingservices(bigeventsliketheSuperBowlandNewYearsEvecan

resultinratesof5,000-6,000tweetspersecondfromalocation)andinmassive

volumes(155milliontweetsinaday)(Twitter2011).Putintothelargerperspective

90%ofalldatainthehistoryofhumanityhasbeencreatedinthelasttwoyears(Tofel

2011).ThiswasnotthetechnologyparadigmwhenGISemerged–datawasstatic,in

mostlysmallvolumesandintendedforarelativelysmallaudience.ThisisnottosayGIS

hasnotevolved,butithasiterativelyadaptedtorequirementsinitsnicheof

practitioners,andnottothedemandsofthelargermarketthatisservedby“BigData”.

7/27/2019 The Danger of a Big Data Episteme and the Need to Evolve GIS

http://slidepdf.com/reader/full/the-danger-of-a-big-data-episteme-and-the-need-to-evolve-gis 4/9

Monolithicvs.Distributed

InitsinceptionGISwaspredominantlyamonolithicapplicationrunninginamainframe

environmentandasitevolvedfromthecommandlinetothedesktopthesame

monolithicstructurepersisted.Monolithicsystems“helptheusercarryoutacomplete

task,endtoend,andare‘privatedatasilos’ratherthanpartsofalargersystemofapplicationsthatworktogether(Wikipedia2011).”InthecaseofGISthecompletetask

wasperforminggeospatialanalysisthatresultedinanendproducttobedistributedto

viewers.Fewothersystemsatthistimegeneratedspatialdata,somonolithicstructures

werenotonlyapopularbutalsoapracticalsolution.Thesamestructuresalso

dominatedwordprocessingandpersonalfinanceapplicationsforsimilarreasons

(Wikipedia2011).

Themonolithicstructurealsomatchedupwellwiththephilosophicaldirectionof

geographicinformationscience.Inthisconstruct,GISwasviewednotonlyasascience

butaprofession,whichrequiredspecialtyskillsandtrainingwithingeography

departments.Whilethiscreatedacorpusofhighlytrainedprofessionalsitalsocreated

aninsularapproachthatalsomanifesteditselfinthetechnicalarchitectureofGIS.Data

wascreated,managed,analyzed,visualizedandpublishedallwithinasinglesystem,and

theresultwasconsideredauthoritativeandcanonical.Operationofthesystemwasby

trainedprofessionals.Sincedatawasnotcreatedexternallyamonolithicdesignwas

efficientandwellsuitedtothecustomersetatthetime.

Astechnologyhasevolvedfrommonolithictodistributedsystems,GIShasadaptedand

evolvedaswell.First,GISadaptedtoaclientserverenvironment,andincreasingly

providesApplicationProgrammingInterfaces(API)todataandanalysisenabling

externalconsumption.Mostrecentlytherehavebeenconnectionsof“BigData”computationalplatformslikeHadooptoGISapplications(Github2013aandGithub

2013b).WhilethehooksintoGIShavemodernized,thestructurehasevolvedtoamore

distributedarchitecture–althoughuserworkflowsarestillgearedtowardstheentire

taskbeingdoneend-to-endinthesystem.

ChallengesofScopeandScale

Computationally“BigData”andrelatedtrendshavecreatedadistributedecosystem

withmanycomponentsandusers.Aslocationdatacomesfromanincreasingvarietyof

devicesandcontributorsthereisachallengeofwhatmechanismwillmanagetheaccuracyandveracityofthedata.Therearepotentialproblemsinbothscaleandscope

applyingthecurrentGISprocessfordeterminingwhatis“authoritativedata”tothese

emergingsourcesofunboundedlocationdata.Fromascopeperspectiveitrequiresthe

GIScadreofprofessionalstobeexpertsinamassivenumberofsubjectmatterareas–

anthropology,sociology,economics,politicalscience,socialmedia,disasterresponse

etc..Isthedisasterresponseprofessionalonthegroundinabetterpositionto

determinethequalityofdatabeingreportedbycitizens,oraretheGISprofessionals

7/27/2019 The Danger of a Big Data Episteme and the Need to Evolve GIS

http://slidepdf.com/reader/full/the-danger-of-a-big-data-episteme-and-the-need-to-evolve-gis 5/9

backatheadquarters?Shouldaninstitutionbedependentonhaving

geography/geomaticacademicdepartmentsgenerateGIScurriculumstocreateanew

generationofsocialmediaanalystsbeforerespondingtothepressingneedtoanalyzea

newsourceofinformation?Theinherentproblemofhavingthedisciplineofgeography

createaspecialtydisciplineforeveryaspectofsciencethathasalocationorgeographic

componenthaslongbeenrecognizedas“therecurringidentitycrisisthatplaguesmoderngeographyanditspractitioners(Tuason1987)”.

Thisiswheretheproblemofscalebecomespotentiallyinsurmountable.Themonolithic

structurewhichrequiresaGIStrainedpersontodictatedataasauthoritativehasan

inherentdependencyofrequiringatrainedpersontoalwaysbeintheloop.Asthe

volumeoflocationenableddataincreasesatanexponentialrateitraisesthereal

problemofhowdothenumberofGISprofessionalsscaletokeeppacewiththespeed

andvolumeofthenewdatathatmustbeverified.ThestructureofGISasatechnology

andprofessionwasnotbuilttohandlemassivevolumesofexternalauthoreddata.

Becauseofitsmonolithicstructure,datawastobegeneratedbyprofessionalssolely

withintheGISworkflow.Now,Twitteraloneisgeneratingmillionsoflocation-enabled

messagesperday.Simply,therearenotenoughtrainedprofessionalstoverifyeach

newpieceofdataeveniftheydidhavethetools.Itisaproblemofsupplyanddemand.

Thesupplyofdatabeinggeneratedhasfaroutstrippedthesupplyoftrained

professionalstoverifyit-requiringanewparadigminordertoadapt.Thisisnottosay

theconceptofverifiedandunverifieddataisnotcriticaltoeffectiveoperations.Itis

sayingthatinordertokeepupwiththerapidlygrowingvolumeofdata,theverification

andvalidationofdatacannotcontinuetobepurelydependentontrainedhuman

professionalsdoingthisbyhandorwithcurrenttools.Innovation,automation,

statisticalinferenceandtheuseofcrowdsourcingtoenableverificationandvalidation

ofdataaregreatlyneededinorderforGIStosuccessfullyadapt.

Issuesofprivacyandthepotentialofcreatingbothgovernmentandcorporatedriven

surveillancestatesfurthercomplicatethischallenge(DodgeandKitchin2007).As

humansaretakenoutoftheloopandreplacedwithalgorithmicregulationthe

applicationofethicsandgovernanceisunclear.Whilethisgoesoutsidethescopeof

thispositionpaper,itisausefulconnectionofhowtechnologicalchallengesof“Big

Data”aredirectlylinkedtosocietalrepercussionsbeingfocusedonbyotherpapersin

thisjournaledition.

StatisticalChallengesofDataatScale

Theamountofdataemergingfrom“BigData”,wherelocationisonefeatureofdata,is

onlygoingtoincreaseatever-highervelocities.Thisnewrealityisgoingtorequire

innovativeconceptsaroundnotonlyleveragingthecrowdfordata,butalsousingthe

crowdtoascertaintheveracityofdata.Traditionalconceptslikeerrorboundswill

fundamentallychangebecausedatacollectionhasexpandedfromjustaperiodicbasis

toalsoincludepersistentcollectionfrommillionsofgloballydistributedsensors.Inthis

7/27/2019 The Danger of a Big Data Episteme and the Need to Evolve GIS

http://slidepdf.com/reader/full/the-danger-of-a-big-data-episteme-and-the-need-to-evolve-gis 6/9

context,errorwillbeafluidconceptandnotastaticmeasure.Metadataneedstoalso

evolvetoafluidconceptinthesecases.TherequirementfordedicatedGISmetadata

librarianswithhundredsofmetadataelementswillnotscalefor“BigData”.Thecrowd

canbeleveragedtoverifyandupdatemetadataasonepotentialifnotentirely

sufficientpath.Thishasbeendonewithgreatsuccessfor“pointofinterest”(POI)and

roaddatabyprojectsrangingfromFactualtoOpenStreetMaprespectively.

Further,theconceptofsamplesizeandmarginoferrorisbeingturnedupsidedown.

Previouslyasmallcadreofhighlytrainedprofessionalsmadeasmallnumberofvery

preciseobservationsandthesewereextrapolatedtoanentirepopulation.Now,sample

sizescomeclosetothesizeoftheactualpopulation,butarealsoincrediblybiased(i.e.

TwitterprovidesamassivesamplebutitisbiasedtoonlythoseusingTwitter).Recent

workbytheOxfordInternetInstitutefoundlargebiasesjustindifferentmethodsof

accessingTwittertoquerydataforanalysis-searchAPIvs.streamingAPI(Gonzales-

Bailonetal2012).Thereisstillalackoffundamentalscienceinunderstandingwhatthe

geographicanddemographicbiasesareoftheproducersof“BigData”,throughthe

varietyofuserdrivenservicesthatcreatethecontent.

TheMethodologicalChallengesoftheVarietyinBigData

Theemergenceof"locationasafeature"inmobileandwebappshasnotonly

generatedalargeamountofnewdata,butalsochangedthedefiningcharacteristicsof

thedata.Thisemergingdataisoftenunstructured,unverified,streamingand

unbounded–asnotedabovethisisadifferentworldthanthemajorityofstructuredGIS

dataworkedwithtraditionally.

Tacklingthisdatameansnotonlyreimaginingmanycurrentstatisticaltechniques,butalsodippingintootherdisciplinesandtool-boxeslikenaturallanguageprocessing,

statisticalmechanicsandmachinelearningtonameafew.ExtendingGeographyto

workwiththeseemergentsourcesofdatamean1)evolvingcurrentdisciplinary

approachesand2)borrowingfromotherdisciplinestosolvenewproblems.

“BigData”hasseveralfeaturestoitthatgeographicinformationsciencehasnot

commonlyfocusedon,andthereisnotasolidexistingmethodologicalframeworkfor

managing.Challengesindealingwitherror,accuracy,andsamplebiashavebeen

addressedbrieflyinthispaper.Expandingthelisttodealingwiththeunstructured

aspectsofbigdata,unboundeddatastreams,locationasasubsetofalargerdataset

andothersgoeswellbeyondthescopeofthepaper.Itisusefultogiveatrivialexample

ofhowthesechallengescanmakeevenasimplegeographicanalysistaskchallenging

though.

Creatingthematicmapsisoneofthemostcommoncartographicoutputsandselecting

therightclassificationfordataispartoftellingtheappropriatestoryofadataanalysis.

Whenthedataforamapisstaticthisisafairlystraightforwardtask.Whendatais

7/27/2019 The Danger of a Big Data Episteme and the Need to Evolve GIS

http://slidepdf.com/reader/full/the-danger-of-a-big-data-episteme-and-the-need-to-evolve-gis 7/9

dynamicandunboundedthetaskbecomesconsiderablymorecomplex.Startingwith

fourofthemostcommonapproachestobinningdataforthematicmapping-equal

interval,standarddeviation,Jenksnaturalbreaksandquantile–thechallengesquickly

emerge.BothstandarddeviationandJenk’srequiretheminimumandmaximumvalues

ofthedatadistributiontobeknown.Inthecaseofanunboundedperpetuallyupdating

stream,itisnotpossibletoknowwhatthesevalueswillbe.Theminimumandmaximumvaluesofthedatastreamhistoricallycouldbeusedasaproxy,butthese

couldeasilybeexceededinthefuturecausingthemappingtobeinaccurate.Quantile

andequalintervalcanbecalculateddynamicallysincetheydonotrequiretheboundsof

data,butdonotcoveralldatadistributionsaccurately.Further,thesedatadistributions

willchangeovertimesotheappropriatebinningattime“x”mightnotalsobethe

correctbinningattime“y”.Thisbeginstoprovidesomeperspectiveonthechallenges

“BigData”holdsforgeographicmethodologies,whichonlybecomemorecomplexwhen

appliedtomoresophisticatedgeographicmethodologiesutilizing“BigData”.

TheChallengeofInterpretationwhenBigDataEqualsthePerceivedEndofTheory 

Themethodologicalchallengesimposedby“BigData”makeinterpretationexceeding

difficult.Inspiteoftheseobstaclesthereisapopularconceptionthat“BigData”will

notonlybetheendoftheory(Anderson2008),butevenfurtherthe:

“Beliefthatbigdata,harnessedthroughcollectiveintelligence,wouldallowusto

getattherightanswertoeveryproblem,makingbothrepresentationand

deliberationunnecessary”(Morozov2012).

Thepanaceaaspectsof“BigData”havegrownaspopularperception,leadingtobeliefs

thatresultsofanalysesareapplicabletosocietywritlarge.Haklay(2013)haswrittenonthetrendoftoolsanddatageneratedbythetechnologicaleliteandthebiases(Haklay

andBudhathoki2010)inthedatagenerated.Whiletheconceptofahumanpowered

sensorwebdrivenbytheadoptionofmobiledevicesiscompelling-thereislittle

understandingofthemacro-scaledynamics.Whoandwhoisnotconnected?Who

contributesandwhopassivelyconsumes?Howdoesthisbreakdownbydemographic

andgeography?Thedigitaldivideismuchmorethanconnectivity,butalsothe

participationonthevariousservicesridingacrossnetworksthatgenerate“BigData”.

Whatarethe“datashadows”createdbytheinteractionsofhumanandmachinesacross

networksthatcompresstimeandspace(Graham2013)?Thecreationofcontentthat

feeds“BigData”bothactivelyandpassivelyhasitsowndistinctgeographiesandbiasesthatareonlybeginningtobeconceptualized.Withoutthisparameterizationitis

incrediblydifficulttointerprettheresultsof“BigData”inthecontextofglobalsociety.

Conclusion

Thispaperbegandiscussingaspectsof“BigData”asatechnologicmeme.Exploring

how“BigData”hasevolvedpointstoitsperceivedemergenceasanepisteme.What

7/27/2019 The Danger of a Big Data Episteme and the Need to Evolve GIS

http://slidepdf.com/reader/full/the-danger-of-a-big-data-episteme-and-the-need-to-evolve-gis 8/9

beganasanevolutionincomputationhasmorphedinpopularculturetobeafieldof

scientificity.Thosethatworkwith“BigData”areevenreferredtoas“datascientists”.

Thereductionistmethodsofunderstandingrealityin“BigData”producenew

knowledgeandmethodsforthecontrolofreality.Yetitisnotarealitythatreflectsthe

largersociety,butinsteadthesmallminoritycontributingcontent.

ForGeographyasadisciplinethereisanopportunitytopointouttheseshortcoming

throughcriticalappraisalsof“BigData”anditsreflectionofsociety.Further,thereis

potentiallyanevenlargeropportunityindevelopingthemethodologiesthatwillallow

foramorerealisticinterpretationof“BigData”analysisinthecontextofanunfiltered

societalview.Todosothegeographicinformationscienceaspectofthedisciplinewill

needtoevolvetheirapproachtodataandanalysis.Insuccessthisprovidesaunique

opportunityforpositivisticandpost-positivisticscholarsinGeographytocollaboratein

pushingthedisciplineforwardtoanareainneedofgreaterillumination. 

References

Anderson,C.(2008)“TheEndofTheory:TheDataDelugeMakestheScientificMethod

Obsolete”.Wired ,June23,2008,

http://www.wired.com/science/discoveries/magazine/16-07/pb_theory

Dodge,M.andKitchin,R.(2007)“Outlinesofaworldcominginexistence’:Pervasive

computingandtheethicsofforgetting”EnvironmentandPlanningB34:3,431-445

Github(2013a)“GISToolsforHadoop”April3,2013http://esri.github.com/gis-tools-

for-hadoop/

Github(2013b)“SpatialHadoop”April3,2013

https://github.com/aseldawy/spatialhadoop

Gonzales-Bailon,S.Wang,N.Rivero,A.Borge-Holthoefer,J.andMoreno,Y.(2012)

“AssessingtheBiasonCommunicationsNetworksSampledfromTwitter”arXiv.org,

December7,2012http://arxiv.org/abs/1212.1684

Graham,M.(2013)“TheVirtualDimension.”InGlobalCityChallenges:debatinga

concept,improvingthepractice.eds.M.AcutoandW.Steele.London:Palgrave.(in

press)

Haklay,M.(2013)“NeogeographyandtheDelusionofDemocratization”Environment

andPlanningA,45:1,55–69

Haklay,M.andBudhathoki,N.(2011)“OpenStreetMap–OverviewandMotivational

Factors”HorizonInfrastructureChallengeThemeDay,theUniversityofNottingham,

7/27/2019 The Danger of a Big Data Episteme and the Need to Evolve GIS

http://slidepdf.com/reader/full/the-danger-of-a-big-data-episteme-and-the-need-to-evolve-gis 9/9

UK,March19,2010http://www.academia.edu/2656557/OpenStreetMap-

Overview_and_Motivational_Factors

Jess3(2013)“TheGeoSocialUniverse3.0”February7,2013

http://blog.jess3.com/2013/02/the-geosocial-universe-3-0.html Meeker,M.(2011)“TopMobileInternetTrends”February10,2011

http://www.slideshare.net/kleinerperkins/kpcb-top-10-mobile-trends-feb-2011

Morozov,E.(2013)“TheMemeHustler:TomO’Reilly’sCrazyTalk”TheBaffler,22

http://thebaffler.com/past/the_meme_hustler

Toffe,KevinC(2011),“ReducingDataLatencyLeadstoFasterDecisions”March23,

2011http://gigaom.com/2011/03/23/reducing-data-latency-leads-to-faster-decisions/

Tuason,JulieA.(1987)'ReconcilingtheUnityandDiversityofGeography',Journalof

Geography,86:5,190—193

Twitter(2011),“CelebratingaNewYearwithaNewTweetRecord”January6,2011

http://blog.twitter.com/2011/01/celebrating-new-year-with-new-tweet.html

Wikipedia(2013)“BigData”April2,2013

http://en.wikipedia.org/w/index.php?title=Big_data&oldid=574362901

Wikipedia(2011),“MonolithicApplication”April20,2011

http://en.wikipedia.org/w/index.php?title=Monolithic_application&oldid=57292208

6


Top Related