limits of learning in incomplete networks · regression weights feature vector of node i. nol...

Limits of Learning in Incomplete Networks

Timothy [email protected]

In collaboration with

1

Timothy Sakharov Sahely Bhadra Tina Eliassi-Rad

Supported by NSF CNS-1314603 & NSF IIS-1741197

Background: Incomplete Networks

- Networkdataisoftenincomplete

- Acquiringmoredataisoftenexpensiveand/orhard

- Researchquestion:Givenanetworkeddatasetandlimitedresourcestocollectmoredata,howcanyougetthemostbangforyourbuck?

2

Two general approaches to network completion

3

CollectmoredataDon’tcollectmoredata


4

Collectmoredata

Assumeanetworkmodel

Combinenetworkmodelwithincompletedatatogetamodelofthenetworkstructure

Infermissingdatafromthismodel

- [Kimetal.2011]- [Chenetal.2018]

Don’tcollectmoredata

Collectmoredata


5

EstimateStatisticsfrompartiallyobservednetwork- [Soundarajanetal.2015]- [Soundarajanetal.2016]Utilizeanexplore-exploitapproach- [PfeifferIIIetal.2014]- [Soundarajanetal.2017]- [Muraietal.2018]- [Madhawaetal.2018]- Thiswork!

Assumeanetworkmodel

Combinenetworkmodelwithincompletedatatogetamodelofthenetworkstructure

Infermissingdatafromthismodel

- [Kimetal.2011]- [Chenetal.2018]

Don’tcollectmoredata

Our solution: Network Online Learning (NOL)

6

- Learntogrowanincompletenetworkthroughsequential,optimalqueries(tosomeAPI)

- Agnostictobothdatadistributions andsamplingmethod

- Interpretablefeaturesthatarecomputableonline

- ResearchQuestion:Givenanetworkeddatasetandlimitedresourcestocollectmoredata,howcanyougetthemostbangforyourbuck?

Assumptions

7

Amodeloftheunderlyinggraph.

Howtheinitialsamplewascollected.

WeknowtheAPIaccessmodel(completevsincompletequeries).

Theunderlyingnetworkisstatic(probingthesamenodetwicegivesnonewinformation).

Weassume... Wedonotassume...

Inputs:

- : Incomplete network

- b : probing budget

- r : reward function

8

Probe b times to maximize cumulative reward

Output:

Network after b probes

Network Online Learning (NOL)

Example reward function: - number of new nodes observed

NOL algorithm

9

NOL algorithm

10

NOL algorithm

11

Observe the current state

NOL algorithm

12


Feature vector of node iRegression weights

NOL algorithm

13


14


Choose the next action

NOL algorithm

15



NOL algorithm

Take action, update network, collect reward

16




Update parameters

NOL algorithm

17




Update parameters

NOL algorithm

Online linear regression following Strehl et al., NIPS 2008.

18




Repeat until budget depleted

Update parameters

NOL algorithm

NOL algorithm

19

1. Observe the current state

2. Choose the next action

3. Take action, update network, collect reward

4. Update parameters

5. Repeat until budget depleted

Features

20

Features

21

i

Feature: In-sample degree

22

i

Feature: In-sample clustering coefficient

23

i

Feature: Normalized size of connected component

24

i

Feature: Fraction of probed neighbors

25

i

Feature: Lost Reward

26

u v

wKey idea: The order in which we probe nodes can impact the reward they yield.

Feature: Lost Reward

27

Unobserved node (potential reward for u or v)

Partially observed nodes

u v

wKey idea: The order in which we probe nodes can impact the reward they yield.

ResearchQuestion:Givenanetworkeddataset,andlimitedresourcestocollectmoredata,howcanyougetthemostbangforyourbuck?

- Potentialbangforyourbuckdependsonnetworkstructure!

28

Heterogeneity in network properties creates a learning “spectrum”

29


30

Learning not useful

Heuristics optimal


31

Potential for learning

Learning not useful

Heuristics optimal


32

Homogeneous degree dist.


Learning not useful

Heuristics optimal


33

Learning not useful


Heterogenous degree dist.

1Avrachenkovetal.INFOCOMWKSHPS (2014)

Heuristics optimal1



34

1Avrachenkovetal.INFOCOMWKSHPS (2014)


Heterogenous degree dist. with rich structure



Learning not useful

Heuristics optimal1

Experiments

35

Heuristic Baselines

36

- Highdegree- Probetheunprobednodewithmaximumdegree

- Highdegreew/jump- Probetheunprobednodewithmaximumdegree,randomlyjump

withprobabilityp- Lowdegree

- Probetheunprobednodewithminimumdegree- Random

- Probeanodechosenuniformlyatrandomfromtheunprobednodes

iKNN-UCB [Madhawa+ArXivpreprint,2018]

37

- K-NearestNeighborsUpperConfidenceBound- Nonparametricmulti-armedbanditapproach

- Choosenodetoprobebycombiningnearestneighborrewardinformation(basedonEuclideandistancebetweenfeaturevectors)+extentofpreviousexplorationofsimilaractions


38





Learning not useful

Heuristics optimal


39

Learning not useful

Heuristics optimal


Homogenous degree dist.




40

Learning not useful

Heuristics optimal


Erdos-Renyi Model




41

Learning not useful

Heuristics Optimal


Erdos-Renyi Model

BTER Model1

1C.Seshadhrietal.PhysicalReviewE (2012)



42

Learning not useful

Heuristics Optimal


Erdos-Renyi Model

Barabasi-Albert Model

BTER Model1

1C.Seshadhrietal.PhysicalReviewE (2012)

Results - Learning Spectrum

43

Learning not useful

Heuristics Optimal


Erdos-Renyi Model


BTER Model


44

Learning not useful

Heuristics Optimal


Erdos-Renyi Model


Allmethodsareindistinguishable,learningdoesn’thelporhurt!

BTER Model


45

Heuristics Optimal


Erdos-Renyi Model



Learning not useful

BTER Model


46

Heuristics Optimal


Erdos-Renyi Model


HighdegreeheuristicisoptimalinBAmodel,butNOLlearnstoprobeliketheheuristic!


Learning not useful

BTER Model


47

Heuristics Optimal


Erdos-Renyi Model



HighdegreeheuristicisoptimalinBAmodel,butNOLlearnstoprobeliketheheuristic!

Learning not useful

BTER Model


48

Learning Not Useful

Heuristics Optimal


Coauthorshipnetwork

Citationnetwork

EmailCommunicationNetwork

N=23kE=89k△s=78.7k

N=36.7kE=184k△s=727k

N=6.7kE=17k△s=21.6k

DBLP Cora Enron


49

Learning Not Useful

Heuristics Optimal


✓✓✓DBLP Cora Enron

Summary

50

- NetworkOnlineLearning canlearntoprobeonlinewithminimalassumptions

- Successistiedtopropertiesoftheunderlyingnetwork:

- Spectrumbasedonobjectivefunctionbeingmaximized(degreedistributionintheseexperiments)

- NOLcanlearntobehaveliketheoptimalheuristic

- Preliminaryexperimentssuggestsomerealworldcomplexnetworksfallinthe“learnable”category

References

51

Kimetal.(2011).TheNetworkCompletionProblem:InferringMissingNodesandEdgesinNetworks.SIAMInternationalConferenceonDataMining.

Seshadhrietal.(2012).Communitystructureandscale-freecollectionsofErdos-Renyigraphs,PhysicalReviewE.

Avrachenkovetal.(2014).Payfew,influencemost:Onlinemyopicnetworkcovering.Proceedings- IEEEINFOCOM.

PfeifferIIIetal.(2014).ActiveExplorationinNetworks:UsingProbabilisticRelationshipsforLearningandInference.InProceedingsofthe23rdACMInternationalConferenceonConferenceonInformationandKnowledgeManagement.

Soundarajanetal.(2015).MaxOutProbe:AnAlgorithmforIncreasingtheSizeofPartiallyObservedNetworks.The2015NIPSWorkshoponNetworksintheSocialandInformationSciences.

Soundarajanetal.(2016).MaxReach:Reducingnetworkincompletenessthroughnodeprobes.2016IEEE/ACMInternationalConferenceonAdvancesinSocialNetworksAnalysisandMining(ASONAM).

Soundarajanetal.(2017).ε-WGX:AdaptiveEdgeProbingforEnhancingIncompleteNetworks.InProceedingsofthe2017ACMonWebScienceConference.

Muraietal.(2018).SelectiveHarvestingOverNetworks.DataMiningandKnowledgeDiscovery.

Madhawaetal.(2018).ExploringPartiallyObservedNetworkswithNonparametricBandits.arXiv:1804.07059.

Chenetal.(2018).FlexibleModelSelectionforMechanisticNetworkModelsviaSuperLearner.arXiv:1804.00237.

Tim [email protected]

Thanks!

Slides at http://eliassi.org/larock_netsci18.pdf

limits of learning in incomplete networks · regression weights feature vector of node i. nol...

Documents