representation learning for network embedding 赵鑫ir.sdu.edu.cn/~zhuminchen/rl/zhaoxin2016.pdf ·...

DistributedLearningforNetworkEmbedding

Xin [email protected]

Renmin University of China

SMP2016@NanChang

Whatissocialcomputingconcernedabout?

Ourcurrenttopic

• Therearemanytopicsbuttodaywefocusonnetworkembedding

Outline• Preliminaries

– word2vec• NetworkEmbeddingModels

– DeepWalk– Node2vec– GENE– LINE– SDNE

• ApplicationsofNetworkEmbedding– Basicapplications– Visualization– Textclassification– Recommendation

• Conclusion

Preliminaries

• Softmax functions• Distributionalsemantics• Word2vec– CBOW– Skip-gram

Preliminaries• Representationlearning– Usingmachinelearningtechniquestoderivedatarepresentation

• Distributedrepresentation– Differentfromone-hotrepresentation,itusesdensevectorstorepresentdatapoints

• Embedding– Mappinginformationentitiesintoalow-dimensionalspace

Softmax function

• IttransformsaK-dimensionalrealvectorintoaprobabilitydistribution– Acommontransformationfunctiontoderiveobjectivefunctionsforclassificationordiscretevariablemodeling

Distributionalsemantics

• Targetword=“stars”

Distributionalsemantics

• Collectthecontextualwordsfor“stars”

Word2Vec

• Input:asequenceofwordsfromavocabularyV

• Output:afixed-lengthvectorforeachterminthevocabulary– vw

Itimplementstheideaofdistributionalsemanticsusingashallowneuralnetworkmodel.

Architecture1:CBOW

• CBOW predictsthecurrentwordusingsurroundingcontexts– Pr(𝑤"|context(𝑤"))

• Windowsize2c

• context(𝑤") =[𝑤"#$,…,𝑤"%$]

Architecture1:CBOW


– UsingaK-dimensionalvectortorepresentwords• 𝑤" → 𝒗<=

• 𝒗><= =∑ 𝒗=ABCD=EB FCG$ (𝑖 ≠ 𝑡)

Architecture1:CBOW


– BasicIdea• Giventhecontextofthecurrentword𝒗><=• Sim(𝒗><= ,𝒗<=)>Sim(𝒗><= ,𝒗<T)

Architecture1:CBOW

• Howtoformulatetheidea– Usingasoftmax function– Consideredasaclassificationproblem• Eachwordisaclassificationlabel

𝑃 𝑤 wcontext =exp(𝑠𝑖𝑚(𝒗><,𝒗<))

∑ exp(𝑠𝑖𝑚(𝒗><,𝒗<X))�<X

Architecture2

• Skip-gram predictssurroundingwordsusingthecurrentword– Pr(context(𝑤") |𝑤")• Windowsize2c

• context(𝑤") =[𝑤"#$,…,𝑤"%$]

Architecture2

• Skip-gram predictssurroundingwordsusingthecurrentword– Pr(context(𝑤") |𝑤")• Windowsize2c

• context(𝑤") =[𝑤"#$,…,𝑤"%$]

𝑃(𝑤′|𝑤) =exp(𝑠𝑖𝑚(𝒗<,𝒗<X))

∑ exp(𝑠𝑖𝑚(𝒗<,𝒗<XX))�<XX

NetworkEmbeddingModels

• DeepWalk• Node2vec• GENE• LINE• SDNE


• DeepWalk (Perozzi etal.,KDD2014)

• Node2vec• GENE• LINE• SDNE

Whatisnetworkembedding?• Wemapeachnodeinanetworkintoalow-dimensionalspace– Distributedrepresentationfornodes– Similaritybetweennodesindicatethelinkstrength

– Encodenetworkinformationandgeneratenoderepresentation

19

Example

• Zachary’sKarateNetwork:

20

DeepWalk

• DeepWalk learnsalatentrepresentationofadjacencymatricesusingdeeplearningtechniquesdevelopedforlanguagemodeling

21

Languagemodeling

• Learningarepresentationofawordfromdocuments(wordco-occurrence):– word2vec:

• Thelearnedrepresentationscaptureinherentstructure

• Example:

22

Fromlanguagemodelingtographs• Idea:– Nodes<-->Words– Nodesequences<-->Sentences

• Generatingnodesequences:– Usingrandomwalks

• shortrandomwalks=sentences

• Connection:– Wordfrequency inanaturallanguagecorpusfollowsapowerlaw.

– Vertexfrequencyinrandomwalksonscalefreegraphsalsofollowsapowerlaw.

23

Framework

24

RepresentationMapping

25

DeepLearningStructure:Skip-grammodel

26

Skip-gram:Theinputtothemodeliswi,andtheoutputcouldbewi−1,wi−2,wi+1,wi+2

Experiments

• NodeClassification– Somenodeshavelabels,somedon’t

• DataSet– BlogCatalog– Flickr– YouTube

27

Results:BlogCatalog

28


• DeepWalk• Node2vec (Groveretal.,KDD2016)

• GENE• LINE• SDNE

Node2Vec

• AgeneralizedversionofDeepWalk– Objectivefunction

– Conditionalindependence

– Symmetryinfeaturespace

Node2Vec

– anetworkneighborhoodofnodeu generatedthroughaneighborhoodsamplingstrategyS.

– Thekeyliesinhowtofindaneighboronthegraph

– HowDeepWalk solvethis?

HowNode2vecDothis?

• Motivation

– BFS:broaderà homophily– DFS:deeperà structuralequivalence

HowNode2vecDothis?

• CanwecombinethemeritsofDFSandBFS– BFS:broaderà homophily– DFS:deeperà structuralequivalence

HowNode2vecDothis?

• Explainingthesamplingstrategy

Node2vecAlgorithm

ComparisonbetweenDeepWalk andNode2vec

• Theyactuallyhavethesameobjectivefunctionandformulations

• Thedifferenceliesinhowtogeneraterandomwalks

• BEAUTY:nodeà word,pathà sentence


• DeepWalk• Node2vec• GENE (Chenetal.,CIKM2016)

• LINE• SDNE

GENE

• IncorporateGroupInformationtoEnhanceNetworkEmbedding–Whengroupinformationisavailable,howtomodelit?• Groupàcontrol member

GENE

• Recalldoc2vec

• Howtousedoc2vectomodelgroupandmembervectors

GENE

• IncorporateGroupInformationtoEnhanceNetworkEmbedding–Whengroupinformationisavailable,howtomodelit?

GENE

• Formulatetheidea


• DeepWalk• Node2vec• GENE• LINE (Tangetal.,WWW2015)

• SDNE

First-orderProximity

• Thelocalpairwiseproximitybetweenthevertices– Determinedbytheobservedlinks

• However,manylinksbetweentheverticesaremissing– Notsufficientforpreservingtheentire

networkstructure

12

34

5

6

7

8

9

10

Vertex6 and7 havealargefirst-orderproximity

LINE

FromJianTang’sslides

• Theproximitybetweentheneighborhoodstructures ofthevertices

• Mathematically,thesecond-orderproximitybetweeneachpairofvertices(u,v)isdeterminedby:

12

34

5

6

7

8

9

10

Vertex5 and6 havealargesecond-orderproximity

�̂�^ = (𝑤^_,𝑤^G,… ,𝑤^ b )

�̂�c = (𝑤c_,𝑤cG,… ,𝑤c b )�̂�d = (1,1, 1,1,0,0,0,0,0,0)

�̂�g = (1,1, 1,1,0,0,5,0,0,0)

Second-orderProximity

LINE


PreservingtheFirst-orderProximity

• Givenanundirected edge 𝑣j, 𝑣k ,thejointprobabilityof𝑣j, 𝑣k

𝑝_ 𝑣j, 𝑣k =1

1 + exp(−𝑢jo ⋅ 𝑢k)

𝑂_ = 𝑑(�̂�_ ⋅,⋅ , 𝑝_ ⋅,⋅ )

∝ − t 𝑤jk log 𝑝_(𝑣j, 𝑣k)�

j,k ∈v

�̂�_ 𝑣j, 𝑣k =𝑤jk

∑ 𝑤jwkw�(jw,kw)

𝑢j:Embeddingofvertex𝑣j

KL-divergence• Objective:

𝑣j

LINE


PreservingtheSecond-orderProximity

• Givenadirected edge(𝑣j, 𝑣k),theconditionalprobabilityof𝑣k given𝑣j is:

𝑝G 𝑣k|𝑣j =exp(𝑢kXo ⋅ 𝑢j)

∑ exp(𝑢yXo⋅ 𝑢j)|b|yz_

�̂�G 𝑣k|𝑣j =𝑤jk

∑ 𝑤jy�y∈b

𝑂G =t𝜆j𝑑(�̂�G ⋅ 𝑣j , 𝑝G ⋅ 𝑣j )�

j∈b

∝ − t 𝑤jk log 𝑝G(𝑣k|𝑣j)�

j,k ∈v

𝜆j:Prestigeofvertexinthenetwork𝜆j = ∑ 𝑤jk�

k

𝑢j:Embeddingofvertexiwheni isasourcenode;𝑢jX:Embeddingofvertexiwheni isatargetnode.

• Objective:

LINE


PreservingbothProximity

• Concatenatetheembeddings individuallylearnedbythetwoproximity

First-order

Second-order

LINE



• DeepWalk• Node2vec• GENE• LINE• SDNE(Wangetal.,KDD2016)

SDNE

• Preliminary– Autoencoder

SDNE

• Preliminary– Autoencoder• Thesimplestcase:asinglehiddenlayer

SDNE

• First-orderproximity– Linkednodesshouldbecodedsimilarly

SDNE

• Second-orderproximity– Themodelshouldreconstructtheneighborhoodvectors

– Similarnodesevenwithoutlinkscanhavesimilarcodes• Orwecannotreconstructtheneighborhood

SDNE

• Networkreconstruction

• Linkprediction

NetworkEmbeddingModels• DeepWalk– Nodesentences+word2vec

• Node2vec– DeepWalk +moresamplingstrategies

• GENE– Group~document +doc2vec(DM,DBOW)

• LINE– Shallow+first-order+second-orderproximity

• SDNE– Deep+First-order+second-orderproximity

ApplicationsofNetworkEmbedding

• Basicapplications• DataVisualization• Textclassification• Recommendation

BasicApplications

• Networkreconstruction• Linkprediction• Clustering• Featurecoding– Nodeclassification• Demographicprediction


• Basicapplications• DataVisualization(Tangetal.,WWW2016)

• Textclassification• Recommendation

DataVisualization

DataVisualization

• ConstructionoftheKNNgraph

DataVisualization

• Visualization-basedembedding

DataVisualization

• Non-linearfunction

DataVisualization

• Accuracy

• Runningtime

DataVisualization


• Basicapplications• DataVisualization• Textclassification (Tangetal.,KDD2015)

• Recommendation

Networkembeddinghelpstextmodeling

Textrepresentation,e.g.,wordanddocumentrepresentation,…

…

degree

networkedge

node word

document

classification

text

embedding

wordco-occurrencenetworkFreetext

Deeplearninghasbeenattractingincreasingattention…

Afuturedirectionofdeeplearningistointegrateunlabeleddata…

TheSkip-grammodelisquiteeffectiveandefficient…

Informationnetworksencodetherelationshipsbetweenthedataobjects…

Ifwehavethewordnetwork,wecananetworkembeddingmodeltolearnwordrepresentations.

TextClassification


• Adapttheadvantagesofunsupervisedtextembeddingapproachesbutnaturallyutilizethelabeled dataforspecifictasks

• Differentlevelsofwordco-occurrences:localcontext-level,document-level,label-level

Textcorpora

degree

network

edge

node word

document

classification

text

embedding

(a)word-wordnetwork

Heterogeneoustextnetwork

Textrepresentation,e.g.,wordanddocumentrepresentation,…

…

label

label

label document

Deeplearninghasbeenattractingincreasingattention…

Afuturedirectionofdeeplearningistointegrateunlabeleddata…

TheSkip-grammodelisquiteeffectiveandefficient…

Informationnetworksencodetherelationshipsbetweenthedataobjects…

null

null

null

textinformation

network

word…

classification

label_2

label_1

label_3…

(c)word-labelnetwork

…

textinformationnetworkword…classification

doc_1doc_2doc_3doc_4…

(b)word-documentnetwork

…

TextClassification


BipartiteNetworkEmbedding– ExtendpreviousworkLINE (Tangetal.WWW’2015) onlarge-scaleinformationnetworkembedding

– Preservethefirst-order andsecond-order proximity– Onlyconsiderthesecond-order proximityhere

Tangetal.LINE:Large-scaleInformationNetworkEmbedding.WWW’2015

𝑉} 𝑉~

𝑣j

𝑣kp 𝑣k|𝑣j =��(^T

�⋅^C)∑ ��(^Tw

� ⋅^C)�Tw∈�

𝑂 = − t 𝑤jk log 𝑝(𝑣k|𝑣j)�

j,k ∈v

• Foreachedge 𝑣j, 𝑣k ,defineaconditionalprobability

• Edgesamplingandnegativesamplingforoptimization

• Objective:

TextClassification


HeterogeneousTextNetworkEmbedding

• Heterogeneoustextnetwork:threebipartitenetworks– Word-word(word-context),word-document,word-labelnetwork– Jointlyembedthethreebipartitenetworks

• Objective

• where

O�"� = 𝑂<< + 𝑂<� + 𝑂<�

𝑂<< = − t 𝑤jk log𝑝(𝑣j|𝑣k)�

j,k ∈vFF

𝑂<� = − t 𝑤jk log𝑝(𝑣j|𝑑k)�

j,k ∈vF�

𝑂<� = − t 𝑤jk log𝑝(𝑣j|𝑙k)�

j,k ∈vF�

Objectiveforword-word network

Objectiveforword-document network

Objectiveforword-label network

TextClassification


ResultsonLong Documents:Predictive20newsgroup Wikipedia IMDB

Type Algorithm Micro-F1 Macro-F1 Micro-F1 Macro-F1 Micro-F1 Macro-F1

Unsupervised LINE(𝐺<�) 79.73 78.40 80.14 80.13 89.14 89.14

Predictiveembedding

CNN 78.85 78.29 79.72 79.77 86.15 86.15

CNN(pretrain) 80.15 79.43 79.25 79.32 89.00 89.00

PTE(𝐺<�) 82.70 81.97 79.00 79.02 85.98 85.98

PTE(𝐺<< + 𝐺<�) 83.90 83.11 81.65 81.62 89.14 89.14

PTE(𝐺<� + 𝐺<�) 84.39 83.64 82.29 82.27 89.76 89.76

PTE(pretrain) 82.86 82.12 79.18 79.21 86.28 86.28

PTE(joint) 84.20 83.39 82.51 82.49 89.80 89.80

PTE(joint)>PTE(pretrain)

PTE(joint)>PTE(𝐺<�)PTE(joint)>CNN/CNN(pretrain)

TextClassification


ResultsonShort Documents:PredictiveDBLP MR Twitter

Type Algorithm Micro-F1 Macro-F1 Micro-F1 Macro-F1 Micro-F1 Macro-F1

Unsupervisedembedding

LINE(𝐺<< + 𝐺<�)

74.22 70.12 71.13 71.12 73.84 73.84

Predictiveembedding

CNN 76.16 73.08 72.71 72.69 75.97 75.96CNN(pretrain) 75.39 72.28 68.96 68.87 75.92 75.92PTE(𝐺<�) 76.45 72.74 73.44 73.42 73.92 73.91PTE(𝐺<< + 𝐺<�) 76.80 73.28 72.93 72.92 74.93 74.92PTE(𝐺<� + 𝐺<�) 77.46 74.03 73.13 73.11 75.61 75.61PTE(pretrain) 76.53 72.94 73.27 73.24 73.79 73.79PTE(joint) 77.15 73.61 73.58 73.57 75.21 75.21

PTE(joint)>PTE(pretrain)

PTE(joint)>PTE(𝐺<�)PTE(joint)≈ CNN/CNN(pretrain)

TextClassification



• Basicapplications• DataVisualization• Textclassification• Recommendation (Zhaoetal.,AIRS2016,Xie et al, CIKM

2016)

Recommendation

• LearningDistributedRepresentationsforRecommenderSystemswithaNetworkEmbeddingApproach–Motivation

Zhaoetal.,AIRS2016

Recommendation

• Fromtrainingrecordstonetworks

Recommendation

• Givenanyedgeinthenetwork

Recommendation

• User-itemrecommendation

Recommendation

• User-item-tagrecommendation

Graph-basedPOIEmbedding

Xie etal.,CIKM2016

Moreworksonrecommendation

• Howtoutilizesequentialembeddingmodelstosolveotherapplicationtasks

Sequentialmodelingforrecommendation

• Deeplearningforsequencemodeling– Token2vec• POIrecommendation• Productrecommendation

– RecurrentNeuralNetworks• POIrecommendation

Word2Vec

• Input:asequenceofwords fromavocabularyV

• Output:afixed-lengthvectorforeachterm inthevocabulary– vw

Itimplementstheideaofdistributionalsemanticsusingashallowneuralnetworkmodel.

Token2Vec

• Input:asequenceofsymboltokens fromavocabularyV

• Output:afixed-lengthvectorforeachsymbolinthevocabulary– vw

Youcanimaginethatallthesequencesinwhichsurroundingcontextsaresensitivecanpotentiallybemodeledwithword2vec.

Check-indata

What information these check-in data contain?UserIDLocationIDCheck-intimeCategorylabel/nameGPSinformation

Check-indata

What information these check-in data contain?UserIDLocationIDCheck-intimeCategorylabel/nameGPSinformation

An example

UID25821BurgerKing@BH Point2015-01-13/1:30pmRestaurant

ASequentialWaytoModeltheData

• Givenauseru,atrajectoryisasequenceofcheck-inrecordsrelatedtou

UserID LocationID Check-inTimestampu1 l181 2016-08-269:26am

u1 l32 2016-08-2610:26am

u1 l323 2016-08-2611:26am

u1 l32323 2016-08-261:26pm

u2 l345 2016-08-269:16am

u2 l13 2016-08-2610:36am

ASequentialWaytoModeltheData

• Givenauseru,atrajectoryisasequenceofcheck-inrecordsrelatedtou

UserID LocationID Check-inTimestampu1 l181 2016-08-269:26am

u1 l32 2016-08-2610:26am

u1 l323 2016-08-2611:26am

u1 l32323 2016-08-261:26pm

u2 l345 2016-08-269:16am

u2 l13 2016-08-2610:36am

u1:l181àl32àl323àl32323u2:l345àl13


• Deeplearningforsequencemodeling– Token2vec• POIrecommendation(Zhaoetal.,TKDE2016)• Productrecommendation


Task

• Input:Check-insequences

• Output:Embeddingrepresentationsforusers,locationsandotherrelatedinformation

GenerationofaSingleLocation inaTrajectory

• Userinterests:• Trajectoryintents:• Surroundinglocations• Temporalcontexts:

Observationsintextdata

• King– man=Queen– woman

• Whatabouttrajectorydata?

Qualitativeexamples



(Zhaoetal.,TKDE2016,Wangetal.,SIGIR2015)


Token2vecforProductRecommendation

• Doc2vec– Docà user–Wordà product


• PreliminaryresultsonJingDong dataset– AllthethreesimpleembeddingmethodsarecomparativewiththestrongbaselineBPR



– RecurrentNeuralNetworks• POIrecommendation(Yangetal.,arXiv 2016)

RNNfortrajectorysequences

• Inashortwindow


• Inalongrange,RNNtendstobelesseffectiveduetotheproblemof“vanishinggradient”– LongShort-TermMemoryunits(LSTM)– GatedRecurrentUnit(GRU)


• Combineshort- andlong-termdependencetogether


• Incorporateuserinterestsandnetworks

Conclusions

• Therearenoboundariesbetweendatatypesandresearchareasintermsofmythologies– Datamodelsarethecore

• Eveniftheideasaresimilar,wecanmovefromshallowtodeepiftheperformanceactuallyimproves

Disclaimer

• Forconvenience,Idirectlycopysomeoriginalslidesorfiguresfromthereferredpapers.IamsorrybutIdidnotaskforthepermissionofeachreferredauthor.Ithankyoufortheseslides.Iwillnotdistributeyouroriginalslides.

References• TomasMikolov,IlyaSutskever,KaiChen,GregoryS.Corrado,JeffreyDean.DistributedRepresentationsofWords

andPhrasesandtheirCompositionality.NIPS2013:3111-3119• BryanPerozzi,RamiAl-Rfou',StevenSkiena.DeepWalk:onlinelearningofsocialrepresentations.KDD2014:701-

710• AdityaGrover,JureLeskovec.node2vec:ScalableFeatureLearningforNetworks.KDD2016:855-864• Jifan Chen,QiZhang,Xuanjing Huang,IncorporateGroupInformationtoEnhanceNetworkEmbedding.CIKM2016• JianTang,Meng Qu,Mingzhe Wang,MingZhang,JunYan,Qiaozhu Mei.LINE:Large-scaleInformationNetwork

Embedding.WWW2015:1067-1077• Daixin Wang,PengCui,Wenwu Zhu.StructuralDeepNetworkEmbedding.KDD2016:1225-1234• JianTang,Jingzhou Liu,MingZhang,Qiaozhu Mei.VisualizingLarge-scaleandHigh-dimensionalData.WWW2016:

287-297• JianTang,Meng Qu,Qiaozhu Mei.PTE:PredictiveTextEmbeddingthroughLarge-scaleHeterogeneousText

Networks.KDD2015:1165-1174• WayneXinZhao,Jin Huang,Ji-Rong Wen.LearningDistributedRepresentationsforRecommenderSystemswitha

NetworkEmbeddingApproach.AIRS2016.• MinXie,Hongzhi Yin,Hao Wang,Fangjiang Xu,Weitong Chen,andSenWang.LearningGraph-basedPOI

EmbeddingforLocation-basedRecommendation• WayneXinZhao**,Ningnan Zhou**,XiaoZhang,Ji-Rong Wen,ShanWang.AGeneralMulti-ContextEmbedding

ModelforMiningHumanTrajectoryData.IEEETrans.Knowl.DataEng.28(8):1945-1958(2016)• WayneXinZhao,SuiLi,Yulan He,EdwardY.Chang,Ji-Rong Wen,Xiaoming Li.ConnectingSocialMediatoE-

Commerce:Cold-StartProductRecommendationUsingMicrobloggingInformation.IEEETrans.Knowl.DataEng.28(5):1147-1159(2016)

• QuocV.Le,TomasMikolov.DistributedRepresentationsofSentencesandDocuments.ICML2014:1188-1196• Pengfei Wang,Jiafeng Guo,Yanyan Lan,JunXu,Shengxian Wan,Xueqi Cheng.LearningHierarchical

RepresentationModelforNextBasket Recommendation.SIGIR2015:403-412• ChengYang,Maosong Sun,WayneXinZhao,Zhiyuan Liu.ANeuralNetworkApproachtoJointModelingSocial

NetworksandMobileTrajectories.arXiv:1606.08154(2016)

AdvancedReadings• Shiyu Chang,WeiHan,Jiliang Tang,Guo-JunQi,CharuC.Aggarwal,ThomasS.Huang:HeterogeneousNetworkEmbeddingviaDeepArchitectures.KDD2015:119-128

• Mingdong Ou,PengCui,JianPei,Ziwei Zhang,WenwuZhu:AsymmetricTransitivityPreservingGraphEmbedding.KDD2016:1105-1114

• ThomasN.Kipf,MaxWelling:Semi-SupervisedClassificationwithGraphConvolutionalNetworks.CoRR abs/1609.02907(2016)

• MikaelHenaff,JoanBruna,YannLeCun:DeepConvolutionalNetworksonGraph-StructuredData.CoRR abs/1506.05163(2015)