the network: a data structure that links domains

71
Marko A. Rodriguez University of New Mexico, September 14, 2007 The Network: A Data Structure that Links Domains Marko A. Rodriguez Los Alamos National Laboratory Vrije Universiteit Brussel University of California at Santa Cruz [email protected] http://www.soe.ucsc.edu/~okram

Upload: marko-rodriguez

Post on 11-May-2015

2.040 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

The Network:A Data Structure that Links Domains

Marko A. Rodriguez

Los Alamos National LaboratoryVrije Universiteit Brussel

University of California at Santa Cruz

[email protected]

http://www.soe.ucsc.edu/~okram

Page 2: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

About me.

• Marko Antonio Rodriguez.

• Bachelors of Science in Cognitive Science from U.C. San Diego.• Minor in the Arts in Computer Music from U.C. San Diego.• Masters of Science in Computer Science from U.C. Santa Cruz.• Visiting Researcher at the Center for Evolution, Complexity, and

Cognition at the Free University of Brussels.• Ph.d. in Computer Science from U.C. Santa Cruz [soon].

o I defend November 15, 2007!• Researcher at the Los Alamos National Laboratory since 2005.

Page 3: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Research trends.

• MESUR: Metrics from Scholarly Usage of Resources.(http://www.mesur.org)

• Neno/Fhat: A Semantic Network Programming Language and VirtualMachine Architecture. (http://neno.lanl.gov)

• CDMS: Collective Decision Making Systems. (http://cdms.lanl.gov)

Page 4: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

What is a network?

• A network is a data structure that is used to connect vertices/nodes/dots bymeans of edges/links/lines.

• Networks are everywhere.o Social: friendship, trust, communication, collaboration.o Technological: web-pages, communication, software dependencies, circuits.o Scholarly: journals, authors, articles, institutions.o Natural: protein interaction, neural, food web.

Page 5: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

The undirected network.

• There is the undirected network of common knowledge.o Sometimes called an undirected single-relational network.o e.g. vertex i and vertex j are “related”.

• The semantic of the edge denotes the network type.o e.g. friendship network, collaboration network, etc.

i j

Page 6: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Example undirected network.

Herbert

Marko

Aric

Ed

Zhiwu

AlbertoJen

Johan

Luda

Stephan

Whenzong

Page 7: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

The directed network.

• Then there is the directed network of common knowledge.o Sometimes called a directed single-relational network.o For example, vertex i is related to vertex j, but j is not related to i.

i j

Page 8: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Example directed network.

Muskrat

Bear

Fish

Fox

Meerkat

LionHuman

Wolf

Deer

Beetle

Hyena

Page 9: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

The semantic network.

• Finally, there is the semantic networko Sometimes called a directed multi-relational network.o For example, vertex i is related to vertex j by the semantic s, but j is not

related to i by the semantic s.

i js

Page 10: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Example semantic network.

SantaFe

Marko

NewMexico

Ryan

California

UnitedStatesLANL

livesInworksWith

cityOf

originallyFrom

stateOfstateOf

locatedIn

hasLab

Cells

Atoms

madeOf

madeOf

researches

Oregon

southOf

hasResident

Arnold

governerOf

northOf

Page 11: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

What are the techniques for analysis?

• Degree statisticso How many in- and out-edges does vertex i have?o What is the maximum and minimum in- and out-degree of the network?

• Shortest-path metricso What is the smallest number of steps to get from vertex i to vertex j?o How many of the shortest-paths go through vertex i?

• Power metricso What vertices are the most “influential”?

• Metadata distributionso What is the probability that a vertex of type x is connected to a vertex of type y?

Page 12: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Degree statistics.

Muskrat

Bear

Fish

Fox

Meerkat

LionHuman

Wolf

Deer

Beetle

Hyena

out = 4in = 0

out = 1in = 1

out = 0in = 2

out = 2in = 1

out = 1in = 1

out = 1in = 1

out = 3in = 0

out = 0in = 4

out = 0in = 1

out = 1in = 0

out = 1in = 3

Max_out = 4Max_in = 4Min_out = 0Min_in = 0

Page 13: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Shortest-path between Marko and Aric.

Herbert

Marko

Aric

Ed

Zhiwu

AlbertoJen

Johan

Luda

Stephan

Whenzong

Shortest path = 1

Page 14: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Eccentricity of Marko.

Herbert

Marko

Aric

Ed

Zhiwu

AlbertoJen

Johan

Luda

Stephan

Whenzong

Eccentricity = 3

Page 15: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Radius of the network.

Herbert

Marko

Aric

Ed

Zhiwu

AlbertoJen

Johan

Luda

Stephan

Whenzong

Radius = 3

Page 16: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Diameter of the network.

Herbert

Marko

Aric

Ed

Zhiwu

AlbertoJen

Johan

Luda

Stephan

Whenzong

Diameter = 4

Page 17: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Closeness of Marko.

Herbert

Marko

Aric

Ed

Zhiwu

AlbertoJen

Johan

Luda

Stephan

Whenzong

Closeness = 0.0526

Page 18: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Shortest-path metrics.

Shortest-path Eccentricity

DiameterRadius

Closeness Betweenness

Page 19: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Power metrics.

• Eigenvector Centralityo Rank vertices according to the primary eigenvector of the adjacency matrix

representing the network.o In the language of Markov chains, find the stationary probability distribution of the

chain.

• PageRanko Ensure a real-valued ranking by introducing a “teleportation-network” which

ensures strong connectivity (used by Google).

Page 20: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

The components to calculate astationary probability distribution.

• Take a single “random walker”.

• Place that random walker on any random vertex in the network.

• At every time step, the random walker transitions from its current node to anadjacent node in the network (i.e. takes a random outgoing edge from its currentnode.)

• Anytime the random walker is at a node, increment a “times visited” counter by 1.

• Let this algorithm run for an “infinite” amount of time.

• Normalize the “times visited” counters.o That is your centrality vector.

a

1

0.0123

Page 21: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Random walker example.

a

c

b

d

0

0

0

0

Page 22: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Random walker example.

a

c

b

d

1

0

0

0

Page 23: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Random walker example.

a

c

b

d

1

0

1

0

Page 24: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Random walker example.

a

c

b

d

1

0

1

1

Page 25: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Random walker example.

a

c

b

d

1

1

1

1

Page 26: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Random walker example.

a

c

b

d

1

1

2

1

Page 27: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Random walker example.

a

c

b

d

1

2

2

1

Page 28: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Random walker example.

a

c

b

d

2

2

2

1

Page 29: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Random walker example.

a

c

b

d

2

2

3

1

Page 30: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Random walker example.

a

c

b

d

2

2

3

2

Page 31: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Random walker example.

a

c

b

d

2

3

3

2

Page 32: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Random walker example.

a

c

b

d

2

3

4

2

Page 33: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Random walker example.

a

c

b

d

66785

133310

133321

66784

Page 34: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Random walker example.

a

c

b

d

0.167

0.332

0.332

0.167

Page 35: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

PageRank.

• The random walker has 0.85 probability of using G as its propagation networkand a 0.15 probability of using H as its propagation network (Google’s publishedalpha value).

• Every node is reachable by every other node and thus, is strongly connected.• A strongly connected network guarantees a stationary probability distribution.

Page 36: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Metadata distribution metrics.

• Scalar Assortativityo What’s the probability of encountering a node with degree x?o What’s the probability of encountering a node with degree x that is

connected to a node of degree y?o What’s the probability of encountering a node with degree x that is

connected to a node of degree y that is connected to a node of degree z?o …

• Discrete Assortativityo What’s the probability of encountering a node with metadata x?o What’s the probability of encountering a node with metadata x that is

connected to a node of metadata y?o What’s the probability of encountering a node with metadata x that is

connected to a node of metadata y that is connected to a node of metadataz?

o …

Page 37: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

The CENS network dataset.

• Center for Embedded Network Sensing at U.C. Los Angeles.• “An interdisciplinary and multi-institutional venture, CENS involves hundreds of

faculty, engineers, graduate student researchers, and undergraduate studentsfrom multiple disciplines at the partner institutions of University of California atLos Angeles (UCLA), University of Southern California (USC), University ofCalifornia Riverside (UCR), California Institute of Technology (Caltech),University of California at Merced (UCM), and California State University at LosAngeles (CSULA).”

Page 38: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Everything is metadata.

Marko

Affilation LANLDepartment Research LibraryGender MaleJobRank Ph.D. StudentBuidling P362Lab Prototyping TeamAdvisor Johan BollenDegree 2

…. ….

Page 39: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

The CENS coauthorship network.

• UCLA - red• USC - orange• Coventry - green• …

• Ph.D. - ellipse• Professor - hexagon• …

Page 40: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

1st-order degree distributions in CENS coauthorship network.

Page 41: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

2nd-order degree distributions in CENS coauthorship network.

Page 42: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

2nd-order degree assortativity in CENS coauthorship network.

• Pearson correlation on edge degrees.o r \in [-1,1]

• r = 0.212

Page 43: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

2nd-order degree assortativity in other networks.

• Physics coauthorship: 0.363• Biology coauthorship: 0.127• Mathematics coauthorship: 0.120• Film actor collaborations: 0.208

• Internet: -0.189• World Wide Web: -0.065

• Neural network: -0.163• Marine food web: -0.247

• Random graph: 0.0• Regular graph: 1.0

Newman, M.J., “Assortative Mixing in Networks”, Physical Review Letters, 89(20), 2002.

Page 44: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Metadata path frequencies.

• 1064.0 UCLA UCLA• 442.0 USC USC• 336.0 USC UCLA• 76.0 MIT UCLA• 58.0 UCLA UCM• 32.0 Caltech UCLA

• 376.0 Phd Professor• 254.0 Phd Researcher• 242.0 Researcher Professor• 184.0 Phd Phd• 142.0 Professor Professor

• 1186.0 Male Male• 508.0 Male Female• 78.0 Female Female

• 304.0 US US• 156.0 India US• 70.0 India India• 58.0 US China• 36.0 India China• 28.0 China China• 24.0 US Italy• 18.0 Iran India• 14.0 Iran US• 12.0 Greece India

• 750.0 CS CS• 388.0 EE CS• 340.0 EE EE• 84.0 CS CivilEng• 78.0 Biology CS• 74.0 CivilEng CivilEng

Page 45: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

2nd-order metadata assortativity in CENS coauthorshipnetwork.

• 0.696 Gender• 0.641 Affiliation• 0.513 Department• 0.482 Advisor• 0.426 Lab• 0.319 Building• 0.290 Origin• 0.168 JobRank• 0.042 Room

Page 46: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

3rd-order metadata assortativity in CENS coauthorshipnetwork.

• 0.471 Gender• 0.435 Affiliation• 0.290 Department• 0.225 Origin• 0.207 Advisor• 0.195 Lab• 0.170 Building• 0.032 JobRank• 0.004 Room

Page 47: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Metadata compression.

• 229.0 US:Male US:Female• 228.0 US:Male US:Male• 197.0 US:Male India:Male• 121.0 India:Male India:Male• 76.0 India:Male US:Female• 36.0 US:Male China:Male• 30.0 US:Female US:Female• 21.0 Taiwan:Male China:Male• 19.0 US:Male Italy:Male• 18.0 India:Male South Korea:Male• 17.0 India:Male China:Male• 17.0 India:Male Australia:Male• 16.0 China:Male China:Male• 16.0 India:Male Greece:Male• 16.0 US:Male Mexico:Male

Page 48: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

3rd-order metadata compression.

• 1402.0 UCLA:Phd UCLA:Professor UCLA:Phd• 879.0 UCLA:Researcher UCLA:Professor UCLA:Phd• 605.0 UCLA:Professor UCLA:Professor UCLA:Phd• 512.0 UCLA:Researcher UCLA:Professor UCLA:Researcher• 380.0 UCLA:Researcher UCLA:Professor UCLA:Professor• 304.0 USC:Phd UCLA:Professor UCLA:Phd• 294.0 USC:Phd USC:Professor USC:Phd• 272.0 UCLA:Phd UCLA:Phd UCLA:Phd• 270.0 UCLA:Professor UCLA:Phd UCLA:Phd

Page 49: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Breather.

Page 50: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Example semantic network.

SantaFe

Marko

NewMexico

Ryan

California

UnitedStatesLANL

livesInworksWith

cityOf

originallyFrom

stateOfstateOf

locatedIn

hasLab

Cells

Atoms

madeOf

madeOf

researches

Oregon

southOf

hasResident

Arnold

governerOf

northOf

Page 51: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

What is the Semantic Web?

• The figurehead of the Semantic Web initiative, Tim Berners-Lee, describes theSemantic Web as

o “... an extension of the current web in which information is given well-definedmeaning, better enabling computers and people to work in cooperation.”

• Perhaps not the best definition. It implies a particular application space--namelythe “web metadata and intelligent agents” space.

• My definition is that the Semantic Web iso “a highly-distributed, standardized semantic network data model--a URG (Uniform

Resource Graph). It’s a uniform way of graphing resources.”

Page 52: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

What is a resource?

• Resource = Anything.o Anything that can be identified.

• The Uniform Resource Identifier (URI):o <scheme name> : <hierarchical part> [ ? <query> ] [ # <fragment> ]

- http://www.lanl.gov- urn:uuid:550e8400-e29b-41d4-a716-446655440000- urn:issn:0892-3310- http://www.lanl.gov#MarkoRodriguez

– prefix it to make it easier on the eyes -- lanl:MarkoRodriguez

• The Semantic Webo “first identify it, then relate it!”

Page 53: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

The technologies of the Semantic Web.

• Resource Description Framework (RDF): The foundation technology of theSemantic Web. RDF is a highly-distributed, semantic network data model. InRDF, URIs and literals (e.g. ints, doubles, strings) are related to one another intriples.

o <lanl:marko> <lanl:worksWith> <lanl:jhw>o <lanl:jhw> <lanl:wrote> <lanl:LAUR-07-2028>o <lanl:LAUR-07-2028> <lanl:hasTitle> “Web-Based Collective Decision Making

Systems”^^<xsd:string>

• RDF Schema (RDFS): The ontology is to the Semantic Web as the schema is tothe relational database.

o “Anything of rdf:type lanl:Human can lanl:drive anything of rdf:type lanl:Car.”

• Triple-Store: The triple-store is to semantic networks what the relationaldatabase is to the data table.

o a.k.a. semantic repository, graph database, RDF database.

Page 54: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

RDF and RDFS.

lanl:marko lanl:cookie

lanl:Human lanl:Food

lanl:isEatingrdf:type rdf:type

lanl:isEating

rdfs:domainrdfs:range

ontologyinstance

RDF is not a syntax. It’s a data model. Various syntaxes exist to encode RDF including RDF/XML, N-TRIPLE, TRiX, N3, etc.

Page 55: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

The triple-store.

SELECT ?a ?c WHERE { ?a type human ?a wrote ?b ?b type article ?c wrote ?b ?c type human ?a != ?c }

• There are two primary ways to distribute information on the Semantic Web.o 1.) publish RDF/XML document on a web server.o 2.) expose a public interface to an RDF triple-store.

• The triple store is to semantic networks what the relational database is to datatables.

o Storing and querying triples in a triple store.o SPARQL\Update query language.

- like SQL, but for triple-stores.

INSERT ?a coauthor ?c WHERE { ?a type human ?a wrote ?b ?b type article ?c wrote ?b ?c type human ?a != ?c }

DELETE ?s ?p ?o WHERE { ?s ?p ?o }

Page 56: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Triple-store vs. relational database.

Triple-store Relational Database

SQL InterfaceSPARQL Interface

SELECT (?x4) WHERE { ?x1 dc:creator lanl:LAUR-06-2139. ?x1 lanl:hasFriend ?x2 . ?x2 lanl:worksFor ?x3 . ?x3 lanl:collaboratesWith ?x4 . ?x4 lanl:hasEmployee ?x1 . }

SELECT collaboratesWithTable.ordId2 FROM personTable, authorTable, articleTable, friendTable, hasEmployeeTable, organizationTable, worksForTable, collaboratesWithTable WHERE personTable.id = authorTable.personId AND authorTable.articleId = "dc:creator LAUR-06-2139" AND personTable.id = friendTable.personId1 AND friendTable.personId2 = worksForTable.personId AND worksForTable.orgId = collaboratesWithTable.orgId2 AND collaboratesWithTable.ordId2 = personTable.id

Page 57: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Semantic network metrics?

• What does it means to run a shortest-path calculation on a semantic network?o Shortest-path along which semantic--which edge type(s)?

• What does it mean to calculate PageRank on a semantic network?o What are legal semantics for the random walker?

Page 58: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Shortest-path metrics in a semantic network?

lanl:marko lanl:johanlanl:hasFriend

lanl:bob

lanl:jill

lanl:chuck

lanl:hasFriend

lanl:hasFriendlanl:hasFriend

lanl:livesInSameCityAs

“What is the shortest path between lanl:marko and lanl:jill by taking only lanl:hasFriend edges?”

Page 59: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

PageRank in a semantic network?

lanl:marko

lanl:p1

lanl:wrote

lanl:johan

lanl:wrote

?

lanl:chuck

lanl:hasFriend

lanl:Article

rdf:type

rdf:type

lanl:Human

rdf:type

rdf:type

? ?

Page 60: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Grammar-based geodesics and random walkers.

• How do you port many of the undirected and directed single-relational networkanalysis algorithms over to the semantic network domain?

o My solution is what I call a grammar.

• Nearly every network analysis algorithm can be represented in terms of a walkertraversing a network.

o Geodesics.o PageRanko Metadata paths.o etc.

Page 61: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Components of a grammar-based walker.

• A walker.o Discrete element.

• A grammar.o An abstract representation of legal path for the walker take.

- e.g. “you can traverse a lanl:friendOf edge from a lanl:Human to anotherlanl:Human.”

- Also includes rules: “increment a counter.”, “don’t ever return to this vertex.”

• A data set that respects the ontological “expectations” of the grammar.

Page 62: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Grammar-based PageRank example.

lanl:marko

lanl:p1

lanl:wrote

lanl:johan

lanl:wrote

lanl:chuck

lanl:hasFriend

lanl:Article

rdf:type

rdf:type

lanl:Human

rdf:type

rdf:type

0

0

0

“Take only lanl:wrote out-edge to a resource ofrdf:type lanl:Article. Then take a lanl:wrote in-edge to a resource of rdf:type lanl:Human.Increment only lanl:Humans. Make sure that thelanl:Human seen before is not the samelanl:Human currently. Repeat infinitely.”

Page 63: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Grammar-based PageRank example.

lanl:marko

lanl:p1

lanl:wrote

lanl:johan

lanl:wrote

lanl:chuck

lanl:hasFriend

lanl:Article

rdf:type

rdf:type

lanl:Human

rdf:type

rdf:type

“Take only lanl:wrote out-edge to a resource ofrdf:type lanl:Article. Then take a lanl:wrote in-edge to a resource of rdf:type lanl:Human.Increment only lanl:Humans. Make sure that thelanl:Human seen before is not the samelanl:Human currently. Repeat infinitely.”

1

0

0

Page 64: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Grammar-based PageRank example.

lanl:marko

lanl:p1

lanl:wrote

lanl:johan

lanl:wrote

lanl:chuck

lanl:hasFriend

lanl:Article

rdf:type

rdf:type

lanl:Human

rdf:type

rdf:type

1

0

0

“Take only lanl:wrote out-edge to a resource ofrdf:type lanl:Article. Then take a lanl:wrote in-edge to a resource of rdf:type lanl:Human.Increment only lanl:Humans. Make sure that thelanl:Human seen before is not the samelanl:Human currently. Repeat infinitely.”

Page 65: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Grammar-based PageRank example.

lanl:marko

lanl:p1

lanl:wrote

lanl:johan

lanl:wrote

lanl:chuck

lanl:hasFriend

lanl:Article

rdf:type

rdf:type

lanl:Human

rdf:type

rdf:type

1

0

1

“Take only lanl:wrote out-edge to a resource ofrdf:type lanl:Article. Then take a lanl:wrote in-edge to a resource of rdf:type lanl:Human.Increment only lanl:Humans. Make sure that thelanl:Human seen before is not the samelanl:Human currently. Repeat infinitely.”

Page 66: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Grammar-based PageRank example.

lanl:marko

lanl:p1

lanl:wrote

lanl:johan

lanl:wrote

lanl:chuck

lanl:hasFriend

lanl:Article

rdf:type

rdf:type

lanl:Human

rdf:type

rdf:type

1

0

1

“Take only lanl:wrote out-edge to a resource ofrdf:type lanl:Article. Then take a lanl:wrote in-edge to a resource of rdf:type lanl:Human.Increment only lanl:Humans. Make sure that thelanl:Human seen before is not the samelanl:Human currently. Repeat infinitely.”

Page 67: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Grammar-based PageRank example.

lanl:marko

lanl:p1

lanl:wrote

lanl:johan

lanl:wrote

lanl:chuck

lanl:hasFriend

lanl:Article

rdf:type

rdf:type

lanl:Human

rdf:type

rdf:type

2

0

1

“Take only lanl:wrote out-edge to a resource ofrdf:type lanl:Article. Then take a lanl:wrote in-edge to a resource of rdf:type lanl:Human.Increment only lanl:Humans. Make sure that thelanl:Human seen before is not the samelanl:Human currently. Repeat infinitely.”

Page 68: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Grammar-based PageRank example.

lanl:marko

lanl:p1

lanl:wrote

lanl:johan

lanl:wrote

lanl:chuck

lanl:hasFriend

lanl:Article

rdf:type

rdf:type

lanl:Human

rdf:type

rdf:type

2

0

1

“Take only lanl:wrote out-edge to a resource ofrdf:type lanl:Article. Then take a lanl:wrote in-edge to a resource of rdf:type lanl:Human.Increment only lanl:Humans. Make sure that thelanl:Human seen before is not the samelanl:Human currently. Repeat infinitely.”

Page 69: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Grammars create implicit relationships.

lanl:marko

lanl:p1

lanl:wrote

lanl:johan

lanl:wrote

lanl:chuck

lanl:hasFriend

lanl:Article

rdf:type

rdf:type

lanl:Human

rdf:type

rdf:type

lanl:hasCoauthor

Page 70: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Conclusions.

• Many data sets can be represented as a network of “actors”.• There exists many network analysis algorithms.

o Shortest-path metrics.o Eigenvector-based metrics.o Assortativity coefficients.

• The semantic network data structure is a less studied data model.o Semantic Web community doesn’t take a network approach to their substrate.

• The grammar technique can be used to port many of the common networkanalysis algorithms to the semantic network domain.

o Grammar-based geodesics.o Grammar-based random walkers.

Page 71: The Network: A Data Structure that Links Domains

Marko A. RodriguezUniversity of New Mexico, September 14, 2007

Related publications.

• Rodriguez, M.A., Watkins, J.H., Bollen, J., Gershenson, C., “Using RDF to Model the Structure andProcess of Systems”, International Conference on Complex Systems, Boston, Massachusetts, LAUR-07-5720, October 2007.

• Rodriguez, M.A., Bollen, J., Van de Sompel, H., “A Practical Ontology for the Large-Scale Modeling ofScholarly Artifacts and their Usage”, 2007 ACM/IEEE Joint Conference on Digital Libraries, pages 278-287, Vancouver, Canada, ACM/IEEE Computing, doi:10.1145/1255175.1255229, LA-UR-07-0665, June2007.

• Rodriguez, M.A., "Social Decision Making with Multi-Relational Networks and Grammar-BasedParticle Swarms", 2007 Hawaii International Conference on Systems Science (HICSS), pages 39-49,Waikoloa, Hawaii, IEEE Computer Society, ISSN: 1530-1605, doi:10.1109/HICSS.2007.487, LA-UR-06-2139, January 2007.

• Rodriguez, M.A., "A Multi-Relational Network to Support the Scholarly Communication Process",International Journal of Public Information Systems, volume 2007, issue 1, pages 13-29, ISSN: 1653-4360,LA-UR-06-2416, March 2007.

• Rodriguez, M.A., “Mapping Semantic Networks to Undirected Networks”, LA-UR-07-5287, August 2007.• Rodriguez, M.A., Watkins, J.H., “Grammar-Based Geodesics in Semantic Networks”, LA-UR-07-4042,

June 2007.• Rodriguez, M.A., Bollen, J., “Modeling Computations in a Semantic Network”, LA-UR-07-3678, May

2007.• Rodriguez, M.A., “General-Purpose Computing on a Semantic Network Substrate”, LA-UR-07-2885,

April 2007.• Rodriguez, M.A., “Grammar-Based Random Walkers in Semantic Networks”, LA-UR-06-7791, November

2006.