tim estes - generating dynamic social networks from large scale unstructured data

20
Generating Dynamic Social Networks from Large Scale Unstructured Data Enterprise Software to Make Sense of Really Junky Data Tim Estes - CEO, Digital Reasoning

Upload: digital-reasoning

Post on 25-May-2015

781 views

Category:

Technology


1 download

DESCRIPTION

Tim Estes, CEO of Digital Reasoning, delivered this presentation at the Strata Conference (Feb 2011). It discusses how large scale blog data can be mined to yield social networks of influencers, connections, discussion topics, etc.

TRANSCRIPT

Page 1: Tim Estes - Generating dynamic social networks from large scale unstructured data

Generating Dynamic Social

Networks from Large Scale

Unstructured DataEnterprise Software to Make Sense of Really Junky Data

Tim Estes - CEO, Digital Reasoning

Page 2: Tim Estes - Generating dynamic social networks from large scale unstructured data

• What is a social network?

• The web of relationships between entities that influences actions

• Why does it matter?

• To reference Aesop: “You are known by the company you keep.”

• What’s required to build one algorithmically?

• What’s similar, what’s the same, what’s connected

What We’ll Discuss

Page 3: Tim Estes - Generating dynamic social networks from large scale unstructured data

president bush

president george w

administration

bush administration

george

george w

george bush

brown

american

clinton

house

gov

white

clinton

the administration

president-elect

barack obama

barack

president george w

tenn

the predators

predators

oakland

milwaukee

st louis

carolina

a season

baltimore

kentucky

miley cyrus

pussycat dolls

bob dylan

nine inch nails

rock star

the timberwolves

sean preston

lanarkshire

ticket prices

nme

britney spears

the album

x factor

my friends

mtv

madonna

lady gaga

singer

a student

NashvilleBush

What’s similar?

We use patented algorithms for deducing related terms from the data…

White

House

Justin

Timberlake

Britney

Spears

Page 4: Tim Estes - Generating dynamic social networks from large scale unstructured data

What’s the same?

Concept resolution:

Roll up similar things into groups of the same (again, algorithmically)

Example: Tony Blair

Page 5: Tim Estes - Generating dynamic social networks from large scale unstructured data

What’s connected?

Link analysis:

Show who and what are connected (again, you guessed it, algorithmically)

Terrorist Leader Connections

Page 6: Tim Estes - Generating dynamic social networks from large scale unstructured data

Let’s Put an Idea to the Test...

YES

YES

and

With powerful analytics can you remove some or

most of the need for a priori structure in designing

and understanding social networks or other quasi-

ontological schemas?

Can you also do it with messy unstructured data?

Page 7: Tim Estes - Generating dynamic social networks from large scale unstructured data

But first...

Why do we (Digital Reasoning) care?

Page 8: Tim Estes - Generating dynamic social networks from large scale unstructured data

Because its what we do for a living. We make sense of the senseless.

Our customers have critical needs

- Digital Reasoning works primarily in the Defense and Intelligence

Community making sense of noisy, unstructured data and turning it

into usable entity-centric systems supporting mission critical

intelligence.

The data is big and bad

- Little structure in content, topics all over the place, and totally different

ontologies/schemas across the community.

The times we live in create urgencies

- We care because the better and faster we are at making sense of this

kind of data, the safer our country is.

Page 9: Tim Estes - Generating dynamic social networks from large scale unstructured data

Why did we take a data-centric, deployed software model?

Unique Environments

- Given who our customers are... we can’t host their data. No one can.

The solution had to be a pure deployed software model.

Meaning in Hard to Reach Places

- The data is basically a bunch of pieces that don’t want to be connected.

People that don’t want to be found.

Result?

- Imagine trying to turn that kind of data in that type of architecture from a

bunch of loose communication into a social network that has patterns of

life, weightings of influence, and projections of probable future actions...

Page 10: Tim Estes - Generating dynamic social networks from large scale unstructured data

Here’s what it looks like in an architecture…

Page 11: Tim Estes - Generating dynamic social networks from large scale unstructured data

Now let’s show what can be learned with a little application of

Entity-Oriented Analytics to a bunch of web data.

Page 12: Tim Estes - Generating dynamic social networks from large scale unstructured data

Test Case

Web Blog+Wikipedia data (collected by Fetch)

- 6M Blog URLs collected over 1Yr +

- 16M unique blog messages

- no unifying these, topic or author

- tricky to get “good” big data from the open web. ended up using .5% of that

original source. 1TB became 4GB.

No a priori structure, sparse metadata, nearly all meaning emerges

from analysis

Let’s see what we can find out...

Page 13: Tim Estes - Generating dynamic social networks from large scale unstructured data

Examining connections related to “Carl Icahn”

The data shows

connections to and from

Carl Icahn by:

• people

• periodicals

• topics

• companiesOn closer examination

the data tells us:

Carl Icahn “is backing” a

startup company that “would build” products

related to Barack Obama

Page 14: Tim Estes - Generating dynamic social networks from large scale unstructured data

Let’s examine what connections we find to “Egypt”

Egypt is identified as a

location, as an organization

(country) and as an

unassigned entity with all

related connections

On closer examination we see

interesting connections in the

blogs for Egypt, Cairo, Issues and the phrase “powder keg”.

If we drill down into the actual

blog entry we see the context of

the connections

Page 15: Tim Estes - Generating dynamic social networks from large scale unstructured data

How about connections to “Steve Jobs”?

The entities and connections in

the blog data are vast – which

is not surprising.

The large amount of authors

and topics reflect the popularity

of Steve Jobs as a blog subject

Authors

TopicsOne connection is interesting:

“Steve Jobs” to “Walt Mossberg”to “Kindle”

Synthesys shows the reason for connection as “pricing”

Clicking on this word we see the

context of the connection

Page 16: Tim Estes - Generating dynamic social networks from large scale unstructured data

Demo Platform

Synthesys Platform Beta

elastic

user-driven

entity-oriented-analytics on demand

Page 17: Tim Estes - Generating dynamic social networks from large scale unstructured data

Observations

New innovations will be algorithmic and focused on turning hard-

to-use data into dynamic, evolving knowledge that can automate

machine execution

Architectures/solutions will have to accommodate customers that don’t want to move their data to a Public Cloud

It is a true statement... “If you can connect the dots, you can

connect the people”

Page 18: Tim Estes - Generating dynamic social networks from large scale unstructured data

So why should You care?

Because there is a lot of data that doesn‘t belong on a shared grid.

Such as Top Secret data, Sensitive Corporate Data, and Personal

Data.

Because people may want to own (Personal Computing model)

vs. rent (Mainframe model) analytics

Because you may not want to convert your data to fit the model of

the hosted solution or map to their ontology to get the answers

you need.

Page 19: Tim Estes - Generating dynamic social networks from large scale unstructured data

To learn more…

See us at:

- Strata Science Fair (Wed evening 6:45PM)

- Digital Reasoning Booth #305

- www.digitalreasoning.com

Page 20: Tim Estes - Generating dynamic social networks from large scale unstructured data

Questions?

Automated Understanding, Trusted Decisions, True Intelligence