factual 2011 web 2.0 presentation

32
Big Data Challenges: Getting Some March 31, 2011 Gil Elbaz @factual @gilelbaz

Upload: factualteam

Post on 27-Jan-2015

106 views

Category:

Technology


0 download

DESCRIPTION

Gil Elbaz, CEO and founder of Factual, gave a talk at the 2011 Web 2.0 Conference in San Francisco. His talk was entitled: "Big Data Challenges: Getting Some."

TRANSCRIPT

Page 1: Factual 2011 Web 2.0 Presentation

Big Data Challenges: Getting SomeMarch 31, 2011

Gil Elbaz @factual @gilelbaz

Page 2: Factual 2011 Web 2.0 Presentation

Confi dential

Road to Information Singularity

2

Page 3: Factual 2011 Web 2.0 Presentation

Confi dential

Networks Underlying Information Flow

3

! Density: number of connecting paths! Plasticity:

ease of forming new paths

! Speed & Flow: rate of information transfer

!""#$%%&&&'()*++,-+.*/(01,-211(**3'4*5%()*++,-+6.*/6(01,-211%

Page 4: Factual 2011 Web 2.0 Presentation

Confi dential

The Internet

4

!""#$%%&&&'7578*-'4*5%9,+,"7):;/*11/*7<1:=52/,47-:>2)24*550-,47",*-1:?-"2/-2"%<#%@ABACD@ECF

Page 5: Factual 2011 Web 2.0 Presentation

Confi dential

Search Engines

5

Page 6: Factual 2011 Web 2.0 Presentation

Confi dential

Social Networks: Facebook

6

600 million Facebook users130 average friends

8 friend requests / month

15 messages / day / user

!""#$%%A'(#'()*+1#*"'4*5%

Page 7: Factual 2011 Web 2.0 Presentation

Confi dential

Trending of Unfriending

7

Page 8: Factual 2011 Web 2.0 Presentation

Confi dential 8

Page 9: Factual 2011 Web 2.0 Presentation

Confi dential

Unfriending

9

Page 10: Factual 2011 Web 2.0 Presentation

Confi dential

Another Network: The Brain

10

100 billion neurons

1000 ‘hardwired’ synapses

!""#$%%&2)4*52"*G57/"'4*5%A@CC%@C

Page 11: Factual 2011 Web 2.0 Presentation

Confi dential

Web 3.0: Data Web

11

Page 12: Factual 2011 Web 2.0 Presentation

Confi dential

Web Scale Data = More Pain

12

Findability

Access

Rights

Economics

Standards

Integration & Aggregation

Trust

Page 13: Factual 2011 Web 2.0 Presentation

Confi dential

Web 2.0 Model: Scale-Free Networks

13&&&'.0"0/22H#)*/7",*-'-2"

Page 14: Factual 2011 Web 2.0 Presentation

Confi dential

Book Data: Progress Being Made

14

Google Book Search API Open Library Books API ISBNdb Amazon API LibraryThing GoodReads WorldCat

Page 15: Factual 2011 Web 2.0 Presentation

Confi dential

I,-<7(,),"JKKKKKKKKKKKK=44211KKKKKKKL,+!"1KKKKKKKKKKKKM4*-*5,41KKKKKKK

N"7-<7/<1KKKKKKKKKKKKKKK>/01"KKKKKK

Google Book Search API Open Library Books API ISBNdb WorldCat

Amazon API LibraryThing GoodReads

Page 16: Factual 2011 Web 2.0 Presentation

Confi dential

Another Case Study: Local Data

16

!""#$%%1"2O24!2-2J'#*1"2/*01'4*5%

Page 17: Factual 2011 Web 2.0 Presentation

Confi dential

Another Case Study: Local Data

17

!"#$$%&

'()%*++,

-++.$

'+/&01/(&%

2%3.

4#33+"

5++63%

7+8%9:/;)$#+;

!"#$$%&

'()%*++,

-++.$

'+/&01/(&%

2%3.

4#33+"

5++63%

7+8%9:/;)$#+;

Examine Twitter sentiment(avoid dirty coffee shops)

Identify areas of highest bike thefts

Correlate check-ins with property values

!"##$%$&$'$(#)*+()(,-&(##)%.'/!"#$%"$&"'$()*$*!)$%+*+0

Page 18: Factual 2011 Web 2.0 Presentation

Confi dential

HomeJunction

18

Page 19: Factual 2011 Web 2.0 Presentation

Confi dential

Factual is Example of New Information Network

19

"#$#%&'( )'$&*+*#(&(

Aggregate Mash Curate Dedupe Canonicalize

,-."'-%$%+*+

,-./#'&01&-*'&2

!"#$%"&'()"*+$,-.-/(0(1("*+$%231#-&"$4..*

345&*'6&'$

Developers Search EnginesPublishers

Page 20: Factual 2011 Web 2.0 Presentation

Confi dential

Factual’s Open Data Model

20

Free, access via APIs, SDKs, and downloads BUT…we ask you to contribute back into ecosystem.

Benefi ts

! Drive down costs

! Rapid iteration

! Differentiate on user experience

! Only need small % participation from world (e.g. Wikipedia)

Page 21: Factual 2011 Web 2.0 Presentation

Confi dential

Equivalence Measurements

21

Subway Sandwiches52 E Court StCincinnati, OH(513)-241-6699

Subway52 West Court St45202(800)-653-2323

= ?

Page 22: Factual 2011 Web 2.0 Presentation

Confi dential

Large-Scale Aggregation Technologies

22

Page 23: Factual 2011 Web 2.0 Presentation

Confi dential

Large-Scale Aggregation Technologies

23

=#7/"52-"1KPK=#"1;2-"2/KPK;"/

;*/#KPK;*/#*/7",*-N2/O,42KPKNO4=""*/-2JKPK=""J=11*4KPK=11*4,7"21?-4KPK?-4*/#*/7"2<=11-KPK=11*4,7",*-;*KPK;*5#7-JQ*0-"KPKQ"R/*1KPKR/*"!2/1

KKKKKKKRRSKPKR7/(2T02KKK'''''U*/,KPK>2<

Page 24: Factual 2011 Web 2.0 Presentation

Confi dential

Large-Scale Aggregation Technologies

24

L21"70/7-"KPKL1"/-"L21"70/7-"KPKL21"07/7-"V*1#KPKV*1#,"7)R,))7/<1KPKR,)),7/<1N7)*-KPKN)-R0..2"KPKR0..2"";2-"2/KPK;"/

=#7/"52-"1KPK=#"1R*0",T02KPKR"TW2&2)2/1KPKW2&)2/1;)27-2/1KPK;)-/1KKKKKQ7/32"KPKQ3"8K'''''X/7+2-KPKYZL2,))JK[

Page 25: Factual 2011 Web 2.0 Presentation

Confi dential

Kragen O'Reilly?

25

Page 26: Factual 2011 Web 2.0 Presentation

Confi dential

Large-Scale Deduping

26

• Specialized data compression & folding techniques

• Eliminate redundant entities - endpoints and authority pages

• Improves precision & recall

• Enables real-time dedupe and crosswalks

Page 27: Factual 2011 Web 2.0 Presentation

Confi dential

Shared Foundational Data

! Commoditization of data

! Head attributes for people, places, things decreasing in value

! hCard data value driven to zero (visual of local data being identical on thousand of apps)

! Entertainment: IMDB exposed all their data for non-commercial use (link to site map)

! Yet, there are still lots of errors in foundation data – thus need “living” model

Page 28: Factual 2011 Web 2.0 Presentation

Confi dential

LA Neighborhoods: Another Crowdsourcing Example

! LA Times started with 87 neighborhoods based on census tracts

! Incorporated 650+ user maps

! Ended with 114 neighborhoods for LA City

! Added additional 158 neighborhoods for LA County

Page 29: Factual 2011 Web 2.0 Presentation

Confi dential

Ownership & Rights: LA Neighborhoods:

! Terms of Service: Creative Commons Attribution, Noncommercial, Share-Alike license

! Can share and remix as long as it’s for noncommercial uses, attributed to the LA Times, and shared under the same terms

Page 30: Factual 2011 Web 2.0 Presentation

Confi dential

Evolving “Buy” Model

! Data Marketplaces (“itunes of data?”)

! Data Search Engines

! Microformats / Semantic Web Markups / Other Standards

! Electronic Forms of T&Cs

Page 31: Factual 2011 Web 2.0 Presentation

Confi dential

Summary: Road to the Information Singularity

31

! Rise in community storage and access

! New common schemas and standards

! Defi nitive, accountable sources of “open” data

! Trends towards sharing of foundational data

! 'Buy' models based on unique data, novel access methods, SLAs, value-added services

Page 32: Factual 2011 Web 2.0 Presentation

Thank you! Questions......

Gil Elbaz @factual @gilelbaz