factual 2011 web 2.0 presentation
DESCRIPTION
Gil Elbaz, CEO and founder of Factual, gave a talk at the 2011 Web 2.0 Conference in San Francisco. His talk was entitled: "Big Data Challenges: Getting Some."TRANSCRIPT
Big Data Challenges: Getting SomeMarch 31, 2011
Gil Elbaz @factual @gilelbaz
Confi dential
Road to Information Singularity
2
Confi dential
Networks Underlying Information Flow
3
! Density: number of connecting paths! Plasticity:
ease of forming new paths
! Speed & Flow: rate of information transfer
!""#$%%&&&'()*++,-+.*/(01,-211(**3'4*5%()*++,-+6.*/6(01,-211%
Confi dential
The Internet
4
!""#$%%&&&'7578*-'4*5%9,+,"7):;/*11/*7<1:=52/,47-:>2)24*550-,47",*-1:?-"2/-2"%<#%@ABACD@ECF
Confi dential
Search Engines
5
Confi dential
Social Networks: Facebook
6
600 million Facebook users130 average friends
8 friend requests / month
15 messages / day / user
!""#$%%A'(#'()*+1#*"'4*5%
Confi dential
Trending of Unfriending
7
Confi dential 8
Confi dential
Unfriending
9
Confi dential
Another Network: The Brain
10
100 billion neurons
1000 ‘hardwired’ synapses
!""#$%%&2)4*52"*G57/"'4*5%A@CC%@C
Confi dential
Web 3.0: Data Web
11
Confi dential
Web Scale Data = More Pain
12
Findability
Access
Rights
Economics
Standards
Integration & Aggregation
Trust
Confi dential
Web 2.0 Model: Scale-Free Networks
13&&&'.0"0/22H#)*/7",*-'-2"
Confi dential
Book Data: Progress Being Made
14
Google Book Search API Open Library Books API ISBNdb Amazon API LibraryThing GoodReads WorldCat
Confi dential
I,-<7(,),"JKKKKKKKKKKKK=44211KKKKKKKL,+!"1KKKKKKKKKKKKM4*-*5,41KKKKKKK
N"7-<7/<1KKKKKKKKKKKKKKK>/01"KKKKKK
Google Book Search API Open Library Books API ISBNdb WorldCat
Amazon API LibraryThing GoodReads
Confi dential
Another Case Study: Local Data
16
!""#$%%1"2O24!2-2J'#*1"2/*01'4*5%
Confi dential
Another Case Study: Local Data
17
!"#$$%&
'()%*++,
-++.$
'+/&01/(&%
2%3.
4#33+"
5++63%
7+8%9:/;)$#+;
!"#$$%&
'()%*++,
-++.$
'+/&01/(&%
2%3.
4#33+"
5++63%
7+8%9:/;)$#+;
Examine Twitter sentiment(avoid dirty coffee shops)
Identify areas of highest bike thefts
Correlate check-ins with property values
!"##$%$&$'$(#)*+()(,-&(##)%.'/!"#$%"$&"'$()*$*!)$%+*+0
Confi dential
HomeJunction
18
Confi dential
Factual is Example of New Information Network
19
"#$#%&'( )'$&*+*#(&(
Aggregate Mash Curate Dedupe Canonicalize
,-."'-%$%+*+
,-./#'&01&-*'&2
!"#$%"&'()"*+$,-.-/(0(1("*+$%231#-&"$4..*
345&*'6&'$
Developers Search EnginesPublishers
Confi dential
Factual’s Open Data Model
20
Free, access via APIs, SDKs, and downloads BUT…we ask you to contribute back into ecosystem.
Benefi ts
! Drive down costs
! Rapid iteration
! Differentiate on user experience
! Only need small % participation from world (e.g. Wikipedia)
Confi dential
Equivalence Measurements
21
Subway Sandwiches52 E Court StCincinnati, OH(513)-241-6699
Subway52 West Court St45202(800)-653-2323
= ?
Confi dential
Large-Scale Aggregation Technologies
22
Confi dential
Large-Scale Aggregation Technologies
23
=#7/"52-"1KPK=#"1;2-"2/KPK;"/
;*/#KPK;*/#*/7",*-N2/O,42KPKNO4=""*/-2JKPK=""J=11*4KPK=11*4,7"21?-4KPK?-4*/#*/7"2<=11-KPK=11*4,7",*-;*KPK;*5#7-JQ*0-"KPKQ"R/*1KPKR/*"!2/1
KKKKKKKRRSKPKR7/(2T02KKK'''''U*/,KPK>2<
Confi dential
Large-Scale Aggregation Technologies
24
L21"70/7-"KPKL1"/-"L21"70/7-"KPKL21"07/7-"V*1#KPKV*1#,"7)R,))7/<1KPKR,)),7/<1N7)*-KPKN)-R0..2"KPKR0..2"";2-"2/KPK;"/
=#7/"52-"1KPK=#"1R*0",T02KPKR"TW2&2)2/1KPKW2&)2/1;)27-2/1KPK;)-/1KKKKKQ7/32"KPKQ3"8K'''''X/7+2-KPKYZL2,))JK[
Confi dential
Kragen O'Reilly?
25
Confi dential
Large-Scale Deduping
26
• Specialized data compression & folding techniques
• Eliminate redundant entities - endpoints and authority pages
• Improves precision & recall
• Enables real-time dedupe and crosswalks
Confi dential
Shared Foundational Data
! Commoditization of data
! Head attributes for people, places, things decreasing in value
! hCard data value driven to zero (visual of local data being identical on thousand of apps)
! Entertainment: IMDB exposed all their data for non-commercial use (link to site map)
! Yet, there are still lots of errors in foundation data – thus need “living” model
Confi dential
LA Neighborhoods: Another Crowdsourcing Example
! LA Times started with 87 neighborhoods based on census tracts
! Incorporated 650+ user maps
! Ended with 114 neighborhoods for LA City
! Added additional 158 neighborhoods for LA County
Confi dential
Ownership & Rights: LA Neighborhoods:
! Terms of Service: Creative Commons Attribution, Noncommercial, Share-Alike license
! Can share and remix as long as it’s for noncommercial uses, attributed to the LA Times, and shared under the same terms
Confi dential
Evolving “Buy” Model
! Data Marketplaces (“itunes of data?”)
! Data Search Engines
! Microformats / Semantic Web Markups / Other Standards
! Electronic Forms of T&Cs
Confi dential
Summary: Road to the Information Singularity
31
! Rise in community storage and access
! New common schemas and standards
! Defi nitive, accountable sources of “open” data
! Trends towards sharing of foundational data
! 'Buy' models based on unique data, novel access methods, SLAs, value-added services
Thank you! Questions......
Gil Elbaz @factual @gilelbaz