Download - Building a super database from linked data
![Page 2: Building a super database from linked data](https://reader033.vdocument.in/reader033/viewer/2022052621/558b4349d8b42a821f8b457b/html5/thumbnails/2.jpg)
Who is this NOT for?
Building a large database from a tiny team Organizing the world's information Information innovation
Who IS this for?
![Page 3: Building a super database from linked data](https://reader033.vdocument.in/reader033/viewer/2022052621/558b4349d8b42a821f8b457b/html5/thumbnails/3.jpg)
About
Co-founder, CTO Popular movie reviews web site Aggregated reviews,
comprehensive film database
![Page 4: Building a super database from linked data](https://reader033.vdocument.in/reader033/viewer/2022052621/558b4349d8b42a821f8b457b/html5/thumbnails/4.jpg)
The Stone Age
Static HTML templates
Editors read articles and pull quotations
Only cover the newest movies
~1000 films
![Page 5: Building a super database from linked data](https://reader033.vdocument.in/reader033/viewer/2022052621/558b4349d8b42a821f8b457b/html5/thumbnails/5.jpg)
Modern Times
Shift to LAMP License long-tail
database Automated spiders,
early UGC via critics Use homegrown
CMS for additional content
(How I felt maintaining Rotten Tomatoes' overloaded database servers)
![Page 6: Building a super database from linked data](https://reader033.vdocument.in/reader033/viewer/2022052621/558b4349d8b42a821f8b457b/html5/thumbnails/6.jpg)
8 million unique visitors / month Lean startup: 25x traffic with 7 staff Great site for film lovers (including Steve Jobs)
v
The Result
![Page 7: Building a super database from linked data](https://reader033.vdocument.in/reader033/viewer/2022052621/558b4349d8b42a821f8b457b/html5/thumbnails/7.jpg)
About Co-founder, CTO
SNS for artists started with Daniel Wu 吴彦祖
Started with six artists, now 1,600 artists, 600K registered users
Also powers official web sites:
李连杰: JetLi.com
成龙: JackieChan.com
莫文蔚: KarenMok.com
![Page 8: Building a super database from linked data](https://reader033.vdocument.in/reader033/viewer/2022052621/558b4349d8b42a821f8b457b/html5/thumbnails/8.jpg)
Our LAMP stack: Not the best setup for...Newsfeeds...
Viral loop analysis...
Multivariate testing...
The Problem?!?Scalability issues with real-time data, but without traffic from
public, long-tail content
![Page 9: Building a super database from linked data](https://reader033.vdocument.in/reader033/viewer/2022052621/558b4349d8b42a821f8b457b/html5/thumbnails/9.jpg)
About
A better entertainment database
Providing the long-tail content
Still a part of alivenotdead.com
Still in alpha
![Page 10: Building a super database from linked data](https://reader033.vdocument.in/reader033/viewer/2022052621/558b4349d8b42a821f8b457b/html5/thumbnails/10.jpg)
Features Comprehensive info
for celebrities, films, music, and TV
Searchable, structured data
Multilingual: English, Chinese, Japanese
Aggregated social media from inside/outside China
![Page 11: Building a super database from linked data](https://reader033.vdocument.in/reader033/viewer/2022052621/558b4349d8b42a821f8b457b/html5/thumbnails/11.jpg)
Why use mongoDB?
Flexible schema for different data sources
Dozens of other sources...
![Page 12: Building a super database from linked data](https://reader033.vdocument.in/reader033/viewer/2022052621/558b4349d8b42a821f8b457b/html5/thumbnails/12.jpg)
Why use
Scalable big data 500,000 translations
Next challenge:
Aggregating and storing the social media firehose
2 million+ topics covered
![Page 13: Building a super database from linked data](https://reader033.vdocument.in/reader033/viewer/2022052621/558b4349d8b42a821f8b457b/html5/thumbnails/13.jpg)
Why use
Crossing the border... alive.tom.com in
Tianjin Alivenotdead.com
in Hong Kong
Use replica sets/eventual consistency to overcome frequent cross-border network issues
![Page 14: Building a super database from linked data](https://reader033.vdocument.in/reader033/viewer/2022052621/558b4349d8b42a821f8b457b/html5/thumbnails/14.jpg)
Wikipedia as structured data Creative Commons license
Multiple CC sources Organized taxonomy Acquired by Google No Chinese/Japanese yet!
Using Linked Open Data
![Page 15: Building a super database from linked data](https://reader033.vdocument.in/reader033/viewer/2022052621/558b4349d8b42a821f8b457b/html5/thumbnails/15.jpg)
Wikipedia as structured data Creative Commons license
Only Wikipedia Messy taxonomy Chinese/Japanese topic
translations, but requires English topic link
Using Linked Open Data
![Page 16: Building a super database from linked data](https://reader033.vdocument.in/reader033/viewer/2022052621/558b4349d8b42a821f8b457b/html5/thumbnails/16.jpg)
Using Linked Open Data
Use Freebase organized taxonomy, broad data Expand DBpedia to Chinese-only topics Same methodology across Chinese wiki sources
![Page 17: Building a super database from linked data](https://reader033.vdocument.in/reader033/viewer/2022052621/558b4349d8b42a821f8b457b/html5/thumbnails/17.jpg)
The Future
Developer API Topic extraction Real-time trends
across languages Other verticals
Already 10x more data than Rotten Tomatoes...
The complete sum of information from across the web...
Information not constrained by language...