the technology perspective- staffan truvé, recorded future
DESCRIPTION
Presentation by Steffan Truvé (Recorded Future) from the 'Prefect Swell' workshop on text and data mining on the 27th of September 2013.TRANSCRIPT
Robots’ Rights and the Future of
Web Intelligence Staffan Truvé, Ph.D.
CTO, Recorded Future [email protected]
Who are we?
• Founded in 2009 • US-Swedish startup
• 40 persons, 50/50 US/SE
• Boston (HQ), DC, Göteborg,
• and a few more places around the globe
• Focus on web intelligence, for governments and industry
• Backed by Google Ventures, Atlas Venture, Balderton, IA Ventures, and I-Q-T
2
Cost to acquire and analyze
data
Amount of data
Cost to acquire and analyze
data
Amount of data
Today’s Discussion, Tomorrow’s News
6
Silicon Valley executives head to Vail, Colo. next week for the annual Pacific Crest Technology Leadership Forum
The carrier may select partners to set up a new carrier as early as next month
“2010 is the year when Iran will kick out Islam. Ya Ahura we will.”
“... Dr Sarkar says the new facility will be operational by March 2014...”
Unilever will hold their UK launch event early next week in Manchester
“...opposition organizers plan to meet on Thursday to protest...”
“Excited to see Morsi speak this weekend...”
“According to TechCrunch China’s new 4G network will be deployed by mid-2010”
“Strange new Russian worm set to unleash botnet on 4/1/2013...”
Estimated study completion date: November 2014
…new facility will be operational by March 2013…
...the transaction is expected close in late 2013…
Organizing the Web for Analysis
8
250,000 Real-time Sources
10 Billion Time-tagged Facts
“Kuo expects that Apple will introduce an iPhone 5S around June or July of this year”
“...opposition organizers plan to meet on Thursday to protest...”
Drought and malnutrition hinder next spring’s expansion plans in Kabul...
A few minutes from publishing to analysis
Inside the Web Intelligence Machine
Web Intelligence – at Web Scale • Processing 100s of millions of
documents
• Sources from all over the world, in 8 languages – English, Arabic, Chinese x 2, Russian, Spanish, French, Farsi
• From government sites and big media to blogs and social media
• 10B “facts” - growing fast
• 25+ entity types
• 100+ event types
• Metrics, signals, alerts
• In real time!
10
11
Evolution, Opportunities, and Threats
12
Without text mining, the web is useless!
• No way to find stuff without search engines – which all rely on text mining • And all publishers realize this
• Search à Analysis • A necessary evolution as
the web grows • Creating new value
• Aggregation, analysis
• Enabling media criticism
13
Drivers / Opportunities • Moore’s Law!
• Advances in linguistics, algorithms, math
• Exponential growth of
content
• The volume of information on the web is making traditional search worthless
14
Threats
• Appification / closed vertical silos
• Deep Web (& darkweb)
• IP protectionism / legislation 15
Restricting the right to analyze is absurd
• What is the borderline between reading and analyzing?
• Impossible to differentiate humans from ‘robots’
• Robots must have the same rights as human readers
16
Turing would have laughed!
17
18
Staffan Truvé [email protected]
The best way to predict the future is to invent it! (Alan Kay)
Dynabook, 1968