data mining and the innovation of the crowds jeff lynn 21 october 2011
TRANSCRIPT
Data Mining and the
Innovation of the Crowds
Jeff Lynn21 October 2011
2
Text/data mining already flourishes, but only the owners of the material and their licensees can participate
Data Mining: The Problems We Face Today
That creates two problemso It means less material can be mined
• If relevant material is owned by a multitude of rights-holders, each may only be able to mine a portion
• This is bad, for obvious reasons
o It means fewer people can do the mining• Only those with a direct connection to the rights-holder can get involved
• I see this as even worse, but the reasons may not be obvious
3
Instead, they turned to crowdsourcing and developed a programmed called Connect & Develop.
Proctor & Gamble had long relied on their internal product development staff of over 7,000 people.
◦ In 2000, they realised that 7,000 would not be nearly enough to innovate fast enough to meet customer demand
◦ The traditional approach would have been to hire more internal staff
The Power of Crowdsourcing:A Case Study from P&G
P&G posts product development tasks to a public website
◦ Includes the price they will pay for the project to be completed
◦ Members of P&G’s extended dev team respond with proposals
◦ The work is awarded to the best solution
Connect & Develop
4
As a result:◦ There are now 1.5 million people in P&G’s extended network◦ Over 50% of P&G’s product initiatives involve significant
collaboration with outside innovators
◦ P&G remains one of the most successful consumer goods companies in the world, with its share price increasing ~150% since the programme started
5
P&G’s a great story, but at the end of the day is just about making tastier Pringles
Imagine This in Data Mining
Think about what would happen if you had a pool of 1.5 million people using different techniques to mine data from thousands of:o Biomedical research paperso Historical newspaper articleso Analyses of public sentimentso Endless other data sources that are already in the public domain
6
As much as anything, crowdsourced data mining is what the digital economy is supposed to be about
◦ Utilising the low costs of communication to tap the talents of lots of people
◦ Improving collective human knowledge by taking advantage of the individual knowledge of people spread around the world
Conclusion
And it’s also what copyright is supposed to foster o We have IP laws solely to promote innovationo If the IP laws don’t allow crowdsourced data mining, then what
are they for?