data mining and the innovation of the crowds jeff lynn 21 october 2011

Data Mining and the

Innovation of the Crowds

Jeff Lynn21 October 2011

http://www.coadec.com/


2

Text/data mining already flourishes, but only the owners of the material and their licensees can participate

Data Mining: The Problems We Face Today

That creates two problemso It means less material can be mined

• If relevant material is owned by a multitude of rights-holders, each may only be able to mine a portion

• This is bad, for obvious reasons

o It means fewer people can do the mining• Only those with a direct connection to the rights-holder can get involved

• I see this as even worse, but the reasons may not be obvious


3

Instead, they turned to crowdsourcing and developed a programmed called Connect & Develop.

Proctor & Gamble had long relied on their internal product development staff of over 7,000 people.

◦ In 2000, they realised that 7,000 would not be nearly enough to innovate fast enough to meet customer demand

◦ The traditional approach would have been to hire more internal staff

The Power of Crowdsourcing:A Case Study from P&G


P&G posts product development tasks to a public website

◦ Includes the price they will pay for the project to be completed

◦ Members of P&G’s extended dev team respond with proposals

◦ The work is awarded to the best solution

Connect & Develop

4

As a result:◦ There are now 1.5 million people in P&G’s extended network◦ Over 50% of P&G’s product initiatives involve significant

collaboration with outside innovators

◦ P&G remains one of the most successful consumer goods companies in the world, with its share price increasing ~150% since the programme started


5

P&G’s a great story, but at the end of the day is just about making tastier Pringles

Imagine This in Data Mining

Think about what would happen if you had a pool of 1.5 million people using different techniques to mine data from thousands of:o Biomedical research paperso Historical newspaper articleso Analyses of public sentimentso Endless other data sources that are already in the public domain


6

As much as anything, crowdsourced data mining is what the digital economy is supposed to be about

◦ Utilising the low costs of communication to tap the talents of lots of people

◦ Improving collective human knowledge by taking advantage of the individual knowledge of people spread around the world

Conclusion

And it’s also what copyright is supposed to foster o We have IP laws solely to promote innovationo If the IP laws don’t allow crowdsourced data mining, then what

are they for?


data mining and the innovation of the crowds jeff lynn 21 october 2011

Documents