chris boos, arago ag: big data means new programs
DESCRIPTION
Lightning Talk anlässlich des zweiten CloudCamp Frankfurt am 24.5.2012 in der Brotfabrik in Hausen.TRANSCRIPT
ANALYZING BIG DATA IS PROGRAMMING FOR THE CLOUD
Chris Boos (@boosc)[email protected]
CloudCamp Frankfurt 24.5.2012
Donnerstag, 24. Mai 12
Data, lots of itDonnerstag, 24. Mai 12
Even in simple datasets, common statistics fails - (avg, min, max, distribution)
Donnerstag, 24. Mai 12
79 times more CPU power than used in Apollo missions on one iPhone
Donnerstag, 24. Mai 12
Why you need big data
Data Processing 1960 s 1950 s Data
Information Mangement 1980 s 1970 s Information
Knowledge Management 1990 s Knowledge
Knowledge Ecology 2000 s Intelligence
Wisdom 2010 s Systems Thinking
Yield You Are Here !
Donnerstag, 24. Mai 12
Finding clusters, evaluating outliers and interpreting white noise
Donnerstag, 24. Mai 12
You are not looking for patterns, you are looking for anomalies
Donnerstag, 24. Mai 12
Cloud Computing 1.0 Is
When the IT guys are finally able to explain to business
people what they were talking about 20 years ago!
Donnerstag, 24. Mai 12
=
Donnerstag, 24. Mai 12
Computation on demand
+Pay as you go
Donnerstag, 24. Mai 12
Cloud Computing 2.0 Is
When the IT guys realize that using this scalable
ressource also calles for new ways of programming
Donnerstag, 24. Mai 12
=
Donnerstag, 24. Mai 12
go beyond IaaSand start
thinking parallel
Donnerstag, 24. Mai 12
and
Donnerstag, 24. Mai 12
BASE(Basically Available, Soft State, Eventual consistency)
not
ACID(Atomicity, Consistency, Isolation, Durability)
Donnerstag, 24. Mai 12
How to scale (AWS Example)
• Do not allocate instances manually
• Each component needs to be independent
• Plan for failure
• Actively provoke failure
Donnerstag, 24. Mai 12
Human Software
• Click Workers and Mechanical Turks are not just cheap labour
• They allow programmers to hand tasks to humans they are not able to handle algorithmically
• Make use of it to
• Do things too complicated for machine learning
• Pre populate machine learning spaces
Donnerstag, 24. Mai 12
Old Style (Imperative) Programming
• Step by step explanation what to do
• Explaining WHAT to do rather than RESULTS you want
• Always necessary for basic algorithms
1
2
3
Donnerstag, 24. Mai 12
One New Stly (Functional) Programming I
• Combine results to become a program
• Allows dynamic distribution
• Map-Reduce is only one way of doing it!
1
2
3
Donnerstag, 24. Mai 12
Functional Programming II
F ( G ( H ( A,B) , C), D)
getMusicLikes(getFriends(facebookID)
Instead of
for i in getFriends(facebookID) getMusicLikes(i)
Donnerstag, 24. Mai 12
Check out my tool list:http://www.hcboos.net/100-links/
Donnerstag, 24. Mai 12
2 Examples
Donnerstag, 24. Mai 12
The AMP3 Platform at Senzari.comAdaptable Music Parallel Processing Platform
Donnerstag, 24. Mai 12
MARS-o-Matic at arago.deBig data based IT modelling and pricing app
Donnerstag, 24. Mai 12
Thank You for Your Time
Donnerstag, 24. Mai 12
Credits
• „Big Data Just Beginning to Explode“ by CSC http://www.csc.com/insights/flxwd/78931-big_data_just_beginning_to_explode
• „Social media network connections among twitter users“ by Marc Smith http://www.flickr.com/photos/marc_smith/
• Asteroid Datasets by Bruce Gary http://brucegary.net/POVENMIRE/x.htm
Donnerstag, 24. Mai 12