zwei jahrebigdata

47
Big Data: Das zweite Jahr. Joerg Blumtritt

Upload: joerg-blumtritt

Post on 26-Jan-2015

104 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Zwei jahrebigdata

Big Data:Das zweite Jahr.

Joerg Blumtritt

Page 2: Zwei jahrebigdata

2

Page 3: Zwei jahrebigdata
Page 4: Zwei jahrebigdata

4

Page 5: Zwei jahrebigdata

5

Page 6: Zwei jahrebigdata

The Future of Market Research

Page 7: Zwei jahrebigdata
Page 8: Zwei jahrebigdata

Hardware

Traditional• exotic hardware• big central servers• SAN• RAID• hardware reliability• expensive• limited scalability

Big Data• commodity HW• racks of pizza boxes• Ethernet• JBOD• unreliable HW• cost effective• scales further

Page 9: Zwei jahrebigdata

Software

Traditional• monolithic• centralized storage• RDBMS• schema first• proprietary

Big Data• distributed• storage & compute• nodes• raw data• open source

Page 10: Zwei jahrebigdata

Quanti fication

VolumeVelocityVariety

DataScience

Page 11: Zwei jahrebigdata

1. Volume– Very large data sets– Data Center → Data Warehouse → Internet Scale– Typical dimensions: billions or trillions of records, millions

or billions of variables– e.g. Twitter: > 400 M Tweets per day– Technologies: MapReduce, HDFS, Project Voldemort

... das erste V

Page 12: Zwei jahrebigdata

Map-Reduce

12

http://hadoop.apache.org/docs/r0.18.3/mapred_tutorial.html#Example%3A+WordCount+v2.0

Page 13: Zwei jahrebigdata

1. Volume2. Velocity

– Very fast data streams– sensor data, smartphones, socia media:– Typical dimensions: 15k-300k/s– Real time inputs / real time outputs– Stream/event pocessing– Technologies: Storm, S4, Esper, HBase, Kafka

zweites V

Page 14: Zwei jahrebigdata

Storm

14

http://engineering.twitter.com/2011/08/storm-is-coming-more-details-and-plans.html

Page 15: Zwei jahrebigdata

1. Volume2. Velocity3. Variety / Variability

– Manifold and highly variable data structures– data market places, e.g. Datasift, GNIP, Enigma.io– No schema / NoSQL– Distributed storage– Immutability

... und das letzte V

Page 16: Zwei jahrebigdata

16

{"created_at":"Sat Apr 13 08:07:34 +0000 2013", "id":322984390491774976, "id_str":"322984390491774976", "text":"getr\u00e4umt, ich h\u00e4tte \u00fcber den Skandal geblogt, dass wir immernoch geschirrsp\u00fchlen, genau wie zu Car\u00eames Zeiten.", "source":"\u003ca href=\"http:\/\/twitter.com\/download\/android\" rel=\"nofollow\"\u003eTwitter for Android\u003c\/a\u003e", "truncated":false, "in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":10177792,"id_str":"10177792", "name":"Joerg Blumtritt", "screen_name":"jbenno", "location":"Stockdorf", "url":"http:\/\/slow-media.net", "description":"I just coined the word panfuturistic because it sounds cool. http:\/\/memeticturn.com\/declaration-of-liquid-culture", "protected":false,"followers_count":2671,"friends_count":1599,"listed_count":141,"created_at":"Mon Nov 12 11:16:15 +0000 2007", "favourites_count":3582,"utc_offset":3600, "time_zone":"Berlin", "geo_enabled":true,"verified":false,"statuses_count":30140,"lang":"en", "contributors_enabled":false,"is_translator":false,"profile_background_color":"FFFFFF", "profile_background_image_url":"http:\/\/a0.twimg.com\/profile_background_images\/816896285\/688fcbc8df9391dfd71012d06ca34002.jpeg", "profile_background_image_url_https":"https:\/\/si0.twimg.com\/profile_background_images\/816896285\/688fcbc8df9391dfd71012d06ca34002.jpeg", "profile_background_tile":false,"profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/3315156408\/db719e7db02772e468179545fb06e7f9_normal.jpeg", "profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/3315156408\/db719e7db02772e468179545fb06e7f9_normal.jpeg", "profile_banner_url":"https:\/\/si0.twimg.com\/profile_banners\/10177792\/1365261531", "profile_link_color":"0000FF", "profile_sidebar_border_color":"FFFFFF", "profile_sidebar_fill_color":"E0FF92", "profile_text_color":"000000", "profile_use_background_image":true,"default_profile":false,"default_profile_image":false, "following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"favorite_count":0,"favorited":false,"retweeted":false,"lang":"de"}

Page 17: Zwei jahrebigdata

17

Statt die Konsistenz der Daten schon in der Struktur festzulegen,wird eine Funktion definiert, die jeden Record nach den vorgegebenen Kriterien überprüft:

function IsConsistent(Record, Schema) as Boolean

Page 18: Zwei jahrebigdata

18

Operation SQL Create INSERT Read (Retrieve) SELECT Update (Modify) UPDATE Delete (Destroy) DELETE

"mutable"

"Each event happens at a particular time and is always true"

• Just C+R; nothing gets ever "updated"

• Records are stored as files. Each record is a new file.

"immutable"

Page 19: Zwei jahrebigdata

19

Query

Precomputed View(Batch Mode)

Data Stream

All Data

Precomputed realtime view

Page 20: Zwei jahrebigdata

Quanti fication

VolumeVelocityVariety

DataScience

Page 21: Zwei jahrebigdata

known knowns known unknowns unknowns unkonws

„data puking“(Dashboards)

„analysis throwing“(Modellings)

„data democracy“(Big Data)

Avinash Kaushik

As we know, There are known knowns. There are things we know we know. We also know There are known unknowns. That is to say: We know there are some things

We do not know. But there are also unknown unknowns, The ones we don't know We don't know.

Donald Rumsfeld

Page 22: Zwei jahrebigdata

Data Science

22

Page 23: Zwei jahrebigdata
Page 24: Zwei jahrebigdata
Page 25: Zwei jahrebigdata

• Text comparism of party programmes

• Cosinus-Vector distance

Page 26: Zwei jahrebigdata

26

0

500

1000

1500

0 4 8 12 16 20 0 4 8 12 16 22 2 6 10 14 20

DSDSTatort

So 10.3.Sa 9.3.Fr 8.3.

Page 27: Zwei jahrebigdata

Personahttp://twitter.com/FlaviaReil/statuses/308321057499144193http://twitter.com/froschmann1968/statuses/308321920200364034http://twitter.com/VeronikaTangen/statuses/308322141676388352http://twitter.com/froschmann1968/statuses/308322188501602304http://twitter.com/QWallyTy/statuses/308322522863128576http://twitter.com/Duftlavendel/statuses/308322911444406272http://twitter.com/kakakiri/statuses/308323144836456448http://twitter.com/Chake/statuses/308323468179566592http://twitter.com/RegulaAeppli/statuses/308323570386350083http://twitter.com/Imissmycat1/statuses/308323602342764544http://twitter.com/WorldNewsGerman/statuses/308323834749140995http://twitter.com/Zoran2010/statuses/308324446035386368

27

Page 28: Zwei jahrebigdata

28

männlichweiblichn.a.

Page 29: Zwei jahrebigdata

29

http://www.jasondavies.com/parallel-sets/

http://www.nytimes.com/interactive/2012/05/17/business/dealbook/how-the-facebook-offering-compares.html?_r=0

http://www.senchalabs.org/philogl/PhiloGL/examples/winds/

Page 30: Zwei jahrebigdata

Quanti fication

VolumeVelocityVariety

DataScience

D3

Page 31: Zwei jahrebigdata

31

Page 32: Zwei jahrebigdata

32

Page 33: Zwei jahrebigdata

33

Page 34: Zwei jahrebigdata

Quantified Self

34

Page 35: Zwei jahrebigdata

35

Page 36: Zwei jahrebigdata

36

Page 37: Zwei jahrebigdata

37

Page 38: Zwei jahrebigdata

38

Page 39: Zwei jahrebigdata

39

Page 40: Zwei jahrebigdata

40

Page 41: Zwei jahrebigdata

41

Page 42: Zwei jahrebigdata

42

Page 43: Zwei jahrebigdata

43

Page 44: Zwei jahrebigdata

44

Page 45: Zwei jahrebigdata

45

Page 46: Zwei jahrebigdata

Digital Darwinismis the Evolution ofConsumer Behavior whenSociety & TechnologyEvolve FasterThan the AbilityTo Adapt

Brian Solis

Page 47: Zwei jahrebigdata

47

{"name": "Joerg Blumtritt", "job":

{title: "Strategy Consultant", startdate: "2005", enddate: null

}"job":

{title: "Chairman", company: "Arbeitsgemeinschaft Social Media e.V.", startdate: "2008", enddate: null

}"email": "[email protected]""twitter":"@jbenno", "blog": "http://beautifuldata.net", "blog": "http://slow-media.net", "blog": "http://kuirjeo.net", "blog": "http://memeticturn.net", "website":"http://mediagnosis.de" , "image": "http://slow-media.net/wp-content/uploads/jb_creeper.jpg", "bio": http://beautifuldata.net/Joerg-blumtritt/

}