hi-tech barbecue/grilling @high mountain. datafy everything: what's next in digital life daniel...

40
Hi-Tech Barbecue/Grilling @high mountain

Upload: suzanna-weaver

Post on 19-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

Hi-Tech Barbecue/Grilling @high mountain

Page 2: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com
Page 3: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

Datafy Everything: What's Next in Digital Life

Daniel HaoTien [email protected]

http://danieleewww.yolasite.com/2015-mgb070.php

Page 4: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

Datafication: a process of “taking all aspects of life and turning them into data”

• Google’s augmented-reality glasses datafy the gaze

• Twitter datafies stray thoughts• LinkedIn datafies professional networks• Facebook datafies social activities• Pandora/Spotify datafies music feeling and

sensibility• Amazon datafies shopping

Once we datafy things, we can transform their purpose and turn theinformation into new forms of value.

Page 5: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

What Behind Datafication

• Statistically N=small random sampling(carefully curated data) to approaching N=all (some messiness)

• Data – from some to all – from clean to messy– from causation to correlation -this represents a move

away from always trying to understand the deeper reasons behind how the world works to simply learning about an association among phenomena and using that to get things done.

Page 6: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

Datafying through the Air: Go-and-Fly

• http://www.airware.com/aerial-information-platform

• https://www.youtube.com/watch?v=6ZjwgSwXfMQ

Page 7: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com
Page 8: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

Benefit of Telematics

• Greenhouse Gas Reduction• Telematics outputs drive UPS’s planning,

training, and maintenance activities.• Mileage Reduction- Multi-million gallons of

gasoline saving yearly.• Fuel and Emissions Efficiency• Operational Improvement-- Even tiny operational

improvements from telematics data can cut millions of miles from the total.

Page 9: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

Saving small per and earning big total amount

Page 10: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

Eg. Outbreak Early Warning

http://www.google.org/

Page 11: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

This serves as a reminder that predictions are only probabilities and are not always correct, especially when the basis for the prediction -- Internet searches -- is in a constant state of change and vulnerable to outside influences, such as media reports. Still, big data can hint at the general direction of an ongoing development, and Google’s system did just that.

Page 12: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com
Page 13: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

The fact that Google decided not to update the model for 2012-13, and subsequently the model performed poorly in 2012-13, suggests that the procedure for deciding when an update is necessary may need to be reworked.

Page 14: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

Datafication of Posteriors• When a person is seated, the contours of the body, its

posture, and its weight distribution can all be quantified and tabulated.

• Car Seat IDs Driver’s Rear End: Mr. Koshimizu, a mechanical engineering associate professor at the Advanced Institute of Industrial Technology in Tokyo, has developed an ultra-sensitive sheet that sometime down the line could make the contours of a driver’s rear end an integral part of a car’s security system.

Page 15: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

FAST (Future Attribute Screening Technology)

Page 16: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

Prediction Technology: Risk Prediction• http://www.ubicna.com/en/technology/

Page 17: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

• All frauds and misconducts differ, there is similarity in their progression from development to emergence

• Risk prediction can be applied to all kinds of misconduct cases. i.e. Cartel, FCPA (bribe), information leakage, research misconduct, etc.

Application of Prediction Technology:

Page 18: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

Browser FingerprintingIn the past, clearing cookies after each session or selecting your browser’s “Do Not Track” setting could prevent third-party tracking. But the advent of browser fingerprinting makes it very difficult to prevent others from monitoring your online activities. The diagram outlines how an online advertising network can track the sites you visit using fingerprinting.

Page 19: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com
Page 20: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

Browser Fingerprinting

• Collecting identifying information about unique characteristics of the individual computers people use. Under the assumption that each user operates his or her own hardware, identifying a device is tantamount to identifying the person behind it.

• Unique characteristics including user’s screen size, time zone, browser plug-ins, and set of installed system fonts.

• Users continue to be fingerprinted even if they have checked “Do Not Track” in their browser’s preferences.http://spectrum.ieee.org/computing/software/browser-fingerprinting-and-the-onlinetracking-arms-race

Page 21: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com
Page 22: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com
Page 23: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

The future belongs to the companies and people that

turn data into products/services

Page 24: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

The Historical S, T & A Co-evolution Process Perspective: Age of Data Science

Courtesy of Byeongwon Park 2007

NBIC: Nanotechnology, Biotechnology, Information Technology, Cognitive Science

More stories here!

Page 25: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

Big Data

• A data set(s) with characteristics (e.g. volume, velocity, variety, variability, veracity, etc.) that for a particular problem domain at a given point in time cannot be efficiently processed using current/existing/established/traditional technologies and techniques in order to extract valu

Page 26: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

The value drivers of big data for enterprise

Page 27: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

Big Data Market Forecast

Page 28: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com
Page 29: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

Roles in Big Data Ecosystem• Data Provider: introduces new data or information feeds into the ecosystem — Big

Data Application Provider: executes a life cycle (collection, processing, dissemination) controlled by the system orchestrator to implement specific vertical applications requirements and meet security and privacy requirements

• Big Data Framework Provider: establishes a computing fabric (computation and storage resources, platforms, and processing frameworks) in which to execute certain transformation applications while protecting the privacy and integrity of data

• Data Consumer: includes end users or other systems who utilize the results of the Big Data Application Provider

• System Orchestrator: defines and integrates the required data application activities into an operational vertical system

• Security and Privacy: the role of managing and auditing access to and control of the system and the underlying data including management and tracking of data provenance

• Management: the overarching control of the execution of a system, the deployment of the system, and its operational maintenance

Page 30: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

Data Engineering

Page 31: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

Data Science Process

http://www.youtube.com/watch?v=xbecGJlODPg

Page 32: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

The data scientist is involved in every part of this process

Page 33: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

Big Data Paradigm • Consists of the distribution of data systems across horizontally coupled

independent resources to achieve the scalability needed for the efficient processing of extensive data sets.

• With the new Big Data Paradigm, analytical functions can be executed against the entire data set or even in real-time on a continuous stream of data. Analysis may even integrate multiple data sources from different organizations. For example, consider the question “What is the correlation between insect borne diseases, temperature, precipitation, and changes in foliage”. To answer this question an analysis would need to integrate data about incidence and location of diseases, weather data, and aerial photography.

• The Big Data paradigm has other implications from these technical innovations. The changes are not only in the logical data storage, but in the parallel distribution of data and code in the physical file system and direct queries against this storage.

Ref. http://www.iso.org/iso/big_data_report-jtc1.pdf

Page 34: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

Big Data Engineering

• Which is the storage and data manipulation technologies that leverage a collection of horizontally coupled resources to achieve a nearly linear scalability in performance.

• New engineering techniques in the data layer have been driven by the growing prominence of data types that cannot be handled efficiently in a traditional relational model. The need for scalable access in structured and unstructured data has led to software built on name-value/key-value pairs or columnar (big table), documentoriented, and graph (including triple-store) paradigms.

Page 35: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

Data Lifecycle

• The shift in thinking causes changes in the traditional data lifecycle. One description of the end-to-end data lifecycle categorizes the steps as collection, preparation, analysis and action. Different big data use cases can be characterized in terms of the data set characteristics at-rest or in-motion, and in terms of the time window for the end-to-end data lifecycle. Data set characteristics change the data lifecycle processes in different ways, for example in the point of a lifecycle at which the data are placed in persistent storage. In a traditional relational model, the data are stored after preparation (for example after the extract-transform-load and cleansing processes). In a high velocity use case, the data are prepared and analysed for alerting, and only then is the data (or aggregates of the data) given a persistent storage. In a volume use case the data are often stored in the raw state in which it was produced, prior to the application of the preparation processes to cleanse and organize the data. The consequence of persistence of data in its raw state is that a schema or model for the data are only applied when the data are retrieved, known as schema on read.

Page 36: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

Data Engineering …more

• A third consequence of big data engineering is often referred to as “moving the processing to the data, not the data to the processing”. The implication is that the data are too extensive to be queried and transmitted into another resource for analysis, so the analysis program is instead distributed to the data-holding resources; with only the results being aggregated on a different resource. Since I/O bandwidth is frequently the limited resource in moving data, another approach would be to embed query/filter programs within the physical storage medium.

Page 37: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

Machine Learning and Feature Engineering

• http://www.slideshare.net/dato-inc/overview-of-machine-learning-and-feature-engineering

Page 38: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

Case Studies…

• TBD

Page 39: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

Working with data at scaleMaking data tell its story

The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be……..

Page 40: Hi-Tech Barbecue/Grilling @high mountain. Datafy Everything: What's Next in Digital Life Daniel HaoTien Lee danieleewww@gmail.com

Assignment for next week

• Case study and discussion: “Data Science and Machine Learning” search and prepare 15min. presentation and 10min. Q&A