big data mining

19
Big Data Mining

Upload: thadsanamoorthy-kajavathanan

Post on 27-Jan-2015

112 views

Category:

Data & Analytics


1 download

DESCRIPTION

This is my presentation about Big data mining.

TRANSCRIPT

Page 1: Big data mining

Big Data Mining

Page 2: Big data mining

Overview Introduction

Characteristics of Big Data

Big Data and it’s challenges

Big Data mining Tools

Big Data mining algorithm

Applications of Big Data

References

Q&A

Page 3: Big data mining

Introduction

Page 4: Big data mining

Interesting Facts The volume of business data worldwide, across all companies,

doubles every 1.2 years (was 1.5 years)

Daily 2500 quadrillion of data are produced and more than 90 percentage of data are produced within past two years.

A regular person is processing daily more data than a 16th century individual in his entire life

In the last years cost of storage and processing power dropped significantly

Bad data or poor data quality costs US businesses $600 billion annually

By 2015, 4.4 million IT jobs globally will be created to support big data (Gartner)

Facebook processes 10 TB of data every day / Twitter 7 TB

Google has over 3 million servers processing over 2 trillion searches per year in 2012 (only 22 million in 2000)

Page 5: Big data mining

What is

Page 6: Big data mining

The term Big data is used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques.

-Webopedia

Page 7: Big data mining

Characteristics of Big Data

Volume - The quantity of data

Variety - categorizing the data

Velocity - speed of generation of data or the speed of processing the data

Variability - Inconsistency

Complexity - Managing the data

Page 8: Big data mining

DATA MINING CHALLENGES WITH BIG DATA Main challenge for an intelligent database is handling Big data.

The important thing is scaling the large amount of data and provide solution for these problem by HACE theorem

Page 9: Big data mining

ChallengesLocation of Big Data sources- Commonly Big Data

are stored in different locationsVolume of the Big Data- size of the Big Data grows

continuously.Hardware resources- RAM capacityPrivacy- Medical reports, bank transactionsHaving domain knowledgeGetting meaningful information

SolutionsParallel computing programmingAn efficient platform for computing will not have

centralized data storage instead of that platform will be distributed in big scale storage.

Restricting access to the data

Page 10: Big data mining

BIG Data Mining Tools Hadoop Apache S4 Strom Apache Mahout MOA

Page 11: Big data mining

Hadoop It is developed by Apache Software Foundation project and open

source software platform for scalable, distributed computing.

Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.

Hadoop provides fast and reliable analysis of both Structured and un structured data.

It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

Hadoop uses MapReduce programming model to mine data.

This MapReduce program is used to separate datasets which are sent as input into independent subsets. Those are process parallel map task.

Map() procedure that performs filtering and sorting

 Reduce() procedure that performs a summary operation

Page 12: Big data mining
Page 13: Big data mining

Big Data Mining Algorithm Big data applications have so many sources to gather information.

If we want to mine data, we need to gather all distributed data to the centralized site. But it is prohibited because of high data transmission cost and privacy concerns.

Most of the mining levels order to achieve the pattern of correlations, or patterns can be discovered from combined variety of sources.

The global data mining is done through two steps process.

Model level

Knowledge level.

Each and every local sites use local data to calculate the data statistics and it share this information in order to achieve global data distribution in their data level.

Page 14: Big data mining

In model level it will produce local pattern. This pattern will be produced after mined local data.

By sharing these local patterns with other local sites, we can produce a single global pattern.

At the knowledge level, model correlation analysis investigates the relevance between models generated from various data sources to determine how related the data sources are correlated to each other, and how to form accurate decisions based on models built from autonomous sources

Page 15: Big data mining

Applications of Big Data Healthcare organizations can achieve better insight into disease

trends and patient treatments.

Public sector agencies can catch fraud and other threats in real-time.

Applications of Multimedia data

To find travelling pattern of travelers

CC TV camera footage

Photos and Videos from social network

Recommender system

Integration and mining of Bio data from various sources in Biological network by NSF (National Science Foundation).

Classifying the Big data stream in run time, by Australian Research council.

Page 16: Big data mining

References[1] IEEE, Data Mining with Big Data, January 2014

[2] McKinsy Global Institute, Big Data: The next frontier for innovation, competition and productivity- May 2011

[3] Xindong Wu, Xinguan Zhu, Gong-Qing Wu, Wei Ding, 2013, Data Mining with Big Data

[4] Ahmed and Karypis 2012, Rezwan Ahmed, George Karpis, Algorithms for mining the evolution of conserved relational states in dynamic network

[5] Wu X. 2000, Building Intelligent Learning Database Systems, AI Magazine

[6] Oracle, June 2013,Unstructured Data Management with Oracle Database 12c

[7] Valery A.Petrushin, Jia-Yu Pan, Cees G.M.Snoek, 2010,Tenth International Workshop on Multimedia Data Mining

[8] Big data[Online].Available:www.en.wikipedia.org/wiki/Big_data

[9] Big data [Online]. Available: www.webopedia.com/TERM /B/ big_data.html

[10]IBM big data and information management [Online]. Available: www-01.ibm.com/software/data/bigdata

[11] Big data [Online]. Available: www.adainbigdata.com

[12] Big Data Explained [Online]. Available: www.mongodb.com/big-data-explained

[13] Big data [Online]. Available: www.sas.com/en_us/insights/big-data/what-is-big-data.html

[14] Big Data Mining Tools[Online]. Available: www.albertbifet.com/big-data-mining-tools

Page 17: Big data mining
Page 18: Big data mining
Page 19: Big data mining

Cloud storage for Big Data

Processing