big data as a service - sethuonline.com | sathyabama university chennai

16
R.Sethuraman M.E,(PhD)., Assistant Professor, Faculty of Computing, Dept of Computer Science Engineering, Sathyabama University, Chennai. . An Efficient Framework for Data As A Service in Hadoop EcoSystem www.sethuonline.com

Upload: sethuraman-r

Post on 21-Mar-2017

105 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Big Data As a service - Sethuonline.com | Sathyabama University Chennai

R.Sethuraman M.E,(PhD).,Assistant Professor,

Faculty of Computing,Dept of Computer Science Engineering,

Sathyabama University,Chennai.

.

An Efficient Framework for Data As A Service in Hadoop EcoSystem

www.sethuonline.com

Page 2: Big Data As a service - Sethuonline.com | Sathyabama University Chennai

Agenda• Introduction

- Big Data Analytics- Hadoop EcoSystem

- Data As a Service• Literature Survey• Inference from the Survey• Problem Defined• Key Challenges of Problem• Proposed Methodology• References

www.sethuonline.com

Page 3: Big Data As a service - Sethuonline.com | Sathyabama University Chennai

Introduction

Big Data Analytics:• A process of examining large data sets containing various of

data types to uncover hidden patterns, unknown correlations and other useful business insights

• The sources for the large data sets includes server logs, social media, mobile devices and sensors. These data’s are of unstructured and semi-structured type.

• The traditional databases and Relational databases will not fit these unstructured and semi-structured data obtained from data sources

• This makes an necessity for the move to the new technology of Hadoop.

www.sethuonline.com

Page 4: Big Data As a service - Sethuonline.com | Sathyabama University Chennai

• Hadoop is an framework that supports the processing of huge and diversed data sets across clustered systems

• Hadoop does with support of related tools like YARN, MapReduce, Hive…

• This serves as an central repository for all incoming streams of raw data.

• Hadoop is not a single product instead its an collection of components.

• Its popularity is in storing, analyzing and in fast retrieval of unstructured data in low cost effective manner.

Hadoop EcoSystem

www.sethuonline.com

Page 5: Big Data As a service - Sethuonline.com | Sathyabama University Chennai

Data As A Service [DaaS]

• Data as a service (DaaS) is the delivery of statistical analysis tools or information obtained from large information sets in order to gain a competitive advantage for an organization.

• This is done over the immense volume of unstructured data that was updated in the regular basis

HOW IT WORKS: - the data’s obtained using web crawlers are sent into

framework of Hadoop for the following processing* Data Storage* Data Processing* Data Management

www.sethuonline.com

Page 6: Big Data As a service - Sethuonline.com | Sathyabama University Chennai

Literature Survey

S.NO Base Paper Proposed Work Limitations

1 Service-Generated Big Data and Big Data-as-a-Service“Zibin Zheng ; Dept. of Comput. Sci. & Eng., Chinese Univ. of Hong Kong, Hong Kong, China ; Jieming Zhu ; Lyu, M.R.”[2014 IEEE International Congress ]

BDaaS provide APIs to access service generated Big Data and Big Data Analytics results.

Heterogenous data’s are not handled in BDaaS, thus security is not considered in BigData Analysis

2 “Towards Cloud Based Analytics A s a Service for BigData Analytics in the cloud”“Chanchal Yadav, Shullang Wang, Manoj Kumar,” [IJCSN, Vol 2, Issuue 3, 2014 ISSN:2277-5420]

proposes the conceptual architecture of CLAaaS, a big data analytics service providing platform in cloud

Due to multi-tenancy, compromises are made at design level and requires efficient Text Processing algorithm for efficient data retrieval

www.sethuonline.com

Page 7: Big Data As a service - Sethuonline.com | Sathyabama University Chennai

Literature Survey

S.NO Base Paper Proposed Work Limitations

3 Wei Fan, Albert Bifet, “Mining Big Data: Current Status, and Forecast to the Future”, SIGKDD Explorations,2014 Volume 14, Issue 2

overview of architecture and algorithms used in large data sets. These algorithms define various structures and methods implemented to handle Big Data

Normalization, Record Linkage and Quality measures needs to be addressed.

4 Priya P. Sharma, Chandrakant P. Navdeti, “Securing Big Data Hadoop: A Review of SecurityIssues, Threats and Solution”, IJCSIT, Vol 5(2), 2014, 2126-2131

Big data security at the environment level along with security issues that we are dealing with today

Needs to ensure security for the data sources by which efficient data can be considered for business insights

www.sethuonline.com

Page 8: Big Data As a service - Sethuonline.com | Sathyabama University Chennai

Literature Survey “Service Generated Big Data and Big Data-as-a Service ”

“Zibin Zheng ; Dept. of Comput. Sci. & Eng., Chinese Univ. of Hong Kong, Hong Kong, China ; Jieming Zhu ; Lyu, M.R.”[2014 IEEE International Congress ]

This paper explains the Research on storing and processing the increasing amount of data obtained from various service generated Big Data and the analysis done by BDaaS for improved Analytics

Issues Addressed :• A single infrastructure provides functionality for storing and Analyzing different types

of service-generated BigData• BDaaS provide APIs to access service generated Big Data and Big Data Analytics

results.  • Service Logs, Service Qos and Service Relationship are exploited to enhance system

performance.

www.sethuonline.com

Page 9: Big Data As a service - Sethuonline.com | Sathyabama University Chennai

Survey Contd…

“Service Generated Big Data and Big Data-as-a Service ”“Zibin Zheng ; Dept. of Comput. Sci. & Eng., Chinese Univ. of Hong Kong, Hong Kong, China ; Jieming Zhu ; Lyu, M.R.”[2014 IEEE International Congress ]

Issues not Addressed :• Heterogenous data’s are not condsidered in service generated

Big Data while enhancing quality of service oriented systems. • The technology road map for API’s are not synchronized• The security issues are not addressed with respect to service

providers and BDaaS. • Pattern Matching excluded for heterogenous data provides

inefficient retrieval of data’s.www.sethuonline.com

Page 10: Big Data As a service - Sethuonline.com | Sathyabama University Chennai

Literature Survey Contd…

• “Towards Cloud Based Analytics A s a Service for BigData Analytics in the cloud”“Chanchal Yadav, Shullang Wang, Manoj Kumar,” [IJCSN, Vol 2, Issuue 3, 2014

ISSN:2277-5420]

Issues Addressed :• This paper proposes the conceptual architecture of CLAaaS, a big data

analytics service providing platform in cloud. • This platform is equipped with customizable domain specific software

tools and workflow management system to facilitate the execution of big data.– Cognos product is used for statistical, business and scientific data

analysis – BigInsights used for visualization and predictive analytics– Weka provides Graphical User Interfaces

www.sethuonline.com

Page 11: Big Data As a service - Sethuonline.com | Sathyabama University Chennai

Survey Contd…

• “Towards Cloud Based Analytics A s a Service for BigData Analytics in the cloud”

“Chanchal Yadav, Shullang Wang, Manoj Kumar,” [IJCSN, Vol 2, Issuue 3, 2014 ISSN:2277-5420]

Issues not Addressed: • Compromises are made at Design level due to Multi-Tenancy• Seperation of the data of different users needs a new software• Promotion of web collaboration and concerns for data privacy in the cloud.• Text Processing is not handled for the retrieval activity for heterogeneous

data.

www.sethuonline.com

Page 12: Big Data As a service - Sethuonline.com | Sathyabama University Chennai

Inferences from literature survey

• Heterogenous data’s are not handled in BDaaS, thus security is not considered in BigData Analysis

• Needs to ensure security for the data sources by which efficient data can be considered for business insights

• Normalization, Record Linkage and Quality measures needs to be addressed.

• Due to multi-tenancy, compromises are made at design level and requires efficient Text Processing algorithm for efficient data retrieval

www.sethuonline.com

Page 13: Big Data As a service - Sethuonline.com | Sathyabama University Chennai

Problem Defined

• Data retrieval can be made effective for Unstructured and semistructured data by using Machine Learning Algorithms like page ranking and C4.5

• The process of Normalization can be improved with the implementation of text processing

• Record linkage done through efficient mining algorithms for heterogenous data

www.sethuonline.com

Page 14: Big Data As a service - Sethuonline.com | Sathyabama University Chennai

PROPOSED FRAMEWORK

www.sethuonline.com

Page 15: Big Data As a service - Sethuonline.com | Sathyabama University Chennai

Proposed Methodology • A new framework is proposed for to achieve the

efficient DaaS using machine learning Algorithms and Text processing.

• The machine learning algorithm C4.5 helps in building Decision Trees

• The equivalent to C4.5 is CART.• Page Ranking helps in basic graph analysis• The graphs are connected with each other.

www.sethuonline.com

Page 16: Big Data As a service - Sethuonline.com | Sathyabama University Chennai

List of References • How Treato Analyzes Health-related Social Media Big Data with Hadoop and HBase _

Cloudera Engineering[Assaf Yardeni,International Conference on Cloud, Big Data and Trust 2013, Nov 13-15, RGPV]

• Algorithm and Approaches to handle large Data- A Survey [Chanchal Yadav, Shullang Wang, Manoj Kumar,”, IJCSN, Vol 2, Issuue 3, 2013

ISSN:2277-5420• Managing Heterogeneous Sensor Data on a Big Data Platform IoT Services for Data-

intensive Science (Koji Zettsu, Takashi Kimata[Computer Software and Applications Conference Workshops (COMPSACW), 2014 IEEE 38th International]

• Performance and energy efficiency of big data applications in cloud environments A Hadoop case study(Eugen Feller, Lavanya Ramakrishnan, Christine Morin IJCSN” Volume 74, Issue 3, March 2014, Pages 2166–2179”)

• Service-generated Big Data and Big Data-as-a-Service Overview (Zibin Zheng, Jieming Zhu, and Michael R. Lyu university of HongKong, china[2014 IEEE International Congress ])

• Prompt Cloud is a leading web data crawling & extraction company, serving customers across the globe with valuable data to suit their business needs.