unlocking big data

15
Unlocking Big Data

Upload: orchestrate-technologies-llc

Post on 14-Jul-2015

348 views

Category:

Technology


0 download

TRANSCRIPT

Unlocking Big Data

Now companies are in the middle of a renovation that forces them to be analytics-driven to continue being competitive. Data analysis provides a complete insight about their business. It also gives noteworthy advantages over their competitors. Analytics-driven insights compel businesses to take action on service innovation, enhance client experience, detect irregularities in process and provide extra time for product or service marketing. To work on analytics driven activities, companies require to gather, analyse and store information from all possible sources. Companies should bring appropriate tools and workflows in practice to analyse data rapidly and unceasingly. They should obtain insight from data analysis result and make changes in their business process and practice on the basis of gained result. It would help to be more agile than their previous process and function.

Building Big with Big Data

Introduction

Few noteworthy facts:

Around 15 petabytes data per year being generated by The Large Hadron Collider near Geneva.

Everyday 1 terabyte new trade data is being produced by New York Stock Exchange.

10 billion photos are being hosted by facebook that takes 1 petabyte storage space.

Around 2.5 petabytes data is being stored by ancestry.com.

Internet data is growing by the rate of 20 terabytes per month. Currently internet archive stores approximately 2 petabytes data.

As per Gartner report, big data has already become focal point of discussion for companies. Now most of the organization will be concentrating and finalizing on process to make

investment in big data.

In the case of research and development, contribution of big data is more about diversity, practicality and sometimes about quantity. The main data analytics competency is the capacity to imagine associations and patterns among available information and data. Enterprises should combine real-time data with clinical data. They should mine genetic data and understand regional and population data. By doing this, organizations can start quickly identifying reasons for research failure. It also helps to create more proficient trials. Companies can also do rapid discovery and get faster approval on new innovation that leads to reduce the expenditure too.

Appropriate usage of Big Data

Research & Development

IT providers should gain excellent knowledge and skill on big data to become champion of big data so that they can stay pertinent in the context of ever changing industry. IT vendors are not dealing with just one distinct technology or one huge sector but they have to work with several technologies and pertain to various industries. Companies are looking for business renovation competences from vendors by accepting big data for:

The capability to gather, interpret and take advantage of huge volume of data from customer, social media and real-time information on product demand and supply is one of the most important aspects of business.To have competitive advantage, improve sales,increase customer loyalty and product enhancement can be achieved by investing in appropriate technology to analyse important business information and data. Companies should improve their capability to store and rapidly analyse these humongous data with the help of right tool and obtain business insights to work on them.

Customer behavior data has been drastically changed because of internet, social media such as facebook, twitter etc. Earlier cash registers and Point-of-Sale systems were ways of running a business. This system was not able to keep a record of every move of a consumer. Old systems have been replaced by e-commerce websites. e-commerce websites records every move of a consumer in the process of purchase. Product feedback used to be taken through a phone call. Now consumer expresses their opinion on purchased product or service through social media that is digitally recorded. All these data can be analysed which will help to enhance product or service.

Customer Behavior Analysis

Precise risk assessment can help to make high quality decision, reduce costs and comply with regulatory guidelines. There is humongous data available to analyze. Companies require a universal workflow and thought process to successfully detect and evaluate all threat possibilities, well-known or anonymous, that their company might encounter. Businesses should detect all threats to the organization. Be a threat on company’s brand image or data violation or regulatory guidelines. Post threat detection, organization must analyze their impact on business opportunities. Big data analysis can help to maintain a balance between threat and opportunity.

Threat Management

Enterprises are not able to manage huge amount and type of data and need for quick analysis to obtain actionable insights. Below are few tools that can be used for data and business analysis:

Business Analysis Tools

It is also an open source package that creates reports from database column. One of the most valuable features of this package is ability to convert SQL tables into PDF. Companies are using this feature to present the table into PDF format and discuss in meetings. The JasperReports Server provides software to suck up data from storage platforms such as:

Jaspersoft BI Suite

MongoDB

Cassandra

Redis

Riak

CouchDB

Neo4j

It is 9 years old open source data processing platform.Cloudera started providing support in 2008 for the same. Now MapR and Hortonworks are also providing support. Hadoop jobs are written in Java.

Pentaho started as engine to produce reports. Now it is entering into big data amaking simple to gather data from new sources. Pentaho's tool can be hooked up with NoSQL databases like MongoDB and Cassandra. Post connection with database, columns can be dragged and dropped into views. It presents in such a way that it seems information has been taken from SQL database.

Tableau Desktop visualization tool we can look at data in unique way, then analyse and view in different way. Tableau is trying to provide a mechanism that allows slicing and dicing of data time and again as per requirement.

Hadoop Pentaho Business Analytics

Tableau Desktop and Server

Pentaho started as engine to produce reports. Now it is entering into big data amaking simple to gather data from new sources. Pentaho's tool can be hooked up with NoSQL databases like MongoDB and Cassandra. Post connection with database, columns can be dragged and dropped into views. It presents in such a way that it seems information has been taken from SQL database.

Splunk It is not precisely a report-producing tool or a group of AI routines. However it generates reports along the way. It builds a directory of data. This indexing is flexible. Splunk makes sense of log files as it already tuned to a particular application.

There are few more tools such as Karmasphere Studio and Analyst, Talend Open Studio, Skytree Server that can be utilize for business and data analysis. Organizations will get into big data with their own unique thought process. Companies would be focusing on analytics and agility as they would want to take advantage of big data and IT. Conventional businesses will not get altered but innovative technologies would alter business process and practices that would help organizations to be more agile.

Splunk

Splunk

Analyzing Unstructured DataInformation digitization with high volume of multi-channel transaction has resulted into data flood. The always growing speed of digital data has forced the world’s combined data to twofold. As per Gartner report, approximately 80% data apprehended by a company is unstructured data. It includes data from consumer calls, emails and opinion on social platforms. In addition to this, huge amount of data is being generated through diagnostic information logged by various user devices. In first place, organized data itself is so huge that it demands a humongous effort to analyse the same. Making sense out of unstructured data would be far more difficult than structured data.

Companies should understand structured, semi- structured and unstructured information to reach at important business decisions. Enterprises can take right decisions such as defining consumer sentiment, customizing offers etc only after analysing all available data.

While going through huge amount of data might seem a tough job but at the end it would be rewarding. By going through unstructured data sets, relation and pattern can be found out by detecting connection between unrelated data sources. Trends can be discovered through this analysis method that would be useful insight for a business.

Route to Analyze Unstructured DataUse relevant data sources

Define analytics requirement

Pick technology stack for data incorporation and storage

Information digitization with high volume of multi-channel transaction has resulted into data flood. The always growing speed of digital data has forced the world’s combined data to twofold. As per Gartner report, approximately 80% data apprehended by a company is unstructured data. It includes data from consumer calls, emails and opinion on social platforms. In addition to this, huge amount of data is being generated through diagnostic information logged by various user devices. In first place, organized data itself is so huge that it demands a humongous effort to analyse the same. Making sense out of unstructured data would be far more difficult than structured data.

Companies should understand structured, semi- structured and unstructured information to reach at important business decisions. Enterprises can take right decisions such as defining consumer sentiment, customizing offers etc only after analysing all available data.

While going through huge amount of data might seem a tough job but at the end it would be rewarding. By going through unstructured data sets, relation and pattern can be found out by detecting connection between unrelated data sources. Trends can be discovered through this analysis method that would be useful insight for a business.

To start, it is essential to understand data sources that are significant for the analysis. Streaming videos, chat, emails, voice files and web logs, all of them comes under unstructured data sources. If the information is only loosely connected to the issue, it must be kept aside. Only relevant data sources should be used for analysis that would result into relevant outcome.

An analysis may become useless in case end requirement is not defined. It is key to know what kind of result is expected. Expectation could be volume, pattern, reason, impact or altogether something different. Also, usage roadmap for analysis result should be given so that it can be utilize during predictive analysis prior to segmentation and integration.

Fresh data can be brought from various data sources. The analysis result should be kept in a technology stack or in cloud storage so that it is simpler to get data for analysis purpose. Picking data storage system is dependent on various aspects such as scalability, quantity, and velocity needs. It is essential to pick right technology stack for data incorporation and storage. Project information architecture can be set only after evaluation of final requirement against technology stack.

Below are few business needs and the corresponding mapping of the technology stack:

Real- Time: Real time quote is very important for e-commerce organizations. It needs following real-time actions and bring offerings on the basis of predictive analysis results. Storm, Flume and Lambda are some of the technologies that provide the same.

Accessibility: This is vital to consume data from social media. The technology should make sure that data loss does not happen in real-time stream. Data redundancy plan should be incorporated in the project. Messaging queue such as Apache Kafka can be used to hold incoming information.

Multi- tenancy:  Another important aspect is the capability to separate information and resources from various user groups. Big Data solutions must be capable of supporting multi – tenancy circumstances. Consumer data, feedaback and insights are sensitive and extremely important. Data isolation is vital to fulfil confidentiality requirements.

Security logs: HBase or Cassandra with flexible column families can be used to process unstructured web logs or security logs.

Use data lake to keep data before sending todata warehouse

Clean the Data

Recover Valuable Data

Conventionally, a company gathered data, cleaned it and stored like if data source was HTML file, only text will be extracted stored. Other information from HTML file will be lost in such a way that it seems the same has been lost while storing in data warehouse. The plea of this preceding approach was that the data was in an unspoiled, changeable format. It could be used on the basis of requirement. Though, with the arrival of Big Data, data lake is being utilize to store the data in its original format. So that when it is thought beneficial and required for a reason data can be provided in its original format. It protects the data with all information that might help in analysis.

It is advised to clean up a copy of data and keep the original file in native format. For example, a text file can have plenty of noise that vague important information. It is good method to remove noise such as whitespace, symbols while changing casual text into a formal document. Spoken language should be specified and kept separately. Duplicate information should be removed.

Parts- of- Speech tagging can be used for finding general entities such as person, company, location and connections among them. It is called natural language processing and semantic analysis. With this, frequency matrix can be built to know the word trend and pattern in the text.

Ontology Assessment

Data Modeling and Text Mining

Connections among sources and entities can be built to create specific structured database through analysis. It might be a time consuming task but obtained insights would be significant to any business.

Consumer behavior resemblances and comparisons can be found out through these tools. It would help to design a campaign. The nature of consumers can be identified with sentiment analysis of opinions and feedbacks.

Data should be classified and segmented post database creation. It will consume less time while utilizing supervised and unsupervised machine learning such as:

K- means

Logistic Regression

Naïve Bayes

Support Vector Machine Algorithms

It is important that analysis results are shared in a tabular and graphical format. It should give actionable insights. Information should be rendered in such a way so that it can be accessed and utilized on handheld device or web based tool. It would help end user to make the most out of analysis result. ROI should be measured in terms of investment & cost and also in terms of improvement in process efficiency and effectiveness.

The actual worth is in usage of data analysis for 360 degree insight. It should have combine analysis of structured and unstructured data. Structured data can forecast consumer behavior. Unstructured data analysis can reveal motive behind such behavior. Fresh data sources like social platforms are vital to companies as they offer unique information that can be analyzed. Data scientists need to equip themselves with new and appropriate skills to analyse unstructured data.

Impact Measurement

It is important that analysis results are shared in a tabular and graphical format. It should give actionable insights. Information should be rendered in such a way so that it can be accessed and utilized on handheld device or web based tool. It would help end user to make the most out of analysis result. ROI should be measured in terms of investment & cost and also in terms of improvement in process efficiency and effectiveness.

The actual worth is in usage of data analysis for 360 degree insight. It should have combine analysis of structured and unstructured data. Structured data can forecast consumer behavior. Unstructured data analysis can reveal motive behind such behavior. Fresh data sources like social platforms are vital to companies as they offer unique information that can be analyzed. Data scientists need to equip themselves with new and appropriate skills to analyse unstructured data. www.orchestrate.com

1330, Capital Parkway, Carrollton, Texas 75006

[email protected] | Toll Free: 800-232-5130

About Orchestrate

Orchestrate is a US based business process management organisation with Headquarter in Dallas, USA. Orchestrate satisfies to the diverse outsourcing requirements of clients in an extensive range of businesses, including IT, finance, mortgage, utilities and healthcare. Orchestrate is continu-ously motivated to add significance to clients’ businesses through efficient back office practices and noteworthy cost savings.