real time big data applications: file · web viewunit i. introduction to big data. big data –...

UNIT I

INTRODUCTION TO BIG DATA

Big Data – Definition, Characteristic Features – Big Data Applications - Big Data vs. Traditional Data - Risks of Big Data - Structure of Big Data - Challenges of Conventional Systems - Web Data – Evolution of Analytic Scalability - Evolution of Analytic Processes, Tools and methods - Analysis vs. Reporting - Modern Data Analytic Tools.

1.1 DEFINITION:

Big data is an evolving term that describes any voluminous amount of structured, semi structured and unstructured data that has the potential to be mined for information. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy.

1.2 CHARACTERISTICS OF BIG DATA:

(i)Volume

The name 'Big Data' itself is related to a size which is enormous. Size of data plays very crucial role in determining value out of data. Also, whether a particular data can actually be considered as a Big Data or not, is dependent upon volume of data. Hence, 'Volume' is one characteristic which needs to be considered while dealing with 'Big Data'.

(ii)Variety

Variety refers to heterogeneous sources and the nature of data, both structured and unstructured. During earlier days, spreadsheets and databases were the only sources of data considered by most of the applications. Now days, data in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. is also being considered in the analysis applications. This variety of unstructured data poses certain issues for storage, mining and analyzing data.

(iii)Velocity

The term 'velocity' refers to the speed of generation of data. How fast the data is generated and processed to meet the demands, determines real potential in the data. Big Data Velocity deals with the speed at which data flows in from sources like business processes, application logs, networks and social media sites, sensors, Mobile devices, etc. The flow of data is massive and continuous.

(iv)Variability

This refers to the inconsistency which can be shown by the data at times, thus hampering the process of being able to handle and manage the data effectively.

1

https://en.wikipedia.org/wiki/Data_analysis

https://en.wikipedia.org/wiki/Computer_data_storage

https://en.wikipedia.org/wiki/Automatic_identification_and_data_capture

http://searchbusinessanalytics.techtarget.com/definition/unstructured-data

http://whatis.techtarget.com/definition/semi-structured-data

http://whatis.techtarget.com/definition/semi-structured-data

http://whatis.techtarget.com/definition/structured-data

1.3 BIG DATA APPLICATIONS:

Some of the industries propelled by big data analytics are:

Public Sector Services. Healthcare contributions. Learning Services. Insurance Services. Industrialized and Natural Resources. Transportation Services. Banking Sectors and Fraud Detection

Real Time Big data Applications:

a) Procurement with Big data

Demand can be forecasted properly as per different conditions available with Big Data.

b) Big data in Product development

What product to be developed to increase sales

c) Big data in manufacturing sector

Big data can be used to identify machinery and process variations that may be indicators of quality problems.

d) Big data for product distribution

Based on data available, its analysis could be done to ensure proper distribution in proper market.

e) Big data in Marketing field

Big data helps in knowing better marketing strategy that could increase ale.

f) Price Management using Big data

To maintain position in market, price management plays a key role and Big data helps business in knowing market trend for it.

g) Merchandising

Big Data plays a major role in sales for retail market also.

2

h) Big data in Sales

It helps in increasing sale for the business. It also helps in optimizing assignment of sales resources and accounts, product mix and other operations.

i) Store Operations using Big Data

Different tools can be used to monitor store operations which reduce manual work. Big data helps in adjusting inventory levels on the basis of predicted buying patterns, study of demographics, weather, key events, and other factors.

j) Big data in Human Resources

Big Data has changed way of recruitment and other HR operations. You can also find out the characteristics and behaviors of successful and effective employees, as well as other employee insights to manage talent better.

k) Big data in Banking

Big Data has provided biggest opportunity to companies like Citi bank to see the big picture due to balancing the sensitive nature of the data for delivering value to clients along with prioritizing the privacy and protection of information. It has been fully adopted by many companies to drive business growth and enhance the services they provide to customers. Understand further how Income tax has benefited from Big data.

l) Big data in Finance sector

Financial services have widely adopted big data analytics to inform better investment decisions with consistent returns. The big data pendulum for financial services has swung from passing fad to large deployments last year.

m) Big data in Telecom

A recent report, “Global Big Data Analytics Market in Telecom Industry 2014-2018,” found that use of data analytics tools in telecom sector is expected to grow at a compound annual growth rate of 28.28 percent over the next four years. Mobile Telecom harnesses Big Data with combined actuate and hadoop solution.

n) Big data in retail sector

Retailers harness Big Data to offer consumers personalized shopping experiences. Analyzing how a customer came to make a purchase, or the path to purchase, is 1 way big data tech is making a mark in retail. 66% of retailers have made financial gains in customer relationship management through big data.

3

http://data-flair.training/blogs/hadoop-introduction-comprehensive-tutorial-guide-beginners/

http://data-flair.training/blogs/income-tax-department-to-use-big-data-to-scrutinise-bank-accounts/

o) Big data in HealthCare

Big data is used for analyzing data in the electronic medical record (EMR) system with the goal of reducing costs and improving patient care. This Data includes the unstructured data from physician notes, pathology reports etc. Big Data and healthcare analytics have the power to predict, prevent & cure diseases.

p) Big data in Media and Entertainment

Big data is changing the media and entertainment industry, giving users and viewers a much more personalized and enriched experience. Big data is used for increasing revenues, understanding real-time customer sentiment, increasing marketing effectiveness and ratings and viewership.

q) Big Data in tourism

Big data is transforming the global tourism industry. People know more about the world than ever before. People have much more detailed itineraries these days with the help of Big data.

r) Big data in Airlines

Big Data and Analytics give wings to the Aviation Industry. An airline now knows where a plane is headed, where a passenger is sitting, and what a passenger is viewing on the IFE or connectivity system.

s) Big data in Social Media

Big data is a driving factor behind every marketing decision made by social media companies and it is driving personalization to the extreme.

1.4 BIG DATA VS TRADITIONAL DATA:

The major difference between traditional data and big data are discussed below.

Data architecture

Traditional data use centralized database architecture in which large and complex problems are solved by a single computer system. Centralised architecture is costly and ineffective to process large amount of data. Big data is based on the distributed database architecture where a large block of data is solved by dividing it into several smaller sizes. Then the solution to a problem is computed by several different computers present in a given computer network. The computers communicate to each other in order to find the solution to a problem . The distributed database provides better computing, lower price and also improve the

4

performance as compared to the centralized database system. This is because centralized architecture is based on the mainframes which are not as economic as microprocessors in distributed database system. Also the distributed database has more computational power as compared to the centralized database system which is used to manage traditional data.

Types of data

Traditional database systems are based on the structured data i.e. traditional data is stored in fixed format or fields in a file. Examples of the unstructured data include Relational Database System (RDBMS) and the spreadsheets, which only answers to the questions about what happened. Traditional database only provides an insight to a problem at the small level. However in order to enhance the ability of an organization, to gain more insight into the data and also to know about metadata unstructured data is used. Big data uses the semi-structured and unstructured data and improves the variety of the data gathered from different sources like customers, audience or subscribers. After the collection, Bid data transforms it into knowledge based information

Volume of data

The traditional system database can store only small amount of data ranging from gigabytes to terabytes. However, big data helps to store and process large amount of data which consists of hundreds of terabytes of data or petabytes of data and beyond. The storage of massive amount of data would reduce the overall cost for storing data and help in providing business intelligence

Data schema

Big data uses the dynamic schema for data storage. Both the un-structured and structured information can be stored and any schema can be used since the schema is applied only after a query is generated. Big data is stored in raw format and then the schema is applied only when the data is to be read. This process is beneficial in preserving the information present in the data. The traditional database is based on the fixed schema which is static in nature. In traditional database data cannot be changed once it is saved and this is only done during write operations

Data relationship

In the traditional database system relationship between the data items can be explored easily as the number of information’s stored is small. However, big data contains massive or voluminous data which increase the level of difficulty in figuring out the relationship between the data items

Scaling

Scaling refers to demand of the resources and servers required to carry out the computation. Big data is based on the scale out architecture under which the distributed approaches for computing are employed with more than one server. So, the load of the

5

computation is shared with single application based system. However, achieving the scalability in the traditional database is very difficult because the traditional database runs on the single server and requires expensive servers to scale up.

Higher cost of traditional data

Traditional database system requires complex and expensive hardware and software in order to manage large amount of data. Also moving the data from one system to another requires more number of hardware and software resources which increases the cost significantly. While in case of big data as the massive amount of data is segregated between various systems, the amount of data decreases. So use of big data is quite simple, makes use of commodity hardware and open source software to process the data

Accuracy and confidentiality

Under the traditional database system it is very expensive to store massive amount of data, so all the data cannot be stored. This would decrease the amount of data to be analyzed which will decrease the result’s accuracy and confidence. While in big data as the amount required to store voluminous data is lower. Therefore the data is stored in big data systems and the points of correlation are identified which would provide high accurate results.

1.5 RISKS OF BIG DATA:

Data Security

This risk is obvious and often uppermost in our minds when we are considering the logistics of data collection and analysis. Data theft is a rampant and growing area of crime – and attacks are getting bigger and more damaging.

The bigger your data, the bigger the target it presents to criminals with the tools to steal and sell it. In the case of Target, hackers stole credit and debit card information of 40 million customers, as well as personal identifying information such as email and geographical addresses of up to 110 million people. In March, a federal judge approved a settlement in which Target would pay $10 million into a settlement fund, from which payments of up to $10,000 would be made to everyone affected by the breach.

Data Privacy

Closely related to the issue of security is privacy. But in addition to ensuring that people’s personal data are safe from criminals, you need to be sure that the sensitive information you are storing and collecting isn’t going to be divulged through less malevolent but equally damaging misuse by yourself or by people to whom you have delegated responsibility for analyzing and reporting on it.

6

http://data-informed.com/profitable-lessons-tomtoms-brush-data-privacy-controversy/

http://www.washingtonpost.com/news/business/wp/2015/03/19/target-data-breach-victims-could-get-up-10000-each-from-court-settlement/

Failing to follow applicable data protection laws can lead to expensive lawsuits and even prison, depending on what sort of data you are using and the jurisdiction you are in. Last year, private hire and car sharing service Uber stirred up controversy when one of its executives was caught using the service’s “God mode” to track the movements of BuzzFeed journalist Johana Bhuiyan.

Costs

Data collection, aggregation, storage, analysis, and reporting all cost money. On top of this, there will be compliancy costs – to avoid falling foul on the issues I raised in the previous point. These costs can be mitigated by careful budgeting during the planning stages, but getting it wrong at that point can lead to spiralling costs, potentially negating any value added to your bottom line by your data-driven initiative. This is why “starting with strategy” is so vital. A well-developed strategy will clearly set out what you intend to achieve and the benefits that can be gained so they can be balanced against the resources allocated to the project. One bank that I worked with was worried about the costs of storing and maintaining all the data it was collecting to the point that it was considering pulling the plug on one particular analytics project, as the costs looked likely to exceed any potential savings. By identifying and eliminating irrelevant data from the project, the bank was able to bring costs back under control and achieve its objectives.

Bad Analytics

Aka “getting it wrong.” Misinterpreting the patterns shown by your data and drawing causal links where there is in fact merely random coincidence is an obvious pitfall. Sales data may show a rise following a major sporting event, prompting you to draw a link between sports fans and your products or services, when in fact the rise is based on there being more people in town, and the rise would be equally dramatic after a large live music event.

In addition, care must be taken to avoid confirmation bias – easily imposed when an analyst comes to a project with predetermined ideas about what they are looking for and is blinded to insights from the data that go against these preconceived notions. The only way to protect against this is to ensure that you are implementing all best practice procedures from top to bottom throughout your project.

Google’s Flu Trends project serves as a good example. Designed to produce accurate maps of flu outbreaks based on the searches being made by Google users, at first it provided compelling results. But as time went on, its predictions began to diverge increasingly from reality. It turned out that the algorithms behind the project just weren’t accurate enough to pick up anomalies such as the 2009 H1N1 pandemic, vastly reducing the value that could be gained from them.

Bad Data

I’ve come across many data projects that start off on the wrong foot by collecting irrelevant, out of date, or erroneous data. This usually comes down to insufficient time being spent on designing the project strategy. The big data gold rush has led to a “collect everything

7

http://bits.blogs.nytimes.com/2014/03/28/google-flu-trends-the-limits-of-big-data/

http://data-informed.com/whats-wrong-picture-art-honest-visualizations/

http://www.buzzfeed.com/johanabhuiyan/uber-is-investigating-its-top-new-york-executive-for-privacy#.ksWAa6DgeN

and think about analyzing it later” approach at many organizations. This not only adds to the growing cost of storing the data and ensuring compliance, it leads to large amounts of data that can become outdated very quickly.

The real danger here is falling behind your competition. If you are not analyzing the right data, you won’t be drawing the right insights that will provide value. Meanwhile, your competitors most likely will be running their own data projects. And if they are getting it right, they’ll take the lead. A healthcare client I recently worked with created a 217-page report for senior management. A lot of the data in the report would have been useful, but it was drowned out by irrelevant background noise. Working with them, I was able to show them how to cut the report down to 20 pages, mostly info graphics, which clearly showed the relevant data while omitting a lot of the noise.

That’s just a simple checklist of the risks that every big data project needs to account for before one cent is spent on infrastructure or data collecting. Businesses of all sizes should engage wholeheartedly with big data projects. If they don’t, they run the serious risk of being left behind. But they also should be aware of the risks and enter into big data projects with their eyes wide open.

1.6 STRUCTURE OF BIG DATA:

Figure: Big Data structures, models and their linkage at different processing stages.

8

1.7 CHALLENGES OF CONVENTIONAL SYSTEMS:

In the past, the term ‘Analytics' has been used in the business intelligence world to provide tools and intelligence to gain insight into the data through fast, consistent, interactive access to a wide variety of possible views of information. Data mining has been used in enterprises to keep pace with the critical monitoring and analysis of mountains of data. The main challenge in the traditional approach is how to unearth all the hidden information through the vast amount of data.

Traditional Analytics analyzes on the known data terrain that too the data that is well understood. It cannot work on unstructured data efficiently.

Traditional Analytics is built on top of the relational data model, relationships between the subjects of interests have been created inside the system and the analysis is done based on them. This approach will not adequate for big data analytics.

Traditional analytics is batch oriented and we need to wait for nightly ETL (extract, transform and load) and transformation jobs to complete before the required insight is obtained.

Parallelism in a traditional analytics system is achieved through costly hardware like MPP (Massively Parallel Processing) systems

Inadequate support of aggregated summaries of data

Apart from these challenges others are categorized as Data challenges

-Volume, velocity, veracity, variety -Data discovery and comprehensiveness -Scalability

Process challenges -Capturing data -Aligning data from different sources -Transforming data into suitable form for data analysis -Modeling data (mathematically, simulation) -Understanding output, visualizing results and display issues on mobile devices

Management challenges -Security -Privacy -Governance -Ethical issues -Traditional/ RDBMS challenges -Designed to handle well structured data -traditional storage vendor solutions are very expensive -shared block-level storage is too slow -read data in 8k or 16k block size -Schema-on-write requires data be validated before it can be written to disk. -Software licenses are too expensive -Get data from disk and load into memory requires application

9

1.8 WEB DATA:

In the world of Big Data, there's a lot of talk about unstructured data -- after all, "variety" is one of the three Vs. Often these discussions dwell on log file data, sensor output or media content. But what about data on the Web itself -- not data from Web APIs, but data on Web pages that were designed more for eyeballing than machine-driven query and storage? How can this data be read, especially at scale? Recently, I had a chat with the CTO and Founder of Kapow Software, Stefan Andreasen, who showed me how the company's Katalyst product tames data-rich Web sites not designed for machine-readability.

Scraping the Web:

If you're a programmer, you know that Web pages are simply visualizations of HTML markup -- in effect every visible Web page is really just a rendering of a big string of text. And because of that, the data you may want out of a Web page can usually be extracted by looking for occurrences of certain text immediately preceding and following that data, and taking what's in between.

Code that performs data extraction through this sort of string manipulation is sometimes said to be performing Web "scraping." This term that pays homage to "screen scraping," a similar, though much older, technique used to extract data from mainframe terminal screen text. Web scraping has significant relevance to Big Data. Even in cases where the bulk of a Big Data set comes from flat files or databases, augmenting that with up-to-date- reference data from the Web can be very attractive, if not outright required.

Unlocking Important Data:

But not all data is available through downloads, feeds or APIs. This is especially true of government data, various Open Data initiatives notwithstanding. Agencies like the US Patent and Trademark Office (USPTO) and the Federal Securities and Exchange Commission (SEC) have tons of data available online, but API access may require subscriptions from third parties.

Similarly, there's lots of commercial data available online that may not be neatly packaged in code-friendly formats either. Consider airline and hotel frequent flyer/loyalty program promotions. You can log into your account and read about them, but just try getting a list of all such promotions that may apply to a specific property or geographic area, and keeping the list up-to-date. If you're an industry analyst wanting to perform ad hoc analytical queries across such offers, you may be really stuck.

Downside Risk:

So it's Web scraping to the rescue, right? Not exactly, because Web scraping code can be brittle. If the layout of a data-containing Web page changes -- even by just a little -- the text patterns being searched may be rendered incorrect, and a mission critical process may completely break down. Fixing the broken code may involve manual inspection of the page's

10

http://www.sec.gov/edgar/searchedgar/companysearch.html

http://www.sec.gov/edgar/searchedgar/companysearch.html

http://patft.uspto.gov/netahtml/PTO/search-adv.htm

http://patft.uspto.gov/netahtml/PTO/search-adv.htm

http://kapowsoftware.com/

new markup, then updating the delimiting text fragments, which would hopefully be stored in a database, but might even be in the code itself.

Such an approach is neither reliable, nor scalable. Writing the code is expensive and updating it is too. What is really needed for this kind of work is a scripting engine which determines the URLs it needs to visit, the data it needs to extract and the processing it must subsequently perform on the data. What's more, allowing the data desired for extraction, and the delimiters around it, to be identified visually, would allow for far faster authoring and updating than would manual inspection of HTML markup.

An engine like this has really been needed for years, but the rise of Big Data has increased the urgency. Because this data is no longer needed just for simple and quick updates. In the era of Big Data, we need to collect lots of this data and analyze it.

Making it Real:

Kapow Software's Katalyst product meets the spec, and then some. It provides all the wish list items above: visual and interactive declaration of desired URLs, data to extract and delimiting entities in the page. So far, so good. But Katalyst doesn't just build a black box that grabs the data for you. Instead, it actually exposes an API around its extraction processes, thus enabling other code and other tools to extract the data directly.

That's great for public Web sites that you wish to extract data from, but it's also good for adding an API to your own internal Web applications without having to write any code. In effect, Katalyst builds data services around existing Web sites and Web applications, does so without required coding, and makes any breaking layout changes in those products minimally disruptive.

Maybe the nicest thing about Katalyst is that it's designed with data extraction and analysis in mind, and it provides a manageability layer atop all of its data integration processes, making it perfect for Big Data applications where repeatability, manageability, maintainability and scalability are all essential.

Web Data is BI, and Big Data:

Katalyst isn't just a tweaky programmer's toolkit. It's a real, live data integration tool. Maybe that's why Informatics, a big name in BI which just put out its 9.5 release this week, announced a strategic partnership with Kapow Software. As a result, Informatica PowerExchange for Kapow Katalyst will be made available as part of Informatica 9.5. Version 9.5 is the Big Data release of Informatica, with the ability to treat Hadoop as a standard data source and destination. Integrating with this version of Informatica makes the utility of Katalyst in Big Data applications not merely a provable idea, but a product reality.

11

http://www.marketwire.com/press-release/kapow-software-partners-with-informatica-to-provide-big-data-access-1657613.htm

http://www.informatica.com/

http://kapowsoftware.com/products/kapow-katalyst-platform/core-components.php

1.9 ANALYSIS Vs REPORTING:

There are five differences between reporting and analysis:

1. Purpose

Reporting helps companies monitor their data even before digital technology boomed. Various organizations have been dependent on the information it brings to their business, as reporting extracts that and makes it easier to understand.

Analysis interprets data at a deeper level. While reporting can link between cross-channels of data, provide comparison, and make understand information easier (think of a dashboard, charts, and graphs, which are reporting tools and not analysis reports), analysis interprets this information and provides recommendations on actions.

2. Tasks

As reporting and analysis have a very fine line dividing them, sometimes it’s easy to confuse tasks that have analysis labeled on top of them when all it does is reporting. Hence, ensure that your analytics team has a healthy balance doing both.

Here’s a great differentiator to keep in mind if what you’re doing is reporting or analysis:

Reporting includes building, configuring, consolidating, organizing, formatting, and summarizing. It’s very similar to the abovementioned like turning data into charts, graphs, and linking data across multiple channels.

Analysis consists of questioning, examining, interpreting, comparing, and confirming. With big data, predicting is possible as well.

3. Outputs

Reporting and analysis have the push and pull effect from its users through their outputs. Reporting has a push approach, as it pushes information to users and outputs come in the forms of canned reports, dashboards, and alerts.

Analysis has a pull approach, where a data analyst draws information to further probe and to answer business questions. Outputs from such can be in the form of ad hoc responses and analysis presentations. Analysis presentations are comprised of insights, recommended actions, and a forecast of its impact on the company—all in a language that’s easy to understand at the level of the user who’ll be reading and deciding on it.

This is important for organizations to realize truly the value of data, such that a standard report is not similar to a meaningful analytics.

12

4. Delivery

Considering that reporting involves repetitive tasks—often with truckloads of data, automation has been a lifesaver, especially now with big data. It’s not surprising that the first thing outsourced are data entry services since outsourcing companies are perceived as data reporting experts.

Analysis requires a more custom approach, with human minds doing superior reasoning and analytical thinking to extract insights, and technical skills to provide efficient steps towards accomplishing a specific goal. This is why data analysts and scientists are demanded these days, as organizations depend on them to come up with recommendations for leaders or business executives make decisions about their businesses.

5. Value

This isn’t about identifying which one brings more value, rather understanding that both are indispensable when looking at the big picture. It should help businesses grow, expand, move forward, and make more profit or increase their value.

1.10 MODERN DATA ANALYTIC TOOLS:

Following are some of the prominent big data analytics tools and techniques that are used by analytics developers.

Cassandra:

This is the most applauded and widely used big data tool because it offers an effective management of large and intricate amounts of data. This is a database which offers high availability and scalability without affecting the performance of commodity hardware and cloud infrastructure. Cassandra has many advantages and some of those are fault tolerance, decentralization, durability, performance, professional support, elasticity, and scalability. Since this tool has so many qualities hence it is loved by all the analytics developers. Companies which are using Cassandra big data analytics tool are eBay and Netflix.

Hadoop:

This is a striking product from Apache which has been used by many eminent companies. Hadoop is basically an open-source software framework which is written in Java language so that it can work with a chunk of data sets. It is designed in such a way so that it can scale up from a single server to hundreds of machines. The most prominent feature of this advanced software library is superior processing of voluminous data sets. Many companies choose big data tool Hadoop because of its great processing capabilities. With this tool, the developer provides regular updates and improvements to the product.

13

http://www.infinitdatum.com/back-office-data-management-services/

Knime:

This is a big data analytics open source data tool. Knime is a leading analytics platform which provides an open solution for data-driven innovation. With the help of this tool, you can discover the hidden potential of your data, mine for fresh insights, and can predict new futures by analysing the data. With nearly 1000 modules, hundreds of ready-to-run examples, a complete range of integrated tools, and a chunk of advanced algorithms available, this Knime analytics platform is certainly the best toolbox for any data scientist who wants to accomplish his job in a hassle-free way. This tool can support any type of data like XML, JSON, Images, documents, and more. This tool also possesses advanced predictive and machine learning algorithms.

OpenRefine:

Are you stuck up with large and voluminous data sets? Then this tool is ideal for you which help you to explore huge and baggy data sets easily. Basically, OpenRefine helps to organize the data in the database that was nothing but a mess and muddle. This tool helps you in cleaning and transforming data from one format into another. This data tool can also be used to link and extend your datasets with web services and other peripheral data. Earlier, OpenRefine is known as Google Refine but from 2012, Google didn’t support this project and it was then rebranded to OpenRefine.

R language:

R is an open source programming language which helps the organizations to manage and analyse a chunk of data effectively and aptly. The language was initially written by Ross Ihaka and Robert Gentleman but it has got immense appreciation from the mathematicians, statisticians, data scientists and data miners who are in the field of data analytics. R is packed with a host of data analysis tools which make the analysis of data more facile and simpler for the users. With R, businesses don’t need to develop the customized tools and moreover, they can easily get rid of the time-consuming codes. R is the prime data analysis software which consists of innumerable algorithms that are designed for data retrieval, processing, analysis and high-end statistical graphics representations.

Plotly:

As a successful big data analytics tool, Plotly has been used to create great dynamic visualization even the organization has inadequate time or skills for meeting big data needs. With the help of this tool, you can create stunning and informative graphics very effortlessly. Basically, Plotly is used for composing, editing, and sharing interactive data visualization via web.

Bokeh:

This tool has many resemblances with Plotly. This tool is very effective and useful if you want to create easy and informative visualizations. Bokeh is a Python interactive visualization library which helps you in creating astounding and meaningful visual presentation of data in the

14

web browsers. Thus, this tool is widely used by big data analytics experienced persons to create interactive data applications, dashboards, and plots quickly and easily. Many data analytics experts claimed that Bokeh is the most progressive and effective visual data representation tool.

Neo4j:

Neo4j is one of the leading big data analytics tools as it takes the big data business to the next level. Neo4j is a graph database management system which is developed by Neo4j Inc. This tool helps to work with the connections between them. The connections between the data drive modern intelligent applications, and Neo4j is the tool that transforms these connections to gain competitive advantage. As per DB-Engines ranking, Neo4j is the most popular graph database.

Rapidminer:

This is certainly one of the favourite tools for all the data specialists. Like Knime, this is also an open source data science platform which operates through visual programming. This tool has the capability of manipulating, analysing, modeling and integrating the data into business processes. RapidMiner helps data science teams to become more productive by giving an open source platform for data preparation, model deployment, and machine learning. Its unified data science platform accelerates the building of complete analytical workflows. From data preparation to machine learning to model deployment, everything can be done under a single environment. This actually enhances the efficiency and lessens the time for various data science projects.

Wolfram Alpha:

If you want to do something new from your data, then this could be an ideal tool for you. This will give you every minute detail of your data. This famous tool was developed by Wolfram alpha LLC which is a subsidiary of Wolfram Research. If you want to do advanced research on financial, historical, social, and other professional areas, then you must use this platform. Suppose, if you type Microsoft, then you will receive miscellaneous information including input interpretation, fundamentals, financials, new trade, price, performance comparisons, data return analysis, and much more relevant information.

Orange:

Orange is an open source data visualization and data analysis tool which can be used by both novice and sagacious persons in the field of data analytics. This tool provides interactive workflows with a large toolbox. With the help of this toolbox, you can create interactive workflows to analyse and visualize data. Orange is crammed many different visualizations like from scatter plots, bar charts, trees, to dendrograms, networks and heat maps, you can find everything in this tool.

15

Node XL:

This is a data visualization and analysis software tool for relationships and networks. This tool offers exact calculations to the users. You will be glad to know that it is a free and open-source network analysis and visualization software tool which has a wide range of application. This tool is considered as one of the best and latest statistical tools for data analysis which gives advanced network metrics, automation, access to social media network data importers, and many more things.

Storm:

Storm has inscribed its name as one of the popular data analytics tools because of its superior streaming data processing capabilities in real time. You can even integrate this tool with many other tools like Apache Slider in order to manage and secure your data. Storm can be used by an organization in many cases like data monetization, cybersecurity analytics, detection of the threat, operational dashboards, real-time customer management, etc. All these functions can enhance your business growth and will give you many opportunities for the betterment of your business.

Hope, from the above-mentioned list, you got enough information regarding some of the best data analytics tools which will be ruling in the upcoming years. If you want to establish your business firmly, then enhance your knowledge of these data analytics tools.

Norjimm is one among the most popular custom software development company in India with teams having an project experience of 6+ years and with many happy clients in different parts of the world. We are known for our innovative and future preparatory approach that we follow. Get in touch with us today to get the best software and app development services.

16

real time big data applications: file · web viewunit i. introduction to big data. big data –...

Documents