an early adopter's guide to hadoop

22
An Early Adopter’s Guide to Hadoop insights a sas publication

Upload: sassoftware

Post on 16-Apr-2017

132 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: An Early Adopter's Guide to Hadoop

An Early Adopter’s Guide to Hadoop

insightsa sas

publication

insightsa sas

publication

Page 2: An Early Adopter's Guide to Hadoop

Table of contents

How do you know if you’re ready for Hadoop?

1

Hadoop: The new boardroom buzzwordAn interview with SAS and Hortonworks

11

Hadoop survey results reveal use cases, needs and trends

5

Using Hadoop and SAS® for network analytics to build a customer-centric telecom service

14

Big data management: 5 things you need to know

8

The real scoop on HadoopAn interview with Cloudera

17

Page 3: An Early Adopter's Guide to Hadoop

3

How do you know if you’re ready for Hadoop?Hear from an early adopter about training, use cases and analyzing consumer data in Hadoop

By Alison Bolen, SAS Insights Editor

Page 4: An Early Adopter's Guide to Hadoop

Without a doubt, Bob Zurek meets the definition of an early adopter.

The Amazon Echo in his living room can answer questions about his

morning commute or rattle off player stats from yesterday’s Red Sox

game. The Apple Watch™ on his wrist tracks the number of steps he

takes each day and chimes to remind him about his evening activities.

The Nest thermostat in his home adjusts automatically based on the

time of day and the location of everyone in the house.

Thinking about these new devices, the data streaming from each one, and all the advancements that are yet to come, Zurek likes to tell his kids, “You are in for an amazing adventure.”

That philosophy and excitement for new technology carries over into his job, where Zurek oversees a large Hadoop implementation at Epsilon. As Vice President of Products, Zurek is responsible for Epsilon’s line of digital marketing and customer loyalty offerings.

One of Epsilon’s most popular products is a digital messaging platform called Agility Harmony that some of the world’s largest brands use to store and process customer data, so they can create meaningful connections with their customers via mobile devices and other online channels. Combining the power of Hadoop and SAS, Agility Harmony gives Epsilon’s customers a complete view of individual consumers for digital campaigns.

I spoke with Zurek recently about his Hadoop experience and his advice for organizations who are considering a move to Hadoop as a storage and analytics platform.

Let’s start with your overall outlook on Hadoop and analytics. How are things starting to change?

Bob Zurek: Hadoop is becoming pretty mainstream pretty fast, and people are getting skilled up quickly. One thing that’s great to see is the amount of innovation that data scientists and technology leaders are applying to help solve problems with Hadoop. For example, the security around Hadoop has continued to improve.

Early on, Cloudera and Hortonworks turned their attention to making sure that organizations could rely on Hadoop day in and day out, and feel a sense of security around the ecosystem. We need to continue to grow the security capabilities, since security is becoming an important part of the partnership ecosystem for the pioneers in Hadoop.

Just as SAS has focused on interoperability with Hadoop, many vendors are driving new technologies to embrace the capabilities of Hadoop. Whether it’s data integration solutions that move data in and out, or BI technologies that allow you to reach into Hadoop and do data discovery,

One benefit of Hadoop is its ability to scale at heights we’ve never seen before at a very economical price model. As we gather and store more and more information, that information is going to be valuable to consumers in ways we’ve never seen before.

Learn more about SAS and HadoopTOC 2

Page 5: An Early Adopter's Guide to Hadoop

visualization and analytics, the big technology ecosystem players have all embraced it well.

Hadoop has good use cases now in every industry. It’s moving away from being just an ETL alternative or an alternative to a data warehouse, and getting into supporting more interactive applications. It’s not just batch in nature. It can support interactive applications and pure analytic applications.

How are you using analytics with Hadoop?

Zurek: We use Hadoop to capture consumer profile data and segment cus-tomer data on behalf of the brands that we support. Brands are using Agility Harmony to email people who have opted in to get digital communications, and each brand collects information and preferences from opt-in customers who have a special interest in the brand.

Someone at the brand might say, “I want to send an email to females with a passion for soccer who live within a 150-mile radius around Boston.” We use Hadoop to power the segmentation that has to go through advanced querying of the system. The brand can then align the campaign and quickly target consumers with a special deal.

In the meantime, we get a lot of event data coming back for each mailing, like what time it was opened, who clicked what, and from what device. That event data is sent back into the system, so you can tie those events back into the segmentation rules and, for example, look for consumers that have a propensity to transact for a specific type of campaign. So the next mailing might target female soccer players who are looking for a good deal on gear.

Our data science team is full of SAS users. It’s great for machine learning, advanced analytics and solving complex problems. And our customers ask very complex questions of the data. For example, they might decide they want to send a breakfast special to people who open emails on

cellphones near a particular location at 7 a.m. And we can do that.

Hadoop is often synonymous with the term big data. How big is your data, and how important is scalability?Zurek: When you make a platform like Agility Harmony available to the top 50 retailers, banks or quick serve restaurants, there’s a lot going on. We work with a lot of brands, and a single brand might have 2 million consumers stored in Hadoop. This is a cloud-based system that’s managing 40 billion pieces of communication over the course of a month. It has to scale pretty quickly.

One benefit of Hadoop is its ability to scale at heights we’ve never seen before at a very economical price model. There’s this whole notion of performance and scale as we deal with more structured, unstructured and semistructured data.

As we gather and store more and more information, that information is going to be valuable to consumers in ways we’ve never seen before. As consumers store more and more information on mobile devices, they want that information to be easily accessible and consumable. You need the scalability and robustness of technologies like Hadoop to do that.

How did you know you were ready for Hadoop? How might other organizations know if they are ready for Hadoop?Zurek: We have very smart IT operational experts globally. For them, it was like, “Hadoop, no problem. We’ll take it on and manage it.” The developers here got excited and began to embrace it immediately too. You can almost hear the buzz when they start applying their algorithms and techniques to the data in Hadoop.

Pick a reliable distribution partner. We selected Cloudera three years ago because, at the time, they were in best position to support us. We’ve seen great success in working with them, but if you’re a huge Oracle, IBM or

Learn more about SAS and HadoopTOC 3

Page 6: An Early Adopter's Guide to Hadoop

SAS shop and have investments in that technology, these vendors are in a position to help too. They all have teams of Hadoop experts, and a lot of technology services companies can help too.

It’s very important to identify core Hadoop use cases for your modernization project. Pick the easy ones. Take a baby step with Hadoop. Make sure you understand if you’ll use HBase or a SQL interface on top of Hadoop. Is it compatible with tools and existing coding models in your platform? If you’re looking at modernization, make sure you understand the use case and that Hadoop will support that use case.

As a final thought, what one recommendation do you have for analytics and Hadoop?Zurek: Embrace change. Hadoop is a vehicle for change for your organization in a good way. We are seeing significant ROI from our investment in Hadoop, in the 10x range, from increased efficiencies and reduced costs. It’s a very good, strong economic return.

Learn more about SAS and HadoopTOC 4

Page 7: An Early Adopter's Guide to Hadoop

7

Hadoop survey results reveal use cases, needs and trendsBy Anne-Lindsay Beall, SAS Insights Editor

Page 8: An Early Adopter's Guide to Hadoop

In a recent TDWI survey of more than 300 data management

professionals, respondents were unanimous in their zeal for Hadoop.

But with adoption rates still fairly low, the burning question is, why?

How are companies using Hadoop? And why is implementing

Hadoop important?

The TDWI Best Practices Report Hadoop for the Enterprise by Philip Russom answers both questions – first by asking survey respondents to name the most useful applications of Hadoop if their organizations were to implement it (they were to select four or fewer). Here are the top responses:

Not surprisingly, data warehousing and BI are well represented, but non-DW/BI applications – such as archiving traditional data (19 percent), content management (17 percent) and operational applications (11 percent) – also showed up on the list. Russom points out that these applications are becoming more common among Hadoop users and is a sign that Hadoop usage is diversifying across enterprises.

What are companies doing with Hadoop?

Hadoop has enormous potential that’s only beginning to be tapped, and it’s on its way to becoming mainstream.

Learn more about SAS and HadoopTOC 6

Page 9: An Early Adopter's Guide to Hadoop

Why is Hadoop important?

The bottom line? Hadoop has enormous potential that’s only beginning to be tapped, and it’s on its way to becoming mainstream. Of those surveyed, 44 percent expect to have HDFS in production within 12 months, and another 14 percent will have it up and running within 24 months.

To learn more about benefits, barriers and best practices for Hadoop, download the full TDWI Best Practices Report: Hadoop for the Enterprise.

“ Cost savings. Linear scalability. Evaluate ‘the hype’ practically. Complement BI.”

—BI architect, telecom, Europe

“ Reduces cost of data. New ability to query big data sets. Supply chain improvements. Predictive analytics.”

— Vice president, food and beverage, Asia

“ Our existing infrastructure cannot handle the tenfold increase in data volumes.”

— Data strategy manager, hospitality, US

“ It’s important to realize the potential of big data and to explore new business opportunities.”

—Data specialist, consulting, Asia

“In your own words, why is implementing Hadoop important (or not important)?”

As you’ll see from the few excerpts below, the respondents’ comments reveal a number of benefits, needs and trends:

To get a broader picture of Hadoop usage – and users’ unvarnished opinions – the TDWI survey also asked an open-ended question:

Learn more about SAS and HadoopTOC 7

Page 10: An Early Adopter's Guide to Hadoop

10

Big data management: 5 things you need to knowby David Loshin, President, Knowledge Integrity Inc.

Page 11: An Early Adopter's Guide to Hadoop

As more organizations adopt big data platforms, concern mounts that

application development may suffer from the lack of good practices

for managing the data powering those applications. When we talk

about big data management in relation to big data platforms (like

those combining commodity hardware with Hadoop), it’s clear that

big data technologies have created a need for new and different data

management tools and processes. Here are five things you need to

know about big data management that will help ensure consistency

and trust in your analytic results.

Business users can do some big data management by themselvesOne of the mantras of big data is availability – enabling access to numerous massive data sets in their original formats. Today’s business users, who are more adept than their predecessors, often want to access and prepare the data in its raw format rather than having it fed to them through a chain of operational data stores, data warehouses and data marts. Business users want to scan the data sources and craft their reports and analyses around their own business needs.

Supporting business user self-service for big data has two big data management implications:

• To permit data discovery, users will have to be allowed to peruse the data independently.

• Users will need data preparation tools to assemble the information from the numerous data sets and present it for analysis.

It’s not your parents’ (or grandparents’) data modelOur conventional approach to capturing and storing data for reporting and analysis centers on absorbing data into a predefined structure. But in the big data management world, the expectation is that both structured and unstructured data sets can be ingested and stored in their original (or raw) formats, eschewing the use of predefined data models. The benefit is that different users can adapt the data sets in the ways that best suit their needs.

To reduce the risk of inconsistency and conflicting interpretations, though, this suggests the need for good practices in metadata management for big data sets. That means solid procedures for documenting the business glossary, mapping business terms to data elements, and maintaining a collaborative environment to share interpretations and methods of manipulating data for analytical purposes.

Quality is in the eye of the beholderIn conventional systems, data standardization and cleansing are applied prior to storing the data in its predefined model. One of the consequences of big data is that providing the data in its original format means no

Managing big data ... entails a new cadre of technologies and processes to enable broader data accessibility and usability.

Learn more about SAS and HadoopTOC 9

Page 12: An Early Adopter's Guide to Hadoop

cleansing or standardizations are applied when the data sets are captured.

While this provides greater freedom in the way data is used, it becomes the users’ responsibility to apply any necessary data transformations. So, as long as user transformations don’t conflict with each other, data sets may be easily used for different purposes. This implies the need for methods to manage the different transformations and ways to ensure that they don’t conflict. Big data management must incorporate ways to capture user transformations and ensure that they are consistent and support coherent data interpretations.

Understanding the architecture improves performanceBig data platforms rely on commodity processing and storage nodes for parallel computation using distributed storage. Yet if you remain unfamil-iar with the details of any SQL-on-Hadoop’s query optimization and exe-cution models, you may be unpleasantly surprised by unexpectedly poor response times.

For example, complex JOINs may require that chunks of distributed data sets be broadcast to all computing nodes – causing huge amounts of data to be injected into the network and creating a significant performance bottleneck. The upshot is that understanding how the big data architecture organizes data and how the database execution model optimizes queries will help you write data applications with reasonably high performance.

It’s a streaming worldIn the past, much of the data that was collected and consumed for analytical purposes originated within the organization and was stored in static data repositories. Today, there is an explosion of streaming data. We have human-generated content such as data streamed from social media channels, blogs, emails, etc. We have machine-generated data from myriad sensors, devices, meters and other Internet-connected machines. We have

David Loshin, President of Knowledge Integrity Inc., is a recognized thought leader and expert consultant in the areas of data quality, master data management and business intelligence. Loshin is a prolific author regarding data management best practices, via the expert channel at b-eye-network.com and numerous books, white papers, and web seminars on a variety of data management best practices.

automatically-generated streaming content such as web event logs. All of these sources stream massive amounts of data and are prime fodder for analysis.

This is the crux of the issue. Any big data management strategy must include technology to support stream processing that scans, filters and selects the

meaningful information for capture, storage and subsequent access.

Considerations for big data managementManaging big data not only subsumes many of the conventional approaches to data modeling and architecture, it entails a new cadre of technologies and processes to enable broader data accessibility and usability. A big data management strategy must embrace tools enabling data discovery, data preparation, self-service data accessibility, collaborative semantic metadata management, data standardization and cleansing, and stream processing engines. Being aware of these implications can dramatically speed the time-to-value of your big data program.

Learn more about SAS and HadoopTOC 10

Page 13: An Early Adopter's Guide to Hadoop

13

Hadoop: The new boardroom buzzwordAn interview with SAS and Hortonworks

Page 14: An Early Adopter's Guide to Hadoop

A few years ago, the open source data platform Hadoop was the

domain of first mover technology companies and entrepreneurial

analysts frustrated with the limitations of traditional data storage

systems’ capabilities experimenting on their own desktops to

create new analytical models.

Today, the quirky moniker is part of boardroom banter and forms the technology base for one of SAS’ latest strategic alliances. We sat down for a conversation with Simon Gregory, EMEA Business Development Director at Hortonworks, a software company specialized in developing applications for and ensuring support for the Hadoop platform. Along with Adrian Jones from SAS’ Global Center of Excellence for Information Management and Analytics, we set out to discover what the buzz is about.

Why is Hadoop different?Adrian Jones: Hadoop shot to the fore after a difficult financial period, and people were looking to challenge the marketplace. Early adopters were looking to gain flexibility at an attractive price point, as looking for new things was a challenge with the constraints of the traditional data storage systems. So whether it is working with new data types, or greater volumes instead of just samples, Hadoop is just flexible and can cope with that. You can onboard more data quicker.

Simon Gregory: The types of data our customers try to drive into Hadoop are not the types of data that fit nicely into traditional systems or rows and columns. Hadoop lets you supplement existing systems with a new style of data, a complimentary platform that actually supports this diversity of data. And this works well with the emergence of mobile and social media, where the focus is on unstructured data.

Why were legacy data storage systems not able to adapt to these demands?Jones: People, analysts especially, have been doing things on their desktops and own servers that management did not know about, running models on Hadoop to complement their work. I have met analysts who spent 60 percent of their time just doing data management, not analytics – the traditional systems were focused on SLAs and other structural limitations, and it made it hard to join data together. IT departments were blissfully unaware for a long time; so were data warehouse vendors who missed what the business was trying to accomplish because they were too focused on what IT was telling them.

Gregory: There was a sort of purchasing mechanism in the past in which line-of-business managers went to IT and said “I would like you to build me a program based on this business problem” and what happened was that an application would be built that was perfectly capable of handling the type and volume of data, but it would be created in a silo. So what we hear our customers saying is that they ended up having a lot of data silos across their estate and what they really want is a single point of en-try, and the current application points are not giving them this.

Jones: It is not like IT intended not to be able to serve its business lines’ needs or wanted to create silos – it just happened over time. It is all a cultural thing – we need to educate and address the company culture, because in most companies we work the way we do due to legacies and processes which were determined by IT systems. We need to educate IT departments to be more of a flexible service model with a degree of structure around it while we educate business managers that this flexibil-ity is actually good, they can innovate more. So you need to address the fears.

But what I see is that when you let analysts play with this, they get it– and BANG – it is an overnight turning point. But first, there is a period of convincing managers, gaining sponsors and advocates. So it is about

Learn more about SAS and HadoopTOC 12

Page 15: An Early Adopter's Guide to Hadoop

opening people’s minds up to changing the way they have worked for

sometimes 30 years.

How is Hadoop changing the way businesses collaborate around big data?Gregory: What is really interesting now is seeing the collaboration between the CTO and the IT infrastructure office – and more and more, between the CMO and the chief data officer. This is a really interesting shift. Three years ago, Hadoop was the domain of mostly technology companies, but now you are seeing traditional, even conservative, companies starting to use it.

The business is starting to see data as an asset, and the minute they do this, they can see how Hadoop can help them in moving forward. It is about accepting data that shifts your customers’ perception of your products, exposes information you did not have access to before. The cultural part is critical – when the business sees this as important, it becomes important. Also, analytics is an asset – not back office. Both of these worlds need to go hand in hand. Data is worthless without data analytics and data access. Hadoop alone is not the answer – but in the broader ecosystem, including analytics, that is where it fits together.

Jones: You are seeing companies revisiting their data strategies and seeing that some companies are unloading everything but the primary data into Hadoop to use their resources most appropriately. This way they can still maintain the integration to legacy systems without the cost profile. Customers are talking about how Hadoop fits into their data strategies – from all angles. Hadoop changes the way an analyst thinks and works: It sets the analyst free again, and that is going to be very powerful. Creative analysts will do very well in this new paradigm!

Learn more about SAS and HadoopTOC 13

Page 16: An Early Adopter's Guide to Hadoop

16

Using Hadoop and SAS® for network analytics to build a customer-centric telecom service

Page 17: An Early Adopter's Guide to Hadoop

OTE Cosmote, the largest mobile and wireline fixed-network operator in Greece, has expanded its relationship with SAS to strengthen its service quality and benchmarking, customer experience and service-performance capabilities for both its mobile and wireline fixed network. SAS for network analytics facilitates analyzing and visualizing huge amounts of network and customer data quickly and efficiently, helping OTE Cosmote deliver the best customer experience and remain a leader in the Greek telecommunications market.

Konstantinos Vlahodimitropoulos, Service Quality and Technology Management Director at OTE Cosmote, explains the project developed with SAS that combines analytics with the Hadoop storage system. He highlights the benefits of network analytics and data visualization in a business revolutionized by the emergence of mobile devices.

The telecommunications landscape is undergoing a period of profound transformation. What are the main challenges and opportunities for communica-tions service providers?Konstantinos Vlahodimitropoulos: The proliferation of smartphones has dramatically changed the telecommunications market as well as the user lifestyle. Today people communicate, build relationships and execute trans-actions on their mobile devices. For telecommunications companies, the

new scenario represents a complex challenge and a major opportunity at the same time. The key challenge is to ensure the continuity and quality of service provided through the broadband networks. Customers have high expectations from network efficiency and user experience. The progressive erosion of margins is a critical point for telecommunications companies, so the need for savings keeps growing. OTE Cosmote is extremely focused on reducing operating costs, and investments are weighted to avoid wast-ing money and to maximize returns.

How can SAS® for network analytics help in the new scenario?Vlahodimitropoulos: Analyzing big data is crucial to anticipate customer needs, ensure loyalty and prevent churn. Data visualization allows us to de-liver relevant information to the users with drill-down ability and self-service reporting. At OTE Cosmote, the main need was updating the outdated data warehouse architectures with innovative and agile solutions. Our goal was to analyze growing volumes of network and customer data with greater speed, in seconds instead of minutes, accelerating the decision-making pro-

cess. Hadoop and SAS have provided that ability.

What are the main areas the solution is addressing, and who are the main users?Vlahodimitropoulos: SAS for network analytics was deployed in our prod-uct management department to gain valuable insights on service quality for mobile service and performance monitoring for fixed-line networks. This includes customer experience management initiatives to discover how services are used by each customer, and to gain additional insights and drill down to detailed activities to quickly discover and address customer concerns. For example, a particular project focused on the broadband mea-surements obtained through drive tests and the Ookla Speedtest app by OTE Cosmote users themselves. In addition to improving service quality, OTE Cosmote is using SAS for network analytics for competitive bench-

When customers turn on their mobile phones they expect – or demand

– good service. Providing numerous services on a distributed network

throughout large geographical areas requires sophisticated analytics

to monitor service quality and performance.

Learn more about SAS and HadoopTOC 15

Page 18: An Early Adopter's Guide to Hadoop

marking across geographical areas, for performing advanced root-cause analysis of selected incidents and managing faults, and for monitoring and analyzing crucial network KPIs.

SAS for network analytics extends to our marketing/CRM department for customer profiling, churn prediction modeling, and optimization of cross-sell and up-sell activities. Additionally, to ensure fee compliance with the telecom-munications market regulations, OTE is using SAS Activity-Based Management for managerial reporting related to billing.

What benefits are you seeing?Vlahodimitropoulos: Using SAS and Hadoop, OTE Cosmote provides com-petitive service levels and compelling marketing messages. Through our technology investments, we’re seeing increased speed of analysis, reduced from minutes to seconds, on increasing volumes of data; accelerated decision making; and the opportunity to access reports independently, with an immedi-ate, detailed view. Previously, the reporting was time consuming and required significant IT support. Today business users can easily explore relevant data on their own, without taking time away from the IT department.

How will OTE Cosmote use analytics in the future?Vlahodimitropoulos: The next step will be the adoption of the SaaS model for SAS technologies to quickly extend the solutions to other countries within

the OTE group, taking advantage of big data cost-efficient scalability.

The results illustrated in this article are specific to the particular situations, business models, data input, and computing environments described herein. Each SAS customer’s experience is unique based on business and technical variables and all statements must be considered non-typical. Actual savings, results, and performance characteristics will vary depending on individual customer configurations and conditions. SAS does not guarantee or represent that every customer will achieve similar results. The only warranties for SAS products and services are those that are set forth in the express warranty state-ments in the written agreement for such products and services. Nothing herein should be construed as constituting an additional warranty. Customers have shared their successes with SAS as part of an agreed-upon contractual exchange or project success summarization following a successful implementation of SAS software. Brand and product names are trademarks of their respective companies.

Learn more about SAS and HadoopTOC 16

Page 19: An Early Adopter's Guide to Hadoop

19

The real scoop on HadoopCloudera’s Mike Olson talks latest trends, changes and your formula for success

By Anne-Lindsay Beall, SAS Insights Editor

Page 20: An Early Adopter's Guide to Hadoop

Mike Olson is unquestionably an expert in Hadoop.

After selling his startup to Oracle in 2008, Olson

co-founded Cloudera to sell a version of Hadoop

that’s packed with the features and support

businesses need to get value out of big data.

Fresh off his his executive roundtable at The Premier Business Leadership

Series, we sat down with Olson to find out how Hadoop technology – and the way businesses are using it – is changing.

Let’s start with your overall outlook on Hadoop and analytics. How are things changing around these technologies?Mike Olson: The overarching theme, and the one driving all others, is innovation. Hadoop’s 10 years old; since Doug Cutting and Mike Cafarella wrote the initial versions of HDFS and MapReduce, there’s been an explosion of new projects expanding Hadoop’s capabilities. The list reads like a strange sort of bestiary: Pig, Hive, Sentry, Zookeeper, Impala and 20 or 30 more. The capabilities span ingest and filtering, data quality, SQL, data flow, web-scale data serving, security and multitenancy, fast text search – a long list of enterprise-grade capabilities and powerful new ways to work with data at scale.

Of particular note, lately, is the advent of interactive and real-time services. Original Hadoop was batch-mode (and was much maligned for that). Today, you can do streaming data ingest, filter and alert on events as they happen, train models and score events in real time – tens of terabytes used to be expensive and hard to handle. Today, we have customers with tens of petabytes of data.

How are you seeing companies use analytics with Hadoop? Have you seen analytics projects evolve since companies began implementing Hadoop?Olson: Two major trends.

First, companies are combining data sets that have long been segregated. Digesting user behavior from web and mobile interactions, combining that with transaction flows from in-store and e-commerce sites, and adding interactions from calls, chats or emails to customer support was impossible before. We had separate systems for all of those data sets. Now, we can land them in a single place, and use a variety of analytic tools to collect and analyze them together.

Second, really powerful new analytic techniques are available. SAS users have been at the forefront of analytics for a long time – machine learning and high-powered statistics are familiar to them. With Spark Streaming, we’re now seeing those and other techniques applied in concert to complex event processing flows. The business users who get real-time results from these systems may not know or care what algorithms are hidden behind the curtain, but they have great application user interfaces supporting them in making better decisions based on data analyzed using those tools.

How does Hadoop fit into business modernization plans? Olson: The core idea behind Hadoop – the insight that Google had, when it invented the technique – was that you could gang together large num-bers of inexpensive industry-standard servers, and use their combined storage and CPU to catch, process and analyze more data, at dramatically lower cost than ever before. None of us in the database industry believed, back then, you could build a system big enough to ingest and store the whole Internet. Google ignored the impossibility and did it.

Learn more about SAS and HadoopTOC 18

Page 21: An Early Adopter's Guide to Hadoop

The reason that Hadoop is such a great solution is that it’s inexpensive, scalable and completely flexible. You don’t need to predict, in advance, what kinds of data you’ll capture and store. The system is able to handle any format, including new formats as they emerge. That’s crucial, since we can’t predict today what sensors, what systems and what data formats we’ll be using five or 10 years hence.

Hadoop is, by a considerable margin, the most successful platform for big data storage and processing in the world. Businesses looking to modernize certainly need to plan for big data, and they ought to choose Hadoop right now, just on that basis. But its flexibility means it’s the best choice for the long term, as well. It future-proofs the data center, adapting to new data and new analytic engines as they emerge.

Should all businesses be looking at Hadoop? Or only those with big data? Olson: Every substantial enterprise should, for sure. Businesses have been using data for a long time to make better decisions – capturing information about customers and sales, exploring them with business intelligence tools, using great analytic products from SAS to understand history and the pres-ent, and to predict the future. Big data just means that you can bring more detail, more data, to that party. More detail means a finer-grained and more useful picture of today, and a more reliable look at the future.

Even small businesses and individuals that don’t want or use Hadoop themselves are getting plenty of it. You can’t shop online, plan an airplane trip, get map directions or enjoy programs on television or the Internet without kicking off Hadoop analytics, and benefiting from their output. We’ll see that march continue: More and more of the services we consume will be backed by big data. I could tell you great stories today about big data in health care, connected cars, the energy grid, agriculture, manufacturing and on and on.

The reason that Hadoop is such a great solution is that it’s inexpensive, scalable and completely flexible. You don’t need to predict, in advance, what kinds of data you’ll capture and store.

Learn more about SAS and HadoopTOC 19

Page 22: An Early Adopter's Guide to Hadoop

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries.

® indicates USA registration. Other brand and product names are trademarks of their respective companies. Copyright © 2016 SAS Institute Inc. Cary, NC, USA. All rights reserved.

Follow us:

Learn more about the history of Hadoop and the state of Hadoop adoption on our Hadoop overview page.