creating a next-generation big data architecture
DESCRIPTION
If you’ve spent time investigating Big Data, you quickly realize that the issues surrounding Big Data are often complex to analyze and solve. The sheer volume, velocity and variety changes the way we think about data – including how enterprises approach data architecture. Significant reduction in costs for processing, managing, and storing data, combined with the need for business agility and analytics, requires CIOs and enterprise architects to rethink their enterprise data architecture and develop a next-generation approach to solve the complexities of Big Data. Creating the data architecture while integrating Big Data into the heart of the enterprise data architecture is a challenge. This webinar covered: -Why Big Data capabilities must be strategically integrated into an enterprise’s data architecture -How a next-generation architecture can be conceptualized -The key components to a robust next generation architecture -How to incrementally transition to a next generation data architectureTRANSCRIPT
![Page 1: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/1.jpg)
Big Data Architectural Series:Creating a Next-Generation Big Data Architecture
facebook.com/perficient twitter.com/Perficientlinkedin.com/company/perficient
![Page 2: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/2.jpg)
2
Perficient is a leading information technology consulting firm serving clients throughout
North America.
We help clients implement business-driven technology solutions that integrate business
processes, improve worker productivity, increase customer loyalty and create a more agile
enterprise to better respond to new business opportunities.
About Perficient
![Page 3: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/3.jpg)
3
• Founded in 1997
• Public, NASDAQ: PRFT
• 2013 revenue $373 million
• Major market locations:
• Allentown, Atlanta, Boston, Charlotte, Chicago, Cincinnati,
Columbus, Dallas, Denver, Detroit, Fairfax, Houston,
Indianapolis, Lafayette, Minneapolis, New York City,
Northern California, Oxford (UK), Philadelphia, Southern
California, St. Louis, Toronto, Washington, D.C.
• Global delivery centers in China and India
• >2,200 colleagues
• Dedicated solution practices
• ~90% repeat business rate
• Alliance partnerships with major technology vendors
• Multiple vendor/industry technology and growth awards
Perficient Profile
![Page 4: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/4.jpg)
BUSINESS SOLUTIONS
Business Intelligence
Business Process Management
Customer Experience and CRM
Enterprise Performance Management
Enterprise Resource Planning
Experience Design (XD)
Management Consulting
TECHNOLOGY SOLUTIONS
Business Integration/SOA
Cloud Services
Commerce
Content Management
Custom Application Development
Education
Information Management
Mobile Platforms
Platform Integration
Portal & Social
Our Solutions Expertise
![Page 5: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/5.jpg)
Our Speaker
Bill Busch
Sr. Solutions Architect, Enterprise Information Solutions, Perficient
• Leads Perficient's enterprise data practice
• Specializes in business-enabling BI solutions that enable the agile enterprise
• Responsible for executive data strategy, roadmap development, and the delivery of high-impact solutions that enable organizations to leverage enterprise data
• Bill has over 15 years of experience in executive leadership, business intelligence, data warehousing, data governance, master data management, information/data architecture and analytics
![Page 6: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/6.jpg)
Perficient’s Big Data Architectural Series
Business
Case
Next
Generation
Architecture
Future Topics
• Data Integration
• Stream
Processing
• NoSQL
• SQL on Hadoop
• Data Quality
• Governance
• Use Cases &
Case Studies
Today’s
Webinar
![Page 7: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/7.jpg)
Today’s Objectives
5 Architectural
Roles For Hadoop
Hadoop
Ecosystem
Potential
vs. Reality
Realizing A
Hadoop
Centric
Architecture
![Page 8: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/8.jpg)
Today’s Objectives
5 Architectural
Roles For Hadoop
Hadoop
Ecosystem
Potential
vs. Reality
Realizing A
Hadoop
Centric
Architecture
![Page 9: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/9.jpg)
“Big Data is high-volume, high-velocity and high-variety information assets that demand cost-effective,
innovative forms of information processing for enhanced insight and decision making.”
Convergence of structured, unstructured,and dark data
Big Data is the evolution of data creating similar data management issues that IT has struggled to address
for the last 20+ years.
Three Views of Big Data
![Page 10: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/10.jpg)
“Big Data is high-volume, high-velocity and high-variety information assets that demand cost-
effective, innovative forms of information processing for enhanced insight and decision
making.”
Convergence of structured, unstructured, and dark data
Big Data is the evolution of data creating similar data management issues that IT has struggled to
address for the last 20+ years.
Three Views of Big Data
![Page 11: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/11.jpg)
Common Big Data Business Use Cases
Improve Strategic
Decision Making
Customer
Experience
Analysis
Operational
Optimization
Risk and Fraud
Reduction
Data Monetization
Security Event
Detection and
Analysis
IT Cost
Management
![Page 12: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/12.jpg)
Expanding Data Ecosystem
• Customer
Intelligence
• Operations
• Risk& Fraud
• Data
Monetization
• Strategic
Development
• Security
Intelligence
• IT Optimization
Structured Data
(5-20% of Total)
Point-of-Sale
Text Messages
Contracts &
Regulatory
Preferences &
Emotions
Security AccessWeather
Machine Data
Automobile
Mobile
Communications
Geospatial
Social
Data
Ecosystem
![Page 13: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/13.jpg)
Enterprise Data ArchitectureNext Generation
![Page 14: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/14.jpg)
The PromiseData Architecture Simplification
Data IntegrationData HubAnalytics
Stream ProcessingData Warehouse Operational Data
Hadoop Cluster
![Page 15: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/15.jpg)
The RealityMaturity Limits the Use Cases
• Realize the potential of Hadoop
• Multi-tenancy is in its infancy
• Hadoop 2.0 and YARN
• Most third-party applications are just
moving to YARN
• Hive (and other SQL on Hadoop
solutions) maturing
• Robust enterprise functionality is
evolving
• Security
• High Availability
![Page 16: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/16.jpg)
Different Types of “Open Source Hadoop”
Apache
Projects
Only
Proprietary
Value Add & Re-
Development
Apache
Projects +
Proprietary
Add-ons
Packaged and
Online Solutions
• IBM Big Insights
• Oracle Big Data
Appliance
• HDInsight
• Many others!
Choosing A Hadoop Distribution
Company Philosophy
Current Relationships
Acceptable Risk
Specialized Functionality
![Page 17: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/17.jpg)
Quick Primer on YARN
What is Yarn?
• Yet Another Resource Manager
• Sometimes referred as
MapReduce 2.0
• Data operating system
• Fault-Tolerance
Why is this important?
• Enables multi-tendency on
Hadoop
• Moves processing to the data*Image Provided by HortonWorks
![Page 18: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/18.jpg)
Today’s Objectives
5 Architectural
Roles For Hadoop
Hadoop
Ecosystem
Potential
vs. Reality
Realizing A
Hadoop
Centric
Architecture
![Page 19: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/19.jpg)
Hadoop
Analytics
Data Warehouse
Stream Processing
Data Factory
Transactional Data Store
Five Common Architectural RolesHadoop Big Data Use Cases
![Page 20: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/20.jpg)
Enterprise Data ArchitectureNext Generation
![Page 21: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/21.jpg)
Hadoop
Analytics
Data Warehouse
Stream Processing
Data Factory
Transactional Data Store
Five Common Architectural RolesHadoop Big Data Use Cases
![Page 22: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/22.jpg)
Analytical Processing
Source Wrangle Data Model & Tune Operationalize1 2 3 4
• Data Ingestion
• Metadata
Management
• Data Access
• Data Preparation
Tools
• Data Discovery
&Visualization
• Data Wrangling
Tools
• Business Glossary
& Search
• Data Access
• Data Discovery &
Visualization
• Analytical Tools
• Analytical
Sandbox
• Business Created
Reporting
• Model Execution &
Management
• Knowledge
Management
(Portal)
Analytical
Process
Architectural
Capabilities
![Page 23: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/23.jpg)
Analytical Processing
Source Wrangle Data Model & Tune Operationalize1 2 3 4
• Data Ingestion
• Metadata
Management
• Data Access
• Data Preparation
Tools
• Data Discovery
&Visualization
• Data Wrangling
Tools
• Business Glossary
& Search
• Data Access
• Data Discovery &
Visualization
• Analytical Tools
• Analytical
Sandbox
• Business Created
Reporting
• Model Execution &
Management
• Knowledge
Management
(Portal)
Analytical
Process
Architectural
Capabilities
![Page 24: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/24.jpg)
Data Access
• There are many methods
to accessing Big Data
• Direct HDFS
• NoSQL / Connector
• Hive/ SQL On Hadoop
• Align tool to access
methods and file types
• Data Preparation
• Analytics Source
Files/DataTidy Data
Data
Preparation
Tool
Analytics
Tool
Analytical
Result
Read Access
Write Access
Key
Hadoop Cluster
![Page 25: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/25.jpg)
Hadoop
Analytics
Data Warehouse
Stream Processing
Data Factory
Transactional Data Store
Five Common Architectural RolesHadoop Big Data Use Cases
![Page 26: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/26.jpg)
Data Warehouse Roles
• Two models for splitting processing
• Hot – Cold• Data Warehouse Layer
• Push high user loads to traditional data warehouses
• Fully investigate DW-Hadoop connector functionality
• Leverage opportunity to use in-memory database solutions
Data Warehouse Layer Approach
Hadoop Cluster Traditional DW/DM
Hot – Cold Data Warehouse
Cold Data
Hadoop Cluster Traditional DW/DM
Hot Data
![Page 27: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/27.jpg)
Data WarehouseOrganize Your Data
• Types of data stored on
cluster
• Analytical sandboxes
• Team
• Individual
• Quotas
• Potential to replace
information lifecycle
management solutions
• No right answer – clearly
define usage
Consolidated
Data
Streaming
Queues
Delta’s
(Incremental)
Common Data (Dimensions, Master Data)
Improved / Modeled Data
Published, Analytical and Aggregates
Sandbox Zone
Raw Data Processed Data
Hadoop Cluster
Archived Data
![Page 28: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/28.jpg)
Hadoop
Analytics
Data Warehouse
Stream Processing
Data Factory
Transactional Data Store
Five Common Architectural RolesHadoop Big Data Use Cases
![Page 29: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/29.jpg)
Stream and Event Processing
• Dedicated vs. Shared Model
• Persistence of messages, logs, etc.
• Long-term storage
• Queuing
• Pre-load (HDFS) vs. Post-load
processing
• Micro-Batch vs. One-at-a-Time
• Programing language support
• Processing guarantee
• At most once
• At least once
• Exactly once
Let business requirements drive need for streaming solutions. It is acceptable to use more
than one solution as long as the roles / purposes of each are clearly defined.
![Page 30: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/30.jpg)
Hadoop
Analytics
Data Warehouse
Stream Processing
Data Factory
Transactional Data Store
Five Common Architectural RolesHadoop Big Data Use Cases
![Page 31: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/31.jpg)
The Data Integration Challenge
Key Point: Hadoop and Hadoop-related technologies can address these challenges.
However, they must be architected and governed properly
Volume, variety, and
velocity create unique
challenges for data
integration
10,000+ unique entities
(or file groups) may have
to be managed
Batch windows are still
the same or shrinking
The Challenge
![Page 32: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/32.jpg)
Data Factory & Integration
Hadoop Distributed
Tools
Data Integration
Packages
Hybrid (Both Hadoop
and Data Integration
Package)
• Leverages tools included in
the Hadoop Distribution and
programing languages
• Scoop, Flume, Spark, Java,
MapReduce are examples
• Tools can be implemented in
many different modes
• Hand-coded/scripted
• Runtime Configured
• Generated
• Based on use case
leverages both Hadoop and
COTs tools to move and
transform data
• Leverage commercial data
integration packages to
move and transform data
• IBM Infosphere Big Insights,
Informatica are examples
• Key questions, where is
processing taking place and
does the tool use YARN
resource manger?
Approaches to Big Data Integration
![Page 33: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/33.jpg)
Define Pipelines and Stages
Sqoop
Cloud
Sources
RDBMS
File
HubFTP
Packaged
Tool
Object
DBMSETL Tool
Log
DataFTP
Stream/
Message
Bus
Kafta
Sqoop
Storm
ExtractHDFS Load &
Formatting
Scraping&
Normalization
MCF
Storm
Cleansing ,
Aggregation
Transformation
Package
ETL Tool
Storm
Data Distribution Data Access &
Distribution
RDBMS/DW
/IMDB
Hive
Hbase
File
Extracts
NoSQL
Stream
Output
Custom
Sqoop
Custom
Custom
Message
Bus
ETL
Tool ETL Tool
![Page 34: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/34.jpg)
Big Data Integration FrameworkTypical Services
Key Guidance:
• In lieu of using a ETL product, consider building a Big
Data Integration framework
• Apache Falcon provides pipeline management
• Focus is on making all components run-time
configurable with metadata
• Can offer significant cost savings over the long run
Load UtilityMetadata
Collection Metadata
Pipeline
Config
Files
Metadata
Config Files
Pipeline Utilities
Parser
(Delimiter)
Data
Standardization
HIVE
Publishing
MF Coding
Converters
File Joiner &
Transport
Logging
Checksum
Retention
Replication
Late Arriving
Data
Exception
Handling
Pipeline Master (ex. Falcon)
DB Copy
Archival
Audit
Sqoop Flume
HDFS Shell
![Page 35: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/35.jpg)
Hadoop
Analytics
Data Warehouse
Stream Processing
Data Factory
Transactional Data Store
Five Common Architectural RolesHadoop Big Data Use Cases
![Page 36: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/36.jpg)
SQL on Hadoop
• SQL on Hadoop is changing
• Historically focused on read
functionality for analytics
• New breed of SQL on Hadoop
• BI and operational
reporting
• Transaction Processing
*Image Provided by Splice Machine
![Page 37: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/37.jpg)
Transactions In Hive
![Page 38: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/38.jpg)
Today’s Objectives
5 Architectural
Roles For Hadoop
Hadoop
Ecosystem
Potential
vs. Reality
Realizing A
Hadoop
Centric
Architecture
![Page 39: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/39.jpg)
Common Big Data Business Use Cases
Improve Strategic
Decision Making
Customer
Experience
Analysis
Operational
Optimization
Risk and Fraud
Reduction
Data Monetization
Security Event
Detection and
Analysis
IT Cost
Management
![Page 40: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/40.jpg)
Architectural Scenarios
Architecture
Role
Business Use Case Analytics
Data
Warehouse
Stream
Processing Data Factory
Transactional
Data Store*
Strategic Decision
Making P s
Customer Experience P s P s
Operational
Optimization P s s s
Risk and Fraud
Reduction P s P
Data Monetization s s P
Security Event
Detection and Analysis P s s s
IT Cost Management P s P P
* Capability is just emerging within the Hadoop
ecosystem. Consider this use case for isolated
business cases and early adopters.P = Primary Use Case s = Secondary Use case
![Page 41: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/41.jpg)
Integrating Hadoop into the Enterprise
Determine
Business Use
Cases
Understand
Current Tools
& Architecture
Align Business
Use Case
Priorities
Build
Roadmap
Specify
Solution
Architecture
Update &
Maintain
Roadmap
Implement
Roadmap
![Page 42: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/42.jpg)
Final Thoughts
Do
• Match the business use case to the big data role
• Clearly define a roadmap
• Establish clear architectural standards to drive
• Consistency
• Re-use of resources
• Homework when defining a solution architecture
Don’t
• Select an initial use case that relies on immature
Hadoop functionality
• Leverage tools that move data off the cluster for
processing then storing the data back on the cluster
• Assume all Hadoop technologies integrate well together
![Page 43: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/43.jpg)
As a reminder, please submit your
questions in the chat box.
We will get to as many as possible.
![Page 44: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/44.jpg)
Daily unique content
about content
management, user
experience, portals
and other enterprise
information technology
solutions across a
variety of industries.
Perficient.com/SocialMedia
Facebook.com/Perficient
Twitter.com/Perficient
![Page 45: Creating a Next-Generation Big Data Architecture](https://reader034.vdocument.in/reader034/viewer/2022052316/55991e291a28abde158b4636/html5/thumbnails/45.jpg)
Thank you for your participation today.Please fill out the survey at the close of this session.