Transforming Business in a Digital Era with
Big Data and Microsoft
facebook.com/perficient twitter.com/Perficient_MSFTlinkedin.com/company/perficient
2
Perficient is a leading information
technology consulting firm serving clients
throughout North America.
We help clients implement business-driven
technology solutions that integrate business
processes, improve worker productivity, increase
customer loyalty and create a more agile enterprise
to better respond to new business opportunities.
ABOUT PERFICIENT
3
PERFICIENT PROFILEFounded in 1997
Public, NASDAQ: PRFT
2014 revenue $456 million
Major market locations:
Allentown, Atlanta, Ann Arbor, Boston, Charlotte, Chicago, Cincinnati,
Columbus, Dallas, Denver, Detroit, Fairfax, Houston, Indianapolis,
Lafayette, Milwaukee, Minneapolis, New York City, Northern California,
Oxford (UK), Southern California, St. Louis, Toronto
Global delivery centers in China and India
>2,600 colleagues
Dedicated solution practices
~90% repeat business rate
Alliance partnerships with major technology vendors
Multiple vendor/industry technology and growth awards
4
INDUSTRIES Healthcare
Financial Services
Life Sciences
Consumer Markets
Automotive & Transportation
High Tech
Telecom
Energy & Utilities
Manufacturing
Media & Entertainment
PORTALPortal Frameworks
SearchSecurityWeb AnalyticsWeb Content Management
Social & CollaborationMobilityExperience Design
INTEGRATIONIntegration Frameworks
Cloud ArchitectureReference Architecture
Application IntegrationEnterprise Application IntegrationService Oriented Architecture
Process & Content IntegrationBusiness Process ManagementComplex Event ProcessingRules Engines
DATA & CONTENTBusiness Analytics
Business IntelligencePredictive AnalyticsReporting
Structured Data ManagementData Integration, Quality & GovernanceEnterprise Data WarehouseMaster Data ManagementProduct & Information Management
Unstructured Data ManagementBig DataContent IntelligenceContent Management
Enterprise Search
CUSTOMER EXPERIENCECustomer 360
Multi Channel EnablementRelationship ManagementSocial Engagement
CommerceMarketing Strategy ImplementationOrder ManagementSupply Chain ManagementService & SupportManaged Hosting
Sales & Service SupportCustomer Service, Sales Force Automation
Experience DesignStrategic Roadmaps & Envision Workshops User Research & Metrics AnalysisCreative & Interaction DesignCustom & Responsive UI Development
Digital MarketingSearch Engine MarketingOnline AdvertisingContent StrategyConversion Optimization
Management Consulting
BUSINESS OPERATIONSCorporate Performance Management
Budgeting, Forecasting & PlanningBusiness Analysis & Predictive Analytics
Enterprise Business SolutionsOracle EBSVertex Tax Solutions
Human Resource SolutionsEmployee Portals Human Resource ManagementTalent Management
Enterprise Social PlatformsSocial StrategyLync Unified CommunicationsOffice 365
Management Consulting
OUR SOLUTIONS PORTFOLIO
6
SPEAKERS
Shankar RamaNathan Perficient
Senior Enterprise Architect,Strategic Advisors Team
Andrew Tegethoff PerficientPractice Lead,
Microsoft Business Intelligence
77
Introduction
Digital Transformation & Big Data
Big Data Challenges
Big Data & Microsoft
In the Cloud with HDInsight
In the Data Center with APS
AGENDA
11
BIG DATA CHALLENGES:
How to get value from Big Data?
Governance & Security Concerns/ Issues
Analytical / Technology Talent
Integrating different sources of Data
Integrating Enterprise Data with Big Data
Defining Strategy
Funding
14
Customer Databases
Service Records
Text Documents
Product Databases
POS Data
Weblogs
Social Media
Clickpaths
Callcenter
Payment Database
CUSTOMER EXPERIENCE
Omni Channel Strategy
Integrating
Enterprise Data
Big Data: At rest & in
motion
15
BIG DATA RISKS AND OPTIONS
Not able to show business value
Difficulty in finding the resources
IT/HW/SW installation bottle-neck
1919
Consulting on strategic and tactical aspects of BI with the Microsoft Data Platform
MICROSOFT BUSINESS INTELLIGENCE
2020
• Volumeo Terabytes, petabytes,
exabytes
• Velocityo How much data is
created every minute?
• Varietyo Social, Web, Internet of
Things, etc.
BIG DATA
2121
BIG DATA
What types of data are we talking about?
People to People
Online forums
Social networks
Blogs
SMS threads
Email threads
People to Machine
E-commerce
Bank cards
Credit cards
Mobile devices
Digital TV
Machine to Machine
Medical devices
GPS devices
Bar code scanners
Sensors
Surveillance cams
2222
An open source framework for the storage and processing
of very large data sets.
The Hadoop ecosystem consists of many additional tools that perform functions like:
• Resource management
• Extract, Transform and Load (“ETL”) and/or Extract,
Load and Transform (“ELT”)
• Full text search
• Workflow scheduling
• SQL querying
ENTER HADOOP
2323
WHAT CAN
HADOOP DO?
• Allow you to keep pace with more volume, more variety, and greater velocity of data.
• Allow you to store all of data in its raw form, so you can ask questions later that were not thought of when the data was captured.
• Enable you to ask questions of your data that previously couldn’t be answered – as well as capture data that previously couldn’t be captured.
2424
INTRODUCING HDINSIGHT
• Key part of Microsoft’s Big Data/Hadoop story
• “PaaS” option for cloud Hadoop
• Azure wraps an Apache Hadoop implementation
created by Hortonworks and Microsoft partnership
• Uses Azure Storage (Tables) for scalable “NoSql”
cloud storage
• Integrates Big Data into existing applications, BI
solutions, reporting environments, Excel
2525
• Establish an Azure Storage account
• Set up an HDInsight cluster
• Account cost relates directly to size of cluster & uptime!
• Upload data
• Using native JavaScript, Hadoop command line, Sqoop connection from
SQL Server or Azure SQL Database or a raft of third-party tools
• Connect and analyze
• Use SQL Server and/or Excel via ODBC,
• Integrate with applications via Hadoop.NET or Azure SQL Database via
Sqoop
HOW DOES IT WORK?
2626
CLOUDERA ON AZURE
• CDH – Cloudera Hadoop distribution
• Installed on Azure Virtual Machines
running Linux
• Cloudera’s preferred cloud platform
• “IaaS” option for cloud-based Hadoop
2828
ADVANCED ANALYTICS WITH AZURE MLCLOUD-BASED DATA SCIENCE & PREDICTIVE ANALYTICS
• Fully-managed Azure offering
• Browser-based development environment
• Deploy predictive models as a Web Service with
Azure ML API
• Data sources: Use HDInsight, Azure Storage, local
data files, HTTP
• Includes best in Class Algorithms from Xbox & Bing
• Built-in support for the R language, includes over
350 packages, or “BYO” R code
• Deploy in minutes
3030
Connect to an HDInsight cluster using Power Query
Extract data into Power Pivot, join with other datasets
from a variety of sources to create powerful mashups
Easily translate Big Data into compelling
visualizations with Power View
ANALYZE BIG DATA WITH POWER BI
3131
CLOUD HADOOP:THE VALUE PROP
• Enables Big Data/Hadoop
proposition, but on a scalable pay-as-
you-go basis
• Enhances analytical capability over
“loosely structured” data
• Expands scope and type of analysis
possible across a wide variety of use
cases
• Integrates easily with existing data
systems
3232
• Turnkey, on-premises Big Data analytics appliance
• Relational Data
• Massively Parallel Processing (MPP) with SQL
Server Parallel Data Warehouse (PDW)
• Non-Relational Data
• 100% Hadoop installation via on-premises version
of HDInsight
• Seamless Querying
• Polybase – query Big Data using SQL
• Performance
• In-Memory Columnstore
• Scale up to 6 PB
ANALYTICS PLATFORM SYSTEM (APS)
3434
– Massively Parallel Processing (MPP)
• Fundamentally different than typical RDBMS
Symmetric Multi-Processing (SMP)
• “Shared nothing” architecture
• Large number of dedicated processors
• Every CPU has its own storage
– Better query and load performance
• Amplified by inclusion of In-Memory
Columnstores
– Fault-tolerant, inexpensive, yet comprehensive
VLDW solution
SQL SERVER PDW
3535
ONCE AGAIN… HDInsight
– Fundamentally the same
product, but deployed within
the APS appliance
– 100% Apache Hadoop
– Query with SQL via Powerbase
– Fully integrated with Microsoft
platform
• User authentication with ADFS
• High availability
• WS Failover Clustering
3636
ON-PREMISES HADOOP: THE VALUE PROP
• Relational and non-relational data
management in one turnkey
solution
• Lowest cost per TB for a data
warehouse appliance in the industry
• Hardware choices: Dell, HP, Quanta
• Integrates into Windows infrastructure
• Performance, security, and scalability
3737
PLANNING FOR THE FUTURE
• Establishing target problems
• Identifying resources (i.e. Azure, on premises)
• Defining and acquiring required skillsets
• Bringing it all together
3838
POLL
Where do you feel you need help with Big Data technology?a. Establishing a business caseb. Transitioning from POC to production c. Establishing a Solution Architectured. Hadoop/Open Source toolsete. Microsoft toolsetf. Other