adding hadoop to your analytics mix?
TRANSCRIPT
MAKING BIG DATA COME ALIVE
Adding Hadoop to Your Analytics Mix: Challenges and Strategies
Madina KassengaliyevaJuly 23, 2015
2© 2015 Think Big, a Teradata Company 04/18/2023
Madina KassengaliyevaDirector, Client Services, Think Big
Madina Kassengaliyeva is responsible for ensuring successful delivery of Think Big’s service engagements. Madina has led strategy, engineering and data science engagements in a variety of areas, including recommendation engines, customer interactions optimization, marketing analytics and compliance. Madina holds an MBA from the University of Chicago and a BA in International Studies from American University.
Presenters
Paul BarschDirector, Services Marketing, Think Big
Paul Barsch directs marketing programs for Think Big, a Teradata Company. Paul has been in IT for 15+ years in variety of roles for Teradata, HP Enterprise Services and KPMG Consulting.
3
Housekeeping
Use the widget bar below to…
Get valuable resources & complete exit survey
Ask Questions to the Presenters
Request online technical help
Go social….
…and follow the conversation © 2015 Think Big, a Teradata Company 04/18/2023
4
• Hadoop Adoption Path
• Key Challenges – Data, Organization, Capabilities
• Ideas for Solutions
Agenda
5© 2015 Think Big, a Teradata Company 04/18/2023
Common Hadoop Adoption Path
1. Address Immediate
Needs
2. Establish a Data
Repository
3. Initial Analytics
Exploration
4. Integrate Hadoop into the Analytics Capabilities
• Hadoopusedtorelieveatechnologypainpoint
• Reducedatawarehousecosts
• SpeedupETL
• Theonlyusersareintechnologyteams
• MoreandmoredatagetsaddedtoHadoopasaresultofPhase1
• Greaterdatavariety,morerawdata,deeperhistory
• Initialdatatransfer,security,andgovernancepracticesareestablished
• Stillperceivedaslargelyatechnologyplatform
• LimitednumberofpeopleorteamsconductPOCsusingHadoop
• Analyticstechniquesnotavailableontraditionalplatformsareapplied
• Earlywinsindicatepromisingbusinessimpactandexcitementbuilds
• MultipleteamsuseHadoopaspartoftheanalyticsinfrastructure
• Techniques,methods,bestpracticesandaccesspatternsgetcodified
• Businessbeginstocaptureconsistentvalue
TransitionfromPhase3toPhase4iswhenkeychallengesemerge
6© 2015 Think Big, a Teradata Company 04/18/2023
Hadoop Adoption – Critical Point
7© 2015 Think Big, a Teradata Company 04/18/2023
Key Challenges
Data
Organization
Capabilities
• Impact of schema on read
• Consistent taxonomies and reference data
• Architecture - access patterns and flows
• Skills, roles and responsibilities
• Lack of common vocabulary
• Knowledge capture and sharing
• Foundational capabilities at the whim of changing business priorities
• Future that’s hard to envision is hard to build
8© 2015 Think Big, a Teradata Company 04/18/2023
Organization – Key Challenges
• Skills, roles and responsibilitieso Significant skills gaps between what’s currently available and what is
neededo Both business and technology do analytics and often engineering,
blurring lines of responsibility or ownershipo “Throw over the wall” doesn’t work
• Lack of common vocabularyo Every BU (and every leader) have their own understanding of the same
wordso This is rarely discussed
• Knowledge capture and sharingo Multiple teams work with the same data and similar techniqueso Organization silos do not naturally support broad knowledge transfer
9© 2015 Think Big, a Teradata Company 04/18/2023
• Cross-BU committee to guide organizational change, define common vocabulary, defend the effort to executive leadership and share success
• Thorough, honest skills assessments to identify gaps, training needs, augmentation needs, map to roles and responsibilities
• Documented tools requirements based on current and projected skills
• Collaboration architecture
• Plug into existing knowledge transfer practices and tools and allow for informal information exchange based on data access privileges
Organization – Ideas for Solutions
10© 2015 Think Big, a Teradata Company 04/18/2023
Organization – Key Functions
Strategy
Data Management & Governance
Architecture Tools Market Research
Roadmap Planning
Value Realization
Future Data Sources
Services
Support
Visualization & ReportingData SME’s
Core Platform Development Testing
Operations
Core Platform Management
Metrics Tracking & Reporting Platform Integration
Program Management
Roadmap Execution
Cross Group Coordination
Financial Management
Small Project Prioritization
Communication & Change
Management
Application Development
Analytic Sandbox
Data Science
Integration, Interfaces &
Ingestion
Training
Incident Management Config, Change, Release ManagementProblem Management
Help DeskKnowledge Management
Technology Governance
Data Quality & Metrics
Access Controls
Data Governance
Metadata Management
11© 2015 Think Big, a Teradata Company 04/18/2023
• Foundational capabilities at the whim of changing business priorities
• Lack of consensus on what are foundational capabilities
• Let’s be honest, the “Top Project” changes often and the resources go with it
• Foundational capabilities do not immediately impact the bottom line
• Future that’s hard to envision is hard to build
• Lack of shared vision
• Clarity needed at multiple levels – strategy, operational details, day to day
Capabilities – Key Challenges
12© 2015 Think Big, a Teradata Company 04/18/2023
• Consolidate ownership in a team that has organizational influence and includes representatives from the business, the infrastructure, architecture, data, and analytics
• Back to vocabulary – agree on what capabilities mean for your business unit and your technology partners
• Roadmaps are useful – visual representations of high-level goals against a time line that should define your projects
• Dedicate resource to capabilities and protect them
• Check in with your roadmap – does it still reflect your vision?
Capabilities – Ideas for Solutions
Photo courtesy of Flickr. Creative Commons. By E.Bass.
13© 2015 Think Big, a Teradata Company 04/18/2023
Capabilities Pyramid
14© 2015 Think Big, a Teradata Company 04/18/2023
Capabilities: Roadmap Example
Analytics standardizedmethods,code,tools,teamroles
Operations standardizedprocesses,tools,teamroles
Skills and roles matrix
Data Ingestion, Transfer, Structuring,
and Governance approach
Unified Model Management
Integrated Data Science
Variablesbasedonsinglesourcestructureddata
VariableselectioninHadoop
Integrationwithexistingscoringengine
BatchdataprocessinginHadoopIntegration Cross-channeland intraday variables generation
BatchscoringinHadoop
Naturallanguageprocessingtoanalyzetextandvoice
Initialreal-timescoring
Execution Methodology and project management
Data and Models
Organization and Management
Analytics Knowledge Management
Scoring Architectural and Analytical design
Data Lifecycle Management
Real-time scoring design
Statisticalandmachine-learning-basedmodeling
DataExplorationofunstructureddatacomponents(e.g.URL,chattext)
DataExplorationofstructureddatacomponents(e.g.pageviews,
Cross-channelvariables,variablesfromunstructureddata+intradayvariables
15© 2015 Think Big, a Teradata Company 04/18/2023
• Impact of schema on read
• Hadoop supports a variety of data structures, which simplifies data ingestion and allows data users to define preferred schemas
• This shifts the burden of defining the schema to the data users
• Consistent taxonomies and reference data
• Meaningful data analysis requires known and consistent taxonomy
• New taxonomies can get created by individual teams
• Reference data changes
• Architecture - access patterns and flows
• Data flows across platforms, regular updates, physical and virtual constraints
• Decisions on what should be done where
Data – Key Challenges
16© 2015 Think Big, a Teradata Company 04/18/2023
• Big issue with lots of opinions – see Data Lake et. al
• Test and define common data manipulation patterns for different use cases – aggregations, reductions, basic statistical derivations
• Centralize the responsibility for data governance, data architecture, taxonomy, and maintenance
• Establish knowledge sharing for data post-analytics
Data – Ideas for Solutions
Photo courtesy of Flickr. Creative Commons. By Renzo Ferrante
17© 2015 Think Big, a Teradata Company 04/18/2023
• Data management, knowledge, architecture, and processing assurance
• Investment justification, research, knowledge sharing
• Data aggregation and enhancement
Client Example – Centralized Data Group
Data Source 1
Data Source 2
Data Source 3
Data Source 3
Business Group
Product Group
Central Tech Group
18© 2015 Think Big, a Teradata Company 04/18/2023
Conclusions
Data
Organization
Capabilities
• Centralize data management• Knowledge of data = knowledge of
business
• Technology is not enough – need the right people and processes
• Executive commitment is key
• Tough conversations can yield much better alignment
• Dedicate and protect resources to build capabilities
19
• 100% Big Data Focus
• Founded in 2010 with100+ engagements across 70 clients
• Unlock value of big data with data science and data engineering services
• Proven vendor-neutral open source integration expertise
• Agile team-based development methodology
• Think Big Academy for skills and organizational development
• Global delivery model
Who is Think Big?
20
Questions and Answers
Questions and Answers
Thank You!