introduction to mapr -...
TRANSCRIPT
© 2016 MapR Technologies 1© 2016 MapR Technologies 1© 2016 MapR Technologies
Introduction to MapR
OUR GLOBAL REACH
© 2017 MapR Technologies 2
• San Jose, California (HQ)• United Kingdom• Korea• Netherlands• Germany
• France• India• Singapore• Australia• Japan
© 2016 MapR Technologies 3© 2016 MapR Technologies 3
MapR is Transforming Business with Data
WHAT WE DO
Bring together
analytics and operations into next-generation
Converged Applications
for the business
WHYIT MATTERS
Empowers companies to grow revenue through innovation
and cutting costs
HOW WE DO IT
Patented technology
architecture with the world’s only complete Converged Data Platform
Leading companies around the world are transforming their business with the industry’s only Converged Data Platform
© 2016 MapR Technologies 4© 2016 MapR Technologies 4
Enabling Transformation Through Converged Applications
OPERATIONALAPPLICATIONS
Immediate
ANALYTICAL APPLICATIONS
Historical
Complete access to real-time and historical data in one platform
Converged Applications
© 2016 MapR Technologies 5© 2016 MapR Technologies 5
Our Customers Are Leading the Way
Financial Services Telco & Media
Ad tech
Government
RetailOver 80 use cases including
payment efficiency and accuracy
of claims processing. $2M/month
reduction in payment errors and
fraud.
Provides 95% of Fortune 500
CPG and retailers with data and
analytics. Achieved $2.5M/year
annual savings from
mainframe & DW offload.
Ported credit scoring use case
to MapR resulting in 20X cost
savings over DB2.
Biometric identification system
for more than 1.25 billion
people in India. $1.3B yearly
savings thru fraud reduction.
Developed a new self-service
analytics platform to give their
customers better market
insights to help them
operationalize their decisions.
Protects $1 trillion in charge
volume from fraud every year.
Amex offers program has
saved card members over
$180M.INNOVATION
COSTREDUCTION
© 2016 MapR Technologies 6© 2016 MapR Technologies 6
Powered by the World’s Only Converged Data Platform
Breakthrough Reliability
Operate globally at enterprise grade for mission critical apps
Breakthrough Value
Radically cut costs of big data IT infrastructure
BreakthroughInnovation
Enable continuous innovation with proprietary technology and open source access
A platform engineered to support next-generation applications
© 2016 MapR Technologies 7© 2016 MapR Technologies 7
Optimized for Speed
Supports parallel processing of large scale analytics and machine learning across data.
Built with Breakthrough TechnologyInnovative architecture delivers uncompromising scale, speed and availability
Optimized for Availability
Provides advanced capabilities including self-healing and disaster recovery to support continuous data access.
Optimized for Scale
Enables high scale processing by organizing underlying data into large distributed containers to scale to trillions of files.
© 2016 MapR Technologies 8© 2016 MapR Technologies 8
A Crisis of Complexity
Expensive to stitch together
Fragile not agile
“Connected” and “Federated” not converged
Limited in scale, no global
Many security models, points of failure
Hadoop & Spark
cluster
Cassandra for event
or content
logging
Classic data
warehouse
Message
middleware
Application
serverDocumentJSON DB
Search server
vs. the Complete Data Platform
Engineered as single platform
Powers legacy and next-gen apps
Enables continuous innovation
Supports all big data technologies
Multiple deployment environments
BUSINESS MODEL: SUPPORT FREE SOFTWARE BUSINESS MODEL: ENTERPRISE SOFTWARE LICENSES
© 2016 MapR Technologies 9© 2016 MapR Technologies 9
Flexible processing where
change is the norm
Distributed processing across clusters, data
centers, public & private cloud environments
Supports global apps that
can scale arbitrarily
A Single Platform: On-Prem, In the Cloud, or InterCloud
© 2014 MapR Technologies 10© 2016 MapR Technologies
MapR Customer Use Cases
© 2014 MapR Technologies 11
ENTERPRISE
DATA HUB
MARKETING
OPTIMIZATION
RISK & SECURITY
OPTIMIZATION
OPERATIONS
INTELLIGENCE
• Multi-structured
data staging & archive
• ETL / DW optimization
• Mainframe optimization
• Data exploration
• Recommendation engines
& targeting
• Customer 360
• Click-stream analysis
• Social media analysis
• Ad optimization
• Network security
monitoring
• Security information &
event management
• Fraudulent behavioral
analysis
• Supply chain & logistics
• System log analysis
• Manufacturing quality
assurance
• Preventative maintenance
• Smart meter analysis
• Non-Productive Time
Mitigation
Common Use Cases:
© 2014 MapR Technologies 12
Exploration and Production OptimizationFind new sources of revenue and maximize revenue from existing sources
• Optimal predictive analytics requires massive data volumes, compute power,
and input speed, leading to costly infrastructure
• Existing data loads limit the ability to run additional analytics for identifying
new revenue opportunities
OBJECTIVES
CHALLENGES
SOLUTION
Business Impact
Image credit: “Oilfields near Ramana” by Mark van Laere is licensed under CC BY-ND 2.0
• Cost-effective, high performance and scalable computing platform for
capturing data from many sources
• Ability to run complex analytics over massive volumes of data to identify
patterns than lead to new revenue sources
More precise predictions for new sources of revenue, lower costs associated with exploration, more
efficient production use of existing revenue sources
• High performance analytics to keep up with massive volumes of high velocity, granular data
• Scalable platform for more cost-effective, parallel processing of predictive analytics
• Make better and faster decisions on pursuing future production projects
• Make more accurate measurements of yield/cost ratio on existing projects
© 2015 MapR Technologies 13
NOV Avoids Oil Well Failure and Reduces Costly DowntimePredictive analytics on oil well operations enables proactive repairs prior to failure
• High cost of managing huge volumes of high resolution data to predict failure
• Failures occur due to many different variables (usage patterns, usage conditions,
etc.) so data on all factors must be captured and correlated
OBJECTIVES
CHALLENGES
SOLUTION
Business Impact
• Efficiently collect/store huge volumes of sensor data (up to 1TB historical data
per rig, PBs of total data), scale out as data grows
• Use predictive analytics and anomaly detection to analyze all data inputs,
and based on historical patterns, alert when equipment is likely to fail
Customers save millions of dollars with predictive maintenance – gaining greater insights with higher
resolution data provides a competitive advantage
• High performance MapR lets them store data at a higher frequency, with fewer resources
• Low latency enables faster & advanced responsiveness for keeping assets running and productive
• Reduce well failure rate by improving predictability of repair and replacement
schedule of parts/equipment by analyzing higher resolution data
• Avoid costly downtime of revenue-generating operations
© 2014 MapR Technologies 14
Smart Meter AnalysisMake more accurate operations decisions from smart meter data
• Cost-effectively managing high velocity data from millions of sources
• Scaling for growth expectations and higher resolution data
OBJECTIVES
CHALLENGES
SOLUTION
Business Impact
Image credit: “Onzo Smart Energy Meter Kit Display” by Digitpedia Com is licensed under CC BY 2.0
• Fast ingestion/storage and large scale analytics on a cost-effective,
distributed computing platform
• Clustering/segmentation, proactive alerting, usage recommendations,
graph analysis, pattern matching, etc. for customer billing optimization,
demand response optimization, increasing operational efficiency
Revenue opportunities around better resource allocation, special offers, customer analytics, etc.
• High performance analytics to get a better understanding of usage and behavior
• Integrated security and HA/DR capabilities to comply with regulations
• Better segmentation of consumer markets for optimized pricing
• Identify opportunities for value-added data services – alerts on anomalous
usage, recommended power plans, allocation planning, etc.
© 2014 MapR Technologies 15
➢ Approximately 20 % reduction in fraud and leakage of govt aid programs($50B)
➢ Average citizen’s life is transformed as they can get access to various stipulated benefits
➢ 645 million citizens currently enrolled providing identity for approx. 60% of the population
➢ 10x throughput; 4-6x lower latency; 1/3 the hardware of previous Hadoop distribution
World’s Largest Biometric Database Indian government agency creates biometric identification system for all citizens
• Increase % of citizens who have bank accounts and can access benefits
• Reduce corruption and fraud in government aid programs
•
• Issues with data replication and loss across clusters in competing distribution
• Weak disaster recovery strategy in competitive distribution
• Complicated upgrade process and high availability issues
• Complete data backup: Snapshots and mirroring
• Lower maintenance overhead: Rolling upgrades
• Fingerprints and retina scans with 200 millisecond response: MapR- DB
OBJECTIVES
CHALLENGES
SOLUTION
Business Impact
INDIAN GOVERNMENT AGENCY
© 2014 MapR Technologies 16© 2016 MapR Technologies
Next Steps
© 2016 MapR Technologies 17© 2016 MapR Technologies 17
Reduce costs to improve efficiency
Extend capabilities
to grow revenues
Innovate for disruptiveadvantage
Put the Power of MapR to Work for Your Business
Wherever you are on your big data journey
© 2016 MapR Technologies 18© 2016 MapR Technologies 18
We Make It Easy to Get Started
1
Understand capabilities of big
data platform
Experimentation
2
Develop first use cases and put into production
Implementation
3
Expand to multiple use
cases across key lines of business
Expansion
4
Integrate and expand data driven apps and analysis to all lines of business and more business functions
Optimization
Take the MapR Big Data Maturity Model
© 2016 MapR Technologies 19© 2016 MapR Technologies 19© 2016 MapR Technologies
Sullexis
Cost Effective Data Archiving and Reporting with BigData Tools and the Cloud
DAMA (Houston) – February 14, 2017
Tim Morgan - Managing Director
About Sullexis
• Sullexis is a professional services firm that specializes in helping its clients to
create, manage, and enhance data to accelerate and improve decision making
across the enterprise. We bring data and technology together to make our clients
measurably more effective
• With industry experience ranging from energy and manufacturing to finance and
high tech, Sullexis brings the technology, processes, and strategies together to
make you more effective in what you do
• Founded in 2006, Sullexis is headquartered in Houston, TX and has a delivery
center in Monterrey, MX.
• Our consultants have implemented solutions across the US, Caribbean, Europe
and Latin America.
Presentation Title 21
Client Background
• Our client is one of North America’s largest Oilfield Services companies
providing well construction, completion and operating services to exploration
and production companies.
• A significant number of acquisitions over the last 10 years resulted in 18
different ERP applications running on 5 different platforms. To enable
future, scale-able growth, they embarked on an ERP standardization project.
The goal was to put the entire company on one technology stack with a
common process.
• Having decided to consolidate on a single ERP, the client still needed to
determine how best to handle compliance, regulatory and operational needs
associated with the legacy systems.
• Migrating transaction data to the new ERP would be cost prohibitive and
risky; and market ready data archiving solutions were costly and unable to
meet the defined business needs.
• This left retaining the legacy systems themselves, which would be very costly,
or finding a new approach that was cost effective, reliable and could meet
the business needs.
22
18 to 1
Key Requirements
Preserve and provide easy access to ALL data• Preserve all structured and unstructured data (approx 12 TBs)
• Ability to run legacy reports to meet compliance, regulatory and ongoing business needs
• Easy for a business person to use, to minimize IT resource dependency
• Ability to provide consolidated views across disparate data sets
Be cost effective• Flexible and scalable compute/data storage options (ex. Use of cold storage)
• Provide access through existing BI and reporting tools (ex. Hyperion, MS Power BI, SAP Lumira)
to eliminate new purchases and training
• Enable 100% decommissioning of legacy systems
Enable the future• Establish processes and tools that support future company acquisitions
• Provide platform to enable new and innovate data applications and solutions
23
Solution Selection Process
Initial Analysis• Market Research
• Vendor presentations
Two week POC ‘bake-off’ to demonstrate:• Rapid integration of different data sources both structured and unstructured
• Connectivity to SAP ECC and Oracle EBS
• Reporting capabilities re-using SAP Lumira
Winning POC Solution• A MapR Converged Data Platform cluster installed in MapR’s private cloud
• Predefined adapters for Oracle used to extract and load structured data to MapR (<100GB)
• Unstructured data of CSV, PDFs and TXT loaded and made viewable through Elastic Search
• Apache Drill and a local install of SAP Lumira connected to the MapR cluster to demonstrate
reporting capabilities
24
Solution Architecture
Project Considerations
Technology Factors
• Reliability and speed of connection to cloud
• Count and category of machines in cloud
(CPU, RAM, Storage)
• Volume of data (row size and count)
• Ongoing transaction use of source system
• Variable needs for data (frequency,
response, volume)
Project Factors
• Timeliness of and accessibility to various
parties
• Cataloging of all data
• Evaluation of transactional status of
existing data sets, and how to address
moving targets (blackout periods, iterative
loads, journaling)
• Ability to validate data loads (row counts
samples)
Solution Architecture
NFS
PDF, CSV, XLS Oracle Navision SysPro MS Excel Great Plains
Data
Web-Scale StorageMapR-FS MapR-DB
Real Time Unified Security Multi-tenancy Disaster Recovery Global NamespaceHigh Availability
MapR StreamsEvent StreamingDatabase
Enterprise Grade Platform
27
PDF TIFF CSV
Why Azure
• Sullexis and client both experienced with Azure and MSFT
• MapR Quick Start on Azure made it easy and fast to get started
• MapR already successfully running well on Azure (see blog)
• Client’s enterprise MSFT account made it simple to procure and administer
• Connectivity to Azure via ExpressRoute mitigated some of the reliability and latency of
connection
28
Apache Drill - Flexible & Fast
Access to any data type, any data source
• Relational
• Nested data
• Schema-less
Rapid time to insights
• Query data in-situ
• No Schemas required
• Easy to get started
Integration with existing tools
• ANSI SQL
• BI tool integration
Scale in all dimensions
• TB-PB of scale
• 1000’s of users
• 1000’s of nodes
Granular security
• Authentication
• Row/column level controls
• De-centralized
29
Sqoop – Easy & Efficient
Leveraging a Sullexis developed direct connect extract tool based on Sqoop was
seen as meeting all the technology and project factors:
• Addresses all source data
• Support for both Oracle and SQL Server
• Import direct to Parquet
• Supports type mapping
• Supports incremental imports and merges
• Enables validation via row count matches
• Provides for parallel imports for enhance speed (but also allows for throttling)
30
Elastic Search – Simple & Transparent
31
Reporting Client Browser
Web UI
edgenode 1node 0 node 2
POSIX Client
PDF TIFF CSV PDF TIFF CSV PDF TIFF CSV
MapR-FS
ODBC or JDBC HTTP(S)
Highlights
• Quick and easy startup
• Primary technical concerns around latency to the cloud can be successfully mitigated (e.g. client’s cluster enabled transfer rates of 100-140 million records per hour)
• While early, the base business case will result in a payback within a few months and business users have suggested that data access is easier now than originally available in the legacy system
• This ERP legacy system decommissioning approach can be executed in as little 2 months for a complete data archive to 6 months with robust operational reporting
• Provides repeatable tools and process available for future system decommissioning needs
• The client is already experimenting with the platform for use as an IoT sensor data historian. So far the results have been encouraging
32
© 2016 MapR Technologies 33© 2016 MapR Technologies 33© 2016 MapR Technologies
Demonstration