google на конференции big data russia
DESCRIPTION
Презентация от компании Google Russia — на конференции Big Data Russia (http://bigdatarussia.ru/).TRANSCRIPT
Google confidential │ Do not distribute Google confidential │ Do not distribute
Big Data with Google Cloud PlatformFocus on insight, not infrastructure
Daniel BergqvistSolution Engineer, Big Data TechnologiesOlga StrelovaCloud Platform Sales,Tel: +7 495 734-71-41, [email protected]
Google confidential │ Do not distribute
Why Big Data?
Google confidential │ Do not distribute
Big Data is driving Big Value
Used data from telematic sensors in over 46K vehicles to:● Reduce daily routes by
85 million miles ● Saved 8.4 million gallons
of fuel● Saved over $30 million
in miles cut/driver/day
Created Snapshot device to collect data on driving habits and user behavior in real-time
Calculated applicable discount to driver’s monthly premium based on their individual behavior
Analyzed the activity of their entire customer base (over 7M customers and 19B images)
Uncovered trends that improved customer acquisition, retention and value through optimized marketing
Google confidential │ Do not distribute
Trends
Increasing Digitization of Human & Economic
Activity
Falling Costs of Storage & Computing
Increasing Pace of Innovation
Google confidential │ Do not distribute
Opportunities with Big Data
Recognize and seize market trends before your competitors
Capture business value from information
Create a smarter, learning organization
1
2
3
Google confidential │ Do not distribute
Complex technical infrastructure to
support distributed computing
Requires specialized expertise
Big Data is Hard Big Data is Expensive
Time consuming
Big Data remains inaccessible
Storage costs scale with larger
datasets
Computing resources must be provisioned for peak-loads
Personnel are expensive
Google confidential │ Do not distribute
No complex data architecture
required
Use the technical and
product skillsets you already have
Big Data is Hard Big Data is Expensive
Google is making Big Data accessible
Pay on-demand for only the
resources you use
Take advantage of falling prices
& Moore’s Law
Reduce infrastructure management
burden
EasyAffordable
Query within seconds and get real-time
results
Google confidential │ Do not distribute
Where did these come from?
Google confidential │ Do not distribute
Cloud Storage Cloud SQL Cloud
Datastore
To organize the world’s information and make it universally accessible and useful
Google confidential │ Do not distribute
Search1B Searches/Month >25% of F500 (GSA)
Android1.5M+ activation per day
900+ M devices
YouTube100 hours of video
uploaded per minute
G+500M+ accounts;
135M+ active in stream
Apps500M+ Gmail
Google Services in Numbers
Chrome310M+ browser users
Maps & Earth1B+ downloads; 200M+ mobile;
10M+ activations on iOS
Cloud Platform4.75M+ apps; 250K+
developers
Google confidential │ Do not distribute
Google is a pioneer in Big Data
SpannerDremelMapReduce
Big Table Colossus
2012 20132002 2004 2006 2008 2010
GFS MillWheel
Flume
Google confidential │ Do not distribute
Store
Cloud Storage Cloud SQL Cloud
Datastore
Capture Analyze
BigQuery Dataflow
We help you manage the entire lifecycle of Big Data
Open Source Tools
Pub/Sub
Process
DataflowStorage DatastoreSQL
Google confidential │ Do not distribute
Computing Patterns
BigQuery
Open Source Tools
Cloud Pub/Sub
Cloud Dataflow
Our Big Data products
• Successor to MapReduce and based on Google technologies, including Flume and MillWheel• Fully managed service• Create data pipelines that ingest, transform and analyze in batch or streaming mode • Takes care of deploying, maintaining and scaling infrastructure
• Interactive analysis of large scale datasets, providing real-time insights• Run fast, SQL queries against virtually limitless datasets in seconds• Full visibility and control with pricing, only pay for querying and storage• No complex data architecture required
• Event management system that simplifies analytics application architecture • Connect your services with reliable, many-to-many asynchronous messaging• Guarantees that messages will be delivered whether or not all consumers are online • Provides a single global ingestion point, not dependent on zone or regional availability• Scales to what you need with no wasted capacity
• Run Hadoop and other FOSS on Cloud platform; take advantage of performance, ease of use and cost efficiency • Using cloud resources eliminates capital costs and reduces administration time• With one command line, start a cluster running Hadoop, Hive, Pig, Spark or Shark in order to get up and running
quickly and without worrying about configuration hassles• Using GCP storage products allows you to take advantage of accessing data within any Hadoop deployment
Google confidential │ Do not distribute
Lets look at specific examples
Google confidential │ Do not distribute
Using Google Cloud Platform for marketing analytics enables a deeper understanding of how marketing investments are performing
What Cloud Platform offers:
● Easily micro-segment by looking for discreet patterns in large sets of customer data
● Measure campaigns by combining multiple datasets that can track campaigns across channels and users across stages of the buying funnel
● Market-mix modeling to optimize spend across channels
● Identify patterns and trends in real-time to improve customer acquisition and ROI
1. Marketing Analytics The Technology
Integration between Google Analytics Premium and BigQuery allows for data mashups, analysis of user interaction across multiple devices, and complex queries at lightening speed to gain deeper, broader insights
Cloud Dataflow helps you ingest and analyze data from both live campaigns, existing CRM tools, and any other data sources you need
Open Source Tools and Connectors allow you to harness the power of many open-source tools such as Hadoop and Spark to provide flexibility when analyzing campaign data BigQuery enables interactive analysis of unlimited amounts of data allowing you to seize opportunities and optimize in a timely manner, thereby increasing acquisition and ROI
Google confidential │ Do not distribute
Home furnishing retailer Rooms To Go simplifies the consumer
shopping experience by offering completely designed room
packages.
Boosting Sales While Improving Shopping Experience
Google confidential │ Do not distribute
Using Google Cloud Platform for sensor data & IoT enables use of diffuse data sources to optimize large-scale systems & improve production processes
What Cloud Platform offers:
● Scalable, reliable platform for capturing and managing IoT data
● Ability to run analytics (streaming and historical) over this data
● Improve customer experiences based on faster responses to events
● Cost effective storage needed to process vast amounts of data
2. Sensor Data & IoT The Technology
Google Cloud Storage, Cloud SQL, and Datastore provide scalable and secure ways to store data
Pub/Sub provides a reliable system for event collection and management
Dataflow allows to filter, aggregate and enrich data both for streaming and batch analysis under one API
BigQuery allows for interactive analysis of unlimited data to uncover trends in large databases and across all customers in order to improve customer experience
Connected Equipments/Devices
Lennox International Inc. is an American company. Through its subsidiaries, it is a provider of climate control products for the heating, ventilation, air conditioning, and refrigeration markets in housing and commercial sectors around the world.
Goal: Capture detailed product performance data and ambient conditions from the installed units for better innovation and customer service
● Innovation: Finding out areas for product improvements and new designs
● Customer Delight: Providing energy settings advice proactively to customer based on usage, weather conditions etc...
● Customer Service: Predictive maintenance to avoid major breakdowns
● Cost Savings: Better understanding of failure points feeding back into better design, helping reduce warranty and replacement costs
Google confidential │ Do not distribute
Using Google Cloud Platform for Log Dataenables easy management of massive log files constantly ingesting real-time data with much shorter response times
What Cloud Platform offers:
● Better management of massive log files● An efficient platform for capturing, managing
and analyzing IoT infrastructure● The ability to continuously identify customer
trends and take timely actions
3. Log Data The Technology
BigQuery handles log files of massive volume, constantly ingesting real-time data with much shorter response times
Pub/Sub provides a fully managed service for reliable event ingestion, distribution and notifications, which automatically scales to what you need with no wasted capacity
Dataflow is a pipeline management system that allows you to examine a real-time stream of data as well as compare it to historical data in order to capture significant patterns and activities
Apps running in Compute Engine and App Engine benefit from advanced log analytics based on data streaming with real-time alerts
Phones
BigQuery Storage
BigQuery Workflows
Big Query
Hadoop MapReduce Workflows
Compute Engine
App Engine
Cloud Storage
Big Query• Business Analysts• Applications • Visualizations
Motorola
Google confidential │ Do not distribute
Using Google Cloud Platform for SaaSenables ease of management for analytics
What Cloud Platform offers:
● Ease of integration with open source tools
● A platform to capture, process and analyze large scale analytics without needing to worry about building a complex infrastructure
● Technology that scales and requires minimal administration
● The most cost effective, fastest way to store and analyze data
4. SaaS The TechnologyConnectors and Tools for Hadoop data sources allow you to easily install different open source processing frameworks such as Spark, Shark, Hive and Pig to take advantage of interoperability and portability within all these frameworks as well as other Google Cloud Platform products under one system
Dataflow takes care of ingestion, transformation and analysis of data, providing real-time access to application and consumer data across a set of devices
Compute Engine allows you to easily scale up and down depending on your workload. Also, per minute billing lets you pay for exactly what you use and sustained-use discounts automatically reward you for running steady-state workloads
BigQuery provides a 99.9% uptime SLA and you only pay for the storage you need and queries you run, giving you full visibility and control
Cloud Storage and Big Query require no hardware/software eliminating capital expenditure or the need to build complex infrastructure
Google confidential │ Do not distribute
Streak - CRM in email
Managing millions of interactions and recommendations/day with Prediction API and BigQuery
Google confidential │ Do not distribute
Using Google Cloud Platform for Hadoop Workloads enables an easy and effective way to unlock the power of the Apache Hadoop framework
What Cloud Platform offers:
● Quick startup times● Unmatched value with per-minute billing to optimize
for scale and speed● Agility to mix and match data with multiple open
source software and cloud services without worrying about configuration
● Greater stability for running Hadoop● Flexibility and control of resizing your cluster
depending on workload● An easy way to leverage the Hadoop framework
without worrying about investing in costly infrastructures and administration
5. Traditional Hadoop Workloads The Technology
Compute Engine virtual machines start in seconds
bdutil allows you to easily deploy and use the best tools from the open-source ecosystem. With one command line, you can start a cluster running Hadoop, Hive, Pig, Spark or Shark in order to get up and running quickly without worrying about configuration hassles
Cloud Storage frees you from the burden of investing in complex disks and machines and provides flexibility to scale up and down when needed
Connectors provide access to Cloud Storage, BigQuery and Datastore, which allow you to turn down your cluster without losing any of your data and take advantage of accessing your data within any of your Hadoop deployments
Google confidential │ Do not distribute
Cdiscount.com
France's largest e-commerce site, Cdiscount.com, is using Compute Engine because it's 15x faster than their on premise data warehouse.
Google confidential │ Do not distribute
Google probably processes more information than any company on the
planet and tends to have to invent tools to cope with the data. As a result its
technology runs a good five to 10 years ahead of the competition.
Bloomberg Businessweek, June 2014
Google confidential │ Do not distribute