zenko webinar: enabling data scality co-founder & …...thursday, august 3, 2017 enabling data...
TRANSCRIPT
Thursday, August 3, 2017
Enabling Data Control in a Multi-Cloud World
Giorgio RegniScality Co-founder & CTO
Laure VergeronSoftware Engineer
Zenko Webinar:
We’ll shed some light on a few questions:
• What does multicloud mean?
• How to leverage the efficiency of both public and private clouds?
• How can this multi-cloud data controller do search and discovery across
clouds?
• How can you get involved with Zenko?
1
Zenko Webinar: Multi-cloud, hybrid stores, open-source
Freedom to leverage multiple, different cloud infrastructures, private or public
Acknowledge that each application has its own infrastructure requirements that will evolve over time
Acknowledge that each cloud service has their own domain of expertise and leverage the native services they offer
2
What does multicloud mean?
To you, what does multi-cloud enable?(here is one of our answers)
Examples of Use-Cases For Multi-Cloud?
3
▪ Content Distribution▪ Media companies have tens of thousands of movies, which they store on Private Cloud for
control. When it is time to publish a movie, it makes sense to copy it to a public cloud to use its transcoding and CDN services.
▪ Compute Bursting▪ Banks have to do risk analysis leveraging thousands of CPU every night. These intense
computation only run for a few hours. Rather than having idle servers for the rest of the day, it makes sense to use Public Cloud services for the computation
▪ Analytics▪ E-commerce company do more and more machine learning on their very large data lake.
Rather than setting up Hadoop infrastructures in-house, the company can copy just a data set to an Hadoop cloud, compute the appropriate algorithm, and get back the result and destroy the cloud copy of the data to save on storage cost.
▪ Long-term Archival / cold storage▪ While storing data which is regularly accessed is cheaper in a private cloud, long term archive
of never accessed data is cheaper in long term archive cloud offering. Automatic archival of never accessed data would save a lot of money.
Examples of Use-Cases for Multi-Cloud
5
The Zenko Vision
Control and Freedom for data in a Multi-Cloud world.
• Single Interface to any Cloud▪ S3 API as a single API set to any cloud
• Allow reuse in the Cloud▪ Maintain the native cloud format
• Always know your data and where it is▪ Metadata search
• Trigger actions based on data▪ Data Workflow to manage replication, location
The Zenko Multi-Cloud Data Controller
Native format
DataManagement
Data Insight
One S3 API for any cloud
6
• Allow reuse in the Cloud▪ Maintain the native cloud format
• Full compatibility with S3▪ IAM, policies, request syntax...
The Zenko Multi-Cloud Data Controller
7
What was Zenko’s CloudServer previous name?(check your answer on DockerHub)
Zenko is not Scality’s first open-source project...
8
Open Source Scality CloudServer Adoption
▪Launched June 2016
▪Open-source implementation of AWS S3 API
▪Code available on Github under Apache 2.0 license
▪Packaged in Docker container for easy deployment
▪Seamless upgrade to S3 Connector for the RING
Now Over 700,000
"Scality provides our backend storage and gives us a single interface for developers to code within any cloud on a common API set. With Scality,
we can write an application once and deploy anywhere on any cloud.”
Mathias Herberts, co-founder and CTO at Cityzen Data
…Developers Are Seeing the Benefits
10
“We are big users of Docker in our production environment and implemented the Docker version of Scality S3 Server. It is efficient, secure with encryption
and S3 authentication, and very easy to maintain.”Christian Patry, System Engineer, BlueSolutions by Polyconseil
• Amazon S3 API is the defacto standard ▪ Extended to support multiple cloud backends▪ Provides a full featured S3 interface independent of the backend cloud stores ability
• Native Cloud Data Format▪ Data stored in cloud must be accessible in standard format▪ Access via standard keys and methods - enables use by cloud services without change
• Highly Available Cloud Service▪ Integrated in HA services through Docker▪ If higher level of availability required then extensible to RING
CloudServer – Amazon S3 & Native Format
11
S3 API
?OPAQUEDATA
CLOUDGATEWAY
?
Many gateway products store data in a closed "black box" format that is unreadable by native cloud services and apps in the cloud.
Gateways store "opaque data" in the cloud as a black box. Cloud apps try to access it, but cannot due to the proprietary format.
What is the default location constraint for AWS S3?(AWS S3 documentation answers...)
Do you know your AWS basics?
12
Zenko Manages Bucket Metadata Namespace• Decoupled from the underlying data
location• Native APIs used to store data
Control of Buckets location • Provides the default storage location for
objects stored (PUT) into that Bucket• Buckets can be managed across
multiple RINGs & Public Cloud regions• Location Mapping is managed through
a configuration file
Native Cloud Data & Extended Location Control
13
Location: RING-West
Location: us-east-1
PUT Bucket1 LocationConstraint: “RING-West”
PUT Bucket2 LocationConstraint: “S3-US-East-1”
METADATA: Zenko Namespace
REST/Sproxyd S3 API
PUT Bucket3LocationConstraint: “Azure-US”
Blob Storage API
Location: Windows.AzureStorage.US
DATA: via native drivers/APIs
Innovation: Extended Location Control across clouds
• Replication supports copy of objects across RINGs▪ Externalized via the AWS S3 API for Cross-Region Replication (CRR) ▪Configure replication on a source Bucket and assign the Target Bucket in any cloud▪ Objects are asynchronously replicated to target – with native key/data format
Backbeat - Policy-Based Replication Across Clouds
14
S3-us-east-1RING1
OBJ1OBJ1
AWS S3 Namespace
PUT PUT
Backbeat (Async Replication)Amazon CloudFront
Amazon EC2
Amazon EMR
S3 BUCKET CRR API
Zenko Namespace
15
Backbeat Architecture (Details)
Clueso - Metadata Search
16
Federated Search on Metadata Search across cloud namespaces independent of
location Applications can attach extended metadata as
key/value pairs through optional S3 “x-amz-meta-” headers
Search on one or multiple attributes, fuzzy searches
Programmatic Access to Search Accessed through RESTful API with attribute
parameters Natural fit with S3 semantics, e.g
GET /bucketName?search&attributeKey=’attributeValue’
Clueso Search Engine
Examples:“SELECT key where ContentType = ‘PDF’”“SELECT key where Title=‘Mathematics%’”
• Goals are to allow user to retrieve keys:▪ By user metadata (x-amz-meta headers and tags) which are not
predefined
▪ By object owner
▪ Created before or after some date, or range of dates
▪ Using multiple attributes using AND and OR conditions
▪ Fuzzy search on attribute values, such as partial strings and wildcards
such as with SQL “LIKE” statements
▪ Programmatic access to search functionality via an API
Clueso - Overview and Goals
Why Spark?
• Distributed Search by design
• SQL Semantics▪ Not a separate database▪ No need to pre-index (Juliet can create new attributes on the fly)▪ No need to store separate indexes
• Fast Processing - 100x faster than Hadoop
• Flexible Engine▪ Can use same Spark cluster to do Athena-like search on the actual data
• Largest Open Source community in big data - over 1000 contributors▪ Ecosystem of contributors and companies
18
19
Open Source Code on Github under Apache 2.0 License
Zenko Open Source: Features & Capabilities
METADATA
DATA STORAGE
DMD REST/Sproxyd AWS S3 API AZURE BLOB API
Shared Local Storage
S3 API
APP
METADATA
APP
S3 CALLS
Zenko Open SourceS3 API—Single API set and 360° access to any cloud Native format—Data written through Zenko is stored in the native format of the target cloud storage and can be read directly, without going through Zenko.
Project Backbeat for data workflow—Policy-based data management engine
Project Clueso for metadata search— Apache Spark-based metadata search tool for optimal data insight
HA/Failover – Deployed as dual-containers managed by Docker Swarm for HA, but not full scale-out
Simple Security –single-tenant credentials managed locally
S3 API
S3 CALLS
METADATA DATA
CLUESOMetadata Search
Bucket LOCATION
BACKBEATData Policy Engine
Bucket LOCATIONCRR/DATADATA
Data Storage Back-ends- Existing interest in integration of NAS filers- Other public clouds: Oracle, Backblaze and OpenStack based
Clueso Search Plugins- GDPR discovery (find data that is not compliant)- Data analytics- eDiscovery for legal documents
Backbeat plugins for Data Management & Mobility- Migration - Compliance
21
Ecosystem Extensions: Community & Partner Driven
Community Meetups• Initiated prior to our S3 Server launch• Participating at open source events for Docker, Nodejs, etc...
Developer “Hackathons”• Paris and San Francisco in 2015 & 2016• Co-sponsoring with partners – focused on a specific project goal (e.g., IP Drives, S3 API)• Great for building visibility & community participation
Next hackathon to develop creative extensions to Zenko!• 42 Silicon Valley (free coding university)• August 14-18 in Fremont, CA: https://www.zenko.io/hackathon
22
Building a Developer community
Zenko Installation & Portal
Demo
23
• Open Source Community Edition –July 11th ▪ Available through github and docker▪ Common API through S3 API▪ Dual Server HA configuration (non scale-out) ▪ Backend store as volumes, AWS S3
• Open Source Community Edition – September▪ Clueso Metadata Search▪ Backbeat Data Workflow▪ and MS Azure
• Enterprise Edition (EE) – target beginning 2018▪ Scale-out solution▪ Enterprise support
S3
Search Engine
File
Managem
ent UI
Search
BACKBEAT
Data Management Engine
Location Control
Release Plans
24
Zenko EE: Enterprise Security, File & Scale-Out
METADATA: HA/Consistency Cluster
DATA STORAGE
DMD REST/Sproxyd AWS S3 API AZURE BLOB API
Shared Local Storage
S3 API
APP APP
DATA
CLUESOMetadata Search
S3 CALLS
Zenko Enterprise EditionMulti-tenancy & Enterprise Security – Full IAM support of Multi Accounts, Users, Groups, Policies & Single-Sign On (SSO) to AD & LDAP security servers
Scale-Out – N-Way scale-out to any number of servers to deliver capacity AND performance for massive workloads, leverages the Metadata engine cluster from S3 Connector
File & S3 Shared Access – bi-directional file & object sharing with NFS v4/v3 & SMB for legacy apps
Enables full Scale-Out for all key Zenko Services:
• Native Cloud Storage — Support for multiple public clouds and Scality RING in native data format
• Backbeat for data workflow—Policy-based data management engine
• Clueso for metadata search— Apache Spark-based metadata search tool for optimal data insight
S3 API S3 API← Scale Out →
S3 CALLS
← Scale Out →
NFS/SMB
Google CS API
← Enterprise Apps → LegacyApp
NFS/ / SMB
Identity & Access Management (IAM): SAML 2.0/SSO with AD/LDAP
BACKBEATData Policy Engine
METADATA DATACRR/DATAMETADATA
LOCATION LOCATION LOCATION LOCATION
26
Getting involved with Zenko
How can I get involved with Zenko?
• Let us know what you do with Zenko stack!▪ [email protected]▪ Get your project/company featured on the website in a quote
• Contribute tutorials▪ Get a blogpost featuring your introduction of your tutorial ▪ Become part of our readTheDocs hosted documentation
• Contribute code▪ It’s an opportunity to drive the roadmap with us !▪ Join the team and be part of the Zenko craze !▪ We have Contributing Guidelines on the GitHub repos, and we’ll answer your
questions via GitHub issues or our forum forum.scality.com
• Meet us at Microsoft Ignite, AWS Re:invent, Meetups...▪ All info is on www.zenko.io
27