zenko webinar: enabling data scality co-founder & …...thursday, august 3, 2017 enabling data...

Thursday, August 3, 2017

Enabling Data Control in a Multi-Cloud World

Giorgio RegniScality Co-founder & CTO

Laure VergeronSoftware Engineer

Zenko Webinar:

We’ll shed some light on a few questions:

• What does multicloud mean?

• How to leverage the efficiency of both public and private clouds?

• How can this multi-cloud data controller do search and discovery across

clouds?

• How can you get involved with Zenko?

1

Zenko Webinar: Multi-cloud, hybrid stores, open-source

Freedom to leverage multiple, different cloud infrastructures, private or public

Acknowledge that each application has its own infrastructure requirements that will evolve over time

Acknowledge that each cloud service has their own domain of expertise and leverage the native services they offer

2

What does multicloud mean?

To you, what does multi-cloud enable?(here is one of our answers)

Examples of Use-Cases For Multi-Cloud?

3

http://www.zenko.io/blog/need-open-source-multi-cloud-data-controller/

▪ Content Distribution▪ Media companies have tens of thousands of movies, which they store on Private Cloud for

control. When it is time to publish a movie, it makes sense to copy it to a public cloud to use its transcoding and CDN services.

▪ Compute Bursting▪ Banks have to do risk analysis leveraging thousands of CPU every night. These intense

computation only run for a few hours. Rather than having idle servers for the rest of the day, it makes sense to use Public Cloud services for the computation

▪ Analytics▪ E-commerce company do more and more machine learning on their very large data lake.

Rather than setting up Hadoop infrastructures in-house, the company can copy just a data set to an Hadoop cloud, compute the appropriate algorithm, and get back the result and destroy the cloud copy of the data to save on storage cost.

▪ Long-term Archival / cold storage▪ While storing data which is regularly accessed is cheaper in a private cloud, long term archive

of never accessed data is cheaper in long term archive cloud offering. Automatic archival of never accessed data would save a lot of money.

Examples of Use-Cases for Multi-Cloud

5

The Zenko Vision

Control and Freedom for data in a Multi-Cloud world.

• Single Interface to any Cloud▪ S3 API as a single API set to any cloud

• Allow reuse in the Cloud▪ Maintain the native cloud format

• Always know your data and where it is▪ Metadata search

• Trigger actions based on data▪ Data Workflow to manage replication, location

The Zenko Multi-Cloud Data Controller

Native format

DataManagement

Data Insight

One S3 API for any cloud

6

• Allow reuse in the Cloud▪ Maintain the native cloud format

• Full compatibility with S3▪ IAM, policies, request syntax...

The Zenko Multi-Cloud Data Controller

7

What was Zenko’s CloudServer previous name?(check your answer on DockerHub)

Zenko is not Scality’s first open-source project...

8

https://hub.docker.com/r/scality/s3server/

Open Source Scality CloudServer Adoption

▪Launched June 2016

▪Open-source implementation of AWS S3 API

▪Code available on Github under Apache 2.0 license

▪Packaged in Docker container for easy deployment

▪Seamless upgrade to S3 Connector for the RING

Now Over 700,000

"Scality provides our backend storage and gives us a single interface for developers to code within any cloud on a common API set. With Scality,

we can write an application once and deploy anywhere on any cloud.”

Mathias Herberts, co-founder and CTO at Cityzen Data

…Developers Are Seeing the Benefits

10

“We are big users of Docker in our production environment and implemented the Docker version of Scality S3 Server. It is efficient, secure with encryption

and S3 authentication, and very easy to maintain.”Christian Patry, System Engineer, BlueSolutions by Polyconseil

• Amazon S3 API is the defacto standard ▪ Extended to support multiple cloud backends▪ Provides a full featured S3 interface independent of the backend cloud stores ability

• Native Cloud Data Format▪ Data stored in cloud must be accessible in standard format▪ Access via standard keys and methods - enables use by cloud services without change

• Highly Available Cloud Service▪ Integrated in HA services through Docker▪ If higher level of availability required then extensible to RING

CloudServer – Amazon S3 & Native Format

11

S3 API

?OPAQUEDATA

CLOUDGATEWAY

?

Many gateway products store data in a closed "black box" format that is unreadable by native cloud services and apps in the cloud.

Gateways store "opaque data" in the cloud as a black box. Cloud apps try to access it, but cannot due to the proprietary format.

What is the default location constraint for AWS S3?(AWS S3 documentation answers...)

Do you know your AWS basics?

12

http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region

Zenko Manages Bucket Metadata Namespace• Decoupled from the underlying data

location• Native APIs used to store data

Control of Buckets location • Provides the default storage location for

objects stored (PUT) into that Bucket• Buckets can be managed across

multiple RINGs & Public Cloud regions• Location Mapping is managed through

a configuration file

Native Cloud Data & Extended Location Control

13

Location: RING-West

Location: us-east-1

PUT Bucket1 LocationConstraint: “RING-West”

PUT Bucket2 LocationConstraint: “S3-US-East-1”

METADATA: Zenko Namespace

REST/Sproxyd S3 API

PUT Bucket3LocationConstraint: “Azure-US”

Blob Storage API

Location: Windows.AzureStorage.US

DATA: via native drivers/APIs

Innovation: Extended Location Control across clouds

• Replication supports copy of objects across RINGs▪ Externalized via the AWS S3 API for Cross-Region Replication (CRR) ▪Configure replication on a source Bucket and assign the Target Bucket in any cloud▪ Objects are asynchronously replicated to target – with native key/data format

Backbeat - Policy-Based Replication Across Clouds

14

S3-us-east-1RING1

OBJ1OBJ1

AWS S3 Namespace

PUT PUT

Backbeat (Async Replication)Amazon CloudFront

Amazon EC2

Amazon EMR

S3 BUCKET CRR API

Zenko Namespace

15

Backbeat Architecture (Details)

Clueso - Metadata Search

16

Federated Search on Metadata Search across cloud namespaces independent of

location Applications can attach extended metadata as

key/value pairs through optional S3 “x-amz-meta-” headers

Search on one or multiple attributes, fuzzy searches

Programmatic Access to Search Accessed through RESTful API with attribute

parameters Natural fit with S3 semantics, e.g

GET /bucketName?search&attributeKey=’attributeValue’

Clueso Search Engine

Examples:“SELECT key where ContentType = ‘PDF’”“SELECT key where Title=‘Mathematics%’”

• Goals are to allow user to retrieve keys:▪ By user metadata (x-amz-meta headers and tags) which are not

predefined

▪ By object owner

▪ Created before or after some date, or range of dates

▪ Using multiple attributes using AND and OR conditions

▪ Fuzzy search on attribute values, such as partial strings and wildcards

such as with SQL “LIKE” statements

▪ Programmatic access to search functionality via an API

Clueso - Overview and Goals

Why Spark?

• Distributed Search by design

• SQL Semantics▪ Not a separate database▪ No need to pre-index (Juliet can create new attributes on the fly)▪ No need to store separate indexes

• Fast Processing - 100x faster than Hadoop

• Flexible Engine▪ Can use same Spark cluster to do Athena-like search on the actual data

• Largest Open Source community in big data - over 1000 contributors▪ Ecosystem of contributors and companies

18

19

Open Source Code on Github under Apache 2.0 License

Zenko Open Source: Features & Capabilities

METADATA

DATA STORAGE

DMD REST/Sproxyd AWS S3 API AZURE BLOB API

Shared Local Storage

S3 API

APP

METADATA

APP

S3 CALLS

Zenko Open SourceS3 API—Single API set and 360° access to any cloud Native format—Data written through Zenko is stored in the native format of the target cloud storage and can be read directly, without going through Zenko.

Project Backbeat for data workflow—Policy-based data management engine

Project Clueso for metadata search— Apache Spark-based metadata search tool for optimal data insight

HA/Failover – Deployed as dual-containers managed by Docker Swarm for HA, but not full scale-out

Simple Security –single-tenant credentials managed locally

S3 API

S3 CALLS

METADATA DATA

CLUESOMetadata Search

Bucket LOCATION

BACKBEATData Policy Engine

Bucket LOCATIONCRR/DATADATA

Data Storage Back-ends- Existing interest in integration of NAS filers- Other public clouds: Oracle, Backblaze and OpenStack based

Clueso Search Plugins- GDPR discovery (find data that is not compliant)- Data analytics- eDiscovery for legal documents

Backbeat plugins for Data Management & Mobility- Migration - Compliance

21

Ecosystem Extensions: Community & Partner Driven

Community Meetups• Initiated prior to our S3 Server launch• Participating at open source events for Docker, Nodejs, etc...

Developer “Hackathons”• Paris and San Francisco in 2015 & 2016• Co-sponsoring with partners – focused on a specific project goal (e.g., IP Drives, S3 API)• Great for building visibility & community participation

Next hackathon to develop creative extensions to Zenko!• 42 Silicon Valley (free coding university)• August 14-18 in Fremont, CA: https://www.zenko.io/hackathon

22

Building a Developer community

https://www.zenko.io/hackathon

Zenko Installation & Portal

Demo

23

• Open Source Community Edition –July 11th ▪ Available through github and docker▪ Common API through S3 API▪ Dual Server HA configuration (non scale-out) ▪ Backend store as volumes, AWS S3

• Open Source Community Edition – September▪ Clueso Metadata Search▪ Backbeat Data Workflow▪ and MS Azure

• Enterprise Edition (EE) – target beginning 2018▪ Scale-out solution▪ Enterprise support

S3

Search Engine

File

Managem

ent UI

Search

BACKBEAT

Data Management Engine

Location Control

Release Plans

24

Zenko EE: Enterprise Security, File & Scale-Out

METADATA: HA/Consistency Cluster

DATA STORAGE

DMD REST/Sproxyd AWS S3 API AZURE BLOB API

Shared Local Storage

S3 API

APP APP

DATA

CLUESOMetadata Search

S3 CALLS

Zenko Enterprise EditionMulti-tenancy & Enterprise Security – Full IAM support of Multi Accounts, Users, Groups, Policies & Single-Sign On (SSO) to AD & LDAP security servers

Scale-Out – N-Way scale-out to any number of servers to deliver capacity AND performance for massive workloads, leverages the Metadata engine cluster from S3 Connector

File & S3 Shared Access – bi-directional file & object sharing with NFS v4/v3 & SMB for legacy apps

Enables full Scale-Out for all key Zenko Services:

• Native Cloud Storage — Support for multiple public clouds and Scality RING in native data format

• Backbeat for data workflow—Policy-based data management engine

• Clueso for metadata search— Apache Spark-based metadata search tool for optimal data insight

S3 API S3 API← Scale Out →

S3 CALLS

← Scale Out →

NFS/SMB

Google CS API

← Enterprise Apps → LegacyApp

NFS/ / SMB

Identity & Access Management (IAM): SAML 2.0/SSO with AD/LDAP

BACKBEATData Policy Engine

METADATA DATACRR/DATAMETADATA

LOCATION LOCATION LOCATION LOCATION

26

Getting involved with Zenko

How can I get involved with Zenko?

• Let us know what you do with Zenko stack!▪ [email protected]▪ Get your project/company featured on the website in a quote

• Contribute tutorials▪ Get a blogpost featuring your introduction of your tutorial ▪ Become part of our readTheDocs hosted documentation

• Contribute code▪ It’s an opportunity to drive the roadmap with us !▪ Join the team and be part of the Zenko craze !▪ We have Contributing Guidelines on the GitHub repos, and we’ll answer your

questions via GitHub issues or our forum forum.scality.com

• Meet us at Microsoft Ignite, AWS Re:invent, Meetups...▪ All info is on www.zenko.io

27

mailto:[email protected]

http://forum.scality.com/

http://www.zenko.io

Email: [email protected]

Thank You

mailto:[email protected]

zenko webinar: enabling data scality co-founder & …...thursday, august 3, 2017 enabling data...

Documents