schema registry - set you data free

Post on 13-Apr-2017

69 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Schema RegistrySatish Duggana, HortonworksDataworks summit - 2017, Munich

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Introduction What is Schema Registry?

• A shared repository of schemas that allows applications to flexibly interact with each other

What Value does Schema Registry Provide?– Data Governance

• Provide reusable schema • Define relationship between schemas• Enable generic format conversion, and generic routing

– Operational Efficiency• To avoid attaching schema to every piece of data • Producers and consumers can evolve at different rates

Example Use– Register Schemas for Kafka Topics to be used by consumers of Kafka Topic (e.g: Nifi, StreamLine)

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Schema Registry Concepts

• Schema Group A logical grouping/container for similar type of schemas or based any criteria that the customer has from managing the schemas

• Schema Metadata Metadata associated with a named schema.

• Schema Version The actual versioned schema associated a schema meta definition

Schema Metadata 1

Schema NameSchema TypeDescriptionCompatibility PolicySerializersDeserializers

Schema Group

Group Name

SchemaVersion 3

SchemaVersion 2

Schema Version 1versiontextFingerprint

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Schema Registry

Schema Registry Component Architecture

SR Web Server

Schema RegistryWeb App

REST APISchema Registry Client

Java Client

Integrations

Nifi Processors Kafka Ser/Des StreamLine

SchemaStorage

Pluggable Storage

Serializer/Deserializer Jar Storage

MySQL In-Memory Local File System

HDFSPostgres

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Writer/Reader schemas

Writer schema– Senders/Producers use this schema while sending the payloads according to the given schema viz

writer’s schema

Reader/Projection schema– Receivers uses this schema to project the received payload written with a writer schema.

Sender ReceiverWriter

SchemaWriter

SchemaProjection

Schema

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Schema evolution

Producerv2

Consumerv2

Producerv1

Producerv4

Consumerv5

Producerv1

Consumerv7

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Schema Compatibility Policies

What is a Compatibility Policy?– Defines the rules of how the schemas can evolve– Subsequent version updates has to honor the schema’s original compatibility.

Policies Supported– Backward– Forward– Both– None

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Backward compatibility

New version of a schema would be compatible with earlier version of that schema. Data written from earlier version of the schema, can be read with a new version of the

schema.

V1{ "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" } ]}

V2{ "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" }, { "name": "pages", "type": "int", "default": -1 } ]}

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Forward compatibility

Existing schema is compatible with future versions of the schema. That means the data written from new version of the schema can still be read with old

version of the schema.

V1{ "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" } ]}

V2{ "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" }, { "name": "pages", "type": "int" } ]}

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Both/Full compatibility

New version of the schema provides both backward and forward compatibilities.

V1{ "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" } ]}

V2{ "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" }, { "name": "pages", "type": "int", "default": -1 }, { "name": "title", "type" : "string", "default": "" } ]}

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Schema composition

Schemas can be shared and reused with in existing schemas Inbuilt support in default serializer/deserializer to build effective schemas

{ "name": "account", "namespace": "com.hortonworks.example.types", "includeSchemas": [ { "name": "utils” } ], "type": "record", "fields": [ { "name": "name", "type": "string" }, { "name": "id", "type": "com.hortonworks.datatypes.uuid" } ]}

{ "name": "uuid", "type": "record", "namespace": "com.hortonworks.datatypes", "doc": "A Universally Unique Identifier, in canonical form in lowercase. This is generated from java.util.UUID Example: de305d54-75b4-431b-adb2-eb6b9e546014", "fields": [ { "name": "value", "type": "string", "default": "" } ]}

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Sender/Receiver flow

Local schema/serdes

cache

Serializer

Sender

Schema Registry Client

Message Store

Local schema/serdes

cache

Deserializer

Schema Registry Client

versionpayload

versionpayload

Schema Storage SerDes Storage

Receiver

SchemaRegistrySchemaRegistry SchemaRegistry

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Serializers/Deserializers

Snapshot based serializer/deserializer– Serializes the complete payload– Deserializes the payload to respective type

Pull based serializer/deserializer– Serialize whatever elements are required and ignore other elements– Pull out whatever elements that are required to build the desired object

Push based deserializer– Gives callback to receive parsing events for respective fields in schema

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Schema registry client

REST based client Caching

– Metadata– Schema versions– Ser/des libs and class loaders

URL selectors– Round robin– Failover– Custom

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

HA

Storage provider – Depends on transactional support of

underlying SQL stores– Spinup required schema registry

instances

Supports HA at SchemaRegistry– Using ZK/Curator– Automatic failover of master– Master gets all writes– Slaves receive only reads

SchemaRegistry

storage

SchemaRegistrySchemaRegistry

SchemaRegistry

SchemaRegistrySchemaRegistry

storage

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Integration of Schema Registry

Kafka– Using producer/consumer API for serializer/deserializer

Nifi Processors for Schema Registry– Fetch Schema– Serialize/Deserialize with Schema

StreamLine– Lookup Schema of a Kafka Topic

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Kafka integration

Local schema/serdes

cache

KafkaAvroSerializer

Producer

Schema Registry Client

Local schema/serdes

cache

KafkaAvroDeserializer

Schema Registry Client

versionpayload

versionpayload

Consumer

SchemaRegistrySchemaRegistry SchemaRegistry

Kafka

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Kafka Avro ser/des protocol

ser/des can be implemented with different protocols Default ser/des send protocol/schema versions as part of the binary payload of kafka

messages– Can be enhanced to use headers/metadata instead of the message payload– Custom ser/des can be registered for schemas.

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Nifi integration

Nifi Controller Service Nifi processors

– Transforms• Avro – CSV• Avro – Json• Json – CSV

– Extracting Avro fields

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Schema Registry UI

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

WIP/Future enhancements

Security– Kerberos support– Default authorizers and Apache Ranger support

Archiving schemas Notifications

– New versions– Archiving

Converters

22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Try it out!

https://github.com/hortonworks/registry https://groups.google.com/forum/#!forum/registry Open sourced under Apache license Apache incubation soon Contributions are welcome

23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Q & A

top related