realtime data in a microservice architecture handout · 'dwd 3odwirup g "ä b Á qù ) e...
TRANSCRIPT
1
Real-time & fast data in a microservice architectureReal-time & fast data in a microservice architecture
Mark van Gool
Technical Product Owner
Data Architect
Mark van Gool
Technical Product Owner
Data Architect
DevOn Summit 2019DevOn Summit 2019
30% of Dutch households already shop with wehkamp today and we keep growing to serve even more
400.000Different products
>500.000Visitors every day
614 millionCustomer sales 16/17
10 millionSent packages
900Employees
30%Of Dutch households
shop at wehkamp
70%Of visitors are female
Over 2.000 brands• C&A• Hunkemöller• Mango• Tommy Hilfiger• River Island• Hugo Boss• Scotch & Soda• HK Living• Bloomingville
Online market position• Fashion #1• Home & Garden #1/2• Electronics #3• Entertainment #3• Home Appliances #3• Sports & Leisure #1• Beauty & Wellness #1
The customer journey
Customer
product price stock
wish list basket order deliverypayment
account
return
email social sea
clickstream
consuming data
producing data
accountingfulfillment
review/rating customer casesession account
review/rating
profile
profile
analytics
delivery options
repair
Customer as consumer and producer of data
Wehkamp employees
product price stock
producing data
accounting purchase order content photo payment
analytics assortment info
consuming data
suppliersprices stock data weather
e-mails feeds sea
Employees as consumer and producer of data
Customer
product price stock
wish list basket order deliverypayment
account
return
email social sea
clickstream
consuming data
producing data
accountingfulfillment
review/rating customer casesession account
review/rating
profile
profile
analytics
delivery options
repair
Wehkamp employees
product price stock
producing data
accounting purchase order content photo payment
analytics assortment info
consuming data
suppliersprices stock data weather
e-mails feeds sea
product
price
stock
wish list
basket
order delivery
payment
account
return
social
sea
clickstream
fulfillment
review/rating
customer casesession
profile
accounting purchase ordercontent
photopayment
analytics
analytics
delivery options
assortment infosupplierspurchase price
stock data
repair
We’re a data company (and online retailer, tech, …)
system
system
system
system
system system
system system
system
system
system
system
system system
DataPlatform
One platform to rule them all
system
system
system
system
system system
system system
system
system
system
system
system system
What it used to be…
Business IntelligenceWeb AnalyticsAnalyticsCampaign Management
batch
External partiesBack-end
we
bsite
Front-end
Data warehouses
Batch
BatchBatch
Batch
Or on detailed level…
`yesterdays’ data (near) real-time
batch streaming
manual automated
being smart – machine learning, AI
more data but structured
data governance & security
more insights with the right tools
Operational efficiency
Operational efficiency
More relevant to customer
More relevant to customer
Increase customer
satisfaction
Increase customer
satisfaction
The ambition
Enterprise
ServiceBus
send
Data warehouse
Service
receive
send
send
receive
subscribe
receive
receive
receive
• Service-to-service communication – anti-pattern• New messages to be implemented by sender, receiver
and ESB• ESB expensive and heavyweight• Data via ETL processes to Datawarehouse • Senders and receivers ‘know’ each other
• Orchestration needed
Monolith
Service
Monolith
LoadTransformExtract
send
Wait! ESB, SOA and DWH’s solve this, right?
publish
data lake
publish
publish
publish
subscribe
subscribe
subscribe
subscribe
subscribe
subscribe
subscribe
subscribe
subscribe
publisher does not ‘know’ who the subscribers are subscribers do not ‘know’ who the publisher is
microservice
local data store
New (old) pattern: publish/subscribe
publish
data lake
publish
publish
subscribe
subscribe
subscribe
subscribe
subscribesubscribe
subscribe
Pub-Sub – data can be consumed by anyone!Consumers are responsible for keeping track of their
progress consuming messages
Producers can also be consumers (of other topics)New Message? Create new topic!
TOPIC
PRODUCER
CONSUMER PRODUCER & CONSUMER
TOPIC
CONSUMER
CONSUMER
CONSUMER GROUP
CONSUMER
TOPICPRODUCER
CONSUMER
• high speed• scalable• light weight• distributed• persistent
Enter: Apache Kafka
web site apps
search
productfetcher
navigation
content
account
recommender
product
searchsuggest
recommender
basket
wish list
big data &data warehousesdata science, intelligence, analytics
click stream
Email marketing system
Customer and order system
Pricing system
Stock system
Product system
Customer service system
How it works in practice
service calls
Batch
Data WarehouseSource system
Analytics
Data Warehouse
Source system
Source system
ETL Processes
ETL Processes
Campaign Mgmt
Web Analytics
Visualization tools
2-24 hrs
Web site Google Tag Mgr Google AnalyticsBig
QueryWeb Analytics
What about data warehouses, BI and web analytics?
data lake
Micro service
data entity
subscribe
subscribeSource system
Source system
Source system
Web siteclickstream
data entity
data entity
Other system
subscribe
Streaming data
processing
Analytics
Campaign Mgmt
Web Analytics
Visualization tools
Web Analytics
Real-time data products
Data Processing
Enter: Apache Kafka & Data Lake
products feed folder e-mail system customer
customere-mail systemAPI
e-maildata platform Kafka Consuming Service
new situation
old situation
couple of seconds
couple of hours
Example: stock and price to e-mail system
Website – CONSUMERPIM (Product Information Management) - PRODUCER
< 5 sec
< 24 hour✗
Example: Real-time product data from PIM to website
web site apps
searchsuggest
data lake &data warehousesdata science, intelligence, analytics
click stream
data processing
winter
summer
Example: Data Driven Search Suggestions
web site apps
recommendation gateway
data lake &data warehouses
click stream
data processing
realtimeproductdata
itemitemrec
toplistrec
categ. / itemrec
clickstream data
Example: Recommendations
The Data Lake
microservices
system
system
Kaf
ka
RE
ST
Dat
a In
ges
tio
n DATA LAKE
Filling the data lake
Customer
ProductMetrics
Clickstream
Dumping data Data Swamp
Data Sources
DATA LAKE
Kafka C
on
nect
microservices
system
system
Kaf
ka R
ES
TD
ata
Ing
esti
on Zone
Zone Zone
Zone
Zone
Streaming data processing
Streaming data processing
Sandbox
Data Catalog
Data Processing(e.g. with
Spark)
Discover, classify, protect sensitive
data (GDPR)
Structure in the Data Lake: zones
• New architecture (microservices, pub/sub, etc.)• New technology (.NET Core, Scala, NodeJS, Kafka,
Spark, etc.)• New way of working (Agile)• New team-setup: DevOps Product Teams• New hosting: AWS (automate everything!!!)• New culture: room for experimentation, responsibilities as
low as possible, less hierarchy• No Architect functions – every engineer can take the role
of an architect – Open Space discussions• Measure everything, don’t decide on gut feeling
The road we took
Logo Bingo
30
Think big, act small….
Demo data pipeline - idea
Streaming Twitter API
readerKafka
Producer
Kafka Consumer
Kafka Consumer
Kafka Consumer
Kafka Consumer
Kafka Connect
Data LakeAnalyticsReporting
Data Science
Demo data pipeline – let’s tweet!
Create an arbitrary tweet with at least the following:
@kafkademo
• 1x Twitter (+ app + keys)• 4 x Raspberry Pi 3 + Raspbian OS• 4 x Blinkt LED strip + Python library• 1x Memobird G2 thermal printer• 1x Confluent Kafka• 1x Python for Python consumer• 1x .NET (for .NET consumer)• 1x .NET (for handling Tweets and producer for Kafka)• 1x Debugged and translated Chinese .NET code for Memobird printer• 1x Lots of config and test• + extra till rolls….
Demo data pipeline - ingredients
糟糕的代码
Twitter-Kafka handler (.NET)
Twitterclient
kafkademo(partition 0)
tweet
Twitterclient
Twitterclient
Streaming API
tweettweet
tweet
kafka-rest
tweet Consumer (Python)
kafkademo(partition 1)
Consumer (Python)
kafkademo(partition 2)
Consumer (Python)
Kafka Consumer (.NET)
tweets
tweets
topic kafkademotopic kafkademo
consumergroupconsumergroup
Twitterclient
Demo data pipeline - architecture Kafka Connect
Tweetinvi.Streaming.IFilteredStream stream;
public void Authenticate(){
ITwitterCredentials creds = new TwitterCredentials("<consumerKey>", <consumerSecret>", "<accessToken>", "<accessTokenSecret");
Auth.SetCredentials(creds);}
public void InitializeFilteredStream(string filterString, Func<ITweet, string, Task> handleMethod){
stream = Stream.CreateFilteredStream();stream.AddTrack(filterString); // in our case @kafkademostream.MatchingTweetReceived += async (sender, tweetargs) =>
await handleMethod(tweetargs.Tweet, filterString);Task.Factory.StartNew(() =>{
stream.StartStreamMatchingAllConditions();});
}
Handling streaming Tweets
public static async Task<bool> PublishToKafkaAsync(string key, string value){
client.BaseAddress = new Uri(<name or ip address of Kafka Rest>);client.DefaultRequestHeaders.Accept.Clear();client.DefaultRequestHeaders.Accept.Add(
new MediaTypeWithQualityHeaderValue("application/vnd.kafka.v2+json"));string topicValue = @"{ ""records"":[{ ""value"":""" + value + @"""}]}"; //ugly, no schema, just for demoHttpRequestMessage httpRequest = new HttpRequestMessage(HttpMethod.Post, @"/topics/kafkademo"){
Content = new StringContent(topicValue, Encoding.UTF8, "application/vnd.kafka.json.v2+json")};try{
var result = await client.SendAsync(httpRequest);return result.IsSuccessStatusCode;
}catch (…)
}
Calling KAFKA REST at specific topic
Consuming from topic and light up LEDsfrom confluent_kafka import Consumer, KafkaError # Kafka libfrom blinkt import set_clear_on_exit, set_brightness, set_pixel, show # BLINKT LED strip for Raspberry Pi lib# (…)settings = {
'group.id': 'GroupPython’, # setting the consumer ID (group.id) for the 3-part consumer group# rest of Kafka settings for consumer# (…)
}c = Consumer(settings)c.subscribe(['kafkademo'])print('Subscribed to topic "kafkademo"')try:
while True:msg = c.poll(0.1)if msg is None:
continueelif not msg.error():
print('Received message: {0}'.format(msg.value()))# light up LED strip in a nice way
elif msg.error().code() == KafkaError._PARTITION_EOF:print('End of partition reached {0}/{1}'.format(msg.topic(), msg.partition()))
else:print('Error occured: {0}'.format(msg.error().str()))
except KeyboardInterrupt:pass
finally:c.close()
Consuming from topic and call printer APIstatic void Main(string[] args){
string brokerList = “<list of Kafka brokers>";string topic = "kafkademo";MemoBirdApi memoBird = new MemoBirdApi();var config = new Dictionary<string, object> {
{ "group.id", "kafkademogroup" },{ "bootstrap.servers", brokerList },
};using (var consumer = new Consumer<Null, string>(config, null, new StringDeserializer(Encoding.UTF8))) {
consumer.Assign(new List<TopicPartitionOffset> {new TopicPartitionOffset(topic, 0, 0),new TopicPartitionOffset(topic, 1, 0),new TopicPartitionOffset(topic, 2, 0) });
consumer.OnError += (_, error) => Console.WriteLine($"Error: {error}");while (true) {
if (consumer.Consume(out Message<Null, string> msg, 10)) {Console.WriteLine($"Topic: {msg.Topic} Partition: {msg.Partition} - {msg.Value}");memoBird.PrintText(msg.Value);
}}
}}
Demo data pipeline - extension with button
Raspberry Pi Zero W+ Python code
Battery
Twitter-Kafka handler (.NET)
Twitterclient
kafkademo(partition 0)
tweet
Twitterclient
Twitterclient
Streaming API
tweettweet
tweet
kafka-rest
tweet Consumer (Python)
kafkademo(partition 1)
Consumer (Python)
kafkademo(partition 2)
Consumer (Python)
Kafka Consumer (.NET)
tweets
tweets
topic kafkademotopic kafkademo
consumergroupconsumergroup
Twitterclient
Demo data pipeline - architecture
simulatedtweet
Kafka Connect
Handling Red Button press and post to KAFKA REST#Handler for the button pressdef handleButtonPress(pin):
print("Button pushed. Calling Kafka REST endpoint...")postOnKafka()
#Do a REST call to the Kafka REST endpointdef postOnKafka():
#Create payload and post a default message on Kafka(…)
#Register the GPIO pinGPIO.setmode(GPIO.BCM)GPIO.setup(4, GPIO.IN, pull_up_down = GPIO.PUD_UP)
#Detect the button pressGPIO.add_event_detect(4, GPIO.FALLING, callback = handleButtonPress, bouncetime = 200)
print("Standing by for button press....")
Demo data pipeline - ideas• Kafka Streams application with Google Translate API
• Real-time, streaming translations of tweets• Kafka Connect configuration towards Amazon S3 Data Lake• Extending tweets with images
• Any more ideas?
DATA LAKE
Ka
fka
Str
ea
ms
44
Microservices Loose coupling
CI/CD
Synergy
Autonomousteams
Automation
Agile
DevOps
(big)data
technologyPub/sub Cloud
Kafka home page https://kafka.apache.org/
Kafka free e-books and papers https://www.confluent.io/apache-kafka-stream-processing-book-bundle
Confluent platform quickstart https://docs.confluent.io/current/quickstart.html
Online training (cheap) https://www.udemy.com/apache-kafka-series-kafka-from-beginner-to-intermediate/
GDPR https://nl.wikipedia.org/wiki/Algemene_verordening_gegevensbescherming
Data Lakes https://aws.amazon.com/big-data/datalakes-and-analytics/what-is-a-data-lake/
Wehkamp Tech Blog https://medium.com/wehkamp-techblog
Mark van Gool [email protected]/in/mark-van-gool
Useful links
46
Thank you!
47