lessons learnt re-writing a pubsub system · about 3 • founded in 2000 • located in...

34
1

Upload: others

Post on 21-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

1

Page 2: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

2

Lessons learnt re-writing a PubSub systemChandru Mullaparthi - Principal Software Architect at bet365

Page 3: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

About

3

• Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers • One of the largest private companies in the UK • Employs more than 2,000 people • 2013-2014: Over £26 billion was staked

– Last year is likely to be around 25% up • Business growing very rapidly!

• Very technology focused company

Page 4: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

History of Erlang use

• Started experimenting with Erlang/OTP circa 2011

• 3 major systems written in Erlang

• 4th one in progress

• Code base of varied quality

4

Page 5: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

Publisher

5

Data sources

Data Pump Publisher

Page 6: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

Why rewrite?

6

Complicated process structure

socket handling process

decode stage 1 decode stage 2 decode stage 3

decode stage 4 decode stage 5 decode stage 6

Pool of processes

Pool of processes

Pool of processes

Message queue hotspots

Data Pump

Page 7: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

New architecture

7

• Exploits tree structure of data shipped to clients

• Total number of nodes in the tree ~2.2million

• Approx 130K top level nodes just below root

Page 8: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

1st attempt

• One process to take a message off the socket

• Decode

• Update global-state ETS table

• Send message to one of 130K top level procs

8

Page 9: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

New architecture

9

socket handling process

+ decoding process

Data Pump

ETS

update handling process

update handling process

Approx. 130K procs update handling process

client socket handler

client socket handler

client socket handler

1 process per TCP connection

Global tree node data store

Result: message queue build up at source

Page 10: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

2nd attempt• One process to take a message off the

socket • Decoding 1st stage • Insert into ETS table • Another process pops message off ETS

table • Update ETS table • Send message to one of 130K top level

procs10

Page 11: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

New architecture

11

socket handling process ETS

decoding process

Data Pump

ETS

update handling process

update handling process

Approx. 130K procs update handling process

client socket handler

client socket handler

client socket handler

1 process per TCP connection

Global tree node data store

Page 12: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

Results

12

It worked well with 1.8 million nodes in the tree

BUT

throughput fell and latency increased for 2.2m nodes in the tree

Page 13: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

Garbage collection

• GC is per process

• Regular GC of a process is “generational”. Data that survives at least one GC will be put on the old heap

• When there is no space on the old heap, a “fullsweep” is performed

13

Page 14: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

GC continued

• The time taken to GC eats into the 2000 reductions available for a process

• So a “busy process” (with lots of incoming messages) with lots of garbage produced will quickly spiral out of control

14

Page 15: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

Reduction count penalty

• Sending message to a process with a “large” message queue

• Long GC • Proportional to size of message sent to another node • Proportional to size of binaries created • …

15

Nice to have Better documentation about reduction count

consumption and “penalties”

Page 16: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

Process priority flag• process_flag(priority, high) does NOT

mean more reductions are available

• It just means it gets scheduled before others, if it has messages in its queue

16

Nice to have Support for processes which have a dedicated

execution thread may be useful?

Page 17: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

Diagnosis

17

[root@publisher ~]# for x in $(seq 10000); \ do ps -eo ppid,pid,user,stat,pcpu,comm,wchan:32 | \ grep D | \ egrep -v "\-|WCHAN"; \ done;

2 263 root DN 0.0 khugepaged call_rwsem_down_write_failed 2 263 root DN 0.0 khugepaged call_rwsem_down_write_failed 2 263 root DN 0.0 khugepaged call_rwsem_down_write_failed 2 263 root DN 0.0 khugepaged call_rwsem_down_write_failed 2 263 root DN 0.0 khugepaged call_rwsem_down_write_failed 2 263 root DN 0.0 khugepaged call_rwsem_down_write_failed

-e : Select all processes -o : user defined format

ppid : parent process id pid : process id stat : process state pcpu : CPU utilisation comm : command name wchan : kernel function name where process is sleeping

Page 18: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

THP

• Transparent Huge Pages are a feature in 64-bit RHEL

• Enabled by default for all applications

• Abstraction over “huge page” support in Linux

• Most database vendors recommend turning this off

18

Page 19: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

What are huge pages?

• Typical size of page in memory is 4KB

• 20GB of memory equates to 5,242,880 pages and that many entries in the page table.

• RHEL kernel allows page sizes of 2MB and 1GB on x86_64

19

Page 20: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

How does THP work?

• Using the huge page feature requires explicit code change to the application (in this case ‘beam’)

• THP aims to make it “just happen”

• It also does some memory defragmentation which seems to be the root cause of the issue

20

Page 21: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

Disabling THP

21

echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag echo no > /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag

Page 22: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

Permanently disable THP

22

#!/bin/sh

if [[ $EUID -ne 0 ]]; then echo "This script must be run as root" 1>&2 exit 1 fi

for KERNEL in /boot/vmlinuz-*; do grubby --update-kernel="$KERNEL" --args='transparent_hugepage=never' done

source: http://unix.stackexchange.com/questions/99154/disable-transparent-hugepages

Page 23: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

Message passing

• Double edged sword – Easy to write reasonably complex code,

quickly – Enough rope to hang yourself easily

• Developed a process message queue with {active, once} semantics

• No improvement in performance, but no noticeable degradation either!

23

Page 24: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

Message queue

24

Page 25: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

Results

25

Mill

isec

onds

(log

sca

le)

100K clients 1x production load

Page 26: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

Results

26

Mill

isec

onds

100K clients 1x production load

Page 27: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

Results

27

Mill

isec

onds

(log

sca

le)

100K clients 4x production load

Page 28: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

Bottlenecks

28

socket handling process ETS

decoding process

Data Pump

ETS

update handling process

update handling process

update handling process

client socket handler

client socket handler

client socket handler

1 process per TCP connection

Global tree node data store

Queues build up here at very high

load

Some of these processes become

“hot”

Page 29: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

Work in progress

29

Page 30: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

Distributed Data Grid

30

• An mnesia like database

• Without the storage backend issues

• And without the “partitioned network” problems with mnesia

• Something similar to Infinispan

Page 31: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

Bounded message q

• Long term itch

• Seems fairly straightforward to achieve ➡ Additional option max_msg_q_length in

process_flag/2 BIF ➡ Modify process_info/2 BIF ➡ Kill process if message queue length

exceeds configured limit

31

Page 32: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

Deep packet inspection

• Analyse traffic patterns in real-time

• Proof of concept being worked on using Erlang

• Speed will be an issue here so an interesting challenge

32

Page 33: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

COME WORK WITH US!

33

Page 34: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers

34