lessons learnt re-writing a pubsub system · about 3 • founded in 2000 • located in...
TRANSCRIPT
![Page 1: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/1.jpg)
1
![Page 2: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/2.jpg)
2
Lessons learnt re-writing a PubSub systemChandru Mullaparthi - Principal Software Architect at bet365
![Page 3: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/3.jpg)
About
3
• Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers • One of the largest private companies in the UK • Employs more than 2,000 people • 2013-2014: Over £26 billion was staked
– Last year is likely to be around 25% up • Business growing very rapidly!
• Very technology focused company
![Page 4: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/4.jpg)
History of Erlang use
• Started experimenting with Erlang/OTP circa 2011
• 3 major systems written in Erlang
• 4th one in progress
• Code base of varied quality
4
![Page 5: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/5.jpg)
Publisher
5
Data sources
Data Pump Publisher
![Page 6: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/6.jpg)
Why rewrite?
6
Complicated process structure
socket handling process
decode stage 1 decode stage 2 decode stage 3
decode stage 4 decode stage 5 decode stage 6
Pool of processes
Pool of processes
Pool of processes
Message queue hotspots
Data Pump
![Page 7: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/7.jpg)
New architecture
7
• Exploits tree structure of data shipped to clients
• Total number of nodes in the tree ~2.2million
• Approx 130K top level nodes just below root
![Page 8: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/8.jpg)
1st attempt
• One process to take a message off the socket
• Decode
• Update global-state ETS table
• Send message to one of 130K top level procs
8
![Page 9: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/9.jpg)
New architecture
9
socket handling process
+ decoding process
Data Pump
ETS
update handling process
update handling process
Approx. 130K procs update handling process
client socket handler
client socket handler
client socket handler
1 process per TCP connection
Global tree node data store
Result: message queue build up at source
![Page 10: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/10.jpg)
2nd attempt• One process to take a message off the
socket • Decoding 1st stage • Insert into ETS table • Another process pops message off ETS
table • Update ETS table • Send message to one of 130K top level
procs10
![Page 11: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/11.jpg)
New architecture
11
socket handling process ETS
decoding process
Data Pump
ETS
update handling process
update handling process
Approx. 130K procs update handling process
client socket handler
client socket handler
client socket handler
1 process per TCP connection
Global tree node data store
![Page 12: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/12.jpg)
Results
12
It worked well with 1.8 million nodes in the tree
BUT
throughput fell and latency increased for 2.2m nodes in the tree
![Page 13: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/13.jpg)
Garbage collection
• GC is per process
• Regular GC of a process is “generational”. Data that survives at least one GC will be put on the old heap
• When there is no space on the old heap, a “fullsweep” is performed
13
![Page 14: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/14.jpg)
GC continued
• The time taken to GC eats into the 2000 reductions available for a process
• So a “busy process” (with lots of incoming messages) with lots of garbage produced will quickly spiral out of control
14
![Page 15: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/15.jpg)
Reduction count penalty
• Sending message to a process with a “large” message queue
• Long GC • Proportional to size of message sent to another node • Proportional to size of binaries created • …
15
Nice to have Better documentation about reduction count
consumption and “penalties”
![Page 16: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/16.jpg)
Process priority flag• process_flag(priority, high) does NOT
mean more reductions are available
• It just means it gets scheduled before others, if it has messages in its queue
16
Nice to have Support for processes which have a dedicated
execution thread may be useful?
![Page 17: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/17.jpg)
Diagnosis
17
[root@publisher ~]# for x in $(seq 10000); \ do ps -eo ppid,pid,user,stat,pcpu,comm,wchan:32 | \ grep D | \ egrep -v "\-|WCHAN"; \ done;
2 263 root DN 0.0 khugepaged call_rwsem_down_write_failed 2 263 root DN 0.0 khugepaged call_rwsem_down_write_failed 2 263 root DN 0.0 khugepaged call_rwsem_down_write_failed 2 263 root DN 0.0 khugepaged call_rwsem_down_write_failed 2 263 root DN 0.0 khugepaged call_rwsem_down_write_failed 2 263 root DN 0.0 khugepaged call_rwsem_down_write_failed
-e : Select all processes -o : user defined format
ppid : parent process id pid : process id stat : process state pcpu : CPU utilisation comm : command name wchan : kernel function name where process is sleeping
![Page 18: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/18.jpg)
THP
• Transparent Huge Pages are a feature in 64-bit RHEL
• Enabled by default for all applications
• Abstraction over “huge page” support in Linux
• Most database vendors recommend turning this off
18
![Page 19: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/19.jpg)
What are huge pages?
• Typical size of page in memory is 4KB
• 20GB of memory equates to 5,242,880 pages and that many entries in the page table.
• RHEL kernel allows page sizes of 2MB and 1GB on x86_64
19
![Page 20: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/20.jpg)
How does THP work?
• Using the huge page feature requires explicit code change to the application (in this case ‘beam’)
• THP aims to make it “just happen”
• It also does some memory defragmentation which seems to be the root cause of the issue
20
![Page 21: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/21.jpg)
Disabling THP
21
echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag echo no > /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag
![Page 22: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/22.jpg)
Permanently disable THP
22
#!/bin/sh
if [[ $EUID -ne 0 ]]; then echo "This script must be run as root" 1>&2 exit 1 fi
for KERNEL in /boot/vmlinuz-*; do grubby --update-kernel="$KERNEL" --args='transparent_hugepage=never' done
source: http://unix.stackexchange.com/questions/99154/disable-transparent-hugepages
![Page 23: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/23.jpg)
Message passing
• Double edged sword – Easy to write reasonably complex code,
quickly – Enough rope to hang yourself easily
• Developed a process message queue with {active, once} semantics
• No improvement in performance, but no noticeable degradation either!
23
![Page 24: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/24.jpg)
Message queue
24
![Page 25: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/25.jpg)
Results
25
Mill
isec
onds
(log
sca
le)
100K clients 1x production load
![Page 26: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/26.jpg)
Results
26
Mill
isec
onds
100K clients 1x production load
![Page 27: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/27.jpg)
Results
27
Mill
isec
onds
(log
sca
le)
100K clients 4x production load
![Page 28: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/28.jpg)
Bottlenecks
28
socket handling process ETS
decoding process
Data Pump
ETS
update handling process
update handling process
update handling process
client socket handler
client socket handler
client socket handler
1 process per TCP connection
Global tree node data store
Queues build up here at very high
load
Some of these processes become
“hot”
![Page 29: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/29.jpg)
Work in progress
29
![Page 30: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/30.jpg)
Distributed Data Grid
30
• An mnesia like database
• Without the storage backend issues
• And without the “partitioned network” problems with mnesia
• Something similar to Infinispan
![Page 31: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/31.jpg)
Bounded message q
• Long term itch
• Seems fairly straightforward to achieve ➡ Additional option max_msg_q_length in
process_flag/2 BIF ➡ Modify process_info/2 BIF ➡ Kill process if message queue length
exceeds configured limit
31
![Page 32: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/32.jpg)
Deep packet inspection
• Analyse traffic patterns in real-time
• Proof of concept being worked on using Erlang
• Speed will be an issue here so an interesting challenge
32
![Page 33: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/33.jpg)
COME WORK WITH US!
33
![Page 34: Lessons learnt re-writing a PubSub system · About 3 • Founded in 2000 • Located in Stoke-on-Trent • The largest online sports betting company • Over 19 million customers](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6d7f3c65fcc501a21ef87/html5/thumbnails/34.jpg)
34