no bid left behind
TRANSCRIPT
No bid left behind My day to day handling a resilient real time bidding platform in a JVM environment.
Marc de Palol Trovit
Hey hi,
• Studied here (good to be back)
• Some research on supercomputing
• Moved to London, discovered Hadoop & intensive data systems.
• Came back, still in the ‘Data Engineering’ stuff.
A classified search engine for property, jobs, cars, products and holiday rentals
• 180 Million ads,
• 170 Tb in the cluster
• 65 Million uniques / 170 Million visits
• 10 apps (iOS, Android)
• Cool office in Barcelona.
have a look at http://www.trovit.es
Real Time Bidding
It’s about selling ads.
• Per impression basis.
• Programmatic instantaneous auction
We are using ‘DoubleClick Ad Exchange’ (Google)
• Response under 100 ms.
• If 15% of our responses are invalid or timed out, we stop getting bid requests progressively
This system, literally, spends money. So, it must be rock solid.
Our system is coded carefully, with love and tests.
• Logging with ‘mailAppender’
log4j.appender.mail=org.apache.log4j.net.SMTPAppender log4j.appender.mail.SMTPHost=localhost log4j.appender.mail.From=Error <[email protected]> [email protected], [email protected] log4j.appender.mail.Subject=[ERROR] WE ARE GOING TO DIE log4j.appender.mail.layout=org.apache.log4j.PatternLayout log4j.appender.mail.threshold=ERROR
• Logging with ‘mailAppender’
Probably, no e-mail when you’ve got an OOM.
log4j.appender.mail=org.apache.log4j.net.SMTPAppender log4j.appender.mail.SMTPHost=localhost log4j.appender.mail.From=Error <[email protected]> [email protected], [email protected] log4j.appender.mail.Subject=[ERROR] WE ARE GOING TO DIE log4j.appender.mail.layout=org.apache.log4j.PatternLayout log4j.appender.mail.threshold=ERROR
Let’s talk about OOM for a minute.
ps ax | grep java
JVMOpts=“- XX:OnOutOfMemoryError= /usr/local/bin/slack-msg.sh"
🚫
👍
Some cool ideas for improving memory usage
• byte[] serialization in objects ❗
• Varying Memory Conditions ❗
• Logging with ‘mailAppender’
• Bad when OOM.
• Heartbeat
• Doing some real work
• Supervision with actors
• If you’re using Akka
• control flow != data flow
Data errors.
Roll back (when possible)
• Keeping different versions in the DB.
• Keep the old version around.
• Know how to do a rollback.
Data errors.
Roll back (when possible)
• Keeping different versions in the DB.
• Keep the old version around.
• Know how to do a rollback.
Checks & Asserts with google guava.
checkArgument(i >= 0, "Argument was %s but expected nonnegative", i);
checkArgument(i < j, "Expected i < j, but %s > %s", i, j);
checkNotNull(myList, "List should not be null")
checkState(object.isValid(), "Object is not valid")
System errors
These happen mostly between system integrations.
• Your code and the DB.
• Your code and the 3rd party library.
• Your code and the queue.
DBs, a necessary supervillain
• Lost connection.
• Timeouts
• Can give you corrupted data.
• Can give you 0 data.
• Can give you too much data.
Once the circuit breaker is open,
• Notify
• Try again! maybe.
• Try to avoid DOS your own system.
• Exponential retry.
• Failover
• Restart
Some other bits and pieces:
• Tight coupling leads to fast propagation of errors.
• Event driven stuff
• Complete parameter checking
• Avoid SPF’s. Pretty please.
• Stateless is better.
• Bounded queues!