living with garbage

93
Senior Software Engineer LIVING WITH GARBAGE! Gregg Donovan etsy.com

Upload: gregg-donovan

Post on 19-Jun-2015

1.185 views

Category:

Technology


2 download

DESCRIPTION

"Living With Garbage" talk by Gregg Donovan at the NYC Search and Discovery Meetup on 12/12/2013.

TRANSCRIPT

Page 1: Living With Garbage

Senior Software Engineer

LIVING WITH GARBAGE!

Gregg Donovan

etsy.com

Page 2: Living With Garbage

4 Years Solr & Lucene at etsy.com

3 years Solr & Lucene at TheLadders.com

Page 3: Living With Garbage
Page 4: Living With Garbage

10+ million members

Page 5: Living With Garbage

24+ million items

Page 6: Living With Garbage

1mm+ active sellers

Page 7: Living With Garbage

10+ billion pageviews per month

Page 8: Living With Garbage
Page 9: Living With Garbage
Page 10: Living With Garbage
Page 11: Living With Garbage
Page 12: Living With Garbage
Page 13: Living With Garbage
Page 14: Living With Garbage
Page 15: Living With Garbage

CodeAsCraft.etsy.com

Page 16: Living With Garbage

Understanding GCMonitoring GC

Debugging Memory LeaksDesign for Partial Availability

Page 17: Living With Garbage
Page 18: Living With Garbage

public class BuzzwordDetector { static String[] prefixes = { "synergy", "win-win" }; static String[] myArgs = { "clown synergy", "gorilla win-wins", "whamee" }; ! public static void main(String[] args) { args = myArgs; ! int buzzwords = 0; for (int i = 0; i < args.length; i++) { String lc = args[i].toLowerCase(); for (int j = 0; j < prefixes.length; j++) { if (lc.contains(prefixes[j])) { buzzwords++; } } } System.out.println("Found " + buzzwords + " buzzwords"); } }

Page 19: Living With Garbage

New(): ref <- allocate() if ref = null /* Heap is full */ collect() ref <- allocate() if ref = null /* Heap is still full */ error "Out of memory" return ref atomic collect(): markFromRoots() sweep(HeapStart, HeapEnd)

From Garbage Collection Handbook

Page 20: Living With Garbage

markFromRoots(): initialise(worklist) for each fld in Roots ref <- *fld if ref != null && not isMarked(ref) setMarked(ref) add(worklist, ref) mark() initialise(worklist): worklist <- empty mark(): while not isEmpty(worklist) ref <- remove(worklist) /* ref is marked */ for each fld in Pointers(ref) child <- *fld if (child != null && not isMarked(child) setMarked(child) add(worklist, child)

From Garbage Collection Handbook

Page 21: Living With Garbage

Trivia: Who invented the first GC and Mark-and-Sweep?

Page 22: Living With Garbage

Weak Generational Hypothesis

Page 23: Living With Garbage

Where do objects in common Solr application live?

AtomicReaderContext?

SolrIndexSearcher?

SolrRequest?

Page 24: Living With Garbage

GC Terminology: Concurrent vs Parallel

Page 25: Living With Garbage

JVM Collectors

Page 26: Living With Garbage

Serial

Page 27: Living With Garbage

Trivia: How does System.identityHashCode() work?

Page 28: Living With Garbage

Throughput

Page 29: Living With Garbage

CMS

Page 30: Living With Garbage

Garbage First (G1)

Page 31: Living With Garbage

Continuously Concurrent Compacting Collector (C4)

Page 32: Living With Garbage

IBM, Dalvik, etc.?

Page 33: Living With Garbage

Why Throughput?

Page 34: Living With Garbage

Questions so far?

Page 35: Living With Garbage

Monitoring

Page 36: Living With Garbage

GC time per Solr request

Page 37: Living With Garbage

... import java.lang.management.*; ... ! public static long getCollectionTime() { long collectionTime = 0; for (GarbageCollectorMXBean mbean : ManagementFactory.getGarbageCollectorMXBeans()) { collectionTime += mbean.getCollectionTime(); } return collectionTime; }

Available via JMX

Page 38: Living With Garbage

Visual GC

Page 39: Living With Garbage
Page 40: Living With Garbage

export GC_DEBUG="-verbose:gc \ -XX:+PrintGCDateStamps \ -XX:+PrintHeapAtGC \ -XX:+PrintGCApplicationStoppedTime \ -XX:+PrintGCApplicationConcurrentTime \ -XX:+PrintAdaptiveSizePolicy \ -XX:AdaptiveSizePolicyOutputInterval=1 \ -XX:+PrintTenuringDistribution \ -XX:+PrintGCDetails \ -XX:+PrintCommandLineFlags \ -XX:+PrintSafepointStatistics \ -Xloggc:/var/log/search/gc.log"

Page 41: Living With Garbage

2013-04-08T20:14:00.162+0000: 4197.791: [Full GCAdaptiveSizeStart: 4206.559 collection: 213 PSAdaptiveSizePolicy::compute_generation_free_space limits: desired_promo_size: 9927789154 promo_limit: 8321564672 free_in_old_gen: 4096 max_old_gen_size: 22190686208 avg_old_live: 22190682112 AdaptiveSizePolicy::compute_generation_free_space limits: desired_eden_size: 9712028790 old_eden_size: 8321564672 eden_limit: 8321564672 cur_eden: 8321564672 max_eden_size: 8321564672 avg_young_live: 7340911616 AdaptiveSizePolicy::compute_generation_free_space: gc time limit gc_cost: 1.000000 GCTimeLimit: 98 PSAdaptiveSizePolicy::compute_generation_free_space: costs minor_time: 0.167092 major_cost: 0.965075 mutator_cost: 0.000000 throughput_goal: 0.990000 live_space: 29859940352 free_space: 16643129344 old_promo_size: 8321564672 old_eden_size: 8321564672 desired_promo_size: 8321564672 desired_eden_size: 8321564672 AdaptiveSizeStop: collection: 213 [PSYoungGen: 8126528K->7599356K(9480896K)] [ParOldGen: 21670588K->21670588K(21670592K)] 29797116K->29269944K(31151488K) [PSPermGen: 58516K->58512K(65536K)], 8.7690670 secs] [Times: user=137.36 sys=0.03, real=8.77 secs] Heap after GC invocations=213 (full 210): PSYoungGen total 9480896K, used 7599356K [0x00007fee47ab0000, 0x00007ff0dd000000, 0x00007ff0dd000000) eden space 8126528K, 93% used [0x00007fee47ab0000,0x00007ff0177ef080,0x00007ff037ac0000) from space 1354368K, 0% used [0x00007ff037ac0000,0x00007ff037ac0000,0x00007ff08a560000) to space 1354368K, 0% used [0x00007ff08a560000,0x00007ff08a560000,0x00007ff0dd000000) ParOldGen total 21670592K, used 21670588K [0x00007fe91d000000, 0x00007fee47ab0000, 0x00007fee47ab0000) object space 21670592K, 99% used [0x00007fe91d000000,0x00007fee47aaf0e0,0x00007fee47ab0000) PSPermGen total 65536K, used 58512K [0x00007fe915000000, 0x00007fe919000000, 0x00007fe91d000000) object space 65536K, 89% used [0x00007fe915000000,0x00007fe918924130,0x00007fe919000000) }

Page 42: Living With Garbage

GC Log Analyzers?

GCHisto

GCViewer

garbagecat

Page 43: Living With Garbage

Graphing with Logster

github.com/etsy/logster

Page 44: Living With Garbage
Page 45: Living With Garbage

GC Dashboardgithub.com/etsy/dashboard

Page 46: Living With Garbage
Page 47: Living With Garbage

YourKit.com

Page 48: Living With Garbage

Designing for Partial Availability

Page 49: Living With Garbage

JVMTI GC Hook?

Page 50: Living With Garbage

How can a client ignore GC-ing hosts?

Page 51: Living With Garbage

Server lies to clients about availability

TCP socket receive buffer

TCP write buffer

Page 52: Living With Garbage

“Banner” protocol1. Connect via TCP

2. Wait ~1-10ms

3. Either receive magic four byte header or try another host

4. Only send query after receiving header from server

Page 53: Living With Garbage

0xC0DEA5CF

Page 54: Living With Garbage

What if GC happens mid-request?

Page 55: Living With Garbage

Backup requests

Page 56: Living With Garbage

Jeff Dean: Achieving Rapid Response Time in Large

Online Services

Page 57: Living With Garbage

Solr sharding?

Right now, only as fast as the slowest shard.

Page 58: Living With Garbage

“Make a reliable whole out of unreliable parts.”

Page 59: Living With Garbage

Memory Leaks

Page 60: Living With Garbage

Solr API hooks for custom code

QParserPlugin SearchComponent

SolrRequestHandler SolrEventListenerQParserPlugin

SolrCache ValueSourceParser

etc.FieldType

Page 61: Living With Garbage

QParserPluginPSA: Are you sure you need custom code?

Page 62: Living With Garbage

CoreContainer#getCore()

RefCounted<SolrIndexSearcher>

Page 63: Living With Garbage

SolrIndexSearcher generation marking with YourKit triggers

Page 64: Living With Garbage
Page 65: Living With Garbage

Questions so far?

Page 66: Living With Garbage

Miscellaneous Topics

Page 67: Living With Garbage

System.gc()?

Page 68: Living With Garbage

-XX:+UseCompressedOops

Page 69: Living With Garbage

-XX:+UseNUMA

Page 70: Living With Garbage

Paging

Page 71: Living With Garbage

#!/usr/bin/env bash !# This script is designed to be run every minute by cron. !host=$(hostname -s) !psout=$(ps h -p `cat /var/run/etsy-search.pid` -o min_flt,maj_flt 2>/dev/null) min_flt=$(echo $psout | awk '{print $1}') # minor page faults maj_flt=$(echo $psout | awk '{print $2}') # major page faults !epoch_s=$(date +%s) !echo -e "search_memstats.$host.etsy-search.min_flt\t${min_flt:-0}\t$epoch_s" | nc graphite.etsycorp.com 2003 echo -e "search_memstats.$host.etsy-search.maj_flt\t${maj_flt:-0}\t$epoch_s" | nc graphite.etsycorp.com 2003

Page 72: Living With Garbage

Solution 1: Buy more RAM

Ideally enough RAM to: Keep index in OS file buffers AND ensure no paging of VM memory AND whatever else happens on the box

~$5-10/GB

Page 73: Living With Garbage

echo “0” > /proc/sys/vm/swappiness

Page 74: Living With Garbage

mlock()/mlockall()github.com/LucidWorks/mlockall-agent

Page 75: Living With Garbage

echo “-17” > /proc/$PID/oom_adj

Mercy from the OOM Killer

Page 76: Living With Garbage

Huge Pages

Page 77: Living With Garbage

-XX:+AlwaysPreTouch

Page 78: Living With Garbage

Possible Future Directions

Page 79: Living With Garbage

Many small VMs instead of one large VM

microsharding

Page 80: Living With Garbage

In-memory Lucene codecs

I.e. custom DirectPostingsFormat

Page 81: Living With Garbage

Off-heap memory with sun.misc.Unsafe?

Page 82: Living With Garbage

Try G1 again

Page 83: Living With Garbage

Try C4 again

Page 84: Living With Garbage

Resources

Page 85: Living With Garbage

gchandbook.org

Page 86: Living With Garbage
Page 87: Living With Garbage

bit.ly/mmgcb

Mark Miller’s GC Bootcamp

Page 88: Living With Garbage

bit.ly/giltene

Gil Tene: Understanding Java Garbage Collection

Page 89: Living With Garbage

bit.ly/cpumemory

Ulrich Drepper: What Every Programmer Should Know About Memory

Page 90: Living With Garbage

github.com/pingtimeout/jvm-options

Page 91: Living With Garbage

Read the JVM Source(Not as scary as it sounds.)

hg.openjdk.java.net/jdk7/jdk7

Page 92: Living With Garbage

Mechanical Sympathy Google Group

bit.ly/mechsym

Page 93: Living With Garbage

Questions?

Thanks for coming!

[email protected]