profiling jvm applications in production...improve jvm application performance on linux...
TRANSCRIPT
ProfilingJVMApplicationsinProduction
SashaGoldshteinCTO,Sela Group
@goldshtngithub.com/goldshtn
https://s.sashag.net/srecon0318
WorkshopIntroduction
• Mission:Applymodern,low-overhead,production-readytoolstomonitorandimproveJVMapplicationperformanceonLinux• Objectives:qIdentifyingoverloadedresourcesqProfilingforCPUbottlenecksqVisualizingandexploringstacktracesusingflamegraphsqRecordingsystemevents(I/O,network,GC,etc.)qProfilingforheapallocations
CourseIntroduction
• Targetaudience:Applicationdevelopers,systemadministrators,productionengineers• Prerequisites:UnderstandingofJVMfundamentals,experiencewithLinuxsystemadministration,familiaritywithOSconcepts• Labenvironment:EC2,deliveredthroughthebrowserduringthecoursedates• Coursehands-onlabs:https://github.com/goldshtn/linux-tracing-workshop
CoursePlan
• JVMandLinuxperformanceinformationsources• CPUsampling• Flamegraphsandsymbols• Lab:Profilingwithperfandasync-profiler• eBPF• BCCtools• Lab:Tracingfileopens• GCtracingandallocationprofiling• Lab:Allocationprofiling
TheLabEnvironment
• Followthelinkprovidedbytheinstructor• SignuporloginwithGoogle• Entertheclassroomtoken• Clickthebeaker-in-a-cloudicontogetyourownlabinstance• Waitfortheterminaltoinitialize
JVMandLinuxPerformanceSources
Kernel
PerformanceInformationSources
JVM
Syscall interface
Devicedrivers
CPU
Javaapplications
Systemlibraries
PMUeventsOtherdevices
BlockI/O Ethernet Scheduler Mem
Filesystem TCP/IP
Otherapplications
GC JIT
Classloader
USDT(dtrace)probes
mbeansJMX
JVMTIagentsServiceabilityAPI
+PrintCompilation
+PrintGC &other
JavaFlightRecorder
Tracepoints
kprobes
uprobes
Tracepoints
attachinterface(jcmd)
hsperf (jstat)
”softwareevents”
USEChecklistforLinuxSystemshttp://www.brendangregg.com/USEmethod/use-linux.html
CPU
Core Core
LLC
RAM
MemorycontrollerGFX I/O
controller SSD
E1000
FSB
PCIe
U:perf
U:mpstat -P 0S:vmstat 1
E:perf
U:sar -n DEV 1S:ifconfigE:ifconfig
U:free -mS:sar -BE:dmesg
U:iostat 1S:iostat -xz 1E:…/ioerr_cntU:perf
USEChecklistForJVMApplications
JavaThread
Heap
allocate
JNI Nativelibraries
Kernel
JavaThread
Thread Thread
allocate
GC JIT
syscalls
U:jmap,jhat,jstat
U:top,jstackE:JVMTI
U:JVMTI,USDT
U:+PrintCompilation,
jstat,USDT
U:+PrintGC,jstat,NMT,
USDT
⚠MindTheOverhead
• Anyobservationcanchangethestateofthesystem,butsomeobservationsareworsethanothers• Performancetoolshaveoverhead• Checkthedocs• Tryonatestsystemfirst• Measuredegradationintroducedbythetool
OVERHEADThis traces variouskernelpagecachefunctionsandmaintainsin-kernelcounts,whichareasynchronouslycopiedtouser-space.Whiletherateofoperationscanbeveryhigh(>1G/sec)wecanhaveupto34%overhead,thisisstillarelativelyefficientwaytotracetheseevents,andsotheoverheadisexpectedtobesmallfornormalworkloads. Measureinatestenvironment.
—man cachestat (fromBCC)
CPUSampling
Samplingvs.Tracing
• Sampling worksbygettingasnapshotoracallstackeveryNoccurrencesofaninterestingevent• Formostevents,implementedinthePMUusingoverflowcountersandinterrupts
• Tracing worksbygettingamessageoracallstackateveryoccurrenceofaninterestingevent
CPUtimepid 121 pid 121 pid 408 pid 188
systemtimepid 121 pid 408
CPUsample
diskwrite
JVMStackSampling
• TraditionalCPUprofilerssampleallthreadstacksperiodically(e.g.100timespersecond)• TypicallyusetheJVMTIGetAllStackTraces API• jstack,JVisualVM,YourKit,JProfiler,andalotofothers
Thread1running blocked runningGC
Thread2running blockedGC
Thread1blockedGC
sample samplesample
Safepoint Bias
• Samplesarecapturedonlyatsafepoints• ResearchEvaluatingTheAccuracyofJavaProfilers byMytkowicz,Diwan,Hauswirth,Sweeneyshowswildvarietyofresultsbetweenprofilersduetosafepoint bias• Additionally,capturingafullstacktraceforallthreadsisquiteexpensive(thinkSpring)
perf
• perf isaLinuxmulti-toolforperformanceinvestigations• Capableofbothtracingandsampling• Developedinthekerneltree,mustmatchrunningkernel’sversion
• Debian-based: apt install linux-tools-common• RedHat-based: yum install perf
RecordingCPUStacksWithperf
• TofindaCPUbottleneck,recordstacksattimedintervals:# system-wideperf record -ag -F 97# specific processperf record -p 188 -g -F 97# specific workloadperf record -g -F 97 -- ./myapp
Legend-a allCPUs-p specificprocess-- runworkloadandcaptureit-g capturecallstacks-F frequencyofsamples(Hz)-c #ofeventsineachsample
ASingleStack
# perf scriptparprimes 13393 248974.821897: 10309278 cpu-clock:
92b is_prime+0xffffffffff800035 (/…/parprimes)96c primes_loop+0xffffffffff800021 (/…/parprimes)9d4 primes_thread+0xffffffffff800020 (/…/parprimes)
75ca start_thread+0xffff011d4ae720ca (/…/libpthread-2.23.so)…# perf script | wc –l7214
StackReport
# perf report --stdio# Children Self Command Shared Object Symbol# ........ ........ ............ .................. .......................................#
72.02% 71.53% parprimes parprimes [.] is_prime| --71.53%--start_thread
primes_threadprimes_loopis_prime
...truncated
27.86% 0.00% dd [kernel.kallsyms] [k] vfs_read|---vfs_read
| --27.80%--__vfs_read
...truncated
FlameGraphsandMissingSymbols
Symbols
• perf needssymbolstodisplayfunctionnames(beyondmodulesandaddresses)• Forcompiledlanguages(C,Go,…)theseareoftenembeddedinthebinary• Orinstalledasseparatedebuginfo(usually/usr/lib/debug)
$ objdump -tT /usr/bin/bash | grep readline0000000000306bf8 g DO .bss 0000000000000004 Base rl_readline_state00000000000a46c0 g DF .text 00000000000001d4 Base readline_internal_char00000000000a3cc0 g DF .text 0000000000000126 Base readline_internal_setup0000000000078b80 g DF .text 0000000000000044 Base posix_readline_initialize00000000000a4de0 g DF .text 0000000000000081 Base readline00000000003062d0 g DO .bss 0000000000000004 Base bash_readline_initialized…
ReportWithoutSymbols
# perf report --stdio# Children Self Command Shared Object Symbol# ........ ........ ....... ................. .......................#
100.00% 0.00% hello hello [.] 0xffffffffffc0051d|---0x51d
||--54.91%--0x4f7||--27.97%--0x4eb||--8.73%--0x4e3|--7.97%--0x4ff
JavaAppReport
# perf report --stdio# Children Self Command Shared Object Symbol# ........ ........ ....... .................. ......................#
100.00% 0.00% java perf-2318.map [.] 0x00007f82b50004e7|---0x7f82b50004e7
||--8.15%--0x7f82b510d63e||--7.97%--0x7f82b510d6ca||--7.07%--0x7f82b510d6c2||--6.88%--0x7f82b510d686||--6.16%--0x7f82b510d68e
perf-PID.map Files
• Whensymbolsaremissinginthebinary,perfwilllookforafilenamed/tmp/perf-PID.map bydefault
$ cat /tmp/perf-1882.map7f2cd1108880 1e8 Ljava/lang/System;::arraycopy7f2cd1108c00 200 Ljava/lang/String;::hashCode7f2cd1109120 2e0 Ljava/lang/String;::indexOf7f2cd1109740 1c0 Ljava/lang/String;::charAt…7f2cd110ce80 120 LHello;::doStuff7f2cd110d280 140 LHello;::fidget7f2cd110d5c0 120 LHello;::fidget7f2cd110d8c0 120 LHello;::fidget…
GeneratingMapFiles
• ForinterpretedorJIT-compiledlanguages,mapfilesneedtobegeneratedatruntime• Java:perf-map-agentcreate-java-perf-map.sh $(pidof java)• ThisisaJVMTIagentthatattachesondemandtotheJavaprocess• Additionaloptionsincludedottedclass,unfoldall,sourcepos• Consider-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints formoreaccurateinlineinfo
• Otherruntimes:• Node: node --perf-basic-prof-only-functions app.js• Mono: mono --jitmap ...• .NETCore: export COMPlus_PerfMapEnabled=1
FixedReport;StillBroken
# perf report --stdio# Children Self Command Shared Object Symbol# ........ ........ ....... .................. ......................#
100.00% 0.00% java perf-3828.map [.] call_stub|---call_stub
LHello;::fidget…
WalkingStacks
• Tosuccessfullywalkstacks,perf requires* FPOtobedisabled• ThisisanoptimizationthatusesEBP/RBPasageneral-purposeregisterratherthanaframepointer
• C/C++: -fno-omit-frame-pointer• Java: -XX:+PreserveFramePointer sinceJava8u60
*Whendebuginformationispresent,perfcanuselibunwind andfigureoutFPO-enabledstacks,butnotfordynamiclanguages
FixedReport
# perf report --stdio# Children Self Command Shared Object Symbol# ........ ........ ....... .................. ......................#
100.00% 99.65% java perf-4005.map [.] LHello;::fidget|--99.65%--start_thread
JavaMainjni_CallStaticVoidMethodjni_invoke_staticJavaCalls::call_helpercall_stubLHello;::mainLHello;::doStuffLHello;::identifyWidgetLHello;::fidget
…
Real-WorldStackReports
# perf report --stdio | wc -l14823
FlameGraphs
• Avisualizationmethod(adjacencygraph),veryusefulforstacktraces,inventedbyBrendanGregg• http://www.brendangregg.com/flamegraphs.html
• Turns1000sofstacktracepagesintoasingleinteractivegraph• Examplescenarios:• IdentifyCPUhotspotsonthesystem/application• Showstacksthatperformheavydiskaccesses• Findthreadsthatblockforalongtimeandthestackwheretheydoit
ReadingaFlameGraph
• Eachrectangleisafunction• Y-axis:stackdepth• X-axis:sortedstacks(nottime)
• Widerframesaremorecommon• Supportszoom,find• Filterwithgrep😎
GeneratingaFlameGraph
$ git clone https://github.com/BrendanGregg/FlameGraph$ sudo perf record -F 97 -g -p `pidof java` -- sleep 10$ sudo perf script |
FlameGraph/stackcollapse-perf.pl |FlameGraph/flamegraph.pl > flame.svg
NotJustForMethods
• Forjustapackage-levelunderstandingofwhereyourtimegoes,usepkgsplit-perf.pl andgenerateapackage-levelflamegraph:
Fromhttp://www.brendangregg.com/blog/2017-06-30/package-flame-graph.html
Lab:CPUInvestigationWithperf AndFlameGraphs 💻
Problemswithperf
• OnlyJava8u60andlaterissupported(todisableFPO)• DisablingFPOhasasmallperformanceimpact(upto10%inpathologicalcases)• Symbolresolutionrequiresanadditionalagent• Interpreterframescan’tberesolved(shownas“Interpreter”)• Recompiledmethodscanbemisreported(appearmorethanonceintheperfmap)• Stackdepthisusuallylimitedto127(again,thinkSpring)• CanbeconfiguredsinceLinux4.8using/proc/sys/kernel/perf_event_max_stack
async-profiler
JVMTIAgents
• AJVMTI(JVMToolInterface)agentcanbeloadedwith-agentpathorattachedthroughtheJVMattachinterface• Examplesoffunctionality:• Tracethreadstartandstopevents• Countmonitorcontentionsandwaittimes• Aggregateclassloadandunloadinformation• Fulleventreference:http://docs.oracle.com/javase/8/docs/platform/jvmti/jvmti.html
AsyncGetCallTrace
• InternalAPIintroducedtosupportlightweightprofilinginOracleDeveloperStudio• Producesasinglethread’sstackwithoutwaitingforasafepoint• Designedtobecalledfromasignalhandler• UsedbyHonestProfiler(byRichardWarburtonandcontributors):https://github.com/jvm-profiling-tools/honest-profiler
async-profiler
• OpensourceprofilerbyAndreiPangin andcontributors:https://github.com/jvm-profiling-tools/async-profiler
kernel
perf_events
user
CPUPMU
Cyclicperfbuffer
JVMthread
JVMthread
Nativethread
perffd
libasyncProfiler.soinotify
signalAsyncGetCallTrace
samplestacksamplestack
Profilers,Compared
perf• Java≧8u60todisableFPO• DisablingFPOhasaperfpenalty• Needamapfile• Interpreterframesarenotsupported• System-wideprofilingispossible• Canprofilecontainersfromthehost(orfromasidecar)
async-profiler• WorksonolderJavaversions• FPOcanstayon• Nomapfileisrequired• Interpreterframesaresupported• Intheory,nativeandJavastacksdon’talwayssync• Profilingrunsin-process(so,in-container)
Lab:ProfilingWithasync-profiler 💻
eBPF
What’sWrongWithperf?
• perf reliesonpushingalotofdata touserspace,throughfiles,foranalysis• Downloadingafileat∼1Gb/sproduces∼89Knetif_receive_skb events/s(19MB/sincludingstacks)
kernel
e1000 netif_receive_skb
perf_events perf.data
user
perf | awk | …
monitoraverage packet size: 189 bytes
BPF:1990
• Invented byMcCanne andJacobsonatBerkeley,1990-1992:instructionset,representation,implementationofpacketfilters
$ tcpdump -d 'ip and dst 186.173.190.239'(000) ldh [12](001) jeq #0x800 jt 2 jf 5(002) ld [30](003) jeq #0xbaadbeef jt 4 jf 5(004) ret #262144(005) ret #0
BPF:Today
• Supportsawidespectrumofusages• HasaJITformaximumefficiency
kernel
BPFruntimeprobes
sockets
syscalls
user
controlprogram
BPFprogram
BPFprogram
BPFmap
controlprogramverifier&JIT
BPFcompiler
BPFTracingkernel
BPFruntimekprobes
tracepoints
user
controlprogramBPFprogram
map
applicationUSDT
uprobesperfoutput
①
① installsBPFprogramandattachestoevents
②
②eventsinvoketheBPFprogram
perf_events
②
③
③BPFprogramupdatesamaporpushesaneweventtoabuffersharedwithuser-space
③
④
④user-spaceprogramisinvokedwithdatafromthesharedbuffer
⑤
⑤user-spaceprogramreadsstatisticsfromthemapandclearsitifnecessary
controlprogram
BPFTracingFeaturesinTheLinuxKernel
Version Feature Scenarios4.1 kprobes/uprobes attach Dynamic tracingwithBPFbecomespossible4.1 bpf_trace_printk BPFprogramscanprintoutputtoftrace pipe4.3 perf_events output Efficienttracingoflargeamountsofdatafor
analysisinuser-space4.6 Stack traces Efficientaggregationof callstacksforprofiling
ortracing4.7 Tracepoints support APIstabilityfortracingprograms4.9 perf_events attach Low-overhead profilingandPMUsampling
16.04
24
16.10
25
TheOldWayAndTheNewWaykernel
VFS k{,ret}probe:vfs_read
perf_events perf.data
user
perf | awk | …
monitorLATμs # distribution0 - 1 … |@@@@ |1 – 2 … |@ |2 - 4 … |@@@@@@@@ |
kernel
VFS k{,ret}probe:vfs_read
BPFprogram BPFmap
user
controlprogram
monitorLATμs # distribution0 - 1 … |@@@@ |1 – 2 … |@ |2 - 4 … |@@@@@@@@ |
BCCPerformanceChecklist
TheBCCBPFFront-End
• https://github.com/iovisor/bcc• BPFCompilerCollection(BCC)isaBPFfrontendlibraryandamassivecollectionofperformancetools• ContributorsfromFacebook,PLUMgrid,Netflix,Sela
• HelpsbuildBPF-basedtoolsinhigh-levellanguages• Python,Lua,C++
kernel
user
BCCtool BCCtool …
BCCcompilerfrontend
Clang+LLVM
BCCloaderlibrary
BPFruntime eventsources
JVM
Syscall interface
BlockI/O EthernetScheduler Mem
Devicedrivers
Filesystem TCP/IP
CPUApplications
Systemlibraries
profilellcstat
hardirqssoftirqsttysnoop
runqlatcpudist
offcputimeoffwaketimecpuunclaimed
memleakoomkill
slabratetoptcptoptcplife
tcpconnecttcpaccept
biotopbiolatencybiosnoopbitesize
filetopfilelifefileslowervfscountvfsstatcachestatcachetopmountsnoop*fsslower*fsdistdcstatdcsnoopmdflush
execsnoopopensnoopkillsnoopstatsnoopsyncsnoopsetuidsnoop
mysqld_qslowerbashreadlinedbslowerdbstat
mysqlsniff
memleaksslsniff
gethostlatencydeadlock_detector
ustatugc
ucalls
uthreadsuobjnewuflow
argdisttrace
funccountfunclatencystackcount
BCCLinuxPerformanceChecklist
1. execsnoop2. opensnoop3. ext4slower
(orbtrfs*,xfs*,zfs*)4. biolatency5. biosnoop6. cachestat7. tcpconnect
8. tcpaccept9. tcptop10.gethostlatency11.cpudist12.runqlat13.profile
SomeBCCTools
# ext4slower 1Tracing ext4 operations slower than 1 msTIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME06:49:17 bash 3616 R 128 0 7.75 cksum06:49:17 cksum 3616 R 39552 0 1.34 [06:49:17 cksum 3616 R 96 0 5.36 2to3-2.706:49:17 cksum 3616 R 96 0 14.94 2to3-3.4^C# execsnoopPCOMM PID RET ARGSbash 15887 0 /usr/bin/man lspreconv 15894 0 /usr/bin/preconv -e UTF-8man 15896 0 /usr/bin/tblman 15897 0 /usr/bin/nroff -mandoc -rLL=169n -rLT=169n -Tutf8^C
SomeBCCTools
# runqlat -p `pidof java` 10 1Tracing run queue latency... Hit Ctrl-C to end.
usecs : count distribution0 -> 1 : 11 |* |2 -> 3 : 7 | |4 -> 7 : 133 |****************** |8 -> 15 : 288 |****************************************|16 -> 31 : 205 |**************************** |32 -> 63 : 38 |***** |64 -> 127 : 11 |* |128 -> 255 : 5 | |256 -> 511 : 3 | |512 -> 1023 : 1 | |1024 -> 2047 : 3 | |2048 -> 4095 : 0 | |4096 -> 8191 : 3 | |
BCC’sprofile Tool
# profile 10 -F 97 -K # kernel stacks only…
ffffffffa4818691 __lock_text_startffffffffa45b0341 ata_scsi_queuecmdffffffffa458813d scsi_dispatch_cmdffffffffa458b021 scsi_request_fnffffffffa43be643 __blk_run_queueffffffffa43c3bc1 blk_queue_bioffffffffa43c1cf2 generic_make_requestffffffffa43c1e4d submit_bioffffffffa43b825d submit_bio_waitffffffffa43c5c65 blkdev_issue_flushffffffffa4309b4d ext4_sync_fsffffffffa428b260 sync_fs_one_sbffffffffa425a553 iterate_supersffffffffa428b374 sys_syncffffffffa4003c17 do_syscall_64ffffffffa4818bab return_from_SYSCALL_64- stress (3303)
14
BCC’sprofile Toolkernel
PMUcpu-clocks
perf_events perf.data
userperf script | fold
| flamegraph
monitor
kernel
BPFprogramBPFmap
userprofile –f | flamegraph
monitor
BPFstacks
PMUcpu-clocks
Lab:SnoopingFileOpens 💻
General-PurposeBCCTools
TracingSourcesForBCCTools
kernel
BPFprogram
user
application
USDThotspot:class_loaded
application
uprobesmysqld:…mysql_parse…
kprobestcp_sendmsg
tracepointssched:sched_switch
perf_eventscpu-clocks
USDTProbesin(Some)High-LevelLanguages
OpenJDKhotspot:gc_begin
hotspot:thread_starthotspot:method_entry
OOTB
notsupported
buildflag
OracleJDKNode.js
node:http_server_requestnode:http_client_request
node:gc_begin
Pythonpython:function_entrypython:function_return
python:gc_start
Rubyruby:method_entryruby:object_createruby:load_entry
libc/libpthreadlibc:memory_malloc_retrylibpthread:pthread_startlibpthread:mutex_acquired
MySQLmysql:query_start
mysql:connection_startmysql:query_parse_start
PHPphp:request_startupphp:function_entry
php:error
USDTProbesandUprobesintheJVM
• OpenJDK Hotspothasalargenumberofstatic(USDT)probesinvarioussubsystems;displaywithtplist orreadelf:$ tplist -p $(pidof java) | grep 'hotspot.*gc'.../libjvm.so hotspot:mem__pool__gc__begin.../libjvm.so hotspot:mem__pool__gc__end.../libjvm.so hotspot:gc__begin.../libjvm.so hotspot:gc__end
• AllJVMnativemethodscanbeusedwithdynamicprobes;discoverwithobjdump ornm:$ nm -C $(find /usr/lib/debug -name libjvm.so.debug)
| grep 'card.*table'0000000000854751 t PSScavenge::card_table()00000000016dd778 b PSScavenge::_card_table...
BCCtrace
• trace isamulti-purposeloggingtool;thinkofitasadynamiclogatarbitrarylocationsinthesystem(canalsoprintcallstacks)
# trace 'SyS_write (arg3 > 100000) "large write: %d bytes", arg3'PID TID COMM FUNC -9353 9353 dd SyS_write large write: 1048576 bytes9353 9353 dd SyS_write large write: 1048576 bytes9353 9353 dd SyS_write large write: 1048576 bytes^C# trace 'r:/usr/bin/bash:readline "%s", retval'TIME PID COMM FUNC -02:02:26 3711 bash readline ls –la02:02:36 3711 bash readline wc -l src.c^C
BCCfunccount/stackcount
• funccount countsthenumberofinvocationsofaparticularmethod,whilestackcount alsoaggregatesthecallstacks
# LIBJVM=$(find /usr/lib -name libjvm.so)# funccount -p $(pidof java) "$LIBJVM:*do_collection*"Tracing 5 functions for ".../libjvm.so:*do_collection*"... Hit Ctrl-C to end.^CFUNC COUNT_ZN16GenCollectedHeap13do_collectionEbbmbi 848Detaching...
Lab:TracingDatabaseAccesses 💻
HeapAllocationProfiling
ApproachesforAllocationProfiling
• AllocationprofilingcanhelpreduceGCpressureandpausetimes• Tracingeachobjectallocationisextremelyexpensive,though• Use-XX:+ExtendedDTraceProbes andsamplehotspot:object__alloc probes (expectasignificantoverhead)• TraceHotspotallocationtracingcallbacksdesignedforJFR• send_allocation_in_new_tlab_event:whenanewTLABisallocatedforathreadbecausetheoldonewasexhausted• send_allocation_outside_tlab_event:whenanobjectisallocatedoutsideaTLAB(e.g.becauseit’stoobig,orbecausetheTLABisexhausted)
async-profiler
• Whenusedwiththeheapmode,instrumentstheJFRTLABallocationeventsandreportsobjectsallocatedandstacksamples• RequiresJDKdebuginfo tobeinstalled(tofindtherelevantsymbols)
$ ./profiler.sh -d 10 -e alloc -o summary,flat `pidof java`HEAP profiling started...696470120 (75.33%) [C226075184 (24.45%) [B
425600 (0.05%) [Ljava/util/HashMap$Node;193592 (0.02%) com/sun/org/apache/xerces/internal/dom/ElementImpl185536 (0.02%) com/sun/org/apache/xml/internal/serializer/NamespaceMappings$MappingRecord
162176 (0.02%) java/util/Stack
BCCToolsWithExtendedProbes
# funccount -p `pidof java` u:$LIBJVM:object__allocTracing 1 functions for "u:.../libjvm.so:object__alloc"... Hit Ctrl-C to end.FUNC COUNTobject__alloc 4000987Detaching...
# argdist -p `pidof java` -C "u:$LIBJVM:object__alloc():char*:arg2"605018 arg2 = java/lang/String609801 arg2 = java/util/HashMap$Nod908716 arg2 = com/sun/org/apache/xml/internal/serializer/NamespaceMappings$MappingRecord
908778 arg2 = java/util/Stack909348 arg2 = [Ljava/lang/Object;910097 arg2 = [C
grav
• CollectionofperformancevisualizationtoolsbyMarkPriceandAmirLanger:https://github.com/epickrram/grav• IncludesaPythonwrapperontopofobject__alloc probeswithsamplingsupport,flamegraphgeneration,andfilteringspecifictypes
$ sudo python src/heap/heap_profile.py -p `pidof java` -d 10 > alloc.stacks$ FlameGraph/flamegraph.pl < alloc.stacks > alloc.svg
Lab:ExcessiveGCAndAllocationProfiling 💻
CourseWrap-Up
ObjectivesReview
• Mission:Applymodern,low-overhead,production-readytoolstomonitorandimproveJVMapplicationperformanceonLinux• Objectives:üIdentifyingoverloadedresourcesüProfilingforCPUbottlenecksüVisualizingandexploringstacktracesusingflamegraphsüRecordingsystemevents(I/O,network,GC,etc.)üProfilingforheapallocations
References• JVMobservabilitytools
• http://openjdk.java.net/groups/hotspot/docs/Serviceability.html
• http://docs.oracle.com/javase/8/docs/platform/jvmti/jvmti.html
• http://cr.openjdk.java.net/~minqi/6830717/raw_files/new/agent/doc/index.html
• https://docs.oracle.com/javase/8/docs/technotes/guides/management/jconsole.html
• perf andflamegraphs• https://perf.wiki.kernel.org/index.php/Main_Page
• http://www.brendangregg.com/flamegraphs.html
• AGCTprofilers• https://github.com/jvm-profiling-tools/async-profiler
• https://github.com/jvm-profiling-tools/honest-profiler
• BCCandBPF• https://github.com/iovisor/bcc/blob/master/docs/tutorial.md
• http://www.brendangregg.com/ebpf.html• http://blogs.microsoft.co.il/sasha/2016/03/31/probing-the-jvm-with-bpfbcc/
• http://blogs.microsoft.co.il/sasha/2016/03/30/usdt-probe-support-in-bpfbcc/
• ContainersandJVM• https://blog.csanchez.org/2017/05/31/running-a-jvm-in-a-container-without-getting-killed/
• http://www.brendangregg.com/blog/2017-05-15/container-performance-analysis-dockercon-2017.html
• http://batey.info/docker-jvm-flamegraphs.html
Questions?
SashaGoldshteinCTO,Sela Group
@goldshtngithub.com/goldshtn