node interactive debugging node.js in production

Post on 15-Apr-2017

5.333 Views

Category:

Technology

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Debugging Node.js in ProductionYunong Xiao

@yunongx Software Engineer

Node Platform

Node.js @ Netflix

❖ 65+ Million Subscribers❖ Website (netflix.com)❖ Dynamic asset packager❖ PaaS on Node❖ Internal Services

–Gene Kranz, Flight Director, Apollo 13

“Let's work the problem, people. Let's not make things any worse by guessing”

Apply the Scientific Method

1. Construct a Hypothesis

2. Collect data

3. Analyze data and draw a conclusion

4. Repeat

Production Crisis

❖ Runtime Performance

❖ Runtime Crashes

❖ Memory Leaks

Netflix is “Slow”

Gather Request Data

http://restify.comhttp://github.com/restify/node-restify

Observable REST Framework

to the Rescue[2014-12-09T14:07:26.293Z] INFO: shakti/restify-audit/20067: handled: 200, latency=1402 (req_id=b3fa3820-7fac-11e4-8908-a5c7b70d676f, latency=1435) GET / HTTP/1.1 host: www.netflix.com -- HTTP/1.1 200 OK x-netflix.client.instance: i-057e47ef x-frame-options: DENY content-type: text/html -- req.timers: { "parseBody": 700123, "apiRpc": 701911, "render": 400031 }

req.timers: { "parseBody": 700123, “apiRPC”: 301911, "render": 400031,}

On CPU

CPU is Critical

❖ Node is essentially “single threaded”

❖ Cascading effect on ALL requests in process

req.timers: { "parseBody": 700123, “apiRPC”: 301911, "render": 400031,}

Can’t process ANY other request for 1.1 seconds

On CPU

How Much Code?

$ find . -name "*.js*" | xargs cat | wc -l

6 042 301

Statistically Sample Stack Traces

Snapshot What’s Currently Executing

Stacktrace: A stack trace is a report of the active stack frames at a certain point in time during the execution of a program.

> console.log(ex, ex.stack.split("\n"))ReferenceError: ex is not defined at repl:1:13 at REPLServer.defaultEval (repl.js:132:27) at bound (domain.js:254:14) at REPLServer.runBound [as eval] (domain.js:267:12) at REPLServer.<anonymous> (repl.js:279:12) at REPLServer.emit (events.js:107:17) at REPLServer.Interface._onLine (readline.js:214:10) at REPLServer.Interface._line (readline.js:553:8) at REPLServer.Interface._ttyWrite (readline.js:830:14) at ReadStream.onkeypress (readline.js:109:10)

Two Problems 1) How to sample stack traces from a running

process? 2) How to do 1) without affecting the process?

Linux Perf EventsPERF(1) perf Manual PERF(1)

NAME perf - Performance analysis tools for Linux

SYNOPSIS perf [--version] [--help] COMMAND [ARGS]

DESCRIPTION Performance counters for Linux are a new kernel-based subsystem that provide a framework for all things performance analysis. It covers hardware level (CPU/PMU, Performance Monitoring Unit) features and software features (software counters, tracepoints) as well.

Sample Stack Traces w/ perf(1)

# perf record -F 99 -p `pgrep -n node` -g -- sleep 30[ perf record: Woken up 2 times to write data ][ perf record: Captured and wrote 0.524 MB perf.data (~22912 samples) ]

Sample Stack Traceab2fee v8::internal::Heap::DeoptMarkedAllocationSites() (/apps/node/bin/a69754 v8::internal::StackGuard::HandleInterrupts() (/apps/node/bin/node)c9f13b v8::internal::Runtime_StackGuard(int, v8::internal::Object**3c793e3060bb (/tmp/perf-5382.map)3c793e3060bb (/tmp/perf-5382.map)3c793e3060bb (/tmp/perf-5382.map)3c793e3060bb (/tmp/perf-5382.map) (repeated 30 more lines)8e6b2f v8::Function::Call(v8::Local<v8::Context>, v8::Local<v8::Value>, int, v8::Local<v8::Value>*) (/apps/node/bin/node)8f2281 v8::Function::Call(v8::Local<v8::Value>, int, v8::Local<v8::Value>*) (/apps/node/bin/node)df599a node::MakeCallback(node::Environment*, v8::Local<v8::Value>,...df5ccb node::CheckImmediate(uv_check_s*) (/apps/node/bin/node)fb1597 uv__run_check (/apps/node/bin/node)fabcee uv_run (/apps/node/bin/node)dfaa50 node::Start(int, char**) (/apps/node/bin/node)7fcc3ef6876d __libc_start_main (/lib/x86_64-linux-gnu/libc-2.15.so)

Missing JS Frames

Why? v8 places symbols JIT(Just in Time)

node --perf_basic_prof_only_functions

“outputs the files in a format that the existing perf toolcan consume.”

node --perf_basic_prof_only_functions

Available right now in Node v5.x

Coming soon to Node v4.3:https://github.com/nodejs/node/pull/3609

Resultsnode 5382 cpu-clock: 3c793e38b0c1 LazyCompile:DELETE native runtime.js:349 (/tmp/perf-5382.map) 3c793e31981d Builtin:JSConstructStubGeneric (/tmp/perf-5382.map) 3c793ff2ca94 (/tmp/perf-5382.map) 3c793e98a10f LazyCompile:~AtlasClient._run /apps/node/webapp/node_modules/nf-atlas-client/lib/client/AtlasClient.js:85 (/tmp/perf-5382.map) 3c793f47de29 LazyCompile:*AtlasClient.timer /apps/node/webapp/node_modules/nf-atlas-client/lib/client/AtlasClient.js:70 (/tmp/perf-5382.map) 3c793e9eee38 LazyCompile:~fetchSingleGetCallback /apps/node/webapp/singletons/ShaktiFetcher.js:120 (/tmp/perf-5382.map) 3c793f6cffee LazyCompile:*Model.get /apps/node/webapp/node_modules/nf-models/lib/Model.js:90 (/tmp/perf-5382.map) 3c793ed3e2ad (/tmp/perf-5382.map) 3c7940e4357b Handler:ca (/tmp/perf-5382.map) 3c793f060e3c Function:~ /apps/node/webapp/node_modules/vasync/lib/vasync.js:134 (/tmp/perf-5382.map) 3c79404edbfa (/tmp/perf-5382.map) 3c79401fd3f7 (/tmp/perf-5382.map) 3c79400e307b LazyCompile:*fetchMulti /apps/node/webapp/singletons/ShaktiFetcher.js:50 (/tmp/perf-5382.map) 3c793fb9a59f LazyCompile:*fetch /apps/node/webapp/singletons/ShaktiFetcher.js:32 (/tmp/perf-5382.map) 3c793e896697 (/tmp/perf-5382.map) 3c7943aaabbe (/tmp/perf-5382.map) 3c793ef4c53c Function:~ /apps/node/webapp/node_modules/vasync/lib/vasync.js:245 (/tmp/perf-5382.map) 3c793eaf4f01 LazyCompile:* /apps/node/webapp/node_modules/nf-packager/lib/index.js:194 (/tmp/perf-5382.map) 3c793eab130a LazyCompile:processImmediate timers.js:352 (/tmp/perf-5382.map) 3c793e319f7d Builtin:JSEntryTrampoline (/tmp/perf-5382.map) 3c793e3189e2 Stub:JSEntryStub (/tmp/perf-5382.map) a65baf v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*, bool) (/apps/node/bin/node) 8e6b2f v8::Function::Call(v8::Local<v8::Context>, v8::Local<v8::Value>, int, v8::Local<v8::Value>*) (/apps/node/bin/node) 8f2281 v8::Function::Call(v8::Local<v8::Value>, int, v8::Local<v8::Value>*) (/apps/node/bin/node) df599a node::MakeCallback(node::Environment*, v8::Local<v8::Value>, v8::Local<v8::Function>, int, v8::Local<v8::Value>*) (/apps/node/bin/node) df5ccb node::CheckImmediate(uv_check_s*) (/apps/node/bin/node) fb1597 uv__run_check (/apps/node/bin/node) fabcee uv_run (/apps/node/bin/node) dfaa50 node::Start(int, char**) (/apps/node/bin/node) 7fcc3ef6876d __libc_start_main (/lib/x86_64-linux-gnu/libc-2.15.so))

JS Frames

Native Frames

Problem: Too Many Traces

$ cat out.nodestacks01 | grep cpu-clock | wc -l

744$ wc -l out.nodestacks01

58116

Too Many Traces

Solution: Flame Graphs

Flamegraph

❖ Each box presents a function in the stack (stack frame)

❖ x-axis: percent of time on CPU❖ y-axis: stack depth❖ colors: random, or can be a

dimension❖ https://github.com/

brendangregg/FlameGraph

v8

libc

JS

built ins

Flame Graph Interpretation

a()

b() h()

c()

d()

e() f()

g()

i()

Flame Graph InterpretationTop edge shows who is running on-CPU, and how much (width)

a()

b() h()

c()

d()

e() f()

g()

i()

Flame Graph InterpretationTop-down shows ancestry

e.g., from g():

h()

d()

e()

i()

a()

b()

c()

f()

g()

Flame Graph Interpretation

a()

b() h()

c()

d()

e() f()

g()

i()

Widths are proportional to presence in samples

e.g., comparing b() to h() (incl. children)

> 50% time on CPU

lodash!

function merge(object) { var args = arguments, length = 2;...

Use _.assign() Instead

Before

After

Flame Graphs

Helps you find 1 LoC out of 6 Million

Results

❖ Dramatically reduced request latency

❖ Reduced CPU utilization

❖ Increased throughput

Runtime Performance Technique

❖ Sample stack traces via perf(1)

❖ Visualize code distribution with CPU flame graphs

❖ Identify candidate code paths for performance improvement

❖ Repeat

Runtime Crashes

- Chafin, R. "Pioneer F & G Telemetry and Command Processor Core Dump Program." JPL Technical Report XVI, no. 32-1526 (1971): 174.

“The method described in this article was designed to provide a core dump… with a minimal impact

on the spacecraft… as the resumption of data acquisition from the spacecraft is the highest

priority.”

Core Dumps — A Brief History

❖ Magnetic core memory❖ Dump out the contents of

“core” memory for debugging❖ “Core dump” was born❖ Initially printed on paper!❖ Postmortem debugging was

born!

Production Constraints

❖ Uptime is critical

❖ Not easily reproducible

❖ Can’t simulate environment

❖ Resume normal operations ASAP

Postmortem Debugging

Take core dump

Restart app

Load core dump

elsewhere

Engineer FixDebug

Continue serving traffic

Configure Node to Dump Core on Error

!"[0] <> node --abort_on_uncaught_exception throw.jsUncaught Error

FROMObject.<anonymous> (/Users/yunong/throw.js:1:63)Module._compile (module.js:435:26)Object.Module._extensions..js (module.js:442:10)Module.load (module.js:356:32)Function.Module._load (module.js:311:12)Function.Module.runMain (module.js:467:10)startup (node.js:134:18)node.js:961:3

[1] 4131 illegal hardware instruction (core dumped) node --abort_on_uncaught_exception throw.js

Node Post Mortem Tooling

❖ Netflix uses Linux in Prod

❖ Linux — Work in progress

❖ https://github.com/tjfontaine/lldb-v8

❖ https://github.com/indutny/llnode

❖ Solaris — Full featured, compatible with Linux cores

❖ https://github.com/joyent/mdb_v8

Socks & Duct Tape: Setup a Debug Solaris Instance

EC2: http://omnios.omniti.com/wiki.php/Installation#IntheCloud

VM: http://omnios.omniti.com/wiki.php/Installation#Quickstart

Post Mortem Methodology

❖ Where: Inspect stack trace

❖ Why: Inspect heap and stack variable state

mdb(1) JS commands❖ ::help <cmd>

❖ ::jsstack

❖ ::jsprint

❖ ::jssource

❖ ::jsconstructor

❖ ::findjsobjects

❖ ::jsfunctions

Load the Core Dump

# mdb ./node-v4.2.2-linux/node-v4.2.2-linux-x64/bin/node ./core.7186

> ::load ./mdb_v8_amd64.somdb_v8 version: 1.1.1 (release, from 28cedf2)V8 version: 143.156.132.195Autoconfigured V8 support from targetC++ symbol demangling enabled

linux node binary core dumpload mdb_v8 module

::jsstack> ::jsstackjs: testjs: storeHeaderjs: <anonymous> (as OutgoingMessage._storeHeader)js: <anonymous> (as ServerResponse.writeHead)js: restifyWriteHeadjs: _cbjs: sendjs: <anonymous> (as <anon>)js: <anonymous> (as ReactRenderer._renderLayout)js: <anonymous> (as <anon>)js: <anonymous> (as <anon>)js: <anonymous> (as dispatchHandler)js: <anonymous> (as <anon>)js: runHooksjs: runTransitionToHooksjs: <anonymous> (as assign.to)js: <anonymous> (as <anon>)js: runHooksjs: runTransitionFromHooksjs: <anonymous> (as assign.from)js: <anonymous> (as React.createClass.statics.dispatch)native: _ZN2v88internalL6InvokeEbNS0_6HandleINS0_10JSFunctionEEENS1_INS0...native: v8::internal::Execution::Call+0xc8native: v8::internal::Runtime_Apply+0x1cejs: <anonymous> (as b)

frame type

func name

Always name your functions!var foo = function foo() {};

Foo.prototype.bar = function bar() {};

foo(function bar() {});

::jsstack -v Frame Source> ::jsstack -vjs: storeHeader file: http.js posn: position 18774 this: 2ad561306c91 (<unknown>) arg1: 3bd67e0669b9 (JSObject: ServerResponse) arg2: 3dfe966ae299 (JSObject: Object) arg3: 34d5391d8859 (SeqAsciiString) arg4: 34d5391d8881 (SeqAsciiString)

652 653 function storeHeader(self, state, field, value) { 654 // Protect against response splitting. The if statement is there to 655 // minimize the performance impact in the common case. 656 if (/[\r\n]/.test(value)) 657 value = value.replace(/[\r\n]+[ \t]*/g, ''); 658 659 state.messageHeader += field + ': ' + value + CRLF; 660 661 if (connectionExpression.test(field)) { 662 state.sentConnectionHeader = true; 663 if (closeExpression.test(value)) { 664 self._last = true; 665 } else { 666 self.shouldKeepAlive = true; 667 } 668 669 } else if (transferEncodingExpression.test(field)) {

::jsstack -vn0 Frame and Function Args> ::jsstack -vn0js: test file: native regexp.js posn: position 2677 this: 2421205bd4d9 (JSRegExp) arg1: 34d5391d8859 (SeqAsciiString)js: storeHeader file: http.js posn: position 18774 this: 2ad561306c91 (<unknown>) arg1: 3bd67e0669b9 (JSObject: ServerResponse) arg2: 3dfe966ae299 (JSObject: Object) arg3: 34d5391d8859 (SeqAsciiString) arg4: 34d5391d8881 (SeqAsciiString)js: <anonymous> (as OutgoingMessage._storeHeader) file: http.js posn: position 15652 this: 3bd67e0669b9 (JSObject: ServerResponse) arg1: 3dfe966ae271 (ConsString) arg2: 3dfe966add99 (JSObject: Object)js: restifyWriteHead file: /apps/node/webapp/node_modules/restify/lib/response.js posn: position 6964 this: 3bd67e0669b9 (JSObject: ServerResponse) (1 internal frame elided)js: _cb file: /apps/node/webapp/node_modules/restify/lib/response.js

Func NameJS FileLine #

Func Args

::jsstack Function Args> ::jsstack -vn0js: test file: native regexp.js posn: position 2677 this: 2421205bd4d9 (JSRegExp) arg1: 34d5391d8859 (SeqAsciiString)js: storeHeader file: http.js posn: position 18774 this: 2ad561306c91 (<unknown>) arg1: 3bd67e0669b9 (JSObject: ServerResponse) arg2: 3dfe966ae299 (JSObject: Object) arg3: 34d5391d8859 (SeqAsciiString) arg4: 34d5391d8881 (SeqAsciiString)js: <anonymous> (as OutgoingMessage._storeHeader) file: http.js posn: position 15652 this: 3bd67e0669b9 (JSObject: ServerResponse) arg1: 3dfe966ae271 (ConsString) arg2: 3dfe966add99 (JSObject: Object)js: restifyWriteHead file: /apps/node/webapp/node_modules/restify/lib/response.js posn: position 6964 this: 3bd67e0669b9 (JSObject: ServerResponse) (1 internal frame elided)js: _cb file: /apps/node/webapp/node_modules/restify/lib/response.js

Memory Address of Var Var Type

::jsprint Print JS Objects> 3bd67e0669b9::jsprint{ "_time": 1437690472539, "_headers": { "content-type": "text/html", "req_id": "5b7f18f2-7f12-4c68-b07f-3cd75698ba65", "set-cookie": “CENSORED; Domain=.netflix.com; Expires=Fri, 24 Jul 2015 10:27:52 GMT", "x-frame-options": "DENY", "x-ua-compatible": "IE=edge", "x-netflix.client.instance": "i-c420596c", }, "output": [], "_last": false, "_hangupClose": false, "_hasBody": true, "socket": { "_connecting": false, "_handle": [...], "_readableState": [...], "readable": true, "domain": null, "_events": [...], "_maxListeners": 10, "_writableState": [...], "writable": true, "allowHalfOpen": true, "onend": function <anonymous> (as socket.onend),

Actual JS Object Instance

::jsconstructor Show Object Constructor

> 3bd67e0669b9::jsconstructor -vServerResponse (JSFunction: 2421205bced9)

::jssource Print f() Source

> 2421205bced9::jssourcefile: http.js

1066 function ServerResponse(req) { 1067 OutgoingMessage.call(this); 1068 1069 if (req.method === 'HEAD') this._hasBody = false; 1070 1071 this.sendDate = true; 1072 1073 if (req.httpVersionMajor < 1 || req.httpVersionMinor < 1) { 1074 this.useChunkedEncodingByDefault = chunkExpression.test(req.headers.te); 1075 this.shouldKeepAlive = false; 1076 } 1077 } 1078 util.inherits(ServerResponse, OutgoingMessage);

Core Dump === Complete Process State

Memory Leaks

Memory Leaks

Generate Core Dump Ad-hoc

gcore(1) GNU Tools gcore(1)

NAME gcore - Generate a core file for a running process

SYNOPSIS gcore [-o filename] pid

Take a Core Dump!root@demo:~# gcore `pgrep node`[Thread debugging using libthread_db enabled]Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".[New Thread 0x7facaeffd700 (LWP 5650)][New Thread 0x7facaf7fe700 (LWP 5649)][New Thread 0x7facaffff700 (LWP 5648)][New Thread 0x7facbc967700 (LWP 5647)][New Thread 0x7facbd168700 (LWP 5617)][New Thread 0x7facbd969700 (LWP 5616)][New Thread 0x7facbe16a700 (LWP 5615)][New Thread 0x7facbe96b700 (LWP 5614)]0x00007facbea5b5a9 in syscall () from /lib/x86_64-linux-gnu/libc.so.6Saved corefile core.5602

Problem: Find Leaking Objects

::findjsobjects

NAME findjsobjects - find JavaScript objects

SYNOPSIS [ addr ] ::findjsobjects [-vb] [-r | -c cons | -p prop]

::findjsobjects Find ALL JS Objects on Heap

> ::findjsobjects OBJECT #OBJECTS #PROPS CONSTRUCTOR: PROPS ... 3dfe97453121 18 6721 Array 157a020e01 1304 101 <anonymous> (as Constructor): ... 8f1a53211 13879 12 ReactDOMComponent: _tag, tagName, props, ... 8f1a05691 85776 2 Array 3dfe97451a99 36 5589 Array 23e5d7d44351 1 218020 Object: .2f5hpw2hgjk.1.0.3, ... 8f1a05f31 40533 6 <anonymous> (as ReactElement): type, ... 8f1a04da1 252133 1 Array 8f1a04dc1 125869 7 Array 8f1a04f01 114914 8 Array 8f1a04d39 230924 7 Module: id, exports, parent, filename, ...

Memory Leak Strategy

❖ Look at objects on heap for suspicious objects

❖ Take successive core dumps and compare object counts

❖ Growing object counts are likely leaking

❖ Inspect object for more context

❖ Walk reverse references to find root object

Look at Object Delta Between Successive Core Dumps

Uptime = 45mins

> ::findjsobjects OBJECT #OBJECTS #PROPS CONSTRUCTOR: PROPS ... 8f1a04d39 230924 7 Module: id, exports, parent, filename, ...

Uptime = 90 mins

> ::findjsobjects OBJECT #OBJECTS #PROPS CONSTRUCTOR: PROPS ... 8f1a04d39 323454 7 Module: id, exports, parent, filename, ...

Analyze Leaked Objects

Representative Object

> ::findjsobjects OBJECT #OBJECTS #PROPS CONSTRUCTOR: PROPS ... 8f1a04d39 323454 7 Module: id, exports, parent, filename, ...

Representative Object, 1 of 323454

Look Closer> 8f1a04d39::jsprint{ "id": "/apps/node/webapp/ui/js/pages/akiraClient.js", "exports": {}, "parent": { "id": "/apps/node/webapp/middleware/autoClientStrings.js", "exports": function autoExposeClientStrings, "parent": [...], "filename": "/apps/node/webapp/middleware/autoClientStrings.js", "loaded": true, "children": [...], "paths": [...], }, "filename": "/apps/node/webapp/ui/js/pages/akiraClient.js",

Use ::findjsobjects to Find All “Module” Objects

> 8f1a04d39::findjsobjects8f1a04d393fd996bffb393fd996bfcff13fd996bfbac13fd996bf8a193fd996bf79493fd996bf3ce93fd996bf0f193fd996bead713fd996bea8213fd996bea0013fd996be92b13fd996be73d13fd996be58d13fd996bd88b13fd996bcb4593fd996bcaa413fd996bc70093fd996bc3321

Analyze All 320K+ Objects?

Custom Querying With Pipes and Unix Tools

8f1a04d39::findjsobjects | ::jsprint ! grep filename | sort | uniq -c

Results... 1 "filename": "/apps/node/webapp/ui/js/akira/components/messaging/paymentHold.js", 2 "filename": "/apps/node/webapp/ui/js/common/commonCore.js", 1 "filename": "/apps/node/webapp/ui/js/common/playPrediction/playPrediction.js", 3 "filename": "/apps/node/webapp/ui/js/common/presentationTracking/presentationTracking.js", 111061 "filename": “/apps/node/webapp/ui/js/common/playPrediction/playPrediction.js", 7103 "filename": “/apps/node/webapp/ui/js/pages/reactClientRender.js", 111061 "filename": “/apps/node/webapp/ui/js/pages/akiraClient.js", 118257 "filename": “/apps/node/webapp/middleware/autoClientStrings.js",... Client Side Modules

What’s holding on to these modules?

Aim: Find Root Object

Walk Reverse Refs with ::findjsobjects -r

> 8f1a04d39::findjsobjects -r

8f1a04d39 referred to by 14fd6c5b13c1.parent

Root Object> 1f313791bb41::jsprint[ { "id": "/apps/node/webapp/ui/js/pages/akiraClient.js", "exports": [...], "parent": [...], "filename": "/apps/node/webapp/ui/js/pages/akiraClient.js", "loaded": false, "children": [...], "paths": [...], }, { "id": "/apps/node/webapp/ui/js/pages/akiraClient.js", "exports": [...], "parent": [...], "filename": "/apps/node/webapp/ui/js/pages/akiraClient.js", "loaded": false, "children": [...], "paths": [...], }, { "id": "/apps/node/webapp/ui/js/pages/akiraClient.js", "exports": [...], "parent": [...], "filename": "/apps/node/webapp/ui/js/pages/akiraClient.js",

Spot the Leakvar cache = {};

function checkCache(someModule) { var mod = cache[someModule]; if (!mod) { try { mod = require(someModule); cache[someModule] = mod; return mod; } catch (e) { return {}; } }

return mod;}

Module could be client only, must catch

Should cache the fact we caught an exception here

Root Cause

❖ Node caches metadata for each module

❖ If require process throws an exception, the module metadata is leaked (bug?)

❖ Client side module meant we were throwing during every request, and not caching the fact we tried to require it

❖ Each request leaks 3+ module metadata objects

Memory Leaks

❖ Take successive core dumps (gcore(1))

❖ Compare object counts (::findjsobjects)

❖ Growing objects are likely leaking

❖ Inspect object for more context (::jsprint)

❖ Walk reverse references to find root obj (::findjsobjects -r)

Post Mortem Debugging is Critical to Large Scale Prod Node Deployments

More State than Just Logs❖ Detailed stack trace (::jsstack)

❖ Function args for each frame (::jsstack -vn0)

❖ Get state of any object and its provenance (::jsprint, ::jsconstructor)

❖ Get source code of any function (::jssource)

❖ Find arbitrary JS objects (::findjsobjects)

❖ Unmodified Node binary!

Production Failures are Inevitable

But We Can Learn From Them

Production Debugging❖ Runtime Performance

❖ CPU profiling/flame graphs

❖ Runtime Crashes

❖ Inspect program state with core dumps and mdb

❖ Memory leaks

❖ Analyze objects and references with core dumps and mdb

Use the Scientific Method

Epilogue — State of Tooling

❖ Join Working Group https://github.com/nodejs/post-mortem

❖ Help make mdb_v8 cross platform https://github.com/joyent/mdb_v8

❖ Contribute to https://github.com/tjfontaine/lldb-v8 and https://github.com/indutny/llnode

Acknowledgements❖ mdb_v8

❖ Dave Pacheco, TJ Fontaine, Julien Gilli, Bryan Cantrill

❖ CPU Profiling/Flamegraphs

❖ Brendan Gregg, Google V8 team, Ali Ijaz Sheikh

❖ Linux Perf

❖ Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Peter Zijlstra

❖ lldb-v8

❖ TJ Fontaine

❖ llnode

❖ Fedor Indutny

Get Involved!

THANKS

❖ Questions? We’re Hiring!❖ yunong@netflix.com❖ @yunongx

Citations

❖ Slides 29-32 used with permission from “Java Mixed-Mode Flame Graphs”, Brendan Gregg, Oct 2015

❖ Slide 26 used with permission from http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html

top related