when it all goes wrong - postgresql€¦ · 2019-10-17 when postgres goes wrong pgconfeu created...

69
When it all Goes Wrong

Upload: others

Post on 21-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

When it all Goes Wrong

Page 2: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

Will Leinweber@leinweber

Citus Data (Microsoft)

bitfission.com(warning autoplays midi)

Page 3: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

coming fromcitus cloud

heroku postgres

Page 4: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

special thankscitus cloud— dan farina (@danfarina)

heroku postgres— maciek sakrejda (@uhoh_itsmaciek)

Page 5: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

same sorts of problemsfrom pages & alerts

from support tickets

Page 6: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

this talkmore app dev who uses postgresrather than dba

Page 7: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

the problem with Postgresit’s pretty good

you don’t get experience with how it breaks

Page 8: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

what to do for a problem

Page 9: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

what to do for a problem

Page 10: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

complicated systemnetwork

hardware

o/s

postgres

Page 11: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

using the database (too much)95% application

4% auto vacuum

1% everything else

Page 12: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

hard to convinceall the graphs saying DB is slow

and nothing has changed

…must be the database!

Page 13: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

https://upload.wikimedia.org/wikipedia/commons/9/98/Survivorship-bias.png

Page 14: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

“but I didn’t change anything”no deploys!

no database migrations!

no scaling!

Page 15: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

“but I didn’t change anything”

https://upload.wikimedia.org/wikipedia/commons/0/09/Redherring.gif

Page 16: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

“but I didn’t change anything”more traffic?

change in access patterns?

one big user logged in?

Page 17: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

run out of a resource

Page 18: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

snowball

Page 19: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

examplemanageable user 1s query => 2x expensive

frequent, small queries 3ms => 12ms

Page 20: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

assumptions

hardwaremaintenance

app

Page 21: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

assumptionspostgres should not crash

…with overcommit off and no containers

large extensions increase chance

Page 22: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

if not postgres, then what

Page 23: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

system resourcescpu

memory

disk

parallelism / backends

locks

Page 24: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

cpu mem disk parallelismcpu mem disk parallelism

Page 25: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

cpu mem disk parallelismcredentials wrong

networking broken

locking issue, check pg_locks

idle in transaction

Page 26: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

cpu mem disk parallelismapplication submitting backlogged workload

connection leak

pool sizes set too large

pg_lock issue + application backlog

Page 27: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

cpu mem disk parallelismworkload skew causing thrashing

unusual sequential scan workload

failover or restart => no cache

pg_prewarm

Page 28: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

cpu mem disk parallelismsame as just disk,

but also the application is piling on

Page 29: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

cpu mem disk parallelismlarge GROUP BYs

high disk latency due to unusual page dispersion pattern in the workload

Page 30: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

cpu mem disk parallelismworkload has high mem (GROUP BY) + app adding backlog

lock contention slowing mem release

Page 31: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

cpu mem disk parallelismlarge GROUP BYs + paging in unusual data

Page 32: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

cpu mem disk parallelismLook for what is causing disk access

Page 33: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

cpu mem disk parallelismsmall, in-memory workload

lots of seq scans on small table

index scan w/ filter dropping lots

Page 34: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

cpu mem disk parallelismapp backlog + too much processing on small data

simply a lot of work

Page 35: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

cpu mem disk parallelismlarge seq scans

Page 36: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

cpu mem disk parallelismloading cold data + application backlog

Page 37: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

cpu mem disk parallelismsmall # of backends doing a lot more work

Page 38: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

cpu mem disk parallelismentity, workload, entity*workload

soft deletes and non-conditional indexes

Page 39: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

cpu mem disk parallelismreporting query

Page 40: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

cpu mem disk parallelismapp backlog, but with CPU/mem problems

Page 41: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

tools of the trade

Page 42: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

tools of the tradeC symbols

Page 43: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

tools of the trade: perfperf record -p <pid> && perf report

Page 44: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

tools of the trade: perfperf top

Page 45: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

tools of the trade: perfwww.brendangregg.com/perf.html

Page 46: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

tools of the trade: gdbgdb -batch -ex 'bt' -p <pid>

Page 47: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

Page 48: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

Page 49: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

tools of the trade: iostatiostat -xm 10

Page 50: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

tools of the trade: iotop

Page 51: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

tools of the trade: htop

Page 52: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

Tools of the trade: bwm-ng

Page 53: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

tools of the trade: backendspgrep -lf postgres + grep + wc

select * from pg_stat_activity

Page 54: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

tools of the trade: pg_s_sselect * from pg_stat_statements

Page 55: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

tools of the trade: summarycpu mem disk parallelism network

perf x

gdb x

iostat x

iotop x

htop x x

bwm x

pgrep x

Page 56: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

what to do

Page 57: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

what to doconfiguration change

Page 58: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

what to dodb change

Page 59: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

what to docode change

Page 60: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

flirting with disasterVelocity NY 2013: Richard Cook"Resilience In Complex Adaptive Systems”

Jens Rasmussen:Risk management in a dynamic society: a modeling problem

Page 61: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

flirting with disaster

economicboundary

Page 62: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

flirting with disaster

economicboundary

workloadboundary

Page 63: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

flirting with disaster

economicboundary

workloadboundary

performanceboundary

Page 64: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

flirting with disaster

economicboundary

workloadboundary

performanceboundary

errormargin

Page 65: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

flirting with disaster

economicboundary

workloadboundary

performanceboundary

Page 66: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

flirting with disaster

economicboundary

workloadboundary

performanceboundary

errormargin

Page 67: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

flirting with disaster

economicboundary

workloadboundary

performanceboundary

errormargin

Page 68: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

flirting with disasterVelocity NY 2013: Richard Cook"Resilience In Complex Adaptive Systems”

Jens Rasmussen: Risk management in a dynamic society: a modeling problem

Page 69: When it all Goes Wrong - PostgreSQL€¦ · 2019-10-17 when postgres goes wrong pgconfeu Created Date: 10/17/2019 10:46:17 AM

@leinweber

thank youWill Leinweber

@leinweber citusdata.com