data antipatterns nyc devops - 2014
DESCRIPTION
Are you running a Database in the cloud? Worried that you’re doing it wrong? Engine Yard supports a broad set of databases, with lots of flexibility for customers to modify and configure their installations. But the freedom to adapt and extend standard functionality comes at the risk of unexpected negative consequences: seemingly benign modifications can seriously affect durability and performance. During our years of helping customers, I've had the unique opportunity to observe common problems, patterns, and best practices with big (and not so big) data. In this talk I'll highlight the most common pitfalls and tell you how to avoid them (I will most likely rant about other things too!). http://www.meetup.com/nycdevops/events/181986002/TRANSCRIPT
ANTI PATTERNS
DATA
Updated!
ines @ Engine Yard.com @Randommood
And I’m a happy dog!
I N E SS O M B R A
I work with Databases
I can get a little ragey
sometimes
Disclaimer #1
I’m sorry.!
Ask me to slow down.
Disclaimer #2
ZOMG, the horror!
.BACKUPSyes, we are going there
“I know you. You know you. And I know you
know that I know you”
White Goodman
Boring Definition #1
Backups
Copy and archiving of data
Goal is to restore the state of a DB
Many types - blah
Anti-Pattern #1
Taking too many
backups
Not free, they requires resources
Full backup every hour, really? What about backup retention?
Anti-Pattern #2
Taking too few
backups
Enough to minimize the risk of data loss due to corrupted backup files
yes,����������� ������������������ this����������� ������������������ totes����������� ������������������ happens!
The untested backup
Anti-Pattern #3
Doing backups right
Logically test
backups
Errorless restore is not enough. Test logical data too
Doing backups right
Know your types &
tools
Take logical and binary backups
Continuous archiving & hot backup utilities
Doing backups right
Practice restores
Backups alone do not constitute DR. Have a plan & practice it
Server extensions and configuration matter when restoring
“I want a ridiculously
good looking
Database” Derek Zoolander
(honestly, Ben Stiller rules)
Obvious statement #1
Many DB choices
Cargo culting your
database
Anti-Pattern #4Failure to understand use case, strengths & weaknesses of a new database
RDBMS for Session
Data
Anti-Pattern #5 Often means at least one write per request
Any DB issue/task may cause app to hang
Tables have a tendency to bloat
Modeling, it’s all the same
Anti-Pattern #6
Data Model
Consistency needs
Availability needs
Scaling needsOperational story & cost
Doing it right
Know your needs
Doing it right
Spike it, forealsies
Spike it with your data and traffic. Best way to gain operational experience
Doing it right
Leverage new
features
Relational databases are getting quite versatile
Evaluate clustered MySQL options
Are we doing
ok?
We have a cloud deployment!Happy team on shipping day, lmfao if you don’t celebrate like this
Cloud-based databases,
they are real
Obvious statement #2 Databases can live in the cloud quite well
Many IaaS, PaaS, & DBaaS options
Easy to get started & may be economical
Where did my instance go?
Anti-Pattern #7
Anti-Pattern #8
Cloud, it’s just like
hardware
It’s not. Cloud resources are virtualized
Capacity planning and monitoring matter. A lot
Anti-Pattern #9
Shit doesn’t happen
You are not immune to infrastructure failures. Plan for it
Anti-Pattern #10
Storage is the same
Instance storage is not persisted (unless you use EBS)
Data locality matters
Don’t run your cloud DBs too hot!
Doing cloud right
Know your cloud
deployments
Replication in the cloud is a must-have
Put DB master & replicas in different AZs
Doing cloud right
Learn high availability &
disaster recovery
Get good at replica promotions (some work involved)Understand and invest in DR/HA. Know your options
Doing cloud right
Know your system
Invest in monitoring
Know your data distribution & querying patterns
Know baseline behavior
And there’s more!
Boring Definition #2
Indexes(or indices, I prefer indexes)
Improves speed of data retrieval
Used in random & ordered lookups
Imply additional writes & storage
Anti-Pattern #12
Too few Indexes
Room for query optimization & increased speed
Analyze, slow logs & monitoring tools are your friends
Anti-Pattern #13
Too many Indexes
They are not free. Your DB maintains them.
Too many will impact your write throughput.
Doing Indexes right
Many index types
Many types. Learn how your DB does them.
You want the right amount.
Doing Indexes right
Postgres Indexes
summarized
B - T R E E S D E F A U LT. N U M E R I C , T E X T , N U L L H A S H E Q U A L I T Y . D O N ’ T U S E
G I N A R R AY V A L U E S & F T S G I S T G E O M E T R I C D AT A & F T S
They Can Be Created Concurrently!
Doing Indexes right
Postgres Indexes are more baller
P A R T I A L I N D E X E S I N D E X + W H E R E C L A U S E E X P R E S S I O N I N D E X E S M AT C H O N F U N C T I O N / M O D I F I C AT I O N
U N I Q U E I N D E X E S P R E V E N T S D U P E S S O R T E D I N D E X E S A LT E R B - T R E E F R O M A S C T O D E S C
Prioritize them, take them regularly. For the love of
sweet baby jesus routinely test them
Backups Know a DBs use case,
strengths, & weaknesses How well does it fit your
needs?
Choices
tl;dr;
Have the right amount. Properly maintain them.
DB IndexesNot the same as real hardware. Plan for
failures.
The Cloud
QuestionsThank you!
I’m @randommood on the twitters .