data integrity in stateful services -...

34
Data Integrity in Stateful Services Velocity, China, 2016

Upload: others

Post on 19-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

Data Integrity in Stateful Services

Velocity, China, 2016

Page 2: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

Data Integrity

Bringing Sexy Back

Page 3: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

-Every DBA who doesn’t want to be fired

“Protect the Data.”

Page 4: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

Breaking Integrity Down

• Physical Integrity - Help, my data files are gone!

• Logical Integrity - Help, my emails disappeared!

Page 5: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

“Data Integrity is the mission of

the entire organization.”

Page 6: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

When to plan for

integrity?

Page 7: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

You’re guarding a henhouse that already has a fox in it.

If you are planning for recovery after your application and infrastructure is

built…

Page 8: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

Goals for Integrity

Elimination

Empowerment

Detection

Flexibility

Page 9: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

Elimination

Where possible, eliminate the potential for corruption

and data loss.

Optimize for durability based on your user needs

ACID vs BASE

Consistency vs Availability

Velocity Levers

Page 10: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

Empowerment

Help people and systems to recover rapidly from

their own mistakes.

don’t trust destructive requests

soft deletes w/recovery API

data versioning

Page 11: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

Detection

Early detection of corruption is as

important as the ability to recover from it.

unit and regression testing

data validation pipelines

tools for investigation

Page 12: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

Flexibility

You cannot predict all of the ways you can lose

data. Focus on flexibility in your toolbox.

Tiered Storage

Replication and Data Portability

Page 13: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

What could go wrong?Famous Last Words

Page 14: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

Planned Recovery

• Production deployments

• Environment duplication

• Downstream services

• (analytics, compliance)

• Operational tests

Page 15: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

Unplanned Recovery

Category

Scope

Impact

“Google estimates 24 combinations of data integrity failures possible”

Page 16: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

Scenario Scope

• Small:

• Localized or single instance in redundant scenarios.

• Small subset of data (1000 customers)

• Medium:

• Cluster-Wide or a full Zone

• A full dataset (all customers in a shard)

• Large:

• Multiple clusters, or a full DC

• Multiple datasets (full data loss, all customers across shards)

Page 17: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

Scenario Impact

• Small:

• Some features impacted, non-SLO threatening.

• Small subset of users impacted.

• Medium:

• SLO threatening.

• Moderate subset of users impacted.

• Large:

• SLO impacting, application down.

• Majority of users impacted.

Page 18: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

Operator Error

• Data Deletion

• Data Corruption

• Relaxed Constraints

• Storage removal

Page 19: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

Application Errors

• Removing pointers to assets in external storage

• Character set mutilation

• Duplication of data

Page 20: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

Infrastructure Services

• Orchestration got frisky?

• Configuration management change some durability parameters?

• Proxies or DNS points to the wrong node?

Page 21: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

OS and Hardware Errors

• Silent corruption due to failed ECC error checks?

• Filesystem corruption

• Data loss during a power down

Page 22: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

Hardware Failures

• Disk Failures

• Memory Failures

• Controller Failures

Page 23: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

Datacenter Failures

• Catastrophic power loss

• Wiped out storage

• Fires, catastrophic events

Page 24: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

Anatomy of a Recovery Strategy

Page 25: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

Building Block 1

• A culture of unit and regression testing

• Data validation test suite

• Example: Storing external media

• Tools and analytics to investigate errors

“Early Detection, Bad Data Propagates”

Page 26: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

Building Block 2

• Fast, expensive storage for dataset portability

• Slow, inexpensive storage for long-term backups

• Long-term storage (tape, offsite)

• Object storage for versioning

• Distributed logs (i.e. Kafka) for versioning

“Tiered Storage”

Page 27: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

Building Block 3

• Full and incremental online backups

• Full and incremental offline/long-term backups

• APIs for soft deletion/undeletion

• APIs for version rollback/play forward

• Producers for event streams to recreate objects

“Toolbox”

Page 28: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

Building Block 4

• Daily use as testing - incorporate recovery into daily work

• Continuous testing of less-used recovery methods

• Regular game days - team scenario testing

“Testing”

Page 29: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

Final Considerations

Page 30: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

Data Integrity is cultural

• Design, Build, Test, Deploy - each stage is an opportunity to think about data integrity

• Checks and balances between teams keeps us honest and focused on the goal

• Data becomes too complex for one person or team to understand it. We must help each other.

Page 31: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

Data Integrity Must be Continuous

• These processes are crucial and cannot be allowed to gather dust.

• Humans will not do this on their own, integration and automation is required.

• This must be put into project functional requirements.

Page 32: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

You cannot plan for everything

• New and interesting things will occur that will challenge your plans.

• Flexibility and multiple options must be made available.

• Early detection is crucial for ensuring problems do not propagate out of control.

Page 33: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

Good luck!

“@lainevcampbell, [email protected]

Page 34: Data Integrity in Stateful Services - O'Reillyvelocity.oreilly.com.cn/2016/ppts/DBIntegrityOReilly.pdf · Data Integrity is cultural • Design, Build, Test, Deploy - each stage is

Check out our book

“Laine Campbell and Charity Majors”