data integrity in rails

58
Data Integrity in Rails Strategies and techniques to make your data bulletproof

Upload: rizwan-reza

Post on 21-Apr-2017

2.983 views

Category:

Software


0 download

TRANSCRIPT

Data Integrity in RailsStrategies and techniques to make your data bulletproof

Proactive

ReactiveVS

Simple Complex

Email ValidationsNot NULL constraint

In ApplicationIn Database

Proactive Techniques

NOT NULL Constraint

The “NoMethodError”

@blog.body.to_html

Returns nilundefined method ‘to_html' for nil:NilClass

Fix in view

@blog.body.try(:to_html)

Fix in view

@blog.body.to_s.to_html

Fix in view

unless @blog.body.nil? @blog.body.to_html end

Learnings

Likely to encounter the same problem in other places

Messy code littered with checks and guards

These are all band-aid fixes

Fix using Rails Validation

validates :body, presence: true, if: ->(record) { record.nil? }

WTH?!?

Validations can still be bypassed

blog = Blog.newblog.body = nilblog.save(validate: false)

Learnings

Code is unnecessarily hard to read

Validations can be bypassed, resulting in incoherent data

Fix in Database

change_column :blog, :body, :text, null: false, default: ''

Learnings

No Code Modification

Less Complexity–you never have to deal with both nils and blank strings

Work on the assumption that body is never nil

The missing Parent

The “NoMethodError”

@post.author.name

Returns nilundefined method ‘name' for nil:NilClass

Deleting parent but not children results in this error

Fix in view

@post.author.name if @post.author

Fix in view

@post.author.try(:name)

Learnings

Likely to encounter the same problem in other places

Messy code littered with checks and guards

These are all band-aid fixes

Fix using ActiveRecord

has_one :author, dependent: :destroy

Fix using ActiveRecord

has_one :author, dependent: :destroy

Inefficient if lots of records

Fix using ActiveRecord

has_one :author, dependent: :delete_all

Does only one query, but doesn’t run callbacks

Fix using ActiveRecord

has_one :author, dependent: :restrict_with_exception

Blows up if you try to delete a parent with children in DB

Fix using ActiveRecord

has_one :author, dependent: :restrict_with_error

Shows an error if you try to delete a parent with children in DB

These strategies can still be bypassed

Post.find(1).delete

Learnings

This is better than fixing locally in views

But this can still introduce bad data

Fix in Database

add_foreign_key :authors, :posts

Fix in Database

add_foreign_key :authors, :posts

Rails 4.2 Feature

Fix in Database

ALTER TABLE `authors ̀ADD CONSTRAINT `authors_post_id_fk ̀FOREIGN KEY (`post_id`) REFERENCES `posts`(id);

Fix in Database

add_foreign_key :authors, :posts, on_delete: :cascade

Removes all authors when a post is deleted

Fix in Database

add_foreign_key :authors, :posts, on_delete: :restrict

:restrict is the default behavior of foreign keys

Ideal fix

has_one :author, dependent: :delete_all

add_foreign_key :authors, :posts, on_delete: :restrict

Learnings

The ideal fix never allows someone to directly introduce orphan data, but still does the optimized cascading

behavior when deleted in ActiveRecord.

Duplicate Data

Uniqueness Validation

validates :name, uniqueness: true Author.where(name: "Mr. Duplicate").count # => 2

Uniqueness Validation

author = Author.newauthor.name = "Mr. Duplicate"author.save(validate: false)

Unique Index

add_index :authors, :name, unique: true

Unique Index

PG::Error: ERROR: could not create unique index "index_authors_on_name"

DETAIL: Key (name)=(Mr. Duplicate) is duplicated.

Ways of Removing Duplicate Data

Use SQL to arbitrarily remove duplicates

Use scripts to automatically merge content in rows

Manually merge content/remove duplicate rows

Unique Index Protects Data from having Duplicates

PG::Error: ERROR: duplicate key value violates unique constraint "index_authors_on_name"DETAIL: Key (title)=(Mr. Duplicate) already exists

This error is thrown every time the Active Record validation is bypassed

Unique Index Protects Data from having Duplicates

def save_with_retry_on_unique(*args) retry_on_exception(ActiveRecord::RecordNotUnique) do save(*args) endend

Retries saving when error is thrown, so the validation can take over

Unique Index Protects Data from having Duplicates

def save_with_retry_on_unique(*args) retry_on_exception(ActiveRecord::RecordNotUnique) do save(*args) endend Retries only once

Calls the block only once

One-to-One Relationships

add_index :authors, :name, unique: true

Protects from associating multiple records to the parent

LearningsActive Record validations are not meant for data integrity.

Incoherent Data can still be introduced.

Database level index on unique makes sure data is never duplicated.

Rails will skip validations a lot in concurrent situations, so always handle the underlying

ActiveRecord::RecordNotUnique Error.

Don’t forget to add unique index on one-to-one relationships.

Polymorphic Associations

Polymorphic Association

class Post has_many :comments, as: :commentableendclass Comment belongs_to :commentable, polymorphic: trueend

Both commentable_type and commentable_id are stored in the database.

Polymorphic Association

class Post has_many :comments, as: :commentableendclass Comment belongs_to :commentable, polymorphic: trueend

There is no way to add foreign keys to polymorphic associations.

LearningsThere is no SQL standard way of adding polymorphic

associations.

Referential Integrity is compromised when we use this ActiveRecord pattern.

Expensive to index.

The data distribution isn’t usually uniform.

Harder to JOIN in SQL.

Database-friendly Polymorphic Associations

class Post has_many :comments, class_name: 'PostComment'endclass PostComment include Commentable belongs_to :postend

LearningsAdding one table for each child type maintains data integrity.

Foreign keys can be added.

Extract similar behaviors using modules in Ruby in the application.

Create a non-table backed Ruby class for creating comments

Use class_name option to designate which class name to use when retrieving records.

LearningsEasier to grok and operate.

Harder to aggregate over all comments regardless of type.

More expensive to add another parent type.

Use specific tables if you care for data integrity.

If data integrity is a non-issue, use polymorphic associations. Event logging or activity feeds are good

examples.

Reactive Techniques

Data Integrity Test Suite

MAX_ERRORS = 50 def test_posts_are_valid errors = [] Post.find_each do |post| next if post.valid? errors << [post.id, post.errors.full_messages] break if errors.size > MAX_ERRORS end assert_equal [], errorsend

Data Integrity Test Suite

def test_post_bodys_are_not_nil assert_equal 0, Post.where(body: nil).countend

LearningsProactive techniques work best

They’re not always feasible if you have bad data already

Reactive integrity checks are a good alternative

Run these regularly against production data to surface errors up.

Avoid using complex constraints.

RecapNot null constraints

Unique indexes

Foreign keys

Refactor Polymorphic association into separate tables

Reactive integrity checks

Thanks!@rizwanreza