postgresql materialized views with active record

25
PostgreSQL Materialized Views and Active Record David Roberts

Upload: david-roberts

Post on 09-Jun-2015

1.838 views

Category:

Software


4 download

DESCRIPTION

How to use PostgreSQL Materialized Views with to simplify code and improve performance of your Ruby on Rails application

TRANSCRIPT

Page 1: PostgreSQL Materialized Views with Active Record

PostgreSQL Materialized Views

and Active Record

David Roberts

Page 2: PostgreSQL Materialized Views with Active Record

The ProblemHow do you quickly filter data

represented by multiple ActiveRecord associations and calculations?

Page 3: PostgreSQL Materialized Views with Active Record

Data ModelCity%

&%Philly%&%Boston%

Technology%&%Ruby%&%Python%

Club%&%Philly.rb%

Talk%&%PostgreSQL%Materialized%Views%

Feedback%&%“Super%rad%talk!!”%&%“The%whole%world%is%now%dumber”%

Author%&%David%Roberts%

Page 4: PostgreSQL Materialized Views with Active Record

View all Commentsclass Feedback < ActiveRecord::Base belongs_to :talk INVALID_COMMENTS = ['', 'NA', 'N/A', 'not applicable']

scope :filled_out, -> { where.not(comment: INVALID_COMMENTS) }end

Feedback.filled_out

Feedback Load (2409.6ms) SELECT "feedbacks".* FROM "feedbacks" WHERE ("feedbacks"."comment" NOT IN ('', 'NA', 'N/A', 'not applicable'))

Page 5: PostgreSQL Materialized Views with Active Record

Highest Scoring Talk with Valid Comments

Feedback.filled_out \ .select('talk_id, avg(score) as overall_score') \ .group('talk_id').order('overall_score desc') \ .limit(10)

# 665msSELECT talk_id, avg(score) as overall_score FROM "feedbacks" WHERE ("feedbacks"."comment" NOT IN ('', 'NA', 'N/A', 'not applicable')) GROUP BY talk_id ORDER BY overall_score desc LIMIT 10;

Page 6: PostgreSQL Materialized Views with Active Record

Highest Scoring Talk with Valid Comments

results = Feedback.filled_out \ .select('talk_id, avg(score) as overall_score') \ .group('talk_id').order('overall_score desc') \ .limit(10)

results.first.inspect

=> "#<Feedback id: nil, talk_id: 24510>"

Page 7: PostgreSQL Materialized Views with Active Record

Highest Scoring Talks in PA by Authors named Parker

Feedback.filled_out.joins(talk: [:author, { club: :city }] ) \ .select('feedbacks.talk_id, avg(feedbacks.score) as overall_score') \ .where("cities.state_abbr = ?", 'PA') \ .where("authors.name LIKE '%?%'", 'Parker') \ .group('feedbacks.talk_id') \ .order('overall_score desc') \ .limit(10)

# 665msSELECT feedbacks.talk_id, avg(feedbacks.score) as overall_score FROM "feedbacks" INNER JOIN "talks" ON "talks"."id" = "feedbacks"."talk_id" INNER JOIN "authors" ON "authors"."id" = "talks"."author_id" INNER JOIN "clubs" ON "clubs"."id" = "talks"."club_id" INNER JOIN "cities" ON "cities"."id" = "clubs"."city_id" WHERE ("feedbacks"."comment" NOT IN ('', 'NA', 'N/A', 'not applicable')) AND (cities.state_abbr = 'PA') AND (authors.name LIKE '%Parker%') GROUP BY feedbacks.talk_id ORDER BY overall_score desc LIMIT 10;

Page 8: PostgreSQL Materialized Views with Active Record

What’s Wrong with these Examples?

• Long ugly queries

• Slow queries are bad for Web Applications

• Fighting ActiveRecord framework

• SQL aggregates are difficult to access

• Returned object no longer corresponds to model

• Repetitive code to setup joins and Filter invalid comments

Page 9: PostgreSQL Materialized Views with Active Record

Using Database Views

to clean up ActiveRecord models

Page 10: PostgreSQL Materialized Views with Active Record

Views• Uses a stored / pre-defined Query

• Uses live data from corresponding tables when queried

• Can reference data from many tables

• Great for hiding complex SQL statements

• Allows you to push functionality to database

Page 11: PostgreSQL Materialized Views with Active Record

But this is a Ruby talk!

Page 12: PostgreSQL Materialized Views with Active Record

class CreateTalkView < ActiveRecord::Migration def up connection.execute <<-SQL CREATE VIEW v_talks_report AS SELECT cities.id as city_id, cities.name as city_name, cities.state_abbr as state_abbr, technologies.id as technology_id, clubs.id as club_id, clubs.name as club_name, talks.id as talk_id, talks.name as talk_name, authors.id as author_id, authors.name as author_name, feedback_agg.overall_score as overall_score FROM ( SELECT talk_id, avg(score) as overall_score FROM feedbacks WHERE feedbacks.comment NOT IN ('', 'NA', 'N/A', 'not applicable') GROUP BY talk_id ) as feedback_agg INNER JOIN talks ON feedback_agg.talk_id = talks.id INNER JOIN authors ON talks.author_id = authors.id INNER JOIN clubs ON talks.club_id = clubs.id INNER JOIN cities ON clubs.city_id = cities.id INNER JOIN technologies ON clubs.technology_id = technologies.id SQL end

def down connection.execute 'DROP VIEW IF EXISTS v_talks_report' endend

Page 13: PostgreSQL Materialized Views with Active Record

Encapsulate in ActiveRecord Modelclass TalkReport < ActiveRecord::Base # Use associations just like any other ActiveRecord object belongs_to :author belongs_to :talk belongs_to :club belongs_to :city belongs_to :technology # take advantage of talks has_many relationship delegate :feedbacks, to: :talk

self.table_name = 'v_talks_report'

# views cannot be changed since they are virtual def readonly true endend

Page 14: PostgreSQL Materialized Views with Active Record

Highest Scoring Talks with Valid Comments

results = TalkReport.order(overall_score: :desc) \ .limit(10)

results.first.inspect => "#<TalkReport city_id: 16, city_name: \"Burlington\", state_abbr: \"VT\", technology_id: 2, club_id: 73, club_name: \"Python - Burlington\", talk_id: 6508, talk_name: \"The SQL protocol is down, reboot the optical syste...\", author_id: 100, author_name: \"Jerrell Gleichner\", overall_score: #<BigDecimal:7ff3f80f1678,'0.4E1',9(27)>>"

Page 15: PostgreSQL Materialized Views with Active Record

Highest Scoring Talks in PA by Authors named Parker

TalkReport.where(state_abbr: 'PA') \ .where("author_name LIKE '%Parker%'") \ .order(overall_score: :desc).limit(10)

Page 16: PostgreSQL Materialized Views with Active Record

Using Materialized Views

to Improve Performance

Page 17: PostgreSQL Materialized Views with Active Record

Materialized Views• Acts similar to a Database View, but results persist for

future queries

• Creates a table on disk with the Result set

• Can be indexed

• Ideal for capturing frequently used joins and aggregations

• Allows optimization of tables for updating and Materialized Views for reporting

• Must be refreshed to be updated with most recent data

Page 18: PostgreSQL Materialized Views with Active Record

class CreateTalkReportMv < ActiveRecord::Migration def up connection.execute <<-SQL CREATE MATERIALIZED VIEW mv_talks_report AS SELECT cities.id as city_id, cities.name as city_name, cities.state_abbr as state_abbr, technologies.id as technology_id, clubs.id as club_id, clubs.name as club_name, talks.id as talk_id, talks.name as talk_name, authors.id as author_id, authors.name as author_name, feedback_agg.overall_score as overall_score FROM ( SELECT talk_id, avg(score) as overall_score FROM feedbacks WHERE feedbacks.comment NOT IN ('', 'NA', 'N/A', 'not applicable') GROUP BY talk_id ) as feedback_agg INNER JOIN talks ON feedback_agg.talk_id = talks.id INNER JOIN authors ON talks.author_id = authors.id INNER JOIN clubs ON talks.club_id = clubs.id INNER JOIN cities ON clubs.city_id = cities.id INNER JOIN technologies ON clubs.technology_id = technologies.id; CREATE INDEX ON mv_talks_report (overall_score); SQL end

def down connection.execute 'DROP MATERIALIZED VIEW IF EXISTS mv_talks_report' endend

Page 19: PostgreSQL Materialized Views with Active Record

ActiveRecord Model Changeclass TalkReport < ActiveRecord::Base # No changes to associations belongs_to …

self.table_name = 'mv_talks_report'

def self.repopulate connection.execute("REFRESH MATERIALIZED VIEW #{table_name}") end # Materialized Views cannot be modified def readonly true endend

Page 20: PostgreSQL Materialized Views with Active Record

Highest Scoring Talks99% reduction in runtime

577msFeedback.filled_out \ .select('talk_id, avg(score) as score') \ .group('talk_id').order('score desc').limit(10)

1msTalkReport.order(overall_score: :desc).limit(10)

Page 21: PostgreSQL Materialized Views with Active Record

Highest Scoring Talks in PA by Authors named “Parker”

400msFeedback.filled_out.joins(talk: [:author, { club: :city }] ) \ .select('feedbacks.talk_id, avg(feedbacks.score) as overall_score') \ .where("cities.state_abbr = ?", 'PA') \ .where("authors.name LIKE '%?%'", 'Parker') \ .group('feedbacks.talk_id') \ .order('overall_score desc') \ .limit(10)

19msTalkReport.where(state_abbr: 'PA') \ .where("author_name LIKE '%?%'", 'Parker') \ .order(overall_score: :desc).limit(10)

95% reduction in runtime

Page 22: PostgreSQL Materialized Views with Active Record

Why Use Materialized Views in your Rails application?

• ActiveRecord models allow for easy representation in Rails

• Capture commonly used joins / filters

• Allows for fast, live filtering and sorting of complex associations or calculated fields

• Push data intensive processing out of Ruby to Database

• Make use of advanced Database functions

• Can Optimize Indexes for reporting only

• When Performance is more important than Storage

Page 23: PostgreSQL Materialized Views with Active Record

Downsides

• Requires PostgreSQL 9.3

• Entire Materialized View must be refreshed to update

• Bad when Live Data is required

• For this use case, roll your own Materialized View using standard tables

Page 24: PostgreSQL Materialized Views with Active Record

Downsides• Migrations are painful!

• Recommend writing in SQL, so no using scopes

• Entire Materialized View must be dropped and redefined for any changes to the View or referring tables

• Hard to read and track what changed

Page 25: PostgreSQL Materialized Views with Active Record

Resources

• Source Code used in talk

• https://github.com/droberts84/materialized-view-demo

• PostgreSQL Documentation

• https://wiki.postgresql.org/wiki/Materialized_Views