exploring ruby on rails and postgresql
DESCRIPTION
An overview of Ruby, jRuby, Rails, Torquebox, and PostgreSQL that was presented as a 3 hour class to other programmers at The Ironyard (http://theironyard.com) in Greenville, SC in July of 2013. The Rails specific sections are mostly code samples that were explained during the session so the real focus of the slides is Ruby, "the rails way" / workflow / differentiators and PostgreSQL.TRANSCRIPT
Exploring Ruby on Rails and PostgreSQL
Who am I?
• I’m Barry Jones• Application Developer since ’98– Java, PHP, Groovy, Ruby, Perl, Python– MySQL, PostgreSQL, SQL Server, Oracle, MongoDB
• Efficiency and infrastructure nut• Believer in “right tool for the job”– There is no silver bullet, programming is about
tradeoffs
The Silver Bullet
Ruby on Rails and PostgreSQL
j/k But it’s close
What do we look for in a language?• Balance
– Can it do what I need it to do?• Web: Ruby/Python/PHP/Perl/Java/C#/C/C++
– Efficient to develop with it?• Ruby/Python/PHP
– Libraries/tools/ecosystem to avoid reinventing the wheel?• Ruby/Python/PHP/Java/Perl
– Is it fast?• Ruby/Python/Java/C#/C/C++
– Is it stable?• Ruby/Python/PHP/Perl/Java/C#/C/C++
– Do other developers use it?• At my company? In the area? Globally?
– Cost effective?• Ruby/Python/PHP/Perl/C/C++
– Can it handle my architectural approach well?• Ruby/Python/Java/C# handle just about everything• CGI languages (PHP/Perl/C/C++) are very bad fits for frameworks, long polling, evented programming
– Will it scale?• Yes. This is a subjective question because web servers scale horizontally naturally
– Will my boss let me use it?• .NET shop? C#• Java shop? Java (Groovy, Clojure, Scala), jRuby, jython• *nix shop? Ruby, Python, Perl, PHP, C, C++
• Probable Winners: Ruby and Python
What stands out about Ruby?
• Malleability– Everything is an object– Objects can be monkey
patched
• Great for writing Domain Specific Languages– Puppet– Chef– Capistrano– Rails
“this is a string object”.length
class String def palindrome? self == self.reverse endend
“radar”.palindrome?
How is monkey patching good?• Rails adds web specific capabilities to Ruby
– “ “.blank? == true
• Makes using 3rd party libraries much easier– Aspect Oriented Development
• Not dependent on built in hooks
– Queued processingrecord = Record.find(id)record.delay.some_intense_logic
• DelayedJob• Resque• Sidekiq• Stalker
– Cross integrationsEmail.deliver
• MailHopper – Automatically deliver all email in the background• Gems that specifically enhance other gems
How is monkey patching…bad?
• If any behavior is modified by a monkey patch there is a chance something will break
• On a positive note, if you’re writing tests and following TDD or BDD the tests should catch any problems
• On another positive note, the ruby community is very big on testing
Why was Ruby created?
• Created by Yukirio Matsumoto
• "I wanted a scripting language that was more powerful than Perl, and more object-oriented than Python.”
• "I hope to see Ruby help every programmer in the world to be productive, and to enjoy programming, and to be happy. That is the primary purpose of Ruby language.” – Google Tech Talk in 2008
Ruby Version Manager
• cd into directory autoselects correct version of ruby and gemset
• Makes running multiple projects with multiple versions of ruby and gem dependencies on one machine dead simple
.rvmrc file
rvm rubytype-version-patch@gemset
Examples:
rvm ruby-1.9.3-p327@myprojectrvm jruby-1.7.4@myjrubyprojectrvm ree-1.8.7@oldproject
Bundler and Gemfile
$ bundle installUsing rake (10.0.4)Using i18n (0.6.1)Using multi_json (1.7.2)Using activesupport (3.2.13)Using builder (3.0.4)Using activemodel (3.2.13)Using erubis (2.7.0)Using journey (1.0.4)Using rack (1.4.5)Using rack-cache (1.2)Using rack-test (0.6.2)…Your bundle is complete! Use `bundle show [gemname]` to see where a bundled gem is installed.
source 'https://rubygems.org'source 'http://gems.github.com'
# Application infrastructuregem 'rails', '3.2.13'gem 'devise'gem 'simple_form'gem 'slim'gem 'activerecord-jdbc-adapter’gem 'activerecord-jdbcpostgresql-adapter'gem 'jdbc-postgres'gem 'jruby-openssl'gem 'jquery-rails'gem 'torquebox', '2.3.0'gem 'torquebox-server', '~> 2.3.0'
ForemanNot Ruby specific but written in rubyUsed with Heroku
Drop in a Procfile
$ foreman start
CTRL + C to stop everything
Procfile
web: bundle exec thin start -p $PORTworker: bundle exec rake resque:work QUEUE=*clock: bundle exec rake resque:scheduler
jRuby: Why?
Ruby isn’t perfect• Some gems can create memory leaks– esp. if they were written with native C
• Does not have kernel level threading– Global Interpreter Lock
• Everything is an object means unnecessary processing happens when doing things like adding numbers leading to a performance hit
jRuby: So how does it fix things?I hate writing Java…but the JVM is a work of art• Java infrastructure is virtually bulletproof
– Most mature way to deploy a web application– Enterprisey
• JVM’s garbage collector is best of breed and eliminates the potential memory leak issues
• JVM’s Just-In-Time compiler continually optimizes code the longer it runs making it faster
• JVM gives Ruby kernel level threading• jRuby inspects your Ruby code to see if you’re doing anything it would prefer you
didn’t…and turns it off if you’re not– Eg. If you aren’t overloading the + operator on int’s, it will convert them to basic types
instead of running as objects• Include and use very mature Java libraries directly in your Ruby code
– Significantly expands your toolbelt– Allows easy integration into existing Java environments
The Sidekiq TestSidekiq is a multithreaded background worker that provides tremendous concurrency benefits
Creating 1,000,000 objects in 50 concurrent threadsRuby
jRuby
The App Server TestCPU Usage
The App Server TestFree Memory
The App Server TestLatency
The App Server TestThroughput
Update and Clarification• As of this posting to Slideshare, Torquebox has a mature version 3 and a
prototype version 4 that operates in a “web server only” mode. Ruby is at version 2.1.0 with dramatic improvements to memory performance with forking which allows higher concurrency.
• At this time, jruby still wins but it’s much closer. Based on chatter from the #jruby IRC channels, a major new release of both jRuby and Torquebox are expected to dramatically improve their performance thanks to recent Java updates. The expected timeline was late 2014 last I heard.
• Independent benchmarks can be found here: http://www.techempower.com/benchmarks/#section=data-r9&hw=peak&test=json
RUBY ON RAILSLet’s take a break before covering…
What do we look for in a framework?
• Please don’t suck– Rails does not suck
• Does it follow Model-View-Controller?– Yes – Since Rails 1 it’s been the standard bearer for how to do MVC on the web, copied in almost every language
• Does it help me avoid repeating myself (DRY)?– Yes
• Is it self documenting?– Yes, it has a set of rules that generally make most documentation unnecessary
• Is it flexible enough to bend to my application needs?– Yes
• Do other people use it?– Good gosh yes
• Will it work with my database?– Yes
• Is it still going to be around in X years?– Ruby has Rails– Python has Django– Groovy has Grails– C# has MVC– PHP has fragmented framework Hell (aka – who knows?)– Java has a few major players (Struts 2, Play, etc)
Rails: The Basics
Browser
Rack
RouterController + Models
View
Rails: Rack
Watch this excellent walkthrough of Rack Middleware:
http://railscasts.com/episodes/151-rack-middleware
Summary: It’s a layer of ruby code that passes requests into your app and sends responses back out. You can add layers to do pre/post processing on all requests prior to beginning ANY of your application code.
Rails: Models / ActiveRecordclass Post < ActiveRecord::Base belongs_to :category has_many :tags, through: :posts_tags validates :title, presence: true before_save :create_slug, only: :create
scope :newest_first, order(‘created_at DESC’) scope :active, where(‘active = ?’,true) scope :newest_active, newest_first.active scope :search, lambda do |text| where(‘title LIKE ?’,”%#{text}%”) end def create_slug self.slug = title.downcase.squish.sub(‘ ‘,’-’) end
end
post = Post.new(title: ‘Some title’)post.save! ORpost = Post.create(title: ‘Some title’)
post.slug # some-titlepost.id # 1post.created_at # Created datetimepost.updated_at # Updated datetime
post.title = ‘New title’post.save!
# Relationspost.tags.firstpost.tags.countpost.category.namepost = Post.include(:tags) # Eager load
post = Post.search(‘some’).newest_active.first
Rails: Migrationsclass CreateInitialTables < ActiveRecord::Migration def up create_table :posts do |t| t.string :title t.text :body t.string :slug t.integer :category_id
t.timestamps end
# … create more tables… add_index :tags, [:name,:something], unique: true
execute “UPDATE posts SET field = ‘value’ WHERE stuff = ‘happens’” end
def down drop_table :posts end
def change add_column :posts, :user_id, :integer endend
$ rake db:migrate
Rails: Controllers Class PostsController < ApplicationController before_filter :authenticate, only: :destroy
def index # GET /posts end
def new # GET /posts/new end
def create # POST /posts end
def show # GET /posts/:id end
def edit # GET /posts/:id/edit end
def update # PUT /posts/:id end
def destroy # DELETE /posts/:id endend
# Routesresources :posts
OR limit it
resources :posts, only: [:create,:new]
Rails: Views/app/views /layouts /application.html.erb /posts /new.html.slim /new.json.rabl /index.xml.erb /_widget.html.erb
# slim example.post h2=post.title .body.grid-8=post.body
# erb example<div class=“post”> <h2><%=post.title%></h2> <div class=“body grid-8”> <%=post.body%> </div></div>
Rails: Testing with rspecDescribe Post do describe ‘a basic test’ do subject { FactoryGirl.build(:post,title: ‘Some title’) } it ‘should be valid’ do should_not be_nil subject.valid?.should be_true end end
describe ‘something with a complicated dependency’ do before do Post.stub(:function_to_override){ true } end end
describe ‘a test with API hits’ do use_vcr_cassette ‘all_a_twitter’, record: :new_episodes endend
POSTGRESQLLet’s take a break before we talk about…
How do you pronounce it?Answer Response Percentage
post-gres-q-l 2379 45%
post-gres 1611 30%
pahst-grey 24 0%
pg-sequel 50 0%
post-gree 350 6%
postgres-sequel 574 10%
p-g 49 0%
database 230 4%
Total 5267
What IS PostgreSQL?
• Fully ACID compliant• Feature rich and extensible• Fast, scalable and leverages multicore
processors very well• Enterprise class with quality corporate support
options• Free as in beer• It’s kind’ve nifty
Laundry List of Features• Multi-version Concurrency Control (MVCC)• Point in Time Recovery• Tablespaces• Asynchronous replication• Nested Transactions• Online/hot backups• Genetic query optimizer multiple index types• Write ahead logging (WAL)• Internationalization: character sets, locale-aware sorting, case sensitivity, formatting• Full subquery support• Multiple index scans per query• ANSI-SQL:2008 standard conformant• Table inheritance• LISTEN / NOTIFY event system• Ability to make a Power Point slide run out of room
What are we covering today?
• Full text-search• Built in data types• User defined data types• Automatic data compression• A look at some other cool features and
extensions, depending how we’re doing on time
Full-text Search• What about…?
– Solr– Elastic Search– Sphinx– Lucene– MySQL
• All have their purpose– Distributed search of multiple document types
• Sphinx
– Client search performance is all that matters• Solr
– Search constantly incoming data with streaming index updates• Elastic Search excels
– You really like Java• Lucene
– You want terrible search results that don’t even make sense to you much less your users• MySQL full text search = the worst thing in the world
Full-text Search
• Complications of stand alone search engines– Data synchronization
• Managing deltas, index updates• Filtering/deleting/hiding expired data• Search server outages, redundancy
– Learning curve– Character sets match up with my database?– Additional hardware / servers just for search– Can feel like a black box when you get a support
question asking “why is/isn’t this showing up?”
Full-text Search
• But what if your needs are more like:– Search within my database– Avoid syncing data with outside systems– Avoid maintaining outside systems– Less black box, more control
Full-text Search • tsvector
– The text to be searched• tsquery
– The search query
• to_tsvector(‘the church is AWESOME’) @@ to_tsquery(SEARCH)• @@ to_tsquery(‘church’) == true• @@ to_tsquery(‘churches’) == true• @@ to_tsquery(‘awesome’) == true• @@ to_tsquery(‘the’) == false• @@ to_tsquery(‘churches & awesome’) == true• @@ to_tsquery(‘church & okay’) == false
• to_tsvector(‘the church is awesome’)– 'awesom':4 'church':2
• to_tsvector(‘simple’,’the church is awesome’)– 'are':3 'awesome':4 'church':2 'the':1
Full-text Search• ALTER TABLE mytable ADD COLUMN search_vector tsvector
• UPDATE mytable SET search_vector = to_tsvector(‘english’,coalesce(title,’’) || ‘ ‘ || coalesce(body,’’) || ‘ ‘ || coalesce(tags,’’))
• CREATE INDEX search_text ON mytable USING gin(search_vector)
• SELECT some, columns, we, needFROM mytableWHERE search_vector @@ to_tsquery(‘english’,‘Jesus & awesome’)ORDER BY ts_rank(search_vector,to_tsquery(‘english’,‘Jesus & awesome’)) DESC
• CREATE TRIGGER search_update BEFORE INSERT OR UPDATEON mytable FOR EACH ROW EXECUTE PROCEDUREtsvector_update_trigger(search_vector, ’english’, title, body, tags)
Full-text Search• CREATE FUNCTION search_trigger RETURNS trigger AS $$
begin new.search_vector := setweight(to_tsvector(‘english’,coalesce(new.title,’’)),’A’) || setweight(to_tsvector(‘english’,coalesce(new.body,’’)),’D’) || setweight(to_tsvector(‘english’,coalesce(new.tags,’’)),’B’); return new;end$$ LANGUAGE plpgsql;
• CREATE TRIGGER search_vector_update BEFORE INSERT OR UPDATE OF title, body, tags ON mytable FOR EACH ROW EXECUTE PROCEDURE search_trigger();
Full-text Search
• A variety of dictionaries– Various Languages– Thesaurus– Snowball, Stem, Ispell, Synonym– Write your own
• ts_headline– Snippet extraction and highlighting
Datatypes: ranges• int4range, int8range, numrange, tsrange, tstzrange, daterange
• SELECT int4range(10,20) @> 3 == false• SELECT numrange(11.1,22.2) && numrange(20.0,30.0) == true• SELECT int4range(10,20) * int4range(15,25) == 15-20
• CREATE INDEX res_index ON schedule USING gist(during)
• ALTER TABLE schedule ADD EXCLUDE USING gist (during WITH &&)
ERROR: conflicting key value violates exclusion constraint ”schedule_during_excl”DETAIL: Key (during)=([ 2010-01-01 14:45:00, 2010-01-01 15:45:00 )) conflicts with existing key (during)=([ 2010-01-01 14:30:00, 2010-01-01 15:30:00 )).
Datatypes: hstore• properties
– {“author” => “John Grisham”, “pages” => 535}– {“director” => “Jon Favreau”, “runtime” = 126}
• SELECT … FROM mytable WHERE properties -> ‘director’ LIKE ‘%Favreau’– Does not use an index
• WHERE properties @> (‘author’ LIKE “%Grisham”)– Uses an index to only check properties with an ‘author’
• CREATE INDEX table_properties ON mytable USING gin(properties)
Datatypes: arrays• CREATE TABLE sal_emp(name text, pay_by_quarter integer[],
schedule text[][])
• CREATE TABLE tictactoe ( squares integer[3][3] )
• INSERT INTO tictactoe VALUES (‘{{1,2,3},{4,5,6},{7,8,9}}’)
• SELECT squares[1:2][1:1] == {{1},{4}}
• SELECT squares[2:3][2:3] == {{5,6},{8,9}}
Datatypes: JSON
• Validate JSON structure• Convert row to JSON• Functions and operators very similar to hstore
Datatypes: XML
• Validates well-formed XML• Stores like a TEXT field• XML operations like Xpath• Can’t index XML column but you can index the
result of an Xpath function
Data compression with TOAST
• TOAST = The Oversized Attribute Storage Technique
• TOASTable data is automatically TOASTed
• Example: – stored a 2.2m XML document– storage size was 81k
User created datatypes• Built in types
– Numerics, monetary, binary, time, date, interval, boolean, enumerated, geometric, network address, bit string, text search, UUID, XML, JSON, array, composite, range
– Add-ons for more such as UPC, ISBN and more
• Create your own types– Address (contains 2 streets, city, state, zip, country)– Define how your datatype is indexed– GIN and GiST indexes are used by custom datatypes
Further exploration: PostGIS• Adds Geographic datatypes• Distance, area, union, intersection, perimeter• Spatial indexes• Tools to load available geographic data• Distance, Within, Overlaps, Touches, Equals,
Contains, Crosses
• SELECT name, ST_AsText(geom)FROM nyc_subway_stationsWHERE name = ‘Broad St’
• SELECT name, boronameFROM nyc_neighborhoodsWHERE ST_Intersects(geom, ST_GeomFromText(‘POINT(583571 4506714)’,26918)
• SELECT sub.name, nh.name, nh.borough FROM nyc_neighborhoods AS nh JOIN nyc_subway_stations AS sub ON ST_Contains(nh.geom, sub.geom)WHERE sub.name = ‘Broad St”
Further exploration: Functions
• Can be used in queries• Can be used in stored procedures and triggers• Can be used to build indexes• Can be used as table defaults• Can be written in PL/pgSQL, PL/Tcl, PL/Perl,
PL/Python out of the box• PL/V8 is available an an extension to use
Javascript
Further exploration: PLV8• CREATE OR REPLACE FUNCTION plv8_test(keys text[], vals text[])
RETURNS text AS $$ var o = {}; for(var i = 0; i < keys.length; i++) { o[keys[i]] = vals[i]; } return JSON.stringify(o);$$ LANGUAGE plv8 IMMUTABLE STRICT;
SELECT plv8_test(ARRAY[‘name’,’age’],ARRAY[‘Tom’,’29’]);
• CREATE TYPE rec AS (i integer, t text);CREATE FUNCTION set_of_records RETURNS SETOF rec AS $$ plv8.return_next({“i”: 1,”t”: ”a”}); plv8.return_next({“i”: 2,”t”: “b”});$$ LANGUAGE plv8;
SELECT * FROM set_of_records();
Further exploration: Async commands / indexes
• Fine grained control within functions– PQsendQuery– PQsendQueryParams– PQsendPrepare– PQsendQueryPrepared– PQsendDescribePrepared– PQgetResult– PQconsumeInput
• Per connection asynchronous commits– set synchronous_commit = off
• Concurrent index creation to avoid blocking large tables– CREATE INDEX CONCURRENTLY big_index ON mytable (things)
ARCHITECTUREAnd finally…
Biggest Issue with Frameworks
• Framework Dependency• Trying to do everything in application code• Race conditions• Package dependency
Old School
• Service Oriented Architecture– Getting more popular because of REST– Had been happening for years prior with WSDL
• Database managed your data– Constraints, triggers, functions, stored procedures– If it was in the database…it was valid
• Nothing has changed…this is still the best way
If you really leverage your database…
• You can easily break your application into logical parts
• You don’t need to create APIs through your core code base when direct DB access there
• You can use a different language for certain things if it makes sense to do so– Node.js is great for APIs– Using a library that only runs on Windows
• Database can provide granular access controls
Architecture: Before
Architecture: After
Architecture: Scaled
THANKS!
Credits / Sources• NOTE: Some code samples in this presentation have minor alterations for
presentation clarity (such as leaving out dictionary specifications on some search calls, etc)
• http://www.postgresql.org/docs/9.2/static/index.html• http://workshops.opengeo.org/postgis-intro/• http://stackoverflow.com/questions/15983152/how-can-i-find-out-how-big-a-large-
text-field-is-in-postgres
• https://devcenter.heroku.com/articles/heroku-postgres-extensions-postgis-full-text-search
• http://railscasts.com/episodes/345-hstore?view=asciicast• http://www.slideshare.net/billkarwin/full-text-search-in-postgresql• http://sourceforge.net/apps/mediawiki/postgres-xc/index.php?title=Main_Page• http://railscasts.com/episodes/151-rack-middleware• http://joshrendek.com/2012/11/sidekiq-vs-resque/• http://torquebox.org/news/2011/10/06/torquebox-2x-performance/• http://jruby.org/• https://rvm.io/• http://ddollar.github.io/foreman/• http://en.wikipedia.org/wiki/Ruby_(programming_language)• http://bundler.io/• http://www.techempower.com/benchmarks/#section=data-r9&hw=peak&test=json