demystifying postgresql (zendcon 2010)
DESCRIPTION
LAMP was originally Linux, Apache, MySQL, PHP. While the L & A have parts have become more flexible, most still use MySQL. With the recent acquisition by Oracle there's no better time to demystify PostgreSQL. For years PostgreSQL has had a reputation of being difficult, but this is the furthest from the truth. This presentation by Asher Snyder will cover installation, basic queries, stored procedures, triggers, and full-text search and more.TRANSCRIPT
Demystifying
Presented by
Asher Snyder
@ashyboy
co-founder of
Latest PostgreSQL version 9.0.1www.postgresql.org
What is PostgreSQL?• Completely Open Source Database System
– Started in 1995– Open source project– Not owned by any one company– Controlled by the community– Can’t be bought (Looking at you ORACLE)
• ORDMBS– Object-relational database management system?
• RDBMS with object-oriented database model– That’s right, it’s not just an RDMBS– You can create your own objects– You can inherit
• Fully ACID compliant– Atomicity, Consistency, Isolation, Durability. Guarantees that
database transactions are processed reliably.• ANSI SQL compliant
Notable Features
• Transactions• Functions (Stored procedures)• Rules• Views• Triggers• Inheritance• Custom Types• Referential Integrity• Array Data Types• Schemas• Hot Standby (as of 9.0)• Streaming Replication (as of 9.0)
Support
• Excellent Personal Support!– Vibrant Community
• Active Mailing List• Active IRC - #postgreSQL on irc.freenode.net
– Absolutely amazing
• Complete, Extensive and Detailed Documentation– http://www.postgresql.org/docs/
• Regular and frequent releases and updates– Public Roadmap
• Support for older builds– Currently support and release updates to builds as
old as 5 years
Support
• Numerous Shared Hosts– Such as A2hosting (http://www.a2hosting.com)
• Numerous GUI Administration Tools– pgAdmin (http://www.pgadmin.org/)– php pgAdmin
(http://phppgadmin.sourceforge.net/)– Notable commercial tools
• Navicat (http://www.navicat.com)• EMS (http://sqlmanager.net)
– Many many more• http://wiki.postgresql.org/wiki/Community_Guide_to_
PostgreSQL_GUI_Tools
I nsta l lat ion
• Despite what you’ve heard Postgres is NOT hard to install
$ apt-get install postgresql
On Ubuntu or Debian:
$ emerge postgresql
On Gentoo:
$ adduser postgresmkdir /usr/local/pgsql/data chown postgres /usr/local/pgsql/data su - postgres/usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data /usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data
Manual Installation:
I nsta l lat ion (cont)
• Regardless of what installation method you choose. Make sure to modify postgresql.conf and pg_hba.confconfiguration files if you want to allow outside access.
# TYPE DATABASE USER CIDR-ADDRESS METHODhost all all 0.0.0.0/0 md5
pg_hba.conf - Controls which hosts are allowed to connect
Allows for connection from any outside connection with md5 verification
# - Connection Settingslisten_addresses = '*' # IP address to listen on #listen_addresses = 'localhost' # default
postgresql.conf - PostgreSQL configuration file
Allows PostgreSQL to listen on any address
Gett ing Started
$ /etc/init.d/postgresql start
pg_hba.conf - Controls which hosts are allowed to connect
$ /usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data
alternatively, you can start it manually
Start from distro
Creat ing Your Database
$ psqlLaunch psql
CREATE DATABASE first_db;Create your first PostgreSQL database
\c first_dbConnect to your newly created database
Tables
Tables - Creat ing
CREATE TABLE users ("user_id" SERIAL, "email" TEXT, "firstname" TEXT, "lastname" TEXT, "password" TEXT, PRIMARY KEY("user_id"));
• What’s a SERIAL?– Short for INTEGER NOT NULL with default value of nextval('users_user_id_seq')
• Similar to AUTO_INCREMENT property of other databases.
Tables - Insert ing
INSERT INTO users (email, firstname, lastname, password)VALUES ('[email protected]', 'Asher', 'Snyder', 'Pass');
We can explicitly define each field and the value associated with it. This will set user_id to the default value.
INSERT INTO users VALUES (DEFAULT, '[email protected]', 'John', 'Doe', 'OtherPass');
Alternatively, we can not specify the column names and insert based on the column order, using DEFAULT for the user_id.
SELECT * FROM users;
Lets see the results
user_id | email | firstname | lastname | password--------+-------------------+-----------+------------------+-----------+----------------------------
1 | [email protected] | Ash | Snyder | Pass2 | [email protected] | John | Doe | OtherPass
Views
Views
CREATE VIEW v_get_all_users AS SELECT * FROM users;
• Views allow you store a query for easy retrieval later.– Query a view as if it were a table– Allows you to name your query– Use views as types– Abstract & encapsulate table structure changes
• Allows for easier modification & extension of your database
Create basic view
Query the view
SELECT * FROM v_get_all_users; user_id | email | firstname | lastname | password--------+-------------------+-----------+------------------+-----------+----------------------------
1 | [email protected] | Ash | Snyder | Pass2 | [email protected] | John | Doe | OtherPass
Views (cont)
ALTER TABLE users ADD COLUMN "time_created" TIMESTAMP WITHOUT TIME ZONE DEFAULT
Alter Table
Now, if we were to query the table we would see a timestamp showing us when the user was created.
SELECT * FROM users;
As you can see, this is not very useful for humans. This is where a view can come in to make your life easier.
user_id | email | firstname | lastname | password | time_created--------+-------------------+-----------+------------------+-----------+----------------------------
1 | [email protected] | Ash | Snyder | Pass | 2010-10-27 14:30:07.3359362 | [email protected] | John | Doe | OtherPass | 2010-10-27 14:30:07.335936
Views (cont)
CREATE OR REPLACE VIEW v_get_all_usersASSELECT user_id,
email,firstname,lastname,password,time_created,to_char(time_created, 'FMMonth
FMDDth, YYYY FMHH12:MI:SS AM') as friendly_timeFROM users;
Alter View
Views (cont)
Now, when we query the view we can actually interpret time_created
SELECT * FROM v_get_all_users;user_id | email | firstname | lastname | password | time_created---------+-------------------+-----------+------------------+-----------+----------------------------
1 | [email protected] | Ash | Snyder | Pass | 2010-10-27 14:30:07.335936 2 | [email protected] | John | Doe | OtherPass | 2010-10-27 15:20:05.235936
| friendly_time+-------------------------------October 27th, 2010 2:30:07 PM October 27th, 2010 3:20:05 PM
Views (cont) – Joined ViewFinally, lets create a joined view
CREATE TABLE companies ( "company_id" SERIAL, "company_name" TEXT, PRIMARY KEY("company_id") );
Create companies table
INSERT INTO companies VALUES(DEFAULT, 'NOLOH LLC.');
Add company
ALTER TABLE users ADD COLUMN company_id INTEGER;
Add company_id to users
UPDATE users SET company_id = 1;
Update users
Views (cont) – Joined View
CREATE OR REPLACE VIEW v_get_all_usersASSELECT user_id,
email,firstname,lastname,password,time_created,to_char(time_created, 'FMMonth FMDDth, YYYY
FMHH12:MI:SS AM') as friendly_time,t2.company_id,t2.company_name
FROM users t1LEFT JOIN companies t2 ON (t1.company_id = t2.company_id);
Alter view
Views (cont)
SELECT * FROM v_get_all_users;user_id | email | firstname | lastname | password | time_created---------+-------------------+-----------+------------------+-----------+----------------------------
1 | [email protected] | Ash | Snyder | SomePass | 2010-10-27 14:30:07.335936 2 | [email protected] | John | Doe | OtherPass | 2010-10-27 15:20:05.235936
| friendly_time | company_id | company_name+------------------------------+------------+--------------October 27th, 2010 2:30:07 PM | 1 | NOLOH LLC. October 27th, 2010 3:20:05 PM | 1 | NOLOH LLC.
Query view
Nice! Now instead of having to modify a query each time we can just use v_get_all_users. We can even use this VIEW as a return type when creating your own database functions.
Funct ions
Funct ions• Also known as Stored Procedures• Allows you to carry out operations that would normally
take several queries and round-trips in a single function within the database
• Allows database re-use as other applications can interact directly with your stored procedures instead of a middle-tier or duplicating code
• Can be used in other functions• First class citizens
– Query functions like tables– Create functions in language of your choice
• SQL, PL/pgSQL, C, Python, etc.– Allowed to modify tables and perform multiple operations– Defaults– In/Out parameters
Funct ions (cont)
CREATE FUNCTION f_add_company(p_name TEXT) RETURNS INTEGER AS $func$ INSERT INTO companies (company_name) VALUES ($1) RETURNING company_id; $func$ LANGUAGE SQL;
Create basic function
Call f_add_company
f_add_company---------------
2(1 row)
SELECT f_add_company('Google');
Funct ions (cont) – Syntax Expla ined• Return type
– In this case we used INTEGER• Can be anything you like including your views and own custom types
– Can even return an ARRAY of types, such as INTEGER[]• Do not confusing this with returning a SET of types.
• Returning multiple rows of a particular types is done through RETURN SETOF. For example, RETURN SETOF v_get_all_users.
• $func$ is our delimiter separating the function body. It can be any string you like. For example $body$ instead of $func$.
• $1 refers to parameter corresponding to that number. We can also use the parameter name instead. In our example this would be 'p_name'.
Funct ions (cont) – PL/pgSQL
• If you’re using PostgreSQL < 9.0 make sure you add PL/pgSQL support
CREATE TRUSTED PROCEDURAL LANGUAGE "plpgsql" HANDLER "plpgsql_call_handler" VALIDATOR "plpgsql_validator";
Funct ions (cont) – PL/pgSQL
CREATE OR REPLACE FUNCTION f_add_company(p_name TEXT) RETURNS INTEGERAS$func$DECLARE
return_var INTEGER;BEGIN
SELECT INTO return_var company_id FROM companies WHERE lower(company_name) = lower($1);
IF NOT FOUND THENINSERT INTO companies (company_name) VALUES ($1) RETURNING
company_id INTO return_var;END IF;RETURN return_var;
END;$func$LANGUAGE plpgsql;
Funct ions (cont)
Call f_add_company
f_add_company---------------
3(1 row)
SELECT f_add_company('Zend');
Call f_add_company with repeated entry
f_add_company---------------
2(1 row)
SELECT f_add_company('Google');
• We can see that using functions in our database allows us to integrate safeguards and business logic into our functions allowing for increased modularity and re-use.
Triggers
Triggers• Specifies a function to be called BEFORE or AFTER any INSERT, UPDATE, or DELETE operation– Similar to an event handler (but clearly confined) in event
based programming– BEFORE triggers fire before the data is actually inserted.– AFTER triggers fire after the statement is executed and
data is inserted into the row• Function takes NO parameters and returns type TRIGGER
• Most commonly used for logging or validation• Can be defined on entire statement or a per-row basis
– Statement level triggers should always return NULL• As of 9.0 can be specified for specific columns and
specific WHEN conditions
Triggers (cont)
CREATE TABLE logs ("log_id" SERIAL, "log" TEXT,PRIMARY KEY("log_id")
);
Create logs table
CREATE FUNCTION "tr_log_handler"()RETURNS trigger AS$func$DECLARE
log_string TEXT;BEGIN
log_string := 'User ' || OLD.user_id || ' changed ' || CURRENT_TIMESTAMP;IF NEW.email != OLD.email THEN
log_string := log_string || 'email changed from ' || OLD.email || ' to ' || NEW.email || '. ';
END IF;IF NEW.firstname != OLD.firstname THEN
log_string := log_string || 'firstname changed from ' || OLD.firstname || ' to ' || NEW.firstname || '. ';
END IF;IF NEW.lastname != OLD.lastname THEN
log_string := log_string || 'lastname changed from ' || OLD.lastname || ' to ' || NEW.lastname || '. ';
END IF;INSERT INTO logs (log) VALUES (log_string);RETURN NEW;
END;$func$LANGUAGE plpgsql;
Create trigger handler
Triggers (cont)
CREATE TRIGGER "tr_log" AFTER UPDATE ON users FOR EACH ROW EXECUTE PROCEDURE "tr_log_handler"();
Create trigger
log_id | log --------+---------------------------------------------------------------------------------
1 | User 1 changed 2010-08-30 23:43:23.771347-04 firstname changed from Asher to Ash.(1 row)
SELECT * FROM logs;
Update userUPDATE users SET firstname = 'Ash' WHERE user_id = 1;
Display logs
Ful l -Text Search
Ful l -Text Search
• Allows documents to be preprocessed • Search through text to find matches• Sort matches based on relevance
– Apply weights to certain attributes to increase/decrease relevance
• Faster than LIKE, ILIKE, and LIKE with regular expressions
• Define your own dictionaries– Define your own words, synonyms, phrase
relationships, word variations
Ful l -Text Search
• tsvector– Vectorized text data
• tsquery– Search predicate
• Converted to Normalized lexemes– Ex. run, runs, ran and running are forms of the same lexeme
• @@– Match operator
Ful l -Text Search (cont)
SELECT zendcon 2010 is a great conference'::tsvector@@ 'conference & great'::tsquery;
Match Test 1
SELECT zendcon 2010 is a great conference'@@ 'conference & bad'
Match Test 2
?column?---------
t(1 row)
?column?---------
f(1 row)
Ful l -Text Search (cont) – to_
SELECT to_tsvector(zendcon 2010 is a great conference‘)@@ plainto_tsquery('conference & great') ;
to_tsvector & plainto_tsquery
SELECT * FROM companies WHERE to_tsvector(company_name) @@ to_tsquery('unlimited');
Search through tables
?column?---------
t(1 row)
company_id | company_name------------+--------------(0 rows)
Ful l -Text Search
• setweight– Possible weights are A, B, C, D
• equivalent 1.0, 0.4, 0.2, 0.1 respectively
• ts_rank– Normal ranking function
• ts_rank_cd– uses the cover density method of ranking, as
specified in Clarke, Cormack, and Tudhope’s“Relevance Ranking for One to Three Term Queries” in the journal, “Information Processing and Management”, 1999.
Ful l -Text Search (cont) - Rank ing
SELECT case_id, file_name, ts_rank_cd(setweight(to_tsvector(coalesce(t1.file_name)), 'A') ||
setweight(to_tsvector(coalesce(t1.contents)), 'B'), query) AS rankFROM cases.cases t1, plainto_tsquery('copyright') queryWHERE to_tsvector(file_name || ' ' || contents) @@ queryORDER BY rank DESC LIMIT 10
setweight & ts_rank_cd
case_id | file_name | rank ---------+------------------------------------------------------+------
101 | Harper & Row v Nation Enter 471 US 539.pdf | 84.8113 | IN Re Literary Works Elec Databases 509 F3d 522.pdf | 76215 | Lexmark Intl v Static Control 387 F3d 522.pdf | 75.2283 | Feist Pubs v Rural Tel 499 US 340.pdf | 67.2216 | Lucks Music Library v Ashcroft 321 F Supp 2d 107.pdf | 59.2342 | Blue Nile v Ice 478 F Supp 2d 1240.pdf | 50.8374 | Perfect 10 v Amazon 487 F3d 701.pdf | 43.685 | Pillsbury v Milky Way 215 USPQ 124.pdf | 43.6197 | St Lukes Cataract v Sanderson 573 F3d 1186.pdf | 42272 | Religious Technology v Netcom 907 F Supp 1361.pdf | 42
(10 rows)
Communicat ion
Communicat ion - Nat ive
• Use native pgsql functions– pg_connect, pg_query, pg_fetch_row, etc.
$dbconn = pg_connect("dbname=mary");
//alternatively connect with host & port information$dbconn2 = pg_connect("host=localhost port=5432 dbname=mary");
Connect to database
$users = pg_query('SELECT * FROM users'); while($user = pg_fetch_row($users))
print_r($user);
Query database
Communicat ion – 3rd Party
• PDO• Doctrine• Propel• Redbean• Zend Framework• phpDataMapper• PHP Framework Implementation
– Most PHP Frameworks provide some sort of communication and connection functionality
Communicat ion –
//Create Connection$firstDB = new DataConnection(Data::Postgres, 'first_db');
Create Connection
$users = $firstDB->ExecSQL('SELECT * FROM users');
Query database
$firstDB->ExecSQL('INSERT INTO companies (company_name) VALUES($1)', $company);
Query database with params
$user = $firstDB->ExecFunction('f_add_user', $email, $first, $last, $pass, $company);
Query function
$users = $firstDB->ExecView('v_get_all_users');
Query view
Data::$Links->FirstDb = new DataConnection(Data::Postgres, 'first_db');Data::$Links->FirstDb->ExecSQL('SELECT * FROM users');
Data::$Links – Persistent method to connect and access your database
Quest ions
?@ashyboy
Presented by
Asher Snyderco-founder of