algebra y sql
DESCRIPTION
AR Y SQLTRANSCRIPT
Wolf-Tilo Balke
Joachim Selke
Institut für Informationssysteme
Technische Universität Braunschweig
www.ifis.cs.tu-bs.de
Relational
Database Systems 1
• Implementing the relational model
• SQL
– Queries
• SELECT
– Data manipulation language (next lecture)
• INSERT
• UPDATE
• DELETE
– Data definition language (next lecture)
• CREATE TABLE
• ALTER TABLE
• DROP TABLE
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 2
Overview
• A hotel database:
Hotel(hotelNo, hotelName, city)
Room(roomNo, hotelNo → Hotel, type, price)
Booking(hotelNo → Hotel/Room, guestNo → Guest,
dateFrom, dateTo, roomNo → Room)
Guest(guestNo, guestName, guestAddress)
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 3
Exercise 6.1
• Give relational algebra expressions for the following queries in natural language:
a) Create a list of all hotels that includes the total number of rooms for each hotel.
πhotelName, count roomNo(
hotelNo, hotelName𝔉count(roomNo) (Hotel ⋈ Room))
• Of course, this does not work for hotels without any rooms. However, let‟s assume that there is no such hotel ...
b) What‟s the average room price at the Balke Inn?
πaverage price(𝔉average(price) (
ςhotelName = „Balke Inn‟(Hotel) ⋈ Room))
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 4
Exercise 6.1
c) How many different people made a reservation for a
room whose price is higher than the average room
price at the respective hotel?
AvgPrice ≔ πhotelNo, average price(
hotelNo𝔉average(price) (Hotel ⋈ Room))
ExpensiveRooms ≔πhotelNo, roomNo(ςprice > average price(Room ⋈ AvgPrice))
RichGuests = πguestNo(Booking ⋈ ExpensiveRooms)
Result = 𝔉count(guestNo)RichGuests
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 5
Exercise 6.1
• Describe (using natural language) the relations
that would be produced by the following
tuple relational calculus expressions:
a) { H.hotelName | hotel(H) ∧ H.city = „Braunschweig‟ }
List the names of all hotels in Braunschweig!
b) { H.hotelName | Hotel(H) ∧ ∃R (
Room(R) ∧ H.hotelNo = R.hotelNo ∧ R.price > 50)}
List the names of all hotels offering a room that
costs more than €50!
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 6
Exercise 6.2
c) { H.hotelName | Hotel(H) ∧ ∃B ∃G (
Booking(B) ∧ Guest(G) ∧ H.hotelNo = B.hotelNo
∧ B.guestNo = G.guestNo
∧ G.guestName = „Wolf-Tilo Balke‟)}
List the names of all hotels in that Wolf-Tilo Balke
has or had a reservation!
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 7
Exercise 6.2
d) { H.hotelName, G.guestName, B1.dateFrom, B2.dateFrom |Hotel(H) ∧ Guest(G) ∧ Booking(B1) ∧ Booking(B2)
∧ H.hotelNo = B1.hotelNo ∧ G.guestNo = B1.guestNo
∧ B2.hotelNo = B1.hotelNo ∧ B2.guestNo = B1.guestNo
∧ B2.dateFrom ≠ B1.dateFrom}
List the names of all hotels and guests that
have or had different reservations at the same hotel;
for each hotel and guest, also list all possible pairs
of starting dates for the corresponding reservations.
• Hint: (hotelNo, guestNo, dateFrom) is the primary key of
Booking, which makes each result line a pair of reservations
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 8
Exercise 6.2
• Generate the relational algebra, tuple relational
calculus, and domain relational calculus
expressions for the following queries:
a) List all hotels.
RA: πhotelName(Hotel)
TRC: { h.hotelName | Hotel(h) }
DRC: { h2 | ∃h1, h3 Hotel(h1, h2, h3) }
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 9
Exercise 6.3
b) List all single rooms with a price below €20 per night.
RA: πhotelName, city, roomNo(
ςtype = „single‟ ∧ price < 20 (Room ⋈ Hotel))
TRC: { h.hotelName, h.city, r.roomNo |
Hotel(h) ∧ Room(r) ∧ r.type = „single‟
∧ r.price < 20 ∧ h.hotelNo = r.hotelNo }
DRC: { h2, h3, r1 | ∃r4, h1 (Room(r1, h1, „single‟, r4)
∧ r4 < 20 ∧ Hotel(h1, h2, h3)) }
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 10
Exercise 6.3
c) List the price and type of all rooms at the Balke Inn.
RA: πroomNo, type, price(Room ⋈ ςhotelName = „Balke Inn‟(Hotel))
TRC: { r.roomNo, r.roomType, r.price | Room(r)
∧ ∃h (Hotel(h) ∧ h.hotelName = „Balke Inn‟
∧ h.hotelNo = r.hotelNo) }
DRC: { r1, r3, r4 | ∃h1, h3 (Room(r1, h1, r3, r4)
∧ Hotel(h1, „Balke Inn‟, h3)) }
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 11
Exercise 6.3
d) List the details (guestNo, guestName, and guestAddress)
of all guests currently staying at the Balke Inn.
RA: Guest ⋉ ςdateFrom ≤ TODAY ∧ dateTo ≥ TODAY
∧ hotelName = „Balke Inn‟(Hotel ⋈ Booking)
TRC: { g | Guest(g) ∧ ∃b, h (Booking(b) ∧ Hotel(h)
∧ b.guestNo = g.guestNo ∧ b.hotelNo = h.hotelNo
∧ b.dateFrom ≤ TODAY ∧ b.dateTo ≥ TODAY
∧ h.hotelName = „Balke Inn‟) }
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 12
Exercise 6.3
DRC: { g1, g2, g3 | Guest(g1, g2, g3) ∧ ∃h1, h3, b3, b4, b5 ( Hotel(h1, „Balke Inn‟, h3)
∧ Booking(h1, g1, b3, b4, b5) ∧ b3 ≤ TODAY
∧ b4 ≥ TODAY) }
e) List the details of all rooms at the Balke Inn,including the name of the guest currently stayingin the room, if the room is occupied.
RA: πroomNo, type, price, guestName(Room
⎧ (ςhotelName = „Balke Inn‟(Hotel)⋈ ςdateFrom ≤ TODAY ∧ dateTo ≥ TODAY(Booking)
⋈ Guest))
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 13
Exercise 6.3
TRC:
BalkeRoom(r) ≔ Room(r) ∧ ∃h (Hotel(h)
∧ h.hotelName = „Balke Inn‟ ∧ h.hotelNo = r.hotelNo)
CurrentBooking(g, r) ≔ Guest(g) ∧ ∃b (Booking(b)
∧ b.hotelNo = r.hotelNo ∧ b.roomNo = r.roomNo
∧ b.dateFrom ≤ TODAY ∧ b.dateTo ≥ TODAY
∧ b.guestNo = g.guestNo)
{ r.roomNo, r.type, r.price, g.guestName | BR(r) ∧
(CB(g, r) ∨ (≦∃g‟ CB(g‟, r) ∧ g = (null, null, null, null))) }
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 14
Exercise 6.3
DRC:
BalkeRoom(r1, r2, r3, r4) ≔ Room(r1, r2, r3, r4)
∧ ∃h3 Hotel(r2, „Balke Inn‟, h3)
CurrentBooking(g2, r1, r2) ≔ ∃g1, g3 (Guest(g1, g2, g3)
∧ ∃b3, b4 (Booking(r2, g1, b3, b4, r1)
∧ b3 ≤ TODAY ∧ b4 ≥ TODAY))
{ r1 r3, r4, g2 | ∃r2 BR(r1, r2, r3, r4) ∧
(CB(g2, r1, r2) ∨ (≦∃g2‟ CB(g2‟, r1, r2) ∧ g2 = null)) }
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 15
Exercise 6.3
• In the early 1970s, the relational modelbecame a “hot topic” database research
– Based on set theory
– A relation is a subset of theCartesian product over a list of domains
• Early “query interfaces” for the relational model:
– Relational algebra
– Tuple relational calculus (SQUARE, SEQUEL)
– Domain relational calculus (QBE)
• Question: How to build a working database management system using this theory?
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 16
7.1 FromTheory to Practice
• System R was the first working prototype of
a relational database system (starting 1973)
– Most design decisions taken during the
development of System R substantially influenced
the design of subsequent systems
• Questions
– How to store and represent data?
– How to query for data?
– How to manipulate data?
– How do you do all this with good performance?
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 17
7.1 FromTheory to Practice
• The challenge of the System R project was to create a working prototype system
– Theory is good
– But developers were willing to sacrifice theoretical beauty and clarity for the sake of usability and performance
• Vocabulary change
– Mathematical terms were too unfamiliar for most people
– Table = relation
– Row = tuple
– Column = attribute
– Data type, domain = domain
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 18
7.1 FromTheory to Practice
• Design decisions:
During the development of System R, two major
and very controversial decisions had been made
– Allow duplicate tuples
– Allow NULL values
• Those decisions are still subject to discussions…
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 19
7.1 FromTheory to Practice
• Duplicates
– In a relation, there cannot be any duplicate tuples
– Also, query results cannot contain duplicates
• Remember: The relational algebra and relational calculi
all had implicit duplicate elimination
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 20
7.1 FromTheory to Practice
• Practical considerations
– You want to query for name and birth year of all students of TU Braunschweig
– The result returns roughly 13,000 tuples
– Probably there are some duplicates
– It‟s 1973, and your computer has 16 kilobytes ofmain memory and a very slow external storage device…
– To eliminate duplicates, you need to store the result,sort it, and scan for adjacent duplicate lines• System R engineers concluded that this effort
is not worth the effect
• Duplicate elimination in result sets happens only on-request
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 21
7.1 FromTheory to Practice
• Decision: Don‟t eliminate duplicates in results
• What about the tables?
– Again: Ensuring that no duplicates end up in the tables requires some work
– Engineers also concluded that there is actually no need in enforcing the no-duplicate policy
• If the user wants duplicates and is willing to deal with all the arising problems – then that‟s fine
• Decision: Allow duplicates in tables
• As a result, the theory underlying relational databases shifted from set theory to multi-set theory
– Straightforward, only notation is more complicated
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 22
7.1 FromTheory to Practice
• Sometimes, an attribute value is not known or
an attribute does not apply for an entity
– Example: What value should the attribute
universityDegree take for the entity Heinz Müller,
if Heinz Müller does not have any degree?
– Example: You regularly observe the weather and
store temperature, wind strength, and
air pressure every hour – and then
your barometer breaks... What now?
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 23
7.1 FromTheory to Practice
• Possible solution:
For each domain, define a value indicating that
data is not available, not known, not applicable, …
– For example, use none for Heinz Müller‟s degree,
use −1 for missing pressure data, ...
– Problem:
• You need such a special value for each domain or use case
• You need special failure handling for queries, e.g.
“compute average of all pressure values that are not −1”
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 24
7.1 FromTheory to Practice
• Again, system designers chose the simplest solution
(regarding implementation): NULL values
– NULL is a special value which is usable in any domain
and represents that data is just there
• There are many interpretations of what NULL actually means
– Systems have some default rules how to deal with
NULL values
• Aggregation functions usually ignore rows with NULL values
(which is good in most, but not all cases)
• Three-valued logic
• However, creates some strange anomalies
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 25
7.1 FromTheory to Practice
• Another tricky problem:
How should users query the DB?
• Classical answer:
– Relational algebra and relational calculi
– Problem: More and more non-expert users
• More “natural” query interfaces:
– QBE (query by example)
– SEQUEL (structured English query language)
– SQL: the current standard; derived from SEQUEL
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 26
7.1 FromTheory to Practice
• The design of relational query languages
– Donald D. Chamberlin and Raymond F. Boyce
worked on this task
– Both of IBM Research in San Jose, California
– Main concern: “Querying relational databases
is too difficult with current paradigms”
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 27
7.2 SQUARE & SEQUEL
• “Current paradigms” at the time:
– Relational algebra
• Requires users to define how and in which order data
should be retrieved
• The specific choice of a sequence of operations has an
enormous influence on the system‟s performance
– Relational calculi (tuple, domain)
• Provide declarative access to data, which is good
• Just state what you want and not how to get it
• Relational calculi are quite complex:
many variables and quantifiers
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 28
7.2 SQUARE & SEQUEL
• Chamberlin and Boyce‟s first result was a
query language called SQUARE
– “Specifying queries as relational expressions”
– Based directly on tuple relational calculus
– Main observations:
• Most database queries are rather simple
• Complex queries are rarely needed
• Quantification confuses people
• Under the closed-world assumption,
any TRC expression with quantifiers can be replaced
by a join of quantifier-free expressions
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 29
7.2 SQUARE & SEQUEL
• SQUARE is a notation for (or interface to) TRC
– No quantifiers, implicit notation of variables
– Adds additional functionality needed in practice
(grouping, aggregating, among others)
– Solves safety problem by introducing the
closed world assumption
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 30
7.2 SQUARE & SEQUEL
• Retrieve the names of all female students
– TRC: { t.name | students(t) ⋀ t.sex = „f ‟ }
– SQUARE: namestudentssex („f ‟)
• Get all exam results better than 2.0 in course 101
– TRC:{ t.result | exams(t) ∧ t.crsNr = 101 ∧ t.result < 2.0}
– SQUARE: resultexamscrsNr, result (101, <2.0)
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 31
What part of the
result tuples
should be returned?The range relation of
the result tuples
Attributes with conditions
Conditions
7.2 SQUARE & SEQUEL
• Get a list of all exam results better than 2.0
along with the according student name
– TRC:
{ t1.name, t2.result | students(t1) ⋀ exams(t2)
⋀ t1.matNr = t2.matNr ⋀ t2.result < 2.0 }
– SQUARE:
name result students matNr ⃘ matNr examsresult (<2.0)
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 32
Join of two SQUARE queries
7.2 SQUARE & SEQUEL
Also, ∪, ∩, and ∖ can be used
to combine SQUARE queries.
• Also, SQUARE is relationally complete
– You do not need explicit quantifiers
– Everything you need can be done using
conditions and query combining
• However, SQUARE was not well received
– Syntax was difficult to read and parse, especially
when using text console devices:
• name result students matNr ⃘ matNr exams crsNr result (102, <2.0)
– SQUARE‟s syntax is too mathematical and artificial
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 33
7.2 SQUARE & SEQUEL
• In 1974, Chamberlin & Boyce proposed SEQUEL
– Structured English Query Language
– Based on SQUARE
• Guiding principle:
– Use natural English keywords to structure queries
– Supports “fluent” vocalization and notation
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 34
7.2 SQUARE & SEQUEL
• Fundamental keywords
– SELECT: What attributes should be retrieved?
– FROM: What relations are involved?
– WHERE: What conditions should hold?
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 35
7.2 SQUARE & SEQUEL
• Get all exam results better than 2.0 for course 101
– SQUARE:
resultexamscrsNr result (101, < 2.0)
– SEQUEL:
SELECT result FROM exams
WHERE crsNr = 101 AND result < 2.0
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 36
7.2 SQUARE & SEQUEL
• Get a list of all exam results better than 2.0,
along with the according student names
– SQUARE:
name result students matNr ⃘ matNr result examsresult (< 2.0)
– SEQUEL:
SELECT name, result
FROM students, exams
WHERE students.matNr = exams.matNr
AND result < 2.0
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 37
7.2 SQUARE & SEQUEL
• IBM integrated SEQUEL into System R
• It proved to be a huge success
– Unfortunately, the name SEQUEL already
has been registered as a trademark
by the Hawker Siddeley aircraft company
– Name has been changed to SQL (spoken: Sequel)
• Structured query language
– Patented in 1985 by IBM
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 38
7.2 SQUARE & SEQUEL
• Since then, SQL has been adopted by all(?)
relational database management systems
• This created a need for standardization:
– 1986: SQL-86 (ANSI standard, ISO standard)
– SQL-92, SQL:1999, SQL:2003, SQL:2006, SQL:2008
– The official pronunciation is “es queue el”
• However, most database vendors treat the
standard as some kind of “recommendation”
– More on this later (next lecture)
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 39
7.2 SQUARE & SEQUEL
• There are three major classes of DB operations:
– Defining relations, attributes, domains, constraints, ...• Data definition language (DDL)
– Adding, deleting and modifying tuples
• Data manipulation language (DML)
– Asking queries
• Often part of the DML
• SQL covers all these classes
• In this lecture, we will use IBM DB2’s SQL dialect
– DB2 is used in our SQL lab
– Similar notation in other RDBMS(at least for the part of SQL we teach you in this lecture)
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 40
7.3 Overview of SQL
• According to the SQL standard, relations and other database objects exist in an environment
– Think “environment = RDBMS”
• Each environment consists of catalogs
– Think “catalog = database”
• Each catalog consists of a set of schemas
– Think “schema = group of tables (and other stuff)”
• A schema is a collection of database objects (tables, domains, constraints, ...)
– Each database object belongs to exactly one schema
– Schemas are used to group related database objects
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 41
7.3 Overview of SQL
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 42
7.3 Overview of SQL
Environment
“BigCompany database server”
Catalog
“humanResources”
Schema
“people”
SchemaSchema
“training”
Catalog
“production”
Schema “products”Schema “products”
Schema “testing”
Schema “taxes”Schema “taxes”
Table “staff”
Table
“has Office”
...
...
...
...
• When working with the environment,users connect to a single catalog andhave access to all database objects in this catalog
– However, accessing/combining data objects from different catalogs usually is not possible
– Thus, typically, catalogs are the maximum scopeover that SQL queries can be issued
– In fact, the SQL standard defines an additional layerin the hierarchy on top of catalogs
• Clusters are used to group related catalogs
• According to the standard, they provide the maximum scope
• However, hardly any vendor supports clusters
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 43
7.3 Overview of SQL
• After connecting to a catalog, database objects can be referenced using their qualified name
– Qualified name = schemaname.objectname
• However, when working only with objects from asingle schema, using unqualified names would be nice
– Unqualified name = objectname
• One schema always is defined to be the default schema
– SQL implicitly treats objectname as defaultschema.objectname
– The default schema can be set to schemaname as follows:SET SCHEMA schemaname
– Initially, after login: default schema = current user name• Remember to change the default schema accordingly!
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 44
7.3 Overview of SQL
• SQL queries
– Simple SELECT
– Joins
– Set operations
– Aggregation and grouping
– Subqueries
– Writing “good” SQL code
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 45
7.4 SQL Queries
• SQL queries are statements that
retrieve information from a DBMS
– Simplest case: From a single table,
return all rows matching some given criterion
– SQL queries may return multi-sets (bags) of rows
• Duplicates are allowed by default
(but can be eliminated on request)
• Even just a single value is a row set
(one row with one column)
• However, often it‟s just called a result set…
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 46
7.4 SQL Queries
• Basic structure of SQL queries:
–SELECT <attribute list> FROM <table list> WHERE <condition>
– Attribute list: Attributes to be returned (projection)
– Table list: All tables involved
– Condition: A Boolean expression that
intensionally defines the result set
• If no condition is provided, it is implicitly replaced by TRUE
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 47
7.4 SQL Queries
• SQL performs duplicate elimination
of rows in result set
– May be expensive (due to sorting)
– DISTINCT keyword is used
• Example:
– SELECT DISTINCT name FROM staff
• Returns all different names of staff members,
without duplicates
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 48
7.4 Enforcing Sets in SQL
• The SELECT keyword is sometimes a bit
confusing, since it actually denotes a projection
• To return all attributes under consideration,
the wildcard “*” may be used
• Examples:
– SELECT * FROM …Return all attributes of the tables in the FROM clause
– SELECT movies.* FROM movies, persons WHERE ...Return all attributes of the movies table
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 49
7.4 Attribute Names
• Attribute names are qualified or unqualified
– Unqualified: Just the attribute name
• Only possible, if attribute name is unique among
the tables given in the FROM clause
– Qualified: tablename.attributename
• Necessary if tables share identical attribute names
• If tables in the FROM clause share identical attribute names
and also identical table names, we need even more
qualification:
schemaname.tablename.attributename
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 50
7.4 Attribute Names
• The the result set/table consists ofthe attributes given in the SELECT clause
• However, result attributes can be renamed
– Remember the renaming operator ρ fromrelational algebra…
– SQL uses the AS keyword for renaming
– The new names can also be used in the WHERE clause
• Example:
– SELECT persons.personName AS nameFROM persons, … WHERE name = ’Smith’ AND ...• personName is now called name in the result schema
• name = ’Smith’ means persons.personName = ’Smith’
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 51
7.4 Attribute Names
• Table names can be referenced in the same wayas attribute names (qualified or unqualified)
• However, renaming works slightly different
– The result table of an SQL query has no name
– But tables can be given alias names to simplify queries(also called tuple variables or just aliases)
– Indicated by the AS keyword
• Example:SELECT title, genreFROM movies AS m, genres AS gWHERE m.id = g.id
• The AS keyword is optional: FROM movies m, genres g
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 52
7.4 Table Names
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 53
7.4 Basic Select
SELECTSELECT expression
ALLALL
DISTINCTDISTINCT
,,
table name
**
..
column nameASAS
FROMFROM table name
alias name
ASAS
,,
query ))((
WHEREWHERE search condition GROUP BYGROUP BY column namecolumn name
,,
HAVINGHAVING search condition
attribute names
table names
query block
• Usually, SQL queries return exactly those tuples
matching a given search condition
– Indicated by the WHERE keyword
– The condition is a logical expression which can be
applied to each row and may have one of three values
TRUE, FALSE, and NULL
• Again, NULL might mean “unknown,” “does not apply,”
“does not matter,” ...
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 54
7.4 Conditions
• Search conditions are conjunctions of predicates
– Each predicate evaluates to TRUE, FALSE, or NULL
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 55
7.4 Conditions
search condition
NOTNOT
predicate
search condition(( ))
ANDAND
OROR
• Why TRUE, FALSE, and NULL?
– SQL uses so-called ternary (three-valued) logic
– When a predicate cannot be evaluated because itcontains some NULL value, the result will be NULL• Example: powerStrength > 10 evalutes to NULL
iff powerStrength is NULL
• NULL = NULL also evaluates to NULL
• Handling of NULL by the operators AND and OR:
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 56
7.4 Conditions
AND TRUE FALSE NULL
TRUE TRUE FALSE NULL
FALSE FALSE FALSE FALSE
NULL NULL FALSE NULL
OR TRUE FALSE NULL
TRUE TRUE TRUE TRUE
FALSE TRUE FALSE NULL
NULL TRUE NULL NULL
NOT NULL
TRUE FALSE
FALSE TRUE
? NULL
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 57
7.4 Conditionspredicate
expressionexpression
expressionexpression
expressionexpression
expressionexpression
NOTNOT
comparisoncomparison expressionexpression
expressionexpression expressionexpressionBETWEENBETWEEN ANDAND
NOTNOT
NULLNULLISIS
expressionexpression
NOTNOT
LIKELIKE
expressionexpressionESCAPEESCAPE
expressionexpression
NOTNOT
ININ
queryquery(( ))
(( ))expressionexpression
,,
EXISTSEXISTS queryquery(( ))
expressionexpression comparisoncomparison
SOMESOME
ANYANYALLALL
queryquery(( ))
• Predicates usually contain expressions
– Column names or constants
– Additionally, SQL provides some special expressions
• Functions
• CASE expressions
• CAST expressions
• Scalar subqueries
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 58
7.4 Conditions
• Expressions can be combined usingexpression operators
– Arithmetic operators:+, -, *, and / with the usual semantics
• age + 2
• price * quantity
– String concatenation ||: (also written as CONCAT ) Combines two strings into one
• firstName || ’ ’ || lastname || ’ (aka ’ || alias || ’)’
• ’Hello’ CONCAT ’ World’ → ’Hello World’
– Parenthesis:Used to modify the evaluation order
• (price + 10) * 20
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 59
7.4 Conditions
• Simple comparisons:– Valid comparison operators are
• =, <, <=, >=, and >
• <> (meaning: not equal)
– Data types of expressions need to be compatible (if not, CAST has to be used)
– Character values are usually compared lexicographically (while ignoring case)
– Examples:• powerStrength > 10• name = ’Magneto’• ’Magneto’ <= ’Professor X’
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 60
7.4 Conditions
expressionexpression comparisoncomparison expressionexpression
• BETWEEN predicate:
– X BETWEEN Y AND Z is a shortcut for
Y <= X AND X =< Z
– Note that you cannot reverse the order of Z and Y
• X BETWEEN Y AND Z is different from
X BETWEEN Z AND Y
• The expression can never be true if Y > Z
– Examples:
• year BETWEEN 2000 AND 2008
• score BETWEEN targetScore - 10 AND targetScore + 10
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 61
7.4 Conditions
expressionexpression
NOTNOT
expressionexpression expressionexpressionBETWEENBETWEEN ANDAND
• IS NULL predicate:
– The only way to check if a value is NULL or not
– Returns either TRUE or FALSE
– Examples:
• realName IS NOT NULL
• powerStrength IS NULL
• NULL IS NULL
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 62
7.4 Conditions
expressionexpression
NOTNOT
NULLNULLISIS
• LIKE predicate:– The predicate is for matching character strings to patterns
– Match expression must be a character string
– Pattern expression is a (usually constant) string• May not contain column names
– Escape expression is just a single character
– During evaluation, the match expression is compared to the pattern expression with following additions• _ in the pattern expression represents any single character
• % represents any number of arbitrary characters
• The escape character prevents the special semantics of _ and %
– Most modern database nowadays also support more powerful regular expressions (introduced in SQL-99)
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 63
7.4 Conditionsmatch
expression
match
expression
pattern
expression
pattern
expressionLIKELIKE
escape
expression
escape
expressionESCAPEESCAPENOTNOT
• Examples:– address LIKE ’%City%’
• ’Manhattan’ → FALSE• ’Gotham City Prison’ → TRUE
– name LIKE ’M%_t_’• ’Magneto’ → TRUE• ’Matt’ → TRUE• ’Mtt’ → FALSE
– status LIKE ’_/_%’ ESCAPE ’/’• ’1_inPrison’ → TRUE• ’1–inPrison’ → FALSE• ’%_%’ → TRUE• ’%%%’ → FALSE
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 64
7.4 Conditions
match
expression
match
expression
pattern
expression
pattern
expressionLIKELIKE
escape
expression
escape
expressionESCAPEESCAPENOTNOT
• IN predicate:
– Evaluates to true if the value of the test expression is
within a given set of values
– Particularly useful when used with a subquery (later)
– Examples:
• name IN (’Magneto’, ’Batman’, ’Dr. Doom’)
• name IN (SELECT title FROM movies)– Those people having a film named after them…
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 65
7.4 Conditions
expressionexpression
NOTNOT
ININ
queryquery(( ))
(( ))expressionexpression
,,
• EXISTS predicate:
– Evaluates to TRUE if a given subquery
returns at least one result row
• Always returns either TRUE or FALSE
– Examples
• EXISTS (SELECT * FROM heroes)
– EXISTS may also be used to express semi-joins
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 66
7.4 Conditions
EXISTSEXISTS queryquery(( ))
• SOME/ANY and ALL:
– Compares an expression to each value provided bya subquery
– TRUE if
• SOME/ANY: At least one comparison returns TRUE
• ALL: All comparisons return TRUE
– Examples:
• result <= ALL (SELECT result FROM results)– TRUE if the current result is the smallest one
• result < SOME (SELECT result FROM results)– TRUE if the current result is not the largest one
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 67
7.4 Conditions
expressionexpression comparisoncomparison
SOMESOME
ANYANYALLALL
queryquery(( ))
• Also, SQL can do joins of multiple tables
• Traditionally, this is performed by simply stating
multiple tables in the FROM clause
– Result contains all possible combinations of all
rows of all tables such that the search condition holds
– If there is no WHERE clause, it‟s a Cartesian product
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 68
7.5 Joins
• Example:
– SELECT * FROM hero, hasAlias
• hero × hasAlias
– SELECT * FROM hero, hasAliasWHERE hero.id = hasAlias.hero_id
• hero ⋈id = hero_id hasAlias
• Besides this common implicit notation of joins,
SQL also supports explicit joins
– Inner joins
– Left/right/full outer joins
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 69
7.5 Joins
• Explicit joins are specified in the FROM clause– SELECT * FROM table1 JOIN table2 ON <join condition>
WHERE <some other condition>
– Often, attributes in joined tables have the same names,
so qualified attributes are needed
– INNER JOIN and JOIN are equivalent
• Explicit joins improve readability of your SQL code!
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 70
7.5 Joins
table name
INNER JOININNER JOIN
table name
JOINJOIN
LEFT JOINLEFT JOINRIGHT JOINRIGHT JOIN
FULL JOINFULL JOIN
joined table
ONON condition
Natural join: List students and their exam results– π lastName, crsNr, result students ⋈ exams
– SELECT lastName, crsNr, resultFROM students AS s JOIN exams AS e ON s.matNr = e.matNr
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 71
7.5 Joins
matNr firstName lastName sex
1005 Clark Kent m
2832 Louise Lane f
4512 Lex Luther m
5119 Charles Xavier m
matNr crsNr result
9876 100 3.7
2832 102 2.0
1005 101 4.0
1005 100 1.3
students exams
lastName crsNr result
Kent 100 1.3
Kent 101 4.0
Lane 102 2.0
πlastName, crsNr, result students ⋈exams
We lost Lex Luther and Charles Xavier
because they didn’t take any exams!
Also information on student 9876 disappears…
Left outer join: List students and their exam results– π lastName, crsNr, result students ⎧ exams
– SELECT lastName, crsNr, resultFROM students AS s LEFT JOIN exams AS e ON s.matNr = e.MatNr
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 72
7.5 Joins
matNr firstName lastName sex
1005 Clark Kent m
2832 Louise Lane f
4512 Lex Luther m
5119 Charles Xavier m
matNr crsNr result
9876 100 3.7
2832 102 2.0
1005 101 4.0
1005 100 1.3
students exams
lastName crsNr result
Kent 100 1.3
Kent 101 4.0
Lane 102 2.0
Luther NULL NULL
Xavier NULL NULL
π lastName, crsNr, result students ⎧exams
Right outer join:– π lastName, crsNr, result students ⎨ exams
– SELECT lastName, crsNr, resultFROM students s RIGHT JOIN exams e ON s.matNr = e.MatNr
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 73
7.5 Joins
matNr firstName lastName sex
1005 Clark Kent m
2832 Louise Lane f
4512 Lex Luther m
5119 Charles Xavier m
matNr crsNr result
9876 100 3.7
2832 102 2.0
1005 101 4.0
1005 100 1.3
students exams
lastName crsNr result
Kent 100 1.3
Kent 101 4.0
Lane 102 2.0
NULL 100 3.7
π lastName, crsNr, result students ⎨ exams
Full outer join:– π lastName, crsNr, result students ⎩ exams
– SELECT lastName, crsNr, resultFROM students s FULL JOIN exams e ON s.matNr = e.MatNr
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 74
7.5 Joins
matNr firstName lastName sex
1005 Clark Kent m
2832 Louise Lane f
4512 Lex Luther m
5119 Charles Xavier m
matNr crsNr result
9876 100 3.7
2832 102 2.0
1005 101 4.0
1005 100 1.3
students exams
lastName crsNr result
Kent 100 1.3
Kent 101 4.0
Lane 102 2.0
Luther NULL NULL
NULL 100 3.7
Xavier NULL NULL
π lastName, crsNr, result students ⎩ exams
• SQL also supports the common set operators
– Set union ∪: UNION
– Set intersection ∩ : INTERSECT
– Set difference ∖: EXCEPT
• By default, set operators eliminate duplicates unlessthe ALL modifier is used
• Sets need to be union-compatible to use set operators
– Row definition must match (data types)
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 75
7.6 Set Operators
queryINTERSECTINTERSECT
queryquery-block
literal table
(( ))
EXCEPTEXCEPT
UNIONUNION
ALLALL queryquery
query-blockquery-block
literal tableliteral table
(( ))
• Example:
( SELECT course, result FROM exams WHERE course= 100
EXCEPTSELECT course, result FROM exams WHERE result IS NULL)
UNIONVALUES (100, 2.3), (100, 1.7)
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 76
7.6 Set Operators
student course result
9876 100 3.7
2832 102 2.0
1005 101 4.0
1005 100 NULL
6676 102 4.3
3412 NULL NULL
exams
course result
100 3.7
100 2.3
100 1.7
query block query
}literal table
• Column functions are used to performstatistical computations
– Similar to aggregate function in relational algebra
– Column functions are expressions
– They compute a scalar value for a set of values
• Examples:
– Compute the average score over all exams
– Count the number of exams each student has taken
– Retrieve the best student
– ...
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 77
7.7 Column Function
• Column functions in the SQL standard:– MIN, MAX, AVG, COUNT, SUM:
Each of these are applied to some other expression• NULL values are ignored
• Function columns in result set just get their column number as name
– If DISTINCT is specified, duplicates are eliminated in advance• By default, duplicates are not eliminated (ALL)
– COUNT may also be applied to *• Simply counts the number of rows
– Typically, there are many more column functions available in your RDBMS (e.g. in DB2: CORRELATION, STDDEV, VARIANCE, ...)
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 78
7.7 Column Function
function
name
function
name
column functionexpression
COUNT(*)COUNT(*)
(( ))
ALLALL
DISTINCTDISTINCT
expression
• Examples:
– SELECT COUNT(*) FROM heroes
• Returns the number of rows of the heroes table
– SELECT COUNT(name), COUNT(DISTINCT name) FROM heroes
• Returns the number of rows in the heroes table for that
name is not null and the number of non-null unique names
– SELECT MIN(strength), MAX(strength), AVG(strength) FROM powers
• Returns the minimal, maximal, and average power strength
in the powers table
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 79
7.7 Column Function
• Similar to aggregation in relation algebra,
SQL supports grouping
– GROUP BY <column names>
– Creates a group for each combination of
distinct values within the provided columns
– A query containing GROUP BY can access
non-group-by-attributes only by column functions
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 80
7.7 Grouping
Examples:
– SELECT course, AVG(result), COUNT(*), COUNT(result) FROM exams GROUP BY course
• For each course, list the average result,
the number of results, and the number of non-null results
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 81
7.7 Grouping
student course Result
9876 100 3.7
2832 102 2.0
1005 101 4.0
1005 100 NULL
6676 102 4.3
3412 NULL NULL
exams
course 2 3 4
100 3,7 2 1
101 4.0 1 1
102 3.15 2 2
NULL NULL 1 0
Examples:
– SELECT course, AVG(result), COUNT(*)FROM exams WHERE course IS NOT NULLGROUP BY course
• The where clause is evaluated before the groups are formed!
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 82
7.7 Grouping
student course Result
9876 100 3.7
2832 102 2.0
1005 101 4.0
1005 100 NULL
6676 102 4.3
3412 NULL NULL
exams
course 2 3
100 3.7 2
101 4.0 1
102 3.15 2
• Additionally, there may be restrictions on
the groups themselves
– HAVING <condition>
– The condition may involve group properties
– Only those groups are created that fulfill the
HAVING condition
– A query may have a WHERE and a HAVING clause
– Also, it is possible to have HAVING without GROUP BY
• Then, the whole table is treated as a single group
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 83
7.7 Grouping
• Examples:
– SELECT course, AVG(result), COUNT(*)FROM exams WHERE course <> 100GROUP BY course HAVING count(*) > 1
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 84
7.7 Grouping
student course Result
9876 100 3.7
2832 102 2.0
1005 101 4.0
1005 100 NULL
6676 102 4.3
3412 NULL NULL
exams
course 1 2
102 3.15 2
• As last step in the processing pipeline,(unordered) result sets may be converted into lists– Impose an order on the rows
– This concludes the SELECT statement
– ORDER BY keyword
• Please note:Ordering completely breaks with set calculus/algebra– Result after ordering is a list, not a (multi-)set!
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 85
7.8 Ordering
SELECT statement
query
column namecolumn nameODER BYODER BY
integer
ASCASC
DESCDESC
,,
• ORDER BY may order ascending or descending
– Default: ascending
• Ordering on multiple columns possible
• Columns used for ordering arereferences by their name
• Example
– SELECT * FROM resultsORDER BY student, crsNo DESC
– Returns all results ordered by student id (ascending)
– If student ids are identical, we sort in descending orderby course number
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 86
7.8 Ordering
• When working with result lists, often only the
top-k results are of interest (e.g. k = 10)
• How can we limit the number of result rows?
– Since SQL:2008, the SQL standard offers the
FETCH FIRST clause (supported e.g. in DB2)
– Example:
SELECT name, salary FROM salariesORDER BY salary FETCH FIRST 10 ROWS ONLY
– FETCH FIRST can also be used without ORDER BY
• Get a quick impression of the result set
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 87
7.8 Ordering
• SQL queries are evaluated in this sequence:
5. SELECT <attribute list>
1. FROM <table list>
2. [WHERE <condition>]
3. [GROUP BY <attribute list>]
4. [HAVING <condition>]
6. [UNION/INTERSECT/EXCEPT <query>]
7. [ORDER BY <attribute list>]
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 88
7.8 Evaluation Order of SQL
• In SQL, you may embed a query block within
a query block (so called subquery, or nested query)
– Subqueries are written in parenthesis
– Literal subqueries can be used as expressions
• Return just one single value
– Subqueries may create the test set for the IN-condition
– Subqueries may create the test set for the
EXISTS-condition
– Each subquery within the table list creates
a temporary source table
• Called inline view
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 89
7.9 Subqueries
• Subqueries may either be correlated
or uncorrelated
– If the WHERE clause of the inner query uses an
attributes within a table declared in the outer query,
the two queries are correlated
• The inner query needs to be re-evaluated for every tuple
in the outer query
• This is rather inefficient, so avoid correlated subqueries
whenever possible!
– Otherwise, the queries are uncorrelated
• The inner queries needs to be evaluated just once
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 90
7.9 Subqueries
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 91
7.9 Subqueries
id realNameHero
hero_id aliasNameHasAlias
hero_id power_id powerStrengthHasPower
id name descriptionPower
• Expressions:
– SELECT hero.* FROM hero, hasPower pWHERE hero.id = p.hero_id
AND powerStrenth =(SELECT MAX(powerStrength) FROM hasPower)• Select all those heroes having powers with maximal strength
• Uncorrelated! Subquery is evaluated only once
• IN-condition:
– SELECT * FROM hero WHERE id IN(SELECT hero_id FROM hasAlias
WHERE aliasName LIKE ’Super%’)• Select all those heroes having an alias starting with “Super”
• Uncorrelated
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 92
7.9 Subqueries
• EXISTS-condition:
– SELECT * FROM hero hWHERE EXISTS
(SELECT * FROM hasAlias a WHERE h.id = a.hero_id)• Select heroes having at least one alias
• Correlated! Subquery is evaluated for each tuple in hero
• Inline view
– SELECT h.realName, a.aliasNameFROM hasAlias a,(SELECT * FROM heroes WHERE realName LIKE ’A%’) h WHERE h.id < 100 AND a.hero_id = h.id• Get realName–alias pairs for all heroes with a real name staring
with “A” and an id smaller than 100
• Uncorrelated
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 93
7.9 Subqueries
• What is “good” SQL code?
– Easy to read
– Easy to write
– Easy to understand!
• There is no “official” SQL style guide,
but here are some general hints
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 94
7.10 Writing Good SQL Code
1. Write SQL keywords in uppercase,
names in lowercase!
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 95
7.10 Writing Good SQL Code
GOOD
BADSELECT MOVIE_TITLEFROM MOVIESWHERE MOVIE_YEAR = 2009
SELECT movie_titleFROM moviesWHERE movie_year = 2009
2. Use proper qualification!
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 96
7.10 Writing Good SQL Code
GOOD
BADSELECT imdbraw.movies.movie_title,imdbraw.movies.movie_yearFROM imdbraw.moviesWHERE imdbraw.movies.movie_year = 2009
SET SCHEMA imdbraw
SELECT movie_title, movie_yearFROM moviesWHERE movie_year = 2009
3. Use aliases to keep your code short
and the result clear!
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 97
7.10 Writing Good SQL Code
GOOD
BADSELECT movie_title, movie_yearFROM movies, genresWHERE movies.movie_id = genres.movie_idAND genres.genre = ’Action’
SELECT movie_title title, movie_year yearFROM movies m, genres gWHERE m.movie_id = g.movie_idAND g.genre = ’Action’
4. Separate joins from conditions!
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 98
7.10 Writing Good SQL Code
GOOD
BAD
SELECT movie_title title, movie_year yearFROM movies m, genres g, actors aWHERE m.movie_id = g.movie_idAND g.genre = ’Action’AND m.movie_id = a.movie_idAND a.person_name LIKE ’%Schwarzenegger%’
SELECT movie_title title, movie_year yearFROM movies mJOIN genres g ON m.movie_id = g.movie_idJOIN actors a ON g.movie_id = a.movie_idWHERE g.genre = ’Action’AND a.person_name LIKE ’%Schwarzenegger%’
5. Use proper indentation!
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 99
7.10 Writing Good SQL Code
GOOD
BAD
SELECT movie_title title, movie_year yearFROM movies m, genres g, actors aWHERE m.movie_id = g.movie_idAND g.genre = ’Action’AND m.movie_id = a.movie_idAND a.person_name LIKE ’%Schwarzenegger%’
SELECT movie_title title, movie_year yearFROM movies m
JOIN genres g ON m.movie_id = g.movie_idJOIN actors a ON g.movie_id = a.movie_id
WHERE g.genre = ’Action’AND a.person_name LIKE ’%Schwarzenegger%’
6. Extract uncorrelated subqueries!
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 100
7.10 Writing Good SQL Code
GOOD
BAD
WITH recent_actors (person_id) AS(SELECT DISTINCT person_id
FROM actors aJOIN movies m ON a.movie_id = m.movie_id
WHERE movie_year >= 2007)SELECT DISTINCT person_name nameFROM directors dWHERE d.person_id IN (SELECT * FROM recent_actors)
SELECT DISTINCT person_name nameFROM directors dWHERE d.person_id IN(SELECT DISTINCT person_id
FROM actors aJOIN movies m ON a.movie_id = m.movie_id
WHERE movie_year >= 2007)
7. Avoid subqueries (use joins if possible)!
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 101
7.10 Writing Good SQL Code
GOOD
BAD
SELECT DISTINCT d.person_name nameFROM directors d
JOIN actors a ON d.person_id = a.person_idJOIN movies m ON a.movie_id = m.movie_id
WHERE movie_year >= 2007
WITH recent_actors (person_id) AS(SELECT DISTINCT person_id
FROM actors aJOIN movies m ON a.movie_id = m.movie_id
WHERE movie_year >= 2007)SELECT DISTINCT person_name nameFROM directors dWHERE d.person_id IN (SELECT * FROM recent_actors)