midterm summary

35
1

Upload: miriam

Post on 08-Jan-2016

81 views

Category:

Documents


2 download

DESCRIPTION

Midterm summary. Types of data on the web: unstructured, semi-structured, structured Scale and u ncertainty are key features Main goals are to model, search (query) and rank Unstructured : Text Semi-structured : XML, Web graph, Web Applications (TBD) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Midterm summary

1

Page 2: Midterm summary

Midterm summary

• Types of data on the web: unstructured, semi-structured, structured

• Scale and uncertainty are key features• Main goals are to model, search (query) and rank• Unstructured: Text• Semi-structured: XML, Web graph, Web

Applications (TBD)• Structured: Relational Databases (Tables)

– We will now focus on that!

2

Page 3: Midterm summary

3

Relational Databases and Query Languages

Page 4: Midterm summary

Tables on the Web?

• Lots of them!• Go to a random sports-related Wikipedia page..• Each table may be small

– Especially if manually created

• However integrating of all tables on the web results in a huge database– See Google BigTables project

• Much of the data has questionable credibility– Integration also leads to uncertainty

4

Page 5: Midterm summary

Relational model

• A declarative method for specifying data and queries

• Data is represented as a set of tables

• A schema is used to specify the types of tables and their connections

5

Page 6: Midterm summary

6

Data

1. Atomic types, a.k.a. data types

2. Tables built from atomic types

Unlike XML, no nested tables, only flat tables are allowed!

– We will see later how to decompose complex structures into multiple flat tables

Page 7: Midterm summary

7

Data Types

• Characters: – CHAR(20) -- fixed length– VARCHAR(40) -- variable length

• Numbers:– BIGINT, INT, SMALLINT, TINYINT– REAL, FLOAT -- differ in precision– MONEY

• Times and dates: – DATE– DATETIME -- SQL Server

• Others... All are simple

Page 8: Midterm summary

8

Tables

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

Product

Attribute namesTable name

Tuples or rows

Page 9: Midterm summary

9

Tables Explained

• A tuple = a record– Restriction: all attributes are of atomic type

• A table = a set of tuples– Like a list…

– …but it is unordered: no first(), no next(), no last().

Page 10: Midterm summary

10

Tables Explained

• The schema of a table is the table name and its attributes:

Product(PName, Price, Category, Manfacturer)

• A key is an attribute whose values are unique;we underline a key

Product(PName, Price, Category, Manfacturer)

Page 11: Midterm summary

11

SQL Query

Basic form: (plus many many more bells and whistles)

SELECT attributes FROM relations (possibly multiple) WHERE conditions (selections)

SELECT attributes FROM relations (possibly multiple) WHERE conditions (selections)

Page 12: Midterm summary

12

Simple SQL Query

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

SELECT *FROM ProductWHERE category=‘Gadgets’

SELECT *FROM ProductWHERE category=‘Gadgets’

Product

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks“selection”

Page 13: Midterm summary

13

Simple SQL Query

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

SELECT PName, Price, ManufacturerFROM ProductWHERE Price > 100

SELECT PName, Price, ManufacturerFROM ProductWHERE Price > 100

Product

PName Price Manufacturer

SingleTouch $149.99 Canon

MultiTouch $203.99 Hitachi

“selection” and“projection”

Page 14: Midterm summary

14

A Notation for SQL Queries

SELECT PName, Price, ManufacturerFROM ProductWHERE Price > 100

SELECT PName, Price, ManufacturerFROM ProductWHERE Price > 100

Product(PName, Price, Category, Manfacturer)

Answer(PName, Price, Manfacturer)

Input Schema

Output Schema

Page 15: Midterm summary

15

Selections

What goes in the WHERE clause:• x = y, x < y, x <= y, etc

– For number, they have the usual meanings

– For CHAR and VARCHAR: lexicographic ordering• Expected conversion between CHAR and VARCHAR

– For dates and times, what you expect...

• Pattern matching on strings...

Page 16: Midterm summary

16

The LIKE operator

• s LIKE p: pattern matching on strings• p may contain two special symbols:

– % = any sequence of characters

– _ = any single character

Product(PName, Price, Category, Manufacturer)Find all products whose name mentions ‘gizmo’:

SELECT *FROM ProductsWHERE PName LIKE ‘%gizmo%’

SELECT *FROM ProductsWHERE PName LIKE ‘%gizmo%’

Page 17: Midterm summary

17

Eliminating Duplicates

SELECT DISTINCT categoryFROM Product

SELECT DISTINCT categoryFROM Product

Compare to:

SELECT categoryFROM Product

SELECT categoryFROM Product

Category

Gadgets

Gadgets

Photography

Household

Category

Gadgets

Photography

Household

What happens if moreattributes are selected?

Page 18: Midterm summary

18

Ordering the Results

SELECT pname, price, manufacturerFROM ProductWHERE category=‘gizmo’ AND price > 50ORDER BY price, pname

SELECT pname, price, manufacturerFROM ProductWHERE category=‘gizmo’ AND price > 50ORDER BY price, pname

Ordering is ascending, unless you specify the DESC keyword.

Ties are broken by the second attribute on the ORDER BY list, etc.

Page 19: Midterm summary

19

Ordering the Results

SELECT categoryFROM ProductORDER BY pname

SELECT categoryFROM ProductORDER BY pname

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

?

Page 20: Midterm summary

20

Ordering the Results

SELECT DISTINCT categoryFROM ProductORDER BY category

SELECT DISTINCT categoryFROM ProductORDER BY category

Compare to:

Category

Gadgets

Household

Photography

SELECT categoryFROM ProductORDER BY pname

SELECT categoryFROM ProductORDER BY pname ?

Page 21: Midterm summary

21

Joins in SQL

• Connect two or more tables:PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

Product

Company Cname StockPrice Country

GizmoWorks 25 USA

Canon 65 Japan

Hitachi 15 Japan

What isthe connection

betweenthem ?

Page 22: Midterm summary

22

Joins

Product (pname, price, category, manufacturer)Company (cname, stockPrice, country)

Find all products under $200 manufactured in Japan;return their names and prices.

SELECT pname, priceFROM Product, CompanyWHERE manufacturer=cname AND country=‘Japan’ AND price <= 200

SELECT pname, priceFROM Product, CompanyWHERE manufacturer=cname AND country=‘Japan’ AND price <= 200

Joinbetween Product

and Company

Page 23: Midterm summary

23

Joins in SQL

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

Product Company

Cname StockPrice Country

GizmoWorks 25 USA

Canon 65 Japan

Hitachi 15 Japan

PName Price

SingleTouch $149.99

SELECT pname, priceFROM Product, CompanyWHERE manufacturer=cname AND country=‘Japan’ AND price <= 200

SELECT pname, priceFROM Product, CompanyWHERE manufacturer=cname AND country=‘Japan’ AND price <= 200

Page 24: Midterm summary

24

Joins

Product (pname, price, category, manufacturer)Company (cname, stockPrice, country)

Find all countries that manufacture some product in the ‘Gadgets’ category.

SELECT countryFROM Product, CompanyWHERE manufacturer=cname AND category=‘Gadgets’

SELECT countryFROM Product, CompanyWHERE manufacturer=cname AND category=‘Gadgets’

Page 25: Midterm summary

25

Joins in SQL

Name Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

Product Company

Cname StockPrice Country

GizmoWorks 25 USA

Canon 65 Japan

Hitachi 15 Japan

SELECT countryFROM Product, CompanyWHERE manufacturer=cname AND category=‘Gadgets’

SELECT countryFROM Product, CompanyWHERE manufacturer=cname AND category=‘Gadgets’

Country

??

??

What isthe problem ?

What’s thesolution ?

Page 26: Midterm summary

26

Joins

Product (pname, price, category, manufacturer)Purchase (buyer, seller, store, product)Person(persname, phoneNumber, city)

Find names of people living in Seattle that bought some product in the ‘Gadgets’ category, and the names of the stores they bought such product from

SELECT DISTINCT persname, storeFROM Person, Purchase, ProductWHERE persname=buyer AND product = pname AND city=‘Seattle’ AND category=‘Gadgets’

SELECT DISTINCT persname, storeFROM Person, Purchase, ProductWHERE persname=buyer AND product = pname AND city=‘Seattle’ AND category=‘Gadgets’

Page 27: Midterm summary

27

When are two tables related?

• You guess they are• I tell you so• Foreign keys are a method for schema designers to

tell you so (7.1)– A foreign key states that a column is a reference to the

key of another tableex: Product.manufacturer is foreign key of Company

– Gives information and enforces constraint

Page 28: Midterm summary

28

Disambiguating Attributes

• Sometimes two relations have the same attr:Person(pname, address, worksfor)Company(cname, address)

SELECT DISTINCT pname, addressFROM Person, CompanyWHERE worksfor = cname

SELECT DISTINCT pname, addressFROM Person, CompanyWHERE worksfor = cname

SELECT DISTINCT Person.pname, Company.addressFROM Person, CompanyWHERE Person.worksfor = Company.cname

SELECT DISTINCT Person.pname, Company.addressFROM Person, CompanyWHERE Person.worksfor = Company.cname

Whichaddress ?

Page 29: Midterm summary

29

Tuple Variables

SELECT DISTINCT x.storeFROM Purchase AS x, Purchase AS yWHERE x.product = y.product AND y.store = ‘BestBuy’

SELECT DISTINCT x.storeFROM Purchase AS x, Purchase AS yWHERE x.product = y.product AND y.store = ‘BestBuy’

Find all stores that sold at least one product that the store‘BestBuy’ also sold:

Answer (store)

Product (pname, price, category, manufacturer)Purchase (buyer, seller, store, product)Person(persname, phoneNumber, city)

Page 30: Midterm summary

30

Tuple VariablesGeneral rule: tuple variables introduced automatically by the system: Product (name, price, category, manufacturer)

Becomes:

Doesn’t work when Product occurs more than once:In that case the user needs to define variables explicitly.

SELECT name FROM Product WHERE price > 100

SELECT name FROM Product WHERE price > 100

SELECT Product.name FROM Product AS Product WHERE Product.price > 100

SELECT Product.name FROM Product AS Product WHERE Product.price > 100

Page 31: Midterm summary

31

Meaning (Semantics) of SQL Queries

SELECT a1, a2, …, akFROM R1 AS x1, R2 AS x2, …, Rn AS xnWHERE Conditions

1. Nested loops:

Answer = {}for x1 in R1 do for x2 in R2 do ….. for xn in Rn do if Conditions then Answer = Answer {(a1,…,ak)}return Answer

Answer = {}for x1 in R1 do for x2 in R2 do ….. for xn in Rn do if Conditions then Answer = Answer {(a1,…,ak)}return Answer

Page 32: Midterm summary

32

Meaning (Semantics) of SQL Queries

SELECT a1, a2, …, akFROM R1 AS x1, R2 AS x2, …, Rn AS xnWHERE Conditions

2. Parallel assignment

Doesn’t impose any order !

Answer = {}for all assignments x1 in R1, …, xn in Rn do if Conditions then Answer = Answer {(a1,…,ak)}return Answer

Answer = {}for all assignments x1 in R1, …, xn in Rn do if Conditions then Answer = Answer {(a1,…,ak)}return Answer

Page 33: Midterm summary

33

Exercises

Product (pname, price, category, manufacturer)Purchase (buyer, seller, store, product)Company (cname, stock price, country)Person(per-name, phone number, city)

Ex #1: Find people who bought telephony products.Ex #2: Find names of people who bought American productsEx #3: Find names of people who bought American products and they live in Seattle.Ex #4: Find people who have both bought and sold something.Ex #5: Find people who bought stuff from Joe or bought products from a company whose stock prices is more than $50.

Page 34: Midterm summary

Solution#1

SELECT DISTINCT PU.buyer

FROM Purchase PU, Product PR

WHERE PU.product = PR.pname AND

PR.category = 'telephony‘

#2

SELECT DISTINCT PU.buyer

FROM Purchase PU, Product PR, Company C

WHERE PU.product = PR.pname AND

PR.manufactur = C.cname AND

C.country = 'America‘

#3

SELECT DISTINCT PU.buyer

FROM Purchase PU, Product PR, Company C, Person P

WHERE PU.product = PR.pname AND

PR.manufactur = C.cname AND

C.country = 'America' AND

PU.buyer = P.per-name AND

P.city = 'Seattle' 34

Page 35: Midterm summary

Solution#4

SELECT DISTINCT buyer

FROM Purchase

WHERE buyer IN (SELECT seller FROM Purchase)

#5

SELECT DISTINCT PU.buyer

FROM Purchase PU, Product PR, Company C

WHERE PU.product = PR.pname AND

PR.manufactur = C.cname AND

(PU.seller = 'Joe' OR C.stockprice > 50)

35