ecommerce technology 20-751 databases
DESCRIPTION
eCommerce Technology 20-751 Databases. Concepts. Relational model SQL DB construction Normalization ER diagrams Transactions Web support. Critical Role of Data. Without data, an organization cannot function especially in eCommerce Initially, data was prepared for specific applications - PowerPoint PPT PresentationTRANSCRIPT
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
eCommerce Technology20-751
Databases
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Concepts
• Relational model• SQL• DB construction
– Normalization – ER diagrams
• Transactions• Web support
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Critical Role of Data
• Without data, an organization cannot function– especially in eCommerce
• Initially, data was prepared for specific applications– payroll data for the payroll system– parts lists for the bill of materials system– sales data for statistical analysis
• By 1970, clear that data had common properties• Data for many applications could be stored together
in an organized way– database instead of separate collections
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
What is a Database?
• No formal definition• A collection of related data allowing:
– insert (add new data)– delete (delete existing data)– update (change existing data = delete +
insert)– query (retrieve all data having a certain property)
• What does “related” mean?
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Database Management System
• Based on a data model, e.g. relational, object, hierarchical
• Has data definition language (DDL) to identify data• Has data manipulation language (DML) for queries
and updates• Separates structure of data from DB implementation• Enforces data structure and content rules• Handles transactions, concurrent operations• Allows backup and recovery from errors• Connects to other software
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
The Relational Model
• A set is a collection of unique items{ CS, HCII, ISRI, RI, LTI, CALD } Divisions of SCS
{ CS, HCII, CS, HCII, RI, CS } NOT A SET (repeated elements)
• A relation on two sets A, B is a set of pairs of elements, one from A and one from BA = { 46-870, 20-751, 46-749, 20-753, 20-770 }
B = { GSIA, SCS }
R = { (46-870, GSIA), (20-751, SCS), (20-753, SCS), (20-770, GSIA), (46-749, GSIA) }
• Relations can be defined on any number of sets
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
The Relational Model of Data
A = { 46-870, 20-751, 46-749, 20-753, 20-770 }
B = { GSIA, SCS }
R = { (46-870, GSIA), (20-751, SCS), (20-753, SCS), (20-770, GSIA), (46-749, GSIA) }
46-870 46-749 20-753 20-75120-770
SCS GSIA
This is the graph of the relation R
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
The Relational Model of Data
A = { 46-870, 20-751, 46-749, 20-753, 20-770 }
B = { GSIA, SCS }
R = { (46-870, GSIA), (20-751, SCS), (20-753, SCS), (20-770, GSIA), (46-749, GSIA) }
This is a table of the relation R
COURSE SCHOOL
20-770 SCS
46-749 GSIA
20-753 SCS
20-751 SCS
46-870 GSIA
CONTAINS ONLYCOURSE NUMBERS
CONTAINS ONLYSCHOOL NAMES
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
The Relational Model of Data
Relations are not necessarily binary. May involve many sets:
• Each row is a 7-tuple. Relation on 7 sets.• No implied ordering of either rows or columns. Sorting is irrelevant• Note: bad table design since “DEPT” is an attribute of “FACULTY”, not “COURSE”
COURSE SCHOOL REQ'D ROOM # FACULTY DEPT
20-770 SCS Y 152 64 STEENKISTE CS
46-749 GSIA N 150 62 GOETTLER GSIA
20-753 SCS N 150 57 NYBERG LTI
20-751 SCS Y 152 64 SHAMOS LTI
46-870 GSIA Y 152 64 MUKHOPADHYAY GSIA
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Tables• A relation can be represented as a table• One row for each tuple in the relation• Easier to draw than a graph• Table has implicit order (of rows and columns)
– But: a relation has no ordering, either of tuples or attributes
• The cardinality C(R) of a relation R is the number of tuples it contains = # of rows in its table
• Relational model represents data as a collection of unordered two-dimensional tables
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Keys
• Key: an attribute (or minimum set of attributes) that uniquely defines a tuple– In the example relation, “Course” is a key
• A relation may have more than one key.• A set of attributes that can serve as a key is a
candidate key.• One is chosen as the primary key.• Keys are used to reference (retrieve) tuples.
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Foreign Keys
• A key from one relation that is an attribute of another relation is a foreign key.
• If we had a “Faculty” relation, then “Faculty” would be a foreign key in the “Courses” relation.
• Foreign keys connect relations together.
FACULTY DEPT
GOETTLER GSIA
MUKHOPADHYAY GSIA
NYBERG LTI
SHAMOS LTI
STEENKISTE CS
COURSE SCHOOL REQD RM # FACULTY
20-770 SCS Y 152 64 STEENKISTE
46-749 GSIA N 150 62 GOETTLER
20-753 SCS N 150 57 NYBERG
20-751 SCS Y 152 64 SHAMOS
46-870 GSIA Y 152 64 MUKHOPADHYAY
FOREIGN KEY: PRIMARY KEYPRIMARY KEY
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Operations on Relations
• Projection
List specific attributes L (columns) of R, written L(R)
E.g. show course number and room
COURSE SCHOOL REQD RM # FACULTY DEPT
20-770 SCS Y 152 64 STEENKISTE CS
46-749 GSIA N 150 62 GOETTLER GSIA
20-753 SCS N 150 57 NYBERG LTI
20-751 SCS Y 152 64 SHAMOS LTI
46-870 GSIA Y 152 64 MUKHOPADHYAY GSIA
COURSE RM
20-770 152
46-749 150
20-753 150
20-751 152
46-870 152
Course, Room(Courses)Courses
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Operations on Relations
• Selection (extract horizontal slices)– List all tuples of relation R whose attributes satisfy
condition C, written C(R)
– E.g. show all tuples with Room = 152, Room=152(R)
• Projection & Selection are unary (1-table) operations
COURSE SCHOOL REQD RM # FACULTY DEPT
20-770 SCS Y 152 64 STEENKISTE CS
46-749 GSIA N 150 64 GOETTLER GSIA
20-751 SCS Y 152 64 SHAMOS LTI
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Structured Query Language (SQL)
• A data manipulation language for manipulating relational databases
• SELECT queries the database• UPDATE modifies relations• DELETE removes tuples
Syntax of the SQL SELECT command:
SELECT { attributes }FROM { table }WHERE { attribute-conditions };
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Structured Query Language (SQL)
• Projection
– SQL: SELECT CourseNo, Room FROM Courses;
– SQL: SELECT DISTINCT Room FROM Courses;
• Selection
– SELECT * FROM Courses WHERE Room= “152”;
– Give a table of all courses that meet in 152
YIELDS DISTINCT TUPLESSINCE CourseNo IS A KEY
MUST ASK FOR DISTINCT TUPLESSINCE Room IS NOT A KEY
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Join
• The natural join A * B consists of tuples with matching attributes (names & values) in A and B
• Natural join is a way of obtaining information across tables
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Natural Join A * B• Attribute names and values must match
• Also called “inner join”: Statistics * Geography• Cartesian product and join are binary operations
City State Pop % For Area Elevation Radio
Seattle WA 532900 13.1 84 14 40
Detroit MI 1027974 3.4 139 601 59
City State Pop % For
Seattle WA 532900 13.1
Detroit MI 1027974 3.4
City Area Elevation Radio
Seattle 84 14 40Atlanta 130 1050 27
Detroit 139 601 59
*=
Statistics: Geography:
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Joins in SQL
• SELECT City, State, RadioFROM StatisticsINNER JOIN GeographyON Statistics.City = Geography.City
City State Radio
Seattle WA 40
Detroit MI 59
City State Pop % For
Seattle WA 532900 13.1
Detroit MI 1027974 3.4
City Area Elevation Radio
Seattle 84 14 40Atlanta 130 1050 27
Detroit 139 601 59
Statistics: Geography:
Result of Query:
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Other SQL Constructs
• ORDERBY (sorting)– SELECT Company, OrderNumber FROM Orders ORDER
BY Company;
• BETWEEN– SELECT * FROM Persons WHERE LastName BETWEEN
'Hansen' AND 'Pettersen‘; • ALTER TABLE (changes table structure)• Functions
– SUM()– COUNT()– MAX()
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Database Constraints
• Domain (data validity) constraints– All values in a column must be from the same
domain– Example: all salaries are positive numeric dollar
amounts. “Monthly” is invalid.• Entity Integrity
– Every entity must have a unique primary key. (Otherwise, can’t access the entity)
• Referential Integrity– Every foreign key value in a relation must match a
primary key in the foreign relation table
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Functional Dependency
• Attribute B is functionally dependent on attribute A if the value of A uniquely determines B– One-to-one relationship: two functional
dependencies: A depends on B; B depends on A– Many-to-one relationship: one functional
dependency: B depends on A– Many-to-many relationship: no dependencies:
neither A nor B depends on the other• Functional dependencies are constraints between
attributes or sets of attributes. They must be maintained or error or inconsistency will result.
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Normalization
• A relation is well-structured if it is non-redundant and allows INSERT, MODIFY and DELETE without error or inconsistency.
• Normalization assists in maintaining functional dependencies and preventing errors and inconsistencies.
• DELETE anomaly:
• Deleting “Jones” removes all information about course 46-870 (namely that its room is 150)
• In the information is in another table, it shouldn’t be here also.
Student Email Course Room
Smith smithj@andrew 20-751 152
Smith smithj@andrew 20-753 152
Jones jonesj@cs 46-870 150
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Normalization
• MODIFY anomaly:
• Suppose Smith’s email address changes. Every line in the table corresponding to Smith must be changed or data will be inconsistent.
• An attribute unique to a key should be entered only once in the database.
Student Email Course Room
Smith smithj@andrew 20-751 152
Smith smithj@andrew 20-753 152
Jones jonesj@cs 46-870 150
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Normalization
• Restructuring to produce smaller, well-structured equivalent relations, reduce data replication
• First Normal Form. Make all attributes atomic. No multiple values.
Name Phone Dept Bldg
Carbonell 83064, 87279 CS, LTI WeH, NSH
Reddy 82597, 87170 CS, Robotics WeH
Name Phone Dept Bldg
Carbonell 83064 CS WeH
Carbonell 87279 LTI NSH
Reddy 82597 CS WeH
Reddy 87170 Robotics WeH
FIRSTNORMAL
FORM:
MULTIPLEVALUES
MULTIPLEVALUES
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Second Normal Form
• Eliminate partial functional dependencies. Every non-key attribute must depend on all key attributes (or redundancy can result).
City State Capital City Pop.
Philadelphia PA Harrisburg 1478002
Pittsburgh PA Harrisburg 1336449
Detroit MI Ann Arbor 1027974
City State City Pop.
Philadelphia PA 1478002
Pittsburgh PA 1336449
Detroit MI 1027974
State Capital
PA Harrisburg
MI Ann Arbor
2NF:DECOMPOSE
INTO TWOTABLES
Capital DEPENDS ON State ONLY,
NOT CITY
KEY IS(City, State)
NOT IN 2NF:
There are many other normal forms and normalization rules
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Entity-Relationship (ER) Diagrams
• Must specify:– Entities (things to be represented in the database)– Attributes (properties of entities)– Relationships (relations among entities)
• These can be modeled by entity-relationship diagrams
• The diagrams are used as a guide to designing the database
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Entity-Relationship (ER) Diagrams• Entity types
– Entity type: Store– Entities: Downtown Store, Squirrel Hill Store, Oakland Store
• Relationships between entity types:
• This is the “Has” relationship
• Direction of arrow is important (“Branch has Staff,” not “Staff Has Branch”)
EXAMPLE FROM CONNOLLY & BEGG
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Entity-Relationship (ER) Diagrams
• Relationships need not be binary:
• This is the “Arranges” relationship; it can be though of as a 4-tuple (Solicitor, Bid, Buyer, Institution)
EXAMPLE FROM CONNOLLY & BEGG
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Entity-Relationship (ER) Diagrams
EXAMPLE FROM CONNOLLY & BEGG
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Entity-Relationship (ER) Diagrams
EXAMPLE FROM CONNOLLY & BEGG
Web Database Connectivity
JDBC = Java Database Connectivity
SQLJ = Java-Embedded SQL
SOURCE: CONNOLLY & BEGG
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Distributed Databases
• Databases in which data is stored in more than one location but appears local to the user– Replicated: multiple copies of database– Partitioned: data is split among locations
• Fragmentation– Information about fragments is stored in a
distributed data catalog (DDC)– Horizontal v. vertical fragmentation
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Distributed Databases
• Advantages– Reduced load on central DB– Lower cost (data spread among small machines)– Reliability (machine failure is not fatal)– Fast access to local data– Ease of growth
• Disadvantages– Complexity. Difficult to maintain consistency– Security (many access points)– Telecommunications required
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Distributed Databases
• Products– PeerDirect
• Issues– Updating of information in a distributed database
is a form of transaction processing
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
What is a Transaction?
• An action requiring a series of steps and database updates.
• Transactions are the basis of Ecommerce.• Transactions may be distributed. Steps processed
on different computers.• Transactions may fail. One or more steps may be
unsuccessful.• Transaction systems must be recoverable. Data and
“state” must be restored after failure.
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
ATM Transaction Database
ACCT PIN BALANCE
536-0178 AEZ22 $ 1013.22
536-9112 71BZZ $ 25561.18
557-0308 82A6Z $ 3278.08
599-8133 K8LL0 $ 1622.77
632-0012 R3TTP $ 12016.45
STOLEN
421-2254
536-9112
542-1613
599-0028
613-4299
667-0033
ACCT RECENT
557-0308 $ 150.00
536-9112 $ 6700.00
542-0061 $ 240.00
599-8133 $ 5500.00
610-0518 $ 50.00
ACCT ATM TIME UID AMOUNT
536-0178 543 10:08:32 0005148 -200.00
536-0178 543 10:09:21 0005154 -200.00
107-0003 391 10:10:06 0005167 300.00
599-8133 422 10:11:15 0005174 -75.00
ACCOUNT MASTER
STOLEN CARDS
RECENT ACTIVITY
POST I NG LOG
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
ATM Withdrawal
1 Check STOLEN If card is stolen, ABORT
2 Check PIN If wrong, retry 3 times, ABORT
3 Check RECENT v. BALANCE Too much activity, ABORT
4 Check ATM reserve If not enough money, ABORT
5 Update RECENT Indicate new activity
6 Update BALANCE Debit bank account
7 Write to Log Record transaction
8 Update ATM reserve Debit ATM balance
9 Tell ATM to dispense money Pay the man
10 Check dispensing status If failed, ABORT
11 Make updates permanent COMMIT the transaction
(Think: “to memory”)
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
ACID
• Four minimum requirements of a transaction T in a transaction system:
• Atomic. T executes completely or not at all.• Consistent. T preserves database
consistency and integrity.• Isolated. T executes as if it were running
alone. Not affected by other concurrent transactions.
• Durable. T’s results preserved during failure.
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Atomicity
• Transaction: John pays Mary $100.• Take $100 out of John’s account.• Add $100 to Mary’s account• Problems:
– John or Mary might not have an account– John might not have $100– System might fail after subtracting $100 from John
• If failure occurs, must undo partial results• “Commit”: successful recording of a transaction• “Abort”: failure of a transaction.
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Consistency
• Maintain database constraints– data validity– unique primary keys– referential integrity– conservation conditions (debits = credits, total
cash = sub of cash in all accounts, etc.)
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Isolation
• Two transactions T1, T2 are interleaved if some steps of one are performed after the other starts but before it completes.T1 has steps A B C D E F; T2 has steps P Q R S
A B P C D Q R S E F is an interleaved schedule.
P Q R S A B C D E F is not interleaved.
• A sequence of transactions is isolated if their steps can be interleaved without affecting the result. Transactions are blind to simultaneous execution.
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Durability
• Results must survive failure.• Logging. Maintaining a record of all data updates so
databases can be repaired if failure occurs.• Updates must be logged before they are performed.
If failure occurs, transaction can complete from failure point.
• If abort is necessary, can undo logged transactions.• Without logging, can’t recover from some types of
failures.
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Simultaneous Transactions• If all TP were done by one single-threaded process, it
would be easy. Just execute one step at a time.• With just two threads (or processes) it’s complicated.
2 transactions T1, T2: READ A; A = A+1; WRITE A;
• Value of A is 6, but it should be 7!
A in DB 5 5 5 5 6 6
T1 STEP READ A A=A+1 WRITE A
A in T1 5 5 6 6 6 6
T2 STEP READ A A=A+1 WRITE A
A in T2 5 5 6 6
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Locking
• A solution is LOCKING. Associate a variable with one process at a time. LOCK the others out.
• LOCK A; READ A; A = A+1; WRITE A; UNLOCK A;• If T1 starts first, it locks A.• When T2 tries to lock A, it can’t. It has to wait.• T1 finishes completely before T2 can lock A.• After T1 finishes, A = 6• After T2 finishes, A = 7, the correct value• Locking achieves atomicity
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Deadlock
T1: LOCK A; LOCK B; B=A+B; UNLOCK A; UNLOCK B;
T2: LOCK B; LOCK A; B=A - B; UNLOCK B; UNLOCK A;
This is deadlock. Neither transaction can complete.
A in DB 17 17 17 17 17
B in DB 3 3 3 3 3
T1 STEP LOCK A LOCK BCAN'T EXECUTE
T2 STEP LOCK B LOCK ACAN'T EXECUTE
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
Ways to Eliminate Deadlock
1. Require each transaction to request all locks at the same time. System either grants them all or none.
• Problem: very restrictive. Transactions cannot be interleaved. Essentially serial execution.
2. Assign an ordering to the variables: A=1; B=2; C=3 …Require transactions to request locks in that order.
3. Do nothing. Periodically check for deadlock. If it exists, cancel out a transaction.
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2003
COPYRIGHT © 2003 MICHAEL I. SHAMOS
QA&