sql ch 9 - data integrity

SQL – Ch 10 – Data Integrity

Prof. Mukesh N. Tekwani [9869 488 356]

10. DATA INTEGRITY

1. What does the term data integrity mean?

The term data integrity refers to the correctness and completeness of the data in a database. When the

contents of a database are modified with the INSERT, DELETE, or UPDATE statements, the integrity of the

stored data can be lost in many different ways.

For example:

• Invalid data may be added to the database, e.g., order for a nonexistent product.

• Existing data may be modified to an incorrect value, e.g., reassigning a salesperson to a nonexistent

office.

• Changes to the database may be lost due to a system error or power failure.

• Changes may be partially applied, such as adding an order for a product without adjusting the quantity

available for sale.

To preserve the consistency and correctness of its data, a RDBMS imposes one or more data integrity

constraints. These constraints restrict the data values that can be inserted into the database or created by

a database update.

The different types of data integrity constraints are:

• Required data.

• Validity checking.

• Entity integrity.

• Referential integrity

• Other data relationships.

• Business rules

• Consistency

2 Explain the different data integrity constraints

The term data integrity refers to the correctness and completeness of the data in a database. When the

contents of a database are modified with the INSERT, DELETE, or UPDATE statements, the integrity of the

stored data can be lost in many different ways.

The different types of data integrity constraints are:

� Required data. Some columns in a database must contain a valid data value in every row; they are not

allowed to contain missing or NULL values. E.g., every order must have an associated customer who

placed the order. Therefore, the CUST column in the ORDERS table is a required column. The DBMS can

be asked to prevent NULL values in this column.

� Validity checking. Every column in a database has a domain, or a set of data values that are permitted

for that column. E.g., order numbers that begin at 100,001, so the domain of the ORDER_NUM column

is positive integers greater than 100,000. Similarly, employee numbers in the EMPL_NUM column must

fall within the numeric range of 101 to 999. The DBMS can be asked to prevent other data values in

these columns.

� Entity integrity. The primary key of a table must contain a unique value in each row, which is different

from the values in all other rows. E.g., each row of the PRODUCTS table has a unique set of values in its

MFR_ID and PRODUCT_ID columns. Duplicate values are illegal. The DBMS can be asked to enforce this

unique values constraint.

� Referential integrity. A foreign key in a relational database links each row in the child table containing

the foreign key to the row of the parent table containing the matching primary key value. In the sample

database,

SQL - Ch 10 – Data Integrity

[email protected]

� Other data relationships. Other constraints may be enforced on the database. For example, the quota

target for each office must not exceed the total of the quota targets for the salespeople in that office.

The DBMS can be asked to check modifications to the office and salesperson quota targets to make

sure that their values are constrained in this way.

� Business rules. Updates to a database may be prevented by business rules governing the real-world

transactions that are represented by the updates. E.g., there may be a business rule that forbids

accepting an order for which there is an inadequate product inventory. The DBMS can be asked to

check each new row added to the ORDERS table to make sure that the value in its QTY column does not

violate this business rule.

� Consistency. Some transactions can cause multiple updates to a database. That is, if data in one table

is updated, there should be a corresponding change in other liked tables. For example, accepting a

customer order may involve adding a row to the ORDERS table, increasing the SALES column in the

SALESREPS table for the person who took the order, and increasing the SALES column in the OFFICES

table for the office where that salesperson is assigned. The INSERT and both UPDATEs must all take

place in order for the database to remain in a consistent, correct state. The DBMS can be asked to

enforce this type of consistency rule or to support applications that implement such rules.

3 What are the techniques of simple validity checking?

SQL provides a data validation capability by allowing us to create a rule that determines what data can be

entered into a particular column. SQL checks the rule each time an INSERT or UPDATE statement is

attempted for the table that contains the column.

Ex 1: To create a rule for the QUOTA column in the SALESREPS table:

CREATE RULE QUOTA_LIMIT

AS @VALUE BETWEEN 0.00 AND 500000.00

VALIDITY CHECKING TECHNIQUES There are two techniques for simple validity checking: Column Check constraints & Domains

Column Check Constraints:

A check constraint is a search condition, which produces a true/false value. When a check constraint is specified for a column, DBMS automatically checks the value of that column each time a new row is inserted or a row is updated to insure that the search condition is true. If the search condition is not true, the INSERT or UPDATE statement fails. A column check constraint is given as part of the column definition within the CREATE TABLE statement.

Ex:

CREATE TABLE SALESREPS

(EMPL_NUM INTEGER NOT NULL

CHECK (EMPL_NUM BETWEEN 101 AND 199), AGE INTEGER

CHECK (AGE >= 21), .

QUOTA MONEY

CHECK (MONEY >= 0.0)

.

Consider the constraint CHECK (EMPL_NUM BETWEEN 101 AND 199),

This constraint requires that valid employee numbers be three-digit numbers between 101 and 199. Now consider the constraint on the AGE column : CHECK (AGE >= 21)

The third constraint (on the QUOTA column) is CHECK (MONEY >= 0.0)



Domains:

A domain a collection of permitted values. These permitted values can be applied to not just one column

but many columns.

We first create a domain by using the CREATE DOMAIN statement, as follows:

CREATE DOMAIN VALID_EMPLOYEE_ID INTEGER

CHECK (VALUE BETWEEN 101 AND 199)

Once the VALID_EMPLOYEE_ID domain has been defined, it may be used to define columns in

database tables instead of a data type. Now we can write the CREATE TABLE statement for the SALESREPS table as follows:

CREATE TABLE SALESREPS

(EMPL_NUM VALID_EMPLOYEE_ID,

AGE INTEGER CHECK (AGE >= 21),

.

.

QUOTA MONEY

CHECK (MONEY >= 0.0)

Advantages of using Domains:

1. The advantage of using the domain is that if other columns in other tables also contain employee numbers, the domain name can be used repeatedly, thus simplifying the table definitions.

2. The definition of "valid data" (such as valid employee numbers in this example) is stored in one, central place within the database. If the definition changes later (for example, if the company grows and employee numbers in the range 200-299 must be allowed), it is much easier to change one domain definition than to change many column constraints scattered throughout the database.

4 Explain what is meant by “entity integrity”

A table's primary key must have a unique value for each row of the table.. For example, two rows of the SALESREPS table cannot have the value 106 in their EMPL_NUM column. Therefore we

impose the restriction that the primary key must have a unique value. This is called the entity integrity constraint. When a primary key is specified for a table, the DBMS automatically checks the uniqueness of the primary key value for every INSERT and UPDATE statement performed on the table. If we attempt

to insert a row with a duplicate primary key value or to update a row so that its primary key would be a duplicate, it will fail and generate an error message.

5 What is referential integrity?

A set of columns in a table that corresponds to the primary key in another table is called as a foreign key. For example, consider the EmpNumber (primary key in Employees table). These values are also used in the Orders table. In the Orders table, this column is called as the foreign key. Any values used in the foreign key column in Orders table must point to or refer to an existing primary key in the Employees table. Hence this type of integrity is called as referential integrity.

This rule enforces the integrity of the parent/child relationship which is created by the primary key / foreign key combination.

6 In what ways can referential integrity of a database be affected?

1. Inserting a new child row. When an INSERT statement adds a new row to the child table, its foreign key value must match one of the primary key values in the parent table. If the foreign


[email protected]

key value does not match any primary key, inserting the row will corrupt the database, because there will be a child without a parent (an "orphan"). Inserting a row in the parent table never creates any problem; because the new row simply becomes a parent without any children.

This problem is handled by checking the values of the foreign key columns before the INSERT statement is permitted. If they don't match a primary key value, the INSERT statement is

rejected with an error message. 2. Updating the foreign key in a child row. If the foreign key is modified by an UPDATE

statement, the new value must match a primary key value in the parent table. Otherwise, the updated row will be an orphan. This problem is handled by checking the updated foreign key value. If there is no matching primary key value, the UPDATE statement is rejected with an error message.

3. Deleting a parent row. If a row of the parent table that has one or more children is deleted,

the child rows will become orphans. The foreign key values in these rows will no longer match any primary key value in the parent table. Deleting a row from the child table will not create any problem because the parent of this row simply has one less child after the deletion.

This problem requires a different approach. We can do one of the following: a) Prevent the deletion of parent row until all foreign keys are reassigned a new value. b) Automatically delete the dependent child rows. c) Set the foreign key value of such records to NULL. d) Set the foreign key value of such records to some default value.

4. Updating the primary key in a parent row. If the primary key of a row in the parent table is modified, all of the current children of that row become orphans because their foreign keys no longer match a primary key value. This problem has similar complexity. Again, there are four logical possibilities: a) Prevent the primary key from being changed until the foreign keys are reassigned. b) Automatically update the foreign key. c) Set the foreign key to NULL.

d) Set the foreign key to some default value.

7 What are the delete rules to enforce database integrity?

Whenever a parent/child relationship is created by a foreign key in a database, we can specify an associated delete rule. The delete rule tells the DBMS what to do when a user tries to delete a row of the parent table. These four delete rules are: 1. The RESTRICT delete rule prevents us from deleting a row from the parent table if the row has

any children. A DELETE statement that attempts to delete such a parent row generates an

error message. Only those rows can be deleted from the parent that have no child rows. 2. The CASCADE delete rule tells the DBMS that when a parent row is deleted, all of its child

rows should also automatically be deleted from the child table. 3. The SET NULL delete rule tells the DBMS that when a parent row is deleted, the foreign key

values in all of its child rows should automatically be set to NULL. Therefore when a row is deleted from the parent table it causes a "set to NULL" update on selected columns of the

child table. 4. The SET DEFAULT delete rule tells the DBMS that when a parent row is deleted, the foreign

key values in all of its child rows should automatically be set to the default value for that particular column. Thus, deletions from the parent table cause a "set to DEFAULT" update on

selected columns of the child table.



8 What are the update rules to enforce database integrity?

The update rule tells the DBMS what to do when a user tries to update the value of one of the primary key columns in the parent table. There are four possibilities:

1. The RESTRICT update rule prevents you from updating the primary key of a row in the parent table if that row has any children. An UPDATE statement that attempts to modify the

primary key of such a parent row is rejected with an error message.

2. The CASCADE update rule tells the DBMS that when a primary key value is changed in a

parent row, the corresponding foreign key value in all of its child rows should also automatically be changed in the child table, to match the new primary key.

3. The SET NULL update rule tells the DBMS that when a primary key value in a parent row

is updated, the foreign key values in all of its child rows should automatically be set to NULL. Primary key changes in the parent table cause a "set to NULL" update on selected

columns of the child table.

4. The SET DEFAULT update rule tells the DBMS that when a primary key value in a parent

row is updated, the foreign key values in all of its child rows should automatically be set to the default value for that particular column. Primary key changes in the parent table cause a "set to DEFAULT" update on selected columns of the child table.

9 What is a trigger?

1. Triggers are stored procedures that are executed automatically when a particular event occurs. A trigger can also be defined as a piece of code which is activated by DBMS if a specific operation is executed on the database, and only when a certain condition holds.

2. Triggers are used to enforce data integrity. 3. They are similar to constraints. 4. The following three events can trigger an action: INSERT, DELETE and UPDATE. 5. The action triggered by an event is given as a sequence of SQL statements. 6. Triggers provide an alternative way to enforce referential integrity. 7. Triggers are activated automatically. They cannot be called by the user. 8. Triggers are created using the CREATE TRIGGER command.

Syntax: CREATE TRIGGER trigger-name

ON table-name FOR which-event (INSERT | DELETE | UPDATE)

AS

trigger-code

9. Triggers are removed by using the DROP TRIGGER command.

Example: DROP TRIGGER triggername

Example 1: Create a trigger to disallow any rows with a budget of over 100 in the Movies table: CREATE TRIGGER movies_insert

ON Movies

FOR INSERT

AS

BEGIN

IF budget > 100 BEGIN

ROLLBACK TRANSACTION PRINT “Transaction not permitted for budget over 100”

END

END

Now consider the following INSERT query: INSERT INTO Movies (movie_id, studio_id, director_id, gross, budget, release_date)


[email protected]

VALUES (15, ‘Test Movie’, 3, 5, 50, 101, GETDATE()) Note that the budget here is 101 (bold value). SQL will give the error “Insertion into Movies not allowed”. The trigger is fired after after the execution of the statement is finished.

10 State the advantages and disadvantages of triggers.

Advantages:

1. The major advantage of triggers is that business rules can be stored in the database and enforced consistently with each update to the database. This can reduce the complexity of application programs that access the database.

2. Triggers can be used to enforce referential integrity. Disadvantages: 1. Database complexity. When the rules are moved into the database, setting up the database

becomes a more complex task. 2. Hidden rules. Since the rules are hidden away inside the database, programs may generate

an enormous amount of database activity. The programmer no longer has total control over what happens to the database. A program-initiated database action may cause other, hidden actions.

3. Hidden performance implications. Since triggers stored inside the database, the

consequences of executing a SQL statement are no longer completely visible to the programmer.

sql ch 9 - data integrity

Education