1.1 - relational data model.odp

21
© 2010 IBM Corporation Information Management Information Management Ecosystem Partnerships IBM Canada Lab Summer/Fall 2010 The Relational Data Model

Upload: selvaraj-villy

Post on 10-Apr-2016

223 views

Category:

Documents


1 download

TRANSCRIPT

© 2010 IBM Corporation

Information Management

Information Management Ecosystem PartnershipsIBM Canada Lab

Summer/Fall 2010

The Relational Data Model

2 © 2010 IBM Corporation

Information Management

Agenda

■ Data and Databases

■ Database Management Systems (DBMS)

■ Information Model & Data Model

■ Relational Data Model & Relational Databases

■ Normalization

3 © 2010 IBM Corporation

Information Management

What is Data?

■ Collection of facts or numbers

■ Can be quantitative or qualitative

■ Describes a variable or set of variables

■ Essentially data can be thought of as the result of observations based on things like:

– measurements– statistics

4 © 2010 IBM Corporation

Information Management

Data and Information

■ Data is simply facts

■ Why is data important?–By relating different pieces of data

we are able to extract valuable information by presenting it in meaningful context

■ For that we need to be able to:

● Store data → so it can be persisted● Structure data → so it is easier to manipulate● Organize data → in a meaningful way● Process data → to derive data value from it

5 © 2010 IBM Corporation

Information Management

Why Databases?

■ Data can be stored using multiple methods such as:–Text files–Comma delimited data files –Spreadsheets–Databases

■ Why database?–The way data is accessed –The way data is handled

6 © 2010 IBM Corporation

Information Management

Managing Data

■ Using a database provides:–a standard interface for accessing data –multiple users with simultaneous ability to insert, update and

delete data–changes to the data without risk of losing data and its

consistency–the capability to handle huge volumes of data and users–tools for data backup, restore and recovery–security–reduce redundancy–data independence

7 © 2010 IBM Corporation

Information Management

Database Management Systems

■ Database Management System (DBMS)– It is the software system that manages databases– Provides an interface of access to the databases– Provides data services to applications

• Efficient data querying and update mechanisms• Data integrity – guarantees data is always right even in case

of software and hardware errors• Others: backup, compression, security, replication, etc.

■ DB2 is a Database Management System

8 © 2010 IBM Corporation

Information Management

Information Model

■ Information Model –Abstract management of objects at a conceptual level–Independent of specific implementations and protocols –Hides all protocol and implementation details –Defines relationships between managed objects.

■ Multiple implementations of an information model exists–Data models

9 © 2010 IBM Corporation

Information Management

Data Model

■ A model is a representation of an object or concept of the real world

– 3D model: a graphical representation of an object– Scale model: a replica or copy of an object in a smaller size– Business model: describes how a company operates

■ Data Model– Define how data is to be represented and structured– It can be used to map how data from the real world is to be

represented in a software system

■ Characteristics of a Data Model– Lower level of abstraction– Intended for the software developer– Includes specific implementation and protocol details

10 © 2010 IBM Corporation

Information Management

Types of Data Models

■ Extended Relational

■ Entity-Relationship

■ Hierarchical

■ Network

■ Object-oriented

■ Object-relational

■ Relational

■ Semantic

■ Semi-structured (XML)

11 © 2010 IBM Corporation

Information Management

The Relational Data Model

■ Proposed by E.F. Codd in 1970.

■ It is mathematical model that describes data as a collection of Relations (sets) and the values of the data is defined by Domains.

■ Focuses on providing better data independence

■ Data are operated upon by means of a relational calculus or relational algebra

■ Advantages– Based on a formal theoretical model and proven in practice– Provides logical view of the data

■ It is implemented by most DBMS in the market, such as DB2.– There are called Relational Database Management

Systems

12 © 2010 IBM Corporation

Information Management

Components of the Relational Data Model■ The relational data model has its own unique terms used to

define its concepts.

Attribute

Relation

Tuple

(ID, int) (NAME, text) (EXT, int) (Active, boolean)

1 John S 54213 Y

2 Michael B 52137 Y

3 Jeremy W 50603 Y

4 Leah E 58963 N

■ Domain (or data type) defines the set of possible values that data can assume

■ Relation is composed by a heading and a body

– Heading: a set of attributes

– Body: a set of tuples

■ Attribute is composed by a name and a domain (type)

■ A tuple is a set of attribute values

Domain

13 © 2010 IBM Corporation

Information Management

Components of a Relational Database

■ Concepts from Relational Data Model can be mapped to their implementation found in a Relational Database

■ Relational databases store data using tables– A table consists of columns and rows– Each column has a specific data type – Each row features a certain value for each column

Column

Table

Row

ID NAME EXTENSION MANAGER

1 John S 54213 Y

2 Susan P 59867 N

4 Andrew J 55935 N

5 Michael B 52137 Y

6 Jeremy W 50603 Y

7 Leah E 58963 N

14 © 2010 IBM Corporation

Information Management

Tables

■ A Table is the counterpart of a Relation from the Relational Data Model

■ A table stores data in rows and columns– Rows are the same as Tuples– Columns are the same as Attributes

■ There can be multiple tables for different types of data to reduce redundant information

– Normalization (more on this later)

■ For example: –You want to store data about a company

• Data about branch offices will be stored in a table• Employee data for specific branches will be stored in its own

table• Product data will be stored in another table

15 © 2010 IBM Corporation

Information Management

Columns

■ Columns are also known as fields

■ Each field contains a specific type of information such as name, extension, position and so on

■ Columns must be designated a specific data type such as DATE, VARCHAR, INTEGERand so on

Specific Information

Data type: Char

16 © 2010 IBM Corporation

Information Management

Domains and Data Types

■ Data Type– This is counterpart of Domains in the Relational Model,

which defines the smallest unit of data that can be stored– Columns always have a data type

Manager Domain = (Y, N)

A domain is the set of all possible values for the specific attribute

Smallest unit of Data (atomic value)

Attribute/Colum

17 © 2010 IBM Corporation

Information Management

Components of a Relational Database

■ Primary Keys– Uniquely identifies each tuple (row) of the relation (table)– Relations must always have a primary key– Although it is recommended, tables in a relational database

are not required to have a Primary Key

■ Examples– Driver's license of a person– ISBN of a book– Serial number of a product

Primary Key

18 © 2010 IBM Corporation

Information Management

Components of a Relational Database

■ Foreign Keys– Attribute in one relation whose values match a primary key

of another relation – Defines the relationship between two tables

ID(PK)

NAMEDEPT_ID

(FK)

1 John S A

2 Michael B B

3 Jeremy W B

4 Leah E C

ID(PK)

DEPT_NAME

A Finance

B Design

C Business

D Development

19 © 2010 IBM Corporation

Information Management

Why Normalization?

■ Why do we need normalization?

■ Consider the following table:–Lists of task an employee is involved in:

■ Example operation: if John moves to a new city, all entries related to John must be updated

– redundancy– anomalies

ID Name Office City Extension Task

1 John S Toronto 54213 Planning

1 John S Toronto 54213 Marketing

1 John S Toronto 54213 Testing

2 Susan P New York 59867 Marketing

3 Jennifer L Chicago 59415 Planning

3 Jennifer L Chicago 59415 Testing

20 © 2010 IBM Corporation

Information Management

Normalization

■ No anomalies, no redundancy

■ No loss of information

Employee ID Task ID

1 1

1 2

1 3

2 2

3 1

3 3

ID Name Office City Extension

1 John S Toronto 54213

2 Susan P New York 59867

3 Jennifer L Chicago 59415

ID Task

1 Planning

2 Marketing

3 Testing

Task TableEmployee Table

Employee Tasks Table

© 2010 IBM Corporation

Information Management

Information Management Ecosystem PartnershipsIBM Canada Lab

Summer/Fall 2010Questions?

E-mail: [email protected]: “DB2 Academic Workshop”