relational database intro for marketers

15
Relational Database Principles for Non-IT People A Grumpy Old Marketer October 2014

Upload: steve-finlay

Post on 15-Jul-2015

66 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Relational Database Principles for Non-IT People

A Grumpy Old Marketer

October 2014

Relational Database Principles

• What is a relational database?

• The First and Second Commandments

• What all this means

What Is a Relational Database?

• At the most basic level, most of the critical information that we use in our IT systems consists of entities and information about them

• It is useful to think about information as tables, in which each row is an entity, and each column or “field” provides information about that entity. The example below is an imaginary list of employees:

First name Last name Employee ID Department etc.

Annie Oakley 340955 Collections etc.

Noam Chomsky 409102 Graphic Arts etc.

etc. etc. etc. etc. etc.

What Is a Relational Database?

• But in any real world situation, some of the columns/ fields in such a table are actually references to a different kind of entity:

First name Last name Employee ID Department etc.

Annie Oakley 340955 Collections etc.

Noam Chomsky 409102 Graphic Arts etc.

etc. etc. etc. etc. etc.

Each employee has a unique ID, and may have a unique name…

… but each employee does not have a unique department. Departments are a different kind of entity, and there are far fewer departments than employees.

What Is a Relational Database?

• Suppose that there are 500 employees, and 10 departments. If the Department field is simply a text field into which someone types or copies the department name, then each department name must be re-entered an average of 50 times. This wastes time and allows errors:

First name Last name Employee ID Department etc.

Annie Oakley 340955 Collections etc.

Noam Chomsky 409102 Graphic Arts etc.

Marvin Hagler 268003 Bill collection etc.

William Cody 550254 Graphic Rats etc.

Walt Disney 027851 Bill collecting etc.

In fact, these five employees belong to only two departments.

What Is a Relational Database?

• Relational database management systems (RDBMS) achieve vast improvements in efficiency and data accuracy by linking each such field to a separate table which describes that kind of entity:

First name Last name Employee ID Department ID etc.

Annie Oakley 340955 6 etc.

Noam Chomsky 409102 9 etc.

Marvin Hagler 268003 6 etc.

William Cody 550254 9 etc.

Walt Disney 027851 6 etc.

Department ID Department name Department VP emp ID etc.

6 Collections 711203 etc.

9 Graphic Arts 488030 etc.

• An RDBMS reduces data entry work enormously– In the example on the previous two slides, 500 department name

entries must be copied or typed if an RDBMS is not used

– If an RDBMS is used, only 10 department name entries are needed. When data is entered in the Employee table, the system presents the user with a list of departments for the Department field, and the user selects rather than typing or pasting.

• An RDBMS eliminates a huge number of opportunities for error– The system presents a list of selections for any linked field, rather than

requiring someone to type or paste accurately. The only possible error is an incorrect selection, which is far less likely than a spelling or pasting error.

• As the size and complexity of a dataset increases, the benefits of an RDBMS increase in proportion

• The benefits of an RDBMS are even greater when attributes that belong in linked tables are taken into account.– Suppose that each department has its own cost centre code, and

that we want to see which cost centre each employee belongs to

First name Last name Employee ID Department Cost centre

Annie Oakley 340955 Collections 7684

Noam Chomsky 409102 Graphic Arts 2417

etc. etc. etc. etc. etc.

– Because the cost centre code is determined by which department the employee is in, we do not need to enter this information in the table of employees at all. The information should be set up as shown on the next slide:

First name Last name Employee ID Department ID etc.

Annie Oakley 340955 6 etc.

Noam Chomsky 409102 9 etc.

Marvin Hagler 268003 6 etc.

William Cody 550254 9 etc.

Walt Disney 027851 6 etc.

Department ID

Department name

Department VP emp ID

Department cost centre

etc.

6 Collections 711203 7684 etc.

9 Graphic Arts 488030 2417 etc.

After the correct department has been selected for each employee record, the linkage between the Department ID fields (the “lookup fields”) guarantees that the cost centre code, and every other correct datum in the Department table, will be correct for every employee.

These benefits are the reason for the First Commandment of RDBMS:

Store each data item only once!

Let’s take another look at our example

First name Last name Employee ID Department ID etc.

Annie Oakley 340955 6 etc.

Noam Chomsky 409102 9 etc.

Marvin Hagler 268003 6 etc.

William Cody 550254 9 etc.

Walt Disney 027851 6 etc.

Department ID Department name Department VP emp ID etc.

6 Collections 711203 etc.

9 Graphic Arts 488030 etc.

Why don’t we just make the user select the department name, and store the department name in this field? Then the system could look up the department by name, and we wouldn’t need the Department ID field at all.

Because information changes!• Suppose that:

– The responsibilities of the Graphic Arts department are expanded, and it becomes Design and Animation.

– The Collections department is renamed Revenue Assurance.

• What happens when department names are used in the lookup fields instead of IDs?– If names are used, we must update the Department Name field in the

table of departments, and we must also update the Department Name field for each and every employee in that department.

– If IDs are used, we update the Department Name field in the table of departments. AND WE’RE DONE.

• If we store meaningful information (e.g., names) in the lookup fields, data maintenance is hundreds of times harder.

The need to maintain data efficiently is the reason for the Second Commandment of RDBMS:

NEVER put meaning in your key (lookup field) values!

What All This Means

• If the information you are handling is inherently relational (which it almost always is), you cannot afford the wasted effort and errors that result from NOT using an RDBMS.

• Your RDBMS must obey the commandments:

– Store each data item only once.

– Never put meaning in your key values.

If you do not use an RDBMS that follows the commandments, your data will go to Hell and

will take your business with it.

Da ogni boccadirompea co’ denti

un peccatore, a guisa di maciulla,

sì che tre ne faceacosì dolenti