lecture 3 - pdf - full

19
2014-01-18 1 AP/ADMS 2511 Management Information Systems Session 3 – Chapter 3 Data, Information, and Knowledge Management Learning Objectives 1. Describe the difficulties of managing data 2. Explain how data governance is facilitated by master data management 3. Use the data hierarchy and build E-R (entity relationship) diagrams 4. Explain the characteristics of relational database management systems and their role in information reporting 5. Explain the nature of data warehouses and data marts, their advantages and disadvantages and their role in data mining 6. Describe the knowledge management system cycle and discuss types of knowledge What can “go wrong” with data? 4 It can be …. – Lost – Copied – Erased – Changed Have multiple copies (that are slightly different!) Hard to find – Overwhelming!

Upload: albert-wang

Post on 20-Oct-2015

21 views

Category:

Documents


4 download

DESCRIPTION

Information Management

TRANSCRIPT

Page 1: Lecture 3 - PDF - Full

2014-01-18

1

AP/ADMS 2511Management

Information Systems

Session 3 – Chapter 3

Data, Information, and Knowledge Management

Learning Objectives

1. Describe the difficulties of managing data 2. Explain how data governance is facilitated by master

data management3. Use the data hierarchy and build E-R (entity

relationship) diagrams4. Explain the characteristics of relational database

management systems and their role in information reporting

5. Explain the nature of data warehouses and data marts, their advantages and disadvantages and their role in data mining

6. Describe the knowledge management system cycle and discuss types of knowledge

What can “go wrong” with data?

4

• It can be ….– Lost– Copied– Erased– Changed– Have multiple copies (that are

slightly different!)– Hard to find– Overwhelming!

Page 2: Lecture 3 - PDF - Full

2014-01-18

2

Data quality problems almost cost Library and Archives Canada $200,000

5

• Library and Archives Canada wanted to buy an old map worth $200,000

• Then, they found that they already had two copies!

• Culprits: translation errors, incomplete data entry and data entry errors

Managing Data

Difficulties in Managing Data– Amount of data increases

exponentially

– Data are scattered and collected by many individuals using various methods and devices

– Data come from many sources

– Data security, quality, and integrity are critical

Examples of Data Sources

E-mails

Credit card swipes

RFID tags

Digital video surveillance

Radiology scans

Blogs

Learning Objectives

1. Describe the difficulties of managing data 2. Explain how data governance is facilitated by master

data management3. Use the data hierarchy and build E-R (entity

relationship) diagrams4. Explain the characteristics of relational database

management systems and their role in information reporting

5. Explain the nature of data warehouses and data marts, their advantages and disadvantages and their role in data mining

6. Describe the knowledge management system cycle and discuss types of knowledge

Page 3: Lecture 3 - PDF - Full

2014-01-18

3

Note the difference between transaction and master data

9

• Master files (master data ) is semi-permanent data, such as employee name, address, customer name, customer credit limit

• Transaction data represents business activities or events, such as payroll cheque, customer invoice

• Importance of master data management– Video: http://www.youtube.com/watch?v=MRgjLKufu34

Data Governance

Data Governance encompasses the people, processes and procedures to create a consistent, enterprise view of your data in order to:

– Improve data security

– Increase consistency & confidence in decision making

– Decrease the risk of regulatory fineshttp://www.youtube.com/watch?v=tylX6GvTu5o

Relationship between data governance and a master data management

Video: http://www.youtube.com/watch?v=0uR_-w-UQI0

Practice question 1

Ace Hardware

Learning Objectives

1. Describe the difficulties of managing data 2. Explain how data governance is facilitated by master

data management3. Use the data hierarchy and build E-R (entity

relationship) diagrams4. Explain the characteristics of relational database

management systems and their role in information reporting

5. Explain the nature of data warehouses and data marts, their advantages and disadvantages and their role in data mining

6. Describe the knowledge management system cycle and discuss types of knowledge

Page 4: Lecture 3 - PDF - Full

2014-01-18

4

Catch [Figure 6-2]

Organizing Data in a Traditional File EnvironmentOrganizing Data in a Traditional File Environment

File organization concepts• Bit: Smallest unit of data; binary digit (0,1) • Byte: Group of bits that represents a single character • Field: Group of words or a complete number • Record: Group of related fields • File: Group of records of same type

• Database: Group of related files• Entity: Person, place, thing, event about which

information is maintained• Attribute: Description of a particular entity• Key field: Identifier field used to retrieve, update, sort a

record

Data design for database and file-based systems

15

Data term for database systems

Examples Data term for file-based systems

Entity OR instance (information about one part)

137, Door Latch, $22.00, 8259

Record

Entity class (OR flat file if extracted from the database)

(all of the information for a particular entity type)

File (or table)

Data design for database and file-based systems (p. 73)

16

Data term for

database systems

Examples Data term for file-

based systems

Attribute Part_Name:Door Latch

Field

Primary key OR identifier

137 (unique part number)

Same: Primary key

Secondary keyor Foreign key

8259 (Supplier_Number)

Same: Secondary key

Page 5: Lecture 3 - PDF - Full

2014-01-18

5

An Example of a Table

Records

AttributesEntity class = Part

Primary Keys & Foreign Keys

To ensure that each record is unique in each table, we can set one field to be a Primary Key field.

A Primary Key is a field that that will contain no duplicates and no blank values.

Example

Entity class Attributes Primary Key

Based on the report above, find the entity classes, attributes and primary key that are in the database which the report issued from.

Employee

Department

Job

Employee id, employee name

Dept. Num.Dept. Name, Num of employees

Job #, Jon name, Hours

Employee #

Dept. #

Job. #

Designing Databases – Data ModelData Model• A map or diagram that represents entities and

their relationships• Used by Database Administrators to design tables

with their corresponding associations

Page 6: Lecture 3 - PDF - Full

2014-01-18

6

Entity-Relationship Modeling (ER)

Database designers plan the database design in a process called entity-relationship (ER) modeling .

ER diagrams consists of entities, attributes and relationships.

Entities:Entity instance–person, place, object, event, concept (often corresponds to a row in a table)Entity class–collection of entities (often corresponds to a table)

Relationships:Relationship instance–link between entities (corresponds to primary key-foreign key equivalencies in related tables)Relationship type–category of relationship…link between entity types

Attribute –property or characteristic of an entity or relationship type (often corresponds to a field (column) in a table)

E-R Model Constructs

Cardinality of Relationships

• One-to-One– Each entity in the relationship will have exactly one

related entity

• One-to-Many– An entity on one side of the relationship can have

many related entities, but an entity on the other side will have a maximum of one related entity

• Many-to-Many– Entities on both sides of the relationship can have

many related entities on the other side

Entity-Relationship diagram models also document what is happening with data (p. 118)

Page 7: Lecture 3 - PDF - Full

2014-01-18

7

In Class Entity-relationship diagram practice

Let design an E-R diagram for a movie database• Movies have: title, year of release, length (minutes), genre (e.g.

comedy, action), directors, actors, rating (e.g. Pg, A), studio• Studios (with a name and address) produce one or more movies• Actors (name and date of birth) have roles in one or more movies

• Directors (name and date of birth) direct one or more movies

Video explaining how to do E-R diagramshttp://www.youtube.com/watch?gl=CA&v=mQ4D0drMrYI

Solution

Learning Objectives

1. Describe the difficulties of managing data 2. Explain how data governance is facilitated by master

data management3. Use the data hierarchy and build E-R (entity

relationship) diagrams4. Explain the characteristics of relational database

management systems and their role in information reporting

5. Explain the nature of data warehouses and data marts, their advantages and disadvantages and their role in data mining

6. Describe the knowledge management system cycle and discuss types of knowledge

Problems with the traditional file environment• Data redundancy- The presence of duplicate data in multiple data files

so that the same data are stored in more than one place or location.

• Data inconsistency : The same attribute may have different values.

• Program-data dependence - The coupling of data stored in files and the specific programs required to update and maintain those files such that changes in programs require changes to the data

• Lack of flexibility - A traditional file system can deliver routine scheduled reports after extensive programming efforts, but it cannot deliver ad-hoc reports or respond to unanticipated information requirements in a timely fashion

• Poor security - Management may have no knowledge of who is accessing or making changes to the organization’s data

• Lack of data sharing and availability - Information cannot flow freely across different functional areas or different parts of the organization.

Organizing Data in a Traditional File Environment

Page 8: Lecture 3 - PDF - Full

2014-01-18

8

Database management systems (DBMS)• How a DBMS solves the problems of the

traditional file environment

The Database Approach to Data Management

A Database Management System (DBMS) is a set of computer programs that controls the creation, maintenance, and the use of the database with computer as a platform or of an organization and its end users

Disadvantages of Database Management Systems

30

• More complex (and costly) to set up and maintain

• Complex structures may be slower for processing high volume periodic transaction updates

• How a photo-sharing site manages its data (video):http://news.cnet.com/1606-2_3-23122.html

Using the relational database model

31

• Based on linked two-dimensional tables• Comprises:

– Query language (SQL)– Data dictionary– Normalization process

• How would our student information be organized in a relational table?

H.W:Practice Question ABM Inc. looks at data management and relational database aspects.

Relationships

Page 9: Lecture 3 - PDF - Full

2014-01-18

9

Practice Question 2

DBMS at China Sports Lottery

Learning Objectives

1. Describe the difficulties of managing data 2. Explain how data governance is facilitated by master

data management3. Use the data hierarchy and build E-R (entity

relationship) diagrams4. Explain the characteristics of relational database

management systems and their role in information reporting

5. Explain the nature of data warehouses and data marts, their advantages and disadvantages and their role in data mining

6. Describe the knowledge management system cycle and discuss types of knowledge

What is a data warehouse?• A specialized form of database• An architectural structure that defines how

historical data is stored• Linking components of the data warehouse via

data communications increases the scope of data available

Advantages of data warehouses

• Data is:–Organized and consistent– Integrated (and possibly ‘cleansed’)–Historical and nonvolatile–Optimized for access (for OLAP use, multi-

dimensional)

Page 10: Lecture 3 - PDF - Full

2014-01-18

10

Disadvantages/constraints of data warehouses

VERY costly and complex to establish (hardware, software and people)

Requires continual maintenance as supporting applications change

Requires high levels of security to ensure access to authorized users

What Is a Data Warehouse Used for?

• Knowledge discovery– Making consolidated reports

– Finding relationships and correlations, trends and patterns of behavior.

– Data mining

– Examples• Banks identifying credit risks• Insurance companies searching for fraud• Medical research

Data Marts

• How does a data mart differ from a data warehouse?

• Data Mart – A logical subset of the complete data warehouse. Often viewed as a restriction of the data warehouse to a single business process or to a group of related business processes targeted toward a particular business group.

Learning Objectives

1. Describe the difficulties of managing data 2. Explain how data governance is facilitated by master

data management3. Use the data hierarchy and build E-R (entity

relationship) diagrams4. Explain the characteristics of relational database

management systems and their role in information reporting

5. Explain the nature of data warehouses and data marts, their advantages and disadvantages and their role in data mining

6. Describe the knowledge management system cycle and discuss types of knowledge

Page 11: Lecture 3 - PDF - Full

2014-01-18

11

Data life cycleData: elementary description of transactions that are recorded, classified, and stored but not organized to convey any specific meaning.

Information : refers to data that have been organized so that they have meaning and value to the recipient.

Knowledge : data and/or information that has beenorganized and processedto convey understanding,experience, accumulatedlearning, and expertise.

Wisdom : knowledge accumulated and applied

41

The data life cycle in an organization, Fig 3.1, p. 114

42

Knowledge Management

Knowledge management (KM)

is a process that helps organizations manipulate important knowledge that is part of the organization’s memory, usually in an unstructured format

http://www.youtube.com/watch?v=aIM9hmI-t6w&feature=related

Two Kinds of Knowledge

Knowledge is intangible, dynamic, and difficult to measure, but without it no organization can survive.

• Tacit : or unarticulated knowledge is more personal, experiential, context specific, and hard to formalize; is difficult to communicate or share with others; and is generally in the heads of individuals and teams.

• Explicit : explicit knowledge can easily be written down and codified.

Page 12: Lecture 3 - PDF - Full

2014-01-18

12

People need to know what to do with data?

45

• Data mining / data analytics– http://www.youtube.com/watch?v=BjznLJcgSFI

• Big data, TEDxUofM - Jameson Toole -Big Data for Tomorrow– http://www.youtube.com/watch?v=HSVQ5RDBEJs

Knowledge Management System Cycle

Create knowledge

Capture knowledge

Refine knowledge

Store knowledge

Manage knowledge

Disseminate knowledge

http://www.youtube.com/watch?v=9vm77Ge2Kxs

http://www.youtube.com/watch?v=LYq9jmVtQU8

SAP LAB

SAPGUI - System ID GB2

Page 13: Lecture 3 - PDF - Full

2014-01-18

13

Add the “GB2” connectionIf it missing.

•Highlight (single left click)

the “Connections” entry in

the left hand pane of the

Logon screen.

•While pointing at the

highlighted entry, right

click.

•Select “Add a New Entry”

from the floating menu.

• Click on “Next”.

•Fill in the GB2 connection

details:

Description: “GB2”

Application Server:

“GB2.UCC.UWM.EDU”

Instance Number: “00” [two

zeros]

System ID: “GB2”

Click on “Finish”.

Client Number: 527

527

Page 14: Lecture 3 - PDF - Full

2014-01-18

14

Student Userids:GBI-### (See list posted for your ### )Case sensative

GBI-###

Do not try to delete the

asterisks. Enter the password

“###” in the SAP instruction

document means

your specific GBI #,

i.e. the unique number

you have after the “GBI-“

in your user ID login.

Example: GBI-222 The ### in the assignment w ill be : 222

When you log on for the first time the system will request you to change your password.

Enter the initial password provided here and change it to the LAST SIX DIGITS of your STUDENT ID. Maintain this password throughout the course and don’t change it.

Student initial Passwords:Fall2511*case sensitive

•Next, the following window will

pop up. Click on

Introduction to Navigation in SAP Solutions and Products

Open the file : Navigation exercise.doc in SAP (Session 3) tab.

Page 15: Lecture 3 - PDF - Full

2014-01-18

15

Assignment 1

Now you can start working on Assignment 1Please go to SAP Assignment 1

In session 4 tab

Please note that I will not be able to assist your SAP concerns over email, therefore we

will have next session a SAP session as well

Extra Reading

Normalization

Normalization is a method for analyzing and reducing a relational database to its most streamlined form for:� Minimum redundancy

� Maximum data integrity

� Best processing performance

Normalized data is when attributes in the table depend only on the primary key.

Non-Normalized Relation

Page 16: Lecture 3 - PDF - Full

2014-01-18

16

Normalizing the Database (part A) Normalizing the Database (part B)

Normalization Produces Order What is the ultimate purpose of a database management system?

Data Information Knowledge Action/wisdomAction/wisdom

Its to transform

Data driven decision making

http://www.youtube.com/watch?v=M3-4lT1K1Fc&feature=related

Page 17: Lecture 3 - PDF - Full

2014-01-18

17

Metadata

The term "meta" comes from a Greek word that denotes something of a higher or more fundamental nature. Metadata, then, is data about other data. The term refers to any data used to aid the identification, description and location of networked electronic resources

What is Metadata ?

time period

author

sources

(file) size

•title

•supplemental •information

•abstract

©2005 CSC Brands, L.P. All Rights Reserved

Metadata is analogous to product labeling.

What is Metadata?

©2005 CSC Brands, L.P. All Rights Reserved

•entity

•attributes

Metadata

Page 18: Lecture 3 - PDF - Full

2014-01-18

18

Query Languages

The most popular why to request for information from a database is by using query languages.

SQL Short for Structured Query Language. A standard protocol used to request information from databases. Servers which can handle SQL are known as SQL servers.

Sample of an SQL query.

Online Analytical Processing (OLAP)

Supports multidimensional data analysis, enabling users to view the same data in different ways using multiple dimensions� Each aspect of information—product, pricing, cost,

region, or time period—represents a different dimension

� E.g. Comparing sales in East in June vs. May and July

Enables users to obtain online answers to ad hoc questions such as these in a fairly rapid amount of time

The view that is showing is product versus region.

If you rotate the cube 90 degrees, the face that will show is product versus actual and projected sales.

If you rotate the cube 90 degrees again, you will see region versus actual and projected sales.

Other views are possible.

Multidimensional Data Model

Page 19: Lecture 3 - PDF - Full

2014-01-18

19

Data warehousing at HBC (see Fig 4.11, p. 121)

• HBC spent three years implementing data marts and data query processes

• It was difficult to get business users to give up high levels of detail

• Expected to improve access to data for over 2,000 employees

Example - From Exam

The Board of Directors of RTAL agreed that setting up a DBMS for client and candidate records is the way to go. They would like to know what data analysis tools are available to them once a DBMS is set up. You suggest data mining.Define data mining and provide two examples of how RTAL can use data mining to improve its operational efficiencies.

Legislation can affect data archiving requirements

75

• Two relatively recent legislative changes that have an effect on storage:– In Canada, PIPEDA (Personal Information Protection and

Electronic Documents Act) came into effect January 1, 2004– In the U.S. Sarbanes-Oxley July 2002

• More local bills also provide requirements for information system implementation and management– For example, Texas State Legislature Bill 3740 has requirements

for data management, systems implementation and data governance, See: http://www.legis.state.tx.us/tlodocs/81R/billtext/html/HB03740I.htm

Team work created this course

• The materials for this course were developed by Cristobal Sanchez-Rodriguez and Ingrid Splettstoesser, with the help of:– Hila Cohen– Ken Cudeck– Marius Dobre– John Kucharczuk– Carl Lapp– Donna Rex– Mario Vasilkovs