managing data resources 9 th edition. problems with the traditional file environment data redundancy...

31
Managing Data Managing Data Resources Resources 9 9 Th Th Edition Edition

Upload: emory-page

Post on 04-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

Managing DataManaging Data

ResourcesResources

99ThTh Edition Edition

Page 2: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

Problems with the Traditional File Environment

• Data redundancy and inconsistency: the presences of duplicate data in multiple data files so that the same data are stored in more than one place or location

• Data inconsistency – the same attribute may have different values

• Program – data dependence: the coupling of data stored in files and the specific programs required to update and maintain those files

• Lack of flexibility: traditional file systems can deliver routine scheduled reports, but cannot deliver ad-hoc reports or respond to unanticipated requirements.

Page 3: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

Problems with the Traditional File Environment (Continued)

• Lack of data sharing and availability: Information cannot flow freely across different functional areas or different parts of the organization. Users find different values of the same piece of information in two different systems.

• Poor security: Because there is little control or management of data, management will have no knowledge of who is accessing or even making changes to the organization’s data.

Page 4: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

Other Database Concepts

• Object-oriented database model– Successor to the relational model– Integration of data and programs– Handles wider variety of field types

• Entity-relationship diagrams– Graphical method of displaying relationships

between tables– Tool for IS professionals

Page 5: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

Types of Database Models

• Hierarchical• Network• Relational• Object-oriented

– Extension of the relational model– Stores both data and the procedures that act

on the data– Stores more complex types of information

(graphics)

Page 6: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

An Entity-Relationship Diagram

CREATING A DATABASE ENVIRONMENT

Figure 7-12

Page 7: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

Physical versus Logical Views

• In managing information, physical deals with the structure of information as it resides on various storage media.

• Logical deals with how knowledge workers view their information needs, and includes such terms as:– CHARACTER - our smallest unit of information.– FIELD - group of related characters.– RECORD - group of related fields.– FILE - group of related records.– DATABASE - group of logically associated files.– DATA WAREHOUSE - information from many

databases.

Page 8: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

Other Logical Structures in a Database

• DATA DICTIONARY - contains the logical structure of information in a database.

• An INTEGRITY CONSTRAINT is a rule that helps assure the quality of the information in a database.– A registration database at your school includes

integrity constraints concerning prerequisites for certain classes.

– Designating primary keys, enforcing referential integrity, using input masks, and validation rules are ways to establish integrity constraints

Page 9: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

Sample Data Dictionary Report

Page 10: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

Components of a DBMS

DBMS engine- accepts logical requests from the various other DBMS subsystems, converts them to their physical equivalent, and actually accesses the database and data dictionary as they exist on a storage device.

DATA DEFINITION SUBSYSTEMDATA DEFINITION SUBSYSTEM - - helps you create and maintain the data dictionary and define the structure of the files in a database

You use this subsystem to define the information logical structure when you first create a database.Once you’ve created a database, you use this subsystem to define new fields, delete fields, or change field properties.

Page 11: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

More Components of a DBMS

• DATA MANIPULATION SUBSYSTEM- helps you add, change, and delete information in a database and mine it for valuable information– Tools in this subsystem include views, report

generators, query languages (QBE and SQL)– SQL is both a DML and DDL

• APPLICATION GENERATION SUBSYSTEM-contains facilities to help you develop transaction-intensive applications. – Programming languages specific to a

particular DBMS– Interfaces to commonly used programming

languages (e.g., COBOL or C++).

More Components of a DBMS

• DATA MANIPULATION SUBSYSTEM- helps you add, change, and delete information in a database and mine it for valuable information– Tools in this subsystem include views, report

generators, query languages (QBE and SQL)– SQL is both a DML and DDL

• APPLICATION GENERATION SUBSYSTEM-contains facilities to help you develop transaction-intensive applications. – Programming languages specific to a

particular DBMS– Interfaces to commonly used programming

languages (e.g., COBOL or C++).

Page 12: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

More Components of a DBMS

• DATA ADMINISTRATION SUBSYSTEM-helps you manage the overall database environment by providing facilities for:– Backup and recovery– Security management

Page 13: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

Database Architectures- Centralized

• Centralized database use a single central processor or multiple processors in a client/server network. The major feature is that the database is in a single physical location.– Advantages of this design are that security

tends to be higher and risks are lower– When data demands in terms of access are

highly decentralized this design tends to be costly and inflexible

Page 14: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

Database Architectures- Distributed

• Databases can be decentralized either by partitioning or by replicating

• Partitioned database: Database is divided into segments or regions. For example, a customer database can be divided into Eastern customers and Western customers, and two separate databases maintained in the two regions.

• Duplicated database: The database is duplicated at two or more locations. The separate databases are synchronized in off hours on a batch basis.

Page 15: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

Distributed Databases

Page 16: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

Ensuring Data Quality

• Corporate and government databases have unexpectedly poor levels of data quality.

• National consumer credit reporting databases have error rates of 20-35%.

• 32% of the records in the FBI’s Computerized Criminal History file are inaccurate, incomplete, or ambiguous.

• Gartner Group estimates that consumer data in corporate databases degrades at the rate of 2% a month.

Page 17: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

• The quality of decision making in a firm is directly related to the quality of data in its databases.

• Data Quality Audit: Structured survey of the accuracy and level of completeness of the data in an information system

• Data Cleansing: Consists of activities for detecting and correcting data in a database or file that are incorrect, incomplete, improperly formatted, or redundant

• Integrity constraints (mentioned earlier)

Ensuring Data Quality (Continued)

Page 18: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

Data Warehouse

• Definition- a database with tools that stores current and historical data that is designed to support business analysis activities and decision-making tasks of managers; typically a relational database model is used

• Benefits

improved access

improved information

isolation from operational systems

tools permit advanced data analysis• Users• Data marts

Page 19: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

Comparison of Data in a Data Warehouse and Operational Data

• Operational Data

• Data is on many systems

• Current operational data

• Inconsistent data definitions

• Functionally organized data

• Data are constantly changing

• Warehouse Data

• Integrated in one enterprise-wide system

• Recent and historical data

• Consistent data definitions

• Data are organized around business entities

• Data are stabilized

Page 20: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

Building a Data Warehouse (ETL)

• Extraction phase – create files on the computer that will store the data warehouse and move transaction data to this machine; data may come from many sources or parts of the organization

• Transformation phase – cleanse and standardize the data. Why is this necessary?

• Load phase – transfer the data from the transformation phase into the data warehouse

• The ETL process becomes automated to make regular transfers of transaction data into the data warehouse

Page 21: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

Data-Mining and Data-Mining Tools

• Data-mining is the process of selecting, exploring, and modeling large amounts of data to discover previously unknown relationships that support decision making.

• Traditional data mining tools answer questions about variables that we think are related– Query languages (QBE or SQL)– Report generators– Multidimensional analysis tools (OLAP and pivot

tables)– Standard statistical procedures (regression,

ANOVA)• Knowledge discovery Data-mining tools look for

relationships that are not discernable to the human eye (see next slide)

Page 22: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

Data-Mining

Page 23: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

Multidimensionality

• Multidimensional data analysis enables users to view data using various dimensions, measures and time frames OLAP

– dimensions: products, business units, country, industry (categories)

– measures: money, unit sales, head count, variances

– time: daily, weekly, monthly, quarterly, yearly)

• This type of analysis also provides the ability to view data in different ways (tables, charts, 3-D, geographically)

• OLAP tools provide for this

• Pivot tables in Excel or Access

Page 24: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

A Data Cube

Page 25: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

Examples of OLAP Tools

• Go to www.fedscope.opm.gov – Under data cubes on entry page click on

employment– Demonstrate drill down and adding charts– Data for this example comes from the Central

Personnel Data File (CPDF) of the federal government

– The OLAP tool used to build this site is from a company named Cognos (PowerPlay)

• OLAP tools based on Excel– http://wLCubed.com – http://www.cubularity.com

Page 26: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

Databases and the Web

• Physical relationship of the hardware• The role of middleware (conversion of HTML to SQL;

conversion of query result back to HTML).• Using the Web

– The browser is a virtual standard and easy to use– The browser does not require training in a

database query tool– The use of the browser requires no change to the

internal database; this enables firms to provide access to internal databases with little cost thus leveraging their investment in older systems.

Page 27: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

Linking Internal Databases to the Web

Page 28: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

Management Opportunities and Challenges

• Effectively managing an organization’s data resources is more than selecting a logical database design– Ongoing commitment requiring discipline– Requires organizational and conceptual changes– Management commitment and understanding

required– Huge opportunities to improve performance by

managing data better• Obstacles

– Cost/benefit is difficult; costs are upfront and benefits are in the future

Page 29: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

Solutions

• Data administration function– Data are the property of the organization– Establish a group to administer data

• Data-planning and modeling methodology– Enterprise planning for data using a common

methodology• Database technology, management, and users

– New software requires new personnel trained on the software

– Database administration– Increased training for end users

Page 30: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

Key Organizational Elements in the Database Environment

Page 31: Managing Data Resources 9 Th Edition. Problems with the Traditional File Environment Data redundancy and inconsistency: the presences of duplicate data

Spreadsheets Versus DBMS

• Linkage between elements– spreadsheet - between cells in same table– DBMS - between elements in different tables

• Orientation– spreadsheet is toward calculations– DBMS is tilted toward organization and linkage

of data elements in different tables• Capabilities

– DBMS has extensive querying and reporting power

– spreadsheet is limited • Memory requirements

– entire spreadsheet table must be in memory– not true for the database table