files and databases chapter 12 organizing and managing digital data © the mcgraw-hill companies,...

46
Files and Databases Chapter 12 Organizing and Managing Digital Data © The McGraw-Hill Companies, Inc., 2000

Upload: cecily-bailey

Post on 29-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Files and Databases

Chapter 12

Organizing and Managing Digital Data

© The McGraw-Hill Companies, Inc., 2000

2 CCI © The McGraw-Hill Companies, Inc., 2000Ch 12

Overview

• Database and database administrator

• Hierarchy

• File handling

• File management systems

• DBMS

• Ethics

Ch 12 © The McGraw-Hill Companies, Inc., 2000 3 CCI

Databases

• Collection of related files

• Range in size from those on your PC to terabytes of digital photographs of the world on a large series of servers– http://teraserver.microsoft.com

• Examples– online services, virtual art museums, libraries

Ch 12 © The McGraw-Hill Companies, Inc., 2000 4 CCI

Database Administration

• Managing a database

• Database administrator (DBA)– design, implementation,

integration]– coordination with users– system security– backup and recovery– performance monitoring

Ch 12 © The McGraw-Hill Companies, Inc., 2000 5 CCI

Hierarchy and Key Field

• Data storage hierarchy– levels of data

• bits, bytes, fields, records, files, and databases

– definitions• character, field, record, file, database

– key field• unique data used to identify a record• used for sorting• often numerically generated

Ch 12 © The McGraw-Hill Companies, Inc., 2000 6 CCI

Basic Concepts

• Types of files– program files

• software instructions

– data files• data order and organization

should be logical and consistent

Ch 12 © The McGraw-Hill Companies, Inc., 2000 7 CCI

Types of Data Files• Master

– relatively permanent records updated periodically

– currently accurate • Transaction

– temporary holding file used for additions, deletions, modifications

Ch 12 © The McGraw-Hill Companies, Inc., 2000 8 CCI

Batch vs. Online

• Batch– collect data, then process all at once– advantage

• very efficient processing, checking for data validity occurs at the originating batch site

• Online – real-time processing– airlines reservation system booking prevents

duplicate reservations

Ch 12 © The McGraw-Hill Companies, Inc., 2000 9 CCI

Offline vs. Online

• Offline– not directly accessible to the CPU such as tapes

or disks that need to be loaded

• Online– storage is direct and fast– generally disk

Ch 12 © The McGraw-Hill Companies, Inc., 2000 10 CCI

File Organization

• Sequential access storage– stores one record after another– alphabetic or numeric

• Direct access storage– can access the data using direct methods such

as addressing

Ch 12 © The McGraw-Hill Companies, Inc., 2000 11 CCI

Organizing Methods

• Sequential file organization– records can be retrieved in the sequence that

they were stored– useful when large group needs to be accessed

most of the time– catalog mailing

Ch 12 © The McGraw-Hill Companies, Inc., 2000 12 CCI

Organizing Methods (continued)

• Direct file organization– random file organization– records stored in no particular sequence– hashing algorithm used to generate a unique

number to identify the record– faster for finding a specific record

Ch 12 © The McGraw-Hill Companies, Inc., 2000 13 CCI

Organizing Methods

• Indexed-sequential file organization– or indexed file organization– files stored in sequential order– indexes records according to key field– requires magnetic or optical disk– slower overall than direct access– bank has up-to-date record information, but

prints sequentially (monthly statements)

Ch 12 © The McGraw-Hill Companies, Inc., 2000 14 CCI

File Management System

• Disadvantage of file management systems– data redundancy– lack of integrity– lack of program independence

Ch 12 © The McGraw-Hill Companies, Inc., 2000 15 CCI

Database Management Systems

• DBMS

• Controls the structure of the database and access to the data

Ch 12 © The McGraw-Hill Companies, Inc., 2000 16 CCI

Advantages of DBMS

• Reduced data redundancy

• Improved data integrity

• More program independence

• Increased user productivity

• Increased security

Ch 12 © The McGraw-Hill Companies, Inc., 2000 17 CCI

Disadvantages of DBMS

• Cost issues

• Data vulnerability issues

• Privacy issues

Ch 12 © The McGraw-Hill Companies, Inc., 2000 18 CCI

Database Organization

• Hierarchical

• Network

• Relational

• Object-oriented

Ch 12 © The McGraw-Hill Companies, Inc., 2000 19 CCI

Hierarchical • Grouped in related groups, or tree

• Lower level record called a child

• Parent record at the top of the tree is called a root record

• In a hierarchical database, a parent may have more than one child, but a child has only one parent (a one-to-many relationship)

• Simple and fast

Ch 12 © The McGraw-Hill Companies, Inc., 2000 20 CCI

Network Database

• A type of hierarchical database, but children can have more than one parent

• More flexible because can establish relationships between differently parents

• Limits to the number of links

• Retains some of the speed of access of a hierarchical database

Ch 12 © The McGraw-Hill Companies, Inc., 2000 21 CCI

Relational Database

• Relates data through a key field

• More flexible

• Advantage– user does not have to be aware of structure– easily add, modify, delete records

Ch 12 © The McGraw-Hill Companies, Inc., 2000 22 CCI

Relational Database

• Disadvantage– can be time consuming

Ch 12 © The McGraw-Hill Companies, Inc., 2000 23 CCI

Object-Oriented Database

• OODBMS

• Numeric, text, graphics, audio

• Important part of technology merge

• Uses– medical information systems– engineering information systems– geographic databases– training and education

Ch 12 © The McGraw-Hill Companies, Inc., 2000 24 CCI

DBMS Features

• Data dictionary– also called encyclopedia and repository– stores data definitions

• Utilities– assist in maintaining databases by filtering

acceptable data for input, editing data, and monitoring

Ch 12 © The McGraw-Hill Companies, Inc., 2000 25 CCI

Query Language

• Data manipulation language• Used to make database

queries that do not require command language

• Most popular is SQL (“see quill”), or Structured Query Language

Ch 12 © The McGraw-Hill Companies, Inc., 2000 26 CCI

SQL• Used in Oracle, Sybase, dBase, Paradox,

and Microsoft Access• Some use a natural or spoken English

method of information gathering

Ch 12 © The McGraw-Hill Companies, Inc., 2000 27 CCI

Report Generator

• Produces on-screen or printed reports

• User can customize appearance

Ch 12 © The McGraw-Hill Companies, Inc., 2000 28 CCI

Access Security

• Can be tailored for group access or individual access

• Physical security is equally as critical as data security

Ch 12 © The McGraw-Hill Companies, Inc., 2000 29 CCI

System Recovery

• Recovery types– full and partial– match backup techniques

• Techniques– mirroring– reprocessing– rollforward– rollback

Ch 12 © The McGraw-Hill Companies, Inc., 2000 30 CCI

Types of Recovery

• Mirroring– frequent simultaneous copying of data to two or

more places

• Reprocessing– goes back to a point of database activity where

the database was correct and reprocesses data to bring it up to date

Ch 12 © The McGraw-Hill Companies, Inc., 2000 31 CCI

Types of Recovery (continued)

• Rollforward: forward recovery– recreates current database using a previously

stored database– uses after-image records with processing

information

• Rollback: backward recovery– undoes unwanted images, for example, if only

half a transaction was processed

Ch 12 © The McGraw-Hill Companies, Inc., 2000 32 CCI

Mining, Warehouses, “Siftware”

• Data mining– DM, or knowledge discovery– sifts through large database to

uncover trends and predict future trends

– helps in marketing, health, and science

Ch 12 © The McGraw-Hill Companies, Inc., 2000 33 CCI

Data Warehousing• Requires data preparation

– identification of all data sources– fuse data and clean or scrub data to ensure

accuracy– metadata shows the origins of data, the

transformations, and summary data

• Data warehouse– combination of cleaned data and metadata– often uses massively parallel processing (MPP)

Ch 12 © The McGraw-Hill Companies, Inc., 2000 34 CCI

Siftware for Finding and Analyzing

• Query-and-reporting tools– Focus Reporter, Seagate Crystal Reports,

Esperant– Specific questions to verify hypotheses

• Multidimensional-analysis tools– MDA– Essbase, Lightship– data surfing to explore dimensions of subset

Ch 12 © The McGraw-Hill Companies, Inc., 2000 35 CCI

Siftware for Finding and Analyzing...

• Intelligent agents– roam networks and perform complex tasks– DataEngine, Data, Logic– Help turn up unexpected relationships and

patterns

• Data mining– combines facts from all parts of a business– cash registers, shipping documents, credit-card

files

Ch 12 © The McGraw-Hill Companies, Inc., 2000 36 CCI

Ethics of Using Databases

• Misinformation explosion– data is found, but little effort

is made to insure that the data is updated

– reliance on anecdotal evidence

– causes inaccuracies that can be harmful

Ch 12 © The McGraw-Hill Companies, Inc., 2000 37 CCI

Information Accuracy

• More facts, faster facts, but not necessarily better facts

• Database is not necessarily updated with current information

• Computer sources not necessarily accurate

Ch 12 © The McGraw-Hill Companies, Inc., 2000 38 CCI

Information Completeness

• Know the boundaries, as no information service has it all

• Know the complete iterations of key words• History is limited

– most databases go back only to 1980– frequently assessment is unthinkingly extended

to years beyond 1980

Ch 12 © The McGraw-Hill Companies, Inc., 2000 39 CCI

Privacy Issues• Right not to reveal

information about one’s self

• Credit card, shopping habits, harassment

• Fair Information Practices– U.S. Department of

Health, education, and Welfare

Ch 12 © The McGraw-Hill Companies, Inc., 2000 40 CCI

Privacy Enactment

• Privacy Act of 1974– limits government and their contractors– right to see and correct inaccurate data about

one’s self• Freedom of Information Act

– personal access to data gathered on self• Computer Matching and Privacy Protection

Act– prevents government from comparing some

records to other records of individuals

Ch 12 © The McGraw-Hill Companies, Inc., 2000 41 CCI

Finance Privacy

• Fair Credit Reporting Act of 1970– access to and challenge credit records– if denied credit, must be given free of

charge

• Right to Financial Privacy Act of 1978– restrictions on federal agencies

wanting to search records in banks

Ch 12 © The McGraw-Hill Companies, Inc., 2000 42 CCI

Health Privacy

• No federal laws protect medical records in the United States– except drug and alcohol abuse and psychiatric

care

• A strategy is to decline to fill out medical history or questionnaires unless clear need for them

• Can always ask for a copy of your medical records

Ch 12 © The McGraw-Hill Companies, Inc., 2000 43 CCI

Employment Privacy

• Nongovernmental employer least regulated by privacy legislation

• Employers may verify– education– employment– credit– driving record– workers’ compensation claims– criminal record, if any

Ch 12 © The McGraw-Hill Companies, Inc., 2000 44 CCI

Commerce Issues

• Marketing gathers data about age, buying habits, favorite charities

• No prohibition of gathering data for one reason and using it for another– except Video Privacy Protection Act of 1988

• prevents giving out records without a court order or individual’s consent

Ch 12 © The McGraw-Hill Companies, Inc., 2000 45 CCI

Communications Privacy

• Some constraints in acquiring and disseminating information, listening, and encryption use

• Some argue that you must be willing to give up some privacy for safety and security