penn state student chapter of the association for computing machinery we welcome all interested...
TRANSCRIPT
![Page 1: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/1.jpg)
Penn State Student Chapter of theAssociation for
Computing Machinery
We welcome all interested students to our 4th general meeting of the Spring 2005 semester!
When: Monday, April 11th, 2005 from 7-8 pm Where: Cybertorium (213 IST)
Agenda:• Brief overview of our ACM chapter• New officer introductions• Special topic presentation: No Pain, No Game
Presented by IST Professor Brian K. Smith• Co-op/Intern presentation: Working at IBM
Presented by Rick Osowski
Free refreshments will be provided
![Page 2: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/2.jpg)
2
IST 210
Data Warehousing, Data Mining, and Advanced Applications
![Page 3: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/3.jpg)
3
IST 210 Data Rich, but Information Poor
Data is stored, not explored : by its volume and complexity it represents a burden, not a support
Data overload results in uninformed decisions, contradictory information, higher overhead, wrong decisions, increased costs
Data is not designed and is not structured for successful management decision making
![Page 4: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/4.jpg)
4
IST 210 Improving Decision Making
Data
Information
Decisions
Data Warehouse
![Page 5: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/5.jpg)
5
IST 210 Data Warehouse Concepts
![Page 6: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/6.jpg)
6
IST 210 What’s a Data Warehouse?
A data warehouse is a single, integrated source of decision support information formed by collecting data from multiple sources, internal to the organization as well as external, and transforming and summarising this information to enable improved decision making.
A data warehouse is designed for easy access by users to large amounts of information, and data access is typically supported by specialized analytical tools and applications.
![Page 7: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/7.jpg)
7
IST 210
Data Warehouse Characteristics
Key Characteristics of a Data Warehouse
Subject-oriented Integrated Time-variant Non-volatile
![Page 8: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/8.jpg)
8
IST 210 Subject Oriented• Example for an insurance company :
PolicyPolicyCustomerCustomer
Data
LossesLosses PremiumPremium
Commercial and Life
Insurance Systems
Commercial and Life
Insurance Systems
Auto and Fire Policy
Processing Systems
Auto and Fire Policy
Processing Systems
Data
Accounting System
Accounting System
Claims Processing
System
Claims Processing
SystemBilling System
Billing System
Applications Area Data Warehouse
![Page 9: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/9.jpg)
9
IST 210 Integrated
• Data is stored once in a single integrated location(e.g. insurance company)
Data WarehouseDatabase
Subject = Customer
Auto PolicyProcessing
System
Auto PolicyProcessing
System
Customer data stored in severaldatabases
Fire PolicyProcessing
System
Fire PolicyProcessing
System
FACTS, LIFECommercial, Accounting
Applications
FACTS, LIFECommercial, Accounting
Applications
![Page 10: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/10.jpg)
10
IST 210 Time - Variant
Data is tagged with some element of time - creation date, as of date, etc.
Data is available on-line for long periods of time for trend analysis and forecasting. For example, five or more years
Data Warehouse Data
Time Data
{Key
• Data is stored as a series of snapshots or views which record how it is collected across time.
![Page 11: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/11.jpg)
11
IST 210 Non-Volatile
• Existing data in the warehouse is not overwritten or
updated. External Sources
• Read-Only
DataWarehouseDatabaseData
WarehouseEnvironment
Data Warehouse
Environment
ProductionDatabases
ProductionApplications
ProductionApplications
• Update• Insert• Delete
• Load
![Page 12: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/12.jpg)
12
IST 210 Transaction System vs. Data Warehouse
![Page 13: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/13.jpg)
13
IST 210
On-line, real time update into disparate systems
Day-to-day operations System Experts
UsersData Manipulation
Unix
VMS
MVS
Other
Transaction-Based Reporting System
![Page 14: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/14.jpg)
14
IST 210
BENEFIT: Reduce data processing
costs
BENEFIT: Integrated, consistent data
available for analysis
BENEFIT: Improve Network Reporting processes and
analytical capabilities
Data Staging, Transformation and Cleansing
Data Staging, Transformation and Cleansing
Interfaces
Executive Reporting and On-Line Analysis
EnvironmentOther
VMS
MVS
Unix
Su
mm
arization
OLAP
DataWarehouse
Warehouse-Based Reporting System
![Page 15: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/15.jpg)
15
IST 210
Transaction - Warehouse Process
TransformSummarize &
Refine
On-line, real time update.
“Transaction Based Process”
Day-to-day operations
Detailed Information to operational systems.
“Warehouse Based Process”
Decision support for management use.
Batch Load
![Page 16: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/16.jpg)
16
IST 210
Supports management analysis and decision-
making processes Contains summarized, refined, and cleansed
information Non-volatile -- provides a data “snapshot”;
adjustments are not permitted, or are limited Business analysis requirements drive the data
structure and system design Integrated, consistent information on a single
technology platform Users have direct, fast access via On-line
Analytical Processing tools Minimal impact on operational processes
Data Warehouse
Supports day-to-day operational processes Contains raw, detailed data that has not been
refined or cleansed Volatile -- data changes from day-to-day, with
frequent updates Technical issues drive the data structure and
system design Disparate data structures, physical locations,
query types, etc. Users rely on technical analysts for reporting
needs Operational processes impacted by queries
run off of system
Transaction System
Transaction System vs. Data Warehouse
![Page 17: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/17.jpg)
17
IST 210 Data Warehouse Architecture
![Page 18: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/18.jpg)
18
IST 210 Data Warehouse Architecture
Conversion & Interface
OLAPCubes
Ad-hocReporting
CannedReports
Data MartsStaging AreaODS
Operational System Data Warehouse
![Page 19: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/19.jpg)
19
IST 210
Map source data to target Data scrubbing Derive new data Data Extraction Transform / convert data Create / modify metadata
Conversion& Cleansing
Data Warehouse ArchitectureConversion and Cleansing Activities
![Page 20: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/20.jpg)
20
IST 210
DetailedData
Metadata
Ranges from detailed to summarized data
Contains metadata Many views of the data Subject-Oriented Time-variant
SummaryData
Data Warehouse ArchitectureData Warehouse Components
![Page 21: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/21.jpg)
21
IST 210
Requirements Gathering Process Business Measure Definition
Standard definition and related business rules and formulas
Source data element(s), including quality constraints
Data granularity levels (e.g., county detail for state)
Data retention (e.g., one month, one quarter, one year, multiple years)
Priority of the information (For example, is the information necessary to derive other business measures?)
Data load frequency (e.g., monthly, quarterly, etc.)
![Page 22: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/22.jpg)
22
IST 210 Star Join Schema
Region_Dimension_Tableregion _id
NENWSESW
region _doc
NortheastNorthwestSoutheastSouthwest
account _id
100000110000120000130000140000
account _doc
ABC ElectronicsMidway ElectricVictor ComponentsWashburn, Inc.Zerox
Account_Dimension_Table
Product_Dimension_Tableprod_grp_id
102030
prod_id
100140220
prod_grp_desc
Fewer devicesCircuit boardsComponents
prod_desc
Power supplyMotherboardCo-processor
month
01-199602-199603-1996
mo_in_fiscal_yr
456
month_name
JanuaryFebruaryMarch
prod_id
100140220
region_id
SWNESW
account_id
100000110000100000
vend_id
100200300
net-sales
30,00023,00032,000
gross_sales
50,00042,00049,000
Monthly_Sales_Summary_Table
Time_Dimension_Table
Fact Table
Dimension Tables
Vendor_Dimension_Tablevend_id
100200300
vendor_desc
PowerAge, Inc.Advanced Micro DevicesFarad Incorporated
month
01-199602-199603-1996
![Page 23: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/23.jpg)
23
IST 210 Multi-Dimensional Analysis
Zip Code
County
Region
State
ProductFamily
Client Type
Account
Store
ProductLine Brand
Category
GroupItem
Class of Trade
Net Sales by Brand byRegion by Client Type
Geography Dimension
Customer Dimension
Product DimensionProduct Dimension
Business Measure:Net Sales
DW0117
![Page 24: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/24.jpg)
24
IST 210 Application Solution Classes
Executive information system (EIS) : Present information at the highest level of summarization using
corporate business measures. They are designed for extreme ease-of-use and, in many cases, only a mouse is required. Graphics are usually generously incorporated to provide at-a-glance indications of performance
Decision Support Systems (DSS) : They ideally present information in graphical and tabular
form, providing the user with the ability to drill down on selected information. Note the increased detail and data manipulation options presented
![Page 25: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/25.jpg)
25
IST 210
1
Data Mining
![Page 26: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/26.jpg)
26
IST 210 Data Mining The process of extracting valid, previously
unknown, comprehensible, and actionable information from large databases and using it to make crucial business decisions, (Simoudis,1996).
Involves the analysis of data and the use of software techniques for finding hidden and unexpected patterns and relationships in sets of data.
![Page 27: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/27.jpg)
27
IST 210 Data Mining Reveals information that is hidden and
unexpected, as little value in finding patterns and relationships that are already intuitive.
Patterns and relationships are identified by examining the underlying rules and features in the data.
Data mining can provide huge paybacks for companies who have made a significant investment in data warehousing.
Relatively new technology, however already used in a number of industries.
![Page 28: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/28.jpg)
28
IST 210 Examples of Applications of Data Mining
Retail / Marketing Identifying buying patterns of customers Finding associations among customer demographic
characteristics Predicting response to mailing campaigns Market basket analysis
Banking Detecting patterns of fraudulent credit card use Identifying loyal customers Predicting customers likely to change their credit card
affiliation Determining credit card spending by customer groups
![Page 29: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/29.jpg)
29
IST 210 Examples of Applications of Data Mining
Insurance Claims analysis Predicting which customers will buy new policies
Medicine Characterizing patient behavior to predict surgery
visits Identifying successful medical therapies for different
illnesses
![Page 30: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/30.jpg)
30
IST 210 Data Mining Operations and Associated Techniques
![Page 31: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/31.jpg)
31
IST 210 Database Segmentation Aim is to partition a database into an unknown number
of segments, or clusters, of similar records. Uses unsupervised learning to discover homogeneous
sub-populations in a database to improve the accuracy of the profiles.
Less precise than other operations thus less sensitive to redundant and irrelevant features.
Sensitivity can be reduced by ignoring a subset of the attributes that describe each instance or by assigning a weighting factor to each variable.
Applications of database segmentation include customer profiling, direct marketing, and cross selling.
![Page 32: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/32.jpg)
32
IST 210 Scatterplot
![Page 33: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/33.jpg)
33
IST 210 Visualization
![Page 34: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/34.jpg)
34
IST 210
Data Mining and Data Warehousing
Major challenge to exploit data mining is identifying suitable data to mine.
Data mining requires single, separate, clean, integrated, and self-consistent source of data.
A data warehouse is well equipped for providing data for mining.
Data quality and consistency is a pre-requisite for mining to ensure the accuracy of the predictive models. Data warehouses are populated with clean, consistent data.
![Page 35: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/35.jpg)
35
IST 210 Data Mining and Data Warehousing It is advantageous to mine data from multiple sources
to discover as many interrelationships as possible. Data warehouses contain data from a number of sources.
Selecting the relevant subsets of records and fields for data mining requires the query capabilities of the data warehouse.
The results of a data mining study are useful if there is some way to further investigate the uncovered patterns. Data warehouses provide the capability to go back to the data source.
![Page 36: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/36.jpg)
36
IST 210 Advanced Database Topics
![Page 37: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/37.jpg)
37
IST 210 A Little History Prior to the 1980s hierarchical and network
databases. Hardware dumb terminals using private
networks Database centralized and stored on the disk
packs End user terminals simply input/output devices
Processing at the mainframe Data text data Networks had to handle text data No access from outside to the organization's
private network.
![Page 38: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/38.jpg)
38
IST 210
Microcomputer enabled workstation processing power.
Satellite and network technology provided for very high speed, high traffic, and low cost long distance communications networks.
Internet in the late 1990s and the corresponding phenomenal growth in electronic commerce (E-commerce) necessitated public access to data in people's homes.
The volume of data needed to be transmitted increased greatly.
New Needs
![Page 39: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/39.jpg)
39
IST 210
Business environment changed during the last two decades
Information stored at different locations, on different hardware and operating systems, with different commercial DBMS products, and with different underlying data models had to be combined
The centralized database was no longer feasible to handle these new demands
New Needs
![Page 40: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/40.jpg)
40
IST 210
Distributed Database Scenario
There are many advantages to using a distributed database rather than a centralized database. They are:
Improved performance, because high traffic data are stored locally.
More efficient data management, because the DBA workload is shared.
Better network integrity, because the whole system does not stop if one computer goes down.
Expansion of the database is facilitated when the organization grows, since new data does not have to be centralized. It can remain and be administered in the original location.
Data for the whole organization can still be accessed from any location.
![Page 41: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/41.jpg)
41
IST 210
Data administration is improved (??) In a distributed database system even a
simple task like creating a backup copy of the database can take a considerable amount of time.
If the database is divided among several locations the time and workload for this task can be shared.
Distributed Database
![Page 42: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/42.jpg)
42
IST 210 Replication of Data System failure in one location should not stop
processing in other locations Replicate all or parts of the database in more than one
location. Database replication improves performance and
provides a fail-safe option, but it involves considerable complexity
Replication of frequently used data improves response time and reduces network traffic
If the data changes at one location it must be changed at all locations
![Page 43: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/43.jpg)
43
IST 210
Distributed Systems in an Ideal World
C. J. Date established rules for the ideal distributed DBMS system
Rules are a goal that distributed systems strive toward, but have not yet reached
According to Date's rules: Each site is responsible for its own portion of the
distributed database, including security, backup, and recovery.
Each site has equal capabilities and does not rely on any other site.
The system should work regardless of the computer hardware, operating system, or network installed at any site.
![Page 44: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/44.jpg)
44
IST 210
Date's Rules of Distributed Databases:
1. Local site independence2. Central site independence3. Failure independence4. Location transparency5. Fragmentation transparency6. Replication transparency7. Distributed query processing8. Distributed transaction processing9. Hardware independence10. Operating system independence11. Network independence12. Database independence
![Page 45: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/45.jpg)
45
IST 210
Complexities of Distributed Databases
There also are many complications involved in the management of distributed database systems.
The distributed database must be carefully designed to insure the following:
Store data as close as possible to where it is used most often.
Make the location of the data transparent to the end user.
Make the system easy to expand. Optimize queries to improve response time in the
distributed environment.
![Page 46: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/46.jpg)
46
IST 210 Database Design The designer must analyze the organization's
needs and business processes to determine the best way to distribute the database.
There are several possibilities for storing the data in more than one location:
Centralized master database Replication of the entire or part of the database in
several locations Horizontal partitions Vertical partitions Mixture of the above
![Page 47: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/47.jpg)
47
IST 210 Fragmentation
Horizontal fragmentation of the database
means that rows of a table(s) may be stored in different locations
Similar to the separation of the customer table in the retailing example above.
Vertical fragmentation means that columns of a table ( i.e., attributes or groups of attributes of an entity) are stored in different locations.
![Page 48: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/48.jpg)
48
IST 210 Query Formulation Distributed databases require a considerable
amount of network overhead Poorly formulated query it may cause
unnecessary data retrieval from the database Query optimization is ideally performed by the
distributed database management system
![Page 49: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/49.jpg)
49
IST 210 OODB In traditional relational databases E-R Modeling and
normalization focuses on identifying entities, their attributes, and the relationships between entities
This works well for most organizational data, especially business data
The advent of the microcomputer and processing power on the desktop
Computer aided design, CAD, became the norm for engineering work, so it became necessary to store drawings
Powerful multimedia PCs with sound cards and color monitors enabled the manipulation of sound and video files
Many other applications were developed that required more than just text and numeric processing
![Page 50: Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f275503460f94c3f274/html5/thumbnails/50.jpg)
50
IST 210 Why?? These new applications were facilitated by the
development of Object-Oriented Programming Still evolving development of object-oriented data
modeling, object-oriented databases, and object-oriented database management systems
OODBMS and O/R DBMS are two types of database management systems that are currently available
O/R DBMS uses the basic theory of relational database management systems with object-oriented features added
OODBMS is more object-oriented and was developed separately from the relational products
OODMBS suffers from a lack of standardization that is available with relational database systems