data management & data warehouses
Post on 23-Feb-2016
43 Views
Preview:
DESCRIPTION
TRANSCRIPT
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Data Management & Data Warehouses
MIS 320Kraig Pencil
Summer 2014
1
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Game Plan
• Introduction• Why use a relational database?• Database management systems• Data warehouses• Data mining• Data marts
2
PPT Slides by Dr. Craig Tyran & Kraig Pencil
A. Why use a relational database?1. A database sounds great, but why don’t we just store all our data in one
big table in an Excel spreadsheet?– Example: Can you foresee any hassles or potential difficulties associated
with entering/storing order information in the following Excel table?
3
PPT Slides by Dr. Craig Tyran & Kraig Pencil
A. Why use a relational database?1. Why don’t we just store our data in one spreadsheet table?
(cont.)– Potential problems
• May have “redundant” data entry• Potential for data entry errors (different/wrong phone
numbers)• Updates can be a hassle/inefficient (e.g., change phone no)
– Solution• “Normalize” the data … Break up the table into a set of linked tables in a data
base (instead of having one spreadsheet)– See example 4
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Example: Normalized Tables(and the advantages of a database)
Questions:
a) Any unneeded redundancy?
b) Is it now efficient to update customer info?
c) Where is the foreign key?
5
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Example: Non-Normalized Data Table for an Auto Shop (Rainer & Turban, Fig 4.6)
Examples of redundancy
PPT Slides by Dr. Craig Tyran & Kraig Pencil
B. Database Management Systems1. What is a “database management system”
(DBMS)? SW that allows one to create, store, organize,
manage, and use data• Example of a DBMS?
2. Key components– Data Definition subsystem– Data Manipulation subsystem– Application Generation subsystem– Data Administration subsystem– DBMS engine
7
PPT Slides by Dr. Craig Tyran & Kraig Pencil
DBMS Components
8
Lab Tutorials 1,2
Lab Tutorials 3,5
Lab Tutorials 4,6
PPT Slides by Dr. Craig Tyran & Kraig Pencil
B. Database Management Systems3. Examples of DBMS components in Access
Data Definition subsystem– Data dictionary (“Design view” for a table)
Data Manipulation subsystem: Move, change, and “ask questions”
– View of a table (“Datasheet view” for a table)
– Query-by-example (QBE) tool– Structured query language (SQL)
Application Generation subsystem: the “front end”– Design of forms and reports
Data Administration subsystem– Optimize query performance– Security settings with password
9
PPT Slides by Dr. Craig Tyran & Kraig Pencil
B. Database Management Systems4. What aspects of data need to be specified?
– Lots of aspects!!!• Recall table creation in MS Access (Tutorials 1 & 2)
– Common data properties• Data “type” (number, text, date, etc.)• Description • Field size• Required/not required• Etc.
– An important reference for a database system: Data dictionary– Stores information about the data in a database
10
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Access Example:
Information about the “Gender” field is specified in “Field Properties” section
Data “type” (number, text, date, etc.)
Description
Field size
Required/not required
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Access Example: Data Manipulation Subsystem (Low Stock Products query)
QBE or SQL may be used to prepare a query. Which approach would be easier for most people?
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Access Example: Application Generation Subsystem (Employer Information Form)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Access Example: Data Administration (Performance Analysis for a Database)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
B. Database Management Systems (cont.)
5. DBMS: Example products– You are very likely to work with – and
possibly help develop a database– using one or more of the following:• Small-Midsize DBMS: Microsoft Access,
dBase, Paradox• Mid-to-Large DBMS: Microsoft SQL
Server, Oracle, DB2, Informix, IMS
15
PPT Slides by Dr. Craig Tyran & Kraig Pencil
C. Data Warehouses1. Business problem:
• Difficult for larger organizations to analyze organizational data from multiple sources
• Solution: Data warehouse
2. Gather/integrate information from existing operational databases into a “warehouse”
• Create “Business Intelligence” system
• See next figure 16
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Create a Data Warehouse from Operational Databases
From Haag, et al., MIS for the Information Age, 2004
17
PPT Slides by Dr. Craig Tyran & Kraig Pencil
C. Data Warehouses (cont.)3. Data warehouse features
• Designed to support business decision making• Not transactions!• Supports OLAP
– On-line Analytical Processing• Crosses functional boundaries of an organization• Can be very large • Note: Warehouse is “read only”
• Why?• Can be a significant strategic resource for a company
Can yield a high ROI
4. Examples• ??? 18
PPT Slides by Dr. Craig Tyran & Kraig Pencil
C. Data Warehouses (cont.)
5. Implementation issues• People may be reluctant to
share information• “ETL” process is not easy
• Extraction, transformation, load• Expensive
19
PPT Slides by Dr. Craig Tyran & Kraig Pencil
D. Data Mining1. Provides a means to extract patterns and
relationships from large amount of data (e.g., a data warehouse)
2. Mining analogy – Sift through raw dirt/rock to find something of
value– Large volumes of data are sifted in an attempt
to find something worthwhile
3. Example: market basket analysis– Identify products that may be attractive to a
customer• See next slide: Amazon.com buyer suggestions
20
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Data Mining: Example of pattern discovered via mining
PPT Slides by Dr. Craig Tyran & Kraig Pencil
D. Data mining (cont.)4. Identify previously unknown patterns
– e.g., What are characteristics of customers likely to default on a bank loan?“Target knows before it shows”
– How Target Figured Out A Teen Girl Was Pregnant Before Her Father Did
– How Companies Learn Your Secrets: NYTimes
– e.g., Suppose you discovered that beer and diapers*were often found in the same purchase?• “Market basket analysis”• What could you do with that information to improve sales of one, the other or both?
*This is a common example, not an actual case.
22
PPT Slides by Dr. Craig Tyran & Kraig Pencil
E. Data Marts5. Data marts
• Warehouses can be overwhelming/difficult to implement …
Some organizations create “data marts”
• A subset of a data warehouse• Simpler, scaled-down version• Focuses on/Integrates a specific
area (e.g., Sales department)• Provides useful decision making
tools
Haggen photo from: www.callhugh.com/ ferndale.php MiniMart photo from: http://www.ae.gatech.edu/research/controls/pictures/f020801_gtar/Mini%20Mart.JPG
23
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Data Marts: Subsets of Data Warehouse
24
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Data Mining – Business Intelligence
• A few videos to watch and think about …• http://www.youtube.com/user/SASsoftware?v=C14GVhNt7Do&featu
re=pyv&ad=4782573666&kw=CRM• http://www.youtube.com/user/ibm?#p/c/13/fFdITHMuy2w• http://www.youtube.com/user/SASsoftware?v=2677nWVNg9M&feat
ure=pyv&ad=4782551166&kw=business%20analytics• http://www.youtube.com/watch?v=El_lSd6G5WU• http://www.youtube.com/watch?v=uP89kaDU40c• http://www.youtube.com/user/SASsoftware?v=C14GVhNt7Do&featu
re=pyv&ad=4782573666&kw=CRM#p/u/35/ecqk0JUKvAI
25
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Big Data• Big data[1][2] is a collection
of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. (Wikipedia)
• (Image)
26
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Global Big Data: + 2.5 exobytes/day
• The world's technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s[15]
• As of 2012, every day 2.5 quintillion (2.5×1018) bytes of data were created.[16]
• (Wikipedia)• (Image)
27
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Big Data• The next frontier in data?• http://www.eweek.com/c/a/Data-Storage/Big-Data-Analytics-Is
-Just-Starting-to-Reach-Its-Potential-10-Reasons-Why-457684/?kc=EWKNLEAU07102012STR1
• Some terms:– Hadoop (distributed file
organization)– Distributed databases and
server clusters– Cassandra (No only SQL
DBMS)– MapReduce (breaking
computation into smaller pieced, then combining the results of each computation) 281993 2000 2007 2014
-
500,000,000,000
1,000,000,000,000
1,500,000,000,000
2,000,000,000,000
2,500,000,000,000
3,000,000,000,000
Total World Data Storage Capac-ity
(in CDs @ 730MB/CD)
top related