data management & data warehouses
DESCRIPTION
Data Management & Data Warehouses. MIS 320 Kraig Pencil Summer 2014. Game Plan. Introduction Why use a relational database? Database management systems Data warehouses Data mining Data marts. A. Why use a relational database?. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Data Management & Data Warehouses](https://reader036.vdocument.in/reader036/viewer/2022062305/56816388550346895dd4780f/html5/thumbnails/1.jpg)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Data Management & Data Warehouses
MIS 320Kraig Pencil
Summer 2014
1
![Page 2: Data Management & Data Warehouses](https://reader036.vdocument.in/reader036/viewer/2022062305/56816388550346895dd4780f/html5/thumbnails/2.jpg)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Game Plan
• Introduction• Why use a relational database?• Database management systems• Data warehouses• Data mining• Data marts
2
![Page 3: Data Management & Data Warehouses](https://reader036.vdocument.in/reader036/viewer/2022062305/56816388550346895dd4780f/html5/thumbnails/3.jpg)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
A. Why use a relational database?1. A database sounds great, but why don’t we just store all our data in one
big table in an Excel spreadsheet?– Example: Can you foresee any hassles or potential difficulties associated
with entering/storing order information in the following Excel table?
3
![Page 4: Data Management & Data Warehouses](https://reader036.vdocument.in/reader036/viewer/2022062305/56816388550346895dd4780f/html5/thumbnails/4.jpg)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
A. Why use a relational database?1. Why don’t we just store our data in one spreadsheet table?
(cont.)– Potential problems
• May have “redundant” data entry• Potential for data entry errors (different/wrong phone
numbers)• Updates can be a hassle/inefficient (e.g., change phone no)
– Solution• “Normalize” the data … Break up the table into a set of linked tables in a data
base (instead of having one spreadsheet)– See example 4
![Page 5: Data Management & Data Warehouses](https://reader036.vdocument.in/reader036/viewer/2022062305/56816388550346895dd4780f/html5/thumbnails/5.jpg)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Example: Normalized Tables(and the advantages of a database)
Questions:
a) Any unneeded redundancy?
b) Is it now efficient to update customer info?
c) Where is the foreign key?
5
![Page 6: Data Management & Data Warehouses](https://reader036.vdocument.in/reader036/viewer/2022062305/56816388550346895dd4780f/html5/thumbnails/6.jpg)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Example: Non-Normalized Data Table for an Auto Shop (Rainer & Turban, Fig 4.6)
Examples of redundancy
![Page 7: Data Management & Data Warehouses](https://reader036.vdocument.in/reader036/viewer/2022062305/56816388550346895dd4780f/html5/thumbnails/7.jpg)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
B. Database Management Systems1. What is a “database management system”
(DBMS)? SW that allows one to create, store, organize,
manage, and use data• Example of a DBMS?
2. Key components– Data Definition subsystem– Data Manipulation subsystem– Application Generation subsystem– Data Administration subsystem– DBMS engine
7
![Page 8: Data Management & Data Warehouses](https://reader036.vdocument.in/reader036/viewer/2022062305/56816388550346895dd4780f/html5/thumbnails/8.jpg)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
DBMS Components
8
Lab Tutorials 1,2
Lab Tutorials 3,5
Lab Tutorials 4,6
![Page 9: Data Management & Data Warehouses](https://reader036.vdocument.in/reader036/viewer/2022062305/56816388550346895dd4780f/html5/thumbnails/9.jpg)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
B. Database Management Systems3. Examples of DBMS components in Access
Data Definition subsystem– Data dictionary (“Design view” for a table)
Data Manipulation subsystem: Move, change, and “ask questions”
– View of a table (“Datasheet view” for a table)
– Query-by-example (QBE) tool– Structured query language (SQL)
Application Generation subsystem: the “front end”– Design of forms and reports
Data Administration subsystem– Optimize query performance– Security settings with password
9
![Page 10: Data Management & Data Warehouses](https://reader036.vdocument.in/reader036/viewer/2022062305/56816388550346895dd4780f/html5/thumbnails/10.jpg)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
B. Database Management Systems4. What aspects of data need to be specified?
– Lots of aspects!!!• Recall table creation in MS Access (Tutorials 1 & 2)
– Common data properties• Data “type” (number, text, date, etc.)• Description • Field size• Required/not required• Etc.
– An important reference for a database system: Data dictionary– Stores information about the data in a database
10
![Page 11: Data Management & Data Warehouses](https://reader036.vdocument.in/reader036/viewer/2022062305/56816388550346895dd4780f/html5/thumbnails/11.jpg)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Access Example:
Information about the “Gender” field is specified in “Field Properties” section
Data “type” (number, text, date, etc.)
Description
Field size
Required/not required
![Page 12: Data Management & Data Warehouses](https://reader036.vdocument.in/reader036/viewer/2022062305/56816388550346895dd4780f/html5/thumbnails/12.jpg)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Access Example: Data Manipulation Subsystem (Low Stock Products query)
QBE or SQL may be used to prepare a query. Which approach would be easier for most people?
![Page 13: Data Management & Data Warehouses](https://reader036.vdocument.in/reader036/viewer/2022062305/56816388550346895dd4780f/html5/thumbnails/13.jpg)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Access Example: Application Generation Subsystem (Employer Information Form)
![Page 14: Data Management & Data Warehouses](https://reader036.vdocument.in/reader036/viewer/2022062305/56816388550346895dd4780f/html5/thumbnails/14.jpg)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Access Example: Data Administration (Performance Analysis for a Database)
![Page 15: Data Management & Data Warehouses](https://reader036.vdocument.in/reader036/viewer/2022062305/56816388550346895dd4780f/html5/thumbnails/15.jpg)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
B. Database Management Systems (cont.)
5. DBMS: Example products– You are very likely to work with – and
possibly help develop a database– using one or more of the following:• Small-Midsize DBMS: Microsoft Access,
dBase, Paradox• Mid-to-Large DBMS: Microsoft SQL
Server, Oracle, DB2, Informix, IMS
15
![Page 16: Data Management & Data Warehouses](https://reader036.vdocument.in/reader036/viewer/2022062305/56816388550346895dd4780f/html5/thumbnails/16.jpg)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
C. Data Warehouses1. Business problem:
• Difficult for larger organizations to analyze organizational data from multiple sources
• Solution: Data warehouse
2. Gather/integrate information from existing operational databases into a “warehouse”
• Create “Business Intelligence” system
• See next figure 16
![Page 17: Data Management & Data Warehouses](https://reader036.vdocument.in/reader036/viewer/2022062305/56816388550346895dd4780f/html5/thumbnails/17.jpg)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Create a Data Warehouse from Operational Databases
From Haag, et al., MIS for the Information Age, 2004
17
![Page 18: Data Management & Data Warehouses](https://reader036.vdocument.in/reader036/viewer/2022062305/56816388550346895dd4780f/html5/thumbnails/18.jpg)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
C. Data Warehouses (cont.)3. Data warehouse features
• Designed to support business decision making• Not transactions!• Supports OLAP
– On-line Analytical Processing• Crosses functional boundaries of an organization• Can be very large • Note: Warehouse is “read only”
• Why?• Can be a significant strategic resource for a company
Can yield a high ROI
4. Examples• ??? 18
![Page 19: Data Management & Data Warehouses](https://reader036.vdocument.in/reader036/viewer/2022062305/56816388550346895dd4780f/html5/thumbnails/19.jpg)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
C. Data Warehouses (cont.)
5. Implementation issues• People may be reluctant to
share information• “ETL” process is not easy
• Extraction, transformation, load• Expensive
19
![Page 20: Data Management & Data Warehouses](https://reader036.vdocument.in/reader036/viewer/2022062305/56816388550346895dd4780f/html5/thumbnails/20.jpg)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
D. Data Mining1. Provides a means to extract patterns and
relationships from large amount of data (e.g., a data warehouse)
2. Mining analogy – Sift through raw dirt/rock to find something of
value– Large volumes of data are sifted in an attempt
to find something worthwhile
3. Example: market basket analysis– Identify products that may be attractive to a
customer• See next slide: Amazon.com buyer suggestions
20
![Page 21: Data Management & Data Warehouses](https://reader036.vdocument.in/reader036/viewer/2022062305/56816388550346895dd4780f/html5/thumbnails/21.jpg)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Data Mining: Example of pattern discovered via mining
![Page 22: Data Management & Data Warehouses](https://reader036.vdocument.in/reader036/viewer/2022062305/56816388550346895dd4780f/html5/thumbnails/22.jpg)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
D. Data mining (cont.)4. Identify previously unknown patterns
– e.g., What are characteristics of customers likely to default on a bank loan?“Target knows before it shows”
– How Target Figured Out A Teen Girl Was Pregnant Before Her Father Did
– How Companies Learn Your Secrets: NYTimes
– e.g., Suppose you discovered that beer and diapers*were often found in the same purchase?• “Market basket analysis”• What could you do with that information to improve sales of one, the other or both?
*This is a common example, not an actual case.
22
![Page 23: Data Management & Data Warehouses](https://reader036.vdocument.in/reader036/viewer/2022062305/56816388550346895dd4780f/html5/thumbnails/23.jpg)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
E. Data Marts5. Data marts
• Warehouses can be overwhelming/difficult to implement …
Some organizations create “data marts”
• A subset of a data warehouse• Simpler, scaled-down version• Focuses on/Integrates a specific
area (e.g., Sales department)• Provides useful decision making
tools
Haggen photo from: www.callhugh.com/ ferndale.php MiniMart photo from: http://www.ae.gatech.edu/research/controls/pictures/f020801_gtar/Mini%20Mart.JPG
23
![Page 24: Data Management & Data Warehouses](https://reader036.vdocument.in/reader036/viewer/2022062305/56816388550346895dd4780f/html5/thumbnails/24.jpg)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Data Marts: Subsets of Data Warehouse
24
![Page 25: Data Management & Data Warehouses](https://reader036.vdocument.in/reader036/viewer/2022062305/56816388550346895dd4780f/html5/thumbnails/25.jpg)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Data Mining – Business Intelligence
• A few videos to watch and think about …• http://www.youtube.com/user/SASsoftware?v=C14GVhNt7Do&featu
re=pyv&ad=4782573666&kw=CRM• http://www.youtube.com/user/ibm?#p/c/13/fFdITHMuy2w• http://www.youtube.com/user/SASsoftware?v=2677nWVNg9M&feat
ure=pyv&ad=4782551166&kw=business%20analytics• http://www.youtube.com/watch?v=El_lSd6G5WU• http://www.youtube.com/watch?v=uP89kaDU40c• http://www.youtube.com/user/SASsoftware?v=C14GVhNt7Do&featu
re=pyv&ad=4782573666&kw=CRM#p/u/35/ecqk0JUKvAI
25
![Page 26: Data Management & Data Warehouses](https://reader036.vdocument.in/reader036/viewer/2022062305/56816388550346895dd4780f/html5/thumbnails/26.jpg)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Big Data• Big data[1][2] is a collection
of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. (Wikipedia)
• (Image)
26
![Page 27: Data Management & Data Warehouses](https://reader036.vdocument.in/reader036/viewer/2022062305/56816388550346895dd4780f/html5/thumbnails/27.jpg)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Global Big Data: + 2.5 exobytes/day
• The world's technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s[15]
• As of 2012, every day 2.5 quintillion (2.5×1018) bytes of data were created.[16]
• (Wikipedia)• (Image)
27
![Page 28: Data Management & Data Warehouses](https://reader036.vdocument.in/reader036/viewer/2022062305/56816388550346895dd4780f/html5/thumbnails/28.jpg)
PPT Slides by Dr. Craig Tyran & Kraig Pencil
Big Data• The next frontier in data?• http://www.eweek.com/c/a/Data-Storage/Big-Data-Analytics-Is
-Just-Starting-to-Reach-Its-Potential-10-Reasons-Why-457684/?kc=EWKNLEAU07102012STR1
• Some terms:– Hadoop (distributed file
organization)– Distributed databases and
server clusters– Cassandra (No only SQL
DBMS)– MapReduce (breaking
computation into smaller pieced, then combining the results of each computation) 281993 2000 2007 2014
-
500,000,000,000
1,000,000,000,000
1,500,000,000,000
2,000,000,000,000
2,500,000,000,000
3,000,000,000,000
Total World Data Storage Capac-ity
(in CDs @ 730MB/CD)