cs 157b: database management systems ii march 13 class meeting department of computer science san...
TRANSCRIPT
CS 157B: Database Management Systems IIMarch 13 Class Meeting
Department of Computer ScienceSan Jose State University
Spring 2013Instructor: Ron Mak
www.cs.sjsu.edu/~mak
Department of Computer ScienceSpring 2013: March 13
CS 157B: Database Management Systems II© R. Mak
2
Midterm Question 1.a
<xs:schema xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:complexType name="investor_type"> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="level"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="beginner"/> <xs:enumeration value="intermediate"/> <xs:enumeration value="advanced"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> <xs:element name="experience"> <xs:complexType> <xs:sequence> <xs:element name="investor" type="investor_type" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element></xs:schema>
experience.xsd
Department of Computer ScienceSpring 2013: March 13
CS 157B: Database Management Systems II© R. Mak
3
Midterm Question 1.b<xs:schema xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:complexType name="stock_type"> <xs:sequence> <xs:element name="symbol" type="xs:string"/> <xs:element name="company" type="xs:string"/> <xs:element name="price" type="xs:decimal"/> <xs:element name="change" type="xs:decimal"/> </xs:sequence> <xs:attribute name="exchange"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="NASDAQ"/> <xs:enumeration value="NYSE"/> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:complexType> <xs:element name="prices"> <xs:complexType> <xs:sequence> <xs:element name="stock" type="stock_type" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element></xs:schema>
prices.xsd
Department of Computer ScienceSpring 2013: March 13
CS 157B: Database Management Systems II© R. Mak
4
Midterm Question 2.a
<winners>{ for $stk in doc("prices.xml") //stock for $inv in doc("investors.xml") //investor let $change := $stk/change where ($inv/portfolio/symbol = $stk/symbol) and ($stk/change > 0) order by $stk/symbol ascending return <stock> { ($stk/symbol, $stk/price, $stk/change, $inv/name) } </stock>}</winners>
winners.xql
Department of Computer ScienceSpring 2013: March 13
CS 157B: Database Management Systems II© R. Mak
5
Midterm Question 2.b<losers>{ for $inv in doc("investors.xml") //investor for $exp in doc("experience.xml") //investor for $stk in doc("prices.xml") //stock let $change := $stk/change where ($inv/name = $exp/name) and ($inv/portfolio/symbol = $stk/symbol) and ($exp/level != "beginner") and ($stk/change < 0) return <investor> { ($inv/name, $exp/level, $stk/symbol, $stk/price, $stk/change) } </investor>}</losers>
losers.xql
Department of Computer ScienceSpring 2013: March 13
CS 157B: Database Management Systems II© R. Mak
6
Midterm Question 3
An application that combines information from the various data sources. an online mash-up a new web service
Operation: Schedule air travel between cities. Web service client code that connects to the airline schedules
web service and the weather forecasts web service. Need to do data unmarshalling if the web services
provide results in XML. Use XQuery to query the flight distance and flight duration
XML documents. Unmarshal the XQuery results and combine with data from the
weather forecasts web service to estimate flight duration.
Department of Computer ScienceSpring 2013: March 13
CS 157B: Database Management Systems II© R. Mak
7
Midterm Question 3
Operation: Allow airline seat selection. Use XSLT to convert the airline seating charts from XML to a
more display-friendly form, such as an HTML-based web page.
Operation: Book lodging. Use Hibernate object-relational mapping to access
the lodging database. Map hierarchical data:
Campsite and hotel are subclasses of lodging. Fancy, Midpriced, and Budget are subclasses of hotel.
Use the Criteria API to do the queries._
Department of Computer ScienceSpring 2013: March 13
CS 157B: Database Management Systems II© R. Mak
8
Project #3 Presentations Next Monday
Present and demo your web services mashupsin 15 minutes.
Section 1 Team INVIKO Team Lasers + 2 other teams
Section 2 Team C Team Unlimited Data + 2 other teams
Department of Computer ScienceSpring 2013: March 13
CS 157B: Database Management Systems II© R. Mak
9
Data Warehouse
A data warehouse is where a company keeps its important data assets.
Three categories of a company’s data. Operational data generated by ongoing business activities.
Examples: customer orders, shipping and receiving, financial transactions, etc.
Integrated data from different parts of the business. Combine data from disparate corporate applications that
weren’t meant to work together, in order to better leverage the data and to improve synchronicity among departments.
Monitoring data to keep track of how the business is doing and to understand and evaluate business processes. Examples: reports, online “dashboards”
Department of Computer ScienceSpring 2013: March 13
CS 157B: Database Management Systems II© R. Mak
10
Data Warehouse
Data assets come from transforming operational data into integrated data or monitoring data. The data warehouse should contain only high quality data.
The data is periodically extracted from their original data sources, transformed into a higher quality, more useful form, and then loaded into the data warehouse. ETL: extract, transform, and load
Data warehousing is a well-architected information management solution ... that enables analytical and information processing ... while overcoming application, organizational, and other barriers.
Department of Computer ScienceSpring 2013: March 13
CS 157B: Database Management Systems II© R. Mak
11
Purpose of a Data Warehouse
Businesses today are deluged by data. A company that can make use of its key data assets
has a better chance to succeed.
A data warehouse provides the platform, tools, and processes to manage and deliver the key data assets.
Decision-makers can rely on the analysis of high-quality data and not just on hunches. Predecessors to data warehousing include
“decision-support systems” (DSS) from the mid 1970s. DSS evolved into “executive information systems” (EIS)
in the mid 1980s.
Department of Computer ScienceSpring 2013: March 13
CS 157B: Database Management Systems II© R. Mak
12
Data Warehousing Thought Leaders
Bill Inmon The “Father of Data Warehousing” Wrote the book Building the Data Warehouse in 1993. Advocates an approach to data warehousing called the
“Corporate Information Factory”.
Ralph Kimball Wrote the first edition of The Data Warehouse Toolkit in 1996. Promotes dimensional modeling.
developed the star schema dimension tables and fact tables denormalized tables
Department of Computer ScienceSpring 2013: March 13
CS 157B: Database Management Systems II© R. Mak
13
Star Schema Example
Facts: sales data Dimensions: date, store, product
http://en.wikipedia.org/wiki/Star_schema
Department of Computer ScienceSpring 2013: March 13
CS 157B: Database Management Systems II© R. Mak
14
Star Schema Example
How many TV sets have been sold, for each brand and country, in 1997?
SELECT P.Brand, S.Country, SUM(F.Units_Sold) FROM Fact_Sales F INNER JOIN Dim_Date D ON F.Date_Id = D.Id INNER JOIN Dim_Store S ON F.Store_Id = S.Id INNER JOIN Dim_Product P ON F.Product_Id = P.Id WHERE D.YEAR = 1997 AND P.Product_Category = 'tv' GROUP BY P.Brand, S.Country
Department of Computer ScienceSpring 2013: March 13
CS 157B: Database Management Systems II© R. Mak
15
Why Denormalization?
Why do we normalize tables in a standard database? Support data updates. Prevent update anomalies. Disadvantages
many tables, many joins slow queries
A data warehouse, on the other hand ... Requires rapid data access Updates occur infrequently
Example: nightly ETL
Therefore, data warehouses can contain denormalized tables._
Department of Computer ScienceSpring 2013: March 13
CS 157B: Database Management Systems II© R. Mak
16
Related to Data Warehousing
Data mart A small data warehouse. A subset or (or a view into) a data warehouse for a
particular set of users, such as financial analysts. the preferred definition
Operational data store (ODS) A database specifically for integrating data from
disparate sources, especially a company’s operational data (e.g., transactional data).
Perform some operations on the data, such as cleansing and conforming.
Feed the data into the data warehouse._
Department of Computer ScienceSpring 2013: March 13
CS 157B: Database Management Systems II© R. Mak
17
Data Warehousing and Business Intelligence (DW/BI)
Business intelligence is a primary reason to have a data warehouse!
Several levels of BI: querying and reporting dashboards and scorecards business analytics data mining
_
Department of Computer ScienceSpring 2013: March 13
CS 157B: Database Management Systems II© R. Mak
18
Business Intelligence
Querying and reporting “Canned” queries that generate formatted results. Generate reports on demand, or automatically and periodically. “What happened recently in my company?”
Dashboards and scorecards Graphical dashboards contain charts and gauges that monitor
a company’s key performance indicators (KPI). Examples: transactions/second, number of web hits, etc.
Scorecards show performance measured against a plan or set of objectives.
“What is happening right now and how are we doing?” _
Department of Computer ScienceSpring 2013: March 13
CS 157B: Database Management Systems II© R. Mak
19
Business Intelligence
Business analysis Online analytical processing (OLAP)
Do not confuse with an older term, online transaction processing (OLTP).
Visualize data in a multidimensional manner. Analytical processes that involve manipulating data along different
dimensions. “What happened recently in my company, and why?”
Data mining Use statistics to do predictive analysis. Discover patterns and relationships in the data. Use artificial intelligence (AI) techniques to find answers
even if you don’t know the questions. “What can happen? What’s interesting?”