cs 157b: database management systems ii march 13 class meeting department of computer science san...

19
CS 157B: Database Management Systems II March 13 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak www.cs.sjsu.edu/~mak

Upload: homer-shields

Post on 30-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 157B: Database Management Systems II March 13 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak

CS 157B: Database Management Systems IIMarch 13 Class Meeting

Department of Computer ScienceSan Jose State University

Spring 2013Instructor: Ron Mak

www.cs.sjsu.edu/~mak

Page 2: CS 157B: Database Management Systems II March 13 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak

Department of Computer ScienceSpring 2013: March 13

CS 157B: Database Management Systems II© R. Mak

2

Midterm Question 1.a

<xs:schema xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:complexType name="investor_type"> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="level"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="beginner"/> <xs:enumeration value="intermediate"/> <xs:enumeration value="advanced"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> <xs:element name="experience"> <xs:complexType> <xs:sequence> <xs:element name="investor" type="investor_type" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element></xs:schema>

experience.xsd

Page 3: CS 157B: Database Management Systems II March 13 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak

Department of Computer ScienceSpring 2013: March 13

CS 157B: Database Management Systems II© R. Mak

3

Midterm Question 1.b<xs:schema xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:complexType name="stock_type"> <xs:sequence> <xs:element name="symbol" type="xs:string"/> <xs:element name="company" type="xs:string"/> <xs:element name="price" type="xs:decimal"/> <xs:element name="change" type="xs:decimal"/> </xs:sequence> <xs:attribute name="exchange"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="NASDAQ"/> <xs:enumeration value="NYSE"/> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:complexType> <xs:element name="prices"> <xs:complexType> <xs:sequence> <xs:element name="stock" type="stock_type" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element></xs:schema>

prices.xsd

Page 4: CS 157B: Database Management Systems II March 13 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak

Department of Computer ScienceSpring 2013: March 13

CS 157B: Database Management Systems II© R. Mak

4

Midterm Question 2.a

<winners>{ for $stk in doc("prices.xml") //stock for $inv in doc("investors.xml") //investor let $change := $stk/change where ($inv/portfolio/symbol = $stk/symbol) and ($stk/change > 0) order by $stk/symbol ascending return <stock> { ($stk/symbol, $stk/price, $stk/change, $inv/name) } </stock>}</winners>

winners.xql

Page 5: CS 157B: Database Management Systems II March 13 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak

Department of Computer ScienceSpring 2013: March 13

CS 157B: Database Management Systems II© R. Mak

5

Midterm Question 2.b<losers>{ for $inv in doc("investors.xml") //investor for $exp in doc("experience.xml") //investor for $stk in doc("prices.xml") //stock let $change := $stk/change where ($inv/name = $exp/name) and ($inv/portfolio/symbol = $stk/symbol) and ($exp/level != "beginner") and ($stk/change < 0) return <investor> { ($inv/name, $exp/level, $stk/symbol, $stk/price, $stk/change) } </investor>}</losers>

losers.xql

Page 6: CS 157B: Database Management Systems II March 13 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak

Department of Computer ScienceSpring 2013: March 13

CS 157B: Database Management Systems II© R. Mak

6

Midterm Question 3

An application that combines information from the various data sources. an online mash-up a new web service

Operation: Schedule air travel between cities. Web service client code that connects to the airline schedules

web service and the weather forecasts web service. Need to do data unmarshalling if the web services

provide results in XML. Use XQuery to query the flight distance and flight duration

XML documents. Unmarshal the XQuery results and combine with data from the

weather forecasts web service to estimate flight duration.

Page 7: CS 157B: Database Management Systems II March 13 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak

Department of Computer ScienceSpring 2013: March 13

CS 157B: Database Management Systems II© R. Mak

7

Midterm Question 3

Operation: Allow airline seat selection. Use XSLT to convert the airline seating charts from XML to a

more display-friendly form, such as an HTML-based web page.

Operation: Book lodging. Use Hibernate object-relational mapping to access

the lodging database. Map hierarchical data:

Campsite and hotel are subclasses of lodging. Fancy, Midpriced, and Budget are subclasses of hotel.

Use the Criteria API to do the queries._

Page 8: CS 157B: Database Management Systems II March 13 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak

Department of Computer ScienceSpring 2013: March 13

CS 157B: Database Management Systems II© R. Mak

8

Project #3 Presentations Next Monday

Present and demo your web services mashupsin 15 minutes.

Section 1 Team INVIKO Team Lasers + 2 other teams

Section 2 Team C Team Unlimited Data + 2 other teams

Page 9: CS 157B: Database Management Systems II March 13 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak

Department of Computer ScienceSpring 2013: March 13

CS 157B: Database Management Systems II© R. Mak

9

Data Warehouse

A data warehouse is where a company keeps its important data assets.

Three categories of a company’s data. Operational data generated by ongoing business activities.

Examples: customer orders, shipping and receiving, financial transactions, etc.

Integrated data from different parts of the business. Combine data from disparate corporate applications that

weren’t meant to work together, in order to better leverage the data and to improve synchronicity among departments.

Monitoring data to keep track of how the business is doing and to understand and evaluate business processes. Examples: reports, online “dashboards”

Page 10: CS 157B: Database Management Systems II March 13 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak

Department of Computer ScienceSpring 2013: March 13

CS 157B: Database Management Systems II© R. Mak

10

Data Warehouse

Data assets come from transforming operational data into integrated data or monitoring data. The data warehouse should contain only high quality data.

The data is periodically extracted from their original data sources, transformed into a higher quality, more useful form, and then loaded into the data warehouse. ETL: extract, transform, and load

Data warehousing is a well-architected information management solution ... that enables analytical and information processing ... while overcoming application, organizational, and other barriers.

Page 11: CS 157B: Database Management Systems II March 13 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak

Department of Computer ScienceSpring 2013: March 13

CS 157B: Database Management Systems II© R. Mak

11

Purpose of a Data Warehouse

Businesses today are deluged by data. A company that can make use of its key data assets

has a better chance to succeed.

A data warehouse provides the platform, tools, and processes to manage and deliver the key data assets.

Decision-makers can rely on the analysis of high-quality data and not just on hunches. Predecessors to data warehousing include

“decision-support systems” (DSS) from the mid 1970s. DSS evolved into “executive information systems” (EIS)

in the mid 1980s.

Page 12: CS 157B: Database Management Systems II March 13 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak

Department of Computer ScienceSpring 2013: March 13

CS 157B: Database Management Systems II© R. Mak

12

Data Warehousing Thought Leaders

Bill Inmon The “Father of Data Warehousing” Wrote the book Building the Data Warehouse in 1993. Advocates an approach to data warehousing called the

“Corporate Information Factory”.

Ralph Kimball Wrote the first edition of The Data Warehouse Toolkit in 1996. Promotes dimensional modeling.

developed the star schema dimension tables and fact tables denormalized tables

Page 13: CS 157B: Database Management Systems II March 13 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak

Department of Computer ScienceSpring 2013: March 13

CS 157B: Database Management Systems II© R. Mak

13

Star Schema Example

Facts: sales data Dimensions: date, store, product

http://en.wikipedia.org/wiki/Star_schema

Page 14: CS 157B: Database Management Systems II March 13 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak

Department of Computer ScienceSpring 2013: March 13

CS 157B: Database Management Systems II© R. Mak

14

Star Schema Example

How many TV sets have been sold, for each brand and country, in 1997?

SELECT P.Brand, S.Country, SUM(F.Units_Sold) FROM Fact_Sales F INNER JOIN Dim_Date D ON F.Date_Id = D.Id INNER JOIN Dim_Store S ON F.Store_Id = S.Id INNER JOIN Dim_Product P ON F.Product_Id = P.Id WHERE D.YEAR = 1997 AND P.Product_Category = 'tv' GROUP BY P.Brand, S.Country

Page 15: CS 157B: Database Management Systems II March 13 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak

Department of Computer ScienceSpring 2013: March 13

CS 157B: Database Management Systems II© R. Mak

15

Why Denormalization?

Why do we normalize tables in a standard database? Support data updates. Prevent update anomalies. Disadvantages

many tables, many joins slow queries

A data warehouse, on the other hand ... Requires rapid data access Updates occur infrequently

Example: nightly ETL

Therefore, data warehouses can contain denormalized tables._

Page 16: CS 157B: Database Management Systems II March 13 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak

Department of Computer ScienceSpring 2013: March 13

CS 157B: Database Management Systems II© R. Mak

16

Related to Data Warehousing

Data mart A small data warehouse. A subset or (or a view into) a data warehouse for a

particular set of users, such as financial analysts. the preferred definition

Operational data store (ODS) A database specifically for integrating data from

disparate sources, especially a company’s operational data (e.g., transactional data).

Perform some operations on the data, such as cleansing and conforming.

Feed the data into the data warehouse._

Page 17: CS 157B: Database Management Systems II March 13 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak

Department of Computer ScienceSpring 2013: March 13

CS 157B: Database Management Systems II© R. Mak

17

Data Warehousing and Business Intelligence (DW/BI)

Business intelligence is a primary reason to have a data warehouse!

Several levels of BI: querying and reporting dashboards and scorecards business analytics data mining

_

Page 18: CS 157B: Database Management Systems II March 13 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak

Department of Computer ScienceSpring 2013: March 13

CS 157B: Database Management Systems II© R. Mak

18

Business Intelligence

Querying and reporting “Canned” queries that generate formatted results. Generate reports on demand, or automatically and periodically. “What happened recently in my company?”

Dashboards and scorecards Graphical dashboards contain charts and gauges that monitor

a company’s key performance indicators (KPI). Examples: transactions/second, number of web hits, etc.

Scorecards show performance measured against a plan or set of objectives.

“What is happening right now and how are we doing?” _

Page 19: CS 157B: Database Management Systems II March 13 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak

Department of Computer ScienceSpring 2013: March 13

CS 157B: Database Management Systems II© R. Mak

19

Business Intelligence

Business analysis Online analytical processing (OLAP)

Do not confuse with an older term, online transaction processing (OLTP).

Visualize data in a multidimensional manner. Analytical processes that involve manipulating data along different

dimensions. “What happened recently in my company, and why?”

Data mining Use statistics to do predictive analysis. Discover patterns and relationships in the data. Use artificial intelligence (AI) techniques to find answers

even if you don’t know the questions. “What can happen? What’s interesting?”