presented by team 3 min lin minchao lin tian liu yifai ng haochen shan yigang wang guo yu

37
Center for African Studies The MasterCard Foundation Scholars Program Database Project Final Presentation PRESENTED BY TEAM 3 MIN LIN MINCHAO LIN TIAN LIU YIFAI NG HAOCHEN SHAN YIGANG WANG GUO YU YING YANG YE ZHONG

Upload: beck-stevens

Post on 04-Jan-2016

33 views

Category:

Documents


1 download

DESCRIPTION

Center for African Studies The MasterCard Foundation Scholars Program Database Project Final Presentation. PRESENTED BY TEAM 3 Min Lin Minchao Lin Tian Liu Yifai NG Haochen shan Yigang Wang Guo Yu Ying Yang Ye Zhong. Client Overview. UC Berkeley Center for African Studies (CAS) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Center for African StudiesThe MasterCard Foundation Scholars ProgramDatabase Project Final Presentation

PRESENTED BY TEAM 3MIN LIN

MINCHAO LINTIAN LIUYIFAI NG

HAOCHEN SHANYIGANG WANG

GUO YUYING YANGYE ZHONG

Page 2: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Client Overview• UC Berkeley Center for African Studies (CAS)

• The MasterCard Foundation Program• Provide comprehensive support for Education• For economically disadvantaged students from developing countries in Africa • Financial, academic, social, and career counseling • 2012-2020, 25 new students per year

• Client Need• A new database help better manage the program• Specialized in tracking financial transactions• Also store data and information related to the program• Financial, academic, social, career

Page 3: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Project ReviewDP I

• Familiarized with clients and understood their needs

• Created 1st version of Simplified EER diagram

• Set up schedule for the project

DP II

• Expanded and Revised EER diagram

• Conceptualized 5 queries• Developed Relational Schema

DP III

• Finalized EER Diagram• Created database in Access and

implemented relations into Access

• Altered Queries and developed SQL code

Final Presentation

• Implemented Queries into Access • Modified relational Schema• Did Normalization Analysis

Page 4: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

EER Diagram

Page 5: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Access Relationships

Page 6: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Relational Schema1. Person (PID, Lname, Fname, MI, Birth_Date, Nationality, Gender, Primary_Phone_No, Address)

1a. Alumni (Alumini_PID1, Occupation, Company, Degree, Graduation_Date, Admission_Date)

1b. Student (Student_PID1, SID)

1bb. Pre_Student (Student_PID1, Financial_Background, SAT_Score, TOFEL_Score, High_School_GPA, No_HighSchool_Honor, No_HighSchool_Awards, HID9)

1bc. Student_current_status (SID, Mentor_Employee_PID1, Class_Level, Expected_Graduation_Date, App_Date, College_GPA)

1c. Employee (Employee_PID1, Position)

2. TransitionAccount (TID, Amount, Time, Year, Month, Date, PC_ID3)

2a. Withdrawal (W_TID2)

2aa. Check (Check_TID2, Description, Responsible_Person, Account_No)

2aaa. Account_Detail (Account_No, Routing_No)

2ab. Bearbuy (Bear_TID2 )

2ac. Cash (C_TID2, Responsible_Person)

2ad. BlueCard (BC_TID2)

2b. Deposit (D_TID2)

3. Personal_Card (PC_ID, Year, Month, Day, Time, Amount, Reimbursed, Employee_PID1)

4. Expense (EID, Amount, Year, Month, Day, Description, PMID5)

4a. InternalExpense (I_EID4) 4ab. Tuition (T_EID4, Semester, Degree_Class)

4ac. CourseMaterial (C_EID4, Course_Name, ISBN)4b. ExternalExpense (E_EID4, OID5) 4ba. Summer_Winter_Housing (SWH_EID4, Address, Start_Date, Finish_Date) 4baa. Summer_Winter_Housing_Timeline (Start_Date, Finish_Date, Duration) 4bab. Housing_Information(Address, Landlord, Agent,Contact_Phone_No)4bb. External_Course_Material (EC_EID4, Course_Name, ISBN) 4bc. SEVIS_Fee (S_EID4) 4bd. Travel_Airline_Ticket (TAT_EID4, Flight_No, Airline_Name) 4bda. Ticket_Detail (Flight_No, Airline_Name, Date, Departure_Location, Destination, Class) 4be. Office_Supplies (O_EID4, Supply_Name, Quantity, Lead_Time, Price, Discount)

5. Order (OID, Total_Amount, Description, Product_Name, Quantity, Price, Date,Req_Student_PID1)

5a. OnlineOrder (OOID5, Tracking_No, Discount, OSID)

5b. NormalOrder (NOID5)

6. Item (NOID5, IID, Iname, Description)

7. OnlineShop (OSID7, Website, Shipping_Method)

7a.Website_Email_Address (Website, Supplier_Email_Address ,Description)

Page 7: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Relational Schema (Continue)8. Course (CID, Semester, Professor, Final_Date, Description, Ctitle) 8a. GE (GE_CID8, Category) 8b. LowerDivision (L_CID8) 8c. UpperDivision (U_CID8) 8d. Sections(CID, Section_Number) 8e. Ctitle (CID, Ctitle)9.High_School(HID, Name, School_Type, Year_Founded, Relidious_Affiliation, Academic_Calender, Setting, Student_Faculty_Ratio, Total_Enrollment_No, Gender_Ratio, College_Enrollment_Rate)10.Job (JID, Position, No_Employee, Start_Date, Finish_date, Work_On_Alumini_PID1, Work_on_Student_PID1) 10a.Job_Detail (Position, Job_function, Location, Salary, Major_Req, Skill_Need, Description, Hour_Per_Week, Language_Need, Visa_Req) 10b.Job-Timeline (Start_Date, Finish_date, Duration) 10c. Full_1Time (F_JID10, Degree_Level, Exp_Req, Pre_I_JID10c) 10d. Part_Time (P_JID10, Degree_Level, Exp_Req) 10e. Internship (I_JID10, Class_Level, Referrer)11.Company (Company_ID, Company_Name, Description, Company_Type, Contact_Phone_No, Size, Industry_Type, Email_Address, Website, City, Street, Zip_Code)12.Country (Name, No_Student)13.Event (Event_ID, title, Start_Date, Finish_date, Duration, No_People, Sponsor, City) 13a. CAS_Event (CAS_Event_ID13, Description, Cost) 13b. Other_Event (O_Event_ID13, Description) 13ba. Recruiting_Event. (R_Event_ID13, Industry_Type, Job_Class, Major_Preferred)

14. Grade (Student_PID1, CID8, Grade_Option, Grade, Semester)

15. Survey (Survey_ID, title, date, description)

15a. Alumni_Survey (A_Survey_ID15, Alumini_PID1)

15b. Student_Survey (S_Survey_ID15, Student_PID1)

N to N Relationships:

16.Withdrawal_Pay_Expense (W_TID2, EID4)

17.Employee_Check_Inventory (Employee_PID1, OID5)

18.Student_Take_Course (Student_PID1, CID8, Grade, Grade_Option, Semesters)

18a. Ctitle(CID8, Ctitle)

19.Student_Participate_Event (Student_PID1, Event_ID13)

20.Company_Provide_Job (Company_ID11, JID10)

21.Company_Participate_Recuriting_Event (Company_ID11, Recuriting_Event_ID13)

22.Company_Located_Country (Company_ID11, Country_Name12)

23.Event_Located_Country (Event_ID13, Country_Name12)

Multivalue:

24.Person_Email (PID1, E-mail_Address)

25.Alumni_Major (Alumni_PID1a, Major)

26.Student_Major (Student_PID1b, Major)

27.Course_Midterm_Date (CID8, Year, Month, Day)

Page 8: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

NORMALIZATION ANALYSIS

Page 9: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Decomposing to 1NF and 2NFCourse (CID, Semester, Professor, Final_Date, Section_Number, Description, Ctitle)

Course (CID, Semester, Professor, Final_Date, Description, Ctitle)Sections(CID, Section_Number)

To 1NF

Student_PID CID Grade Grade_Option Semester Ctitle

To 2NF

Student_Take_Course (Student_PID, CID, Grade, Grade_Option, Semesters)CourseTite(CID, Ctitle)

Student_Take_Course

Page 10: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Decomposing to 3NF

Check_TID2 Description Responsible_Person Account_No Routing_No

Check

To 3NF

Check (Check_TID2, Description, Responsible_Person, Account_No)Account_Detail (Account_No, Routing_No)

Page 11: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Decomposing to BCNF

CID Semester Professor Final_Date Description Ctitle

To BCNF

Course (CID, Semester, Professor, Final_Date, Section_Number, Description)Ctitle (CID, Ctitle)

Course

Page 12: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

QUERIES

Page 13: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Query 1: Demand Forecasting and EOQ

Forecast student’s demands for each type of orders in the future based on historical records, then use this data to determine the best order quantity and time interval to place orders on online shops.

1. Well understand the students’ need for each type of supplies on a timely basis.2. Predict the order period and amount so that CAS could plan ahead.

Object

Benefits

Page 14: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Query 1: Demand Forecasting - SQLSELECT Order.ProductName AS Product, sum(Order.Quantity) AS SepTotalQuantityFROM [Order]WHERE Order.Date like "8/*/2013”GROUP BY Order.ProductName;

Step 1: Extract the Data from Access. Get the order quantity of a specific product over a period.

Sample output: A 3*3 matrix include all the product’s order quantity for a Specific period of a specific year

Page 15: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Query 1: Demand Forecasting - ProcessStep 2: Calculate the seasonal factor and monthly demand forecast with a calculator program wrote by Java.

Step 3: Put the result back into SQL to get the Economic order quantity (EOQ) model.

Page 16: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Query 1: Demand Forecasting - Process

Part of Java code

Page 17: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Query 1: EOQ - SQLSELECT DISTINCT Order.ProductName, IIF(Item.Quantity=0,“Yes”,“No”) AS StockOut,

Round(Sqr(2*(OnlineOrder.ShippingFee)*(OnlineOrder.MonthlyDemand)/(0.1*Order.Price))) AS OptimalOrderQ,

IIf(OnlineOrder.ShippingTime>0,Round((OnlineOrder.ShippingTime)*(OnlineOrder.MonthlyDemand)/(0.1*Order.Price)),0) AS ReorderPointQuantity,

Round(((Sqr(2*(OnlineOrder.ShippingFee)*(OnlineOrder.MonthlyDemand)/(0.1*Order.Price))/10))*30) AS OrderCycleDays, OnlineShop.Website AS Website, Order.Date AS OrderDate,

Order.Date +Round(((Sqr(2*(OnlineOrder.ShippingFee) *(OnlineOrder.MonthlyDemand)/(0.1*Order.Price))/10))*30) AS NextOrderDate

FROM Order, OnlineOrder, OnlineShop, ItemWHERE Order.OID=OnlineOrder.OID AND OnlineOrder.OSID=OnlineShop.OSID AND Item.OID=Order.OIDORDER BY Order.Date;

Check whether the inventory is stocked out or not

Calculate optimal order quantity

Calculate Reorder point

Calculate order cycle and dates

Calculate Next order date

Page 18: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Query 1: EOQ - Output

Page 19: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Query 2: Academic Performance

What are the factors that affect students’ academic performance (i.e. GPA) and to what extent?Classes, parents’ education level, Traveling Expense, Course material fee, etc.

1. Understand factors that may affect students’ performance2. Wisely Allocate the fund according to the analysis3. Help student get more successful by planning events accordingly

Object

Benefits

Page 20: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Query 2: Academic Performance - SQL SELECT Student.SID,1.714+0.589*A.Indicator-0.00632*IIf(Student.Degree=“Undergraduate”,1,0)+0.0165*IIf(Student.Gender=“Male”,1,0)+0.000644*Student.SATScore-0.0147*B.NumberEventAttend+0.0528*IIF(Student.Research=“Yes”,1,0) AS ExpGPAFrom Student, (SELECT Student.SID,Count(internship.JID) AS Indicator FROM Student, Job,Internship Where Student.PID=Job.PID AND Job.JID=Internship.JID Group BY Student.SID Union Select Student.SID,0 From Student Where Student.PID NOT IN(SELECT student.PID From Student,Job,Internship Where Student.PID=Job.PID AND Job.JID=Internship.JID))As A,

Select specific variable and combine with coefficients to obtain result by linear regression model

Determine whether a student have ever attend any internships or not (binary variable)Defined as table A

Page 21: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Query 2: Academic Performance – SQL (Cont.) (SELECT Student.SID,Count(StudentParticipateEvent.EventID) AS NumberEventAttend From Student,StudentParticipateEvent Where Student.PID=StudentParticipateEvent.PID Group BY Student.SID) AS BWhere A.SID=Student.SID AND B.SID=Student.SID;

Determine how many events a student have attend (numerical value)Defined as table B

Step 2: Run linear regression over all the variables, then use Akaike Information Criterion to reduce the model to the most efficient model.

Implement with R

Page 22: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Query 3: High School Comparison

Is there significant difference between each aspect of two high schools?

1. Compare the quality of two high schools based on the records of admitted students. 2. Help with decision making when comparing applicants with similar qualifications from the perspective of their high school strengths

Object

Benefits

Page 23: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Query 3: High School Comparison - SQLSELECT Student.PID, Student.HID, Student.CollegeGPA, Student.SATScore, Student.TOFELScore, Student.[HighSchoolEvents#], Student.[HighSchoolAward#]FROM StudentWHERE (((Student.HID)=1)) OR (((Student.HID)=2));

Step 1: Extract the data from Access by SQL

Step 2: Calculate the mean of each corresponding category of all admitted students from these two high schools, then use t-test with unequal variances to get the p-value

Implement with Excel

Page 24: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Query 3: High School Comparison - OutputStep3: Use Holm-Bonferroni method to find out if each difference is significant.

Implement with MATLAB.

[corrected_p, h]=bonf_holm([0.38 0.414 0.0513 0.334 0.257] ,0.5) corrected_p = 1.0020 0.7600 0.2565 1.0280 1.0280h = 0 0 1 0 0

Step4: Sample output from MATLAB

Page 25: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Query 4: Category Expense

What is the distribution of expense on four major categories of external expenses (i.e. Course Material, Travel, Office Supply and Housing, ) that is paid by the program? Generate distribution chart and use statistical tools to analyze these distributions.

1. Track expenses related to students.2. Decide the expense constraints for students on each category.3. Generate clear expense report, could be included in the annual report for the MasterCard Foundation.

Object

Benefits

Page 26: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Query 4: Category Expense - SQLStep 1: Find the total expense of each student

SELECT Student.PID, Sum(Expense.Amount) AS AmountOfSumFROM (OfficeSupply INNER JOIN Expense ON OfficeSupply.EID = Expense.EID) INNER JOIN Student ON Expense.PID = Student.PIDGROUP BY Student.PIDORDER BY Student.PID;

SELECT Count([OfficeSupply Query].PID) AS CountOfPID, Partition([AmountOfSum],0,1100,100) AS Expr1FROM [OfficeSupply Query]GROUP BY Partition([AmountOfSum],0,1100,100);

Step 2: Generate the data for histogram

Step 3: Use Report function to generate graphs, and use Access toolbox to generate the Mean, and Standard Deviation of the distribution. Step 4: If, in most cases, the distribution is bell-distributed, we could use 68-95,99.7 rule, aka Three-sigma rule, to set up expense constraints for students.

Page 27: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Query 4: Category Expense - Output

Results after the second stepCount the number of PID to generate a histogram

Query Results after the first step& The toolbox that could be used to calculate average and variance quickly

Page 28: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Query 4: Category Expense - Output

Expense Summary Report(Based on Sample Data)

Page 29: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Query 5: Alumni & Employment

What are the relationships among an alumni’s employment status, alumni’s GPA, alumni’s education level, number of events an alumni has attend etc?

1. Based on the analysis result, help current students find jobs.2. Find out the most important factor that affect a student’s future employment.

Object

Benefits

Page 30: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Query 5: Alumni & Employment - SQLSELECT Alumni.PID, IIF(Alumni.Degree="Undergraduate",1,0) AS Degree,count(StudentParticipateEvent.EID) AS NumEventAttend, IIF(AlumniMajor.school_of_medicine = "Yes", 1, 0) As Sch_Medicine,IIF(AlumniMajor.school_of_law = "Yes", 1, 0) As Sch_Law, IIF(AlumniMajor.college_of_engineering = "Yes", 1, 0) As Sch_Engi, IIF(AlumniMajor.school_of_optometry = "Yes", 1, 0) As Sch_OptIIF(AlumniMajor.college_of_natural_resource = "Yes", 1, 0) As Sch_Nat,IIF(AlumniMajor.college_of_letter_science = "Yes", 1, 0) As Sch_Science, IIF(AlumniMajor.school_of_information = "Yes", 1, 0) As Sch_Inf,IIF(AlumniMajor.school_of_social_welfare = "Yes", 1, 0) As Sch_welfare, IIF(AlumniMajor.haas_business_school = "Yes", 1, 0) As Sch_has, Company.c_latitude, Company.c_longitudeFROM Alumni, StudentParticipateEvent, AlumniMajor, Company, FullTimeWHERE FullTime.JID = Job.JID AND Company.CompanyID=Job.CompanyID AND Job.PID=Alumni.PID AND Alumni.PID=StudentParticipateEvent.PID AND AlumniMajor.PID=Alumni.PID;

Step 1: Extract the data from Access by SQL

Page 31: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Query 5: Alumni & Employment - ProcessStep 2: Fitting Logistic Regression Model with R

• this model will give the result of the probability for predicted variable to be 1 ( which means this person will get employed or not)

• Potentially 20 candidate models

Step 3: Model Selection• Cross-Validation

Employment = GPA + Event + Degree • AIC ( Akaike Information Criterion)

Employment = GPA + Event + Degree + School of Information• BIC ( Bayesian Information Criterion)

Employment = GPA• Deviance Selection

Employment = GPA + Event + Degree

Page 32: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Query 5: Alumni & Employment - ProcessStep 4: Cut-off Selection

• Find a cut of the predicted probability which will let us judge if the predicted value is 1 or 0

• Method: Building Confusion Matrix• Choosing a cut off probability first • Using confusion Matrix to find the best cut off

• Base on the result we choose 0.58

Page 33: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Query 5: Alumni & Employment - ProcessStep 6: Plot

Page 34: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Query 5: Alumni & Employment - Process

Step 7: Creating XML concatenate with KML plot Alumni’s Company on Google Earth

• Creating an KML plot for Alumni who graduated from different college• Plot those coordinates on Google earth• Get intuition employment status geographically for each college in UC

Berkeley, which will give us an intuition where has higher employment rate for corresponding college’s current student.

• Example for School of Medicine

Page 35: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Query 5: Alumni & Employment - OutputStep 8: Implement into GoogleEarth

Page 36: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Future Work & Improvements

• Create forms & reports• Make our database more user-friendly

• Create additional queries: • Other useful Queries• Such as monthly balance check

Page 37: PRESENTED BY TEAM 3 Min  Lin Minchao  Lin Tian  Liu Yifai  NG Haochen shan Yigang  Wang Guo  Yu

Q&AThank you for listening.