1 design data retrieval and manipulation for subset of ‘gombe’ database using qbe durga gumaste...

23
1 Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste Advisor: Dr. Shashi Shekhar

Post on 19-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

1

Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE

Durga GumasteAdvisor: Dr. Shashi

Shekhar

2

Agenda

Objective Background and Motivation Query description Query optimization Conclusion

3

Objective

Design and implement queries to access andmanipulated ‘Gombe’ chimpanzees data subset,

suchthat queries can be modified by the user having

no background of any Data Manipulation

Language(DML)

4

Background and Motivation

Dr. Jane Goodall has done active research of Gombe chimpanzee for last 38 years

Data retrieval for analysis on the data

Frequent query modification

Ease of modification

5

Project structure

Database: Microsoft Access 2000 Desktop database Microsoft office suite

Microsoft access- QBE vs. VB script GUI for single SQL queries Execution time Support SQL 92

6

About data

Data about “Gombe” chimpanzees Collected since 1953 Behavioral

Food habits How do they travel (in a group or alone)

15 tables Average size: 15-20 MB

7

Tables used for queries

Name Description No of records

chimp Chimps observed by biologists 212

Follow Each time a chimp is observed by biologists

8463

Follow_arrival_ new

Any chimp arriving with the focal chimp

230743

Food_bout What and where a focal chimp eats during the follow

68871

Follow_map_position

Location of the focal chimp during the follow

393615

8

Relationship between tables

9

Queries

Multi-join, nested, range Q1: Find all chimps arriving alone

Q2: Include mothers arriving with off springs Q3: Include siblings

Q4: Exclude mothers and siblings Q5: Find chimps arriving together with other chimp

Single table, aggregate, pointQ6: Find food count of food items in a particular month of a year (Find % food counts)Q7: Find duration for which food items are eaten in a particular month of a year(Find % food duration)

10

Nested, join, range query

Chimps arriving alone Chimps are said to be alone when arrival time

between 2 chimps is more than 5 minutes.(Q1) Mothers arriving with off springs are counted as

arriving alone (Q2) Chimps which arrive with their siblings are

counted as arriving alone(Q3) Both Q2 and Q3 (Q4)

11

Implementation (Q4)

follow_arrival (A)

follow_arrival (B)

Inner join on A and B (self join)

A.date=B.date A.follow=B.follow A.chimp<>B.chimp

Result Set

Arrival time difference between A.chimp and B.chimp > 5 minutes orA and B are mother child orA and B are siblings

Chimprelationships

Follow_map_position (F)

Inner join with F

A.date=F.date A.follow=F.follow A.chimp<>F.focal A.seq = F.seq

1. Inner join (self join) on follow_arrival

2. Select chimps having fa_time_start difference more than 5 minutes for a particular follow on a particular date

3. Take location coordinates for such chimps from follow_map_position table by joining follow_arrival table with follow_map_position table

12

Comparison - PL/SQL & QBE for Q1

13

Comparison - PL/SQL & QBE for Q1

14

Query optimization in Microsoft Access

Cost bases query optimization MS Jet 3.0 Table statistics

Rushmore optimization Efficient use of indexes Index intersection,union,minus

15

Performance evaluation

Execution times

15 12 13 26 10

300

600

300

420480

0

100

200

300

400

500

600

700

Q1(13135) Q2(16754) Q3(13560) Q4(19699) Q5(187038)

Queries

Tim

e(s) index

no index

16

Compact database

Compact database using Compact utility provided by

Microsoft Access De-fragmentation Reordering database pages Reclaim unused space

Original size: 1.1 GB After compaction: 284 MB

Flags queries needing recompilation

17

Derived table

Follow Date Time Map Grpsize

------

chimp_id

AL

AO

AP

AR

AT

Follow_ArrivalFollow_map_time

Time interval adjustment10:03 10:0010:11 10:15

Follow_arrival

Certainty Value 1 1 0 0 uncertain blank

Sum of certainties

AL740101 1/1/1974

10:00 AM 2 1 1 0 13

AL AO AP AR

Group_composition_table

18

Conclusion

Query modification using QBE Ease of writing and modifying queries GUI

Indexes on join attributes improve the performance by 90-95%

Inner join queries are not displayed in QBE Base queries in SQL and sub-queries in QBE Access uses dyna-sets

Derived tables created in VB Multiple queries Onetime queries

19

Future work

Optimize derived table queries in VB Updating group composition table if new

chimp gets added to chimp table

20

Acknowledgement

21

Questions?

22

Desktop Databases

Advantages: Desktop databases are inexpensive Desktop databases are user-friendly Desktop databases offer web solutions

Limitations: Desktop databases generally support only one user. Desktop databases have weak security Desktop databases are not designed for the Internet

23

Access Features (product summary)

Advantages:•  Powerful database engine•  Integration with MS Office XP•  Intuitive Graphical User Interface (GUI)

Disadvantages •  Relatively expensive The Bottom Line - If