1 design data retrieval and manipulation for subset of ‘gombe’ database using qbe durga gumaste...
Post on 19-Dec-2015
220 views
TRANSCRIPT
1
Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE
Durga GumasteAdvisor: Dr. Shashi
Shekhar
3
Objective
Design and implement queries to access andmanipulated ‘Gombe’ chimpanzees data subset,
suchthat queries can be modified by the user having
no background of any Data Manipulation
Language(DML)
4
Background and Motivation
Dr. Jane Goodall has done active research of Gombe chimpanzee for last 38 years
Data retrieval for analysis on the data
Frequent query modification
Ease of modification
5
Project structure
Database: Microsoft Access 2000 Desktop database Microsoft office suite
Microsoft access- QBE vs. VB script GUI for single SQL queries Execution time Support SQL 92
6
About data
Data about “Gombe” chimpanzees Collected since 1953 Behavioral
Food habits How do they travel (in a group or alone)
15 tables Average size: 15-20 MB
7
Tables used for queries
Name Description No of records
chimp Chimps observed by biologists 212
Follow Each time a chimp is observed by biologists
8463
Follow_arrival_ new
Any chimp arriving with the focal chimp
230743
Food_bout What and where a focal chimp eats during the follow
68871
Follow_map_position
Location of the focal chimp during the follow
393615
9
Queries
Multi-join, nested, range Q1: Find all chimps arriving alone
Q2: Include mothers arriving with off springs Q3: Include siblings
Q4: Exclude mothers and siblings Q5: Find chimps arriving together with other chimp
Single table, aggregate, pointQ6: Find food count of food items in a particular month of a year (Find % food counts)Q7: Find duration for which food items are eaten in a particular month of a year(Find % food duration)
10
Nested, join, range query
Chimps arriving alone Chimps are said to be alone when arrival time
between 2 chimps is more than 5 minutes.(Q1) Mothers arriving with off springs are counted as
arriving alone (Q2) Chimps which arrive with their siblings are
counted as arriving alone(Q3) Both Q2 and Q3 (Q4)
11
Implementation (Q4)
follow_arrival (A)
follow_arrival (B)
Inner join on A and B (self join)
A.date=B.date A.follow=B.follow A.chimp<>B.chimp
Result Set
Arrival time difference between A.chimp and B.chimp > 5 minutes orA and B are mother child orA and B are siblings
Chimprelationships
Follow_map_position (F)
Inner join with F
A.date=F.date A.follow=F.follow A.chimp<>F.focal A.seq = F.seq
1. Inner join (self join) on follow_arrival
2. Select chimps having fa_time_start difference more than 5 minutes for a particular follow on a particular date
3. Take location coordinates for such chimps from follow_map_position table by joining follow_arrival table with follow_map_position table
14
Query optimization in Microsoft Access
Cost bases query optimization MS Jet 3.0 Table statistics
Rushmore optimization Efficient use of indexes Index intersection,union,minus
15
Performance evaluation
Execution times
15 12 13 26 10
300
600
300
420480
0
100
200
300
400
500
600
700
Q1(13135) Q2(16754) Q3(13560) Q4(19699) Q5(187038)
Queries
Tim
e(s) index
no index
16
Compact database
Compact database using Compact utility provided by
Microsoft Access De-fragmentation Reordering database pages Reclaim unused space
Original size: 1.1 GB After compaction: 284 MB
Flags queries needing recompilation
17
Derived table
Follow Date Time Map Grpsize
------
chimp_id
AL
AO
AP
AR
AT
Follow_ArrivalFollow_map_time
Time interval adjustment10:03 10:0010:11 10:15
Follow_arrival
Certainty Value 1 1 0 0 uncertain blank
Sum of certainties
AL740101 1/1/1974
10:00 AM 2 1 1 0 13
AL AO AP AR
Group_composition_table
18
Conclusion
Query modification using QBE Ease of writing and modifying queries GUI
Indexes on join attributes improve the performance by 90-95%
Inner join queries are not displayed in QBE Base queries in SQL and sub-queries in QBE Access uses dyna-sets
Derived tables created in VB Multiple queries Onetime queries
19
Future work
Optimize derived table queries in VB Updating group composition table if new
chimp gets added to chimp table
22
Desktop Databases
Advantages: Desktop databases are inexpensive Desktop databases are user-friendly Desktop databases offer web solutions
Limitations: Desktop databases generally support only one user. Desktop databases have weak security Desktop databases are not designed for the Internet