1 cidr’03 aims: an immersidata management system cyrus shahabi computer science department &...
Post on 21-Dec-2015
216 Views
Preview:
TRANSCRIPT
1CIDR’03
AIMS: An Immersidata Management System
Cyrus ShahabiComputer Science Department &Integrated Media Systems CenterUniversity of Southern California
Los Angeles, CA 90089-0781shahabi@usc.edu
http://infolab.usc.edu
2CIDR’03
Outline
Definitions and Motivating Applications
Immersive Data Types (focus: immersidata)
AIMS Architecture
Subsystems: Acquisition, Storage & Querying
Current Status (demo, if time permits)
Conclusion and Future Work
3CIDR’03
Immersive Environments
Immersive Environments allow a user to become immersed within an augmented or virtual reality environment in order to interact with people, objects, places, and databases.
Examples Office of the Future (UNC) Fire Fighter Training System (Georgia Tech) Planetary Exploration (JPL) Physical/Occupational Therapy System (Haifa Univ.) Virtual Classroom and Office (USC IMSC) Haptic Museum (USC IMSC) MRE: Mission Rehearsal Exercise (USC ICT)
4CIDR’03
Thesis (1) It is absolutely critical to understand the data
generated by and for immersive environments For example, from the data acquired from a user’s
interactions with an immersive environment (i.e., immersidata), we can learn about the user’s behavior to: Study human factor issues Measure the effectiveness of the environment Customize the information delivery Identify pitfalls in the system Better understand the user’s intentions Improve the system performance
For immersive and multimedia community! For database community:
Immersive sensors are the user interfaces of the future; as a research community we should study their generated data or we will miss the boat.
6CIDR’03
command
PlayRunStop
Zoom-InZoom-Out
0.720.150.63
0.920.25
Immersive environment
RecognitionSystem
DB of Labeled Patterns
Application (1) : Immersive Sensor Pattern Recognition
On-Line Query & Analysis
7CIDR’03
AcquisitionModule
ImmersidataDatabase
Spatio-Temporal(moving sensors)Query Evaluation
i
i
2. Sensor valuessampled over time Recognition modules:
-SVD-Bayesian Classifiers-Neural Net
3. Semantic descriptionof hand
1. User makes ASL
signs w/ a glove
4. ASL signs recognized
C E F
Application (1) : American Sign Language (ASL) as well-defined
patterns
8CIDR’03
On-Line query and analysis challenges: A hand sign is composed of a sequence of data samples across
multiple sensor streams A sequence for one sign has no fixed length (i.e., can’t tell when
one ends and the other starts!)
Two problems (chicken & egg-problem) with interdependent solutions should be addressed• Isolate signs• Recognize the isolated sign
An example statement in American Sign Language (ASL)
like yellow shoesI
Application (1) : ASL On-Line Q&A …
9CIDR’03
Application (2) : Immersive Classroom
Off-Line Query & Analysis
Study attention performance for Normal & ADHD-Diagnosed Children
A classroom as a virtual environment (virtual students, a virtual teacher, desks, a blackboard, a window to the playground, doors)
Presence of distracters Paper airplane Ambient classroom noise Students walking Cars passing outside, visible through the window
10CIDR’03
Application (2) : IC Off-Line Q&A …
User, wearing HMD, is immersed into the class Trackers monitor body movements and stream data
to the database Task: pressing a button when a particular letter
pattern is seen on the virtual blackboard (e.g., AX)
Head sensor data
Arm sensor data
Leg sensor data
DB
Mouse Clicks
Displayed Characters
Distracters
11CIDR’03
Application (2) – IC Off-Line Q&A … Off-line query and analysis:
Range-sum queries
• Sum of body movements
• Average reaction time to the patterns
• Number of correct hits
Classification and clustering
• Use a classification technique to differentiate between normal and ADHD-diagnosed subjects (e.g., SVM)
Distinguishing hyperactive kids from normal by automatically analyzing tracker data: major impact in psychotherapy, able to discriminate and specify diagnosis in a manner not possible using existing traditional methods
Video Clip
12CIDR’03
Thesis (2)
Immersive applications in training and simulation
domains, share common data storage and
analysis requirements (i.e., dealing w/ sensor
data streams, aka immersidata)
Hence, instead of building customized systems
for the “acquisition, storage and querying”
needs of each immersive application, one can
design a general-purpose system addressing
many of the shared requirements
14CIDR’03
Focus: Immersidata [MIS’99]
Data acquired from user’s interaction with the immersive environment Subject body positions Subject recognized gestures
Can be analyzed to learn about user’s behavior Specifications
Multidimensional <si, x, y, z, t, v>
Spatio-Temporal Continuous Data Streams (CDS) Potentially large in size and bandwidth requirements Noisy
…, <sn,xn,yn,zn,hn,pn,rn,tn>, …, …,<s1,x1,y1,z1,h1,p1,r1,t1>, …
15CIDR’03
1. Acquisition module
DWPT basis selectionfor each dimension
Transformation
2. Storage module
Wavelets packinginto disk blocks or DB BLOBS
Immersidata storage(file-system + OR-DBMS)
4. Query & analysis module
Application-specificGUI
ProPolyne [web] services
Users statesand contexts
Sensor Data Streams
3. User interaction module
Pattern isolationheuristic
Pattern matching:SVD-based measure
AIMS: An Immersidata Management System
16CIDR’03
Challenges of AIMS Subsystems
Acquisition [SIGMETRICS’01,ICME’02] Data should be filtered and transformed (similar to signals) Database friendly signal processing techniques are required
Storage [SIGMOD’03?] Physical level of storage system should be designed to store
transformed data (e.g., wavelet coefficients)• Block allocation strategies considering query patterns
Offline Query and Analysis [EDBT’02.PODS’02] Approximate, progressive, and efficient polynomial analytical query
on large amount of multidimensional data Online Query and Analysis [MMM’03]
Common challenges with querying continuous data streams Real-time pattern recognition on aggregation of multiple data
streams that are incrementally completing Data from all streams form the meaningful data
17CIDR’03
1. Acquisition Module
Receive multidimensional sensor streams In real-time selects different basis per dimension
(optimally) from the DWPT (Discrete Wavelet Packet Transforms) library
Applies multidimensional transformation to data (generates multi-resolution representations of data)
NOTE: no compression is applied, no data will be lost by this process
• INPUT: Multidimensional streams• OUTPUT: Wavelet coefficients
Approaches:
18CIDR’03
2. Storage Module
Optimally packs related wavelet coefficients into disk blocks (to reduce future I/O cost) and store them in the file system or within OR-DBMS
Includes corresponding disk blocks info into the DBMS (Database Management System) for future queries
• INPUT: Wavelet coefficients• OUTPUT: disk blocks
metadata records
Approaches:
21CIDR’03
3. User Interaction Module
Receives data from various input-devices (beyond keyboard and mouse) used by the user (e.g., for data visualization purposes)
Understands the set of requested actions (SVD + mutual-information)
Translate actions to application-specific commands and/or database queries (takes user-profile & context into account)
Also stores a history of users interactions to be mined off-line and/or on-line to extract user state/behavior and application context to facilitate future interactions by the same user (e.g., personalization/customization)
• INPUT: Camera/speech/tracker/immersive-sensor• OUTPUT: application commands and queries
user profile/state and application context
Approaches:
22CIDR’03
4. Query & Analysis Module
Transforms queries into a consistent wavelet domain as of data Performs queries efficiently (and perhaps approximately or
progressively) in the wavelet domain Displays the correct resolution/granularity of aggregate
value(s) and/or events to the user based on user profile (e.g., tolerable latency time) and/or system requirements and/or data availability
An event is tagged with space (e.g., latitude, longitude and altitude), time and bag of attributes
• INPUT: Range and point queries• OUTPUT: Aggregate values/Integrated events
Approaches:
23CIDR’03
AIMS Main Theme: Data Manipulation, Query & Analysis
in the WAVELET Domain Main idea/distinction: storage is cheap and queries
are ad-hoc; let’s keep all the wavelet coefficients! (no data compression)
Intuition: At the data population time, we don’t know which coefficients are more/less important
• Different than the signal-processing objective to reconstruct the entire signal as good as possible
• This has been observed by [Garofalakis & Gibbons, SIGMOD’02], but they proposed other ways to drop coefficients assuming a uniform workload
Opportunity: At the query time, however, we have the knowledge of what is important to the pending query
24CIDR’03
Define range-sum query as dot product of query vector and data vector (also observed by [Gilbert et. al, VLDB’2001] but no query transformation)
Offline: Multidimensional wavelet transform of data
At the query time: “lazy” wavelet transform of query vector (very fast)
Dot product of query and data vectors in the transformed domain exact result
Choose high-energy query coefficients only fast approximate result (90% accuracy by retrieving < 10% of data)
Choose query coefficients in order of energy progressive result
AIMS Main Theme: Q&A of Wavelets
27CIDR’03
1. Acquisition module
DWPT basis selectionfor each dimension
Transformation
2. Storage module
Wavelets packinginto disk blocks or DB BLOBS
Sensor Data storage(file-system + DBMS)
4. Query & analysis module
Application-specificGUI
ProPolyne [web] services
Users statesand contexts
<x, y, z, t, value>Remote Sensor Data Streams
<lat, long, altitude, t, temperature>
3. User interaction module
Pattern isolationheuristic
Pattern matching:SVD-based measure
AIMS with a Twist!
28CIDR’03
Conclusion and Future Work A new application domain, immersive applications, and one of its
data set, immersidata, were introduced Database challenges involved in managing immersidata
discussed: Some direct adoption of the typical database research techniques
(e.g., OLAP) Some modifications/extensions of the current research contributions
(e.g., in the area of data streams) that are not applicable immediately The design of AIMS, an innovative data systems architecture,
were reported Future Work
I/O efficient ways for Wavelet transformation and incremental update Hybrid sorting of both data and query coefficients Prototypical implementation of an end-to-end application using AIMS Performance evaluation
29CIDR’03
Application (3) – Physical/Occupational Therapy Both On-Line and Off-Line Q&A
Rehabilitation research using virtual environments and gaming technologies Enables individuals with severe physical disabilities to use their residual motor
abilities in more efficient and less fatiguing ways Patient watches her video projected on a 2-d virtual environment Video cameras track body movements Animated target characters are manipulated within the environment Patient is asked to hit the targets to gain more score Potential data analysis tasks
Offline analysis of user performance in order to find specific motor disabilities Online analysis of body movements to add more targets in the directions which need more
exercises
31CIDR’03
Haptic Data Acquisition [SIGMETRICS’01]
Temporal aspect: the rate of which the values of sensors should be sampled? Trade-off between ‘accuracy & bandwidth utilization
Fixed Sampling: Sampling at a constant rate; max value of speed is a
function of system speed and/or haptic glove Group Sampling:
Intuitive grouping of sensors; different sampling rate for each group
Adaptive Sampling: Dynamic sampling; within a window of session, every
sensor sampled at an individual optimal rate
32CIDR’03
ProPolyne Features
“Measure” can be any polynomial on any combination of attributes Can support COUNT, SUM, AVERAGE Also supports Covariance, Kurtosis, etc. All using one set of pre-computed aggregates
Independent from how well the data set can be compressed/approximated by wavelets Because: We show “range-sum queries” can always be
approximated well by wavelets (not always HAAR though!)
Low update cost: O(logd N) Can be used for exact, approximate and progressive
range-sum query evaluation
33CIDR’03
Polynomial Range-Sum Queries Polynomial range-sum queries: Q(R,f,I)
I is a finite instance of schema F R SubSetOf Dom(F), is the range f : Dom(F) R is a polynomial of degree
RIxxfIfRQ
)(),,(
IRx
kKxIRQ
xxfCOUNT
2)58,30(1)55,28(1)(1),1,(
1)(1)(:
Example: F = (Age, Salary) R: (25 < age < 40) & (55k < salary < 150k)
Age Salary
25 $50k28 $55k30 $58k50 $100k55 $130k57 $120k
I
IRx
kksalaryKsalaryxfIsalaryRQ
xsalaryxfSUM
113)58,30()55,28()(),,(
)()(:
2^)),1,((
),,(),,(
),1,(
),,(),(
3280)58,30()55,28()()(),,(
IRQ
IsalaryRQIageRQ
IRQ
IagesalaryRQsalaryageCov
MkfKfxagexsalaryIagesalaryRQIRx
34CIDR’03
Polynomial Range-Sum Queries as “Vector Queries”
The data frequency distribution of I is the function I : Dom(F) Z that maps a point x to the number of times it occurs in I
To emphasize the fact that a query is an operator on the data frequency distribution, we write
Example: (25,50)=(28,55)=…=(57,120)=1 and (x)=0 otherwise.
),,(),,( IfRQIfRQ
Age Salary
25 $50k28 $55k30 $58k50 $100k55 $130k57 $120k
I
)(
)()()(),,(FDomx
II xxxffRQ R
1)( xR Rx
0)( xR Rx where:
if
if
Hence:
II RffRQ ,),,( Or:Vector Query query data
35CIDR’03
Ha[i]’s Ga[i]’s
a[i]’sji 20
120 ji
H2a [i]’s GHa[i]’s
H3a[i]’s GH2a[i]’s
220 ji
320 ji
H operator: computes a local average of array a at every other point to produce an array of summary coefficients: Ha
Example (Haar) h=[1/2,1/2]
G operator: measures how much values in the array a vary inside each of the summarized blocks to compute an array of detail coefficients: Ga
Example (Haar) g=[1/2,-1/2]
Overview of Wavelets
a
DWT of a
Summary coefficientsof a at level 2Detail coefficients
of a at level 2
aka wavelet coefficients of a
][ˆ][ˆ][][ baibia
36CIDR’03
Naive Evaluation of Vector Queries Using Wavelets
Hence, vector queries can be computed in the wavelet-transformed space as:
Algorithm: Off-line transformation of data vector (or “data distribution function”, i.e., , to
be exact)
• O (|I|ldlogdN) for sparse data, O (|I|) = Nd for dense data
Transform the query vector at submission
• O (Nd) !
Sum-up the products of the corresponding elements of data and query vectors
• Retrieving elements of data vector: O (Nd) !
1
0,...,
1010
10
),...,(ˆ),...,(ˆ)ˆ,ˆ(),,(N
dd
d
RR fffRQ
37CIDR’03
Fast Evaluation of Vector Queries Using Wavelets
Main intuitions: “query vector” can be transformed quickly because
most of the coefficients are known in advance “Transformed query vector” has a large number of
negligible (e.g., zero) values (independent on how well data can be approximated by wavelet)
Example: Haar filter & COUNT function on R=[5,12] on the domain of integers from 0 to 15:
}0,2
1,0,0,0,
2
1,0,0,
2
1,0,
2
1,0,
22
3,
22
3,
2
1,2{ˆ
}0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0{
R
R
GaGHaGH2aGH3aH4a At each step, you
know the zeros
38CIDR’03
Exact Evaluation of Vector Queries
Query:SUM(salary) when (25 < age < 40) & (55k < salary < 150k)
# of Wavelet Coefficients: 837# of Nonzero Coordinates: 4380
40CIDR’03
Optimal Disk Placement for Wavelet Data
The goal is to efficiently store wavelet coefficients Efficiently means fast access to stored data, low I/O
complexity, little disk access How to achieve this: create a principle of locality of
reference Designed for wavelet overlap queries, but can be extended
for polynomial range-sum queries over multidimensional data
41CIDR’03
Optimal Disk Placement for Wavelet DataDiscrete Wavelet Transform
x0 x1 x3 x4 x5 x6 x7x2
0 1 3 4 5 6 72
DWT
Time Domain
Wavelet Domain(coefficients)
42CIDR’03
SVD Background
The idea of SVD is based on the following theorem of linear algebra: If matrix , then there exist column-orthonormal
matrices U and V such that where and
, and is a diagonal matrix
such that
nmRX TVAUX
rmRU nrRV rrRA ),...,,( 21 paaadiagA
paaa ...21
43CIDR’03
Weighted-Sum SVD
Each data sequence could be represented as a matrix, where the columns (r) are the sensors and hence their # is fixed
The similarity metric of two data sequences is defined on the ‘square’ matrices To eliminate the effect that the number of
rows (i.e., the time dimension) in the two matrices are different (i.e., multiply the matrix by its transpose matrix)
44CIDR’03
Weighted-Sum SVD
Problem: Obtain the similarity of input sequence and the pattern
q11 q1r
qr1 qrr
p11 p1r
pr1 prr
square
square
SVD decompose
SVD decompose
e1, e2, … , er ×c1
cr
c2 ×
e1
e2
er
f1, f2, … , fr ×d1
dr
d2 ×
f1
f2
fr
weight
cw1
cwr
cw2
cw1+cw2+…+ cwr=1
dw1
dwr
dw2
dw1+dw2+…+ dwr=1
45CIDR’03
Weighted-Sum SVD
Problem: Obtain the similarity of input sequence and the pattern
e1, e2, … , er
e1
e2
er
cw1
cwr
cw2 f1, f2, … , fr
f1
f2
fr
dw1
dwr
dw2
r
iiii
r
iiii
fedw
fecw
12
11
The similarity of input sequence
and the pattern
=min(Θ1, Θ2)
46CIDR’03
The Ridge-Climbing Heuristic
Procedure: Compute the accumulated similarity values (ASVs)
between the input sequence and all vocabulary sequences
Keep track of all ASVs For each vocabulary sequence, check whether the ASV
is monotonically increasing, and whether a maximum is reached
• Yes: put this vocabulary into the candidates pool Choose the vocabulary from the candidates pool with
biggest maximal value Isolate the recognized stream
top related