research of database group @ unsw some slides are taken from memebers @dbg wenjie zhang
TRANSCRIPT
![Page 1: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/1.jpg)
Research of Database Group @ UNSW
Some slides are taken from memebers @DBG
Wenjie Zhang
![Page 2: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/2.jpg)
Group Overview
• Research Field: core topics in DB, DM, IR, MM
• Group Size: 8 staff members; 20+ PhD students
• Research support: Consistent success in government research grant applications
![Page 3: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/3.jpg)
Some recent research projects• Xuemin Lin and Wenjie Zhang: Efficiently Processing Pattern-based
Structure Queries over Large Graphs , ARC Discovery Grant (2015 - 2017 ), $397,500
• Wenjie Zhang and Lei Chen, Continuous Loyalty-based Similarity Queries over Moving Objects, ARC Discovery Project (2015-2017), $266,300
• Lijun Chang, Efficient Cohesive-Subgraph Search over Large Graphs, ARC Early Career Research Award (2015-2017), $372, 000
• Xuemin Lin, Probablistic Search Over Large-Scale Uncertain Graphs, ARC Discovery Project(2014-2016), $413,000
• Xuemin Lin and Wenjie Zhang, Ranking Complex Objects in a Multi-dimensional Space, ARC Discovery Project(2012-2014), $350,000
• Wenjie Zhang, Continuously Monitoring Uncertain Objects in a Multi-dimensional Space, ARC Early Career Research Award (2012-2014), $375,000
![Page 4: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/4.jpg)
What is research ?
• Research comprises "creative work undertaken on a systematic basis in order to increase the stock of knowledge, including knowledge of humans, culture and society, and the use of this stock of knowledge to devise new applications”. ---- wikipedia
![Page 5: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/5.jpg)
Research degrees & projects
• Master by Research
• PhD
• Research projects: 18UoC / 24UoC
![Page 6: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/6.jpg)
Some research topics
• Location based services
• Preference queries on multi-dimensional data
![Page 7: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/7.jpg)
Location based services
• Services that integrate a user’s location with other information to provide added value to a user.
![Page 8: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/8.jpg)
Examples
Navigation and travelGeo-social networkingGamingRetailAdvertisement
and many many more…
![Page 9: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/9.jpg)
Location-based services have a bright future
Number of mobiles > World’s population
24% use LBS and 94% of these find LBS valuable
LBS are a bonanza for start-ups (est. market $13B in 2014)
$21B in 2015
![Page 10: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/10.jpg)
Past Research
Shortest Path Query Range Query k-Nearest Neighbors Query Reverse Nearest Neighbors Query k-Closest Pairs Query
and other similar queries…
![Page 11: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/11.jpg)
Shortest path query
• What is the shortest path from here to airport
![Page 12: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/12.jpg)
Range Query
• Return the coffee shops within 300 meters.
![Page 13: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/13.jpg)
K Nearest Neighbor Queries
• Return the closest fuel stations.
![Page 14: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/14.jpg)
Reverse Nearest Neighbor Query
• Return the cars for which my fuel station is the nearest fuel station.
![Page 15: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/15.jpg)
K-Closest Pairs Return the closest pair of McDonald’s.
![Page 16: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/16.jpg)
Variations
• Static queries VS continuous queries
• Euclidean distance VS network distance
![Page 17: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/17.jpg)
Some research topics
• Location based services
• Preference queries on multi-dimensional data
![Page 18: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/18.jpg)
Preference queries on massive multi-dimension data
DBG@UNSW 18
Massive multidimensional data are collected everyday
location data from various Observational Mechanisms.
- Smart Phone
0.36 billion this year in China – largest smart phone market , expect 0.45 billion next year. Baidu Location based service receives 3.5 billion location requests on average each day.
- Sensor
- Radio Frequency Identification (RFID)
- Global Position System (GPS)
![Page 19: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/19.jpg)
Background
Other Multi-dimensional data from various applications - Environment monitoring Measure light, temperature, humidity…
- Finance and economic data purchase transactions, stock transactions …
- User behavior data click streams , shopping records, … - Network data Network monitoring data - etc.
DBG@UNSW 19
![Page 20: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/20.jpg)
Problems Investigated
DBG@UNSW 20
Given a large number of multi-dimensional objects, we investigate the following representative and fundamental queries.
• Rank-based Queries
Top k query, Quantile query, Influence maximization
• Dominance-based Queries
Skyline query, representative skyline query, dominating queries
• Spatial Keyword queries
![Page 21: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/21.jpg)
DBG@UNSW 21
Rank-based queries
1. Top k query
p2
p1
p3
X : academic score
p4p6
p5p7 p8
Y: rese
arch
score
f(p) = x + y
![Page 22: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/22.jpg)
2. Φ-quantile : summarize score distribution
DBG@UNSW 22
Rank-based queries (cont.)
The first element in a sorted list with the cumulative weight not smaller than Φ, where Φ is a number in (0, 1].
Sorted elements:
3 3 6 7 8 9 12 13 15 20
0.5 quantile (median) 0.8 quantile
![Page 23: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/23.jpg)
• Other Statistics
DBG@UNSW 23
Rank-based queries (cont.)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Find all elements with frequency > 0.1%
Top-k most frequent elements
What is the frequency of element 3? What is the total frequency
of elements between 8 and 14?
How many elements have non-zero frequency?
![Page 24: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/24.jpg)
Rank-based queries (cont.)
• Reverse rank-based queries (ongoing….)– How can an object be the top-1 result ?– For most users ?–With minimum cost ?
![Page 25: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/25.jpg)
Dominance-based queries
DBG@UNSW 25
n-dimensional numeric space D = (D1, …, Dn) on each dimension, a user preference ≺ is defined two points, u dominates v (u ≺ v), if
- Di (1 ≤ i ≤ n), u.Di ≺ = v.Di
- Dj (1 ≤ j ≤ n), u.Dj ≺ v.Dj
p2
p1
p3
p4p6
p5
p7p8
Y: rese
arch
score
X : academic score
![Page 26: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/26.jpg)
DBG@UNSW 26
Dominance-based queries (cont.)Skyline : points not dominated by other points. - candidates of best options in multi-criteria decision applications.
![Page 27: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/27.jpg)
Dominance-based queries (cont.)
• Top-k dominating queries: objects with the highest dominating ability
![Page 28: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/28.jpg)
New challenges (1)
Massive Streaming data Arrive at high speed and the volume of the data is extremely large.
- Twitter : 140 million users and over 340 million tweets per Day
- 200Mb/sec from a single sensor node for reading of the weather data
- AT&T collects 600-800 Gigabytes of NetFlow data each day
- Square Kilometre Array (SKA) project : a few exabytes (1018 bytes) of data per day for a single beam per square kilometer,
![Page 29: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/29.jpg)
Streaming Algorithm
DBG@UNSW 29
Stream processingEngine
Synopses in Memory
Data Streams
( Approximate ) Answer
One scan only Processing time ( fast ) Synopsis size ( small ) Accuracy ( a good tradeoff with synopsis size )
![Page 30: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/30.jpg)
New Challenges (2)
DBG@UNSW 30
The data may be uncertain for various reasons.
Limits of the measuring devices Noise Delay or loss in data transfer. Privacy Data integration
The uncertainty of the data may be described continuously or discretely.
![Page 31: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/31.jpg)
New Challenges (3)
DBG@UNSW 31
Enriched spatial data
Textual data - Twitter , Weibo, Fourquare
The user profile - age, gender, preference, etc.
Multimedia data - photos, videos
![Page 32: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/32.jpg)
An enormous amount of spatio-textual objects
available in many applications• Online local search
e.g., online yellow pages Social network services
e.g., Facebook, Flickr, Twitter
Spatial-Textual Objects
Spatial keyword search
DBG@UNSW 32
![Page 33: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/33.jpg)
Top k spatial keyword search
p1 (pizza,coffee,sushi)
p3 (pizza,sushi)
p2 (pizza,coffee,steak)
p4 (coffee,sushi)
p5 (pizza,steak,seafood)
pizza,coffee
DBG@UNSW 33
![Page 34: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/34.jpg)
A little bit about BIG Data
• What is big data ?– Four Vs: Value, Velocity, Variety, Verocity
• How Big ?– Even scanning (linear algorithm) not
applicable
• How to handle ?– New computational paradigms
![Page 35: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/35.jpg)
A little bit about BIG Data
• A recent Mckinsey Global Institute report forecasts a serious shortage of data science and engineering professionals in 2018.
• Data scientist: the sexiest job of the 21st century
![Page 36: Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bf7b1a28abf838c83883/html5/thumbnails/36.jpg)
Thank you!
Questions?