mining for empty rectangles in large data sets jeff edmonds jarek gryz dongming liang renee miller
TRANSCRIPT
![Page 1: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller](https://reader030.vdocument.in/reader030/viewer/2022032703/56649f555503460f94c78ba3/html5/thumbnails/1.jpg)
Mining for Empty Rectangles in Large Data Sets
Jeff Edmonds
Jarek Gryz
Dongming Liang
Renee Miller
![Page 2: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller](https://reader030.vdocument.in/reader030/viewer/2022032703/56649f555503460f94c78ba3/html5/thumbnails/2.jpg)
2
0 0 1 1 0 0 0 0 1
1 2 3 6 7 8
Matrix representation
A B 3 1 3
6 7 8
A,B(R S)
![Page 3: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller](https://reader030.vdocument.in/reader030/viewer/2022032703/56649f555503460f94c78ba3/html5/thumbnails/3.jpg)
3
0 0 1 1 0 0 0 0 1
1 2 3 6 7 8
Find All Maximal 0-Rectangles
A,B(R S)
000
0 00
al
00
0
um
A B 3 1 3
6 7 8
![Page 4: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller](https://reader030.vdocument.in/reader030/viewer/2022032703/56649f555503460f94c78ba3/html5/thumbnails/4.jpg)
4
0 0 1 1 0 0 0 0 1
95 96 97 BMW Z3 Honda L2 Toyota 6A
Example
A,B(R S)
0 0Car Year
…
First BMW Z3 series cars were made in 1997.
![Page 5: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller](https://reader030.vdocument.in/reader030/viewer/2022032703/56649f555503460f94c78ba3/html5/thumbnails/5.jpg)
5
Relation to Previous Work
[Lui, Ku, Hsu] & [Orlowski] Our Work
Problem:
Purpose:• Machine Learning• Computational Geometry
• Query Optimization
• between points in real plane
• within a 0-1 matrix
Find all maximal empty rectangles
# of maximal 0-rectangles:• O( (# 1’s)2 ) • O( #0’s )
[Namaad, Hsu, Lee]
![Page 6: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller](https://reader030.vdocument.in/reader030/viewer/2022032703/56649f555503460f94c78ba3/html5/thumbnails/6.jpg)
6
Relation to Previous WorkOur Work
Time:
Space:• O(|X||Y|) • O(min(|X|, |Y|))
• only two rows of matrix kept in memory
• O( # 1’s log(#1’s) + # rectangles ) = O(|X||Y|)
• O( #0’s ) = O(|X||Y|)
[Lui, Ku, Hsu] & [Orlowski][Namaad, Hsu, Lee]
![Page 7: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller](https://reader030.vdocument.in/reader030/viewer/2022032703/56649f555503460f94c78ba3/html5/thumbnails/7.jpg)
7
Relation to Previous WorkOur Work
Practical Implementation:
Scalable:• Scales Badly • Scales well wrt
• # of tuples in join• # of maximal rectangles• # of values |X| & |Y|
• Intensive random memory access
Requires a single scan of the sorted data
Practical?• IBM paid us $25,000
to patent it!
[Lui, Ku, Hsu] & [Orlowski][Namaad, Hsu, Lee]
![Page 8: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller](https://reader030.vdocument.in/reader030/viewer/2022032703/56649f555503460f94c78ba3/html5/thumbnails/8.jpg)
8
Structure of Algorithmloop y = 1..|Y|
loop x = 1..|X|• Output all maximal 0-rectangles
with <x,y> as bottom-right corner• Maintain the loop invariant
1
1
1
1
1
X
•0
Y
0
1
Timing
O(1) amortized time per <x,y>
<x,y> *
![Page 9: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller](https://reader030.vdocument.in/reader030/viewer/2022032703/56649f555503460f94c78ba3/html5/thumbnails/9.jpg)
9
Designing an Algorithm Define Problem Define Loop
InvariantsDefine Measure of Progress
Define Step Define Exit Condition Maintain Loop Inv
Make Progress Initial Conditions Ending
km
79 km
to school
Exit
Exit
79 km 75 km
Exit
Exit
0 km Exit
![Page 10: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller](https://reader030.vdocument.in/reader030/viewer/2022032703/56649f555503460f94c78ba3/html5/thumbnails/10.jpg)
10
1
1
1
1
1•00
1
XY
<x,y> *
Define the Loop Invariant• We have read the matrix up to <x,y>
and cannot reread the matrix.• We must output all maximal 0-rectangles
with <x,y> as bottom-right corner• What must we remember?
![Page 11: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller](https://reader030.vdocument.in/reader030/viewer/2022032703/56649f555503460f94c78ba3/html5/thumbnails/11.jpg)
11
0
step
1
1
1
1
1 0
( x ,y )r r
( x ,y )1 1
( x ,y )2 2
( x ,y )3 3
( x ,y )4 4
( x ,y )5 5
Stack of steps 1
1
X
Y
<x,y> *1 0 0 0 0
10
00
0
0
x*
y*
![Page 12: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller](https://reader030.vdocument.in/reader030/viewer/2022032703/56649f555503460f94c78ba3/html5/thumbnails/12.jpg)
12
1
1
1
1
1
1
1
X
Y
0
1 0 0 0 0
10
00
0
0
( x ,y )r r
( x ,y )1 1
( x , y )
0
<x,y> *
Constructing Maximal Rectangles
![Page 13: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller](https://reader030.vdocument.in/reader030/viewer/2022032703/56649f555503460f94c78ba3/html5/thumbnails/13.jpg)
13
1
1
1
1
1
1
1
X
Y
0
1 0 0 0 0
10
00
0
0
( x ,y )r r
( x ,y )1 1
( x , y )
0
• Too Narrow • Maximal• Too short
<x,y> *
Constructing Maximal Rectangles
![Page 14: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller](https://reader030.vdocument.in/reader030/viewer/2022032703/56649f555503460f94c78ba3/html5/thumbnails/14.jpg)
14<x-1,y> *
Constructing staircase(x,y)from staircase(x-1,y)
1
1
1
1
1
1
1
1 0 0 0 0
00
00
0
0
0
00
00
0
1
0
00
0
Case 1
<x,y> *
0
![Page 15: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller](https://reader030.vdocument.in/reader030/viewer/2022032703/56649f555503460f94c78ba3/html5/thumbnails/15.jpg)
151
1
1
1
1
1
1
X
Y
0
1 0 0 0 0
1
0
00
0
0
( x ,y )r r
( x ,y )1 1
( x, y )
0<x-1,y> *
Constructing staircase(x,y)from staircase(x-1,y)
0
Case 2
![Page 16: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller](https://reader030.vdocument.in/reader030/viewer/2022032703/56649f555503460f94c78ba3/html5/thumbnails/16.jpg)
161
1
1
1
1
1
1
X
Y
0
1 0 0 0 0
1
0
00
0
0
( x ,y )r r
( x ,y )1 1
( x, y )
0
• Too Narrow • Maximal• Too short
<x-1,y> *
Constructing staircase(x,y)from staircase(x-1,y)
00
Delete
Keep
<x,y> *
0
![Page 17: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller](https://reader030.vdocument.in/reader030/viewer/2022032703/56649f555503460f94c78ba3/html5/thumbnails/17.jpg)
17
Constructing x* & y*
1
1
1
1
1
1
1
0
1 0 0 0 0
( x ,y )r r
( x ,y )1 1
( x, y )
0<x,y> *
00
00
0
0
01
0
x*
y*
![Page 18: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller](https://reader030.vdocument.in/reader030/viewer/2022032703/56649f555503460f94c78ba3/html5/thumbnails/18.jpg)
18X
Y
<x,y>
10
0
00
00
0
100
00
0
01
0
1
00
0
00
00
0
0
01
000
00
0
0
100
00
0
0
10
01
0
0
10
00
0
0
10
0
00
00
0
0
100
00
0
0
01
000
00
0
0
10
Location of last 1 seen in each column
*
![Page 19: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller](https://reader030.vdocument.in/reader030/viewer/2022032703/56649f555503460f94c78ba3/html5/thumbnails/19.jpg)
19
Structure of Algorithmloop y = 1..|Y|
loop x = 1..|X|• Construct staircase(x,y)• Output all maximal 0-rectangles
with <x,y> as bottom-right corner
1
1
1
1
1
X
•0
Y
<x.y>
0
1
Timing
O(1) amortized time per <x,y>
Third
<x,y> *
![Page 20: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller](https://reader030.vdocument.in/reader030/viewer/2022032703/56649f555503460f94c78ba3/html5/thumbnails/20.jpg)
201
1
1
1
1
1
1
X
Y
0
1 0 0 0 0
1
0
00
0
0
( x ,y )r r
( x ,y )1 1
( x, y )
0
• Too Narrow • Maximal• Too short
<x,y> *
Timing
00
Delete
0
Only work that is not constant Time
![Page 21: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller](https://reader030.vdocument.in/reader030/viewer/2022032703/56649f555503460f94c78ba3/html5/thumbnails/21.jpg)
21
TimingAmortized # of steps deleted (per <x,y>)
= # of steps created (per <x,y>) 1£
<x-1,y> *1
1
1
1
1
1
1
1 0 0 0 0
00
00
0
0
0
00
00
0
1
0
00
0
![Page 22: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller](https://reader030.vdocument.in/reader030/viewer/2022032703/56649f555503460f94c78ba3/html5/thumbnails/22.jpg)
22
Number of Maximal Rectangles
# of maximal 0-rectangles:
• O( (# 1’s)2 ) [Namaad, Hsu, Lee]• Running time of alg = O( #0’s )
£
£
![Page 23: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller](https://reader030.vdocument.in/reader030/viewer/2022032703/56649f555503460f94c78ba3/html5/thumbnails/23.jpg)
23
Designing an Algorithm Define Problem Define Loop
InvariantsDefine Measure of Progress
Define Step Define Exit Condition Maintain Loop Inv
Make Progress Initial Conditions Ending
km
79 km
to school
Exit
Exit
79 km 75 km
Exit
Exit
0 km Exit