incremental data transformations on wide-column stores with notaql
TRANSCRIPT
![Page 1: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/1.jpg)
on Wide-Column Stores
M. Sc. Johannes Schildgen2015-06-29
Incremental Data Transformations
with
![Page 2: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/2.jpg)
"A DBA walks into a NoSQL bar, but turns and leaves because he couldn't find a table"
![Page 3: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/3.jpg)
Column Families
RowId info children
![Page 4: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/4.jpg)
RowId info childrenPeter
Lisa
born cmpny salary
1965 IBM 70k
Lisa Carl Susi Toni
€5 €0 €10 €7
born school
1997 BSIT
Peter, 1965,IBM, 70k
Lisa, 1997,BSIT
Column Families
€10 €5
€0€7
![Page 5: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/5.jpg)
HBase APIput ‘pers‘, ‘Carl‘, ‘info:born‘, ‘1982‘
put ‘pers‘, ‘Carl‘, ‘info:school‘, ‘BSIT‘
put ‘pers‘, ‘Carl‘, ‘info:school‘, ‘BUIT‘
get ‘pers‘, ‘Carl‘
![Page 6: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/6.jpg)
Jaspersoft HBase QL
{ "tableName": "pers", "deserializerClass": "com.jaspersoft…DefaultDeserializer", "filter": { "SingleColumnValueFilter": { "family": „info", "qualifier": „school", "compareOp": "EQUAL", "comparator": { "SubstringComparator": { "substr":
„BSIT" } } } }}
𝛔𝐬𝐜𝐡𝐨𝐨𝐥 ¿ ′𝐁𝐒𝐈𝐓 ′𝐩𝐞𝐫𝐬
http://community.jaspersoft.com/wiki/jaspersoft-hbase-query-language
![Page 7: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/7.jpg)
Phoenix
SELECT * FROM pers WHERE school = ‘BSIT‘
𝛔𝐬𝐜𝐡𝐨𝐨𝐥 ¿ ′𝐁𝐒𝐈𝐓 ′𝐩𝐞𝐫𝐬
https://github.com/forcedotcom/phoenix
„Parent of each person?“
![Page 8: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/8.jpg)
![Page 9: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/9.jpg)
Input Table Output Table
![Page 10: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/10.jpg)
Column
RowID Value
Input Cell Output Cell
Column
RowID Value
![Page 11: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/11.jpg)
_c
_r _v
_c
_r _v
Input Cell Output Cell
![Page 12: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/12.jpg)
_cborn
_rLisa
_v1997
_cborn
_rLisa
_v1997
Input Cell Output Cell
𝐩𝐞𝐫𝐬
OUT._r <- IN._r,OUT.born <- IN.born;
𝝅𝒃𝒐𝒓𝒏
![Page 13: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/13.jpg)
_cborn
_rLisa
_v1997
_cborn
_rLisa
_v1997
Input Cell Output Cell
𝐩𝐞𝐫𝐬
OUT._r <- IN._r,OUT.born <- IN.born,OUT.school <- IN.school;
𝝅𝒃𝒐𝒓𝒏 , 𝒔𝒄𝒉𝒐𝒐𝒍
![Page 14: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/14.jpg)
_cborn
_rLisa
_v1997
_cborn
_rLisa
_v1997
Input Cell Output Cell
𝐩𝐞𝐫𝐬
OUT._r <- IN._r,OUT.$(IN._c) <- IN._v;
𝛔𝐬𝐜𝐡𝐨𝐨𝐥 ¿ ′𝐁𝐒𝐈𝐓 ′
![Page 15: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/15.jpg)
_cborn
_rLisa
_v1997
_cborn
_rLisa
_v1997
Input Cell Output Cell
IN-FILTER: school=‘BSIT‘,OUT._r <- IN._r,OUT.$(IN._c) <- IN._v;
row predicate
𝐩𝐞𝐫𝐬𝛔𝐬𝐜𝐡𝐨𝐨𝐥 ¿ ′𝐁𝐒𝐈𝐓 ′
![Page 16: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/16.jpg)
That was:Selection and Projection
![Page 17: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/17.jpg)
Now:Grouping
![Page 18: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/18.jpg)
_ccmpny
_rPeter
_vIBM
_csalsum
_rIBM
_v645k
Input Cell Output Cell
Salary sum of each company.
OUT._r <- IN.cmpny, OUT.salsum <- SUM(IN.salary);
![Page 19: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/19.jpg)
RowId info
Eve
Carl
Julia
Lisa
OUT._r <- IN.cmpny, OUT.salsum <- SUM(IN.salary):
born cmpny salary
1965 IBM 70k
born cmpny job
1966 IBM intern
born cmpny salary
1967 IBM 80k
born school salary
1997 BSIT 1k
salsum
IBM 70k
salsum
IBM 80k
salsum
IBM 150k
![Page 20: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/20.jpg)
Advanced Transformations:More Filters
![Page 21: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/21.jpg)
RowId info childrenPeter
Lisa
born cmpny salary
1965 IBM 70k
Lisa Carl Susi Toni
€5 €0 €10 €7
born school
1997 BSIT
Peter, 1965,IBM, 70k
Lisa, 1997,BSIT
€10 €5
€0€7
![Page 22: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/22.jpg)
RowId info childrenPeter
Lisa
born cmpny salary
1965 IBM 70k
Lisa Carl Susi Toni
€5 €0 €10 €7
born school
1997 BSIT
OUT._r <- IN._r,OUT.$(IN._c) <- IN._v;
![Page 23: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/23.jpg)
RowId info childrenPeter
Lisa
born cmpny salary
1965 IBM 70k
Lisa Carl Susi Toni
€5 €0 €10 €7
born school
1997 BSIT
IN-FILTER: COL_COUNT(children)>0OUT._r <- IN._r,OUT.$(IN._c) <- IN._v;
![Page 24: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/24.jpg)
RowId info childrenPeter
Lisa
born cmpny salary
1965 IBM 70k
Lisa Carl Susi Toni
€5 €0 €10 €7
born school
1997 BSIT
IN-FILTER: COL_COUNT(children)>0OUT._r <- IN._r,OUT.$(IN.children._c) <- IN._v;
![Page 25: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/25.jpg)
RowId info childrenPeter
Lisa
born cmpny salary
1965 IBM 70k
Lisa Carl Susi Toni
€5 €0 €10 €7
born school
1997 BSIT
IN-FILTER: COL_COUNT(children)>0OUT._r <- IN._r,OUT.$(IN.children._c?(@>5)) <- IN._v;
cell predicate
![Page 26: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/26.jpg)
RowId info childrenPeter
Lisa
born cmpny salary
1965 IBM 70k
Lisa Carl Susi Toni
€5 €0 €10 €7
born school
1997 BSIT
IN-FILTER: COL_COUNT(children)>0OUT._r <- IN._r,OUT.$(IN.children._c?(!Carl)) <- IN._v;
cell predicate
![Page 27: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/27.jpg)
NotaQL Transformation Platform:MapReduce
![Page 28: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/28.jpg)
map(rowId, row)
row violates row pred.?
has more columns?
no
cell violates cell pred.?
yes
map IN.{_r,_c,_v}, fetched columns and constants to r,c and v
no
emit((r, c), v)
no
yes
Stop
yes
RowId info
Peter born cmpny salary
1965 IBM 70k
salsum
IBM 70k
((IBM, salsum), 70k)
![Page 29: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/29.jpg)
reduce((r,c), {v})
put(r, c, aggregateAll(v))
Stop
((IBM, salsum), {70k, 80k, 10k})
((IBM, salsum), 160k)
![Page 30: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/30.jpg)
Incremental Transformations:Self-Maintainability
![Page 31: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/31.jpg)
Salary sum of all people born before 1980 per company.
IN-FILTER: born<1980,OUT._r <- IN.cmpny, OUT.salsum <- SUM(IN.salary);
17:45 17:47 17:50
Execute job New peopleare added
Execute jobagain
Δ + =
![Page 32: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/32.jpg)
Salary sum of all people born before 1980 per company.
IN-FILTER: born<1980,OUT._r <- IN.cmpny, OUT.salsum <- SUM(IN.salary);
17:45 17:47 17:50
Execute job New peopleare added
Execute jobagain
salsum
IBM 150k
born cmpny salary
Melissa 1989 IBM 50k
Nora 1977 IBM 80k
salsum
IBM 230k+
![Page 33: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/33.jpg)
Salary sum of all people born before 1980 per company.
IN-FILTER: born<1980,OUT._r <- IN.cmpny, OUT.salsum <- SUM(IN.salary);
17:50 17:52 17:55
Execute job Peopleare deleted
Execute jobagain
salsum
IBM 230k
born cmpny salary
Melissa 1989 IBM 50k
Nora 1977 IBM 80k
salsum
IBM 150k-
![Page 34: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/34.jpg)
Salary sum of all people born before 1980 per company.
IN-FILTER: born<1980,OUT._r <- IN.cmpny, OUT.salsum <- SUM(IN.salary);
17:50 17:52 17:55
Execute job Peopleare updated
Execute jobagain
born cmpny salary
Peter 1965 IBM 70k1990
-= 70k
![Page 35: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/35.jpg)
Salary sum of all people born before 1980 per company.
IN-FILTER: born<1980,OUT._r <- IN.cmpny, OUT.salsum <- SUM(IN.salary);
17:50 17:52 17:55
Execute job Peopleare updated
Execute jobagain
born cmpny salary
Peter 1965 IBM 70k1990
+= 70k1965
![Page 36: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/36.jpg)
Salary sum of all people born before 1980 per company.
IN-FILTER: born<1980,OUT._r <- IN.cmpny, OUT.salsum <- SUM(IN.salary);
17:50 17:52 17:55
Execute job Peopleare updated
Execute jobagain
born cmpny salary
Peter 1965 IBM 70k75k
+= 75k-70k
![Page 37: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/37.jpg)
Salary sum of all people born before 1980 per company.
IN-FILTER: born<1980,OUT._r <- IN.cmpny, OUT.salsum <- SUM(IN.salary);
17:50 17:52 17:55
Execute job Peopleare updated
Execute jobagain
born cmpny salary
Peter 1965 IBM 70kSAP
-= 70k
![Page 38: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/38.jpg)
map(rowId, row)
row violates row pred.?
has more columns?
no
cell violates cell pred.?
yes
map IN.{_r,_c,_v}, fetched columns and constants to r,c and v
no
emit((r, c), v)
no
yes
Stop
yes
reduce((r,c), {v})
put(r, c, aggregateAll(v))
Stop
Just read Delta (ts>ts‘)
![Page 39: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/39.jpg)
map(rowId, row)
row violates row pred.?
has more columns?
no
cell violates cell pred.?
yes
map IN.{_r,_c,_v}, fetched columns and constants to r,c and v
no
emit((r, c), v)
no
yes
Stop
yes
reduce((r,c), {v})
put(r, c, aggregateAll(v))
Stop
…and the former Result
![Page 40: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/40.jpg)
map(rowId, row)
row violates row pred.?
has more columns?
no
cell violates cell pred.?
yes
map IN.{_r,_c,_v}, fetched columns and constants to r,c and v
no
emit((r, c), v)
no
yes
Stop
yes
reduce((r,c), {v})
put(r, c, aggregateAll(v))
Stop
Was it satisfied on previous execution?
Yes? Invert v
*
![Page 41: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/41.jpg)
map(rowId, row)
row violates row pred.?
has more columns?
no
cell violates cell pred.?
yes
map IN.{_r,_c,_v}, fetched columns and constants to r,c and v
no
emit((r, c), v)
no
yes
Stop
yes
reduce((r,c), {v})
put(r, c, aggregateAll(v))
Stop
Unchanged?
![Page 42: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/42.jpg)
Evaluation:Performance
![Page 43: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/43.jpg)
TPCH 1 (selection, projection, sum of prices per status code)
TPCH 6 (selection, projection, sum of all prices)
Reverse Web-Link Graph0
10
20
30
40
50
60
70
80
90
100
Native Hadoop
NotaQL (non-inc.)
NotaQL (0.1% changes, timestamp-based CDC)
NotaQL (10% changes, manual CDC)
![Page 44: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/44.jpg)
Conclusion
• Selection, Projection• Grouping, Aggregation• Schema-Flexible• Horizontal Aggregation• MetadataData• Graph Processing• Text Processing
SQLIncremental!
![Page 45: Incremental Data Transformations on Wide-Column Stores with NotaQL](https://reader035.vdocument.in/reader035/viewer/2022062220/55c5ecb3bb61ebf3158b472a/html5/thumbnails/45.jpg)
Thank you!