b reakout s ession - b ig b ench - 3rd workshop on big data benchmarking july 16-17 xi‘an, china
TRANSCRIPT
BREAKOUT SESSION- BIGBENCH -
3rd Workshop on Big Data Benchmarking
July 16-17
Xi‘an, China
BIGBENCH – FURTHER DEVELOPMENT (1)
• Late Binding needs to be addressed in BigBench- Pre- or Post-queries- Workload has to deal with missing values- Possibly start with Weblogs- Add columns to tables dynamically
• Scaling factor needs to be proven for data generation rate and query result size
Data model specific:• Integration of media resources considered, but excluded• Localization (WGS84) aspect for Customer (potentially for
reviews, considered as minor important since postal code available)
Late Binding ::= the schema information will be evaluated during runtime.
Support for Graph structures:• Integration of hash-tag functionality• (Re-)Tweet like methods on recommendation of Customer• On-the-fly analysis will end in graph structures (e.g., “give
me all Customers retweeting a positive review of product XY“)
BIGBENCH – FURTHER DEVELOPMENT (2)
Open Issues:• Is localization an issue for a benchmark?• Do images/other media add value to a data benchmark?
BIGBENCH – OPEN ISSUES
• Big Data Challenge– Have people implement BigBench– Hive version will be out soon– Discussion later
• Big Data Pipeline– BigBench somewhere in the middle/end?– Discussion later
BIGBENCH – FURTHER STEPS