retrieving objects from toriodal mesh data using fastbit technology – a progress report
Post on 04-Jan-2016
41 Views
Preview:
DESCRIPTION
TRANSCRIPT
U.S. Department of Energy Contract No. DE-AC03-76SF00098
Retrieving Objects from Toriodal Mesh Data Using FastBit Technology
– A Progress Report
OutlineOverview of FastBit technology
Recent progresses
John WuScientific Data Management, Berkeley Lab
http://sdm.lbl.gov/fastbit
FastBit 2
FastBit Started In a Big Smash
Searching for clues to Quark-Gluon Plasma in a large set of high-energy collisions High-Energy Physics
experiment STAR 600 participants / 50
institutions / 12 countries Data rate 200 MB/s Data collected 5 PB ~ 1 Billion collision events, 5
MB per event (equivalent to having millions of variables)
Challenge: finding 100 or so events with the best evidence of QGP
FastBit 3
FastBit 10x Faster than DBMS
Queries on 12 most queried attributes (2.2 million records) from STAR High-Energy Physics Experiment, average attribute cardinality 222,000
Experiments confirm that: WAH compressed indexes are 10X faster than bitmap indexes from a
DBMS, 5X faster than our own implementation of BBC Size of WAH compressed indexes is only 30% of raw data size (a B+-
tree index from a popular DBMS system is 3-4X)
2-D queries 5-D queries
[Wu, Otoo, Shoshani 2001]
FastBit 4
FastBit Grew with a Big Boom
Searching for a more fuel efficient combustion engine (Homogeneous-Charge Compression Ignition engine) Require detailed
numerical simulation with hundreds of variables
Simulation mesh: 1000 x 1000 x 1000
1000s time steps per simulation
Challenge: finding and tracking ignition kernels
FastBit 5
FastBit Finds Volumes Faster Than Best Isocontour Finder
FastBit finds volume of interest efficiently with compressed representation of the volume
FastBit identifies volumes of interest as efficient as the best algorithm that identify the surface only (isocontouring), in theory
FastBit is three times faster than the best isocontouring algorithm in VTK
0
0.1
0.2
0.3
Tim
e [
se
c]
vtkKitwareContourFilter DEX
3X
[Wu, Koegler, Chen, Shoshani 2003]
[Stockinger, Shalf, Bethel, Wu 2005]
FastBit
FastBit Milestones
2007/08: FastBit speed up drug discovery tool (first publication not involving any FastBit developers)
2007/08: First public release, version a0.7 2007/06: Physical design reviewed 2007/06: First PhD thesis involving FastBit completed 2006/03: Prove formal optimality 2006/02: Work on Enron data made headline at PRIMEUR 2005/05: Appeared in ACM TechNews 2005/05: Grid Collector wins ISC Award 2005/01: CRD news report on FastBit 2004/12: WAH patent issued
U.S. Department of Energy Contract No. DE-AC03-76SF00098
FastBit Progress Report
Two-level encoding Feature identification on toroidal mesh
http://sdm.lbl.gov/fastbit
FastBit
Two Levels Are Better Than One
Most commonly used bitmap index is one-level equality encoded (e1)
Multi-level encoding was postulated to possibly improve query performance [Wu, Otoo, Shoshoni, 2000] [Sinha, Winslett, 2007]
Through extensive analyses, we found the correct number of coarse level bins to use, and ensure that the two-level encoding always perform better [Wu, Stockinger, Shoshani]
bn = binary encodinge1 = one-level equalityee = equality-equalityre = range-equalityie = interval-equality
FastBit
Feature Identification on Toroidal Mesh
Defines connectivity based on the distances computed from (x, y, z) coordinates
Two ways to speed up the feature identification work with lines instead of points use an efficient connected component labeling algorithm
10 – 100 times faster than working with points [Sinha, Winslett, Wu]
FastBit
Better Approach – Redefine Connectivity
Redefine connectivity based on toroidal coordinates Node A is connect to B and C on the same circle To D and E on the circle just below the current one in the same plane To F and G on the circle of the same radius in the plane just before By symmetry, there are four more points on circles above and after A total of 10 neighbors for every node – more than previous approach
Advantages of such connectivity definition Neighbors of consecutive nodes on a circle, i.e., arc, also form arcs These neighboring arcs fall on four different circles Our labeling algorithm examines only two out of four circles
FastBit
New Connectivity Improves Region Finding
1.E-04
1.E-03
1.E-02
1.E-01
1.E+00
1.E+01
1.E+02
1.E+03
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
num of nodes
tim
e (
s)
XYZ torus - 1 torus - 2
Preliminary results Three different labeling methods shown
XYZ: a nearest-neighbor mesh constructed from (x, y, z) coordinates
Torus – 1: connectivity described on previous page, label nodes Torus – 2: connectivity described previously, label arcs
Speedup = ratio of total time used by two methods
SpeedupTorus – 1 v. XYZ 25Torus – 2 v. XYZ 150Torus – 2 v. Torus – 1 6
FastBit
New Approach Scales Well
Approach torus – 1 scales linearly with the number of nodes in the regions of interest
Approach torus – 2 scales linearly with the number of arcs in the regions on interests
Number of arcs <= number of nodes on the boundaries of the regions
O(|arcs|) O(|boundary|) For regions defined with simple
range conditions such “potential >= 1e-8”, where the boundaries of the regions are isocontours, approach torus – 2 scales as well as the best isocontouring algorithms
Need formal proof
torus - 1
y = 2E-06x
0
2
4
6
8
10
12
14
16
18
0.E+00 2.E+06 4.E+06 6.E+06 8.E+06 1.E+07
number of nodes
lab
el t
ime
torus - 2
y = 3E-06x
0
0.5
1
1.5
2
2.5
3
3.5
0.0E+00 2.0E+05 4.0E+05 6.0E+05 8.0E+05 1.0E+06 1.2E+06
number of arcs
lab
el t
ime
FastBit
Future Plans
GTC data Wrap up the current work on 3D GTC data Prepare for new 5D data Add visualization front-end Work with particles
FastBit software Python API?
Other applications Visualization ?
U.S. Department of Energy Contract No. DE-AC03-76SF00098
Contact Information
FastBit website http://sdm.lbl.gov/fastbit
John’s email John.Wu@nersc.gov
Arie’s email Arie@lbl.gov
FastBit is an efficient searching tool for data-driven science. Key techniques in FastBit have been extensively exercised. If you have an application that requires searching operations, feel
free to contact us.
top related