system support for data-intensive applications
DESCRIPTION
System Support for Data-Intensive Applications. Katherine Yelick U.C. Berkeley, EECS. The “Post PC” Generation. Two technologies will likely dominate: 1) Mobile Consumer Electronic Devices e.g., PDA, Cell phone, wearable computers, with cameras, recorders, sensors - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/1.jpg)
Slide 1
System Support for Data-Intensive Applications
Katherine Yelick U.C. Berkeley, EECS
![Page 2: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/2.jpg)
Slide 2
The “Post PC” Generation
Two technologies will likely dominate:
1) Mobile Consumer Electronic Devices
–e.g., PDA, Cell phone, wearable computers, with cameras, recorders, sensors
–make the computing “invisible” through reliability and simple interfaces
2) Infrastructure to Support such Devices
–e.g., successor to Big Fat Web Servers, Database Servers
–make these “utilities” with reliability and new economic models
![Page 3: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/3.jpg)
Slide 3
Open Research Issues• Human-computer interaction
– uniformity across devices• Distributed computing
– coordination across independent devices• Power
– low power designs and renewable power sources
• Information retrieval– finding useful information amidst a flood of
data• Scalability
– Scaling devices down– Scaling services up
• Reliability and maintainability
![Page 4: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/4.jpg)
Slide 4
The problem space: big data
• Big demand for enormous amounts of data– today: enterprise and internet applications
» online applications: e-commerce, mail, web, archives» enterprise decision-support, data mining databases
– future: richer data and more of it» computational & storage back-ends for mobile devices» more multimedia content» more use of historical data to provide better services
• Two key application domains:– storage: public, private, and institutional data– search: building static indexes, dynamic
discovery
![Page 5: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/5.jpg)
Slide 5
Reliability/Performance Trade-off
• Techniques for reliability:– High level languages with strong types
» avoid memory leaks, wild pointers, etc.» C vs. Java
– Redundant storage, computation, etc.» adds storage and bandwidth overhead
• Techniques for performance:– Optimize for a specific machine
» e.g., cache or memory hierarchy
– Minimize redundancy
• These two goals work against each other
![Page 6: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/6.jpg)
Slide 6
Specific Projects• ISTORE
– A reliable, scalable, maintainable storage system
• Data-intensive applications for “backend” servers– Modeling the real world– Storing and finding information
• Titanium– A high level language (Java) with high
performance– A domain-specific language and
optimizing compiler• Sparsity
– Optimization using partial program input
![Page 7: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/7.jpg)
Slide 7
ISTORE: Reliable Storage System
• 80-node x86-based cluster, 1.4TB storage– cluster nodes are plug-and-play, intelligent, network-
attached storage “bricks”» a single field-replaceable unit to simplify maintenance
– each node is a full x86 PC w/256MB DRAM, 18GB disk– 2-node system running now; full system in next quarter
ISTORE Chassis80 nodes, 8 per tray2 levels of switches•20 100 Mbit/s•2 1 Gbit/sEnvironment Monitoring:UPS, redundant PS,fans, heat and vibration sensors...
Intelligent Disk “Brick”Portable PC CPU: Pentium II/266 + DRAM
Redundant NICs (4 100 Mb/s links)Diagnostic Processor
Disk
Half-height canister
![Page 8: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/8.jpg)
Slide 8
A glimpse into the future?
• System-on-a-chip enables computer, memory, redundant network interfaces without significantly increasing size of disk
• ISTORE HW in 5-7 years:
– building block: 2006 MicroDrive integrated with IRAM » 9GB disk, 50 MB/sec from disk» connected via crossbar switch
– 10,000 nodes fit into one rack!
• O(10,000) scale is our ultimate design point
![Page 9: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/9.jpg)
Slide 9
Specific Projects• ISTORE
– A reliable, scalable, maintainable storage system
• Data-intensive applications for “backend” servers– Modeling the real world– Storing and finding information
• Titanium– A high level language (Java) with high
performance– A domain-specific language and
optimizing compiler• Sparsity
– Optimization using partial program input
![Page 10: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/10.jpg)
Slide 10
Heart Modeling• A computer simulation of a human heart
– Used to design artificial heart valves– Simulations run for days on a C90 supercomputer– Done by Peskin and MacQueen at NYU
• Modern machines are faster but harder to use– working with NYU– using Titanium
• Shown here: close-up of aortic valve during ejection
• Images from the Pittsburgh Supercomputer Center
![Page 11: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/11.jpg)
Slide 11
Simulation of a Beating Heart
• Shown here:– Aortic valve (yellow); Mitral valve (purple)– Mitral valves closes when left ventrical pumps
• Future: virtual surgery?
![Page 12: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/12.jpg)
Slide 12
Earthquake Simulation• Earthquake modeling
– Used for retrofitting buildings, emergency preparedness, construction policies– Done by Beilak (CMU); also by Fenves (Berkeley)
– Problems: grid (graph) generation; using images
![Page 13: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/13.jpg)
Slide 13
Earthquake Simuation• Movie shows a simulated aftershock following the
1994 Northridge earthquake in California
• Future: sensors everywhere; tied to central system
![Page 14: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/14.jpg)
Slide 14
Pollution Standards• Simulation of ozone layer
– Done by Russell (CMU) and McRae (MIT)– Used to influence automobile emissions
policy
Los Angeles Basin shown at 8am (left) and 2pm (right)
The “cloud” shows areas where ozone levels are above federal ambient air quality standards (0.12 parts per million)
![Page 15: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/15.jpg)
Slide 15
Information Retrieval• Finding useful information amidst huge data sets
– I/O intensive application• Today’s example: web search engines
– 10 Million documents in typical matrix. – Web storage increasing 2x every 5 months– One class of techniques based on sparse
matrices
• Problem: Can you make this run faster, without writing hand-optimized, non-portable code?
# keywords
~100K
# documents ~= 10 M
•Matrix is compressed
•“Random” memory access
•Cache miss per 2Flops
•Run at 1-5% of machine peak
x
![Page 16: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/16.jpg)
Slide 16
Image-Based Retrieval• Digital library
problem: – retrieval on images– content-based
• Computer vision problem– uses sparse matrix
• Future: search in medical image databases; diagnosis; epidemiological studies
![Page 17: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/17.jpg)
Slide 17
Object Based Image Description
![Page 18: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/18.jpg)
Slide 18
Specific Projects• ISTORE
– A reliable, scalable, maintainable storage system
• Data-intensive applications for “backend” servers– Modeling the real world– Storing and finding information
• Titanium– A high level language (Java) with high
performance– A domain-specific language and
optimizing compiler• Sparsity
– Optimization using partial program input
![Page 19: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/19.jpg)
Slide 19
Titanium Goals• Help programmers write reliable software
– Retain safety properties of Java– Extend to parallel programming constructs
• Performance– Sequential code comparable to C/C++/Fortran– Parallel performance comparable to MPI
• Portability• How?
– Domain-specific language and compiler– No JVM– Optimizing compiler– Explicit parallelism and other language
constructs for high performance
![Page 20: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/20.jpg)
Slide 20
Titanium Overview: Sequential
Object-oriented language based on Java with:• Immutable classes
– user-definable non-reference types for performance
• Unordered loops– compiler is free to run iteration in any order– useful for cache optimizations and others
• Operator overloading– by demand from our user community
• Multidimensional arrays– points and index sets as first-class values – specific to an application domain: scientific
computing with block-structured grids
![Page 21: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/21.jpg)
Slide 21
Titanium Overview: ParallelExtensions of Java for scalable parallelism:• Scalable parallelism
– SPMD model with global address space• Global communication library
– E.g., broadcast, exchange (all-to-all)– Used to build data structures in the
global address space• Parallel Optimizations
– Pointer operations– Communication (underway)
• Bulk asynchronous I/O– speed with safety
![Page 22: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/22.jpg)
Slide 22
Implementation• Strategy
– Compile Titanium into C– Communicate through shared memory on SMPs– Lightweight communication for distributed
memory
• Titanium currently runs on:– Uniprocessors– SMPs with Posix or Solaris threads– Berkeley NOW, SP2 (distributed memory)– Tera MTA (multithreaded, hierarchical)– Cray T3E (global address space) – SP3 (cluster of SMPs, e.g., Blue Horizon at
SDSC)
![Page 23: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/23.jpg)
Slide 23
Sequential Performance
C/C++/FORTRAN
JavaArrays
TitaniumArrays Overhead
DAXPY3D multigrid2D multigridEM3D
1.4s12s
5.4s0.7s 1.8s 1.0s 42%
15%83%
7%
6.2s22s
1.5s6.8s
Ultrasparc:
C/C++/FORTRAN
JavaArrays
TitaniumArrays Overhead
DAXPY3D multigrid2D multigridEM3D
1.8s23.0s
7.3s1.0s 1.6s 60%
-25%-13%27%
5.5s20.0s
2.3s
Pentium II:
Performance results from 98; new IR and optimization framework almost complete.
![Page 24: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/24.jpg)
Slide 24
SPMD Execution Model
• Java programs can be run as Titanium, but the result will be that all processors do all the work
• E.g., parallel hello world class HelloWorld { public static void main (String [] argv) { System.out.println(‘’Hello from proc ‘’ + Ti.thisProc()); } }
• Any non-trivial program will have communication and synchronization
![Page 25: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/25.jpg)
Slide 25
SPMD Execution Model
• A common style is compute/communicate
• E.g., in each timestep within particle simulation with gravitation attraction
read all particles and compute forces on mine Ti.barrier(); write to my particles using new forces Ti.barrier();
• This basic model is used on the large-scale parallel simulations described earlier
![Page 26: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/26.jpg)
Slide 26
SPMD Model• All processor start together and execute same
code, but not in lock-step• Basic control done using
– Ti.numProcs() total number of processors– Ti.thisProc() number of executing processor
• Sometimes they do something independent if (Ti.thisProc() == 0) { ….. do setup ..… }
System.out.println(‘’Hello from ‘’ + Ti.thisProc());
double [1d] a = new double [Ti.numProcs()];
![Page 27: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/27.jpg)
Slide 27
Barriers and Single
• Common source of bugs is barriers or other global operations inside branches or loops
barrier, broadcast, reduction, exchange• A “single” method is one called by all procs
public single static void allStep(...)• A “single” variable has same value on all procs
int single timestep = 0;
• The compiler uses “single” type annotations to ensure there are no synchronization bugs with barriers
![Page 28: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/28.jpg)
Slide 28
Explicit Communication: Exchange
• To create shared data structures– each processor builds its own piece– pieces are exchanged (for object, just
exchange pointers)• Exchange primitive in Titanium int [1d] single allData; allData = new int [0:Ti.numProcs()-1]; allData.exchange(Ti.thisProc()*2);
• E.g., on 4 procs, each will have copy of allData:
0 2 4 6
![Page 29: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/29.jpg)
Slide 29
Exchange on Objects
• More interesting example: class Boxed { public Boxed (int j) {
val = j;
}
public in val;
}
Object [1d] single allData;
allData = new Object [0:Ti.numProcs()-1];
allData.exchange(new Boxed(Ti.thisProc());
![Page 30: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/30.jpg)
Slide 30
Use of Global / Local• As seen, references (pointers) may be remote
– easy to port shared-memory programs• Global pointers are more expensive than local
– True even when data is on the same processor
– Use local declarations in critical sections• Costs of global:
– space (processor number + memory address)
– dereference time (check to see if local)• May declare references as local
![Page 31: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/31.jpg)
Slide 31
Global Address Space
• Processes allocate locally• References can be passed
to other processes
Class C { int val;….. }C gv; // global pointerC local lv; // local pointer
if (thisProc() == 0) {lv = new C();
}gv = broadcast lv from 0; gv.val = …..; ….. = gv.val;
Process 0Other
processes
lv
gv
lv
gv
![Page 32: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/32.jpg)
Slide 32
Local Pointer Analysis
• Compiler can infer many uses of local– “Local Qualification Inference” (Liblit’s work)
• Data structures must be well partitioned
Effect of LQI
0
50
100
150
200
250
cannon lu sample gsrb poison
applications
run
nin
g t
ime
(s
ec
)
Original
After LQI
![Page 33: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/33.jpg)
Slide 33
Bulk Asynchronous I/O Performance
async
bulkds
bulkraf
dsb
ds
raf
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
0 10 20 30 40 50 60
File Size (MB)
Th
rou
gh
pu
t (M
B/s
ec
)
External sort benchmark on NOW
• raf: random access file (Java)
• ds: unbuffered stream (Java)
• dsb: buffered stream (Java)
• bulkraf: bulk random access (Titanium)
• bulkds: bulk sequential (Titanium)
• async: asynchronous (Titanium)
![Page 34: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/34.jpg)
Slide 34
Performance Heterogeneity
• System performance limited by the weakest link• Performance heterogeneity is the norm
– disks: inner vs. outer track (50%), fragmentation– processors: load (1.5-5x)
• Virtual Streams: dynamically off-load I/O work from slower disks to faster ones
0
1
2
3
4
5
6
100% 67% 39% 29%
Efficiency Of Single Slow Disk
Min
imu
m P
er-
Pro
ce
ss
B
an
dw
idth
(MB
/se
c)
Ideal
Virtual Streams
Static
![Page 35: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/35.jpg)
Slide 35
Parallel performance on an SMP
• Speedup on Ultrasparc SMP (shared memory multiprocessor)
• EM3D performance linear
– simple kernel
• AMR largely limited by
– problem size
– 2 levels, with top one serial
0
1
2
3
4
5
6
7
8
1 2 4 8
em3d
amr
![Page 36: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/36.jpg)
Slide 36
Parallel Performance on a NOW
• MLC for Finite-Differences by Balls and Colella• Poisson equation with infinite boundaries
– arise in astrophysics, some biological systems, etc.• Method is scalable
– Low communication
• Performance on– SP2 (shown) and t3e– scaled speedups– nearly ideal (flat)
• Currently 2D and non-adaptive 0
0.2
0.4
0.6
0.8
1
1.2
1 4 16
processors
Tim
e/f
ine
-pa
tch
-ite
r/p
roc
129x129/65x65
129x129/33x33
257x257/129x129
257x257/65x65
![Page 37: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/37.jpg)
Slide 37
Performance on CLUMPs• Clusters of SMPs (CLUMPs) have two-levels of
communication– BH at SDSC has 144 nodes, each with 8
nodes– 8th processor cannot be used effectively
GSRB performance with 700x700 patches
0
10
20
30
40
50
60
70
0 5 10 15 20 25 30 35
Processes
Tim
e (s
)
1 p/node
2 p/node
4 p/node
7 p/node
8 p/node
![Page 38: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/38.jpg)
Slide 38
Cluster of SMPs• Communication within a node is shared-
memory• Communication between nodes uses LAPI
– for large messages, a separate thread is created by LAPI
– interferes with computation performanceAggregate bandwidth with multiple processes
0
10
20
30
40
50
0 10000 20000 30000 40000 50000 60000 70000
Data Size (bytes)
Ban
dwid
th (M
B/s
) 1 p/node
2 p/node
4 p/node
7 p/node
8 p/node
![Page 39: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/39.jpg)
Slide 39
Optimizing Parallel Programs• Would like compiler to introduce asynchronous
communication, which is a form of possible reordering
• Hardware also reorders– out-of-order execution– write buffered with read by-pass– non-FIFO write buffers
• Software already reorders too– register allocation– any code motion
• System provides enforcement primitives– volatile: at the language level not well-defined– tend to be heavy weight, unpredictable
• Can the compiler hide all this?
![Page 40: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/40.jpg)
Slide 40
Semantics: Sequential Consistency
• When compiling sequential programs:
Valid if y not in expr1 and x not in expr2 (roughly)
• When compiling parallel code, not sufficient test.
y = expr2;
x = expr1;
x = expr1;
y = expr2;
Initially flag = data = 0
Proc A Proc B
data = 1; while (flag==1);
flag = 1; ... = ...data...;
![Page 41: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/41.jpg)
Slide 41
Cycle Detection: Dependence Analog
• Processors define a “program order” on accesses from the same thread P is the union of these total orders
• Memory system define an “access order” on accesses to the same variable
A is access order (read/write & write/write pairs)
• A violation of sequential consistency is cycle in P U A.
• Intuition: time cannot flow backwards.
write data read flag
write flag read data
![Page 42: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/42.jpg)
Slide 42
Cycle Detection
• Generalizes to arbitrary numbers of variables and processors
• Cycles may be arbitrarily long, but it is sufficient to consider only cycles with 1 or 2 consecutive stops per processor [Sasha & Snir]
write x write y read y
read y write x
![Page 43: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/43.jpg)
Slide 43
Static Analysis for Cycle Detection
• Approximate P by the control flow graph• Approximate A by undirected “dependence”
edges• Let the “delay set” D be all edges from P that
are part of a minimal cycle
• The execution order of D edge must be preserved; other P edges may be reordered (modulo usual rules about serial code)
• Synchronization analsysis also critical [Krishnamurthy]
write z read x
read y write z
write y read x
![Page 44: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/44.jpg)
Slide 44
Automatic Communication Optimization
• Implemented in subset of C with limited pointers • Experiments on the NOW; 3 synchronization
styles
• Future: pointer analysis and optimizations
Tim
e (
no
rma
lized
)
![Page 45: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/45.jpg)
Slide 45
Specific Projects• ISTORE
– A reliable, scalable, maintainable storage system
• Data-intensive applications for “backend” servers– Modeling the real world– Storing and finding information
• Titanium– A high level language (Java) with high
performance– A domain-specific language and
optimizing compiler• Sparsity
– Optimization using partial program input
![Page 46: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/46.jpg)
Slide 46
Sparsity: Sparse Matrix Optimizer
• Several data mining or web search algorithms use sparse matrix-vector multiplication– use for documents, images, video, etc.– irregular, indirect memory patterns perform
poorly on memory hierarchies• Performance improvements possible, but depend
on: – sparsity structure, e.g., keywords within
documents– machine parameters without analytical models
• Good news:– operation repeated many times on similar matrix– Sparsity: automatic code generator based on
matrix structure and machine
![Page 47: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/47.jpg)
Slide 47
Sparsity: Sparse Matrix Optimizer
![Page 48: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/48.jpg)
Slide 48
Summary• Future
– small devices + larger servers– reliability increasingly important
• Reliability techniques include– hardware: redundancy, monitoring– software: better languages, many others
• Performance trades off against safety in languages– use of domain-specific features (e.g.,
Titanium)
![Page 49: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/49.jpg)
Slide 49
Backup Slides
![Page 50: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/50.jpg)
Slide 50
The Big Motivators for Programming Systems
Research
• Ease of Programming– Hardware costs -> 0– Software costs -> infinity
• Correctness– Increasing reliance on software increases
cost of software errors (medical, financial, etc.)
• Performance– Increasing machine complexity– New languages and applications
» Enabling Java; network packet filters
![Page 51: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/51.jpg)
Slide 51
The Real Scalability Problems: AME
• Availability– systems should continue to meet quality of
service goals despite hardware and software failures and extreme load
• Maintainability– systems should require only minimal ongoing
human administration, regardless of scale or complexity
• Evolutionary Growth– systems should evolve gracefully in terms of
performance, maintainability, and availability as they are grown/upgraded/expanded
• These are problems at today’s scales, and will only get worse as systems grow
![Page 52: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/52.jpg)
Slide 52
Research Principles
• Redundancy everywhere, no single point of failure• Performance secondary to AME
– Performance robustness over peak performance– Dedicate resources to AME
» biological systems use > 50% of resources on maintenance
– Optimizations viewed as AME-enablers » e.g., use of (slower) safe languages like Java with static
and dynamic optimizations
• Introspection– reactive techniques to detect and adapt to
failures, workload variations, and system evolution
– proactive techniques to anticipate and avert problems before they happen
![Page 53: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/53.jpg)
Slide 53
Outline• Motivation• Hardware Techniques
– general techniques– ISTORE projects
• Software Techniques• Availability Benchmarks• Conclusions
![Page 54: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/54.jpg)
Slide 54
Hardware techniques
• Fully shared-nothing cluster organization– truly scalable architecture, automatic
redundancy– tolerates partial hardware failure
• No Central Processor Unit: distribute processing with storage– Most storage servers limited by speed of CPUs;
why does this make sense?– Amortize sheet metal, power, cooling
infrastructure for disk to add processor, memory, and network
• On-demand network partitioning/isolation– Applications must tolerate these anyway – Allows testing, repair of online system
![Page 55: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/55.jpg)
Slide 55
Hardware techniques
• Heavily instrumented hardware– sensors for temp, vibration, humidity, power
• Independent diagnostic processor on each node– remote control of power, console, boot code– collects, stores, processes environmental
data – connected via independent network
• Built-in fault injection capabilities– Used for proactive hardware introspection
» automated detection of flaky components» controlled testing of error-recovery mechanisms
– Important for AME benchmarking
![Page 56: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/56.jpg)
Slide 56
ISTORE-2 Hardware Proposal• Smaller disks
– replace 3.5” disks with 2.5” or 1” drives» 340MB available now in 1”, 1 GB next year (?)
• Smaller, more highly integrated processors– E.g., Transmeta Crusoe includes processor
and Northbridge (interface) functionality in 1 Watt
– Xilinx FPGA for Southbridge, diagnostic proc, etc.
• Larger scale– Roughly 1000 nodes, depending on support
» ISTORE-1 built with donated disks, memory, processors
» Paid for network, board design, enclosures (discounted)
![Page 57: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/57.jpg)
Slide 57
Outline• Motivation• Hardware Techniques• Software Techniques
– general techniques– Titanium: a high performance Java dialect– Sparsity: using dynamic information– Virtual streams: performance robustness
• Availability Benchmarks • Conclusions
![Page 58: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/58.jpg)
Slide 58
Software techniques• Fault tolerant data structures
– Application controls replication, checkpointing, and consistency policy
– Self-scrubbing used to identify software errors that have corrupted application state
• Encourage use of safe languages– Type safety and automatic memory
management avoid a host of application errors– Use of static and dynamic information to meet
performance needs• Runtime adaptation to performance
heterogeneity– e.g., outer vs. inner track (1.5X),
fragmentation– Evolution of systems adds to this problem
![Page 59: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/59.jpg)
Slide 59
Software Techniques• Reactive introspection
– Use statistical techniques to identify normal behavior and detect deviations from it» e.g., network activity, response time, program
counter (?)
– Semi-automatic response to abnormal behavior » initially, rely on human administrator » eventually, system learns to set response
parameters
• Proactive introspection– Continuous online self-testing
» in deployed systems!» goal is to shake out bugs in failure response code
on isolated subset» use of fault-injection and stress testing
![Page 60: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/60.jpg)
Slide 60
Techniques for Safe Languages
Titanium: A high performance dialect of Java• Scalable parallelism
– A global address space, but not shared memory
– For tightly-coupled applications, e.g., mining– Safe, region-based memory management
• Scalar performance enhancements, some specific to application domain – immutable classes (avoids indirection)– multidimensional arrays with subarrays
• Application domains– scientific computing on grids
» typically +/-20% of C++/F in this domain– data mining in progress
![Page 61: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/61.jpg)
Slide 61
Use of Static Information• Titanium compiler performs parallel
optimizations– communication overlap (40%) and aggregation
• Uses two new analyses– synchronization analysis: the parallel analog
to control flow analysis » identifies code segments that may execute
in parallel– shared variable analysis: the parallel analog to
dependence analysis»recognize when reordering can be observed
by another processor»necessary for any code motion or use of
relaxed memory models in hardware => missed or illegal optimizations
![Page 62: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/62.jpg)
Slide 62
Conclusions• Two key applications domains
– Storage: loosely coupled– Search: tightly coupled, computation important
• Key challenges to future servers are:– Availability, Maintainability, and Evolutionary
growth
• Use of self-monitoring to satisfy AME goals– Proactive and reactive techniques
• Use of static techniques for high performance and reliable software– Titanium extension of Java
• Use of dynamic information for performance robustness– Sparsity and Virtual Streams
• Availability benchmarks a powerful tool?
![Page 63: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/63.jpg)
Slide 63
Projects and Participants
ISTORE: iram.cs.berkeley.edu/istore
With James Beck, Aaron Brown, Daniel Hettena, David Oppenheimer, Randi Thomas, Noah Treuhaft, David Patterson, John Kubiatowicz
Titanium: www.cs.berkeley.edu/projects/titanium
With Greg Balls, Dan Bonachea, David Gay, Ben Liblit, Chang-Sun Lin, Peter McQuorquodale, Carleton Miyamoto, Geoff Pike, Alex Aiken, Phil Colella, Susan Graham, Paul Hilfinger
Sparsity: www.cs.berkeley.edu/~ejim/sparsity
With Eun-Jin Im
![Page 64: System Support for Data-Intensive Applications](https://reader035.vdocument.in/reader035/viewer/2022062718/56812ced550346895d91b3bd/html5/thumbnails/64.jpg)
Slide 64
History of Programming Language Research
70s 80s 90s 2K
Flop optimization
General PurposeLanguage Design
Parsing Theory
Domain-SpecificLanguage Design
Type Systems Theory
Memory Optimizations
GarbageCollection Threads
Program Verification Program Checking Tools
Data and Control AnalysisType-Based Analysis