robust nest systems minitask report lockheed martin mit mitre osu uc berkeley university of virginia...
TRANSCRIPT
Robust NEST Systems Minitask Report
Lockheed Martin MIT
MitreOSU
UC BerkeleyUniversity of Virginia
Vanderbilt
Santosh Kumar @ OSU December 2003
2
Robustness Questions
Given all NEST demos fielded this year, we are in a position to consider:
• What robustness properties can be claimed of extant NEST (middleware and) systems?
• What robustness issues have been observed by various teams that need to be resolved?
• What low technology/cost strategies would defeat/diminish robustness of NEST systems?
3
List of Sources
• Field Experiment robustness experience reports (centred on Mica2 platform): MIT: Fort Benning Grid OSU: A Line in the Sand UVa: Waking Up Big Brother Vandy: Shooter Localization (includes many suggestions)
• Lockheed Martin categorization of various failure scenarios that need to be handled by applications
• Mitre evaluation of various short-, medium-, and long-term robustness issues
5
Our Job
When I said, “The Red-Team should ‘shot down’ the Blue-Team’s
proposal”, I didn’t have paint-ball guns in mind.
• Our task was to searching out robustness problems.
• We focused on transition and deployment impact. The program’s success metric is
transition.
• Did not design an anti-netted-sensor systems. We did think thought about
counter-measures. Don’t think this is the top
problem. Maybe 2 years from now.
6
Multiple Scales of Consideration
• When viewed on a short time-scale robustness issues have a different character than when viewed on a long time-scale.
One (of many) taxonomies:• Flaws and implementation issues.• Engineering issues.• Technology issues.• Fundamental science issues.
ScienceTechnologyEngineeringDebuging
1 Week 1 Month 1 Year 5 Years 10 Years
Rel
ativ
eE
ffo
rt
7
Flaws and Bugs
Timer Module (9):• UCB timer module can exhibit
up to 10% error. May have been fixed ??? UCB clock is dead-on.
• VU timer changes the semantics; less general. VU clock is finer grained than
UCB clock (a rare problem).
• UO timer not yet available.• Timers are blocked by tasks.
Almost impossible to be sure no task will be running when the timer event occurs.
Antennas (9):• Monopole antenna with no ground
plane.• Antenna connectors require special
tools (easily broken).• We think there’s also an impedance
mismatch.
MICA2 MAC layer (8):• Abstracts away needed timing
control.• Have retry bugs been fixed ???
Anti-Aliasing Filter (8):• Acoustic sensor can’t be used
without an anti-aliasing filter.
8
Flaws and Bugs (cont.)
Flash (8):• SRAM is about the right size for
stack and local variables. Everything else should go in
non-volatile memory.
• Non-volatile memory uses negligible power when sleeping. Can be 100 or 10k times bigger
for same power level. Assuming low duty cycle.
• External flash is too slow. There exists a “fast” write
method; but not used.
• External flash is also too small.
Test Suit (7):• Some better than others.• Few showed up with
adequate test suits.
Fault Identification (6):• Need better methods of
identifying faults.• Hardware and software
faults.
9
Flaws and Bugs (cont.)
Wireless Reprogramming (6):• Even the single-hop version
corrupts the memory, too often. No wireless recovery possible.
• GenericBase chokes on high volume of data.
• Does not support the 38.4 kbps transfer rate of MICA2.
Misc (5):• Degaussing circuit.• Wind guard on the microphone.• Temperature sensitivity.• General maintenance rate.
5% per day?
Battery management (4):• I have a drawer full of batters
that are about 15% used. Don’t work in motes, but do
work in most other devices. Mote battery death occurs when
voltage drops.
• Software measurement of remaining capacity (not useful).
Programming Boards (3):• Burn motes if turned on and
external power is connected.• Old board often fails to
reprogram.
10
Engineering Challenges
Sensor Range (9):• Disruptive tech. usually offers something fundamentally new, in
exchange for lower performance according to the legacy metrics. The “legacy metric” seems to be, “sensing area per dollar”.
• We will be allowed a higher cost per area if we offer new capabilities. This is good; since bigger nodes tend to be cheaper per area.
Similarities to Grosh’s Law in computer architecture.
• However, jumping to a 3 m range is too big a jump.
Time
Log
ofP
erfo
rman
ce
Generation 0 Generation 1
Customer Needs
Overly AggressiveTransition Target
AppropriateTransition Target
11
Engineering Challenges (cont.)
Sensor Range (cont):• Concealment needs limit node
size per coverage area. Mica2 ~20 sq cm. One every 15 m might be “lost” in
the environment. One every meter easily found. Useful area ratios 1e5 to 1e6.
• Sensor range must be greater than average density (~ 1.3x).
Area 10 ppm
1 ppm
Laptop
85e-3 m2 52 m 165 m
4” x 6”
15e-3 m2 22 m 70 m
Mica2 2e-3 m2 8 m 25 m
Marble 200e-6 m2
2.5 m 8 m
Density requirement stemming from concealment criteria.
12
Engineering Challenges (cont.)
Synchronization Metric (9):• 30 sec to achieve 8 μs sync.• Drifts 30 μs every sec.• Good for ~1/4 sec.
Assuming ±8 μs drift. i.e., total error ±16 μs.
• For many synchronization models, accuracy is proportional to synchronization rate. i.e., over some region.
• Desirable metrics are:1. Accuracy per duty cycle.
2. Range of applicability.
• Alternative metrics. e.g. if common model doesn’t apply:1. Accuracy at 0.5% duty cycle.
2. Duty cycle at 1 ms accuracy.
10-6
10-5
10-4
10-3
10-2
10-1
100
101
102
10-4
10-3
10-2
10-1
100
101
102
Worst Case Error in Sec
Dut
y C
ycle
10 ms Sync Overhead100 ms Sync Overhead1 s Sync OverehadCurrent Metric
13
Engineering Challenges (cont.)
Flash-Based Data Store (8):• Read cost comparable to SRAM.
Not with the 3-bit serial interface used in the Mote.
• Write cost ~6 times read cost.• Erase cost ~400 times read cost.
Cost per byte can be made low with larger blocks.
• Well known secret: use log-structures files systems. Always write to end of log. Update by writing new copy. Clean blocks before erasing. Well suited for garbage collectors.
14
Engineering Challenges (cont.)
Service Composition Model (8):• Developed and demonstrated in
isolation can’t be combined.• Key problem is timing conflicts.
Timing knowledge is implicit.
• May have herd a viable standard: Use pseudo-random timing. Low duty cycle service. Timing collisions result in clean
event loss. All servers can handle occasional
event loss.
• If this is “the” composition method it’s underused.
LPI and LPD (7):• All active signals must be below the
noise floor (at receiver). Must track noise level.
• SNR determines the ratio of signal range to discovery range. Typically need coding gain plus SNR
to be about +6 dB.
• Detestability 1 spot in 1000 might need 36 dB coding gain. 1 s interval; 17 min. per node.
0 db -10 dB -20 dB
10 m 10 m 3.1 m 1 m
30 m 30 m 9.3 m 3 m
100 m 100 m 32 m 10 m
Area 100% 10 % 1 %
15
Engineering Challenges (cont.)
Packaging (7):• Concealment.• Sensor/environment interfaces.• Hydrophones.
Better Non-Volatile Memory (6):• Several new non-volatile
memory technologies. Ferroelectric memories. Magneto-resistive memories. Ovonic unified memories.
• Ideal for low-duty cycle designs.
Debugging Harness (4):• Useful for development; not
used in deployment.• Distributed debugging may
require far higher comm. rate than the actual application.
• Development cycles use far more power than deployment.
16
Technology Challenges
Over-the-air Programming (10):• Efficient reliable multi-cast.• OS-style security.
An errant program should not be able to prevent loading.
• Dynamic linking and loading. Make incremental structure
explicit rather than trying to discover it after the fact.
Extra finer grained.
• Multiple levels of security: Factory approved loads. Platforms (third party code).
Over-the-air Programming (cont).
• Platform grade security: Protection from live-lock. Protection from dead-lock. Protection from corruption. May require preemption?
• Issue will be around for years. Phenomenal progress. Good enough for FY04. We’re far from commercial.
17
Technology Challenges (cont.)
Almost Always Off Comms (9):• Tends to violate traditional
comm. assumptions.• Complex trade space.
Higher power routs may be lower latency.
Wakeup rate vs. latency. Different tasks may require
different walkup rates. Emergent scheduling vs.
centralized scheduled. Randomized schedules. Combining multiple services
yields complex schedules. Different states will require
different wake up rates.
Node Localization (8):• Must be very robust.• Multipart is key problem.• Improved ranging.• Not sure it’s really this hard.
18
Technology Challenges (cont.)
New Sensor Modes (IFF) (7):• What is the best way to detect
people? Nature suggests not sound. Really want electronic olfactory.
• Are there any senor mode for identifying combatants.
• Which environments require or benefit from proximal sensing.
• What quantities are mot useful in more congested environments. Chem., bio., speech, power
usage, civilian flight, …
Tragedy of the Commons (6):• No incentive to be a
responsible user of shared resources.
• No enforcement.• Not even a widely agreed
upon definition of what is fair use.
• Want better wealth distribution, but not communism.
19
Technology Challenges (cont.)
Byzantine Behavior Model (6):• Unavoidable:
Some nodes jabber continuously.
Some sensors, “See ‘reds’ under their beds”
In a real deployment a few nodes may be compromised.
• Need mixed probabilistic and worst case analysis framework. e.g., detection and tracking.
• Also need robustness with respect to event loss.
• Would greatly improve the viability of security.
20
Technology Challenges (cont.)
Small Antennas (5):• We build a proper mono-pole
antenna (per user manual). It worked great. In almost all environments.
Small Antennas (5):• We build a proper mono-pole
antenna (per user manual). It worked great. In almost all environments. Of course, it’s useless.
• Building small antennas is hard. Need to adapt to near-field
environment (e.g., loading). Incentive for longer
wavelength (e.g., foliage). Software adjustable antennas.
• High dielectric antennas.
21
Fundamental Scientific Issues
Signal Processing Power (8):• For a technology family a joules-
per-bit-op are nearly constant over a wide range of node size.
• Key the vision of small nodes. Given a fixed energy supply, not
much incentive to use a more powerful node.
Mica2 gets ~3GbOPJ. State of the art ~20GbOPJ. Not ~300 GbOPJ.
• Moore’s law helps this metric slowly (comported to intuition). Doubles ~4 to 6 years.
Signal Processing Power (cont):• Big wins can be had.
ASICS are typically 400x more efficient.
Fully custom designs may be 1000x more efficient.
• Requires use of non-GPP. DSPs 4 to 12x. PolyMorphic computing. FPGAs 100x.
• The alternative is to use complex signal processing algorithms. Clearly part of the vision. May not be enough.
22
Fundamental Scientific Issues (cont.)
N-Log-N (8):• Taxonomy of scaling:
O(N); centralized/monolithic. O(sqrt(N)) or O(cbrt(N)); non-
scalable. O(N Ln(N)); quasi-scalable. O(N); absolutely scalable.
N-Log-N (8):• Taxonomy of scaling:
O(N); centralized/monolithic. O(sqrt(N)) or O(cbrt(N)); non-
scalable. O(N Ln(N)); quasi-scalable. O(N); absolutely scalable.
• However, can’t implement a spanning tree with motes. Comm. range limits size.
• N ln(N) implies Log_2(N) layers. i.e., 10 to 20 layers (not 2).
N-Log-N (cont.):• Not going to build 13 different
sized motes.• But we could built
configurable motes. SDR allows you to vary the
comm. range. You can vary the clock rate. You may even be able to
reconfigure the CPU. Use a sentry-like service to
vary the power rate.
• Still need 2 or 3 types of hardware, but not 10 to 20.
23
Fundamental Scientific Issues (cont.)
Programmable Analog Triggers(7):
• Negligible power, wakeup-triggers can be build for most <sensor, target, app> triplet. Allow lower duty cycles. Simply software; no polling of
environment.
• However, such triggers would have to be configurable in-situ.
• Need the analog counterpart to FPGAs.
Signal Processing Methods (7):• Legacy signal processing
methods assume: Far-field. Essence of problem is extracting
signal in low SNR. Abundant computation.
• Next generation of signal processing will need to assume: Near-field. Special structure of signals are
observable and useful. Essence of problem is finding the
few high SNR signals. Computation is battery.
24
Fundamental Scientific Issues
Non-Radio Comms (6):• Short range and low bandwidth
may not favor RF comm. Acoustic comm. may work. E-field comm. will work. Laser comm. IRDA.
Chaos Theory (5):• A lot of work; not many
answers.• Controlling emergent behavior
will eventually be a critical problem.
25
Laundry Lists Affect
• A major problem with “top-10” lists is they invite a “Laundry List” response from the audience.
• I’m not trying to create check-list; I’m suggesting priorities.
26
Weighting Functions
ScienceTechnologyEngineeringDebuging
1 Week 1 Month 1 Year 5 Years 10 Years
Rel
ativ
eE
ffo
rt
Too Commercial
Too Easy
Researcher's Perspective
ScienceTechnologyEngineeringDebuging
1 Week 1 Month 1 Year 5 Years 10 Years
Rel
ativ
eE
ffo
rt
Too High Risk
System Integrator Prespective
ScienceTechnologyEngineeringDebuging
1 Week 1 Month 1 Year 5 Years 10 Years
Rel
ativ
eE
ffo
rt
Not Yet R
eady
DOD Transition Prespective
Low Risk
28
Disruptive Technology 101
• Positive feedback exists in technology adoption. Sales volume -> lower costs -> sales volume.
• If the feedback is strong enough, the timing of the technology transition becomes chaotic. Sensitive to such small events as to approach randomness.
• New technology may be extremely difficult. The P4 design team was bigger than the Manhattan Project design team.
• The pace of change is fast.• It’s not always clear which technology will win. • The new technology may be in an unrelated field.
Thin film disks required replacing lots of MEs with EEs.
This may be why technologists fail;
it’s not why (properly managed) technology companies fail.
29
Disruptive Technology 101 (cont.)
• Most abrupt changes in technology are not disruptive. Most of the time the leader in the old technology is the first-mover and the
eventual leader in the new technology.
• The disruptive transitions occur when performance outpaces customer’s needs.
• Successful customers and companies anticipate the sustaining changes.
• Disruption occurs when the “low-tech” solution wins.
• New tech. Under-performs, but has other advantages. Usually fundamentally different
advantages.
• New market becomes large and subsumes old market. Time
Perfo
rman
ce
Tech. A
Tech. B
Tech. C
Tech. D
CustomerNeeds
Driving force is that Moore’s law outruns any sensible growth in
demand.