robust nest systems minitask report lockheed martin mit mitre osu uc berkeley university of virginia...

Robust NEST Systems Minitask Report

Lockheed Martin MIT

MitreOSU

UC BerkeleyUniversity of Virginia

Vanderbilt

Santosh Kumar @ OSU December 2003

2

Robustness Questions

Given all NEST demos fielded this year, we are in a position to consider:

• What robustness properties can be claimed of extant NEST (middleware and) systems?

• What robustness issues have been observed by various teams that need to be resolved?

• What low technology/cost strategies would defeat/diminish robustness of NEST systems?

3

List of Sources

• Field Experiment robustness experience reports (centred on Mica2 platform): MIT: Fort Benning Grid OSU: A Line in the Sand UVa: Waking Up Big Brother Vandy: Shooter Localization (includes many suggestions)

• Lockheed Martin categorization of various failure scenarios that need to be handled by applications

• Mitre evaluation of various short-, medium-, and long-term robustness issues

Robustness MiniTaskRed Team Perspective

Kenneth W. Parker, Ph.D.

December 17, 2003

5

Our Job

When I said, “The Red-Team should ‘shot down’ the Blue-Team’s

proposal”, I didn’t have paint-ball guns in mind.

• Our task was to searching out robustness problems.

• We focused on transition and deployment impact. The program’s success metric is

transition.

• Did not design an anti-netted-sensor systems. We did think thought about

counter-measures. Don’t think this is the top

problem. Maybe 2 years from now.

6

Multiple Scales of Consideration

• When viewed on a short time-scale robustness issues have a different character than when viewed on a long time-scale.

One (of many) taxonomies:• Flaws and implementation issues.• Engineering issues.• Technology issues.• Fundamental science issues.

ScienceTechnologyEngineeringDebuging

1 Week 1 Month 1 Year 5 Years 10 Years

Rel

ativ

eE

ffo

rt

7

Flaws and Bugs

Timer Module (9):• UCB timer module can exhibit

up to 10% error. May have been fixed ??? UCB clock is dead-on.

• VU timer changes the semantics; less general. VU clock is finer grained than

UCB clock (a rare problem).

• UO timer not yet available.• Timers are blocked by tasks.

Almost impossible to be sure no task will be running when the timer event occurs.

Antennas (9):• Monopole antenna with no ground

plane.• Antenna connectors require special

tools (easily broken).• We think there’s also an impedance

mismatch.

MICA2 MAC layer (8):• Abstracts away needed timing

control.• Have retry bugs been fixed ???

Anti-Aliasing Filter (8):• Acoustic sensor can’t be used

without an anti-aliasing filter.

8

Flaws and Bugs (cont.)

Flash (8):• SRAM is about the right size for

stack and local variables. Everything else should go in

non-volatile memory.

• Non-volatile memory uses negligible power when sleeping. Can be 100 or 10k times bigger

for same power level. Assuming low duty cycle.

• External flash is too slow. There exists a “fast” write

method; but not used.

• External flash is also too small.

Test Suit (7):• Some better than others.• Few showed up with

adequate test suits.

Fault Identification (6):• Need better methods of

identifying faults.• Hardware and software

faults.

9

Flaws and Bugs (cont.)

Wireless Reprogramming (6):• Even the single-hop version

corrupts the memory, too often. No wireless recovery possible.

• GenericBase chokes on high volume of data.

• Does not support the 38.4 kbps transfer rate of MICA2.

Misc (5):• Degaussing circuit.• Wind guard on the microphone.• Temperature sensitivity.• General maintenance rate.

5% per day?

Battery management (4):• I have a drawer full of batters

that are about 15% used. Don’t work in motes, but do

work in most other devices. Mote battery death occurs when

voltage drops.

• Software measurement of remaining capacity (not useful).

Programming Boards (3):• Burn motes if turned on and

external power is connected.• Old board often fails to

reprogram.

10

Engineering Challenges

Sensor Range (9):• Disruptive tech. usually offers something fundamentally new, in

exchange for lower performance according to the legacy metrics. The “legacy metric” seems to be, “sensing area per dollar”.

• We will be allowed a higher cost per area if we offer new capabilities. This is good; since bigger nodes tend to be cheaper per area.

Similarities to Grosh’s Law in computer architecture.

• However, jumping to a 3 m range is too big a jump.

Time

Log

ofP

erfo

rman

ce

Generation 0 Generation 1

Customer Needs

Overly AggressiveTransition Target

AppropriateTransition Target

11

Engineering Challenges (cont.)

Sensor Range (cont):• Concealment needs limit node

size per coverage area. Mica2 ~20 sq cm. One every 15 m might be “lost” in

the environment. One every meter easily found. Useful area ratios 1e5 to 1e6.

• Sensor range must be greater than average density (~ 1.3x).

Area 10 ppm

1 ppm

Laptop

85e-3 m2 52 m 165 m

4” x 6”

15e-3 m2 22 m 70 m

Mica2 2e-3 m2 8 m 25 m

Marble 200e-6 m2

2.5 m 8 m

Density requirement stemming from concealment criteria.

12


Synchronization Metric (9):• 30 sec to achieve 8 μs sync.• Drifts 30 μs every sec.• Good for ~1/4 sec.

Assuming ±8 μs drift. i.e., total error ±16 μs.

• For many synchronization models, accuracy is proportional to synchronization rate. i.e., over some region.

• Desirable metrics are:1. Accuracy per duty cycle.

2. Range of applicability.

• Alternative metrics. e.g. if common model doesn’t apply:1. Accuracy at 0.5% duty cycle.

2. Duty cycle at 1 ms accuracy.

10-6

10-5

10-4

10-3

10-2

10-1

100

101

102

10-4

10-3

10-2

10-1

100

101

102

Worst Case Error in Sec

Dut

y C

ycle

10 ms Sync Overhead100 ms Sync Overhead1 s Sync OverehadCurrent Metric

13


Flash-Based Data Store (8):• Read cost comparable to SRAM.

Not with the 3-bit serial interface used in the Mote.

• Write cost ~6 times read cost.• Erase cost ~400 times read cost.

Cost per byte can be made low with larger blocks.

• Well known secret: use log-structures files systems. Always write to end of log. Update by writing new copy. Clean blocks before erasing. Well suited for garbage collectors.

14


Service Composition Model (8):• Developed and demonstrated in

isolation can’t be combined.• Key problem is timing conflicts.

Timing knowledge is implicit.

• May have herd a viable standard: Use pseudo-random timing. Low duty cycle service. Timing collisions result in clean

event loss. All servers can handle occasional

event loss.

• If this is “the” composition method it’s underused.

LPI and LPD (7):• All active signals must be below the

noise floor (at receiver). Must track noise level.

• SNR determines the ratio of signal range to discovery range. Typically need coding gain plus SNR

to be about +6 dB.

• Detestability 1 spot in 1000 might need 36 dB coding gain. 1 s interval; 17 min. per node.

0 db -10 dB -20 dB

10 m 10 m 3.1 m 1 m

30 m 30 m 9.3 m 3 m

100 m 100 m 32 m 10 m

Area 100% 10 % 1 %

15


Packaging (7):• Concealment.• Sensor/environment interfaces.• Hydrophones.

Better Non-Volatile Memory (6):• Several new non-volatile

memory technologies. Ferroelectric memories. Magneto-resistive memories. Ovonic unified memories.

• Ideal for low-duty cycle designs.

Debugging Harness (4):• Useful for development; not

used in deployment.• Distributed debugging may

require far higher comm. rate than the actual application.

• Development cycles use far more power than deployment.

16

Technology Challenges

Over-the-air Programming (10):• Efficient reliable multi-cast.• OS-style security.

An errant program should not be able to prevent loading.

• Dynamic linking and loading. Make incremental structure

explicit rather than trying to discover it after the fact.

Extra finer grained.

• Multiple levels of security: Factory approved loads. Platforms (third party code).

Over-the-air Programming (cont).

• Platform grade security: Protection from live-lock. Protection from dead-lock. Protection from corruption. May require preemption?

• Issue will be around for years. Phenomenal progress. Good enough for FY04. We’re far from commercial.

17

Technology Challenges (cont.)

Almost Always Off Comms (9):• Tends to violate traditional

comm. assumptions.• Complex trade space.

Higher power routs may be lower latency.

Wakeup rate vs. latency. Different tasks may require

different walkup rates. Emergent scheduling vs.

centralized scheduled. Randomized schedules. Combining multiple services

yields complex schedules. Different states will require

different wake up rates.

Node Localization (8):• Must be very robust.• Multipart is key problem.• Improved ranging.• Not sure it’s really this hard.

18


New Sensor Modes (IFF) (7):• What is the best way to detect

people? Nature suggests not sound. Really want electronic olfactory.

• Are there any senor mode for identifying combatants.

• Which environments require or benefit from proximal sensing.

• What quantities are mot useful in more congested environments. Chem., bio., speech, power

usage, civilian flight, …

Tragedy of the Commons (6):• No incentive to be a

responsible user of shared resources.

• No enforcement.• Not even a widely agreed

upon definition of what is fair use.

• Want better wealth distribution, but not communism.

19


Byzantine Behavior Model (6):• Unavoidable:

Some nodes jabber continuously.

Some sensors, “See ‘reds’ under their beds”

In a real deployment a few nodes may be compromised.

• Need mixed probabilistic and worst case analysis framework. e.g., detection and tracking.

• Also need robustness with respect to event loss.

• Would greatly improve the viability of security.

20


Small Antennas (5):• We build a proper mono-pole

antenna (per user manual). It worked great. In almost all environments.

Small Antennas (5):• We build a proper mono-pole

antenna (per user manual). It worked great. In almost all environments. Of course, it’s useless.

• Building small antennas is hard. Need to adapt to near-field

environment (e.g., loading). Incentive for longer

wavelength (e.g., foliage). Software adjustable antennas.

• High dielectric antennas.

21

Fundamental Scientific Issues

Signal Processing Power (8):• For a technology family a joules-

per-bit-op are nearly constant over a wide range of node size.

• Key the vision of small nodes. Given a fixed energy supply, not

much incentive to use a more powerful node.

Mica2 gets ~3GbOPJ. State of the art ~20GbOPJ. Not ~300 GbOPJ.

• Moore’s law helps this metric slowly (comported to intuition). Doubles ~4 to 6 years.

Signal Processing Power (cont):• Big wins can be had.

ASICS are typically 400x more efficient.

Fully custom designs may be 1000x more efficient.

• Requires use of non-GPP. DSPs 4 to 12x. PolyMorphic computing. FPGAs 100x.

• The alternative is to use complex signal processing algorithms. Clearly part of the vision. May not be enough.

22

Fundamental Scientific Issues (cont.)

N-Log-N (8):• Taxonomy of scaling:

O(N); centralized/monolithic. O(sqrt(N)) or O(cbrt(N)); non-

scalable. O(N Ln(N)); quasi-scalable. O(N); absolutely scalable.

N-Log-N (8):• Taxonomy of scaling:

O(N); centralized/monolithic. O(sqrt(N)) or O(cbrt(N)); non-

scalable. O(N Ln(N)); quasi-scalable. O(N); absolutely scalable.

• However, can’t implement a spanning tree with motes. Comm. range limits size.

• N ln(N) implies Log_2(N) layers. i.e., 10 to 20 layers (not 2).

N-Log-N (cont.):• Not going to build 13 different

sized motes.• But we could built

configurable motes. SDR allows you to vary the

comm. range. You can vary the clock rate. You may even be able to

reconfigure the CPU. Use a sentry-like service to

vary the power rate.

• Still need 2 or 3 types of hardware, but not 10 to 20.

23

Fundamental Scientific Issues (cont.)

Programmable Analog Triggers(7):

• Negligible power, wakeup-triggers can be build for most <sensor, target, app> triplet. Allow lower duty cycles. Simply software; no polling of

environment.

• However, such triggers would have to be configurable in-situ.

• Need the analog counterpart to FPGAs.

Signal Processing Methods (7):• Legacy signal processing

methods assume: Far-field. Essence of problem is extracting

signal in low SNR. Abundant computation.

• Next generation of signal processing will need to assume: Near-field. Special structure of signals are

observable and useful. Essence of problem is finding the

few high SNR signals. Computation is battery.

24

Fundamental Scientific Issues

Non-Radio Comms (6):• Short range and low bandwidth

may not favor RF comm. Acoustic comm. may work. E-field comm. will work. Laser comm. IRDA.

Chaos Theory (5):• A lot of work; not many

answers.• Controlling emergent behavior

will eventually be a critical problem.

25

Laundry Lists Affect

• A major problem with “top-10” lists is they invite a “Laundry List” response from the audience.

• I’m not trying to create check-list; I’m suggesting priorities.

26

Weighting Functions



Rel

ativ

eE

ffo

rt

Too Commercial

Too Easy

Researcher's Perspective



Rel

ativ

eE

ffo

rt

Too High Risk

System Integrator Prespective



Rel

ativ

eE

ffo

rt

Not Yet R

eady

DOD Transition Prespective

Low Risk

Frequently Asked Questions

Appendix I

28

Disruptive Technology 101

• Positive feedback exists in technology adoption. Sales volume -> lower costs -> sales volume.

• If the feedback is strong enough, the timing of the technology transition becomes chaotic. Sensitive to such small events as to approach randomness.

• New technology may be extremely difficult. The P4 design team was bigger than the Manhattan Project design team.

• The pace of change is fast.• It’s not always clear which technology will win. • The new technology may be in an unrelated field.

Thin film disks required replacing lots of MEs with EEs.

This may be why technologists fail;

it’s not why (properly managed) technology companies fail.

29

Disruptive Technology 101 (cont.)

• Most abrupt changes in technology are not disruptive. Most of the time the leader in the old technology is the first-mover and the

eventual leader in the new technology.

• The disruptive transitions occur when performance outpaces customer’s needs.

• Successful customers and companies anticipate the sustaining changes.

• Disruption occurs when the “low-tech” solution wins.

• New tech. Under-performs, but has other advantages. Usually fundamentally different

advantages.

• New market becomes large and subsumes old market. Time

Perfo

rman

ce

Tech. A

Tech. B

Tech. C

Tech. D

CustomerNeeds

Driving force is that Moore’s law outruns any sensible growth in

demand.

30

Model of Synchronization

robust nest systems minitask report lockheed martin mit mitre osu uc berkeley university of virginia...

Documents