challenges managing large-scale wireless networks
DESCRIPTION
Challenges managing large-scale wireless networks. Lakshminarayanan Subramanian Courant Institute of Mathematical Sciences New York University Joint work with many others. Management Complexity Ladder. Indoor Wireless Access Point Network Multi-hop indoor wireless networks - PowerPoint PPT PresentationTRANSCRIPT
Challenges managing large-scale wireless networks
Lakshminarayanan SubramanianCourant Institute of Mathematical Sciences
New York University
Joint work with many others
Management Complexity Ladder Indoor Wireless Access Point Network
Multi-hop indoor wireless networks
Outdoor Mesh Networks
Rural Wireless Networks (Long-distance +mesh)
2
Why is management hard? Potential causes for performance degradation
External problems Interference, Channel fluctuations
Network performance issues Incorrect ETX, Channel assignment. Routing problems
Physical issues Radio separation issues Unreliable power (Huge problem in rural wireless)
Software issues + Configuration Forwarding problems, unexpected packet drops
Mundane problems Loose pigtail, Card misbehavior, Card stops working
3
4
Why is it hard to fix? Potential causes are huge and interdependent No back channels (in multi-hop cases) Measurements vary by the second Environmental fluctuations Power fluctuations Software behavior on wireless boards is not
very predictable Climbing street poles and towers is actually
not fun!
5
Some experiences ROMA: Multi-radio indoor wireless network (Aditya,
Jinyang)
CitySense: Outdoor wireless mesh network (Matt T, Matt W)
WiLDNet: Long-distance WiFi networks (Rabin, Sergiu, Sonesh, Eric, Manuel)
WiRE architecture (Matt T, Aditya)
Multi-radio mesh promises greater throughput
Cannot transmit concurrently
Intra-path interferenceEliminate
gateway
Cannot transmit concurrently
Inter-path interferencegateway
gateway
Reduce
Physical constraints Compact nodes few radios per node
Link losses, link variability, external load
Link variabilities
Two radios report with diff channel conditions ETX measurements are skewed Channel 1 works very poorly, channel 11 works well!!!
ROMA: basic ideaSingle-radio gateway
Each radio in a multi-radio gateway acts as an independent gateway
<C2,C3>
<C1,C2>
<C2,C3>
<C3,C4>
C1
C2
C3
<C1,C2>
<C2,C3>
<C3,C4>
<C1>
Routing metric must consider worst link
Single-radio route metric:
€
ETTi∑1 2 1
1 2 1 Multi-radio route metric
€
ETTi∑
Path throughput is limited by worst link
ETT over-estimates link performance Link metric should incorporate:
Link variability highly variable links result in unpredictable throughput
External load
€
f (1
pa − pv) * (1+ L)
Conservative metric CETT =
Link metric
€
f (1
pa)ETT =
: average delivery ratio
: deviation of delivery ratio
: fraction of time channel is busy with external traffic €
pa
€
pv
€
L
Our Indoor Testbed
NSC Geode Processors, 128MB RAM, 1GB Flash Implemented on the Click Modular Router Patched Madwifi 0.9.3.3
Aggregate performance
Aggregate throughput (Mbps)
CD
F
2 identical channels1 common, 1 assigned channel ROMA
Setup: 9 UDP flows from 3 gateways to non-gateway nodes
ROMA is able to utilize more channels to reduce inter-path interferenceROMA’s median aggregate throughput is 1.4X or 2.1X of alternative designs
14
WiFi-based Long Distance Networks
WiLD links use standard 802.11 radios
Longer range up to 150km Directional antennas (24dBi) Line of Sight (LOS)
Why choose WiFi: Low cost of $500/node
Volume manufacturing No spectrum costs Customizable using open-
source drivers Good datarates
11Mbps (11b), 54Mbps (11g)
15
AirJaldi Network
• Tibetan Community• WiLD links + APs• Links 10 – 40 Kms • Achieve 4 – 5 Mbps • VoIP + Internet• 10,000 users
Routers used: (a) Linksys WRT54GL, (b) PC Engines Wrap Boards, Costs: (a) $50, (b) $140
16
Aravind Eye Hospital Network
• South India• Tele-ophthalmology• All WiLD links• Links 1 – 15 Kms long• Achieve 4 – 5 Mbps • Video-conferencing• 3000 consultations/month
Routers used: PC Engines Wrap boards, 266 Mhz CPU, 512 MB Cost: $140
17
New World Record – 382 Kms
Pico El Aguila, Venezuela
Elev: 4200 meters
18
Deployment
19
Overall Impact
Both networks financially sustainable 50000 patients/year being scaled to 500000
patients/year Over 30000 patients have recovered sight
20
Experience with WiLD Networks In the field, point-to-point performance is bad On a 60km link in Ghana
We get 0.6 Mbps TCP vs 6 Mbps UDP On a relay (single channel)
We get only 2 Mbps TCP
21
Problem: Propagation Delay Large propagation delay high collision probability
A B
22
Design Choices for WiLDNet
Use Sliding Window flow control 802.11 MAC ACKs disabled Packet batches sent every slot Slot allocation determined by demand
Replace CSMA with TDMA on every link Alternate send and receive slots
23
Inter-Link Interference
A
B
C
1
2
Simultaneous Receive
A
B
C
1
2
Simultaneous Send Send & Receive
Disable CCA 12dB isolation
A
B
C
1
2
24
Channel Loss: From external traffic
Strong correlation between loss and external traffic
Source (A) and interferer (I) do not hear each other
A BI
25
Sustainability Challenges
Bad quality grid power Higher component failures, more downtimes
Limited local expertise Local operation, maintenance, and diagnosis difficult
Lack of alternate connectivity Complicates remote diagnosis and management
Remote locations Traveling is difficult and infrequent (often once in 6
months)
26
27
28
Poor Quality Power
Spikes and Swells:
• Lost 50 power adapters
• Burned 30 PoE ports
Low Voltages:
• Incomplete boots
• HW watchdog fails
Frequent Fluctuations:
• CF corruptions
• Battery Damage
Volt
ag
e
Ran
ge
Number of Instances seen over 6 weeks
29
HW Faults
Instances* Description Total Downtime
63 Router board not powered 63 days
7 Router powered but hung 10 days
21 Router powered but not connected to remote LAN (burned ethernet ports)
34 days
3 Router on, but wireless cards not transmitting (low voltage)
2 days
3 Router on, but pigtails not connected 45 days
1 Router on, but antenna Line-of-Sight blocked
8 weeks
Hardware Faults at Aravind (in 2006)
*Conservative Estimate>90% of faults are power-related
30
SW Faults
Instances* Description Total Downtime
4 No default gateway specified 4 days
3 Wrong ESSID, channel, mode 3 days
2 Wrong IP address 2 days
2 Misconfigured routing 3 days
Software Faults at Aravind (in 2006)
*Conservative Estimate
31
Solutions
1. Power1.1 Low Voltage Disconnect1.2 Low-cost Solar Power Controller
2. Data Collection and Monitoring
3. Alternate Network Entry Points
4. Recovery Mechanisms
5. Safe Software
32
Power: Low Voltage Disconnect
Low Voltage Disconnect Circuit (LVD)
Disconnect load at low voltage
Prevent battery over-discharge and hung routers
Without LVDs, roughly 50 visits per week for manual reboots at AirJaldi
Off-the-shelf LVDs oscillate too much Too many automatic reboots
We designed new LVD circuit with better delay No more manual visits or reboots!
33
Power: Low-cost Solar Power Controller
Tackle spikes, swells and enable power at remote sites
Features PPT (peak power tracking) => 15% more power draw LVD + trickle charging => Doubles battery life Voltage regulator => No spikes and swells Power-over-Ethernet => Remote Mgmt $70 (compared to $300 commercial units)
Have not lost any routers yet in 1 year
34
Operational ResultsFault Incident Counts
0
10
20
30
40
50
60
WeeklyManualReboots(AirJaldi)
Number ofPower-relatedRouter Faults
(Aravind)
ProlongedDowntimesgreater than
1day (Aravind)
CF CardCorruptions
(Aravind)
Incidents
Count
Before: Jan 07 - Jun 07
After: Jul 07 - Dec 07
35
Operational Results
Equipment Supply
Installation
Management
Maintenance
Our support
Aravind
Local Vendor
Jan’06 – Jun’06
Migration at Aravind
2007: 5 more clinic links
Jun’07 – Dec’07
Jul’06 – Dec’06
Jan’07 – Jun’07
WiRE Architecture
36
Questions?
Thank you!