Download - NETWORKED EMBEDDED SYSTEMS
1
NETWORKED EMBEDDED SYSTEMS
SRIKANTH SUBRAMANIAN
2
Agenda Overview Networked Embedded
Systems (NES) NES built on ASIPs NES built on General Purpose
Processors Quantitative performance comparison of
various NES for TCP and UDP Online HW/SW partitioning of NES
3
Overview NES are employed in Devices that form
the backbone of communication networks
Routers, Network Bridges (Switches), Telephone Switches etc
Perform the task of data processing, Network Connectivity and Service delivery
4
NES were mostly implemented on Single Purpose Processors (SPPs)
SPPs - No Flexibility Pose problems when the requirements
change Alternatives : General purpose
processors or ASIPs (Network Processors)
Overview
5
Network Processors ICs specifically built for networking
application Software Programmable Devices Optimized features for networking
applications : pattern Matching, Queue management, Data bit field manipulation etc.
6
Stareast:
Consists of a Baseboard and two daughter boards
Baseboard contains Intel IXP425 (533MHz) Processor
NES based on ASIPs
7
Scalable performance, Reduced power consumption, Low cost
Deliver a range of data, voice, security and I/O features
Distributed processing Architecture Combination of Intel XScale (an ARM
Processor) and 3 Network Processing Engines (NPEs)
Intel IXP425 Processor
8
XScale - Control plane NPEs - computationally intensive data Parallel Operation of Xscale and NPEs
Intel IXP425 Processor
9
Netgear WAG302: Most commonly
used wireless Access Point
Based on Intel IXP422B processor (266 MHz)
NES based on ASIPs
10
Soekris net4826-50
Based on AMD Geode Processor (266 MHz)
NES based on GPPs
11
Used to create fully customized routers and access points
Low cost, advanced communication features
AMD Geode processor comes under the X86 processor family
NES based on GPPs
12
Quantitative Performance Comparison Objectives:a) Performance comparison Between NES
based on GPP and NES based on ASIPb) Performance comparison between Two
NESs based on ASIP with one running on a commercial Operating system and another running Open source operating system
13
Experimental Setup Three Stareast boards:a) One running Montavista 4.0 OSb) The other two running Snapgear
versions 3.1 and 3.3 Netgear WAG302 Running openWRT Soekris net 4826 running Voyage
Linux (Debian) distribution for X86 processors
14
Experimental Setup
15
Experimental Setup To Study the behavior of the NESs, D-
ITG traffic generator is used Can generate IPv4/IPv6 traffic
replicating the appropriate stochastic processes for both IDT( Inter Departure time) and PS (packet size)
Collect Statistics of Quality of Service (QOS) parameters: Throughput, Jitter, Packet loss and Delay (Latency)
16
Experimental Analysis NES boards are connected back to
back with the Workstation Testing is performed using both TCP
and UDP in the transport layer Two types of tests are performed:a) Discover the number of packets per
second the devices are able to generate for fixed packet size
b) Measure bit rates, jitter, packet loss for different packet rates and sizes
17
Packet Rate:
Results
18
Bitrate for UDP:
Results
19
Bitrate for TCP:
Results
20
Jitter for UDP:
Results
21
Results Jitter for TCP:
22
Packet Loss for UDP:
Results
23
Conclusions NES based on ASIP running a commercial OS
provides better performance as compared a NES based on GPP running a commercial OS
NES based on ASIP running open source OS are still less efficient as compared to commercial OS
Hence NES using network processors can play a major role in data intensive real time applications.
24
Online HW/SW Partitioning
Need:a) Optimal partitioning of load into HW
and SW during compile time is difficultb) Arrival of new tasks during execution
time c) Failure of a node during run time
25
Graph Theory:
Structure of Network ga = (N,C)
Network Architecture
26
Network Architecture Laplacian Matrix: Given a graph G with n vertices (without
loops or multiple edges), its Laplacian matrix is defined as
3 -1 -1 -1
-1 3 -1 -1
-1 -1 3 -1
-1 -1 -1 3
27
Assumptions w.r.t Networka) Architecture graph is undirectedb) each computational task may be assigned to
each node in the network without restrictionc) HW Reconfiguration Temporal partition: Temporal HW/SW
partition at time t is an assignment of each task p Є P(t) to a resource N ( t ) as well as the indication whether the task is implemented in HW or SW.
Network Architecture
28
Workload characterization: Each task pj Є P(t) causes a unique load wh
j on resource ni Є N ( t ) if implemented in HW and a load of ws
j, if implemented in SW. Load: For HW - The fraction of total area occupied by
the load For SW - The fraction of execution time and
period.
Network Architecture
29
load exchanges between two adjacent nodes are determined in each iteration as:
yk-1c = β(wk-1
i – wk-1j) for all c = {ni, nj} Є C
wki = wk-1
i - ∑ yk-1c
c = {ni, nj} Є C Changing β in each iteration k has shown that the
convergence speed can be drastically improved to exactly m - 1 iterations
Choosing β = 1/גk where 1 ≤ k ≤ m – 1 m – Number of distinct Eigen Values of the Laplacian
matrix
Diffusion Algorithm
30
Optimization Flow
31
Objectives:• Find a bi-partition such that the load is
balanced between HW and SW i.e.; minimize ∑|N| wS
i - ∑|N| wHi
i = 1 i = 1 • Effective Load Balance i.e.; minimize |w’ – max{maxi:niЄN{wS
i }, max{maxi:niЄN{wHi }|
by using Evolutionary Algorithm applied to encode
implementation selection• Diffusion Algorithm only balances load
between nodes not HW/SW load
Optimization Flow
32
Need: It is advisable not to split one process and
distribute it to multiple nodes. This increases the data traffic in the network.
Let ycontkc be the real-valued continuous and
ydisckc the discrete flow on one edge c in iteration
k such that ydisckc doesn’t exceed ycontk
c
ydisckc ≤ ycontk
c + ek-1c with e0
c = 0 ek
c = ycontkc + ek-1
c - ydisckc for all c={ni, nj} Є C
An additional adjustment step is introduced em
c = em-1c - yadj
c
Discrete Diffusion Algorithm
33
Experimental Analysis Evaluation the discrete diffusion
algorithm for different types of a network like meshes with 3x3 or 4x4 nodes, a ring and a chordal ring with 8 nodes
In the beginning all tasks are mapped onto a single resource node
Focus is set on the load error |w’ – wi| and the congestion in the network
34
Experimental Analysis
35
Experimental Analysis
36
Questions?
Thank You