low-power wireless video system advisor: professor alex doboli students: christian austin artur...
TRANSCRIPT
Low-Power Wireless Video System
Advisor: Professor Alex Doboli
Students: Christian Austin
Artur Kasperek
Edward Safo
Objective
Establish a low-power wireless client/server streaming video system. Use a multimedia
standard amenable to wireless networks.
Apply hardware software co-design techniques to reduce the power used by the system’s clients.
Server
PDA
Access Point
802.11B11M bit
ŸR unningV ideo ServerŸM ainta ins
databaseof M PEG 4
files
ŸR unningM PEG -4 C lientSoftware
ŸStream sM PEG 4
video from serverthrough w ire lesscom m unication tothe Laptop/PD A
ŸR unningM PEG -4C lientSoftware
Hardware/Software Co-Design
Design methodology that splits a computer system’s design between hardware and software in an effort to improve some feature of the system.Partitioning targets low power consumption
in this design.Achieved by relocating the functionality of
high power sections of code to specialized hardware.
Project FlowDecide on a multimedia standard.Software.
Hardware.
Functional testing and hardware power analysis.
Design software from scratch . Find and analyze existing software.
Isolate high power sections of software for a hardware port. Determine a hardware architecture.
Hardware tuning for lower power consumption.
Multimedia Standard
MPEG-4 was a good match for the system’s requirements.What is MPEG-4?Object based video compression and
decoding standard.New object based compression technique
compresses objects, rather than frames.Objects are distinct entities in a scene;
information can be associate with each one.Builds on previous MPEG and H.263
standards.
MPEG-4 Framework
M onitor
Sync Layer
DM IF Layer
Audiodecoder
VideoDecoder
AudioStream s
SceneDescription
Stream s
VideoStream s
ProtocolIndependent
Stream s
Scene Com positor
VideoObjects
AudioObjects
SceneDescriptionInform ation
LocalStream s
Netw orkStream s
MPEG-4Client
Framew ork
M ultimediaContent
M PEG-4Data
Stream s
Why Use MPEG-4?
Non-proprietary standard.High compression makes streaming over low bandwidth network practical (e.g. wireless).Adjustable resolution coding allows for video continuity/quality trade off. High bit-rate yields better quality video at the
expense of lost frames…
Robust error resilience over noisy channels.Emerging standard. Superset of previous MPEG standards.
Object Based Compression
Video Scenes defined as a composition of objects in space at an instant in time. Object color defined by pixel chrominance and
luminance values; shape is defined by an alpha mask.
Object and bounding rectangle called Video Object Plane (VOP).
Each object compressed separately. Main reason for improved compression.
Block based encoding scheme extended to handle arbitrary shaped objects.
Compression Illustration
Transparent Macroblocks. Carry no information.
Boundary Macroblocks. Compressed using
block based scheme after padding.
Opaque Macroblocks. Compressed as is using
block based scheme.
O bject
Bounding Rectangle
Boundary M acroblock
Transparent M acroblock
O paque M acroblock
Software DecisionsUsed Open source MPEG-4 client and server software. Darwin Streaming Server by Apple. MPEG4IP, an open source project at Sourceforge.
Why Open Source? Implementation of a video server was not an
objective. Design of software from scratch was not practical
given the time constraints.
Locating Power Intensive Code
Hardware power measurement. Accurate measurement requires expensive
hardware.
Power measurement using software. Instruction level power estimation. SimplePower developed at Penn State.
Software profiling. No direct power measurements. Begin looking for high power sections of code in
computationally intensive areas of code. GPROF or Visual Studio.
The Inverse Discrete Cosine Transform (IDCT)
Highly utilized code. Used each time a macroblock is decoded.
Computationally Intensive. Inherent nested loop structure.
High frequency of memory accesses. Results in elevated power consumption.
2
Nv 0
N 1
u 0
N 1
cu cv ( )f ,u v
cos
1
2
( )2 x 1 u
N
cos
1
2
( )2 y 1 v
N
1
2u 0
1 otherwise
cucv
1
2v 0
1 otherwise
IDCT in an MPEG-4 Decoder
An MPEG-4 decoder consists of more than the IDCT
Motion Compensation
VariableLength
Decoding
VOPConstruction
Shape Decoder
DC and ACPrediction
InverseQuantization
IDCTInverseScan
ShapeStream(Alpha
Mask Data)
MotionStream
TextureStream
(MacroblockData)
Hardware Requirements
An economical FPGA with a large gate equivalence.
A fast interface to the FPGA.The hardware will implement a time critical
function of an MPEG-4 decoder.
Peripheral memory, which the FPGA can use as a buffer for IDCT blocks.
Spartan-II 200 PCI Board
200, 000 gate equivalent Xilinx Spartan-II FPGA.
32-bit PCI interface.
8 MB on-board memory.
JTAG interface
ISP PROM
PCI Core
PCI was the best solution for a high transfer rate interface.
Need to interface IDCT design to PCI Bus.
Xilinx LogiCore provides a PCI front end for the IDCT design. Abstracts the details of the PCI
specification away from the IDCT design.
Hardware Implementation
IDCT hardware design considerations.Low power is primary concern, but design
size and speed are also important.
Procedure. Design an IDCT architecture in terms of a functional unit block diagram. Code the design in VHDL. Write a driver with an API that maps to the hardware’s functions. Synthesize and place and route the design.
IDCT ArchitectureDecodes an 8X8 block of IDCT coefficients.Uses onboard memory as buffer for fetching and storing inputs. Less CPU intervention.
Performs two 1-D IDCTs. First half of data path performs 1-D IDCT on each
row vector of the 8X8 input macroblock matrix. Row results stored in an 8X8, transposed, and
used as inputs to the second half of the data path. Second half of data path performs another 1-D
IDCT on each of the column vectors of its 8X8 input matrix, completing the 2-D IDCT of the macroblock.
Architecture Block Diagram
Memoryinput
control
IDCTCONTROL
Reg 1
4 mult/3add
4 mult/3add
4 mult/3add
4 mult/3add
4 mult/3add
4 mult/3add
4 mult/3add
4 mult/3add
d0,2,4,6
d1,3,5,6
CoefficientTable
Reg 2
A1.even
A2.even
A3.even
A0.odd
A1.odd
A2.odd
A3.odd
A0.even
x0
x1
x2
x3
x4
x5
x6
x7
x0
x4
x1
x5
x2
x6
x3
x7
Butterfly
Butterfly
Butterfly
Butterfly
8X8Transpos
eReg
D0
D7
D1
D6
D2
D5
D3
D4
4 mult/3add
4 mult/3add
4 mult/3add
4 mult/3add
4 mult/3add
4 mult/3add
4 mult/3add
4 mult/3add
d0,2,4,6
d1,3,5,6
Reg 3
A1.even
A2.even
A3.even
A0.odd
A1.odd
A2.odd
A3.odd
A0.even
x0
x1
x2
x3
x4
x5
x6
x7
x0
x4
x1
x5
x2
x6
x3
x7
Butterfly
Butterfly
Butterfly
Butterfly
Reg 4
D0
D7
D1
D6
D2
D5
D3
D4
Memoryoutputcontrol
Control
Address
Data
Control
Address
Data
ControlInputs
ControlOutputs
Mulltiplier Mulltiplier Mulltiplier Mulltiplier
Adder
Coeff 1 Data 1 Coeff 2 Data 2 Coeff 3 Data 3 Coeff 4 Data 4
4 mult/3 add
Adder
Subtractor
Butterfly
In1
In2
Architecture Features
Pipelined design for increased throughput and power reduction.
Exploits Symmetry of IDCT coefficient matrix. Breaks 8X8 matrix operation into two 4X4 matrix
operations and butterfly operations.
Parallel multiply and addition operations perform two 4X4 matrix multiplications in parallel. Speed up of IDCT’s repetitive matrix operations.
Power Reduction
Clock Isolation.Add additional logic to isolate sections of
logic from the clock when not in use.
Glitch reduction.Balance the number of synthesized logic
levels.Duplicate resources instead of sharing
them. Increase amount of pipeline registers.
Goals and Applications
Demonstrate that a low-power wireless video system is practical. Design for a power constrained, low bandwidth
PDA.
Applications: Interactive shopping.
Request video of product information while shopping. Multimedia preview.
Preview movie before buying or renting; watch music video while previewing new album.
Any Questions?