wavelet spectral dimension reduction of hyperspectral imagery on a reconfigurable computer tarek...
TRANSCRIPT
Wavelet Spectral Dimension Reduction of Hyperspectral Imagery on a Reconfigurable Computer
Tarek El-Ghazawi1, Esam El-Araby1, Abhishek Agarwal1, Jacqueline Le Moigne2, and Kris Gaj3
1The George Washington University,2NASA/Goddard Space Flight Center,
3George Mason University{tarek, esam, agarwala}@gwu.edu, [email protected], [email protected]
El-Ghazawi 2 E229 / MAPLD2004
Objectives and IntroductionInvestigate Use of Reconfigurable Computing for
On-Board Automatic Processing of Remote Sensing Data
Remote Sensing Image Classification
Applications: Land Classification, Mining, Geology, Forestry, Agriculture, Environmental
Management, Global Atmospheric Profiling (e.g. water vapor and temperature profiles), and Planetary Space missions
Types of Carriers:
Airborne Spaceborne
El-Ghazawi 3 E229 / MAPLD2004
Types of Sensing
Mono-Spectral Imagery 1 band (SPOT ≡ panchromatic)
Multi-Spectral Imagery 10s of bands (MODIS ≡ 36 bands, SeaWiFS ≡ 8 bands, IKONOS ≡ 5 bands)
Hyperspectral Imagery 100s-1000s of bands (AVIRIS ≡ 224 bands, AIRS ≡ 2378 bands)
Multispectral / Hyperspectral Imagery Comparison
El-Ghazawi 4 E229 / MAPLD2004
Different Airborne Hyperspectral Systems
AISA AURORA AVIRIS
GER
El-Ghazawi 5 E229 / MAPLD2004
Solutions Automatic On-Board Processing
Reduces the cost and the complexity of the On-The-Ground/Earth processing system
larger utilization for broader community, including educational institutions
Enables autonomous decisions to be taken on-board faster critical decisions
Applications:» Future reconfigurable web sensors
missions » Future Mars and planetary exploration
missions
Dimension Reduction*
Reduction of communication bandwidth
Simpler and faster subsequent computations
Why On-Board Processing?
Problems Complex Pre-
processing Steps: Image Registration /
Fusion
Large Data Volumes Large cost and
complexity of the On-The-Ground / Earth processing systems
Large critical decisions latency
Large data downlink bandwidth requirements
* Investigated Pre-Processing Step
El-Ghazawi 6 E229 / MAPLD2004
Solutions
Reconfigurable Computers (RCs) Higher performance (throughput and
processing power) compared to conventional processors
Lower form / wrap factors compared to parallel computers
Higher flexibility (reconfigurability) compared to ASICs
Less costs and shorter time-to-solution compared to ASICs
Why Reconfigurable Computers?
On-Board Processing Problems
High Computational Complexities Low performance for traditional
processing platforms
High form / wrap factors (size and weight) for parallel computing systems
Low flexibility for traditional ASIC-Based solutions
High costs and long design cycles for traditional ASIC-Based solutions
IntroductionIntroduction
El-Ghazawi 8 E229 / MAPLD2004
Hyper Image
Bands
Columns
Ro
ws
224 bands
512 pixels
512
pixe
ls
Data Arrangement
Pix
els
≡ (
Ro
ws
x C
olu
mn
s)
Pa
rall
el
Co
mp
uti
ng
Sc
op
e,
Re
co
nfi
gu
rab
le C
om
pu
tin
g 2
nd S
co
pe
BandsReconfigurable
Computing 1st Scope
Matrix Form
El-Ghazawi 9 E229 / MAPLD2004
Data Arrangement (cnt’d)
Hyper Image
Bands
Columns
Ro
ws
(0,0) (0,1) (0,cols-1)
(rows-1,0) (rows-1,cols-1)
Array Form
8 Bits
012..
Bands-1
012..
Bands-1
012..
Bands-1
0
1
(Pixels-1)
(0,0)
(0,1)
(rows-1,cols-1)
Pixels = Rows X Columns
El-Ghazawi 10 E229 / MAPLD2004
AVIRIS: SALINAS’98 (217x512 by 192 bands)
AVIRIS: INDIAN PINES’92 (400x400 by 192 bands)
Examples of Hyperspectral Datasets
El-Ghazawi 11 E229 / MAPLD2004
Dimension Reduction Techniques Principal Component
Analysis (PCA): Most Common Method
Dimension Reduction Does Not Preserve Spectral
Signatures Complex and Global
computations: difficult for parallel processing and hardware implementations
Wavelet-Based Dimension Reduction: Preserves Spectral
Signatures High-Performance
Implementation Simple and Local Operations
Multi-Resolution Wavelet Decomposition of Each Pixel 1-D Spectral Signature (Preservation of Spectral Locality)
El-Ghazawi 12 E229 / MAPLD2004
2-D DWT (1-level Decomposition)
LL
LH HH
HL
2L
H 2
L
H
2
2
L
H
2
2
L H
1-D DWT
El-Ghazawi 13 E229 / MAPLD2004
2-D DWT (2-level Decomposition)
L 2
H 2
L 2
H 2
L 2
H 2
L 2
H 2
L 2
H 2
L 2
H 2 LH HH
HL
First Level Second Level
El-Ghazawi 14 E229 / MAPLD2004
Wavelet-Based vs. PCA (Execution Time, 500 MHz P3)
Timer-Salinas98
104.178
122.173
158.583
7.696 7.6317.677 9.0037.715
94.82490.634
0
20
40
60
80
100
120
140
160
No.of PC/Level of Decomp.
Tim
e (
se
c)
Wavelet
PCA
12/4 24/36/5 48/2 96/1
Complexity: Wavelet-Based = O(MN) ; PCA = O(MN2+N3)
El-Ghazawi 15 E229 / MAPLD2004
6/5 12/4 24/3 48/2 96/1Timer GLOBAL 7.696 7.677 7.631 7.715 9.003
IO_R 0.406 0.412 0.411 0.412 0.41Comp. 7.253 7.19 7.069 6.692 7.939IO_W 0.037 0.075 0.151 0.311 0.654
No.of PC/Level of Wavelet DecompositionWAVELET PCA
6/5 12/4 24/3 48/2 96/1Timer GLOBAL 90.634 94.824 104.178 122.173 158.583
IO_R 0.423 0.395 0.395 0.394 0.394Comp. 90.173 94.355 103.633 121.478 157.568IO_W 0.038 0.074 0.15 0.301 0.621
No.of PC/Level of Wavelet Decomposition
Complexity: Wavelet-Based = O(MN) ; PCA = O(MN2+N3)
Wavelet-Based
5%
92%
3%
IO_R
Comp.
IO_W
PCA
0%
100%
0%
IO_R
Comp.
IO_W
Wavelet-Based vs. PCA (cnt’d) (Execution Time, 500 MHz P3)
El-Ghazawi 16 E229 / MAPLD2004
Wavelet-Based vs. PCA (cnt’d) (Classification Accuracy)
Implemented on the HIVE (8 Pentium Xeon/Beowulfs-Type System) 6.5 times faster than sequential implementation
Classification Accuracy Similar or Better than PCA
Faster than PCA
El-Ghazawi 17 E229 / MAPLD2004
The Algorithm
Decompose Each Pixel to Level L
Read Data
Read Threshold (Th)
Write Data
Get Lowest Level (L) from Global Histogram
Remove Outlier Pixels
OVERALL
Compute Level for Each Individual Pixel(PIXEL LEVEL)
DWT Coefficients (the Approximation)
Reconstructed Approximation
No
Yes
Compute Correlation (Corr) between Orig and Recon.
Add Contribution of the Pixel to Global Histogram
Corr < Th
Decompose Spectral Pixel
Save Current Level [a] of Wavelet Coefficients
ReconstructIndividual Pixel to Original Stage
Get Current Level [a] of Wavelet Coefficients
PIXEL LEVEL
Prototyping Wavelet-Based Dimension Reduction of Hyperspectral Imagery
on a Reconfigurable Computer, the SRC-6E
Prototyping Wavelet-Based Dimension Reduction of Hyperspectral Imagery
on a Reconfigurable Computer, the SRC-6E
El-Ghazawi 19 E229 / MAPLD2004
Hardware Architecture of SRC-6E
El-Ghazawi 20 E229 / MAPLD2004
SRC Compilation Process
Objectfiles
Application sources Macro sources
MAP CompilerP Compiler
Logic synthesis
Place & Route
Linker
.v files
.bin files
.ngo files
.o files .o files
Applicationexecutable
Configurationbitstreams
HDLsources
Netlists
.c or .f files .vhd or .v files
Objectfiles
Application sources Macro sources
MAP CompilerP Compiler
Logic synthesis
Place & Route
Linker
.v files
.bin files
.ngo files
.o files .o files
Applicationexecutable
Configurationbitstreams
HDLsources
Netlists
.c or .f files .vhd or .v files
El-Ghazawi 21 E229 / MAPLD2004
Top Hierarchy Module
L1:L5
Y1:Y5
THGTE_1: GTE_5
Correlator
X
DWT_IDWT
Level
N
LlevelMUX
Histogram
El-Ghazawi 22 E229 / MAPLD2004
Decomposition and Reconstruction Levels of Dimension Reduction (DWT_IDWT)
L0
Level_5
L1L 2
L2L 2
L3L 2
L4L 2
L5L 2
Level_4Level_3Level_2Level_1
X
2
L’
2
L’
2
L’
2
L’
2
L’
2
L’
2
L’
Y2
D
2
L’
2
L’
2
L’
Y4D
2
L’
2
L’
2
L’
2
L’
2
L’
Y5Y3
D
Y1
D
El-Ghazawi 23 E229 / MAPLD2004
FIR Filters (L, L’) Implementation
+
RegisterC(1)
RegisterC(2)
RegisterC(3)
RegisterC(n)
Input Image D(i)
Output Image F(i)
X
X
X
X
…
El-Ghazawi 24 E229 / MAPLD2004
Correlator Module
X
Yi
termxx
termyy
termAB
termxy term2xy
TH TH2
MULTtermxxtermyy
MULT
MULT
MULT
Shift Left
(32 bits)
CompareGTE_i(Increment
Histogrami)
termAB
termAB
N
bitsNBABNAtermN
iN
iN
iiAB 2log216
2
16
22
2),(
TH
termterm
termyx
yyxx
xy
El-Ghazawi 25 E229 / MAPLD2004
Histogram Module
GTE_3
GTE_2
GTE_1
GTE_4
GTE_5
Update Histogram Counters
Level Selector
cnt_3
cnt_2
cnt_1
cnt_4
cnt_5
Level
El-Ghazawi 26 E229 / MAPLD2004
Resource Utilization and Operating Frequency
El-Ghazawi 27 E229 / MAPLD2004
Measurements Scenarios
Read
Data
Write
Data
MAP
Free
Configuration + End-to-End time (SW)
End-to-End time with I/OAllocation
time
MAP
Alloc.
MAPFunction
Computations Transfer-Out
OBM
to CM
CM to
OBMCompute
End-to-End time (HW)
Transfer-In
Repeat
nstreams times
Release
time
µP Functions
El-Ghazawi 28 E229 / MAPLD2004
SRC Experiment Setup and Results Salinas’98
217 X 512 Pixels, 192 Bands = 162.75 MB Number of Streams = 41 Stream Size = 2730 voxels ≈ 4 MB
Non-Overlapped Streams TDMA-IN = 13.040 msec TCOMP = 0.62428 msec TDMA-OUT = 22.712 msec TTotal = 1.49 sec Throughput = 109.23 MB/Sec
Overlapped Streams TDMA = 35.752 msec TCOMP = 0.62428 msec Xc = 0.0175 Throughput = 111.14 MB/Sec
Speedupnon-overlapped = (1+ Xc) =
1.0175 (insignificant)
Compute DMA-OUT
DMA-IN
Compute DMA-OUT
DMA-IN
Compute DMA-OUT
DMA-IN
Compute
DMA-OUTDMA-IN
Compute
DMA-OUTDMA-IN
Compute
DMA-OUTDMA-IN
TTOTAL
Compute DMA-OUTDMA-IN
TDMA-IN TDMA-OUTTCOMPUTATIONS
El-Ghazawi 29 E229 / MAPLD2004
Execution Time
Salinas'98
14.27
20.2323.21
30.2233.05
8.60
12.34
16.16
20.44 20.21
1.49 1.49 1.49 1.49 1.491.47 1.47 1.47 1.47 1.47
0
5
10
15
20
25
30
35
40
1 2 3 4 5
Level of Decomposition
Tim
e (s
ec)
P3 (500MHz)
Intel Xeon (1.8GHz)
SRC-6E (Non-Overlapped)
SRC-6E (Overlapped)
El-Ghazawi 30 E229 / MAPLD2004
Distribution of Execution Times
El-Ghazawi 31 E229 / MAPLD2004
Speedup Results
Salinas'98
0.00
5.00
10.00
15.00
20.00
25.00
1 2 3 4 5
Level of Decomposition
Sp
eed
up
No Overlapping Speedup (P3,500MHz)
No Overlapping Speedup (Xeon,1.8GHz)
Overlapping Speedup (P3,500MHz)
Overlapping Speedup (Xeon,1.8GHz)
El-Ghazawi 32 E229 / MAPLD2004
Concluding Remarks
We prototyped the automatic wavelet-based dimension reduction algorithm on a reconfigurable architecture
Both coarse-grain and fine-grain parallelism are exploited
We observed a 10x speedup using the P3 version of SRC-6E. From our previous experience we expect this speedup to double using the P4 version of SRC machine
These speedup figures were obtained while I/O is still dominating. The speedup can be increased by improving I/O Bandwidth of the reconfigurable platforms