03/12/20101 analysis of fpga based kalman filter architectures arvind sudarsanam dissertation...

39
03/12/2010 1 Analysis of FPGA based Kalman Filter Architectures Arvind Sudarsanam Dissertation Defense 12 March 2010

Upload: cora-gilmore

Post on 27-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1
  • 03/12/20101 Analysis of FPGA based Kalman Filter Architectures Arvind Sudarsanam Dissertation Defense 12 March 2010
  • Slide 2
  • 03/12/20102 Outline Introduction Literature review PolyFSA architecture Architecture analysis Area analysis Error analysis Performance analysis Contributions Future work
  • Slide 3
  • 03/12/20103 Outline Introduction Literature review PolyFSA architecture Architecture analysis Area analysis Error analysis Performance analysis Contributions Future work
  • Slide 4
  • 03/12/20104 Kalman filters for Spacecraft navigation
  • Slide 5
  • 03/12/20105 Kalman filters
  • Slide 6
  • 03/12/20106 Research overview An FPGA based Polymorphic systolic array architecture is proposed to accelerate Kalman filters - Portions of this architecture can be reused for other applications during run-time A comprehensive architecture analysis is presented. Results are presented in terms of area savings for varying performance and precision error.
  • Slide 7
  • 03/12/20107 Outline Introduction Literature review PolyFSA architecture Architecture analysis Area analysis Error analysis Performance analysis Contributions Future work
  • Slide 8
  • 03/12/20108 Hardware design for Kalman filters - Systolic arrays Yeh [7], M. Lu [8] and P. Rao [9] proposed systolic array architectures for Kalman filters based on Faddeev algorithm Cardoso et. al [11] proposed a hardware software co-processor system Profiling is used to guide partitioning by designer C2H [12] tool from Altera used to generate RTL designs But these architectures are not scalable. Some efforts [15-20] target individual linear algebra operations, like matrix inverse.
  • Slide 9
  • 03/12/20109 Error analysis Initial efforts [28-35] were targeted towards analyzing variable precision fixed-point arithmetic Constantinides [36-45] proposed multiple ideas towards error analysis for fixed-point arithmetic Availability of FPGAs has caused a surge in work towards developing variable precision architectures, especially in the floating point domain [46-53]
  • Slide 10
  • 03/12/201010 Performance and area analysis Existing performance and area estimation approaches target a parameter-specific architecture [72] Parameters include: Overall data path width Memory size Number of processing elements Proposed research is also parameter-specific, but looks at latency, precision and input rates of floating point arithmetic units
  • Slide 11
  • 03/12/201011 Outline Introduction Literature review PolyFSA architecture Application analysis Mapping to Systolic array Architecture details Architecture analysis Contributions Future work
  • Slide 12
  • 03/12/201012 Extended Kalman Filter
  • Slide 13
  • 03/12/201013 Faddeev algorithm Faddeev algorithm is a method for efficiently computing the Schur complement (D - CA -1 B) Given matrices A,B,C,D, arrange in matrix M as: Reduce to row echelon form and D-CA -1 B will result in the lower right corner D-CA -1 B
  • Slide 14
  • 03/12/201014 Faddeev algorithm
  • Slide 15
  • 03/12/201015 Faddeev algorithm Single node Boundary nodeInternal node
  • Slide 16
  • 03/12/201016 Mapping to systolic array Simplify data flow Mapping to 1-D Systolic array Folding to make systolic array scalable
  • Slide 17
  • 03/12/201017 Architecture details for boundary PE Details for internal PE are similar
  • Slide 18
  • 03/12/201018 Control flow
  • Slide 19
  • 03/12/201019 Results Target FPGA Xilinx Virtex 4 SX35 Test case is derived from [Ronnback-2000] Performance is compared against a software implementation on a Virtutech Simics PowerPC 750 simulator (Thanks: Rob Barnes [79])
  • Slide 20
  • 03/12/201020 Performance of proposed PolyFSA Overall execution time of EKF on PolyFSA based system architecture and PowerPC Estimated execution of Faddeev algorithm for varying number of PEs and Faddeev Parameters
  • Slide 21
  • 03/12/201021 Outline Introduction Literature review PolyFSA architecture Architecture analysis Area analysis Error analysis Performance analysis Contributions Future work
  • Slide 22
  • 03/12/201022 Architecture analysis During design time, each PE in the proposed PolyFSA is derived for best performance and with highest precision QUESTION: By allowing for degradation in performance and/or tolerating precision error, can we reconfigure the existing PE with a set of smaller PEs?
  • Slide 23
  • 03/12/201023 Design parameters that can be varied Precision of Adder unit (madd) Multiplier unit (mmul) Divider unit (mdiv) Latency of Adder unit (LatAdd) Multiplier unit (LatMul) Divider unit (LatDiv) Input rate of the divider (c_rate)
  • Slide 24
  • 03/12/201024 Area analysis Adder unit
  • Slide 25
  • 03/12/201025 Area analysis Multiplier unit
  • Slide 26
  • 03/12/201026 Area analysis Divider unit
  • Slide 27
  • 03/12/201027 Area analysis Divider unit
  • Slide 28
  • 03/12/201028 Error analysis Top-level flow
  • Slide 29
  • 03/12/201029 Faddeev algorithm - Error vs Precision
  • Slide 30
  • 03/12/201030 Error analysis for EKF
  • Slide 31
  • 03/12/201031 EKF Area Savings vs Error
  • Slide 32
  • 03/12/201032 Performance analysis Major portion of execution time
  • Slide 33
  • 03/12/201033 Calculation of T faddeev Execution time of Faddeev algorithm on the proposed PolyFSA is computed using a simulation model We are interested in observing the impact of performance degradation on resource utilization Results are shown for overall execution of EKF
  • Slide 34
  • 03/12/201034 Performance analysis Vary latency
  • Slide 35
  • 03/12/201035 Performance analysis Vary c_rate
  • Slide 36
  • 03/12/201036 Area versus Performance
  • Slide 37
  • 03/12/201037 3-D Pareto curves
  • Slide 38
  • 03/12/201038 Summary An FPGA based Polymorphic Faddeev Systolic Array (PolyFSA) architecture is proposed to accelerate the compute-intensive kernels of Kalman filters. Hierarchical analysis of the error introduced in results of Kalman filter computations due to reduction in precision is presented. Simulation model to estimate the overall execution time of the Kalman filter algorithm is proposed. Results of architecture analysis are presented in terms of pareto curves.
  • Slide 39
  • 03/12/201039 Future work Proposed methodology architecture design supported by analysis can be applied to design for other applications Design goals can be extended to incorporate Power consumption Design parameters can be extended to include other options Implementation type, FPGA family type etc.