performance analysis of processor
DESCRIPTION
Performance Analysis of Processor. Final Presentation. Performed by: Alexei Iolin 307724211 Alexander Faingersh 306966912 Instructor: Evgeny Fiksman. Agenda. Project Goals MicroBlaze architecture - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Performance Analysis of Processor](https://reader036.vdocument.in/reader036/viewer/2022062520/56815931550346895dc664e6/html5/thumbnails/1.jpg)
Performance Analysis of Processor
Final Presentation
Performed by: Alexei Iolin 307724211Alexander Faingersh 306966912
Instructor: Evgeny Fiksman
![Page 2: Performance Analysis of Processor](https://reader036.vdocument.in/reader036/viewer/2022062520/56815931550346895dc664e6/html5/thumbnails/2.jpg)
Agenda
• Project Goals
• MicroBlaze architecture
• OPB timer/counter and interrupt controller
• Connecting Customized IP to FSL bus
• Our Customized IP
• Time performance measurements
• Current measurements
• Conclusions
![Page 3: Performance Analysis of Processor](https://reader036.vdocument.in/reader036/viewer/2022062520/56815931550346895dc664e6/html5/thumbnails/3.jpg)
Project Goals
• Examination of MicroBlaze calculation abilities by measuring time of running applications and examining Interrupt handler abilities.
• Measuring power consumption by sampling current during program executions.
• Implementing arbitrary application in Hardware (IDCT) and using it as a hardware acceleration for MicroBlaze.
• Implementing the same functionality in C and comparing the results with hardware.
• Adding self written C code for testing FPU.
MicroBlaze is a Soft core Processor developed by Xilinx that meets performance, area-efficiency and low cost targets.Although using the MicroBlaze enables fast system development on a single FPGA, some of the “special” applications run slower than in Hardware IP. We will examine this with EDK environment
![Page 4: Performance Analysis of Processor](https://reader036.vdocument.in/reader036/viewer/2022062520/56815931550346895dc664e6/html5/thumbnails/4.jpg)
Hardware
![Page 5: Performance Analysis of Processor](https://reader036.vdocument.in/reader036/viewer/2022062520/56815931550346895dc664e6/html5/thumbnails/5.jpg)
EDK and MicroBlaze
• The Embedded Development Kit (EDK) is a set of microprocessor design tool and common software platforms. The EDK includes the Platform Studio tool suite, the MicroBlaze core and a library of peripheral IP cores.
• The MicroBlaze embedded soft core is a 32-bit
Reduced Instruction Set Computer (RISC) optimized for implementation in FPGA. Operating at up to 200
MHz.
• MicroBlaze enables to have complete flexibility in setting peripherals, memory and interface features on a single FPGA
![Page 6: Performance Analysis of Processor](https://reader036.vdocument.in/reader036/viewer/2022062520/56815931550346895dc664e6/html5/thumbnails/6.jpg)
MicroBlaze Architecture MicroBlaze Hardware Options and Functions• Hardware Barrel Shifter• Hardware Divider• Machine Status Set and Clear Instructions• Hardware Exception Support• Pattern Compare Instructions• Floating-Point Unit (FPU)• Hardware Multiplier Enable
Bus Infrastructure• Data-side On-chip Peripheral
Bus (DOPB)• Instruction-side On-chip Peripheral Bus (IOPB)• Data-side Local Memory Bus
(DLMB)• Instruction-side Local Memory Bus (ILMB)• Fast Simplex Link (FSL)
![Page 7: Performance Analysis of Processor](https://reader036.vdocument.in/reader036/viewer/2022062520/56815931550346895dc664e6/html5/thumbnails/7.jpg)
Overall view on OPB peripherals
![Page 8: Performance Analysis of Processor](https://reader036.vdocument.in/reader036/viewer/2022062520/56815931550346895dc664e6/html5/thumbnails/8.jpg)
OPB Timer/Counter
The TC (Timer/Counter) is a 32-bit timer module that attaches to the OPB.
• Two programmable interval timers with interrupt, event generation, and event capture capabilities.
• Each timer has 3 32bit registers:
1. TCSR - Control Register
2. TLR - Load Register
3. TCR - Counter Register
• Both timer/counter modules can be used in a Generate Mode, a Capture Mode, or a Pulse Width Modulation (PWM) Mode.
![Page 9: Performance Analysis of Processor](https://reader036.vdocument.in/reader036/viewer/2022062520/56815931550346895dc664e6/html5/thumbnails/9.jpg)
OPB Interrupt Controller
![Page 10: Performance Analysis of Processor](https://reader036.vdocument.in/reader036/viewer/2022062520/56815931550346895dc664e6/html5/thumbnails/10.jpg)
Continuing INTC…
INTC Features
• Priority between interrupt requests is determined by vector position.
• Supports data bus widths of 8-bits, 16-bits, or 32-bits for OPB interface.
• Number of interrupt inputs configurable up to the width of data bus.
• Interrupt Enable Register (IER) for selectively disabling individual interrupt inputs.
• Master Enable Register for disabling interrupt request output and choosing software or hardware interrupts.
• Each input is configurable for edge or level sensitivity.
![Page 11: Performance Analysis of Processor](https://reader036.vdocument.in/reader036/viewer/2022062520/56815931550346895dc664e6/html5/thumbnails/11.jpg)
Connecting Customized IP to FSL BUS
• MicroBlaze has the ability to use its dedicated FSL bus interface to integrate a customized IP core into a MicroBlaze soft processor-based system.
• Generally, there are two ways to integrate a customized IP core into a MicroBlaze
1. One way is to connect the IP on the (OPB) .
2. The second way is to connect the user IP to the
MicroBlaze dedicated Fast Simplex Link (FSL) bus system.
• If the application is time-critical, the designer should take bus standard delays into account, thus the user IP should be connected to the FSL bus system.
Otherwise, it can be connected as a slave or master on the OPB.
![Page 12: Performance Analysis of Processor](https://reader036.vdocument.in/reader036/viewer/2022062520/56815931550346895dc664e6/html5/thumbnails/12.jpg)
Continuing Our Customized IP…The whole embedded system consists of the MicroBlaze itself, two FSL bus systems, the user core, an OPB on-chip bus, two OPB peripherals (UART lite ,Timer and Interrupt Controller) and the on-chip block RAM. The application program is stored in the on-chip block RAM.
![Page 13: Performance Analysis of Processor](https://reader036.vdocument.in/reader036/viewer/2022062520/56815931550346895dc664e6/html5/thumbnails/13.jpg)
• It is possible to use more than 2 dynamic inputs and more than 1 output because up to 16 FSL interface busses are provided.
• User IP is independent, doesn’t affect the internal MB RISC architecture thus won’t decrease the clock frequency of MB.
• Outside implementation of IP allows to run customs calculations parallel to main stream application.
• The new hardware doesn't require inline assembler code because the FSL interface has predefined C-macros for I/O to IP
![Page 14: Performance Analysis of Processor](https://reader036.vdocument.in/reader036/viewer/2022062520/56815931550346895dc664e6/html5/thumbnails/14.jpg)
• We connected 1-dimension IDCT HW block on specially configured FSL .
• A 1-dimension IDCT realized in software requires a high execution time because the C- program executes many loops sequentially .
• Implementation of application as hardware module greatly reduces the execution time due to parallel processing.
• The software application writes 8 values from memory to the FSL. The IDCT core gets the data, calculates the result and returns the result data (8 words) back to MB trough the FSL.
• The HW implementation of IDCT is written in VHDL (the code is available in EDK IP wizard
Our Customized IP
![Page 15: Performance Analysis of Processor](https://reader036.vdocument.in/reader036/viewer/2022062520/56815931550346895dc664e6/html5/thumbnails/15.jpg)
HDL Structure
![Page 16: Performance Analysis of Processor](https://reader036.vdocument.in/reader036/viewer/2022062520/56815931550346895dc664e6/html5/thumbnails/16.jpg)
Time measurements• The time measurement relies on accessing the Timer/Counter MB
OPB peripheral and controlling Timer's counting registers.• The counter I set to count CPU clocks during the program execution
periods.• Time is calculated from counter value and the system clock
frequency.• Measurement result for example:
We can see that the time is T=9366019 clocks. The MB frequency is 100MHz. Thus the time is T= 9366019/100exp(+6) = 93.66 msec.
![Page 17: Performance Analysis of Processor](https://reader036.vdocument.in/reader036/viewer/2022062520/56815931550346895dc664e6/html5/thumbnails/17.jpg)
Interrupt controller time measurements
1. Measure the clean program time (T)
2. Load the Time register with A=0XFFFFFFFF- T+ 0XFF
3. Measure the final time including INTC (T1)
4. Delta = T1- (A+FF)
![Page 18: Performance Analysis of Processor](https://reader036.vdocument.in/reader036/viewer/2022062520/56815931550346895dc664e6/html5/thumbnails/18.jpg)
Total time measurements results
Application Time
IDCT SW Fixed point 320.31 usec
IDCT SW Floating point
93.66 msec
IDCT HW Fixed point (FSL)
12.31 usec
Empty Interrupt handler
1.12 usec
Printing character to RS232
53.85 msec
![Page 19: Performance Analysis of Processor](https://reader036.vdocument.in/reader036/viewer/2022062520/56815931550346895dc664e6/html5/thumbnails/19.jpg)
The current measurement relies on connecting power resistor serially to MB 1.5v power block and recording the voltage on the resistor by sampling with NI DAC Labview system.The program repeated for 5000 times for better statistics. SampleFreq=20KHz. For example:
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20.08
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
time [sec]
curr
ent
[a]
Current measurement of fixed point SW
initial position
program execution
eternity loop
Current measurements
![Page 20: Performance Analysis of Processor](https://reader036.vdocument.in/reader036/viewer/2022062520/56815931550346895dc664e6/html5/thumbnails/20.jpg)
Current measurements results
Application Mean initial and eternity current [a]
Mean program current [a]
Max
program current [a]
Mean Power [mw]
IDCT SW Fixed point
0.1683 0.1649 0.2253 ~ 250
IDCT SW Floating point
0.2504 0.1953 0.2714 ~ 350
IDCT HW Fixed point (FSL)
0.2113 0.1614 0.2225 ~ 280
![Page 21: Performance Analysis of Processor](https://reader036.vdocument.in/reader036/viewer/2022062520/56815931550346895dc664e6/html5/thumbnails/21.jpg)
ConclusionsApplication Mean initial
and eternity current [a]
Mean program current [a]
Max
program current [a]
Time Mean Power [mw]
Mean Energy
[ujoule]
IDCT SW Fixed point
0.1683 0.1649 0.2253 320.31 usec
~ 250 80
IDCT SW Floating point
0.2504 0.1953 0.2714 93.66 msec
~ 350 32781
IDCT HW Fixed point (FSL)
0.2113 0.1614 0.2225 12.31 usec
~ 280 3.45
• MicroBlaze is a very effective solution for easily configurable systems demands.
• Distinguishing critical passes and implementing them in HW can greatly decrease execution time.
• FSL is the most effective (energy and time) way for connecting HW acceleration to MB.
• FPU is the most problematic consumer. Massive use of FPU is not recommended in MB.
• Interrupts input frequency is maximum 890kHz.
![Page 22: Performance Analysis of Processor](https://reader036.vdocument.in/reader036/viewer/2022062520/56815931550346895dc664e6/html5/thumbnails/22.jpg)
Questions?