adv m 11
TRANSCRIPT
AdvancedAdvanced MicrocontrollersMicrocontrollers
Grzegorz BudzyGrzegorz Budzyńń
LLectureecture 11:11:DigitalDigital SignalSignal ControllersControllers & & DigitalDigital SignalSignalProcessorsProcessors
Plan• Digital Signal Controllers
– Introduction
• Digital Signal Controllers vs Microcontrollers
• Digital Signal Controllers vs Digital Signal Processors
– DSC by Texas Instruments
– DSC by Freescale
• Digital Signal Processors
Introduction
Digital Signal Controller (DSC) is a combination
of a Microcontroller and a Digital Signal
Processor (DSP)
Introduction
- Like microcontrollers DSC have:
- Fast interrupt response
- Control oriented peripherals (PWM, watchdog,
etc.)
- Usually programmed in C++ language (although
assembler programming possible)
Introduction
- Like digital signal processors DSC have:
- Single cycle multiply-and-accumulate MAC
instructions
- Barrel shifters
- Large Accumulators
Introduction
- Main applications of DSCs:
- Motor control
- Power conversion
- Sensor processing applications
dsPIC - Architecture
• dsPIC – family of 16 bit RISC controllers with
DSP features
• Two subfamilies:
– dsPIC30F – smaller, slower
– dsPIC33F – highest performance
• The only digital signal controller on the
market available in QFN-28 cases at prices
down to 3$!!!
dsPIC - Architecture
• Main features:
– Modified Harvard architecture,
– Optimized for C-compilers
– Two 40-bit accumulators with rounding
– Memory options as in PIC24
– Many single-cycle MAC operations
– Cases 18 to 110 pins
DSC - TI portfolio
• Four main subfamilies:
– 24x 16-bit Series
– 28x Fixed point Series
– 28x Piccolo Series
– 28x Delfino Floating-point Series
Piccolo Series• TMS320F2802x:
– fixed point microcontrollers
– 40-60MHz performance
– up to 64KB of on-chip flash
– small 38-pin package options
– feature rich peripherals:
• 150-ps high resolution enhanced pulse width modulators
(ePWMs)
• 4.6 MSPS 12-bit ADC
• high precision on-chip oscillators, analog comparators
• high speed 12-bit ADC
• support for I2C, SPI, and SCI.
Piccolo Series
- TMS320F2803x:
- fixed point 32-bit microcontrollers
- 60 MHz speed
- up to 128KB flash memory
- 64 or 80-pin packages
- peripherals and features of the 2802x devices plus:
- control law accelerator (CLA) for high efficiency control loops
- QEP module
- CAN and LIN interfaces
Delfino Series• TMS320F2833x:
– Integrated floating point unit simplifies
development and speeds control applications up
by an average of 50%
– F2833x devices run at up to 150 MHz (300
MFLOPS) with two package offerings that are pin-
for-pin compatible within all F2833x and F2823x
controllers
– Features up to 512KB of on-chip flash and a DMA
for high speed memory access.
Delfino Series• TMS320C2834x:
– delivers up to 600 MFLOPS of floating-point
performance
– up to 516KB of single-access RAM
– PWMs with 65-ps
– Direct Memory Access and a low-latency core
make the C2834x an excellent solution for
performance-hungry real-time control
applications.
28x Fixed-Point Series
• TMS320F2823x:
– F2823x generation of controllers is a fixed point
version of the F2833x devices
– Pin-to-pin compatible with the F2833x series, all
of the peripherals and features remain the same
except for the floating point unit.
28x Fixed-Point Series
• TMS320F280x:
– device offers 60-100Mhz performance
– were the first generation to feature:
• the on-chip 12.5 MSPS 12-bit ADC
• multiple high resolution PWM peripherals
• QEP (quadrature encoder pulse)
– F280xx devices have up to 256KB of flash
memory.
28x Fixed-Point Series
• TMS320F281x:
– F281x device generation features:
• 150Mhz core
• flexible Event Managers that provide access to timers,
compare/PWM units, captures, and quadrature-
encoder units
C2000 core features• Main features:
– Efficient C engine with hardware that allows a C
compiler to generate compact code, resulting in
industry-leading code density
– Single cycle read-modify-write instructions, single
cycle 32-bit multiply.
– Fast interrupt service time (down to 9 cycles) with
automatic zero-cycle context save.
– 96 dedicated interrupt vectors that require no
software decision making
C2000 core features
• Main features:
– 32-bit floating-point unit on Delfino controllers
– On select Piccolo devices, an independent Control Law
Accelerator (CLA) processes floating-point control loops to
free the CPU for other purposes.
– Three 32-bit general purpose CPU timers brings accuracy
and flexibility to any applications.
– Code Security Module prevents reverse engineering and
protects valuable intellectual property
MC56F8357• Main features:
– On-chip memory includes high-speed volatile and
nonvolatile components:
• 512 KB of Program Flash
• 4 KB of Program RAM (836X Devices)
• 32 KB of Data RAM
• 32 KB of Data Flash (836X Devices)
• 32 KB of Boot Flash
– Access up to 4MB of off-chip program and 32MB
of data memory
– Up to 60 MIPS at 60 MHz execution frequency
MC56F8357• Main features:
– Four 12-bit, Analog-to-Digital Converters
– Temperature Sensor
– Up to two FlexCAN (CAN Version 2.0 B-compliant)
– Two Serial Communication Interfaces (SCIs)
– Up to two Serial Peripheral Interfaces (SPIs)
– Two dedicated external interrupt pins
– Software-programmable Phase-Lock Loop
MC56F8357 - Core• Main features 1/2:
– Efficient 16-bit 56800E family controller engine
with dual Harvard architecture
– Single-cycle 16 × 16-bit parallel Multiplier-
Accumulator (MAC)
– Four 36-bit accumulators, including extension bits
– Arithmetic and logic multi-bit shifter
– Parallel instruction set with unique DSP
addressing modes
– Hardware DO and REP loops
MC56F8357 - Core• Main features 2/2:
– Three internal address buses and one external
address bus
– Four internal data buses and one external data bus
– Instruction set supports both DSP and controller
functions
– Controller-style addressing modes and instructions for
compact code
– Efficient C compiler and local variable support
– Software subroutine and interrupt stack with depth
limited only by memory
DSP Introduction- Digital Signal Processing: application of
mathematical operations to digitally
represented signals
- Signals represented digitally as sequences of
samples
- Digital signals obtained from physical signals
via tranducers (e.g., microphones) and analog
to-digital converters (ADC)
DSP Introduction
- Digital signals converted back to physical
signals via digital-to-analog converters (DAC)
- Digital Signal Processor (DSP): electronic
system that processes digital signals
DSP Introduction
- Most DSP tasks require:
- Repetitive numeric computations
- Attention to numeric fidelity
- High memory bandwidth, mostly via array
accesses
- Real-time processing
DSP Introduction
- DSPs must perform these tasks efficiently
while minimizing:
- Cost
- Power
- Memory use
- Development time
Common DSP applications
- Applications – Instrumentation and
measurement:
- Communications
- Audio and video processing
- Graphics, image enhancement, 3- D rendering
- Navigation, radar, GPS
- Control - robotics, machine vision, guidance
Common DSP algorithms
- Algorithms
- Frequency domain filtering - FIR and IIR
- Frequency- time transformations - FFT
- Correlation
Fast data access
- Need of transferring data to / from memory
or DSP peripherals
- Need of retrieving instructions from memory
- Three main implementations:
- high-bandwidth memory architectures
– specialized addressing modes
– direct memory access
High-bandwidth memory architectures
- Only Harvard (b) and Super-Harvard (c) usedin DSPs
• Super-Harvard modification - adding to the DSP core a small bank of fast memory, called‘instruction cache’
• Data are also allowed to be stored in the program memory
• The last-executed program instructions arerelocated at run time in the instruction cache
High-bandwidth memory architectures
- Cache drawbacks:
– Problems caused by the lack of full predictability for
cache hits
– A missing cache hit happens when the data or the
instructions needed by the DSP are not stored in
cache memory, hence they have to be fetched from a
slower memory with an execution speed penalty
– A situation causing a missing cache hit is, for instance,
the flow change due to branch instructions.
Specialized addressing modes• Address generator blocks controls the address
generation for:
– specialized addressing modes such as indexing
addressing, circular buffers, and bit-reversal
addressing
Specialized addressing modes• Circular buffers – user for example in the
implementation of digital filters
Direct memory access
• The DMA controller is a second processor
working in parallel with the DSP core
• It is dedicated to transferring information
between two memory areas or between
peripherals and memory
• The DMA controller frees the DSP core for
other processing tasks
Fast computation – MAC centered
• The MAC operation is used by many digital
processing algorithms
• The basic DSP arithmetic processing blocks are:
– a) many registers
– b) one or more multipliers
– c) one or more Arithmetic Logic Units (ALUs)
– d) one or more shifters
Instruction pipelining
• Instruction pipelining consists of:
– dividing the execution of instructions into different
stages
– executing the different instructions in parallel
stages.
• The net result is an increased throughput of the
instruction execution.
Parallel architectures
• Parallel-enhanced DSP architectures started to
appear on the market in the mid 1990s and
were based:
– on instruction-level parallelism (VLIW),
– data-level parallelism (SIMD),
– a combination of both
Parallel architectures - VLIW• In VLIW many instructions are issued at the
same time and are executed in parallel by multiple execution units
• Characteristics of VLIW architectures include simple and regular instruction sets
• Instruction scheduling is done at compile-time and not at run-time
• writing assembly code for VLIW architecture is very complex and the optimization is oftenbetter left to the compiler
Parallel architectures - SIMD
• SIMD architectures are based on data-level
parallelism
• Only one instruction is issued at a time
• The same operation specified by the instruction
is performed on multiple data sets
Numerical fidelity
• It is essential that the numerical fidelity be
maximized
• The errors due to the finite number of bits used
in the number representation and in the
arithmetic operations should be minimized
• Improving numerical fidelity can be done by
changing the numeric representation or by
dedicated hardware features
Numerical fidelity
• DSP can be categorized into:
– Fixed point (up to 64-bit, fractional arithmetic)
– Floating point (32- or 64-bit)
Fast execution control• It is important that the program in the DSP is
executed in a deterministic way
• Interrupts have to be serviced with minimal
latency
• An important DSP feature is the
implementation by hardware of looping
constructs, referred to as ‘zero-overhead
hardware loop’ - e.g. RPT #2 || NOP
TI 66AK2H12• Up to 5.6 GHz of ARM and 9.6 GHz of DSP
processing coupled with:
– security,
– packet processing,
– Ethernet
• The raw computational performance is 38.4
GMACS/core and 19.2 Gflops/core (@ 1.2 GHz
operating frequency)
TI 66AK2H12• Eight TMS320C66x™ DSP Core Subsystems
Each With
– Up to 1.2 GHz C66x Fixed/Floating-Point DSP
Cores
• 38.4 GMacs/Core for Fixed Point @ 1.2 GHz
• 19.2 GFlops/Core for Floating Point @ 1.2 GHz
– Memory
• 32K Byte L1P Per Core
• 32K Byte L1D Per Core
• 1024K Byte Local L2 Per Core
TI 66AK2H12• ARM® Cortex™-A15 MPCore™ Processors Containing
Four ARM Cortex-A15 Cores
– Up to 1.4-GHz Cortex-A15 Processor Core Speed
– 4MB L2 Cache Memory Shared by All ARM Cores
– Full Implementation of ARMv7-A Architecture Instruction
Set
– 32KB L1 Instruction Cache and Data Cache per Cortex-A15
Processor Core
– AMBA 4.0 AXI Coherency Extension (ACE) Master Port,
Connected to MSMC (Multicore Shared Memory
Controller) for Low Latency Access to Shared MSMC SRAM
TI 66AK2H12• Network Coprocessor
– Packet Accelerator Enables Support for
• Transport Plane IPsec, GTP-U, SCTP, PDCP
• L2 User Plane PDCP (RoHC, Air Ciphering)
• 1 Gbps Wire Speed Throughput at 1.5 MPackets Per Second
– Security Accelerator Engine Enables Support for
• IPSec, SRTP, 3GPP and WiMAX Air Interface, and SSL/TLS Security
• ECB, CBC, CTR, F8, A5/3, CCM, GCM, HMAC, CMAC, GMAC, AES,
DES, 3DES, Kasumi, SNOW 3G, SHA-1, SHA-2 (256-bit Hash), MD5
• Up To 6.4 Gbps IPSec and 3 Gbps Air Ciphering
– Ethernet Subsystem
• Five SGMII Port Switch
TI 66AK2H12• Peripherals
– Four Lanes of SRIO 2.1
• 5 Gbps Operation Per Lane
• Supports Direct I/O, Message Passing
– Two Lanes PCIe Gen2
• Supports Up To 5 GBaud
– TwoHyperLink
• Supports Connections to Other KeyStone Architecture Devices
• Supports Up To 50 GBaud
– Five Enhanced Direct Memory Access (EDMA) Modules
TI 66AK2H12• Peripherals
– Two 72-Bit DDR3 Interfaces with Speeds Up To 1600 MHz
– USB 3.0
– Two UART Interfaces
– Three I2C Interfaces
– 32 GPIO Pins
– Three SPI Interfaces
– Semaphore Module
– Twenty 64-Bit Timers
– Five On-Chip PLLs
Keystone architecture• High performance structure for integrating
RISC and DSP cores with application-specific
coprocessors and I/O
• Four main hardware elements:
– Multicore Navigator,
– TeraNet,
– Multicore Shared Memory Controller
– HyperLink
Keystone architecture• Multicore Navigator:
– A packet-based manager that controls 16k queues
– When tasks are allocated to the queues, Multicore
Navigator provides hardware-accelerated dispatch
that directs tasks to the appropriate available
hardware
• TeraNet:
– central resource to move packets with 2 Tbps
capacity!
Keystone architecture• Multicore Shared Memory Controller:
– enables processing cores to access shared memory directly without drawing from the TeraNet’s capacity – no blocking of packetmoevement by memory access
• HyperLink:
– provides a 50-GBaud chip-level interconnect
– Working with Multicore Navigator, HyperLinkdispatches tasks to tandem devices transparently and executes tasks as if they are running on local resources
References[1] dsPIC family documentation; www.microchip.com
[2] www.ti.com
[3] C2000 family documentation; www.ti.com
[4] www.freescale.com
[5] 56F8000 family documentation; www.freescale.com
[6] http://www.coe.pku.edu.cn/tpic/2010913102418831.pdf
[7] http://www.dspguide.com/CH28.PDF
[8] http://www.cs.berkeley.edu/~pattrsn/252S98/Lec08-dsp.pdf
[9] http://www.ti.com/lit/ds/symlink/66ak2h12.pdf