Download - Notes for Digital Signal Processor
-
8/6/2019 Notes for Digital Signal Processor
1/25
Cliff Notes for Digital Signal Processor
C54x
1. With a neat diagram explain the important features of TMS320C54x Processors?
-
8/6/2019 Notes for Digital Signal Processor
2/25
Overview
The C54x DSP has a high degree of operational flexibility and speed. It combines an advanced modified
Harvard architecture (with one program memory bus, three data memory buses, and four address
buses), a CPU with application-specific hardware logic, on-chip memory, on-chip peripherals, and a
highly specialized instruction set
The C54x devices offer these advantages:
Enhanced Harvard architecture built around one program bus, three data buses, and fouraddress buses for increased performance and versatility
Advanced CPU design with a high degree of parallelism and application specific hardware logicfor increased performance
A highly specialized instruction set for faster algorithms and for optimized high-level languageoperation
Modular architecture design for fast development of spinoff devices Advanced IC processing technology for increased performance and low power consumption Low power consumption and increased radiation hardness because of new static design
techniques
Key Features
This section lists the key features of the C54x DSPs.
Key Features - CPU
Advanced multibus architecture with one program bus, three data buses, and four addressbuses
40-bit arithmetic logic unit (ALU), including a 40-bit barrel shifter and two independent 40-bitaccumulators
17-bit 17-bit parallel multiplier coupled to a 40-bit dedicated adder for nonpipelined single-cycle multiply/accumulate (MAC) operation
Compare, select, store unit (CSSU) for the add/compare selection of the Viterbi operator Exponent encoder to compute the exponent of a 40-bit accumulator value in a single cycle Two address generators, including eight auxiliary registers and two auxiliary register arithmetic
units
-
8/6/2019 Notes for Digital Signal Processor
3/25
Multiple-CPU/core architecture on some devicesKey Features - Memory
192K words 16-bit addressable memory space (64K-words program, 64K-words data, and 64K-words I/O), with extended program memory in the C548, C549, C5402, C5410, and C5420
Key Features - Instruction set
Single-instruction repeat and block repeat operations Block memory move instructions for better program and data management Instructions with a 32-bit long operand Instructions with 2- or 3-operand simultaneous reads Arithmetic instructions with parallel store and parallel load Conditional-store instructions Fast return from interrupt
Key Features - On-chip peripherals
Software-programmable wait-state generator Programmable bank-switching logic On-chip phase-locked loop (PLL) clock generator with internal oscillator or external clock source. External bus-off control to disable the external data bus, address bus, and control signals Data bus with a bus holder feature Programmable timer Available Ports: HPI (Host Port Interface), Synchronous Serial Ports, Buffered Serial Port,
Multichannel Buffered Serial Port, TDM (Time Division Multiplexed Serial Port)
Speed Supported: 25/20/15/12.5/10ns execution time for a single cycle fixed point instruction(40 MIPS/50 MIPS/66 MIPS/80 MIPS/100 MIPS)
Key Features Power
Power consumption control with IDLE 1, IDLE 2, and IDLE 3 instructions for power-down modes Control to disable the CLKOUT signal
-
8/6/2019 Notes for Digital Signal Processor
4/25
Key Features - Emulation
IEEE Standard 1149.1 boundary scan logic interfaced to on-chip scan-based emulation logicReferences: TMS320C54x DSP Reference Set Volume 1: CPU and Peripherals, Chapter 1
-
8/6/2019 Notes for Digital Signal Processor
5/25
2. With a neat diagram explain the bus-architecture of TMS320C54x processors?
The C54xE DSPs use an advanced modified Harvard architecture that maximizes processing power witheight buses.
Separate program and data spaces allow simultaneous access to program instructions and data,
providing a high degree of parallelism. For example, three reads and one write can be performed in a
single cycle. Instructions with parallel store and application-specific instructions fully utilize this
architecture. In addition, data can be transferred between data and program spaces.
-
8/6/2019 Notes for Digital Signal Processor
6/25
The C54xE DSP architecture is built around eight major 16-bit buses (four program/data buses and four
address buses):
The program bus (PB) carries the instruction code and immediate operands from programmemory.
Three data buses (CB, DB, and EB) interconnect to various elements, such as the CPU, dataaddress generation logic, program address generation logic, on-chip peripherals, and data
memory.
o The CB and DB carry the operands that are read from data memory.o The EB carries the data to be written to memory.
Four address buses (PAB, CAB, DAB, and EAB) carry the addresses needed for instructionexecution.
The C54x DSP can generate up to two data-memory addresses per cycle using the two auxiliaryregister arithmetic units (ARAU0 and ARAU1).
The PB can carry data operands stored in program space (for instance, a coefficient table) to themultiplier and adder for multiply/accumulate operations or to a destination in data space for
data move instructions (MVPD and READA). This capability, in conjunction with the feature of
dual-operand read, supports the execution of single-cycle, 3-operand instructions such as the
FIRS instruction.
The C54x DSP also has an on-chip bidirectional bus for accessing on-chip peripherals. This bus isconnected to DB and EB through the bus exchanger in the CPU interface. Accesses that use thisbus can require two or more cycles for reads and writes, depending on the peripherals
structure.
-
8/6/2019 Notes for Digital Signal Processor
7/25
Reference: TMS320C54x DSP Reference Set Volume 1: CPU and Peripherals, Chapter 1
-
8/6/2019 Notes for Digital Signal Processor
8/25
3. With a neat diagram explain the architecture of TMS320C54x processors?
The C54x DSPs use an advanced modified Harvard architecture that maximizes processing power witheight buses.
Separate program and data spaces allow simultaneous access to program instructions and data,
providing a high degree of parallelism. For example, three reads and one write can be performed in a
single cycle. Instructions with parallel store and application-specific instructions fully utilize this
architecture. In addition, data can be transferred between data and program spaces. Such parallelism
supports a powerful set of arithmetic, logic, and bit-manipulation operations that can all be performed
-
8/6/2019 Notes for Digital Signal Processor
9/25
in a single machine cycle. Also, the C54x DSP includes the control mechanisms to manage interrupts,
repeated operations, and function calling.
Bus Structure:
The C54x DSP architecture is built around eight major 16-bit buses (four program/data buses and four
address buses):
The program bus (PB) carries the instruction code and immediate operands from programmemory.
Three data buses (CB, DB, and EB) interconnect to various elements, such as the CPU, dataaddress generation logic, program address generation logic, on-chip peripherals, and data
memory.
The CB and DB carry the operands that are read from data memory. The EB carries the data to be written to memory. Four address buses (PAB, CAB, DAB, and EAB) carry the addresses needed for instruction
execution.
Internal Memory Organization:
The C54x DSP memory is organized into three individually selectable spaces: program, data, and I/O
space. The C54x devices can contain random access memory (RAM) and read-only memory (ROM).
Among the devices, the following types of RAM are represented: dual-access RAM (DARAM), single-
access RAM (SARAM), and two-way shared RAM. The DARAM or SARAM can be shared within
subsystems of a multiple-CPU core device. You can configure the DARAM and SARAM as data memoryor program/data memory. The C54x DSP also has 26 CPU registers plus peripheral registers that are
mapped in data-memory space.
Central Processing Unit:
The CPU is common to all C54x devices. The C54x CPU contains:
40-bit arithmetic logic unit (ALU) Two 40-bit accumulators Barrel shifter 17 17-bit multiplier 40-bit adder Compare, select, and store unit (CSSU) Data address generation unit Program address generation unit
-
8/6/2019 Notes for Digital Signal Processor
10/25
Data Addressing:
The C54x DSP offers seven basic data addressing modes:
Immediate addressing uses the instruction to encode a fixed value. Absolute addressing uses the instruction to encode a fixed address. Accumulator addressing uses accumulator A to access a location in program memory as data. Direct addressing uses seven bits of the instruction to encode the lower seven bits of an
address. The seven bits are used with the data page pointer (DP) or the stack pointer (SP) to
determine the actual memory address.
Indirect addressing uses the auxiliary registers to access memory. Memory-mapped register addressing uses the memory-mapped registers without modifying
either the current DP value or the current SP value.
Stack addressing manages adding and removing items from the system stack. During the execution of instructions using direct, indirect, or memory-mapped register
addressing, the data-address generation logic (DAGEN) computes the addresses of data-memory
operands
Pipeline Operation
An instruction pipeline consists of a sequence of operations that occur during the execution of an
instruction. The C54x DSP pipeline has six levels: prefetch, fetch, decode, access, read, and execute. Ateach of the levels, an independent operation occurs. Because these operations are independent, from
one to six instructions can be active in any given cycle, each instruction at a different stage of
completion. Typically, the pipeline is full with a sequential set of instructions, each at one of the six
stages. When a PC discontinuity occurs, such as during a branch, call, or return, one or more stages of
the pipeline may be temporarily unused
Onchip Peripherals
All the C54x devices have a common CPU, but different on-chip peripherals are connected to their CPUs.
The C54x devices may have these, or other, on-chip peripheral options:
General-purpose I/O pins Software-programmable wait-state generator Programmable bank-switching logic Clock generator Timer Direct memory access (DMA) controller Standard serial port
-
8/6/2019 Notes for Digital Signal Processor
11/25
Time-division multiplexed (TDM) serial port Buffered serial port (BSP) Multichannel buffered serial port (McBSP) Host-port interface
o 8-bit standard (HPI)o
8-bit enhanced (HPI8)o 16-bit enhanced (HPI16)
External Bus Interface
The interfaces external ready input signal and software-generated wait states allow the processor to
interface with memory and I/O devices of many different speeds. The interfaces hold modes allow an
external device to take control of the C54x DSP buses; in this way, an external device can access the
resources in the program, data, and I/O spaces.
IEEE Standard 1149.1 Scanning Logic
The IEEE Standard 1149.1 scanning-logic circuitry is used for emulation and testing purposes only. This
logic provides the boundary scan to and from the interfacing devices. Also, it can be used to test pin-to-
pin continuity as well as to perform operational tests on devices peripheral to the C54x DSP. The IEEE
Standard 1149.1 scanning logic is interfaced to internal scanning-logic circuitry that has access to all of
the on-chip resources. Thus, the C54x DSP can perform on-board emulation using the IEEE Standard
1149.1 serial scan pins and the emulation-dedicated pins.
-
8/6/2019 Notes for Digital Signal Processor
12/25
4. Explain the pipeline stages and phases of any of the DSP?
What is meant by pipelining? Describe briefly the pipeline operation of TMS320C54x
processors?
Processors with pipelining are organized inside into stages which can semi-independently work on
separate jobs. Each stage is organized and linked into a 'chain' so each stage's output is fed to another
stage until the job is done. This organization of the processor allows overall processing time to be
significantly reduced.
A deeper pipeline means that there are more stages in the pipeline, and therefore, fewer logic gates in
each stage. This generally means that the processor's frequency can be increased as the cycle time is
lowered. This happens because there are fewer components in each stage of the pipeline, so the
propagation delay is decreased for the overall stage
Pipelining does not help in all cases. There are several possible disadvantages. An instruction pipeline is
said to be fully pipelined if it can accept a new instruction every clock cycle. A pipeline that is not fullypipelined has wait cycles that delay the progress of the pipeline.
Advantages of Pipelining
The cycle time of the processor is reduced, thus increasing instruction issue-rate in most cases. Some combinational circuits such as adders or multipliers can be made faster by adding more
circuitry. If pipelining is used instead, it can save circuitry vs. a more complex combinational
circuit.
Disadvantages of Pipelining
A non-pipelined processor executes only a single instruction at a time. This prevents branchdelays (in effect, every branch is delayed) and problems with serial instructions being executed
concurrently. Consequently the design is simpler and cheaper to manufacture.
The instruction latency in a non-pipelined processor is slightly lower than in a pipelinedequivalent. This is because extra flip flops must be added to the data path of a pipelined
processor.
A non-pipelined processor will have a stable instruction bandwidth. The performance of apipelined processor is much harder to predict and may vary more widely between different
programs.
C54x Pipeline
The C54x DSP has a six-level deep instruction pipeline. The six stages of the pipeline are independent of
each other, which allows overlapping execution of instructions. During any given cycle, from one to six
different instructions can be active, each at a different stage of completion.
-
8/6/2019 Notes for Digital Signal Processor
13/25
The six levels and functions of the pipeline structure are:
Program prefetch: Program address bus (PAB) is loaded with the address of the next instruction to be
fetched.
Program fetch: An instruction word is fetched from the program bus (PB) and loaded into the
instruction register (IR). This completes an instruction fetch sequence that consists of this and the
previous cycle.
Decode: The contents of the instruction register (IR) are decoded to determine the type of memory
access operation and the control sequence at the data-address generation unit (DAGEN) and the CPU.
Access:DAGEN outputs the read operands address on the data address bus, DAB. If a second operand is
required, the other data address bus, CAB, is also loaded with an appropriate address. Auxiliary registers
in indirect addressing mode and the stack pointer (SP) are also updated. This is considered the first of
the 2-stage operand read sequence.
Read: The read data operand(s), if any, are read from the data buses, DB and CB. This completes thetwo-stage operand read sequence. At the same time, the two-stage operand write sequence begins. The
data address of the write operand, if any, is loaded into the data write address bus (EAB). For memory-
mapped registers, the read data operand is read from memory and written into the selected memory-
mapped registers using the DB.
Execute: The operand write sequence is completed by writing the data using the data write bus (EB).
The instruction is executed in this phase.
The first two stages of the pipeline, prefetch and fetch, are the instruction fetch sequence. In one cycle,
the address of a new instruction is loaded. In the following cycle, an instruction word is read. In case of
multiword instructions, several such instruction fetch sequences are needed.
-
8/6/2019 Notes for Digital Signal Processor
14/25
During the third stage of the pipeline, decode, the fetched instruction is decoded so that appropriate
control sequences are activated for proper execution of the instruction.
The next two pipeline stages, access and read, are an operand read sequence. If required by the
instruction, the data address of one or two operands are loaded in the access phase and the operand or
operands are read in the following read phase.
Any write operation is spread over two stages of the pipeline, the read and execute stages. During the
read phase, the data address of the write operand is loaded onto EAB. In the following cycle, the
operand is written to memory using EB.
Each memory access is performed in two phases by the C54x DSP pipeline. In the first phase, an address
bus is loaded with the memory address. In the second phase, a corresponding data bus reads from or
writes to that memory address.
-
8/6/2019 Notes for Digital Signal Processor
15/25
5. Explain briefly all the different addressing modes of C54x Processor?
Data addressing
The TMS320C54x DSP offers seven basic addressing modes:
Immediate addressing uses the instruction to encode a fixed value. Absolute addressing uses the instruction to encode a fixed address. Accumulator addressing uses an accumulator to access a location in program memory as data. Direct addressing uses seven bits of the instruction to encode an offset relative to DP or to SP.
The offset plus DP or SP determine the actual address in data memory.
Indirect addressing uses the auxiliary registers to access memory. Memory-mapped register addressing modifies the memory-mapped registers without affecting
either the current DP value or the current SP value.
Stack addressing manages adding and removing items from the system stack.Data addressing
Immediate Addressing
In immediate addressing, the instruction syntax contains the specific value of the operand. Two types of
values can be encoded in an instruction:
Short immediate values can be 3, 5, 8, or 9 bits in length. Long immediate values are always 16 bits in length.
Immediate values can be encoded in 1-word or 2-word instructions. The 3-, 5-, 8-, or 9-bit values are
encoded into 1-word instructions; 16-bit values are encoded into 2-word instructions.
Data addressing
Absolute Addressing
There are four types of absolute addressing
Data-memory address (dmad) addressing Program-memory address (pmad) addressing Port address (PA) addressing *(lk) addressing is used with all instructions that support the use of a single data-memory
(Smem) operand
Data Addressing - Accumulator Addressing
Accumulator addressing uses the value in the accumulator as an address. This addressing mode is used
to address program memory as data.
Data Addressing - Direct Addressing
In direct addressing, the instruction contains the lower seven bits of the datamemory address (dma).
The 7-bit dma is an address offset that is combined with a base address, with the data-page pointer
-
8/6/2019 Notes for Digital Signal Processor
16/25
(DP), or with the stack pointer (SP) to form a 16-bit data-memory address. Using this form of addressing,
you can access any of 128 locations in random order without changing the DP or the SP.
Data Addressing - Indirect Addressing
In indirect addressing, any location in the 64K-word data space can be accessed using the 16-bit address
contained in an auxiliary register. The C54x DSP has eight 16-bit auxiliary registers (AR0AR7). Indirect
addressing is used mainly when there is a need to step through sequential locations in memory in fixed-
size steps.
Data Addressing - Memory-Mapped Register Addressing
Memory-mapped register addressing is used to modify the memory-mapped registers without affecting
either the current data-page pointer (DP) value or the current stack-pointer (SP) value. Because DP and
SP do not need to be modified in this mode, the overhead for writing to a register is minimal. Memory-
mapped register addressing works for both direct and indirect addressing.
Data Addressing - Stack Addressing
The system stack is used to automatically store the program counter during interrupts and subroutines.
It can also be used at your discretion to store additional items of context or to pass data values. The
stack is filled from the highest to the lowest memory address. The processor uses a 16-bit memory-
mapped register, the stack pointer (SP), to address the stack. SP always points to the last element stored
onto the stack.
Program Memory Addressing
Following program control operations that affect the value loaded in the PC:
Branches Calls Returns Conditional operations Repeats of an instruction or a block of instructions Hardware reset Interrupts
-
8/6/2019 Notes for Digital Signal Processor
17/25
6. Explain with a neat diagram the architecture of 6x series of processors?
The C6000 devices execute up to eight 32-bit instructions per cycle. The C674x CPU consists of 64
general-purpose 32-bit registers and eight functional units. These eight functional units contain:
Two multipliers Six ALUs
Features of the C6000 devices
Advanced VLIW CPU with eight functional units, including two multipliers and six arithmeticunits
o Executes up to eight instructions per cycle for up to ten times the performance of typicalDSPs
o Allows designers to develop highly effective RISC-like code for fast development time
-
8/6/2019 Notes for Digital Signal Processor
18/25
Instruction packingo Gives code size equivalence for eight instructions executed serially or in parallelo Reduces code size, program fetches, and power consumption
Conditional execution of most instructionso Reduces costly branchingo Increases parallelism for higher sustained performance
Efficient code execution on independent functional units 8/16/32-bit data support, providing efficient memory support for a variety of applications 40-bit arithmetic options add extra precision for vocoders and other computationally intensive
applications
Saturation and normalization provide support for key arithmetic operations Field manipulation and instruction extract, set, clear, and bit counting support common
operation found in control and data manipulation applications.
The VelociTI architecture of the C6000 platform of devices make them the first off-the-shelf DSPs to use
advanced VLIW to achieve high performance through increased instruction-level parallelism. A
traditional VLIW architecture consists of multiple execution units running in parallel, performing
multiple instructions during a single clock cycle. Parallelism is the key to extremely high performance,
taking these DSPs well beyond the performance capabilities of traditional superscalar designs. VelociTI is
a highly deterministic architecture, having few restrictions on how or when instructions are fetched,
executed, or stored. It is this architectural flexibility that is key to the breakthrough efficiency levels of
the TMS320C6000 Optimizing compiler.
The C674x CPU, contains:
Program fetch unit
16/32 bit instruction dispatch unit, advanced instruction packing
Instruction decode unit
Eight functional units (.L1, .L2, .S1, .S2, .M1, .M2, .D1, and .D2)
Two load-from-memory data paths (LD1 and LD2)
Two store-to-memory data paths (ST1 and ST2)
Two data address paths (DA1 and DA2)
Two register file data cross paths (1X and 2X) Two general-purpose register files (A and B)
Control registers
Control logic
Test, emulation, and interrupt logic
Internal DMA (IDMA) for transfers between internal memories
-
8/6/2019 Notes for Digital Signal Processor
19/25
The program fetch, instruction dispatch, and instruction decode units can deliver up to eight 32-bit
instructions to the functional units every CPU clock cycle. The processing of instructions occurs in each
of the two data paths (A and B), each of which contains four functional units (.L, .S, .M, and .D) and 32
32-bit general-purpose registers.
General-Purpose Register Files
There are two general-purpose register files (A and B) in the CPU data paths. Each of these files contains
32 32-bit registers (A0A31 for file A and B0B31 for file B). The general-purpose registers can be used
for data, data address pointers, or condition registers.
Functional Units
The eight functional units in the C6000 data paths can be divided into two groups of four; each
functional unit in one data path is almost identical to the corresponding unit in the other data path.
Each functional unit has its own 32-bit write port, so all eight units can be used in parallel every cycle,
into a general-purpose register file. All units ending in 1 (for example, .L1) write to register file A, and allunits ending in 2 write to register file B. Each functional unit has two 32-bit read ports for source
operands src1 and src2. Four units (.L1, .L2, .S1, and .S2) have an extra 8-bit-wide port for 40-bit long
writes, as well as an 8-bit input for 40-bit long reads. Since each DSP multiplier can return up to a 64-bit
result, an extra write port has been added from the multipliers to the register file.
Register File Cross Paths
Each functional unit reads directly from and writes directly to the register file within its own data path.
That is, the .L1, .S1, .D1, and .M1 units write to register file A and the .L2, .S2, .D2, and .M2 units write to
register file B. The register files are connected to the opposite-side register file's functional units via the
1X and 2X cross paths. These cross paths allow functional units from one data path to access a 32-bitoperand from the opposite side register file. The 1X cross path allows the functional units of data path A
to read their source from register file B, and the 2X cross path allows the functional units of data path B
to read their source from register file A.
Memory, Load, and Store Paths
The DSP supports double word loads and stores. There are four 32-bit paths for loading data from
memory to the register file. For side A, LD1a is the load path for the 32 LSBs and LD1b is the load path
for the 32 MSBs. For side B, LD2a is the load path for the 32 LSBs and LD2b is the load path for the 32
MSBs. There are also four 32-bit paths for storing register values to memory from each register file. For
side A, ST1a is the write path for the 32 LSBs and ST1b is the write path for the 32 MSBs. For side B, ST2a
is the write path for the 32 LSBs and ST2b is the write path for the 32 MSBs.
Data Address Paths
The data address paths (DA1 and DA2) are each connected to the .D units in both data paths. This allows
data addresses generated by any one path to access data to or from any register. The DA1 and DA2
resources and their associated data paths are specified as T1 and T2, respectively. T1 consists of the DA1
address path and the LD1 and ST1 data paths. For the DSP, LD1 is comprised of LD1a and LD1b to
-
8/6/2019 Notes for Digital Signal Processor
20/25
support 64-bit loads; ST1 is comprised of ST1a and ST1b to support 64-bit stores. Similarly, T2 consists of
the DA2 address path and the LD2 and ST2 data paths. For the DSP, LD2 is comprised of LD2a and LD2b
to support 64-bit loads; ST2 is comprised of ST2a and ST2b to support 64-bit stores.
-
8/6/2019 Notes for Digital Signal Processor
21/25
7. What is pipelining? Explain the pipeline stages o TMS320C6x Processors?
Processors with pipelining are organized inside into stages which can semi-independently work on
separate jobs. Each stage is organized and linked into a 'chain' so each stage's output is fed to another
stage until the job is done. This organization of the processor allows overall processing time to be
significantly reduced.
A deeper pipeline means that there are more stages in the pipeline, and therefore, fewer logic gates in
each stage. This generally means that the processor's frequency can be increased as the cycle time is
lowered. This happens because there are fewer components in each stage of the pipeline, so the
propagation delay is decreased for the overall stage
Pipelining does not help in all cases. There are several possible disadvantages. An instruction pipeline is
said to be fully pipelined if it can accept a new instruction every clock cycle. A pipeline that is not fully
pipelined has wait cycles that delay the progress of the pipeline.
Advantages of Pipelining
The cycle time of the processor is reduced, thus increasing instruction issue-rate in most cases. Some combinational circuits such as adders or multipliers can be made faster by adding more
circuitry. If pipelining is used instead, it can save circuitry vs. a more complex combinational
circuit.
Disadvantages of Pipelining
A non-pipelined processor executes only a single instruction at a time. This prevents branchdelays (in effect, every branch is delayed) and problems with serial instructions being executedconcurrently. Consequently the design is simpler and cheaper to manufacture.
The instruction latency in a non-pipelined processor is slightly lower than in a pipelinedequivalent. This is because extra flip flops must be added to the data path of a pipelined
processor.
A non-pipelined processor will have a stable instruction bandwidth. The performance of apipelined processor is much harder to predict and may vary more widely between different
programs.
Highlights of C6000 Pipeline
The pipeline can dispatch eight parallel instructions every cycle. Parallel instructions proceed simultaneously through each pipeline phase. Serial instructions proceed through the pipeline with a fixed relative phase difference between
instructions.
Load and store addresses appear on the CPU boundary during the same pipeline phase,eliminating read-after-write memory conflicts.
-
8/6/2019 Notes for Digital Signal Processor
22/25
Pipeline Operation Overview
The pipeline phases are divided into three stages:
Fetch
Decode
Execute
All instructions in the DSP instruction set flow through the fetch, decode, and execute stages of the
pipeline. The fetch stage of the pipeline has four phases for all instructions, and the decode stage has
two phases for all instructions. The execute stage of the pipeline requires a varying number of phases,
depending on the type of instruction.
Fetch Phase
The fetch phases of the pipeline are:
PG: Program address generatePS: Program address send
PW: Program access ready wait
PR: Program fetch packet receive
The DSP uses a fetch packet (FP) of eight words. All eight of the words proceed through fetch processing
together, through the PG, PS, PW, and PR phases.
During the PG phase, the program address is generated in the CPU. In the PS phase, the program
address is sent to memory. In the PW phase, a memory read occurs. Finally, in the PR phase, the fetch
packet is received at the CPU.
Decode Phase
The decode phases of the pipeline are:
DP: Instruction dispatch
DC: Instruction decode
In the DP phase of the pipeline, the fetch packets are split into execute packets. Execute packets consist
of one instruction or from two to eight parallel instructions. During the DP phase, the instructions in an
execute packet are assigned to the appropriate functional units. In the DC phase, the source registers,
destination registers, and associated paths are decoded for the execution of the instructions in the
functional units.
Execution Phase
The execute portion of the pipeline is subdivided into five phases (E1-E5). Different types of instructions
require different numbers of these phases to complete their execution.
-
8/6/2019 Notes for Digital Signal Processor
23/25
-
8/6/2019 Notes for Digital Signal Processor
24/25
8. With a neat diagram explain the core architecture of ADSP 21xx DSP?
ADSP 21xx has following architectural features
Computation unitsmultiplier, ALU, shifter, and data register file Program sequencer with related instruction cache, interval timer, and Data Address Generators
(DAG1 and DAG2)
Dual-blocked SRAM External ports for interfacing to off-chip memory, peripherals, and hosts Input/Output (I/O) processor with integrated DMA controllers, serial ports (SPORTs), serial
peripheral interface (SPI) ports, and a UART port
JTAG Test Access Port for board test and emulationADSP 21xx Bus
ADSP-21xx has three onchip buses - PM bus, DM bus, and DMA bus. The PM bus provides access to
either instructions or data. During a single cycle, these buses let the processor access two data operands
(one from PM and one from DM), and access an instruction (from the cache)
-
8/6/2019 Notes for Digital Signal Processor
25/25
How ADSP addresses DSP requirements
Fast, flexible arithmetic computation unitso The ADSP-219x family DSPs execute all computational instructions in a single cycle. They
provide both fast cycle times and a complete set of arithmetic operations.
Unconstrained data flow to and from the computation units. The ADSP-219x has a modifiedHarvard architecture combined with a data register file. In every cycle, the DSP can:
o Read two values from memory or write one value to memoryo Complete one computationo Write up to three values back to the register fileo Extended precision and dynamic range in the computation units
40-Bit Extended Precision. The DSP handles 16-bit integer and fractional formats (twos-complement and unsigned). The processors carry extended precision through result registers in
their computation units, limiting intermediate data truncation errors.
Dual address generators with circular buffering supporto Dual Address Generators. The DSP has two data address generators (DAGs) that provide
immediate or indirect (pre- and post-modify) addressing. Modulus and bit-reverse
operations are supported with memory page constraints on data buffer placement only.
Efficient program sequencingo Efficient Program Sequencing. In addition to zero-overhead loops, the DSP supports
quick setup and exit for loops. Loops are both nestable (eight levels in hardware) and
interruptable. The processors support both delayed and non-delayed branches.