arm processor - 123seminarsonly.com · web viewarm is designed to efficiently access memory using a...
TRANSCRIPT
ARM Processor September 2005
Introduction
The ARM processor core originates within a British computer
company called Acorn. In the mid-1980s they were looking for replacement for
the 6502 processor used in their BBC computer range, which were widely used
in UK schools. None of the 16-bit architectures becoming available at that time
met their requirements, so they designed their own 32-bit processor.
Other companies became interested in this processor, including
Apple who were looking for a processor for their PDA project (which became the
Newton). After much discussion this led to Acorn’s processor design team
splitting off from Acorn at the end of 1990 to become Advanced RISC Machines
Ltd, now just ARM Ltd.
Thus ARM Ltd now designs the ARM family of RISC processor
cores, together with a range of other supporting technologies. One important
point about ARM is that it does not fabricate silicon itself, but instead just
produces the design.
The ARM processor is a powerful low-cost, efficient, low-power
(consumption, that is) RISC processor. Its design was originally for the
Archimedes desktop computer, but somewhat ironically numerous factors about
its design make it unsuitable for use in a desktop machine (for example, the
MMU and cache are the wrong way around). However, many factors about its
design make it an exceptional choice for embedded applications. The ARM
architecture enjoys the widest choice of embedded operating systems (OS) for
system development. OS choice is critical in producing a winning system design
that meets the needs of the developer's chosen market. ARM enables choice by
partnering with many leading suppliers of embedded OS and development
environments. ARM offers a broad range of processor cores to address a wide
Dept. of Computer Science Model Engineering College 1
ARM Processor September 2005
variety of applications while delivering optimum performance, power consumption
and system cost. These cores are designed to meet the needs of three system
categories:
Embedded real-time systems
Embedded real-time systems for storage, automotive body and
power-train, industrial and networking applications
Application platforms
Devices running open operating systems including Linux, Palm OS,
Symbian OS and Windows CE in wireless, consumer entertainment
and digital imaging applications
Secure applications
Smart cards, SIM cards and payment terminals
ARM CPU cores cover a wide range of performance and features
enabling system designers to create solutions that meet their precise
requirements. ARM offers both synthesizable and hard macro products, together
with a range of coprocessors and debug facilities.
“ATAP” stands for ARM Technology Access Program. Creates a
network of independent design service companies and equips them to deliver
ARM-powered designs. Members get access to ARM technology, expertise and
support. Members sometimes referred to as “Approved Design Centers”.
Dept. of Computer Science Model Engineering College 2
ARM Processor September 2005
Why ARM
The main features of ARM processor that makes it outstanding are :-
Built-in architecture extensions - more efficient processing of algorithms to save
CPU overhead, memory and power.
Technologies it uses are
Thumb®2 -Greatly improved code density
DSP - signal process directly in the RISC core
Jazelle® - Java acceleration;
TrustZone™ - Hardware/Software environment for maximum security
Core performance - Through a wide range of functionality and power, parts
running from 1MHz to 1 GHz with architectural performance enhancements for
media and Java.
Tools of choice – ARM has the widest range of hardware and software tools
support of any 32 bit architecture.
Extensive ecosystem of networking ASICs and standard products/ASSPs - more
than 125 standard networking devices for quick time-to-market design cycles.
Wide support - ARM is the best supported microprocessor architecture available.
A wide range of OS, Middleware and tools support an extensive choice of
multimedia codec solutions optimized for ARM processors, are available from the
ARM Connected Community
Physical IP - leading edge for high performance systems
Dept. of Computer Science Model Engineering College 3
ARM Processor September 2005
Design notes
The ARM instruction set follows the 6502 in concept, but includes a number
of features designed to allow the CPU to better pipeline them for execution. In
keeping with traditional RISC concepts, this included tuning the commands to
execute in well-defined times, typically one cycle. A more interesting addition to
the ARM design is the use of a 4-bit condition code on the front of every
instruction, meaning that every instruction can be made a conditional.
This cuts down significantly on the space available for, for example,
displacements in memory access instructions, but on the other hand it does
make it possible to avoid branch instructions when generating code for small if
statements. The standard example of this is Euclid’s GCD algorithm:
(This example is in the C programming language)
int gcd(int i, int j)
{
while (i != j)
if (i > j)
i -= j;
else
j -= i;
return i;
}
Expressed in ARM assembly, the loop, with a little rotation, might look something
like
b test
loop subgt Ri,Ri,Rj
suble Rj,Rj,Ri
Dept. of Computer Science Model Engineering College 4
ARM Processor September 2005
test cmp Ri,Rj
bne loop
which avoids the branches around the then and else clause that one would
typically have to emit.
Another unique feature of the instruction set is the ability to fold shifts
and rotates into the "data processing" (arithmetic, logical, and register-register
move) instructions, so that, for example, the C statement "a += (j << 2);" could be
rendered as a single instruction on the ARM, register allocation permitting.
This results in the typical ARM program being denser than what would
normally be expected of a RISC processor. This implies that there is less need
for load/store operations and that the pipeline is being used more efficiently.
Even though the ARM runs at what many would consider to be low speeds, it
nevertheless competes quite well with much more complex CPU designs.
The ARM processor also has some features rarely seen on other
architectures that are considered RISC, such as PC-relative addressing (indeed,
on the ARM the PC is one of its 16 registers) and pre- and post-increment
addressing modes.
Another item of note is that the ARM has been around for a while, with
the instruction set increasing somewhat over time. Some early ARM processors
(prior to ARM7TDMI), for example, have no instruction to load a two-byte
quantity, so that, strictly speaking, for them it's not possible to generate code that
would behave the way one would expect for C objects of type "volatile short".
There are lots of things which determine the power consumption of a
processor. The most influential on transistor level are the supply voltage, clock-
speed, number of switching transistors and to a lesser extent the transistor
leakage. By lowering supply voltage, the power requirements drop dramatically.
The maximum work frequency drops as well, further lowering power. By only
Dept. of Computer Science Model Engineering College 5
ARM Processor September 2005
powering the parts of the chip that are actually doing some work, you save even
more. If you have a simple implementation with shallow pipelines, using low
amounts of transistors, you are in an even better position.
The low power consumption is because it has approximately 1/25th of
the number of gates of a Pentium. The high performance is because it's designed
better than the Pentium. It doesn't have all the excess baggage the Pentium
carries around with it to make it backwards-compatible with the 486, 386, 286,
186 and 8086. The 8086 was a CISC design anyway, as are all its successors,
whilst the ARM is a RISC design. RISC design is about implementing those
instructions that are used frequently and anything else can be synthesized from
them. CISC design is about throw in a kitchen sink instruction and anything else
you can think of just in case somebody might want to use them. With RISC
design you can make certain simplifications that speed things up - you can
design the instruction decode using hardwired gates but CISC is so complicated
that you have to use microcode which is inherently slower. As far as RISC goes,
the ARM has some wrinkles of its own that add to its performance. The ability to
place a conditional flag on any instruction and to determine whether instructions
can or cannot affect processor flags means that you can often avoid branches
which result in instruction stalls or other slowdowns (on processors that don't
have this ability then you have to add loads of power-consuming extra logic to try
and compensate for branch stalls).
Dept. of Computer Science Model Engineering College 6
ARM Processor September 2005
Programmers Model
Data size and instruction sets
The ARM is a 32-bit architecture. The cause of confusion here is the
term “word” which will mean 16-bits to people with a 16-bit background. In the
ARM world 16-bits is a “half word” as the architecture is a 32-bit one, whereas
“word” means 32-bits.
Jazelle cores can also execute Java byte code Java byte codes are
8-bit instructions designed to be architecture independent. Jazelle
transparently executes most byte codes in hardware and some in highly
optimized ARM code. This is due to a tradeoff between hardware complexity
(power consumption & silicon area) and speed.
Most ARM’s implement two instruction sets - 32-bit ARM Instruction
Set and 16-bit Thumb Instruction Set
Processor Modes
The ARM has seven basic operating modes:
User: unprivileged mode under which most tasks run.
FIQ: entered when a high priority (fast) interrupt is raised.
IRQ: entered when a low priority (normal) interrupt is raised.
Supervisor: entered on reset and when a Software Interrupt instruction is executed.
Abort: used to handle memory access violations.
Undef: used to handle undefined instructions.
System: privileged mode using the same registers as user mode.
Dept. of Computer Science Model Engineering College 7
ARM Processor September 2005
Each key press will switch mode from user -> FIQ ->user -> IRQ ->
user ->SVC -> User -> Undef -> User -> Abort and then back to user.
The Programmers Model can be split into two elements - first of all,
the processor modes and secondly, the processor registers. So let’s start by
looking at the modes.
Now the typical application will run in an unprivileged mode know as
“User” mode, whereas the various exception types will be dealt with in one of the
privileged modes: Fast Interrupt, Supervisor, Abort, Normal Interrupt and
Undefined.
One question here is what is the difference between the privileged
and unprivileged modes? Well in reality very little really - the ARM core has an
output signal (nTRANS on ARM7TDMI, InTRANS, DnTRANS on 9, or encoded
as part of HPROT or BPROT in AMBA) which indicates whether the current mode
is privileged or unprivileged, and this can be used, for instance, by a memory
controller to only allow IO access in a privileged mode. In addition some
operations are only permitted in a privileged mode, such as directly changing the
mode and enabling of interrupts. All current ARM cores implement system mode
(added in architecture v4). This is simply a privileged version of user mode.
Important for re-entrant exceptions because no exceptions can cause system
mode to be entered.
The ARM Register Set
The ARM architecture provides a total of 37 registers, all of which are
32-bits long. 1 dedicated program counter, 1 dedicated current program status
register, 5 dedicated saved program status registers, 30 general purposes
registers .However these are arranged into several banks, with the accessible
bank being governed by the current processor mode. In summary though, in
each mode, the core can access: a particular set of 13 general purposes
Dept. of Computer Science Model Engineering College 8
ARM Processor September 2005
registers (r0 - r12), particular r13 - which is typically used as a stack pointer. This
will be a different r13 for each mode, so allowing each exception type to have its
own stack and particular r14 - which is used as a link (or return address) register.
Again this will be a different r14 for each mode.,r15 - whose only use is as the
Program counter.
The CPSR (Current Program Status Register) - this stores additional
information about the state of the processor: And finally in privileged modes
(except system), a particular SPSR (Saved Program Status Register). This stores
a copy of the previous CPSR value when an exception occurs. This combined
with the link register allows exceptions to return without corrupting processor
state.
Program Status Registers
Condition code flags
N = Negative result from ALU Z = Zero result from ALU
C = ALU operation Carried out V = ALU operation oVerflowed
Sticky Overflow flag - Q flag used to indicates if saturation has occurred
And used only in Architecture 5TE/J.
J bit used in Architecture 5TEJ only, and J = 1indicates that Processor in Jazelle
state.
Interrupt Disable bits.
I = 1: Disables the IRQ. F = 1: Disables the FIQ.
Dept. of Computer Science Model Engineering College
27
31N Z C V Q
28 67
I F T mode16
23
815
5 4 024
f s x c
U n d e f i n e dJ
9
ARM Processor September 2005
T Bit used in Architecture xT only, and T = 0: Processor in ARM state, T = 1:
Processor in Thumb state
Mode bits Specifies the processor mode.
Program Counter
ARM runs in three different states, which are ARM state, Thumb state and
Jazelle state.
ARM is designed to efficiently access memory using a single memory
access cycle. So word accesses must be on a word address boundary, half-
word accesses must be on a half-word address boundary. This includes
instruction fetches. Point out that strictly, the bottom bits of the PC simply do not
exist within the ARM core - hence they are ‘undefined’. Memory system must
ignore these for instruction fetches.
When the processor is executing in ARM state, All instructions are 32 bits
wide and All instructions must be word aligned .Therefore the pc value is stored
in bits [31:2] with bits [1:0] undefined (as instruction cannot be half word or byte
aligned).
When the processor is executing in Thumb state all instructions are 16 bits
wide, all instructions must be half-word aligned. Therefore the pc value is stored
in bits [31:1] with bit [0] undefined (as instruction cannot be byte aligned).
In Jazelle state, the processor doesn’t perform 8-bit fetches from memory.
Instead it does aligned 32-bit fetches (4-byte prefetching) which is more efficient.
Note we don’t mention the PC in Jazelle state because the ‘Jazelle PC’ is
actually stored in r14 - this is technical detail that is not relevant as it is
completely hidden by the Jazelle support code.
When the processor is executing in Jazelle state, all instructions are 8 bits
wide Processor performs a word access to read 4 instructions at once.
Dept. of Computer Science Model Engineering College 10
ARM Processor September 2005
Exception Handling
When an exception occurs, the ARM, Copies CPSR into
SPSR_<mode>,Sets appropriate CPSR bits ,Change to ARM state, Change
to exception mode ,Disable interrupts (if appropriate),Stores the return
address in LR_mode>,Sets PC to vector address
To return, exception handler needs to Restore CPSR from SPSR_<mode>
and Restore PC from LR_<mode>
This can only be done in ARM state.
Exception handling on the ARM is controlled through the use of an area of
memory called the vector table. This lives (normally) at the bottom of the memory
map from 0x0 to 0x1c. Within this table one word is allocated to each of the
various exception types.
This word will contain some form of ARM instruction that should perform a
branch. It does not contain an address.
Reset - executed on power on IRQ - normal interrupt
Undef - when an invalid instruction reaches the FIQ - fast interrupt
Execute stage of the pipeline
SWI - when a software interrupt instruction is
executed.
Prefetch - when an instruction is fetched from
memory that is invalid for some reason, if it
reaches the execute stage then this exception is
taken.
Data - if a load/store instruction tries to access an
Dept. of Computer Science Model Engineering College 11
FIQIR
(Reserved)Data Abort
Prefetch AbortSoftware Interrupt
Undefined Instruction
Reset
Vector Table
ARM Processor September 2005
invalid memory location, then this exception is taken.
When one of these exceptions is taken, the ARM goes through a low-
overhead sequence of actions in order to invoke the appropriate exception
handler. The current instruction is always allowed to complete (except in case of
Reset).
IRQ is disabled on entry to all exceptions; FIQ is also disabled on entry to Reset
and FIQ.
Dept. of Computer Science Model Engineering College 12
ARM Processor September 2005
System Design
Here is a very generic ARM based design that is actually fairly
representative of the designs that we see being done.
Figure – Example of an ARM based System
On-chip there will be an ARM core (obviously) together with a number
of system dependant peripherals. Also required will be some form of interrupt
controller which receives interrupts from the peripherals and raised the IRQ or
FIQ input to the ARM as appropriate. This interrupt controller may also provide
hardware assistance for prioritizing interrupts.
As far as memory is concerned there is likely to be some (cheap)
narrow off-chip ROM (or flash) used to boot the system from. There is also likely
to be some 16-bit wide RAM used to store most of the runtime data and perhaps
some code copied out of the flash. Then on-chip there may well be some 32-bit
memory used to store the interrupt handlers and perhaps stacks.
Dept. of Computer Science Model Engineering College
16 bit RAM
8 bit ROM
32 bit RAM
ARMCore
Peripherals
InterruptController
nFIQnIRQ
13
ARM Processor September 2005
Processor Types
ARM 1 (v1)
This was the very first ARM processor. Actually, when it was first
manufactured in April 1985, it was the very first commercial RISC processor.
Ever.As a testament to the design team, it was "working silicon" in it's first
incarnation, it exceeded it's design goals, and it used less than 25,000
transistors.
The ARM 1 was used in a few evaluation systems on the BBC
micro (Brazil - BBC interfaced ARM), and a PC machine (Springboard - PC
interfaced ARM).It is believed a large proportion of Arthur was developed on the
Brazil hardware. In essence, it is very similar to an ARM 2 - the differences being
that R8 and R9 are not banked in IRQ mode, there's no multiply instruction, no
LDR/STR with register-specified shifts, and no co-processor gubbins.
ARM evaluation system for BBC Master
Dept. of Computer Science Model Engineering College 14
ARM Processor September 2005
ARM 2 (v2)
Experience with the ARM 1 suggested improvements that could
be made. Such additions as the MUL and MLA instructions allowed for real-time
digital signal processing. Back then, it was to aid in generating sounds. Who
could have predicted exactly how suitable to DSP the ARM would be, some
fifteen years later?
In 1985, Acorn hit hard times which led to it being taken over by
Olivetti. It took two years from the arrival of the ARM to the launch of a computer
based upon it. When the first ARM-based machines rolled out, Acorn could gladly
announce to the world that they offered the fastest RISC processor around.
Indeed, the ARM processor kicked ass across the computing league tables, and
for a long time was right up there in the 'fastest processors' listings. But Acorn
faced numerous challenges. The computer market was in disarray, with some
people backing IBM's PC, some the Amiga, and all sorts of little itty-bitty things.
Then Acorn go and launch a machine offering Arthur (which was about as nice
as the first release of Windows) which had no user base, precious little software,
and not much third party support. But they succeeded. The ARM 2 processor
was the first to be used within the RISC OS platform, in the A305, A310, and
A4x0 range. It is an 8MHz processor that was used on all of the early machines,
including the A3000. The ARM 2 is clocked at 8MHz, which translates to
approximately four and a half million instructions per second (0.56 MIPS/MHz).
No current image
ARM 3 (v2as)
Launched in 1989, this processor built on the ARM 2 by offering
4K of cache memory and the SWP instruction. The desktop computers based
Dept. of Computer Science Model Engineering College 15
ARM Processor September 2005
upon it were launched in 1990.Internally, via the dedicated co-processor
interface; CP15 was 'created' to provide processor control and identification.
Several speeds of ARM 3 were produced. The A540 runs a
26MHz version, and the A4 laptop runs a 24MHz version. By far the most
common is the 25MHz version used in the A5000, though those with the 'alpha
variant' have a 33MHz version.
At 25MHz, with 12MHz memory (a la A5000), you can expect around
14 MIPS (0.56 MIPS/MHz).It is interesting to notice that the ARM3 doesn't
'perform' faster - both the ARM2 and the ARM3 average 0.56 MIPS/MHz. The
speed boost comes from the higher clock speed, and the cache.
And just to correct a common misunderstanding, the A4 is not a squashed down
version of the A5000. The A4 actually came first, and some of the design choices
were reflected in the later A5000 design.
ARM3 with FPU
ARM 250 (v2as)
The 'Electron' of ARM processors, this is basically a second level
revision of the ARM 3 design which removes the cache, and combines the
primary chipset (VIDC, IOC, and MEMC) into the one piece of silicon, making the
creation of a cheap'n'cheerful RISC OS computer a simple thing indeed. This
was clocked at 12MHz (the same as the main memory), and offers approximately
7 MIPS (0.58 MIPS/MHz).
Dept. of Computer Science Model Engineering College 16
ARM Processor September 2005
This processor isn't as terrible as it might seem. That the A30x0 range
was built with the ARM250 was probably more a cost-cutting exercise than
intention. The ARM250 was designed for low power consumption and low cost,
both important factors in devices such as portables, PDAs, and organisers -
several of which were developed and, sadly, none of which actually made it to a
release.
No current image
ARM 250 mezzanine
This is not actually a processor. It is included here for historical
interest. It seems the machines that would use the ARM250 were ready before
the processor, so early releases of the machine contained a 'mezzanine' board
which held the ARM 2, IOC, MEMC, and VIDC.
ARM 4 and ARM 5
These processors do not exist.
More and more people began to be interested in the RISC
concept, as at the same sort of time common Intel (and clone) processors
showed a definite trend towards higher power consumption and greater need for
heat dissipation, neither of which are friendly to devices that are supposed to be
running off batteries. The ARM design was seen by several important players as
being the epitome of sleek, powerful RISC design. It was at this time a deal was
struck between Acorn, VLSI (long-time manufacturers of the ARM chipset), and
Apple. This lead to the death of the Acorn RISC Microprocessor, as Advanced
RISC Machines Ltd was born. This new company was committed to design and
support specifically for the processor, without the hassle and baggage of RISC
Dept. of Computer Science Model Engineering College 17
ARM Processor September 2005
OS (the main operating system for the processor and the desktop machines).
Both of those would be left to Acorn. In the change from being a part of Acorn to
being ARM Ltd in it's own right, the whole numbering scheme for the processors
was altered.
ARM 610 (v3)
This processor brought with it two important 'firsts'. The first 'first' was
full 32 bit addressing, and the second 'first' was the opening for a new generation
of ARM based hardware.
Acorn responded by making the RiscPC. In the past, critics were none-
too-keen on the idea of slot-in cards for things like processors and memory (as
used in the A540), and by this time many people were getting extremely annoyed
with the inherent memory limitations in the older hardware, the MEMC can only
address 4Mb of memory, and you can add more by daisy-chaining MEMCs - an
idea that not only sounds hairy, it is hairy!
The RiscPC brought back the slot-in processor with a vengeance.
Future 'better' processors were promised, and a second slot was provided for
alien processors such as the 80486 to be plugged in. As for memory, two SIMM
slots were provided, and the memory was expandable to 256Mb. This does not
sound much as modern PCs come with half that as standard. However you can
get a lot of mileage from a RiscPC fitted with a puny 16Mb of RAM. But, always,
we come back to the 32 bit. Because it has been with us and known about ever
since the first RiscPC rolled out, but few people noticed, or cared. Now as the
new generation of ARM processors drop the 26 bit 'emulation' modes, the RISC
OS users are faced with the option of getting ourselves sorted, or dying.
Ironically, the other mainstream operating systems for the RiscPC hardware -
namely ARMLinux and netbsd/arm32 are already fully 32 bit.
Several speeds were produced; 20MHz, 30Mhz, and the 33MHz part
used in the RiscPC. The ARM610 processor features an on-board MMU to
Dept. of Computer Science Model Engineering College 18
ARM Processor September 2005
handle memory, a 4K cache, and it can even switch itseld from little-endian
operation to big-endian operation. The 33MHz version offers around 28MIPS
(0.84 MIPS/MHz)
.
The RiscPC ARM610 processor card
ARM 710 (v3)
As an enhancement of the ARM610, the ARM 710 offers an increased
cache size (8K rather than 4K), clock frequency increased to 40MHz, improved
write buffer and larger TLB in the MMU.
Additionally, it supports CMOS/TTL inputs, Fastbus, and 3.3V power
but these features are not used in the RiscPC.
Clocked at 40MHz, it offers about 36MIPS (0.9 MIPS/MHz); which
when combined with the additional clock speed, it runs an appreciable amount
faster than the ARM 610.
Dept. of Computer Science Model Engineering College 19
ARM Processor September 2005
ARM710 side by side with an 80486, the coin is a British 10 pence coin.
ARM 7500
The ARM7500 is a RISC based single-chip computer with memory
and I/O control on-chip to minimise external components. The ARM7500 can
drive LCD panels/VDUs if required, and it features power management. The
video controller can output up to a 120MHz pixel rate, 32bit sound, and there are
four A/D convertors on-chip for connection of joysticks etc.
The processor core is basically an ARM710 with a smaller (4K) cache.
The video core is a VIDC2.The IO core is based upon the IOMD.
The memory/clock system is very flexible, designed for maximum uses
with minimum fuss. Setting up a system based upon the ARM7500 should be
fairly simple.
ARM 7500FE
A version of the ARM 7500 with hardware floating point support.
Dept. of Computer Science Model Engineering College 20
ARM Processor September 2005
ARM7500FE, as used in the Bush Internet box.
StrongARM / SA110 (v4)
The StrongARM took the RiscPC from around 40MHz to 200-
300MHz and showed a speed boost that was more than the hardware should
have been able to support. Still severely bottlednecked by the memory and I/O,
the StrongARM made the RiscPC fly. The processor was the first to feature
different instruction and data caches, and this caused quite a lot of self-modifying
code to fail including, amusingly, Acorn's own runtime compression system. But
on the whole, the incompatibilities were not more painful than an OS upgrade
(anybody remember the RISC OS 2 to RISC OS 3 upgrade, and all the programs
that used SYS OS_UpdateMEMC, 64, 64 for a speed boost froze the machine
solid!).
In instruction terms, the StrongARM can offer half-word loads and
stores, and signed half-word and byte loads and stores. Also provided are
instructions for multiplying two 32 bit values (signed or unsigned) and replying
with a 64 bit result. This is documented in the ARM assembler user guide as only
working in 32-bit mode, however experimentation will show you that they work in
26-bit mode as well.
Dept. of Computer Science Model Engineering College 21
ARM Processor September 2005
Later documentation confirms this. The cache has been split into
separate instruction and data cache (Harvard architecture), with both of these
caches being 16K, and the pipeline is now five stages instead of three.
In terms of performance... at 100MHz, it offers 11
A StrongARM mounted on a LART board.
In order to squeeze the maximum from a RiscPC, the Kinetic includes
fast RAM on the processor card itself, as well as a version of RISC OS that
installs itself on the card. Apparently it flies due to removing the memory
bottleneck, though this does cause 'issues' with DMA expansion cards.
Dept. of Computer Science Model Engineering College 22
ARM Processor September 2005
A Kinetic processor card.
SA1100 variant
This is a version of the SA110 designed primarily for portable
applications. I mention it here as I am reliably informed that the SA1100 is the
processor inside the 'faster' Panasonic satellite digibox. It contains the
StrongARM core, MMU, cache, PCMCIA, general I/O controller (including two
serial ports), and a colour/greyscale LCD controller. It runs at 133MHz or
200MHz and it consumes less than half a watt of power.
Thumb
The Thumb instruction set is a reworking of the ARM set, with a few
things omitted. Thumb instructions are 16 bits (instead of the usual 32 bit). This
allows for greater code density in places where memory is restricted. The Thumb
set can only address the first eight registers, and there are no conditional
execution instructions. Also, the Thumb cannot do a number of things required
Dept. of Computer Science Model Engineering College 23
ARM Processor September 2005
for low-level processor exceptions, so the Thumb instruction set will always come
alongside the full ARM instruction set. Exceptions and the like can be handled in
ARM code, with Thumb used for the more regular code.
Other versions
M variants
This is an extension of the version three designs (ARM 6 and ARM 7)
that provides the extended 64 bit multiply instructions. These instructions
became a main part of the instruction set in the ARM version 4 (Strong-Arm, etc).
T variants
These processors include the Thumb instruction set (and, hence, no 26
bit mode).
E variants
These processors include a number of additional instructions which
provide improved performance in typical DSP applications. The 'E' standing for
"Enhanced DSP".
The future
The future is here. Newer ARM processors exist, but they are 32 bit
devices. This means, basically, that RISC OS won't run on them until all of RISC
Dept. of Computer Science Model Engineering College 24
ARM Processor September 2005
OS is modified to be 32 bit safe. As long as BASIC is patched, a reasonable
software base will exist. However all C programs will need to be recompiled. All
relocatable modules will need to be altered. And pretty much all assembler code
will need to be repaired. In cases where source isn't available (ie, anything
written by Computer Concepts), it will be a tedious slog. It is truly one of the
situations that could make or break the platform.
ARM Technologies
Jazelle
The ARM926EJ-S core incorporates the Jazelle instructions for Java
acceleration. This is complemented by the Jazelle Support Code that manages
the interface between the core hardware, the virtual machine and the operating
system.
ARM have implemented a technology that allows certain of their
architectures to execute Java byte code natively in hardware, in another
execution mode alongside the existing ARM and Thumb modes and accessed in
a similar fashion to ARM/Thumb interworking. The first processor with Jazelle
technology was the ARM926EJ-S: Jazelle being denoted by the 'J' in the CPU
name.
ARM Jazelle® technology for acceleration of execution environments
provides our connected community with high quality, class leading solutions for
the ARM architecture that offer the optimum combination of high performance,
low power and low cost.ARM Jazelle DBX (Direct Byte code eXecution)
Dept. of Computer Science Model Engineering College 25
ARM Processor September 2005
technology for direct byte code execution of JavaTM delivers an unparalleled
combination of Java performance and the world's leading 32-bit embedded RISC
architecture - giving platform developers the freedom to run Java applications
alongside established OS, middleware and application code on a single
processor. The single-processor solution offers higher performance, lower
system cost and lower power than coprocessors and dual-processor (dedicated
Java processor and native applications processor) solutions.
ARM Jazelle RCT (Runtime Compilation Target) technology
supports efficient ahead-of-time (AOT) and just-in-time (JIT) compilation with
Java and other execution environments. By ensuring that techniques such as
AOT and JIT can be implemented with up to 3x smaller memory footprint, Jazelle
RCT technology enables high performance with a significant reduction in
application memory footprint and power consumption. Jazelle RCT is applicable
to Java and other execution environments such as Microsoft .NET Compact
Framework. ARM’s Jazelle DBX and Jazelle RCT technologies offer flexibility
and choice in a roadmap for efficient and high performance implementation of
execution environments across a wide range of ARM platforms.
Thumb-2
Thumb-2 technology made its debut in the ARM1156 core,
announced in 2003. Thumb-2 extends the limited 16-bit instruction set of Thumb
with additional 32-bit instructions to give the instruction set more breadth. As a
result the stated aim for Thumb-2 is to achieve code density that is similar to
Thumb with performance similar to the ARM instruction set on 32-bit memory.
Thumb-2 also extends both the ARM and Thumb instruction set with
yet more instructions, including bit-field manipulation, table branches, and
conditional execution.
Dept. of Computer Science Model Engineering College 26
ARM Processor September 2005
Thumb-2EE
Thumb-2EE, marketed as Jazelle RCT, was announced in 2005,
appearing in the Cortex series of cores. Thumb-2EE provides a small extension
to Thumb-2, making the instruction set particularly suited to code generated at
runtime (e.g. by JIT compilation) in managed Execution Environments. Thumb-
2EE is a target for languages such as Java, C#, Perl and Python, and allows JIT
compilers to output smaller compiled code without impacting performance.
New features provided by Thumb-2EE include automatic null pointer checks on
every load and store instruction, an instruction to perform an array bounds check,
and the ability to branch to handlers, which are small sections of frequently called
code, commonly used to implement a feature of a high level language, such as
allocating memory for a new object.
NEON
NEON technology is a combined 64 and 128bit SIMD (Single
Instruction Multiple Data) instruction set that provides standardized acceleration
for media and signal processing applications. NEON can execute MP3 audio
decoder in less than 10 CPU MHz and can run the GSM AMR (Adaptive Multi-
Rate) speech codec using only 13 CPU MHz. It features a comprehensive
instruction set, separate register files and independent execution hardware.
NEON supports 8-, 16-, 32- and 64-bit integer and single precision floating-point
data and operates in SIMD operations for handling audio/video processing as
well as graphics and gaming processing. SIMD is a crucial element in vector
supercomputers which feature simultaneous multiple operations. In NEON, the
SIMD supports up to 16 operations at the same time.
VFP
VFP technology is a coprocessor extension to the ARM architecture. It
provides low-cost single-precision and double-precision floating-point
computation that is fully compliant with the ANSI/IEEE Std 754-1985 Standard for
Dept. of Computer Science Model Engineering College 27
ARM Processor September 2005
Binary Floating-Point Arithmetic. VFP provides floating-point computation
suitable for a wide spectrum of applications such as PDA, smart phones, voice
compression and decompression, three-dimensional graphics and digital audio,
printers, set-top boxes, and automotive applications. The VFP architecture also
supports execution of short vector instructions allowing SIMD (Single Instruction
Multiple Data) parallelism. This is useful in graphics and signal-processing
applications by reducing code size and increasing throughput.
ARM Core Sight Technology
ARM Core Sight technology is designed to meet the wide range of needs of
embedded developers and silicon manufacturers, such as providing wide system
visibility with minimal overhead, thus reducing processor costs.
ARM Core Sight technology provides a complete debug and trace solution for the
entire system- on-chip (SoC). It makes single ARM core and complex, multi-core
SoCs easy to debug and thus speeds development of more reliable, higher
performance ARM Powered products.
By providing system-wide visibility through the smallest port, Core Sight
technology provides the highest standard of debug and trace capabilities and can
be leveraged for all cores and complex peripherals. Core Sight technology builds
on ARM’s current Embedded Trace Macrocell™ (ETM) products, which are
widely licensed and supported by ARM Real View® development tools and all
other leading tool vendors.
Dept. of Computer Science Model Engineering College 28
ARM Processor September 2005
Key Benefits
• Higher visibility of complete system operation through fewer pins
• Direct access by debugger to system memory, for visibility without affecting
CPU operation and for faster code download
• Ability to debug any of multiple cores, even if other cores are in sleep mode or
powered down
• Standard solution across all silicon vendors for widest tools support
• Re-usable for single ARM core, multi-core or core and DSP systems
• Enables faster time-to-market with greater reliability and higher performance
products
• Supports highest frequency processors, including ARM\'s new Cortex cores
• Builds on proven ARM ETM technology
Devices implementing Core Sight technology can comply at four independent
levels:
• Core Sight Debug
o Debug Access Port
o Embedded Cross Trigger
• Core Sight ETM
o Embedded Trace Macro cells
• Core Sight Multi-source Trace
o Trace Funnel
Dept. of Computer Science Model Engineering College 29
ARM Processor September 2005
o Embedded Trace Buffer
o Trace Port Interface Unit
o AHB Trace Macro cell
o Instrumentation Trace
• Core Sight Single Wire
o Single Wire Debug
o Single Wire Viewer
Memory Management
Introduction
The RISC OS machines work with two different types of memory -
logical and physical. The logical memory is the memory as seen by the OS, and
the programmer. Your application begins at &8000 and continues until &xxxxx.
The physical memory is the actual memory in the machine.
Under RISC OS, memory is broken into pages. Older machines have
a page of 8/16/32K (depending on installed memory), and newer machines have
a fixed 4K page. If you were to examine the pages in your application workspace,
you would most likely see that the pages were seemingly random, not in order.
The pages relate to physical memory, combined to provide you with xxxx bytes of
logical memory. The memory controller is constantly shuffling memory around so
that each task that comes into operation 'believes' it is loaded at &8000. Write a
little application to count how many wimp polls occur every second, you'll begin
to appreciate how much is going on in the background.
Dept. of Computer Science Model Engineering College 30
ARM Processor September 2005
MEMC: Older systems
In ARM 2, 250, and 3 machines; the memory is controlled by the
MEMC (Memory Controller). This unit can cope with an address space of 64Mb,
but in reality can only access 4Mb of physical memory. The 64Mb space is split
into three sections: 0Mb - 32Mb : Logical RAM
32Mb - 48Mb : Physical RAM
48Mb - 64Mb : System ROMs and I/O
Parts of the system ROMs and I/O are mapped over each other, so
reading from it gives you code from ROM, and writing to it updates things like the
VIDC (video/sound).
It is possible to fit up to 16Mb of memory to an older machine, but
you will need a matched MEMC for each 4Mb. People have reported that simply
fitting two MEMCs (to give 8Mb) is either hairy or unreliable, or both. In practice,
the hardware to do this properly only really existed for the A540 machine, where
each 4Mb was a slot-in memory card with an on-board MEMC.
The MEMC is capable of restricting access to pages of memory in
certain ways, either complete access, no access, no access in USR mode, or
read-only access. Older versions of RISC OS only implemented this loosely, so
you need to be in SVC mode to access hardware directly but you could quite
easily trample over memory used by other applications.
MMU: Newer systems
Dept. of Computer Science Model Engineering College 31
ARM Processor September 2005
The newer systems, with ARM6 or later processor, have an MMU
built into the processor. This consists of the translation look-aside buffer (TLB),
access control logic, and translation table walk logic. The MMU supports memory
accesses based upon 1Mb sections or 4K pages. The MMU also provides
support for up to 16 'domains', areas of memory with specific access rights.
The TLB caches 64 translated entries. If the entry is for a virtual address, the
control logic determines if access is permitted. If it is, the MMU outputs the
appropriate physical address otherwise is signals the processor to abort.
If the TLB misses (it doesn't contain an entry for the virtual address), the walk
logic will retrieve the translation information from the (full) translation table in
physical memory.If the MMU should be disabled, the virtual address is output
directly as the physical address.
It gets a lot more complicated, suffice to say that more access rights
are possible and you can specify memory to be bufferable and/or cacheable (or
not), and the page size is fixed to 4K. A normal RiscPC offers two banks of RAM,
and is capable of addressing up to 256Mb of RAM in fairly standard PC-style
SIMMs, plus up to 2Mb of VRAM double-ported with the VIDC, plus
hardware/ROM addressing.
On the RiscPC, the maximum address space of an application is 28Mb.
This is not a restriction of the MMU but a restriction in the 26-bit processor mode
used by RISC OS. A 32-bit processor mode could, in theory, allocate the entire
256K to a single task.All current versions of RISC OS are 26-bit.
System limitations
Consider a RiscPC with an ARM610 processor. The cache is 4K.The
bus speed is 16MHz (note, only slightly faster than the A5000!), and the
hardware does not support burst-mode for memory accesses. Upon a context
switch (ie, making an application 'active') you need to remap it's memory to begin
at &8000 and flush the cache.
Dept. of Computer Science Model Engineering College 32
ARM Processor September 2005
The Pipeline
A conventional processor executes instructions one at a time, just
as you expect it to when you write your code. Each execution can be broken
down into three parts, which anybody who has learned this stuff at college will
have fetch, decode, execute burned into their memory.
In English...
1. Fetch
Retrieve the instruction from memory. Don't get all techie -
whether the instruction comes from system memory or the processor
cache is irrelevant, the instruction is not loaded 'into' the processor until it
is specifically requested. The cache simply serves to speed things up. By
loading chunks of system memory into the cache, the processor can
satisfy many more of its instruction fetches by pulling instructions from the
cache. This is necessary because processors are very fast (StrongARMs,
200MHz+; Pentiums up to GHz!) and system memory is not (33, 66, or
Dept. of Computer Science Model Engineering College 33
ARM Processor September 2005
133MHz). To see the effect the cache has on your processor, use *Cache
Off.
2. Decode
Figure out what the instruction is, and what is supposed to be
done.
3.Execute
Perform the requested operation.
Each of these operations is performed along with the electronic
'heartbeat', the clock rate. Example clock rates for several microprocessors
included in Acorn products are given here as an example:
BBC microcomputer 6502 2MHz
Acorn A310-A3000 ARM 2 8MHz
Acorn A5000 ARM 3 25MHz
Acorn A5000/I ARM 3 30MHz
RiscPC600 ARM610 33MHz
RiscPC700 ARM710 40MHz
Early PC co-processor 486SXL-40 33MHz (not 40!)
RiscPC (StrongARM) SA110 202MHz - 278MHz+
As shown in the PC world, processors are running into GHz speeds
(1,000,000,000 ticks/sec) which will necessitate much in the way of speed
tweaks (huge amounts of cache, extremely optimized pipeline) because there is
no way the rest of the system can keep up. Indeed, the rest of the system is likely
to be operating at a quarter of the speed of the processor. The RiscPC is
designed to work, I believe, at 33MHz. That is why people thought the Strong-
Dept. of Computer Science Model Engineering College 34
ARM Processor September 2005
Arm wouldn't give much of a speed boost. However the small size of ARM
programs, coupled with a rather large cache, made the Strong-Arm a viable
proposition in the RiscPC, it bottlenecked horribly, but other factors meant that
this wasn't so visible to the end-user, so the result was a system which is much
faster than the ARM710. More recently, the Kinetic Strong-Arm processor card.
This attempts to alleviate bottlenecks by installing a big wedge of memory
directly on the processor card and using that. It even goes so far as to install the
entirety of RISC OS into that memory so you aren't kept waiting for the ROMs
(which are slower even than RAM).
There is an obvious solution. Since these three stages (fetch,
decode, execute) are fairly independent, would it not be possible to: fetch instruction #3
decode instruction #2
execute instruction #1
...then, on the next clock tick...
fetch instruction #4
decode instruction #3
execute instruction #2
...tick...
fetch instruction #5
decode instruction #4
execute instruction #3
In practice, the answer is yes. And this is exactly what a pipeline is.
Simply by doing this, you have just made your processor three times faster!
Now, it isn't a perfect solution.
Dept. of Computer Science Model Engineering College 35
ARM Processor September 2005
When it comes to a branch, the pipeline is dumped as instructions after a
branch are not required. This is why it is preferable to use conditional
execution and not branching.
Next, you have to keep in mind the program counter is ahead of the
instruction that is currently being executed. So if you see an error at 'x',
then the real error is quite possibly at 'x-8' (or 'x-12' for Strong-Arm).
RISC vs CISC
In the early days of computing, you had a lump of silicon which
performed a number of instructions. As time progressed, more and more facilities
were required, so more and more instructions were added. However, according
to the 20-80 rule, 20% of the available instructions are likely to be used 80% of
the time, with some instructions only used very rarely. Some of these instructions
are very complex, so creating them in silicon is a very arduous task. Instead, the
processor designer uses microcode. To illustrate this, we shall consider a
modern CISC processor (such as a Pentium or 68000 series processor). The
core, the base level, is a fast RISC processor. On top of that is an interpreter
which 'sees' the CISC instructions, and breaks them down into simpler RISC
instructions.
Already, we can see a pretty clear picture emerging. Why, if the
processor is a simple RISC unit, don't we use that? Well, the answer lies more in
politics than design. However Acorn saw this and not being constrained by the
need to remain totally compatible with earlier technologies, they decided to
implement their own RISC processor.
Dept. of Computer Science Model Engineering College 36
ARM Processor September 2005
Up until now, we've not really considered the real differences between
RISC and CISC, so...
A Complex Instruction Set Computer (CISC) provides a large and
powerful range of instructions, which is less flexible to implement. For example,
the 8086 microprocessor family has these instructions:
JA Jump if Above
JAE Jump if Above or Equal
JB Jump if Below
...
JPO Jump if Parity Odd
JS Jump if Sign
JZ Jump if Zero
There are 32 jump instructions in the 8086, and the 80386 adds more.
I've not read a spec sheet for the Pentium-class processors, but I suspect it (and
MMX) would give me a heart attack!
By contrast, the Reduced Instruction Set Computer (RISC) concept is to
identify the sub-components and use those. As these are much simpler, they can
be implemented directly in silicon, so will run at the maximum possible speed.
Nothing is 'translated'. There are only two Jump instructions in the ARM
processor - Branch and Branch with Link. The "if equal, if carry set, if zero" type
of selection is handled by condition options, so for example:
BLNV Branch with Link NeVer (useful!)
BLEQ Branch with Link if EQual
and so on. The BL part is the instruction, and the following part is the condition.
This is made more powerful by the fact that conditional execution can be applied
to most instructions! This has the benefit that you can test something, then only
Dept. of Computer Science Model Engineering College 37
ARM Processor September 2005
do the next few commands if the criteria of the test matched. No branching off,
you simply add conditional flags to the instructions you require to be conditional:
SWI "OS_DoSomethingOrOther" ; call the SWI
MVNVS R0, #0 ; If failed, set R0 to -1
MOVVC R0, #0 ; Else set R0 to 0
Or, for the 80486:
INT $...whatever... ; call the interrupt
CMP AX, 0 ; did it return zero?
JE failed ; if so, it failed, jump to fail code
MOV DX, 0 ; else set DX to 0
return
RET ; and return
failed
MOV DX, 0FFFFH ; failed - set DX to -1
JMP return
The odd flow in that example is designed to allow the fastest non-
branching throughput in the 'did not fail' case. This is at the expense of two
branches in the 'failed' case
Most modern CISC processors, such as the Pentium, uses a fast
RISC core with an interpreter sitting between the core and the instruction. So
when you are running Windows95 on a PC, it is not that much different to trying
to get W95 running on the software PC emulator. Just imagine the power hidden
inside the Pentium...
Another benefit of RISC is that it contains a large number of
registers, most of which can be used as general purpose registers.
Dept. of Computer Science Model Engineering College 38
ARM Processor September 2005
This is not to say that CISC processors cannot have a large number
of registers, some do. However for it's use, a typical RISC processor requires
more registers to give it additional flexibility. Gone are the days when you had
two general purpose registers and an 'accumulator'.
One thing RISC does offer, though, is register independence. As you
have seen above the ARM register set defines at minimum R15 as the program
counter, and R14 as the link register (although, after saving the contents of R14
you can use this register as you wish). R0 to R13 can be used in any way you
choose, although the Operating System defines R13 is used as a stack pointer.
You can, if you don't require a stack, use R13 for your own purposes. APCS
applies firmer rules and assigns more functions to registers (such as Stack Limit).
However, none of these - with the exception of R15 and sometimes R14 - is a
constraint applied by the processor. You do not need to worry about saving your
accumulator in long instructions, you simply make good use of the available
registers.
The 8086 offers you fourteen registers, but with caveats: The first four
(A, B, C, and D) are Data registers (a.k.a. scratch-pad registers). They are 16bit
and accessed as two 8bit registers, thus register A is really AH (A, high-order
byte) and AL (A low-order byte). These can be used as general purpose
registers, but they can also have dedicated functions - Accumulator, Base,
Count, and Data. The next four registers are Segment registers for Code, Data,
Extra, and Stack. Then come the five Offset registers: Instruction Pointer (PC),
SP and BP for the stack, then SI and DI for indexing data. Finally, the flags
register holds the processor state. As you can see, most of the registers are tied
up with the bizarre memory addressing scheme used by the 8086. So only four
general purpose registers are available, and even they are not as flexible as
ARM registers.
Dept. of Computer Science Model Engineering College 39
ARM Processor September 2005
The ARM processor differs again in that it has a reduced number of
instruction classes (Data Processing, Branching, Multiplying, Data Transfer, and
Software Interrupts).
A final example of minimal registers is the 6502 processor, which
offers you:
Accumulator - for results of arithmetic instructions
X register - First general purpose register
Y register - Second general purpose register
PC - Program Counter
SP - Stack Pointer, offset into page one (at &01xx).
PSR - Processor Status Register - the flags.
While it might seem like utter madness to only have two general
purpose registers, the 6502 was a very popular processor in the '80s. Many
famous computers have been built around it. For the Europeans: consider the
Acorn BBC Micro, Master, Electron...For the Americans: consider the Apple2 and
the Commodore PET. The ORIC uses a 6502, and the C64 uses a variant of the
6502.(in case you were wondering, the Spicy uses the other popular processor -
the ever bizarre and freaky Z80)
So if entire systems could be created with a 6502, imagine the
flexibility of the ARM processor. It has been said that the 6502 is the bridge
between CISC design and RISC. Acorn chose the 6502 for their original
machines such as the Atom and the System# units. They went from there to
design their own processor - the ARM.
To summarize the above, the advantages of a RISC processor are:
Quicker time-to-market. A smaller processor will have fewer instructions,
and the design will be less complicated, so it may be produced more
rapidly.
Dept. of Computer Science Model Engineering College 40
ARM Processor September 2005
Smaller 'die size' - the RISC processor requires fewer transistors than
comparable CISC processors...This in turn leads to a smaller silicon size
which, in turn again, leads to less heat dissipation. Most of the heat of
ARM710 is actually generated by the 80486 in the slot beside it (and that's
when it is supposed to be in 'standby').
Related to all of the above, it is a much lower power chip. ARM design
processors in static form so that the processor clock can be stopped
completely, rather than simply slowed down. The Solo computer
(designed for use in third world countries) is a system that will run from a
12V battery, charging from a solar panel.
Internally, a RISC processor has a number of hardwired instructions.
This was also true of the early CISC processors, but these days a typical
CISC processor has a heart which executes microcode instructions which
correlate to the instructions passed into the processor. Ironically, this heart
tends to be RISC.
A RISC processor's simplicity does not necessarily refer to a simple
instruction set.The stack is adjusted accordingly. The '^' pushes the
processor flags into R15 as well as the return address. And it is
conditionally executed. This allows a tidy 'exit from routine' to be
performed in a single instruction. The RISC concept, however, does not
state that all the instructions are simple. If that were true, the ARM would
not have a MUL, as you can do the exact same thing with looping ADDing.
No, the RISC concept means the silicon is simple. It is a simple processor
to implement.
RISC vs ARM
You shouldn't call it "RISC vs CISC" but "ARM vs CISC". For
example conditional execution of (almost) any instruction isn't a typical feature of
RISC processors but can only(?) be found on ARMs. Furthermore there are quite
some people claiming that an ARM isn't really a RISC processor as it doesn't
Dept. of Computer Science Model Engineering College 41
ARM Processor September 2005
provide only a simple instruction set, i.e. you'll hardly find any CISC processor
which provides a single instruction as powerful as a LDREQ R0,[R1,R2,LSR
#16]!
Today it is wrong to claim that CISC processors execute the complex
instructions more slowly, modern processors can execute most complex
instructions with one cycle. They may need very long pipelines to do so (up to 25
stages or so with a Pentium III), but nonetheless they can. And complex
instructions provide a big potential of optimization, i.e. if you have an instruction
which took 10 cycles with the old model and get the new model to execute it in 5
cycles you end up with a speed increase of 100% (without a higher clock
frequency). On the other hand ARM processors executed most instruction in a
single cycle right from the start and thus doesn’t have this optimization potential
(except the MUL instruction).
The argument that RISC processors provide more registers than
CISC processors isn't right. Just take a look at the (good old) 68000, it has about
the same number of registers as the ARM has. And that 80x86 compatible
processors don't provide more registers is just a matter of compatibility. But this
argument isn't completely wrong: RISC processors are much simpler than CISC
processors and thus take up much less space, thus leaving space for additional
functionality like more registers. On the other hand, a RISC processor with only
three or so registers would be a pain to program, i.e. RISC processors simply
need more registers than CISC processors for the same job.
And the argument that RISC processors have pipelining whereas
CISCs don't is plainly wrong. I.e. the ARM2 hadn't whereas the Pentium has...
The advantages of RISC against CISC are those today:
RISC processors are much simpler to build, by this again results in the
following advantages:
o easier to build, i.e. you can use already existing production facilities
Dept. of Computer Science Model Engineering College 42
ARM Processor September 2005
o much less expensive, just compare the price of a XScale with that
of a Pentium III at 1 GHz...
o less power consumption, which again gives two advantages:
much longer use of battery driven devices
no need for cooling of the device, which again gives to
advantages:
smaller design of the whole device
no noise
RISC processors are much simpler to program which doesn't only help the
assembler programmer, but the compiler designer, too. You'll hardly find
any compiler which uses all the functions of a Pentium III optimally.
And then there are the benefits of the ARM processors:
Conditional execution of most instructions, which is a very powerful thing
especially with large pipelines as you have to fill the whole pipeline every
time a branch is taken, that's why CISC processors make a huge effort for
branch prediction.
The shifting of registers while other instructions are executed which mean
that shifts take up no time at all (the 68000 took one cycle per bit to shift)
The conditional setting of flags, i.e. ADD and ADDS, which becomes
extremely powerful together with the conditional execution of instructions
Dept. of Computer Science Model Engineering College 43
ARM Processor September 2005
KEY APPLICATIONS
ARM and Bluetooth Wireless Technology
The Bluetooth specification is controlled and issued by the SIG
(Special Interest Group), which has approximately 2500 members at time of
writing, including ARM which is an Associate Member.
ARM Architecture in Bluetooth Applications
ARM has a leading position as the 'CPU of choice' for Bluetooth applications, as
shown by the IP vendors and silicon vendors that target the ARM architecture
below:
ARM aims to:
•Encourage and assist all Bluetooth IP vendors to target ARM
Dept. of Computer Science Model Engineering College 44
ARM Processor September 2005
•Enable all Bluetooth SoC designers to design in ARM technology
•Bring leading Bluetooth IP to the ARM partnership
ARM's Bluetooth activity provides a focal point for third parties wanting to work
with ARM or for partners and OEMs wishing to access Bluetooth IP.
3D Graphics Acceleration
The anticipated growth of 3D graphics in a wide variety of consumer
products from mobile phones to set top boxes has resulted in a market
requirement for a complete 3D graphics rendering sub-system suitable for
integration in embedded ARM core-based SoC devices.
The launch of 3G mobile networks and the growth of Java™ enabled mobile
devices with large colour displays are together expected to lead to a dramatic
growth in wireless gaming over the next few years. Industry analysts predict that
with the number of wireless gamers around the world growing to between 53
million and 360 million by 2006.
The ARM range of 3D hardware acceleration solutions has been designed to
meet this market requirement and support rich multimedia applications on a wide
variety of portable and consumer products. The family currently features two
products: the ARM MBX R-S™ and MBX HR-S™ for integration with all ARM
processor families.
The ARM 3D graphics acceleration technology is based around the PowerVR®
MBX graphics processor from Imagination Technologies, a low-power and
efficient implementation of the PowerVR Series 3 architecture. Combined with
ARM’s industry leading embedded RISC processor cores, MBX enables complex
3D, 2D and video graphics to be accessed on mobile and consumer platforms.
Dept. of Computer Science Model Engineering College 45
ARM Processor September 2005
Voip
VoIP (Voice Over Internet Protocol) is the ability to packetize voice and
send it through the internet infrastructure. With significant cost and features
benefits over traditional telephony, VoIP is gaining momentum in both the
residential and enterprise markets. At the end of 2004, IDC estimates the US had
over 1M residential VoIP subscribers. This number is expected to reach 6 or 7M
by 2006.
Support and processing for VoIP falls into a wide spectrum of product types,
from line cards in infrastructure devices to desktop phones. ARM’s full range of
processor cores is ideal for meeting the wide performance requirements of this
market. For higher-end VoIP products, including voice infrastructure gateways,
there are a number of chipsets combining an ARM core with digital signal
processing engines or a DSP processor to enable multiple channels of voice. For
low cost phones and terminal adapters, the combination of ARM cores with built-
in DSP extensions and partner software solutions utilizing this digital signal
processing capability enable low cost, low power VoIP implementations.
Hard Disk DrivesHard disk drives may well be the ultimate real-time control system. Managing the
combination of high rotation speeds, extreme precision of actuators, dealing with
turbulence caused by fast disk speeds and external effects such as shock
demand high performance and computationally intensive embedded processing.
The market requires power efficiency, die size and debug capability. ARM is now
widely accepted as the architecture of choice for this demanding market.
Through years of working with HDD partners, ARM has perfected its cores, and
developed leading edge real-time debug solution to meet the needs of HDD
designers. ARM cores can be found in over 30% of all shipments, with ARM-
based designs shipping, or in development, at all major OEMs.
Printers
Dept. of Computer Science Model Engineering College 46
ARM Processor September 2005
ARM is proving to be an excellent choice for printer
applications. ASIC integration risk reduction is balanced with the high
performance requirements of the laser market. Lower costs are achieved in the
ink market while still boosting image quality and throughput.
Conclusion
Basically the ARM architecture has a simple, powerful, yet compact
instruction set which is easy to compile to. Furthermore, most ARM
implementations use (almost) fully associative caches, 3 to 5 stage pipelines,
have a narrow and relatively slow external bus (without L2 caches). They all
support powering down the parts that don't do any work. Finally, the newest
ARMs use the latest process technology to decrease supply voltage rather than
to crank up clock speed.
The low power consumption is because it has approximately 1/25th of
the number of gates of a Pentium. The high performance is because it's designed
better than the Pentium. With RISC design you can make certain simplifications
that speed things up - you can design the instruction decode using hardwired .
Dept. of Computer Science Model Engineering College 47
ARM Processor September 2005
As far as RISC goes, the ARM has some wrinkles of its own that add to
its performance. The ability to place a conditional flag on any instruction and to
determine whether instructions can or cannot affect processor flags means that
you can often avoid branches which result in instruction stalls or other slowdowns
(on processors that don't have this ability then you have to add loads of power-
consuming extra logic to try and compensate for branch stalls). The barrel shifter
allows much more flexibility than ALU shifting and makes ARM instructions
capable of doing a lot more than you first thought. Basically, the ARM is a better
design than the Pentium.
Reference
www.arm.com
http://en.wikipedia.org/
Dept. of Computer Science Model Engineering College 48
ARM Processor September 2005
Dept. of Computer Science Model Engineering College 49