arm processor - 123seminarsonly.com · web viewarm is designed to efficiently access memory using a...

68
ARM Processor September 2005 Introduction The ARM processor core originates within a British computer company called Acorn. In the mid-1980s they were looking for replacement for the 6502 processor used in their BBC computer range, which were widely used in UK schools. None of the 16-bit architectures becoming available at that time met their requirements, so they designed their own 32-bit processor. Other companies became interested in this processor, including Apple who were looking for a processor for their PDA project (which became the Newton). After much discussion this led to Acorn’s processor design team splitting off from Acorn at the end of 1990 to become Advanced RISC Machines Ltd, now just ARM Ltd. Thus ARM Ltd now designs the ARM family of RISC processor cores, together with a range of other supporting technologies. One important point about ARM is that it does not fabricate silicon itself, but instead just produces the design. The ARM processor is a powerful low- cost, efficient, low-power (consumption, that is) RISC processor. Its design was originally for the Archimedes Dept. of Computer Science Model Engineering College 1

Upload: others

Post on 19-Jan-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

Introduction

The ARM processor core originates within a British computer

company called Acorn. In the mid-1980s they were looking for replacement for

the 6502 processor used in their BBC computer range, which were widely used

in UK schools. None of the 16-bit architectures becoming available at that time

met their requirements, so they designed their own 32-bit processor.

Other companies became interested in this processor, including

Apple who were looking for a processor for their PDA project (which became the

Newton). After much discussion this led to Acorn’s processor design team

splitting off from Acorn at the end of 1990 to become Advanced RISC Machines

Ltd, now just ARM Ltd.

Thus ARM Ltd now designs the ARM family of RISC processor

cores, together with a range of other supporting technologies. One important

point about ARM is that it does not fabricate silicon itself, but instead just

produces the design.

The ARM processor is a powerful low-cost, efficient, low-power

(consumption, that is) RISC processor. Its design was originally for the

Archimedes desktop computer, but somewhat ironically numerous factors about

its design make it unsuitable for use in a desktop machine (for example, the

MMU and cache are the wrong way around). However, many factors about its

design make it an exceptional choice for embedded applications. The ARM

architecture enjoys the widest choice of embedded operating systems (OS) for

system development. OS choice is critical in producing a winning system design

that meets the needs of the developer's chosen market. ARM enables choice by

partnering with many leading suppliers of embedded OS and development

environments. ARM offers a broad range of processor cores to address a wide

Dept. of Computer Science Model Engineering College 1

Page 2: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

variety of applications while delivering optimum performance, power consumption

and system cost. These cores are designed to meet the needs of three system

categories:

Embedded real-time systems

Embedded real-time systems for storage, automotive body and

power-train, industrial and networking applications

Application platforms

Devices running open operating systems including Linux, Palm OS,

Symbian OS and Windows CE in wireless, consumer entertainment

and digital imaging applications

Secure applications

Smart cards, SIM cards and payment terminals

ARM CPU cores cover a wide range of performance and features

enabling system designers to create solutions that meet their precise

requirements. ARM offers both synthesizable and hard macro products, together

with a range of coprocessors and debug facilities.

“ATAP” stands for ARM Technology Access Program. Creates a

network of independent design service companies and equips them to deliver

ARM-powered designs. Members get access to ARM technology, expertise and

support. Members sometimes referred to as “Approved Design Centers”.

Dept. of Computer Science Model Engineering College 2

Page 3: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

Why ARM

The main features of ARM processor that makes it outstanding are :-

Built-in architecture extensions - more efficient processing of algorithms to save

CPU overhead, memory and power.

Technologies it uses are

Thumb®2 -Greatly improved code density

DSP - signal process directly in the RISC core

Jazelle® - Java acceleration;

TrustZone™ - Hardware/Software environment for maximum security

Core performance - Through a wide range of functionality and power, parts

running from 1MHz to 1 GHz with architectural performance enhancements for

media and Java.

Tools of choice – ARM has the widest range of hardware and software tools

support of any 32 bit architecture.

Extensive ecosystem of networking ASICs and standard products/ASSPs - more

than 125 standard networking devices for quick time-to-market design cycles.

Wide support - ARM is the best supported microprocessor architecture available.

A wide range of OS, Middleware and tools support an extensive choice of

multimedia codec solutions optimized for ARM processors, are available from the

ARM Connected Community

Physical IP - leading edge for high performance systems

Dept. of Computer Science Model Engineering College 3

Page 4: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

Design notes

The ARM instruction set follows the 6502 in concept, but includes a number

of features designed to allow the CPU to better pipeline them for execution. In

keeping with traditional RISC concepts, this included tuning the commands to

execute in well-defined times, typically one cycle. A more interesting addition to

the ARM design is the use of a 4-bit condition code on the front of every

instruction, meaning that every instruction can be made a conditional.

This cuts down significantly on the space available for, for example,

displacements in memory access instructions, but on the other hand it does

make it possible to avoid branch instructions when generating code for small if

statements. The standard example of this is Euclid’s GCD algorithm:

(This example is in the C programming language)

int gcd(int i, int j)

{

while (i != j)

if (i > j)

i -= j;

else

j -= i;

return i;

}

Expressed in ARM assembly, the loop, with a little rotation, might look something

like

b test

loop subgt Ri,Ri,Rj

suble Rj,Rj,Ri

Dept. of Computer Science Model Engineering College 4

Page 5: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

test cmp Ri,Rj

bne loop

which avoids the branches around the then and else clause that one would

typically have to emit.

Another unique feature of the instruction set is the ability to fold shifts

and rotates into the "data processing" (arithmetic, logical, and register-register

move) instructions, so that, for example, the C statement "a += (j << 2);" could be

rendered as a single instruction on the ARM, register allocation permitting.

This results in the typical ARM program being denser than what would

normally be expected of a RISC processor. This implies that there is less need

for load/store operations and that the pipeline is being used more efficiently.

Even though the ARM runs at what many would consider to be low speeds, it

nevertheless competes quite well with much more complex CPU designs.

The ARM processor also has some features rarely seen on other

architectures that are considered RISC, such as PC-relative addressing (indeed,

on the ARM the PC is one of its 16 registers) and pre- and post-increment

addressing modes.

Another item of note is that the ARM has been around for a while, with

the instruction set increasing somewhat over time. Some early ARM processors

(prior to ARM7TDMI), for example, have no instruction to load a two-byte

quantity, so that, strictly speaking, for them it's not possible to generate code that

would behave the way one would expect for C objects of type "volatile short".

There are lots of things which determine the power consumption of a

processor. The most influential on transistor level are the supply voltage, clock-

speed, number of switching transistors and to a lesser extent the transistor

leakage. By lowering supply voltage, the power requirements drop dramatically.

The maximum work frequency drops as well, further lowering power. By only

Dept. of Computer Science Model Engineering College 5

Page 6: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

powering the parts of the chip that are actually doing some work, you save even

more. If you have a simple implementation with shallow pipelines, using low

amounts of transistors, you are in an even better position.

The low power consumption is because it has approximately 1/25th of

the number of gates of a Pentium. The high performance is because it's designed

better than the Pentium. It doesn't have all the excess baggage the Pentium

carries around with it to make it backwards-compatible with the 486, 386, 286,

186 and 8086. The 8086 was a CISC design anyway, as are all its successors,

whilst the ARM is a RISC design. RISC design is about implementing those

instructions that are used frequently and anything else can be synthesized from

them. CISC design is about throw in a kitchen sink instruction and anything else

you can think of just in case somebody might want to use them. With RISC

design you can make certain simplifications that speed things up - you can

design the instruction decode using hardwired gates but CISC is so complicated

that you have to use microcode which is inherently slower. As far as RISC goes,

the ARM has some wrinkles of its own that add to its performance. The ability to

place a conditional flag on any instruction and to determine whether instructions

can or cannot affect processor flags means that you can often avoid branches

which result in instruction stalls or other slowdowns (on processors that don't

have this ability then you have to add loads of power-consuming extra logic to try

and compensate for branch stalls).

Dept. of Computer Science Model Engineering College 6

Page 7: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

Programmers Model

Data size and instruction sets

The ARM is a 32-bit architecture. The cause of confusion here is the

term “word” which will mean 16-bits to people with a 16-bit background. In the

ARM world 16-bits is a “half word” as the architecture is a 32-bit one, whereas

“word” means 32-bits.

Jazelle cores can also execute Java byte code Java byte codes are

8-bit instructions designed to be architecture independent. Jazelle

transparently executes most byte codes in hardware and some in highly

optimized ARM code. This is due to a tradeoff between hardware complexity

(power consumption & silicon area) and speed.

Most ARM’s implement two instruction sets - 32-bit ARM Instruction

Set and 16-bit Thumb Instruction Set

Processor Modes

The ARM has seven basic operating modes:

User: unprivileged mode under which most tasks run.

FIQ: entered when a high priority (fast) interrupt is raised.

IRQ: entered when a low priority (normal) interrupt is raised.

Supervisor: entered on reset and when a Software Interrupt instruction is executed.

Abort: used to handle memory access violations.

Undef: used to handle undefined instructions.

System: privileged mode using the same registers as user mode.

Dept. of Computer Science Model Engineering College 7

Page 8: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

Each key press will switch mode from user -> FIQ ->user -> IRQ ->

user ->SVC -> User -> Undef -> User -> Abort and then back to user.

The Programmers Model can be split into two elements - first of all,

the processor modes and secondly, the processor registers. So let’s start by

looking at the modes.

Now the typical application will run in an unprivileged mode know as

“User” mode, whereas the various exception types will be dealt with in one of the

privileged modes: Fast Interrupt, Supervisor, Abort, Normal Interrupt and

Undefined.

One question here is what is the difference between the privileged

and unprivileged modes? Well in reality very little really - the ARM core has an

output signal (nTRANS on ARM7TDMI, InTRANS, DnTRANS on 9, or encoded

as part of HPROT or BPROT in AMBA) which indicates whether the current mode

is privileged or unprivileged, and this can be used, for instance, by a memory

controller to only allow IO access in a privileged mode. In addition some

operations are only permitted in a privileged mode, such as directly changing the

mode and enabling of interrupts. All current ARM cores implement system mode

(added in architecture v4). This is simply a privileged version of user mode.

Important for re-entrant exceptions because no exceptions can cause system

mode to be entered.

The ARM Register Set

The ARM architecture provides a total of 37 registers, all of which are

32-bits long. 1 dedicated program counter, 1 dedicated current program status

register, 5 dedicated saved program status registers, 30 general purposes

registers .However these are arranged into several banks, with the accessible

bank being governed by the current processor mode. In summary though, in

each mode, the core can access: a particular set of 13 general purposes

Dept. of Computer Science Model Engineering College 8

Page 9: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

registers (r0 - r12), particular r13 - which is typically used as a stack pointer. This

will be a different r13 for each mode, so allowing each exception type to have its

own stack and particular r14 - which is used as a link (or return address) register.

Again this will be a different r14 for each mode.,r15 - whose only use is as the

Program counter.

The CPSR (Current Program Status Register) - this stores additional

information about the state of the processor: And finally in privileged modes

(except system), a particular SPSR (Saved Program Status Register). This stores

a copy of the previous CPSR value when an exception occurs. This combined

with the link register allows exceptions to return without corrupting processor

state.

Program Status Registers

Condition code flags

N = Negative result from ALU Z = Zero result from ALU

C = ALU operation Carried out V = ALU operation oVerflowed

Sticky Overflow flag - Q flag used to indicates if saturation has occurred

And used only in Architecture 5TE/J.

J bit used in Architecture 5TEJ only, and J = 1indicates that Processor in Jazelle

state.

Interrupt Disable bits.

I = 1: Disables the IRQ. F = 1: Disables the FIQ.

Dept. of Computer Science Model Engineering College

27

31N Z C V Q

28 67

I F T mode16

23

815

5 4 024

f s x c

U n d e f i n e dJ

9

Page 10: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

T Bit used in Architecture xT only, and T = 0: Processor in ARM state, T = 1:

Processor in Thumb state

Mode bits Specifies the processor mode.

Program Counter

ARM runs in three different states, which are ARM state, Thumb state and

Jazelle state.

ARM is designed to efficiently access memory using a single memory

access cycle. So word accesses must be on a word address boundary, half-

word accesses must be on a half-word address boundary. This includes

instruction fetches. Point out that strictly, the bottom bits of the PC simply do not

exist within the ARM core - hence they are ‘undefined’. Memory system must

ignore these for instruction fetches.

When the processor is executing in ARM state, All instructions are 32 bits

wide and All instructions must be word aligned .Therefore the pc value is stored

in bits [31:2] with bits [1:0] undefined (as instruction cannot be half word or byte

aligned).

When the processor is executing in Thumb state all instructions are 16 bits

wide, all instructions must be half-word aligned. Therefore the pc value is stored

in bits [31:1] with bit [0] undefined (as instruction cannot be byte aligned).

In Jazelle state, the processor doesn’t perform 8-bit fetches from memory.

Instead it does aligned 32-bit fetches (4-byte prefetching) which is more efficient.

Note we don’t mention the PC in Jazelle state because the ‘Jazelle PC’ is

actually stored in r14 - this is technical detail that is not relevant as it is

completely hidden by the Jazelle support code.

When the processor is executing in Jazelle state, all instructions are 8 bits

wide Processor performs a word access to read 4 instructions at once.

Dept. of Computer Science Model Engineering College 10

Page 11: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

Exception Handling

When an exception occurs, the ARM, Copies CPSR into

SPSR_<mode>,Sets appropriate CPSR bits ,Change to ARM state, Change

to exception mode ,Disable interrupts (if appropriate),Stores the return

address in LR_mode>,Sets PC to vector address

To return, exception handler needs to Restore CPSR from SPSR_<mode>

and Restore PC from LR_<mode>

This can only be done in ARM state.

Exception handling on the ARM is controlled through the use of an area of

memory called the vector table. This lives (normally) at the bottom of the memory

map from 0x0 to 0x1c. Within this table one word is allocated to each of the

various exception types.

This word will contain some form of ARM instruction that should perform a

branch. It does not contain an address.

Reset - executed on power on IRQ - normal interrupt

Undef - when an invalid instruction reaches the FIQ - fast interrupt

Execute stage of the pipeline

SWI - when a software interrupt instruction is

executed.

Prefetch - when an instruction is fetched from

memory that is invalid for some reason, if it

reaches the execute stage then this exception is

taken.

Data - if a load/store instruction tries to access an

Dept. of Computer Science Model Engineering College 11

FIQIR

(Reserved)Data Abort

Prefetch AbortSoftware Interrupt

Undefined Instruction

Reset

Vector Table

Page 12: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

invalid memory location, then this exception is taken.

When one of these exceptions is taken, the ARM goes through a low-

overhead sequence of actions in order to invoke the appropriate exception

handler. The current instruction is always allowed to complete (except in case of

Reset).

IRQ is disabled on entry to all exceptions; FIQ is also disabled on entry to Reset

and FIQ.

Dept. of Computer Science Model Engineering College 12

Page 13: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

System Design

Here is a very generic ARM based design that is actually fairly

representative of the designs that we see being done.

Figure – Example of an ARM based System

On-chip there will be an ARM core (obviously) together with a number

of system dependant peripherals. Also required will be some form of interrupt

controller which receives interrupts from the peripherals and raised the IRQ or

FIQ input to the ARM as appropriate. This interrupt controller may also provide

hardware assistance for prioritizing interrupts.

As far as memory is concerned there is likely to be some (cheap)

narrow off-chip ROM (or flash) used to boot the system from. There is also likely

to be some 16-bit wide RAM used to store most of the runtime data and perhaps

some code copied out of the flash. Then on-chip there may well be some 32-bit

memory used to store the interrupt handlers and perhaps stacks.

Dept. of Computer Science Model Engineering College

16 bit RAM

8 bit ROM

32 bit RAM

ARMCore

Peripherals

InterruptController

nFIQnIRQ

13

Page 14: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

Processor Types

  ARM 1 (v1)

This was the very first ARM processor. Actually, when it was first

manufactured in April 1985, it was the very first commercial RISC processor.

Ever.As a testament to the design team, it was "working silicon" in it's first

incarnation, it exceeded it's design goals, and it used less than 25,000

transistors.

The ARM 1 was used in a few evaluation systems on the BBC

micro (Brazil - BBC interfaced ARM), and a PC machine (Springboard - PC

interfaced ARM).It is believed a large proportion of Arthur was developed on the

Brazil hardware. In essence, it is very similar to an ARM 2 - the differences being

that R8 and R9 are not banked in IRQ mode, there's no multiply instruction, no

LDR/STR with register-specified shifts, and no co-processor gubbins.

ARM evaluation system for BBC Master

Dept. of Computer Science Model Engineering College 14

Page 15: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

  ARM 2 (v2)

Experience with the ARM 1 suggested improvements that could

be made. Such additions as the MUL and MLA instructions allowed for real-time

digital signal processing. Back then, it was to aid in generating sounds. Who

could have predicted exactly how suitable to DSP the ARM would be, some

fifteen years later?

In 1985, Acorn hit hard times which led to it being taken over by

Olivetti. It took two years from the arrival of the ARM to the launch of a computer

based upon it. When the first ARM-based machines rolled out, Acorn could gladly

announce to the world that they offered the fastest RISC processor around.

Indeed, the ARM processor kicked ass across the computing league tables, and

for a long time was right up there in the 'fastest processors' listings. But Acorn

faced numerous challenges. The computer market was in disarray, with some

people backing IBM's PC, some the Amiga, and all sorts of little itty-bitty things.

Then Acorn go and launch a machine offering Arthur (which was about as nice

as the first release of Windows) which had no user base, precious little software,

and not much third party support. But they succeeded. The ARM 2 processor

was the first to be used within the RISC OS platform, in the A305, A310, and

A4x0 range. It is an 8MHz processor that was used on all of the early machines,

including the A3000. The ARM 2 is clocked at 8MHz, which translates to

approximately four and a half million instructions per second (0.56 MIPS/MHz).

No current image

ARM 3 (v2as)

Launched in 1989, this processor built on the ARM 2 by offering

4K of cache memory and the SWP instruction. The desktop computers based

Dept. of Computer Science Model Engineering College 15

Page 16: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

upon it were launched in 1990.Internally, via the dedicated co-processor

interface; CP15 was 'created' to provide processor control and identification.

Several speeds of ARM 3 were produced. The A540 runs a

26MHz version, and the A4 laptop runs a 24MHz version. By far the most

common is the 25MHz version used in the A5000, though those with the 'alpha

variant' have a 33MHz version.

At 25MHz, with 12MHz memory (a la A5000), you can expect around

14 MIPS (0.56 MIPS/MHz).It is interesting to notice that the ARM3 doesn't

'perform' faster - both the ARM2 and the ARM3 average 0.56 MIPS/MHz. The

speed boost comes from the higher clock speed, and the cache.

And just to correct a common misunderstanding, the A4 is not a squashed down

version of the A5000. The A4 actually came first, and some of the design choices

were reflected in the later A5000 design.

ARM3 with FPU

 

ARM 250 (v2as)

The 'Electron' of ARM processors, this is basically a second level

revision of the ARM 3 design which removes the cache, and combines the

primary chipset (VIDC, IOC, and MEMC) into the one piece of silicon, making the

creation of a cheap'n'cheerful RISC OS computer a simple thing indeed. This

was clocked at 12MHz (the same as the main memory), and offers approximately

7 MIPS (0.58 MIPS/MHz).

Dept. of Computer Science Model Engineering College 16

Page 17: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

This processor isn't as terrible as it might seem. That the A30x0 range

was built with the ARM250 was probably more a cost-cutting exercise than

intention. The ARM250 was designed for low power consumption and low cost,

both important factors in devices such as portables, PDAs, and organisers -

several of which were developed and, sadly, none of which actually made it to a

release.

No current image

  ARM 250 mezzanine

This is not actually a processor. It is included here for historical

interest. It seems the machines that would use the ARM250 were ready before

the processor, so early releases of the machine contained a 'mezzanine' board

which held the ARM 2, IOC, MEMC, and VIDC.

 

ARM 4 and ARM 5

These processors do not exist.

More and more people began to be interested in the RISC

concept, as at the same sort of time common Intel (and clone) processors

showed a definite trend towards higher power consumption and greater need for

heat dissipation, neither of which are friendly to devices that are supposed to be

running off batteries. The ARM design was seen by several important players as

being the epitome of sleek, powerful RISC design. It was at this time a deal was

struck between Acorn, VLSI (long-time manufacturers of the ARM chipset), and

Apple. This lead to the death of the Acorn RISC Microprocessor, as Advanced

RISC Machines Ltd was born. This new company was committed to design and

support specifically for the processor, without the hassle and baggage of RISC

Dept. of Computer Science Model Engineering College 17

Page 18: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

OS (the main operating system for the processor and the desktop machines).

Both of those would be left to Acorn. In the change from being a part of Acorn to

being ARM Ltd in it's own right, the whole numbering scheme for the processors

was altered.

 

ARM 610 (v3)

This processor brought with it two important 'firsts'. The first 'first' was

full 32 bit addressing, and the second 'first' was the opening for a new generation

of ARM based hardware.

Acorn responded by making the RiscPC. In the past, critics were none-

too-keen on the idea of slot-in cards for things like processors and memory (as

used in the A540), and by this time many people were getting extremely annoyed

with the inherent memory limitations in the older hardware, the MEMC can only

address 4Mb of memory, and you can add more by daisy-chaining MEMCs - an

idea that not only sounds hairy, it is hairy!

The RiscPC brought back the slot-in processor with a vengeance.

Future 'better' processors were promised, and a second slot was provided for

alien processors such as the 80486 to be plugged in. As for memory, two SIMM

slots were provided, and the memory was expandable to 256Mb. This does not

sound much as modern PCs come with half that as standard. However you can

get a lot of mileage from a RiscPC fitted with a puny 16Mb of RAM. But, always,

we come back to the 32 bit. Because it has been with us and known about ever

since the first RiscPC rolled out, but few people noticed, or cared. Now as the

new generation of ARM processors drop the 26 bit 'emulation' modes, the RISC

OS users are faced with the option of getting ourselves sorted, or dying.

Ironically, the other mainstream operating systems for the RiscPC hardware -

namely ARMLinux and netbsd/arm32 are already fully 32 bit.

Several speeds were produced; 20MHz, 30Mhz, and the 33MHz part

used in the RiscPC. The ARM610 processor features an on-board MMU to

Dept. of Computer Science Model Engineering College 18

Page 19: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

handle memory, a 4K cache, and it can even switch itseld from little-endian

operation to big-endian operation. The 33MHz version offers around 28MIPS

(0.84 MIPS/MHz)

.

The RiscPC ARM610 processor card

 

ARM 710 (v3)

As an enhancement of the ARM610, the ARM 710 offers an increased

cache size (8K rather than 4K), clock frequency increased to 40MHz, improved

write buffer and larger TLB in the MMU.

Additionally, it supports CMOS/TTL inputs, Fastbus, and 3.3V power

but these features are not used in the RiscPC.

Clocked at 40MHz, it offers about 36MIPS (0.9 MIPS/MHz); which

when combined with the additional clock speed, it runs an appreciable amount

faster than the ARM 610.

Dept. of Computer Science Model Engineering College 19

Page 20: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

ARM710 side by side with an 80486, the coin is a British 10 pence coin.

 

ARM 7500

The ARM7500 is a RISC based single-chip computer with memory

and I/O control on-chip to minimise external components. The ARM7500 can

drive LCD panels/VDUs if required, and it features power management. The

video controller can output up to a 120MHz pixel rate, 32bit sound, and there are

four A/D convertors on-chip for connection of joysticks etc.

The processor core is basically an ARM710 with a smaller (4K) cache.

The video core is a VIDC2.The IO core is based upon the IOMD.

The memory/clock system is very flexible, designed for maximum uses

with minimum fuss. Setting up a system based upon the ARM7500 should be

fairly simple.

  ARM 7500FE

A version of the ARM 7500 with hardware floating point support.

Dept. of Computer Science Model Engineering College 20

Page 21: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

ARM7500FE, as used in the Bush Internet box.

 

StrongARM / SA110 (v4)

The StrongARM took the RiscPC from around 40MHz to 200-

300MHz and showed a speed boost that was more than the hardware should

have been able to support. Still severely bottlednecked by the memory and I/O,

the StrongARM made the RiscPC fly. The processor was the first to feature

different instruction and data caches, and this caused quite a lot of self-modifying

code to fail including, amusingly, Acorn's own runtime compression system. But

on the whole, the incompatibilities were not more painful than an OS upgrade

(anybody remember the RISC OS 2 to RISC OS 3 upgrade, and all the programs

that used SYS OS_UpdateMEMC, 64, 64 for a speed boost froze the machine

solid!).

In instruction terms, the StrongARM can offer half-word loads and

stores, and signed half-word and byte loads and stores. Also provided are

instructions for multiplying two 32 bit values (signed or unsigned) and replying

with a 64 bit result. This is documented in the ARM assembler user guide as only

working in 32-bit mode, however experimentation will show you that they work in

26-bit mode as well.

Dept. of Computer Science Model Engineering College 21

Page 22: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

Later documentation confirms this. The cache has been split into

separate instruction and data cache (Harvard architecture), with both of these

caches being 16K, and the pipeline is now five stages instead of three.

In terms of performance... at 100MHz, it offers 11

A StrongARM mounted on a LART board.

In order to squeeze the maximum from a RiscPC, the Kinetic includes

fast RAM on the processor card itself, as well as a version of RISC OS that

installs itself on the card. Apparently it flies due to removing the memory

bottleneck, though this does cause 'issues' with DMA expansion cards.

Dept. of Computer Science Model Engineering College 22

Page 23: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

A Kinetic processor card.

SA1100 variant

This is a version of the SA110 designed primarily for portable

applications. I mention it here as I am reliably informed that the SA1100 is the

processor inside the 'faster' Panasonic satellite digibox. It contains the

StrongARM core, MMU, cache, PCMCIA, general I/O controller (including two

serial ports), and a colour/greyscale LCD controller. It runs at 133MHz or

200MHz and it consumes less than half a watt of power.

  Thumb

The Thumb instruction set is a reworking of the ARM set, with a few

things omitted. Thumb instructions are 16 bits (instead of the usual 32 bit). This

allows for greater code density in places where memory is restricted. The Thumb

set can only address the first eight registers, and there are no conditional

execution instructions. Also, the Thumb cannot do a number of things required

Dept. of Computer Science Model Engineering College 23

Page 24: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

for low-level processor exceptions, so the Thumb instruction set will always come

alongside the full ARM instruction set. Exceptions and the like can be handled in

ARM code, with Thumb used for the more regular code.

Other versions

M variants

This is an extension of the version three designs (ARM 6 and ARM 7)

that provides the extended 64 bit multiply instructions. These instructions

became a main part of the instruction set in the ARM version 4 (Strong-Arm, etc).

T variants

These processors include the Thumb instruction set (and, hence, no 26

bit mode).

E variants

These processors include a number of additional instructions which

provide improved performance in typical DSP applications. The 'E' standing for

"Enhanced DSP".

 

The future

The future is here. Newer ARM processors exist, but they are 32 bit

devices. This means, basically, that RISC OS won't run on them until all of RISC

Dept. of Computer Science Model Engineering College 24

Page 25: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

OS is modified to be 32 bit safe. As long as BASIC is patched, a reasonable

software base will exist. However all C programs will need to be recompiled. All

relocatable modules will need to be altered. And pretty much all assembler code

will need to be repaired. In cases where source isn't available (ie, anything

written by Computer Concepts), it will be a tedious slog. It is truly one of the

situations that could make or break the platform.

ARM Technologies

Jazelle

The ARM926EJ-S core incorporates the Jazelle instructions for Java

acceleration. This is complemented by the Jazelle Support Code that manages

the interface between the core hardware, the virtual machine and the operating

system.

ARM have implemented a technology that allows certain of their

architectures to execute Java byte code natively in hardware, in another

execution mode alongside the existing ARM and Thumb modes and accessed in

a similar fashion to ARM/Thumb interworking. The first processor with Jazelle

technology was the ARM926EJ-S: Jazelle being denoted by the 'J' in the CPU

name.

ARM Jazelle® technology for acceleration of execution environments

provides our connected community with high quality, class leading solutions for

the ARM architecture that offer the optimum combination of high performance,

low power and low cost.ARM Jazelle DBX (Direct Byte code eXecution)

Dept. of Computer Science Model Engineering College 25

Page 26: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

technology for direct byte code execution of JavaTM delivers an unparalleled

combination of Java performance and the world's leading 32-bit embedded RISC

architecture - giving platform developers the freedom to run Java applications

alongside established OS, middleware and application code on a single

processor. The single-processor solution offers higher performance, lower

system cost and lower power than coprocessors and dual-processor (dedicated

Java processor and native applications processor) solutions.

ARM Jazelle RCT (Runtime Compilation Target) technology

supports efficient ahead-of-time (AOT) and just-in-time (JIT) compilation with

Java and other execution environments. By ensuring that techniques such as

AOT and JIT can be implemented with up to 3x smaller memory footprint, Jazelle

RCT technology enables high performance with a significant reduction in

application memory footprint and power consumption. Jazelle RCT is applicable

to Java and other execution environments such as Microsoft .NET Compact

Framework. ARM’s Jazelle DBX and Jazelle RCT technologies offer flexibility

and choice in a roadmap for efficient and high performance implementation of

execution environments across a wide range of ARM platforms.

Thumb-2

Thumb-2 technology made its debut in the ARM1156 core,

announced in 2003. Thumb-2 extends the limited 16-bit instruction set of Thumb

with additional 32-bit instructions to give the instruction set more breadth. As a

result the stated aim for Thumb-2 is to achieve code density that is similar to

Thumb with performance similar to the ARM instruction set on 32-bit memory.

Thumb-2 also extends both the ARM and Thumb instruction set with

yet more instructions, including bit-field manipulation, table branches, and

conditional execution.

Dept. of Computer Science Model Engineering College 26

Page 27: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

Thumb-2EE

Thumb-2EE, marketed as Jazelle RCT, was announced in 2005,

appearing in the Cortex series of cores. Thumb-2EE provides a small extension

to Thumb-2, making the instruction set particularly suited to code generated at

runtime (e.g. by JIT compilation) in managed Execution Environments. Thumb-

2EE is a target for languages such as Java, C#, Perl and Python, and allows JIT

compilers to output smaller compiled code without impacting performance.

New features provided by Thumb-2EE include automatic null pointer checks on

every load and store instruction, an instruction to perform an array bounds check,

and the ability to branch to handlers, which are small sections of frequently called

code, commonly used to implement a feature of a high level language, such as

allocating memory for a new object.

NEON

NEON technology is a combined 64 and 128bit SIMD (Single

Instruction Multiple Data) instruction set that provides standardized acceleration

for media and signal processing applications. NEON can execute MP3 audio

decoder in less than 10 CPU MHz and can run the GSM AMR (Adaptive Multi-

Rate) speech codec using only 13 CPU MHz. It features a comprehensive

instruction set, separate register files and independent execution hardware.

NEON supports 8-, 16-, 32- and 64-bit integer and single precision floating-point

data and operates in SIMD operations for handling audio/video processing as

well as graphics and gaming processing. SIMD is a crucial element in vector

supercomputers which feature simultaneous multiple operations. In NEON, the

SIMD supports up to 16 operations at the same time.

VFP

VFP technology is a coprocessor extension to the ARM architecture. It

provides low-cost single-precision and double-precision floating-point

computation that is fully compliant with the ANSI/IEEE Std 754-1985 Standard for

Dept. of Computer Science Model Engineering College 27

Page 28: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

Binary Floating-Point Arithmetic. VFP provides floating-point computation

suitable for a wide spectrum of applications such as PDA, smart phones, voice

compression and decompression, three-dimensional graphics and digital audio,

printers, set-top boxes, and automotive applications. The VFP architecture also

supports execution of short vector instructions allowing SIMD (Single Instruction

Multiple Data) parallelism. This is useful in graphics and signal-processing

applications by reducing code size and increasing throughput.

ARM Core Sight Technology

ARM Core Sight technology is designed to meet the wide range of needs of

embedded developers and silicon manufacturers, such as providing wide system

visibility with minimal overhead, thus reducing processor costs.

ARM Core Sight technology provides a complete debug and trace solution for the

entire system- on-chip (SoC). It makes single ARM core and complex, multi-core

SoCs easy to debug and thus speeds development of more reliable, higher

performance ARM Powered products.

By providing system-wide visibility through the smallest port, Core Sight

technology provides the highest standard of debug and trace capabilities and can

be leveraged for all cores and complex peripherals. Core Sight technology builds

on ARM’s current Embedded Trace Macrocell™ (ETM) products, which are

widely licensed and supported by ARM Real View® development tools and all

other leading tool vendors.

Dept. of Computer Science Model Engineering College 28

Page 29: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

Key Benefits

• Higher visibility of complete system operation through fewer pins

• Direct access by debugger to system memory, for visibility without affecting

CPU operation and for faster code download

• Ability to debug any of multiple cores, even if other cores are in sleep mode or

powered down

• Standard solution across all silicon vendors for widest tools support

• Re-usable for single ARM core, multi-core or core and DSP systems

• Enables faster time-to-market with greater reliability and higher performance

products

• Supports highest frequency processors, including ARM\'s new Cortex cores

• Builds on proven ARM ETM technology

Devices implementing Core Sight technology can comply at four independent

levels:

• Core Sight Debug

o Debug Access Port

o Embedded Cross Trigger

• Core Sight ETM

o Embedded Trace Macro cells

• Core Sight Multi-source Trace

o Trace Funnel

Dept. of Computer Science Model Engineering College 29

Page 30: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

o Embedded Trace Buffer

o Trace Port Interface Unit

o AHB Trace Macro cell

o Instrumentation Trace

• Core Sight Single Wire

o Single Wire Debug

o Single Wire Viewer

  Memory Management

Introduction

The RISC OS machines work with two different types of memory -

logical and physical. The logical memory is the memory as seen by the OS, and

the programmer. Your application begins at &8000 and continues until &xxxxx.

The physical memory is the actual memory in the machine.

Under RISC OS, memory is broken into pages. Older machines have

a page of 8/16/32K (depending on installed memory), and newer machines have

a fixed 4K page. If you were to examine the pages in your application workspace,

you would most likely see that the pages were seemingly random, not in order.

The pages relate to physical memory, combined to provide you with xxxx bytes of

logical memory. The memory controller is constantly shuffling memory around so

that each task that comes into operation 'believes' it is loaded at &8000. Write a

little application to count how many wimp polls occur every second, you'll begin

to appreciate how much is going on in the background.

Dept. of Computer Science Model Engineering College 30

Page 31: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

 

MEMC: Older systems

In ARM 2, 250, and 3 machines; the memory is controlled by the

MEMC (Memory Controller). This unit can cope with an address space of 64Mb,

but in reality can only access 4Mb of physical memory. The 64Mb space is split

into three sections: 0Mb - 32Mb : Logical RAM

32Mb - 48Mb : Physical RAM

48Mb - 64Mb : System ROMs and I/O

Parts of the system ROMs and I/O are mapped over each other, so

reading from it gives you code from ROM, and writing to it updates things like the

VIDC (video/sound).

It is possible to fit up to 16Mb of memory to an older machine, but

you will need a matched MEMC for each 4Mb. People have reported that simply

fitting two MEMCs (to give 8Mb) is either hairy or unreliable, or both. In practice,

the hardware to do this properly only really existed for the A540 machine, where

each 4Mb was a slot-in memory card with an on-board MEMC.

The MEMC is capable of restricting access to pages of memory in

certain ways, either complete access, no access, no access in USR mode, or

read-only access. Older versions of RISC OS only implemented this loosely, so

you need to be in SVC mode to access hardware directly but you could quite

easily trample over memory used by other applications.

 

MMU: Newer systems

Dept. of Computer Science Model Engineering College 31

Page 32: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

The newer systems, with ARM6 or later processor, have an MMU

built into the processor. This consists of the translation look-aside buffer (TLB),

access control logic, and translation table walk logic. The MMU supports memory

accesses based upon 1Mb sections or 4K pages. The MMU also provides

support for up to 16 'domains', areas of memory with specific access rights.

The TLB caches 64 translated entries. If the entry is for a virtual address, the

control logic determines if access is permitted. If it is, the MMU outputs the

appropriate physical address otherwise is signals the processor to abort.

If the TLB misses (it doesn't contain an entry for the virtual address), the walk

logic will retrieve the translation information from the (full) translation table in

physical memory.If the MMU should be disabled, the virtual address is output

directly as the physical address.

It gets a lot more complicated, suffice to say that more access rights

are possible and you can specify memory to be bufferable and/or cacheable (or

not), and the page size is fixed to 4K. A normal RiscPC offers two banks of RAM,

and is capable of addressing up to 256Mb of RAM in fairly standard PC-style

SIMMs, plus up to 2Mb of VRAM double-ported with the VIDC, plus

hardware/ROM addressing.

On the RiscPC, the maximum address space of an application is 28Mb.

This is not a restriction of the MMU but a restriction in the 26-bit processor mode

used by RISC OS. A 32-bit processor mode could, in theory, allocate the entire

256K to a single task.All current versions of RISC OS are 26-bit.

 

System limitations

Consider a RiscPC with an ARM610 processor. The cache is 4K.The

bus speed is 16MHz (note, only slightly faster than the A5000!), and the

hardware does not support burst-mode for memory accesses. Upon a context

switch (ie, making an application 'active') you need to remap it's memory to begin

at &8000 and flush the cache.

Dept. of Computer Science Model Engineering College 32

Page 33: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

The Pipeline

A conventional processor executes instructions one at a time, just

as you expect it to when you write your code. Each execution can be broken

down into three parts, which anybody who has learned this stuff at college will

have fetch, decode, execute burned into their memory.

In English...

1. Fetch

Retrieve the instruction from memory. Don't get all techie -

whether the instruction comes from system memory or the processor

cache is irrelevant, the instruction is not loaded 'into' the processor until it

is specifically requested. The cache simply serves to speed things up. By

loading chunks of system memory into the cache, the processor can

satisfy many more of its instruction fetches by pulling instructions from the

cache. This is necessary because processors are very fast (StrongARMs,

200MHz+; Pentiums up to GHz!) and system memory is not (33, 66, or

Dept. of Computer Science Model Engineering College 33

Page 34: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

133MHz). To see the effect the cache has on your processor, use *Cache

Off.

 

2. Decode

Figure out what the instruction is, and what is supposed to be

done.

3.Execute

Perform the requested operation.

Each of these operations is performed along with the electronic

'heartbeat', the clock rate. Example clock rates for several microprocessors

included in Acorn products are given here as an example:

BBC microcomputer 6502 2MHz

Acorn A310-A3000 ARM 2 8MHz

Acorn A5000 ARM 3 25MHz

Acorn A5000/I ARM 3 30MHz

RiscPC600 ARM610 33MHz

RiscPC700 ARM710 40MHz

Early PC co-processor 486SXL-40 33MHz (not 40!)

RiscPC (StrongARM) SA110 202MHz - 278MHz+

As shown in the PC world, processors are running into GHz speeds

(1,000,000,000 ticks/sec) which will necessitate much in the way of speed

tweaks (huge amounts of cache, extremely optimized pipeline) because there is

no way the rest of the system can keep up. Indeed, the rest of the system is likely

to be operating at a quarter of the speed of the processor. The RiscPC is

designed to work, I believe, at 33MHz. That is why people thought the Strong-

Dept. of Computer Science Model Engineering College 34

Page 35: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

Arm wouldn't give much of a speed boost. However the small size of ARM

programs, coupled with a rather large cache, made the Strong-Arm a viable

proposition in the RiscPC, it bottlenecked horribly, but other factors meant that

this wasn't so visible to the end-user, so the result was a system which is much

faster than the ARM710. More recently, the Kinetic Strong-Arm processor card.

This attempts to alleviate bottlenecks by installing a big wedge of memory

directly on the processor card and using that. It even goes so far as to install the

entirety of RISC OS into that memory so you aren't kept waiting for the ROMs

(which are slower even than RAM).

There is an obvious solution. Since these three stages (fetch,

decode, execute) are fairly independent, would it not be possible to: fetch instruction #3

decode instruction #2

execute instruction #1

...then, on the next clock tick...

fetch instruction #4

decode instruction #3

execute instruction #2

...tick...

fetch instruction #5

decode instruction #4

execute instruction #3

In practice, the answer is yes. And this is exactly what a pipeline is.

Simply by doing this, you have just made your processor three times faster!

Now, it isn't a perfect solution.

Dept. of Computer Science Model Engineering College 35

Page 36: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

When it comes to a branch, the pipeline is dumped as instructions after a

branch are not required. This is why it is preferable to use conditional

execution and not branching.

Next, you have to keep in mind the program counter is ahead of the

instruction that is currently being executed. So if you see an error at 'x',

then the real error is quite possibly at 'x-8' (or 'x-12' for Strong-Arm).

RISC vs CISC

In the early days of computing, you had a lump of silicon which

performed a number of instructions. As time progressed, more and more facilities

were required, so more and more instructions were added. However, according

to the 20-80 rule, 20% of the available instructions are likely to be used 80% of

the time, with some instructions only used very rarely. Some of these instructions

are very complex, so creating them in silicon is a very arduous task. Instead, the

processor designer uses microcode. To illustrate this, we shall consider a

modern CISC processor (such as a Pentium or 68000 series processor). The

core, the base level, is a fast RISC processor. On top of that is an interpreter

which 'sees' the CISC instructions, and breaks them down into simpler RISC

instructions.

Already, we can see a pretty clear picture emerging. Why, if the

processor is a simple RISC unit, don't we use that? Well, the answer lies more in

politics than design. However Acorn saw this and not being constrained by the

need to remain totally compatible with earlier technologies, they decided to

implement their own RISC processor.

Dept. of Computer Science Model Engineering College 36

Page 37: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

Up until now, we've not really considered the real differences between

RISC and CISC, so...

A Complex Instruction Set Computer (CISC) provides a large and

powerful range of instructions, which is less flexible to implement. For example,

the 8086 microprocessor family has these instructions:

JA Jump if Above

JAE Jump if Above or Equal

JB Jump if Below

...

JPO Jump if Parity Odd

JS Jump if Sign

JZ Jump if Zero

There are 32 jump instructions in the 8086, and the 80386 adds more.

I've not read a spec sheet for the Pentium-class processors, but I suspect it (and

MMX) would give me a heart attack!

By contrast, the Reduced Instruction Set Computer (RISC) concept is to

identify the sub-components and use those. As these are much simpler, they can

be implemented directly in silicon, so will run at the maximum possible speed.

Nothing is 'translated'. There are only two Jump instructions in the ARM

processor - Branch and Branch with Link. The "if equal, if carry set, if zero" type

of selection is handled by condition options, so for example:

BLNV Branch with Link NeVer (useful!)

BLEQ Branch with Link if EQual

and so on. The BL part is the instruction, and the following part is the condition.

This is made more powerful by the fact that conditional execution can be applied

to most instructions! This has the benefit that you can test something, then only

Dept. of Computer Science Model Engineering College 37

Page 38: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

do the next few commands if the criteria of the test matched. No branching off,

you simply add conditional flags to the instructions you require to be conditional:

SWI "OS_DoSomethingOrOther" ; call the SWI

MVNVS R0, #0 ; If failed, set R0 to -1

MOVVC R0, #0 ; Else set R0 to 0

Or, for the 80486:

INT $...whatever... ; call the interrupt

CMP AX, 0 ; did it return zero?

JE failed ; if so, it failed, jump to fail code

MOV DX, 0 ; else set DX to 0

return

RET ; and return

failed

MOV DX, 0FFFFH ; failed - set DX to -1

JMP return

The odd flow in that example is designed to allow the fastest non-

branching throughput in the 'did not fail' case. This is at the expense of two

branches in the 'failed' case

  Most modern CISC processors, such as the Pentium, uses a fast

RISC core with an interpreter sitting between the core and the instruction. So

when you are running Windows95 on a PC, it is not that much different to trying

to get W95 running on the software PC emulator. Just imagine the power hidden

inside the Pentium...

Another benefit of RISC is that it contains a large number of

registers, most of which can be used as general purpose registers.

Dept. of Computer Science Model Engineering College 38

Page 39: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

This is not to say that CISC processors cannot have a large number

of registers, some do. However for it's use, a typical RISC processor requires

more registers to give it additional flexibility. Gone are the days when you had

two general purpose registers and an 'accumulator'.

One thing RISC does offer, though, is register independence. As you

have seen above the ARM register set defines at minimum R15 as the program

counter, and R14 as the link register (although, after saving the contents of R14

you can use this register as you wish). R0 to R13 can be used in any way you

choose, although the Operating System defines R13 is used as a stack pointer.

You can, if you don't require a stack, use R13 for your own purposes. APCS

applies firmer rules and assigns more functions to registers (such as Stack Limit).

However, none of these - with the exception of R15 and sometimes R14 - is a

constraint applied by the processor. You do not need to worry about saving your

accumulator in long instructions, you simply make good use of the available

registers.

The 8086 offers you fourteen registers, but with caveats: The first four

(A, B, C, and D) are Data registers (a.k.a. scratch-pad registers). They are 16bit

and accessed as two 8bit registers, thus register A is really AH (A, high-order

byte) and AL (A low-order byte). These can be used as general purpose

registers, but they can also have dedicated functions - Accumulator, Base,

Count, and Data. The next four registers are Segment registers for Code, Data,

Extra, and Stack. Then come the five Offset registers: Instruction Pointer (PC),

SP and BP for the stack, then SI and DI for indexing data. Finally, the flags

register holds the processor state. As you can see, most of the registers are tied

up with the bizarre memory addressing scheme used by the 8086. So only four

general purpose registers are available, and even they are not as flexible as

ARM registers.

Dept. of Computer Science Model Engineering College 39

Page 40: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

The ARM processor differs again in that it has a reduced number of

instruction classes (Data Processing, Branching, Multiplying, Data Transfer, and

Software Interrupts).

A final example of minimal registers is the 6502 processor, which

offers you: 

 Accumulator - for results of arithmetic instructions

  X register  - First general purpose register

  Y register   - Second general purpose register

  PC          - Program Counter

  SP          - Stack Pointer, offset into page one (at &01xx).

  PSR          - Processor Status Register - the flags.

While it might seem like utter madness to only have two general

purpose registers, the 6502 was a very popular processor in the '80s. Many

famous computers have been built around it. For the Europeans: consider the

Acorn BBC Micro, Master, Electron...For the Americans: consider the Apple2 and

the Commodore PET. The ORIC uses a 6502, and the C64 uses a variant of the

6502.(in case you were wondering, the Spicy uses the other popular processor -

the ever bizarre and freaky Z80)

So if entire systems could be created with a 6502, imagine the

flexibility of the ARM processor. It has been said that the 6502 is the bridge

between CISC design and RISC. Acorn chose the 6502 for their original

machines such as the Atom and the System# units. They went from there to

design their own processor - the ARM.

  To summarize the above, the advantages of a RISC processor are:

Quicker time-to-market. A smaller processor will have fewer instructions,

and the design will be less complicated, so it may be produced more

rapidly.

Dept. of Computer Science Model Engineering College 40

Page 41: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

Smaller 'die size' - the RISC processor requires fewer transistors than

comparable CISC processors...This in turn leads to a smaller silicon size

which, in turn again, leads to less heat dissipation. Most of the heat of

ARM710 is actually generated by the 80486 in the slot beside it (and that's

when it is supposed to be in 'standby').

Related to all of the above, it is a much lower power chip. ARM design

processors in static form so that the processor clock can be stopped

completely, rather than simply slowed down. The Solo computer

(designed for use in third world countries) is a system that will run from a

12V battery, charging from a solar panel.

 

Internally, a RISC processor has a number of hardwired instructions.

This was also true of the early CISC processors, but these days a typical

CISC processor has a heart which executes microcode instructions which

correlate to the instructions passed into the processor. Ironically, this heart

tends to be RISC.

A RISC processor's simplicity does not necessarily refer to a simple

instruction set.The stack is adjusted accordingly. The '^' pushes the

processor flags into R15 as well as the return address. And it is

conditionally executed. This allows a tidy 'exit from routine' to be

performed in a single instruction. The RISC concept, however, does not

state that all the instructions are simple. If that were true, the ARM would

not have a MUL, as you can do the exact same thing with looping ADDing.

No, the RISC concept means the silicon is simple. It is a simple processor

to implement.

RISC vs ARM

You shouldn't call it "RISC vs CISC" but "ARM vs CISC". For

example conditional execution of (almost) any instruction isn't a typical feature of

RISC processors but can only(?) be found on ARMs. Furthermore there are quite

some people claiming that an ARM isn't really a RISC processor as it doesn't

Dept. of Computer Science Model Engineering College 41

Page 42: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

provide only a simple instruction set, i.e. you'll hardly find any CISC processor

which provides a single instruction as powerful as a LDREQ R0,[R1,R2,LSR

#16]!

Today it is wrong to claim that CISC processors execute the complex

instructions more slowly, modern processors can execute most complex

instructions with one cycle. They may need very long pipelines to do so (up to 25

stages or so with a Pentium III), but nonetheless they can. And complex

instructions provide a big potential of optimization, i.e. if you have an instruction

which took 10 cycles with the old model and get the new model to execute it in 5

cycles you end up with a speed increase of 100% (without a higher clock

frequency). On the other hand ARM processors executed most instruction in a

single cycle right from the start and thus doesn’t have this optimization potential

(except the MUL instruction).

The argument that RISC processors provide more registers than

CISC processors isn't right. Just take a look at the (good old) 68000, it has about

the same number of registers as the ARM has. And that 80x86 compatible

processors don't provide more registers is just a matter of compatibility. But this

argument isn't completely wrong: RISC processors are much simpler than CISC

processors and thus take up much less space, thus leaving space for additional

functionality like more registers. On the other hand, a RISC processor with only

three or so registers would be a pain to program, i.e. RISC processors simply

need more registers than CISC processors for the same job.

And the argument that RISC processors have pipelining whereas

CISCs don't is plainly wrong. I.e. the ARM2 hadn't whereas the Pentium has...

The advantages of RISC against CISC are those today:

RISC processors are much simpler to build, by this again results in the

following advantages:

o easier to build, i.e. you can use already existing production facilities

Dept. of Computer Science Model Engineering College 42

Page 43: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

o much less expensive, just compare the price of a XScale with that

of a Pentium III at 1 GHz...

o less power consumption, which again gives two advantages:

much longer use of battery driven devices

no need for cooling of the device, which again gives to

advantages:

smaller design of the whole device

no noise

RISC processors are much simpler to program which doesn't only help the

assembler programmer, but the compiler designer, too. You'll hardly find

any compiler which uses all the functions of a Pentium III optimally.

And then there are the benefits of the ARM processors:

Conditional execution of most instructions, which is a very powerful thing

especially with large pipelines as you have to fill the whole pipeline every

time a branch is taken, that's why CISC processors make a huge effort for

branch prediction.

 

The shifting of registers while other instructions are executed which mean

that shifts take up no time at all (the 68000 took one cycle per bit to shift)

The conditional setting of flags, i.e. ADD and ADDS, which becomes

extremely powerful together with the conditional execution of instructions

 

Dept. of Computer Science Model Engineering College 43

Page 44: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

KEY APPLICATIONS

ARM and Bluetooth Wireless Technology

The Bluetooth specification is controlled and issued by the SIG

(Special Interest Group), which has approximately 2500 members at time of

writing, including ARM which is an Associate Member.

ARM Architecture in Bluetooth Applications

ARM has a leading position as the 'CPU of choice' for Bluetooth applications, as

shown by the IP vendors and silicon vendors that target the ARM architecture

below:

ARM aims to:

•Encourage and assist all Bluetooth IP vendors to target ARM

Dept. of Computer Science Model Engineering College 44

Page 45: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

•Enable all Bluetooth SoC designers to design in ARM technology

•Bring leading Bluetooth IP to the ARM partnership

ARM's Bluetooth activity provides a focal point for third parties wanting to work

with ARM or for partners and OEMs wishing to access Bluetooth IP.

3D Graphics Acceleration

The anticipated growth of 3D graphics in a wide variety of consumer

products from mobile phones to set top boxes has resulted in a market

requirement for a complete 3D graphics rendering sub-system suitable for

integration in embedded ARM core-based SoC devices.

The launch of 3G mobile networks and the growth of Java™ enabled mobile

devices with large colour displays are together expected to lead to a dramatic

growth in wireless gaming over the next few years. Industry analysts predict that

with the number of wireless gamers around the world growing to between 53

million and 360 million by 2006.

The ARM range of 3D hardware acceleration solutions has been designed to

meet this market requirement and support rich multimedia applications on a wide

variety of portable and consumer products. The family currently features two

products: the ARM MBX R-S™ and MBX HR-S™ for integration with all ARM

processor families.

The ARM 3D graphics acceleration technology is based around the PowerVR®

MBX graphics processor from Imagination Technologies, a low-power and

efficient implementation of the PowerVR Series 3 architecture. Combined with

ARM’s industry leading embedded RISC processor cores, MBX enables complex

3D, 2D and video graphics to be accessed on mobile and consumer platforms.

Dept. of Computer Science Model Engineering College 45

Page 46: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

Voip

VoIP (Voice Over Internet Protocol) is the ability to packetize voice and

send it through the internet infrastructure. With significant cost and features

benefits over traditional telephony, VoIP is gaining momentum in both the

residential and enterprise markets. At the end of 2004, IDC estimates the US had

over 1M residential VoIP subscribers. This number is expected to reach 6 or 7M

by 2006.

Support and processing for VoIP falls into a wide spectrum of product types,

from line cards in infrastructure devices to desktop phones. ARM’s full range of

processor cores is ideal for meeting the wide performance requirements of this

market. For higher-end VoIP products, including voice infrastructure gateways,

there are a number of chipsets combining an ARM core with digital signal

processing engines or a DSP processor to enable multiple channels of voice. For

low cost phones and terminal adapters, the combination of ARM cores with built-

in DSP extensions and partner software solutions utilizing this digital signal

processing capability enable low cost, low power VoIP implementations.

Hard Disk DrivesHard disk drives may well be the ultimate real-time control system. Managing the

combination of high rotation speeds, extreme precision of actuators, dealing with

turbulence caused by fast disk speeds and external effects such as shock

demand high performance and computationally intensive embedded processing.

The market requires power efficiency, die size and debug capability. ARM is now

widely accepted as the architecture of choice for this demanding market.

Through years of working with HDD partners, ARM has perfected its cores, and

developed leading edge real-time debug solution to meet the needs of HDD

designers. ARM cores can be found in over 30% of all shipments, with ARM-

based designs shipping, or in development, at all major OEMs.

Printers

Dept. of Computer Science Model Engineering College 46

Page 47: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

ARM is proving to be an excellent choice for printer

applications. ASIC integration risk reduction is balanced with the high

performance requirements of the laser market. Lower costs are achieved in the

ink market while still boosting image quality and throughput.

Conclusion

Basically the ARM architecture has a simple, powerful, yet compact

instruction set which is easy to compile to. Furthermore, most ARM

implementations use (almost) fully associative caches, 3 to 5 stage pipelines,

have a narrow and relatively slow external bus (without L2 caches). They all

support powering down the parts that don't do any work. Finally, the newest

ARMs use the latest process technology to decrease supply voltage rather than

to crank up clock speed.

The low power consumption is because it has approximately 1/25th of

the number of gates of a Pentium. The high performance is because it's designed

better than the Pentium. With RISC design you can make certain simplifications

that speed things up - you can design the instruction decode using hardwired .

Dept. of Computer Science Model Engineering College 47

Page 48: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

As far as RISC goes, the ARM has some wrinkles of its own that add to

its performance. The ability to place a conditional flag on any instruction and to

determine whether instructions can or cannot affect processor flags means that

you can often avoid branches which result in instruction stalls or other slowdowns

(on processors that don't have this ability then you have to add loads of power-

consuming extra logic to try and compensate for branch stalls). The barrel shifter

allows much more flexibility than ALU shifting and makes ARM instructions

capable of doing a lot more than you first thought. Basically, the ARM is a better

design than the Pentium.

Reference

www.arm.com

http://en.wikipedia.org/

Dept. of Computer Science Model Engineering College 48

Page 49: ARM PROCESSOR - 123seminarsonly.com · Web viewARM is designed to efficiently access memory using a single memory access cycle. So word accesses must be on a word address boundary,

ARM Processor September 2005

Dept. of Computer Science Model Engineering College 49