j. fisher // hp labs, 7/99 the mass customization of instruction-set architectures josh fisher hp...

31
J. Fisher // HP Labs, 7/99 The Mass Customization of The Mass Customization of Instruction-set Instruction-set Architectures Architectures Josh Fisher HP Labs Cambridge [email protected] (This talk has been influenced by the work of Paolo Faraboschi, Vas Bala, Stefan Freudenberger, Giuseppe Desoli, and Geoffrey Brown.) ENTIRE PRESENTATION COPYRIGHT © HEWLETT-PACKARD, 1998, 1999

Post on 21-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

J. Fisher // HP Labs, 7/99

The Mass Customization of The Mass Customization of Instruction-set ArchitecturesInstruction-set ArchitecturesThe Mass Customization of The Mass Customization of Instruction-set ArchitecturesInstruction-set Architectures

Josh FisherHP Labs [email protected]

(This talk has been influenced by the work of Paolo Faraboschi, Vas Bala,

Stefan Freudenberger, Giuseppe Desoli, and Geoffrey Brown.)

ENTIRE PRESENTATION COPYRIGHT © HEWLETT-PACKARD, 1998, 1999

J. Fisher // HP Labs, 7/99

John Hennessy once told me:

““When Dave and I were revising our book, several When Dave and I were revising our book, several people suggested that we leave out the material on people suggested that we leave out the material on ISA design for general-purpose systems. It’s ISA design for general-purpose systems. It’s basically a dead subject.”basically a dead subject.”

Conventional Wisdom: No More Variety Conventional Wisdom: No More Variety in CPU Architecturesin CPU Architectures

J. Fisher // HP Labs, 7/99

Architecturally Visible CustomizationArchitecturally Visible CustomizationArchitecturally Visible CustomizationArchitecturally Visible Customization

This talk is about:

Customizing CPUs in ways that are visible in the object code

I’ll emphasize embedded processors, but will also talk about general-purpose processors

What are the barriers? How do we overcome them?

J. Fisher // HP Labs, 7/99

2x performance routinely

Often much higher factors

Why do this? In a word: performance.

My opinion of the performance increases available through visible customization, at least for embedded processors:

Why Customize?Why Customize?Why Customize?Why Customize?

J. Fisher // HP Labs, 7/99

Pressures For Embedded CustomizationPressures For Embedded CustomizationPressures For Embedded CustomizationPressures For Embedded Customization

OpportunityDesktop

ProcessorEmbeddedProcessor

Performance ByTaking Advantage

Of Known Mix

SometimesRelevant (Known

Mix??)Huge Opportunity

ReducedSize

LargelyUnimportant Huge

ReducedPower

LargelyUnimportant Huge

Lowest PossibleCost

SomewhatImportant Huge

J. Fisher // HP Labs, 7/99

Opportunities and Pressures That Opportunities and Pressures That Encourage Embedded CustomizationEncourage Embedded CustomizationOpportunities and Pressures That Opportunities and Pressures That Encourage Embedded CustomizationEncourage Embedded Customization

Add things: We can improve performance by using special operations and mixes tailored to that code

Remove things: We can eliminate superfluous circuits. Thus we can reduce silicon area, cost, power and size, while still equaling or exceeding the performance of a CPU much worse in those respects.

In an embedded environment we know a lot more about the code that will run.

This opens up two complementary kinds of advantages:

J. Fisher // HP Labs, 7/99

The Opportunity Is Everywhere In The Opportunity Is Everywhere In Embedded ProcessingEmbedded ProcessingThe Opportunity Is Everywhere In The Opportunity Is Everywhere In Embedded ProcessingEmbedded Processing

They need processor performance because of:

• Huge data sets &/or big computations (embedded supercomputing)

• Low power requirements; chip area requirements

• Flexibility and time to market: sometimes ASICs are inappropriate

• Cellphones• Printers• Disk Controllers• Digital still image

• Network routers• Video• Medical devices• …and on and on

J. Fisher // HP Labs, 7/99

What Visible Changes Can Be Useful?What Visible Changes Can Be Useful?What Visible Changes Can Be Useful?What Visible Changes Can Be Useful?

Multiple visible ALUs

Number of registers

Register “clusters”

Types of ALUs (float vs. not, special ops, ...)

Latencies

Power saving through visible control

Instruction compression

… and lots more

J. Fisher // HP Labs, 7/99

Existing binaries barrier—A general-purpose processor must be compatible; This is even becoming more relevant to embedded processing

Lost savings/higher chip cost due to lower volumes—You lose the ability to leverage sales to many different users

Toolchain development and maintenance costs; user familiarity—Visible ISAs imply a new software toolchain for each processor

Hardware development costs —Each variant processor needs a new chip design

The product development cycle for embedded products—Users don’t develop products in a way that is very friendly to customization

Barriers to Visible VarietyBarriers to Visible VarietyBarriers to Visible VarietyBarriers to Visible Variety

So why not do this? I can think of five big barriers:

Most of this talk is a discussion of these barriers and some solutions.

J. Fisher // HP Labs, 7/99

Until recently, new companies, or companies that weren't previously selling computers, entered the general-purpose computer business as follows:

Build and sell computers on which no existing binaries would run efficiently. True of IBM, DEC, HP, Prime, Data General, Univac, CDC, ...

Probably more than 100 companies made “significant dents” in the market this way.

IT SEEMS YOU CAN’T DO THIS ANYMORE!

The ISA Barrier—The ProblemThe ISA Barrier—The Problem

J. Fisher // HP Labs, 7/99

ApproximateApproximate Number of New Computer Number of New Computer Companies Fitting Barrier Condition Per YearCompanies Fitting Barrier Condition Per Year

NE

XT

MIP

S

CE

LE

RIT

Y?

, S

ILIC

ON

GR

AP

HIC

SC

OL

EC

O A

DA

M

PY

RA

MID

?,

RID

GE

?,

SU

N

AP

OL

LO

, T

HR

EE

RIV

ER

SA

CO

RN

AT

AR

I

TA

ND

Y,

CO

MM

OD

OR

EA

PP

LE

, S

INC

LA

IR,

…M

ITS

/AL

TA

IRU

NID

AT

A, N

OR

SK

DA

TA

PR

IME

WA

NG

, A

MD

AH

L

DA

TA

GE

NE

RA

L

0

1

2

3

4

5

6

7

8

1948

1950

1952

1954

1956

1958

1960

1962

1964

1966

1968

1970

1972

1974

1976

1978

1980

1982

1984

1986

1988

1990

1992

1994

1996

Year Of Introduction Of Company's First Computer

Nu

mb

er O

f N

ew C

om

pa

nie

s

J. Fisher // HP Labs, 7/99

A new PC demonstrated by former Apple Computer executive Jean-Louis Gassee was impressive enough to elicit a standing ovation from the typically skeptical crowd of some 500 technology executives at the Agenda conference. With its own software operating system, Gassee's BeBox computer will be will be incompatible with everything now on the incompatible with everything now on the market, but the audience still seemed market, but the audience still seemed enthusiasticenthusiastic

New Computer Dazzles a Jaded Industry Crowd New York Times (10/04/95) P.D6; John Markoff

BeBox: The Last Serious Attempt?BeBox: The Last Serious Attempt?

1995

J. Fisher // HP Labs, 7/99

We've taken the decision to focus purely on software development and to cease production of the BeBox. We want to explain what led to this decision and what this implies for you, our developers.

BeBox: The Last Serious Attempt?BeBox: The Last Serious Attempt?

Dear Be Developers, (1997) 1997

J. Fisher // HP Labs, 7/99

““ISA Drift”ISA Drift”““ISA Drift”ISA Drift”

I believe that the constraint of total binary compatibility is too severe for anyone to live with.

For example, in Intel’s 32-bit architectures we have seen or will see the following binary incompatibilities:

1. Hardware floating-point vs. software floating point

2. MMX vs. not MMX

3. And, soon, Katmai New Instructions:

These are, of necessity, dealt with in a way I find quite clumsy.

J. Fisher // HP Labs, 7/99

Techniques that will jimmy binaries after distribution are maturing. This will happen even more because:

““ISA Drift”ISA Drift”““ISA Drift”ISA Drift”

• Even mainstream ISA’s can’t resist some visible variety

• We’re going to see a lot of really lousy binaries (like Java VM code) distributed over the web

This can (and I believe certainly will) lead to the opportunity to customize, and will have the slow effect of causing ISA Drift.

I believe architectures will become families of ISAs that are what we would today call mutually incompatible.

J. Fisher // HP Labs, 7/99

Techniques that operate on binaries after they are distributed have exploded in the past 3 years, some examples are:

What Will Permit “ISA Drift”What Will Permit “ISA Drift”What Will Permit “ISA Drift”What Will Permit “ISA Drift”

1. OCT/Software Migration

2. Code caching

3. Dynamic compiling

4. Dynamic optimization

5. Statistical profiling

6. Link/load/install time techniques

… and so on

J. Fisher // HP Labs, 7/99

Even in the Embedded Market, Binary Even in the Embedded Market, Binary Compatibility Can Be ImportantCompatibility Can Be ImportantEven in the Embedded Market, Binary Even in the Embedded Market, Binary Compatibility Can Be ImportantCompatibility Can Be Important

Mostly, one designs a whole new system:

The trend today is for this to be an increasingly important effect, but I predict that where it is relevant, ISA drift will occur. We will see a lot of ISA variety, just as in the case of general-purpose processors.

However, when the ISA changes, certain 3rd party programs, especially RTOS’s, will change in places, or even require assembly code, to run properly

Most of the effect of this is upon the toolchain (discussed later), but there are other techniques that must be brought to bear

Recompiling is an obvious requirement

J. Fisher // HP Labs, 7/99

Existing binaries barrier—A general-purpose processor must be compatible; This is even becoming more relevant to embedded processing

Lost savings/higher chip cost due to lower volumes—You lose the ability to leverage sales to many different users

Toolchain development and maintenance costs; user familiarity—Visible ISAs imply a new software toolchain for each processor

Hardware development costs —Each variant processor needs a new chip design

The product development cycle for embedded products—Users don’t develop products in a way that is very friendly to customization

Barriers to Visible VarietyBarriers to Visible VarietyBarriers to Visible VarietyBarriers to Visible Variety

So why not do this? I can think of five big barriers:

J. Fisher // HP Labs, 7/99

How Can Low Volume Customized How Can Low Volume Customized Processors Be Competitive?Processors Be Competitive?How Can Low Volume Customized How Can Low Volume Customized Processors Be Competitive?Processors Be Competitive?

Suppose a product designer is choosing between: A simple customized processor as I’m describing, or

A more complex, much larger mass-market, high performance embedded microprocessor

The mass-market processor gets a big volume advantage If successful, many other buyers will select the same chip Silicon processes can be tuned up, big runs can be done Experience (the learning curve) has a dramatic effect on yield Chips can be cranked out like jelly-beans

In similar small volumes, the mass-market processor might cost twice as much or more than a simpler customized processor

But with its much larger volumes it might cost less.

J. Fisher // HP Labs, 7/99

System-On-Chip Changes the EquationSystem-On-Chip Changes the EquationSystem-On-Chip Changes the EquationSystem-On-Chip Changes the Equation

Before: Core was a separate component. A big, complex core that should have been far more expensive might be a large multiple cheaper because of large volumes.

Now: Every chip is made for the anticipated use only.That means that the volume of the actual part will be the same whether one uses a mass-market core, or a customized core.

Customized processors are now on an equal footing!

Core

`

CoreOther logic

J. Fisher // HP Labs, 7/99

Existing binaries barrier—A general-purpose processor must be compatible; This is even becoming more relevant to embedded processing

Lost savings/higher chip cost due to lower volumes—You lose the ability to leverage sales to many different users

Toolchain development and maintenance costs; user familiarity—Visible ISAs imply a new software toolchain for each processor

Hardware development costs —Each variant processor needs a new chip design

The product development cycle for embedded products—Users don’t develop products in a way that is very friendly to customization

Barriers to Visible VarietyBarriers to Visible VarietyBarriers to Visible VarietyBarriers to Visible Variety

So why not do this? I can think of five big barriers:

J. Fisher // HP Labs, 7/99

The solution is to automate all aspects of the variation of ISAs: treat the automation itself as a science.

This then follows a common pattern for highly automated, customized manufacturing:

Toolchain Costs and FamiliarityToolchain Costs and FamiliarityToolchain Costs and FamiliarityToolchain Costs and Familiarity

1. At first, everything that’s built is one-of-a-kind

2. After manufacturing becomes highly automated, only a few different styles are manufactured

3. Later, automation is introduced within the manufacturing process itself, and a great deal of variety becomes possible

In other fields, this is called “mass customization”.

J. Fisher // HP Labs, 7/99

Mass Customization

J. Fisher // HP Labs, 7/99

Model #1

Model #2

Model #3

Model #4

Model #5 ...

ARCHITECTURAL DESCRIPTIONS

Software tool base doesn’t vary with new ISAs--presents single family view to programmers.

(Compiler, assembler, OS, debuggers, simulators, …)

Most difficult part is

compilerthat doesn’t compromise

ILPbecause ofscalability.

Software development is done relative to toolchain, not the hardware.

This is easy to talk about, but hard to implement!

It’s especially hard to write a state-of-the-art compiler that finds ILP and does all other desirably optimizations, while maintaining this philosophy.

Applying “Mass Customization” Applying “Mass Customization” — Build BuildScalable and Customizable ToolchainsScalable and Customizable Toolchains

J. Fisher // HP Labs, 7/99

1. All toolchain changes support all architectures in range

2. Testing methodology uses architectures as if they were test programs (thus NxM tests)

3. Preserve C semantics as best you can (even when you can't really)

4. Fast and accurate simulation of everything. (Direct execution simulation—at HPLC we call it “compiled simulation”)

This requires enormous discipline. You’re always faced with this choice:

• Do a quick thing you can do to make it work on this architecture, or

• Do something painstaking to preserve the generality.

Scalable and Customizable ToolchainsScalable and Customizable Toolchains

Here’s what you probably want to do:

J. Fisher // HP Labs, 7/99

Existing binaries barrier—A general-purpose processor must be compatible; This is even becoming more relevant to embedded processing

Lost savings/higher chip cost due to lower volumes—You lose the ability to leverage sales to many different users

Toolchain development and maintenance costs; user familiarity—Visible ISAs imply a new software toolchain for each processor

Hardware development costs —Each variant processor needs a new chip design

The product development cycle for embedded products—Users don’t develop products in a way that is very friendly to customization

Barriers to Visible VarietyBarriers to Visible VarietyBarriers to Visible VarietyBarriers to Visible Variety

So why not do this? I can think of five big barriers:

J. Fisher // HP Labs, 7/99

The Hardware Design Cost BarrierThe Hardware Design Cost BarrierThe Hardware Design Cost BarrierThe Hardware Design Cost Barrier

(Of these barriers, I have the least I can/will say about hardware design cost.)

Some random points:

1. Not having to pay much attention to binary compatibility allows simpler and more flexible design.

2. As with toolchains, a key is to have the “religion” of designing for families of ISAs.

3. It’s important to leverage the design of family members.

4. I believe that reconfigurable hardware is an important research area. But it seems to be huge factors away from success—I don’t see it making a dent for a long time.

J. Fisher // HP Labs, 7/99

Existing binaries barrier—A general-purpose processor must be compatible; This is even becoming more relevant to embedded processing

Lost savings/higher chip cost due to lower volumes—You lose the ability to leverage sales to many different users

Toolchain development and maintenance costs; user familiarity—Visible ISAs imply a new software toolchain for each processor

Hardware development costs —Each variant processor needs a new chip design

The product development cycle for embedded products—Users don’t develop products in a way that is very friendly to customization

Barriers to Visible VarietyBarriers to Visible VarietyBarriers to Visible VarietyBarriers to Visible Variety

So why not do this? I can think of five big barriers:

J. Fisher // HP Labs, 7/99

Here’s how we wish we could build “appliance” products:

Product Development Cycles and CustomizationProduct Development Cycles and CustomizationProduct Development Cycles and CustomizationProduct Development Cycles and Customization

1. Determine market needs

2. Do the application development

3. Design the final physical product sans final processor (use a similar, but different, processor in this phase).

4. Design and plug in the microprocessor very late in the design process, sort of as if it were the logo, or the power cord.

J. Fisher // HP Labs, 7/99

But instead, here’s what seems to happen (not at HP in particular, everywhere we’ve looked):

1. Do application development while designing/specifying the hardware

2. Start Determining market needs

3. Design final hardware, including selecting microprocessor (continue to do application development)

4. Debug and certify hardware for ~8-18 months (continue to do application development)

5. Determine market needs again, now that you’ve seen hardware and time has passed (continue to do application development)

6. Ship the product (continue to do application development)

Product Development Cycles and CustomizationProduct Development Cycles and CustomizationProduct Development Cycles and CustomizationProduct Development Cycles and Customization

J. Fisher // HP Labs, 7/99

Existing binaries barrier—Someone’s going to jimmy your code at run-time anyway, will cause “ISA drift”; For embedded processors, solutions below apply

Lost savings/higher chip cost due to lower volumes—At least for embedded processors, system-on-chip technology changes this dramatically

Toolchain development and maintenance costs; user familiarity—Toolchains must treat ISAs like data. With respect to toolchains, processors can & should look like members of a superscalar family

Hardware development costs—Design with this paradigm in mind; simpler processors that don’t have to wrestle with binary compatibility are more plastic

The product development cycle for embedded products—Get a better class of users ; ok, that won’t work, so customize a little less precisely when necessary

Solutions to the Barriers to Visible VarietySolutions to the Barriers to Visible VarietySolutions to the Barriers to Visible VarietySolutions to the Barriers to Visible Variety