advanced processor technologies group overview
DESCRIPTION
Advanced Processor Technologies group overview. APT mission. “To explore novel architectures and techniques that will enable the effective exploitation of the billion transistor chips of the near-future”. APT group. Focus: Moore’s Law will soon deliver billion transistor chips - PowerPoint PPT PresentationTRANSCRIPT
1
Advanced Processor Advanced Processor TechnologiesTechnologies
group overviewgroup overview
2
APT missionAPT mission
“To explore novel architectures and techniques that will enable the effective exploitation of the billion transistor chips of the near-future”
3
APT groupAPT group
• Focus:– Moore’s Law will soon deliver
billion transistor chips– how do we make best use of a
billion transistors?•parallel processing•systems-on-chip•novel architectures•…?
4
Strategy/VisionStrategy/Vision
• Industry shift to multicore processors– directly addressed by our CMP work
• Power/heat is performance-limiting– asynchronous and low-power design
have growing importance• Timing closure is a critical problem
– acceptance of mixed timing and GALS• Design automation is vital
– async automation must be competitive
5
Strategy/VisionStrategy/Vision
• Can university groups design state-of-the-art digital silicon?– probably not in conventional
processors– few academic groups still fab digital
chips• Is trying to take designs through to
fabrication still a good idea?– we believe so, because ‘reality’
matters!– but the game is very tough indeed
6
Many-core Many-core Architecture and Architecture and
SoftwareSoftwareMikel Lujan
7
Buying a single-core Buying a single-core processor is difficult!processor is difficult!
Multi-cores bring fundamental changes for Computer Science
[applications, programming languages, compilers runtime systems (OS), computer architecture]
8
Active projectsActive projects
• Managed Runtime Environments and Low-Power Many-core Architectures– DOME Delaying and Overcoming
Microprocessor Errors
• Teraflux– On the search for a “good” parallel
computational model
• AXLE – Accelerating Analytics of Big Data
9
Managed Runtime Managed Runtime EnviromentsEnviroments
• Java, .Net are examples of managed runtime environments (JVM, CLR)
• Key elements: JIT compilation and control of memory allocation
• Research opportunities:– Scaling MREs for many-core architectures (GPUs)– Hardware acceleration of MREs– Use MREs for low-power computing– Use MREs for dealing with faults and transistor
wearout -> DOME
10
TeraFlux Project
• Major focus of current ‘General Purpose’ Many-Core research.
• Three major goals– To define the hardware architecture of a highly
extensible, general purpose multi-core system– To develop a simple to use parallel
programming approach based on programming with • side-effect-free computations + transactions
– How do we simulate/prototype many-cores architectures?
11
Starting Assumptions
• Requiring strongly consistent shared memory is a major impediment to extensibility
• The efficient scheduling of control-flow based threads is hard
• The major complexity in parallel programming is the handling of shared state (locks etc.)
12
Simulate/Prototypemany-core architectures
• Designing a chip is expensive and time consuming• Computer architects build software models to
simulate new architectures• Simulation can be slow (months to run one
application)• How we can accelerate this process? Research
opportunities– New modelling techniques– FPGA prototyping
13
AXLE & Big DataAXLE & Big Data
• Collaboration with Dr. Gavin Brown (MLO group)
• Amount of data generated in scientific experiments or social web keeps growing!
• Graph-based data -> complex computation• How can we make sense of this data
deluge?– New Learning techniques capable of working at scale– Redesign architectures (clusters/data centres) and
software for low power analytics– Accelerate software (JIT adaptation) for data processing– Hardware acceleration for low-power learning algorithms
14
For more background infoFor more background info
• "Future Multi-core Computing" (COMP6062b)– Learn by directed reading and group
discussions of research papers– Practice parallel programming in the labs
• Watch out for the organised ARM & Intel school seminars in Nov and Dec
15
CommunicationCommunicationArchitecturesArchitectures
Javier Navaridas
16
• On-chip networks– Tile-based systems– Heterogeneous systems
• High performance computing networks– Massively Parallel Processing systems– Compute Clusters– Datacentres
InterconnectionInterconnectionNetworksNetworks
17
• Topologies– Routing– Wiring– Fault resilience– Deadlock avoidance
• Router microarchitecture– Congestion control– Quality of Service– Fault tolerance
• Scheduling and resource management– Task placement
• System and workload modelling– Analytical modelling– Simulation
TopicsTopics
18
VirtualizationVirtualization
Alasdair Rawsthorne
19
Unifying Unifying System System and and Process Process VirtualizationVirtualization
• Potential benefits: performance, power, design time, security• Impacts design of future compilers, OS, CPU and runtimes
ApplicationApplication
Operating System
Operating System
CPUCPU
Unvirtualized
ApplicationApplication
Operating System
Operating System
CPUCPU
System Virtualization
(eg Xen, Vmware,
VirtualBox)
Hypervisor/VMMHypervisor/VMM
ApplicationApplication
Operating System
Operating System
CPUCPU
Process Virtualization
(eg JVM, Rosetta,
DynamoRIO, ValGrind)
Dynamic RuntimeDynamic Runtime
Application
Application
Operating System
Operating System
CPUCPU
Unified Virtualization
Optimizing VMM
20
Neural SystemNeural Systems s EngineeringEngineering
Steve Furber,Jim Garside,Dave Lester
21
The SpiNNakerThe SpiNNaker projectproject
• Multi-core CPU node– 18 ARM968 processors– to model large-scale
systems of spiking neurons
– in biological real time
• Scalable up to systems with 10,000s of nodes– over a million
processors– >108 MIPS total
22
Current status…Current status…• Full 18-core chip: arrived 20 May 2011• Test card: 4 chips, 72 processors
– Cards can be linked together• Neuron models: LIF, Izhikevich, MLP• Synapse models: STDP, NMDA• Networks: PyNN -> SpiNNaker, various small
tools to build Router tables, etc• 48-chip 103 machine
…and the next steps:• 500-chip 104 machine (Q4 2012), 5,000-chip 105
machine (H1 2013), 50,000-chip 106 machine (H2 2013).
23
PhD projectsPhD projects
• Recent:– SpiNNaker monitoring– PyNN -> SpiNNaker– Real-time neural learning algorithms– Modelling the rat barrel cortex– Technology scaling on SpiNNaker– Error correction with CRC
24
Technology ScalingTechnology Scaling• 90nm SpiNNaker CPU node
• SP library is faster• requires 128k DTCM
• LL library better overall?(work by Eustace Painkras, UoM PhD)
25
PyNN -> SpiNNPyNN -> SpiNN
• LIF
• Izhikevich
26
PhD projectsPhD projects
• Future:– System software
• run-time fault-tolerance, scaling, …
– SpiNNaker2 architecture exploration– Neural network models
• learning algorithms, rewiring
– Robotics using SpiNNaker– Non-neural algorithms
• graphics, physics modelling, …
27
Emerging Technologies Emerging Technologies for Integrated Circuits for Integrated Circuits
and Systemsand Systems
Let’s do some hard(ware) work
Vasilis Pavlidiswww.cs.man.ac.uk/~pavlidiv
28
3-D Integration 3-D Integration OpportunitiesOpportunities
• The same total area for the two circuits
• RTSV = 170 mΩ, CTSV = 2 fF
• *RCs for 65 nm, Del. Impr: 54%
• Integrate disparate technologies/components
28
2-D global wire of 20 mm 3-D global wire of 12 mm
* “ASU Predictive Technology Model.” [Online]. Available: http://www.eas.asu.edu/~ptm/
29
Three-Dimensional (3-D) Three-Dimensional (3-D) Integrated Circuits and SystemsIntegrated Circuits and Systems
• Develop design methodologies for 3-D ICs
• New models are required to consider the third physical dimension
• Diverse technologies– SiP, interposer, TSVs
• Many challenges exist down the road!!!– Be the first to address them
• Opportunities to tape-out do exist!– CMP/Tezzaron - cmp.imag.fr– Cadence PDK - 3-D Encounter
Xilinx FPGAVirtex 7
30
A New Circuit Design A New Circuit Design Paradigm (Safe Projects )Paradigm (Safe Projects )
• (Re-)Design and assess SpiNNaker-based 3-D architectures– Power, area, performance,
cost/yield– Interposer and TSVs
technologies
• Research methodology– Use available resources– Differentiate only where
required
• Other topics– Can resonance improve energy
efficiency of GALS based architectures?
– Design for manufacturability for GALS systems 2-D/3-D• Considering process, voltage, and
temperature (PVT) variations
• PVT behavior is substantially different in 3-D systems
Develop/extend CAD tools for the physical design of 3-D systems
– Special focus on interposer technologies
31
3-D Integration as a System 3-D Integration as a System Integration Approach Integration Approach (High-Return Projects)(High-Return Projects)
• Heterogeneous 3-D integration– Preached a lot but not explored
(at all)!
• Memory on logic is a single application
• Develop techniques and methods for “Mix-and-Match” systems– How do you model…?– How do evaluate…?– How do you integrate…?– How do you manufacture…?
• The physical proximity of diverse systems may not come for free!
31
Interdisciplinary research is a prerequisite for such systems
Rather application driven
32
PhD GuidelinesPhD Guidelines
• Persistence, Persistence, Persistence!• Manage rejection• Be there early!• Citations value more than publications• Presentation and writing skills
32
PhD is NOT an end in itself but a means to end!
33
Asynchronous Logic Asynchronous Logic Design ToolsDesign Tools
[Doug Edwards,]Jim Garside,
Steve Furber,Alasdair Rawsthorne
34
Previous ProjectsPrevious Projects
• Balsa– world-leading public asynchronous
synthesis tool– used for complete microprocessors
• SEDATE– delay Insensitive datapath synthesis
• GALSA– framework for heterogeneous GALS
• ...
35
GAELSGAELS
• Globally Asynchronous Elastic Logic Synthesis– modern SoCs comprise numerous,
semi-autonomous subsystems– shrinking transistors have hard-to-
predict variations
• Address using Elastic Logic – new, delay tolerant paradigm– new project!
36
ReconfigurableReconfigurableProcessingProcessing
Jim Garside
37
Current ComputingCurrent Computing
• Energy use is a problem• Software
– offers processing flexibility– highly inefficient – big overheads
• Hardware– limited programmability– greater efficiency– expensive to develop
38
A Solution?A Solution?
• Compile an algorithm into a mixture of hardware and software– how to partition the 'code'?– dynamic adaptation
• Existing solutions tend towards static partitioning– require wide skills from developers– sacrifice potential flexibility– intolerant of differing hardware
39
Dynamic Dynamic ReconfigurationReconfiguration
• Keep algorithm in common 'object' format
• Identify, 'compile' and run repeating sections in available hardware
• Adapt to facilities of any given chip – allow for future portability
40
To date ...To date ...
• Can identify critical loops and recompile them to hardware– using pre-existing code
• Developing tool flow• Have reasonable reconfigurable
hardware architectureResults• Promising – not 'earth shattering'
41
FutureFuture
• Want:• Means of expressing algorithms
allowing easy compilation into software or hardware
• Extract/exploit sensible parallelism – 'fine grain' for hardware– 'coarse grain' (?) for software
• Get (some of) the available speed/power efficiency
42
Mobile Systems Mobile Systems ArchitectureArchitecture
Nick Filer with help from
Barry Cheetham
43
Nick FilerNick Filer
• Interests:– Wireless networks of all types. Mainly:
• Ad-hoc, • Voice over IP, • Sensors (data collection) , • Pocket networks (e.g. mobile phones, PDAs), • Information dissemination.
– Supported by:• Simulation, analysis, software generation tools.
– eLearning tools for science.
44
Current Interest - 1Current Interest - 1
• Pocket Networks– Based on clusters of mobile users. – Person to person transport.– What applications are useful, will work,
when and how will applications work?• Voice?• Video?• Delay tolerant text messages?
45
Current Interest - 2Current Interest - 2
• Low power Wireless Sensor Networks– Algorithms for reduced power usage,
mainly getting it low by design.– Intelligent transport/routing protocols
driving low power packet routing.– Smart dust:
• Current cost $100+, needs to be cheaper.• Ultra-low power (NEW): processor, memory,
design.• Nano scale. E.g. for use down oil wells!
46
Current Interest – 3Current Interest – 3• Hand-over in mobile wireless networks.
– Pretty much solved problem (even if not always ideal) for mobile phones.
– Close to solutions for WiFi, WiMAX, Bluetooth, Zigbee etc. Still lots to learn though.
– Currently 3 layer hierarchy – infrastructure Wide Area Personal Area.
– What happens with more layers? • Macro scale to nano scale?• Fixed infrastructure interacting with mobile
autonomous agents?• Just how inefficient are these mechanisms currently?
47
Current Interest - 4Current Interest - 4• Information dissemination in mobile
ad-hoc networks.– P2P technologies.– P2P optimization for task, availability,
handover, low energy, access latency…– P2P to aid DNS like queries (information
retrieval) in mobile, changing topology networks.
– Delay tolerant P2P. Opportunistic communications e.g. send 100,000 sensors down an oil well, get 1 back, what does it know? Own data, others data?
48
Current Interest - 5Current Interest - 5• Real time distributed systems (sound and video)
– Internet choir• Very tight audio constraints (max 50ms)• Demands of latency & bandwidth
– Singing together• Less constrained internet choir but synchronization very
difficult.– Broadcast simulcasts
• Mixed video and sound from various locations.• Broadcast over multiple media types with different delay
etc. characteristics.– Major Obstacles:
• Media types and standards, protocols, congestion, error handling, signal processing, links to hand-over problems ....
Joint with Barry Cheetham
49
Current Interest - 6Current Interest - 6
• Support for adaptable network stacks – Writing or changing software is time
consuming, error prone, …– Models can capture semantics of software:
Purpose, usage, transformation knowledge ...– Hence: Use models to generate
implementations.• Use in teaching/learning, simulation, network
stack implementation.
– Support for adaptable network stacks
50
Current Interest – 7Current Interest – 7
• eLearning for Complex Systems– Most eLearning tools you have seen are not
much more Content Management Systems.– There is currently little or no evidence they
improve student grades!– We have on-going work looking at improving
understanding of wireless systems.– Also, interested in science teaching for
awkward adolescents.
Joint with Barry Cheetham
51
Arithmetic and Arithmetic and Control TheoryControl Theory
Dave Lester
52
Arithmetic and Control Arithmetic and Control TheoryTheory
• Exact Arithmetic– NASA/Boeing
• Correctness of Control Theory Applications– Airbus
• Formalisation and Mechanisation of Probabilistic Reasoning