[harvard cs264] 15a - the onset of parallelism, changes in computer architecture and...
DESCRIPTION
http://cs264.orghttp://goo.gl/mBWaOTRANSCRIPT
The Onset of Parallelism
Changes in computer architecture and Microsoft’s role in the transition
David Rich April 2011
Your introduction – some questions… ! What kind of software do you see
yourself working on in the future? Scientific? Web? Games? Business?
! Have you worked on a distributed app? MPI?
! Have you used Visual Studio? ! Which will limit performance in the
future: Power consumption? Latency? Lack of parallelism? Bugs?
! Made in 1922 by Robert Flaherty
! Considered to be the first full length documentary -though some scenes were staged
! http://en.wikipedia.org/wiki/Nanook_of_the_north
Job Specialization Bricklayer / Masonry Carpenter Caulker / Pointer / Cleaners Cement Mason Construction Lineman Drywall Finisher/Taper Electrician, Elevator Mechanic Electrician, HVAC--Environmental Control System Servicer & Installer Electrician, General Journeyman (Inside) Electrician, Limited Energy Technician A Electrician, Limited Energy Technician B Electrician, Limited Renewable Energy Technician Electrician, Limited Residential Electrician, Sign Maker-Erector / Sign Hanger / Sign Assembler-Fabricator Exterior/Interior Specialist (metal framing & drywall) Finisher, Masonry Floorcoverer Glazier (construction) Heat / Frost Insulator Heavy Duty Repairer
• What about? – Architect – Surveyor – Inspector
• Or people that work in the companies that produce pre-fab components? – Pipes, wires, windows,
fixtures, etc.
Industrial Pipefitter (construction) Industrial Welder (construction) Ironworker, Structural Laborer Marble Setter, Masonry Millwright Construction Machinery Erector Operating Engineer Painter--Decorator / Traffic Control Painter Pile Driver Pipefitter Plasterer Plumber Renewable Energy Technician Roofer Scaffold Erector Sheet Metal Worker Solar Heating/Cooling Systems Installer Sprinkler Fitter Steamfitter Technical Engineer Terrazzo Worker, Masonry Tilesetter, Masonry Tree Trimmer, Power Line Truck Driver (Heavy)
Guggenheim Museum in Bilbao
Acorn pre-fab house
Preparing for the Future – What Will Your Machine Look Like in 5 to 10 Years?
! Look at the Top500, predict and divide: 1. At any point in time, most organizations can
afford a machine which is 1/1000th the size of the #1 machine on the Top500
2. Exaflop comes from 2x efficiency, 2x frequency and 100x the cores
Today’s #1 Tianhe-1A
Test: Is this within your budget? (1/1000th)
Exaflop Your Future Platform
Perf: 2.5 PFs 250 TFs 1000PFs 1PF Nodes 7,168 7 500,000? 500? Cores X86: 86,016
GPU: 3,211,164
X86: 86 -- ~14 Xeons GPU: 3,211 -- ~7 Tesla
130 Million 130 Thousand Cores…
14
-
50,000
100,000
150,000
200,000
250,000
Cor
es
Number of Cores in Top500 #1 Over Time
ASCI Red Earth
Simulator Fujitsu
Blue Gene
RoadRunner
Jaguar
ASCI White
Core Counts On the Rise
-
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
3,500,000 Ju
n 93
N
ov 9
3 Ju
n 94
N
ov 9
4 Ju
n 95
N
ov 9
5 Ju
n 96
N
ov 9
6 Ju
n 97
N
ov 9
7 Ju
n 98
N
ov 9
8 Ju
n 99
N
ov 9
9 Ju
n 00
N
ov 0
0 Ju
n 01
N
ov 0
1 Ju
n 02
N
ov 0
2 Ju
n 03
N
ov 0
3 Ju
n 05
Ju
n 05
N
ov 0
5 Ju
n 06
N
ov 0
6 Ju
n 07
N
ov 0
7 Ju
n 08
N
ov 0
8 Ju
ne 0
9 N
ov 0
9 Ju
n 10
N
ov 1
0
Tianhe-1A GPUs Get to #1...
Good News: Everybody gets a Petaflop!
Bad News: You have to find 200,000 way parallelism!
Caveat: No biology since high school…
Niche vs. Commodity Computing in HPC
80’s 90’s 00’s 10’s 20’s
Vertically Integrated Single Machines IBM, Digital, Cray, HP (Apollo, Data General, Prime, Masscomp, Gould…)
Cluster of SMP RISC + *nix + MPI IBM, Digital, SGI…
Commodity Clusters Horizontal Industry 64bit x86 + Linux HP, IBM, Dell–many others
Commodity Clusters Plus: GPU, Multicore, Cloud, FPGA, “big data” & Windows!
Homogeneity
?
?
“Perfect Predator” Performance growth with decreasing cost and no code changes.
www.calxeda.com
15 years 450 MM users
13 years 550 MM users/mo.
12 years 40 Petabytes/mo.
11 years 12 Bil queries/mo.
7 years 5 Bil conf mins/yr.
6 years 2 Bil emails/day
2 years 12 MM users
Update
500 Million active Windows Live IDs 9.9 Billion messages / day via WL Messenger Over 1 Million BPOS Users in 36 Countries
Microsoft’s Datacenter Evolution
Datacenter Co-Location
Generation 1
Modular Datacenter Generation 4
Server
Capacity
Quincy and San Antonio
Generation 2
Chicago and Dublin Generation 3
Time to Market Lower TCO
Facility PAC
Generation 3 - Chicago Data Center $500M+ investment
707,000 sq ft
1.5 million person hours-of-labor
3000 construction related jobs
7.5 miles of chilled water piping
3400 tons of steel
2400 tons of copper
26,000 cubic yards of concrete
190 miles of conduit
Visual Studio
! Visual Studio is used by over half of the professional programmers in the world
! VS2010 – released a year ago – has been downloaded over 7 million times (more than 4 million extension downloads)
! Main point: when we release a new capability into Visual Studio it automatically gets large adoption
! (story about the ISC developers)
Microsoft and GPUs
The volume business….
GPU Hardware Evolution
Year Version Defining Feature 1996 DirectX3 Hardware rasterization 1997 DirectX5 2 Shading options to select 1998 DirectX6 Multi-texture operations 1999 DirectX7 Vertex Processing in hardware 2000 DirectX8 Programmable Shaders: Vertex and Pixel 2002 DirectX9 High Level Shading Language, 32 instr 2003 DirectX9c 1000s of instructions per shader 2006 DirectX10 Unified Shaders: consistent shader models 2009 DirectX11 Compute Shader: explicit SIMD, random I/O
The GPGPU Software Stack
! Windows has broad support at all levels:
• Supports all HW • Each of CUDA,
OpenCL and DirectCompute
• Almost all high level tools and libraries
Hardware GPU: AMD & NVIDIA
Mullticore x86: AMD & Intel
Low Level Programming CUDA, OpenCL, DirectCompute
High level tools and libraries
PGI “x86 CUDA”, CAPS, Culatools, Volara,
Acceleware
DirectCompute
! What is DirectCompute? • Microso3’s GPGPU Programming Solu<on
• API of the DirectX Family
• Component of the Direct3D API
! Why Use DirectCompute Over Other APIs? • Interoperability with rest of 2D, 3D, Video rendering APIs
(display computed results)
• Cross-‐hardware compa<bility • Feature compa<bility guarantees
• Access to fixed-‐func<on hardware
! Used extensively by the gaming community http://msdn.microsoft.com/directx
GPGPU Development on Windows
! Choice: CUDA, OpenCL or DirectCompute ! Tools and libraries;
Nsight and Visual Studio, PGI, CAPS, MATLAB, Jacket, PyCUDA, Quantifi, CUDA.NET, Culatools, NAG, Scicomp… many others
! NVIDIA reports that over 80% of CUDA SDK downloads are for Windows
Microsoft and NVIDIA
NVIDIA’s Parallel Nsight is integrated with Microsoft’s Visual Studio
Computer Cluster
MATLAB Distributed Computing Server
Windows HPC Server
Desktop Computer
Parallel Computing Toolbox
MATLAB
Workers
HPC Edition
SOA
MPI Cluster SOA Excel ISV / OSS
Applications
Operating Systems
HPC Middleware Pack
HPC Applications
On Premise Cluster Computing
*Note that in SP1 support for MPI applications on Azure does not exist.
Performance Parity Between Linux and Windows
Cores 1 2 4 6 8 16 24 32 48 RedHat 5 U3 5200.43 3385.17 3095.72 2281.25 1790.59 1014.42 776.71 638.43 621.42 Win HPC R2 SP1 5404.38 3298.55 3175.9 2171.37 1736.11 992.82 745.43 610.88 549.74
500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500
Elap
sed
Tim
e [s
ecs]
1 Million active Cells, 1000 wells, Blackoil
Make your choice based on features and TCO…
NEW
! Run multiple instances of Excel 2010 on an HPC Cluster ! Each instance runs an iteration of the same workbook ! Can be launched from Excel 2010 or a Windows program ! Excel Dialog Suppression
Excel Workbook on the Cluster
! Run User Defined Functions in parallel on a cluster ! Excel 2010 includes a new API and options for HPC
cluster ! Support for .XLL files developed through Excel SDK ! Easy to develop on a desktop and then deploy to a cluster
Excel UDF on the Cluster
! Connects to the cluster as a SOA client ! VSTO code in workbook calls out to SOA Service ! Input and output managed by Excel developer
Excel SOA Client
NEW
NEW
! Use Azure servers to run HPC compute Jobs ! Can be used to “burst-out” to the cloud to handle peak demand ! Can create clusters that include dedicated on-premise servers, non-dedicated
workstations and shared Azure servers ! Jobs can run unchanged across all 3 types of compute nodes (no support for MPI in SP1) ! Azure nodes are added to cluster using the Administration console (just like Workstation nodes)
Jobs Requests
Head & Broker Nodes
Azure Gateway
Azure
HPC Clients
On-premise
• “Burst” into cloud on-demand while keeping control over data and corporate policies
• Pay only for what you use
• A stepping stone to hybrid and public clouds.
• Dynamically adjust how much runs on-premise and in the cloud
Azure
Compute Nodes
Desktops HPC Head Node
Broker Node
Compute Nodes On-Premise and in Azure Simultaneously
Azure
Compute Instances
Compute Proxies
“Combined with Intel Parallel Studio, I think it is reasonable to say that Windows has the richest and most complete set of tools for multicore programming”. -- James Reinders, Intel, 12-April-2010
Parallel Development
Solution Begins with DEVELOPERS
Make it easier to express and manage the
correctness, efficiency and maintainability of
parallelism on Microsoft platforms for developers of
all skill levels
Enable developers to express parallelism easily and focus on the problem to be
solved Improve the
efficiency and scalability of
parallel applications
Simplify the process of
designing and testing parallel
applications
Visual Studio 2010 Tools, Programming Models, Runtimes
Parallel Pattern Library
Resource Manager
Task Scheduler
Task Parallel Library
Parallel LINQ
Managed Native
Threads Operating System
Concurrency Runtime
Programming Models
ThreadPool Task Scheduler
Resource Manager
Data Structures D
ata
Stru
ctur
es
Tools
Tooling
Parallel Debugger
Tool Windows
Profiler Concurrenc
y Analysis
Agents Library
UMS Threads
.NET Framework 4 Visual C++ 10 Visual Studio IDE
Windows
World’s Fastest House Construction Three and a Half Hours
http://www.microsoft.com/hpc
David Rich darich at microsoft.com
© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.