announcing amazon ec2 f1 instances with custom fpgas

35
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. David Pellerin, Business Development Principal December 15, 2016 Announcing Amazon EC2 F1 Instances with Custom FPGAs Hardware-Accelerated Computing on AWS F1

Upload: amazon-web-services

Post on 16-Apr-2017

2.139 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Announcing Amazon EC2 F1 Instances with Custom FPGAs

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

David Pellerin, Business Development Principal

December 15, 2016

Announcing Amazon EC2 F1 Instances with Custom FPGAsHardware-Accelerated Computing on AWS

F1

Page 2: Announcing Amazon EC2 F1 Instances with Custom FPGAs

Agenda

1. Accelerated Computing Concepts

2. Introducing F1 FPGA Instances

3. Examples of FPGA Use-Cases

4. FPGA Development Process

Page 3: Announcing Amazon EC2 F1 Instances with Custom FPGAs

Accelerated Computing on EC2

Page 4: Announcing Amazon EC2 F1 Instances with Custom FPGAs

EC2 Compute Instance Types

M4

General purpose

Computeoptimized

Storage and IO

optimized

GPU and FPGA

acceleratedMemory

optimized

X1

2010

2013

2016

2016PreviewF1

P2

G2

CG1

M3

T2

I2 HS1

I3 D2

R4

R3

C5

C4

C3

CC2

Announced

Page 5: Announcing Amazon EC2 F1 Instances with Custom FPGAs

NVIDIA Tesla GPU Card

P2: GPU-accelerated computing Enabling a high degree of parallelism – each

GPU has thousands of cores Consistent, well documented set of APIs

(CUDA, OpenACC, OpenCL) Supported by a wide variety of ISVs and

open source frameworks

Xilinx UltraScale+

FPGA

F1: FPGA-accelerated computing Massively parallel – each FPGA includes

millions of parallel system logic cells Flexible – no fixed instruction set, can

implement wide or narrow datapaths Programmable using available, cloud-based

FPGA development tools

GPU and FPGA for Accelerated Computing

Page 6: Announcing Amazon EC2 F1 Instances with Custom FPGAs

CPU: High speed, lower efficiency GPU/FPGA: High throughput, higher efficiency

GPUs and FPGAs can provide massive parallelism and higher efficiency than CPUs for certain categories of applications

Accelerated Computing ConceptsMore parallelism for higher throughout…

Page 7: Announcing Amazon EC2 F1 Instances with Custom FPGAs

A GPU is effective at processing the same set of operations in parallel – single instruction, multiple data (SIMD). A GPU has a well-defined instruction-set, and fixed word sizes – for example single, double, or half-precision integer and floating point values.

An FPGA is effective at processing the same or different operations in parallel – multiple instructions, multiple data (MIMD). An FPGA does not have a predefined instruction-set, or a fixed data width.

ControlALU

ALU

Cache

DRAM

ALU

ALU

CPU(one core)

FPGA

DRAM DRAM

GPU

Each FPGA in F1 has more than 2M of these cells

Each GPU in P2 has 2880 of these cores

DRAM

Parallel Processing in GPUs and FPGAs

Blo

ck R

AM

Blo

ck R

AM

DRAM DRAM

Page 8: Announcing Amazon EC2 F1 Instances with Custom FPGAs

module filter1 (clock, rst, strm_in, strm_out)

for (i=0; i<NUMUNITS; i=i+1)

always@(posedge clock)

integer i,j; //index for loops

tmp_kernel[j] = k[i*OFFSETX];

FPGA handles compute-intensive, deeply pipelined, hardware-accelerated operations

CPU handles the rest

application

How FPGA Acceleration Works

Page 9: Announcing Amazon EC2 F1 Instances with Custom FPGAs

Process

Process

Process

Process

Process

Process

Process

Process

Process

Data Data

DataData

Process

Process

Data

Hardware-Accelerated ComputingBuilding parallel systems for parallel problems

Page 10: Announcing Amazon EC2 F1 Instances with Custom FPGAs

An FPGA is effective at processing data of many types in parallel, for example creating a complex pipeline of parallel, multistage operations on a video stream, or performing massive numbers of dependent or independent calculations for a complex financial model…

An FPGA does not have an instruction-set!Data can be any bit-width (9-bit integer? No problem!)Complex control logic (such as a state machine) is easy to implement in an FPGA

Each FPGA in F1 has more than 2M of these cells

Parallel Processing in FPGAs

Page 11: Announcing Amazon EC2 F1 Instances with Custom FPGAs

Introducing F1 FPGA Instances

Page 12: Announcing Amazon EC2 F1 Instances with Custom FPGAs

Make FPGA acceleration available to a larger community of developers, and to millions of potential end-customers

Provide dedicated and large amounts of FPGA logic in a single EC2 instance, using multiple FPGAs

Simplify the development process by providing cloud-based FPGA development tools

Allow developers to focus on algorithm design, by abstracting FPGA I/O using well-defined interfaces

Provide access to a growing ecosystem of FPGA programming tools and applications

Provide a Marketplace for FPGA applications, providing more choice and easy access for all AWS customers

FPGA Acceleration in the AWS Cloud: Goals

Page 13: Announcing Amazon EC2 F1 Instances with Custom FPGAs

New EC2 FPGA instance type for accelerated computing Up to 8 Xilinx UltraScale+ 16nm VU9P FPGA devices in a single instance The f1.16xlarge size provides:

8 FPGAs, each with over 2 million customer-accessible FPGA programmable logic cells and over 5000 programmable DSP blocks

Each of the 8 FPGAs has 4 DDR-4 interfaces, with each interface accessing a 16GiB, 72-bit wide, ECC-protected memory

Instance Size FPGAs DDR-4 (GiB)

FPGA Link

FPGA Direct

vCPUs Instance Memory (GiB)

NVMe Instance Storage (GB)

Network Bandwidth*

f1.2xlarge 1 4 x 16 - - 8 122 1 x 480 10 Gbps Peak

f1.16xlarge 8 32 x 16 Y Y 64 976 4 x 960 30 Gbps

*In a placement group

F1 FPGA Instance Types on AWS

Page 14: Announcing Amazon EC2 F1 Instances with Custom FPGAs

System Logic Block:Each FPGA in F1 provides over 2M of these logic blocks

DSP (Math) Block:Each FPGA in F1 has more than 5000 of these blocks

I/O Blocks:Used to communicate externally, for example to DDR-4, PCIe, or ring

Block RAM:Each FPGA in F1 has over 60Mb of internal Block RAM, and over 230Mb of embedded UltraRAM

Blo

ck R

AM

Blo

ck R

AM

I/O Blocks

DDR-4 DDR-4

DDR-4 DDR-4

PC

Ie

FPG

A Li

nk

What’s Inside the F1 FPGA?

Page 15: Announcing Amazon EC2 F1 Instances with Custom FPGAs

AWS FPGA ShellFPGA I/O is provided using pre-configured, pre-tested, and secure I/O components, allowing FPGA developers to focus on their differentiating value

The FPGA Shell allows for faster coding of core acceleration functions by removing the need to develop I/O related FPGA hardware

Blo

ck R

AM

Blo

ck R

AM

DDR-4 DDR-4

DDR-4 DDR-4

FPG

A Li

nk

PC

IeAbstracting FPGA I/O

Page 16: Announcing Amazon EC2 F1 Instances with Custom FPGAs

Amazon Machine

Image (AMI)Amazon FPGA

Image (AFI)

EC2 F1 Instance

CPU Application

on F1

DDR-4 Attached Memory

DDR-4 Attached Memory

DDR-4 Attached Memory

DDR-4 Attached Memory

DDR-4 Attached Memory

DDR-4 Attached Memory

DDR-4 Attached Memory

DDR-4 Attached Memory

FPGA Link

PCIeDDR

Controllers

Launch Instanceand Load AFI

An F1 instance can have any number of AFIs

An AFI can be loaded into the FPGA in less than 1 second

FPGA Acceleration Using F1

Page 17: Announcing Amazon EC2 F1 Instances with Custom FPGAs

Example F1 Use-Cases

Page 18: Announcing Amazon EC2 F1 Instances with Custom FPGAs

Highly Efficient

• Algorithms Implemented in Hardware• Gate-Level Circuit Design• No Instruction Set Overhead

Massively Parallel

• Massively Parallel Circuits• Multiple Compute Engines• Rapid FPGA Reconfigurability

FPGA

Speeds Analysis of Whole Human Genomes from Hours to MinutesUnprecedented Low Cost for Compute and Compressed Storage

F1 for Genomics Processing

Page 19: Announcing Amazon EC2 F1 Instances with Custom FPGAs

F1 for Financial ComputingModeling Counterparty Risk (CVA) and Regulatory Capital Requirements

Page 20: Announcing Amazon EC2 F1 Instances with Custom FPGAs

F1 for Video ProcessingNext Generation Video Compression for Broadcast Quality 4K content

Successfully ported to F1 in just 3 weeks

Page 21: Announcing Amazon EC2 F1 Instances with Custom FPGAs

F1 for Accelerated AnalyticsHeterogeneous Compute Acceleration for Faster Data Discovery

Page 22: Announcing Amazon EC2 F1 Instances with Custom FPGAs

FPGA Development Process

Page 23: Announcing Amazon EC2 F1 Instances with Custom FPGAs

Development steps Launch the AWS-provided FPGA Developer AMI, which includes all needed FPGA design and programming software, as well as the AWS FPGA Hardware Development Kit (HDK)

Use Xilinx Vivado or SDAccel software and a hardware description language (Verilog, VHDL, or OpenCL) with the HDK to describe and simulate your custom FPGA logic

After successful simulation, use Vivado or SCAccel to synthesize and place/route the FPGA logic to create an FPGA Design Check Point (DCP), encrypt, and generate an Amazon FPGA Image (AFI)

Launch an F1 instance and load the AFI to the FPGA, using AFI management tools provided by AWS

Developing Applications for F1

1

2

3

4

Page 24: Announcing Amazon EC2 F1 Instances with Custom FPGAs

Generate an Amazon FPGA

Image (AFI)FPGA Place-and-Route

using Xilinx Vivado on C4 or M4 instance

FPGA Logic Design using Xilinx Vivado on C4 or M4

instance

Securely deploy AFI on one or

more F1 instances

Developing Applications for F1

Page 25: Announcing Amazon EC2 F1 Instances with Custom FPGAs

Choose and launch the AWS-provided FPGA Developer AMI, which includes all needed FPGA design and programming software, as well as the AWS FPGA Hardware Development Kit (HDK)

Developing Applications for F1

Page 26: Announcing Amazon EC2 F1 Instances with Custom FPGAs

Developing Applications for F1

Page 27: Announcing Amazon EC2 F1 Instances with Custom FPGAs

Use Xilinx Vivado or SCAccel software and a hardware description language (Verilog, VHDL, or OpenCL) with the HDK to describe and simulate your custom FPGA logic After successful simulation, use scripts provided with the HDK to encrypt, synthesize and place/route the FPGA logic to create a final FPGA Design Check Point (DCP) and generate a secure, encrypted Amazon FPGA Image (AFI)

Developing Applications for F1

Page 28: Announcing Amazon EC2 F1 Instances with Custom FPGAs

Launch an F1 instance and download the AFI to the FPGA, using AFI management tools provided by AWS

Generate an Amazon FPGA

Image (AFI)

Deploy AFI on one or more F1

instances

Developing Applications for F1

Page 29: Announcing Amazon EC2 F1 Instances with Custom FPGAs

Amazon EC2 FPGA Deployment via Marketplace

Amazon Machine

Image (AMI)Amazon FPGA Image

(AFI)

AFI is secured, encrypted, dynamically loaded into the FPGA - can’t be copied or

downloaded

Customers

AWS Marketplace

Delivering FPGA Partner Solutions on AWSvia AWS Marketplace

Page 30: Announcing Amazon EC2 F1 Instances with Custom FPGAs

Delivering FPGA Partner Solutions on AWSAWS Marketplace Benefits• Streamlined delivery of FPGA-accelerated solutions: Offer software as a

managed Amazon Machine Image (AMI) and one or more Amazon FPGA Images (AFI), with secure 1-click purchasing.

• Discover new customers: Allow customers to launch directly from AWS Marketplace, decreasing the length of sales cycles. Sellers can also offer free trials with no additional engineering effort.

• Simplified billing & payments: Customers pay for AWS Marketplace software as part of the regular AWS billing cycle. AWS manages the complexity of AMI and AFI security, metering, billing, payment collection, and financial reporting.

• Secure your FPGA-based products: FPGA custom logic is deployed to customers in a secure way, with no ability to view, copy, or edit the AFI logic.

• Provide Seamless Product Support: AWS Marketplace Product Support Connection makes it easy to support your customers on AWS Marketplace.

Page 31: Announcing Amazon EC2 F1 Instances with Custom FPGAs

FPGA: A Field Programmable Gate Array is a device that consists of very large numbers of configurable logic and memory elements interconnected by configurable routing resources. FPGAs differ from CPUs and GPUs by having no fixed instruction set, and in their ability to implement operations and processes that are pipelined and parallelized in an almost unlimited number of ways, using arbitrarily sized bit-widths. AFI (Amazon FPGA Image): a file containing the binary image for an FPGA bitstream. Loading an AFI onto an FPGA “programs” that device, within seconds, to perform one of more application-specific functions. HDL (Hardware Description Language): a low-level programming language designed for describing logic functions for the purposed of simulation and for conversion (via synthesis) to an FPGA or ASIC.Vivado and SDAccel: a set of design tools produced by Xilinx (provider of the F1 FPGA devices) for development of FPGA logic, pre-integrated and provided at no charge by AWS. Verilog: a commonly-used HDL for FPGA design and simulation, supported by Vivado.VHDL: another commonly-used HDL for FPGA , also supported by Vivado.

F1 Glossary

Page 32: Announcing Amazon EC2 F1 Instances with Custom FPGAs

OpenCL (Open Computing Language): a higher-level alternative to HDL programming based on C-language, and supported in the Xilinx SDAccel design tools. OpenCL can be used to target either FPGAs or GPUs.HDK (Hardware Development Kit): a set of tools, documentation, and associated FPGA libraries provided by AWS to assist FPGA developers with more rapid FPGA development, in particular to simplify the use of I/O from the FPGA to the host EC2 instance via PCIe, from FPGA to memory, and from FPGA to FPGA.AXI: an FPGA-internal bus format providing standardized interfaces for memory-mapped communications and for high-speed streaming data. AXI is used in the F1 HDK to define interfaces between AWS-provided interface logic, and custom logic provided by FPGA developers.Developer AMI: a preconfigured AMI, available in the AWS Marketplace, that includes all necessary software and libraries for FPGA development, including the Vivado software and the HDK libraries enabling HDL design and simulation.

F1 Glossary (cont)

Page 33: Announcing Amazon EC2 F1 Instances with Custom FPGAs

Synthesis: the process, using software tools provided with Vivado, of converting an HDL or OpenCL application into a lower-level format (sometimes referred to as a “netlist”) representing the individual logic elements of the application, for example AND, OR, XOR gates, adders and multipliers, shift registers, etc. This “netlist” must be further processed, using place-and-route software, to create a downloadable bitstream.Place-and-Route: the process, using software tools provided with Vivado, of mapping individual logic elements to precise locations in the target FPGA, and specifying their interconnections. Place-and-route is an iterative process that can require hours to complete for larger applications and larger FPGAs.Bitstream: a binary format representing the synthesized, placed, and routed FPGA application ready for downloading to an FPGA.Design Check Point (DCP): a binary file format containing the FPGA bitstream, ready for ingestion during the creation of an Amazon FPGA Image (AFI).

F1 Glossary (cont)

Page 34: Announcing Amazon EC2 F1 Instances with Custom FPGAs

Additional Resources

AWS F1 details: https://aws.amazon.com/ec2/instance-types/f1/

AWS Marketplace: https://aws.amazon.com/marketplace/

AWS Educate: https://aws.amazon.com/education/awseducate/

Edico Genome: http://www.edicogenome.com/

NGCODEC: http://www.ngcodec.com

Maxeler: http://www.maxeler.com/

Ryft: https://www.ryft.com

Page 35: Announcing Amazon EC2 F1 Instances with Custom FPGAs

Thank you!