building multi-processor fpga systems hands-on tutorial to using fpgas and linux chris martin member...

99
Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin <[email protected]> Member Technical Staff Embedded Applications

Upload: josephine-powell

Post on 22-Dec-2015

233 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Building Multi-Processor FPGA SystemsHands-on Tutorial to Using FPGAs and Linux

Chris Martin <[email protected]>Member Technical Staff Embedded Applications

Page 2: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Agenda

2

Introduction

Problem: How to Integrate Multi-Processor Subsystems

Why…– Why would you do this?– Why use FPGAs?

Lab 1: Getting Started - Booting Linux and Boot-strapping NIOS

Lab 2: Inter-Processor Communication and Shared Peripherals

Lab 3: Locking and Tetris

Building Hardware: FPGA Hardware Tools & Build Flow

Building/Debugging Software: Software Tools & Build Flow

References

Q&A – All through out.

Page 3: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Subsystem1

3

The Problem – Integrating Multi-Processor Subsystems

Given a system with multiple processor sub-systems, these architecture decisions must be considered:

Inter-processor communication

Partitioning/sharing Peripherals (locking required)

Bandwidth & Latency Requirements

Processor

Periph 1 Periph 2 Periph 3

Subsystem2Processor

Periph 1 Periph 2 Periph 3

???

Page 4: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

4

Why Do We Need to Integrate Multi-Processor Subsystems?

May have inherited processor subsystem from another development team or 3rd party

– Risk Mitigation by reducing change

Fulfill Latency and Bandwidth Requirements

– Real-time Considerations– If main processor not Real-Time

enabled, can add a real-time processor subsystem

Design partition / Sandboxing– Break the system into smaller

subsystems to service task– Smaller task can be designed easily

Leverage Software Resources– Sometimes problem is resolved in less

time by Processor/Software rather than Hardware design

– Sequencers, State-machines

Page 5: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

5

Why do we want to integrate with FPGA?(or rather, HOW can FPGAs help?)

Bandwidth & Latency can be tailored

– Addresses Real-time aspects of System Solution

– FPGA logic has flexible interconnect

– Trade Data width with clock frequency with latency

Experimentation– Many processor subsystems

can be implemented– Allows you to experiment

changing microprocessor subsystem hardware designs

– Altera FPGA under-the-hood– However: Generic Linux

interfaces used and can be applied in any Linux system.

NIOS

ARM

APeripheral

N Peripheral

SharedPeripheral

Mailbox

Simple Multiprocessor System

And, why is Altera involved with Embedded Linux…

Page 6: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Why is Altera Involved with Embedded Linux?

6

More than 50% of FPGA designs include an embedded processor, and growing.Many embedded designs using LinuxOpen-source re-use.

– Altera Linux Development Team actively contributes to Linux Kernel

0

20,000

40,000

60,000

80,000

100,000

120,000

Without CPU With CPU

Source: Gartner September 2010

50%

Des

ign

Sta

rts

With Embedded ProcessorWithout Embedded Processor

Page 7: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

SoCKit Board Architecture Overview

Lab focus- UART- DDR3- LEDs- Buttons

7

Page 8: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

FPGA Fabric “Soft Logic”

SoC/FPGA Hardware Architecture Overview

ARM-to-FPGA Bridges- Data Width

configurable FPGA

- 42K Logic Macros

- Using no more than 14%

8

A9

I$ D$

A9

I$ D$

L2

EMIF

DDR

ROM

RAM

DMA

UART

SD/MMC

AXI BridgeFPGA2HPS

AXI BridgeHPS2FPGA

AXI BridgeLWHPS2FPGA

NIOS

RAMGPIO

SYS ID

32 32/64/128

32

32/64/128

Page 9: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Lab 1: Getting StartedBooting Linux and Boot-strapping NIOS

9

Topics Covered:– Configuring FPGA from SD/MMC and U-Boot– Booting Linux on ARM Cortex-A9– Configuring Device Tree– Resetting and Booting NIOS Processor– Building and compiling simple Linux Application

Key Example Code Provided:– C code for downloading NIOS code and resetting NIOS from ARM– Using U-boot to set ARM peripheral security bits

Full step-by-step instructions are included in lab manual.

Page 10: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Lab 1: Hardware Design Overview

10

NIOS Subsystem– 1 NIOS Gen 2 processor– 64k combined instruction/data

RAM (On-Chip RAM)– GPIO peripheral

ARM Subsystem– 2 Cortex-A9 (only using 1)– DDR3 External Memory– SD/MMC Peripheral– UART Peripheral

Shared Peripherals Dedicated Peripherals

Subsystem 1

Subsystem 2

Cortex-A9

GPIO

UART

SD/MMC

NIOS 0

RAM

EMIF

Page 11: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Lab1: Programmer View - Processor Address Maps

Address Base Peripheral

0xFFC0_2000 ARM UART

0x0003_0000 GPIO (LEDs)

0x0002_0000 System ID

0x0000_0000 On-chip RAM

Address Base Peripheral

0xFFC0_2000 UART

0xC003_0000 GPIO (LEDs)

0xC002_0000 System ID

0xC000_0000 On-chip RAM

11

NIOS ARM Cortex-A9

Page 12: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Lab 1: Peripheral Registers

Peripheral AddressOffset

Access Bit Definitions

Sys ID 0x0 RO [31:0] – System ID. Lab Default = 0x00001ab1

GPIO 0x0 R/W [31:0] – Drive GPIO output.Lab Uses for LED control, push button status and NIOS processor resets (from ARM). [3:0] - LED 0-3 Control. ‘0’ = LED off . ‘1’ = LED on[4] – NIOS 0 Reset[5] – NIOS 1 Reset[1:0] – Push Button Status

UART 0x14 RO Line Status Register[5] – TX FIFO Empty[0] – Data Ready (RX FIFO not-Empty)

UART 0x30 R/W Shadow Receive Buffer Register[7:0] – RX character from serial input

UART 0x34 R/W Shadow Transmit Register[7:0] – TX character to serial output

12

Page 13: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Lab 1: Processor Resets Via Standard Linux GPIO Interface

NIOS resets connected to GPIO

GPIO driver uses /sys/class/gpio interface

int main(int argc, char** argv){ int fd, gpio=168; char buf[MAX_BUF];

/* Export: echo ### > /sys/class/gpio/export */ fd = open("/sys/class/gpio/export", O_WRONLY); sprintf(buf, "%d", gpio); write(fd, buf, strlen(buf)); close(fd);

/* Set direction to Out: */ /* echo "out“ > /sys/class/gpio/gpio###/direction */ sprintf(buf, "/sys/class/gpio/gpio%d/direction", gpio); fd = open(buf, O_WRONLY); write(fd, "out", 3); /* write(fd, "in", 2); */ close(fd);

/* Set GPIO Output High or Low */ /* echo 1 > /sys/class/gpio/gpio###/value */ sprintf(buf, "/sys/class/gpio/gpio%d/value", gpio); fd = open(buf, O_WRONLY); write(fd, "1", 1); /* write(fd, "0", 1); */ close(fd);

/* Unexport: echo ### > /sys/class/gpio/unexport */ fd = open("/sys/class/gpio/unexport", O_WRONLY); sprintf(buf, "%d", gpio); write(fd, buf, strlen(buf)); close(fd);}13

Page 14: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Lab 1: Loading External Processor Code Via Standard Linux shared memory (mmap)

NIOS RAM address accessed via mmap()

Can be shared with other processes

R/W during load Read-only protection

after load

/* Map Physical address of NIOS RAM to virtual address segment with Read/Write Access */fd = open("/dev/mem", O_RDWR);load_address = mmap(NULL, 0x10000, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0xc0000000);

/* Set size of code to load */ load_size = sizeof(nios_code)/sizeof(nios_code[0]);

/* Load NIOS Code */for(i=0; i < load_size ;i++){ *(load_address+i) = nios_code[i]; }

/* Set load address segment to Read-Only */mprotect(load_address, 0x10000, PROT_READ);

/* Un-map load address segment */munmap(load_address, 0x10000);

14

Page 15: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Lab 2: MailboxesNIOS/ARM Communication

15

Topics Covered:– Altera Mailbox Hardware IP

Key Example Code Provided:– C code for sending/receiving messages via hardware Mailbox IP

NIOS & ARM C Code– Simple message protocol– Simple Command parser

Full step-by-step instructions are included in lab manual. – User to add second NIOS processor mailbox control.

Page 16: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Subsystem 1

Subsystem 2

Lab 2: Hardware Design Overview

16

NIOS 0 & 1 Subsystems– NIOS Gen 2 processor– 64k combined instruction/data

RAM– GPIO (4 out, LED)– GPIO (2 in, Buttons)– Mailbox

ARM Subsystem– 2 Cortex-A9 (only using 1)– DDR3 External Memory– SD/MMC Peripheral– UART Peripheral

Cortex-A9

GPIO

UART

SD/MMC

NIOS 0

RAM

Shared Peripherals Dedicated Peripherals

MBox

Subsystem 3

GPIO

NIOS 1

RAMMBox

GPIO

EMIF

Page 17: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Lab2: Programmer View - Processor Address Maps

Address Base Peripheral

0xFFC0_2000 ARM UART

0x0007_8000 Mailbox (from ARM)

0x0007_0000 Mailbox (to ARM)

0x0005_0000 GPIO (In Buttons)

0x0003_0000 GPIO (Out LEDs)

0x0002_0000 System ID

0x0000_0000 On-chip RAM

Address Base Peripheral

0xFFC0_2000 UART

0x0007_8000 Mailbox (to NIOS 1)

0x0007_0000 Mailbox (from NIOS 1)

0x0006_8000 Mailbox (to NIOS 0)

0x0006_0000 Mailbox (from NIOS 0)

0xC003_0000 GPIO (LEDs)

0xC002_0000 System ID

0xC001_0000 NIOS 1 RAM

0xC000_0000 NIOS 0 RAM

17

NIOS 0 & 1 ARM Cortex-A9

Page 18: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Lab 2: Additional Peripheral (Mailbox) Registers

Peripheral AddressOffset

Access Bit Definitions

Mailbox 0x0 R/W [31:0] – RX/TX Data

Mailbox 0x8 R/W [1] – RX Message Queue Has Data

[0] – TX Message Queue Empty

18

Page 19: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

LAB 2: Designing a Simple Message Protocol

Design Decisions:- Short Length: A single 32-bit word- Human Readable- Message transactions are closed-

loop. Includes ACK/NACK Format:

- Message Length: Four Bytes- First Byte is ASCII character

denoting message type.- Second Byte is ASCII char from

0-9 denoting processor number. - Third Byte is ASCII char from 0-9

denoting message data, except for ACK/NACK.

- Fourth Byte is always null character ‘\0’ to terminate string (human readable).

Message Types:- “G00”: Give Access to UART

(Push)- “A1A”: ACK- “N1A”:NACK

Can be Extended:- “L00”: LED Set/Ready- “B00”: Button Pressed- “R00”: Request UART Access

(Pull)

19

Byte 0 Byte 1 Byte 2 Byte3

‘L’ ‘0’ ‘0’ ‘\0’

‘A’ ‘0’ ‘0’ ‘\0’

Cortex-A9 NIOS 0

“G00”

“A0A”“N0N”

Page 20: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Lab 2: Inter-Processor Communication with Mailbox HWVia Standard Linux Shared Memory (mmap)

Wait for Mailbox Hardware message empty flag

Send message (4 bytes) Disable ARM/Linux

Access to UART Wait for RX message

received flag Re-enable ARM/Linux

UART Access

/* Map Physical address of Mailbox to virtual address segment with Read/Write Access */fd = open("/dev/mem", O_RDWR);mbox0_address = mmap(NULL, 0x10000, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0xff260000);<snip>/* Waiting for Message Queue to empty */while((*(volatile int*)(mbox0_address+0x2000+2) & 1) != 0 ) {} /* Send Granted/Go message to NIOS */send_message = "G00";*(mbox0_address+0x2000) = *(int *)send_message;

/* Disable ARM/Linux Access to UART (be careful here)*/config.c_cflag &= ~CREAD;if(tcsetattr(fd, TCSAFLUSH, &config) < 0) { }

/* Wait for Received Message */while((*(volatile int*)(mbox0_address+2) & 2) == 0 ) {}

/* Re-enable UART Access */config.c_cflag |= CREAD;tcsetattr(fd, TCSAFLUSH, &config);/* Read Received Message */printf(" - Message Received. DATA = '%s'.\n", (char*)(mbox0_address));

20

Page 21: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Lab 3: Putting It All Together – Tetris!Combining Locking and Communication

21

Topics Covered:– Linux Mutex

Key Example Code Provided:– C code showcasing using Mutexes for locking shared peripheral access– C code for multiple processor subsystem bringup and shutdown

Full step-by-step instructions are included in lab manual. – User to add code for second NIOS processor bringup, shutdown and

locking/control.

Page 22: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Subsystem 1

Subsystem 2

Lab 3: Hardware Design Overview (Same As Lab 2)

22

NIOS 0 & 1 Subsystems– NIOS Gen 2 processor– 64k combined instruction/data

RAM– GPIO (4 out, LED)– GPIO (2 in, Buttons)– Mailbox

ARM Subsystem– 2 Cortex-A9 (only using 1)– DDR3 External Memory– SD/MMC Peripheral– UART Peripheral

Cortex-A9

GPIO

UART

SD/MMC

NIOS 0

RAM

Shared Peripherals Dedicated Peripherals

MBox

Subsystem 3

GPIO

NIOS 1

RAMMBox

GPIO

EMIF

Page 23: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Lab 3: Programmer View - Processor Address Maps

Address Base Peripheral

0xFFC0_2000 ARM UART

0x0007_8000 Mailbox (from ARM)

0x0007_0000 Mailbox (to ARM)

0x0005_0000 GPIO (In Buttons)

0x0003_0000 GPIO (Out LEDs)

0x0002_0000 System ID

0x0000_0000 On-chip RAM

Address Base Peripheral

0xFFC0_2000 UART

0x0007_8000 Mailbox (to NIOS 1)

0x0007_0000 Mailbox (from NIOS 1)

0x0006_8000 Mailbox (to NIOS 0)

0x0006_0000 Mailbox (from NIOS 0)

0xC003_0000 GPIO (LEDs)

0xC002_0000 System ID

0xC001_0000 NIOS 1 RAM

0xC000_0000 NIOS 0 RAM

23

NIOS 0 & 1 ARM Cortex-A9

Page 24: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Available Linux Locking/Synchronization Mechanisms

24

Need to share peripherals– Choose a Locking Mechanism

Available in Linux– Mutex <- Chosen for this Lab– Completions– Spinlocks – Semaphores – Read-copy-update (decent for multiple

readers, single writer)– Seqlocks (decent for multiple readers, single

writer)

Available for Linux– MCAPI - openmcapi.org

Page 25: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

NIOS 1

Tetris Message Protocol – Extended from Lab 2

25

NIOS Control Flow: – Wait for button press– Send Button press message– Wait for ACK (Free to write

to LED GPIO)– Write to LED GPIO– Send LED ready msg– Wait for ACK

ARM Control Flow: – Wait for button press

message– Lock LED GPIO Peripheral– Send ACK (Free to write to

LED GPIO)– Wait for LED ready msg– Send ACK– Read LED value– Release Lock/Mutex

Cortex-A9NIOS 0 “B00”

“A0A”

“A0A”

“L00”

“B10”

“A1A”

“A1A”

“L10”

Page 26: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Lab 3: Locking Hardware Peripheral AccessVia Linux Mutex

In this example, LED GPIO is accessed by multiple processors

Wrap LED critical section (LED status reads) with:- pthread_mutex_lock()- pthread_mutex_unlock()

Also need Mutex init/destroy:- pthread_mutex_init()- pthread_mutex_destroy()

pthread_mutex_t lock;<snip – Initialize/create/start> /* Initialize Mutex */ err = pthread_mutex_init(&lock, NULL);

/* Create 2 Threads */ i=0; while(i < 1) { err = pthread_create(&(tid[i]), NULL, &nios_buttons_get, &(nios_num[i])); i++; }

<snip – Critical Section> pthread_mutex_lock(&lock); /* Critical Section */ pthread_mutex_unlock(&lock);

<snip Stop/Destroy> /* Wait for threads to complete */ pthread_join(tid[0], NULL); pthread_join(tid[1], NULL);

/* Destroy/remove lock */ pthread_mutex_destroy(&lock);

26

Page 27: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

References

27

Page 28: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Altera References

System Design Tutorials:

– http://www.alterawiki.com/wiki/Designing_with_AXI_for_Altera_SoC_ARM_Devices_Workshop_Lab_-_Creating_Your_AXI

3_Component

– Designing_with_AXI_for_Altera_SoC_ARM_Devices_Workshop_Lab

– Simple_HPS_to_FPGA_Comunication_for_Altera_SoC_ARM_Devices_Workshop

– http://www.alterawiki.com/wiki/Simple_HPS_to_FPGA_Comunication_for_Altera_SoC_ARM_Devices_Workshop_-_LAB2

Multiprocessor NIOS-only Tutorial:

– http://www.altera.com/literature/tt/tt_nios2_multiprocessor_tutorial.pdf

Quartus Handbook:

– https://www.altera.com/en_US/pdfs/literature/hb/qts/quartusii_handbook.pdf

Qsys:

– System Design with Qsys (PDF) section in the Handbook

– Qsys Tutorial: Step-by-step procedures and design example files to create and verify a system in Qsys

– Qsys 2-day instructor-led class: System Integration with Qsys

– Qsys webcasts and demonstration videos

SoC Embedded Design Suite User Guide:

– https://www.altera.com/en_US/pdfs/literature/ug/ug_soc_eds.pdf

Page 29: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Related Articles

29

Performance Analysis of Inter-Processor Communication Methods– http

://www.design-reuse.com/articles/24254/inter-processor-communication-multi-core-processors-reconfigurable-device.html

Communicating Efficiently between QorlQ Cores in Medical Applications

– https://cache.freescale.com/files/32bit/doc/brochure/PWRARBYNDBITSCE.pdf

Linux Inter-Process Communication:– http://www.tldp.org/LDP/tlk/ipc/ipc.html

Linux locking mechanisms (from ARM):– http://infocenter.arm.com/help/index.jsp?topic=/

com.arm.doc.dai0425/ch04s07s03.html

OpenMCAPI:– https://bitbucket.org/hollisb/openmcapi/wiki/Home

Mutex Examples:– http://www.thegeekstuff.com/2012/05/c-mutex-examples/

Page 30: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Thank YouThank You Full Tutorial Resources Online

- Project Wiki Page: http://rocketboards.org/foswiki/Projects/BuildingMultiProcessorSystems

Includes:- Source code- Hardware source- Hardware Quartus Projects- Software Eclipse Projects

Page 31: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

BACKUP SLIDES

Page 32: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Post-Lab 1 Additional Topics

Hardware Design Flow and FPGA Boot with U-boot and SD/MMC

32

Page 33: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Building Hardware:Qsys (Hardware System Design Tool) User Interface

33

Connections between cores

Interfaces ExportedIn/out of system

Page 34: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Hardware and Software Work Flow Overview

34

Inputs:– Hardware Design (Qsys or RTL or Both)

Outputs (to load on boot media):– Preloader and U-boot Images– FPGA Programmation File: Raw Binary Format (RBF)– Device Tree Blob

Quartus&

Qsys

RBF

EclipseDS-5 & Debug Tools

Device Tree

Preloader & U-Boot

Page 35: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

SDCARD Layout

35

Partition 1: FAT– Uboot scripts– FPGA HW Designs (RBF)– Device Tree Blobs– zImage– Lab material

Partition 2: EXT3 – Rootfs

Partition 3: Raw – Uboot/preloader

Partition 4: EXT3 – Kernel src

Page 36: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Updating SD Cards

36

More info found on Rocketboards.org– http://www.rocketboards.org/foswiki/Documentation/GSRD141SdCard

Automated Python Script to build SD Cards:– make_sdimage.py

File Update Procedure

zImage Mount DOS SD card partition 1 and replace file with new one:$ sudo mkdir sdcard$ sudo mount /dev/sdx1 sdcard/ $ sudo cp <file_name> sdcard/ $ sudo umount sdcard

soc_system.rbf

soc_system.dtb

u-boot.scr

preloader-mkpimage.bin $ sudo dd if=preloader-mkpimage.bin of=/dev/sdx3 bs=64k seek=0

u-boot-socfpga_cyclone5.img $ sudo dd if=u-boot-socfpga_cyclone5.img of=/dev/sdx3 bs=64k seek=4

root filesystem $ sudo dd if=altera-gsrd-image-socfpga_cyclone5.ext3 of=/dev/sdx2

Page 37: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Post-Lab 2 Additional TopicUsing Eclipse to Debug: NIOS Software Build Tools

37

Page 38: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Altera NIOS Software Design and Debug Tools

38

Nios II SBT for Eclipse key features:

– New project wizards and software templates

– Compiler for C and C++ (GNU)

– Source navigator, editor, and debugger

– Eclipse project-based tools– Download code to hardware

Page 39: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Key Multi-Processor System Design Points

39

Startup/Shutdown– Processor – Peripheral– Covered in Lab 1.

Communication between processors– What is the physical link?– What is the protocol & messaging method?– Message Bandwidth & Latency– Covered in Lab 2

Partitioning peripherals– Declare dedicated peripherals – only connected/controlled by one

processor– Declare shared peripherals – Connected/controlled by multiple

processors– Decide Upon Locking Mechanism– Covered in Lab 3

Page 40: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Post Lab 3 Additional TopicAltera SoC Embedded Design Suite

Page 41: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Altera Software Development Tools

41

Eclipse– For ARM Cortex-A9 (ARM Development Studio 5 – Altera Edition)– For NIOS

Pre-loader/U-Boot Generator

Device Tree Generator

Bare-metal Libraries

Compilers– GCC (for ARM and NIOS)– ARMCC (for ARM with license)

Linux Specific– Kernel Sources– Yocto & Angstrom recipes: http://

rocketboards.org/foswiki/Documentation/AngstromOnSoCFPGA_1 – Buildroot: http://

rocketboards.org/foswiki/Documentation/BuildrootForSoCFPGA

Page 42: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

System Development Flow

FPGA Design Flow Software Design Flow

42

HardwareDevelopment

SoftwareDevelopment

Release Release• Quartus II Programmer• In-system Update

• Flash Programmer

Simulate Simulate• ModelSim, VCS, NCSim, etc.• AMBA-AXI and Avalon bus

functional models (BFMs)

Debug Debug• SignalTap™ II logic

analyzer• System Console

• GDB, Lauterbach, Eclipse

• Quartus II design software• Qsys system integration

tool• Standard RTL flow• Altera and partner IP

• Eclipse• GNU toolchain• OS/BSP: Linux, VxWorks• Hardware Libraries• Design Examples

Design Design

Page 43: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Inside the Golden System Reference Design

43

Complete system example design with Linux software support

Target Boards:– Altera SoC Development Kits– Arrow SoC Development Kits– Macnica SoC Development Kits

Hardware Design:– Simple custom logic design in

FPGA– All source code and Quartus II /

Qsys design files for reference

Software Design:– Includes Linux Kernel and

Application Source code– Includes all compiled binaries

Page 44: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

---Topics – Back Up---

44

Introductions: Altera and SoC FPGAs

Development Tools– How to Build Hardware: FPGA Hardware Tools & Build Flow– How to Build Software: Software Tools & Build Flow– How to Debug all-of-the-above: Debug Tools

Key Multi-processor System Design Points

Hardware design– Shared peripherals– Available Hardware IP

Software design– Message Protocols– Linux tools/mechanism available today

Page 45: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Quartus – Hardware Development Tool

Page 46: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Quartus II User Interface

Quartus II main window provides a high level of visibility to each stage of the design flow- Project navigator provides direct

visual access to most of the key project information

- Tasks window allows you to use the tools and features of the Quartus II software and monitor their progress from a flow-based layout

- Tool View window shows various tools and design files

- Messages window outputs messages from each process of the run

46

Project Navigator

Tasks window

Tool View window

Messages window

Page 47: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

47

Typical Hardware Design Flow

Design entry/RTL coding and early pin planning• Behavioral or structural description of design• Early pin planning allows board development in parallel

Functional verification

• Verify design behavior

Synthesis (mapping)• Translate design into device-specific primitives• Optimization to meet required area and

performance constraints

Placement and routing (fitting)• Place design in specific device resources with reference to

area and performance constraints• Connect resources with routing lines

Timing analysis• Verify performance specifications were met• Static timing analysis

Functional verification• Verify design will work in

target technology

PC board simulation and test• Simulate board design• Program and test device on board• On-chip tools for debuggingIn-system debugIn-system debug

Functional verificationFunctional verification

Functional verificationFunctional verification

Design creationDesign creation

Project creationProject creation

Project definition

Design compilationDesign compilation

Logic

I/O

Memory

Page 48: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Quartus II Feature Overview

48

Fully integrated development tool– Multiple design entry methods

Includes intellectual property- (IP-) based system design

– Up-front I/O assignment and validationEnables printed circuit board (PCB) layout early in the design process

– Incremental compilationReduces design compilation and improves timing closure

– Logic synthesisIncludes comprehensive integrated synthesis solutionAdvanced integration with third-party EDA synthesis software

– Timing-driven placement and routing

– Physical synthesisImproves performance without user intervention

– Verification solutionTimeQuest timing analyzerPowerPlay power analysis and optimizationFunctional simulation

– On-chip debug and verification suite In-system debugIn-system debug

Functional verificationFunctional verification

Functional verificationFunctional verification

Design creationDesign creation

Project creationProject creation

Project definition

Design compilationDesign compilation

Logic

I/O

Memory

Page 49: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

49

Quartus II Feature Overview (1/2)Feature Quartus II Software

Project creation New project wizard

Design entry

HDL editor Schematic editor State machine editor MegaWizard™ Plug-In Manager

– Customization and generation of IP Qsys system integration tool

Design constraint assignments Assignment editor Pin planner Synopsys Design Constraint (SDC) editor

Synthesis Quartus II Integrated Synthesis (QIS) Third-party EDA synthesis Design assistant

Fitting and placing design into FPGA to meet user requirements

Fitter (including physical synthesis)

Design analysis and debug Netlist viewers Advisors

Power analysis PowerPlay power analyzer

Page 50: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

50

Quartus II Feature Overview (2/2)

Feature Quartus II Software

Static timing analysis on post-fitted design TimeQuest timing analyzer

Viewing and editing design placement Chip Planner

Functional verification ModelSim®-Altera edition Third-party EDA simulation tools

Generation of device programming file Assembler

On-chip debug and verification

SignalTapTM II (embedded logic analyzer) In-system memory content editor Logic analyzer interface editor In-system sources and probes editor SignalProbe pins Transceiver Toolkit External memory interface toolkit

Technique to optimize design and improve productivity

Quartus II incremental compilation Physical synthesis optimization Design Space Explorer (DSE)

Page 51: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Quartus II Subscription Edition vs. Web Edition

51

Subscription Edition Web Edition

Device supported

MAX series devices: All (Excluding MAX7000 / 3000)

Cyclone III/IV/V FPGAs: All Arria II/V FPGAs: All Stratix III, IV, V FPGAs: All Cyclone V SoCs: All

MAX series devices: All (Excluding MAX7000 / 3000)

Cyclone V FPGAs: All (Excluding 5CEA9, 5CGXC9, and 5CGTD9)

Cyclone III/IV FPGAs: All Arria II GX FPGA: EP2AGX45 Cyclone V SoCs: All

Software features: Incremental compilation

and team-based design SSN Analyzer Transceiver Toolkit

Yes No

SignalTap II, SignalProbe Yes If TalkBack feature is enabled

Multi-processor support Yes If TalkBack feature is enabled

IP Base Suite MegaCore® functions Yes

No license required for OpenCore Plus hardware evaluation

License fee required for production use

Platform support Windows 32/64-bitLinux 32/64-bit

Windows 32/64-bitLinux 32/64-bit

License and maintenance terms

Perpetual(continues to work after

expiration)No license required except for IP core

Price $ Free

Page 52: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

52

How to Get Started Using Quartus II Software

Download Quartus II software today and start designing with Altera programmable logic devices

Quartus II Handbook - http://www.altera.com/literature/lit-qts.jsp – Guides you through the programmable logic design cycle from design to

verification– Also covers third-party EDA vendor tool interfaces

Online demonstrations - http://www.altera.com/quartusdemos – Easiest way to learn about the latest Quartus II software features and

design flows

Training classes - https://mysupport.altera.com/etraining– Offers online training classes and live presentation coupled with hands-on

exercises to learn about Quartus II features and design flows

Agenda

Page 53: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Qsys – System Integration Platform

Page 54: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Qsys System Integration Platform

54

Avalon® Interfaces

AMBA® AXI, APB, AHB

High-Performance Interconnect

Based on Network-on-a-Chip (NoC)Architecture

Hierarchy Design Reuse

DesignSystem

Add toLibrary

Package as IP

Real-Time System Debug

Industry-Standard Interfaces

®

Qsys is Altera’s design environment for- Deployment of IP, with hierarchal support- Development platform for Altera custom solutions - Design platform for customers to quickly create system designs

Automated Testbench Generation

Page 55: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Qsys User Interface

55

Toolbar

Interfaces Exportedfor Hierarchy

Improved Validation Display

Page 56: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

56

Qsys Benefits

Raises the level of design abstraction– System-level design and system visualization

Simplifies complex hierarchal system development– Automated interconnect generation

Provides a standard platform– IP integration, custom IP authoring, IP verification

Enables design re-use

Reduces time to market– System-level design reduces development time– Facilitates verification

Qsys improves productivity

Page 57: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

57

Transport Layer Transfers packets to destination

Transaction Layer Converts command

packets to transactions and responses to response packets

Transaction Layer Converts transactions to

command packets and responses packets to responses

Network-on-Chip Architecture

MasterNetworkInterface

MasterNetworkInterface

Avalon-STAvalon-MMAXI-MM

Avalon-MMAXI-MM

MasterInterface

MasterInterface

Avalon STNetwork

(Response)

Avalon STNetwork

(Command)

SlaveNetworkInterface

SlaveNetworkInterface

SlaveInterface

SlaveInterface

Page 58: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

58

Benefits of Network-On-Chip Approach

See white paper: Applying the Benefits of NoC Architecture to FPGA System Design

Independent implementation of transaction/transport layers– Different transport layer network topologies can be implemented without

transaction layer modificatione.g. High performance components on a wide high-frequency crossbar network

Supports standard interface interoperability– Mix and match interface types on transaction layer without transport layer

modification

Scalability– Segment network into sub-networks using

Bridges

Clock crossing logic

Page 59: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Industry-Standard Interfaces

59

Developer Standard Interface Protocol

Avalon® Interfaces

AMBA® AXI, AMBA APB, and AMBA AHB®

Qsys supports mixing of different interfaces

Page 60: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Target Qsys Applications

Qsys can be used in almost every FPGA design

Designs fall into two categories– Control plane

Memory mapped

Reading and writing to control and status registers

– Data planeStreaming

Data switching (muxing, demuxing), aggregation, bridges

Page 61: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

61

“Packets………I care about Latency!”

Qsys packet format is wide– Packet format contains a complete transaction in a single clock cycle– Supports:

Writes with 0 cycles of latency

Reads with a round-trip latency of 1 cycle– You can control latency via Qsys configuration

Separate command and response network– Increases concurrency

Command traffic and Response traffic don’t compete for resources

Page 62: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

62

Qsys: Wide Range of Compliant IP

Wide range of plug-and-play intellectual property (IP):

– Interface protocol IPE.g. PCIe, Ethernet 10/100/1000 Mbps (Triple-Speed Ethernet), Interlaken, JTAG, UART, SPI

– External memory interface IPE.g. DDR/DDR2/DDR3

– Video and imaging processing (VIP) IPE.g. VIP Suite including scaler, switch, deinterlacer, and alpha blending mixer

– Embedded processor IPE.g. Hardened ARM processor system, Nios II processor

– Verification IPE.g. Avalon-MM/-ST, AXI4, APB

>100 Qsys compliant IP available

Page 63: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Qsys as a Platform for System Integration

63

Connect IP and Systems

Automate Error-Prone Integration Tasks

Simplify Integration

Accelerate Development HDL

IP 1Custom 1

IP 2IP 3Custom 2

Interface protocols Memory DSP Embedded Bridges PLL Custom systems

Library of Available IP

Page 64: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

64

Additional Resources

Watch online demos (3-5 min) www.altera.com/qsys

Complete the Qsys tutorial (2-3 hrs) www.altera.com/qsys

Watch free webcasts (10-15 mins)www.altera.com/qsys

Sign up for Qsys training www.altera.com/training

Page 65: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

In-system Verification

Page 66: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Debug Challenges

66

Accessing and viewing internal signals

Not enough pins to use as test points

Capabilities in creating trigger conditions that correctly capture data

Verification of standard or proprietary protocol interfaces

Overall design process bottleneck

Debug Can Be Costly

Page 67: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

On-chip Debug

67

Access and view internal signals

Store captured data in FPGA embedded memory

Use JTAG interface as debug ports

Incrementally add internal signals to view

Reduce Debug Cycles by

Using On-chip Debug Tools

Page 68: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

On-chip Debug Technology

68

Debug tools communicate with the FPGA via standard JTAG interface

Multiple debug functions can share the JTAG interface simultaneously

– Altera’s system-level debugging (SLD) hub technology makes this possible

– All Altera tools and some third-party tools support the SLD hub JTAG interface

Download Cable

FPGA

JTAGTap

Controller

Node2

Node1

User's Design

(Core Logic)

NodeNSLD

Hub

NodeN-1

Page 69: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

On-chip Debug Tools in Quartus II Software

69

SignalTap II logic analyzer– Captures and displays hardware events, fast turnaround times– Incrementally creates trigger conditions and adds signals to view– Uses captured data stored in on-chip RAM and JTAG interface for communication

In-system memory content editor– Displays content of on-chip memory– Enables modification of memory content in a running system

External logic analyzer interface– Uses external logic analyzer to view internal signals– Dynamically switches internal signals to output

In-system sources and probes– Stimulate and monitor internal signals without using on-chip RAM

Exception: SignalProbe incremental routing feature does not use JTAG interface (i.e. SLD hub technology)

– Quickly routes an internal node to a pin for observation

Page 70: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

SignalTap II Logic Analyzer

70

Provides the most advanced triggering capabilities available in an FPGA-embedded logic analyzer

Proven to be invaluable in the lab– Captures bugs that would take weeks of simulation to uncover

Has broad customer adoptionFeatures and benefits

– An embedded logic analyzerUses available internal memory

– Probes state of internal signals without using external equipment or extra I/O pins

– Incremental compilation supportFast turnaround time when adding signals to view

– Advanced triggering for capturing difficult events/transactions– Power-up trigger support

Debug the initialization code– Megafunction support

Optionally, instantiate in HDL

Page 71: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

71

In-system Memory Content Editor

Enables FPGA memory content and design constants to be updated in-system, via JTAG interface, without recompiling a design or reconfiguring the rest of the FPGA

– Fault injection into system– Update memory while system is running – Change value of coefficients in DSP applications– Easily perform “what if?” type experiments in-system in just seconds

Supports MIF and HEX formats for data interchange

Megafunctions supported– LPM_CONSTANT, LPM_ROM, LPM_RAM_DQ, ALTSYNCRAM (ROM and single-

port RAM mode)

Enable memory content editor

Enable memory content editor

Page 72: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

72

In-system Memory Content Editor

Under Tools menu In-system Memory Content Editor

Page 73: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Altera SoC Embedded Design Suite

Page 74: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Included in SoC Embedded Design Suite (EDS)

74

Development Studio 5 Altera Edition– Awesome debugger, especially when combined with

USB Blaster II

Altera SoC FPGA System Trace Macrocells– Application development environment– Streamline system analyzer

Hardware Libraries

GNU-based bare-metal (EABI) compiler tools

U-Boot

Root file system to jump start software development

Pre-built Linux kernel– http://www.rocketboards.org for source trees and community access

Page 75: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

System Development Flow

FPGA Design Flow Software Design Flow

75

HardwareDevelopment

SoftwareDevelopment

Release Release• Quartus II Programmer• In-system Update

• Flash Programmer

Simulate Simulate• ModelSim, VCS, NCSim, etc.• AMBA-AXI and Avalon bus

functional models (BFMs)

Debug Debug• SignalTap™ II logic

analyzer• System Console

• GNU, Lauterbach, DS5

• Quartus II design software• Qsys system integration

tool• Standard RTL flow• Altera and partner IP

• ARM Development Studio 5

• GNU toolchain• OS/BSP: Linux, VxWorks• Hardware Libraries• Design Examples

Design Design

Page 76: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Altera SoC Embedded Design Suite

FPGA Design Flow Software Design Flow

76

HardwareDevelopment

SoftwareDevelopment

Release Release• Quartus II Programmer• In-system Update

• Flash Programmer

Simulate Simulate• ModelSim, VCS, NCSim, etc.• AMBA-AXI and Avalon bus

functional models (BFMs)• Virtual Target

Debug Debug• SignalTap™ II logic

analyzer• System Console

• GNU, Lauterbach, DS5

• Quartus II design software• Qsys system integration

tool• Standard RTL flow• Altera and partner IP

• ARM Development Studio 5

• GNU toolchain• OS/BSP: Linux, VxWorks• Hardware Libraries• Design Examples

Design Design

Software Development

HW/SW Handoff

FPGA-Adaptive Debugging

Page 77: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Altera SoC Embedded Design Suite

77

Comprehensive Suite SW Dev Tools

Hardware / software handoff tools

Linux application development– Yocto Linux build environment– Pre-built binaries for Linux / U-Boot– Work in conjunction with the Community Portal

Bare-metal application development– SoC Hardware Libraries– Bare-metal compiler tools

FPGA-adaptive debugging– ARM DS-5 Altera Edition Toolkit

Design examples

Firmware Development

Hardware-to-Software Handoff

Linux Application

Development

FPGA-Adaptive

Debugging

Free Web Edition Subscription Edition Free 30-day Eval

Page 78: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Hardware-to-Software Handoff

78

system.sopcinfosystem.iswinfo

Device Tree Generator

Preloader Generator

.c & .h source files

Linux Device Tree

Qsys system info, SDRAM calibration files, ID / timestamp, HPS IOCSR data

Hardware

Software

Page 79: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Hardware / Software Handoff Tools

79

Allow hardware and software teams to work independently and follow their familiar design flows

Take Altera Quartus® II / Qsys output files and generate handoff files for the software design flow

Device Tree standard specifies hardware connectivity so that Linux kernel can boot up correctly

Page 80: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Linux Application Development

80

Yocto build support for Linux– Yocto standard enables open, versatile, and

cost-effective embedded software development– Allows a smooth transition to commercial Linux distributions

Pre-built Linux kernel, U-Boot, and root file system to jump start software development

– Link to community portal for source trees and community access

Page 81: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

81

Bare-metal Application Development

Hardware Libraries– Software interface to all system

registers

– Functions to configure some basic system operations (e.g. clock speed settings, cache settings, FPGA configuration, etc.)

– Support board bring-up and diagnostics development

– Can be used by bare-metal application, device drivers, or RTOS

GNU-based bare-metal (EABI) compiler tools

Operating System

SoC FPGA

Application

HALBMALPAL

BSP

Bare-metal App

Hardware Libraries

Page 82: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Golden System Reference Design

82

Complete system design with Linux software support

– Simple custom logic design in FPGA

– All source code and Quartus II / Qsys design files for reference

– Include all compiled binaries- example can run on an Altera SoC Development Kit to jumpstart development

Page 83: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

DS-5 Altera Edition- One Tool, Three Usages

83

• JTAG-Based Debugging

• Board Bring-up

• OS porting, Drivers Dev,

• Kernel Debug

• Application Debugging

• Linux User Space Code

• RTOS App Code

• System Integration

• System Debug

• FPGA-Adaptive Debugging

1

2

3

Page 84: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

84

One Device, Two Debugging Tools?

Dedicated JTAG connection Visualize & control CPU

subsystem

JTAG

Dedicated JTAG connection Visualize & control FPGA

ARM® DS-5™ Toolkit Altera Quartus™ II Software

JTAGDSTREAM™

Page 85: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

85

One Device, Two Debugging Tools?

Dedicated JTAG connection Visualize & control CPU

subsystem

JTAG

Dedicated JTAG connection Visualize & control FPGA

ARM® DS-5™ Toolkit Altera Quartus™ II Software

JTAGDSTREAM™

Debugging

Barrier

No single tool/cable to visualize

and control both CPU and

FPGA domains

No way for CPU and FPGA to

cross trigger and correlate

hardware and software events

No “fixed” debugger can

address the needs of

“changing” FPGA hardware

Page 86: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Industry First: FPGA-Adaptive Debugging

86

ARM® Development Studio 5 (DS-5™) Altera® Edition ToolkitRemoves debugging barrier between CPUs and FPGA Exclusive OEM agreement between Altera and ARMResult of innovation in silicon, software, and business model

Altera USB-Blaster™II

Connection

Page 87: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

87

FPGA-Adaptive Debugging Features

Single USB-Blaster II cable for

simultaneous SW and HW debug

Automatic discovery of FPGA peripherals

and creation of register views

Hardware cross-triggering between the CPU and FPGA domains

Correlation of CPU software instructions and FPGA hardware events

Simultaneous debug and trace for Cortex-A9 cores and CoreSight™-compliant cores in FPGA

Statistical analysis of software load and bus traffic spanning the CPUs and FPGA

Page 88: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

88

DS-5 Altera Edition Productivity-Boosting Features

Industry’s most advanced multicore debugger for ARM

JTAG based system-level debugging, gdbserver-based application debuggingin one package

Yocto plugin to enableLinux based application development

Integrated OS-aware analysisand debug capability

Page 89: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Visualization of SoC Peripherals

89

Register views assist the debug ofFPGA peripherals

– File generated by FPGA tool flow

– Automatically imported in DS-5 Debugger

Debug views for debug of software drivers

– Self-documenting– Grouped by peripheral,

register and bit-field

CMSIS

Peripheral register descriptions

Page 90: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

90

FPGA-Adaptive, Unified DebuggingFPGA connected to debug and trace buses for non-intrusive capture and visualization of signal events

Simultaneous debug and trace connection to CPU coresand compatible IP

CorrelateFPGA signalevents withsoftware eventsand CPUinstructiontrace usingtriggers andtimestamps

Page 91: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Cross-Domain Debug 1

91

Trigger from software world to FPGA world

HARDWARE TRIGGER!

SOFTWARE TRIGGER

Page 92: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Cross-Domain Debug 2

92

Trigger from FPGA world to software world

HARDWARE TRIGGER

EXECUTION STOPOR

SW TRACE TRIGGER

EXECUTION STOPOR

HW TRACE TRIGGER

Page 93: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Correlate HW and SW Events

93

Debug event trigger point set from either:

SignalTap™ II Logic Analyzer

or DS-5 debugger

Captured trace can then be analyzed using timestamp-correlated events

SignalTap II Logic Analyzer

ARM® DS-5™ Toolkit

Timestamp Correlated

Page 94: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

System-LevelPerformance Analysis

94

Performance bottlenecks in SoCs often come from the CPU interaction with the rest of the SoC

Streamline visualizes software activity with performance counters from the SoC and FPGA to enable full system-level analysis

Streamline only requires a TCP/IP connection to the SoC

ARM® DS-5™ Streamline

Linux OS Counters

Processor Counters,Aggregated, or Per Core

FPGA Block Counters

Power Consumption

Process/Thread Heat Map

Application Events

Page 95: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

95

Altera SoC EDS- Key Benefits

One-stop shop from Altera

All the tools and examples for rapid starts

Familiar tools interface, easy to use

Share tools and knowledge to increase team productivity

Best multicore debugger tools for ARM architectureUnprecedented visibility and control acrossprocessor cores and across CPU, FPGA domains

Faster time to market, lower development costs!

Page 96: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Target Users and Usages

96

Web Edition

Subscription Edition

Board Bring-up Yes

Device Drivers Dev Yes

OS Porting Yes

Baremetal Programming Yes

RTOS Based App Dev Yes

Linux Based App Dev Yes Yes

Multicore App Debugging Yes

System Debugging Yes

Page 97: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

SoC EDS Editions Summary

97

Component Key Feature Web Edition

SubscriptionEdition

30-DayEvaluation

Hardware/Software Handoff Tools

Preloader Image Generator x x x

Flash Image Creator x x x

Device Tree Generator (Linux) x x x

ARM DS-5 Altera Edition

Eclipse IDE x x x

ARM Compiler* x

Debugging over Ethernet (Linux) x x x

Debugging over USB-Blaster II JTAG x x

Automatic FPGA Register Views x x

Hardware Cross-triggering x x

CPU/FPGA Event Correlation x x

Compiler Tool Chains Linaro Tool Chain (Linux) x x x

CodeBench Lite EABI (Bare-metal) x x x

Hardware Libraries Bare-metal programming Support x x x

SoC Programming Examples

Golden System Reference Design x x x

*ARM Compiler is available in DS-5 Professional Edition, available directly from ARM

Page 98: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Coordinated Multi-Channel Delivery

98

Altera.com

Quartus II Programmer SignalTap II

Altera.com

Pre-built Binaries• Kernel• U-Boot• Yocto• Minimal RFS• Tool chains• Handoff tools• HW Libraries• Examples• Documentation

RocketBoards.org

Frequent Updates• Kernel source• U-Boot source• Yocto source• RFS source• Toolchain

source• Public git• Wiki• Mailman

Partners

BSPsMiddleware3rd Party Tools

Page 99: Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin Member Technical Staff Embedded Applications

Altera NIOS Software Design Tools

99

Nios II SBT for Eclipse key features:

– New project wizards and software templates

– Compiler for C and C++ (GNU)

– Source navigator, editor, and debugger

– Eclipse project-based tools