keystone arm-dsp interaction keystone training multicore applications literature number: sprp###

Post on 21-Dec-2015

277 Views

Category:

Documents

11 Downloads

Preview:

Click to see full reader

TRANSCRIPT

KeyStone ARM-DSP InteractionKeyStone TrainingMulticore ApplicationsLiterature Number: SPRP###

Agenda

• MPM• Memory management • ARM-DSP Communication Architecture• Resource management

Typical Keystone II model

4 A15 ARM running SMP LINUX

C66 Core3

C66 Core2

C66 Core1

C66 Core0

C66 Core7

C66 Core6

C66 Core5

C66 Core4

MPMMPM

MPM

MPM

MPM

MPM

MPMMPM

MPM – Multi-processor manager

MPM Operation• MPM server daemon maintains a state

machine for each slave core• MPM command line (or client) utility provides

a command line interface to MPM server. Can be called from a terminal or from an application

• MPM can reset a core, load a core with executable, run a core, collect messages from a core, and collect information after core crash (if there is an exception)

Core state machine

Managing a core• From a terminal

– mpmcl load dsp0 program.out– Must be in elf format– Part of the lab exercises

• From an application– Include file is part of MCSDK release at

/mpm_2_00_01_01/include/mpmclient.h

– Library is part of MCSDK release at /mpm_2_00_01_01/lib/libmpmclient.a

DSP Image requirements• DSP image must be in ELF format• MPM must know about the memories that the

image uses, and it must not overwrite ARM dedicated memories– More about memory management later

• Special sections must be defined to facilitate communications between DSP core and ARM– This is done by the RTSC tools if IPC or MPM used

var Resource = xdc.useModule('ti.ipc.remoteproc.Resource');

– The next slide shows a project map file with the resource section

Mpm_example map file

ARM accessing core information • MPM server monitor the resource table

section• System_printf writes messages to resource

table• The user (or application) can access the

messages in /sys/kernel/debug/remoteproc/remoteprocN/trace0 – Where N is the DSP core number

ARM accessing core Dump • MPM can monitor crash events from DSP and

get core dump– The DSP code needs exception hook – Defined a special memory section

• Fault sample test application is part of pdk release at pdk_keystone2_3_00_04_18/packages/ti/instrumentation/fault_mgmt/test

MPM Configuration • The file mpm_config.json is a Java Script Object Notation file

that describes the DSP access memory segments to the ARM.• 10 memory segments are defined:

– Eight segments are for each DSP core l2 local memory– One segment for MSM memory– One segment for the part of DDR that is used by the MPM as

shared memory• mpm_config.json definition of Core 0 L2 memory:

11

{"name": "local-core0-l2", "localaddr": "0x00800000",

"globaladdr": "0x10800000",

"length": "0x100000","devicename": "/dev/dsp0"},

MPM Configuration• The two shared memory definitions show that the DSP

dedicated memory in DDR starts at 0xa0000000 and has a size of 512M (-1K) bytes (TI default)

• 1K of memory is needed for the MPM management

12

{"name": "local-msmc","globaladdr": "0x0c000000","length": "0x600000","devicename": "/dev/dspmem"},{"name": "local-ddr","globaladdr": "0xa0000000","length": "0x1FFFFC00","devicename": "/dev/dspmem"}

Last word about MPM

• U-BOOT variable mem_reserve define the DDR area that is used by MPM to load DSP image– More about it later

Agenda

• MPM• Memory management • ARM-DSP Communication Architecture• Resource management

Managing Keystone II Memories

KeyStone ARM-DSP Interaction

Disclaimer• The following slides show how the TI implementation that

runs on the TCIEVM6638K2K works.• Other implementations may be different

16

Keystone II shared memoriesPhysical Addresses

Keystone II Device

MSMC memoryAddresses

00 0c00 0000 to

00 0c5f ffff

DDRAAddresses

08 0000 0000 to

09 ffff ffff

DDRBAddresses

00 8000 0000 To

00 ffff ffff

For a complete description of possible memory aliasing see the device data manualDDR3A_REMAP_EN pin determines the mapping of 00 0800 0000 to DDRA or DDRB

Translating Logical memory to physical memory

• DSP and all other TeraNet masters – MPAX registers– Static translation (until the MPAX register is changes)

• ARM – LPAE– MMU Dynamic translation to 40 bits, can access 8G of DDRA – Controlled by U-boot environment variable mem_lpae=1

(default)

• ARM NO LPAE – Disabled MMU, static, can access only 2G of DDRA – Controlled by U-boot environment variable mem_lpae=0

DDRA Size for the ARM• U-boot environment variable ddr3a_size tells the system how much

memory is available– 0: 2GB (default)– 4: 4GB– 8: 8GB

• Memory is used by Linux Kernel, Linux Users domain and DSP cores. The next slides describe TI partition of the DDRA memory

• U_BOOT uses device tree and the parameters to create memory segments

• More information how to configure system with 8GB see http://processors.wiki.ti.com/index.php/MCSDK_UG_Chapter_Exploring#Using_more_than_2GB_of_DDR3A_memory

DDR3A partition• DDR3A is partitioned into two segments• Memory size of 8G

– The first segment starts at physical address 0x08 0000 0000 and size of 2G.

– The second segment starts at 0x08 8000 0000 and size 6G.– Part of the first segment of memory is reserved for the DSP

memory. This is used to load programs and data from the ARM user’s domain to the DSP memory

– Part of the first segment is used by the kernel

• Smaller DDR3A size may have different partition (see next slides)

20

21

6638K2K Memory Architecture (8G DDRA)

DSP dedicated memory

ARMLinux User mode

and kernel memory

Segment 0 size 2G

0x08 0000 0000

0x0A 0000 0000

ARMLinux User mode

Segment 1 size 6G

DSP dedicated area

0x08 8000 0000

22

6638K2K Memory Architecture(2G DDRA –larger DSP memory)

DSP dedicated memory

ARM kernel memoryAnd User Mode

Segment 0 size 2G

0x08 0000 0000

DSP dedicated area 1536M

0x08 8000 0000

Logical memoryAssume default MPAX

registers

0x8000 0000

0xA000 0000

0xFFFFFFFF

23

6638K2K Memory Architecture(1G DDRA) (32bit DDR)

DSP dedicated memory

ARMLinux User mode

and kernel memory

Segment 0 size 1G

0x08 0000 0000

DSP dedicated area 512M

0x08 4000 0000

Logical memoryAssume default MPAX

registers

0x8000 0000

0xA000 0000

0xC000 0000

Define Memories Available To MMU

• TI LINUX u-boot Keystone source release (git) u-boot-keystone/board/ti/tci6638_evm has the file board.c. This file sets the memory architecture for the Linux

• The same directory has other files that are used to configure DDR3A and DDR3B and POST code

• The next slides show parts of the file board.c• Kernel Drivers get information about resources

(including memories) from the device tree. Device tree will be discuss later

24

Board.c (1)/* * Copyright (C) 2012 Texas Instruments Inc. * * TCI6638 EVM : Board initialization * * See file CREDITS for list of people who contributed to this * project. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */

Board.c (2)#if defined(CONFIG_OF_LIBFDT) && defined(CONFIG_OF_BOARD_SETUP)#define K2_DDR3_START_ADDR 0x80000000void ft_board_setup(void *blob, bd_t *bd){ u64 start[2]; u64 size[2]; char name[32], *env, *endp; int lpae, nodeoffset; u32 ddr3a_size; int nbanks;

env = getenv("mem_lpae"); lpae = env && simple_strtol(env, NULL, 0);

ddr3a_size = 0; if (lpae) { env = getenv("ddr3a_size"); if (env) ddr3a_size = simple_strtol(env, NULL, 10); if ((ddr3a_size != 8) && (ddr3a_size != 4)) ddr3a_size = 0; }

Board.c (3)

nbanks = 1; start[0] = bd->bi_dram[0].start; size[0] = bd->bi_dram[0].size;

/* adjust memory start address for LPAE */ if (lpae) { start[0] -= K2_DDR3_START_ADDR; start[0] += CONFIG_SYS_LPAE_SDRAM_BASE; } // segment 0

if ((size[0] == 0x80000000) && (ddr3a_size != 0)) { size[1] = ((u64)ddr3a_size - 2) << 30; start[1] = 0x880000000; nbanks++; }// segment 1

Linux Device Tree• Linux Device tree is an ASCII file XX.dts that

describes the resources available to Linux. A compiled version of the file XX.dtb is used by the Linux kernel.

• Device tree source code has a well-defined syntax

• The information in the device tree is used by device drivers

Standard Device Tree Examplek2hk-evm.dts is from the public git server

/dts-v1/;

/include/ "keystone.dtsi"/include/ "k2hk.dtsi"

/ { compatible = "ti,k2hk-evm", "ti,keystone";

aliases { ethernet1 = &interface1; mdio-gpio0 = <&mdiox0>; };

Device Tree Defines Available CPU

cpus { interrupt-parent = <&gic>;

cpu@0 {compatible = "arm,cortex-a15";

};cpu@1 {

compatible = "arm,cortex-a15";};cpu@2 {

compatible = "arm,cortex-a15";};cpu@3 {

compatible = "arm,cortex-a15";};

};

Memory Defined in Device Tree

• The device tree defines which memory is used by the Linux and which is used by the DSP

• The Device Tree for the EVMK2H is k2hk-evm.dts. This tree defines several memories, including the total logical memory and what part of it will be used by the kernel. It also defines what memories will be reserved for the DSP.

31

Memory Definitions for 6638K2K-Device Tree

dspmem: dspmem {compatible = "linux,rproc-user";mem = <0x0c000000 0x000600000 0xa0000000 0x20000000>;

label = "dspmem";};

memory { reg = <0x00000000 0x80000000 0x00000000 0x20000000>; };

NOTES: linux-keystone/arch/arm/boot/dts /k2hk-evm.dts includes two files, keystone.dtsi and k2hk.dtsi. The memories are defined in these filesThe start address of the DSP DDR is determined by the U-BOOT parameters.When building DSP code, one must be aware what is the start DDR address for DSP

DSP Definition in Device Tree

• For each C66x CorePac, seven memory definitions:• Address of Core control registers (boot address,

power)• L1 P global memory address• L1 D global memory address• L2 global memory address

• In addition, the MSM memory address and DDR addresses that are dedicated to DSP usage are defined.

• DSP code that uses DDR must use ONLY the DDR addresses that are assigned to it.

33

Memory Definitions from 6638K2KDevice Tree

dsp7: dsp7 { compatible = "linux,rproc-user"; reg = <0x0262005C 4 0x02350858 4 0x02350a58 4 0x0262025C 4 0x17e00000 0x00008000 0x17f00000 0x00008000 0x17800000 0x00100000>; reg-names = "boot-address", "psc-mdstat", "psc-mdctl", "ipcgr", "l1pram", "l1dram", "l2ram";

U-BOOT and mem_reserve• The size of the DSP DDR reserve memory is defined in

UBOOT as mem_reserve. The default size is 512M – 0x2000 0000

• To change the size of the reserve memory, the value mem_reserve should be changed in the UBOOT using setenv mem_reserve value

• NOTE: The UBOOT code uses the function ustrtoul to convert the ASCII value into a numeric value. It understands notations such as 512M.

35

U-BOOT and mem_reserve• Question: Is changing the mem_reserve value in

UBOOT enough to change the memory segment that is dedicated to the DSPs for MPM?– The file mpm_config.json tells MPM what memories are

available. It must agree with the device tree and the UBOOT

36

Building DSP Code for MPM• DSP projects that use RTSC must define a

platform.• The standard TI platform (standard = in the

release) was not built to work with MPM if DDR is used by the DSP.

• If the DSP code uses only L2 memory, no action is needed. But if the DSP code uses DDR, a new platform must be defined.

• Projects that do not use RTSC must have a linker command to define the memory structure. The linker command must be modified to work with MPM.

Standard K2H Platform Definitionfor DSP RTSC Build

38

Define New DSP Platform:2G DDR, 512M Dedicated ARM Memory

39

Agenda

• MPM• Memory management • ARM-DSP Communication Architecture• Resource management

ARM-DSP Communication ArchitectureKeyStone ARM-DSP Interaction

ARM-DSP Collaboration

• MPM: Managing the DSP cores from the ARM– DSP executables are in the ARM file system– ARM can reset, load, run, and get messages and dump core

out of a DSP core

• IPC: Exchanging data and messages between ARM and DSP– User Space libraries– Applications that use IPC – OpenCL, openMP

User Mode ARM and DSP IPC Issues• Logical and physical Memory

– Continuous Memory– Different translation types

• Linux Protection– By-pass the MMU, get physical address from kernel space

• Linux and DSP Coherency– There is not coherency between the ARM memory and the

DSP direct access

• Free messages and data– How does the ARM know when it can re-use the memory?

Current solution (release 4_18)- IPCv3• From ARM to DSP

• Copy the data from user space to kernel space memory• Copy the data from Kernel space memory to share memory

DSP• Solve memory issues• Solve coherency issues on ARM (DSP does not have hardware

coherency anyhow)• Solve protection issue

• Needs close loop protocol to re-use shared memory

• Involves two copies, requires CPU resources – Control Path

IPC Types: IPCv3Control Path: IPCv3

– Standard APIs agree with older versions of IPC– General purpose control path supports reliable

delivery– Designed to deliver short messages, but can be

used for “unlimited” data movement– Uses RPMSG kernel driver for clean partition

between user and kernel space

HPC solution (release 4_19)- Data path• Used under-the-hood for openCL and openMP

systems• Use cmem – get a continuous buffer to user

domain• Use the Navigator to move data – one copy by

the navigator PktDMA• Navigator takes care of free memory• Faster than IPCv3 solution

Future solution Navigator based IPCv3

• Use the system that was developed in HPC release for genuine IPC messages between ARM and DSP

• Will be available in future releases (as of July 2014)

Support for User Develop IPC

Fast Path: PktIO and QMSS• Continuous memory is provided by cmem • On the ARM side, there is a library netapi that

supports creating, sending, and receiving packets from the ARM user space.

• Fire and forget (send) polling (ARM) for receive. On DSP, receive is polling, or interrupt, or accumulators (using QMSS DLL)

• Navigator-based transaction, sending packets (descriptors). Up to 64 memory regions can be defined in KeyStone II

ARM IPC Support

• Remote Processor Messaging (RPMsg) is an open-source friendly Inter Processor Communication (IPC) framework

• SysLink (Part of the IPC release) is a runtime library that provides software connectivity between multiple processors. Each processor may run either an HLOS (such as Linux, QNX, etc.) or an RTOS (such as SYS/BIOS).

IPC V3

FeaturesAnd

speed

Complexity

Notify

messageQ

OpenCL and openMP solutions

User defined PKTIO Library

(QMSS on DSP side)

IPC Options

IPC Examples

• MCSDK release has several examples that show IPC properties

• Instructions how to install IPC and build these examples on the Linux side and the DSP side are given in the release.

• The out-of-box example is described in the next few slides.

Release IPC Examples

Agenda

• MPM• Memory management • ARM-DSP Communication Architecture• Resource management

Managing Peripherals and IP in a Heterogeneous Device KeyStone ARM-DSP Interaction

Configure and Use peripheralsIn Heterogeneous Device

• DSP - Chip Support Library (CSL) and Low-Level Drivers (LLD) on DSP

• ARM- LINUX drivers on the ARM• Sharing resource configuration, control, and

usage between different cores is done by Resource management– Protect resources from conflict usage

DSP View of Peripherals and IP • Chip support Library (CSL) provides access to the

peripherals and other IP– CSL translates physical MMR locations into symbols, and

provides functions to manipulate the MMR

• Low level drivers (LLD) is an abstraction layer that simplified the usage of peripherals

• Some peripherals have high layer libraries (on the top of LLD) to further abstract peripherals usage details from the application

DSP: Interface via LLD and CSL Layers

CSL Registers Layer

CSL Function Layer

LLD Layer

Antenna Interface 2 (AIF2)Bit-rate Coprocessor (BCP)

EDMAEMACFFTC

HyperLinkNETCP: Packet Accelerator (PA)

NETCP: Security Accelerator (SA)PCIe

Packet DMA (PKTDMA)Queue Manager (QMSS)

Resource ManagerSRIOTSIP

Turbo Decoder (TCPD)Turbo Encoder (TCPE)

SemaphoresGPIOI2C

UARTSPI

EMIF 16McBSP

UPPIPC Registers

TimersOther IP

Linux Control Peripherals and IP • MMU controls memory access for user mode in

Linux. Applications do not see physical addresses.• Device drivers can be called by the applications. They

can access physical memory.• Linux Device Drivers provide:

– Modularity– Standard interface– Standard structure

• Linux kernel modularity scheme enables new device drivers to be easily added to the kernel

Linux Application API

• Device drivers can be loaded during boot time or loaded (as modules) during run time.

• Driver classification:– Character device– Block device– Network interface

• Each driver type has standard API. For example, character devices will have open and close as well as read and write functions.

Hardware Registers

Application _User Space

Kernel Space

Device Driver (How)

Operating System Utility orApplication Driver (what)

KeyStone Drivers StructureExample - SRIO

Device Dependent Codeu-boot-keystone/drivers/rapidio/keystone_rio.h

(Where u-boot-keystone directory is cloned from the public git)

Generic Driver Filelinux-keystone/drivers/rapidio/rio-driver.c

API to the Applicationlinux-keystone/drivers/rapidio/rio.h

(Where linux-keystone directory is cloned from the public git)

66

Linux Drivers

linux-keystone/drivers (cloned from the public git)

Resource Management

KeyStone ARM-DSP Interaction

Keystone II RM: Major Requirements

• Dynamically manage resources • Enable management of resources at all levels within system

software architecture– Core, task, application component (LLD)– During initialization and during run time, from any thread

• Runtime modification of resource permissions.• Automate reservation of resources taken by Linux kernel• Use generic, processor-independent transport interface that

allows RM instances to communicate regardless of device hardware architecture

Keystone II RM – Overview (1)

• Instance-based Client/Server Architecture:– Three instance hierarchy:

• RM Server – Global management of resources and permission policies• RM Client – Provide resource services to system software elements• RM Client Delegate (CD)

– Offloads management of resource subsets from Server– Manages a sub-pool of resources

– Resource services provided via instance service API

• RM Instances Communication Over Generic Transport Interface– Application must setup data paths between RM instances– Allows RM to run on any device architecture without modification to RM

source

Keystone II RM – Overview (2)• RM server is a Linux process.• Two files define the behavior of the RM; The

global resource list and the policy file.• Both files are written in the same syntax as

device tree and are compiled the same way• From user point of view, the RM calls are

transparent (meaning, when you call open, init and so on, RM is called implicitly)

Keystone II RM – Overview (3)• Global Resource List (GRL)

– GRL captures all resources that will be tracked for a given device

– Facilitates automatic extraction of resources used by ARM Linux from Linux DTB

• Policies specify RM instance resource privileges– Resource initialization, usage, and exclusive right

privileges assigned to RM instances– Runtime modification of policy privileges

• APIs and Linux CLI (Planned)

Keystone II RM: Overview

ARM/DSP n+2

RM Client Instance

ARM/DSP n

Transport-Specific Data Path

ARM/DSP n+1

ARM DSP Transport

Transport API

RM CD Instance

Resources Allocated from Server

CD Service Transaction Handler

Client Service Transaction Handler

RM Server Instance

Resource Allocators

PA

QMSS

Allocation policies

CPPI

QMSS

Etc

User Mode (ARM)

Available resources are inverse of Linux DTB

Resource Policies

Transport API

PA

CPPI

QMSS

Etc

Memory Allocator

CPPI

PA

Mem Alloc

Etc

ARM/DSP n+3

RM Client Instance

QMSS

CPPI

PA

Mem Alloc

Etc

Service

Port

Service

Port

Transport API

Client Service Transaction Handler

DSP DSP Transport DSP DSP Transport

Service

Port

ARM DSP Transport

Transport API

CD Service Transaction Handler

Service

Port

Global Resource List (GRL)

Linux DTB

DSP DSP Transport

Keystone II RM: Services• RM Services:

– Allocate (initialization, usage)– Free– Map resource(s) to NameServer name– Get resource(s) tied to existing NameServer name– Unmap resource(s) from existing NameServer name

• Non-blocking service requests directly return result• Blocking service requests return ID to system

Keystone II RM:Global Resource List (GRL)

• Specified in Device Tree Source (DTS) format– Open source, dual GPL/BSD-licensed LIBFDT used for parsing GRL

• Input to server on initialization• Server instantiates allocator for each resource specified in GRL• A GRL specification for a resource includes:

– Resource name– Resource range (base + length)– Linux DTB alias path (if applicable)– Resource NameServer assignments (if applicable)

• Permissions not specified in GRL; In the policies

GRL Example

• An example of the Global Resource List and policy files can be found in the MCSDK:

/MCSDK_3_00_00_XX/pdk_keystone2_1_00_00_XX/packages/ti/drv/rm/device/k2h

• The first few lines of the file are shown in next slide.• In the same directory there are two policy files:

– policy_dsp_arm.dts – policy_dsp-only.dts

global-resource-list-arm-dsp.dts /dts-v1/;

/ { /* Device resource definitions based on current supported QMSS, CPPI, and * PA LLD resources */

qmss { /* Number of descriptors inserted by ARM */ ns-assignment = "ARM_Descriptors", <0 4096>;

/* QMSS in joint mode affects only -qm1 resource */ control-qm1 { resource-range = <0 1>; }; control-qm2 { resource-range = <0 1>; };

/* QMSS in joint mode affects only -qm1 resource */ linkram-control-qm1 { resource-range = <0 1>; };

Policy Example: policy_dsp_arm.dts (1) /dts-v1/;

/* Keystone II policy containing reserving resources used by Linux Kernel */

/ { /* Valid instance list contains instance names used within TI example projects * utilizing RM. The list can be modified as needed by applications integrating * RM. For an RM instance to be given permissions the name used to initialize it * must be present in this list */ valid-instances = "RM_Server", "RM_Client0", "RM_Client1", "RM_Client2", "RM_Client3", "RM_Client4", "RM_Client5", "RM_Client6", "RM_Client7";

Policy Example: policy_dsp_arm.dts (2) qmss { control-qm1 { assignments = <0 1>, "iu = (*)"; }; control-qm2 { assignments = <0 1>, "iu = (*)"; };

linkram-control-qm1 { assignments = <0 1>, "(*)"; /* Used by Kernel */ }; linkram-control-qm2 { assignments = <0 1>, "(*)"; /* Used by Kernel */ };

linkram-qm1 { assignments = <0x00000000 0xFFFFFFFF>, "iu = (*)"; }; linkram-qm2 {

For More Information• Software downloads and device-specific Data

Manuals for the KeyStone II SoCs can be found at TI.com/multicore.

• For articles related to multicore software and tools, refer to the Embedded Processors Wiki for the KeyStone Device Architecture.

• For questions regarding topics covered in this training, visit the support forums at theTI E2E Community website.

Backup – PktLib Utility Libraries

85

For More Information• Software downloads and device-specific Data

Manuals for the KeyStone SoCs can be found at TI.com/multicore.

• Multicore articles, tools, and software are available at Embedded Processors Wiki for the KeyStone Device Architecture.

• View the complete C66x Multicore SOC Online Training for KeyStone Devices, including details on the individual modules.

• For questions regarding topics covered in this training, visit the support forums at theTI E2E Community website.

top related