snapdragon™ heterogeneous compute sdk...qualcomm® snapdragon™ heterogeneous compute sdk...

464
Qualcomm ® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm ® Snapdragon Heterogeneous Compute SDK is a product of Qualcomm Technologies, Inc. Other Qualcomm products referenced herein are products of Qualcomm Technologies, Inc. or its other subsidiaries. This technical data may be subject to U.S. and international export, re-export, or transfer ("export") laws. Diversion contrary to U.S. and international law is strictly prohibited. © 2017 Qualcomm Technologies, Inc. All rights reserved.

Upload: others

Post on 10-Oct-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDKDocumentation and Interface Specification

80-P2432-1 B

August 28, 2019

Qualcomm® Snapdragon™ Heterogeneous Compute SDK is a product of Qualcomm Technologies, Inc.Other Qualcomm products referenced herein are products of Qualcomm Technologies, Inc. or its othersubsidiaries.

This technical data may be subject to U.S. and international export, re-export, or transfer ("export") laws.Diversion contrary to U.S. and international law is strictly prohibited.

© 2017 Qualcomm Technologies, Inc. All rights reserved.

Page 2: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Submit technical questions at:

[email protected]

Qualcomm is a trademark of Qualcomm Incorporated, registered in the United States and other countries.All Qualcomm Incorporated trademarks are used with permission. Other product and brand names may betrademarks or registered trademarks of their respective owners.

Qualcomm Technologies, Inc.5775 Morehouse Drive

San Diego, CA 92121-1714U.S.A.

Page 3: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Revision History

Revision Date DescriptionA March 2018 Version 1.0B October 2018 Version 1.1

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 3

Page 4: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK CONTENTS

Contents1 Introduction 16

1.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.3 Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.5 Technical Assistance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.6 Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2 Installing Snapdragon™ Heterogeneous Compute SDK 182.1 Verifying your installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.2 Integrating HetCompute with Android NDK Applications . . . . . . . . . . . . . . . . . . 192.3 Hexagon DSP Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.4 OpenCL C++ Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Getting Started 223.1 Writing your first HetCompute program . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.1.1 Building a HetCompute program using ndk-build . . . . . . . . . . . . . . . . . . 23

4 User Guide 244.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.1.1 Writing a HetCompute Application . . . . . . . . . . . . . . . . . . . . . . . . . . 274.1.1.1 Parallel vector addition . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.1.1.2 Parallel sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.1.1.3 Parallelism using tasks . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.1.2 Executing a HetCompute Application . . . . . . . . . . . . . . . . . . . . . . . . 314.2 Parallel Programming Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2.1 Overview of HetCompute Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . 324.2.2 Parallel Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2.2.1 Parallel For Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.2.2.2 Parallel Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.2.3 Parallel Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.2.4 Parallel Scan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.2.5 Parallel Divide-and-Conquer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.2.6 Parallel Sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.2.7 Advanced Topics for Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.2.7.1 Pattern Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.2.7.2 Tuner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.2.8 Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.2.8.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.2.8.2 HetCompute Pipeline Example . . . . . . . . . . . . . . . . . . . . . . 464.2.8.3 HetCompute Pipeline Details . . . . . . . . . . . . . . . . . . . . . . . 474.2.8.4 Launch the HetCompute pipeline . . . . . . . . . . . . . . . . . . . . . 514.2.8.5 Heterogeneous Pipeline (HetCompute Beta Feature) . . . . . . . . . . 534.2.8.6 Heterogeneous Pipeline Details . . . . . . . . . . . . . . . . . . . . . . 56

4.3 Introduction to Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.3.1 Kernels: The Path to Heterogeneity . . . . . . . . . . . . . . . . . . . . . . . . . 60

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 4

Page 5: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK CONTENTS

4.3.1.1 Revisiting Hello World . . . . . . . . . . . . . . . . . . . . . . . . . . . 604.3.1.2 Creating a Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.3.1.3 Setting Kernel Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . 644.3.1.4 Kernels: Advanced Topics . . . . . . . . . . . . . . . . . . . . . . . . . 654.3.1.5 Poly-kernels (Beta feature) . . . . . . . . . . . . . . . . . . . . . . . . 664.3.1.6 hetcompute::range and hetcompute::index . . . . . . . . . . . . . . . . 684.3.1.7 Using hetcompute::range<N> to represent ND-Range (in OpenCL) . . 684.3.1.8 hetcompute::range<1> . . . . . . . . . . . . . . . . . . . . . . . . . . 684.3.1.9 hetcompute::range<2> . . . . . . . . . . . . . . . . . . . . . . . . . . 694.3.1.10 hetcompute::range<3> . . . . . . . . . . . . . . . . . . . . . . . . . . 704.3.1.11 Strided Ranges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.3.2 Creating Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.3.2.1 Create Tasks Using Lambda Expressions . . . . . . . . . . . . . . . . 724.3.2.2 Create Tasks Using Classes . . . . . . . . . . . . . . . . . . . . . . . . 744.3.2.3 Create Tasks Using Function Pointers . . . . . . . . . . . . . . . . . . 76

4.3.3 Task Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774.3.4 Life of a HetCompute Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.3.4.1 The Green Line to Successful Completion . . . . . . . . . . . . . . . . 804.3.4.2 The Red Line to Cancellation . . . . . . . . . . . . . . . . . . . . . . . 80

4.3.5 Launching Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814.3.6 Task Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4.3.6.1 Control Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . 834.3.6.2 Data Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834.3.6.3 Heterogeneous Task Graphs . . . . . . . . . . . . . . . . . . . . . . . 85

4.3.7 Task Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884.3.7.1 Group Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884.3.7.2 Launching Tasks or Kernels to Groups . . . . . . . . . . . . . . . . . . 894.3.7.3 Group Cancellation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

4.3.8 Waiting for Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974.3.9 Exceptions and Cancellation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

4.3.9.1 Aggregate Exception . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1004.3.9.2 GPU/DSP Exception . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1014.3.9.3 Cancellation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1014.3.9.4 Synchronization Points where Exceptions are Observable . . . . . . . 1024.3.9.5 Canceling a Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

4.3.10 Blocking Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094.3.10.1 Blocking Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094.3.10.2 hetcompute::blocking . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

4.3.11 Algebraic Operations on Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1114.3.12 Task-Pointer Collapsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1134.3.13 Unleashing Asynchrony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

4.3.13.1 finish_after . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1154.3.13.2 Asynchronous APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1184.3.13.3 Cancellation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1204.3.13.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

4.4 Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1224.4.1 Basic Usage of Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1224.4.2 Using Buffers with Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

4.4.2.1 Buffers with CPU Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . 1244.4.2.2 Buffers with GPU Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . 125

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 5

Page 6: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK CONTENTS

4.4.2.3 Buffers with DSP Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . 1264.4.3 Synchronized and Concurrent Use . . . . . . . . . . . . . . . . . . . . . . . . . 127

4.4.3.1 Synchronized Access to Buffers Across Host Code and Tasks . . . . . 1274.4.3.2 Concurrent Access by Tasks . . . . . . . . . . . . . . . . . . . . . . . 127

4.4.4 Creating Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1274.4.4.1 With Storage Fully Managed by HetCompute . . . . . . . . . . . . . . 1274.4.4.2 With User-provided Initial Storage and Data . . . . . . . . . . . . . . . 1274.4.4.3 With a Memory Region . . . . . . . . . . . . . . . . . . . . . . . . . . 127

4.4.5 Performance and Storage Optimizations When Using Buffers . . . . . . . . . . . 1284.4.5.1 Explicit Synchronization with Host Code . . . . . . . . . . . . . . . . . 1284.4.5.2 Providing Device Hints . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

4.4.6 Memory Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1304.5 Textures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

4.5.1 QCOM Extended Image format . . . . . . . . . . . . . . . . . . . . . . . . . . . 1334.6 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

4.6.1 HetCompute Lock-Free Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . 1344.7 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

4.7.1 Task-Local Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1354.7.2 Scheduler-Local Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1364.7.3 Thread-Local Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

4.8 Affinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1394.8.1 Overriding Local Affinity Settings . . . . . . . . . . . . . . . . . . . . . . . . . . 141

4.9 Heterogeneous Computing in Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1424.10 Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

4.10.1 Safe Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1454.10.2 Using HetCompute with the Fork() System Call . . . . . . . . . . . . . . . . . . . 1454.10.3 Using HetCompute with TLS-aware Libraries . . . . . . . . . . . . . . . . . . . . 1464.10.4 Distributed Computing using HetCompute . . . . . . . . . . . . . . . . . . . . . 1474.10.5 Avoid the Use of C++ iostream and stringstream Libraries . . . . . . . . . . . . . 147

5 Parallel Processing Tutorial 1485.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1485.2 Parallel Speedups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1485.3 Parallel Programming Paradigms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

5.3.1 Data parallelism (SIMD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1495.3.2 Task parallelism (MIMD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1505.3.3 Braided parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1505.3.4 Pipeline parallelism or Streaming . . . . . . . . . . . . . . . . . . . . . . . . . . 151

5.4 Parallel Programming Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1515.5 Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

5.5.1 Cache locality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1525.5.2 Minimizing wait time and synchronization . . . . . . . . . . . . . . . . . . . . . . 1535.5.3 Load balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

6 Image Processing Tutorial 1556.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1556.2 Image Processing Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1556.3 Parallel Image Processing using HetCompute . . . . . . . . . . . . . . . . . . . . . . . . 156

6.3.1 Naive Parallelization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 6

Page 7: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK CONTENTS

6.3.2 Tiling for Parallelization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1566.3.3 Parallelization using patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

7 Point Kernels (Beta feature) 159

8 Patterns Reference API 1608.1 Parallel For Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

8.1.1 Class Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1628.1.1.1 class hetcompute::pattern::pfor . . . . . . . . . . . . . . . . . . . . . . 1628.1.1.2 class hetcompute::pattern::pfor< hetcompute::internal::pointkernel::pointkernel<

RT, PKType...>, T2 > . . . . . . . . . . . . . . . . . . . . . . . . . . . 1638.1.1.3 class hetcompute::pattern::pfor< T1, void > . . . . . . . . . . . . . . . 164

8.1.2 Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1658.1.2.1 create_pfor_each . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1658.1.2.2 pfor_each . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1658.1.2.3 pfor_each . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1668.1.2.4 pfor_each . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1678.1.2.5 pfor_each_async . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1678.1.2.6 pfor_each_async . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1688.1.2.7 pfor_each_async . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1688.1.2.8 pfor_each_async . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

8.2 Parallel Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1708.2.1 Class Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

8.2.1.1 class hetcompute::pattern::ptransformer . . . . . . . . . . . . . . . . . 1708.2.2 Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

8.2.2.1 create_ptransform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1728.2.2.2 ptransform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1728.2.2.3 ptransform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1738.2.2.4 ptransform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1748.2.2.5 ptransform_async . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1758.2.2.6 ptransform_async . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1758.2.2.7 ptransform_async . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

8.3 Parallel Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1778.3.1 Class Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

8.3.1.1 class hetcompute::pattern::preducer . . . . . . . . . . . . . . . . . . . 1778.3.2 Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

8.3.2.1 create_preduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1788.3.2.2 preduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1798.3.2.3 preduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1808.3.2.4 preduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1818.3.2.5 preduce_async . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1818.3.2.6 preduce_async . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1828.3.2.7 preduce_async . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

8.4 Parallel Scan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1848.4.1 Class Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

8.4.1.1 class hetcompute::pattern::pscan . . . . . . . . . . . . . . . . . . . . . 1848.4.2 Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

8.4.2.1 create_pscan_inclusive . . . . . . . . . . . . . . . . . . . . . . . . . . 1858.4.2.2 pscan_inclusive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1858.4.2.3 pscan_inclusive_async . . . . . . . . . . . . . . . . . . . . . . . . . . 186

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 7

Page 8: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK CONTENTS

8.5 Parallel Divide-and-Conquer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1878.5.1 Class Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

8.5.1.1 class hetcompute::pattern::pdivide_and_conquerer . . . . . . . . . . . 1878.5.2 Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

8.5.2.1 create_pdivide_and_conquer . . . . . . . . . . . . . . . . . . . . . . . 1898.5.2.2 pdivide_and_conquer . . . . . . . . . . . . . . . . . . . . . . . . . . . 1898.5.2.3 pdivide_and_conquer . . . . . . . . . . . . . . . . . . . . . . . . . . . 1918.5.2.4 pdivide_and_conquer . . . . . . . . . . . . . . . . . . . . . . . . . . . 1918.5.2.5 pdivide_and_conquer_async . . . . . . . . . . . . . . . . . . . . . . . 1938.5.2.6 pdivide_and_conquer_async . . . . . . . . . . . . . . . . . . . . . . . 1938.5.2.7 pdivide_and_conquer_async . . . . . . . . . . . . . . . . . . . . . . . 194

8.6 Parallel Sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1958.6.1 Class Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

8.6.1.1 class hetcompute::pattern::psorter . . . . . . . . . . . . . . . . . . . . 1958.6.2 Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

8.6.2.1 create_psort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1968.6.2.2 psort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1978.6.2.3 psort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1978.6.2.4 psort_async . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1988.6.2.5 psort_async . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

8.7 Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1998.7.1 Class Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

8.7.1.1 class hetcompute::iteration_lag . . . . . . . . . . . . . . . . . . . . . . 2008.7.1.2 class hetcompute::iteration_rate . . . . . . . . . . . . . . . . . . . . . 2018.7.1.3 class hetcompute::parallel_stage . . . . . . . . . . . . . . . . . . . . . 2028.7.1.4 class hetcompute::pattern::pipeline . . . . . . . . . . . . . . . . . . . . 2038.7.1.5 class hetcompute::pipeline_context< UserData > . . . . . . . . . . . . 2118.7.1.6 class hetcompute::pipeline_context<> . . . . . . . . . . . . . . . . . . 2128.7.1.7 class hetcompute::pipeline_context_base . . . . . . . . . . . . . . . . 2138.7.1.8 class hetcompute::serial_stage . . . . . . . . . . . . . . . . . . . . . . 2178.7.1.9 class hetcompute::sliding_window_size . . . . . . . . . . . . . . . . . 2188.7.1.10 class hetcompute::stage_input . . . . . . . . . . . . . . . . . . . . . . 219

8.7.2 Typedef Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2218.7.2.1 serial_stage_type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

8.7.3 Enumeration Type Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . 2218.7.3.1 serial_stage_type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

8.8 Tuner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2238.8.1 Class Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

8.8.1.1 class hetcompute::pattern::tuner . . . . . . . . . . . . . . . . . . . . . 223

9 Tasks Reference API 2299.1 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

9.1.1 Class Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2319.1.1.1 class hetcompute::group . . . . . . . . . . . . . . . . . . . . . . . . . . 2319.1.1.2 class hetcompute::group_ptr . . . . . . . . . . . . . . . . . . . . . . . 245

9.1.2 Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2509.1.2.1 create_group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2509.1.2.2 create_group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2519.1.2.3 create_group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2529.1.2.4 finish_after . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 8

Page 9: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK CONTENTS

9.1.2.5 finish_after . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2539.1.2.6 intersect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2539.1.2.7 operator& . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254

9.2 Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2569.2.1 Class Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

9.2.1.1 struct hetcompute::beta::call_tuple . . . . . . . . . . . . . . . . . . . . 2579.2.1.2 struct hetcompute::beta::call_tuple< Dim, gpu_kernel< Args...> > . . 2589.2.1.3 class hetcompute::beta::cl_t . . . . . . . . . . . . . . . . . . . . . . . . 2589.2.1.4 class hetcompute::cpu_kernel . . . . . . . . . . . . . . . . . . . . . . . 2589.2.1.5 class hetcompute::cpu_kernel< FReturnType(FArgs...)> . . . . . . . . 2619.2.1.6 class hetcompute::dsp_kernel . . . . . . . . . . . . . . . . . . . . . . . 2639.2.1.7 class hetcompute::dsp_kernel< int(∗)(Args...)> . . . . . . . . . . . . . 2639.2.1.8 class hetcompute::beta::gl_t . . . . . . . . . . . . . . . . . . . . . . . . 2659.2.1.9 class hetcompute::gpu_kernel . . . . . . . . . . . . . . . . . . . . . . . 2659.2.1.10 class hetcompute::local . . . . . . . . . . . . . . . . . . . . . . . . . . 268

9.2.2 Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2699.2.2.1 create_cpu_kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2699.2.2.2 create_cpu_kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2699.2.2.3 create_dsp_kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2709.2.2.4 create_gpu_kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2709.2.2.5 create_gpu_kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2709.2.2.6 create_gpu_kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2719.2.2.7 create_gpu_kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2719.2.2.8 create_gpu_kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272

9.2.3 Variable Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2729.2.3.1 cl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2729.2.3.2 gl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272

9.3 Indices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2739.3.1 Class Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273

9.3.1.1 class hetcompute::index . . . . . . . . . . . . . . . . . . . . . . . . . . 2739.3.1.2 class hetcompute::index< 1 > . . . . . . . . . . . . . . . . . . . . . . 2739.3.1.3 class hetcompute::index< 2 > . . . . . . . . . . . . . . . . . . . . . . 2739.3.1.4 class hetcompute::index< 3 > . . . . . . . . . . . . . . . . . . . . . . 2739.3.1.5 class hetcompute::index_base . . . . . . . . . . . . . . . . . . . . . . 274

9.4 Ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2809.4.1 Class Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280

9.4.1.1 class hetcompute::range . . . . . . . . . . . . . . . . . . . . . . . . . . 2809.4.1.2 class hetcompute::range< 1 > . . . . . . . . . . . . . . . . . . . . . . 2809.4.1.3 class hetcompute::range< 2 > . . . . . . . . . . . . . . . . . . . . . . 2829.4.1.4 class hetcompute::range< 3 > . . . . . . . . . . . . . . . . . . . . . . 2859.4.1.5 class hetcompute::range_base . . . . . . . . . . . . . . . . . . . . . . 289

9.5 Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2929.5.1 Class Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299

9.5.1.1 struct hetcompute::do_not_collapse_t . . . . . . . . . . . . . . . . . . 2999.5.1.2 class hetcompute::task< ReturnType > . . . . . . . . . . . . . . . . . 2999.5.1.3 class hetcompute::task< ReturnType(Args...)> . . . . . . . . . . . . . 3019.5.1.4 class hetcompute::task< void > . . . . . . . . . . . . . . . . . . . . . 3059.5.1.5 class hetcompute::task<> . . . . . . . . . . . . . . . . . . . . . . . . 3059.5.1.6 class hetcompute::task_ptr< ReturnType > . . . . . . . . . . . . . . . 3169.5.1.7 class hetcompute::task_ptr< ReturnType(Args...)> . . . . . . . . . . . 322

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 9

Page 10: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK CONTENTS

9.5.1.8 class hetcompute::task_ptr< void > . . . . . . . . . . . . . . . . . . . 3269.5.1.9 class hetcompute::task_ptr<> . . . . . . . . . . . . . . . . . . . . . . 329

9.5.2 Typedef Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3359.5.2.1 collapsed_task_type . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3359.5.2.2 non_collapsed_task_type . . . . . . . . . . . . . . . . . . . . . . . . . 335

9.5.3 Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3359.5.3.1 abort_on_cancel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3359.5.3.2 abort_task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3379.5.3.3 bind_as_data_dependency . . . . . . . . . . . . . . . . . . . . . . . . 3399.5.3.4 bind_by_value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3399.5.3.5 blocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3399.5.3.6 create_task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3409.5.3.7 create_task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3419.5.3.8 create_value_task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3429.5.3.9 finish_after . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3449.5.3.10 launch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3449.5.3.11 launch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3459.5.3.12 operator!= . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3469.5.3.13 operator!= . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3469.5.3.14 operator!= . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3479.5.3.15 operator% . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3479.5.3.16 operator% . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3479.5.3.17 operator% . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3479.5.3.18 operator& . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3489.5.3.19 operator& . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3489.5.3.20 operator& . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3489.5.3.21 operator∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3499.5.3.22 operator∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3499.5.3.23 operator∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3499.5.3.24 operator+ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3499.5.3.25 operator+ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3509.5.3.26 operator+ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3519.5.3.27 operator+ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3519.5.3.28 operator- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3529.5.3.29 operator- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3539.5.3.30 operator- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3539.5.3.31 operator- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3549.5.3.32 operator/ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3549.5.3.33 operator/ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3549.5.3.34 operator/ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3549.5.3.35 operator== . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3559.5.3.36 operator== . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3559.5.3.37 operator== . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3559.5.3.38 operator>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3559.5.3.39 operator∧ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3569.5.3.40 operator∧ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3569.5.3.41 operator∧ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3569.5.3.42 operator| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3569.5.3.43 operator| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3579.5.3.44 operator| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 10

Page 11: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK CONTENTS

9.5.3.45 operator∼ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3579.5.4 Variable Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357

9.5.4.1 do_not_collapse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358

10 Buffers Reference API 35910.1 Heterogeneous Compute Device Types . . . . . . . . . . . . . . . . . . . . . . . . . . . 360

10.1.1 Class Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36010.1.1.1 class hetcompute::device_set . . . . . . . . . . . . . . . . . . . . . . . 360

10.1.2 Enumeration Type Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . 36510.1.2.1 device_type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365

10.1.3 Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36610.1.3.1 to_string . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366

10.2 Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36710.2.1 Class Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368

10.2.1.1 class hetcompute::buffer_const_iterator . . . . . . . . . . . . . . . . . 36810.2.1.2 class hetcompute::buffer_iterator . . . . . . . . . . . . . . . . . . . . . 36910.2.1.3 class hetcompute::buffer_ptr . . . . . . . . . . . . . . . . . . . . . . . 37010.2.1.4 struct hetcompute::in . . . . . . . . . . . . . . . . . . . . . . . . . . . 37910.2.1.5 struct hetcompute::inout . . . . . . . . . . . . . . . . . . . . . . . . . . 37910.2.1.6 struct hetcompute::out . . . . . . . . . . . . . . . . . . . . . . . . . . . 37910.2.1.7 class hetcompute::scope_acquire_ro . . . . . . . . . . . . . . . . . . . 37910.2.1.8 class hetcompute::scope_acquire_rw . . . . . . . . . . . . . . . . . . . 38010.2.1.9 class hetcompute::scope_acquire_wi . . . . . . . . . . . . . . . . . . . 381

10.2.2 Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38210.2.2.1 create_buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38210.2.2.2 create_buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38210.2.2.3 create_buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383

10.3 Memory Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38510.3.1 Class Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387

10.3.1.1 class hetcompute::glbuffer_memregion . . . . . . . . . . . . . . . . . . 38710.3.1.2 class hetcompute::ion_memregion . . . . . . . . . . . . . . . . . . . . 38810.3.1.3 class hetcompute::main_memregion . . . . . . . . . . . . . . . . . . . 38810.3.1.4 class hetcompute::memregion . . . . . . . . . . . . . . . . . . . . . . 38910.3.1.5 class hetcompute::svm_memregion . . . . . . . . . . . . . . . . . . . . 390

10.3.2 Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39110.3.2.1 glbuffer_memregion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39110.3.2.2 ion_memregion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39110.3.2.3 ion_memregion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39110.3.2.4 ion_memregion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39210.3.2.5 main_memregion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39210.3.2.6 main_memregion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39210.3.2.7 svm_memregion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39210.3.2.8 svm_memregion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39310.3.2.9 get_fd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39310.3.2.10get_id . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39310.3.2.11get_num_bytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39310.3.2.12get_ptr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39310.3.2.13get_ptr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39410.3.2.14get_ptr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39410.3.2.15 is_cacheable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 11

Page 12: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK CONTENTS

10.3.3 Variable Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39410.3.3.1 s_default_alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394

11 Graphics Reference API 39511.1 Texture APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396

11.1.1 Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39711.1.1.1 create_derivative_texture . . . . . . . . . . . . . . . . . . . . . . . . . 39711.1.1.2 create_sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39711.1.1.3 create_texture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39811.1.1.4 create_texture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39811.1.1.5 is_supported . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39811.1.1.6 map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39911.1.1.7 unmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399

11.2 Texture Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40011.2.1 Class Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400

11.2.1.1 struct hetcompute::graphics::image_size . . . . . . . . . . . . . . . . . 40011.2.1.2 struct hetcompute::graphics::image_size< 1 > . . . . . . . . . . . . . 40011.2.1.3 struct hetcompute::graphics::image_size< 2 > . . . . . . . . . . . . . 40111.2.1.4 struct hetcompute::graphics::image_size< 3 > . . . . . . . . . . . . . 401

11.2.2 Enumeration Type Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . 40111.2.2.1 addressing_mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40111.2.2.2 extended_format_plane_type . . . . . . . . . . . . . . . . . . . . . . . 40111.2.2.3 filter_mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40111.2.2.4 image_format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402

12 Data Structures Reference API 40312.1 Bounded Lock-Free Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404

12.1.1 Class Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40412.1.1.1 class hetcompute::bounded_lfqueue . . . . . . . . . . . . . . . . . . . 404

12.1.2 Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40512.1.2.1 bounded_lfqueue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40512.1.2.2 pop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40512.1.2.3 push . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405

12.2 Unbounded Lock-Free Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40712.2.1 Class Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407

12.2.1.1 class hetcompute::lfqueue . . . . . . . . . . . . . . . . . . . . . . . . . 40712.2.2 Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408

12.2.2.1 lfqueue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40812.2.2.2 pop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40812.2.2.3 push . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408

13 Data Sharing and Storage Reference API 40913.1 Data Sharing Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41013.2 Scheduler Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411

13.2.1 Class Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41113.2.1.1 class hetcompute::scheduler_storage_ptr . . . . . . . . . . . . . . . . 411

13.3 Scoped Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41413.3.1 Class Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414

13.3.1.1 class hetcompute::scoped_storage_ptr . . . . . . . . . . . . . . . . . . 41413.4 Task Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 12

Page 13: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK CONTENTS

13.4.1 Class Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41713.4.1.1 class hetcompute::task_storage_ptr . . . . . . . . . . . . . . . . . . . 417

13.5 Thread Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42113.5.1 Class Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421

13.5.1.1 class hetcompute::thread_storage_ptr . . . . . . . . . . . . . . . . . . 421

14 Exceptions Reference API 42314.1 Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424

14.1.1 Class Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42414.1.1.1 class hetcompute::abort_task_exception . . . . . . . . . . . . . . . . . 42414.1.1.2 class hetcompute::aggregate_exception . . . . . . . . . . . . . . . . . 42514.1.1.3 class hetcompute::api_exception . . . . . . . . . . . . . . . . . . . . . 42614.1.1.4 class hetcompute::canceled_exception . . . . . . . . . . . . . . . . . . 42614.1.1.5 class hetcompute::dsp_exception . . . . . . . . . . . . . . . . . . . . . 42714.1.1.6 class hetcompute::error_exception . . . . . . . . . . . . . . . . . . . . 42714.1.1.7 class hetcompute::gpu_exception . . . . . . . . . . . . . . . . . . . . . 42814.1.1.8 class hetcompute::hetcompute_exception . . . . . . . . . . . . . . . . 42914.1.1.9 class hetcompute::tls_exception . . . . . . . . . . . . . . . . . . . . . 429

14.2 ErrorCodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43114.2.1 Enumeration Type Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . 431

14.2.1.1 hc_error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431

15 Affinity Management API 43215.1 Affinity Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433

15.1.1 Class Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43415.1.1.1 struct hetcompute_affinity_settings_t . . . . . . . . . . . . . . . . . . . 43415.1.1.2 class hetcompute::affinity::settings . . . . . . . . . . . . . . . . . . . . 434

15.1.2 Typedef Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43515.1.2.1 hetcompute_func_ptr_t . . . . . . . . . . . . . . . . . . . . . . . . . . 435

15.1.3 Enumeration Type Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . 43515.1.3.1 cores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43515.1.3.2 hetcompute_affinity_cores_t . . . . . . . . . . . . . . . . . . . . . . . . 43515.1.3.3 hetcompute_affinity_mode_t . . . . . . . . . . . . . . . . . . . . . . . 43615.1.3.4 hetcompute_affinity_pin_threads_t . . . . . . . . . . . . . . . . . . . . 43615.1.3.5 mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436

15.1.4 Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43615.1.4.1 settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43615.1.4.2 ∼settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43715.1.4.3 execute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43715.1.4.4 get . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43715.1.4.5 get_cores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43715.1.4.6 get_mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43715.1.4.7 get_pin_threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43715.1.4.8 hetcompute_affinity_execute . . . . . . . . . . . . . . . . . . . . . . . 43715.1.4.9 hetcompute_affinity_get . . . . . . . . . . . . . . . . . . . . . . . . . . 43815.1.4.10hetcompute_affinity_reset . . . . . . . . . . . . . . . . . . . . . . . . . 43815.1.4.11hetcompute_affinity_set . . . . . . . . . . . . . . . . . . . . . . . . . . 43815.1.4.12 is_this_big_core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43915.1.4.13operator!= . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44015.1.4.14operator== . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 13

Page 14: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK CONTENTS

15.1.4.15reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44015.1.4.16reset_pin_threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44015.1.4.17set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44015.1.4.18set_cores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44215.1.4.19set_mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44215.1.4.20set_pin_threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442

16 Miscellaneous 44316.1 Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44416.2 Legacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445

16.2.1 Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44516.2.1.1 init . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44516.2.1.2 shutdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445

17 Class Documentation 44617.1 cpu_kernel Class Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44617.2 dsp_kernel Class Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44617.3 HetComputeApp::features Class Reference . . . . . . . . . . . . . . . . . . . . . . . . . 44617.4 hetcompute::beta::pattern::pipeline< UserData > Class Template Reference . . . . . . . 446

17.4.1 Member Typedef Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . 44717.4.1.1 context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447

17.4.2 Constructors and Destructors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44717.4.2.1 pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44717.4.2.2 ∼pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44817.4.2.3 pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44817.4.2.4 pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448

17.4.3 Member Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . 44817.4.3.1 add_stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44817.4.3.2 add_stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44817.4.3.3 operator= . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44917.4.3.4 operator= . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449

17.5 hetcompute::internal::pointkernel::pointkernel< RT, Args > Class Template Reference . 44917.6 stage_input_base Class Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44917.7 hetcompute::internal::task_factory< X, Y, Z > Struct Template Reference . . . . . . . . 44917.8 hetcompute::internal::task_factory_dispatch< X, Y > Struct Template Reference . . . . 449

Alphabetical Index 450

Bibliography 463

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 14

Page 15: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK LIST OF TABLES

List of Tables1-1 Reference documents and standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171-2 Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 15

Page 16: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

1 Introduction

1.1 Purpose

This document describes the Qualcomm® Snapdragon™ Heterogeneous Compute SDK programmingmodel and API.

1.2 Scope

This document is for system developers using the Qualcomm Heterogeneous Compute SDK to developdomain-specific libraries for high-performance applications. Qualcomm Heterogeneous Compute SDKhandles core management, providing the ability to port an application across multiple cores. Speed isdetermined by the number of processors on the device.

This document provides the public interfaces necessary to use the features provided by the QualcommHeterogeneous Compute SDK. A functional overview and information on leveraging the interfacefunctionality are also provided.

1.3 Conventions

Function declarations, function names, type declarations, and code samples appear in a different font. Forexample, #include.

Code variables appear in angle brackets. For example, <number>.

Commands and command variables appear in a different font. For example, {copy a:∗.∗ b:}.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 16

Page 17: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Introduction

1.4 References

The following table lists reference documents, which may include Qualcomm documents andnon-Qualcomm standards and resources. Reference documents that are no longer applicable are deletedfrom this table; therefore, reference numbers might not be sequential. This document also includes aBibliography at the end of this document with linkable citations throughout.

Table 1-1 Reference documents and standards

Ref. DocumentQualcommQ1 Application Note: Software Glossary for Customers CL93-V3077-1

1.5 Technical Assistance

For assistance or clarification on information in this guide, send email to Qualcomm Technologies, Inc. [email protected].

1.6 Acronyms

For definitions of commonly used terms and abbreviations, refer to Q1. The following terms are specific tothis document.

Table 1-2 Acronyms

Acronym DefinitionAPI application programming interfaceDAG directed acyclic graphGPGPU general purpose GPUQualcomm Het-Compute

Qualcomm® Snapdragon™ Heterogeneous Compute SDK

MIMD multiple instruction, multiple dataMPI message passing interfaceNDEBUG C/C++ preprocessor macro for NO DEBUGNDK Native Development KitSAXPY scalar vector multiplySIMD single instruction, multiple dataSMP symmetric multiprocessingSoC system-on-a-chipTLS thread local storageaDSP Application/Audio DSPcDSP Compute DSPSVM Shared Virtual Memory

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 17

Page 18: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

2 Installing Snapdragon™ HeterogeneousCompute SDK

This chapter explains how to configure an application to use HetCompute given the binary distribution. Theinstaller package available from the Qualcomm Developer Network contains precompiled dynamiclibraries for Android (32-bit and 64-bit ARM). Install the distribution on your system following the installerprompts, and then see the appropriate section below on how to verify installation, integrate it with yourapplication.

2.1 Verifying your installation

By default, the binary installer places the HetCompute library, headers, and samples in the followingdirectory: /opt/Qualcomm/SnapdragonHeterogeneousComputeSD-K/<version>/<platform> in linux & mac andC:\Qualcomm\SnapdragonHeterogeneousComputeSDK\<version>\<platform> inwindows, which is called the HETCOMPUTE_DIR directory throughout. Substitute platform with either32bit (armeabi-v7a) or 64bit (arm84-v8a) variants. If installed in a different location, that location becomesHETCOMPUTE_DIR.

Android 32-bit (armeabi-v7a):

• CPU, GPU, and Hexagon DSP support: libhetCompute-1.1.0.so

Android 64-bit (arm64-v8a):

• CPU, GPU, and Hexagon DSP support: libhetCompute-1.1.0.so

HetCompute assumes the existence of a working Android NDK and SDK. We recommend using NDKr13b or later.

Note: The Qualcomm Hexagon SDK (available on Qualcomm Developer Network) is neededto enable support for hexagon dsp in the Qualcomm HetCompute library. The recommended version ofthe Qualcomm Hexagon SDK for use with HetCompute is 3.4.0 or later.

Before compiling the samples, specify the path to the root of OpenCL directory containing the headers andthe library by initializing QSHETCOMPUTE_OPENCL_PATH in$HETCOMPUTE_DIR/samples/build/android/jni/Android.mk.

To verify the installation, perform the following:

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 18

Page 19: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Installing Snapdragon™ Heterogeneous Compute SDK

Using ndk-build:

cd $HETCOMPUTE_DIR/samples/build/android/jni$ANDROID_NDK/ndk-build

# Create a directory on the device to push the executable$ANDROID_SDK/adb shell mkdir /data/local/tmp/hetcompute

# Push the executable to the device# Replace armeabi-v7a by arm64-v8a for 64 bit devices$ANDROID_SDK/adb push ../obj/local/armeabi-v7a/hetcompute_sample_helloworld /data/local/tmp/hetcompute

# Push the hetcompute dynamic library to the device.# The 32-bit library should be pushed to /system/vendor/lib,# while the 64-bit library should be pushed to /system/vendor/lib64# Make sure to replace QSHETCOMPUTE_VERSION and Target Architecture with appropriate values in the below

command.$ANDROID_SDK/adb push $HETCOMPUTE_DIR/lib/$(TARGET_ARCH_ABI)/libhetCompute-$(QSHETCOMPUTE_VERSION).so /

system/vendor/lib$ANDROID_SDK/adb shell /data/local/tmp/hetcompute/hetcompute_sample_helloworld

The above ndk-build will build both 32-bit and 64-bit variant of the samples.

Note that some of the HetCompute GPU samples requires image files, the samples assumes that the imagefiles are under /mnt/sdcard in the device. Sample Image files can be found in theHETCOMPUTE_DIR/samples/src directory.

2.2 Integrating HetCompute with Android NDK Applications

The precompiled HetCompute libraries can be easily integrated with an existing native Android application.These libraries have been compiled with the Google NDK r13b, using the clang toolchain and linkedagainst the c++_static runtime. The default build android platform is android-21. Using the sameNDK, compiler, and runtime C++ library is recommended.

You will need to make the following changes to your project files to use the HetCompute libraries. First,edit your project’s jni/Application.mk file to include the following entries:

# APP_STL defines the C++ runtime to useAPP_STL := c++_static# For 64-bit android, APP_ABI := arm64-v8aAPP_ABI := armeabi-v7aNDK_TOOLCHAIN_VERSION := clang# set the APP_PLATFORM to match your platform versionAPP_PLATFORM := android-21

Next, edit your project’s jni/Android.mk to define the location of the HetCompute libraries and headersand generate prebuilt shared library.

# Heterogeneous Compute SDK prebuiltinclude $(CLEAR_VARS)

LOCAL_MODULE := qshetcomputeLOCAL_SRC_FILES := $(HETCOMPUTE_DIR)/$(TARGET_ARCH_ABI)/libhetCompute-$(QSHETCOMPUTE_VERSION).soLOCAL_EXPORT_C_INCLUDES := $(HETCOMPUTE_DIR)/include

include $(PREBUILT_SHARED_LIBRARY)

If applications wants to disable exceptions, make the following changes in Android.nk & Application.mkfiles.

# Add the following CFLAGS in Android.mkLOCAL_CFLAGS := -DHETCOMPUTE_DISABLE_EXCEPTIONS

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 19

Page 20: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Installing Snapdragon™ Heterogeneous Compute SDK

# Disable exceptions for the app in Application.mkAPP_CPPFLAGS += -fno-exceptions

Here is sample Android.mk file that is used to build the shipped samples

define hetcompute_add_sampleinclude $(CLEAR_VARS)LOCAL_MODULE := hetcompute_sample_$1LOCAL_C_INCLUDES := $(QSHETCOMPUTE_CORE_INCLUDE_PATH) \

$(QSHETCOMPUTE_OPENCL_INC_PATH) \$(QSHETCOMPUTE_DSP_STUB_PATH)

LOCAL_SHARED_LIBRARIES := qshetcompute libhetcompute-hexagon-prebuilt libOpenCL-prebuiltLOCAL_CPPFLAGS := -pthread -std=c++11LOCAL_LDLIBS := -llog -lGLESv3 -lEGLLOCAL_CFLAGS := -DHAVE_CONFIG_H=1 -DHAVE_ANDROID_LOG_H=1 -DHETCOMPUTE_HAVE_RTTI=1 -

DHETCOMPUTE_HAVE_OPENCL=1 -DHETCOMPUTE_HAVE_GPU=1 -DHETCOMPUTE_HAVE_GLES=1 -DHETCOMPUTE_HAVE_QTI_DSP=1 -DHETCOMPUTE_THROW_ON_API_ASSERT=1 -DHETCOMPUTE_LOG_FIRE_EVENT=1

ifeq ($(TARGET_ARCH_ABI), arm64-v8a)LOCAL_LDFLAGS := -Wl,-allow-shlib-undefined

endifLOCAL_SRC_FILES := $(QSHETCOMPUTE_SAMPLES_SRC_PATH)/$1.ccinclude $(BUILD_EXECUTABLE)

endef

To build your application, run:

$ANDROID_NDK/ndk-build

HetCompute SDK supports Heterogeneous Compute with offload on the GPU and DSP. GPU offload issupported using OpenCL and OpenGL kernels. The offload mechanism is either OpenCL 1.2 or later, or thenative Qualcomm GPU driver. The community has been collecting a list of Android devices that supportOpenCL. In addition, a Qualcomm-based platform that provides OpenCL support, such as the QualcommDragonBoard.

Using HetCompute with OpenCL as GPU backend requires the OpenCL C++ header file cl.hpp, whichneeds to be patched. For details on how to patch cl.hpp see OpenCL C++ Support. In the Android.mkfile, set LOCAL_C_INCLUDES to include path to OpenCL headers. In the above Android.mk, this isreferred by QSHETCOMPUTE_OPENCL_INC_PATH. libOpenCL-prebuit refers to the correspondingOpenCL library for 32 or 64 bit variant.

To build an application with the Hexagon-enabled library, set LOCAL_C_INCLUDES to include DSP stubheaders generated using Hexagon SDK. This is referred by QSHETCOMPUTE_DSP_STUB_PATH in thesamples Android.mk file.

2.3 Hexagon DSP Support

Using HetCompute with Hexagon DSP tasks requires a working installation of the Hexagon SDK. Thissection assumes that the programmer is familiar with the Hexagon SDK.

Building the hetcompute dsp samples is a two-step process. First, you need to build the hetcompute_dspstub and skel library with the Hexagon SDK, then compile and build the HetCompute DSP samples withthe previously generated stub library. For ease of use, the installation distributes both aDSP and cDSP stub& skel libraries used by the samples. These libraries can be found in$HETCOMPUTE_DIR/external/dsp/ .

Both the skel and stub libraries need to be pushed into the device before running Hetcompute DSP samples.Please refer to Hexagon SDK to determine the location where the libraries need to be installed in the device.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 20

Page 21: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Installing Snapdragon™ Heterogeneous Compute SDK

Running the HetCompute DSP samples:

• A README file is provided in $HETCOMPUTE_DIR/external/dsp/ that explains how tocompile the dsp skel & stub libraries in case one wants to add new functions in the stub & skellibraries. This step can be skipped if one wants to use the precompiled dsp skel & stub librariesdistributed in HetCompute SDK package.

• Push the libraries into the device and run the dsp samples.$ANDROID_SDK/adb shell mkdir -p /data/local/tmp/hetcompute/cd $HETCOMPUTE_DIR$ANDROID_SDK/adb push lib/armeabi-v7a/libhetcompute-@[email protected] /system/vendor/lib$ANDROID_SDK/adb push samples/build/android/libs/armeabi-v7a/hetcompute_sample_hexagon_is_prime /data/local

/tmp/hetcompute

$ANDROID_SDK/adb push external/dsp/lib/libhetcompute_dsp_skel.so /system/lib/rfsa/adsp$ANDROID_SDK/adb push external/dsp/lib/libhetcompute_adsp.so /system/vendor/lib

$ANDROID_SDK/adb shell chmod 0755 /data/local/tmp/hetcompute/hetcompute_sample_hexagon_is_prime$ANDROID_SDK/adb shell /data/local/tmp/hetcompute/hetcompute_sample_hexagon_is_prime

• To offload task execution to cDSP, follow the steps below to run a cDSP sample.$ANDROID_SDK/adb shell mkdir -p /data/local/tmp/hetcompute/cd $HETCOMPUTE_DIR$ANDROID_SDK/adb push lib/armeabi-v7a/libhetcompute-@[email protected] /system/vendor/lib$ANDROID_SDK/adb push samples/build/android/libs/armeabi-v7a/hetcompute_sample_hexagon_is_prime_cdsp /data/

local/tmp/hetcompute

$ANDROID_SDK/adb push external/dsp/lib/libhetcompute_dsp_skel.so /system/lib/rfsa/adsp$ANDROID_SDK/adb push external/dsp/lib/libhetcompute_cdsp.so /system/vendor/lib

$ANDROID_SDK/adb shell chmod 0755 /data/local/tmp/hetcompute/hetcompute_sample_hexagon_is_prime_cdsp$ANDROID_SDK/adb shell /data/local/tmp/hetcompute/hetcompute_sample_hexagon_is_prime_cdsp

Note

If issues are encountered, verify that the calculator example shipped with Hexagon SDK worksproperly in your device and that your device properly supports DSP execution.

2.4 OpenCL C++ Support

Using HetCompute with OpenCL as the GPU backend requires the presence of the OpenCL C++ headerfile from Khronos (version must match to your OpenCL driver installation):

http://www.khronos.org/registry/cl/

The header file should be installed in a subdirectory ${includedir}/OpenCL/ that is searched by thecompiler (e.g., /usr/local/include). Alternatively, additional options would need to be passed as compilationflags (e.g., -I${includedir}).

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 21

Page 22: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

3 Getting Started

3.1 Writing your first HetCompute program

Let’s explore a simple example using HetCompute:

1 #include <vector>23 #include <hetcompute/hetcompute.hh>45 int6 main()7 {8 hetcompute::runtime::init();9 // initialize the input vector10 std::vector<size_t> vin(1024, 0);1112 // in-place update of the input vector13 // equivalent to the following code14 // for (size_t i = 0; i < vin.size(); ++i) {15 // vin[i] = 2 * i;16 // }17 hetcompute::pfor_each(size_t(0), vin.size(), [&vin](size_t i) { vin[i] = 2 * i; })

;1819 hetcompute::runtime::shutdown();20 return 0;21 }

The above program does the following: Given an input vector vin containing 1024 elements, all of whichare initialized to 0, every element is updated to store 2∗i, where i is the index of that element.

In line 3, the hetcompute.hh header is included, which is needed for any HetCompute program. All theHetCompute classes and functions are declared in the hetcompute namespace.

This simple example illustrates the use of the HetCompute pfor_each pattern, which allows the elementsof a collection to be processed in parallel. Because there are no dependencies between iterations (termedinter-iteration dependencies), the values can be computed and updated in parallel. This pattern can be usedto replace all loops in the user’s program that do not have inter-iteration dependencies. HetComputeprovides a variety of other patterns, which are described in Parallel Programming Patterns.

HetCompute also provides programmers with another layer of abstraction, allowing them to think aboutalgorithms in terms of concurrent tasks and letting the HetCompute runtime schedule them onto availableresources in the system. Programmers can create dynamic task graphs by setting dependencies betweentasks that the runtime enforces. Another key HetCompute abstraction —not shown in the example— aregroups. Groups allow the programmer to easily manage sets of tasks. Tasks and groups are discussed inmore detail in Introduction to Tasks.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 22

Page 23: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Getting Started

3.1.1 Building a HetCompute program using ndk-build

The pfor_helloworld program above can be built using ndk-build. Please refer to Android.mk file in$HETCOMPUTE_DIR/samples/build/android/jni/Android.mk for Buildingpfor_helloworld sample. Snippets from the key build steps are listed below.

The following is the project’s Android.mk file.

include $(CLEAR_VARS)LOCAL_MODULE := pfor_helloworldLOCAL_SHARED_LIBRARIES := libhetcomputeLOCAL_SRC_FILES := pfor_helloworld.ccinclude $(BUILD_EXECUTABLE)

The jni/Android.mk is as shown:

The jni/Application.mk file is shown below:

APP_STL := c++_staticAPP_ABI := armeabi-v7a arm64-v8aNDK_TOOLCHAIN_VERSION := clang#set the APP_PLATFORM to match your platform version.APP_PLATFORM := android-21

The example can then be built by typing the following at the command prompt:

$ ndk-build

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 23

Page 24: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

4 User Guide

4.1 Overview

All current hardware platforms, from desktops to smartphones, are built around multicore andheterogeneous systems-on-a-chip (SoC). Servers and supercomputers are also using specialized cores, suchas GPUs, to improve performance and power efficiency.

Qualcomm Heterogeneous Compute SDK (HetCompute) enables the full utilization of the hardware at theuser application level, in the following ways:

• By providing a parallel programming model that allows programmers to express the concurrency intheir applications. HetCompute’s powerful abstractions ease the burden of parallel programmingthrough a design that builds on dynamic concurrency from the ground up. At the high level,HetCompute provides a set of parallel programming patterns that capture many of the existingparallel building blocks, and adds dataflow and work cancellation as first-class primitives thatimprove programmer productivity.

• By seamlessly integrating heterogeneous execution into a concurrent task graph and removing theburden of managing data transfers and explicit data copies between kernels executing on differentdevices. At the low level, HetCompute provides state-of-the-art algorithms for work stealing andpower optimizations that allow it to hide hardware idiosyncrasies to allow the development ofportable applications. In addition, HetCompute is designed to support dynamic mapping toheterogeneous execution units. Moreover, expert programmers can take charge of the executionthrough a carefully designed system of attributes and directives that provide the runtime system withadditional semantic information about the patterns, tasks, and buffers that HetCompute uses asbuilding blocks.

• By embedding the programming model in C++ and providing a C++ library API. C++ is a familiarlanguage for a large number of performance-oriented programmers, thus making it easy forprogrammers to pick up the abstractions quickly. C++ embedding also allows incrementaldevelopment of existing applications, because HetCompute interoperates with existing libraries, suchas pthreads and OpenGL.

HetCompute runs on top of a runtime system that will execute the concurrent applications on all theavailable computational resources on the SoC. The HetCompute runtime system is essentially a resourcemanager for threads, address spaces, and devices. It builds on a set of state-of-the-art algorithms to freeprogrammers from the need to manage these resources explicitly and provide the best performance for theHetCompute execution model.

The remaining sections of this chapter provide a high-level overview of the HetCompute parallel patternsand concurrent abstractions, and its execution model. The rest of the User’s Guide provides additionaldetails on the design decisions in HetCompute, which will allow the programmer to chose the right level ofprimitives to use in the application. The Reference Manual includes the API details.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 24

Page 25: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

The following figure illustrates the HetCompute architecture:

Figure 4-1 HetCompute overview

HetCompute is a user-level library that integrates with OS services to hide the complexity of hardware asmuch as possible, while still providing programmers with control over performance. HetCompute takesadvantage of existing standards to enable execution on the entire SoC: POSIX and C++11 for exploitingmulticore, OpenCL to dispatch onto GPUs, and Hexagon SDK to dispatch to the Qualcomm Hexagon(™ )DSP. The advantage of using HetCompute is that it provides a seamless interface for all these devices,therefore enabling the programmer to focus on the application being developed, rather than managinghardware, different execution models, and data transfers.

HetCompute’s execution model is a concurrent task graph, with acyclic control dependencies and/or datadependencies that define which tasks should execute concurrently. Tasks (defined formally in sectionsIntroduction to Tasks and Tasks Reference API) are units of independent work. They are an intuitive way ofspecifying chunks of computation that can map to different exection units. Dependencies (control and data)provide the mechanism to dynamically build a concurrent task graph. The task will execute in parallel on asmany execution units are available on the platform at that moment. Note that on a mobile device, because ofpower and thermal constraints, some execution units will not be available, or even disappear dynamically.Therefore, it is best that the programmers focus on expressing the concurrency using HetCompute tasks,and the runtime will map them to all the available resources. In HetCompute, heterogeneous execution is nodifferent than multicore execution. However, to provide best performance, HetCompute requiresprogrammers to write specialized kernels. The current version of HetCompute supports writing GPUkernels in the OpenCL language and DSP kernels in C99.

The HetCompute runtime manages tasks and maps them to platform resources using a state-of-the-art workscheduler. The scheduler implements pervasive work stealing and dynamic mapping of tasks to execution

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 25

Page 26: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

units, based on heuristics that are driven by the programmer using novel high-level APIs. Later in thisguide, several examples will be discussed on how the programmer can control the behavior of the runtime,such as using pattern tuners and task attributes. These are particularly relevant for mobile devices.

HetCompute provides two levels of APIs:

• A set of high-level APIs that includes parallel programming patterns and basic tasks and groupcreation and launch. These APIs are intended for programmers who focus first and foremost onproductivity. Using these APIs provide you with the best performance in most instances, withrelatively little coding effort. The semantics of these APIs is precisely defined, and the HetComputetype system is designed to catch many concurrent programming errors.

• A set of low-level APIs that allow expert programmers finer control over the parallel execution.These APIs may offer better performance, at the cost of removing some of the guarantees that thehigh-level APIs provide. Direct access to task pointer objects, task attributes and pattern tuners,specialized allocators, buffer consistency and synchronization, and storage classes, are some exampleof these APIs. These are the foundation for the high-level APIs, and thus the two levels work inconcert. However, using the low-level APIs requires a good understanding of parallel programmingand the side-effects that concurrent execution can have on your program; therefore, use with caution!

The target audience for HetCompute are programmers who require performance. HetCompute is envisionedto be used by application programmers and library programmers to build high-performance applicationsand domain-specific libraries. It is designed to make composing libraries easy: HetCompute tasks can belaunched from any application thread (no need to join a particular thread pool), tasks can be launchedhierarchically and synchronized individually or as a group, and a unified representation of patterns andtasks. These novel characteristics make HetCompute uniquely positioned as a framework for heterogeneousexecution. Many other application programmers can benefit from HetCompute by embedding suchQualcomm HetCompute-enabled libraries and thus indirectly benefit from parallel and heterogeneousexecution without the burden of parallel programming.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 26

Page 27: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4.1.1 Writing a HetCompute Application

Integrating HetCompute in your application is quite easy, as long as you understand the principles ofparallel programming and have a good idea of where the concurrency is in your application. Thefundamental design goal of HetCompute is to easily express a parallel algorithm and incrementally build aparallel application.

The figure below illustrates how to write an application using HetCompute:

Figure 4-2 HetCompute workflow

The workflow from start to completion of a HetCompute application is as follows:

• Identify the algorithm to be parallelized and design a parallel version of the algorithm.

• Encode the algorithm using HetCompute abstractions:

– If the algorithm matches one of the HetCompute patterns, use the pattern directly and enjoy thespeedups.

– More complex applications will require either the use of multiple patterns or they may exhibitparallelism that does not match one of the existing patterns. In this case, use the HetComputebuilding blocks of task and group to partitioning the algorithm into tasks, setting dependenciesbetween the tasks (building the execution task graph), and launching the tasks for execution. Also,partitioning the data should be considered for data concurrent access.

• Patterns and tasks are interoperable, as the HetCompute library maps patterns to tasks. Thus, aHetCompute application consists of a forest of DAGs. The runtime system schedules the tasks oncetheir dependencies are satisfied.

• HetCompute task graphs execute across different devices when the programmer provides devicekernels. To execute on the GPU kernels in OpenCL are written. To run on the DSP, kernels in C99 arewritten. These kernels are integrated into the task graph just like other tasks that are designed for theCPU. Kernels: The Path to Heterogeneity has details on how to design and build heterogeneouskernels.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 27

Page 28: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

The fastest way to build a HetCompute application is by using the HetCompute patterns. If your parallelalgorithm matches one of the parallel programming patterns in HetCompute (pfor_each, preduce,ptransform, pscan, psort, pdivide_and_conquer, or pipeline), directly using thepattern is recommended. The HetCompute runtime understands the semantics of these constructs andoptimizes for their concurrent execution.

Below are several examples on how to use several of these patterns.

4.1.1.1 Parallel vector addition

One of the most common operations for parallel programming is parallel iteration. HetCompute providesthe pfor_each pattern to support parallel iteration. Below is an example:

1 #include <cstdlib>2 #include <vector>34 #include <hetcompute/hetcompute.hh>56 using namespace std;78 int9 main()10 {11 hetcompute::runtime::init();12 const size_t N = 100;13 vector<float> a(N), b(N), c(N);1415 // Initialize the source arrays with random numbers16 for (size_t i = 0; i < N; i++)17 {18 a[i] = static_cast<float>(rand()) / ((1ULL << 31) - 1);19 b[i] = static_cast<float>(rand()) / ((1ULL << 31) - 1);20 }21 float alpha = 0.2f;2223 // add the two vectors concurrently24 hetcompute::pfor_each(size_t(0), N, [&](size_t i) { c[i] = alpha * a[i] + b[i]; })

;2526 hetcompute::runtime::shutdown();27 return 0;28 }

The use of HetCompute is highlighted in this example. Line 4 includes the HetCompute library headers. Upto, and including line 20 is standard C++11 for initializing two vectors, a and b, of size N. In line 24 thehetcompute::pfor_each construct is invoked. It is very similar to a for loop, except that theiterations will be executed in parallel on as many execution units are available on the platform. A moredetailed description of these patterns is in Patterns Reference API.

4.1.1.2 Parallel sort

Another example of a common operation that benefits from concurrent execution is sorting of large arrays.Here is an example of how to sort in parallel in HetCompute:

1 #include <random>2 #include <sstream>3 #include <vector>45 #include <hetcompute/hetcompute.hh>61213 int14 main(int argc, const char* argv[])15 {

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 28

Page 29: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

16 hetcompute::runtime::init();17 std::vector<long> input;18 size_t n_def = 20;19 size_t n = n_def;2021 if (argc >= 2)22 {23 std::istringstream istr(argv[1]);24 istr >> n;25 }2627 std::random_device rd;28 std::mt19937 generator(rd());29 std::uniform_int_distribution<long> dis;30 const size_t num_ints = 1ULL << n;31 // Create a random array of integers32 for (size_t i = 0; i < num_ints; i++)33 {34 input.push_back(dis(generator));35 }3637 hetcompute::psort(input.begin(), input.end());3839 if (!std::is_sorted(input.begin(), input.end()))40 {41 std::cerr << "psorting failed\n";42 }4344 hetcompute::runtime::shutdown();45 return 0;46 }

Most of the code in this example is standard C++ to initialize the data structures. In line 37, thehetcompute::psort parallel sorting function is invoked. It takes two iterators, the beginning and theend of the list, and it sorts (in place) the input vector in the interval [begin, end).

Hopefully, you are convinced how easy is to introduce parallel programming in your application if yourapplication fits one of the pre-defined HetCompute patterns.

4.1.1.3 Parallelism using tasks

HetCompute exposes the fundamental building blocks tasks and groups to parallelize algorithms thatdo not fit into one of the HetCompute patterns. Below is an example of sorting (using merge sort) that isparallelized using hetcompute tasks:

1 #include <algorithm>2 #include <functional>3 #include <iostream>4 #include <iterator>5 #include <sstream>6 #include <vector>78 #include <hetcompute/hetcompute.hh>910 // Parallel mergesort using recursive fork-join parallelism.11 // hetcompute::task<>::finish_after allows easy expression of the parallelism in the12 // algorithm in a non-blocking manner, yielding better performance than13 // blocking parallelization using hetcompute::task<>::wait_for.1416 const size_t GRANULARITY = 8192;1718 // Asynchronous mergesort, to be invoked in a task19 template <typename Iterator, typename Compare>20 void21 mergesort(Iterator begin, Iterator end, Compare cmp)22 {23 size_t n = std::distance(begin, end);

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 29

Page 30: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

24 if (n <= GRANULARITY)25 {26 sort(begin, end, cmp);27 }28 else29 {30 auto middle = begin;31 std::advance(middle, n / 2);32 auto left = hetcompute::launch([=] { mergesort(begin, middle, cmp); });33 auto right = hetcompute::launch([=] { mergesort(middle, end, cmp); });34 auto merge = hetcompute::create_task([=] { std::inplace_merge(begin, middle,

end, cmp); });35 // The left subtree and right subtree tasks must finish before the merge36 // task can execute37 left->then(merge);38 right->then(merge);39 merge->launch();40 // mergesort(begin, end, cmp) logically finishes after the merge task41 // finishes42 merge->finish_after();43 }44 }4546 int47 main(int argc, const char* argv[])48 {49 hetcompute::runtime::init();50 std::vector<long> input;51 size_t n_def = 1 << 16;52 size_t n = n_def;5354 if (argc >= 2)55 {56 std::istringstream istr(argv[1]);57 istr >> n;58 }5960 // Create a random array of integers61 for (size_t i = 0; i < n; i++)62 {63 input.push_back(rand());64 }6566 // Launch mergesort inside a task since it has an asynchronous interface (due67 // to use of hetcompute::task::finish_after)68 auto t = hetcompute::launch([&] { mergesort(input.begin(), input.end(),

std::less<long>()); });69 t->wait_for();7071 if (!std::is_sorted(input.begin(), input.end()))72 {73 std::cerr << "parallel mergesorting failed\n";74 }7576 hetcompute::runtime::shutdown();77 return 0;78 }

Please note how much easier is to just use the HetCompute patterns. In this example, a dynamic DAG oftasks is constructed by splitting the array into halves and sorting each half in parallel, and then merging theresults using another task. Lines 32-33 create the recursive sorting tasks. Line 34 create the merge task.Lines 37-38 sets the dependencies between the tasks, thus building the DAG. Lines 39-42 launch the tasksinto the runtime, and finally the function terminates when the merge task terminates (line 42). In the mainfunction, a task is created for the merge by passing the entire array (line 68), which has no dependency so itcan be directly launched (line 66) and then wait for it to complete (line 69).

This is a quick illustration of the power of HetCompute’s abstractions. In the rest of this guide awalkthrough is provided with details on the design to help you use HetCompute to extract the most benefits

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 30

Page 31: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

for your application.

4.1.2 Executing a HetCompute Application

The figure below illustrates how the HetCompute runtime executes HetCompute applications.

Figure 4-3 HetCompute execution

The HetCompute runtime fundamentally implements a thread pool over which tasks are scheduled at theuser level. When the application starts running, the thread pool is initialized such that it makes optimal useof the existing hardware contexts on the device. The scheduler is a throughput-oriented scheduler. Tasks arescheduled in a non-preemptive manner as they are ready for execution (dependencies are satisfied). Theyare mapped to devices based on the kernel type. The runtime performs additional optimizations forperformance and energy efficiency based on the patterns semantics, patterns tuning and task attributes.

Parallel Programming Patterns

Introduction to Tasks

Buffers

Textures

Data Structures

Storage

Affinity

Heterogeneous Computing in Action

Interoperability

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 31

Page 32: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4.2 Parallel Programming Patterns

Overview of HetCompute Patterns

Parallel Iteration

Parallel Reduction

Parallel Scan

Parallel Divide-and-Conquer

Parallel Sorting

Advanced Topics for Patterns

Pipeline

4.2.1 Overview of HetCompute Patterns

One of HetCompute’s main goals is to simplify parallel programming. To this end, it provides severalconstructs that encapsulate commonly used parallel programming patterns. These patterns reflect howparallel programming experts think about parallel algorithms, and capture these essential understandingswith ready-to-use HetCompute APIs and performance tuning facilities. HetCompute’s parallelprogramming patterns incorporate a variety of algorithmic styles exploiting concurrency. Examples include:data parallelism operations (parallel loop, parallel reduce, parallel transform, and parallel scan),multi-branched recursion (parallel divide and conquer), and staged computation often encountered instreaming applications (pipeline). Programmers are encouraged to use these patterns as basic buildingblocks to construct complex concurrent applications such as physical simulation, image/video processing,or linear algebra routines. More patterns that are representative of key computational requirements will bedeveloped. In addition, because these patterns are layered on top of the Qualcomm HetComputeabstractions, programmers are welcome to add to the library of patterns themselves.

When a HetCompute tool is selected to parallelize an algorithm, first stop at the pattern warehouse, as oneof the existing patterns may meet your needs. If so, use the pattern, measure the performance and efficiency,and refine the implementation using the pattern performance tuner. Use other HetCompute constructs onlyif the algorithm does not map well to the existing patterns, or the pattern implementation does not satisfythe intended performance criteria.

This section provides a high-level overview of what to expect from the parallel patterns. First, the basicuses of the parallel patterns are demonstrated. Next, some advanced topics related to patterns are discussed.Finally, the pipeline pattern is presented as a relatively complex pattern. Note that all patterns presentedherein only apply to CPU. Heterogeneous patterns will be explored and included in future releases. Readersmay also refer to Kernels: The Path to Heterogeneity and Heterogeneous Computing in Action forheterogeneous computing tutorials in HetCompute.

4.2.2 Parallel Iteration

The most commonly used pattern is data-parallel computing, in which the same function is applied todifferent pieces of data. In this chapter, two closely related parallel programming patterns are introducedwhich express data parallelism: hetcompute::pfor_each and hetcompute::ptransform.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 32

Page 33: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4.2.2.1 Parallel For Loop

The parallel for loop pattern, hetcompute::pfor_each, supports concurrent application of a givenfunction object on each element in the input collection returned by the input iterator taken as an argument.The input iterator can be expressed as a pair of integers (lower bound, upper bound), or a pair of randomaccess iterators (begin, end). This pattern is mostly suitable to replace a serial loop where loop-carrieddependence (dependence exists across iterations) does not occur. The following example illustrates the useof the parallel iteration pattern for a simple computation.

1 #include <vector>23 #include <hetcompute/hetcompute.hh>45 int6 main()7 {8 hetcompute::runtime::init();9 // initialize the input vector10 std::vector<size_t> vin(1024, 0);1112 // in-place update of the input vector13 // equivalent to the following code14 // for (size_t i = 0; i < vin.size(); ++i) {15 // vin[i] = 2 * i;16 // }17 hetcompute::pfor_each(size_t(0), vin.size(), [&vin](size_t i) { vin[i] = 2 * i; })

;1819 hetcompute::runtime::shutdown();20 return 0;21 }

Despite the simple look, the underlying implementation is highly efficient in workload parallelization andload balancing. A lock-free workstealing algorithm is employed to balance workload, i.e., iterations towork on, across multiple computational cores. It attempts to exploit the maximum degree of concurrencyavailable in the loop computation, and has a very low overhead of synchronization.

The API also takes two optional parameters: stride and tuner. Pattern tuners are covered in Tuner. Thestride parameter represents the step size of the incremental iterator, and has a default value of one. Forexample, the parallel version of the following code snippet

for(size_t i = 0; i < vin.size(); i += 2)vin[i] = 2 * i;

is the following statement.

hetcompute::pfor_each(size_t(0), 2, vin.size(), [&vin](size_t i){ vin[i] = 2 * i; });

The parallel iteration pattern can be nested. However, it is usually sufficient to only decorate the outmostloop with hetcompute::pfor_each, given the outmost loop has sufficient iterations to keep all coresbusy.

4.2.2.2 Parallel Transformation

The parallel transformation pattern, hetcompute::ptransform, has three versions. The first twoversions apply a given function object to a range and stores the result in another range. They are essentiallythe parallel version of std::transform (one applies unary function and the other applies binaryfunction). The third version, similar to hetcompute::pfor_each, performs in-place transformation.The major difference between the two patterns is that hetcompute::ptransform passes thedereferenced input iterator to the function object, whereas hetcompute::pfor_each passes the inputiterator directly to the function object. Therefore, the input iterator passed to

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 33

Page 34: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

hetcompute::ptransform is restricted to random access iterators. Integral iterators are not allowedbecause they cannot be dereferenced.

The parallel transformation pattern is useful when the programmer wishes to directly manipulate thedereferenced input iterator in the function object. For example, the following code performs a binaryoperation on different segments of the input range, and stores the result in the output range.

1 #include <functional>2 #include <vector>34 #include <hetcompute/hetcompute.hh>56 int7 main()8 {9 hetcompute::runtime::init();10 // Initialize input vector: vin[i] = i11 std::vector<int> vin(1024);12 int j = 0;13 for (auto& i : vin)14 i = j++;1516 // vout[i] = vin[i] + vin[i+1]17 std::vector<int> vout(vin.size() - 1);18 hetcompute::ptransform(begin(vin),19 begin(vin) + vout.size(), // first input range20 begin(vin) + 1, // start of the second input range21 begin(vout), // start of the output range22 std::plus<int>());2324 hetcompute::runtime::shutdown();25 return 0;26 }

4.2.3 Parallel Reduction

At times, programmers require to compute reduction over a range, e.g. sum, min, max, or as complex asmultiplication of a chain of matrices. The parallel reduction pattern, hetcompute::preduce, processesa list of elements using a join function object and computes a return value. A join function object is appliedto two elements and produces a result which can be combined using the join function with the remainingelements in the range. Parallelizing reduction in HetCompute is simple, in that the programmer only needsto pass the input container to work on, or a pair of random access iterators specifying the range. An initialvalue (the identity element) also needs to be specified for reduction. The binary operation definingreduction is expected to be associative, but not necessarily commutative. Putting them altogether, thefollowing example demonstrates a parallel sum implementation in HetCompute:

1 #include <functional>2 #include <iostream>3 #include <numeric>4 #include <vector>56 #include <hetcompute/hetcompute.hh>78 int9 main()10 {11 hetcompute::runtime::init();12 // initalize the input vector13 std::vector<int> vin(1024, 0);14 int val = 1;15 for (auto& i : vin)16 {17 i = val++;18 }19

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 34

Page 35: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

20 const int identity = 0;21 // parallel_sum = 1 + 2 + 3 + ... + 102423 int parallel_sum = hetcompute::preduce(vin, identity, std::plus<int>());2526 // check result27 int serial_sum = std::accumulate(vin.begin(), vin.end(), 0);28 if (parallel_sum != serial_sum)29 {30 std::cout << "Parallel reduction failed!" << std::endl;31 }32 hetcompute::runtime::shutdown();33 return 0;34 }

If the programmer passes a pair of dereferenceable input iterators, the example becomes:

int parallel_sum = hetcompute::preduce(vin.begin(), vin.end(), identity,std::plus<int>());

Or

int parallel_sum = hetcompute::preduce(arr, arr + 1024, identity, std::plus<int>());

However, if the programmer passes a pair of integral input iterators which are not dereferenceable, theprogrammer needs another function object to capture the input container, and to define the accumulationoperation for a subrange starting with some initial value. This is necessary because the join function doesnot offer dereferenced operation on the input iterators. The parallel sum example will become the following:

1 #include <functional>2 #include <iostream>3 #include <numeric>4 #include <vector>56 #include <hetcompute/hetcompute.hh>78 int9 main()10 {11 hetcompute::runtime::init();12 // initalize the input vector13 std::vector<int> vin(1024, 0);14 int val = 1;15 for (auto& i : vin)16 {17 i = val++;18 }1920 const int identity = 0;21 // parallel_sum = 1 + 2 + 3 + ... + 102422 int parallel_sum = hetcompute::preduce(size_t(0),23 vin.size(),24 identity,25 // aggregate subrange26 [&vin](size_t f, size_t l, int& init) {27 for (size_t k = f; k < l; ++k)28 {29 init += vin[k];30 }31 },32 // join intermediate results33 std::plus<int>());3435 // check result36 int serial_sum = std::accumulate(vin.begin(), vin.end(), 0);37 if (parallel_sum != serial_sum)38 {39 std::cout << "Parallel reduction failed!" << std::endl;40 }4142 hetcompute::runtime::shutdown();

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 35

Page 36: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4344 return 0;45 }

Internally, an efficient work stealing algorithm has been implemented to parallelize the reductioncomputation. The algorithm builds a reduction tree and first performs accumulation of subranges in atop-down manner. The intermediate values are then joined together bottom-up to obtain a final result.

Because of the work stealing implementation, programmers can put some in-place transformationcomputation ahead of reduction to build more complex algorithms. In this sense,hetcompute::preduce can be viewed as the combination of the parallel for loop pattern and theparallel reduction pattern. The in-place transformations are completed during the top-down accumulationprocess, as exhibited by the following code snippet.

1 #include <functional>2 #include <iostream>3 #include <numeric>4 #include <vector>56 #include <hetcompute/hetcompute.hh>78 int9 main()10 {11 hetcompute::runtime::init();12 // initalize the input vector13 std::vector<int> vin(1024, 0);14 int val = 1;15 for (auto& i : vin)16 {17 i = val++;18 }1920 const int identity = 0;21 // parallel_sum = 2 + 4 + 6 + ... + 204822 int parallel_sum = hetcompute::preduce(size_t(0),23 vin.size(),24 identity,25 // aggregate subrange26 [&vin](size_t f, size_t l, int& init) {27 for (size_t k = f; k < l; ++k)28 {29 // some transformation func applied to vin30 vin[k] *= 2;31 init += vin[k];32 }33 },34 // join intermediate results35 std::plus<int>());3637 // check result38 int serial_sum = std::accumulate(vin.begin(), vin.end(), 0);39 if (parallel_sum != serial_sum)40 {41 std::cout << "Parallel reduction failed!" << std::endl;42 }4344 hetcompute::runtime::shutdown();45 return 0;46 }

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 36

Page 37: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4.2.4 Parallel Scan

Parallel scan is a useful building block for many parallel algorithms. HetCompute implementshetcompute::pscan_inclusive, a Sklansky-style, in-place parallel prefix operation. Exampleapplications include stream compaction and sorting. The scan is inclusive because it generates a new rangewhere each element i is the result of the prefix operation of all elements up to and including i. If the scanresult of each element includes operations on all previous elements, but not the element itself, it is called anexclusive scan. An exclusive scan can be easily generated from an inclusive scan by shifting the resultingrange right by one element and inserting the identity element at the leftmost place. Ahetcompute::pscan_exclusive API may be provided in the future release.

The most commonly used prefix scan operation is prefix sum, which computes an output range consistingof all sums of prefixes of some input range. An example of parallel prefix sum is given below (the prefixscan operation is in-place, that is, vin will include the prefix sum of its original values after execution):

1 #include <functional>2 #include <vector>34 #include <hetcompute/hetcompute.hh>56 int7 main()8 {9 hetcompute::runtime::init();10 // Initialize input vector: vin[i] = 111 std::vector<int> vin(1024, 1);1213 // After the scan, vin[i] == i + 114 hetcompute::pscan_inclusive(vin.begin(), vin.end(), std::plus<int>());1516 hetcompute::runtime::shutdown();17 return 0;18 }

4.2.5 Parallel Divide-and-Conquer

A common parallel pattern arising in various domains is divide-and-conquer. Examples are quicksort andtree building/traversal. Use hetcompute::pdivide_and_conquer to solve an abstract problem pby splitting it into subproblems solved in parallel. For example, in case of the Fibonacci problem p issimply an int representing the Fibonacci term to compute in parallel. HetCompute internally uses ahigh-performance non-blocking algorithm to solve this problem.

In the following example, the hetcompute::pdivide_and_conquer pattern is demonstrated tocalculate a Fibonacci sequence:

1 #include <sstream>2 #include <vector>34 #include <hetcompute/hetcompute.hh>5810 static size_t11 fibonacci_s(size_t n)12 {13 if (n == 0 || n == 1)14 {15 return n;16 }17 else18 {19 return fibonacci_s(n - 1) + fibonacci_s(n - 2);20 }21 }

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 37

Page 38: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

2224 static const size_t GRANULARITY = 20;2527 static size_t28 fibonacci(size_t n)29 {30 return hetcompute::pdivide_and_conquer<size_t, size_t>(31 // Problem is to compute the n-th Fibonacci term32 n,33 // When should an arbitrary Fibonacci term, represented by ’m’, be34 // computed sequentially?35 // Note that programmer chooses to compute Fibonacci terms 20 and lower36 // sequentially for best performance.37 [](size_t& m) { return m <= GRANULARITY; },38 // How to compute the term sequentially39 [](size_t& m) { return fibonacci_s(m); },40 // Split problem into independent subproblems41 [](size_t& m) {42 return std::vector<size_t>({ m - 1, m - 2 });43 },44 // Merge solutions to subproblems.45 // Note that the first parameter (size_t, corresponding to the split46 // problem) is unused in this case, but may be useful while merging in47 // other cases.48 [](size_t, std::vector<size_t>& sols) { return sols[0] + sols[1]; });49 }5051 int52 main(int argc, const char* argv[])53 {54 hetcompute::runtime::init();55 size_t n_def = 24;56 size_t n = n_def;5758 if (argc >= 2)59 {60 std::istringstream istr(argv[1]);61 istr >> n;62 }6364 size_t out = fibonacci(n);6566 if (out != fibonacci_s(n))67 {68 std::cerr << "parallel fibonacci failed\n";69 }70 hetcompute::runtime::shutdown();71 return 0;72 }

The Fibonacci example demonstrates one form of hetcompute::pdivide_and_conquer whichreturns a solution. There also exists problems that do not expect a returned solution and/or do not have amerge stage after the splitting phase. The hetcompute::pdivide_and_conquer pattern offers bothcombinations. Refer to Patterns Reference API for the complete reference. The following example showsquicksort, which has neither a merge phase nor a returned solution.

1 #include <algorithm>2 #include <array>3 #include <cstdlib>4 #include <functional>5 #include <sstream>6 #include <utility>78 #include <hetcompute/hetcompute.hh>91521 template <typename Iterator>22 struct QuickSort23 {

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 38

Page 39: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

24 QuickSort(Iterator _begin, Iterator _end) : begin(_begin), end(_end), middle() {}25 Iterator begin, end, middle;26 };2729 const size_t GRANULARITY = 8192;3033 template <typename Iterator, typename Compare>34 void35 quicksort(Iterator begin, Iterator end, Compare cmp)36 {37 typedef QuickSort<Iterator> QuickSort;38 hetcompute::pdivide_and_conquer(39 // Main problem40 QuickSort(begin, end),41 // When should an arbitrary array, represented by ’q’, be sorted42 // sequentially?43 // Note that programmer chooses to sort arrays smaller than size 819244 // sequentially for best performance.45 [&](QuickSort& q) {46 size_t n = std::distance(q.begin, q.end);47 if (n <= GRANULARITY)48 {49 return true;50 }51 // Choice of first element as pivot is arbitrary52 auto pivot = *q.begin;53 q.middle = std::partition(q.begin, q.end, std::bind2nd(cmp, pivot));54 // If middle == begin, elements in [begin, end) are greater than or55 // equal to pivot. We could either find a new pivot or as we do here,56 // just sort sequentially.57 return q.middle == q.begin;58 },59 // Sequential sort used60 [&](QuickSort& q) { std::sort(q.begin, q.end, cmp); },61 // Split problem into two subproblems62 [&](QuickSort& q) {63 std::array<QuickSort, 2> subarrays{ { QuickSort(q.begin, q.middle), QuickSort(q.middle, q.end)

} };64 return subarrays;65 });66 }6768 int69 main(int argc, const char* argv[])70 {71 hetcompute::runtime::init();72 std::vector<long> input;73 size_t n_def = 1 << 16;74 size_t n = n_def;7576 if (argc >= 2)77 {78 std::istringstream istr(argv[1]);79 istr >> n;80 }8182 // Create a random array of integers83 for (size_t i = 0; i < n; i++)84 {85 input.push_back(rand());86 }8788 quicksort(input.begin(), input.end(), std::less<long>());8990 if (!std::is_sorted(input.begin(), input.end()))91 {92 std::cerr << "parallel quicksorting failed\n";93 }9495 hetcompute::runtime::shutdown();96

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 39

Page 40: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

97 return 0;98 }

The parallel divide-and-conquer pattern, like other patterns, are built using the basic HetComputeconstructs of tasks and groups. However, they are optimized using knowledge about the pattern structureand the operations in the runtime to minimize the amount of synchronization, and to avoid otherbookkeeping operations that are needed for more generic use.

4.2.6 Parallel Sorting

HetCompute provides a parallel sorting utility hetcompute::psort for programmers. It performs anunstable in-place comparison sorting of an input range. The programmer may either provide a customizedcompare function, or use the default compare function (std::less<T>(), where T is the value type ofthe iterators). An example of HetCompute parallel sort is listed below.

1 #include <random>2 #include <sstream>3 #include <vector>45 #include <hetcompute/hetcompute.hh>61213 int14 main(int argc, const char* argv[])15 {16 hetcompute::runtime::init();17 std::vector<long> input;18 size_t n_def = 20;19 size_t n = n_def;2021 if (argc >= 2)22 {23 std::istringstream istr(argv[1]);24 istr >> n;25 }2627 std::random_device rd;28 std::mt19937 generator(rd());29 std::uniform_int_distribution<long> dis;30 const size_t num_ints = 1ULL << n;31 // Create a random array of integers32 for (size_t i = 0; i < num_ints; i++)33 {34 input.push_back(dis(generator));35 }3637 hetcompute::psort(input.begin(), input.end());3839 if (!std::is_sorted(input.begin(), input.end()))40 {41 std::cerr << "psorting failed\n";42 }4344 hetcompute::runtime::shutdown();45 return 0;46 }

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 40

Page 41: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4.2.7 Advanced Topics for Patterns

HetCompute patterns are more than a collection of convenient methods. The programmers can createHetCompute tasks from the patterns and throw them into the asynchronous runtime environment.HetCompute also offers performance tuners to fine tune the patterns with the knowledge of the system andthe algorithm. This section includes some of the advanced topics to further facilitate the understanding ofHetCompute patterns. Note that the topics covered in this section apply to all the aforementioned patterns.The semantics of HetCompute pipeline pattern is much different from others. Therefore, the advancedtopics such as asynchronous launch are covered separately in Pipeline.

4.2.7.1 Pattern Object

Programmers can create a pattern object and invoke the pattern by using the run method or the () operatorwith arguments, as illustrated in the following code example.

1 #include <vector>2 #include <hetcompute/hetcompute.hh>34 int5 main()6 {7 hetcompute::runtime::init();8 // initialize the input vector9 std::vector<size_t> vin(1024, 0);1011 // declare function object to be applied12 auto func = [&vin](size_t i) { vin[i] = 2 * i; };1314 auto pfor = hetcompute::pattern::create_pfor_each(func);15 pfor.run(size_t(0), vin.size());1617 hetcompute::runtime::shutdown();18 return 0;19 }

Patterns are by default blocking, meaning that the execution is stopped until the pattern call returns, whichis sometimes undesirable. The programmer might want patterns to run asynchronously similar to otherHetCompute tasks. Fortunately, all patterns in HetCompute define a corresponding asynchronous API thatdoes not wait for termination. These APIs are named after the original patterns with suffix _async. InHetCompute, the most common way to launch a pattern asynchronously is to (1) create a pattern object, (2)using hetcompute::create_task and hetcompute::launch to invoke the pattern. As such,programmers can utilize the rich semantics defined for HetCompute tasks and groups (that is, dependencies,wait_for, finish_after, etc.) for pattern manipulation.

1 #include <vector>2 #include <hetcompute/hetcompute.hh>34 int5 main()6 {7 hetcompute::runtime::init();8 // initialize the input vector9 std::vector<size_t> vin(1024, 0);1011 // declare function object to be applied12 auto func = [&vin](size_t i) { vin[i] = 2 * i; };13 auto pfor = hetcompute::pattern::create_pfor_each(func);1415 // create a pfor task and launch!16 auto t1 = hetcompute::create_task(pfor, size_t(0), vin.size());17 t1->launch();18 t1->wait_for();19

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 41

Page 42: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

20 // launch pfor directly21 auto t2 = hetcompute::launch(pfor, size_t(0), vin.size());22 t2->wait_for();23 hetcompute::runtime::shutdown();24 return 0;25 }

4.2.7.2 Tuner

The default pattern implementations should cover the majority of use cases. However, no singleimplementation is the best fit for all workload types. For that reason, HetCompute offers programmers acollection of commonly used algorithm parameters served as the performance tuning knobs (tuner). Inparticular, the programmer can declare a HetCompute tuner object and set its property up front, and pass thetuner object to the pattern API for the purpose of performance tuning. This is illustrated by the followingexample:

// declare hetcompute::tuner object and use the static chunking algorithm for parallelization.hetcompute::pattern::tuner t;t.set_static();

// start pforhetcompute::pfor_each(size_t(0), vin.size(), [&vin](size_t i) { vin[i] = 2 * i; },

t);

Performance settings can be chained in tuner declaration.

// create a pfor objectauto func = [&vin](size_t i) { vin[i] = 2 * i; };auto pfor = hetcompute::pattern::create_pfor_each(func);

// start pforpfor.run(size_t(0),

vin.size(),hetcompute::pattern::tuner()

.set_max_doc(8) // Use 8 tasks for load balancing

.set_chunk_size(16) // The minimum stealing granuality is 16);

Some settings do not have any effect because there is no mapping under the setting specific to a pattern. Thecurrent HetCompute release focuses on performance tuning for hetcompute::pfor_each. The mostuseful settings are listed for hetcompute::pfor_each performance tuning explaining their usages.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 42

Page 43: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

Setting Name Explanationset_max_doc Maximum degree of concurrency. Default set to

the number of available cores which defines themaximum number of tasks launched internally forload balancing. A higher number indicatesoversubscription which might be beneficial incertain usage scenarios.

set_chunk_size Work stealing granularity. Default set to one (eachtask tries to steal work after finishing oneiteration). For long loops with tiny computationalkernel size, the default setting is problematicbecause of the large synchronization overhead. Ifthe default setting does not meet the performancerequirement, a gradual increase in the chunk sizeshould be attempted until the saddle point islocated.

set_static Use the simple static chunking algorithm forparallelization.

set_dynamic Use the dynamic workstealing algorithm forparallelization (default).

set_serial Set to serial execution, convenient for performancecomparison and calculate speedup.

4.2.8 Pipeline

The HetCompute Pipeline pattern supports the pipeline parallel programming model, which is often used instreaming applications.

The HetCompute Pipeline API allows the programmer to describe a linear chain of processing stages suchthat the output of each stage is the input of the next. The programmer associates a C++ stage function witheach stage, and can specify a basic C++ type or a user-defined data-type for handing over data betweenstages. Once launched, the Pipeline stage repeatedly executes the stage function over a data stream. Asuccessor stage starts executing on one data unit after its predecessor stage finishes processing the sameunit. While the stages in the pipeline executes one data unit sequentially (from the first stage to the last),they can execute different data units at the same time.

Note that in contrast to a typical pipeline model, where all the stages always execute exactly the samenumber of iterations, HetCompute supports stages executing a different number of total iterations in onePipeline (by using hetcompute::iteration_rate). Also in contrast to the standard pipeline modelwhere the successor stage can start executing immediately after its predecessor finishes one iteration,HetCompute Pipeline supports iteration delays between stages so that the predecessor can run at least niterations ahead of its immediate successor (by using hetcompute::iteration_lag).

HetCompute Pipeline is compatible with the HetCompute asynchronous semantics so that it can belaunched and waited on, just like any other tasks.

Algorithms for streaming applications can be expected to map to Pipeline in a straightforward manner.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 43

Page 44: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4.2.8.1 Overview

The HetCompute Pipeline API is designed with the following intents:

• The Pipeline is created dynamically by a C++ program as a HetCompute pipeline object. Pipelinestages should be added sequentially to the pipeline prior to launching, that is, once launched, thePipeline can no longer be modified.

• The Pipeline allows arbitrary C++ code in a stage function, though the parameter list is dictated bythe pipeline context (manditory,hetcompute::pattern::pipeline<UserData...>::context) and the data packet(hetcompute::stage_input) that is expected to be handed over between stages.

• The data between stages can transport any data type that is copyable (assignable and constructible)and default constructible.

– One stage iteration can produce, at most 1 data unit.

– One stage iteration can consume 0− n data units, depending on the features of the stage. (seeIteration Rate (hetcompute::iteration_rate):).

• The HetCompute pipeline can control the memory footprint for the stages. Instead of allowing a stageto proceed freely with many future iterations, HetCompute pipeline supports a default specialexecution manner (hetcompute::pattern::pipeline::enable_sliding_window)which favors pipeline throughput, that is, instead of allowing a stage to freely proceed with manyfuture iterations and storing the produced data, the Pipeline schedules the successor stage to consumethe data as soon as possible so as to save the memory space for storage. Thus, a pipeline stage canspecify a fixed amount of memory as a circular buffer to store the produced data. HetCompute callsthe circular buffer Sliding Window . Note that this special execution manner is pipeline-specific ratherthan a stage feature. A pipeline can also run in a more free manner (hetcompute::pattern-::pipeline::disable_sliding_window) if no sliding window is used in any of its stages.This execution manner may lead to a higher level of parallelism at runtime, but has no control on thememory footprint. It can be mostly used for performance tuning when level of parallelism is criticaland memory footprint control is not. Moreover, the pipeline internal uses different buffer datastructures for inter-stage data transfers under different sliding window modes. A static circular bufferis used when sliding window mode is enabled while a dynamic pool bucket buffer is used whensliding window mode is disabled. However, the implementation details of the inter-stage buffer istransparent to the user. The information here is mentioned as a side note to consider when advancedusers need to reason performance and memory usage of their applications under different modes.

• The user can set the following parameters for each Pipeline stage:

– Stage Type: the execution order of the iterations for a stage

◦ Serial Stage (hetcompute::serial_stage): runs every iteration sequentially.

◦ Parallel Stage (hetcompute::parallel_stage): can run multiple consecutive iterationsconcurrently.

· Degree of Concurrency (doc): number of consecutive iterations that can run in parallel.

· A parallel stage with doc = 1 is equivalent to a serial stage.

– Iteration Lag (hetcompute::iteration_lag): minimum number of iterations that a stageshould run ahead of its successor (should be ≥ 0).

– Iteration Rate (hetcompute::iteration_rate): rate of iterations between two consecutivestages

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 44

Page 45: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

◦ Each stage can have different iteration numbers

◦ Iteration lag will be scaled up according to the iteration rate

– Sliding Window Size (hetcompute::sliding_window_size): the unit size of the circularbuffer between stages in a Pipeline so as to control the memory footprint

◦ HetCompute Pipeline supports a sanity check(hetcompute::pattern::pipeline::is_valid()) before launching the pipeline.An exception will be raised if it fails.

– Pipeline Context (hetcompute::pattern::pipeline::context): the information thatis available to all user-defined stage functions for a specific pipeline:

◦ Contains user-defined pipeline-specific data

◦ Contains pipeline execution information: stage id, stage iteration id, etc.

◦ Provides control to the pipeline when possible, that is, stops the pipeline on the fly.

• The default pipeline pattern implementation should cover the most of basic user cases. However,there’s no single implementation is the best fit for all pipeline workload types. User knowledge on theapplication can sometimes help the pattern to perform better. The user can also use the hetcomputetuners for performance tuning (Tuner):

– Stage Tuner: provided when adding a pipeline stage. Currently, HetCompute pipeline supportsstage iteration chunking for user to tweak the granuality of work to be performed together in thescheduler. Iteration chunking is also known as iteration fusion, while multiple iterations will beexecuted sequentially as a big entity to avoid scheduling overhead between iterations. This is veryhelpful if the pipeline stage has very light workload with many itertions to perform. On the otherhead, iteration chunking can reduce the level of parallelism due to the serialization. For parallelstages, chunk size should not be larger than its degree of concurrency. Otherwise, it will beignored by the pattern scheduler. For serial stages, chunk size can be larger than 1. It is helpful toreduce scheduling overhead, however, the user need to make sure chunking with a specific size issafe for the application correcness. The case for serial stage is allowed mainly for performancetuning of some special cases although it violates the philosophy of pattern tuning, which is thattuners are only supposed to be used for performance and never affect the correctness of the pattern.

– Pipeline Tuner: provided when launching the pipeline. Currently, HetCompute pipeline supportsmaximum degree of concurrency, i.e. the number of initial concurrent tasks are launched forpipeline scheduling. This helps if the first stage is a parallel stage. The user may want to controlthe level of parallelism of the pipeline as other applications may occupy some of the compuationalresources in the system.

Here is an example for the pipeline and stage tuner usage:1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();7 // Define a pipeline skeleton, with pipeline context data of type size_t.8 hetcompute::pattern::pipeline<size_t> p;910 // Pipeline context type.11 typedef hetcompute::pattern::pipeline<size_t>::context

context;1213 // Add a parallel stage with degree of concurrency of 8.14 p.add_stage(hetcompute::parallel_stage(8),

hetcompute::pattern::tuner().set_chunk_size(2), [](context& ctx) {15 // some usage of iter and data here

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 45

Page 46: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

16 size_t iter = ctx.get_iter_id();17 size_t data = *ctx.get_data();18 // some usage of iter and data here19 HETCOMPUTE_ILOG("iter: %zu, data: %zu", iter, data);20 });2122 // Add a serial stage.23 p.add_stage(hetcompute::serial_stage(), [](context& ctx) {24 // size_t iter = ctx.get_iter_id();25 // size_t data = *ctx.get_data();26 auto dp = ctx.get_data();27 *dp = *dp + 1;28 // some usage of iter and data here29 });3031 // Define the context data.32 size_t num = 0;3334 // Run the pipeline with 10 iterations with tuner.35 p.run(&num, 10, hetcompute::pattern::tuner().set_max_doc(4));3637 std::cout << "pipeline runs " << num << " iters" << std::endl;3839 hetcompute::runtime::shutdown();40 return 0;41 }

Note that tuners are only suggestions from the user to the scheduler. It is up to the implementation ofthe scheduler whether to respect or ignore the tuning hints.

4.2.8.2 HetCompute Pipeline Example

How to express a simple video processing application is demonstrated using the HetCompute PipelinePattern.

The video processing application (as shown in fig_HETCOMPUTEPipeline) contains three stages:

• Stage 0: Read the video stream frame-by-frame (sequentially)

• Stage 1: Process the frames.

– No intra-stage data dependency, that is, to process the current frame previous processed frames arenot required. This qualifies the stage to be "parallel".

– Inter-stage dependency, which requires that the current frame refer to the previous two originalframes to process the current one. Note that the inter-stage data dependency does not disqualifythe stage from being parallel. This dependency can be ensured by setting the stage lag to be 2.

• Stage 2: Save every other frame to the output file.

Figure 4-4 HetCompute Pipeline Example

The application in the HetCompute Pipeline:

// define the FileInfo struct// will use this type for the pipeline-specific data

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 46

Page 47: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

typedef struct {File* InputFile; //Input video fileFile* OutputFileOdd; //Output video file for the odd framesFile* OutputFileEven; //Output video file for the even framessize_t num_frames; //Number of total frames in the input file

}FileInfo;

// Create pipeline with pipeline-specific data (context data) of type FileInfohetcompute::pattern::pipeline<FileInfo> pipe;

// alias to the pipeline contextusing context = hetcompute::pattern::pipeline<FileInfo>::context

;

// define stage function lambdasauto read_frame_from_stream = [](context& ctx) {

// read frame ctx.get_iter_id() from ctx.get_data()->InputFile;return ctx.get_iter_id();

};

auto process_one_frame = [](context& ctx, hetcompute::stage_input<size_t>&in) {

// process frame in[0];};

auto save_every_other_frame = [](context& ctx) {// save frame ctx.get_iter_id() * 2 to ctx.get_data->OutputFileEven;// save frame ctx.get_iter_id() * 2 + 1 to ctx.get_data->OutputFileOdd;

};

// Add the first stage to read the frames from the video streamp.add_stage(hetcompute::serial_stage(), // serial stage

hetcompute::sliding_window_size(16), // use a sliding window for16 frames

read_frame_from_stream);

// Add the second parallel stage to process the framesp.add_stage(hetcompute::parallel_stage(4), // parallel stage and doc = 4

hetcompute::iteration_lag(2), // iteration delay of 2hetcompute::sliding_window_size(16), // use a sliding window for

16 framesprocess_one_frame);

// Add the last serial stage to save every other frame back to a new output video filep.add_stage(hetcompute::serial_stage(), // serial stage

hetcompute::iteration_rate(2, 1), // Every 2 iterations in the 2ndstage map to 1 here

save_every_other_frame);

// define the pipeline-specific dataFileInfo finfo;// initialize the content for the pipeline-specific datainit_file_info(&finfo);// launch the pipeline and process finfo.num_frames frames// control the memory footprint of the pipeline executionp.enable_sliding_window();p.run(&finfo, finfo.num_frames);

4.2.8.3 HetCompute Pipeline Details

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 47

Page 48: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4.2.8.3.1 Stage Function

• The stage function of a HetCompute Pipeline can be any of the following user-defined C++ entities:

– lambda expression

– callable object

– function pointer

• The return type of the stage function:

– void means no data needs to be passed over to the next stage;

– Non-void means one object of the return type will be handed over to the next stage after eachiteration.

– The last stage in the pipeline should always return void

• The parameter list of the stage function:

– The 1st parameter of the stage function is mandatory. It should be a reference to the context typeof the pipeline the stage belongs to (hetcompute::pattern::pipeline::context).This makes it possible for the user-defined stage function to get access to the pipeline state.

– The 2nd parameter is optional. When needed, the 2nd parameter of a pipeline stage function has tobe a reference to hetcompute::stage_input<type>, here type should match the returntype of the stage function for the predecessor stage.

– The stage function for the first stage should always take one parameter since it does not have apredecessor stage.

– For any other stages, the stage function takes one parameter if their predecessor stages return void;otherwise, it takes two parameters.

– The HetCompute pipeline sanity check will verify that the data types between the return type ofthe predecessor stage and the stage_input type of the successor stage before launching thepipeline. An exception will be raised if the types do not match.

Code snippet for defining stage function in HetCompute:

// alias to the context belongs to a pipeline without specfic datausing context = hetcompute::pattern::pipeline<>::context;

// stage function as a function pointer// @param context& pipeline context// @return size_tsize_t s0(context& ctx) {

// do some thing for iteration ctx.get_iter_id();return ctx.get_iter_id();

}

// stage function as a lambda expression// @param context& pipeline context// @param hetcompute::stage_input<size_t>& pipeline context// @return voidauto s1 = [](context& ctx, hetcompute::stage_input<size_t>& in) {

// do something for iteration ctx.get_iter_id() and uses the data in[0];}

// stage function as a functorclass S2 {public:

// stage function using a callable object// @param context& pipeline context// @return void

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 48

Page 49: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

void operator()(context& ctx) {// 2nd parameter is not needed here because the previous stage returns void// do some thing for iteration ctx.get_iter_id()}

};

S2 s2;

// define and run the pipelinevoid foo() {

// define the pipeline objecthetcompute::pattern::pipeline<> p;

// add the first stage, where ... are other possbile stage features, such as// lag, rate, sliding window sizep.add_stage(hetcompute::serial_stage(), ..., s0);

// add the second stage, where ... are other possible stage features, such as// lag, rate, sliding window sizep.add_stage(hetcompute::parallel_stage(8), ..., s1);

// add the third stage, where ... are other possible stage features, such as// lag, rate, sliding window sizep.add_stage(hetcompute::serial_stage(), ..., s2);

// launch the pipeline...

}

4.2.8.3.2 Iteration Rate (<code>hetcompute::iteration_rate</code>):

Rate of iterations between two consecutive stages.

• Allows different Pipeline stages to run a different number of iterations

• Iteration Rate is always defined in the successor stage ashetcompute::iteration_rate(r1, r2) for a pair of consecutive stages. Here r1/r2 is apositive rational number.

• HetCompute uses the Simplest Form of r1/r2 (divided by Greatest Common Factor), that is,(4 : 8)→ (1 : 2), (21 : 9)→ (7 : 3)

• The default value for iteration rate is (1, 1)

• Execution Semantics:

– Stage s1 is followed by s2 and the iteration rate between them is (r1, r2)

– Each iteration of s1 has the weight of r2, while each iteration of s2 has the weight of r1

– Banking analogy for the execution semantics of the HetCompute Pipeline stage iteration rate:

◦ stage s1 puts r2 coins into the bank when it finishes one iteration

◦ stage s2 needs at least r1 coins in the bank and takes r1 coins out once it is launched

– Mathematics definition: The ith iteration of s2 can not be launched until s1 finishes itsdi ∗ r1/r2eth iteration

– Only specify the number of iterations for the FIRST stage when launching the pipeline

Code snippet for defining stage iteration lag in HetCompute:

// stage s1 is followed by s2// iteration rate between s1 and s2 is (r1, r2)p.add_stage(..., stage1_body);

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 49

Page 50: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

p.add_stage(..., hetcompute::iteration_rate(r1, r2), stage2_body);

4.2.8.3.3 Iteration Lag (hetcompute::iteration_lag):

Minimum number of iterations that a stage runs ahead of its successor (lag ≥ 0)

• Iteration Lag is always described in the successor stage for a pair of consecutive stages.

• Iteration Lag will be scaled up according to stage rates ( s1 follows by s2), that is, L iterations in s2compares to dL ∗ r1/r2e iterations in s1.

• The default value for iteration lag is 0

• Execution Semantics:

– At any time, assume i1 and i2 are the iteration stages s1 and s2 has finished, the following equationholds: i1 ∗ r2 ≥ (i2 + L) ∗ r1

Code snippet for defining stage iteration lag in HetCompute:

// stage s1 is followed by s2// iteration rate between s1 and s2 is (r1, r2)// the lag between them is Lp.add_stage(..., stage1_body);p.add_stage(..., hetcompute::iteration_lag(L),

hetcompute::iteration_rate(r1, r2), stage2_body);

4.2.8.3.4 Sliding Window Size (<code>hetcompute::sliding_window_size</code>):

The unit size of the circular buffer between stages in a Pipeline that is needed to control the memoryfootprint.

• Sliding window size is always defined in the current stage

• Controls the memory footprint between stages and favors pushing the execution towards the laststage.

• If not specified, HetCompute assumes the stage does not use a sliding window

• Sizing the stage sliding window

– In order to be algorithmicaly correct, the minimum sliding window size depends on the value ofiteration lag, iteration rate and the degree of concurrency of a stage.

– Minimum size = dL ∗ r1/r2e+ max{doc, 8, dr1/r2e} in HetCompute

– HetCompute performs a sanity check if sliding window size is explicitly specified in a pipelinebefore launching. An exception will be raised if the check fails.

Code snippet for defining stage sliding window size in HetCompute:

// stage s1 is followed by stage s2// iteration rate between s1 and s2 is (r1, r2)// the lag between them is L// the sliding window size for s1 is sws1p.add_stage(..., hetcompute::sliding_window_size(sws1), stage1_body);p.add_stage(..., hetcompute::iteration_lag(L),

hetcompute::iteration_rate(r1, r2), stage2_body);

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 50

Page 51: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4.2.8.3.5 Pipeline Context:

The information that is available to the user-defined stage functions for a specific pipeline

• Contains user-defined pipeline-specific data if needed

• Contains pipeline stage execution information: iteration id, stage id, etc.

• Access to control the pipeline: stop the pipeline for on the fly terminating execution

• Access to cancel the pipeline: cancel the pipeline execution. Note thathetcompute::abort_on_cancel() needs to be called in the pipeline user-defined stagefunctions for proper pipeline cancellation. A pipeline can be cancelled in any stages, however theinternal state of the pipeline will be non-deterministic for a cancelled pipeline.

Code snippet for defining pipeline context in HetCompute:

// User-defined pipeline-specific datatypedef struct FileInfo{

File* infile;File* outfile;

} FileInfo;

// define a HetCompute pipeline object with data of type FileInfohetcompute::pattern::pipeline<FileInfo> p;

// get the type of the pipeline contextusing finfo_pcontext = hetcompute::pattern::pipeline<FileInfo>::context

;

// define the stage function, which returns a size_t value and takes no stage inputauto stage_body = [](finfo_pcontext& ctx)->size_t{

foo(ctx.get_data()->infile, ctx.get_iter_id());bar(ctx.get_data()->outfile, ctx.get_stage_id());// stop the pipeline on the flyif(...)

ctx.stop_pipeline();else if(...)

ctx.cancel_pipeline();return 0;

};// add the stage to the pipelinep.add_stage(..., stage_body);

// define two pipeline-specific dataFileInfo info1, info2;

// run the pipeline for file1 with info1p.run(&info1, 0);// run the pipeline for file2 with info2p.run(&info2, 0);

4.2.8.4 Launch the HetCompute pipeline

4.2.8.4.1 Launch with a known total number of pipeline iterations

• The total number of pipeline iterations is known before launching

• Only need to provide the total iteration number for the first stage. The iteration number for thefollowing stages will be propogated according to the stage iteration rate.

Code snippet for launching a pipeline with number of iterations:

// define a HetCompute pipeline objecthetcompute::pattern::pipeline<> p;

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 51

Page 52: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

// add stages...// synchronous launch with 10 iterations for the first stage// launch without sliding windowp.disable_sliding_window();p.run(10);

4.2.8.4.2 Launch without knowing the total number of pipeline iterations

• The total number of pipeline iterations is not known before launching

• Stop the pipeline on the fly during exeuction (only the first stage can stop a pipeline through thepipeline context)

• May have performance penalty compared to launching with known iteration numbers

Code snippet for launching a pipeline and stop it on the fly

// define a HetCompute pipeline objecthetcompute::pattern::pipeline<> p;

using context = hetcompute::pattern::pipeline<>::context;

// add the first stage that will stop the pipeline at some pointp.add_stage(hetcompute::serial_stage(), // stage type

..., // other features of the stage, that is, lag, rate, sliding window size[](context&ctx) { // stage function// do something// when condition is true, stop the pipelineif(condition) {ctx.stop_pipeline();

}});

// add other stages...// using the launch type without control on the memory footprint,p.disable_sliding_window();// synchronous launch and stop it on the flyp.run(0);

4.2.8.4.3 Synchronous Launch:

Launch the pipeline and execution blocks until the pipeline finishes execution

Code snippet for synchronously launching a pipeline

// define a HetCompute pipeline objecthetcompute::pattern::pipeline<> p;// add stages...// synchronous launch with known number of iterationsp.run(10);// use HetCompute free function for synchronous launch with known number of iterationshetcompute::launch(p, 10)// pipeline finishes execution at this point

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 52

Page 53: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4.2.8.4.4 Asynchronous Launch:

Pipeline execution behaves like a normal HetCompute task and supports all the task asynchronoussemantics, that is, launch, wait_for, dependency, finish_after, etc. Please use the same precautions with thelife-cycle of the data used in the pipeline as that are needed due to the asynchronous semantics ofHetCompute tasks.

Code snippet for pipeline asynchronous semantics

// define a HetCompute pipeline objecthetcompute::pattern::pipeline<> p;

// add stages...// launch with no control memory footprintp.disable_sliding_window();// asynchronous launch with a known number of iterationsauto t1 = p.create_task(10); // t1 is of type hetcompute::task_ptr<>// use t1 as a regular HetCompute task set dependency, or launch, or wait_for, or finish_after...

// launch with control memory footprintp.enable_sliding_window();// use HetCompute free function for creating an asynchronous task of the pattern// t2 is of type hetcompute::task_ptr<void(size_t)>auto t2 = hetcompute::create_task(p);// bind the argument to the task with a known number of iterationst2->bind(10);// use t2 as a regular HetCompute task set dependency, or launch, or wait_for, or finish_after...

4.2.8.5 Heterogeneous Pipeline (HetCompute Beta Feature)

HetCompute pipeline supports a BETA feature of heterogeneous pipeline execution. A heterogeneouspipeline can have predefined CPU or GPU pipeline stages so that they can run concurrently on theheterogeneous computational components to fully exploit the underlying hardware platform. Theheterogeneous pipeline is desirable for pipeline applications with some stages that are data intensive andmost suitable for GPU processing, while the others can run efficiently on CPU cores at the same time.

Here is a simple code example of a heterogeneous pipeline with two CPU stages (S0 and S2) and one GPUstage (S1) for vector addition in the middle.

1 #include <cmath>2 #include <assert.h>3 #include <hetcompute/hetcompute.hh>45 using namespace hetcompute;6 using namespace std;78 const size_t num_iters = 32;9 std::vector<hetcompute::buffer_ptr<float>> a_bufs;10 std::vector<hetcompute::buffer_ptr<float>> b_bufs;11 std::vector<hetcompute::buffer_ptr<float>> c_bufs;1213 void init_bufs(size_t size);14 void reset_bufs(size_t size);15 void cleanup_bufs();1617 void18 init_bufs(size_t size)19 {20 for (size_t j = 0; j < num_iters; j++)21 {22 // Create input buffers.23 auto buf_a = hetcompute::create_buffer<float>(size);24 auto buf_b = hetcompute::create_buffer<float>(size);25 auto buf_c = hetcompute::create_buffer<float>(size);

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 53

Page 54: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

2627 a_bufs.push_back(buf_a);28 b_bufs.push_back(buf_b);29 c_bufs.push_back(buf_c);30 }31 }3233 void34 reset_bufs(size_t size)35 {36 // Reset the initial value of the buffers.37 for (size_t j = 0; j < num_iters; j++)38 {39 a_bufs[j].acquire_wi();40 b_bufs[j].acquire_wi();41 c_bufs[j].acquire_wi();4243 for (size_t i = 0; i < size; ++i)44 {45 a_bufs[j][i] = i;46 b_bufs[j][i] = size - i;47 c_bufs[j][i] = j + 1;48 }4950 a_bufs[j].release();51 b_bufs[j].release();52 c_bufs[j].release();53 }54 }5556 // Release the memory of the buffers.57 void58 cleanup_bufs()59 {60 a_bufs.clear();61 b_bufs.clear();62 c_bufs.clear();63 }6465 // A GPU kernel string which does the vector addition.66 #define OCL_KERNEL(name, k) std::string const name##_string = #k67 OCL_KERNEL(vadd_kernel, __kernel void vadd(__global float* a, __global float* b, __global float* c,

unsigned int size) {68 unsigned int i = get_global_id(0);69 if (i < size)70 c[i] = a[i] + b[i];71 });7273 int74 main()75 {76 hetcompute::runtime::init();77 const size_t size = 32;7879 // Initialize the buffers.80 init_bufs(size);81 // Reset the buffer values.82 reset_bufs(size);8384 // Define a hetcompute heterogeneous pipeline and its context.85 hetcompute::beta::pattern::pipeline<> p;86 using context = hetcompute::beta::pattern::pipeline<>::context

;8788 // S0: A CPU stage.89 // Add a serial cpu stage.90 p.add_stage(hetcompute::serial_stage(), [](context& ctx) -> size_t {91 // Return the iteration id.92 return ctx.get_iter_id();93 });94

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 54

Page 55: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

95 // S1: A GPU stage.96 // Define a GPU kernel for the GPU stage.97 std::string kernel_name("vadd");98 auto gpu_vadd = hetcompute::create_gpu_kernel<hetcompute::in<hetcompute::buffer_ptr<float>>,99

hetcompute::in<hetcompute::buffer_ptr<float>>,100

hetcompute::out<hetcompute::buffer_ptr<float>>,101 unsigned int>(vadd_kernel_string, kernel_name);102103 // Define the return type of the gpu stage before lambda.104 // Can also use the utlitiy template to get the return type (beta feature):105 // using gk_tuple_type =106 // typename hetcompute::beta::call_tuple<1, decltype(gpu_vadd)>::type;107 using gk_tuple_type =108 std::tuple<hetcompute::range<1>, hetcompute::buffer_ptr<float>,

hetcompute::buffer_ptr<float>, hetcompute::buffer_ptr<float>, unsigned int>;109110 // Add a parallel gpu stage.111 p.add_stage(hetcompute::parallel_stage(4),112 hetcompute::iteration_lag(0),113 hetcompute::iteration_rate(1, 1),114 // before lambda115 // (prepare the parameters for the GPU kernel as a return tuple)116 [&](context& ctx, hetcompute::stage_input<size_t>&) ->

gk_tuple_type {117118 // Get the iteration id to index the buffers.119 size_t y = ctx.get_iter_id();120121 auto r = hetcompute::range<1>(size);122 auto buf_a = a_bufs[y];123 auto buf_b = b_bufs[y];124 auto buf_c = c_bufs[y];125126 return std::make_tuple(r, buf_a, buf_b, buf_c, size);127 },128 // The GPU kernel for the stage.129 gpu_vadd,130131 // after lambda (optional)132 [&](context&, gk_tuple_type&) -> size_t { return 0; });133134 // S2: A CPU stage.135 // Add a serial cpu stage.136 p.add_stage(hetcompute::serial_stage(hetcompute::in_order), [&](

context& ctx, hetcompute::stage_input<size_t>&) {137138 size_t y = ctx.get_iter_id();139140 // Verify the GPU stage outputs.141 a_bufs[y].acquire_ro();142 b_bufs[y].acquire_ro();143 c_bufs[y].acquire_ro();144145 for (size_t i = 0; i < size; ++i)146 {147 assert(size == c_bufs[y][i]);148 assert((a_bufs[y][i] + b_bufs[y][i]) == c_bufs[y][i]);149 if (size != c_bufs[y][i])150 HETCOMPUTE_ILOG("The output of the GPU stage is incorrect.");151 if (a_bufs[y][i] + b_bufs[y][i] == c_bufs[y][i])152 HETCOMPUTE_ILOG("The inputs of the GPU stage are incorrect.");153 }154155 a_bufs[y].release();156 b_bufs[y].release();157 c_bufs[y].release();158 });159160 // Launch hetero-pipeline through hetero-pipeline pattern.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 55

Page 56: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

161 p.run(num_iters);162163 // Clean up the buffers.164 cleanup_bufs();165166 hetcompute::runtime::shutdown();167 return 0;168 }

4.2.8.6 Heterogeneous Pipeline Details

Besides the GPU stage, the heterogeneous pipeline maintains the same features as a regular homogeneousCPU pipeline (see HetCompute Pipeline Details).

4.2.8.6.1 GPU Pipeline Stages

To define the functionality of a GPU stage, we need three components:

• a before lambda

• a GPU kernel

• an after lambda (optional).

The before lambda (can also be a callable object or a function pointer) connects the stage with its previousstage and provides the arguments to the GPU kernel for the current stage. The before lambda takes thesame parameters as a CPU stage function, i.e., a reference tohetcompute::beta::pattern::pipeline::context and a reference tohetcompute::stage_input<type> (optional), see Stage Function. It should return astd::tuple of the range for the GPU kerenal and the variables that will be fed as the arguments for theGPU kernel. In the simple example above (line 94 to 129), the GPU kernel is 1-d and takes three bufferpointers and one unsigned int as its input arguments. Therefore, the before lambda of the GPU stagereturns a std::tuple<hetcompute::range<1>, hetcompute::buffer_ptr<float>,hetcompute::buffer_ptr<float>, hetcompute::buffer_ptr<float>, unsignedint>, with range as the first element in the tuple followed by the kernel arguments in order (see BasicUsage of Buffers, Kernels: The Path to Heterogeneity).

The GPU kernel defines the main functionality of the stage. It will be launched as a GPU task and executeson the GPU component.

The after lambda (optional and can also be a callable object or a function pointer) takes the output of theGPU kernel and performs post-processing (if necessary) to prepare for the following stages, which takestwo parameters:

• A reference to hetcompute::beta::pattern::pipeline::context, as with regularpipeline CPU stage functions.

• A reference to the return tuple of the before lambda. Because the tuple contains both the inputs andoutputs of the GPU kernel, this reference makes it possible for the after lambda to access the outputresults of the GPU kernel for potential post-processing. The after lambda can return any type just asa regular CPU stage function (see Stage Function).

The before lambda and after lambda of a GPU stage synchronize the data beween the current GPU stageand its predecessor/successor stage. In the current design, the synchronization only happens on the CPUside. Since the GPU kernel wraps boths its inputs and outputs as arguments, we have to use the

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 56

Page 57: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

std::tuple approach to provide the inputs and get the outputs from the GPU kernel instead ofparameter pass and value returning as regular functions.

A GPU pipeline stage has the same stage parameters as a regular CPU stage, such ashetcompute::serial_stage or hetcompute::parallel_stagehetcompute::iteration_rate, hetcompute::iteration_lag,hetcompute::sliding_window_size (see HetCompute Pipeline Details).

4.3 Introduction to Tasks

HetCompute programmers partition their applications into independent units of work that can be executedasynchronously in the CPU, the GPU or the Qualcomm Hexagon DSP. These units of work are called tasks.The simplest way to create and launch a task into the HetCompute runtime is by usinghetcompute::launch(Code&&, Args&&...):

1 #include <hetcompute/hetcompute.hh>2 #include <stdio.h>34 int5 main()6 {7 hetcompute::runtime::init();8 // Create task t that prints "Hello World!"9 auto t = hetcompute::launch([] { HETCOMPUTE_ILOG("Hello World!\n"); });1011 HETCOMPUTE_ILOG("This is Qualcomm HetCompute!\n");1213 // Wait for t to complete14 t->wait_for();1516 hetcompute::runtime::shutdown();17 return 0;18 }

The example above shows a HetCompute application that writes Hello World! and This is HetCompute!concurrently. In line 9, hetcompute::launch(Code&&) creates and launches a task t that printsHello World!. hetcompute::launch(Code&&) takes as a parameter a lambda expression,really just an anonymous function, which defines the work that the tasks execute. As will be seen in theremainder of the guide, hetcompute::launch(Code&&, Args&&...) is extremely versatile, andaccepts function pointers, CPU kernels, OpenCL kernels, etc. hetcompute::launch(Code&&)returns a pointer to the task immediately and the execution proceeds to the next statement—HETCOMPUTE_ILOG("This is HetCompute!")— while the HetCompute runtime schedulestask t in the first available CPU core. Because t might run concurrently with main, the following twoprogram outputs are feasible:

Hello World!This is HETCOMPUTE!

This is HETCOMPUTE!Hello World!

Note that HETCOMPUTE_ILOG("This is HetCompute!\n") might execute before t does(perhaps the system is busy). The t->wait_for() statement in line 14 ensures that the program finishesonly after t has completed its execution.

In HetCompute, tasks can have predecessors and successors, forming directed acyclic task graphs. Thepredecessors of a task t are the tasks that must complete before t can execute. Conversely, the successors tare the set of tasks that will execute only after t has completed its execution. Programmers can specify apredecessor/successor relationship between two tasks by creating a dependency between them using the

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 57

Page 58: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

hetcompute::task<>::then() member function:

1 #include <hetcompute/hetcompute.hh>2 #include <stdio.h>34 int5 main()6 {7 hetcompute::runtime::init();89 // Create task that prints "Hello "10 auto hello = hetcompute::launch([] { HETCOMPUTE_ILOG("Hello "); });1112 // Create task that prints " World!"13 auto world = hetcompute::create_task([] { HETCOMPUTE_ILOG("World!\n"); });1415 // Make sure that "World!" prints after "Hello"16 hello->then(world);1718 // Launch world19 world->launch();2021 // Wait for world to complete22 world->wait_for();2324 hetcompute::runtime::shutdown();25 return 0;26 }

The example creates two tasks — hello and world — and sets up a dependency between them (line 16)to ensure that World! is printed after Hello. Without this dependency, the HetCompute runtime couldexecute world first, or concurrently. Note that the order in which tasks are launched does not reflect theorder in which the HetCompute runtime executes them. In line 22, the example waits for world to finish.There is no need to explicitly wait for hello because the HetCompute runtime guarantees that hellocompletes before world executes.

It is important to notice the use of two separate API calls to create and launch world (lines 10 and 19,respectively). This is to create a dependency between hello and world before the latter is launched,because it is not possible to add a predecessor to a launched task because it might have already executed bythe time the program calls hetcompute::task<>::then(). This is the reason why, in HetCompute,task creation, launch, and execution are different operations — hetcompute::launch(Code&&,Args&&...) combines them for the sake of programmatic convenience and efficiency. In HetCompute,tasks must be launched in order to be executed. Launching a task t means that the programmer has finishedadding predecessors to t and that she wants the HetCompute runtime to execute t as soon as itspredecessors have completed and there are execution units available.

These two simple samples illustrate two basic HetCompute abstractions: tasks and dependencies. InHetCompute, programmers think about algorithms in terms of concurrent tasks and let the HetComputeruntime schedule them onto available resources in the system. Programmers can create dynamic task graphsby setting dependencies between tasks that the runtime enforces.

Most parallel algorithms launch more than one task, unlike the example above. Waiting for each individualtask can get cumbersome: the programmer would need to store the task pointers in some container (e.g,queue, list), and call hetcompute::task<>::wait_for() on each of them. For programmaticconvenience and efficiency, HetCompute provides another asynchronous abstraction called groups. A groupis a set of tasks that can be waited for as a unit.

1 #include <hetcompute/hetcompute.hh>2 #include <stdio.h>34 int5 main()

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 58

Page 59: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

6 {7 hetcompute::runtime::init();8 // Create group g9 auto g = hetcompute::create_group();1011 // Launch 10 tasks into g12 for (int i = 0; i < 10; i++)13 {14 g->launch([i] { HETCOMPUTE_ILOG("Hello World! I’m task #%d\n", i); });15 }1617 // Wait for tasks to complete and exit group18 g->wait_for();19 hetcompute::runtime::shutdown();2021 return 0;22 }

The example above creates a group g in line 9. Then, rather than creating tasks out of lambda functions andlaunching them explicitly as in the earlier samples, this example launches the lambda functions directly intogroup g (lines 12-15). HetCompute internally creates optimized tasks that execute as part of the group.Finally, all the tasks are waited for as a unit in line 18.

With this basic introduction to the fundamental units of asynchrony in HetCompute, tasks and groups, readthrough the next several chapters to discover various exciting operations and capabilities of theseasynchronous abstractions, including heterogeneous execution, dataflow, non-blocking parallelization,cancellation, exception handling, and algebraic operations.

Kernels: The Path to Heterogeneity

Creating Tasks

Task Pointers

Life of a HetCompute Task

Launching Tasks

Task Dependencies

Task Groups

Waiting for Tasks

Exceptions and Cancellation

Blocking Tasks

Algebraic Operations on Tasks

Task-Pointer Collapsing

Unleashing Asynchrony

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 59

Page 60: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4.3.1 Kernels: The Path to Heterogeneity

As mentioned in Introduction to Tasks, a HetCompute task contains work that can be executed on anydevice in a system: the CPU, the GPU, or the Qualcomm Hexagon DSP. This allows HetComputeprogrammers to write applications composed of a variety of tasks, fully exploiting the performance andpower efficiency of the heterogeneous devices available in modern computing systems.

HetCompute tasks use kernels to achieve this computational heterogeneity. A kernel contains thecomputation, that is, the actual device code, a task executes. This could be CPU code, GPU code, orQualcomm Hexagon DSP code (aDSP or cDSP), resulting in three different types of kernels. In the currentHetCompute release, every task contains exactly one kernel, dictating which device this task executes on.

4.3.1.1 Revisiting Hello World

The hello world example in the previous section uses hetcompute::launch to create and launch aCPU task in one step. However, hetcompute::launch is in fact one of many convenient methods thatcombine multiple steps into one method. In particular, hetcompute::launch creates anonymouskernels and tasks as necessary along the way.

The general steps to write a HetCompute program are as follows:

1. Create a kernel using a CPU, GPU, or DSP function

2. Set attributes on the kernel such as blocking or not

3. Use the kernel to create one or more tasks

4. Repeat the previous steps to create more tasks

5. Set dependencies between tasks and launch them

Note

Some of the steps above can be interleaved with other steps. For example, new tasks can still becreated and set up task dependency for even after some tasks have already been launched. Meanwhile,many convenient methods exist to combine multiple steps into one; refer to other sections in thischapter to see some of these in action.

As an example, the following code rewrites the hello world program by explicitly carrying out the stepslisted above:

1 #include <hetcompute/hetcompute.hh>2 #include <stdio.h>34 int5 main()6 {7 hetcompute::runtime::init();8 // Create a cpu_kernel k that prints "Hello World!"9 auto k = hetcompute::create_cpu_kernel([] { HETCOMPUTE_ILOG("Hello World!

\n"); });1011 // Use k to create a task t12 auto t = hetcompute::create_task(k);1314 // Launch the task t15 t->launch();1617 // Print another line after t is asynchronously launched18 HETCOMPUTE_ILOG("This is HETCOMPUTE!\n");19

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 60

Page 61: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

20 // Wait for t to complete21 t->wait_for();2223 hetcompute::runtime::shutdown();24 return 0;25 }

Note

Although the modified hello world example above creates a task using a CPU kernel (which itself iscreated from a lambda expression, that is, an anonymous CPU function), hetcompute::create-_task can take other types of kernels created from various device functions: a CPU kernel, a GPUkernel, or a DSP kernel. The detail on how to create these heterogeneous kernels follows.

4.3.1.2 Creating a Kernel

Currently, there are three types of kernels in HetCompute. Use the following methods to create each ofthem:

1. To create a CPU kernel, use hetcompute::create_cpu_kernel. It takes either a function ora function object (that is, a lambda expression or a functor).

2. To create a GPU kernel, use hetcompute::create_gpu_kernel. There are two variants: onefor wrapping OpenCL C kernels and another for wrapping OpenGL ES compute shaders. Thearguments for the OpenCL variant are 1) a string containing an OpenCL GPU device function source,and 2) a string containing the name of the device function. The OpenGL variant takes as argument astring containing a compute shader program source. The hetcompute::create_gpu_kernelmust be explicitly typed by the programmer according to the GPU function’s signature; see Buffersregarding how to translate arrays in GPU function parameters to hetcompute::buffer_ptr.

3. To create a DSP kernel, use hetcompute::create_dsp_kernel. It takes a DSP-compatible Cfunction. HetCompute SDK tries best to adhere to the DSP kernel attributes (set_adsp, set_cdsp) setby developer. DSP Kernel execution defaults to aDSP if chipset does not have cDSP.

Note

The three kernel creation methods have different semantics, due to the difference in how CPU, GPU,and DSP functions are written. Refer to Kernels in the API Reference Manual for details regardingeach method.

The following example shows how to use the methods in practice:

1 #include <hetcompute/hetcompute.hh>23 static int4 f1(int x)5 {6 return x * 2;7 }89 static auto f2 = [](int x) -> int { return x * 2; };1011 static struct12 {13 int operator()(int x) { return x * 2; }14 } f3;1516 // Source string for an OpenCL C kernel17 static std::string const f4_string = "__kernel void f4(__global int *x, __global int *y) {"18 " int i = get_global_id(0);"19 " y[i] = x[i];"

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 61

Page 62: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

20 "}";2122 static int23 f5(int* x, int* y, int l)24 {25 int i = 0;26 for (i = 0; i < l; i++)27 y[i] = x[i] * 2;28 return 0;29 }3031 int32 main()33 {34 hetcompute::runtime::init();3536 // Create a cpu_kernel from a function37 auto k1 = hetcompute::create_cpu_kernel(f1);3839 // Create a cpu_kernel from a lambda expression40 auto k2 = hetcompute::create_cpu_kernel(f2);4142 // Create a cpu_kernel from a functor43 auto k3 = hetcompute::create_cpu_kernel(f3);4445 // Create a gpu_kernel from an OpenCL C GPU function46 auto k4 = hetcompute::create_gpu_kernel<hetcompute::buffer_ptr<int>,

hetcompute::buffer_ptr<int>>(f4_string, "f4");4748 // Create a hexagon_kernel from a DSP function49 auto k5 = hetcompute::create_dsp_kernel<>(f5);5051 hetcompute::runtime::shutdown();52 return 0;53 }

Once a kernel is created, it can be used by hetcompute::create_task to create a task. Moreover, akernel can be used to create multiple independent tasks. (Kernels: Advanced Topics explains what theindependence means.)

4.3.1.2.1 GPU kernels for OpenCL and OpenGL ES

The previous example in Creating a Kernel showed the basic usage ofhetcompute::create_gpu_kernel. A source string was passed, which was implicitly assumed tocontain an OpenCL C function. hetcompute::create_gpu_kernel also allows the user to providean OpenGL ES compute shader program. The user must explictly pass hetcompute::beta::gl as thefirst parameter to hetcompute::beta::create_gpu_kernel to indicate an OpenGL ES computeshader, as shown in the following example.

1 #include <hetcompute/hetcompute.hh>23 #define LOCAL_SIZE 16 // This should match the local_size_x value in the shader45 const char* shader_code = R"GLCODE(67 #version 310 es8 precision highp float;9 layout(local_size_x = 16) in;10 layout(std430) buffer;1112 layout(binding = 2) writeonly buffer Output {13 float elements[];14 } output_data;1516 layout(binding = 0) readonly buffer Input0 {17 float elements[];

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 62

Page 63: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

18 } input_data0;1920 layout(binding = 1) readonly buffer Input1 {21 float elements[];22 } input_data1;2324 void main()25 {26 uint ident = gl_GlobalInvocationID.x;27 output_data.elements[ident] = input_data0.elements[ident] + input_data1.elements[ident];28 }2930 )GLCODE";3132 int33 main()34 {35 hetcompute::runtime::init();36 auto buf_a = hetcompute::create_buffer<float>(1024);37 auto buf_b = hetcompute::create_buffer<float>(buf_a.size());3839 buf_a.acquire_wi();40 buf_b.acquire_wi();41 // Initialize the input vectors42 for (size_t i = 0; i < buf_a.size(); ++i)43 {44 buf_a[i] = i;45 buf_b[i] = buf_a.size() - i;46 }47 buf_a.release();48 buf_b.release();4950 auto buf_c = hetcompute::create_buffer<float>(buf_a.size());5152 auto gl_vadd = hetcompute::beta::create_gpu_kernel<hetcompute::in<hetcompute::buffer_ptr<float>>,53 hetcompute::in<hetcompute::buffer_ptr<float>>,54 hetcompute::out<hetcompute::buffer_ptr<float>>>(

hetcompute::beta::gl, shader_code);5556 hetcompute::range<1> global_range(buf_a.size());5758 hetcompute::range<1> local_range(LOCAL_SIZE);5960 // Create a task61 auto gpu_task = hetcompute::create_task(gl_vadd, // gpu kernel62 global_range, // global range63 local_range, // local range64 buf_a, // rest correspond to gpu_vadd template

parameters65 buf_b,66 buf_c);67 gpu_task->launch();6869 gpu_task->wait_for();70 hetcompute::runtime::shutdown();71 }

Correspondingly when passing OpenCL C functions, the user may optionally passhetcompute::beta::cl to make the use of OpenCL explicit, as illustrated below.

auto k4 = hetcompute::beta::create_gpu_kernel<hetcompute::buffer_ptr<int>,hetcompute::buffer_ptr<int>>(hetcompute::beta::cl, f4_string, "f4");

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 63

Page 64: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4.3.1.2.2 Global and Local Ranges

When launching a GPU kernel, a global range parameter must always be provided. The global rangeidentifies the total number and shape of GPU threads that will be executed. Optionally, a local rangeparameter may also be provided. When omitted, the local range is assumed to be of size 1 (of the1D/2D/3D dimensionality corresponding to the global range).

With GPU kernels created from OpenCL C, the programmer typically doesn’t face correctness issues whenomitting the local range parameter. However, OpenGL ES shader program sources specify a local range.When launching GPU kernels created from OpenGL ES shaders, the programmer must ensure thatthe local range is provided and matches the local range used in the compute shader program. Forexample, the sample program in GPU kernels for OpenCL and OpenGL ES uses a 1D local range of size 16.

4.3.1.3 Setting Kernel Attributes

After a kernel is created—from a CPU, GPU, or DSP function—Qualcomm HetCompute users cannot swapthe underlying function for another one in the kernel. However, users can change the following attributes ofthe kernel:

1. Blocking: denotes whether a CPU task made from this kernel is expected to block on external eventssuch as I/O activities (Blocking Tasks). The blocking attribute can be set and queried using theset_blocking and is_blocking methods of a kernel object.

2. Big: denotes that the CPU kernel is preferably executed on a big core (i.e. has affinity to the big core)in a big.LITTLE SoC. Users can override the kernel affinity setting through the HetCompute affinityAPIs (Affinity). The big attribute can be set and queried using the set_big and is_big methodsof a kernel object.

3. Little: denotes that the CPU kernel is preferably executed on a LITTLE core (i.e. has affinity to theLITTLE core) in a big.LITTLE SoC. Users can override the kernel affinity setting through theHetCompute affinity APIs (Affinity). The little attribute can be set and queried using theset_little and is_little methods of a kernel object.

4. aDSP: denotes that the DSP kernel should be executed on aDSP.

5. cDSP: denotes that the DSP kernel should be executed on cDSP.

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();7 auto k1 = hetcompute::create_cpu_kernel([] { HETCOMPUTE_ILOG("big task

executed"); });8 // inform the Hetcompute runtime that the kernel is best executed on a big9 // core in a big.LITTLE SoC10 k1.set_big();1112 auto t1 = hetcompute::launch(k1);13 t1->wait_for();1415 auto k2 = hetcompute::create_cpu_kernel([] { HETCOMPUTE_ILOG("LITTLE task

executed"); });16 // inform the Hetcompute runtime that the kernel is best executed on a17 // LITTLE core in a big.LITTLE SoC18 k2.set_little();1920 auto t2 = hetcompute::launch(k2);21 t2->wait_for();22

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 64

Page 65: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

23 hetcompute::runtime::shutdown();24 return 0;25 }

Note

While the HetCompute runtime system will try to enforce the big and little kernel attributes, dynamicsystem conditions such as offline cores may prevent it from doing so.Changing a kernel’s attributes does not affect tasks created from this kernel prior to the change.

4.3.1.4 Kernels: Advanced Topics

Because object lifetime management is generally tricky in asynchronous and parallel programming, at taskcreation time, HetCompute copies a kernel into the resulting task. This has a few implications:

1. The task can exist independent of the kernel’s lifetime. This is particularly useful when a kernel iscreated inside a scope such as a function, but the tasks created from this kernel can live beyond theend of this scope and get launched later, even after the original kernel object has already beendestroyed.

2. This also means programmers should be particularly careful with kernel-owned data. As shown in theexample below, the copying of kernel objects during task creation may lead to non-obvious results.

1 #include <hetcompute/hetcompute.hh>2 #include <stdio.h>34 // A functor that stores some internal state5 struct foo6 {7 int n;89 foo() : n(1) {}1011 void operator()()12 {13 HETCOMPUTE_ILOG("n = %d\n", n);14 n++;15 }16 };1718 int19 main()20 {21 hetcompute::runtime::init();2223 // Create a cpu_kernel from the functor24 foo bar;25 auto k = hetcompute::create_cpu_kernel(bar);2627 // Create two independent tasks from k28 auto t1 = hetcompute::create_task(k);29 auto t2 = hetcompute::create_task(k);3031 // Set dependency and launch the tasks32 t1->then(t2);33 t1->launch();34 t2->launch();3536 // Expected output: 1 and 1, not 1 and 237 t2->wait_for();3839 hetcompute::runtime::shutdown();4041 return 0;42 }

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 65

Page 66: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

Kernel objects can generally be moved and copied in construction and assignment, similar to regularobjects. However, one exception is that cpu_kernel objects created from lambda functions cannot becopy-assigned, because lambda functions do not have copy-assignment operators.

Finally, it is possible to create kernel objects by directly instantiating them via their constructors, instead ofusing the factory methods (that is, hetcompute::make_cpu_kernel,hetcompute::make_gpu_kernel, and hetcompute::make_dsp_kernel). This may be usefulfor passing kernel pointers around and extending kernel classes. However, in those situations, one usuallycannot use the auto keyword, and it is important to explicitly and correctly type the kernel objects.

4.3.1.5 Poly-kernels (Beta feature)

The HetCompute runtime automatically balances load across the various devices in a heterogeneoussystem: the CPU (big or LITTLE), GPU, and DSP. Developers can exploit this feature by constructing"poly-kernels" – kernels having multiple implementations of the same interface – and have the HetComputeruntime system dynamically pick the most suitable device on which to execute the task constructed from thepoly-kernel.

1 #include <hetcompute/hetcompute.hh>23 // Macro which creates a string containing the OpenCL C kernel.4 #define OCL_KERNEL(name, k) std::string const name##_string = #k56 OCL_KERNEL(vadd_kernel, __kernel void vadd(__global float* A, __global float* B, __global float* C,

unsigned int size) {7 unsigned int i = get_global_id(0);8 if (i < size)9 C[i] = A[i] + B[i];10 });1112 int13 main(void)14 {15 hetcompute::runtime::init();1617 {18 // Create input buffers, automatically host accessible19 auto buf_a = hetcompute::create_buffer<float>(1024);20 auto buf_b = hetcompute::create_buffer<float>(buf_a.size());2122 buf_a.acquire_wi();23 buf_b.acquire_wi();24 // Initialize the input buffers25 for (size_t i = 0; i < buf_a.size(); ++i)26 {27 buf_a[i] = i;28 buf_b[i] = buf_a.size() - i;29 }30 buf_a.release();31 buf_b.release();3233 // Create an output buffer in relaxed mode: not automatically accsssible by host34 auto buf_c = hetcompute::create_buffer<float>(buf_a.size());3536 // Name of the OpenCL C kernel.37 std::string kernel_name("vadd");3839 // Create a gpu kernel. Note the optional in/out directions that allow HETCOMPUTE40 // to perform copy optimizations. By default, the buffers are treated as41 // inout.42 auto gpu_vadd = hetcompute::create_gpu_kernel<hetcompute::in<hetcompute::buffer_ptr<float>>

,43 hetcompute::in<hetcompute::buffer_ptr<float>>,44 hetcompute::out<hetcompute::buffer_ptr<float>>,45 unsigned int>(vadd_kernel_string, kernel_name);

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 66

Page 67: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

46 unsigned int size = buf_a.size();4748 // Create a hetcompute::range object, 1D in this case.49 hetcompute::range<1> range_1d(buf_a.size());5051 // Create cpu kernel with the same interface as the gpu kernel52 auto cpu_vadd = [](hetcompute::range<1> r,53 hetcompute::buffer_ptr<float> a,54 hetcompute::buffer_ptr<float> b,55 hetcompute::buffer_ptr<float> c,56 unsigned int) {57 HETCOMPUTE_ILOG("running on the CPU");58 hetcompute::pfor_each(r, [&](

hetcompute::index<1> i) { c[i[0]] = a[i[0]] + b[i[0]]; });59 };6061 // Create a task out of a poly-kernel.62 // The HetCompute runtime will automatically choose the appropriate gpu or cpu63 // variant to dispatch based on runtime load on the devices.64 auto poly_task = hetcompute::beta::create_task(std::make_tuple(

gpu_vadd, cpu_vadd),65 range_1d, // global range66 buf_a, // rest correspond to gpu_vadd template

parameters67 buf_b,68 buf_c,69 size);70 // Launch the task71 poly_task->launch();7273 // Wait for task completion.74 poly_task->wait_for();7576 buf_a.acquire_ro();77 buf_b.acquire_ro();78 buf_c.acquire_ro();7980 // Access the results on the host and verify their correctness.81 for (size_t i = 0; i < buf_a.size(); ++i)82 {83 HETCOMPUTE_INTERNAL_ASSERT(buf_a[i] + buf_b[i] == buf_c[i] && buf_c[i] == buf_a.size(),84 "comparison failed at ix %zu: %f + %f == %f == %zu",85 i,86 buf_a[i],87 buf_b[i],88 buf_c[i],89 buf_a.size());90 }9192 buf_a.release();93 buf_b.release();94 buf_c.release();95 }9697 hetcompute::runtime::shutdown();98 }

In the example above, the programmer is interested in adding two vectors. A GPU kernel should beimplemented to perform vector addition on line 42. A CPU kernel should be implemented (usinghetcompute::pfor_each) on line 52. Both alternatives should be exposed to the HetComputeruntime system by constructing a task out of a poly-kernel, exposed as an std::tuple, on line 64. Theconstructed poly_task is like any other task; the programmer can wait for the task, add the task togroups, set dependencies, etc. Once the task is launched (on line 71), the HetCompute runtime performs"late binding" of function to device, wherein either the CPU or the GPU may execute the task depending ontheir relative load.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 67

Page 68: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

Note

The poly-kernel is a beta feature in this HetCompute release, and its API may change in the future.

4.3.1.6 hetcompute::range and hetcompute::index

hetcompute::range<N> provides an abstraction to specify the bounds of an N-dimensional space (N iscurrently 3). Similarly, hetcompute::index<N> represents a point in N-dimensional space. Together, thesecan be used to iterate over an N-dimensional space. It is primarily used for the creation of GPU tasks inHetCompute as it maps very well to OpenCL NDRange; however, it can also be used with CPU tasks if theiteration space has more than one dimension, or if there is a need for strided access.

4.3.1.7 Using hetcompute::range<N> to represent ND-Range (in OpenCL)

hetcompute::range<N> is useful to represent an ND-Range (in OpenCL). The table below shows differentways to create the hetcompute::range<N> object and how it maps to OpenCL constructs.

hetcompute::range OpenCL NDRangehetcompute::range<1>(w) cl::NDRange(w)hetcompute::range<2>(w, h) cl::NDRange(w, h)hetcompute::range<3>(w, h, d) cl::NDRange(w, h, d)hetcompute::range<1>(off_x, w) cl::NDRange(off_x) - will be used as an offset

when used in the GPU kernel launchcl::NDRange(w) - will be used as global size whenused in gpu kernel launch

hetcompute::range<2>(off_x, w, off_y, h) cl::NDRange(off_x, off_y) - will be used as anoffset when used in the GPU kernel launchcl::NDRange(w, h) - will be used as global sizewhen used in the GPU kernel launch

hetcompute::range<3>(off_x, w, off_y, h, off_z,d)

cl::NDRange(off_x, off_y, off_z) - will be used asan offset when used in the GPU kernel launchcl::NDRange(w, h, d) - will be used as global sizewhen used in the GPU kernel launch

4.3.1.8 hetcompute::range<1>

Represents a 1D range and is useful if your problem space is linear like vector addition. The example belowshows a simple vector addition where each work item takes two input vectors and produces one element inthe output vector.

// Macro which creates a string containing the OpenCL C kernel.#define OCL_KERNEL(name, k) std::string const name##_string = #k

OCL_KERNEL(vadd_kernel, __kernel void vadd(__global float* A, __global float* B, __global float* C,unsigned int size) {

unsigned int i = get_global_id(0);if (i < size)

C[i] = A[i] + B[i];});

// Name of the OpenCL C kernel.std::string kernel_name("vadd");

// Create a gpu kernel object. Note the optional in/out directions that allow HetCompute to perform// copy optimizations. By default, the buffers are treated as inout.auto gpu_vadd = hetcompute::create_gpu_kernel<hetcompute::in<hetcompute::buffer_ptr<float>>,

hetcompute::in<hetcompute::buffer_ptr<float>>,hetcompute::out<hetcompute::buffer_ptr<float>>,

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 68

Page 69: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

unsigned int>(vadd_kernel_string, kernel_name);

hetcompute::range<1> range_1d(buf_a.size());

// Create a taskauto gpu_task = hetcompute::create_task(gpu_vadd, // gpu kernel

range_1d, // global rangebuf_a, // rest correspond to gpu_vadd template parametersbuf_b,buf_c,size);

vadd_kernel_string represents a string containing the OpenCL C code for vector addition.

4.3.1.9 hetcompute::range<2>

Represents a 2D range and is useful if your problem space is two-dimensional, similar to matrixmultiplication or many image processing applications. The example below shows a simple matrixmultiplication, where each work item computes one element in the output matrix.

// Macro which creates a string containing the OpenCL C kernel.#define OCL_KERNEL(name, k) std::string const name##_string = #k

OCL_KERNEL(matrix_multiply_kernel,__kernel void matrix_multiply(__global float* a, __global float* b, __global float* c, int M,

int P, int N) {int i = get_global_id(1);int j = get_global_id(0);if (i >= M || j >= N)

return;

c[i * N + j] = 0;for (int k = 0; k < P; k++){

c[i * N + j] += a[i * P + k] * b[k * N + j];}

});

// Name of the OpenCL C kernel.std::string kernel_name("matrix_multiply");

// Create a gpu kernel object. Note the optional in/out directions that allow HetCompute to perform// copy optimizations. By default, the buffers are treated as inout.auto gpu_mm = hetcompute::create_gpu_kernel<hetcompute::in<hetcompute::buffer_ptr<float>>,

hetcompute::in<hetcompute::buffer_ptr<float>>,hetcompute::out<hetcompute::buffer_ptr<float>>,unsigned int,unsigned int,unsigned int>(matrix_multiply_kernel_string, kernel_name);

hetcompute::range<2> range_2d(n, m);

// Create a taskauto gpu_task = hetcompute::create_task(gpu_mm, // gpu kernel

range_2d, // global rangebuf_a, // rest correspond to gpu_vadd template parametersbuf_b,buf_c,m,p,n);

matrix_multiply_kernel_string represents a string containing the OpenCL C code for matrixmultiplication.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 69

Page 70: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4.3.1.10 hetcompute::range<3>

Represents a 3D range and is useful if your problem space is three-dimensional.

4.3.1.11 Strided Ranges.

Consider the example of a denoise_kernel described in more detail Parallelization using patterns.

void denoise_image(){

// initialization, etchetcompute::range<2> r(0, w, TILE_SIZE, 0, h, TILE_SIZE);hetcompute::pfor_each(r, [input, &output] (size_t index) {

hetcompute::index<2> idx = r.linear_to_index(index);denoise_kernel(input, idx[0], TILE_SIZE, idx[1], TILE_SIZE, output);

});}

hetcompute::range<2> above defines a two-dimensional range, from [0,w) x [0, h) with a stride ofTILE_SIZE in each dimension. Each dimension can have a different stride. We use this range as ouriteration space for the parallel loop.

hetcompute::index<2> defines a two-dimensional index. HetCompute ranges know how to iterate to theappropriate points. hetcompute::pfor_each provides a linear index to the lambda (size_t index) in the codeabove; using the linear_to_index call returns a hetcompute::index<2> object that has the appropriatecoordinates in each dimension of the range. We can directly access the dimensions using the [] operator onthe object.

Note

Tasks which run on the GPU should have a stride of 1.

4.3.2 Creating Tasks

HetCompute offers multiple templated methods to create tasks in order to cover several use case scenarios:

1. Create and launch the HetCompute task in a group:

• template<typename Code, typename...Args>void hetcompute::group::launch(Code&& code, Args&& ...args)

• template<typename Code, typename...Args>void hetcompute::group::launch(do_not_collapse_t, Code&& code,Args&& ...args)> Launching tasks in a group is the fastest method to create and executetasks in HetCompute. Launching into a group, simplifies the lifetime maintenance of the taskpointers, because the runtime will take care of it. If the task logically belongs with other tasks,consider creating the group and launching tasks in the group.

2. Create a HetCompute task and launch:

• template<typename Code, typename...Args>collapsed_task_type<Code> hetcompute::launch(Code&& code,Args&& ...args)

• template<typename Code, typename...Args>non_collapsed_task_type<Code>hetcompute::launch(do_not_collapse_t, Code&& code, Args&&...args)> If the tasks are not part of a group, creating and launching a task allows the

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 70

Page 71: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

programmer to directly pass the task to the runtime. The runtime knows that the task does nothave any predecessors and is ready for execution.

3. Create a HetCompute task without launching:

• template<typename ReturnType, typename... Args>hetcompute::task_ptr<ReturnType>hetcompute::create_value_task(Args&& ...args)

• template<typename Code, typename... Args>collapsed_task_type<Code> hetcompute::create_task(Code&&,Args&&...)

• template<typename Code, typename... Args>non_collapsed_task_type<Code>hetcompute::create_task(do_not_collapse_t, Code&&, Args&&...)Creating a task using the create_task methods returns a pointer to the task. The task is notready to be launched and the programmer has the opportunity to set up dependencies (that is,make this task part of a task graph). The runtime maintains the validity of the pointer for thetime when the task is executing. However, because the programmer has a pointer to it, thelifetime of that pointer will also impact how long the task will be maintained in the system. It isrecommended that the programmer reset the pointer when finished with the task.

There are three essential parts for creating a HetCompute task (non-value task):

1. Type of the task, i.e, collapsed or non-collapsed.By default, collapsed tasks are created. Otherwise, pass hetcompute::do_not_collapseto force create non-collapsed tasks. For more information about collapsed andnon-collapsed tasks, see Task-Pointer Collapsing.

2. Computation of the task, that is, Code&& code, in the templated methods. HetCompute supportsthe following computation constructs to encapsulate in tasks:

• C++ basic function-like blocks, that is, lambda expressions (preferred), function objects, orfunction pointers.

• HetCompute Kernels (see Kernels: The Path to Heterogeneity)

• HetCompute Patterns (see Parallel Programming Patterns)

3. Arguments bound to the task, that is, Args&&... args, in the template methods (optional). Ifthe computation construct of a task takes some arguments, the arguments can be bound to the taskwhen creating the task. Argument binding can also happen after the creation of the task(hetcompute::task<ReturnType(Args...)>bind_all), but has to happen beforelaunching.

The methods in the first group create a task and return a pointer to it(hetcompute::task_ptr<...>). The task_ptr is typed according to type of computationconstruct encapsulated in the task. For example, the following code creates three hetcompute tasks.

auto t1 = hetcompute::create_task([&] {i ++;});auto t2 = hetcompute::create_task([&] {return 42;});auto t3 = hetcompute::create_task([&] (bool b, float f){

return b ? int(f) : 0;});

The type of t1 is hetcompute::task_ptr<>.

The type of t2 is hetcompute::task_ptr<int>.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 71

Page 72: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

The type of t3 is hetcompute::task_ptr<int(bool, float)>.

Type of task_ptr determines operations possible on task through that pointer:

1. hetcompute::task_ptr<> can only set up control dependency, e.g., t1->then(t2);

2. hetcompute::task_ptr<ReturnType> can set up control and be the predecessor of somedata dependencies, e.g., t2->then(t1), t3->bind_all(t2). However, this type oftask_ptr cannot bind any data since the argument list information is not there.

3. hetcompute::task_ptr<ReturnType(Args...)> can set up control and datadependencies (as predecessor or successor), and bind data to its computation construct. e.g.,t3->then(t1), t3->bind_all(t2).

For more information about task data dependency, see Task Dependencies.

Note: hetcompute::task_ptr<ReturnType> is-a hetcompute::task_ptr<> Note:hetcompute::task_ptr<ReturnType(Args...)> is-ahetcompute::task_ptr<ReturnType>.

A HetCompute value task is a launched task with no computation construct but only the return value. It isuseful for task algebra, see Algebraic Operations on Tasks. For more information abouthetcompute::task_ptr<...> and HetCompute value tasks, see Tasks and Unleashing Asynchrony.

The methods in the second group create a task, return a pointer to it, and launch the task in the samemethod. It is a combination of hetcompute::create_task(..) andhetcompute::task::launch(...). Task arguments must be bound when using this group ofmethods if the task computation construct takes any.

The methods in the third group create a task and launch it into a group. Usinghetcompute::group::launch(...) is the fastest way to create tasks in HetCompute and shouldbe used as often as possible. For more information about groups, see Task Groups.

Note: The first group of templated methods make it possible to decouple task creation and task launching(Launching Tasks) so that task dependencies can be set between these two steps.

Note: The second group of templated methods are recommended when there are no predecessors to thecreated task. It is more performant than the decoupled creation and launching.

Note: The third group of templated methods should be used as often as possible when there are nodependencies needed for the created task. It is the most performant way to create and launch a task.

4.3.2.1 Create Tasks Using Lambda Expressions

Lambda expressions are a new feature in C++11, and the preferred argument type to create hetcomputetasks. Lambda expressions are unnamed function objects that are able to capture variables from theenclosing scopes. A description of this C++11 feature is outside the scope of this document. Find detailedinformation about lambda expressions in the following links:

• C++11 Tutorial: Lambda Expressions -- The Nuts and Bolts ofFunctional Programming

• Lambda functions

• Lambda Functions in C++11 - the Definitive Guide

• Michael Caisse: Lambda Functions

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 72

Page 73: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

The following code uses a lambda expression to create a task t1 that prints ’Hello World!’:

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();78 // Create a task that prints Hello World!9 auto t1 = hetcompute::create_task([] { HETCOMPUTE_ILOG("Hello World!\n"); });1011 // Launch the task.12 t1->launch();13 // Wait for the task to finish.14 t1->wait_for();1516 hetcompute::runtime::shutdown();17 return 0;18 }

Alternatively, you could create the task using hetcompute::launch(...) orhetcompute::group::launch(...).

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();78 // Create a task that prints Hello World!9 auto t = hetcompute::launch([] { HETCOMPUTE_ILOG("Hello World!\n"); });10 // wait for t1 to finish11 t->wait_for();1213 // create a group14 auto g = hetcompute::create_group();15 // launch a task in g16 g->launch([] { HETCOMPUTE_ILOG("Hello World!\n"); });17 // wait for g to finish18 g->wait_for();1920 hetcompute::runtime::shutdown();21 return 0;22 }

The lambda expression in the previous example is very simple as it does not capture any variables. Let’ssuppose that you want to capture a string with the user name to do a proper greeting:

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();7 auto g = hetcompute::create_group();8 std::string name = "HETCOMPUTE";910 // Launching a task in the group.11 g->launch([name] { HETCOMPUTE_ILOG("Hello World, %s!\n", name.c_str()); });1213 // Wait for g to finish.14 g->wait_for();1516 hetcompute::runtime::shutdown();17 return 0;18 }

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 73

Page 74: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

By capturing name in the lambda expression, HetCompute makes sure that can use it when the taskexecutes, which happens outside the scope where the task is created. Make sure that, if you capturevariables by reference, the original object still exists when the task executes. For example, consider thefollowing code:

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();7 auto g = hetcompute::create_group();89 {10 std::string name = "HETCOMPUTE";1112 // Launching a task in the group.13 g->launch([name] { HETCOMPUTE_ILOG("Hello World, %s!\n", name.c_str()); });14 } // "name" goes out of scope here.1516 // Wait for g to finish.17 g->wait_for();18 hetcompute::runtime::shutdown();1920 return 0;21 }

The string name goes out-of-scope in line 14, and its destructor is then called. If the scheduler executes thetask after that happens, the program will most likely crash.

Refer to Task Pointers for information about capturing hetcompute::task_ptr<...> by referenceand this should never be performed.

Warning

Using default capture by copy ([=]) or by reference ([&]) will capture all variables from theenclosing scope, which may increase the size of your tasks considerably if the compiler cannot figureout that many of them are not used and do not need to be captured. It is recommended that onlycapture the variables that your lambda expression uses.

4.3.2.2 Create Tasks Using Classes

You can use any custom class as <typename Code> by overloading the class’s operator(). Thefollowing code shows how to create a task from a class instance. When the HetCompute scheduler executesthe task, the operator() method is called.

1 #include <hetcompute/hetcompute.hh>23 class user_class4 {5 public:6 explicit user_class(int value) : x(value) {}78 void operator()(int y) { HETCOMPUTE_ILOG("x = %d, y = %d\n", x, y); }910 void set_x(int value) { x = value; }1112 private:13 int x;14 };1516 int17 main()18 {19 hetcompute::runtime::init();

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 74

Page 75: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

20 // Create a hetcompute group.21 auto g = hetcompute::create_group();2223 // Create and launch a task into group g.24 g->launch(user_class(42), 27);2526 // Wait for the group to finish.27 g->wait_for();28 hetcompute::runtime::shutdown();29 return 0;30 }

It is also possible to create an object from user_class and then create a task using that object:

1 #include <hetcompute/hetcompute.hh>23 class user_class4 {5 public:6 explicit user_class(int value) : x(value) {}78 void operator()(int y) { HETCOMPUTE_ILOG("x = %d, y = %d\n", x, y); }910 void set_x(int value) { x = value; }1112 private:13 int x;14 };1516 int17 main()18 {19 hetcompute::runtime::init();20 // Create a hetcompute group.21 auto g = hetcompute::create_group();2223 // Instantiate an object of user_class.24 user_class obj(42);2526 // Create a hetcompute task.27 auto t = hetcompute::create_task(obj);2829 // Launch the task into group g.30 g->launch(t, 27);3132 // Wait for the group to finish.33 g->wait_for();34 hetcompute::runtime::shutdown();35 return 0;36 }

The previous example raises an interesting question: What would the task print if obj.set_x(100) wascalled between lines 27 and 30?

1 #include <hetcompute/hetcompute.hh>23 class user_class4 {5 public:6 explicit user_class(int value) : x(value) {}78 void operator()(int y) { HETCOMPUTE_ILOG("x = %d, y = %d\n", x, y); }910 void set_x(int value) { x = value; }1112 private:13 int x;14 };1516 int17 main()

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 75

Page 76: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

18 {19 hetcompute::runtime::init();20 // Create a hetcompute group.21 auto g = hetcompute::create_group();2223 // Instantiate an object of user_class.24 user_class obj(42);2526 // Create a hetcompute task.27 auto t = hetcompute::create_task(obj);2829 obj.set_x(100);30 // Launch the task into group g.31 g->launch(t, 27);3233 // Wait for the group to finish.34 g->wait_for();35 hetcompute::runtime::shutdown();36 return 0;37 }

As always, the answer to the ultimate question of life, the universe, and everything (including HetComputetask execution) is 42. The reason is that HetCompute makes a copy of obj when it creates the task in line27. Otherwise, users would need to keep track of the lifetime of the objects used to create tasks. However,if you were to construct the object in-place, no copies would be made:

1 #include <hetcompute/hetcompute.hh>23 class user_class4 {5 public:6 explicit user_class(int value) : x(value) {}78 void operator()(int y) { HETCOMPUTE_ILOG("x = %d, y = %d\n", x, y); }910 void set_x(int value) { x = value; }1112 private:13 int x;14 };1516 int17 main()18 {19 hetcompute::runtime::init();20 // Create a hetcompute group.21 auto g = hetcompute::create_group();2223 // Create a hetcompute task.24 auto t = hetcompute::create_task(user_class(42));2526 // Launch the task into group g.27 g->launch(t, 27);2829 // Wait for the group to finish.30 g->wait_for();31 hetcompute::runtime::shutdown();32 return 0;33 }

4.3.2.3 Create Tasks Using Function Pointers

The last way to create a task is by using a function pointer.

1 #include <hetcompute/hetcompute.hh>23 void foo();4 void

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 76

Page 77: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

5 foo()6 {7 HETCOMPUTE_ILOG("Hello World!\n");8 };910 int11 main()12 {13 hetcompute::runtime::init();14 // Create a task that executes foo().15 auto t = hetcompute::create_task(foo);1617 // Launch and wait for the task.18 t->launch();19 t->wait_for();20 hetcompute::runtime::shutdown();21 return 0;22 }

Warning

Due to limitations in the Visual Studio C++ compiler, this does not work on Visual Studio. You can getaround it by using a lambda function:

1 #include <hetcompute/hetcompute.hh>23 void foo();4 void5 foo()6 {7 HETCOMPUTE_ILOG("Hello World!\n");8 };910 int11 main()12 {13 hetcompute::runtime::init();14 // Create a task that executes foo().15 auto t = hetcompute::create_task([] { foo(); });1617 // Launch and wait for the task.18 t->launch();19 t->wait_for();20 hetcompute::runtime::shutdown();21 return 0;22 }

4.3.3 Task Pointers

Managing the lifetime of an object in a parallel application can be challenging. This is why HetComputeprovides shared_ptr-type access to tasks (and several other HetCompute constructs) to automaticallymanage task lifetime. HetCompute automatically destroys tasks once they finish and all references are nolonger in scope.

The various methods to create tasks, e.g., hetcompute::create_task(Body&&), return anappropriately typed custom smart pointer to the task object. The task_ptr may be of the following typesdepending on the type of computation encapsulated by the task:

• hetcompute::task_ptr<>: points to a task that neither accepts any arguments nor returns avalue

• hetcompute::task_ptr<void>: points to a task that returns void

• hetcompute::task_ptr<ReturnType>: points to a task that returns a value

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 77

Page 78: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

• hetcompute::task_ptr<ReturnType(Args...)>: points to a task that acceptsarguments and returns a value

The above types constitute a type hierarchy, such that a hetcompute::task_ptr<> can point to atask that returns a value or accepts arguments. Each of the above task_ptrs permits different operationson tasks, with hetcompute::task_ptr<> providing most of the common operations on tasks and the othertwo types providing more advanced operations:

• hetcompute::task_ptr<> can be launched, canceled, waited for, finished after, serve as thesource of a control dependency, etc.

• In addition to the above, hetcompute::task_ptr<ReturnType> can serve as the source of adata dependency.

• In addition to the above, hetcompute::task_ptr<ReturnType(Args...)> can have itsarguments bound through hetcompute::task<ReturnType(Args...)>::bind_all.

Tasks are reference-counted, so they are automatically destroyed when no morehetcompute::task_ptr<>s reference them. When a task is launched, the HetCompute runtimeincreases the reference count of the task. This prevents the task from being destroyed, even if all pointersreferencing the task are reset. The HetCompute runtime decrements the reference count of the task after itfinishes (completes execution, throws an exception, or is canceled). The task reference count requiresatomic operations. Copying a hetcompute::task_ptr<> causes an atomic increment and the newcopy of the hetcompute::task_ptr<> causes an atomic decrement when it goes out of scope. Forbest results, minimize the number of times your application copies hetcompute::task_ptr<>s.

Some algorithms require constantly passing hetcompute::task_ptr<>s. To maintain highperformance, HetCompute provides another task pointer type that does not perform reference counting:hetcompute::task<>∗ (, hetcompute::task<void>∗, hetcompute::task<ReturnType>∗, andhetcompute::task<ReturnType(Args...)>∗). The following example demonstrates how to pointhetcompute::task<>∗ to a task:

Note

Task lifetime is determined by the number of hetcompute::task_ptr<>s referencing it.Programmers must ensure that there is always a valid hetcompute::task_ptr<> while using ahetcompute::task<>∗; otherwise memory corruption and/or segmentation faults may result.Any operation permitted on a hetcompute::task_ptr<> is also permitted on ahetcompute::task<>∗.

It is incorrect to reference a HetCompute hetcompute::task_ptr<> like the following example:

1 hetcompute::task_ptr<> t1 = hetcompute::create_task([] {HETCOMPUTE_ILOG("Hello World from t1!"); });

2 hetcompute::task_ptr<> t2 = hetcompute::create_task([&t1] {HETCOMPUTE_ILOG("Hello World from t2!"); });

Instead, copy the pointer

1 hetcompute::task_ptr<> t1 = hetcompute::create_task([] {HETCOMPUTE_ILOG("Hello World from t1!"); });

2 hetcompute::task_ptr<> t2 = hetcompute::create_task([t1] {HETCOMPUTE_ILOG("Hello World from t2!"); });

Or use a hetcompute::task<>∗ (of course, make sure that t1 does not go out of scope):

1 hetcompute::task_ptr<> t1 = hetcompute::create_task([] {

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 78

Page 79: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

HETCOMPUTE_ILOG("Hello World from t1!"); });23 hetcompute::task<>* unsafe_t1 = t1.get();45 hetcompute::task_ptr<> t2 = hetcompute::create_task([unsafe_t1

] { HETCOMPUTE_ILOG("Hello World from t2!"); });67 hetcompute::task_ptr<> t3 = hetcompute::create_task([&

unsafe_t1] { HETCOMPUTE_ILOG("Hello World from t3!"); });

4.3.4 Life of a HetCompute Task

A HetCompute task transitions through various stages, from the time it is created to the time it finishes(completes execution successfully or is canceled – the combined state Finished) is called. The figure belowpresents a high-level view of a task’s lifetime. The nodes represent task states and the numbered edgesrepresent transitions from one state to the next. Most of the operations on the tasks (throughhetcompute::task_ptr<> or hetcompute::task<>∗) result in a task transitioning from onestate to another.

Figure 4-5 Lifetime of a HetCompute Task

The state of a task determines the operations permitted on it. Understanding task states and transitions isuseful to debug both correctness and performance issues with your HetCompute program. For example, asshown below, the program may hang because a task which is never launched is waited for.

auto t = hetcompute::create_task([]{});// wait for the task without launching itt->wait_for(); // never released

Stepping through the operations on a task with the above task state diagram in mind may help to identifyand fix errors such as the one in the code snippet.

Note

Neither the task states nor the transitions between them are part of the HetCompute API. They aredescribed herein merely for the sake of understanding.

Most HetCompute tasks take the green line to successful completion, while some take the red line due to avariety of reasons, e.g., throwing an exception. Some tasks may directly be created in the Launched state

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 79

Page 80: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

(via hetcompute::launch(Code&&, Args&&...)), while others may directly be created andlaunched in the Ready state (via hetcompute::launch(Code&&, Args&&...) with all argumentsready).

4.3.4.1 The Green Line to Successful Completion

A task created using hetcompute::create_task(Code&&),hetcompute::launch(Code&&), etc. has at least one hetcompute::task_ptr<> alive in usercode. This hetcompute::task_ptr<> may be used to perform operations on the task. The statetransitions along the green line are described below:

• (1) After setting up any control dependencies (via hetcompute::task<>::then()) or data dependencies(via hetcompute::task<ReturnType(Args...)>::bind_all()) from other tasks, usehetcompute::task<>::launch() to register the task with the HetCompute runtime system.No further dependencies may be added to a Launched task.

• (2) After all tasks on which a task is control- or data-dependent have transitioned to Completed, thetask becomes Ready for execution.

• (3) When an appropriate execution resource (CPU, GPU, or DSP) becomes available and any otherresources such as hetcompute::buffers that may be used by the task become available, the tasktransitions to the Running state.

• (4) Finally, after the task completes execution successfully, and any other tasks after which it is set tofinish (via hetcompute::task<>::finish_after()) have also finished, the tasktransitions to the Completed state.

4.3.4.2 The Red Line to Cancellation

Some tasks may not execute succesfully (e.g., throw an exception), may be canceled programmatically, etc.;such tasks take the red line and end up in the Canceled state. See Exceptions and Cancellation for moredetails. The state transitions along the red line are described below:

• (5) If a task or any task on which it is control- or data-dependent is canceled viahetcompute::task<>::cancel(), the task transitions to the Canceled state.

• (6) If a task or any task on which it is control- or data-dependent is canceled viahetcompute::task<>::cancel(), the task transitions to the Canceled state.

• (7) If a task or any task on which it is control- or data-dependent is canceled viahetcompute::task<>::cancel(), the task transitions to the Canceled state.

• (8) If a task or any task on which it is control- or data-dependent is canceled viahetcompute::task<>::cancel(), the task transitions to the Canceled state. Additionally,an executing task may throw an exception as a result of which the task and its successor tasks arecanceled. An executing task may choose to ignore a cancellation request and may completesuccessfully. Alternatively, an executing task may accept a cancellation request and choose to abort(e.g., via hetcompute::abort_on_cancel()) and transition to the Canceled state.

• (9) Finally, some created tasks may never be launched. When the lasthetcompute::task_ptr<> pointing to such a task goes out of scope, the task is automaticallycanceled. Any successor tasks are also canceled.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 80

Page 81: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4.3.5 Launching Tasks

Tasks do not execute unless they are launched. In Creating Tasks how to create and launch tasks aredescribed using hetcompute::launch(...) and hetcompute::group::launch(...). Taskslaunched in this way execute as soon as hardware contexts are available.

Note: hetcompute::group::launch(Code&&, Args&&...) does not provide ahetcompute::task_ptr<...> and therefore becomes anonymous and cannot be part of aDAG.

Tasks that are part of a DAG must be created using

1. template<typename ReturnType, typename... Args>hetcompute::task_ptr<ReturnType>hetcompute::create_value_task(Args&& ...args)

2. template<typename Code, typename... Args>collapsed_task_type<Code> hetcompute::create_task(Code&&,Args&&...)

3. template<typename Code, typename... Args>non_collapsed_task_type<Code>hetcompute::create_task(do_not_collapse_t, Code&&, Args&&...)

These template methods return a task pointer that can be used to set up dependencies between tasks. Oncethe control dependencies of a task t are set, use

1. t->launch(args...) to launch the task and bind the arguments for data dependency;

2. t->launch() to launch the task, but the arguments need to be bound already byt->bind_all(args...). For more information about task argument binding, see UnleashingAsynchrony and Tasks.

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();7 // Create a task.8 auto t1 = hetcompute::create_task([](int x) { HETCOMPUTE_ILOG("Hello World! x =

%d\n", x); });910 // t1 is ready, launch it and bind the argument.11 t1->launch(42);1213 // Wait for t1 to finish.14 t1->wait_for();1516 // Create a task.17 auto t2 = hetcompute::create_task([](int x) { HETCOMPUTE_ILOG("Hello World! x =

%d\n", x); });1819 // Bind the task argument to t2.20 t2->bind_all(73);2122 // t2 is ready, launch it.23 t2->launch();2425 // Wait for t2 to finish.26 t2->wait_for();2728 hetcompute::runtime::shutdown();

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 81

Page 82: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

29 return 0;30 }

The method hetcompute::launch(...) and hetcompute::group::launch(...) informsthe HetCompute runtime that the task is ready to execute as soon as a hardware context is available andafter all its predecessors have executed. In the following example, task t2 launches, but it will neverexecute because its predecessor t1 has not executed, and therefore, this task will not execute:

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();7 // Create task t1.8 auto t1 = hetcompute::create_task([] { HETCOMPUTE_ILOG("Hello World from t1!");

});910 // Create task t2.11 auto t2 = hetcompute::create_task([] { HETCOMPUTE_ILOG("Hello World from t2!");

});1213 // Set dependency between t1 and t2 so that t2 won’t execute until t1 finishes.14 t1->then(t2);1516 // Launch t2.17 t2->launch();1819 // if uncommented, the wait_for below will never return20 // because t2 will not execute until t121 // does. However, t1 has not been launched.22 // t2->wait_for();2324 hetcompute::runtime::shutdown();25 return 0;26 }

Notice that launching a task means that it is not possible to add any new predecessors, although you can addsuccessors. The reason is that, by launching the task, the programmer is asking the HetCompute runtime toexecute the task as soon as possible. By the time the programmer tries to add a new predecessor to the task,the task might have already executed, and adding a predecessor to an already-executed task is not allowed.

Tasks can launch only once. Any subsequent calls to hetcompute::task<>::launch(...) orhetcompute::group::launch(...) on a task do not cause the task to execute again. Calls ofhetcompute::group::launch(...) on a task might, however, cause the task to be added to newgroups. See Task Groups

4.3.6 Task Dependencies

Often, the concurrency in an application is not regularly structured or is highly dynamic, and thereforecannot be easily expressed through any composition of the HetCompute Parallel Programming Patterns. Toexpress such irregular or dynamic concurrency, HetCompute provides a means to specify control and datadependencies among tasks. The programmer can construct rich acyclic task graphs that span the CPU,GPU, and DSP, in a unified fashion. A task may have multiple predecessors and successors in the taskgraph. A task becomes ready for execution only after all its (control and data dependency) predecessorshave completed successfully.

Note

While control dependencies can be set up among any CPU, GPU, or DSP tasks; data dependencies canbe set up only among CPU tasks in the current HetCompute release.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 82

Page 83: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4.3.6.1 Control Dependencies

Control dependencies may be set up among tasks using hetcompute::task<>::then() to specifythe relative order of task execution. The following example shows how to ensure that task t1 executesbefore task t2. The HetCompute runtime guarantees that t2 will not begin execution until t1 completesexecution, regardless of how many hardware execution contexts are available in the system.

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();78 auto t1 = hetcompute::create_task([] { HETCOMPUTE_ILOG("Hello "); });910 auto t2 = hetcompute::create_task([] { HETCOMPUTE_ILOG("World!"); });1112 // Ensure that t1 executes before t213 t1->then(t2);1415 t1->launch();16 t2->launch();1718 t2->wait_for();1920 hetcompute::runtime::shutdown();21 return 0;22 }

Note

In the example above, the statement t1->then(t2) guarantees that t1 finishes before t2 beginsexecution. Consequently, it suffices to just wait_for t2 to finish to ensure that both t1 and t2finish.

4.3.6.2 Data Dependencies

A task t2 can be data-dependent on another task t1 as the example below shows:

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();78 auto t1 = hetcompute::create_task([] { return 42; });910 auto t2 = hetcompute::create_task([](int i) { HETCOMPUTE_ILOG("The answer to

life the universe and everything = %d", i); });1112 // Set up data dependency from t1 to t213 t2->bind_all(t1);1415 t1->launch();16 t2->launch();1718 t2->wait_for();1920 hetcompute::runtime::shutdown();2122 return 0;23 }

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 83

Page 84: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

In the example, the data dependency is specified usinghetcompute::task<ReturnType(Args...)>::bind_all(). Binding a task argument to ahetcompute::task_ptr<ReturnType> (or hetcompute::task_ptr<ReturnType(Args...)>) implicitly creates adata dependency from the latter to the former. Specifying data dependencies in this manner createsequivalence to normal binding of values to function arguments, as the following example illustrates:

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();78 auto t1 = hetcompute::create_task([] { return 42; });910 auto t2 = hetcompute::create_task([](int i, float f) { HETCOMPUTE_ILOG("int %d

float %f", i, f); });1112 // Bind i to t1: data dependency from t1 to t213 // Bind f to value: normal function argument binding14 t2->bind_all(t1, 42.0f);1516 t1->launch();17 t2->launch();1819 t2->wait_for();2021 hetcompute::runtime::shutdown();2223 return 0;24 }

bind_all must be invoked on a task_ptr of type hetcompute::task_ptr<ReturnType(-Args...)> with argument of type hetcompute::task_ptr<ReturnType> orhetcompute::task_ptr<ReturnType(Args...)>. bind_all cannot be called on plainhetcompute::task_ptr<>s.

auto t1 = hetcompute::create_task([]{return 42;});auto t2 = hetcompute::create_task([](int i) {/*do something*/});hetcompute::task_ptr<> t = t2;t2->bind_all(t1); // Allowedt->bind_all(t1); // Not allowed

auto t1 = hetcompute::create_task([]{return 42;});hetcompute::task_ptr<> t = t1;auto t2 = hetcompute::create_task([](int i) {/*do something*/});t2->bind_all(t1); // Allowedt2->bind_all(t); // Not allowed

Note

In this release of HetCompute, task arguments cannot be references.

For programmatic convenience, data dependencies can be specified similar to task argument binding at anyone of the following points:

• hetcompute::create_task()

• hetcompute::task<ReturnType(Args...)>::bind_all()

• hetcompute::task<ReturnType(Args...)>::launch()

The following example illustrates the above:

1 #include <hetcompute/hetcompute.hh>23 int

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 84

Page 85: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4 main()5 {6 hetcompute::runtime::init();78 auto t1 = hetcompute::create_task([] { return 42; });910 t1->launch();1112 auto t2 = hetcompute::create_task([](int i) { HETCOMPUTE_ILOG("The answer to

life the universe and everything = %d", i); },13 t1); // Set up data dependency from t1 to t2 when t2 is created14 t2->launch();1516 auto t3 = hetcompute::create_task([](int i) { HETCOMPUTE_ILOG("The answer to

life the universe and everything = %d", i); });17 // Set up data dependency from t1 to t318 t3->bind_all(t1);19 t3->launch();2021 auto t4 = hetcompute::create_task([](int i) { HETCOMPUTE_ILOG("The answer to

life the universe and everything = %d", i); });22 // Set up data dependency from t1 to t4 at launch-time23 t4->launch(t1);2425 t2->wait_for();26 t3->wait_for();27 t4->wait_for();2829 hetcompute::runtime::shutdown();30 return 0;31 }

Tasks t2, t3, and t4 are all data-dependent on task t1. But the data dependency is specified throughdifferent HetCompute API calls.

Use HetCompute data dependencies to easily and automatically manage lifetime of data accessed by tasks.The data returned by a task is kept alive until the point the task finishes execution and all references to thetask (through hetcompute::task_ptr<>) go out of scope.

4.3.6.3 Heterogeneous Task Graphs

As mentioned previously, tasks can encapsulate computation to be executed on the CPU, GPU, or DSP. Thisallows the creation of directed acyclic graphs (DAGs) of tasks whose execution spans across all threeprocessing units:

1 #include <hetcompute/hetcompute.hh>23 // header to include the dsp bindings, it is generated by the Hexagon SDK4 #include <include/hetcompute_dsp.h>5 #include "dsp_task_helper.hh"67 static std::string const gpu_fn_string = "__kernel void gpu_fn() {"8 "}";910 int11 main()12 {13 void* adsp_domain_handle = dlopen("libhetcompute_adsp.so", RTLD_NOW);14 void* cdsp_domain_handle = dlopen("libhetcompute_cdsp.so", RTLD_NOW);1516 hetcompute::runtime::init();1718 // This is ensure all task objects are freed before we call shutdown19 {20 int i = 0;2122 // CPU task

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 85

Page 86: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

23 auto t1 = hetcompute::create_task([] {});2425 // Create a dsp_kernel from a DSP function for aDSP26 auto adsp_kernel = create_dsp_kernel_by_domain<const int>(adsp_domain_handle, "

hetcompute_dsp_return_input");27 adsp_kernel.set_adsp();2829 auto t2 = hetcompute::create_task(adsp_kernel, i);3031 // Create a dsp_kernel from a DSP function for cDSP32 auto cdsp_kernel = create_dsp_kernel_by_domain<const int>(cdsp_domain_handle, "

hetcompute_dsp_return_input");33 cdsp_kernel.set_cdsp();3435 auto t3 = hetcompute::create_task(cdsp_kernel, i);3637 // Create a GPU kernel from an OpenCL string38 auto gpu_kernel = hetcompute::create_gpu_kernel<>(gpu_fn_string, "gpu_fn");39 auto t4 = hetcompute::create_task(gpu_kernel,

hetcompute::range<1>(1024));4041 // CPU task42 auto t5 = hetcompute::create_task([i] { return i; });4344 // CPU task45 auto t6 = hetcompute::create_task([](int) { /*do something*/ });4647 // Create a heterogeneous task DAG consisting of control and data48 // dependencies49 t1->then(t2)->then(t3)->then(t5);50 t3->then(t4)->then(t6);51 t6->bind_all(t5);5253 // Launch tasks for execution54 t1->launch();55 t2->launch();56 t3->launch();57 t4->launch();58 t5->launch();59 t6->launch();6061 // Wait for the DAG to finish62 t6->wait_for();63 // Note that task dependencies ensure that tasks t1, t2, t3, t4 and t5 have also64 // finished by this point.65 }6667 hetcompute::runtime::shutdown();6869 dlclose(adsp_domain_handle);70 dlclose(cdsp_domain_handle);71 return 0;72 }

In the above example, the programmer creates CPU tasks on lines 23, 42, and 45; a Hexagon aDSP task online 29, a cDSP task on line 35; and a GPU task on line 27. The programmer then sets up a combination ofcontrol and data dependencies among these tasks on lines 49 through 51. These dependencies create aheterogeneous task graph as the figure below shows.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 86

Page 87: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

Figure 4-6 Heterogeneous task graph

Subsequently, on lines 54 through 59, the programmer launches the task graph for execution and waits forits completion on line 62.

Warning

A cycle in the DAG may cause deadlock. For performance reasons, HetCompute does not checkwhether there are cycles in the DAG. The programmer is responsible for avoiding them.

All the predecessor dependencies of a task must be specified before launching the task. Specifyinginter-task dependencies in this manner ahead of task execution provides information to the HetComputeruntime system allowing it to schedule the tasks intelligently, optimizing for performance and power. Whilethis is the preferred method, alternative means to specify inter-task dependencies exist: After a task startsexecution, it can invoke hetcompute::task<>::wait_for()(Waiting for Tasks) orhetcompute::task<>::finish_after()(finish_after) on some hetcompute::task_ptr<>. Using thesemethods limits the scope of optimization in the HetCompute runtime task scheduler. The followingexample illustrates the two different methods:

1 #include <hetcompute/hetcompute.hh>23 void task_dependency();4 void task_waiting();56 void7 task_dependency()8 {9 auto t1 = hetcompute::create_task([] { HETCOMPUTE_ILOG("Hello "); });1011 auto t2 = hetcompute::create_task([] { HETCOMPUTE_ILOG("World!"); });1213 // Ensure that t1 executes before t214 // *PREFERRED METHOD* of specifying task dependency15 t1->then(t2);1617 t1->launch();18 t2->launch();1920 t2->wait_for();21 }2223 void24 task_waiting()25 {26 auto t1 = hetcompute::create_task([] { HETCOMPUTE_ILOG("Hello "); });2728 auto t2 = hetcompute::create_task([t1] {29 // Wait for t1 to finish execution

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 87

Page 88: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

30 // *LESS PREFERRED METHOD* of specifying task dependency31 t1->wait_for();32 HETCOMPUTE_ILOG("World!");33 });3435 t1->launch();36 t2->launch();3738 t2->wait_for();39 }4041 int42 main()43 {44 hetcompute::runtime::init();45 // preferred46 task_dependency();47 // less preferred48 task_waiting();49 hetcompute::runtime::shutdown();50 }

4.3.7 Task Groups

One of the most common parallel programming patterns is fork and join. The idea is quite intuitive. At thefork, the application splits the job-to-be-done into many tasks. At the join, the application waits for all ofthem to complete before continuing with its execution. An example of fork and join could be styling awebsite. During the styling phase, a parallel web browser could traverse the DOM tree and spawn one taskper DOM element. Each of the tasks would be responsible for styling an element. The browser can onlyrender the page once all the styling tasks complete.

In this scenario, waiting for each individual task would be cumbersome: the programmer would have tostore the task pointers into some container, and call hetcompute::task::wait_for for each of themafter traversing the DOM tree. A group is a HetCompute abstraction that allows the programmer to wait fora set of tasks (each running on a different kind of device) to complete, relieving the programmer fromhaving to wait for each task separately.

In addition to the styling tasks, the parallel browser would have launched other tasks to do HTML parsing,Javascript execution, etc. A logical design decision would be to have all tasks working on the same page tobelong to the same group. Notice that the styling tasks would need to belong to two groups: the "PageXYZ" group, and the "Page XYZ-styling" group. HetCompute supports tasks that belong to multiplegroups.

Cancellation is another important operation that can be done with groups. The methodhetcompute::group::cancel() cancels all the tasks in a group. In our parallel browser example,this would allow the browser to easily cancel all the tasks in the "Page XYZ" group when the user decidesto navigate to a new page before the current one displays.

In summary, groups are sets of tasks that can be canceled or waited for as a unit.

4.3.7.1 Group Creation

Use hetcompute::create_group() to create a new unnamed group. Usehetcompute::create_group (std::string const &name) to create a new group called<name>. Group names are only used for debugging applications, HetCompute does not check forduplicate group names. These functions return a hetcompute::group_ptr pointing to the createdgroup. Use hetcompute::group::get_name() to get the group name. The following codeillustrates hetcompute::create_group and group::get_name to create one unnamed and two

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 88

Page 89: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

named groups and then display their names.

1 #include <cassert>2 #include <string>3 #include <hetcompute/hetcompute.hh>45 int6 main()7 {8 hetcompute::runtime::init();9 // Create group named "Example 1"10 auto g1 = hetcompute::create_group("Example 1");1112 // Create group named "Example 2"13 std::string g2_name("Example 2");14 auto g2 = hetcompute::create_group(g2_name);1516 // Create unnamed group17 auto g3 = hetcompute::create_group();1819 HETCOMPUTE_ILOG("g1 name = %s\n", g1->get_name().c_str());20 HETCOMPUTE_ILOG("g2 name = %s\n", g2->get_name().c_str());21 HETCOMPUTE_ILOG("g3 name = %s\n", g3->get_name().c_str());2223 hetcompute::runtime::shutdown();24 return 0;25 }

4.3.7.1.1 The group_ptr pointer

The method hetcompute::group_ptr hetcompute::create_group(std::stringconst& name) returns an object of type hetcompute::group_ptr, which is a custom smartpointer to the group object. hetcompute::group_ptr pointers behave similarly tohetcompute::task_ptr. Therefore, groups are reference counted, and they are automaticallydestroyed when there are no more hetcompute::group_ptr pointers pointing to them and there is notask in the group (an empty group). This means that even if the user has no pointers to the group,HetCompute will not destroy the group until all its tasks complete.

4.3.7.2 Launching Tasks or Kernels to Groups

There are three ways to add a task or kernel into a group: by creating and Launching, just by launching, orby directly adding without launching it

Creating and Launching

By creating a new task and immediately launching it using voidhetcompute::group::launch (Code && code, Args &&... args<). Use thismethod when the task (or kernel) does not have predecessor or successors. This is the most performantway to create and launch a task in HetCompute. This is because the HetCompute runtime knows thatthe programmer has no pointer to the task and it can perform aggressive optimizations. It is for thisreason that this method does not return a hetcompute::task_ptr. Use this method as much asyou can: This method also binds its arguments args to the arguments of the task or kernel that isbeing launched. The following example creates multiple tasks and binds their arguments whenlaunching them all into a group:

1 #include <hetcompute/hetcompute.hh>23 static void4 do_something(int, int)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 89

Page 90: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

5 {6 }78 int9 main()10 {11 hetcompute::runtime::init();12 // Create group g13 auto g = hetcompute::create_group();1415 // Create tasks from l and launch them into g16 for (int i = 0; i < 10; i++)17 for (int j = 10; j < 20; j++)18 g->launch(do_something, i, j);1920 // Wait for all the tasks in group g to complete21 g->wait_for();2223 hetcompute::runtime::shutdown();24 return 0;25 }

Launching

By launching an existing task using void hetcompute::group::launch(hetcompute::task_ptr<> const & task) . If the task has arguments that need to bebound when launched, the arguments can be bound by using hetcompute::task::bind_all beforelaunching. Alternatively, you can use void hetcompute::group::launch(hetcompute::task< TaskType > ∗ task, FirstArg && first_arg,RestArgs &&... rest_args) which binds the arguments as you launch. In the followingexample, the constant 42 will be bound to the argument x of the task when launched into the group:

1 #include <stdio.h>2 #include <hetcompute/hetcompute.hh>34 int5 main()6 {7 hetcompute::runtime::init();8 // Create group9 auto g = hetcompute::create_group();1011 // hello is a fully-typed task pointer of type12 // hetcompute::task_ptr<void(int)>13 auto hello = hetcompute::create_task([](int x) { HETCOMPUTE_ILOG("Hello World

%d!\n", x); });1415 // Bind hello to 42 and launch task into g16 g->launch(hello, 42);1718 // Wait for g to be empty19 g->wait_for();2021 hetcompute::runtime::shutdown();22 }

Adding

By adding an existing task to a group using hetcompute::group::add(task_ptr<>const& task). This function does not launch the group and the launch has to be done separately.

1 #include <stdio.h>2 #include <hetcompute/hetcompute.hh>3

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 90

Page 91: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4 int5 main()6 {7 hetcompute::runtime::init();8 // Create group g.9 auto g = hetcompute::create_group();1011 // Create task t1. Its type is hetcompute::task_ptr<void()>12 auto t1 = hetcompute::create_task([] { HETCOMPUTE_ILOG("Hello World from t1!\n")

; });1314 // Add task t1 to group g, but do not launch it.15 g->add(t1);1617 auto t2 = hetcompute::launch([t1] {18 // Launch t1. Because it already belongs to group g, there is no19 // reason to use hetcompute::group::launch.20 t1->launch();21 });2223 // Wait for tasks in group g to complete.24 g->wait_for();25 hetcompute::runtime::shutdown();2627 return 0;28 }

Use these methods when the task is part of a DAG. For example, there is a dependency between t1 and t2in the following example:

1 #include <stdio.h>2 #include <hetcompute/hetcompute.hh>34 int5 main()6 {7 hetcompute::runtime::init();8 // Create Example group9 auto g = hetcompute::create_group();1011 // Create tasks12 auto t1 = hetcompute::create_task([] { HETCOMPUTE_ILOG("Hello World from t1!\n")

; });13 auto t2 = hetcompute::create_task([] { HETCOMPUTE_ILOG("Hello World from t2!\n")

; });1415 // Launch t1 into g,16 g->launch(t1);1718 // Use t119 t1->then(t2);2021 // Launch t2 into g22 g->launch(t2);2324 // Wait for tasks to complete25 g->wait_for();2627 hetcompute::runtime::shutdown();28 return 0;29 }

Regardless of the method used, the following rules always apply:

• Tasks stay in the group until they complete execution. Once a task is added to a group, there is noway to remove it from the group.

• Once a task belonging to multiple groups completes execution, HetCompute removes it from all thegroups it belongs.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 91

Page 92: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

• Neither completed nor canceled tasks can join groups.

• Tasks can not be added to a canceled group.

4.3.7.2.1 Adding a Task to Multiple Groups

There are two ways to add a task to multiple groups:

• By launching it into each of the groups. For example, to add task t to groups g1 and g2, thefollowing would be performed:

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();7 // Create task8 auto t = hetcompute::create_task([] { HETCOMPUTE_ILOG("Hello World!\n"); });910 // Create groups11 auto g1 = hetcompute::create_group("Example 1");12 auto g2 = hetcompute::create_group("Example 2");1314 // Launch t into g1 and g215 g1->launch(t);16 g2->launch(t);1718 t->wait_for();19 hetcompute::runtime::shutdown();20 }

Notice that, in the example above, t joins both g1 and g2, but it only executes once. Therefore, the codesnippet outputs a single ’Hello World!’. However, t might never join g2 because it might completeexecution before the first launch returns. Remember that completed tasks can never join groups.

• By creating a new group that is the intersection of all the groups where the task needs to launch, andthen launch the task into it. hetcompute::group_ptr intersect(hetcompute-::group_ptr const& a, hetcompute::group_ptr const& b) returns a grouppointer to a group that represents the intersection of the two groups passed as arguments. Thismethod is more performant than repeatedly launching the same task into different groups.

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();7 // Create task8 auto t = hetcompute::create_task([] { HETCOMPUTE_ILOG("Hello World!\n"); });910 // Create groups11 auto g1 = hetcompute::create_group("Example 1");12 auto g2 = hetcompute::create_group("Example 2");1314 auto g12 = hetcompute::intersect(g1, g2);1516 // Launch t into g1 and g217 g12->launch(t);18 hetcompute::runtime::shutdown();19 }

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 92

Page 93: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4.3.7.2.1.1 Group Intersection

It is important to understand what group intersection really means, because it might appear counterintuitive.hetcompute::intersect returns a pointer to a group that represents the intersection of two or moregroups. Launching a task into the intersection group means simultaneously launching it into all the groupsthat are part of the intersection.

For example, the following code snippet shows an application with two groups, g1 and g2, with 3000 and2000 tasks in each, respectively.

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();7 // Create groups8 auto g1 = hetcompute::create_group("Group 1");9 auto g2 = hetcompute::create_group("Group 2");1011 auto g12 = hetcompute::intersect(g1, g2);1213 for (int i = 0; i < 3000; i++)14 g1->launch([] {15 //... Do something16 });1718 for (int i = 0; i < 2000; i++)19 g2->launch([] {20 //... Do something21 });2223 // Returns immediately. g12 is empty24 g12->wait_for();2526 // Return only after tasks in g1 and g2 complete27 g1->wait_for();28 g2->wait_for();2930 g12->launch([] {31 //... Calculate the Ultimate Question of Life,32 // the Universe, and Everything33 HETCOMPUTE_ILOG("42\n");34 });3536 // All will return after the task prints 4237 g1->wait_for();38 g2->wait_for();39 hetcompute::runtime::shutdown();4041 return 0;42 }

In line 11, g1 and g2 are intersected into g12. The returned pointer, g12, points to an empty groupbecause no task belongs to both g1 and g2 yet. Therefore, g12->wait_for() in line 24 returnsimmediately. The wait_for calls in lines 27 and 28 only return when their tasks complete. In line 30, anew task is launched into g12. The wait_for calls in lines 37 and 38 only return after that taskcompletes execution because t belongs to both g1 and g2 (and, of course, g12).

Note: You can use the & operator instead of hetcompute::intersect:

Keep in mind that group intersection is a somewhat expensive operation. If you need to intersect groupsrepeatedly, just do it once and keep the pointer to the group intersection alive.

1 #include <memory>2 #include <hetcompute/hetcompute.hh>

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 93

Page 94: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

34 static void5 do_something()6 {7 }89 int10 main()11 {12 hetcompute::runtime::init();13 // Create groups14 auto g1 = hetcompute::create_group("Example group 1");15 auto g2 = hetcompute::create_group("Example group 2");16 auto g12 = g1 & g2;1718 // Launch 10 tasks into g1 and g219 for (int i = 0; i < 10; i++)20 {21 g12->launch(do_something);22 }2324 g12->wait_for();25 hetcompute::runtime::shutdown();26 return 0;27 }

Therefore, the code snippet above is faster than the one below:

1 #include <cassert>2 #include <hetcompute/hetcompute.hh>34 static void5 do_something()6 {7 }89 int10 main()11 {12 hetcompute::runtime::init();13 // Create groups14 auto g1 = hetcompute::create_group("Example group 1");15 auto g2 = hetcompute::create_group("Example group 2");1617 // Launch 10 tasks into g1 and g218 for (int i = 0; i < 10; i++)19 {20 (g1 & g2)->launch(do_something);21 }2223 (g1 & g2)->wait_for();24 hetcompute::runtime::shutdown();25 return 0;26 }

Consecutive calls to hetcompute::intersect with the same groups pointer as arguments return apointer to the same group. In addition, group intersection is commutative:

1 #include <cassert>2 #include <stdio.h>3 #include <hetcompute/hetcompute.hh>45 int6 main()7 {8 hetcompute::runtime::init();9 // Create groups10 auto g1 = hetcompute::create_group("Group 1");11 auto g2 = hetcompute::create_group("Group 2");1213 // Get pointer to intersection groups:

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 94

Page 95: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

14 auto g12 = g1 & g2;15 auto g21 = g2 & g1;1617 // This assert will never fire18 assert(g12 == g21);19 hetcompute::runtime::shutdown();20 }

and associative:

1 #include <cassert>2 #include <stdio.h>3 #include <hetcompute/hetcompute.hh>45 int6 main()7 {8 hetcompute::runtime::init();9 // Create groups10 auto g1 = hetcompute::create_group("Group 1");11 auto g2 = hetcompute::create_group("Group 2");12 auto g3 = hetcompute::create_group("Group 3");1314 // Get pointers to intersection groups:15 auto g12_3 = (g1 & g2) & g3;16 auto g1_23 = g1 & (g2 & g3);17 auto g2_13 = g2 & (g1 & g3);1819 // These asserts will never fire20 assert(g12_3 == g1_23);21 assert(g12_3 == g2_13);22 hetcompute::runtime::shutdown();23 }

4.3.7.2.2 Waiting For a Group

hetcompute::group::wait_for() does not return until all the tasks in it have completed executionor have been canceled.

1 #include <hetcompute/hetcompute.hh>2 #include <stdio.h>34 int5 main()6 {7 hetcompute::runtime::init();8 // Create group g9 auto g = hetcompute::create_group();1011 // Launch 10 tasks into g12 for (int i = 0; i < 10; i++)13 {14 g->launch([i] { HETCOMPUTE_ILOG("Hello World! I’m task #%d\n", i); });15 }1617 // Wait for tasks to complete and exit group18 g->wait_for();19 hetcompute::runtime::shutdown();2021 return 0;22 }

Note: As in hetcompute::task::wait_for(), if hetcompute::group::wait_for() iscalled from within a task, HetCompute context switches the task and finds another task to run. Ifcalled from outside a task, it blocks the calling thread until it returns.

Waiting for an intersection group means that HetCompute returns once the tasks in the intersection group

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 95

Page 96: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

have completed or canceled.

For example, g12->wait_for() in the following code returns immediately, because there are no tasksin g12. Neither g1->wait_for() nor g2->wait_for() would return.

// Create and launch two tasks that never endg1->launch([]{

while(1) {}});

g2->launch([]{while(1) {}});

// Returns immediately because there are no// tasks that belong to both g1 and g2g12->wait_for();

// Never returnsg1->wait_for();g2->wait_for();

hetcompute::group::wait_for() and is a safe point. For information about safe points, seeInteroperability.

hetcompute::group::finish_after is a non-blocking alternative to wait_for.hetcompute::group::finish_after returns immediately but the task calling it is guaranteed notto finish before all the tasks in group g are finished.

4.3.7.3 Group Cancellation

Use hetcompute::group::cancel() to cancel all the tasks in a group. Canceling a group meansthat:

• The group tasks that have not started execution will never execute.

• The group tasks that are executing will be canceled only when they callhetcompute::abort_on_cancel. If any of these executing tasks is a blocking task,HetCompute will execute its cancel handler if they had not executed it before.

• Any tasks added to the group after the group is canceled will also be canceled.

In the following example, 2000 tasks are launched and then sleep for some time so that a few of those 2000tasks are done, a few others are executing and a large majority are waiting to be executed. In line 26 thegroup is canceled. This means that next time the running tasks executehetcompute::abort_on_cancel they will see that their group has been canceled and will abort.g->wait_for() will not return before the running tasks end their execution — either because they callhetcompute::abort_on_cancel() or because they finish writing all the messages.

1 #include <atomic>2 #include <hetcompute/hetcompute.hh>34 using namespace std;56 int7 main()8 {9 hetcompute::runtime::init();10 // Counts the number of tasks that execute before the group gets11 // canceled12 atomic<size_t> counter;1314 auto group = hetcompute::create_group();

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 96

Page 97: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

1516 // Create 2000 tasks that increase an atomic counter17 for (int i = 0; i < 2000; i++)18 {19 group->launch([&counter] {20 counter++;21 usleep(7);22 });23 }2425 // Cancel group26 group->cancel();2728 // Wait for group to cancel29 try30 {31 group->wait_for();32 }33 catch (const hetcompute::aggregate_exception& e)34 {35 // If many tasks were canceled, they each propagate a36 // hetcompute::canceled_exception to the group, all of which get aggregated into37 // a single hetcompute::aggregate_exception.38 std::cout << "threw " << e.what() << " due to group cancellation " << std::endl;39 }40 catch (const hetcompute::canceled_exception& e)41 {42 // If all but one task finished by the time group cancellation took effect,43 // then the one remaining task which was canceled will propagate a single44 // hetcompute::canceled_exception.45 std::cout << "threw " << e.what() << " due to group cancellation " << std::endl;46 }47 catch (...)48 {49 // Never reached50 }51 HETCOMPUTE_ILOG("wait_for returned after %zu tasks executed", counter.load());52 hetcompute::runtime::shutdown();53 return 0;54 }

Like hetcompute::cancel(hetcompute::task_ptr const &),hetcompute::group::cancel() returns immediately. Use hetcompute::group::wait_forafter hetcompute::group::cancel()to block execution until the group is empty.

Warning

Once a group is canceled, it cannot be "uncanceled".

4.3.8 Waiting for Tasks

Waiting for tasks is a necessary evil in parallel programming. Because computation is launchedasynchronously, most algorithms will need a mechanism to guarantee that the computation is completed.There are several methods to ensure that in Qualcomm HetCompute, such as setting dependencies betweensuccessive computations or using other forms of nonblocking synchronization Unleashing Asynchrony.While it is recommended to avoid blocking waits, sometimes, you will just have to wait. This will be shownin this section.

The method hetcompute::task<>::wait_for() does not return until the task completesexecution. It returns immediately once the task completes or cancels.

Note: If hetcompute::task<>::wait_for() is called from within a task, HetCompute

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 97

Page 98: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

context-switches the task and finds another one to run. If called from outside a task (that is, the mainthread), HetCompute blocks the thread until hetcompute::task<>::wait_for() returns(see Interoperability).

Note: It is helpful to use HetCompute primitives for blocking, because you communicate your intent to theruntime. Therefore, it can take actions to continue to fully utilize the available resources, rather thanjust spinning.

Both hetcompute::task<>::wait_for() and hetcompute::group::wait_for() (TaskGroups) are a safe points. For information about safe points, see Interoperability.

4.3.9 Exceptions and Cancellation

In C++, exceptions provide a means to transfer control flow to special code blocks to gracefully handleexceptional circumstances (e.g. runtime errors). In the sequential example below, function get_charthrows an exception due to illegal string access, and the exception is propagated up the call stack to mainwhich catches the exception and responds appropriately. The purpose of the exception in this example is tonotify callers of get_char that the function may not have completed successfully and therefore all of itsnormal side effects (the write to the global variable c in the example) may not have taken effect.

1 #include <iostream>2 #include <string>34 #include <hetcompute/hetcompute.hh>56 void get_char();78 char c;910 void11 get_char()12 {13 c = std::string().at(1); // Illegal access of empty string14 }1516 int17 main()18 {19 hetcompute::runtime::init();20 try21 {22 get_char();23 std::cout << "got character " << c << " from string " << std::endl;24 }25 catch (const std::out_of_range&)26 {27 // Deal with exception28 std::cerr << "illegal string access" << std::endl;29 }30 catch (...)31 {32 // Should never get here33 }34 hetcompute::runtime::shutdown();35 return 0;36 }

Now consider what happens when function get_char is executed asynchronously, possibly by a threaddifferent from the one that executes main. In the example below, the call stack of the thread executingget_char does not contain the continuation of task t, because the continuation is possibly in one or more(different) threads that synchronize with t through operations, such as wait_for. Consequently, normalC++ exception propagation up the thread’s call stack is not sufficient for asynchronous programs. In the

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 98

Page 99: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

example below, in the absence of a well-defined exceptions model for asynchronous programs, the mainthread waiting for task t on line 20 may resume execution without being aware that task t finishedunsuccessfully due to the exception.

1 #include <iostream>2 #include <string>34 #include <hetcompute/hetcompute.h>56 void get_char();78 char c;910 void11 get_char()12 {13 c = std::string().at(1); // Illegal access of empty string14 }1516 int17 main()18 {19 auto t = hetcompute::launch([] { get_char(); }); // Executed asynchronously (possibly

in a different thread)20 t->wait_for(); // Synchronization point21 std::cout << "got character " << c << " from string " << std::endl;22 return 0;23 }

HetCompute provides a well-defined exceptions model for asynchronous programs. If an asynchronousHetCompute construct (such as task, group, or pattern) does not successfully finish due to exceptionalcircumstances, HetCompute captures the thrown exceptions and rethrows them at sync points of theconstruct; these sync points are the dynamic program points at which synchronization operations on theconstruct are invoked. Recall that a properly synchronized parallel program will invoke synchronizationoperations on, e.g., a task t; such as wait_for, copy_value, move_value, or inter-task dependency;in order to ensure that any side effects of task t, such as return value and stores to global variables, arevisible to instructions after the synchronization point. Below is the example from before, modified tohandle exceptions in the face of asynchronous execution. Note how copy_value acts as asynchronization point to observe exceptions similar to wait_for.

1 #include <iostream>2 #include <string>34 #include <hetcompute/hetcompute.hh>56 char get_char();78 char9 get_char()10 {11 auto c = std::string().at(1); // Illegal access of empty string12 return c;13 }1415 int16 main()17 {18 hetcompute::runtime::init();19 auto t = hetcompute::launch([] { return get_char(); }); // Executed asynchronously

(possibly in a different thread)20 try21 {22 auto c = t->copy_value(); // Synchronization point23 std::cout << "got character " << c << " from string " << std::endl;24 }25 catch (const std::out_of_range&)26 {

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 99

Page 100: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

27 // Deal with exception28 std::cerr << "illegal string access" << std::endl;29 }30 catch (...)31 {32 // Should never get here33 }34 hetcompute::runtime::shutdown();35 return 0;36 }

4.3.9.1 Aggregate Exception

More than one task in a group of tasks or a pattern executed using a group of tasks may throw an exception.HetCompute captures all such exceptions in a hetcompute::aggregate_exception, which isthrown at sync points of the group or pattern. The example below illustrates its use. Note the use ofhetcompute::aggregate_exception::has_next andhetcompute::aggregate_exception::next to iterate through all the exceptions contained inhetcompute::aggregate_exception. The exceptions may be rethrown in any order due to theasynchronous nature of the constructs generating the exceptions.

1 #include <iostream>2 #include <string>34 #include <hetcompute/hetcompute.hh>56 void get_char();7 void get_num();89 char c;10 int n;1112 #ifdef _MSC_VER13 #pragma warning(disable : 4702)14 #endif1516 void17 get_char()18 {19 c = std::string().at(1); // Illegal access of empty string20 }2122 void23 get_num()24 {25 throw std::exception();26 n = 1;27 n = 2;28 }2930 int31 main()32 {33 hetcompute::runtime::init();34 auto g = hetcompute::create_group();35 g->launch([] { get_char(); }); // Executed asynchronously (possibly in a different thread)36 g->launch([] { get_num(); }); // Executed asynchronously (possibly in a different thread)37 try38 {39 g->wait_for(); // Synchronization point40 std::cout << "got character " << c << " from string " << std::endl;41 std::cout << "got num " << n << std::endl;42 }43 catch (hetcompute::aggregate_exception& e)44 {45 // Deal with all exceptions46 while (e.has_next())

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 100

Page 101: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

47 {48 try49 {50 e.next(); // throws contained exceptions one-by-one51 }52 catch (const std::out_of_range&)53 {54 std::cerr << "illegal string access" << std::endl;55 }56 catch (const std::exception& s)57 {58 std::cerr << s.what() << std::endl;59 }60 catch (...)61 {62 // Should never get here63 }64 }65 }66 catch (...)67 {68 // Should never get here69 }70 hetcompute::runtime::shutdown();71 return 0;72 }7374 #ifdef _MSC_VER75 #pragma warning(default : 4702)76 #endif

4.3.9.2 GPU/DSP Exception

If a GPU/DSP task encounters a runtime error, a corresponding hetcompute::gpu_exception orhetcompute::dsp_exception is thrown at the sync points of that task.

4.3.9.3 Cancellation

While exceptions are thrown from within an executing asynchronous construct, there is another externalreason as to why an asynchronous construct may finish unsuccessfully. In HetCompute, tasks, groups, andpatterns may all be canceled programmatically. Programmatic cancellation is very useful in a variety ofscenarios:

• If a background task, such as fetching data from a remote server, takes too long, the user may cancelit through the UI.

• If a group of tasks is engaged in searching a database, and one of them finds the intended data, theother tasks may be canceled to avoid unnecessary work and save energy.

While the source of exceptions is intrinsic to the asynchronous construct and the source of cancellation isextrinsic to the asynchronous construct, exceptions and cancellations both result in the asynchronousconstruct finishing unsuccessfully. Therefore, HetCompute deals with both exceptions and cancellation inan identical fashion. When a task, group, or pattern is canceled or throws an exception, HetComputerecords the fact. Subsequently, at sync points of the asynchronous construct, HetCompute throwshetcompute::canceled_exception or rethrows the original exception thrown by theasynchronous construct. If the exception is not handled at a sync point, then that exception propagates upthe call stack of the thread executing the sync point as per normal C++ exception propagation.

When a task is canceled or throws an exception, HetCompute cancels all of its successors in the task graph.Any synchronization with the now-canceled successor tasks will throw the appropriate exception(s).

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 101

Page 102: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4.3.9.4 Synchronization Points where Exceptions are Observable

The following is a comprehensive list of program points at which HetCompute may throw an exception dueto an asynchronous construct throwing an exception or getting canceled:

• for task t, t->wait_for()

• for task t, t->copy_value()

• for task t, t->move_value()

• for group g, g->wait_for()

Note

To observe an exception thrown by a task, group, or pattern, the programmer must synchronize withthe same through one of the above methods. If a sync point is not reached and the asynchronousconstruct (task_ptr or group_ptr) goes out of scope, the asynchronous construct is deemed asuseless and the exception is never rethrown anywhere.

4.3.9.5 Canceling a Task

There are three main ways to cancel an individual task t:

• If you have a pointer to the task, you can use t->cancel().

• To cancel a running task from within the task body, call hetcompute::abort_task().

• An unlaunched task is canceled when every hetcompute::task_ptr pointing to the task goesout-of-scope.

In this section, each of these cancellation methods is examined in detail.

4.3.9.5.1 t->cancel()

Use t->cancel() to cancel a task t and its successors. What happens to the task when the programmercalls t->cancel() depends on the status of the task.

4.3.9.5.1.1 Canceling a Task Before It Executes

If a task is canceled before it is launched, it never executes, even if it is launched later. In addition, theruntime will then cancel all successors in the task’s successor graph. In the following example, two tasksare created t1 and t2 and create a dependency between them. Notice that, if any of the tasks execute, itwill raise an assertion. In line 18, t1 is canceled, which causes t2 to be canceled as well. In line 21, t2 islaunched, but this has no effect because the task will not execute, as it was canceled when t1 propagated itscancellation.

1 #include <cassert>23 #include <hetcompute/hetcompute.hh>45 int6 main()7 {8 hetcompute::runtime::init();9 auto t1 = hetcompute::create_task([] { assert(false); });10

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 102

Page 103: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

11 auto t2 = hetcompute::create_task([] { assert(false); });1213 // Create control dependency.14 t1->then(t2);1516 // Cancel t1, which propagates cancellation to t217 t1->cancel();1819 // Launch t2. Does nothing, t2 got canceled via cancellation propagation20 t2->launch();2122 // Returns immediately, t2 is canceled.23 try24 {25 t2->wait_for();26 }27 catch (const hetcompute::canceled_exception& e)28 {29 std::cout << e.what() << ": t2 was canceled" << std::endl;30 }31 catch (...)32 {33 // Never reached34 }3536 hetcompute::runtime::shutdown();37 return 0;38 }

Similarly, if a task is canceled after it is launched, but before it starts executing, it never executes and willpropagate the cancellation request to its successors. In the following example, three tasks are created andchained, t1, t2 and t3. In line 22, t2 is launched, but it cannot execute because its predecessor has notyet executed. In line 25, t2 is canceled, which means that it will never execute. Because t3 is t2’ssuccessor, it is also canceled - if t3 had a successor, it would also be canceled.

1 #include <cassert>23 #include <hetcompute/hetcompute.hh>45 int6 main()7 {8 hetcompute::runtime::init();9 auto t1 = hetcompute::create_task([] { HETCOMPUTE_ILOG("Hello World from t1!\n")

; });1011 auto t2 = hetcompute::create_task([] { assert(false); });1213 auto t3 = hetcompute::create_task([] { assert(false); });1415 // Create dependencies16 t1->then(t2)->then(t3);1718 // Launch t2. It cannot execute as yet because t1 has not been launched.19 t2->launch();2021 // Cancel t2, which propagates cancellation to t322 t2->cancel();2324 // Launch t1. It will execute because no one canceled it.25 t1->launch();2627 // Returns after t1 completes execution28 t1->wait_for();29 hetcompute::runtime::shutdown();3031 return 0;32 }

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 103

Page 104: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4.3.9.5.1.2 Canceling a Task While It Executes

Canceling a task that is executing is more involved because HetCompute uses cooperativemultitasking . This means that, once a task is executing, it is not pre-empted unless it voluntarilycedes the processor (e.g., by calling hetcompute::task<>::wait_for()). Thus, it is up to thetask to check periodically whether or not it has been canceled. Usehetcompute::abort_on_cancel() inside a task body to abort the task immediately if the task, orany of the groups to which it belongs, have been canceled.

1 #include <cassert>23 #include <hetcompute/hetcompute.hh>45 int6 main()7 {8 hetcompute::runtime::init();910 auto t = hetcompute::create_task([] {11 while (1)12 {13 hetcompute::abort_on_cancel();14 HETCOMPUTE_ILOG("Waiting to be canceled.\n");15 usleep(100);16 }17 assert(false); // This will never fire.18 });1920 // Launch t.21 t->launch();2223 // Wait for 2 seconds.24 usleep(200);2526 // Cancel task. Returns immediately.27 t->cancel();2829 try30 {31 // Wait for the task.32 t->wait_for();33 }34 catch (const hetcompute::canceled_exception& e)35 {36 std::cout << e.what() << " thrown" << std::endl;37 }38 catch (...)39 {40 // Never reached.41 }4243 hetcompute::runtime::shutdown();44 return 0;45 }

In the example above, task t will never finish unless it is canceled. Task t is launched in line 16. Afterlaunching the task, HetCompute blocks for 2 seconds in line 19 to make sure that t is scheduled and printsits messages. In line 22, HetCompute is asked to cancel the task, which should be running by now. Themethod t->cancel() returns immediately after it marks the task as "pending for cancellation". Thismeans that t might still be executing after t->cancel() returns. That is why t->wait_for() iscalled in line 26, to make sure t waits to complete its execution.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 104

Page 105: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

Note

A task does not know whether someone has requested its cancellation unless it callshetcompute::abort_on_cancel() during its execution.

The method hetcompute::abort_on_cancel() never returns if the task has indeed been canceledbecause it throws an exception that the Qualcomm HetCompute runtime catches. For this reason, it isrecommended that you use Resource Acquisition Is Initialization (RAII) to allocateand deallocate the resources used inside a task. If using RAII in your code is not an option, surroundhetcompute::abort_on_cancel() with try - catch, and call throw from within the catchblock after the cleanup code:

1 #include <cassert>23 #include <hetcompute/hetcompute.hh>45 int6 main()7 {8 hetcompute::runtime::init();910 auto t = hetcompute::create_task([] {11 while (1)12 {13 try14 {15 hetcompute::abort_on_cancel();16 }17 catch (const hetcompute::abort_task_exception&)18 {19 //..do cleanup20 throw;21 }22 catch (...)23 {24 //..do cleanup25 throw;26 }27 // HETCOMPUTE_ILOG("Waiting to be canceled.\n");28 usleep(10);29 }30 assert(false); // This will never fire31 });3233 // Launch t34 t->launch();3536 // Wait for 20 micro-seconds.37 usleep(20);3839 // Cancel task. Returns immediately.40 t->cancel();4142 try43 {44 // Wait for the task to complete.45 t->wait_for();46 }47 catch (const hetcompute::canceled_exception& e)48 {49 std::cout << e.what() << " thrown" << std::endl;50 }51 catch (...)52 {53 // Never reached54 }5556 hetcompute::runtime::shutdown();

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 105

Page 106: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

57 return 0;58 }

Warning

If throw is replaced in line 12 of the previous example with return, the exception would notpropagate to the runtime, HetCompute would not consider the task as canceled, and, therefore, itssuccessors (if any) would not be canceled.

4.3.9.5.1.3 Canceling a Task After It Completes Execution

Canceling a task after it has been executed has no effect on the task, nor on its successors. In the followingexample, t1 and t2 are launched after a dependency is set up between them. On line 28, t1 is canceledafter it has completed. By then, t1 has finished execution (waiting for it in line 24) so t1->cancel()has no effect. Thus, nobody cancels t2 and t2->wait_for() in line 31 never returns, because t2remains stuck in an infinite loop.

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();7 auto t1 = hetcompute::create_task([] { HETCOMPUTE_ILOG("Hello World from t1!\n")

; });89 auto t2 = hetcompute::create_task([] {10 while (1)11 {12 hetcompute::abort_on_cancel();13 HETCOMPUTE_ILOG("Hello World from t2!\n");14 usleep(100);15 }16 });1718 // Create dependencies.19 t1->then(t2);2021 // Launch tasks.22 t1->launch();2324 // Wait for t1 to complete.25 t1->wait_for();2627 // Cancel t1.28 // Because it has already completed, it does not propagate its cancellation.29 t1->cancel();3031 // If the two lines below are uncommented the wait_for will never return.32 // t2->launch();33 // t2->wait_for();3435 hetcompute::runtime::shutdown();36 return 0;37 }

4.3.9.5.2 hetcompute::abort_task()

Running tasks call hetcompute::abort_task() to cancel themselves and their successors. Considerthe following example. Two tasks are created, t1 and t2, and create a dependency between them. Thebody of t1 is very simple: it prints a message ten times and then it aborts. Both are launced and wait for t1to complete its execution in line 31. Because t1 calls hetcompute::abort_task(), it is canceled

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 106

Page 107: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

and propagates its cancellation to its successor,t2.

1 #include <cassert>23 #include <hetcompute/hetcompute.hh>45 int6 main()7 {8 hetcompute::runtime::init();910 auto t1 = hetcompute::create_task([] {11 int i = 0;12 while (true)13 {14 HETCOMPUTE_ILOG("Hello World %d\n", i);15 sleep(1);16 i++;17 if (i == 10)18 {19 hetcompute::abort_task();20 }21 }22 // This will never fire23 assert(false);24 });2526 auto t2 = hetcompute::create_task([] {27 // This will never fire28 assert(false);29 });3031 t1 >> t2;3233 // Launch tasks34 t1->launch();35 t2->launch();3637 try38 {39 // Wait for t1 to complete.40 t1->wait_for();41 }42 catch (const hetcompute::canceled_exception& e)43 {44 std::cout << e.what() << " thrown when syncing with t1" << std::endl;45 }46 catch (...)47 {48 // Never reached49 }5051 try52 {53 // Returns immediately, t2 is canceled.54 t2->wait_for();55 }56 catch (const hetcompute::canceled_exception& e)57 {58 std::cout << e.what() << " thrown when syncing with t2" << std::endl;59 }60 catch (...)61 {62 // Never reached63 }6465 hetcompute::runtime::shutdown();66 return 0;67 }

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 107

Page 108: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4.3.9.5.3 Cancellation by Abandonment

When all the hetcompute::task_ptrs referencing an unlaunched task go out of scope, the task iscanceled and it propagates the cancellation to its successors. The reasoning is simple: a task t cannotlaunch without a task pointer, and none of its successors will ever be able to execute because t neverexecuted.

1 #include <cassert>23 #include <hetcompute/hetcompute.hh>45 void foo();67 void8 foo()9 {10 auto t1 = hetcompute::create_task([] {11 HETCOMPUTE_ILOG("Hello World from t1\n");12 // This will never fire13 assert(false);14 });1516 auto t2 = hetcompute::create_task([] {17 HETCOMPUTE_ILOG("Hello World from t2\n");18 // This will never fire19 assert(false);20 });2122 auto t3 = hetcompute::create_task([] {23 int i = 0;24 while (i++ < 10)25 {26 HETCOMPUTE_ILOG("Hello World from t3\n");27 sleep(1);28 };29 });3031 t1 >> t2;3233 t2->launch();34 t3->launch();3536 // t1, t2, and t3 go out of scope37 }3839 int40 main()41 {42 hetcompute::runtime::init();43 foo();44 hetcompute::runtime::shutdown();45 return 0;46 }

In the snippet above, three tasks are created t1, t2 and t3, and create a dependency between the first two.t2 and t3 are launched in lines 31 and 32. t2 cannot run because t1 has not yet executed. In line 35,foo() ends and the three pointers go out-of-scope. t1 is canceled because it is not yet launched. t2 iscanceled because t1 propagated its cancellation. t3 does not get canceled and will run even after foo()goes out-of-scope.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 108

Page 109: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4.3.10 Blocking Tasks

Programmers can mark CPU kernels with attributes that help HetCompute make better schedulingdecisions. The current HetCompute release supports just one attribute (to indicate that the kernel mayexecute blocking code), but there are plans to include others in future releases. The following exampleshows how to create a CPU kernel object that is marked as blocking.

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();78 auto k = hetcompute::create_cpu_kernel([] {9 // execute some blocking I/O request10 });11 // inform the Hetcompute runtime that the kernel may block on some external event12 k.set_blocking();1314 auto t = hetcompute::launch(k);15 t->wait_for();1617 hetcompute::runtime::shutdown();18 return 0;19 }

4.3.10.1 Blocking Kernel

A blocking kernel (and a task created out of that kernel) consists of computation that depends on external(non-HetCompute) synchronization to make guaranteed forward progress. Typically, the externalsynchronization includes completing I/O requests, other OS syscalls with indefinite run-time, andbusy-waiting. It does not include waiting on HetCompute tasks or groups usinghetcompute::task<>::wait_for, hetcompute::task<ReturnType>::copy_value,hetcompute::task<ReturnType>::move_value, or hetcompute::group::wait_for.

There are two problems with blocking tasks. The first is that once a blocking task executes, it will take overa thread in a HetCompute thread pool, thus preventing other tasks from executing in the same thread.Because a blocking task spends most of its time blocking on an event, essentially one of the threads in thethread pool is wasted. When the programmer marks the task kernel as blocking, HetCompute ensures thatthe thread pool does not wastefully dedicate a thread to the task.

The second problem has to do with cancellation. If a blocking task is canceled while it is blocked on anexternal event, HetCompute needs to be able to unblock the task so that it can respond to the cancellationsignal (e.g., by calling hetcompute::abort_on_cancel()). As the code snippet below shows, thereis often a well-defined means to unblock:

// blocking call{

x = network_fetch(network_handle);}

// means to unblock network_fetch above{

write_spurious(network_handle);}

write_spurious(network_handle) can be called asynchronously while network_fetch isexecuting to unblock the latter. HetCompute captures the above idea through thehetcompute::blocking() construct to enable efficient cancellation of blocking tasks.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 109

Page 110: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4.3.10.2 hetcompute::blocking

hetcompute::blocking() enables a task to enter and exit multiple blocking sections of code, andprovides means for the programmer to precisely and efficiently specify how to unblock each blockingsection. The hetcompute::blocking() construct takes two programmer-supplied arguments:

• blocking function: a C++ function, functor, or lambda, that contains the blocking code to execute, and

• cancel function: a C++ function, functor, or lambda, that contains the code to unblock the task if it isblocked, so that the task may respond to cancellation.

In response to a task being canceled asynchronously, e.g., via hetcompute::task<>::cancel(),the cancel function will be executed exactly once, only if the task is running.

In the following example, a CPU kernel that executes blocking code is created,. The blocking statement iswrapped in hetcompute::blocking() and specified using two lambdas, includingcancel_function. The kernel is marked as blocking (line 30). After launching task t and sleeping fora second, t(line 39) is canceled. Most likely, by the time t is canceled, it will be waiting on the conditionvariable (line 26). The cancel function wakes up the task body so that it can abort (line 24).

1 #include <condition_variable>2 #include <hetcompute/hetcompute.hh>34 int5 main()6 {7 hetcompute::runtime::init();89 static std::mutex mutex;10 static std::condition_variable cv;1112 // create CPU kernel and mark it as blocking13 auto k = hetcompute::create_cpu_kernel([] {14 auto cancel_function = [] {15 HETCOMPUTE_ILOG("CANCEL blocking task");16 std::lock_guard<std::mutex> lock(mutex);17 cv.notify_all();18 };1920 HETCOMPUTE_ILOG("START blocking task");21 std::unique_lock<std::mutex> lock(mutex);22 for (;;)23 {24 hetcompute::abort_on_cancel();25 // enter hetcompute::blocking construct26 hetcompute::blocking([&lock] { cv.wait(lock); }, // blocking function27 cancel_function); // cancel function28 }29 HETCOMPUTE_ILOG("STOP blocking task");30 });31 k.set_blocking();3233 auto t = hetcompute::launch(k);3435 // wait for task to block36 sleep(1);3738 // cancel task; it will call t’s cancel function39 t->cancel();4041 try42 {43 // wait for t to finish44 t->wait_for();45 }46 catch (const hetcompute::canceled_exception& e)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 110

Page 111: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

47 {48 HETCOMPUTE_ILOG("task threw %s", e.what());49 }50 catch (...)51 {52 // Never reached53 }5455 hetcompute::runtime::shutdown();56 return 0;57 }

Note

If a task receives a cancellation request prior to entering a hetcompute::blocking() section,the task throws hetcompute::canceled_exception immediately upon entering the section. Neither theblocking function nor the cancel function is executed.

4.3.11 Algebraic Operations on Tasks

As a programmatic convenience, HetCompute provides operator overloads on tasks with return values. Atask represents a value that will eventually materialize. The overloaded operators provide a means toexpress computation using such eventual values. In the example below, greet creates two tasks andreturns their sum (through operator +). HetCompute internally creates a task that depends on tasks a and b,and launches it.

1 #include <hetcompute/hetcompute.hh>23 hetcompute::task_ptr<std::string> greet();45 hetcompute::task_ptr<std::string>6 greet()7 {8 auto a = hetcompute::launch([] { return std::string("hello"); });9 auto b = hetcompute::launch([] { return std::string(" world"); });10 return a + b; // returns a launched task which is11 // data-dependent on tasks a and b12 }1314 int15 main()16 {17 hetcompute::runtime::init();18 auto t = greet();19 std::cout << t->move_value() << std::endl;20 hetcompute::runtime::shutdown();21 }

Later, at runtime, when tasks a and b finish, their return values are propagated to task a + b, which thenconcatenates the two strings through the + operator overloaded on the std::string datatype. Theconcatenated string is then retrieved through t->move_value in main.

HetCompute supports the following non-blocking algebraic operations on the return values of collapsedtasks:

• Unary arithmetic and bitwise operations: +, -, ∼ The return value of the task (pointed to byhetcompute::task_ptr) can be any type (built-in or user-defined) as long as the operation isapplicable to it (for user-defined types, the corresponding operator needs to be defined).

• Binary arithmetic and bitwise operations: +, -, ∗, /, %, &, |, ∧, which can take thefollowing three combinations of operands:

– operand1: hetcompute::task_ptr operand2: hetcompute::task_ptr

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 111

Page 112: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

– operand1: hetcompute::task_ptr operand2: value

– operand1: value operand2: hetcompute::task_ptrThe value operand and the return value of the task (pointed to by hetcompute::task_-ptr) can be any type (built-in or user-defined) as long as the operation is applicable between them(for user-defined types, the corresponding operator needs to be defined). The type of the return valueof the new task will be the promoted type between the type of the two operands.

• Binary compound arithmetic and bitwise assignment operations: +=, -=, ∗=, /=, %=, &=,|=, ∧=: which can take the following three combinations of operands

– operand1: hetcompute::task_ptr operand2: hetcompute::task_ptr

– operand1: hetcompute::task_ptr operand2: valueThe value operand and the return value of the task (pointed to byhetcompute::task_ptr) can be any type (built-in or user-defined) as long as the operation isapplicable between them (for user-defined types, the corresponding operator needs to be defined).Also, the type of the result will be type-cast to the type of the return value of the original task so thatthe new task can be pointed to by the original hetcompute::task_ptr.

The result of these operations is a newly launched task whose return value is the result of the correspondingoperations. If any of the operands are tasks, the resulting task will be data dependent (see TaskDependencies) on the operand tasks, and be launched when the operand values are ready.

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();7 auto a = hetcompute::launch([] { return 11; });8 auto b = hetcompute::launch([] { return 4; });9 auto c = hetcompute::launch([] { return 24; });10 auto e = (a * b) - (c / 12); // e is a task

that will eventually compute11 HETCOMPUTE_ILOG("The answer to lie the universe and everything = %u", e->copy_value()); // copy_value

waits for e to compute12 hetcompute::runtime::shutdown();13 }

In the example above, e is an expression tree composed of values, tasks, and operations on tasks. Theexpression tree will be evaluated when tasks in each sub-expression finish and return their values.

Note

Algebraic operators return launched tasks; do not attempt to re-launch them.Use the algebraic operator when it is known that the operation is computationally expensive.The following operators are not supported in the current version:

• Comparison and logical operators: ==, !=, >, <, >=, <=, !, &&, ||

• Bitwise shift operators: <<, >>, <<=, >>=

• Increment and Decrement arithmetic operators: ++, -

• Other meaningless operators for task_ptr: [], ∗, ->∗, (), comma, ""_, type, etc.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 112

Page 113: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4.3.12 Task-Pointer Collapsing

As shown in the example below, a task may return another task. Recall that a task’s type is determined byits return value. The type of the return value determines the tasks to which the returned data may flowthrough data dependencies. In the example, task t has type task_ptr<int> (actuallytask_ptr<int(void)>, but the void argument is irrelevant for this discussion). By the samereasoning, task t1 should have type task_ptr<task_ptr<int>>. However, if a task is viewed as aconstruct that eventually computes a value, and therefore is merely a placeholder for that value, thentask_ptr<int> represents an int that will eventually materialize and thereforetask_ptr<task_ptr<int>> is really just task_ptr<int>. The degeneration oftask_ptr<task_ptr<int>> to task_ptr<int> is called return type collapsing. When a task iscreated/launched, HetCompute performs return type collapsing by default. Consequently, in the example,task t1 can be bound as a data dependency to task t2 which expects an int as its argument.

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();7 auto t1 = hetcompute::launch([] {8 auto t = hetcompute::launch([] { return 42; });9 return t;10 });11 auto t2 = hetcompute::launch([](int i) { std::cout << i << std::endl; }, t1);12 t2->wait_for();13 hetcompute::runtime::shutdown();14 return 0;15 }

There may be certain situations in which the intent is to pass a task_ptr through dataflow. For instance, onetask may create a task t and some other task may launch task t after all of task t’s data is available. For suchsituations, use hetcompute::do_not_collapse to indicate that return type collapsing should not beperformed.

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();7 auto t1 = hetcompute::launch(hetcompute::do_not_collapse,

[] {8 auto t = hetcompute::create_task([] { return 42; });9 return t;10 });11 auto t2 = hetcompute::launch(12 [](hetcompute::task_ptr<int> t) {13 t->launch();14 return t;15 },16 t1);17 std::cout << t2->copy_value() << std::endl;18 hetcompute::runtime::shutdown();19 return 0;20 }

In the above example, task t1 creates task t and passes it as a data dependency to task t2 which accepts atask_ptr<int> as its argument.hetcompute::do_not_collapse ensures that the return type oftask t1 is task_ptr<task_ptr<int>> and that task_ptr t flows to task t2. Task t2 in turnlaunches task t and returns it. Notice that because task t2 is not created withhetcompute::do_not_collapse, t2->copy_value() accesses the int eventually computedby task t.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 113

Page 114: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4.3.13 Unleashing Asynchrony

Applications should be parallelized or made asynchronous in a modular fashion, with the different modulesbeing composable with each other both programmatically and in terms of performance gains from theindependently parallelized modules. Using HetCompute, functions in an application may be split into tasks,with tasks specified as related to each other through dependencies. Dependencies are typically specifiedstatically using hetcompute::task<>::then (control dependency) orhetcompute::task<ReturnType(Args...)>::bind_all (data dependency), or dynamicallyusing the blocking function call hetcompute::task<>::wait_for. For best performance usingHetCompute, dependencies among tasks should be statically specified ahead of execution time usinghetcompute::task<>::then or hetcompute::task<ReturnType(Args...)>::bind-_all, while avoiding dynamic discovery of dependencies during execution through use ofhetcompute::task<>::wait_for as much as possible. However, as the following example shows,achieving modular and composable parallelism without blocking is hard using just the above APIs.

The application pseudocode below calls compose_webpages to build a composite display of multiplewebpages. compose_webpages calls display_webpage to display each webpage;display_webpage in turn fetches data to be displayed and the styling of the data, both of which areused to render the webpage on the composite display.

void display_webpage(string url) {fetch(url, "data");fetch(url, "style");render();

}

void compose_webpages(string urls) {for (auto url : urls) {

display_webpage(url);}}

}

compose_webpages(urls);

Assume that individual webpages can be rendered independently of each other, and that the fetching of dataand its styling can also be done in parallel. This informs the following parallel implementation of thecomposite display application. The for loop in the compose_webpages function is executed inparallel. The code to fetch data and style is launched as tasks that can execute in parallel, while therender function is launched as a task scheduled to execute after the fetchdata and fetchstyletasks have finished.

In HetCompute, tasks are not related to each other except through task dependencies specified throughhetcompute::task<>::then or hetcompute::task<ReturnType(Args...)>::bind-_all. A notable consequence of this is that although the display_webpage task created in the forloop in compose_webpages creates and launches three more tasks, the latter tasks are in no way relatedto the display_webpage task that created them. Therefore, the display_webpage function mustexplicitly wait_for the render task to finish before it returns, so that the display_webpage taskthat created them finishes only when all tasks created and launched by it have finished. Similarly, thecompose_webpages function also waits for all tasks it creates to finish before it returns. While such aparallelization is desirable because each function was locally parallelized in a modular fashion, the use ofblocking hetcompute::task<>::wait_for to enforce the synchronous function call interface cansignificantly hinder performance when there are many outstandinghetcompute::task<>::wait_fors in the application.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 114

Page 115: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4.3.13.1 finish_after

To enable modular and composable parallelization in a high-performance non-blocking style, HetComputeprovides a unique method called hetcompute::task<>::finish_after (andhetcompute::group::finish_after).

A task t1 can register itself as finishing after another task t2 by calling t2->finish_after(). Bydoing so, the HetCompute runtime system is informed that although task t1 may have completedexecution, the task will logically finish only after task t2 has finished. Therefore, just as the use oft2->wait_for() extends the lifetime of task t1 to encapsulate that of task t2,t2->finish_after() achieves the same but without blocking the thread executing task t1.

extern task_ptr<> t2;void foo() {

...t2->finish_after();...

}

int main() {auto t1 = hetcompute::create_task(foo);...

}

A task can register itself as finishing after a group g by callinghetcompute::group::finish_after on group g. Note that a task can register itself as finishingafter any number of tasks and groups.

extern task_ptr<> t2;extern group_ptr g;void foo() {

...t2->finish_after();g->finish_after();...

}

int main() {auto t1 = hetcompute::create_task(foo);...

}

Both function calls are non-blocking, lightweight, and return immediately. Note that the non-blockingparallelization using finish_after below is nearly identical to the blocking parallelization, with thesole difference being the use of finish_after in place of wait_for. Furthermore, note that thedisplay_webpage and compose_webpages functions now return early, before any of the tasks theylaunched finish. Consequently, these functions have become asynchronous and must be invoked fromwithin tasks. Therefore, in main(), compose_webpages is called from within task t rather than as asynchronous function call.

1 #include <string>23 #include <hetcompute/hetcompute.hh>45 void display_webpage(char*);6 void compose_webpages(int num_urls, char* urls[]);78 void9 display_webpage(char* url)10 {11 auto fetchdata = hetcompute::create_task([=] {12 /*fetch(url, "fetchdata");*/13 return std::string(url) + " data";14 });15 auto fetchstyle = hetcompute::create_task([=] {

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 115

Page 116: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

16 /*fetch(url, "fetchstyle");*/17 return std::string(url) + " style";18 });19 auto render = hetcompute::create_task([](std::string data, std::string style

) {20 /*render();*/21 std::cout << data + " " + style << std::endl;22 });23 // Render task may start executing only after data and style have been24 // fetched25 render->bind_all(fetchdata, fetchstyle);26 fetchdata->launch();27 fetchstyle->launch();28 render->launch();29 // Mark display_webpage as logically finishing after the render task finishes30 render->finish_after();31 // Return from function call even before any of the fetchdata, fetchstyle, or render32 // tasks finish. Such an early return makes the function asynchronous.33 }3435 void36 compose_webpages(int num_urls, char* urls[])37 {38 auto g = hetcompute::create_group();39 for (int i = 1; i < num_urls; i++)40 {41 g->launch([=] { display_webpage(urls[i]); });42 }43 // Mark compose_webpages as logically finishing after all webpages have been44 // composed and displayed45 g->finish_after();46 // Return from function call before any of the tasks finish47 }4851 int52 main(int argc, char* argv[])53 {54 hetcompute::runtime::init();5556 // Launch compose_webpages as a task since it is an asynchronous function57 // call58 auto t = hetcompute::launch([=, &argv] { compose_webpages(argc, argv); });59 // Waits for the composite display to be rendered!60 t->wait_for();61 return 0;6263 hetcompute::runtime::shutdown();64 }

The single, global, hetcompute::task<>::wait_for in main ensures that the composite displayis correctly rendered before the application terminates. Note that there are no other blocking calls necessaryto specify all the parallelism in the application.

hetcompute::task<>::finish_after or hetcompute::group::finish_after can beinvoked only from within a task. Note that a task can register itself as finishing after an arbitrary number ofother tasks and groups. A task can register itself as finishing after tasks or groups it did not create or launch.finish_after is a means to semantically relate tasks to each other, e.g., in a parent-child relationship.In the parallel mergesort example below, every node in the mergesort tree is specified to finish afterthe merge step corresponding to that node finishes (line 39).

1 #include <algorithm>2 #include <functional>3 #include <iostream>4 #include <iterator>5 #include <sstream>6 #include <vector>78 #include <hetcompute/hetcompute.hh>

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 116

Page 117: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

910 // Parallel mergesort using recursive fork-join parallelism.11 // hetcompute::task<>::finish_after allows easy expression of the parallelism in the12 // algorithm in a non-blocking manner, yielding better performance than13 // blocking parallelization using hetcompute::task<>::wait_for.1416 const size_t GRANULARITY = 8192;1718 // Asynchronous mergesort, to be invoked in a task19 template <typename Iterator, typename Compare>20 void21 mergesort(Iterator begin, Iterator end, Compare cmp)22 {23 size_t n = std::distance(begin, end);24 if (n <= GRANULARITY)25 {26 sort(begin, end, cmp);27 }28 else29 {30 auto middle = begin;31 std::advance(middle, n / 2);32 auto left = hetcompute::launch([=] { mergesort(begin, middle, cmp); });33 auto right = hetcompute::launch([=] { mergesort(middle, end, cmp); });34 auto merge = hetcompute::create_task([=] { std::inplace_merge(begin, middle,

end, cmp); });35 // The left subtree and right subtree tasks must finish before the merge36 // task can execute37 left->then(merge);38 right->then(merge);39 merge->launch();40 // mergesort(begin, end, cmp) logically finishes after the merge task41 // finishes42 merge->finish_after();43 }44 }4546 int47 main(int argc, const char* argv[])48 {49 hetcompute::runtime::init();50 std::vector<long> input;51 size_t n_def = 1 << 16;52 size_t n = n_def;5354 if (argc >= 2)55 {56 std::istringstream istr(argv[1]);57 istr >> n;58 }5960 // Create a random array of integers61 for (size_t i = 0; i < n; i++)62 {63 input.push_back(rand());64 }6566 // Launch mergesort inside a task since it has an asynchronous interface (due67 // to use of hetcompute::task::finish_after)68 auto t = hetcompute::launch([&] { mergesort(input.begin(), input.end(),

std::less<long>()); });69 t->wait_for();7071 if (!std::is_sorted(input.begin(), input.end()))72 {73 std::cerr << "parallel mergesorting failed\n";74 }7576 hetcompute::runtime::shutdown();77 return 0;78 }

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 117

Page 118: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4.3.13.2 Asynchronous APIs

As stated previously, calling finish_after in a function implicitly makes the function asynchronous. Insome cases, e.g., when the function is set to finish_after a single task, the asynchronous nature of thefunction may be made explicit by modifying it to return that task, instead of calling finish_after. Thisresults in a lightweight, asynchronous, non-blocking API. For illustration, first consider the synchronoussequential implementation below:

// synchronous sequential implementationsize_tfibonacci_seq(size_t n){

if (n < 2){

return n;}else{

auto a = fibonacci_seq(n - 1);auto b = fibonacci_seq(n - 2);return a + b;

}}

This is trivially converted into the following fully asynchronous implementation in three easy steps:

1. Convert sequential function calls into task launches (hetcompute::launch(...))

2. Change return type of function to hetcompute::task_ptr<size_t>

3. Convert integer n into a value task using hetcompute::create_value_task<size_t>

// fully asynchronous non-blocking implementationhetcompute::task_ptr<size_t>fibonacci(size_t n){

if (n < 2){

return hetcompute::create_value_task<size_t>(n);}else{

// task_ptr collapsing// typeof(a) is task_ptr<size_t>, not task_ptr<task_ptr<size_t>>auto a = hetcompute::launch(fibonacci, n - 1);auto b = hetcompute::launch(fibonacci, n - 2);return a + b; // task algebra

}}

The snippet illustrates how Task-Pointer Collapsing and Algebraic Operations on Tasks are synergisticallycombined to enable the asynchronous fibonacci API. Note the close correspondence with thesynchronous sequential implementation shown below. The HetCompute API enables the programmer toeasily and elegantly express the concurrency in the algorithm.

As a performance optimization, the programmer may coarsen the size of tasks so that small Fibonacci termsare computed sequentially while large ones are computed in parallel. Below is the full example with boththe optimized and unoptimized versions.

1 #include <hetcompute/hetcompute.hh>23 // synchronous sequential Fibonacci API4 size_t fibonacci_seq(size_t n);56 // asynchronous Fibonacci API7 hetcompute::task_ptr<size_t> fibonacci(size_t n);

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 118

Page 119: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

8 hetcompute::task_ptr<size_t> fibonacci_opti(size_t n);911 // synchronous sequential implementation12 size_t13 fibonacci_seq(size_t n)14 {15 if (n < 2)16 {17 return n;18 }19 else20 {21 auto a = fibonacci_seq(n - 1);22 auto b = fibonacci_seq(n - 2);23 return a + b;24 }25 }2729 // fully asynchronous non-blocking implementation30 hetcompute::task_ptr<size_t>31 fibonacci(size_t n)32 {33 if (n < 2)34 {35 return hetcompute::create_value_task<size_t>(n);36 }37 else38 {39 // task_ptr collapsing40 // typeof(a) is task_ptr<size_t>, not task_ptr<task_ptr<size_t>>41 auto a = hetcompute::launch(fibonacci, n - 1);42 auto b = hetcompute::launch(fibonacci, n - 2);43 return a + b; // task algebra44 }45 }4749 // optimized asynchronous non-blocking version that dispatches to sequential50 // implementation for small Fibonacci terms51 const size_t GRANULARITY = 16;52 hetcompute::task_ptr<size_t>53 fibonacci_opti(size_t n)54 {55 if (n < GRANULARITY)56 {57 return hetcompute::create_value_task<size_t>(fibonacci_seq(n));58 }59 else60 {61 // task_ptr collapsing62 // typeof(a) is task_ptr<size_t>, not task_ptr<task_ptr<size_t>>63 auto a = hetcompute::launch(fibonacci_opti, n - 1);64 auto b = hetcompute::launch(fibonacci_opti, n - 2);65 return a + b; // task algebra66 }67 }6970 // e.g., ./hetcompute_examples_async_fibonacci 3071 int72 main(int argc, char* argv[])73 {74 hetcompute::runtime::init();7576 size_t n_def = 20;77 size_t n = n_def;7879 if (argc >= 2)80 {81 std::istringstream istr(argv[1]);82 istr >> n;83 }84

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 119

Page 120: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

85 // fibonacci_opti is typically much faster than fibonacci86 // std::cout << "Fibonacci term " << n << " is " << fibonacci(n)->copy_value() << std::endl;87 std::cout << "Fibonacci term " << n << " is " << fibonacci_opti(n)->copy_value() << std::endl;8889 hetcompute::runtime::shutdown();90 return 0;91 }

4.3.13.3 Cancellation

As discussed in Cancellation, a task or group may be canceled. Consequently, any task registered tofinish_after the canceled task or group may be subject to cancellation.

1 #include <hetcompute/hetcompute.hh>23 void foo();4 void bar();56 void7 foo()8 {9 auto tl = hetcompute::launch([] {10 while (true)11 {12 hetcompute::abort_on_cancel();13 // do something14 }15 });16 tl->finish_after();17 // do something18 tl->cancel();19 }2021 void22 bar()23 {24 // do something25 }2627 int28 main()29 {30 hetcompute::runtime::init();31 auto t1 = hetcompute::create_task(foo);32 auto t2 = hetcompute::create_task(bar);3334 t1->then(t2);3536 t1->launch();37 t2->launch();3839 try40 {41 t2->wait_for();42 }43 catch (const hetcompute::canceled_exception&)44 {45 HETCOMPUTE_ILOG("t2 was canceled");46 }47 catch (...)48 {49 // Never reached50 }51 hetcompute::runtime::shutdown();5253 return 0;54 }

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 120

Page 121: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

Referring to the above example, task t1 launches task tl, registers itself as finishing after task tl, andsubsequently cancels task tl. As a result, task t1 is itself canceled by the HetCompute runtime system andcancellation is propagated to successors of task t1, which in this case is only task t2.

Canceling a task does not result in cancellation of other tasks or groups it is registered to finish_after.As the example below shows, canceling task t does not cancel tl that t is registered to finish_after.

1 #include <cassert>23 #include <hetcompute/hetcompute.hh>45 void foo();67 hetcompute::task_ptr<> tl;8 std::atomic<bool> tl_running(false);9 std::atomic<bool> stop_tl(false);1011 void12 foo()13 {14 tl = hetcompute::launch([] {15 while (!stop_tl)16 {17 hetcompute::abort_on_cancel();18 tl_running = true;19 // do something20 }21 });22 tl->finish_after();23 }2425 int26 main()27 {28 hetcompute::runtime::init();29 auto t = hetcompute::launch(foo);3031 while (!tl_running)32 {33 }3435 t->cancel(); // Does not cancel tl3637 stop_tl = true;3839 try40 {41 t->wait_for();42 }43 catch (const hetcompute::canceled_exception&)44 {45 // Will never reach here since t->cancel is issued only after task t starts46 // running and task t never acknowledges cancellation47 }48 catch (...)49 {50 // Never reached51 }5253 assert(!tl->canceled());54 HETCOMPUTE_ILOG("tl was not canceled");5556 tl.reset();5758 hetcompute::runtime::shutdown();59 return 0;60 }

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 121

Page 122: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4.3.13.4 Summary

The non-blocking APIs discussed above; including hetcompute::task::finish_after,hetcompute::group::finish_after,hetcompute::task<ReturnType(Args...)>::bind_all, and algebraic operators onhetcompute::task_ptrs; enable the design and composition of asynchronous applications in amodular fashion. Furthermore, in applications with many outstanding wait_for calls, the performancegains from using finish_after in place of wait_for can be very significant. Application developersare encouraged to use this method to implement highly responsive parallel applications.

4.4 Buffers

Basic Usage of Buffers

Using Buffers with Tasks

Synchronized and Concurrent Use

Creating Buffers

Performance and Storage Optimizations When Using Buffers

Memory Regions

4.4.1 Basic Usage of Buffers

The HetCompute buffers API provides the user with a runtime-managed heterogeneous data structure.Tasks on the CPU, GPU, and DSP can share data using a HetCompute buffer. A HetCompute buffer is acontiguous array of a user-defined data-type T. Each buffer can have one or more buffer pointers of typehetcompute::buffer_ptr<T> or hetcompute::buffer_ptr<const T> pointing to it.The buffer is ref-counted: the HetCompute runtime will automatically deallocate the buffer when there areno more buffer pointers pointing to it. hetcompute::buffer_ptr<T> allows mutable access to thebuffer data, while hetcompute::buffer_ptr<const T> allows only immutable access (similar toint∗ versus const int∗ access to an instance of int). The following code illustrates the most basicAPI call for the creation of buffers of an intended number of elements.

hetcompute::buffer_ptr<float> b1 = hetcompute::create_buffer<float>(100);

hetcompute::buffer_ptr<const float> b2 = hetcompute::create_buffer<constfloat>(100);

hetcompute::buffer_ptr<const float> b3 =hetcompute::create_buffer<float>(100);

The runtime transparently manages the movement of the buffer data between specialized device-specificbacking stores. For example, the runtime allocates ION memory as backing store for the optimal sharing ofbuffer data between the CPU and DSP. Similarly, the runtime uses an OpenCL buffer as backing store tosynchronize the buffer data between the CPU and GPU. If OpenCL library supports shared virtual memory,HetCompute chooses to use SVM as backend memory instead of clmem. Additionally, the runtime tries totake advantage of any available advance knowledge of which devices may access a buffer to optimize theallocation of backing stores from specialized device memories and to minimize the copying of data betweenthe backing stores.

Please also refer to Textures for a GPU-only data structure suitable for image data.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 122

Page 123: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

There are four entities that may access a buffer’s data.

1. A CPU task

2. A GPU task

3. A DSP task

4. CPU host code

Task access: A task may access a buffer by taking the corresponding buffer pointer as an argument. A taskmay access the buffer as an input, an output, or input-output, referred to as the direction of access. Recallthat a task is created using a device-specific kernel (hetcompute::cpu_kernel,hetcompute::gpu_kernel and hetcompute::dsp_kernel). The kernel’s signature mayexplicitly declare the direction for each buffer pointer parameter or the direction may be implicitly inferredbased on the mutability of the buffer pointer (hetcompute::buffer_ptr<T> versushetcompute::buffer_ptr<const T>).

Note that a CPU task may be created directly with a lambda, functor, or function parameter withoutinvolving a CPU kernel. For such a CPU task, the access directions of any buffer pointer arguments areinferred implicitly from the mutability of the buffer pointer parameters to the lambda, functor, or function.

Host code access: The application code on the CPU may directly access a buffer’s data using its bufferpointer. A host code access refers to any access from the application code that is either not enclosed withina task or uses a buffer pointer that is not a parameter to the enclosing task.

The following example illustrates the difference between task and host code access.

auto b1 = hetcompute::create_buffer<int>(3);auto b2 = hetcompute::create_buffer<int>(3);

auto t = hetcompute::launch([=](hetcompute::buffer_ptr<int> x) {

// This is *task access* to b1’s buffer// via task parameter x.for (size_t i = 0; i < x.size(); i++)

x[i] = int(i);

b2.acquire_wi();// This is *host code access* to b2’s buffer.for (size_t i = 0; i < b2.size(); i++)

b2[i] = 1000 + int(i);b2.release();

},b1);

t->wait_for();

// This is host code access to b1’s buffer.b1.acquire_ro();for (size_t i = 0; i < b2.size(); i++)

printf("b1[%zu]=%d", i, b1[i]);b1.release();

// This is host code access to b2’s buffer.b2.acquire_ro();for (size_t i = 0; i < b2.size(); i++)

printf("b2[%zu]=%d", i, b2[i]);b2.release();

Please see Using Buffers with Tasks for more details.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 123

Page 124: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4.4.2 Using Buffers with Tasks

The following sections elaborate on the use of buffers with tasks on each device type.

4.4.2.1 Buffers with CPU Tasks

The following example illustrates buffer access by a CPU task created directly from a user function. Notethat the access directions are implicitly inferred from the mutability of the corresponding buffer pointerparameters.

void user_function(hetcompute::buffer_ptr<const int> x, // x is input onlyhetcompute::buffer_ptr<int> y) // y is

input-output{

for (size_t i = 0; i < x.size(); i++)y[i] = x[i] * 2;

}

intmain(){

hetcompute::runtime::init();auto b1 = hetcompute::create_buffer<int>(10);auto b2 = hetcompute::create_buffer<int>(10);

b1.acquire_wi();for (size_t i = 0; i < b1.size(); i++)

b1[i] = int(i);b1.release();

// launch a CPU task with a user-function:// b1 is inferred as input// b2 is inferred as input-outputauto t = hetcompute::launch(user_function, b1, b2);t->wait_for();// elements of b2 are now double the corresponding elements of b1

b2.acquire_ro();for (size_t i = 0; i < b2.size(); i++)

HETCOMPUTE_ILOG("b2[%zu]=%d", i, b2[i]);b2.release();

hetcompute::runtime::shutdown();return 0;

}

The following example illustrates buffer access by a CPU task created using a CPU kernel.

void user_function(hetcompute::buffer_ptr<const int> x, // x is input onlyhetcompute::buffer_ptr<int> y) // y is

input-output{

for (size_t i = 0; i < x.size(); i++)y[i] = x[i] * 2;

}

intmain(){

hetcompute::runtime::init();

auto b1 = hetcompute::create_buffer<int>(10);auto b2 = hetcompute::create_buffer<int>(10);

b1.acquire_wi();for (size_t i = 0; i < b1.size(); i++)

b1[i] = int(i);b1.release();

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 124

Page 125: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

// The CPU kernel infers the access directions of the// buffer parameters from the user function’s signature.auto ck = hetcompute::create_cpu_kernel(user_function);

// create a CPU task with a cpukernel:// b1 is inferred as input// b2 is inferred as input-outputauto t = hetcompute::launch(ck, b1, b2);t->wait_for();// elements of b2 are now double the corresponding elements of b1

b2.acquire_ro();for (size_t i = 0; i < b2.size(); i++)

HETCOMPUTE_ILOG("b2[%zu]=%d", i, b2[i]);b2.release();

hetcompute::runtime::shutdown();return 0;

}

In the examples above, the user function accessed the buffer data by indexing the buffer pointer as an array.The host code accesses the buffer data in a similar manner. The host code and CPU tasks may also request apointer to manipulate the entire contents of the buffer, as shown below.

auto b = hetcompute::create_buffer<int>(100);

void* ptr = b.host_data();size_t size_in_bytes = b.size() * sizeof(int);

// manipulate [ptr, ptr + size_in_bytes] directly.// The semantics of accessing the ptr data are the same// as host code access on the buffer_ptr b

Limitation HetCompute SDK does not yet support explicit specification of access directions with CPUkernels. The access directions are only allowed to be implicitly inferred.

4.4.2.2 Buffers with GPU Tasks

The following example illustrates creation of a GPU task with implicitly inferred access directions, similarto the CPU task examples above.

// Create a string containing OpenCL C kernel code.#define OCL_KERNEL(name, k) std::string const name##_string = #k

OCL_KERNEL(vdouble_kernel, __kernel void vdouble(__global float* A, __global float* B) {unsigned int i = get_global_id(0);B[i] = 2.0 * A[i];

});

intmain(){

hetcompute::runtime::init();auto buf_a = hetcompute::create_buffer<float>(3);auto buf_b = hetcompute::create_buffer<float>(buf_a.size());

// Initialize the inputbuf_a.acquire_wi();for (size_t i = 0; i < buf_a.size(); ++i)

buf_a[i] = i;buf_a.release();

// Create a kernel objectauto gpu_vdouble = hetcompute::create_gpu_kernel<hetcompute::buffer_ptr<const float>, // inferred as in

directionhetcompute::buffer_ptr<float>> // inferred as

inout direction

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 125

Page 126: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

(vdouble_kernel_string, "vdouble");

auto gpu_task = hetcompute::launch(gpu_vdouble,hetcompute::range<1>(buf_a.size()),buf_a, // accessed as ‘in‘buf_b); // accessed as ‘inout‘

gpu_task->wait_for();

buf_b.acquire_ro();for (size_t i = 0; i < buf_b.size(); i++)

HETCOMPUTE_ILOG("buf_b[%zu] = %f", i, buf_b[i]);buf_b.release();

hetcompute::runtime::shutdown();}

Access directions may be explicitly specified by wrapping the buffer pointer template parameters of thekernel with hetcompute::in, hetcompute::out, and hetcompute::inout, as illustratedbelow.

// Create a kernel objectauto gpu_vdouble = hetcompute::create_gpu_kernel<hetcompute::in<hetcompute::buffer_ptr<float>>, //

explicit in directionhetcompute::out<hetcompute::buffer_ptr<float>>> //

explicit out direction(vdouble_kernel_string, "vdouble");

auto gpu_task = hetcompute::launch(gpu_vdouble,hetcompute::range<1>(buf_a.size()),buf_a, // accessed as ‘in‘buf_b); // accessed as ‘out‘

Note

A GPU kernel may be created using either an OpenCL C function or an OpenGL ES compute shader.However, buffers interact with GPU kernels of either type in an identical manner. See GPU kernels forOpenCL and OpenGL ES for an example program that uses a kernel created from an OpenGL EScompute shader with buffers.

4.4.2.3 Buffers with DSP Tasks

Consider a DSP function with the following IDL signature. The IDL signature explicitly identifies the inand out access directions for the parameters.

long array_is_prime(in sequence<long> numbers, rout sequence<long> primes);

HetCompute recognizes the in and out access directions coming from the IDL signature when ahetcompute::dsp_kernel instance is created from the DSP function, as illustrated in the followingexample.

// dsp kernel creationauto hex_kernel = create_dsp_kernel_by_domain<const int*, int, int*, int>(adsp_domain_handle, "

hetcompute_dsp_array_is_prime");// Set DSP Kernel attribute to aDSPhex_kernel.set_adsp();

// create the dsp task that will be executed inside the dsp DSPauto hex_task = hetcompute::create_task(hex_kernel,

in_buf, // in access recognizedout_buf); // out access recognized

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 126

Page 127: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4.4.3 Synchronized and Concurrent Use

4.4.3.1 Synchronized Access to Buffers Across Host Code and Tasks

When a task completes execution, HetCompute no longer automatically synchronizes all the buffersaccessed by the tasks. The host code must explicitly synchronize to access the buffer data updated by thetask (see Host Access, Explicit Synchronization with Host Code).

HetCompute deprecated the use of buffer_mode::synchronized & buffer_mode::relaxedduring buffer creation. Instead all buffer creation follow the semantics of relaxed mode.

4.4.3.2 Concurrent Access by Tasks

The HetCompute runtime allows multiple tasks to concurrently access the buffer provided they access thebuffer only as input. The runtime ensures that a task accessing the buffer as output or as input-output doesnot execute concurrently with other tasks accessing that buffer. HetCompute disallows concurrent access toa buffer when the buffer is being modified. The acquisition will be blocked when a concurrent task/patternhas acquired the buffer for read-write or write-invalidate access. In rare situations, the acquisition may alsobe blocked when a concurrent task/pattern has read-only access but HetCompute is unable to synchronizethe buffer data for host access until the concurrent task/pattern completes.

4.4.4 Creating Buffers

A buffer can be created in three basic ways. Each of the ways may take additional parameters covered inProviding Device Hints.

4.4.4.1 With Storage Fully Managed by HetCompute

The user specifies the datatype and number of elements needed in the buffer. HetCompute internallymanages the allocation of all the storage needed for the buffer.

auto b = hetcompute::create_buffer<int>(100);

4.4.4.2 With User-provided Initial Storage and Data

The initial storage and data for the buffer can be provided by the user. The HetCompute runtime mayallocate additional backing stores as needed, and will handle the synchronization between the user-providedstorage and any internal backing stores.

// user creates storagestd::vector<int> v;for(int i = 0; i < 100; i++)

v.push_back(i);

// create buffer with initial storage:// v will serve as the main-memory backing store for the bufferauto b = hetcompute::create_buffer(v.data(), v.size());

4.4.4.3 With a Memory Region

hetcompute::memregion allows the user to allocate specialized memory or create inter-operabilitywith data from other frameworks (see Memory Regions).

The user may create a buffer from a previously created memory region. The memory region may alsocontain initial data for the buffer.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 127

Page 128: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

For example, the following code creates a buffer with an ION memregion:

// user creates storage in a specialized "memory region" -- ION memory in this casehetcompute::ion_memregion imr(100 * sizeof(int)); // allocate storage in ION

memory

// user optionally initializes the dataint* p = imr.get_ptr();for(int i=0; i<100; i++)

p[i] = i;

// create buffer with specialized storage:// imr will serve as the ION backing store for the bufferauto b = hetcompute::create_buffer<int>(imr);

4.4.5 Performance and Storage Optimizations When Using Buffers

The runtime tries to take advantage of any advance knowledge of which devices may access a buffer’s data.The knowledge may be gained from HetCompute’s internal scheduling graph when the user creates tasksfor specific devices, and also directly from the user as hints provided at the time of buffer creation.HetCompute uses the current best knowledge to judiciously allocate limited resources (such as IONmemory), and minimizes the use of data copy and synchronization steps between specialized backingstores. For example, if it is known up-front that the buffer will be accessed by a DSP task, HetCompute canallocate ION memory at the time of buffer creation even if the first tasks accessing the buffer run only onthe CPU and GPU. The allocation of ION as the initial backing store eliminates all data copies betweendevices. However, if it was not known up-front that the buffer would be accessed by a Hexagon task, theruntime would initially allocate the backing store from the much cheaper system main memory. Later, theexecution of a Hexagon task would force the allocation of an ION backing store, followed by data copiesbetween the main memory and the ION backing stores. As a second example, if the CPU is not expected toaccess the buffer data, the allocation of the main memory backing store can be skipped entirely, reducingthe number of backing stores whose contents have to be kept synchronized.

4.4.5.1 Explicit Synchronization with Host Code

Hetcompute v1.0 introduces the following APIs in the buffer_ptr class supports the following hostsynchronization calls for a buffer,

1. acquire_ro(): The host code gains read-only access. Results from a prior task become visible tothe host code.

2. acquire_rw(): The host code gains read-write access. Results from a prior task become visible tothe host code. Modifications to the buffer data by the host code will be made visible to anysubsequent tasks.

3. acquire_wi(): The host code invalidates (clobbers) the prior contents of the buffer. Results froma prior task may be lost. The entire contents of the buffer should be treated as undefined, save for thenew contents written by the host code subsequent to this synchronization call. It is valid for the hostcode to read back any new contents written by the host code subsequent to this call. The new contentsof the buffer will be made visible to subsequent tasks.

The buffer synchronization allows the host code to access the buffer data updated by the task (see HostAccess). acquire_ro() acquires the underlying buffer for read-only access by the host code. The hostcode may also modify the buffer data by attempting to acquire the underlying buffer for write access usingacquire_wi() or acquire_rw() buffer APIs.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 128

Page 129: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

All acquire_∗ calls will block for any conflicting operations to complete (e.g., a task concurrentlyperforming read-write access to the buffer), after which the buffer is acquired for access by the host codeand the call unblocks. However, if the buffer has already been acquired for the host code by a precedingacquire_(), the call will return immediately.

The host code may recursively acquire the buffer using a combination of acquire_ro(),acquire_wi() and acquire_rw() calls. The first acquire_ establishes the access type (read-only,write-invalidate, or read-write) of the buffer for the host code. Subsequent recursive acquire_ calls willsucceed only if they are compatible with the previously established access type. Subsequent recursiveacquire_wi() and acquire_rw() calls will return with failure if the first recursive acquisition wasacquire_ro(), as the access type of these calls is incompatible with the established read-only access.However, any subsequent acquire_() recursive calls will succeed if the first acquisition was eitherwrite-invalidate or read-write. When the established access type is write-invalidate, subsequent recursiveread-only or read-write acquisitions are considered to get access to any data written to the buffer after theoriginal write-invalidate. When the established access type is read-write, a subsequent recursivewrite-invalidate does not destroy any prior data, as there is no additional synchronization required betweendevice memories to access the latest data.

The host code releases the buffer only when a number of release() calls equal to the number ofsuccessful recursive acquire_() calls are made.

Note that access by concurrent threads of the host code is also considered recursive, even when theacquire-release calls do not properly nest across threads. The first acquire by any one thread establishes thehost access type for all threads of the host code, until the host code releases.

// Relaxed host synchronization:// Select during buffer creation to get better performanceauto buf_a = hetcompute::create_buffer<float>(3);auto buf_b = hetcompute::create_buffer<float>(buf_a.size());

buf_a.acquire_wi();// Initialize the inputfor (size_t i = 0; i < buf_a.size(); ++i)

buf_a[i] = i;buf_a.release();

// Create a kernel objectauto gpu_vdouble = hetcompute::create_gpu_kernel<hetcompute::in<hetcompute::buffer_ptr<float>>, //

explicit in directionhetcompute::out<hetcompute::buffer_ptr<float>>> //

explicit out direction(vdouble_kernel_string, "vdouble");

// Execute the GPU kernelauto gpu_task1 = hetcompute::launch(gpu_vdouble,

hetcompute::range<1>(buf_a.size()), buf_a, buf_b);gpu_task1->wait_for();

buf_b.acquire_ro();for (size_t i = 0; i < buf_b.size(); i++)

HETCOMPUTE_ILOG("buf_b[%zu] = %f", i, buf_b[i]);buf_b.release();

buf_a.acquire_rw();// Read buf_afor (size_t i = 0; i < buf_a.size(); i++)

HETCOMPUTE_ILOG("buf_a[%zu] = %f", i, buf_a[i]);

// Read and modify buf_afor (size_t i = 0; i < buf_a.size(); ++i)

buf_a[i] = buf_a[i] + 5;buf_a.release();

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 129

Page 130: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

// Execute the GPU kernel againauto gpu_task2 = hetcompute::launch(gpu_vdouble,

hetcompute::range<1>(buf_a.size()), buf_a, buf_b);gpu_task2->wait_for();

// Relaxed host synchronization:// Host explictly requests permission to read buf_b.buf_b.acquire_ro();for (size_t i = 0; i < buf_b.size(); i++)

HETCOMPUTE_ILOG("buf_b[%zu] = %f", i, buf_b[i]);buf_b.release();

The buffer contents become undefined if the host code accesses the buffer data without explicitsynchronization when the synchronization was required (such as after a task access), or if the host codeperforms accesses incompatible with the type of access chosen (e.g., writing to the buffer after invokingro_sync()).

4.4.5.2 Providing Device Hints

Each of the variants of hetcompute::create_buffer() optionally take a list of devices likely toaccess the buffer. The upfront knowledge of likely devices allows the HetCompute runtime to allocatebacking storage more optimally. For example, the total ION memory available on a platform tends to bemuch more limited than the size of the main memory. However, if a buffer will be accessed by the CPU,GPU, and Hexagon, allocating the buffer’s backing store in ION memory ensures that tasks on each devicecan access the same ION backing store and no copies and further allocation/deallocation of backing storeswill be needed. If the likely devices knowledge for a buffer is not available, the HetCompute runtime willallocate the least costly storage based on the known tasks: a GPU task will cause the allocation of anOpenCL AHP buffer as a backing store if OpenCL does not support SVM. If OpenCL supports SVM, aGPU task will cause the allocation of an OpenCL SVM buffer as a backing store, a CPU task will allocate amain memory backing store to which the data will be copied, and then finally when a Hexagon taskexecutes a separate ION backing store will be allocated and the data copied into it.

The following illustrates how to provide likely-device hints during buffer creation.

auto b = hetcompute::create_task(100, {hetcompute::gpu, hetcompute::dsp});

The likely devices information is used merely as an optimization hint. If the information turns out to bepartial or incorrect, the only penalty will be some avoidable backing store allocations and a performance hitdue to the avoidable copying of the buffer data between the backing stores.

4.4.6 Memory Regions

hetcompute::memregion allows abstract allocation of specialized device memory regions (such asION memory) and easy inter-operability with data allocated in other frameworks (such as the use ofexisting OpenGL buffers or pre-allocated ION memory with HetCompute). hetcompute::memregionprovides RAII semantics over the specialized memory or framework data it is wrapping: the user constructsthe object to allocate the corresponding memory or setup the interop, and controls the lifetime of the objectto control the lifetime of the allocated memory or interop.

Currently, HetCompute provides the following three kinds of memory regions:

1. Main memory mem-region hetcompute::main_memregion: Allows a convenient mechanismfor allocating aligned memory.

2. OpenCL SVM Memory mem-region hetcompute::svm_memregion: Creates a wrapperaround an existing or new OpenCL SVM buffer.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 130

Page 131: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

3. ION memory mem-region hetcompute::ion_memregion: Allocates ION memory. The usercan choose if the memory will be cacheable (default) or non-cacheable. The user may also wrappre-allocated ION memory into a hetcompute::ion_memregion.

4. GL Buffer interop mem-region hetcompute::glbuffer_memregion: Creates a wrapperaround an existing OpenGL buffer.

The above three specializations derive from the hetcompute::memregion base class. A mem-regioncan be passed to hetcompute::create_buffer() to create a buffer with the mem-region as abacking store (see Creating Buffers). It is the user’s responsibility to ensure that the backing mem-region ofa buffer is kept alive while the buffer itself is going to be accessed. The user is not required to keep themem-region object alive beyond the point of last access to the buffer.

4.5 Textures

Texture APIs in HetCompute are useful for image processing tasks. These APIs allow the user to createimage objects from data residing in host memory and provide them to a kernel for processing. These APIsalso significantly simplify the programming of parellel image processing, such as filtering.

When a user is processing 2D or 3D image data, HetCompute texture APIs are useful as they provide amultitude of image formats, filtering modes and addressing modes for accessing and manipulating pixeldata in a GPU kernel effectively.

All APIs are accessible by including the following header:

#include <hetcompute/texture.hh>

Note that HetCompute internally handles the initialization of device-and platform-dependent contexts, sothe programmers do not need to query or create these contexts by themselves. To begin with, theprogrammers can use the following code to create HetCompute textures directly:

input_tex = hetcompute::graphics::create_texture<img_format, 2>({ width, height }, static_cast<unsignedchar const*>(input_img_data));

output_tex = hetcompute::graphics::create_texture<img_format, 2>({ width, height }, output_img_data);

This hetcompute::graphics::create_texture API takes an image format, image dimensions,and a valid host pointer which points to raw pixel data in memory as inputs. We create an input texture andan output texture for our GPU filter example.

To use a HetCompute texture, a GPU kernel can be created as follows:

auto boxfilter_gpukernel =hetcompute::create_gpu_kernel<mytextureptrtype, mytextureptrtype, mysamplerptrtype>(source_string,

"box_filter");

The mytextureptrtype and mysamplerptrtype are the corresponding types of textures andsamplers in HetCompute, which is defined as follows:

typedef hetcompute::graphics::texture_ptr<img_format, 2> mytextureptrtype;

typedef hetcompute::graphics::sampler_ptr<addr_mode, fil_mode> mysamplerptrtype;

Please note that source_string contains the actual OpenCL kernel code that takes textures as kernelfunction arguments and use them. In addition, template parameters must match the signature of the kernelfunction. The kernel source code provided in source_string is shown as follows:

__kernel void box_filter(__read_only image2d_t source,__write_only image2d_t dest,sampler_t sampler)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 131

Page 132: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

{// image dimensionsint img_width = get_global_size(0);int img_height = get_global_size(1);

int2 out_coord = (int2) ( get_global_id(0), get_global_id(1) );

if( out_coord.x < img_width && out_coord.y < img_height ){

int2 in_coord = out_coord;

// sample an 8x8 region and average the resultsfloat4 sum = 0.0f;for( int i = 0; i < 8; i++ ){

for( int j = 0; j < 8; j++ ){

sum += read_imagef(source, sampler, in_coord + (int2) (i - 4, j - 4));}

}

// compute the averagefloat4 avg_color = sum / 64.0f;// write the result to the output imagewrite_imagef( dest, out_coord, avg_color );

}}

This kernel above applies an 8-by-8 box filter to an input image and generates a smoothed output image.

The kernel can be executed in the style of general HetCompute tasks:

// launch GPU kernel over 2D rangehetcompute::range<2> r(0, width, 0, height);

auto t = hetcompute::create_task(boxfilter_gpukernel, r, input_tex, output_tex,sampler);

t->launch();

t->wait_for();

Internally, the kernel call is directed to the OpenCL driver for execution.

The processed result image can be read back using the following hetcompute::graphics::map API:

// read back result to CPUauto ptr = static_cast<unsigned char*>(hetcompute::graphics::map(output_tex));

if (ptr != output_img_data)HETCOMPUTE_FATAL("mapped addr does not match the original one.\n");

hetcompute::graphics::unmap(output_tex);

Please note that ptr should match the data pointer output_img_data that we use to create the outputtexture as a sanity check.

HetCompute also handles release of the OpenCL contexts and HetCompute texture objects. However, theprogrammers should still call hetcompute::graphics::unmap to release the mapping between CPUmemory and GPU memory, so the same HetCompute texture object can be reused for subsequent kernelcalls.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 132

Page 133: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4.5.1 QCOM Extended Image format

HetCompute supports creation of textures in QCOM Extended Image formats namely, TP10, NV12, P010.Both linear and UBWC variants of these formats are supported. These image formats are based on YUVformats and restrict direct writes to the parent plane image, however they do support writes to derivative Yand UV planes.

One can create derivative planes by invokinghetcompute::graphics::create_derivative_texture and passing the parent texture, width& height. hetcompute::graphics::create_derivate_textures is wrapper around newerOpenCL QCOM extensions to support QCOM extended formats and vector operations. The code snippetbelow shows a simple use of Compressed TP10 to Compressed TP10 copy using HetCompute parenttextures and derivative textures.

hetcompute::ion_memregion *src_ion_mem = newhetcompute::ion_memregion(buffer_size, false);

memset(src_ion_mem->get_ptr(), 0, src_ion_mem->get_num_bytes());

hetcompute::ion_memregion *dst_ion_mem = newhetcompute::ion_memregion(buffer_size, false);

memset(dst_ion_mem->get_ptr(), 0, dst_ion_mem->get_num_bytes());

if (memcmp(src_ion_mem->get_ptr(), dst_ion_mem->get_ptr(), buffer_size) != 0){

HETCOMPUTE_DLOG("Initial checking of memcmp is failing");}

// Copy image data into source ion buffermemcpy(src_ion_mem->get_ptr(), image.data(), image.size());

typedefhetcompute::graphics::texture_ptr<hetcompute::graphics::image_format::CompressedTP10unorm_int10, 2> mytextureptrtype;

typedef hetcompute::graphics::sampler_ptr<hetcompute::graphics::addressing_mode::ADDRESS_NONE,hetcompute::graphics::filter_mode::FILTER_NEAREST>

mysamplerptrtype;auto sampler =

hetcompute::graphics::create_sampler<hetcompute::graphics::addressing_mode::ADDRESS_NONE,hetcompute::graphics::filter_mode::FILTER_NEAREST>(

false);

auto hetcomputegpukernel =hetcompute::create_gpu_kernel<mytextureptrtype,

mytextureptrtype,mytextureptrtype,mysamplerptrtype>(default_source_string, "

copy_tp10_yuv_image_to_y_image_and_uv_image");

// Create src parent texture for TP10 compressedauto input_tex =

hetcompute::graphics::create_texture<hetcompute::graphics::image_format::CompressedTP10unorm_int10, 2>({ width, height },

*(src_ion_mem),

true);

// Create dst parent texture for TP10 compressedauto output_tex =

hetcompute::graphics::create_texture<hetcompute::graphics::image_format::CompressedTP10unorm_int10, 2>({ width, height },

*(dst_ion_mem),

true);

// Create dst derivative Y & UV planeauto output_tex_y =

hetcompute::graphics::create_derivative_texture<hetcompute::graphics::image_format::CompressedTP10unorm_int10,

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 133

Page 134: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

2>(output_tex,hetcompute::graphics::extended_format_plane_type::ExtendedFormatYPlane);

auto output_tex_uv =hetcompute::graphics::create_derivative_texture<

hetcompute::graphics::image_format::CompressedTP10unorm_int10,2>(output_tex,

hetcompute::graphics::extended_format_plane_type::ExtendedFormatUVPlane);

The above example create source and destination ION memory respectively. Source ION memory ispopulated with image data being read. The creation of GPU Kernel, sampler follow the previous example.The above snippet creates parent UBWC TP10 textures. Since we are writing UBWC TP10 data tooutput_tex the example creates the derivative Y and UV plane. Both these derivative textures are laterpassed into GPU Kernel for actually copying data to output Y and UV textures.

4.6 Data Structures

4.6.1 HetCompute Lock-Free Queue

HetCompute provides two variants of a concurrent lock-free first-in first-out (FIFO) queue data structure: afixed size implementatation denoted by bounded_lfqueue, and an unbounded version denoted bylfqueue. Both variants support two operations: push and pop. The push operation inserts a value intothe queue and returns true if it was successful. A push operation may only return false in the case ofthe bounded_lfqueue, when the queue is full.

A pop operation removes a value from a non-empty queue, and returns true. If the queue is empty, a popoperation returns false. Multiple threads can execute push and pop operations in parallel on the queues,and synchronization is achieved without the use of locks.

All lfqueue APIs are accessible by including the following header:

#include <hetcompute/lfqueue.hh>

The bounded_lfqueue APIs are accessible by including the following header:

#include <hetcompute/bounded_lfqueue.hh>

At a high level, the bounded_lfqueue is implemented as a fixed size circular array, whose size isdefined by the user through an input parameter. Specifically, in HetCompute, size of the array is forced tobe a power of two, by taking the log (to the base 2) of the size as input. Consider the following example of abounded_lfqueue instantiation:

hetcompute::bounded_lfqueue<size_t> q(8);

In this example, the size of the bounded_lfqueue is set to 2∧8 = 256 entries. When the queue is full, apush operation cannot add a new value into the queue until one has been popped.

The lfqueue can be thought of as a linked list, where each node is a bounded_lfqueue, and is ofunbounded size. As in the case of the bounded_lfqueue, the size parameter passed during instantiationgives its initial size. The lfqueue then extends itself in chunks of this size whenever needed.

The following is a simple example that illustrates the use of the lfqueue.

1 #include <hetcompute/lfqueue.hh>23 #include <hetcompute/hetcompute.hh>45 int6 main()

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 134

Page 135: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

7 {8 hetcompute::runtime::init();9 hetcompute::lfqueue<size_t> q(8);1011 // Create groups for producer and consumer tasks12 auto producer = hetcompute::create_group("Producer");13 auto consumer = hetcompute::create_group("Consumer");1415 // Launch 2 tasks into the producer group,16 // each of which pushes 100 values into q17 for (size_t p = 0; p < 2; ++p)18 {19 producer->launch([&]() {20 for (size_t i = 0; i < 100; i++)21 {22 q.push(i);23 }24 });25 }2627 // Launch 2 tasks into the consumer group,28 // each of which pops 100 values from q29 for (size_t c = 0; c < 2; ++c)30 {31 consumer->launch([&]() {32 size_t j = 0;33 while (j < 100)34 {35 size_t result;36 if (q.pop(result))37 {38 // The popped value is stored in result39 ++j;40 }41 }42 });43 }4445 // wait for consumer group to finish46 consumer->wait_for();4748 hetcompute::runtime::shutdown();49 }

In the above example, two HetCompute groups producer and consumer, are created first (lines 12 and13). Two HetCompute tasks are then launched into each group. Each task in the producer group pushes100 size_t values (lines 19-24) into the queue q (instantiated in line 9), and the tasks in the consumergroup concurrently pop the values (lines 32-41) from q. The program terminates only when all the 200values pushed into the queue have been popped. Therefore, it suffices to wait for the consumer group tofinish (line 46), as the consumer tasks will complete only after each one has popped 100 tasks.

4.7 Storage

4.7.1 Task-Local Storage

Tasks, much like threads, can be associated with task-local storage, viahetcompute::task_storage_ptr. The usage pattern consists of declaring a global variable, saystorage, which holds a pointer to the actual task-local data. Then, within task t, that variable is assigneda pointer to a (usually) local variable, or a chunk of freshly allocated memory. After that, storage can beused within the dynamic extent of task t:

1 #include <hetcompute/hetcompute.hh>2

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 135

Page 136: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

3 namespace4 {5 hetcompute::task_storage_ptr<int> storage;6 }; // namespace78 void func();910 void11 func()12 {13 HETCOMPUTE_ILOG("%d", *storage);14 ++*storage;15 }1617 int18 main()19 {20 hetcompute::runtime::init();21 auto g = hetcompute::create_group();22 for (int i = 0; i < 10; ++i)23 {24 g->launch([i] {25 int v = i;26 storage = &v;27 func();28 if (v != i + 1)29 {30 HETCOMPUTE_ILOG("error");31 }32 func();33 if (v != i + 2)34 {35 HETCOMPUTE_ILOG("error");36 }37 });38 }39 g->wait_for();40 hetcompute::runtime::shutdown();41 return 0;42 }

Note that accessing the value of storage affects only the current task. Attempting to modify the value ofa task_storage_ptr outside of a task yields undefined behaviour.

Optionally, a destructor (or rather: finalizer), can be employed to dispose resources. The destructor will runwithin each task that has a value assigned to the global variable.

4.7.2 Scheduler-Local Storage

Another use case is scratchpads: data that is persistent across task boundaries, usually to avoid per-taskmemory allocation or initialization. HetCompute can avoid synchronizing access to scratchpads if eachscheduler creates its own scratchpad (which can then be used like task-local storage). As furtheroptimization, hetcompute::scheduler_storage_ptr<T>s are created lazily when they arewritten to inside of a task. Note that variable initialization and destruction happens through the constructorand destructor of T:

1 #include <hetcompute/hetcompute.hh>23 namespace4 {5 const hetcompute::scheduler_storage_ptr<size_t> s_sls_state;6 }; // namespace78 int9 main()10 {

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 136

Page 137: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

11 hetcompute::runtime::init();12 auto g = hetcompute::create_group();1314 for (size_t i = 0; i < 200; ++i)15 {16 g->launch([i] {17 size_t c = ++*s_sls_state;18 // values for c are consecutive on a per-scheduler basis19 (void)c;20 });21 }2223 g->wait_for();2425 hetcompute::runtime::shutdown();26 return 0;27 }

Scheduler-local storage is unaffected by context switches (e.g., viahetcompute::task<>::wait_for).

1 #include <hetcompute/hetcompute.hh>23 namespace4 {5 const hetcompute::scheduler_storage_ptr<size_t> s_sls_state;6 }; // namespace78 int9 main()10 {11 hetcompute::runtime::init();12 auto g = hetcompute::create_group();13 auto t = hetcompute::create_task([] {});1415 for (size_t i = 0; i < 200; ++i)16 {17 g->launch([=] {18 size_t c1 = ++*s_sls_state;19 t->launch();20 t->wait_for();21 size_t c2 = ++*s_sls_state;22 if (c1 + 1 != c2)23 {24 HETCOMPUTE_ILOG("error: mismatch");25 }26 });27 }2829 g->wait_for();30 hetcompute::runtime::shutdown();3132 return 0;33 }

A complete example

1 #include <algorithm>2 #include <iterator>34 #include <hetcompute/hetcompute.hh>56 template <size_t N>7 struct image_scratchpad8 {9 image_scratchpad() { std::fill(std::begin(edge_image), std::end(edge_image), 0); }10 char edge_image[N];11 };1213 namespace14 {

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 137

Page 138: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

15 const hetcompute::scheduler_storage_ptr<image_scratchpad<4096>> image_buffers;

16 }; // namespace1718 int19 main()20 {21 hetcompute::runtime::init();22 int const N = 200;2324 auto g = hetcompute::create_group();25 for (int i = 1; i < N; ++i)26 {27 g->launch([i] {28 // fill image buffer, which is reused across tasks29 for (auto& slot : image_buffers->edge_image)30 slot = i & 0xff;31 hetcompute::internal::yield(); // context-switch, we expect SLS to survive this32 // check contents33 for (auto const& slot : image_buffers->edge_image)34 {35 if (slot != char(i & 0xff))36 {37 HETCOMPUTE_ILOG("mismatch at position %d", i);38 }39 }40 });41 }42 g->wait_for();4344 hetcompute::runtime::shutdown();45 return 0;46 }

4.7.3 Thread-Local Storage

If a group of tasks needs scratchpads, but does not require that data persists across context-switching,hetcompute::thread_storage_ptr is a viable alternative tohetcompute::scheduler_storage_ptr. Because HetCompute Thread-Local Storage is tied toHetCompute’s device thread, HetCompute allocates fewer instances of T (compared tohetcompute::scheduler_storage_ptr, see earlier example), at most one per device thread.

1 #include <hetcompute/hetcompute.hh>23 namespace4 {5 const hetcompute::thread_storage_ptr<size_t> s_tls_state;6 }; // namespace78 int9 main()10 {11 hetcompute::runtime::init();12 auto g = hetcompute::create_group("test");13 auto t = hetcompute::create_task([] {});1415 for (size_t i = 0; i < 200; ++i)16 {17 g->launch([=] {18 size_t* p1 = s_tls_state.get();19 t->launch();20 t->wait_for();21 size_t* p2 = s_tls_state.get();22 // cannot assume that p1 == p223 (void)p1;24 (void)p2;25 });26 }

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 138

Page 139: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

2728 g->wait_for();2930 hetcompute::runtime::shutdown();31 return 0;32 }

4.8 Affinity

The affinity APIs enable the programmer to change execution properties of program statements (arbitraryfunctions), HetCompute tasks, and device threads. These properties include:

• location: to set the CPUs where the program constructs should run.

• pinning: to set whether HetCompute device threads should migrate freely among cores (also knownas thread binding).

• mode: to override local affinity settings.

Programmers can benefit from these APIs to improve performance and even to save power. All APIs areaccessible by including the main HetCompute header defined in hetcompute/affinity.hh:

#include <hetcompute/hetcompute.hh>

Note

For setting the affinity of individual tasks (rather than all tasks using the above APIs) to big or LITTLEin a big.LITTLE SoC, use CPU kernel attributes (Setting Kernel Attributes).

Regarding the capabilities of the APIs, location enables targeting clusters of cores in heterogeneousSystem-On-Chip (SoC), such as Qualcomm Snapdragon 845 or 835, where not all cores are equal,providing different performance/power points. For example, in a Snapdragon 845, a programmer maychoose to run only in the LITTLE cluster as illustrated in the following example, which demonstrates allother affinity APIs as well.

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();78 auto fn = [](int i) { HETCOMPUTE_ILOG("Function executed with specified affinity on arg %d", i); };9 auto aff_settings =10 hetcompute::affinity::settings(

hetcompute::affinity::cores::big, false,hetcompute::affinity::mode::allow_local_setting);

11 // In a big.LITTLE SoC, function fn executes on a big core.12 hetcompute::affinity::execute(aff_settings, fn, 42);1314 auto g = hetcompute::create_group(__FUNCTION__);1516 auto k_wout_attrib = hetcompute::create_cpu_kernel([] { HETCOMPUTE_ILOG("

Task without kernel affinity attribute."); });1718 auto k_with_attrib = hetcompute::create_cpu_kernel([] { HETCOMPUTE_ILOG("

Task with kernel affinity attribute"); });19 k_with_attrib.set_little();2021 // k_with_attrib kernel will run in a LITTLE core22 g->launch(k_with_attrib);2324 // k_wout_attrib can run in any core

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 139

Page 140: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

25 g->launch(k_wout_attrib);2627 g->wait_for();2829 // Set the affinity to the LITTLE cores without pinning in30 // allow_local_setting mode31 hetcompute::affinity::set(32 hetcompute::affinity::settings(

hetcompute::affinity::cores::little, false,hetcompute::affinity::mode::allow_local_setting));

3334 // k_wout_attrib task will run in a LITTLE core because the kernel has no35 // individual affinity specification36 g->launch(k_wout_attrib);3738 // Set the affinity to the big cores with pinning in allow_local_setting mode39 // by reading the current affinity and then updating the different fields40 auto affinity = hetcompute::affinity::get();4142 // Update the cores from LITTLE to big43 affinity.set_cores(hetcompute::affinity::cores::big);4445 // Enable thread pinning46 affinity.set_pin_threads();4748 // Update the mode from allow_local_setting to override_local_setting in the49 // settings50 affinity.set_mode(hetcompute::affinity::mode::override_local_setting

);5152 // Update the affinity with the modified affinity object53 hetcompute::affinity::set(affinity);5455 // The second run of k_with_attrib will run on a big core because the56 // affinity mode is override_local_setting and global affinity settings are57 // obeyed58 g->launch(k_with_attrib);5960 g->wait_for();6162 hetcompute::runtime::shutdown();63 return 0;64 }

The example illustrates three different ways of setting affinity to program constructs and also shows howone can override the others.

1. hetcompute::affinity::execute is the easiest, most portable and efficient way to enforceaffinity for program statements expressible as function calls, function objects, or C++ lambdas. In theexample, the programmer calls fn with argument 42, and HetCompute ensures that the functionexecutes on a big core. Note that the programmer need not be concerned about whether the code isexecuting on a big.LITTLE SoC, whether the thread calling hetcompute::affinity-::execute(..., fn, ...) is executing on a big or a LITTLE core, etc. HetComputedetermines the most efficient way to execute the function with the desired affinity.

2. The programmer can set the affinity of individual kernels. In the example, the HetCompute CPUkernel k_with_attrib is marked as having affinity for the LITTLE cores through the statement:k_with_attrib.set_little();

HetCompute will run the CPU kernel k_with_attrib on a LITTLE core, in constrast tok_wout_attrib which may run on any core.

3. The programmer can set the affinity of all program statements viahetcompute::affinity::set(

hetcompute::affinity::settings(hetcompute::affinity::cores::little, false,hetcompute::affinity::mode::allow_local_setting));

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 140

Page 141: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

The above creates an affinity settings object containing the indicated location, pinning, and mode.The call to hetcompute::affinity::set will update the HetCompute affinity settings. In thiscase, HetCompute’s device threads will run only in the LITTLE CPUs and can migrate freely amongCPUs. This setting is very useful to save power because it guarantees that all CPU tasks are executedin the low-power cluster of the SoC. Tasks constructed from CPU kernels without the big/LITTLEattribute will automatically be routed to the appropriate CPU cores through the affinity setting(LITTLE cores in the example).

Note

Specifying a big or LITTLE location in an SoC with homogeneous cores, such as Snapdragon 805,will have no effect. However, the pinning request will still be fulfilled.

To update individual aspects of the current affinity settings, use the following API:

auto affinity = hetcompute::affinity::get();

// Update core affinity from LITTLE to bigaffinity.set_cores(hetcompute::affinity::cores::big);

4.8.1 Overriding Local Affinity Settings

There will be situations where programmers would like to pin the device threads to the big cores in order tomaximize locality and performance; e.g., a highly optimized linear algebra library. They can achieve thisgoal by setting the location to big and pinning to true. To guarantee that all tasks are run in the big; i.e.,big/LITTLE CPU kernel affinity attributes or hetcompute::affinity::execute() affinitysettings are discarded, they can also set the mode to override_local_setting.

// Update the mode from allow_local_setting to override_local_setting in the settingsaffinity.set_mode(hetcompute::affinity::mode::override_local_setting

);

// Update the affinity with the modified affinity objecthetcompute::affinity::set(affinity);

In our example, k_with_attrib kernel executes in the LITTLE cores the first two times; the third time,it runs in the big cores.

To respect local affinity settings, set mode tohetcompute::affinity::mode::allow_local_setting

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 141

Page 142: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

Note

Very few situations will benefit from pinning; due to the thermally constrained environment of mobilepackages, CPUs can go online/offline unannounced. When requesting pinning, if there are offlineCPUs, HetCompute will pin device threads as much as possible to a single CPU; however, some devicethreads may remain unpinned.

Reset affinity settings when they are no longer required using:

hetcompute::affinity::reset();

And the system will return to the default state where device threads can freely move across CPUs.

All previous free functions are thread-safe, and programmers may call the affinity APIs at any point ofexecution, even within CPU tasks.

4.9 Heterogeneous Computing in Action

This section demonstrates how to write a simple heterogeneous HetCompute application that executes taskson multiple devices in a system, coherently bringing together many of the HetCompute constructs describedpreviously.

In general, the steps for writing a heterogeneous HetCompute application are as follows:

1. Write device functions for the devices that will be used.

2. Declare HetCompute buffers that pass data between devices.

3. Create HetCompute kernels from appropriate device functions and buffers.

4. Use the kernels to create HetCompute tasks.

5. Construct and launch a HetCompute program using the tasks.

In particular, the last step in the list above is the same for any program, whether it is CPU-only orheterogeneous. Therefore, the remainder of this section focuses on the first four steps.

To explain those steps, the remainder of this section develops a simple HetCompute applicationstep-by-step. This application takes in an array of 10 floating-point numbers, x[i], and computes x[i] ∗ x[i]+ 1 / x[i] for each element. That is, an input of 1.0, 2.0, ..., 10.0 would produce an output of 2.0, 4.5, ...,100.01.

First, the device functions must be written. As explained in Kernels: The Path to Heterogeneity, at present,CPUs, GPUs, and DSPs are programmed in different languages in HetCompute. The following examplelists three device functions that will be used in the example program, showing different deviceprogramming styles.

1 #pragma once23 // A CPU function that initializes an array4 void5 f1(float* b, int N)6 {7 for (int i = 0; i < N; i++)8 b[i] = i + 1.0f;9 }1011 // A GPU function that computes squares12 std::string const f2_string = "__kernel void f2(__global float *in, __global float *out) {"13 " int i = get_global_id(0);"14 " out[i] = in[i] * in[i];"

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 142

Page 143: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

15 "}";1617 // A DSP function that computes reciprocals18 int19 f3(float* in, int lin, float* out, int lout)20 {21 int i;22 for (i = 0; i < lin && i < lout; i++)23 out[i] += 1 / in[i];24 return 0;25 }

After the device functions are written, the next step is to consider how data is passed between thesefunctions and create data containers accordingly. HetCompute buffers serve this purpose. In particular,buffers abstract away much of the manual data marshaling in traditional heterogeneous programmingenvironments, greatly simplifying multi-device programming.

Finally, the methods described in Kernels: The Path to Heterogeneity and Creating Tasks can be used tocreate kernels and tasks, set dependency between tasks, and launch them. This results in the following finalresult.

1 #include <hetcompute/hetcompute.hh>2 #include "heterogeneous.hh"34 int5 main()6 {7 hetcompute::runtime::init();8 // The number of elements to compute x^2 + 1/x for9 constexpr int N = 10;1011 // Create buffers for input and output data12 auto b1 = hetcompute::create_buffer<float>(N);13 auto b2 = hetcompute::create_buffer<float>(N);1415 // The CPU initializes the input data first16 auto t1 = hetcompute::create_task(f1);1718 // The GPU squares every input element19 auto k2 = hetcompute::create_gpu_kernel<hetcompute::buffer_ptr<float>, hetcompute::buffer_ptr<float>>(

f2_string, "f2");20 auto t2 = hetcompute::create_task(k2,

hetcompute::range<1>{ N }, b1, b2);2122 // The DSP adds the reciprocals to the result23 auto k3 = hetcompute::create_dsp_kernel<>(f3);24 auto t3 = hetcompute::create_task(k3, b1, b2);2526 // Run all the tasks27 t1 >> t2 >> t3;28 t1->launch(b1, N);29 t2->launch();30 t3->launch();31 t3->wait_for();3233 // Output the result34 for (int i = 0; i < N; i++)35 HETCOMPUTE_ILOG("%f\n", b2[i]);3637 hetcompute::runtime::shutdown();38 return 0;39 }

Note

It is worth emphasizing again that HetCompute tasks are universal across different devices. Whiletasks may contain kernels customized for different devices, at the task level and above, a programmershould not need to distinguish between these tasks.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 143

Page 144: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

While the example above is functionally correct, its performance can be improved by a few simpletechniques.

1. The GPU and DSP kernels in the example above are sequentially executed. However, through the useof additional buffers, they can be launched asynchronously, giving them a chance to executeconcurrently depending on scheduling results. The caveat is to ensure that they converge in the samehost thread by using a group wait_for.

2. While the default buffer constructors need very few arguments to produce functionally correctbehavior, their performance can be improved by providing additional hints to the constructors. Forexample, hetcompute::in, hetcompute::out, and hetcompute::inout can be used toqualify the buffers and avoid unnecessary copies. Additionally, hints about likely devices can guidestorage allocation for buffers.

The optimizations above produce the following result:

1 #include <hetcompute/hetcompute.hh>2 #include "heterogeneous2.hh"34 int5 main()6 {7 hetcompute::runtime::init();8 // The number of elements to compute x^2 + 1/x for9 constexpr int N = 10;1011 // Create buffers for input and output data12 auto b1 = hetcompute::create_buffer<float>(N);13 auto b2 = hetcompute::create_buffer<float>(N);14 auto b3 = hetcompute::create_buffer<float>(N);15 auto b4 = hetcompute::create_buffer<float>(N);1617 // The CPU initializes the input data first18 auto t1 = hetcompute::create_task(f1);1920 // The GPU squares every input element21 auto k2 =22 hetcompute::create_gpu_kernel<hetcompute::in<hetcompute::buffer_ptr<float>>,

hetcompute::out<hetcompute::buffer_ptr<float>>>(f2_string,23

"f2");24 auto t2 = hetcompute::create_task(k2,

hetcompute::range<1>{ N }, b1, b2);2526 // The DSP adds the reciprocals to the result27 auto k3 = hetcompute::create_dsp_kernel<>(f3);28 auto t3 = hetcompute::create_task(k3, b1, b3);2930 // Run all the tasks31 auto g = hetcompute::create_group();32 t1 >> t2;33 t1 >> t3;34 t1->launch(b1, N);35 g->launch(t2);36 g->launch(t3);37 g->wait_for();3839 // Combine the results40 for (int i = 1; i < N; i++)41 b4[i] = b2[i] + b3[i];4243 // Output the result44 for (int i = 0; i < N; i++)45 HETCOMPUTE_ILOG("%f\n", b2[i]);4647 hetcompute::runtime::shutdown();48

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 144

Page 145: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

49 return 0;50 }

4.10 Interoperability

The HetCompute programming model isolates programmers from threads; however, HetComputeapplications are multithreaded. In this section, some of the interoperability issues are discussed that arisefrom using threads in your application and how they interact with the HetCompute runtime.

A HetCompute application starts in a main thread and will create a thread pool. The number of threads inthe thread pool depends on the platform. Additional threads may be created based on application behaviorin order to keep all the resources busy. A HetCompute task might be created in any thread, and otheroperations, such as launching and executing on any other threads. Heterogeneous tasks on the GPU and D-SP behave similarly. Any application thread may call hetcompute::create_task(Code &&code,Args &&...args) and hetcompute::task<>::launch(). The task will be executed by one ofthe threads in the HetCompute thread pool. It is important to note this distinction because programmersneed to ensure that data accessed in the task must be available throughout the lifetime of the task (even if ithas been allocated in a different thread) and that the data is accessed in a thread-safe manner.

HetCompute provides certain guarantees with respect to task execution, as defined below.

4.10.1 Safe Points

Safe points are HetCompute API methods where the following property holds: the thread on which the taskexecutes before the API call might not be the same as the thread on which the task executes after the APIcall. The APIs in HetCompute that may switch threads are:

• hetcompute::task<>::wait_for()

• hetcompute::group<>::wait_for()

• hetcompute::condition_variable::wait()

4.10.2 Using HetCompute with the Fork() System Call

The Unix fork() system call is designed to duplicate the current process as a new process. This call iscommonly used within shells to start new commands, within web servers to handle new connections in aseparate process, and within web browsers to implement security between different browser tabs.

An important limitation of fork() is that it copies the memory of the process, but starts the child withonly one thread, cloned from the thread that made the fork() system call. This is a known problem formultithreaded programs, and HetCompute is no exception. Calling fork() from a task running in thethread pool starts the new process with only one thread from the pool, and no other threads would exist.Tasks running on the other threads would be copied in an inconsistent state, and the output would beindeterminate. HetCompute implements various features to prevent this misuse of fork().

No calls to fork() are allowed after any HetCompute function is invoked. This includes functions likehetcompute::create_task and hetcompute::create_group. If the application requires theuse of fork(), it should be called before any HetCompute routines are invoked.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 145

Page 146: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

4.10.3 Using HetCompute with TLS-aware Libraries

There are other libraries, such as libraries that use thread local storage (TLS) that also interacts with theHetCompute runtime in non-intuitive ways. Remember that when HetCompute tasks execute, they stay onthe same thread until they complete execution or they arrive at a safe point. In most cases, this is not aproblem; however, when a task makes calls to libraries that use TLS, there are a number of issues asdiscussed below.

Following are examples of common TLS-aware libraries:

Xlib

The Xlib libraries are typically not thread-safe, although each implementation is different. It is notpossible to perform display operations from two threads at the same time because this could corruptinternal data structures. While multiple threads can be used, the programmer must ensure that only onethread can be using Xlib at any time.

UI Toolkits

User interface toolkits —such as QT— typically have a main thread which is dedicated to processinginput events, manipulating a display, and then sleeping until more input occurs. It is important thatcontrol is returned to the UI toolkit as soon as possible to ensure that the user experience is smooth anduninterrupted. If a call from a different thread is made to a function that manipulates the UI or triggersan event, the toolkit may corrupt a data structure, or detect this and generate an error message.

OpenGL

Each OpenGL implementation varies in how it can be used with multiple threads. In typical usage, youcreate an OpenGL context in the thread where you intend to use it. The OpenGL library then setsinternal state information into TLS. This internal state is used so that when calls are made to OpenGL,you do not need to pass the context around each time. However, the TLS is set for only one thread. Soif you try to make an OpenGL call from a different thread, the implementation may fail. Someimplementations allow multiple contexts, with each context being created on the thread where it willbe used. With multiple contexts, some implementations also allow parallel access to the OpenGLlibrary, although this support varies depending on the vendor. Hardware implementing OpenGLtypically uses some kind of command buffer, which can force a sequential ordering of commands.Therefore, trying to implement calls to OpenGL in parallel may not provide any benefit, and mayactually slow things down due to contention on the mutex used to protect the command buffer. AnOpenGL application is typically used with some kind of user interface and event handler, which willbe running on the main thread. So it is recommended that you perform your OpenGL calls in the samethread as the user interface.

While these are limitations that need to be taken into consideration, it is still possible to exploit parallelismusing HetCompute in these types of applications. For example, consider the case of a game with physicssimulation, where the user can click on the display to launch spheres into a room. In a sequentialimplementation, the user touches the display, which generates a UI event. The UI thread wakes up andprocesses the UI event, which needs to generate the new sphere in the physics simulation. The physicssimulation runs for the time required to compute the result. The location of all objects in the physicssimulation is then traversed, and OpenGL calls are made to draw the scene. The OpenGL buffers are thenswapped onto the display, and the thread goes back to sleep to wait for either a UI event, or a timeout to

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 146

Page 147: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK User Guide

refresh the display with no change.

When analyzing the previous example, the bulk of calculations are performed in the physics engine. This isvery computationally expensive and where the most optimization work can be applied. So HetCompute canbe used here to perform the computation in parallel, assuming the underlying implementation supports this.The user breaks down the parts of the simulation into a suitable number of tasks, specifies dependencies,and then launches them with HetCompute. When the tasks are launched, the thread does a wait_for()until the tasks have completed. In the meantime, the HetCompute thread pool begins executing the tasks,which spreads the computational load across all available processors. When the tasks are complete, thewait_for() will return, and execution on the main thread can continue with the calls to OpenGL forrendering. With this arrangement, you can see that the operations that are thread-sensitive are performed inone thread, guaranteeing safe use of libraries such as OpenGL and the user-interface toolkit. Manycomputationally intensive OpenGL applications are written with an event loop very similar to that describedabove, so these changes should be relatively simple to implement to take advantage of HetCompute.

4.10.4 Distributed Computing using HetCompute

HetCompute may be used to provide SMP concurrency in an MPI (Message Passing Interface)application. HetCompute has been tested with MPICH2. All the caveats above about fork() and threadsapply.

4.10.5 Avoid the Use of C++ iostream and stringstream Libraries

Warning

The C++11 standard indicates that the iostream library should be thread safe. As of this writing, theprogrammers have experienced stability issues on some platforms, such as Android and OSX. In orderto maximize portability, HetCompute applications should avoid using cout and cerr to performasynchronous writes, especially to the console. It is recommended to use the C-based stdioprintf routines. On Android, programmers have experienced additional issues withstringstream objects.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 147

Page 148: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

5 Parallel Processing Tutorial

5.1 Abstract

In this tutorial, general principles of parallel programming are introduced with an emphasis on task-basedparallel programming models. Scaling is first introduced as a metric of evaluating the potential speedup thatan algorithm can obtain. Different parallel programming paradigms are introduced with a number ofoptimizations for parallel code. Examples are provided to illustrate the HetCompute programming model.

5.2 Parallel Speedups

Amdahl [2] put forward an argument that the maximum speedup that can be obtained by a parallelalgorithm is bounded by the serial fraction of the program. Intuitively, even if HetCompute could executethe parallel fraction infinitely fast (zero time), the serial fraction will determine the total execution time.This argument, commonly known as Amdahl’s Law, can be summarized by the following equation, whenconsidering N parallel processors:

ParallelSpeedup =s + p

s + p/N=

1

s + p/N,

where s + p = 1, representing the serial and parallel fractions of the program, respectively. Using Amdahl’slaw, the speedup that can be obtained with eight processors as a function of the serial fraction is illustrated

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 148

Page 149: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Parallel Processing Tutorial

in Figure Amdahl.

Figure 5-1 Theoretical speedup on eight processors using Amdahl’s Law.

Note that even if the serial fraction is only 10%, the maximum theoretical speedup achievable is 4.58. Inpractice, however, hardware architecture characteristics, such as caching, allow programmers to obtainmuch better performance from multicore systems. Amdahl’s law expresses performance increase forconstant problem size (strong scaling). Gustafson [9] demonstrates that parallel processing can be used toperform more work in the same amount of time by increasing the problem size, thus improving scalability.this technique is called weak scaling. Architectural artifacts [11] also play an important role; additionalprocessors come with additional cache and memory resources, often enabling applications to obtainsuper-linear speedup. A number of optimizations are discussed that take advantage of architectural featuresin Section Optimizations.

5.3 Parallel Programming Paradigms

When discussing parallel programming, practitioners classify the different types of parallelism looselyfollowing machine organizations [7] :

5.3.1 Data parallelism (SIMD)

SIMD machines include vector units, array processors, and GPUs. In this model, the program is executingthe same code on different data elements. Data parallel algorithms are typically expressed as operations ona multi-dimensional array. Control flow is uniform; however, operations on certain elements may bemasked out. Image processing algorithms are prototypical for data parallelism. In the current version ofHetCompute, one can exploit data parallelism by using vector intrinsics [15] to target the NEON units, orby calling OpenGL functions to execute on the GPU. Future versions of HetCompute will support SIMD

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 149

Page 150: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Parallel Processing Tutorial

compute on the GPU as part of the programming model. In the code example below, a simple example isshown of a scalar vector multiply (SAXPY) using the HetCompute pfor_each pattern (see ParallelProgramming Patterns) and vector operations.

void saxpy(float* y, float a, float* x, int n) { // Y += a * X// for simplicity only vector sizes are multiplied that are multiples of 4assert(n%4 == 0);

float32x4_t av = vmovq_n_f32(a); // initialize all lanes to a// implicitly waits for the loop to completehetcompute::pfor_each(hetcompute::range<1>(0, n, 4), // create

an iteration for each vector op[&av, x, y](const hetcompute::index<1>& i) {

float32x4_t xv, yv;auto sz = 4*sizeof(float);memcpy(&xv, &x[i[0]], sz); // initialize the vector regsmemcpy(&yv, &y[i[0]], sz);yv = vmlaq_f32(yv, av, xv); // yv += a * x[i];memcpy(&y[i[0]], &yv, sz); // copy result back into y[i]

});}

5.3.2 Task parallelism (MIMD)

MIMD machines are multiprocessors. In this model, different hardware execution contexts (e.g., threads,cores, processors) execute different code on different data elements. Tasks are either independent orcooperate on processing over a shared data structure. Thus, tasks may have control or data dependencies.The irregular structure of the computation complicates the handling of dependencies and synchronization.Typical applications include: physical simulations, computer simulations, browsers [5], etc. Taskparallelism is supported in HetCompute and samples/src are provided in the HetCompute User’s Manual[16].

In the example below, a simple parallel depth first traversal is demonstrated for an n-ary tree data structure(each node has a variable number of children stored in the children collection). The hetcompute::pfor_eachconstruct will recursively launch tasks for each of the children of a node. Examples with more complicatedtask graphs are provided in Unleashing Asynchrony.

void depth_first_traversal(node *root){

// process all the children of the node in parallelhetcompute::pfor_each(children.begin(), children.end(), [](childIterator it) {

dfs(*it);});// process the nodenode->mark_as_traversed();

}

5.3.3 Braided parallelism

Modern machines combine CPUs and GPUs for heterogeneous general-purpose computation. Recently,there has been a significant increase in GPGPU (general purpose GPU) programming. The braidedparallelism model combines task parallel computation with data parallel execution [8]. This unified modelis used to dynamically exploit data parallelism on SIMD units and GPUs from within concurrent tasksexecuting on the MIMD units. Examples include gaming applications, which have many concurrent tasks(physics, AI, UI) that are composed from data-parallel computations, such as particle simulations, imageprocessing and rendering.

The following example (in pseudocode) shows how HetCompute can execute tasks on CPU, GPU, and DSPat the same time. Additional samples/src are available in Kernels: The Path to Heterogeneity.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 150

Page 151: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Parallel Processing Tutorial

// Create a set of input buffers for kernelsauto buf_a = hetcompute::create_buffer(1024);auto buf_b = ...; auto buf_c = ...;auto g = hetcompute::create_group(); // aggregate of bound kernels/tasks

// Create a CPU kernel that takes an int as an argumentauto ck = hetcompute::create_cpu_kernel<>([](int){...});// Bind 42 as the argument to kernel; executes on CPUg->launch(ck, 42);

// Create a GPU kernel that takes three buffer argumentsauto gk = hetcompute::create_gpu_kernel<hetcompute::buffer_ptr<float>,

hetcompute::buffer_ptr<float>,hetcompute::buffer_ptr<float>>(vadd_kernel_string, “vadd”);

// Bind the buffer arguments and execute on GPUg->launch(gk, hetcompute::range<1>(1024), buf_a, buf_b, buf_c);

// Create a Qualcomm Hexagon DSP kernel with a buffer and a long ptr parameterauto hk = hetcompute::create_dsp_kernel<hetcompute::buffer<float>, long *>(math_sum);long sum;// Bind arguments and execute on DSPg->launch(hk, buf_a, &sum);g->wait_for();

5.3.4 Pipeline parallelism or Streaming

Distributed processor architectures are composed of separate hardware contexts, such as IBM’s Cellprocessor [12], and designed to exploit small working sets and/or algorithms with little locality.Computation organized as pipeline stages allows code to reside on each unit and the data is streamed across.The pattern of dependencies is fixed, simplifying the parallel execution. Tuning and balancing iscomplicated by the predefined computation structure, and may require significant reorganization. Manyalgorithms are amenable to pipeline parallelism when partitioned. Examples include computer visionalgorithms, search, etc., HetCompute supports the pipeline pattern and samples/src illustrating this patternare shown in Pipeline.

HetCompute supports all the models described above, which includes data parallel through the use ofhetcompute::pfor_each and GPU tasks, task parallel using tasks and groups, braided parallelism bymapping tasks to different execution units, and pipeline parallelism using the hetcompute::pipelinepattern. Programmers are encouraged to think in terms of pattens. If the algorithm matches one of theparallel patterns, the path to speedup is very fast. Otherwise, task graphs are easy to construct inHetCompute, and they can express essentially any parallel computation. The HetCompute User Guideprovides a detailed description of the HetCompute parallelization constructs, their design philosophy, andbest practices.

5.4 Parallel Programming Patterns

Parallel patterns are the shipping industry’s containers for parallel programming. If your algorithm fits intoa parallel pattern, it can be quickly packaged, thus you can enjoy exploiting multiple resources on yourplatform. Parallel Programming Patterns describes the HetCompute parallel patterns and [14] is a goodresource to learn more about parallel programming using patterns.

Moreover, the HetCompute patterns are built upon the basic APIs to create and manage tasks. Therefore,your own patterns can be built and they will seamlessly interoperate with the rest of the HetComputeruntime system. Patterns can also be composed with other patterns. For example, a stage in ahetcompute::pipeline pattern may contain any of the other HetCompute patterns or otherHetCompute tasks.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 151

Page 152: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Parallel Processing Tutorial

5.5 Optimizations

Other than algorithmic decomposition of work and data, a parallel program requires tuning to a specificplatform to achieve optimal performance. As a general rule, the following process should be used:

• Serial tuning: The code executed by each task should be optimized using classical optimizationtechniques: loop optimizations, strength reduction, and cache locality optimizations.

• Synchronization tuning: Coordinating parallel execution is typically considered overhead – theprogram executes additional instructions that are not necessarily part of the effective work. Suchoverhead includes serialization in critical sections, waiting for dependencies to be satisfied and/orcondition variables to be signaled, etc. A well-tuned parallel program spends most of its timeexecuting work as opposed to managing work. However, it may be necessary to replicatecomputation in order to minimize synchronization. Using HetCompute asynchronous patterns is oneway of avoiding synchronization.

• Parallel efficiency: A parallel execution is optimal when all execution units are equally busy,performing minimal redundant work. Therefore, it is important to balance the computation across allprocessors. This can be achieved by a combination of algorithmic decomposition — finer grain tasksallow better load balancing, and take into account architectural characteristics, such as resourcesharing and overhead of spawning tasks — coarser grain tasks typically incur less overhead. InHetCompute, one can tune a number of parameters to control the granularity of tasks that execute thepattern.

These topics are discussed briefly in the next sections.

5.5.1 Cache locality

There are two types of memory reference locality, temporal locality in which program references to a givenmemory address are clustered together in time, and spatial locality, where program references toneighboring addresses are clustered in time [10]. Caches transparently take advantage of both types oflocality: replacement policies exploit temporal locality, while wide cache lines and prefetching techniquesexploit spatial locality. Moreover, current architectures provide several levels of caches, with differentsharing patterns. For example, level one caches (L1) are typically split between instructions and data, andare private to a core (shared by the hyperthreads in the case of an SMT architecture), and level two caches(L2) are shared by multiple cores.

Although programmers do not have direct control over caching, code and data can be structured to improvethe locality of reference, thus making effective use of cache mechanisms [18]. In a multicore system,caches are a shared resource, thus programmers should consider the following:

• Consistency: Most multicore shared memory systems provide hardware coherency [10]. However,architectures implement different consistency models [1], thereby affecting the way shared memoryupdates are visible to different threads. In particular, the ARM architecture defines a weak memoryconsistency model. The C++11 standard defines primitives to enforce the ordering of memoryoperations for all atomic accesses. Senior-level programmers can exploit nonsequential consistentorderings to obtain better performance on such systems.

• False sharing: False sharing [19] arises when independent data items used by two tasks executingconcurrently on two different cores are co-located in the same cache line. Because the unit ofcoherence is the cache line, if the items are accessed by both tasks, the line will be forced by thecoherence protocol to bounce between caches. False sharing can be avoided by separating data itemsaccessed by different concurrent tasks into separate cache lines, using techniques such as padding

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 152

Page 153: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Parallel Processing Tutorial

[17] and/or allocation to cache line boundaries. To improve locality of reference and limit memoryfragmentation, programmers should group data items accessed by a single task as close as possible,preferrably in contiguous blocks of memory addresses.

• Cache interference: In serial applications, cache optimizations are tuned to the entire cache.However, in a parallel application, caches are shared by execution units. A carefully tuned parallelprogram should maximize the utilization of the cache, by ensuring reuse of true shared data. Forexample, by maintaining a single copy of read-only shared data, and referencing it simultaneously,one will exploit temporal locality and minimize the amount of cache used. To minimize contentionand interference, the working set sizes of the tasks should fit in the cache. Tiling and cache blocking[6], [20] parameters must be tuned considering the capacity when the caches are shared.

Many other cache locality optimizations are described in the literature.

5.5.2 Minimizing wait time and synchronization

As mentioned, efficient parallel execution implies balanced execution of non-redundant work on allcomputing units, while minimizing management overhead. Fhe fraction of serial execution (Amdahl’s law)limits has shown the efectiveness of the parallel application in Parallel Speedups. Therefore, programmersshould carefully consider different factors that serialize execution, including:

• Avoid waiting for single tasks: Long chains of dependencies, and/or often waiting for the results ofsingle tasks, limits the level of parallelism available in applications [13]. HetCompute groups can beused to wait for sets of tasks, thus potentially minimizing the overall amount of stalling.

• Data synchronization: Synchronizing shared memory accesses may introduce considerableserialization or cache conflict overhead. Such overhead can be reduced by the followingoptimizations:

– Privatize data [3] – mutually exclusive partitioning of shared data. For example, partitioning animage into tiles, where each task works on a different tile. In cases where the partitioning is notobvious, programmers can copy shared data into private buffers, work on the private data, and thensynchronize changes to the shared copy. Parallel reductions, and parallel gather and scatteroperations are helpful in reshaping the private and shared data formats.

– Avoid large critical sections – Because critical sections guarantee mutual exclusion, they serializethe execution of tasks that are accessing these areas. Minimizing the time spent in criticalsections, in particular, when they are highly contended, will reduce the synchronization overhead.

– Use atomic operations – the appropriate memory ordering further reduces the synchronizationoverhead and relies on hardware capabilities for efficient shared data accesses.

HetCompute encourages an asynchronous programming style. Besides providing parallel programmingpatterns with asynchronous semantics, the execution model of HetCompute is one in which fine-grainedtasks are placed in a dependence graph, and thus minimizes the need for waits. By contrast, fork-joinmodels spawn a large set of work which needs to complete before the control flows from the join.Asynchronous concurrency is also preferable in the case of heterogeneous computing because resourcesneed not be blocked waiting for an off-load device to complete the work.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 153

Page 154: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Parallel Processing Tutorial

5.5.3 Load balancing

Serialization is one of the potential pitfalls of parallel programming. Another is the under-utilization of allcompute units. If the parallel computation is unbalanced, some processors will be idle, thereby cutting intothe potential performance gains. To avoid such scenarios, programmers should pay attention to balancingthe work. This can be achieved in several ways, the most popular being:

• Tuning the task granularity: Task granularity represents the amount of work in a task. Ideally, if theamount of work is known, one can balance the computation manually. However, this is not the casefor irregular applications, in which case, overdecomposition and relying on the HetCompute runtimedynamic scheduling is a better option. Task granularity also plays an important role in managing theoverhead. As task granularity decreases, the overhead of managing the parallel execution becomes alarger fraction of the total time. Therefore, coarser tasks are preferred to minimize the overhead. Thisis an important balance that the programmers need to weigh. HetCompute makes it easy to explorethese trade-offs by providing a set of flexible APIs to create tasks.

• Overdecomposition: Overdecomposition is the mechanism by which programmers ensure that thereis enough parallel work in the system, so that the runtime always has work to schedule.Overdecomposition is defined as creating more tasks than the number of computation units available,such that if a task blocks or waits for dependencies to be satisfied, other independent tasks continue tomake progress. The more independent tasks are provided, the better the load balancing that can beachieved. Of course, one needs to take into consideration the task granularity and manage theoverhead.

5.6 Conclusions

Parallel programming is fun and intellectually challenging. There are many factors that come into playwhen building a parallel application, which may not be obvious. The techniques described in this tutorialwill help you reach the main goal of parallel programming — speeding up the execution of the application.HetCompute is designed to ease this task and provide abstractions that make it convenient to expressparallel computation. The hard work of creating a parallel algorithm remains; however, HetCompute andthese techniques will help encoding these algorithms into an efficient solution.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 154

Page 155: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

6 Image Processing Tutorial

6.1 Abstract

The goal of this tutorial is to illustrate how to use the HetCompute programming model to process imagesusing task parallelism and shared memory.

6.2 Image Processing Filter

As an example, the non-local means (NL-means) image denoising algorithm [4] shall be explored. In thisalgorithm, the estimated value of a pixel is computed as a weighted average of all pixels in the image. Theweights depend on the similarity between pairs of pixels, a similarity which is defined as a decreasingfunction of the weighted Euclidian distance. Pixels with a similar gray level neighborhood have, onaverage, larger weights. For a practical computational algorithm, search of similar windows is restricted inan SxS window. In [4] a 21x21 search window with a 7x7 similarity square neighborhood is consideredrobust enough to denoise, while taking care of the finer details.

The weights are computed using the following equation:

w(i, j) =1

Z(i)e−||v(Ni)−v(Nj)||

22,a

h2 ,

where Z(i) is the normalizing constant:

Z(i) =∑j

e−||v(Ni)−v(Nj)||

22,a

h2

Pseudo-code for implementing this algorithm is shown below.

#define SEARCH_WINDOW_SIZE 21#define SIMILARITY_WINDOW_SIZE 7

void compute_weights(Pixel __restrict *input, int x, int y, int *weights){

// compute similarity using Euclidian distance in the similarity window,// using the equation above.// reads from the input, writes int the array weights

}

int denoise_image(Pixel __restrict *input, int width, int height, Pixel *output){

for (int y = 0; y < height; y++) {for (int x = 0; x < width; x++) { //iterate through all pixel points in the input imagePoint point(x, y);// compute weights for pixel points in the search windowint w[SEARCH_WINDOW_SIZE][SEARCH_WINDOW_SIZE];compute_weights(input, point, w);// denoise: compute the weighted average for this pixel point

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 155

Page 156: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Image Processing Tutorial

for (int i = 0; i < SEARCH_WINDOW_SIZE; i++) {for (int j = 0; j < SEARCH_WINDOW_SIZE; j++) {

Point neighbor(point.x - SEARCH_WINDOW_SIZE / 2 + i,point.y - SEARCH_WINDOW_SIZE / 2 + j);

output[point] += w[i][j] * input[neighbor];}

}}

}}

6.3 Parallel Image Processing using HetCompute

The denoising algorithm presented above is embarrasingly parallel. In this implementation, an in-placealgorithm (the output is written in a separate image) is not used; therefore, each pixel can be processed inparallel with all other pixels.

6.3.1 Naive Parallelization

Given this algorithm, a naive implementation will simply parallelize the outermost loop in denoise_image,creating a task for each pixel, and launching it asynchronously:

int denoise_image(Pixel __restrict *input, int width, int height, Pixel *output){

auto g = create_group("denoise");

for (int y = 0; y < height; y++) {for (int x = 0; x < width; x++) {// iterate through all pixel points in the input imageg->launch( [=] {Point point(x, y);// compute weights for pixel points in the search windowint w[SEARCH_WINDOW_SIZE][SEARCH_WINDOW_SIZE];compute_weights(input, point, w);// denoise: compute the weighted average for this pixel pointfor (int i = 0; i < SEARCH_WINDOW_SIZE; i++) {

for (int j = 0; j < SEARCH_WINDOW_SIZE; j++) {Point neighbor(point.x - SEARCH_WINDOW_SIZE / 2 + i,

point.y - SEARCH_WINDOW_SIZE / 2 + j);output[point] += w[i][j] * input[neighbor];

}}});}

}g->wait_for(); // wait for all the tasks to complete

}

While such a parallelization strategy is very simple and easy to implement in HetCompute, the performanceof such an implementation may not be optimal, for several reasons, as discussed in the Parallel ProcessingTutorial. In particular, this implementation is too fine-grained to overcome the parallel overhead and doesnot exploit cache locality.

6.3.2 Tiling for Parallelization

A simple method to coarsen the granularity of tasks is to tile the image and spawn tasks for each tile. Thiscan be done by either tiling the loop directly:

int denoise_image(Pixel __restrict *input, int width, int height, Pixel *output){

auto g = create_group("denoise");for(int y = 0; y < height; y+ = TILE_SIZE_ROW) {

for(int x = 0; x < width; x+ = TILE_SIZE_COL){// iterate through all pixel points in the input tile

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 156

Page 157: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Image Processing Tutorial

g->launch( [=] {for(int ty = y; ty < y + TILE_SIZE_ROW; ty++){// iterate through pixel points in the tile

for(int tx = x; tx < x + TILE_SIZE_COL; tx++){Point point(tx,ty);// compute weights for pixel points in the search windowint w[SEARCH_WINDOW_SIZE][SEARCH_WINDOW_SIZE];compute_weights(input, point, w);// denoise: compute the weighted average for this pixel pointfor(int i = 0; i < SEARCH_WINDOW_SIZE; i++) {

for(int j = 0; j < SEARCH_WINDOW_SIZE; j++) {Point neighbor(point.x - SEARCH_WINDOW_SIZE / 2 + i,

point.y - SEARCH_WINDOW_SIZE / 2 + j);output[point] += w[i][j] * input[neighbor];

}}

}}});}

}g->wait_for(); // wait for all the tasks to complete

}

or by restructuring the code, such that a denoise kernel is preserved that is identical to the serialimplementation and parallelize its invocation.

int denoise_kernel(Pixel __restrict *input, int startX, int width, int startY, int height, Pixel *output){

for(int y = startY; y < startY + height; y++){for(int x = startX ; x < startX + width; x++){// iterate through all pixel points in the input tilePoint point(tx,ty);// compute weights for pixel points in the search windowint w[SEARCH_WINDOW_SIZE][SEARCH_WINDOW_SIZE];compute_weights(input, point, w);// denoise: compute the weighted average for this pixel pointfor (int i = 0; i < SEARCH_WINDOW_SIZE; i++) {

for (int j = 0; j < SEARCH_WINDOW_SIZE; j++) {Point neighbor(point.x - SEARCH_WINDOW_SIZE / 2 + i,

point.y - SEARCH_WINDOW_SIZE / 2 + j);output[point] += w[i][j] * input[neighbor];

}}}

}}void denoise_image(Pixel __restrict *input, int width, int height, Pixel *output){

// initialization, etcauto g = create_group("denoise");for(int y = 0; y < height; y+= TILE_SIZE_ROW) {

for(int x = 0; x < width; x+= TILE_SIZE_COL) {g->launch( [=] {denoise_kernel(input, x, TILE_SIZE_COL, y, TILE_SIZE_ROW, output);});}

}g->wait_for(); // wait for all the tasks to complete

}

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 157

Page 158: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Image Processing Tutorial

6.3.3 Parallelization using patterns

The tiling optimization takes care of locality, because each task will compute on a larger granularity.However, as you may notice, a task is launched for every tile and then waits for all of them to finish. WhileHetCompute is efficient in launching many tasks, it also has a pattern API that provides better loadbalancing and management of resources. In particular, for this example hetcompute::pfor_each construct isused.

The kernel remains the same, but the driver function is rewritten:

void denoise_image(){

// initialization, etchetcompute::range<2> r(0, width, TILE_SIZE_COL, 0, height, TILE_SIZE_ROW);hetcompute::pfor_each(r, [input, &output] (size_t index) {

hetcompute::index<2> idx = r.linear_to_index(index);denoise_kernel(input, idx[0], TILE_SIZE_COL, idx[1], TILE_SIZE_ROW, output);

});}

In analyzing the code:

hetcompute::range<2> defines a 2-dimensional range, from [0,width) x [0, height) with a stride ofTILE_SIZE in each dimension. Each dimension can have a different stride. This range is used as theiteration space for the parallel loop.

The hetcompute::pfor_each() construct expresses the fact that this is a parallel loop, that is all iterations canexecute concurrently. A lambda expression is passed that encapsulates the kernel (just as QualcommHetCompute was launching tasks in the previous section). In this case, the lambda takes an argument, thelinear index in the range. The hetcompute::pfor_each pattern allows HetCompute to aggregate tasks as it isappropriate for the platform.

hetcompute::index<2> defines a 2-dimensional index. HetCompute ranges know how to iterate to theappropriate points. The linear_to_index call returns an object that has the appropriate coordinates in eachdimension of the range. These can be directly accessed using the [] operator on the object.

Note that there is no hetcompute::wait_for() at the end of the loop. hetcompute::pfor_each is a synchronousoperation and will wait internally for all tasks in the iteration to complete. HetCompute also provides anasynchronous version of pfor_each, namely hetcompute::pfor_each_async(), which should be used whencomputation can proceed because it does not require the data produced in the loop, for example,overlapping the denoising of several images. See the HetCompute reference manual for thehetcompute::pfor_each_async() specification.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 158

Page 159: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

7 Point Kernels (Beta feature)

A Point Kernel, combined with the hetcompute::pfor_each pattern (see Parallel Iteration), isautomatically scheduled for heterogeneous execution across the CPU, GPU, and DSP, by the HetComputeruntime. A Point Kernel is written in C99 with some minor restrictions. The programmer is encouraged touse the hetcompute::pattern::tuner to experiment with and set the distribution of workloadacross the CPU, GPU, and DSP. For example, hetcompute::pattern::tuner().set_cpu_-load(30).set_gpu_load(50).set_dsp_load(20) instructs the HetCompute runtime topartition the range of iterations such that 30% of the iterations are assigned to the CPU, 50% to the GPU,and the remaining 20% to the DSP. In the present release, there are a few constraints on the code inside aPoint Kernel:

1. The pfor_each with a Point Kernel shall write to only one output buffer.

2. Each iteration i of the pfor_each range shall write only to index i of the output buffer.

A Point Kernel captures the operations performed at a point in an iteration space. For example, a vector-addpoint kernel computes the sum of two vector elements A[i] and B[i] and stores the result in C[i], for everypoint i in a hetcompute::range of iterations. In contrast to OpenCL kernels, which can synchronizeacross work-items in a work-group using, e.g., barriers, a Point Kernel captures pure data-parallelism suchthat no two points in the iteration space can synchronize with each other during a kernel’s execution – allsynchronization is deferred to until after the kernel’s execution. In practice, this is not a significantlimitation, as several algorithms in multiple domains such as image processing, video encoding, andsimultaneous localization and mapping can be expressed as Point Kernels.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 159

Page 160: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

8 Patterns Reference API

The Qualcomm HetCompute parallel patterns API provides programmers with a high-level interface toexpress commonly used parallel programming idioms, such as parallel loops, parallel prefix operations,parallel map and reduce operations, etc. We recommend considering using one of the patterns before tryingto implement custom task graphs, as the Qualcomm HetCompute runtime optimizes the execution ofpatterns. You can fine tune the execution as explained in Section Tuner.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 160

Page 161: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

8.1 Parallel For Loop

Classes

• class hetcompute::pattern::pfor< T1, T2 >

• class hetcompute::pattern::pfor< hetcompute::internal::pointkernel::pointkernel< RT, PKType...>,T2 >

• class hetcompute::pattern::pfor< T1, void >

Functions

• template<typename UnaryFn >

pfor< UnaryFn, void > hetcompute::pattern::create_pfor_each (UnaryFn &&fn)

• template<typename KernelTuple , typename ArgTuple >

hetcompute::pattern::pfor< KernelTuple, ArgTuple > hetcompute::beta::pattern::create_pfor_each_helper (KernelTuple&&ktpl, ArgTuple &&atpl)

• template<class InputIterator , typename UnaryFn >

void hetcompute::pfor_each (InputIterator first, InputIterator last, UnaryFn &&fn, consthetcompute::pattern::tuner &t=hetcompute::pattern::tuner())

• template<class InputIterator , typename UnaryFn >

void hetcompute::pfor_each (InputIterator first, const size_t stride, InputIterator last, UnaryFn &&fn,const hetcompute::pattern::tuner &t=hetcompute::pattern::tuner())

• template<size_t Dims, typename UnaryFn >

void hetcompute::pfor_each (const hetcompute::range< Dims > &r, UnaryFn &&fn, consthetcompute::pattern::tuner &t=hetcompute::pattern::tuner())

Parallel version of std::for_each.

• template<class InputIterator , typename UnaryFn >

hetcompute::task_ptr< void()> hetcompute::pfor_each_async (InputIterator first, InputIterator last,UnaryFn fn, const hetcompute::pattern::tuner &tuner=hetcompute::pattern::tuner())

• template<class InputIterator , typename UnaryFn >

hetcompute::task_ptr< void()> hetcompute::pfor_each_async (InputIterator first, const size_t stride,InputIterator last, UnaryFn fn, const hetcompute::pattern::tuner &tuner=hetcompute::pattern::tuner())

• template<size_t Dims, typename UnaryFn >

hetcompute::task_ptr< void()> hetcompute::pfor_each_async (const hetcompute::range< Dims >&r, UnaryFn fn, const hetcompute::pattern::tuner &tuner=hetcompute::pattern::tuner())

• template<size_t Dims, typename UnaryFn >

hetcompute::task_ptr< void()> hetcompute::pfor_each_async (const hetcompute::range< Dims >&r, const size_t stride, UnaryFn fn, const hetcompute::pattern::tuner&tuner=hetcompute::pattern::tuner())

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 161

Page 162: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

8.1.1 Class Documentation

8.1.1.1 class hetcompute::pattern::pfor

template<typename T1, typename T2>class hetcompute::pattern::pfor< T1, T2 >

Public member functions

• pfor (T1 &&ktpl, T2 &&atpl)

• void add_run ()

• uint64_t get_cpu_task_time () const

• uint64_t get_dsp_task_time () const

• uint64_t get_gpu_task_time () const

• template<size_t Dims>

void operator() (const hetcompute::range< Dims > &r, hetcompute::pattern::tuner&t=hetcompute::pattern::tuner().set_cpu_load(100))

• double query_dsp_profile () const

• double query_gpu_profile () const

• template<size_t Dims>

void run (const hetcompute::range< Dims > &r, hetcompute::pattern::tuner&t=hetcompute::pattern::tuner().set_cpu_load(100))

• void set_cpu_task_time (uint64_t ct)

• void set_dsp_profile (double dp)

• void set_dsp_task_time (uint64_t ht)

• void set_gpu_profile (double gp)

• void set_gpu_task_time (uint64_t gt)

Public Attributes

• T2 _atpl

• uint64_t _cpu_task_time

• double _dsp_profile

• uint64_t _dsp_task_time

• double _gpu_profile

• uint64_t _gpu_task_time

• T1 _ktpl

• size_t _num_runs

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 162

Page 163: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

Friends

• template<typename KT , typename AT >

pfor< KT, AT > hetcompute::beta::pattern::create_pfor_each_helper (KT &&ktpl, AT &&atpl)

• template<size_t Dims, typename KernelTuple , typename ArgTuple , typename KernelFirst , typename... KernelRest,

typename... Args, typename Boolean , typename Buf_Tuple >

void hetcompute::internal::pfor_each (hetcompute::pattern::pfor< KernelTuple, ArgTuple >∗const p, const hetcompute::range< Dims > &r, std::tuple< KernelFirst, KernelRest...> &klist,hetcompute::pattern::tuner &tuner, const Boolean called_with_pointkernel, Buf_Tuple &&buf_tup,Args &&...args)

8.1.1.2 class hetcompute::pattern::pfor<hetcompute::internal::pointkernel::pointkernel< RT, PKType...>, T2 >

template<typename RT, typename... PKType, typename T2>class hetcompute::pattern::pfor<hetcompute::internal::pointkernel::pointkernel< RT, PKType...>, T2 >

Public member functions

• pfor (pointkernel_type &pk, T2 &&atpl)

• void add_run ()

• uint64_t get_cpu_task_time () const

• uint64_t get_dsp_task_time () const

• uint64_t get_gpu_task_time () const

• template<size_t Dims>

void operator() (const hetcompute::range< Dims > &r, hetcompute::pattern::tuner&t=hetcompute::pattern::tuner().set_cpu_load(100))

• double query_dsp_profile () const

• double query_gpu_profile () const

• template<size_t Dims>

void run (const hetcompute::range< Dims > &r, hetcompute::pattern::tuner&t=hetcompute::pattern::tuner().set_cpu_load(100))

• void set_cpu_task_time (uint64_t ct)

• void set_dsp_profile (double dp)

• void set_dsp_task_time (uint64_t ht)

• void set_gpu_profile (double gp)

• void set_gpu_task_time (uint64_t gt)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 163

Page 164: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

Public Attributes

• T2 _atpl

• uint64_t _cpu_task_time

• double _dsp_profile

• uint64_t _dsp_task_time

• double _gpu_profile

• uint64_t _gpu_task_time

• size_t _num_runs

• pointkernel_type & _pk

Friends

• template<typename RetType , typename... PKArgs, typename ArgTuple >

pfor< hetcompute::internal::pointkernel::pointkernel< RetType, PKArgs...>, ArgTuple > hetcompute::beta::pattern::create_pfor_each_helper(hetcompute::internal::pointkernel::pointkernel< RetType, PKArgs...> &pk, ArgTuple &&atpl)

• template<size_t Dims, typename KernelTuple , typename ArgTuple , typename KernelFirst , typename... KernelRest,

typename... Args, typename Boolean , typename Buf_Tuple >

void hetcompute::internal::pfor_each (hetcompute::pattern::pfor< KernelTuple, ArgTuple >∗const p, const hetcompute::range< Dims > &r, std::tuple< KernelFirst, KernelRest...> &klist,hetcompute::pattern::tuner &tuner, const Boolean called_with_pointkernel, Buf_Tuple &&buf_tup,Args &&...args)

8.1.1.3 class hetcompute::pattern::pfor< T1, void >

template<typename T1>class hetcompute::pattern::pfor< T1, void >

Public member functions

• pfor (T1 &&fn)

• template<typename InputIterator >

void operator() (InputIterator first, InputIterator last, const hetcompute::pattern::tuner&t=hetcompute::pattern::tuner())

• template<typename InputIterator >

void operator() (InputIterator first, const size_t stride, InputIterator last, consthetcompute::pattern::tuner &t=hetcompute::pattern::tuner())

• template<size_t Dims>

void operator() (const hetcompute::range< Dims > &r, const hetcompute::pattern::tuner&t=hetcompute::pattern::tuner())

• template<typename InputIterator >

void run (InputIterator first, InputIterator last, const hetcompute::pattern::tuner

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 164

Page 165: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

&t=hetcompute::pattern::tuner())

• template<typename InputIterator >

void run (InputIterator first, const size_t stride, InputIterator last, const hetcompute::pattern::tuner&t=hetcompute::pattern::tuner())

• template<size_t Dims>

void run (const hetcompute::range< Dims > &r, const hetcompute::pattern::tuner&t=hetcompute::pattern::tuner())

Public Attributes

• T1 _fn

Friends

• pfor create_pfor_each (T1 &&fn)

• template<typename Fn , typename... Args>

hetcompute::task_ptr< void()> hetcompute::create_task (const pfor< Fn, void > &pf, Args&&...args)

8.1.2 Function Documentation

8.1.2.1 template<typename UnaryFn > pfor< UnaryFn, void > hetcompute::pattern-::create_pfor_each ( UnaryFn && fn )

Create a pattern object from function object fn

Returns a pattern object which can be invoked (1) synchronously, using the run method or the () operatorwith arguments; or (2) asynchronously, using hetcompute::create_task or hetcompute::launch

Examples

auto l = [](size_t) {};auto pfor = hetcompute::pattern::create_pfor_each(l);pfor(0, 100);

Parameters

fn Function object apply to range.

8.1.2.2 template<class InputIterator , typename UnaryFn > void hetcompute-::pfor_each ( InputIterator first, InputIterator last, UnaryFn && fn, consthetcompute::pattern::tuner & t = hetcompute::pattern::tuner() )

Parallel version of std::for_each.

Applies function object fn in parallel to every iterator in the range [first, last). It has a default step size ofone.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 165

Page 166: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

Note

An "iterator" refers to an object that enables a programmer to traverse a container. It can be indices ofintegral type, or pointers of RandomAccessIterator type.In contrast to std::for_each and ptransform, the iterator is passed to the function, instead ofthe element.

It is permissible to modify the elements of the range from fn, provided that InputIterator is amutable iterator.

Note

This function returns only after fn has been applied to the whole iteration range.The usual rules for cancellation apply, i.e., within fn the cancellation must be acknowledged usingabort_on_cancel.

Complexity

Exactly std::distance(first,last) applications of fn.

See Also

ptransform(InputIterator, InputIterator, UnaryFn)abort_on_cancel()

Examples

[...]// Parallel for-loop using indicespfor_each(size_t(0), vin.size(),

[=,&vin] (size_t i) {vin[i] = 2 * vin[i];

});[...]

Parameters

first Start of the range to which to apply fn.last End of the range to which to apply fn.fn Unary function object to be applied.t Qualcomm HETCOMPUTE pattern tuner object (optional)

8.1.2.3 template<class InputIterator , typename UnaryFn > void hetcompute::pfor_-each ( InputIterator first, const size_t stride, InputIterator last, UnaryFn &&fn, const hetcompute::pattern::tuner & t = hetcompute::pattern::tuner() )

Parallel version of std::for_each with step size.

Applies function object fn in parallel to every iterator in the range [first, last) with step size defined by thestride parameter.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 166

Page 167: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

Note

The function object will be applied to iterators with an incremental step size (iter+=stride)

Parameters

first start of the range to which to apply fn.stride step size.last end of the range to which to apply fn.fn unary function object to be applied. set to one by default.t Qualcomm HETCOMPUTE pattern tuner object (optional)

8.1.2.4 template<size_t Dims, typename UnaryFn > void hetcompute::pfor_-each ( const hetcompute::range< Dims > & r, UnaryFn && fn, consthetcompute::pattern::tuner & t = hetcompute::pattern::tuner() )

Instead of passing in a pair of iterators, this form accepts a hetcompute::range object. Internally theindices are linearized before passing to the kernel function. It has a default step size of one.

Parameters

r Range object (1D, 2D or 3D) representing the iteration space.fn Unary function object to be applied.t Qualcomm HETCOMPUTE pattern tuner object (optional).

8.1.2.5 template<class InputIterator , typename UnaryFn > hetcompute::task_-ptr<void()> hetcompute::pfor_each_async ( InputIterator first, InputIteratorlast, UnaryFn fn, const hetcompute::pattern::tuner & tuner = hetcompute-::pattern::tuner() )

Create an asynchronous task from the hetcompute::pfor_each pattern.

Returns a task that represents the pattern’s execution. Operations on the task translate into operations on theexecuting pattern. The caller must launch the task. This API has a default stride size of one.

Parameters

first Start of the range to which to apply fn.last End of the range to which to apply fn.fn Unary function object to be applied.tuner Qualcomm HETCOMPUTE pattern tuner object (optional).

Returns

task_ptr Unlaunched task representing pattern’s execution.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 167

Page 168: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

8.1.2.6 template<class InputIterator , typename UnaryFn > hetcompute::task_-ptr<void()> hetcompute::pfor_each_async ( InputIterator first, const size_tstride, InputIterator last, UnaryFn fn, const hetcompute::pattern::tuner &tuner = hetcompute::pattern::tuner() )

Create an asynchronous task from the hetcompute::pfor_each pattern (with step size).

Parameters

first start of the range to which to apply fn.stride step size.last end of the range to which to apply fn.fn unary function object to be applied.tuner Qualcomm HETCOMPUTE pattern tuner object (optional)

Returns

task_ptr unlaunched task representing pattern’s execution

8.1.2.7 template<size_t Dims, typename UnaryFn > hetcompute::task_ptr<void()>hetcompute::pfor_each_async ( const hetcompute::range< Dims > & r,UnaryFn fn, const hetcompute::pattern::tuner & tuner = hetcompute::pattern-::tuner() )

Create an asynchronous task from the hetcompute::pfor_each pattern.

Returns a task that represents the pattern’s execution. Operations on the task translate into operations on theexecuting pattern. The caller must launch the task. This API has a default step size of one.

Parameters

r range object (1D, 2D or 3D) representing the iteration space.fn unary function object to be applied.tuner HETCOMPUTE pattern tuner object (optional)

Returns

task_ptr unlaunched task representing pattern’s execution

8.1.2.8 template<size_t Dims, typename UnaryFn > hetcompute::task_ptr<void()>hetcompute::pfor_each_async ( const hetcompute::range< Dims > & r,const size_t stride, UnaryFn fn, const hetcompute::pattern::tuner & tuner =hetcompute::pattern::tuner() )

Create an asynchronous task from the hetcompute::pfor_each pattern (with step size).

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 168

Page 169: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

Parameters

r range object (1D, 2D or 3D) representing the iteration space.stride step size.fn unary function object to be applied.tuner HETCOMPUTE pattern tuner object (optional)

Returns

task_ptr unlaunched task representing pattern’s execution

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 169

Page 170: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

8.2 Parallel Transformation

Classes

• class hetcompute::pattern::ptransformer< Fn >

Functions

• template<typename Fn >

ptransformer< Fn > hetcompute::pattern::create_ptransform (Fn &&fn)

• template<typename InputIterator , typename OutputIterator , typename UnaryFn >

std::enable_if<!std::is_same< hetcompute::pattern::tuner,typename std::remove_reference< UnaryFn >::type >::value,void >::type hetcompute::ptransform (InputIterator first, InputIterator last, OutputIterator d_first,UnaryFn &&fn, const hetcompute::pattern::tuner &tuner=hetcompute::pattern::tuner())

• template<typename InputIterator , typename OutputIterator , typename BinaryFn >

std::enable_if<!std::is_same< hetcompute::pattern::tuner,typename std::remove_reference< BinaryFn >::type >::value,void >::type hetcompute::ptransform (InputIterator first1, InputIterator last1, InputIterator first2,OutputIterator d_first, BinaryFn &&fn, const hetcompute::pattern::tuner&tuner=hetcompute::pattern::tuner())

• template<typename InputIterator , typename UnaryFn >

void hetcompute::ptransform (InputIterator first, InputIterator last, UnaryFn &&fn, consthetcompute::pattern::tuner &tuner=hetcompute::pattern::tuner())

• template<typename InputIterator , typename OutputIterator , typename UnaryFn >

hetcompute::task_ptr< void()> hetcompute::ptransform_async (InputIterator first, InputIterator last,OutputIterator d_first, UnaryFn &&fn, const hetcompute::pattern::tuner&tuner=hetcompute::pattern::tuner())

• template<typename InputIterator , typename OutputIterator , typename BinaryFn >

hetcompute::task_ptr< void()> hetcompute::ptransform_async (InputIterator first1, InputIteratorlast1, InputIterator first2, OutputIterator d_first, BinaryFn &&fn, const hetcompute::pattern::tuner&tuner=hetcompute::pattern::tuner())

• template<typename InputIterator , typename UnaryFn >

hetcompute::task_ptr< void()> hetcompute::ptransform_async (InputIterator first, InputIterator last,UnaryFn &&fn, const hetcompute::pattern::tuner &tuner=hetcompute::pattern::tuner())

8.2.1 Class Documentation

8.2.1.1 class hetcompute::pattern::ptransformer

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 170

Page 171: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

template<typename Fn>class hetcompute::pattern::ptransformer< Fn >

Public member functions

• template<typename InputIterator >

void operator() (InputIterator first, InputIterator last, const hetcompute::pattern::tuner&t=hetcompute::pattern::tuner())

• template<typename InputIterator , typename OutputIterator >

void operator() (InputIterator first, InputIterator last, OutputIterator d_first, consthetcompute::pattern::tuner &t=hetcompute::pattern::tuner())

• template<typename InputIterator1 , typename InputIterator2 , typename OutputIterator >

void operator() (InputIterator1 first1, InputIterator1 last1, InputIterator2 first2, OutputIterator d_first,const hetcompute::pattern::tuner &t=hetcompute::pattern::tuner())

• template<typename InputIterator >

void run (InputIterator first, InputIterator last, const hetcompute::pattern::tuner&t=hetcompute::pattern::tuner())

• template<typename InputIterator , typename OutputIterator >

void run (InputIterator first, InputIterator last, OutputIterator d_first, consthetcompute::pattern::tuner &t=hetcompute::pattern::tuner())

• template<typename InputIterator1 , typename InputIterator2 , typename OutputIterator >

void run (InputIterator1 first1, InputIterator1 last1, InputIterator2 first2, OutputIterator d_first, consthetcompute::pattern::tuner &t=hetcompute::pattern::tuner())

Friends

• ptransformer create_ptransform (Fn &&fn)

• template<typename F , typename... Args>

hetcompute::task_ptr< void()> hetcompute::create_task (const ptransformer< F > &ptf, Args&&...args)

8.2.1.1.1 Related Function Documentation

8.2.1.1.1.1 template<typename Fn > ptransformer create_ptransform ( Fn && fn ) [friend]

Create a pattern object from a function object fn.

Returns a pattern object which can be invoked (1) synchronously, using the run method or the () operatorwith arguments; or (2) asynchronously, using hetcompute::create_task or hetcompute::launch.

Examples

auto l = [](size_t) {};auto ptransform = hetcompute::pattern::create_ptransform(l)

;ptransform(vin.begin(), vin.end());

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 171

Page 172: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

Parameters

fn Function object to be applied.

8.2.2 Function Documentation

8.2.2.1 template<typename Fn > ptransformer< Fn > hetcompute::pattern::create_-ptransform ( Fn && fn )

Create a pattern object from a function object fn.

Returns a pattern object which can be invoked (1) synchronously, using the run method or the () operatorwith arguments; or (2) asynchronously, using hetcompute::create_task or hetcompute::launch.

Examples

auto l = [](size_t) {};auto ptransform = hetcompute::pattern::create_ptransform(l)

;ptransform(vin.begin(), vin.end());

Parameters

fn Function object to be applied.

8.2.2.2 template<typename InputIterator , typename OutputIterator , typenameUnaryFn > std::enable_if<!std::is_same<hetcompute::pattern::tuner,typename std::remove_reference<UnaryFn>::type>::value, void>::typehetcompute::ptransform ( InputIterator first, InputIterator last, Output-Iterator d_first, UnaryFn && fn, const hetcompute::pattern::tuner & tuner =hetcompute::pattern::tuner() )

Parallel version of std::transform.

Applies function object fn in parallel to every dereferenced iterator in the range [first, last) and stores thereturn value in another range, starting at d_first.

Note

This function returns only after fn has been applied to the whole iteration range.In contrast to pfor_each, arguments specifying ranges are restricted to RandomAccessIterator,where as pfor_each allows them to be of integral type representing indices.

Complexity

Exactly std::distance(first,last) applications of fn.

See Also

ptransform(InputIterator, InputIterator, UnaryFn&&, const hetcompute::pattern::tuner&)pfor_each(InputIterator, InputIterator, UnaryFn&&, const hetcompute::pattern::tuner&)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 172

Page 173: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

Examples

// arr[i] == 2*vin[i]size_t arr[vin.size()];ptransform(begin(vin), end(vin), arr,

[=] (size_t const& v) {return 2*v;

});

Parameters

first Start of the range to which to apply fn.last End of the range to which to apply fn.d_first Start of the destination range.fn Unary function object to be applied.tuner Qualcomm HetCompute pattern tuner object (optional).

8.2.2.3 template<typename InputIterator , typename OutputIterator , typename Binary-Fn > std::enable_if<!std::is_same<hetcompute::pattern::tuner, typenamestd::remove_reference<BinaryFn>::type>::value, void>::type hetcompute-::ptransform ( InputIterator first1, InputIterator last1, InputIterator first2,OutputIterator d_first, BinaryFn && fn, const hetcompute::pattern::tuner &tuner = hetcompute::pattern::tuner() )

Parallel version of std::transform.

Applies function object fn in parallel to every pair of dereferenced destination iterators in the range [first1,last1) and [first2,...), and stores the return value in another range, starting at d_first.

Note

This function returns only after fn has been applied to the whole iteration range.In contrast to pfor_each, arguments specifying range are restricted to RandomAccessIterator, whereas pfor_each allows them to be of integral type representing indices.

Complexity

Exactly std::distance(first1,last1) applications of fn.

See Also

ptransform(InputIterator, InputIterator, UnaryFn&&, const hetcompute::pattern::tuner&)pfor_each(InputIterator, InputIterator, UnaryFn&&, const hetcompute::pattern::tuner&)

Examples

// vout[i] == vin[i] + vin[i+1]vector<size_t> vout(vin.size()-1);ptransform(begin(vin), begin(vin)+vout.size(),

begin(vin)+1,begin(vout),[=] (size_t const& op1, size_t const& op2) {return op1 + op2;

});

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 173

Page 174: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

Parameters

first1 Start of the range to which to apply fn.last1 End of the range to which to apply fn.first2 Start of the second range to which to apply fn.d_first Start of the destination range.fn Binary function object to be applied.tuner Qualcomm HetCompute pattern tuner object (optional).

8.2.2.4 template<typename InputIterator , typename UnaryFn > void hetcompute-::ptransform ( InputIterator first, InputIterator last, UnaryFn && fn, consthetcompute::pattern::tuner & tuner = hetcompute::pattern::tuner() )

Parallel version of std::transform.

Applies function object fn in parallel to every dereferenced iterator in the range [first, last).

Note

In contrast to pfor_each, the dereferenced iterator is passed to the function.In contrast to pfor_each, arguments specifying range are restricted to RandomAccessIterator, whereas pfor_each allows them to be of integral type representing indices.

It is permissible to modify the elements of the range from fn, assuming that InputIterator is amutable iterator.

Note

This function returns only after fn has been applied to the whole iteration range.

Complexity

Exactly std::distance(first,last) applications of fn.

See Also

pfor_each(InputIterator, InputIterator, UnaryFn&&, const hetcompute::pattern::tuner&)

// In-place double the value for all elements in rangeptransform(begin(vin), end(vin),

[] (size_t& v) {v *= 2;

});

Parameters

first Start of the range to which to apply fn.last End of the range to which to apply fn.fn Unary function object to be applied.tuner Qualcomm HetCompute pattern tuner object (optional).

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 174

Page 175: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

8.2.2.5 template<typename InputIterator , typename OutputIterator , typenameUnaryFn > hetcompute::task_ptr<void()> hetcompute::ptransform_async (InputIterator first, InputIterator last, OutputIterator d_first, UnaryFn && fn,const hetcompute::pattern::tuner & tuner = hetcompute::pattern::tuner() )

Create an asynchronous task from the hetcompute::ptransform pattern.

Note

The usual rules for cancellation apply, i.e., within fn the cancellation must be acknowledged usingabort_on_cancel.

See Also

ptransform(InputIterator, InputIterator, OutputIterator, UnaryFn&&, consthetcompute::pattern::tuner&)

Examples

// create an async task from the ptransform patternauto t = ptransform_async(begin(vin), end(vin), arr,

[=] (size_t const& i) {return 2 * i;

});t->launch();t->wait_for();

8.2.2.6 template<typename InputIterator , typename OutputIterator , typenameBinaryFn > hetcompute::task_ptr<void()> hetcompute::ptransform_async( InputIterator first1, InputIterator last1, InputIterator first2, OutputIteratord_first, BinaryFn && fn, const hetcompute::pattern::tuner & tuner =hetcompute::pattern::tuner() )

Create an asynchronous task from the hetcompute::ptransform pattern.

Note

The usual rules for cancellation apply, i.e., within fn the cancellation must be acknowledged usingabort_on_cancel.

See Also

ptransform(InputIterator, InputIterator, InputIterator, OutputIterator, BinaryFn&&, consthetcompute::pattern::tuner&)

8.2.2.7 template<typename InputIterator , typename UnaryFn > hetcompute-::task_ptr<void()> hetcompute::ptransform_async ( InputIterator first,InputIterator last, UnaryFn && fn, const hetcompute::pattern::tuner & tuner =hetcompute::pattern::tuner() )

Create an asynchronous task from the hetcompute::ptransform pattern.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 175

Page 176: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

Note

The usual rules for cancellation apply, i.e., within fn the cancellation must be acknowledged usingabort_on_cancel.

See Also

ptransform(InputIterator, InputIterator, UnaryFn&&, const hetcompute::pattern::tuner&)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 176

Page 177: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

8.3 Parallel Reduction

Classes

• class hetcompute::pattern::preducer< Reduce, Join >

Functions

• template<typename Reduce , typename Join >

preducer< Reduce, Join > hetcompute::pattern::create_preduce (Reduce &&r, Join &&j)

• template<typename T , class InputIterator , typename Reduce , typename Join >

T hetcompute::preduce (InputIterator first, InputIterator last, const T &identity, Reduce &&reduce,Join &&join, hetcompute::pattern::tuner tuner=hetcompute::pattern::tuner())

• template<typename T , typename Container , typename Join >

T hetcompute::preduce (Container &c, const T &identity, Join &&join, hetcompute::pattern::tunertuner=hetcompute::pattern::tuner())

Perform parallel reduction on some container c.

• template<typename T , typename Iterator , typename Join >

T hetcompute::preduce (Iterator first, Iterator last, const T &identity, Join &&join,hetcompute::pattern::tuner tuner=hetcompute::pattern::tuner())

Perform parallel reduction on range defined by iterators.

• template<typename T , class InputIterator , typename Reduce , typename Join >

hetcompute::task_ptr< T > hetcompute::preduce_async (InputIterator first, InputIterator last, const T&identity, Reduce &&reduce, Join &&join, hetcompute::pattern::tunertuner=hetcompute::pattern::tuner())

• template<typename T , typename Container , typename Join >

hetcompute::task_ptr< T > hetcompute::preduce_async (Container &c, const T &identity, Join&&join, hetcompute::pattern::tuner tuner=hetcompute::pattern::tuner())

• template<typename T , typename Iterator , typename Join >

hetcompute::task_ptr< T > hetcompute::preduce_async (Iterator first, Iterator last, const T &identity,Join &&join, hetcompute::pattern::tuner tuner=hetcompute::pattern::tuner())

8.3.1 Class Documentation

8.3.1.1 class hetcompute::pattern::preducer

template<typename Reduce, typename Join>class hetcompute::pattern::preducer< Reduce, Join >

Public member functions

• template<typename T , typename InputIterator >

T operator() (InputIterator first, InputIterator last, const T &identity, consthetcompute::pattern::tuner &t=hetcompute::pattern::tuner())

• template<typename T , typename InputIterator >

T run (InputIterator first, InputIterator last, const T &identity, const hetcompute::pattern::tuner&t=hetcompute::pattern::tuner())

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 177

Page 178: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

Friends

• preducer create_preduce (Reduce &&r, Join &&j)

• template<typename T , typename R , typename J , typename... Args>

hetcompute::task_ptr< T > hetcompute::create_task (const preducer< R, J > &p, Args &&...args)

8.3.1.1.1 Related Function Documentation

8.3.1.1.1.1 template<typename Reduce , typename Join > preducer create_preduce ( Reduce && r,Join && j ) [friend]

Create pattern object from function objects: Reduce and Join.

Returns a pattern object which can be invoked (1) synchronously, using the run method or the () operatorwith arguments; or (2) asynchronously, using hetcompute::create_task or hetcompute::launch.

Examples

// parallel sum over a vector of elementsvector<size_t> vec(100, 1);const size_t identity = 0;typedef vector<size_t>::iterator iter_type;

auto reduce = [&vec](iter_type it1, iter_type it2, size_t init){return std::accumulate(it1, it2, init);

};auto join = std::plus<size_t>();auto preduce = hetcompute::pattern::create_preduce(reduce, join);

// result == 100auto result = preduce(vec.begin(), vec.end(), identity);

Parameters

r Function object accumulating result for subrange.j Function object combining two subrange results.

8.3.2 Function Documentation

8.3.2.1 template<typename Reduce , typename Join > preducer< Reduce, Join >

hetcompute::pattern::create_preduce ( Reduce && r, Join && j )

Create pattern object from function objects: Reduce and Join.

Returns a pattern object which can be invoked (1) synchronously, using the run method or the () operatorwith arguments; or (2) asynchronously, using hetcompute::create_task or hetcompute::launch.

Examples

// parallel sum over a vector of elementsvector<size_t> vec(100, 1);const size_t identity = 0;typedef vector<size_t>::iterator iter_type;

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 178

Page 179: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

auto reduce = [&vec](iter_type it1, iter_type it2, size_t init){return std::accumulate(it1, it2, init);

};auto join = std::plus<size_t>();auto preduce = hetcompute::pattern::create_preduce(reduce, join);

// result == 100auto result = preduce(vec.begin(), vec.end(), identity);

Parameters

r Function object accumulating result for subrange.j Function object combining two subrange results.

8.3.2.2 template<typename T , class InputIterator , typename Reduce , typename Join> T hetcompute::preduce ( InputIterator first, InputIterator last, const T &identity, Reduce && reduce, Join && join, hetcompute::pattern::tuner tuner =hetcompute::pattern::tuner() )

Performs parallel reduction by reducing the results using binary operator join. Returns the result of thereduction.

Note

Qualcomm HetCompute parallel reduction pattern operates in two stages. In the first stage, it appliesthe reduction operation (reduce) to a set of subranges. In the second stage, the reduction results of allthe subranges will be aggregated (join) into the final result.The binary operation is expected to be associative, but not necessarily commutative, as the algorithmdoes not swap operands of the reduce operation while working on the range.InputIterator can be either of type RandomAccess Iterators, or of integral type to represent iterationindices.For tiny iteration range and/or trivial binary operator, it may not be worthwhile to parallelize thereduction operation.Reduce function requires pass-by-reference semantics.To achieve best performance, it is recommended to implement move constructor/assignment foruser-defined to-be-reduced type.

Complexity

Exactly last-first-1 applications of join.

Examples

[...]const int identity = 0;// Parallel sumauto p_sum = hetcompute::preduce(0, vin.size(), identity,

// reduce func[&vin](int i, int j, int& init){for(int k = i; k < j; ++k)init += vin[k];

},// join func[](size_t x, size_t y){ return x + y; });

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 179

Page 180: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

Parameters

first Start of the range to parallel reduce.last End of the range to parallel reduce.identity A special element leaves other elements unchanged when combined with

them, i.e., x = identity ∗ x and x = x ∗ identity. For example, 0 is the identityelement under addition for the real numbers.

reduce Reduce function applied to each subrange and return the result.join Join calculated results from two subranges.tuner Qualcomm HetCompute pattern tuner object (optional).

Returns

T Returns the reduction result of type T.

8.3.2.3 template<typename T , typename Container , typename Join > T hetcompute-::preduce ( Container & c, const T & identity, Join && join, hetcompute-::pattern::tuner tuner = hetcompute::pattern::tuner() )

A convenient API to use for applying parallel reduction on a passed-in container.

Note: Container must have size() defined and indexable with operator []

See Also

T preduce(InputIterator, InputIterator, const T&, Reduce&&, Join&&)

Examples

[...]vector<int> vin;// Initialize vin[...]const int identity = 0;// Parallel sumauto p_sum = hetcompute::preduce(vin, identity, std::plus<int>());

Parameters

c Container on which parallel reduce is applied.identity A special element leaves other elements unchanged when combined with

them, i.e., x = identity ∗ x and x = x ∗ identity. For example, 0 is the identityelement under addition for the real numbers.

join Join calculated results from two subranges.tuner Qualcomm HetCompute pattern tuner object (optional).

Returns

T Returns the reduction result of type T.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 180

Page 181: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

8.3.2.4 template<typename T , typename Iterator , typename Join > T hetcompute-::preduce ( Iterator first, Iterator last, const T & identity, Join && join,hetcompute::pattern::tuner tuner = hetcompute::pattern::tuner() )

A convenient API to apply parallel reduction on a range specified by iterators.

Note: Developer must ensure the validity of iterators.

See Also

T preduce(InputIterator, InputIterator, const T&, Reduce&&, Join&&)

Examples

[...]vector<int> vin;// Initialize vector[...]const int identity = 0;// Parallel sumauto p_sum = hetcompute::preduce(vin.begin(), vin.end(), identity, std::plus<int>());

Parameters

first Iterator to range start.last Iterator to range end.identity A special element leaves other elements unchanged when combined with

them, i.e., x = identity ∗ x and x = x ∗ identity. For example, 0 is the identityelement under addition for the real numbers.

join Join calculated results from two subranges.tuner Qualcomm HetCompute pattern tuner object (optional).

Returns

T Returns the reduction result of type T.

8.3.2.5 template<typename T , class InputIterator , typename Reduce , typename Join> hetcompute::task_ptr<T> hetcompute::preduce_async ( InputIterator first,InputIterator last, const T & identity, Reduce && reduce, Join && join,hetcompute::pattern::tuner tuner = hetcompute::pattern::tuner() )

Create an asynchronous task from the hetcompute::preduce pattern.

Parameters

first Start of the range to parallel reduce.last End of the range to parallel reduce.identity A special element leaves other elements unchanged when combined with

them, i.e., x = identity ∗ x and x = x ∗ identity. For example, 0 is the identityelement under addition for the real numbers.

reduce Reduce function applied to each subrange and return the result.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 181

Page 182: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

join Join calculated results from two subranges.tuner Qualcomm HetCompute pattern tuner object (optional).

Returns

T Returns the reduction result of type T.

8.3.2.6 template<typename T , typename Container , typename Join > hetcompute-::task_ptr<T> hetcompute::preduce_async ( Container & c, const T &identity, Join && join, hetcompute::pattern::tuner tuner = hetcompute-::pattern::tuner() )

Create an asynchronous task from the hetcompute::preduce pattern.

Parameters

c Container on which parallel reduce is applied.identity a special element leaves other elements unchanged when combined with

them, i.e., x = identity ∗ x and x = x ∗ identity. For example, 0 is the identityelement under addition for the real numbers.

join Join calculated results from two subranges.tuner Qualcomm HetCompute pattern tuner object (optional).

Returns

T Returns the reduction result of type T.

8.3.2.7 template<typename T , typename Iterator , typename Join > hetcompute-::task_ptr<T> hetcompute::preduce_async ( Iterator first, Iterator last,const T & identity, Join && join, hetcompute::pattern::tuner tuner =hetcompute::pattern::tuner() )

Create an asynchronous task from the hetcompute::preduce pattern.

Parameters

first Iterator to range start.last Iterator to range end.identity A special element leaves other elements unchanged when combined with

them, i.e., x = identity ∗ x and x = x ∗ identity. For example, 0 is the identityelement under addition for the real numbers.

join Join calculated results from two subranges.tuner Qualcomm HetCompute pattern tuner object (optional.)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 182

Page 183: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

Returns

T Returns the reduction result of type T.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 183

Page 184: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

8.4 Parallel Scan

Classes

• class hetcompute::pattern::pscan< BinaryFn >

Functions

• template<typename BinaryFn >

pscan< BinaryFn > hetcompute::pattern::create_pscan_inclusive (BinaryFn &&fn)

• template<typename RandomAccessIterator , typename BinaryFn >

void hetcompute::pscan_inclusive (RandomAccessIterator first, RandomAccessIterator last,BinaryFn &&fn, hetcompute::pattern::tuner tuner=hetcompute::pattern::tuner())

• template<typename RandomAccessIterator , typename BinaryFn >

hetcompute::task_ptr< void()> hetcompute::pscan_inclusive_async (RandomAccessIterator first,RandomAccessIterator last, BinaryFn fn, hetcompute::pattern::tunertuner=hetcompute::pattern::tuner())

8.4.1 Class Documentation

8.4.1.1 class hetcompute::pattern::pscan

template<typename BinaryFn>class hetcompute::pattern::pscan< BinaryFn >

Public member functions

• template<typename RandomAccessIterator >

void operator() (RandomAccessIterator first, RandomAccessIterator last, consthetcompute::pattern::tuner &t=hetcompute::pattern::tuner())

• template<typename RandomAccessIterator >

void run (RandomAccessIterator first, RandomAccessIterator last, hetcompute::pattern::tunert=hetcompute::pattern::tuner())

Friends

• pscan create_pscan_inclusive (BinaryFn &&fn)

• template<typename Fn , typename... Args>

hetcompute::task_ptr< void()> hetcompute::create_task (const pscan< Fn > &ps, Args &&...args)

8.4.1.1.1 Related Function Documentation

8.4.1.1.1.1 template<typename BinaryFn > pscan create_pscan_inclusive ( BinaryFn && fn )[friend]

Create pattern object from function object fn.

Returns a pattern object which can be invoked (1) synchronously, using the run method or the () operatorwith arguments; or (2) asynchronously, using hetcompute::create_task or hetcompute::launch.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 184

Page 185: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

Examples

auto l = std::plus<size_t>();auto pscan = hetcompute::pattern::create_pscan_inclusive(l);pscan(vec.begin(), vec.end());

8.4.2 Function Documentation

8.4.2.1 template<typename BinaryFn > pscan< BinaryFn > hetcompute::pattern-::create_pscan_inclusive ( BinaryFn && fn )

Create pattern object from function object fn.

Returns a pattern object which can be invoked (1) synchronously, using the run method or the () operatorwith arguments; or (2) asynchronously, using hetcompute::create_task or hetcompute::launch.

Examples

auto l = std::plus<size_t>();auto pscan = hetcompute::pattern::create_pscan_inclusive(l);pscan(vec.begin(), vec.end());

8.4.2.2 template<typename RandomAccessIterator , typename BinaryFn > voidhetcompute::pscan_inclusive ( RandomAccessIterator first, Random-AccessIterator last, BinaryFn && fn, hetcompute::pattern::tuner tuner =hetcompute::pattern::tuner() )

Sklansky-style parallel inclusive scan.

Performs an in-place parallel prefix computation using the function object fn for the range [first, last).

fn should be associative, because the order of applications is not fixed.

Note

This function returns only after fn has been applied to the whole iteration range.Similar to hetcompute::ptransform, range iterators are restricted to typeRandomAccessIterator.The usual rules for cancellation apply, i.e., within fn the cancellation must be acknowledged usingabort_on_cancel.

Examples

// After: v’ = { v[0], v[0] x v[1], v[0] x v[1] x v[2], ... }pscan_inclusive(begin(v), end(v),

[] (size_t const& i, size_t const& j) {return i + j;

});

Parameters

first Start of the range to which to apply fn.last End of the range to which to apply fn.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 185

Page 186: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

fn Binary function object to be applied.tuner HetCompute pattern tuner object (optional).

8.4.2.3 template<typename RandomAccessIterator , typename BinaryFn >

hetcompute::task_ptr<void()> hetcompute::pscan_inclusive_async (RandomAccessIterator first, RandomAccessIterator last, BinaryFn fn,hetcompute::pattern::tuner tuner = hetcompute::pattern::tuner() )

Create an asynchronous task from the hetcompute::pscan_inclusive pattern.

Returns a task that represents the pattern’s execution. Operations on the task translates into operations onthe executing pattern. The caller must launch the task.

Parameters

first Start of the range to which to apply fn.last End of the range to which to apply fn.fn Binary function object to be applied.tuner HetCompute pattern tuner object (optional).

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 186

Page 187: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

8.5 Parallel Divide-and-Conquer

Classes

• class hetcompute::pattern::pdivide_and_conquerer< IsBaseFn, BaseFn, SplitFn, MergeFn >

Functions

• template<typename IsBaseFn , typename BaseFn , typename SplitFn , typename MergeFn >

pdivide_and_conquerer< IsBaseFn, BaseFn, SplitFn,MergeFn > hetcompute::pattern::create_pdivide_and_conquer (IsBaseFn &&isbase, BaseFn&&base, SplitFn &&split, MergeFn &&merge)

• template<typename Problem , typename Solution , typename Fn1 , typename Fn2 , typename Fn3 , typename Fn4 >

Solution hetcompute::pdivide_and_conquer (Problem p, Fn1 &&is_base, Fn2 &&base, Fn3 &&split,Fn4 &&merge, const hetcompute::pattern::tuner &tuner=hetcompute::pattern::tuner())

• template<typename Problem , typename Fn1 , typename Fn2 , typename Fn3 , typename Fn4 >

void hetcompute::pdivide_and_conquer (Problem p, Fn1 &&is_base, Fn2 &&base, Fn3 &&split, Fn4&&merge, const hetcompute::pattern::tuner &tuner=hetcompute::pattern::tuner())

• template<typename Problem , typename Fn1 , typename Fn2 , typename Fn3 >

void hetcompute::pdivide_and_conquer (Problem p, Fn1 &&is_base, Fn2 &&base, Fn3 &&split,const hetcompute::pattern::tuner &tuner=hetcompute::pattern::tuner())

• template<typename Problem , typename Solution , typename Fn1 , typename Fn2 , typename Fn3 , typename Fn4 >

hetcompute::task_ptr< Solution > hetcompute::pdivide_and_conquer_async (Problem p, Fn1&&is_base, Fn2 &&base, Fn3 &&split, Fn4 &&merge, const hetcompute::pattern::tuner&tuner=hetcompute::pattern::tuner())

• template<typename Problem , typename Fn1 , typename Fn2 , typename Fn3 , typename Fn4 >

hetcompute::task_ptr hetcompute::pdivide_and_conquer_async (Problem p, Fn1 &&is_base, Fn2&&base, Fn3 &&split, Fn4 &&merge, const hetcompute::pattern::tuner&tuner=hetcompute::pattern::tuner())

• template<typename Problem , typename Fn1 , typename Fn2 , typename Fn3 >

hetcompute::task_ptr hetcompute::pdivide_and_conquer_async (Problem p, Fn1 &&is_base, Fn2&&base, Fn3 &&split, const hetcompute::pattern::tuner &tuner=hetcompute::pattern::tuner())

8.5.1 Class Documentation

8.5.1.1 class hetcompute::pattern::pdivide_and_conquerer

template<typename IsBaseFn, typename BaseFn, typename SplitFn, typename MergeFn>classhetcompute::pattern::pdivide_and_conquerer< IsBaseFn, BaseFn, SplitFn, MergeFn >

Public member functions

• _Solution operator() (_Problem &p, const hetcompute::pattern::tuner&pt=hetcompute::pattern::tuner())

• _Solution run (_Problem &p, const hetcompute::pattern::tuner &pt=hetcompute::pattern::tuner())

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 187

Page 188: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

Friends

• pdivide_and_conquerer create_pdivide_and_conquer (IsBaseFn &&isbase, BaseFn &&base, SplitFn&&split, MergeFn &&merge)

• template<typename Problem , typename Solution , typename Fn1 , typename Fn2 , typename Fn3 , typename Fn4 >

hetcompute::task_ptr< Solution > hetcompute::create_task (const pdivide_and_conquerer< Fn1,Fn2, Fn3, Fn4 > &pdnc, Problem p, const hetcompute::pattern::tuner &t)

• template<typename Problem , typename Fn1 , typename Fn2 , typename Fn3 , typename Fn4 >

hetcompute::task_ptr hetcompute::create_task (const pdivide_and_conquerer< Fn1, Fn2, Fn3, Fn4> &pdnc, Problem p, const hetcompute::pattern::tuner &t)

• template<typename Code , typename IsPattern , class Enable >

struct hetcompute::internal::task_factory

8.5.1.1.1 Related Function Documentation

8.5.1.1.1.1 template<typename IsBaseFn , typename BaseFn , typename SplitFn , typename MergeFn >

pdivide_and_conquerer create_pdivide_and_conquer ( IsBaseFn && isbase, BaseFn &&base, SplitFn && split, MergeFn && merge ) [friend]

Create a pattern object from function objects isbase, base, split, and merge.

Returns a pattern object which can be invoked (1) synchronously, using the run method or the () operatorwith arguments; or (2) asynchronously, using hetcompute::create_task or hetcompute::launch.

Examples

// Calculate Fibonacci sequence in parallelauto fib = hetcompute::pattern::create_pdivide_and_conquer(

[](size_t m){ return m <= 1; },[](size_t m){ return m; },[](size_t m){return std::vector<size_t>({m - 1, m - 2});},[](size_t, std::vector<size_t>& sols){return sols[0] + sols[1];});

size_t input = 10;auto par = fib.run(input);

Parameters

isbase Indicates if the problem is a base case problem.base Solves a base case problem.split Splits the problem into subproblems and returns them.merge Merges the solutions to subproblems of p returned by split.

8.5.2 Function Documentation

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 188

Page 189: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

8.5.2.1 template<typename IsBaseFn , typename BaseFn , typename SplitFn ,typename MergeFn > pdivide_and_conquerer< IsBaseFn, BaseFn, SplitFn,MergeFn > hetcompute::pattern::create_pdivide_and_conquer ( IsBaseFn &&isbase, BaseFn && base, SplitFn && split, MergeFn && merge )

Create a pattern object from function objects isbase, base, split, and merge.

Returns a pattern object which can be invoked (1) synchronously, using the run method or the () operatorwith arguments; or (2) asynchronously, using hetcompute::create_task or hetcompute::launch.

Examples

// Calculate Fibonacci sequence in parallelauto fib = hetcompute::pattern::create_pdivide_and_conquer(

[](size_t m){ return m <= 1; },[](size_t m){ return m; },[](size_t m){return std::vector<size_t>({m - 1, m - 2});},[](size_t, std::vector<size_t>& sols){return sols[0] + sols[1];});

size_t input = 10;auto par = fib.run(input);

Parameters

isbase Indicates if the problem is a base case problem.base Solves a base case problem.split Splits the problem into subproblems and returns them.merge Merges the solutions to subproblems of p returned by split.

8.5.2.2 template<typename Problem , typename Solution , typename Fn1 ,typename Fn2 , typename Fn3 , typename Fn4 > Solution hetcompute-::pdivide_and_conquer ( Problem p, Fn1 && is_base, Fn2 && base, Fn3&& split, Fn4 && merge, const hetcompute::pattern::tuner & tuner =hetcompute::pattern::tuner() )

Parallel divide-and-conquer

Solve a problem by splitting it into independent subproblems, which may be solved in parallel, and mergingthe solutions to the subproblems. A subproblem may recursively be split into yet more problems, yieldingsignificant parallelism, e.g., Fibonacci.

Note: For best performance, make split and merge relatively inexpensive compared to base.

Example: Compute n-th Fibonacci term

1 #include <sstream>2 #include <vector>34 #include <hetcompute/hetcompute.hh>5810 static size_t11 fibonacci_s(size_t n)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 189

Page 190: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

12 {13 if (n == 0 || n == 1)14 {15 return n;16 }17 else18 {19 return fibonacci_s(n - 1) + fibonacci_s(n - 2);20 }21 }2224 static const size_t GRANULARITY = 20;2527 static size_t28 fibonacci(size_t n)29 {30 return hetcompute::pdivide_and_conquer<size_t, size_t>(31 // Problem is to compute the n-th Fibonacci term32 n,33 // When should an arbitrary Fibonacci term, represented by ’m’, be34 // computed sequentially?35 // Note that programmer chooses to compute Fibonacci terms 20 and lower36 // sequentially for best performance.37 [](size_t& m) { return m <= GRANULARITY; },38 // How to compute the term sequentially39 [](size_t& m) { return fibonacci_s(m); },40 // Split problem into independent subproblems41 [](size_t& m) {42 return std::vector<size_t>({ m - 1, m - 2 });43 },44 // Merge solutions to subproblems.45 // Note that the first parameter (size_t, corresponding to the split46 // problem) is unused in this case, but may be useful while merging in47 // other cases.48 [](size_t, std::vector<size_t>& sols) { return sols[0] + sols[1]; });49 }5051 int52 main(int argc, const char* argv[])53 {54 hetcompute::runtime::init();55 size_t n_def = 24;56 size_t n = n_def;5758 if (argc >= 2)59 {60 std::istringstream istr(argv[1]);61 istr >> n;62 }6364 size_t out = fibonacci(n);6566 if (out != fibonacci_s(n))67 {68 std::cerr << "parallel fibonacci failed\n";69 }70 hetcompute::runtime::shutdown();71 return 0;72 }

Parameters

p Problem data structure operated on by the functions.is_base Returns TRUE if p is a base case problem, else FALSE.base Solves a base case problem.split Splits p into subproblems and returns them.merge Merges the solutions to subproblems of p returned by split.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 190

Page 191: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

tuner Qualcomm HetCompute pattern tuner object (optional).

Returns

Solution data structure representing solution to the problem.

8.5.2.3 template<typename Problem , typename Fn1 , typename Fn2 , typenameFn3 , typename Fn4 > void hetcompute::pdivide_and_conquer ( Problemp, Fn1 && is_base, Fn2 && base, Fn3 && split, Fn4 && merge, consthetcompute::pattern::tuner & tuner = hetcompute::pattern::tuner() )

Parallel divide-and-conquer specialized for not returning a solution, e.g., mergesort.

Parameters

p Problem data structure operated on by the functions.is_base Returns TRUE if p is a base case problem, else FALSE.base Solves a base case problem, but does not return any solution.split Splits p into subproblems and returns them.merge Merges subproblems returned by split.tuner Qualcomm HetCompute pattern tuner object (optional).

8.5.2.4 template<typename Problem , typename Fn1 , typename Fn2 , typename Fn3> void hetcompute::pdivide_and_conquer ( Problem p, Fn1 && is_base,Fn2 && base, Fn3 && split, const hetcompute::pattern::tuner & tuner =hetcompute::pattern::tuner() )

Parallel divide-and-conquer specialized for no merge of subproblems and not returning a solution, e.g.,quicksort.

Example: Quicksort n random integers

1 #include <algorithm>2 #include <array>3 #include <cstdlib>4 #include <functional>5 #include <sstream>6 #include <utility>78 #include <hetcompute/hetcompute.hh>91521 template <typename Iterator>22 struct QuickSort23 {24 QuickSort(Iterator _begin, Iterator _end) : begin(_begin), end(_end), middle() {}25 Iterator begin, end, middle;26 };2729 const size_t GRANULARITY = 8192;3033 template <typename Iterator, typename Compare>

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 191

Page 192: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

34 void35 quicksort(Iterator begin, Iterator end, Compare cmp)36 {37 typedef QuickSort<Iterator> QuickSort;38 hetcompute::pdivide_and_conquer(39 // Main problem40 QuickSort(begin, end),41 // When should an arbitrary array, represented by ’q’, be sorted42 // sequentially?43 // Note that programmer chooses to sort arrays smaller than size 819244 // sequentially for best performance.45 [&](QuickSort& q) {46 size_t n = std::distance(q.begin, q.end);47 if (n <= GRANULARITY)48 {49 return true;50 }51 // Choice of first element as pivot is arbitrary52 auto pivot = *q.begin;53 q.middle = std::partition(q.begin, q.end, std::bind2nd(cmp, pivot));54 // If middle == begin, elements in [begin, end) are greater than or55 // equal to pivot. We could either find a new pivot or as we do here,56 // just sort sequentially.57 return q.middle == q.begin;58 },59 // Sequential sort used60 [&](QuickSort& q) { std::sort(q.begin, q.end, cmp); },61 // Split problem into two subproblems62 [&](QuickSort& q) {63 std::array<QuickSort, 2> subarrays{ { QuickSort(q.begin, q.middle), QuickSort(q.middle, q.end)

} };64 return subarrays;65 });66 }6768 int69 main(int argc, const char* argv[])70 {71 hetcompute::runtime::init();72 std::vector<long> input;73 size_t n_def = 1 << 16;74 size_t n = n_def;7576 if (argc >= 2)77 {78 std::istringstream istr(argv[1]);79 istr >> n;80 }8182 // Create a random array of integers83 for (size_t i = 0; i < n; i++)84 {85 input.push_back(rand());86 }8788 quicksort(input.begin(), input.end(), std::less<long>());8990 if (!std::is_sorted(input.begin(), input.end()))91 {92 std::cerr << "parallel quicksorting failed\n";93 }9495 hetcompute::runtime::shutdown();9697 return 0;98 }

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 192

Page 193: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

Parameters

p Problem data structure operated on by the functions.is_base Returns TRUE if p is a base case problem, else FALSE.base Solves a base case problem, but does not return any solution.split Splits p into subproblems and returns them.tuner Qualcomm HetCompute pattern tuner object (optional).

8.5.2.5 template<typename Problem , typename Solution , typename Fn1 , typenameFn2 , typename Fn3 , typename Fn4 > hetcompute::task_ptr<Solution>hetcompute::pdivide_and_conquer_async ( Problem p, Fn1 && is_base, Fn2&& base, Fn3 && split, Fn4 && merge, const hetcompute::pattern::tuner &tuner = hetcompute::pattern::tuner() )

Asynchronous Parallel divide-and-conquer

Returns a task that represents the pattern’s execution. Operations on the task translate into operations on theexecuting pattern. The caller must launch the task.

Parameters

p Problem data structure operated on by the functions.is_base Returns TRUE if p is a base case problem, else FALSE.base Solves a base case problem.split Splits p into subproblems and returns them.merge Merges solutions to subproblems of p returned by split.tuner Qualcomm HetCompute pattern tuner object (optional).

Returns

task_ptr Unlaunched task representing pattern’s execution.

8.5.2.6 template<typename Problem , typename Fn1 , typename Fn2 , typenameFn3 , typename Fn4 > hetcompute::task_ptr hetcompute::pdivide_and-_conquer_async ( Problem p, Fn1 && is_base, Fn2 && base, Fn3&& split, Fn4 && merge, const hetcompute::pattern::tuner & tuner =hetcompute::pattern::tuner() )

Asynchronous parallel divide-and-conquer specialized for not returning a solution, e.g., mergesort.

Returns a task that represents the pattern’s execution. Operations on the task translate into operations on theexecuting pattern. The caller must launch the task.

Parameters

p Problem data structure operated on by the functions.is_base Returns TRUE if p is a base case problem, else FALSE.base Solves a base case problem.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 193

Page 194: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

split Splits p into subproblems and returns them.merge Merges solutions to subproblems of p returned by split.tuner Qualcomm HetCompute pattern tuner object (optional).

Returns

task_ptr Unlaunched task representing pattern’s execution.

8.5.2.7 template<typename Problem , typename Fn1 , typename Fn2 , typenameFn3 > hetcompute::task_ptr hetcompute::pdivide_and_conquer_async( Problem p, Fn1 && is_base, Fn2 && base, Fn3 && split, consthetcompute::pattern::tuner & tuner = hetcompute::pattern::tuner() )

Asynchronous parallel divide-and-conquer specialized for no merge of subproblems and not returning asolution, e.g., quicksort.

Returns a task that represents the pattern’s execution. Operations on the task translate into operations on theexecuting pattern. The caller must launch the task.

Parameters

p Problem data structure operated on by the functions.is_base Returns TRUE if p is a base case problem, else FALSE.base Solves a base case problem, but does not return any solution.split Splits p into subproblems and returns them.tuner Qualcomm HetCompute pattern tuner object (optional).

Returns

task_ptr unlaunched task representing pattern’s execution.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 194

Page 195: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

8.6 Parallel Sorting

Classes

• class hetcompute::pattern::psorter< Compare >

Functions

• template<typename Compare >

psorter< Compare > hetcompute::pattern::create_psort (Compare &&cmp)

• template<class RandomAccessIterator , class Compare >

void hetcompute::psort (RandomAccessIterator first, RandomAccessIterator last, Compare cmp,const hetcompute::pattern::tuner &tuner=hetcompute::pattern::tuner())

• template<class RandomAccessIterator >

void hetcompute::psort (RandomAccessIterator first, RandomAccessIterator last, consthetcompute::pattern::tuner &tuner=hetcompute::pattern::tuner())

• template<class RandomAccessIterator , class Compare >

hetcompute::task_ptr< void()> hetcompute::psort_async (RandomAccessIterator first,RandomAccessIterator last, Compare &&cmp, const hetcompute::pattern::tuner&tuner=hetcompute::pattern::tuner())

• template<class RandomAccessIterator >

hetcompute::task_ptr< void()> hetcompute::psort_async (RandomAccessIterator first,RandomAccessIterator last, const hetcompute::pattern::tuner &tuner=hetcompute::pattern::tuner())

8.6.1 Class Documentation

8.6.1.1 class hetcompute::pattern::psorter

template<typename Compare>class hetcompute::pattern::psorter< Compare >

Public member functions

• psorter (Compare &&cmp)

• template<typename RandomAccessIterator >

void operator() (RandomAccessIterator first, RandomAccessIterator last, consthetcompute::pattern::tuner &t=hetcompute::pattern::tuner())

• template<typename RandomAccessIterator >

void run (RandomAccessIterator first, RandomAccessIterator last, const hetcompute::pattern::tuner&t=hetcompute::pattern::tuner())

Friends

• psorter create_psort (Compare &&cmp)

• template<typename Cmp , typename... Args>

hetcompute::task_ptr< void()> hetcompute::create_task (const psorter< Cmp > &p, Args&&...args)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 195

Page 196: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

8.6.1.1.1 Related Function Documentation

8.6.1.1.1.1 template<typename Compare > psorter create_psort ( Compare && cmp ) [friend]

Create hetcompute::psort from function objects cmp.

Returns a pattern object which can be invoked (1) synchronously, using the run method or the () operatorwith arguments; or (2) asynchronously, using hetcompute::create_task or hetcompute::launch.

Examples

vector<int> vin(100000);Rand_int rnd{0, int(vin.size() - 1)};

// Generate 100,000 random integersfor (auto& i : vin)

i = rnd();

// Sort the vector using <code>hetcompute::psort</code>auto p = hetcompute::pattern::create_psort([](int l, int r){return r < l;}

);p(vin.begin(), vin.end());

Parameters

cmp User-customized compare function object to be applied.

8.6.2 Function Documentation

8.6.2.1 template<typename Compare > psorter< Compare > hetcompute::pattern-::create_psort ( Compare && cmp )

Create hetcompute::psort from function objects cmp.

Returns a pattern object which can be invoked (1) synchronously, using the run method or the () operatorwith arguments; or (2) asynchronously, using hetcompute::create_task or hetcompute::launch.

Examples

vector<int> vin(100000);Rand_int rnd{0, int(vin.size() - 1)};

// Generate 100,000 random integersfor (auto& i : vin)

i = rnd();

// Sort the vector using <code>hetcompute::psort</code>auto p = hetcompute::pattern::create_psort([](int l, int r){return r < l;}

);p(vin.begin(), vin.end());

Parameters

cmp User-customized compare function object to be applied.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 196

Page 197: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

8.6.2.2 template<class RandomAccessIterator , class Compare > void hetcompute-::psort ( RandomAccessIterator first, RandomAccessIterator last, Comparecmp, const hetcompute::pattern::tuner & tuner = hetcompute::pattern-::tuner() )

Parallel version of std::sort.

Performs an unstable in-place comparison sort of a container using the supplied cmp function.

Examples

// Sort vin using <code>hetcompute::psort</code>psort(vin.begin(), vin.end(), [](int l, int r){ return r < l; });

See Also

psort(RandomAccessIterator, RandomAccessIterator)

Parameters

first Start of the range to sort.last End of the range to sort.cmp User-customized compare function object to be applied.tuner Qualcomm HetCompute pattern tuner object (optional).

8.6.2.3 template<class RandomAccessIterator > void hetcompute::psort (RandomAccessIterator first, RandomAccessIterator last, const hetcompute-::pattern::tuner & tuner = hetcompute::pattern::tuner() )

Parallel version of std::sort.

Performs an unstable in-place comparison sort of a container. Equivalent to psort(first, last, std::less<T>())where T is the value type of the iterators.

See Also

psort(RandomAccessIterator, RandomAccessIterator, Compare)

Parameters

first Start of the range to sort.last End of the range to sort.tuner Qualcomm HetCompute pattern tuner object (optional).

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 197

Page 198: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

8.6.2.4 template<class RandomAccessIterator , class Compare > hetcompute-::task_ptr<void()> hetcompute::psort_async ( RandomAccessIterator first,RandomAccessIterator last, Compare && cmp, const hetcompute::pattern-::tuner & tuner = hetcompute::pattern::tuner() )

Create an asynchronous task from the hetcompute::psort pattern.

Parameters

first Start of the range to sort.last End of the range to sort.cmp User-customized compare function object to be applied.tuner Qualcomm HetCompute pattern tuner object (optional).

8.6.2.5 template<class RandomAccessIterator > hetcompute::task_ptr<void()>hetcompute::psort_async ( RandomAccessIterator first, RandomAccess-Iterator last, const hetcompute::pattern::tuner & tuner = hetcompute::pattern-::tuner() )

Parallel version of std::sort (asynchronous).

Returns a task that represents the pattern’s execution. Operations on the task translate into operations on theexecuting pattern. The caller must launch the task.

See Also

psort(RandomAccessIterator, RandomAccessIterator)

Parameters

first Start of the range to sort.last End of the range to sort.tuner Qualcomm HetCompute pattern tuner object (optional).

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 198

Page 199: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

8.7 Pipeline

Classes

• class hetcompute::iteration_lag

Pipeline stage iteration lag. More...

• class hetcompute::iteration_rate

Pipeline stage iteration match rate. More...

• class hetcompute::parallel_stage

Parallel pipeline stage for specifying the type of the stages when adding to the pipeline. More...

• class hetcompute::pattern::pipeline< UserData >

Pipeline class. More...

• class hetcompute::pipeline_context< UserData >

Pipeline_context with one user data. More...

• class hetcompute::pipeline_context<>

Pipeline_context with no user data. More...

• class hetcompute::pipeline_context_base

Pipeline context class. More...

• class hetcompute::serial_stage

Serial stage for specifying the type of the stages when adding to the pipeline. More...

• class hetcompute::sliding_window_size

Pipeline stage sliding window size. More...

• class hetcompute::stage_input< InputType >

Pipeline stage input class. More...

Typedefs

• typedef enumhetcompute::serial_stage_type hetcompute::serial_stage_type

Serial pipeline stage types.

Enumerations

• enum hetcompute::serial_stage_type { in_order = 0 }

Serial pipeline stage types.

8.7.1 Class Documentation

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 199

Page 200: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

8.7.1.1 class hetcompute::iteration_lag

Pipeline stage iteration lag

Public member functions

• iteration_lag (size_t lag)

Constructor.

• iteration_lag (iteration_lag const &other)

Copy constructor.

• iteration_lag (iteration_lag &&other)

Move constructor.

• size_t get_iter_lag () const

Get the stage lag.

• HETCOMPUTE_DELETE_METHOD (iteration_lag &operator=(iteration_lag &&other))

• iteration_lag & operator= (iteration_lag const &other)

Copy assignment operator.

8.7.1.1.1 Constructors and Destructors

8.7.1.1.1.1 hetcompute::iteration_lag::iteration_lag ( size_t lag ) [explicit]

Constructor.

8.7.1.1.1.2 hetcompute::iteration_lag::iteration_lag ( iteration_lag const & other )

Copy constructor.

8.7.1.1.1.3 hetcompute::iteration_lag::iteration_lag ( iteration_lag && other ) [explicit]

Move constructor.

8.7.1.1.2 Member Function Documentation

8.7.1.1.2.1 size_t hetcompute::iteration_lag::get_iter_lag ( ) const

Get the stage lag.

Returns

size_t Stage lag.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 200

Page 201: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

8.7.1.1.2.2 iteration_lag& hetcompute::iteration_lag::operator= ( iteration_lag const & other )

Copy assignment operator.

8.7.1.2 class hetcompute::iteration_rate

Pipeline stage iteration match rate.

Public member functions

• iteration_rate (size_t p, size_t c)

Constructor.

• iteration_rate (iteration_rate const &other)

Copy constructor.

• iteration_rate (iteration_rate &&other)

Move constructor.

• size_t get_iter_rate_curr () const

Get the iteration rate for the curr stage.

• size_t get_iter_rate_pred () const

Get the iteration rate for the prev stage.

• HETCOMPUTE_DELETE_METHOD (iteration_rate &operator=(iteration_rate &&other))

• iteration_rate & operator= (iteration_rate const &other)

Copy assignment operator.

8.7.1.2.1 Constructors and Destructors

8.7.1.2.1.1 hetcompute::iteration_rate::iteration_rate ( size_t p, size_t c )

Constructor.

8.7.1.2.1.2 hetcompute::iteration_rate::iteration_rate ( iteration_rate const & other )

Copy constructor.

8.7.1.2.1.3 hetcompute::iteration_rate::iteration_rate ( iteration_rate && other ) [explicit]

Move constructor.

8.7.1.2.2 Member Function Documentation

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 201

Page 202: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

8.7.1.2.2.1 size_t hetcompute::iteration_rate::get_iter_rate_curr ( ) const

Get the iteration rate for the curr stage.

Returns

size_t Iteration rate for the current stage.

8.7.1.2.2.2 size_t hetcompute::iteration_rate::get_iter_rate_pred ( ) const

Get the iteration rate for the prev stage.

Returns

size_t Iteration rate for the previous stage.

8.7.1.2.2.3 iteration_rate& hetcompute::iteration_rate::operator= ( iteration_rate const & other )

Copy assignment operator.

8.7.1.3 class hetcompute::parallel_stage

Parallel pipeline stage for specifying the type of the stages when adding to the pipeline.

Public member functions

• parallel_stage (size_t doc)

Constructor.

• parallel_stage (parallel_stage const &other)

Copy constructor.

• size_t get_degree_of_concurrency () const

Get the degree of concurrency for a parallel stage.

• HETCOMPUTE_DELETE_METHOD (parallel_stage &operator=(parallel_stage &&other))

• parallel_stage & operator= (parallel_stage const &other)

Copy assignment operator.

8.7.1.3.1 Constructors and Destructors

8.7.1.3.1.1 hetcompute::parallel_stage::parallel_stage ( size_t doc ) [explicit]

Constructor.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 202

Page 203: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

Parameters

doc Degree of concurrency for the parallel stage. Degree of concurrency (doc):should be a positive integer number. It specifies the maximum number ofconsecutive stage iterations that can run in parallel for this stage. When doc= 1, the parallel stage will behave like a serial stage.

8.7.1.3.1.2 hetcompute::parallel_stage::parallel_stage ( parallel_stage const & other )

Copy constructor.

8.7.1.3.2 Member Function Documentation

8.7.1.3.2.1 size_t hetcompute::parallel_stage::get_degree_of_concurrency ( ) const

Get the degree of concurrency for a parallel stage.

Returns

size_t Degree of concurrency.

8.7.1.3.2.2 parallel_stage& hetcompute::parallel_stage::operator= ( parallel_stage const & other )

Copy assignment operator.

8.7.1.4 class hetcompute::pattern::pipeline

template<typename... UserData>class hetcompute::pattern::pipeline< UserData >

Pipeline class.

Template Parameters

UserData The type for the pipeline context data or empty, i.e.,hetcompute::pattern::pipeline<size_t> orhetcompute::pattern::pipeline<>.

Public Types

• using context = pipeline_context< UserData...>

Context type for the pipeline.

Public member functions

• pipeline ()

Constructor.

• pipeline (pipeline const &other)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 203

Page 204: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

Copy constructor.

• pipeline (pipeline &&other)

Move constructor.

• virtual ∼pipeline ()

Destructor.

• template<typename... Confs>

hetcompute::task_ptr create_task (UserData ∗...context_data, size_t num_iterations, Confs&&...confs) const

Create a task for the pipeline for asynchronous execution.

• hetcompute::task_ptr< void(UserData∗..., size_t)> create_task (const hetcompute::pattern::tuner &t=hetcompute::pattern::tuner()) const

Create a task for the pipeline for asynchronous execution.

• void disable_sliding_window ()

Disable the pipeline sliding window launch type.

• void enable_sliding_window ()

Enable the pipeline launch type to be with sliding window.

• bool is_valid ()

Pipeline sanity check for stage IO types and sliding window size.

• pipeline & operator= (pipeline const &other)

Copy assignment operator.

• pipeline & operator= (pipeline &&other)

Move assignment operator.

• template<typename... Confs>

void run (UserData ∗...context_data, size_t num_iterations, Confs &&...confs) const

Launch and wait for the pipeline.

8.7.1.4.1 Member Typedef Documentation

8.7.1.4.1.1 template<typename... UserData> using hetcompute::pattern::pipeline< UserData>::context = pipeline_context<UserData...>

Context type for the pipeline.

8.7.1.4.2 Constructors and Destructors

8.7.1.4.2.1 template<typename... UserData> hetcompute::pattern::pipeline< UserData >::pipeline ( )

Constructor.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 204

Page 205: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

8.7.1.4.2.2 template<typename... UserData> virtual hetcompute::pattern::pipeline< UserData>::∼pipeline ( ) [virtual]

Destructor.

Reimplemented in hetcompute::beta::pattern::pipeline< UserData >.

8.7.1.4.2.3 template<typename... UserData> hetcompute::pattern::pipeline< UserData >::pipeline (pipeline< UserData > const & other )

Copy constructor.

8.7.1.4.2.4 template<typename... UserData> hetcompute::pattern::pipeline< UserData >::pipeline (pipeline< UserData > && other )

Move constructor.

8.7.1.4.3 Member Function Documentation

8.7.1.4.3.1 template<typename... UserData> template<typename... Confs> hetcompute::task_ptrhetcompute::pattern::pipeline< UserData >::create_task ( UserData ∗... context_data,size_t num_iterations, Confs &&... confs ) const

Create a task for the pipeline for asynchronous execution. Do not call this member function if the pipelinehas no stages. This would cause a fatal error.

Parameters

context_data Pointer to the data for the pipeline context if the pipeline is defined ashaving one, i.e., sizeof...(UserData) == 1.

num_iterations The total number of iterations for the first stage. Note: if num_iterations ==

0, the pipeline runs infinite number of iterations until the first stage stops thepipeline.

confs Other configurations for launching a task out of pipeline. Currently, onlysupport one tuner object for the pipeline (optional).

Returns

hetcompute::task_ptr<> The pointer to the task in which the pipeline is running.

1 #include <hetcompute/hetcompute.hh>23 //4 // Pipeline without context data (wcd),5 // Known iterations before launch(iter)6 // Launch by creating tasks (ct)7 // Through the pipeline object (obj)8 //9 int10 main()11 {12 hetcompute::runtime::init();13 // Define a pipeline skeleton, without pipeline context data.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 205

Page 206: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

14 hetcompute::pattern::pipeline<> p;1516 // Pipeline context type.17 typedef hetcompute::pattern::pipeline<>::context

context;1819 // Add a serial first stage.20 p.add_stage(hetcompute::serial_stage(), [](context& ctx) {21 size_t iter = ctx.get_iter_id();22 // some usage of iter23 HETCOMPUTE_ILOG("iter: %zu", iter);24 });2526 // Add a parallel stage with degree of concurrency of 4.27 p.add_stage(hetcompute::parallel_stage(4), [](context&) {});2829 // Add a serial stage.30 p.add_stage(hetcompute::serial_stage(), [](context&) {});3132 // Asynchronous launch.33 // Create a task of a pipeline that runs for 20 iterations.34 // Run the pipeline as if the stages are using sliding windows.35 p.enable_sliding_window();36 auto t1 = p.create_task(20);37 // Launch the pipeline and do not block.38 t1->launch();3940 // Create a task of a pipeline that runs for 10 iterations.41 // Run the pipeline as if the stages are not using sliding windows.42 p.disable_sliding_window();43 auto t2 = p.create_task(10);44 // Launch the pipeline and do not block.45 t2->launch();4647 // Wait for the first pipeline to stop.48 t1->wait_for();49 // Wait for the second pipeline to stop.50 t2->wait_for();5152 std::cout << "pipeline1 runs 20 iters" << std::endl;53 std::cout << "pipeline2 runs 10 iters" << std::endl;5455 hetcompute::runtime::shutdown();56 return 0;57 }

8.7.1.4.3.2 template<typename... UserData> hetcompute::task_ptr<void(UserData∗..., size_t)>hetcompute::pattern::pipeline< UserData >::create_task ( const hetcompute::pattern::tuner& t = hetcompute::pattern::tuner() ) const

Create a task for the pipeline for asynchronous execution. The task arguments need to be bound later. Donot call this member function if the pipeline has no stages. This would cause a fatal error.

Parameters

t One tuner object for the pipeline (optional).

Returns

hetcompute::task_ptr<void(UserData∗..., size_t num_iterations)> The pointer to the task in which thepipeline is running. Here, UserData∗... is for the pipeline context data (if there is one), size_t is forspecifying the number of iternations. Both of them need to be bound before launching the task.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 206

Page 207: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

1 #include <hetcompute/hetcompute.hh>23 //4 // Pipeline with context data (wcd),5 // On the fly stop (ofs)6 // Create task and launch later (ctlchl)7 // Through the pipeline object (obj)8 //9 int10 main()11 {12 hetcompute::runtime::init();13 // Define a pipeline skeleton, with pipeline context data of type size_t.14 hetcompute::pattern::pipeline<size_t> p;1516 // pipeline context type17 typedef hetcompute::pattern::pipeline<size_t>::context

context;1819 // Add a serial first stage.20 p.add_stage(hetcompute::serial_stage(), [](context& ctx) {21 size_t iter = ctx.get_iter_id();22 size_t data = *ctx.get_data();23 if (iter == data - 1)24 {25 ctx.stop_pipeline();26 }27 });2829 // Add a parallel stage with degree of concurrency of 4.30 p.add_stage(hetcompute::parallel_stage(4), [](context& ctx) {31 size_t iter = ctx.get_iter_id();32 size_t data = *ctx.get_data();33 // some usage of iter and data here34 HETCOMPUTE_ILOG("iter: %zu, data: %zu", iter, data);35 });3637 // Add a serial stage.38 p.add_stage(hetcompute::serial_stage(), [](context&) {});3940 // Define the context data.41 size_t num1 = 20;42 size_t num2 = 10;4344 // Asynchronous launch.45 // Create a task of a pipeline that runs for num1 iterations.46 // Run the pipeline as if the stages are using sliding windows.47 //48 // Here the total number of iterations is set to be 0 (infinite number of runs).49 // The first stage of the pipeline does dynamic checking to stop the pipeline on the fly.50 // The total number of pipeline iterations is specified by using the pipeline context data.51 p.enable_sliding_window();52 auto t1 = p.create_task();53 // Launch the pipeline, bind the arguments, and do not block.54 t1->launch(&num1, 0);5556 // Create a task of a pipeline that runs for num2 iterations.57 // Run the pipeline as if the stages are not using sliding windows.58 //59 // Here the total number of iterations is set to be 0 (infinite number of runs).60 // The first stage of the pipeline does dynamic checking to stop the pipeline on the fly.61 // The total number of pipeline iterations is specified by using the pipeline context data.62 p.disable_sliding_window();63 auto t2 = p.create_task();64 // Bind the arguments to the task.65 t2->bind_all(&num2, 0);66 // Launch the pipeline and do not block.67 t2->launch();6869 // Wait for the first pipeline to stop.70 t1->wait_for();

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 207

Page 208: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

71 // Wait for the second pipeline to stop.72 t2->wait_for();7374 std::cout << "pipeline1 runs " << num1 << " iters" << std::endl;75 std::cout << "pipeline2 runs " << num2 << " iters" << std::endl;7677 hetcompute::runtime::shutdown();78 return 0;79 }

1 #include <hetcompute/hetcompute.hh>23 //4 // Pipeline with context data (wcd),5 // Known iterations before launch(iter)6 // Create task and launch later (ctlchl)7 // Through the pipeline object (obj)8 //9 int10 main()11 {12 hetcompute::runtime::init();13 // Define a pipeline skeleton, with pipeline context data of type size_t.14 hetcompute::pattern::pipeline<size_t> p;1516 // pipeline context type17 typedef hetcompute::pattern::pipeline<size_t>::context

context;1819 // Add a serial first stage.20 p.add_stage(hetcompute::serial_stage(), [](context& ctx) {21 size_t iter = ctx.get_iter_id();22 size_t data = *ctx.get_data();23 if (iter == data - 1)24 {25 ctx.stop_pipeline();26 }27 });2829 // Add a parallel stage with degree of concurrency of 4.30 p.add_stage(hetcompute::parallel_stage(4), [](context& ctx) {31 size_t iter = ctx.get_iter_id();32 size_t data = *ctx.get_data();33 // some usage of iter and data here34 HETCOMPUTE_ILOG("iter: %zu, data: %zu", iter, data);35 });3637 // Add a serial stage.38 p.add_stage(hetcompute::serial_stage(), [](context&) {});3940 // Define the context data.41 size_t num1 = 20;42 size_t num2 = 10;4344 // Asynchronous launch.45 // Create a task of a pipeline that runs for num1 iterations.46 // Run the pipeline as if the stages are using sliding windows.47 //48 // Here the total number of iterations is set to be 0 (infinite number of runs).49 // The first stage of the pipeline does dynamic checking to stop the pipeline on the fly.50 // The total number of pipeline iterations is specified by using the pipeline context data.51 p.enable_sliding_window();52 auto t1 = p.create_task();53 // Launch the pipeline, bind the arguments, and do not block.54 t1->launch(&num1, 0);5556 // Create a task of a pipeline that runs for num2 iterations.57 // Run the pipeline as if the stages are not using sliding windows.58 //59 // Here the total number of iterations is set to be 0 (infinite number of runs).60 // The first stage of the pipeline does dynamic checking to stop the pipeline on the fly.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 208

Page 209: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

61 // The total number of pipeline iterations is specified by using the pipeline context data.62 p.disable_sliding_window();63 auto t2 = p.create_task();64 // Bind the arguments to the task.65 t2->bind_all(&num2, 0);66 // Launch the pipeline and do not block.67 t2->launch();6869 // Wait for the first pipeline to stop.70 t1->wait_for();71 // Wait for the second pipeline to stop.72 t2->wait_for();7374 std::cout << "pipeline1 runs " << num1 << " iters" << std::endl;75 std::cout << "pipeline2 runs " << num2 << " iters" << std::endl;76 hetcompute::runtime::shutdown();77 return 0;78 }

8.7.1.4.3.3 template<typename... UserData> void hetcompute::pattern::pipeline< UserData>::disable_sliding_window ( )

Disable the pipeline sliding window launch type and there won’t be any control on the memory footprint.

8.7.1.4.3.4 template<typename... UserData> void hetcompute::pattern::pipeline< UserData>::enable_sliding_window ( )

Enable the pipeline launch type to be with sliding window.

8.7.1.4.3.5 template<typename... UserData> bool hetcompute::pattern::pipeline< UserData >::is_valid( )

Pipeline sanity check for stage IO types and sliding window size.

Returns

TRUE (pass) or FALSE (fail)

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();7 const size_t num_iters = 100;89 hetcompute::pattern::pipeline<std::array<size_t, num_iters>

> p;10 typedef hetcompute::pattern::pipeline<std::array<size_t, num_iters>

>::context context;1112 // Add a parallel stage which behaves like a serial stage.13 p.add_stage(hetcompute::parallel_stage(1), [](context&) {});1415 // Add a parallel stage with doc = 8, no lag.16 p.add_stage(hetcompute::parallel_stage(8), [](context&) {});1718 // Add a serial stage.19 p.add_stage(hetcompute::serial_stage(), [](context& ctx) {20 size_t i = ctx.get_iter_id();

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 209

Page 210: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

21 (*ctx.get_data())[i] = i;22 });2324 // Sanity check.25 if (p.is_valid())26 {27 HETCOMPUTE_ILOG("The pipeline settings are valid.");28 }29 else30 {31 HETCOMPUTE_ILOG("The pipeline settings are not valid.");32 }3334 hetcompute::runtime::shutdown();35 return 0;36 }

8.7.1.4.3.6 template<typename... UserData> pipeline& hetcompute::pattern::pipeline< UserData>::operator= ( pipeline< UserData > const & other )

Copy assignment operator.

8.7.1.4.3.7 template<typename... UserData> pipeline& hetcompute::pattern::pipeline< UserData>::operator= ( pipeline< UserData > && other )

Move assignment operator.

8.7.1.4.3.8 template<typename... UserData> template<typename... Confs> void hetcompute::pattern-::pipeline< UserData >::run ( UserData ∗... context_data, size_t num_iterations, Confs&&... confs ) const

Launch and wait for the pipeline.

Parameters

context_data Pointer to the data for the pipeline context if the pipeline defined as havingone, i.e., sizeof...(UserData) == 1.

num_iterations The total number of iterations for the first stage.confs Other configurations for running a pipeline. Currently, only support one

tuner object for the pipeline (optional).

Note: if num_iterations == 0, the pipeline runs infinite number of iterations until the first stage stops thepipeline.

1 #include <hetcompute/hetcompute.hh>23 //4 // Pipeline without context data (wocd),5 // Known iterations before launch(iter)6 // Launch by using hetcompute free function (lch)7 // Through the pipeline object (obj)8 //9 int10 main()11 {12 hetcompute::runtime::init();

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 210

Page 211: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

13 // Define a pipeline skeleton, without pipeline context data.14 hetcompute::pattern::pipeline<> p;1516 // Pipeline context type.17 typedef hetcompute::pattern::pipeline<>::context

context;1819 // Add a serial first stage.20 p.add_stage(hetcompute::serial_stage(), [](context& ctx) {21 size_t iter = ctx.get_iter_id();22 // some usage of iter23 HETCOMPUTE_ILOG("iter: %zu", iter);24 });2526 // Add a parallel stage with degree of concurrency of 4.27 p.add_stage(hetcompute::parallel_stage(4), [](context&) {});2829 // Add a serial stage.30 p.add_stage(hetcompute::serial_stage(), [](context&) {});3132 // Copy the pipeline.33 hetcompute::pattern::pipeline<> p1(p);3435 // Launch using free functions.36 // Launch and wait for pipeline for 15 iterations.37 // Run the pipeline as if the stages are using sliding windows.38 p1.enable_sliding_window();39 p1.run(15);4041 // Launch and wait for pipeline for 25 iterations.42 // Run the pipeline as if the stages are not using sliding windows.43 p.disable_sliding_window();44 p.run(25);4546 std::cout << "pipeline1 runs 15 iters" << std::endl;47 std::cout << "pipeline2 runs 25 iters" << std::endl;4849 hetcompute::runtime::shutdown();50 return 0;51 }

8.7.1.5 class hetcompute::pipeline_context< UserData >

template<typename UserData>class hetcompute::pipeline_context< UserData >

Pipeline_context with one user data.

Template Parameters

UserData The type for the pipeline context data.

Note: This is the pipeline_context type for the pipeline with context data, of type UserData, i.e.,hetcompute::pattern::pipeline<UserData>. Do not use this type directly. Instead, get the member typefrom the pipeline that the context is associated with, i.e., using context =hetcompute::pattern::pipeline<UserData>::context.

Public member functions

• virtual ∼pipeline_context ()

Destructor.

• UserData ∗ get_data () const

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 211

Page 212: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

Get the pointer to the programmer-defined context data.

• HETCOMPUTE_DELETE_METHOD (pipeline_context(pipeline_context const &other))

• HETCOMPUTE_DELETE_METHOD (pipeline_context(pipeline_context &&other))

• HETCOMPUTE_DELETE_METHOD (pipeline_context &operator=(pipeline_context const&other))

• HETCOMPUTE_DELETE_METHOD (pipeline_context &operator=(pipeline_context &&other))

8.7.1.5.1 Constructors and Destructors

8.7.1.5.1.1 template<typename UserData > virtual hetcompute::pipeline_context< UserData>::∼pipeline_context ( ) [virtual]

Destructor.

8.7.1.5.2 Member Function Documentation

8.7.1.5.2.1 template<typename UserData > UserData∗ hetcompute::pipeline_context< UserData>::get_data ( ) const

Get the pointer to the programmer-defined context data.

Returns

UserData∗ The pointer to the user-defined context data, which is provided by the user when launchingthe pipeline.

8.7.1.6 class hetcompute::pipeline_context<>

template<>class hetcompute::pipeline_context<>

Pipeline_context with no user data.

Note: This is the pipeline_context type for the pipeline without context data, i.e.,hetcompute::pattern::pipeline<>. So not use this type directly. Instead, get the member type from thepipeline that the context is associated with, i.e., using context =hetcompute::pattern::pipeline<>::context.

Public member functions

• virtual ∼pipeline_context ()

Destructor.

• HETCOMPUTE_DELETE_METHOD (pipeline_context(pipeline_context const &other))

• HETCOMPUTE_DELETE_METHOD (pipeline_context(pipeline_context &&other))

• HETCOMPUTE_DELETE_METHOD (pipeline_context &operator=(pipeline_context const&other))

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 212

Page 213: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

• HETCOMPUTE_DELETE_METHOD (pipeline_context &operator=(pipeline_context &&other))

8.7.1.6.1 Constructors and Destructors

8.7.1.6.1.1 virtual hetcompute::pipeline_context<>::∼pipeline_context ( ) [virtual]

Destructor.

8.7.1.7 class hetcompute::pipeline_context_base

Pipeline context class.

The user will be able to get information/limited control from the pipeline in the user-defined pipelinefunction through this structure. The user will be able to know the stage_id and the iteration_id duringexecution through pipeline_context and have some control of the execution of the underlying pipeline, suchas stopping the pipeline during execution. When defining a pipeline stage function (function or lambda orcallable object), the first parameter should always be of type pipeline_context.

Public member functions

• virtual ∼pipeline_context_base ()

Destructor.

• void cancel_pipeline ()

Cancel the pipeline.

• size_t get_iter_id () const

Get the current iteration id.

• size_t get_max_stage_iter () const

Get the maximum number of iterations for this stage.

• size_t get_stage_id () const

Get the current stage id.

• bool has_iter_limit () const

Check whether the maximum number of iterations for this stage is set.

• HETCOMPUTE_DELETE_METHOD (pipeline_context_base(pipeline_context_base const&other))

• HETCOMPUTE_DELETE_METHOD (pipeline_context_base(pipeline_context_base &&other))

• HETCOMPUTE_DELETE_METHOD (pipeline_context_base &operator=(pipeline_context_baseconst &other))

• HETCOMPUTE_DELETE_METHOD (pipeline_context_base &operator=(pipeline_context_base&&other))

• void stop_pipeline ()

Stop the pipeline.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 213

Page 214: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

8.7.1.7.1 Constructors and Destructors

8.7.1.7.1.1 virtual hetcompute::pipeline_context_base::∼pipeline_context_base ( ) [virtual]

Destructor.

8.7.1.7.2 Member Function Documentation

8.7.1.7.2.1 void hetcompute::pipeline_context_base::cancel_pipeline ( )

Use this method to cancel a pipeline. Note that hetcompute::abort_on_cancel() needs to be called in thepipeline user-defined stage functions for proper pipeline cancellation. A pipeline can be cancelled in anystages, however the internal state of the pipeline could be non-deterministic

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();7 const size_t num_iters = 100;8 const size_t cancel_iter = 50;9 const size_t doc = 8;1011 hetcompute::pattern::pipeline<std::array<size_t, num_iters>

> p;12 typedef hetcompute::pattern::pipeline<std::array<size_t, num_iters>

>::context context;1314 // Add a serial stage15 p.add_stage(hetcompute::serial_stage(), [](context&) {

hetcompute::abort_on_cancel(); });1617 // Add a parallel stage with doc = 8, no lag18 p.add_stage(hetcompute::parallel_stage(doc), [cancel_iter](context& ctx) {19 size_t i = ctx.get_iter_id();20 (*ctx.get_data())[i] = i;21 if (ctx.get_iter_id() == cancel_iter - 1)22 ctx.cancel_pipeline();23 hetcompute::abort_on_cancel();24 });2526 // Add a serial stage27 p.add_stage(hetcompute::serial_stage(), [](context&) {

hetcompute::abort_on_cancel(); });2829 // define and reset the output array30 std::array<size_t, num_iters> out_array;31 for (size_t i = 0; i < num_iters; i++)32 {33 out_array[i] = 0;34 }3536 // launch with sliding window37 p.enable_sliding_window();38 p.run(&out_array, num_iters);3940 // check the results41 for (size_t i = 0; i < cancel_iter - doc; i++)42 {43 if (out_array[i] != i)44 {45 HETCOMPUTE_ILOG("The pipeline cancellation is not correct.");46 return -1;47 }48 }

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 214

Page 215: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

49 for (size_t i = cancel_iter + doc - 1; i < num_iters; i++)50 {51 if (out_array[i] != 0)52 {53 HETCOMPUTE_ILOG("The pipeline cancellation is not correct.");54 return -1;55 }56 }5758 // parallel launching59 // reset the output arrays60 std::array<size_t, num_iters> out1_array;61 std::array<size_t, num_iters> out2_array;62 for (size_t i = 0; i < num_iters; i++)63 {64 out1_array[i] = 0;65 out2_array[i] = 0;66 }6768 p.enable_sliding_window();69 auto t1 = hetcompute::create_task(p, &out1_array, num_iters);70 t1->launch();7172 p.disable_sliding_window();73 auto t2 = p.create_task(&out2_array, num_iters);74 t2->launch();7576 try77 {78 t1->wait_for();79 }80 catch (const hetcompute::aggregate_exception& e)81 {82 HETCOMPUTE_ILOG("threw %s due to group cancellation. \n", e.what());83 }84 catch (const hetcompute::canceled_exception& e)85 {86 HETCOMPUTE_ILOG("threw %s due to group cancellation. \n", e.what());87 }88 catch (...)89 {90 // unreachable.91 return -1;92 }9394 try95 {96 t2->wait_for();97 }98 catch (const hetcompute::aggregate_exception& e)99 {100 HETCOMPUTE_ILOG("threw %s due to group cancellation. \n", e.what());101 }102 catch (const hetcompute::canceled_exception& e)103 {104 HETCOMPUTE_ILOG("threw %s due to group cancellation. \n", e.what());105 }106 catch (...)107 {108 // unreachable.109 return -1;110 }111112 // checking the results113 for (size_t i = 0; i < cancel_iter - doc; i++)114 {115 if (out1_array[i] != i || out2_array[i] != i)116 {117 HETCOMPUTE_ILOG("The pipeline cancellation is not correct.");118 return -1;119 }

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 215

Page 216: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

120 }121 for (size_t i = cancel_iter + doc - 1; i < num_iters; i++)122 {123 if (out1_array[i] != 0 || out2_array[i] != 0)124 {125 HETCOMPUTE_ILOG("The pipeline cancellation is not correct.");126 return -1;127 }128 }129 hetcompute::runtime::shutdown();130 }

8.7.1.7.2.2 size_t hetcompute::pipeline_context_base::get_iter_id ( ) const

Get the current iteration id (begins from 0).

Returns

size_t Stage iteration id.

8.7.1.7.2.3 size_t hetcompute::pipeline_context_base::get_max_stage_iter ( ) const

Get the maximum number of iterations for this stage.

Returns

size_t maximum number of iterations for this stage. 0 means the maximum number is unknown andthe pipeline will be stopped or canceled dynamically during execution.

8.7.1.7.2.4 size_t hetcompute::pipeline_context_base::get_stage_id ( ) const

Get the current stage id.

Returns

size_t Stage id.

8.7.1.7.2.5 bool hetcompute::pipeline_context_base::has_iter_limit ( ) const

Check whether the maximum number of iterations for this stage is set.

Returns

true - The pipeline has an iteration limit known before running. false- The pipeline does not have aniteration limit known before running.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 216

Page 217: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

See Also

pipeline_context_base::stop_pipeline()

8.7.1.7.2.6 void hetcompute::pipeline_context_base::stop_pipeline ( )

Use this method to stop a pipeline launched with an iteration limit. Calling this method on a pipelinewithout an iteration limit will cause a fatal error. This method can only be called from the first stage of thepipeline.

See Also

pipeline_context_base::has_iter_limit()

8.7.1.8 class hetcompute::serial_stage

Serial stage for specifying the type of the stages when adding to the pipeline.

Public member functions

• serial_stage (serial_stage_type t=serial_stage_type::in_order)

Constructor.

• serial_stage (serial_stage const &other)

Copy constructor.

• serial_stage (serial_stage &&other)

Move constructor.

• serial_stage_type get_type () const

Get the type of the serial stage.

• HETCOMPUTE_DELETE_METHOD (serial_stage &operator=(serial_stage &&other))

• serial_stage & operator= (serial_stage const &other)

Copy assignment operator.

8.7.1.8.1 Constructors and Destructors

8.7.1.8.1.1 hetcompute::serial_stage::serial_stage ( serial_stage_type t = serial_stage_type::in-

_order ) [explicit]

Constructor.

Parameters

t hetcompute::in_order (default) or

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 217

Page 218: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

8.7.1.8.1.2 hetcompute::serial_stage::serial_stage ( serial_stage const & other )

Copy constructor.

8.7.1.8.1.3 hetcompute::serial_stage::serial_stage ( serial_stage && other ) [explicit]

Move constructor.

8.7.1.8.2 Member Function Documentation

8.7.1.8.2.1 serial_stage_type hetcompute::serial_stage::get_type ( ) const

Get the type of the serial stage.

Returns

serial_stage_type The type for the serial stage.

8.7.1.8.2.2 serial_stage& hetcompute::serial_stage::operator= ( serial_stage const & other )

Copy assignment operator.

8.7.1.9 class hetcompute::sliding_window_size

Pipeline stage sliding window size.

Public member functions

• sliding_window_size (size_t size)

Constructor.

• sliding_window_size (sliding_window_size const &other)

Copy constructor.

• sliding_window_size (sliding_window_size &&other)

Move constructor.

• size_t get_size () const

Get the size of the sliding window.

• HETCOMPUTE_DELETE_METHOD (sliding_window_size &operator=(sliding_window_size&&other))

• sliding_window_size & operator= (sliding_window_size const &other)

Copy assignment operator.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 218

Page 219: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

8.7.1.9.1 Constructors and Destructors

8.7.1.9.1.1 hetcompute::sliding_window_size::sliding_window_size ( size_t size ) [explicit]

Constructor.

8.7.1.9.1.2 hetcompute::sliding_window_size::sliding_window_size ( sliding_window_size const &other )

Copy constructor.

8.7.1.9.1.3 hetcompute::sliding_window_size::sliding_window_size ( sliding_window_size && other )[explicit]

Move constructor.

8.7.1.9.2 Member Function Documentation

8.7.1.9.2.1 size_t hetcompute::sliding_window_size::get_size ( ) const

Get the size of the sliding window.

Returns

size_t Sliding window size.

8.7.1.9.2.2 sliding_window_size& hetcompute::sliding_window_size::operator= ( sliding_window_sizeconst & other )

Copy assignment operator.

8.7.1.10 class hetcompute::stage_input

template<typename InputType>class hetcompute::stage_input< InputType >

Pipeline stage input class.

Template Parameters

InputType The data type for the stage_input, which should match the return type of theprevious stage.

Public Types

• typedef InputType input_type

Type of the input data.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 219

Page 220: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

Public member functions

• virtual ∼stage_input ()

Destructor.

• size_t get_first_elem_iter_id () const

Get the iter_id for the stage iteration that generates the first element.

• InputType & get_ith_element (size_t i)

Get the ith element from the input.

• HETCOMPUTE_DELETE_METHOD (stage_input(stage_input const &other))

• HETCOMPUTE_DELETE_METHOD (stage_input(stage_input &&other))

• HETCOMPUTE_DELETE_METHOD (stage_input &operator=(stage_input const &other))

• HETCOMPUTE_DELETE_METHOD (stage_input &operator=(stage_input &&other))

• InputType & operator[ ] (size_t i)

[] operator to get the ith element from the input.

• size_t size () const

Get the number of elements of type InputType in the stage input.

8.7.1.10.1 Member Typedef Documentation

8.7.1.10.1.1 template<typename InputType > typedef InputType hetcompute::stage_input< InputType>::input_type

Type of the input data.

8.7.1.10.2 Constructors and Destructors

8.7.1.10.2.1 template<typename InputType > virtual hetcompute::stage_input< InputType>::∼stage_input ( ) [virtual]

Destructor.

8.7.1.10.3 Member Function Documentation

8.7.1.10.3.1 template<typename InputType > size_t hetcompute::stage_input< InputType>::get_first_elem_iter_id ( ) const

Get the iter_id for the stage iteration that generates the first element.

Returns

size_t The iteration id in the previous stage that generates the first element in the stage_input.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 220

Page 221: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

8.7.1.10.3.2 template<typename InputType > InputType& hetcompute::stage_input< InputType>::get_ith_element ( size_t i )

Get the ith element from the input.

Parameters

i The index of the element to retrieve.

Returns

InputType The ith element in the input.

8.7.1.10.3.3 template<typename InputType > InputType& hetcompute::stage_input< InputType>::operator[ ] ( size_t i )

[] operator to get the ith element from the input.

Parameters

i The index of the element to retrieve.

Returns

InputType The ith element in the input.

8.7.1.10.3.4 template<typename InputType > size_t hetcompute::stage_input< InputType >::size ( )const

Get the number of elements of type InputType in the stage input.

Returns

size_t The number of elements in the input for current iteration.

8.7.2 Typedef Documentation

8.7.2.1 typedef enum hetcompute::serial_stage_type hetcompute::serial_stage_type

Serial pipeline stage types.

8.7.3 Enumeration Type Documentation

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 221

Page 222: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

8.7.3.1 enum hetcompute::serial_stage_type

Serial pipeline stage types.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 222

Page 223: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

8.8 Tuner

Classes

• class hetcompute::pattern::tuner

8.8.1 Class Documentation

8.8.1.1 class hetcompute::pattern::tuner

Public Types

• using load_type = size_t

Public member functions

• tuner ()

• size_t get_chunk_size () const

• load_type get_cpu_load () const

• size_t get_doc () const

• load_type get_dsp_load () const

• load_type get_gpu_load () const

• bool has_profile () const

• bool is_serial () const

• bool is_static () const

• tuner & set_chunk_size (size_t sz)

• tuner & set_cpu_load (load_type load)

• tuner & set_dsp_load (load_type load)

• tuner & set_dynamic ()

• tuner & set_gpu_load (load_type load)

• tuner & set_max_doc (size_t doc)

• tuner & set_profile ()

• tuner & set_serial ()

• tuner & set_static ()

8.8.1.1.1 Constructors and Destructors

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 223

Page 224: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

8.8.1.1.1.1 hetcompute::pattern::tuner::tuner ( )

Tuner constructor Parameters to fine-tune various execution settings in HETCOMPUTE patterns. Note thattuner settings are hints that the HETCOMPUTE runtime takes into account while scheduling a pattern.Constraining factors may cause HETCOMPUTE to ignore the hints.

8.8.1.1.2 Member Function Documentation

8.8.1.1.2.1 size_t hetcompute::pattern::tuner::get_chunk_size ( ) const

Query the granularity of work stealing.

Returns

size_t minimum chunk size.

8.8.1.1.2.2 load_type hetcompute::pattern::tuner::get_cpu_load ( ) const

For patterns executable heterogeneously on multiple devices (e.g. CPU, GPU, DSP), get fraction of patternwork to be executed on the CPU.

Returns

number of units out of total work (cpu_load + gpu_load + dsp_load) to be executed on the CPU

8.8.1.1.2.3 size_t hetcompute::pattern::tuner::get_doc ( ) const

Query the maximum number of tasks launched in parallel.

Returns

size_t degree of concurrency.

8.8.1.1.2.4 load_type hetcompute::pattern::tuner::get_dsp_load ( ) const

For patterns executable heterogeneously on multiple devices (e.g. CPU, GPU, DSP), get fraction of patternwork to be executed on the DSP.

Returns

number of units out of total work (cpu_load + gpu_load + dsp_load) to be executed on the DSP

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 224

Page 225: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

8.8.1.1.2.5 load_type hetcompute::pattern::tuner::get_gpu_load ( ) const

For patterns executable heterogeneously on multiple devices (e.g. CPU, GPU, DSP), set fraction of patternwork to execute on the GPU.

8.8.1.1.2.6 bool hetcompute::pattern::tuner::has_profile ( ) const

Check if hetero pfor_each pattern is profiled.

Returns

bool TRUE if execution is profiled and false otherwise.

8.8.1.1.2.7 bool hetcompute::pattern::tuner::is_serial ( ) const

Check if pattern execution is serialized

Returns

bool TRUE if execution is serialized and false otherwise.

8.8.1.1.2.8 bool hetcompute::pattern::tuner::is_static ( ) const

Check if the parallelization algorithm is static chunking.

Returns

bool TRUE if using static chunking and false if using dynamic work stealing.

8.8.1.1.2.9 tuner& hetcompute::pattern::tuner::set_chunk_size ( size_t sz )

Defines granularity for work stealing. In data parallel patterns, Qualcomm HETCOMPUTE launchesmultiple tasks (defined by doc) in parallel. Each task steals some iterations from other tasks when itsassigned iterations are completed. The chunk size parameter controls the minimum number of iterations atask needs to finish before it is stolen from a stealer task. It is recommended to increase chunk size whenthe computation in each iteration is less.

Parameters

sz Minimum chunk size.

Returns

tuner& reference to the tuner object.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 225

Page 226: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

8.8.1.1.2.10 tuner& hetcompute::pattern::tuner::set_cpu_load ( load_type load )

For patterns executable heterogeneously on multiple devices (e.g. CPU, GPU, DSP), set fraction of patternwork to execute on the CPU.

Parameters

load = number of units out of total work (cpu_load + gpu_load + dsp_load) toexecute on the CPU.

return tuner& reference to the tuner object.

8.8.1.1.2.11 tuner& hetcompute::pattern::tuner::set_dsp_load ( load_type load )

For patterns executable heterogeneously on multiple devices (e.g. CPU, GPU, DSP), set fraction of patternwork to execute on the DSP.

Parameters

load = number of units out of total work (cpu_load + gpu_load + dsp_load) toexecute on the DSP.

return tuner& reference to the tuner object.

8.8.1.1.2.12 tuner& hetcompute::pattern::tuner::set_dynamic ( )

Set the parallelization algorithm to dynamic work stealing (default)

Qualcomm HETCOMPUTE implements a highly efficient work-stealing algorithm and uses it as thecommon backend for parallel iteration patterns. It works well with most workload types, especially foruneven workload distribution across iterations. To improve performance further, consider tuning chunk size(set_chunk_size) and degree of concurrency (set_max_doc).

See Also

set_chunk_size(size_t)set_max_doc(size_t)

Returns

tuner& reference to the tuner object.

8.8.1.1.2.13 tuner& hetcompute::pattern::tuner::set_gpu_load ( load_type load )

For patterns executable heterogeneously on multiple devices (e.g. CPU, GPU, DSP), set fraction of patternwork to execute on the GPU.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 226

Page 227: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

Parameters

load = number of units out of total work (cpu_load + gpu_load + dsp_load) toexecute on the GPU.

return tuner& reference to the tuner object.

8.8.1.1.2.14 tuner& hetcompute::pattern::tuner::set_max_doc ( size_t doc )

Defines the maximum number of tasks in parallel (degree of concurrency) for load balancing. A highernumber indicates over-subscription which might be beneficial in certain usage scenarios. doc must belarger than zero. Otherwise, it will cause a fatal error.

Parameters

doc Degree of concurrency, set to the number of available cores by default.

Returns

tuner& reference to the tuner object

8.8.1.1.2.15 tuner& hetcompute::pattern::tuner::set_profile ( )

Enable profiling within pattern execution. Currently meaningful to hetero pfor_each pattern to generateauto-tuned work distribution across heterogeneous devices.

Returns

tuner& reference to the tuner object.

8.8.1.1.2.16 tuner& hetcompute::pattern::tuner::set_serial ( )

Execute pattern sequentially

Returns

tuner& reference to the tuner object.

8.8.1.1.2.17 tuner& hetcompute::pattern::tuner::set_static ( )

Set the parallelization algorithm to static chunking.

The static chunking algorithm simply divides the iteration range equally and allocates the chunks to parallellaunched tasks. This algorithm has lower synchronization overhead compared with the default dynamicwork stealing algorithm, and features good locality. However, it does not provide load balancing, and inmost cases is outperformed by the dynamic work stealing algorithm (default).

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 227

Page 228: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Patterns Reference API

Returns

tuner& reference to the tuner object.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 228

Page 229: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

9 Tasks Reference API

Tasks represent independent units of work that can be executed asynchronously. Qualcomm HetComputeprogrammers are responsible for partitioning their application into tasks and organizing them into a taskgraph using dependencies. This chapter documents the interfaces to create tasks, setup dependencies, andlaunch (execute) tasks. It also discusses task synchronization (waiting) and cancellation. Grouping is themechanism to wait and cancel on a set of tasks. And finally, attributes is a more advanced feature whichallows programmers to pass additional information about task behavior to the Qualcomm HetComputeruntime system.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 229

Page 230: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.1 Groups

Classes

• class hetcompute::group

Groups represent sets of tasks, which are used to simplify waiting and canceling multiple tasks. More...

• class hetcompute::group_ptr

Smart pointer to a group object. More...

Functions

• group_ptr hetcompute::create_group (const char ∗name)

Creates a named group and returns a group_ptr that points to the group.

• group_ptr hetcompute::create_group (std::string const &name)

Creates a named group and returns a group_ptr that points to the group.

• group_ptr hetcompute::create_group ()

Creates a group and returns a group_ptr that points to the group.

• void hetcompute::finish_after (group ∗g)

• void hetcompute::finish_after (group_ptr const &g)

Specifies that the task invoking this function should be deemed to finish only after tasks in group g finish.

• group_ptr hetcompute::intersect (group_ptr const &a, group_ptr const &b)

Returns a pointer to a group that represents the intersection of two groups.

• bool hetcompute::operator!= (group_ptr const &g, std::nullptr_t)

Compares group g to nullptr.

• bool hetcompute::operator!= (std::nullptr_t, group_ptr const &g)

Compares nullptr to group g.

• bool hetcompute::operator!= (group_ptr const &a, group_ptr const &b)

Compares group a to group b.

• group_ptr hetcompute::operator& (group_ptr const &a, group_ptr const &b)

Returns a pointer to a group that represents the intersection of two groups.

• bool hetcompute::operator== (group_ptr const &g, std::nullptr_t)

Compares group g to nullptr.

• bool hetcompute::operator== (std::nullptr_t, group_ptr const &g)

Compares nullptr to group g.

• bool hetcompute::operator== (group_ptr const &a, group_ptr const &b)

Compares group a to group b.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 230

Page 231: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.1.1 Class Documentation

9.1.1.1 class hetcompute::group

Public member functions

• void add (task_ptr<> const &task)

Adds a task to group without launching it.

• void add (task<> ∗task)

Adds a task to group without launching it.

• void cancel ()

Cancels group.

• bool canceled () const

Checks whether the group is canceled.

• void finish_after ()

Specifies that the task invoking this function should be deemed to finish only after tasks the group. Thismethod returns immediately.

• std::string get_name () const

Returns the group name.

• group_ptr intersect (group_ptr const &other)

Returns a pointer to a group that represents the intersection of two groups.

• group_ptr intersect (group ∗other)

• template<typename FullType , typename FirstArg , typename... RestArgs>

void launch (hetcompute::task_ptr< FullType > const &task, FirstArg &&first_arg, RestArgs&&...rest_args)

Binds arguments to task and launches it into the group.

• template<typename TaskType , typename FirstArg , typename... RestArgs>

void launch (hetcompute::task< TaskType > ∗task, FirstArg &&first_arg, RestArgs &&...rest_args)

Binds arguments to task and launches it into the group.

• void launch (hetcompute::task_ptr<> const &task)

Launches task and into group.

• void launch (hetcompute::task<> ∗task)

Launches task into group.

• template<typename Code , typename... Args>

void launch (Code &&code, Args &&...args)

Creates a new task, binds arguments (if given) and launches it into a group.

• hc_error wait_for ()

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 231

Page 232: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Blocks until all tasks in the group complete execution or are canceled (that is, the group is empty)

9.1.1.1.1 Member Function Documentation

9.1.1.1.1.1 void hetcompute::group::add ( task_ptr<> const & task )

Use add to add a task to a group without launching it. Because of performance reasons, it is recommendedthat tasks are added to groups at the time they are launched using hetcompute::group::launch.Use add when your algorithm requires that the task belongs to a group, but you are not yet ready to launchthe task. For example, perhaps you want to prevent the group from being empty, so you can wait on itsomewhere else.

It is possible, though not recommended because of performance reasons, to use add repeatedly to add atask to multiple groups. Repeatedly adding a task to the same group is not an error, Qualcomm HetComputeignores subsequent launches. If the task has previously been launched, hetcompute::group-::launch(task_ptr<> const&) and hetcompute::group::add(task_ptr<>const&) are equivalent. For more information about tasks joining multiple groups, see Task Groups.

Regardless of the method used to add tasks to a group, the following rules always apply:

• Tasks stay in the group until they finish execution (successfully or unsuccessfully due to exceptionsor cancellation). Once a task is added to a group, there is no way to remove it from the group.

• Once a task belonging to multiple groups completes execution, Qualcomm HetCompute removes itfrom all the groups to which it belongs.

• Neither completed nor canceled tasks can join groups.

• Tasks cannot be added to a canceled group.

Do not call this method if task is nullptr. This would cause a fatal error.

Parameters

task Base task-pointer.

Example 1

1 #include <stdio.h>2 #include <hetcompute/hetcompute.hh>34 int5 main()6 {7 hetcompute::runtime::init();8 // Create group g.9 auto g = hetcompute::create_group();1011 // Create task t1. Its type is hetcompute::task_ptr<void()>12 auto t1 = hetcompute::create_task([] { HETCOMPUTE_ILOG("Hello World from t1!\n")

; });1314 // Add task t1 to group g, but do not launch it.15 g->add(t1);1617 auto t2 = hetcompute::launch([t1] {18 // Launch t1. Because it already belongs to group g, there is no19 // reason to use hetcompute::group::launch.20 t1->launch();

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 232

Page 233: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

21 });2223 // Wait for tasks in group g to complete.24 g->wait_for();25 hetcompute::runtime::shutdown();2627 return 0;28 }

Example 2

1 #include <stdio.h>2 #include <hetcompute/hetcompute.hh>34 int5 main()6 {7 hetcompute::runtime::init();8 // Create groups g1, g2, g39 auto g1 = hetcompute::create_group();10 auto g2 = hetcompute::create_group();11 auto g3 = hetcompute::create_group();1213 // Create task t. Its type is hetcompute::task_ptr<void(int)>14 auto t = hetcompute::create_task([](int seconds) {15 HETCOMPUTE_ILOG("Hello World from t! I’ll sleep for %d seconds\n", seconds);16 sleep(seconds);17 HETCOMPUTE_ILOG("Good bye from t\n");18 });1920 // Launch t into g1, let it sleep for 4 seconds.21 g1->launch(t, 4);2223 // t is launched, possibly running, let’s add it to g2 as well.24 g2->add(t);2526 // Equivalent to g3->add(t).27 g3->launch(t);2829 // Wait for g2 to be empty30 g2->wait_for();3132 HETCOMPUTE_ILOG("**%s**\n", g3->get_name().c_str());33 hetcompute::runtime::shutdown();34 return 0;35 }

See Also

hetcompute::group::launch(task_ptr<>const&)

9.1.1.1.1.2 void hetcompute::group::add ( task<> ∗ task )

Similar to add(task_ptr<> const&) except that it takes a pointer to a base task instead of a basetask-pointer.

Do not call this method if task is nullptr. This would cause a fatal error.

Parameters

task Pointer to base task.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 233

Page 234: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

See Also

hetcompute::group::launch(task_ptr<>const&)hetcompute::group::add(task_ptr<> const&)

9.1.1.1.1.3 void hetcompute::group::cancel ( )

Marks the group as canceled and returns immediately. Once a group is canceled, it cannot revert to anon-canceled state. Canceling a group means that:

• The tasks in the group that have not started execution will never execute.

• The tasks in the group that are executing will be canceled only when they callhetcompute::abort_on_cancel. If any of these executing tasks is a blocking executing ahetcompute::blocking construct, Qualcomm HetCompute executes the constructs’scancellation handler if they had not executed it before.

• Any tasks added to the group after the group is canceled are also canceled.

cancel returns immediately. Call hetcompute::group::wait_for() afterwards to wait for allthe running tasks to be completed. For more information about cancellation, check Tasks.

Example 1

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();78 // Create group9 auto g = hetcompute::create_group();1011 // Create lambda for task body.12 auto l = [](int task_id) {13 HETCOMPUTE_ILOG("Task %d begins execution.\n", task_id);14 for (int i = 0; i < 2; ++i)15 {16 hetcompute::abort_on_cancel();17 usleep(400000);18 }19 HETCOMPUTE_ILOG("Task %d ends execution normally.\n", task_id);20 };2122 // Launch many tasks23 for (int j = 0; j < 10000; ++j)24 {25 g->launch(l, j);26 }2728 // Sleep for a little while, to give some tasks29 // time to completely execute.30 sleep(1);3132 // Cancel group and wait for the running tasks to complete33 g->cancel();34 try35 {36 g->wait_for();37 }38 catch (const hetcompute::aggregate_exception& e)39 {

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 234

Page 235: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

40 std::cout << "threw " << e.what() << " due to group cancellation " << std::endl;41 }42 catch (const hetcompute::canceled_exception& e)43 {44 std::cout << "threw " << e.what() << " due to group cancellation " << std::endl;45 }46 catch (...)47 {48 // Never reached49 }50 hetcompute::runtime::shutdown();51 return 0;52 }

In the example above, launch 10000 tasks are launched into group g. Each task prints a message when itstarts execution and another one right before it ends execution. The latter one will only print if the task doesnot notice that the group has been canceled. (See hetcompute::abort_on_cancel).

Right after launching the tasks, main sleeps for a second before canceling the group. This means that nexttime the running tasks execute hetcompute::abort_on_cancel(), they will see that their grouphas been canceled and will abort. wait_for will not return before the running tasks end their execution –either because they call hetcompute::abort_on_cancel(), or because they complete theirexecution without being canceled.

Example 2

1 #include <atomic>2 #include <hetcompute/hetcompute.hh>34 using namespace std;56 int7 main()8 {9 hetcompute::runtime::init();10 // Counts the number of tasks that execute before the group gets11 // canceled12 atomic<size_t> counter;1314 auto group = hetcompute::create_group();1516 // Create 2000 tasks that increase an atomic counter17 for (int i = 0; i < 2000; i++)18 {19 group->launch([&counter] {20 counter++;21 usleep(7);22 });23 }2425 // Cancel group26 group->cancel();2728 // Wait for group to cancel29 try30 {31 group->wait_for();32 }33 catch (const hetcompute::aggregate_exception& e)34 {35 // If many tasks were canceled, they each propagate a36 // hetcompute::canceled_exception to the group, all of which get aggregated into37 // a single hetcompute::aggregate_exception.38 std::cout << "threw " << e.what() << " due to group cancellation " << std::endl;39 }

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 235

Page 236: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

40 catch (const hetcompute::canceled_exception& e)41 {42 // If all but one task finished by the time group cancellation took effect,43 // then the one remaining task which was canceled will propagate a single44 // hetcompute::canceled_exception.45 std::cout << "threw " << e.what() << " due to group cancellation " << std::endl;46 }47 catch (...)48 {49 // Never reached50 }51 HETCOMPUTE_ILOG("wait_for returned after %zu tasks executed", counter.load());52 hetcompute::runtime::shutdown();53 return 0;54 }

Output

wait_for returned after 87 tasks executed.

Note that this is an example output. Actual output is timing dependent.

See Also

hetcompute::abort_on_cancel()hetcompute::group::wait_for()hetcompute::task<>::cancel()

9.1.1.1.1.4 bool hetcompute::group::canceled ( ) const

Returns true if the group has been canceled; otherwise, returns false. For more about cancellation, seeTasks.

Returns

true – The group is canceled.false – The group is not canceled.

See Also

hetcompute::group::cancel()

9.1.1.1.1.5 void hetcompute::group::finish_after ( )

Exceptions

api_exception If invoked from outside a task or from within ahetcompute::pfor_each or if ’task’ points to null

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 236

Page 237: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Note

If exceptions are disabled by application, this API will terminate the app, if pointer to task isnullptr, invoked from outside a task or from within a hetcompute::pfor_each

Example

1 #include <string>23 #include <hetcompute/hetcompute.hh>45 void display_webpage(char*);6 void compose_webpages(int num_urls, char* urls[]);78 void9 display_webpage(char* url)10 {11 auto fetchdata = hetcompute::create_task([=] {12 /*fetch(url, "fetchdata");*/13 return std::string(url) + " data";14 });15 auto fetchstyle = hetcompute::create_task([=] {16 /*fetch(url, "fetchstyle");*/17 return std::string(url) + " style";18 });19 auto render = hetcompute::create_task([](std::string data, std::string style

) {20 /*render();*/21 std::cout << data + " " + style << std::endl;22 });23 // Render task may start executing only after data and style have been24 // fetched25 render->bind_all(fetchdata, fetchstyle);26 fetchdata->launch();27 fetchstyle->launch();28 render->launch();29 // Mark display_webpage as logically finishing after the render task finishes30 render->finish_after();31 // Return from function call even before any of the fetchdata, fetchstyle, or render32 // tasks finish. Such an early return makes the function asynchronous.33 }3435 void36 compose_webpages(int num_urls, char* urls[])37 {38 auto g = hetcompute::create_group();39 for (int i = 1; i < num_urls; i++)40 {41 g->launch([=] { display_webpage(urls[i]); });42 }43 // Mark compose_webpages as logically finishing after all webpages have been44 // composed and displayed45 g->finish_after();46 // Return from function call before any of the tasks finish47 }4851 int52 main(int argc, char* argv[])53 {54 hetcompute::runtime::init();5556 // Launch compose_webpages as a task since it is an asynchronous function57 // call58 auto t = hetcompute::launch([=, &argv] { compose_webpages(argc, argv); });59 // Waits for the composite display to be rendered!60 t->wait_for();61 return 0;6263 hetcompute::runtime::shutdown();

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 237

Page 238: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

64 }

9.1.1.1.1.6 std::string hetcompute::group::get_name ( ) const

Returns string with the name of the group. If the group has no name, the returned string is empty.

Returns

std::string containing the name of the group.

See Also

hetcompute::create_group(std::string const&)

9.1.1.1.1.7 group_ptr hetcompute::group::intersect ( group_ptr const & other )

Returns a pointer to a group that represents the intersection of the group managed by ∗this and other.

Some applications require that tasks join more than one group. It is possible, though not recommended forperformance reasons, to use hetcompute::group::launch(hetcompute::task_ptr<>const&) or hetcompute::group::add(hetcompute::task_ptr<> const&) repeatedly toadd a task to several groups. Instead, use hetcompute::group::intersect(group_ptrconst&) to create a new group that represents the intersection of all the groups where the tasks need tolaunch. Again, this method is more performant than repeatedly launching the same task into differentgroups.

Launching a task into the intersection group also simultaneously launches it into all the groups that are partof the intersection.

Consecutive calls to hetcompute::group::intersect with the same group pointer as argumentreturn a pointer to the same group.

Group intersection is a commutative operation.

You can use the & operator instead of hetcompute::group::intersect.

Parameters

other group pointer to the group to intersect with.

Returns

group_ptr – Group pointer that points to a group that represents the intersection of ∗this and other.

See Also

hetcompute::intersect(hetcompute::group_ptr const&,hetcompute::group_ptr const&).hetcompute::operator&(hetcompute::group_ptr const&,hetcompute::group_ptr const&)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 238

Page 239: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.1.1.1.1.8 template<typename FullType , typename FirstArg , typename... RestArgs> void hetcompute-::group::launch ( hetcompute::task_ptr< FullType > const & task, FirstArg && first_arg,RestArgs &&... rest_args )

Binds arguments to task and launches it into the group. task must be a fully-typed task-pointer to allowargument binding, and it should not be bound already. Otherwise, launch causes a runtime error. Formore information about binding, check Tasks.

Tasks do not execute unless they are launched. By launching a task, the programmer informs the QualcommHetCompute runtime that the task is ready to execute as soon as all its (control and data) dependencies havebeen satisfied, required buffers, if any, are available, and a hardware context is available. For moreinformation about task launching, see Tasks.

Tasks can launch only once. Any subsequent calls to g->launch() do not cause the task to executeagain. Instead, they cause the task to be added to group g, if the task was not part of that group already.When launching a task into many groups, remember that group intersection is a somewhat expensiveoperation. If you need to launch into multiple groups several times, intersect the groups once and launch thetasks into the intersection. For more information about tasks joining multiple groups, see Task Groups.

Template Parameters

FullType Task pointer type. Should be a full type (i.e., void(int, float)).FirstArg Type of the first argument to be bound to the task.RestArgs Type of the rest of the arguments to be bound to the task.

Parameters

task Fully-typed task pointer.first_arg First task argument.rest_args Rest of the task arguments.

Exceptions

api_exception If task pointer is nullptr.

Note

If exceptions are disabled by application, this API will terminate the app if pointer to task is nullptr

Example

1 #include <stdio.h>2 #include <hetcompute/hetcompute.hh>34 int5 main()6 {7 hetcompute::runtime::init();8 // Create group9 auto g = hetcompute::create_group();1011 // hello is a fully-typed task pointer of type12 // hetcompute::task_ptr<void(int)>13 auto hello = hetcompute::create_task([](int x) { HETCOMPUTE_ILOG("Hello World

%d!\n", x); });1415 // Bind hello to 42 and launch task into g

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 239

Page 240: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

16 g->launch(hello, 42);1718 // Wait for g to be empty19 g->wait_for();2021 hetcompute::runtime::shutdown();22 }

See Also

hetcompute::group::launch(hetcompute::task_ptr<> const&)

9.1.1.1.1.9 template<typename TaskType , typename FirstArg , typename... RestArgs> voidhetcompute::group::launch ( hetcompute::task< TaskType > ∗ task, FirstArg && first_arg,RestArgs &&... rest_args )

Similar to hetcompute::group::launch(hetcompute::task_ptr<FullType> const&,FirstArg&& first_arg, RestArgs&& ...rest_args) except it takes a pointer to a base taskinstead of a base task pointer.

Template Parameters

FullType Task type. Should be a full type (i.e., void(int, float)).FirstArg Type of the first argument to be bound to the task.RestArgs Type of the the rest of the arguments to be bound to the task.

Parameters

task Pointer to task.first_arg First argument value to bind.rest_args The rest of the argument values to bind.

Exceptions

api_exception If pointer to task is nullptr.

Note

If exceptions are disabled by application, this API will terminate the app if pointer to task is nullptr

See Also

hetcompute::group::launch(hetcompute::task_ptr<FullType> const&,FirstArg&& first_arg, RestArgs&& ...rest_args)

9.1.1.1.1.10 void hetcompute::group::launch ( hetcompute::task_ptr<> const & task )

Launches task and into group. Tasks do not execute unless they are launched. By launching a task, theprogrammer informs the Qualcomm HetCompute runtime that the task is ready to execute as soon as all its(control and data) dependencies have been satisfied, required buffers (if any) are available, and a hardwarecontext is available. For more information about task launching, see Tasks.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 240

Page 241: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

A task executes only once regardless of how many times it has been launched. Therefore, any subsequentcall to launch does not cause the task to execute again. Instead, it causes the task to be added to a newgroup, if the task was not part of that group already. For more information about tasks joining multiplegroups, see Task Groups.

Parameters

task Base task pointer.

Exceptions

api_exception If task pointer is nullptr.

Note

If exceptions are disabled by application, this API will terminate the app if task pointer is nullptr

Example

1 #include <stdio.h>2 #include <hetcompute/hetcompute.hh>34 int5 main()6 {7 hetcompute::runtime::init();8 // Create group9 auto g = hetcompute::create_group();1011 // hello is a fully-typed task pointer of type12 // hetcompute::task_ptr<void()>.13 auto hello = hetcompute::create_task([]() { HETCOMPUTE_ILOG("Hello World!\n"); }

);1415 // Launch hello into g.16 g->launch(hello);1718 // Wait for g to be empty.19 g->wait_for();20 hetcompute::runtime::shutdown();21 }

9.1.1.1.1.11 void hetcompute::group::launch ( hetcompute::task<> ∗ task )

Similar to hetcompute::group::launch(hetcompute::task_ptr<> const&) except ittakes a pointer to a base task instead of a base task-pointer.

Parameters

task Pointer to base task.

Exceptions

api_exception If task is nullptr.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 241

Page 242: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Note

If exceptions are disabled by application, this API will terminate the app if task is nullptr

Example

1 #include <stdio.h>2 #include <hetcompute/hetcompute.hh>34 int5 main()6 {7 hetcompute::runtime::init();8 // Create group9 auto g = hetcompute::create_group();1011 // hello is a fully-typed task pointer of type12 // hetcompute::task_ptr<void()>.13 auto hello = hetcompute::create_task([]() { HETCOMPUTE_ILOG("Hello World!\n"); }

);1415 // get regular pointer to task.16 auto hello_ptr = hello.get();1718 // Launch hello into g.19 g->launch(hello_ptr);2021 // Wait for g to be empty.22 g->wait_for();23 hetcompute::runtime::shutdown();24 }

9.1.1.1.1.12 template<typename Code , typename... Args> void hetcompute::group::launch ( Code &&code, Args &&... args )

Creates a new task, binds arguments (if given) and launches it into a group. This is the fastest way to createand launch a task into a group. It is recommended that it be used as much as possible. Note, however, thatthis method does not return a pointer to the task. Therefore, only use this method if the new task will not bepart of a task graph. Qualcomm HetCompute runtime will execute the task as soon as all its (control anddata) dependencies have been satisfied, required buffers if any are available, and a hardware context isavailable. For more information about task launching, see Tasks.

The new task executes the Code passed as an argument to this method.

When creating a task that will execute in the CPU, the preferred types for Code are C++11 lambda andhetcompute::cpu_kernel, although it is possible to use other types such as function objects andfunction pointers. Use hetcompute::dsp_kernel or hetcompute::gpu_kernel to create a taskthat runs in the Qualcomm Hexagon DSP or in the GPU. Regardless of the Code type, it can take up to 31arguments.

Notice that launch makes a copy of code so that the programmer does not need to worry about thelifetime of the code object.

launch can launch a hetcompute pattern object directly just as launching a regular task, as shown in thefollowing example:

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 242

Page 243: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Examples

// increment every element in input vector vinauto l = [&vin](size_t i){ vin[i]++; };auto pfor = hetcompute::pattern::create_pfor_each(l);g->launch(pfor, size_t(0), vin.size());

Notice that launch does not support launching patterns with non-void return value. Thereforeprogrammers cannot launch preduce or pdivide_and_conquer using this group launch semantic.

Template Parameters

Code Code that the task will execute. It can be a lambda expression, functionpointer, functor, patterncpu_kernel, gpu_kernel or a dsp_kernel.

Parameters

code Task body.args Arguments to bind to the parameters.

Example

1 #include <stdio.h>2 #include <hetcompute/hetcompute.hh>34 static void5 foo()6 {7 HETCOMPUTE_ILOG("Hello World! from foo()\n");8 sleep(1);9 HETCOMPUTE_ILOG("Bye from foo()\n");10 }1112 int13 main()14 {15 hetcompute::runtime::init();16 // Create group g17 auto g = hetcompute::create_group();1819 // Create cpu_kernel that executes foo20 auto k1 = hetcompute::create_cpu_kernel(foo);2122 // Create a task from a kernel and launch it23 g->launch(k1);2425 // Create lambda expression l that takes two arguments26 auto l = [](int x, int y) { HETCOMPUTE_ILOG("Hello World! %d + %d = %d\n", x, y, x + y); };2728 // Create tasks from l and launch them into g29 for (int i = 0; i < 3; i++)30 for (int j = 42; j < 44; j++)31 g->launch(l, i, j);3233 // Wait for all the tasks in group g to complete34 g->wait_for();35 hetcompute::runtime::shutdown();3637 return 0;38 }

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 243

Page 244: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

See Also

hetcompute::launch(Code&& code, Args&& ...args)hetcompute::group::launch(hetcompute::task_ptr<FullType> const&,FirstArg&& first_arg, RestArgs&& ...rest_args)hetcompute::group::launch(hetcompute::task_ptr<> const&)

9.1.1.1.1.13 hc_error hetcompute::group::wait_for ( )

Blocks until all tasks in the group complete execution or are canceled. If new tasks are added to the groupwhile wait_for is blocking, wait_for does not return until all those new tasks also complete.

If wait_for is called from within a task, Qualcomm HetCompute context switches the task and findsanother task to run. If called from outside a task, this wait_for blocks the calling thread until it returns.

Note

If exceptions are disabled by application, wait_for returns hetcompute::hc_error instead ofthrowing exceptions.

wait_for is a safe point.

Example 1

1 #include <hetcompute/hetcompute.hh>2 #include <stdio.h>34 int5 main()6 {7 hetcompute::runtime::init();8 // Create group g9 auto g = hetcompute::create_group();1011 // Launch 10 tasks into g12 for (int i = 0; i < 10; i++)13 {14 g->launch([i] { HETCOMPUTE_ILOG("Hello World! I’m task #%d\n", i); });15 }1617 // Wait for tasks to complete and exit group18 g->wait_for();19 hetcompute::runtime::shutdown();2021 return 0;22 }

Waiting for a group intersection means that Qualcomm HetCompute returns once the tasks in theintersection group have completed or executed.

Example 2

1 #include <stdio.h>2 #include <hetcompute/hetcompute.h>34 int5 main()6 {7 // Create groups

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 244

Page 245: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

8 hetcompute::group_ptr g1 = hetcompute::create_group("Example 1");

9 hetcompute::group_ptr g2 = hetcompute::create_group("Example 2");

10 hetcompute::group_ptr g12 = g1 & g2;1112 // Create and launch two tasks that never end13 g1->launch([] {14 while (1)15 {16 }17 });1819 g2->launch([] {20 while (1)21 {22 }23 });2425 // Returns immediately because there are no26 // tasks that belong to both g1 and g227 g12->wait_for();2829 // Never returns30 // g1->wait_for();31 // g2->wait_for();3233 g1->cancel();34 g2->cancel();3536 return 0;37 }

See Also

hetcompute::group::finish_afterhetcompute::task::finish_afterhetcompute::task::wait_forhetcompute::intersection

9.1.1.2 class hetcompute::group_ptr

Smart pointer to a group object, similar to std::shared_ptr.

Public member functions

• group_ptr ()

Default constructor. Constructs a group_ptr with no group.

• group_ptr (std::nullptr_t)

Default constructor. Constructs a group_ptr with no group.

• group_ptr (group_ptr const &other)

Copy constructor. Constructs a group_ptr that manages the same group as other.

• group_ptr (group_ptr &&other)

Move constructor. Move-constructs a group_ptr that manages the same group as other.

• group ∗ get () const

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 245

Page 246: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Returns pointer to managed group.

• operator bool () const

Checks whether pointer is not nullptr.

• group ∗ operator-> () const

Dereference operator. Returns pointer to the managed group.

• group_ptr & operator= (group_ptr const &other)

Assignment operator. Assigns the group managed by other to ∗this.

• group_ptr & operator= (std::nullptr_t)

Assignment operator. Resets ∗this.

• group_ptr & operator= (group_ptr &&other)

Move-assignment operator. Move-assigns the group managed by other to ∗this.

• void reset ()

Resets pointer to managed group.

• void swap (group_ptr &other)

Exchanges managed groups between ∗this and other.

• bool unique () const

Checks whether ∗this is the onlygroup_ptr managing the same group object.

• size_t use_count () const

Returns the number of group_ptr objects managing the same object (including ∗this).

9.1.1.2.1 Constructors and Destructors

9.1.1.2.1.1 hetcompute::group_ptr::group_ptr ( )

Constructs a group_ptr that manages no group. group_ptr::get returns nullptr.

9.1.1.2.1.2 hetcompute::group_ptr::group_ptr ( std::nullptr_t )

Constructs a group_ptr that manages no group. group_ptr::get returns nullptr.

9.1.1.2.1.3 hetcompute::group_ptr::group_ptr ( group_ptr const & other )

Constructs a group_ptr object that manages the same group as other. If other points to nullptr,the newly built object points to nullptr as well.

Parameters

other Group pointer to copy.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 246

Page 247: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.1.1.2.1.4 hetcompute::group_ptr::group_ptr ( group_ptr && other )

Constructs a group_ptr object that manages the same group as other and resets other. If otherpoints to nullptr, the newly built object points to nullptr as well.

Parameters

other Group pointer to move from.

9.1.1.2.2 Member Function Documentation

9.1.1.2.2.1 group∗ hetcompute::group_ptr::get ( ) const

Returns pointer to the managed group. Remember that the lifetime of the group is defined by the lifetime ofthe group_ptr objects managing it. If all group_ptr objects managing a group g go out of scope, allgroup∗ pointing to g may be invalid.

Returns

Pointer to managed group object.

9.1.1.2.2.2 hetcompute::group_ptr::operator bool ( ) const [explicit]

Checks whether ∗this manages a group.

Returns

true – The pointer is not nullptr (∗this manages a group).false – The pointer is nullptr (∗this does not manage a group).

9.1.1.2.2.3 group∗ hetcompute::group_ptr::operator-> ( ) const

Returns pointer to the managed group. Do not call this member function if ∗this does not manage agroup. This would cause a fatal error.

Returns

Pointer to managed group object.

9.1.1.2.2.4 group_ptr& hetcompute::group_ptr::operator= ( group_ptr const & other )

Assigns the group managed by other to ∗this. If, before the assignment, ∗this was the lastgroup_ptr pointing to a group g, then the assignment will cause g to be destroyed. If other managesno object, ∗this will not manage an object either after the assignment.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 247

Page 248: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Parameters

other Group pointer to copy.

Returns

∗this.

9.1.1.2.2.5 group_ptr& hetcompute::group_ptr::operator= ( std::nullptr_t )

Resets ∗this so that it manages no object. If, before the assignment, ∗this was the last group_ptrpointing to a group g, then the assignment will cause g to be destroyed. If other manages no object,∗this will not manage an object either after the assignment.

Returns

∗this.

9.1.1.2.2.6 group_ptr& hetcompute::group_ptr::operator= ( group_ptr && other )

Move-assigns the group managed by other to ∗this. other will manage no group after the assignment.

If, before the assignment, ∗this was the last group_ptr pointing to a group g, then the assignment willcause g to be destroyed. If other manages no object, ∗this will not manage an object either after theassignment.

Parameters

other Group pointer to move from.

Returns

∗this.

9.1.1.2.2.7 void hetcompute::group_ptr::reset ( )

Resets pointer to managed group. If, ∗this was the last group_ptr pointing to a group g, thenreset() cause g to be destroyed.

Exceptions

api_exception If group pointer is nullptr.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 248

Page 249: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.1.1.2.2.8 void hetcompute::group_ptr::swap ( group_ptr & other )

Exchanges managed groups between ∗this and other.

Parameters

other Group pointer to exchange with.

9.1.1.2.2.9 bool hetcompute::group_ptr::unique ( ) const

Checks whether ∗this is the onlygroup_ptr managing the same group object. If ∗this does notmanage any group, unique() returns false.

It is equivalent to checking whether use_count is 1, except that it is more efficient.

Returns

true – The pointer is the only group_ptr managing the group. false – The pointer is not theonly group_ptr managing the group or ∗this is nullptr.

9.1.1.2.2.10 size_t hetcompute::group_ptr::use_count ( ) const

Returns the number of group_ptr objects managing the same object (including ∗this). Notice that theHETCOMPUTE runtime keeps one internal group_ptr to a group if the group contains one or moretasks. This is to prevent a group from disappearing while it has tasks.

1 #include <cassert>2 #include <hetcompute/hetcompute.hh>34 int5 main()6 {7 hetcompute::runtime::init();8 // Create group g.9 auto g = hetcompute::create_group();1011 // g’s use_count should be 112 HETCOMPUTE_ILOG("After construction: g.use_count() = %zu\n", g.use_count());1314 // Copy-construct g2 from g. g and g2’s use_count is 2.15 auto g2 = g;16 HETCOMPUTE_ILOG("After copy-construction: g2.use_count() = %zu\n", g2.use_count());1718 std::atomic<bool> running(false);19 std::atomic<bool> finish(false);2021 // Launch t into g and wait for its completion.22 g->launch([&running, &finish] {23 running = true;24 while (!finish)25 {26 };27 });2829 while (!running)30 {31 };32

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 249

Page 250: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

33 HETCOMPUTE_ILOG("Task in g running. g.use_count() = %zu\n", g.use_count());3435 // Call g.get to get pointer to managed group. g’s use_cont still 3.36 auto g3 = g.get();37 HETCOMPUTE_ILOG("After calling g.get(). g.use_count() = %zu\n", g.use_count());3839 finish = true;4041 g->wait_for();4243 // g’s use_count should be 244 HETCOMPUTE_ILOG("After g->wait_for: g.use_count() = %zu\n", g.use_count());4546 assert(g3 != nullptr);47 HETCOMPUTE_UNUSED(g3);4849 hetcompute::runtime::shutdown();50 return 0;51 }

Output

After construction: g.use_count() = 1After copy-construction: g2.use_count() = 2Task in g running. g.use_count() = 3After calling g.get(). g.use_count() = 3After g->wait_for: g.use_count() = 2

Returns

Total number of group_ptr

9.1.2 Function Documentation

9.1.2.1 group_ptr hetcompute::create_group ( const char ∗ name )

Creates a named group and returns a group_ptr that points to it. Named groups can facilitate debuggingof complex applications. Keep in mind, that Qualcomm HetCompute will make a copy of name, whichmay cause a slight overhead if you repeatedly create and destroy groups.

name does not have to be unique. Qualcomm HetCompute does not ensure it, so two or more groups canshare the same name.

Parameters

name Group name.

Returns

group_ptr – Pointer to the new group.

Example

1 #include <cassert>2 #include <string>3 #include <hetcompute/hetcompute.hh>45 int6 main()

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 250

Page 251: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

7 {8 hetcompute::runtime::init();9 // Create group named "Example 1"10 auto g1 = hetcompute::create_group("Example 1");1112 // Create group named "Example 2"13 std::string g2_name("Example 2");14 auto g2 = hetcompute::create_group(g2_name);1516 // Create unnamed group17 auto g3 = hetcompute::create_group();1819 HETCOMPUTE_ILOG("g1 name = %s\n", g1->get_name().c_str());20 HETCOMPUTE_ILOG("g2 name = %s\n", g2->get_name().c_str());21 HETCOMPUTE_ILOG("g3 name = %s\n", g3->get_name().c_str());2223 hetcompute::runtime::shutdown();24 return 0;25 }

See Also

hetcompute::create_group()hetcompute::create_group(std::string const&)

9.1.2.2 group_ptr hetcompute::create_group ( std::string const & name )

Creates a named group and returns a group_ptr that points to it. Named groups can facilitate debuggingof complex applications. Keep in mind, that Qualcomm HetCompute will make a copy of name, whichmay cause a slight overhead if you repeatedly create and destroy groups.

name does not have to be unique. Qualcomm HetCompute does not ensure it, so two or more groups canshare the same name.

Parameters

name Group name.

Returns

group_ptr – Pointer to the new group.

Example

1 #include <cassert>2 #include <string>3 #include <hetcompute/hetcompute.hh>45 int6 main()7 {8 hetcompute::runtime::init();9 // Create group named "Example 1"10 auto g1 = hetcompute::create_group("Example 1");1112 // Create group named "Example 2"13 std::string g2_name("Example 2");14 auto g2 = hetcompute::create_group(g2_name);1516 // Create unnamed group17 auto g3 = hetcompute::create_group();

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 251

Page 252: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

1819 HETCOMPUTE_ILOG("g1 name = %s\n", g1->get_name().c_str());20 HETCOMPUTE_ILOG("g2 name = %s\n", g2->get_name().c_str());21 HETCOMPUTE_ILOG("g3 name = %s\n", g3->get_name().c_str());2223 hetcompute::runtime::shutdown();24 return 0;25 }

See Also

hetcompute::create_group()hetcompute::create_group(const char∗)

9.1.2.3 group_ptr hetcompute::create_group ( )

Creates a group and returns a group_ptr that points to it.

Returns

group_ptr Pointer to the new group.

Example

1 #include <cassert>2 #include <string>3 #include <hetcompute/hetcompute.hh>45 int6 main()7 {8 hetcompute::runtime::init();9 // Create group named "Example 1"10 auto g1 = hetcompute::create_group("Example 1");1112 // Create group named "Example 2"13 std::string g2_name("Example 2");14 auto g2 = hetcompute::create_group(g2_name);1516 // Create unnamed group17 auto g3 = hetcompute::create_group();1819 HETCOMPUTE_ILOG("g1 name = %s\n", g1->get_name().c_str());20 HETCOMPUTE_ILOG("g2 name = %s\n", g2->get_name().c_str());21 HETCOMPUTE_ILOG("g3 name = %s\n", g3->get_name().c_str());2223 hetcompute::runtime::shutdown();24 return 0;25 }

See Also

hetcompute::create_group()hetcompute::create_group(const char∗)

9.1.2.4 void hetcompute::finish_after ( group ∗ g )

PRIVATE

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 252

Page 253: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.1.2.5 void hetcompute::finish_after ( group_ptr const & g )

Specifies that the task invoking this function should be deemed to finish only after tasks in g finish. Thismethod returns immediately.

If the invoking task is multi-threaded, the programmer must ensure that concurrent calls tofinish_after from within the task are properly synchronized.

Parameters

g Group pointer.

Exceptions

api_exception If invoked from outside a task or from within ahetcompute::pfor_each or if g points to nullptr

Note

If exceptions are disabled by the application, the API terminates in the above listed error conditions.

9.1.2.6 group_ptr hetcompute::intersect ( group_ptr const & a, group_ptr const & b )

Returns a pointer to a group that represents the intersection of two groups. Some applications require thattasks join more than one group. It is possible, though not recommended for performance reasons, to usehetcompute::group::launch(hetcompute::task_ptr<> const&) or hetcompute-::group::add(hetcompute::task_ptr<> const&) repeatedly to add a task to several groups.Instead, use hetcompute::intersect(group_ptr const&, group_ptr const&) to createa new group that represents the intersection of all the groups where the tasks need to launch. Again, thismethod is more performant than repeatedly launching the same task into different groups.

Launching a task into the intersection group also simultaneously launches it into all the groups that are partof the intersection.

Consecutive calls to hetcompute::intersect with the same groups’ pointer as arguments, return apointer to the same group.

Group intersection is a commutative operation.

You can use the & operator instead of hetcompute::group::intersect.

Parameters

a Group pointer to the first group.b Group pointer to the second group.

Returns

group_ptr – Group pointer that points to a group that represents the intersection of a and b.

Example

1 #include <hetcompute/hetcompute.hh>

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 253

Page 254: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

23 int4 main()5 {6 hetcompute::runtime::init();7 // Create groups8 auto g1 = hetcompute::create_group("Group 1");9 auto g2 = hetcompute::create_group("Group 2");1011 auto g12 = hetcompute::intersect(g1, g2);1213 for (int i = 0; i < 3000; i++)14 g1->launch([] {15 //... Do something16 });1718 for (int i = 0; i < 2000; i++)19 g2->launch([] {20 //... Do something21 });2223 // Returns immediately. g12 is empty24 g12->wait_for();2526 // Return only after tasks in g1 and g2 complete27 g1->wait_for();28 g2->wait_for();2930 g12->launch([] {31 //... Calculate the Ultimate Question of Life,32 // the Universe, and Everything33 HETCOMPUTE_ILOG("42\n");34 });3536 // All will return after the task prints 4237 g1->wait_for();38 g2->wait_for();39 hetcompute::runtime::shutdown();4041 return 0;42 }

The example above shows an application with three groups: g1, g2, and their intersection g12. Welaunch thousands of tasks on both g1 and g2. We then wait for g12 (line 23), butg12->wait_for() returns immediately because g12 is empty. This is because at this point no taskbelongs to both g1 and g2. We then launch a task into g12 (line 29). g1->wait_for() andg2->wait_for() return only after the task in g12 completes execution because it belongs to g1,g2, and g12.

See Also

hetcompute::operator&(hetcompute::group_ptr const&,hetcompute::group_ptr const&)

9.1.2.7 group_ptr hetcompute::operator& ( group_ptr const & a, group_ptr const & b)

Parameters

a Group pointer to the first group to intersect.b Group pointer to the second group to intersect.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 254

Page 255: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Returns

group_ptr – Group pointer that points to a group that represents the intersection of a and b.

See Also

hetcompute::intersect(hetcompute::group_ptr const&,hetcompute::group_ptr const&).

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 255

Page 256: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.2 Kernels

Classes

• struct hetcompute::beta::call_tuple< Dim, Args >

Utility base template to get the tuple type. More...

• struct hetcompute::beta::call_tuple< Dim, gpu_kernel< Args...> >

Utility template to get the tuple type for GPU pipeline stages. More...

• class hetcompute::beta::cl_t

Type used for declaring the constant hetcompute::cl. More...

• class hetcompute::cpu_kernel< Fn >

A wrapper around a function object. More...

• class hetcompute::cpu_kernel< FReturnType(FArgs...)>

A wrapper around a function. More...

• class hetcompute::dsp_kernel< Fn >

• class hetcompute::dsp_kernel< int(∗)(Args...)>

• class hetcompute::beta::gl_t

Type used for declaring the constant hetcompute::gl. More...

• class hetcompute::gpu_kernel< Args >

A wrapper around OpenCL C kernels and OpenGL ES compute shaders for GPU compute. More...

• class hetcompute::local< T >

Used as a template parameter to hetcompute::gpu_kernel to indicate a locally allocated parameter. More...

Functions

• template<typename FReturnType , typename... FArgs>

hetcompute::cpu_kernel< FReturnType(FArgs...)> hetcompute::create_cpu_kernel (FReturnType(∗fn)(FArgs...))

Create a cpu_kernel object from a function.

• template<typename Fn >

hetcompute::cpu_kernel< typenamestd::remove_reference< Fn >::type > hetcompute::create_cpu_kernel (Fn &&fn)

Create a cpu_kernel object from a function object.

• template<typename... Args>

hetcompute::dsp_kernel< int(∗)(Args...)> hetcompute::create_dsp_kernel (int(∗fn)(Args...))

• template<typename... Args>

gpu_kernel< Args...> hetcompute::create_gpu_kernel (std::string const &cl_kernel_str, std::stringconst &cl_kernel_name, std::string const &cl_build_options="")

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 256

Page 257: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

• template<typename... Args>

gpu_kernel< Args...> hetcompute::beta::create_gpu_kernel (beta::cl_t const &, std::string const&cl_kernel_str, std::string const &cl_kernel_name, std::string const &cl_build_options="")

• template<typename... Args>

gpu_kernel< Args...> hetcompute::beta::create_gpu_kernel (beta::gl_t const &, std::string const&gl_kernel_str)

• template<typename... Args>

gpu_kernel< Args...> hetcompute::create_gpu_kernel (void const ∗cl_kernel_bin, size_tcl_kernel_len, std::string const &cl_kernel_name, std::string const &cl_build_options="")

• template<typename... Args>

gpu_kernel< Args...> hetcompute::beta::create_gpu_kernel (beta::cl_t const &, void const∗cl_kernel_bin, size_t cl_kernel_len, std::string const &cl_kernel_name, std::string const&cl_build_options="")

• template<typename... Args>

gpu_kernel< Args...> hetcompute::beta::create_gpu_kernel (beta::gl_t const &, void const∗gl_kernel_bin, size_t gl_kernel_len)

Variables

• staticHETCOMPUTE_CONSTEXPR_CONSTsize_type hetcompute::dsp_kernel< int(∗)(Args...)>::arity = parent::arity

• staticHETCOMPUTE_CONSTEXPR_CONSTsize_type hetcompute::cpu_kernel< Fn >::arity = parent::arity

• staticHETCOMPUTE_CONSTEXPR_CONSTsize_type hetcompute::cpu_kernel< FReturnType(FArgs...)>::arity = parent::arity

• cl_t const hetcompute::beta::cl

Used to explicitly indicate creation of an OpenCL kernel.

• gl_t const hetcompute::beta::gl {}

Used to explicitly indicate creation of an OpenGL ES compute kernel.

9.2.1 Class Documentation

9.2.1.1 struct hetcompute::beta::call_tuple

template<size_t Dim, typename... Args>struct hetcompute::beta::call_tuple< Dim, Args >

Utility template to get the tuple type of hetcompute::range, and other types.

The wrapped type can be accessed trough call_tuple<...>::type.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 257

Page 258: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Template Parameters

Dim dimension of hetcompute::range.Args... other types.

9.2.1.2 struct hetcompute::beta::call_tuple< Dim, gpu_kernel< Args...> >

template<size_t Dim, typename... Args>struct hetcompute::beta::call_tuple< Dim, gpu_kernel<Args...> >

Utility template to get the tuple type of hetcompute::range, and GPU kernel argument types. Use case: getthe return type of the before synchronization lambda for a gpu pipeline stage.

The wrapped type can be accessed trough call_tuple<...>::type.

Template Parameters

Dim dimension of hetcompute::range.Args... GPU kernel argument type list.

See Also

template<typename... Args> void add_gpu_stage(Args&&... args)

Data fields

Type Field Descriptiontype

9.2.1.3 class hetcompute::beta::cl_t

See Also

hetcompute::gpu_kernel

9.2.1.4 class hetcompute::cpu_kernel

template<typename Fn>class hetcompute::cpu_kernel< Fn >

A cpu_kernel object contains CPU executable code. It can be used to create tasks. When such a taskruns, it executes the function object in its cpu_kernel.

See Also

cpu_kernel<FReturnType(FArgs...)>

Public Types

• using args_tuple = typename parent::args_tuple

• using collapsed_task_type = typename parent::collapsed_task_type

• using non_collapsed_task_type = typename parent::non_collapsed_task_type

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 258

Page 259: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

• using return_type = typename parent::return_type

• using size_type = typename parent::size_type

Public member functions

• cpu_kernel (Fn const &fn)

Constructor.

• cpu_kernel (Fn &&fn)

Constructor.

• cpu_kernel (cpu_kernel const &other)

Copy constructor.

• cpu_kernel (cpu_kernel &&other)

Move constructor.

• ∼cpu_kernel ()

Destructor.

• bool is_big () const

Returns whether a cpu_kernel object is meant for a big core in a big.LITTLE SoC.

• bool is_blocking () const

Returns whether this cpu_kernel object is blocking.

• bool is_little () const

Returns whether a cpu_kernel object is meant for a LITTLE core in a big.LITTLE SoC.

• bool is_prime () const

Returns whether a cpu_kernel object is meant for a prime core in a tricluster SoC.

• cpu_kernel & operator= (cpu_kernel const &other)

Copy assignment.

• cpu_kernel & operator= (cpu_kernel &&other)

Move assignment.

• cpu_kernel & set_big ()

Set this cpu_kernel object as meant for a big core in a big.LITTLE SoC.

• cpu_kernel & set_blocking ()

Set this cpu_kernel object as blocking.

• cpu_kernel & set_little ()

Set this cpu_kernel object as meant for a LITTLE core in a big.LITTLE SoC.

• cpu_kernel & set_prime ()

Set this cpu_kernel object as meant for a prime core in a tricluster SoC.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 259

Page 260: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Static Public Attributes

• staticHETCOMPUTE_CONSTEXPR_CONSTsize_type arity = parent::arity

Friends

• struct ::hetcompute::internal::cpu_kernel_caller

• template<typename X , typename Y , typename Z >

struct ::hetcompute::internal::task_factory

• template<typename X , typename Y >

struct ::hetcompute::internal::task_factory_dispatch

9.2.1.4.1 Constructors and Destructors

9.2.1.4.1.1 template<typename Fn > hetcompute::cpu_kernel< Fn >::cpu_kernel ( Fn const & fn )[explicit]

Parameters

fn An lvalue function object.

9.2.1.4.1.2 template<typename Fn > hetcompute::cpu_kernel< Fn >::cpu_kernel ( Fn && fn )[explicit]

Parameters

fn An rvalue function object.

9.2.1.4.1.3 template<typename Fn > hetcompute::cpu_kernel< Fn >::cpu_kernel ( cpu_kernel< Fn >

const & other )

Parameters

other Another cpu_kernel object.

9.2.1.4.1.4 template<typename Fn > hetcompute::cpu_kernel< Fn >::cpu_kernel ( cpu_kernel< Fn >

&& other )

Parameters

other Another cpu_kernel object.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 260

Page 261: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.2.1.5 class hetcompute::cpu_kernel< FReturnType(FArgs...)>

template<typename FReturnType, typename... FArgs>class hetcompute::cpu_kernel<FReturnType(FArgs...)>

A cpu_kernel object contains CPU executable code. It can be used to create tasks. When such a taskruns, it executes the function in its cpu_kernel.

See Also

cpu_kernel<FReturnType(FArgs...)>

Public Types

• using args_tuple = typename parent::args_tuple

• using collapsed_task_type = typename parent::collapsed_task_type

• using non_collapsed_task_type = typename parent::non_collapsed_task_type

• using return_type = typename parent::return_type

• using size_type = typename parent::size_type

Public member functions

• cpu_kernel (FReturnType(∗fn)(FArgs...))

Constructor.

• cpu_kernel (cpu_kernel const &other)

Copy constructor.

• cpu_kernel (cpu_kernel &&other)

Move constructor.

• ∼cpu_kernel ()

Destructor.

• bool is_big () const

Returns whether a cpu_kernel object is meant for a big core in a big.LITTLE SoC.

• bool is_blocking () const

Returns whether a cpu_kernel object is blocking.

• bool is_little () const

Returns whether a cpu_kernel object is meant for a LITTLE core in a big.LITTLE SoC.

• bool is_prime () const

Returns whether a cpu_kernel object is meant for a prime core in a tricluster SoC.

• cpu_kernel & operator= (cpu_kernel const &other)

Copy assignment.

• cpu_kernel & operator= (cpu_kernel &&other)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 261

Page 262: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Move assignment.

• cpu_kernel & set_big ()

Set this cpu_kernel object as meant for a big core in a big.LITTLE SoC.

• cpu_kernel & set_blocking ()

Set a cpu_kernel object as blocking.

• cpu_kernel & set_little ()

Set this cpu_kernel object as meant for a LITTLE core in a big.LITTLE SoC.

• cpu_kernel & set_prime ()

Set this cpu_kernel object as meant for a prime core in a tricluster SoC.

Static Public Attributes

• staticHETCOMPUTE_CONSTEXPR_CONSTsize_type arity = parent::arity

Friends

• struct ::hetcompute::internal::cpu_kernel_caller

• template<typename X , typename Y , typename Z >

struct ::hetcompute::internal::task_factory

• template<typename X , typename Y >

struct ::hetcompute::internal::task_factory_dispatch

9.2.1.5.1 Constructors and Destructors

9.2.1.5.1.1 template<typename FReturnType , typename... FArgs> hetcompute::cpu_kernel<FReturnType(FArgs...)>::cpu_kernel ( FReturnType(∗)(FArgs...) fn ) [explicit]

Parameters

fn A function name or function pointer.

9.2.1.5.1.2 template<typename FReturnType , typename... FArgs> hetcompute::cpu_kernel<FReturnType(FArgs...)>::cpu_kernel ( cpu_kernel< FReturnType(FArgs...)> const & other )

Parameters

other Another cpu_kernel object.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 262

Page 263: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.2.1.5.1.3 template<typename FReturnType , typename... FArgs> hetcompute::cpu_kernel<FReturnType(FArgs...)>::cpu_kernel ( cpu_kernel< FReturnType(FArgs...)> && other )

Parameters

other Another cpu_kernel object.

9.2.1.6 class hetcompute::dsp_kernel

template<typename Fn>class hetcompute::dsp_kernel< Fn >

9.2.1.7 class hetcompute::dsp_kernel< int(∗)(Args...)>

template<typename... Args>class hetcompute::dsp_kernel< int(∗)(Args...)>

For this DSP kernel, the template signature corresponds to the DSP kernel’s parameter list.

Template Parameters

Args Arguments of the DSP function run by the kernel.

See Also

create_dsp_kernel() for creating a dsp_kernel.

Public Types

• using args_tuple = typename parent::args_tuple

• using collapsed_task_type = typename parent::collapsed_task_type

• using fn_type = typename parent::dsp_code_type

• using non_collapsed_task_type = typename parent::non_collapsed_task_type

• using return_type = typename parent::return_type

• using size_type = typename parent::size_type

Public member functions

• dsp_kernel (fn_type const &fn)

• dsp_kernel (fn_type &&fn)

• dsp_kernel (dsp_kernel const &other)

• dsp_kernel (dsp_kernel &&other)

• ∼dsp_kernel ()

• dsp_kernel & operator= (dsp_kernel const &other)

• dsp_kernel & operator= (dsp_kernel &&other)

• void set_adsp ()

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 263

Page 264: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

• void set_cdsp ()

Static Public Attributes

• staticHETCOMPUTE_CONSTEXPR_CONSTsize_type arity = parent::arity

Friends

• template<typename X , typename Y , typename Z >

struct ::hetcompute::internal::task_factory

• template<typename X , typename Y >

struct ::hetcompute::internal::task_factory_dispatch

9.2.1.7.1 Constructors and Destructors

9.2.1.7.1.1 template<typename... Args> hetcompute::dsp_kernel< int(∗)(Args...)>::dsp_kernel (fn_type const & fn ) [explicit]

Constructor

Parameters

fn The dsp function to be called.

9.2.1.7.1.2 template<typename... Args> hetcompute::dsp_kernel< int(∗)(Args...)>::dsp_kernel (fn_type && fn ) [explicit]

Constructor

Parameters

fn The DSP function to be called.

9.2.1.7.1.3 template<typename... Args> hetcompute::dsp_kernel< int(∗)(Args...)>::dsp_kernel (dsp_kernel< int(∗)(Args...)> && other )

Move constructor.

9.2.1.7.1.4 template<typename... Args> hetcompute::dsp_kernel< int(∗)(Args...)>::∼dsp_kernel ( )

Destructor.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 264

Page 265: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.2.1.7.2 Member Function Documentation

9.2.1.7.2.1 template<typename... Args> dsp_kernel& hetcompute::dsp_kernel< int(∗)(Args...)>-::operator= ( dsp_kernel< int(∗)(Args...)> const & other )

Equality operator.

9.2.1.7.2.2 template<typename... Args> dsp_kernel& hetcompute::dsp_kernel< int(∗)(Args...)>-::operator= ( dsp_kernel< int(∗)(Args...)> && other )

Inequality operator.

9.2.1.8 class hetcompute::beta::gl_t

See Also

hetcompute::gpu_kernel

9.2.1.9 class hetcompute::gpu_kernel

template<typename... Args>class hetcompute::gpu_kernel< Args >

A wrapper around OpenCL C kernels and OpenGL ES compute shaders for GPU compute. The templatesignature corresponds to the GPU kernel parameter list.

See Also

hetcompute::create_gpu_kernel for creating a gpu_kernel.

Public member functions

• gpu_kernel (std::string const &cl_kernel_str, std::string const &cl_kernel_name, std::string const&cl_build_options="")

Constructor, implicit for OpenCL kernel.

• gpu_kernel (beta::cl_t const &, std::string const &cl_kernel_str, std::string const &cl_kernel_name,std::string const &cl_build_options="")

Constructor, explicit for OpenCL kernel.

• gpu_kernel (beta::gl_t const &, std::string const &gl_kernel_str)

Constructor, explicit for OpenGL ES compute kernel.

• gpu_kernel (void const ∗cl_kernel_bin, size_t cl_kernel_len, std::string const &cl_kernel_name,std::string const &cl_build_options="")

Constructor, implicit for precompiled OpenCL kernel.

• gpu_kernel (beta::cl_t const &, void const ∗cl_kernel_bin, size_t cl_kernel_len, std::string const&cl_kernel_name, std::string const &cl_build_options="")

Constructor, explicit for precompiled OpenCL kernel.

• std::pair< void const ∗, size_t > get_cl_kernel_binary () const

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 265

Page 266: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Extracts the CL binary. Error if invoked on a non-OpenCL gpukernel.

• HETCOMPUTE_DEFAULT_METHOD (gpu_kernel(gpu_kernel const &))

• HETCOMPUTE_DEFAULT_METHOD (gpu_kernel &operator=(gpu_kernel const &))

• HETCOMPUTE_DEFAULT_METHOD (gpu_kernel(gpu_kernel &&))

• HETCOMPUTE_DEFAULT_METHOD (gpu_kernel &operator=(gpu_kernel &&))

• bool is_cl () const

Identifies if kernel type is OpenCL.

• bool is_gl () const

Identifies if kernel type is OpenGL ES.

Friends

• template<typename GPUKernel , typename... CallArgs>

void hetcompute::internal::execute_gpu (GPUKernel const &gk, CallArgs &&...args)

• template<typename GPUKernel , size_t Dims, typename... CallArgs>

struct hetcompute::internal::executor

• template<typename Code , class Enable >

struct hetcompute::internal::task_factory_dispatch

9.2.1.9.1 Constructors and Destructors

9.2.1.9.1.1 template<typename... Args> hetcompute::gpu_kernel< Args >::gpu_kernel ( std-::string const & cl_kernel_str, std::string const & cl_kernel_name, std::string const &cl_build_options = "" )

Constructor, implicit for OpenCL kernel.

Parameters

cl_kernel_str The OpenCL C kernel code as a string.cl_kernel_name The name of the kernel function to be called.cl_build_options Build options to pass to OpenCL (optional).

9.2.1.9.1.2 template<typename... Args> hetcompute::gpu_kernel< Args >::gpu_kernel ( beta::cl_tconst & , std::string const & cl_kernel_str, std::string const & cl_kernel_name, std::stringconst & cl_build_options = "" )

Constructor, explicit for OpenCL kernel.

• Pass hetcompute::beta::cl to explicitly select this OpenCL kernel constructor.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 266

Page 267: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Parameters

cl_kernel_str The OpenCL C kernel code as a string.cl_kernel_name The name of the kernel function to be called.cl_build_options Build options to pass to OpenCL (optional).

9.2.1.9.1.3 template<typename... Args> hetcompute::gpu_kernel< Args >::gpu_kernel ( beta::gl_tconst & , std::string const & gl_kernel_str )

Constructor, explicit for OpenGL ES compute kernel.

• Pass hetcompute::beta::gl to explicitly select this OpenGL ES kernel constructor.

Parameters

gl_kernel_str The OpenGL ES shader code as a string.

9.2.1.9.1.4 template<typename... Args> hetcompute::gpu_kernel< Args >::gpu_kernel ( void const ∗cl_kernel_bin, size_t cl_kernel_len, std::string const & cl_kernel_name, std::string const &cl_build_options = "" )

Constructor, implicit for precompiled OpenCL kernel.

Parameters

cl_kernel_bin The pointer to a precompiled OpenCL C kernel binary.cl_kernel_len The length of the precompiled kernel in bytes.cl_kernel_name The name of the kernel function to be called.cl_build_options Build options to pass to OpenCL (optional).

9.2.1.9.1.5 template<typename... Args> hetcompute::gpu_kernel< Args >::gpu_kernel ( beta-::cl_t const & , void const ∗ cl_kernel_bin, size_t cl_kernel_len, std::string const &cl_kernel_name, std::string const & cl_build_options = "" )

Constructor, explicit for precompiled OpenCL kernel.

• Pass hetcompute::beta::cl to explicitly select this OpenCL kernel constructor.

Parameters

cl_kernel_bin The pointer to a precompiled OpenCL C kernel binary.cl_kernel_len The length of the precompiled kernel in bytes.cl_kernel_name The name of the kernel function to be called.cl_build_options Build options to pass to OpenCL (optional).

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 267

Page 268: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.2.1.9.2 Member Function Documentation

9.2.1.9.2.1 template<typename... Args> std::pair<void const∗, size_t> hetcompute::gpu_kernel< Args>::get_cl_kernel_binary ( ) const

Extracts the CL binary. Error if invoked on a non-OpenCL gpukernel.

Returns

std::pair consisting of a pointer to an allocated buffer holding the CL binary and the size of theallocated buffer (sized to hold the binary) in bytes.

Note

Each invocation of this function internally allocates a new buffer of an appropriate size using new[].The user code is responsible for deleting the buffer after use by calling delete[].

9.2.1.9.2.2 template<typename... Args> bool hetcompute::gpu_kernel< Args >::is_cl ( ) const

Identifies if kernel type is OpenCL.

Returns

true if this is an OpenCL kernel, false otherwise.

9.2.1.9.2.3 template<typename... Args> bool hetcompute::gpu_kernel< Args >::is_gl ( ) const

Identifies if kernel type is OpenGL ES.

Returns

true if this is an OpenGL ES kernel, false otherwise.

9.2.1.10 class hetcompute::local

template<typename T>class hetcompute::local< T >

Used as a template parameter to hetcompute::gpu_kernel to indicate a locally allocated parameter.Corresponds to a __local parameter of an OpenCL kernel. During task creation or launch, thecorresponding argument takes the size of the local allocation in number of elements of type T.

See Also

hetcompute::gpu_kernel

Example:

const char* kernel_string = "__kernel void k(__global int *a,

__global int *b,

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 268

Page 269: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

__local int *c){

...}";

hetcompute::gpu_kernel<hetcompute::buffer_ptr<int>,hetcompute::buffer_ptr<int>,hetcompute::local<int>> gk(kernel_string, "k");

// pass __local size in number of elements (not number of bytes as for OpenCL)int number_of_ints = number_of_bytes / sizeof(int);auto t = hetcompute::create_task(gk, r, buf_a, buf_b, number_of_ints);

9.2.2 Function Documentation

9.2.2.1 template<typename FReturnType , typename... FArgs> hetcompute-::cpu_kernel<FReturnType(FArgs...)> hetcompute::create_cpu_kernel (FReturnType(∗)(FArgs...) fn )

Create a cpu_kernel object that executes a given function. This kernel object can then be used to createa task.

Parameters

fn A function name or function pointer.

Returns

cpu_kernel A kernel object that executes the given function.

See Also

create_cpu_kernel(Fn&& fn)

9.2.2.2 template<typename Fn > hetcompute::cpu_kernel<typename std::remove-_reference<Fn>::type> hetcompute::create_cpu_kernel ( Fn && fn)

Create a cpu_kernel object that executes a given function object. A function object (also called afunctor) is any object with the () operator defined, such as a lambda expression. This kernel object canthen be used to create a task.

Parameters

fn A function object such as a lambda expression.

Returns

cpu_kernel A kernel object that executes the given function object.

See Also

create_cpu_kernel(FReturnType(∗fn)(FArgs...))

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 269

Page 270: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.2.2.3 template<typename... Args> hetcompute::dsp_kernel<int (∗)(Args...)>hetcompute::create_dsp_kernel ( int(∗)(Args...) fn )

This template creates a DSP kernel executable by Qualcomm HetCompute SDK. The template signaturecorresponds to the DSP kernel parameter list.

The kernel code is specified as a C language function. The function returns an int that corresponds to thestatus. When the function returns something other than 0, the Qualcomm HetCompute runtime will triggera hetcompute::dsp_exception().

Template Parameters

dsp_function The DSP function pointer.

// dsp kernel creationauto hex_kernel = create_dsp_kernel_by_domain<const int*, int, int*, int>(adsp_domain_handle, "

hetcompute_dsp_array_is_prime");// Set DSP Kernel attribute to aDSPhex_kernel.set_adsp();

// create the dsp task that will be executed inside the dsp DSPauto hex_task = hetcompute::create_task(hex_kernel,

in_buf, // in access recognizedout_buf); // out access recognized

9.2.2.4 template<typename... Args> gpu_kernel<Args...> hetcompute::create_gpu_-kernel ( std::string const & cl_kernel_str, std::string const & cl_kernel_name,std::string const & cl_build_options = "" )

Creates a GPU kernel executable by HETCOMPUTE, implicitly for OpenCL. The template signaturecorresponds to the GPU kernel parameter list.

The kernel code is specified as a string of OpenCL C code.

Equivalent to calling the hetcompute::gpu_kernel constructor directly.

Parameters

cl_kernel_str The OpenCL C kernel code as a string.cl_kernel_name The name of the kernel function to be called.cl_build_options The build options to pass to OpenCL (optional).

Returns

A gpu_kernel object.

9.2.2.5 template<typename... Args> gpu_kernel<Args...> hetcompute::beta::create_-gpu_kernel ( beta::cl_t const & , std::string const & cl_kernel_str, std::stringconst & cl_kernel_name, std::string const & cl_build_options = "" )

Creates a GPU kernel executable by HETCOMPUTE, explictly for OpenCL. The template signaturecorresponds to the GPU kernel parameter list.

The kernel code is specified as a string of OpenCL C code.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 270

Page 271: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Equivalent to calling the hetcompute::gpu_kernel constructor directly.

• Pass hetcompute::beta::cl to explicitly select OpenCL.

Parameters

cl_kernel_str The OpenCL C kernel code as a string.cl_kernel_name The name of the kernel function to be called.cl_build_options The build options to pass to OpenCL (optional).

Returns

A gpu_kernel object.

9.2.2.6 template<typename... Args> gpu_kernel<Args...> hetcompute::beta-::create_gpu_kernel ( beta::gl_t const & , std::string const & gl_kernel_str)

Creates a GPU kernel executable by HETCOMPUTE, explictly for OpenGL ES. The template signaturecorresponds to the GPU kernel parameter list.

The OpenGL ES shader code is specified as a string.

Equivalent to calling the hetcompute::gpu_kernel constructor directly.

• Pass hetcompute::beta::gl to explicitly select OpenGL.

Parameters

gl_kernel_str The OpenGL ES shader code as a string.

Returns

A gpu_kernel object.

9.2.2.7 template<typename... Args> gpu_kernel<Args...> hetcompute::create_gpu_-kernel ( void const ∗ cl_kernel_bin, size_t cl_kernel_len, std::string const &cl_kernel_name, std::string const & cl_build_options = "" )

Creates a GPU kernel executable by HETCOMPUTE, implicitly for OpenCL, using a precompiled OpenCLkernel. The template signature corresponds to the GPU kernel parameter list.

The kernel code is specified as a prebuilt OpenCL kernel binary.

Equivalent to calling the hetcompute::gpu_kernel constructor directly.

Parameters

cl_kernel_bin The pointer to a precompiled OpenCL C kernel binary.cl_kernel_len The length of the precompiled kernel in bytes.cl_kernel_name The name of the kernel function to be called.cl_build_options The build options to pass to OpenCL (optional).

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 271

Page 272: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Returns

A gpu_kernel object.

9.2.2.8 template<typename... Args> gpu_kernel<Args...> hetcompute::beta::create-_gpu_kernel ( beta::cl_t const & , void const ∗ cl_kernel_bin, size_tcl_kernel_len, std::string const & cl_kernel_name, std::string const &cl_build_options = "" )

Creates a GPU kernel executable by HETCOMPUTE, explicitly for OpenCL, using a precompiled OpenCLkernel. The template signature corresponds to the GPU kernel parameter list.

The kernel code is specified as a prebuilt OpenCL kernel binary.

Equivalent to calling the hetcompute::gpu_kernel constructor directly.

• Pass hetcompute::beta::cl to explicitly select OpenCL.

Parameters

cl_kernel_bin The pointer to a precompiled OpenCL C kernel binary.cl_kernel_len The length of the precompiled kernel in bytes.cl_kernel_name The name of the kernel function to be called.cl_build_options The build options to pass to OpenCL (optional).

Returns

A gpu_kernel object.

9.2.3 Variable Documentation

9.2.3.1 cl_t const hetcompute::beta::cl

See Also

hetcompute::gpu_kernel

9.2.3.2 gl_t const hetcompute::beta::gl {}

See Also

hetcompute::gpu_kernel

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 272

Page 273: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.3 Indices

Classes

• class hetcompute::index< Dims >

• class hetcompute::index< 1 >

• class hetcompute::index< 2 >

• class hetcompute::index< 3 >

• class hetcompute::index_base< Dims >

9.3.1 Class Documentation

9.3.1.1 class hetcompute::index

template<size_t Dims>class hetcompute::index< Dims >

9.3.1.2 class hetcompute::index< 1 >

template<>class hetcompute::index< 1 >

Public member functions

• index (const std::array< size_t, 1 > &rhs)

• index (size_t i)

• void print () const

Additional Inherited Members

9.3.1.3 class hetcompute::index< 2 >

template<>class hetcompute::index< 2 >

Public member functions

• index (const std::array< size_t, 2 > &rhs)

• index (size_t i, size_t j)

• void print () const

Additional Inherited Members

9.3.1.4 class hetcompute::index< 3 >

template<>class hetcompute::index< 3 >

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 273

Page 274: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Public member functions

• index (const std::array< size_t, 3 > &rhs)

• index (size_t i, size_t j, size_t k)

• void print () const

Additional Inherited Members

9.3.1.5 class hetcompute::index_base

template<size_t Dims>class hetcompute::index_base< Dims >

Methods common to 1D, 2D, and 3D index objects are listed here. The value for Dims can be 1, 2, or 3.

Public member functions

• index_base (const std::array< size_t, Dims > &rhs)

• const std::array< size_t, Dims > & data () const

• bool operator!= (const index_base< Dims > &rhs) const

• index_base< Dims > operator+ (const index_base< Dims > &rhs)

• index_base< Dims > & operator+= (const index_base< Dims > &rhs)

• index_base< Dims > operator- (const index_base< Dims > &rhs)

• index_base< Dims > & operator-= (const index_base< Dims > &rhs)

• bool operator< (const index_base< Dims > &rhs) const

• bool operator<= (const index_base< Dims > &rhs) const

• index_base< Dims > & operator= (const index_base< Dims > &rhs)

• bool operator== (const index_base< Dims > &rhs) const

• bool operator> (const index_base< Dims > &rhs) const

• bool operator>= (const index_base< Dims > &rhs) const

• size_t & operator[ ] (size_t i)

• const size_t & operator[ ] (size_t i) const

Protected Attributes

• std::array< size_t, Dims > _data

9.3.1.5.1 Constructors and Destructors

9.3.1.5.1.1 template<size_t Dims> hetcompute::index_base< Dims >::index_base ( const std::array<size_t, Dims > & rhs ) [explicit]

Constructs an index_base object from std::array

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 274

Page 275: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Parameters

rhs std::array to be used for constructing a new object.

9.3.1.5.2 Member Function Documentation

9.3.1.5.2.1 template<size_t Dims> const std::array<size_t, Dims>& hetcompute::index_base< Dims>::data ( ) const

Returns a reference to an std::array of all coordinates of index_base object.

Returns

Const reference to an std::array of all coordinates of an index_base object.

9.3.1.5.2.2 template<size_t Dims> bool hetcompute::index_base< Dims >::operator!= ( constindex_base< Dims > & rhs ) const

Checks for inequality of this with another index_base object.

Parameters

rhs Reference to index_base to be compared with this.

Returns

TRUE – The two indices have different values.FALSE – The two indices have the same values.

9.3.1.5.2.3 template<size_t Dims> index_base<Dims> hetcompute::index_base< Dims >::operator+ (const index_base< Dims > & rhs )

Sums the corresponding values of the current index_base object and another index_base object and returnsa new index_base object.

Parameters

rhs index_base object to be used for summing with the values of the currentobject.

9.3.1.5.2.4 template<size_t Dims> index_base<Dims>& hetcompute::index_base< Dims>::operator+= ( const index_base< Dims > & rhs )

Sums the corresponding values of the current index_base object and another index_base object and returnsa reference to the current index_base object.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 275

Page 276: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Parameters

rhs index_base object to be used for summing with the values of current object.

9.3.1.5.2.5 template<size_t Dims> index_base<Dims> hetcompute::index_base< Dims >::operator- (const index_base< Dims > & rhs )

Subtracts the corresponding values of the current index_base object and another index_base object andreturns a new index_base object.

Parameters

rhs index_base object to be used for subtraction with the values of the currentobject.

9.3.1.5.2.6 template<size_t Dims> index_base<Dims>& hetcompute::index_base< Dims >::operator-=( const index_base< Dims > & rhs )

Subtracts the corresponding values of current index_base object and another index_base object and returnsa reference to current index_base object.

Parameters

rhs index_base object to be used for subtraction with the values of the currentobject.

9.3.1.5.2.7 template<size_t Dims> bool hetcompute::index_base< Dims >::operator< ( constindex_base< Dims > & rhs ) const

Checks if this object is less than another index_base object. Performs a lexicographical comparison of twoindex_base objects, similar to std::lexicographical_compare().

Parameters

rhs Reference to the index_base to be compared with this.

Returns

TRUE – If this is lexicographically is smaller than rhs.FALSE – If this is lexicographically is larger or equal to rhs.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 276

Page 277: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.3.1.5.2.8 template<size_t Dims> bool hetcompute::index_base< Dims >::operator<= ( constindex_base< Dims > & rhs ) const

Checks if this object is less than or equal to another index_base object. Does a lexicographical comparisonof two index_base objects, similar to std::lexicographical_compare().

Parameters

rhs Reference to index_base to be compared with this.

Returns

TRUE – If this is lexicographically is smaller or equal than rhs.FALSE – If this is lexicographically is larger than rhs.

9.3.1.5.2.9 template<size_t Dims> index_base<Dims>& hetcompute::index_base< Dims >::operator=( const index_base< Dims > & rhs )

Replaces the contents of the current index_base object with an other index_base object.

Parameters

rhs index_base object to be used for replacing the contents of current object.

9.3.1.5.2.10 template<size_t Dims> bool hetcompute::index_base< Dims >::operator== ( constindex_base< Dims > & rhs ) const

Compares this with another index_base object.

Parameters

rhs Reference to index_base to be compared with this.

Returns

TRUE – The two indices have the same values.FALSE – The two indices have different values.

9.3.1.5.2.11 template<size_t Dims> bool hetcompute::index_base< Dims >::operator> ( constindex_base< Dims > & rhs ) const

Checks if this object is greater than another index_base object. Does a lexicographical comparison of twoindex_base objects, similar to std::lexicographical_compare().

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 277

Page 278: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Parameters

rhs Reference to index_base to be compared with this.

Returns

TRUE – If this is lexicographically is larger than rhs.FALSE – If this is lexicographically is smaller or equal to rhs.

9.3.1.5.2.12 template<size_t Dims> bool hetcompute::index_base< Dims >::operator>= ( constindex_base< Dims > & rhs ) const

Checks if this object is greater or equal to another index_base object. Does a lexicographical comparison oftwo index_base objects, similar to std::lexicographical_compare().

Parameters

rhs Reference to index_base to be compared with this.

Returns

TRUE – If this is lexicographically is larger or equal to rhs.FALSE – If this is lexicographically is smaller than rhs.

9.3.1.5.2.13 template<size_t Dims> size_t& hetcompute::index_base< Dims >::operator[ ] ( size_t i )

Returns a reference to i-th coordinate of index_base object. No bounds checking is performed.

Parameters

i Specifies which coordinate to return.

Returns

Reference to i-th coordinate of the index_base object.

9.3.1.5.2.14 template<size_t Dims> const size_t& hetcompute::index_base< Dims >::operator[ ] (size_t i ) const

Returns a const reference to i-th coordinate of index_base object. No bounds checking is performed.

Parameters

i Specifies which coordinate to return.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 278

Page 279: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Returns

Const reference to i-th coordinate of the index_base object.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 279

Page 280: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.4 Ranges

Classes

• class hetcompute::range< Dims >

• class hetcompute::range< 1 >

• class hetcompute::range< 2 >

• class hetcompute::range< 3 >

• class hetcompute::range_base< Dims >

9.4.1 Class Documentation

9.4.1.1 class hetcompute::range

template<size_t Dims>class hetcompute::range< Dims >

9.4.1.2 class hetcompute::range< 1 >

template<>class hetcompute::range< 1 >

A 1-dimensional range.

// 1d vector sum using hetcompute::range<1>constexpr size_t N = 100; // size of vectorstd::vector<size_t> v(N);hetcompute::range<1> r(0, N);std::atomic<size_t> sum(0);

// initialize the vectorfor (size_t i = 0; i < N; i++)

v[i] = i + 1;

// compute the sum in parallelhetcompute::pfor_each(r, [v, &sum](hetcompute::index<1> idx) {

sum += v[idx[0]]; });

Public member functions

• range ()

• range (const std::array< size_t, 1 > &bb, const std::array< size_t, 1 > &ee)

• range (const std::array< size_t, 1 > &bb, const std::array< size_t, 1 > &ee, const std::array<size_t, 1 > &ss)

• range (size_t b0, size_t e0, size_t s0)

• range (size_t b0, size_t e0)

• range (size_t e0)

• size_t index_to_linear (const hetcompute::index< 1 > &it) const

• bool is_empty () const

• hetcompute::index< 1 > linear_to_index (size_t idx) const

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 280

Page 281: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

• size_t linearized_distance () const

• void print () const

• size_t size () const

Additional Inherited Members

9.4.1.2.1 Constructors and Destructors

9.4.1.2.1.1 hetcompute::range< 1 >::range ( )

Creates an empty 1D range.

9.4.1.2.1.2 hetcompute::range< 1 >::range ( size_t b0, size_t e0, size_t s0 )

Creates a 1D range, spans from [b0, e0), and is incremented in s0. It will cause a fatal error if b0 is greaterthan or equal to e0. s0 should be greater than 0.

Parameters

b0 Beginning of 1D range.e0 End of 1D range.s0 Stride of 1D range.

9.4.1.2.1.3 hetcompute::range< 1 >::range ( size_t b0, size_t e0 ) [explicit]

Creates a 1D range, spans from [b0, e0).

It will cause a fatal error if b0 is greater than or equal to e0.

Parameters

b0 Beginning of 1D range.e0 End of 1D range.

9.4.1.2.1.4 hetcompute::range< 1 >::range ( size_t e0 ) [explicit]

Creates a 1D range, spans from [0, e0).

Parameters

e0 End of 1D range.

9.4.1.2.2 Member Function Documentation

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 281

Page 282: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.4.1.2.2.1 size_t hetcompute::range< 1 >::index_to_linear ( const hetcompute::index< 1 > & it )const

Converts a hetcompute::index<1> object to a linear number with respect to the current range object.

Parameters

it hetcompute::index<1> object

Returns

A linear ID with respect to the current range object.

9.4.1.2.2.2 hetcompute::index<1> hetcompute::range< 1 >::linear_to_index ( size_t idx ) const

Converts a linear ID to a hetcompute::index<1> with respect to the current range object.

Parameters

idx Linear ID to be converted to an hetcompute::index<1> object.

Returns

hetcompute::index<1> with respect to the current range object.

9.4.1.2.2.3 size_t hetcompute::range< 1 >::linearized_distance ( ) const

Returns the linearized distance of the range

Returns

linearized distance of the range, product of length() in each dimension

9.4.1.2.2.4 size_t hetcompute::range< 1 >::size ( ) const

Returns the size of the range.

Returns

Size of the range, product of the number of elements in each dimension.

9.4.1.3 class hetcompute::range< 2 >

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 282

Page 283: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

template<>class hetcompute::range< 2 >

A 2-dimensional range.

// fill in a 2d matrix by tilesconstexpr size_t N = 20; // size of matrixconstexpr size_t TILE_SIZE = 5;size_t a[N][N];

// define a 2d range with stridehetcompute::range<2> r(0, N, TILE_SIZE, 0, N, TILE_SIZE);

// fill in each tile with the linear index of the tilehetcompute::pfor_each(r, [&a, r, TILE_SIZE](

hetcompute::index<2> idx) {size_t id = r.index_to_linear(idx);// iterate through the tilehetcompute::range<2> rt(idx[0], idx[0] + TILE_SIZE, idx[1], idx[1] + TILE_SIZE)

;for (size_t i = rt.begin(0); i < rt.end(0); i++)

for (size_t j = rt.begin(1); j < rt.end(1); j++)a[i][j] = id;

});

Public member functions

• range ()

• range (const std::array< size_t, 2 > &bb, const std::array< size_t, 2 > &ee)

• range (const std::array< size_t, 2 > &bb, const std::array< size_t, 2 > &ee, const std::array<size_t, 2 > &ss)

• range (size_t b0, size_t e0, size_t s0, size_t b1, size_t e1, size_t s1)

• range (size_t b0, size_t e0, size_t b1, size_t e1)

• range (size_t e0, size_t e1)

• size_t index_to_linear (const hetcompute::index< 2 > &it) const

• bool is_empty () const

• hetcompute::index< 2 > linear_to_index (size_t idx) const

• size_t linearized_distance () const

• void print () const

• size_t size () const

Additional Inherited Members

9.4.1.3.1 Constructors and Destructors

9.4.1.3.1.1 hetcompute::range< 2 >::range ( )

Creates an empty 2D range.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 283

Page 284: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.4.1.3.1.2 hetcompute::range< 2 >::range ( size_t b0, size_t e0, size_t s0, size_t b1, size_t e1,size_t s1 )

Creates a 2D range, comprising of points from the cross product [b0:e0:s0) x [0:e1:s1).

It will cause a fatal error if b0 is greater than or equal to e0 or if b1 is greater than or equal to e1. s0 ands1 should be greater than 0.

Parameters

b0 First coordinate of the beginning of 2D range.e0 First coordinate of the end of 2D range.s0 Stride of the first dimension of 2D range.b1 Second coordinate of the beginning of 2D range.e1 Second coordinate of the end of 2D range.s1 Stride of the second dimension of 2D range.

9.4.1.3.1.3 hetcompute::range< 2 >::range ( size_t b0, size_t e0, size_t b1, size_t e1 )

Creates a 2D range, comprising of points from the cross product [b0, e0) x [b1, e1).

It will cause a fatal error if b0 is greater than or equal to e0 or if b1 is greater than or equal to e1.

Parameters

b0 First coordinate of the beginning of 2D range.e0 First coordinate of the end of 2D range.b1 Second coordinate of the beginning of 2D range.e1 Second coordinate of the end of 2D range.

9.4.1.3.1.4 hetcompute::range< 2 >::range ( size_t e0, size_t e1 )

Creates a 2D range, comprising of points from the cross product [0, e0) x [0, e1).

Parameters

e0 First coordinate of the end of 2D range.e1 Second coordinate of the end of 2D range.

9.4.1.3.2 Member Function Documentation

9.4.1.3.2.1 size_t hetcompute::range< 2 >::index_to_linear ( const hetcompute::index< 2 > & it )const

Converts a hetcompute::index<2> object to a linear number with respect to the current range object.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 284

Page 285: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Parameters

it hetcompute::index<2> object

Returns

A linear ID with respect to the current range object.

9.4.1.3.2.2 hetcompute::index<2> hetcompute::range< 2 >::linear_to_index ( size_t idx ) const

Converts a linear ID to a hetcompute::index<2> with respect to the current range object.

Parameters

idx Linear ID to be converted to a hetcompute::index<2> object.

Returns

hetcompute::index<2> with respect to the current range object.

9.4.1.3.2.3 size_t hetcompute::range< 2 >::linearized_distance ( ) const

Returns the linearized distance of the range

Returns

linearized distance of the range, product of length() in each dimension

9.4.1.3.2.4 size_t hetcompute::range< 2 >::size ( ) const

Returns the size of the range.

Returns

Size of the range, product of the number of elements in each dimension.

9.4.1.4 class hetcompute::range< 3 >

template<>class hetcompute::range< 3 >

A 3-dimensional range.

// 6-point stencil in 3-dconstexpr size_t N = 10; // size of matrixconstexpr size_t TILE_SIZE = 2;float a[N][N][N];

// define a 3d range

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 285

Page 286: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

hetcompute::range<3> r(0, N, TILE_SIZE, 0, N, TILE_SIZE, 0, N, TILE_SIZE);

// initialize all the tiles in parallelhetcompute::pfor_each(r, [&a, TILE_SIZE](

hetcompute::index<3> idx) {// iterate through the tilehetcompute::range<3> rt(idx[0], idx[0] + TILE_SIZE, idx[1], idx[1] + TILE_SIZE,idx[2], idx[2] + TILE_SIZE);for (size_t i = rt.begin(0); i < rt.end(0); i++)

for (size_t j = rt.begin(1); j < rt.end(1); j++)for (size_t k = rt.begin(2); k < rt.end(2); k++)

if (i == 0 || j == 0 || k == 0)a[i][j][k] = 1;

elsea[i][j][k] = 0;

});

// stencil: define a range excluding the border elementshetcompute::range<3> ri(1, N - 1, TILE_SIZE, 1, N - 1, TILE_SIZE, 1, N - 1,

TILE_SIZE);hetcompute::pfor_each(ri, [&a, TILE_SIZE](

hetcompute::index<3> idx) {// iterate through the tilehetcompute::range<3> rt(idx[0], idx[0] + TILE_SIZE, idx[1], idx[1] + TILE_SIZE,idx[2], idx[2] + TILE_SIZE);for (size_t i = rt.begin(0); i < rt.end(0); i++)

for (size_t j = rt.begin(1); j < rt.end(1); j++)for (size_t k = rt.begin(2); k < rt.end(2); k++){

a[i][j][k] = (a[i - 1][j][k] + a[i + 1][j][k] + a[i][j - 1][k] + a[i][j + 1][k] + a[i][j][k - 1] + a[i][j][k + 1]) / 6.0f;

}});

Public member functions

• range ()

• range (const std::array< size_t, 3 > &bb, const std::array< size_t, 3 > &ee)

• range (const std::array< size_t, 3 > &bb, const std::array< size_t, 3 > &ee, const std::array<size_t, 3 > &ss)

• range (size_t b0, size_t e0, size_t s0, size_t b1, size_t e1, size_t s1, size_t b2, size_t e2, size_t s2)

• range (size_t b0, size_t e0, size_t b1, size_t e1, size_t b2, size_t e2)

• range (size_t e0, size_t e1, size_t e2)

• size_t index_to_linear (const hetcompute::index< 3 > &it) const

• bool is_empty () const

• hetcompute::index< 3 > linear_to_index (size_t idx) const

• size_t linearized_distance () const

• void print () const

• size_t size () const

Additional Inherited Members

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 286

Page 287: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.4.1.4.1 Constructors and Destructors

9.4.1.4.1.1 hetcompute::range< 3 >::range ( )

Creates an empty 3D range.

9.4.1.4.1.2 hetcompute::range< 3 >::range ( size_t b0, size_t e0, size_t s0, size_t b1, size_t e1,size_t s1, size_t b2, size_t e2, size_t s2 )

Creates a 3D range, comprising of points from the cross product [b0:e0:s0) x [b1:e1:s1) x [b2:e2:s2)

It will cause a fatal error if b0 is greater than or equal to e0 or if b1 is greater than or equal to e1 or if b2is greater than or equal to e2. s0, s1 and s2 should be greater than 0.

Parameters

b0 First coordinate of the beginning of 3D range.e0 First coordinate of the end of 3D range.s0 Stride of the first dimension of 3D range.b1 Second coordinate of the beginning of 3D range.e1 Second coordinate of the end of 3D range.s1 Stride of the second dimension of 3D range.b2 Third coordinate of the beginning of 3D range.e2 Third coordinate of the end of 3D range.s2 Stride of the third dimension of 3D range.

9.4.1.4.1.3 hetcompute::range< 3 >::range ( size_t b0, size_t e0, size_t b1, size_t e1, size_t b2,size_t e2 )

Creates a 3D range, comprising of points from the cross product [b0, e0) x [b1, e1) x [b2, e2) It will cause afatal error if b0 is greater than or equal to e0 or if b1 is greater than or equal to e1 or if b2 is greater thanor equal to e2.

Parameters

b0 First coordinate of the beginning of 3D range.e0 First coordinate of the end of 3D range.b1 Second coordinate of the beginning of 3D range.e1 Second coordinate of the end of 3D range.b2 Third coordinate of the beginning of 3D range.e2 Third coordinate of the end of 3D range.

9.4.1.4.1.4 hetcompute::range< 3 >::range ( size_t e0, size_t e1, size_t e2 )

Creates a 3D range, comprising of points from the cross product [0, e0) x [0, e1) x [0, e2)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 287

Page 288: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Parameters

e0 First coordinate of the end of 3D range.e1 Second coordinate of the end of 3D range.e2 Third coordinate of the end of 3D range.

9.4.1.4.2 Member Function Documentation

9.4.1.4.2.1 size_t hetcompute::range< 3 >::index_to_linear ( const hetcompute::index< 3 > & it )const

Converts a hetcompute::index<3> object to a linear number with the current range object.

Parameters

it hetcompute::index<3> object.

Returns

A linear ID with respect to the current range object.

9.4.1.4.2.2 hetcompute::index<3> hetcompute::range< 3 >::linear_to_index ( size_t idx ) const

Converts a linear ID to a hetcompute::index<3> with respect to the current range object.

Parameters

idx Linear ID to be converted to a hetcompute::index<3> object.

Returns

hetcompute::index<3> with respect to the current range object.

9.4.1.4.2.3 size_t hetcompute::range< 3 >::linearized_distance ( ) const

Returns the linearized distance of the range

Returns

linearized distance of the range, product of length() in each dimension

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 288

Page 289: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.4.1.4.2.4 size_t hetcompute::range< 3 >::size ( ) const

Returns the size of the range.

Returns

Size of the range, product of the number of elements in each dimension.

9.4.1.5 class hetcompute::range_base

template<size_t Dims>class hetcompute::range_base< Dims >

Methods common to 1D, 2D, and 3D ranges.

Public member functions

• range_base (const std::array< size_t, Dims > &bb, const std::array< size_t, Dims > &ee)

• range_base (const std::array< size_t, Dims > &bb, const std::array< size_t, Dims > &ee, conststd::array< size_t, Dims > &ss)

• size_t begin (const size_t i) const

• const std::array< size_t, Dims > & begin () const

• size_t dims () const

• size_t end (const size_t i) const

• const std::array< size_t, Dims > & end () const

• size_t length (const size_t i) const

• size_t num_elems (const size_t i) const

• size_t stride (const size_t i) const

• const std::array< size_t, Dims > & stride () const

Protected Attributes

• std::array< size_t, Dims > _b

• std::array< size_t, Dims > _e

• std::array< size_t, Dims > _s

9.4.1.5.1 Member Function Documentation

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 289

Page 290: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.4.1.5.1.1 template<size_t Dims> size_t hetcompute::range_base< Dims >::begin ( const size_t i )const

Returns the beginning of the range in the i-th coordinate. It will cause a fatal error if i is greater than orequal to Dims.

Returns

The beginning of the range in the i-th coordinate. (i < Dims).

9.4.1.5.1.2 template<size_t Dims> const std::array<size_t, Dims>& hetcompute::range_base< Dims>::begin ( ) const

Returns all the coordinates of the beginning of the range.

Returns

An array of all the coordinates at the beginning of the range.

9.4.1.5.1.3 template<size_t Dims> size_t hetcompute::range_base< Dims >::dims ( ) const

Returns the dimensions of the range.

Returns

Dimensions of the range, between [1,3].

9.4.1.5.1.4 template<size_t Dims> size_t hetcompute::range_base< Dims >::end ( const size_t i )const

Returns the end of the range in the i-th coordinate. It will cause a fatal error if i is greater than or equal toDims.

Returns

End of the range in the i-th coordinate. (i < Dims)

9.4.1.5.1.5 template<size_t Dims> const std::array<size_t, Dims>& hetcompute::range_base< Dims>::end ( ) const

Returns all the coordinates of the end of range.

Returns

An array of all the coordinates of the end of the range.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 290

Page 291: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.4.1.5.1.6 template<size_t Dims> size_t hetcompute::range_base< Dims >::length ( const size_t i )const

The length of range in the i-th coordinate. num_elems(i) does not take non-stride indices into account,whereas length(i) returns the total length in the i-th coordinate.

Returns

the length of range in the i-th coordinate.

9.4.1.5.1.7 template<size_t Dims> size_t hetcompute::range_base< Dims >::num_elems ( constsize_t i ) const

The number of elements in the i-th coordinate. Equivalent to size(i).

Returns

the number of elements in the i-th coordinate. (i < Dims)

9.4.1.5.1.8 template<size_t Dims> size_t hetcompute::range_base< Dims >::stride ( const size_t i )const

Returns the stride the range in the i-th coordinate.. It will cause a fatal error if i is greater than or equal toDims.

Returns

Stride of the range in the i-th coordinate. (i < Dims)

9.4.1.5.1.9 template<size_t Dims> const std::array<size_t, Dims>& hetcompute::range_base< Dims>::stride ( ) const

Returns all the coordinates of the stride of the range.

Returns

An array of all the coordinates of the stride of the range.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 291

Page 292: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.5 Tasks

Classes

• struct hetcompute::do_not_collapse_t

• class hetcompute::task< ReturnType >

Tasks with a function of non-void return type. More...

• class hetcompute::task< ReturnType(Args...)>

Tasks with the full function signature (return type + parameter list). More...

• class hetcompute::task< void >

Tasks with a function of void return type. More...

• class hetcompute::task<>

Tasks as the basic unit of work. More...

• class hetcompute::task_ptr< ReturnType >

Smart pointer to a task object with function with non-void return type. More...

• class hetcompute::task_ptr< ReturnType(Args...)>

Smart pointer to a task object with full-function signature (return type + parameter list). More...

• class hetcompute::task_ptr< void >

Smart pointer to a task object with function with void return type. More...

• class hetcompute::task_ptr<>

Smart pointer to a task object without function information. More...

Typedefs

• template<typename Fn >

using hetcompute::collapsed_task_type = typename::hetcompute::internal::task_factory< Fn>::collapsed_task_type

• template<typename Fn >

using hetcompute::non_collapsed_task_type = typename::hetcompute::internal::task_factory< Fn>::non_collapsed_task_type

Functions

• void hetcompute::abort_on_cancel ()

Aborts execution of calling task if any of its groups is canceled or if someone has canceled it by callinghetcompute::cancel().

• void hetcompute::abort_task ()

Aborts execution of calling task.

• template<typename Task >

hetcompute::internal::by_data_dep_t< Task && > hetcompute::bind_as_data_dependency (Task &&t)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 292

Page 293: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Explicitly bind a hetcompute::task_ptr<...> or hetcompute::task<...>∗ as data dependency.

• template<typename Task >

hetcompute::internal::by_value_t< Task && > hetcompute::bind_by_value (Task &&t)

Explicitly bind a hetcompute::task_ptr<...> or hetcompute::task<...>∗ by value.

• template<typename BlockingFunction , typename CancelFunction >

void hetcompute::blocking (BlockingFunction &&bf, CancelFunction &&cf)

Enclose user-code that blocks on external activity.

• template<typename Code , typename... Args>

collapsed_task_type< Code > hetcompute::create_task (Code &&code, Args &&...args)

• template<typename Code , typename... Args>

non_collapsed_task_type< Code > hetcompute::create_task (do_not_collapse_t, Code &&code,Args &&...args)

• template<typename ReturnType , typename... Args>

::hetcompute::task_ptr< ReturnType > hetcompute::create_value_task (Args &&...args)

• void hetcompute::finish_after (::hetcompute::task<> ∗task)

Specifies that the task invoking this function should be deemed to finish only after the task finishes.

• void hetcompute::finish_after (::hetcompute::task_ptr<> const &task)

• template<typename Code , typename... Args>

collapsed_task_type< Code > hetcompute::launch (Code &&code, Args &&...args)

• template<typename Code , typename... Args>

non_collapsed_task_type< Code > hetcompute::launch (do_not_collapse_t, Code &&code, Args&&...args)

• bool hetcompute::operator!= (::hetcompute::task_ptr<> const &t, std::nullptr_t)

Compare tasks.

• bool hetcompute::operator!= (std::nullptr_t,::hetcompute::task_ptr<> const &t)

Compare tasks.

• bool hetcompute::operator!= (::hetcompute::task_ptr<> const &a,::hetcompute::task_ptr<> const&b)

Compare tasks.

• template<typename T1 , typename T2 >

inline::hetcompute::task_ptr< decltype(std::declval< typename::hetcompute::task_ptr< T1 >::return_type >)%std::declval< typename::hetcompute::task_ptr< T2 >::return_type >))> hetcompute::operator% (const ::hetcompute::task_ptr< T1 > &t1, const::hetcompute::task_ptr< T2 > &t2)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 293

Page 294: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Algebraic binary operator % for tasks.

• template<typename T1 , typename T2 >

inline::hetcompute::task_ptr< decltype(std::declval< typename::hetcompute::task_ptr< T1 >::return_type >)%std::declval< T2 >))> hetcompute::operator% (const ::hetcompute::task_ptr< T1 > &t1, T2&&op2)

Algebraic binary operator % for tasks.

• template<typename T1 , typename T2 >

inline::hetcompute::task_ptr< decltype(std::declval< T1 >)%std::declval< typename::hetcompute::task_ptr< T2 >::return_type >))> hetcompute::operator% (T1 &&op1, const ::hetcompute::task_ptr< T2 >&t2)

Algebraic binary operator % for tasks.

• template<typename T1 , typename T2 >

inline::hetcompute::task_ptr< decltype(std::declval< typename::hetcompute::task_ptr< T1 >::return_type >)&std::declval< typename::hetcompute::task_ptr< T2 >::return_type >))> hetcompute::operator& (const ::hetcompute::task_ptr< T1 > &t1, const::hetcompute::task_ptr< T2 > &t2)

Algebraic binary operator & for tasks.

• template<typename T1 , typename T2 >

inline::hetcompute::task_ptr< decltype(std::declval< typename::hetcompute::task_ptr< T1 >::return_type >)&std::declval< T2 >))> hetcompute::operator& (const ::hetcompute::task_ptr< T1 > &t1, T2&&op2)

Algebraic binary operator & for tasks.

• template<typename T1 , typename T2 >

inline::hetcompute::task_ptr< decltype(std::declval< T1 >)&std::declval< typename::hetcompute::task_ptr< T2 >::return_type >))> hetcompute::operator& (T1 &&op1, const ::hetcompute::task_ptr< T2 >&t2)

Algebraic binary operator & for tasks.

• template<typename T1 , typename T2 >

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 294

Page 295: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

inline::hetcompute::task_ptr< decltype(std::declval< typename::hetcompute::task_ptr< T1 >::return_type >)∗std::declval< typename::hetcompute::task_ptr< T2 >::return_type >))> hetcompute::operator∗ (const ::hetcompute::task_ptr< T1 > &t1, const::hetcompute::task_ptr< T2 > &t2)

Algebraic binary operator ∗ for tasks.

• template<typename T1 , typename T2 >

inline::hetcompute::task_ptr< decltype(std::declval< typename::hetcompute::task_ptr< T1 >::return_type >)∗std::declval< T2 >))> hetcompute::operator∗ (const ::hetcompute::task_ptr< T1 > &t1, T2&&op2)

Algebraic binary operator ∗ for tasks.

• template<typename T1 , typename T2 >

inline::hetcompute::task_ptr< decltype(std::declval< T1 >)∗std::declval< typename::hetcompute::task_ptr< T2 >::return_type >))> hetcompute::operator∗ (T1 &&op1, const ::hetcompute::task_ptr< T2 >&t2)

Algebraic binary operator ∗ for tasks.

• template<typename T >

inline::hetcompute::task_ptr< typename::hetcompute::task_ptr< T >::return_type > hetcompute::operator+ (const ::hetcompute::task_ptr< T > &t)

Algebraic unary operator + for task.

• template<typename T1 , typename T2 >

inline::hetcompute::task_ptr< decltype(std::declval< typename::hetcompute::task_ptr< T1 >::return_type >)+std::declval< typename::hetcompute::task_ptr< T2 >::return_type >))> hetcompute::operator+ (const ::hetcompute::task_ptr< T1 > &t1, const::hetcompute::task_ptr< T2 > &t2)

Algebraic binary operator + for tasks.

• template<typename T1 , typename T2 >

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 295

Page 296: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

inline::hetcompute::task_ptr< decltype(std::declval< typename::hetcompute::task_ptr< T1 >::return_type >)+std::declval< T2 >))> hetcompute::operator+ (const ::hetcompute::task_ptr< T1 > &t1, T2&&op2)

Algebraic binary operator + for tasks.

• template<typename T1 , typename T2 >

inline::hetcompute::task_ptr< decltype(std::declval< T1 >)+std::declval< typename::hetcompute::task_ptr< T2 >::return_type >))> hetcompute::operator+ (T1 &&op1, const ::hetcompute::task_ptr< T2 >&t2)

Algebraic binary operator + for tasks.

• template<typename T >

inline::hetcompute::task_ptr< typename::hetcompute::task_ptr< T >::return_type > hetcompute::operator- (const ::hetcompute::task_ptr< T > &t)

Algebraic unary operator - for task.

• template<typename T1 , typename T2 >

inline::hetcompute::task_ptr< decltype(std::declval< typename::hetcompute::task_ptr< T1 >::return_type >)-std::declval< typename::hetcompute::task_ptr< T2 >::return_type >))> hetcompute::operator- (const ::hetcompute::task_ptr< T1 > &t1, const::hetcompute::task_ptr< T2 > &t2)

Algebraic binary operator - for tasks.

• template<typename T1 , typename T2 >

inline::hetcompute::task_ptr< decltype(std::declval< typename::hetcompute::task_ptr< T1 >::return_type >)-std::declval< T2 >))> hetcompute::operator- (const ::hetcompute::task_ptr< T1 > &t1, T2&&op2)

Algebraic binary operator - for tasks.

• template<typename T1 , typename T2 >

inline::hetcompute::task_ptr< decltype(std::declval< T1 >)-std::declval< typename::hetcompute::task_ptr< T2 >::return_type >))> hetcompute::operator- (T1 &&op1, const ::hetcompute::task_ptr< T2 >&t2)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 296

Page 297: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Algebraic binary operator - for tasks.

• template<typename T1 , typename T2 >

inline::hetcompute::task_ptr< decltype(std::declval< typename::hetcompute::task_ptr< T1 >::return_type >)/std::declval< typename::hetcompute::task_ptr< T2 >::return_type >))> hetcompute::operator/ (const ::hetcompute::task_ptr< T1 > &t1, const::hetcompute::task_ptr< T2 > &t2)

Algebraic binary operator / for tasks.

• template<typename T1 , typename T2 >

inline::hetcompute::task_ptr< decltype(std::declval< typename::hetcompute::task_ptr< T1 >::return_type >)/std::declval< T2 >))> hetcompute::operator/ (const ::hetcompute::task_ptr< T1 > &t1, T2&&op2)

Algebraic binary operator / for tasks.

• template<typename T1 , typename T2 >

inline::hetcompute::task_ptr< decltype(std::declval< T1 >)/std::declval< typename::hetcompute::task_ptr< T2 >::return_type >))> hetcompute::operator/ (T1 &&op1, const ::hetcompute::task_ptr< T2 >&t2)

Algebraic binary operator / for tasks.

• bool hetcompute::operator== (task_ptr<> const &t, std::nullptr_t)

Compare tasks.

• bool hetcompute::operator== (std::nullptr_t, task_ptr<> const &t)

Compare tasks.

• bool hetcompute::operator== (::hetcompute::task_ptr<> const &a,::hetcompute::task_ptr<> const&b)

Compare tasks.

• inline::hetcompute::task_ptr & hetcompute::operator>> (::hetcompute::task_ptr<>&pred,::hetcompute::task_ptr<> &succ)

Compare tasks.

• template<typename T1 , typename T2 >

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 297

Page 298: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

inline::hetcompute::task_ptr< decltype(std::declval< typename::hetcompute::task_ptr< T1 >::return_type >)∧std::declval< typename::hetcompute::task_ptr< T2 >::return_type >))> hetcompute::operator∧ (const ::hetcompute::task_ptr< T1 > &t1, const::hetcompute::task_ptr< T2 > &t2)

Algebraic binary operator ∧ for tasks.

• template<typename T1 , typename T2 >

inline::hetcompute::task_ptr< decltype(std::declval< typename::hetcompute::task_ptr< T1 >::return_type >)∧std::declval< T2 >))> hetcompute::operator∧ (const ::hetcompute::task_ptr< T1 > &t1, T2&&op2)

Algebraic binary operator ∧ for tasks.

• template<typename T1 , typename T2 >

inline::hetcompute::task_ptr< decltype(std::declval< T1 >)∧std::declval< typename::hetcompute::task_ptr< T2 >::return_type >))> hetcompute::operator∧ (T1 &&op1, const ::hetcompute::task_ptr< T2 >&t2)

Algebraic binary operator ∧ for tasks.

• template<typename T1 , typename T2 >

inline::hetcompute::task_ptr< decltype(std::declval< typename::hetcompute::task_ptr< T1 >::return_type >)|std::declval< typename::hetcompute::task_ptr< T2 >::return_type >))> hetcompute::operator| (const ::hetcompute::task_ptr< T1 > &t1, const::hetcompute::task_ptr< T2 > &t2)

Algebraic binary operator | for tasks.

• template<typename T1 , typename T2 >

inline::hetcompute::task_ptr< decltype(std::declval< typename::hetcompute::task_ptr< T1 >::return_type >)|std::declval< T2 >))> hetcompute::operator| (const ::hetcompute::task_ptr< T1 > &t1, T2&&op2)

Algebraic binary operator | for tasks.

• template<typename T1 , typename T2 >

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 298

Page 299: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

inline::hetcompute::task_ptr< decltype(std::declval< T1 >)|std::declval< typename::hetcompute::task_ptr< T2 >::return_type >))> hetcompute::operator| (T1 &&op1, const ::hetcompute::task_ptr< T2 >&t2)

Algebraic binary operator | for tasks.

• template<typename T >

inline::hetcompute::task_ptr< typename::hetcompute::task_ptr< T >::return_type > hetcompute::operator∼ (const ::hetcompute::task_ptr< T > &t)

Algebraic unary operator ∼ for task.

Variables

• const do_not_collapse_t hetcompute::do_not_collapse {}

9.5.1 Class Documentation

9.5.1.1 struct hetcompute::do_not_collapse_t

Helper struct for specifying do_not_collapse task.

9.5.1.2 class hetcompute::task< ReturnType >

template<typename ReturnType>class hetcompute::task< ReturnType >

Template Parameters

ReturnType Return type of the task function.

Note: An object of this class should not be instantiated. It is a facade to the internal implementation.

Public Types

• using return_type = ReturnType

Public member functions

• return_type const & copy_value ()

Returns a const reference to the value returned by the task.

• return_type && move_value ()

Moves the value returned by value returned by the task.

Additional Inherited Members

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 299

Page 300: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.5.1.2.1 Member Typedef Documentation

9.5.1.2.1.1 template<typename ReturnType > using hetcompute::task< ReturnType >::return_type =ReturnType

Type returned by the task

9.5.1.2.2 Member Function Documentation

9.5.1.2.2.1 template<typename ReturnType > return_type const& hetcompute::task< ReturnType>::copy_value ( )

Returns a const reference to the value returned by the task.

This method behaves like hetcompute::task<>::wait_for, and it does not return until the taskcompletes its execution. It returns immediately if the task has already finished.hetcompute::task<ReturnType>::copy_value() can be called multiple times (unlikehetcompute::task<ReturnType>::move_value).

Example

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();78 // Define a lambda.9 auto l = [](int x) -> int { return x * 2; };10 // Create a task out of the lambda and launch.11 auto t1 = hetcompute::launch(l, 42);1213 // Wait t1 to finish and assign the return value to val.14 int val = t1->copy_value();1516 HETCOMPUTE_ILOG("return value of t1 is: %d", val);1718 hetcompute::runtime::shutdown();19 }

See Also

hetcompute::task<ReturnType>::move_value()hetcompute::task<>::wait_for()

9.5.1.2.2.2 template<typename ReturnType > return_type&& hetcompute::task< ReturnType>::move_value ( )

Moves the value returned by value returned by the task.

This method behaves like hetcompute::task<>::wait_for, and it does not return until the taskcompletes its execution. It returns immediately if the task has already finished.

hetcompute::task<ReturnType>::move_value() can only be called once. Any subsequentcall may raise an exception.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 300

Page 301: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Example

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();7 // Define a lambda.8 auto l = [](int x) -> int { return x * 2; };9 // Create a task out of the lambda and launch.10 auto t1 = hetcompute::launch(l, 42);1112 int val = t1->move_value();13 HETCOMPUTE_ILOG("return value of t1 is: %d", val);1415 hetcompute::runtime::shutdown();16 // Error! value might not be there anymore!!17 // int val_error = t1->move_value();18 return 0;19 }

See Also

hetcompute::task<ReturnType>::copy_value()hetcompute::task<>::wait_for()

9.5.1.3 class hetcompute::task< ReturnType(Args...)>

template<typename ReturnType, typename... Args>class hetcompute::task< ReturnType(Args...)>

Template Parameters

Args... Parameter type list of the task function.ReturnType Return type of the task function.

Note: An object of this class should not be instantiated. It is a facade to the internal implementation.

Public Types

• using args_tuple = std::tuple< Args...>

• using return_type = ReturnType

• using size_type = task<>::size_type

Public member functions

• template<typename... Arguments>

void bind_all (Arguments &&...args)

Bind all arguments to a task with a full-function signature.

• template<typename... Arguments>

void launch (Arguments &&...args)

Launches task and (optionally) binds arguments.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 301

Page 302: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Static Public Attributes

• staticHETCOMPUTE_CONSTEXPR_CONSTsize_type arity = sizeof...(Args)

Friends

• template<typename R , typename... As>

friend::hetcompute::internal::cputask_arg_layer< R(As...)> ∗ c_ptr2 (::hetcompute::task< R(As...)> &t)

Additional Inherited Members

9.5.1.3.1 Member Typedef Documentation

9.5.1.3.1.1 template<typename ReturnType , typename... Args> using hetcompute::task<ReturnType(Args...)>::args_tuple = std::tuple<Args...>

Tuple whose types are the types of the task arguments.

9.5.1.3.1.2 template<typename ReturnType , typename... Args> using hetcompute::task<ReturnType(Args...)>::return_type = ReturnType

Type returned by the task.

9.5.1.3.1.3 template<typename ReturnType , typename... Args> using hetcompute::task<ReturnType(Args...)>::size_type = task<>::size_type

Unsigned integral type.

9.5.1.3.2 Member Function Documentation

9.5.1.3.2.1 template<typename ReturnType , typename... Args> template<typename... Arguments>void hetcompute::task< ReturnType(Args...)>::bind_all ( Arguments &&... args )

Bind all arguments to a task with a full-function signature.

The arguments should be provided in the same order as the task’s parameter list. If an arg is not ahetcompute::task_ptr or hetcompute::task∗, its type should match the corresponding taskparameter (bound by value).

If an arg is a hetcompute::task_ptr or hetcompute::task∗, its return type should be c++compatible with the corresponding task parameter (bound as data dependency); or the type of the argshould be c++ compatible with the corresponding task paramter type (bound by value). When an arg canbe bound either as data dependency or by value, the binding type needs to be explicitly specificed usinghetcompute::bind_as_data_dependency or hetcompute::bind_by_value to avoidambiguity.

The number of arguments should be of the same arity.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 302

Page 303: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Template Parameters

Arguments... Task arguments type.

Parameters

args Arguments to the task.

Example

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();7 // Create a group.8 auto g = hetcompute::create_group();910 std::atomic<size_t> value;1112 // Create a non-collapsing task returns hetcompute::task_ptr<hetcompute::task_ptr<size_t>()>.13 auto t1 = hetcompute::create_task(

hetcompute::do_not_collapse, [&value]() -> hetcompute::task_ptr<size_t> {14 return hetcompute::create_task([&value]() -> size_t {15 value = 27;16 return value;17 });18 });1920 // Create a task.21 auto t2 = hetcompute::create_task([&value] { value = 42; });2223 // Create a task takes two parameters of hetcompute::task_ptr.24 auto t3 = hetcompute::create_task([g](

hetcompute::task_ptr<> ta, hetcompute::task_ptr<> tb) {25 // Set task dependency and launch.26 ta->then(tb); // t1->result() >> t227 g->launch(ta); // t1->result()->launch()28 g->launch(tb); // t2->launch();29 });3031 // Create a task takes one parameter of hetcompute::task_ptr<>.32 auto t4 = hetcompute::create_task([g](

hetcompute::task_ptr<> ta) {33 // Launch the task.34 g->launch(ta); // t1->launch();35 });3637 // Bind the arguments for t3.38 // Bind t1 to the first argument as data dependency (explicity due to ambiguity).39 // Bind t2 to the second argument (by value, no ambiguity).40 t3->bind_all(hetcompute::bind_as_data_dependency(t1), t2);4142 // Bind the argument for t4.43 // Bind t1 to the argument by value (explicity due to ambiguity).44 t4->bind_all(hetcompute::bind_by_value(t1));4546 // launch the tasks into the group (t1 and t2 will be launched in t3 and t4)47 g->launch(t3);48 g->launch(t4);4950 // Wait for the group to finish.51 g->wait_for();5253 HETCOMPUTE_ILOG("%zu", value.load());54

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 303

Page 304: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

55 hetcompute::runtime::shutdown();56 return 0;57 }

See Also

hetcompute::bind_as_dependency()hetcompute::bind_by_value()

9.5.1.3.2.2 template<typename ReturnType , typename... Args> template<typename... Arguments>void hetcompute::task< ReturnType(Args...)>::launch ( Arguments &&... args )

Launches task and (optionally) binds arguments.

This method informs the Qualcomm HetCompute runtime that the task is ready to execute as soon as thereis an available hardware context and after all its predecessors (both data- and control-dependent) haveexecuted.

launch() can have arguments. If so, the number of arguments in launch() should be the same asarity.

Example

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();7 auto t1 = hetcompute::create_task([](int x) { HETCOMPUTE_ILOG("Hello World %d!\n

", x); });89 //...10 // Set up dependencies if needed.11 // ..1213 // t1 is ready, launch and bind it.14 t1->launch(42);1516 // Wait for t1 to finish.17 t1->wait_for();18 hetcompute::runtime::shutdown();19 return 0;20 }

See Also

hetcompute::group::launch(Code)

9.5.1.3.3 Member Data Documentation

9.5.1.3.3.1 template<typename ReturnType , typename... Args> HETCOMPUTE_CONSTEXPR_CONSTsize_type hetcompute::task< ReturnType(Args...)>::arity = sizeof...(Args) [static]

Number of task arguments.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 304

Page 305: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.5.1.4 class hetcompute::task< void >

template<>class hetcompute::task< void >

Note: An object of this class should not be instantiated. It is a facade to the internal implementation.

Additional Inherited Members

9.5.1.5 class hetcompute::task<>

template<>class hetcompute::task<>

Note: This is the basic task without function signature information.

Note: An object of this class should not be instantiated. It is a facade to the internal implementation.

Public Types

• using size_type = ::hetcompute::internal::task::size_type

Public member functions

• void cancel ()

Cancels task.

• bool canceled () const

Checks whether a task is canceled.

• void finish_after ()

finish_after the task.

• bool is_bound () const

Checks whether a task is bound.

• void launch ()

Launches task.

• task_ptr & then (task_ptr<> &succ)

Creates a control dependency between two tasks.

• task ∗ then (task<> ∗succ)

• hc_error wait_for ()

Waits for the task to complete execution.

Protected Member Functions

• internal_raw_task_ptr get_raw_ptr () const

• HETCOMPUTE_DELETE_METHOD (task())

• HETCOMPUTE_DELETE_METHOD (task(task const &))

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 305

Page 306: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

• HETCOMPUTE_DELETE_METHOD (task(task &&))

• HETCOMPUTE_DELETE_METHOD (task &operator=(task const &))

• HETCOMPUTE_DELETE_METHOD (task &operator=(task &&))

Protected Attributes

• internal_raw_task_ptr _ptr

Friends

• friend::hetcompute::internal::task ∗ hetcompute::internal::c_ptr (::hetcompute::task<> ∗t)

9.5.1.5.1 Member Typedef Documentation

9.5.1.5.1.1 using hetcompute::task<>::size_type = ::hetcompute::internal::task::size_type

Unsigned integral type.

9.5.1.5.2 Member Function Documentation

9.5.1.5.2.1 void hetcompute::task<>::cancel ( )

Cancels task.

Use hetcompute::task<>::cancel() to cancel a task and its successors. The effects ofhetcompute::task<>::cancel() depend on the task status:

• If a task is canceled before it launches, it never executes – even if it is launched afterwards. Inaddition, it propagates the cancellation to the task’s successors. This is called "cancellationpropagation".

• If a task is canceled after it is launched, but before it starts executing, it will never execute and it willpropagate cancellation to its successors.

• If the task is running when someone else calls hetcompute::task<>::cancel(), it is up tothe task to ignore the cancellation request and continue its execution, or to honor the request viahetcompute::abort_on_cancel(), which aborts the task’s execution and propagates thecancellation to the task’s successors.

• Finally, if a task is canceled after it completes its execution (successfully or not), it does not changeits status and it does not propagate cancellation.

Example 1: Canceling a task before launching it:

1 #include <cassert>23 #include <hetcompute/hetcompute.hh>45 int6 main()7 {

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 306

Page 307: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

8 hetcompute::runtime::init();9 auto t1 = hetcompute::create_task([] { assert(false); });1011 auto t2 = hetcompute::create_task([] { assert(false); });1213 // Create control dependency.14 t1->then(t2);1516 // Cancel t1, which propagates cancellation to t217 t1->cancel();1819 // Launch t2. Does nothing, t2 got canceled via cancellation propagation20 t2->launch();2122 // Returns immediately, t2 is canceled.23 try24 {25 t2->wait_for();26 }27 catch (const hetcompute::canceled_exception& e)28 {29 std::cout << e.what() << ": t2 was canceled" << std::endl;30 }31 catch (...)32 {33 // Never reached34 }3536 hetcompute::runtime::shutdown();37 return 0;38 }

In the example above, a control dependency is created betwen two tasks, t1 and t2. Notice that, if any ofthe tasks executes, it will raise an assertion. In line 17, t1 is canceled, which causes t2 to be canceled aswell. In line 20, t2 is launched, but it does not matter as it will not execute because it was canceled whent1 propagated its cancellation.

Example 2: Canceling a task after launching it, but before it executes:

1 #include <cassert>23 #include <hetcompute/hetcompute.hh>45 int6 main()7 {8 hetcompute::runtime::init();9 auto t1 = hetcompute::create_task([] { HETCOMPUTE_ILOG("Hello World from t1!\n")

; });1011 auto t2 = hetcompute::create_task([] { assert(false); });1213 auto t3 = hetcompute::create_task([] { assert(false); });1415 // Create dependencies16 t1->then(t2)->then(t3);1718 // Launch t2. It cannot execute as yet because t1 has not been launched.19 t2->launch();2021 // Cancel t2, which propagates cancellation to t322 t2->cancel();2324 // Launch t1. It will execute because no one canceled it.25 t1->launch();

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 307

Page 308: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

2627 // Returns after t1 completes execution28 t1->wait_for();29 hetcompute::runtime::shutdown();3031 return 0;32 }

In the example above, three tasks are created and chained: t1, t2, and t3. In line 22, t2 is launched, but itcannot execute because its predecessor has not yet executed. In line 25, t2 is canceled, which means that itwill never execute. Because t3 is t2’s successor, it is also canceled – if t3 had a successor, it would alsobe canceled.

Example 3: Canceling a task while it executes:

1 #include <cassert>23 #include <hetcompute/hetcompute.hh>45 int6 main()7 {8 hetcompute::runtime::init();910 auto t = hetcompute::create_task([] {11 while (1)12 {13 hetcompute::abort_on_cancel();14 HETCOMPUTE_ILOG("Waiting to be canceled.\n");15 usleep(100);16 }17 assert(false); // This will never fire.18 });1920 // Launch t.21 t->launch();2223 // Wait for 2 seconds.24 usleep(200);2526 // Cancel task. Returns immediately.27 t->cancel();2829 try30 {31 // Wait for the task.32 t->wait_for();33 }34 catch (const hetcompute::canceled_exception& e)35 {36 std::cout << e.what() << " thrown" << std::endl;37 }38 catch (...)39 {40 // Never reached.41 }4243 hetcompute::runtime::shutdown();44 return 0;45 }

In the example above, task t’s will never finish unless it is canceled. t is launched in line 16. Afterlaunching the task, it is blocked for 2 seconds in line 19 to ensure that t is scheduled and prints itsmessages. In line 22, Qualcomm HetCompute is asked to cancel the task, which should be running by now.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 308

Page 309: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

hetcompute::task<>::cancel() returns immediately after it marks the task as "pending forcancellation". This means that t might still be executing after t->cancel() returns. That is whyt->wait_for() is called in line 26, to ensure that we wait for t to complete its execution. Remember:a task does not know whether someone has requested its cancellation unless it callshetcompute::abort_on_cancel() during its execution.

Example 4: Canceling a completed task.

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();7 auto t1 = hetcompute::create_task([] { HETCOMPUTE_ILOG("Hello World from t1!\n")

; });89 auto t2 = hetcompute::create_task([] {10 while (1)11 {12 hetcompute::abort_on_cancel();13 HETCOMPUTE_ILOG("Hello World from t2!\n");14 usleep(100);15 }16 });1718 // Create dependencies.19 t1->then(t2);2021 // Launch tasks.22 t1->launch();2324 // Wait for t1 to complete.25 t1->wait_for();2627 // Cancel t1.28 // Because it has already completed, it does not propagate its cancellation.29 t1->cancel();3031 // If the two lines below are uncommented the wait_for will never return.32 // t2->launch();33 // t2->wait_for();3435 hetcompute::runtime::shutdown();36 return 0;37 }

In the example above, t1 and t2 are launched after a dependency is set up between them. On line 28, iscanceled t1 after it has completed. By then, t1 has finished execution (waiting for it in line 24) socancel(t1) has no effect. Thus, nobody cancels t2 and wait_for(t2) in line 31 never returns.

See Also

hetcompute::abort_on_cancel()hetcompute::task<>::wait_for()

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 309

Page 310: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.5.1.5.2.2 bool hetcompute::task<>::canceled ( ) const

Use the method to check whether a task is canceled. If the task was canceled – via cancellation propagation,hetcompute::group::cancel() or hetcompute::task<>::cancel() – before it startedexecuting, hetcompute::task<>::canceled() returns true.

If the task was canceled – via hetcompute::group::cancel() or hetcompute::group-::cancel() – while it was executing, then hetcompute::task<>::canceled() returns trueonly if the task is not executing any more and it exited via hetcompute::abort_on_cancel().

Finally, if the task completed successfully, hetcompute::task<>::canceled() always returns false.

Returns

true – The task has transitioned to a canceled state.false – The task has not yet transitioned to a canceled state.

Example:

1 #include <cassert>2 #include <hetcompute/hetcompute.h>34 int5 main()6 {7 auto t = hetcompute::create_task([] {8 while (true)9 {10 hetcompute::abort_on_cancel();11 HETCOMPUTE_ILOG("Hello World!\n");12 usleep(1); // Sleep for one micro-second.13 }14 });1516 auto g = hetcompute::create_group();1718 // It will never fire.19 assert(t->canceled() == false);2021 // Launch task.22 g->launch(t);2324 // It will never fire.25 assert(t->canceled() == false);2627 // Sleep for 10 micro-seconds.28 usleep(10);2930 // It will never fire.31 assert(t->canceled() == false);3233 // Cancel both the task and the group.34 t->cancel();35 g->cancel();3637 // Might be false if the task has not executed abort_on_cancel() yet.38 // might also be true if the task has already executed abort_on_cancel().39 HETCOMPUTE_ILOG("t->canceled() = %d", t->canceled() == true);4041 try42 {43 // Wait for the task to transition to canceled state.44 t->wait_for();45 }46 catch (const hetcompute::canceled_exception& e)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 310

Page 311: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

47 {48 std::cout << "threw " << e.what() << " due to task cancellation " << std::endl;49 }50 catch (...)51 {52 // Never reached.53 }5455 // It will never fire.56 assert(t->canceled() == true);57 return 0;58 }

See Also

hetcompute::task<>::cancel()hetcompute::group::cancel()

9.5.1.5.2.3 void hetcompute::task<>::finish_after ( )

Specifies that the current task should be deemed to finish only after the task on which this method isinvoked finishes. This method returns immediately.

Example

1 #include <algorithm>2 #include <functional>3 #include <iostream>4 #include <iterator>5 #include <sstream>6 #include <vector>78 #include <hetcompute/hetcompute.hh>910 // Parallel mergesort using recursive fork-join parallelism.11 // hetcompute::task<>::finish_after allows easy expression of the parallelism in the12 // algorithm in a non-blocking manner, yielding better performance than13 // blocking parallelization using hetcompute::task<>::wait_for.1416 const size_t GRANULARITY = 8192;1718 // Asynchronous mergesort, to be invoked in a task19 template <typename Iterator, typename Compare>20 void21 mergesort(Iterator begin, Iterator end, Compare cmp)22 {23 size_t n = std::distance(begin, end);24 if (n <= GRANULARITY)25 {26 sort(begin, end, cmp);27 }28 else29 {30 auto middle = begin;31 std::advance(middle, n / 2);32 auto left = hetcompute::launch([=] { mergesort(begin, middle, cmp); });33 auto right = hetcompute::launch([=] { mergesort(middle, end, cmp); });34 auto merge = hetcompute::create_task([=] { std::inplace_merge(begin, middle,

end, cmp); });35 // The left subtree and right subtree tasks must finish before the merge36 // task can execute37 left->then(merge);38 right->then(merge);39 merge->launch();

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 311

Page 312: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

40 // mergesort(begin, end, cmp) logically finishes after the merge task41 // finishes42 merge->finish_after();43 }44 }4546 int47 main(int argc, const char* argv[])48 {49 hetcompute::runtime::init();50 std::vector<long> input;51 size_t n_def = 1 << 16;52 size_t n = n_def;5354 if (argc >= 2)55 {56 std::istringstream istr(argv[1]);57 istr >> n;58 }5960 // Create a random array of integers61 for (size_t i = 0; i < n; i++)62 {63 input.push_back(rand());64 }6566 // Launch mergesort inside a task since it has an asynchronous interface (due67 // to use of hetcompute::task::finish_after)68 auto t = hetcompute::launch([&] { mergesort(input.begin(), input.end(),

std::less<long>()); });69 t->wait_for();7071 if (!std::is_sorted(input.begin(), input.end()))72 {73 std::cerr << "parallel mergesorting failed\n";74 }7576 hetcompute::runtime::shutdown();77 return 0;78 }

Exceptions

api_exception If invoked from outside a task or from within ahetcompute::pfor_each or if ’task’ points to null.

Note

If exceptions are disabled by application, this API will terminate the app, if pointer to task isnullptr, invoked from outside a task or from within a hetcompute::pfor_each

See Also

hetcompute::group::finish_after()

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 312

Page 313: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.5.1.5.2.4 bool hetcompute::task<>::is_bound ( ) const

Use this method to check whether all the task parameters are bound. Returns true if the task had noparameters. Remember that only bound tasks can be launched.

Returns

true – All the task parameters are bound. If the task has none, then is_bound always return true. false –At least one of the task parameters is not bound.

9.5.1.5.2.5 void hetcompute::task<>::launch ( )

Launches task.

This method informs the Qualcomm HetCompute runtime that the task is ready to execute as soon as thereis an available hardware context and after all its predecessors (both data- and control-dependent) haveexecuted.

Example

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();7 // Create task t.8 auto t1 = hetcompute::create_task([] { HETCOMPUTE_ILOG("Hello World!\n"); });910 //...11 // Set up dependencies if needed.12 // ..1314 // t1 is ready, launch it.15 t1->launch();1617 // Wait for t to finish.18 t1->wait_for(); // Will not return until t finishes.19 hetcompute::runtime::shutdown();20 return 0;21 }

See Also

hetcompute::group::launch(Code)

9.5.1.5.2.6 task_ptr& hetcompute::task<>::then ( task_ptr<> & succ )

Creates a control dependency between two tasks.

The Qualcomm HetCompute runtime ensures that succ starts executing only after this has completed itsexecution, regardless of how many hardware execution contexts are available in the device. Use this methodto create task dependency graphs.

Note: The programmer is responsible for ensuring that there are no cycles in the task graph.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 313

Page 314: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

If succ has already been launched, hetcompute::task<>::then() will throw an api_exception.This is because it makes little sense to add a predecessor to a task that might already be running. On theother hand, if this successfully completed execution, no dependency will be created, andhetcompute::task<>::then returns immediately. If this gets canceled (or if it is canceled in thefuture), succ will be canceled as well due to cancellation propagation.

Do not call this member if succ is nullptr. This would cause a fatal error.

Parameters

succ Successor task. Cannot be nullptr.

Exceptions

api_exception If succ has already been launched.

Note

If exceptions are disabled by Application, terminates the app if successor is already launched

Example

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();78 // Create group.9 auto g = hetcompute::create_group("Hello World Group");1011 // Create tasks t1 and t2.12 auto t1 = hetcompute::create_task([] { HETCOMPUTE_ILOG("Hello World! from task

t1\n"); });1314 auto t2 = hetcompute::create_task([] { HETCOMPUTE_ILOG("Hello World! from task

t2\n"); });1516 // Create dependency between t1 and t2.17 t1->then(t2);1819 // Launch both t1 and t2 into g.20 g->launch(t1);21 g->launch(t2);2223 // Wait until t1 and t2 finish.24 g->wait_for();2526 hetcompute::runtime::shutdown();27 return 0;28 }

Output:Hello World! from task t1Hello World! from task t2

No other output is possible because of the dependency between t1 and t2.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 314

Page 315: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.5.1.5.2.7 hc_error hetcompute::task<>::wait_for ( )

Waits for the task to complete execution.

This method does not return until the task completes its execution. It returns immediately if the task hasalready finished.

If hetcompute::task<>::wait_for() is called from within a task, Qualcomm HetComputecontext-switches the task and finds another one to run. If called from outside a task (i.e., the main thread),Qualcomm HetCompute blocks the thread until wait_for(hetcompute::task_ptr) returns.

This method is a safe point. Safe points are Qualcomm HetCompute API methods where the followingproperty holds:

The thread on which the task executes before the API call might not be the same as the thread onwhich the task executes after the API call.

Example

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();7 // Create task t.8 auto t1 = hetcompute::create_task([] { HETCOMPUTE_ILOG("Hello World!\n"); });910 //...11 // Set up dependencies if needed.12 // ..1314 // t1 is ready, launch it.15 t1->launch();1617 // Wait for t to finish.18 t1->wait_for(); // Will not return until t finishes.19 hetcompute::runtime::shutdown();20 return 0;21 }

Exceptions

hetcompute-::canceled_exception

If the task or any task on which it is dependent was canceled.

Note

If exceptions are disabled by application, in the above case API will returnhc_error::HC_TaskCanceled

Exceptions

hetcompute-::aggregate_-exception

If the task or any tasks on which it is dependent threw two or moreexceptions.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 315

Page 316: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Note

If exceptions are disabled by application, in the above case API will returnhc_error::HC_TaskAggregateFailure

Exceptions

hetcompute::gpu_-exception

If the task or any tasks on which it is dependent encountered a runtime erroron the GPU.

Note

If exceptions are disabled by application, in the above case API will returnhc_error::HC_TaskGpuFailure

Exceptions

hetcompute-::hexagon_exception

If the task or any tasks on which it is dependent encountered a runtime erroron the Hexagon DSP.

Note

If exceptions are disabled by application, in the above case API will returnhc_error::HC_TaskDspFailure

Exceptions

any other exception that may be thrown by the task or any tasks on which it isdependent.

See Also

hetcompute::group::wait_for()

9.5.1.6 class hetcompute::task_ptr< ReturnType >

template<typename ReturnType>class hetcompute::task_ptr< ReturnType >

Smart pointer to a task object, i.e., hetcompute::task<ReturnType>, similar tostd::shared_ptr.

Template Parameters

ReturnType Return type of the task function

Public Types

• using return_type = ReturnType

• using task_type = task< return_type >

Public member functions

• task_ptr ()

Default constructor. Constructs a task_ptr<ReturnType> with no task<ReturnType>.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 316

Page 317: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

• task_ptr (std::nullptr_t)

Default constructor. Constructs a task_ptr<ReturnType> with no task<ReturnType>.

• task_ptr (task_ptr< return_type > const &other)

Copy constructor. Constructs a task_ptr<ReturnType> that manages the same task as other.

• task_ptr (task_ptr< return_type > &&other)

Move constructor. Move-constructs a task_ptr<ReturnType> that manages the same task asother.

• ∼task_ptr ()

Destructor.

• task_type ∗ get () const

Returns pointer to managed task.

• template<typename T >

task_ptr & operator%= (T &&op)

Compound assignment operator %= with value operand.

• template<typename T >

task_ptr & operator&= (T &&op)

Compound assignment operator &= with value operand.

• template<typename T >

task_ptr & operator∗= (T &&op)

Compound assignment operator ∗= with value operand.

• template<typename T >

task_ptr & operator+= (T &&op)

Compound assignment operator += with value operand.

• template<typename T >

task_ptr & operator-= (T &&op)

Compound assignment operator -= with value operand.

• task_type ∗ operator-> () const

Dereference operator. Returns pointer to managed task.

• template<typename T >

task_ptr & operator/= (T &&op)

Compound assignment operator /= with value operand.

• task_ptr & operator= (task_ptr< return_type > const &other)

Assignment operator. Assigns the task managed by other to ∗this.

• task_ptr & operator= (task_ptr< return_type > &&other)

Assignment operator. Resets ∗this.

• template<typename T >

task_ptr & operator∧= (T &&op)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 317

Page 318: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Compound assignment operator ∧= with value operand.

• template<typename T >

task_ptr & operator|= (T &&op)

Compound assignment operator |= with value operand.

• void swap (task_ptr< return_type > &other)

Exchanges managed tasks between ∗this and other.

9.5.1.6.1 Member Typedef Documentation

9.5.1.6.1.1 template<typename ReturnType > using hetcompute::task_ptr< ReturnType >::return_type= ReturnType

Return type of the task function.

9.5.1.6.1.2 template<typename ReturnType > using hetcompute::task_ptr< ReturnType >::task_type =task<return_type>

Task object type.

9.5.1.6.2 Constructors and Destructors

9.5.1.6.2.1 template<typename ReturnType > hetcompute::task_ptr< ReturnType >::task_ptr ( )

Constructs a task_ptr<ReturnType> that manages no task<ReturnType>.task_ptr<ReturnType>::get returns nullptr.

9.5.1.6.2.2 template<typename ReturnType > hetcompute::task_ptr< ReturnType >::task_ptr (std::nullptr_t )

Constructs a task_ptr<ReturnType> that manages no task<ReturnType>.task_ptr<ReturnType>::get returns nullptr.

9.5.1.6.2.3 template<typename ReturnType > hetcompute::task_ptr< ReturnType >::task_ptr (task_ptr< return_type > const & other )

Constructs a task_ptr<ReturnType> object that manages the same task<ReturnType> as other.If other points to nullptr, the newly built object also points to nullptr.

Parameters

other Task pointer to copy.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 318

Page 319: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.5.1.6.2.4 template<typename ReturnType > hetcompute::task_ptr< ReturnType >::task_ptr (task_ptr< return_type > && other )

Constructs a task_ptr<ReturnType> object that manages the same task as other and resetsother. If other points to nullptr, the newly built object also points to nullptr.

Parameters

other Task pointer to move from.

9.5.1.6.2.5 template<typename ReturnType > hetcompute::task_ptr< ReturnType >::∼task_ptr ( )

Destructor.

9.5.1.6.3 Member Function Documentation

9.5.1.6.3.1 template<typename ReturnType > task_type∗ hetcompute::task_ptr< ReturnType >::get () const

Returns pointer to the managed task. Remember that the lifetime of the task is defined by the lifetime of thetask_ptr<ReturnType> objects managing it. If all task_ptr<ReturnType> objects managinga task t go out of scope, all task<ReturnType>∗ pointing to t may be invalid.

Returns

Pointer to managed task object.

9.5.1.6.3.2 template<typename ReturnType > template<typename T > task_ptr& hetcompute::task_-ptr< ReturnType >::operator%= ( T && op )

Compound assignment operator %= with value operand.

See Also

template<typename T> task_ptr& operator+=(T&& op2)

9.5.1.6.3.3 template<typename ReturnType > template<typename T > task_ptr& hetcompute::task_-ptr< ReturnType >::operator&= ( T && op )

Compound assignment operator &= with value operand.

See Also

template<typename T> task_ptr& operator+=(T&& op2)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 319

Page 320: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.5.1.6.3.4 template<typename ReturnType > template<typename T > task_ptr& hetcompute::task_-ptr< ReturnType >::operator∗= ( T && op )

Compound assignment operator ∗= with value operand.

See Also

template<typename T> task_ptr& operator+=(T&& op2)

9.5.1.6.3.5 template<typename ReturnType > template<typename T > task_ptr& hetcompute::task_-ptr< ReturnType >::operator+= ( T && op )

Compound assignment operator += with value operand

Create a new task whose return value will be the result of this operator applied to the return value of theoriginal task pointed by this task_ptr and the operand on the right side of the operator.

The new task will be data dependent on the original task and the operand on the right side if it is also a task.

The new task will be launching automatically by the runtime once the data is ready.

The old task pointed by this will be dereferrenced, and this task_ptr will point to the newly created taskwith the result value.

Note: The operator should be applicable onto the return value of the current task and the operand (returnvalue is considered here if the operand is also a task).

Parameters

op Operand on the right side of this operator.

Returns

A new task whose return value is the result of this operator and can be pointed to by this shared pointer(same type of return value).

1 #include <iostream>2 #include <hetcompute/hetcompute.hh>34 int5 main()6 {7 hetcompute::runtime::init();89 // create and launch a task that return -73 and 27.810 auto t = hetcompute::launch([]() { return 73; });1112 // create a task whose return value is t’s return value + 27.813 // the new task will still be pointed by t14 // the new task t is data dependent on the original task15 // and the return type keeps the same (type coersion)16 t += 27.8;1718 // wait for t to finish and display the return value19 std::cout << "The return value of t is: " << t->copy_value() << std::endl;2021 hetcompute::runtime::shutdown();22 }

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 320

Page 321: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.5.1.6.3.6 template<typename ReturnType > template<typename T > task_ptr& hetcompute::task_-ptr< ReturnType >::operator-= ( T && op )

Compound assignment operator -= with value operand.

See Also

template<typename T> task_ptr& operator+=(T&& op2)

9.5.1.6.3.7 template<typename ReturnType > task_type∗ hetcompute::task_ptr< ReturnType>::operator-> ( ) const

Returns pointer to managed task. Do not call this member function if ∗this manages no task.

Exceptions

api_exception If task pointer is nullptr.

Note

If exceptions are disabled in application, terminates the app if task pointer is nullptr

Returns

Pointer to managed task object.

9.5.1.6.3.8 template<typename ReturnType > template<typename T > task_ptr& hetcompute::task_-ptr< ReturnType >::operator/= ( T && op )

Compound assignment operator /= with value operand.

See Also

template<typename T> task_ptr& operator+=(T&& op2)

9.5.1.6.3.9 template<typename ReturnType > task_ptr& hetcompute::task_ptr< ReturnType>::operator= ( task_ptr< return_type > const & other )

Assigns the task managed by other to ∗this. If, before the assignment, ∗this was the lasttask_ptr<ReturnType> pointing to a task t, then the assignment will cause t to be destroyed. Ifother manages no object, ∗this will also not manage an object after the assignment.

Parameters

other Task pointer to copy.

Returns

∗this.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 321

Page 322: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.5.1.6.3.10 template<typename ReturnType > task_ptr& hetcompute::task_ptr< ReturnType>::operator= ( task_ptr< return_type > && other )

Resets ∗this so that it manages no object. If, before the assignment, ∗this was the lasttask_ptr<ReturnType> pointing to a task t, then the assignment will cause t to be destroyed. Ifother manages no object, ∗this will also not manage an object after the assignment.

Returns

∗this.

9.5.1.6.3.11 template<typename ReturnType > template<typename T > task_ptr& hetcompute::task_-ptr< ReturnType >::operator∧= ( T && op )

Compound assignment operator ∧= with value operand.

See Also

template<typename T> task_ptr& operator+=(T&& op2)

9.5.1.6.3.12 template<typename ReturnType > template<typename T > task_ptr& hetcompute::task_-ptr< ReturnType >::operator|= ( T && op )

Compound assignment operator |= with value operand.

See Also

template<typename T> task_ptr& operator+=(T&& op2)

9.5.1.6.3.13 template<typename ReturnType > void hetcompute::task_ptr< ReturnType >::swap (task_ptr< return_type > & other )

Exchanges managed tasks between ∗this and other.

Parameters

other Task pointer to exchange with.

9.5.1.7 class hetcompute::task_ptr< ReturnType(Args...)>

template<typename ReturnType, typename... Args>class hetcompute::task_ptr< ReturnType(Args...)>

Smart pointer to a task object, i.e., hetcompute::task<ReturnType(Args...)>, similar tostd::shared_ptr.

Template Parameters

ReturnType Return type of the task function.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 322

Page 323: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Args... The type for the task parameters.

Public Types

• using args_tuple = typename task_type::args_tuple

• using return_type = typename task_type::return_type

• using size_type = typename task_type::size_type

• using task_type = task< ReturnType(Args...)>

Public member functions

• task_ptr ()

Default constructor. Constructs a task_ptr<ReturnType> with no task<ReturnType>.

• task_ptr (std::nullptr_t)

Default constructor. Constructs a task_ptr<ReturnType> with no task<ReturnType>.

• task_ptr (task_ptr< ReturnType(Args...)> const &other)

Copy constructor. Constructs a task_ptr<ReturnType> that manages the same task as other.

• task_ptr (task_ptr< ReturnType(Args...)> &&other)

Move constructor. Move-constructs a task_ptr<ReturnType> that manages the same task asother.

• ∼task_ptr ()

Destructor.

• task_type ∗ get () const

Returns pointer to managed task.

• task_type ∗ operator-> () const

Dereference operator. Returns pointer to managed task.

• task_ptr & operator= (task_ptr< ReturnType(Args...)> const &other)

Assignment operator. Assigns the task managed by other to ∗this.

• task_ptr & operator= (task_ptr< ReturnType(Args...)> &&other)

Assignment operator. Resets ∗this.

• void swap (task_ptr< ReturnType(Args...)> &other)

Exchanges managed tasks between ∗this and other.

Static Public Attributes

• staticHETCOMPUTE_CONSTEXPR_CONSTsize_type arity = task_type::arity

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 323

Page 324: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.5.1.7.1 Member Typedef Documentation

9.5.1.7.1.1 template<typename ReturnType , typename... Args> using hetcompute::task_ptr<ReturnType(Args...)>::args_tuple = typename task_type::args_tuple

Tuple for the task object parameter types.

9.5.1.7.1.2 template<typename ReturnType , typename... Args> using hetcompute::task_ptr<ReturnType(Args...)>::return_type = typename task_type::return_type

Return type of the task function.

9.5.1.7.1.3 template<typename ReturnType , typename... Args> using hetcompute::task_ptr<ReturnType(Args...)>::size_type = typename task_type::size_type

Unsigned integral type.

9.5.1.7.1.4 template<typename ReturnType , typename... Args> using hetcompute::task_ptr<ReturnType(Args...)>::task_type = task<ReturnType(Args...)>

Task object type.

9.5.1.7.2 Constructors and Destructors

9.5.1.7.2.1 template<typename ReturnType , typename... Args> hetcompute::task_ptr< Return-Type(Args...)>::task_ptr ( )

Constructs a task_ptr<ReturnType> that manages no task<ReturnType>.task_ptr<ReturnType>::get returns nullptr.

9.5.1.7.2.2 template<typename ReturnType , typename... Args> hetcompute::task_ptr< Return-Type(Args...)>::task_ptr ( std::nullptr_t )

Constructs a task_ptr<ReturnType> that manages no task<ReturnType>.task_ptr<ReturnType>::get returns nullptr.

9.5.1.7.2.3 template<typename ReturnType , typename... Args> hetcompute::task_ptr< Return-Type(Args...)>::task_ptr ( task_ptr< ReturnType(Args...)> const & other )

Constructs a task_ptr<ReturnType> object that manages the same task<ReturnType> as other.If other points to nullptr, the newly built object also points to nullptr.

Parameters

other Task pointer to copy.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 324

Page 325: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.5.1.7.2.4 template<typename ReturnType , typename... Args> hetcompute::task_ptr< Return-Type(Args...)>::task_ptr ( task_ptr< ReturnType(Args...)> && other )

Constructs a task_ptr<ReturnType> object that manages the same task as other and resetsother. If other points to nullptr, the newly built object also points to nullptr.

Parameters

other Task pointer to move from.

9.5.1.7.2.5 template<typename ReturnType , typename... Args> hetcompute::task_ptr< Return-Type(Args...)>::∼task_ptr ( )

Destructor.

9.5.1.7.3 Member Function Documentation

9.5.1.7.3.1 template<typename ReturnType , typename... Args> task_type∗ hetcompute::task_ptr<ReturnType(Args...)>::get ( ) const

Returns pointer to the managed task. Remember that the lifetime of the task is defined by the lifetime of thetask_ptr<ReturnType> objects managing it. If all task_ptr<ReturnType> objects managinga task t go out of scope, all task<ReturnType>∗ pointing to t may be invalid.

Returns

Pointer to managed task object.

9.5.1.7.3.2 template<typename ReturnType , typename... Args> task_type∗ hetcompute::task_ptr<ReturnType(Args...)>::operator-> ( ) const

Returns pointer to managed task. Do not call this member function if ∗this manages no task.

Returns

Pointer to managed task object.

9.5.1.7.3.3 template<typename ReturnType , typename... Args> task_ptr& hetcompute::task_ptr<ReturnType(Args...)>::operator= ( task_ptr< ReturnType(Args...)> const & other )

Assigns the task managed by other to ∗this. If, before the assignment, ∗this was the lasttask_ptr<ReturnType> pointing to a task t, then the assignment will cause t to be destroyed. Ifother manages no object, ∗this will also not manage an object after the assignment.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 325

Page 326: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Parameters

other Task pointer to copy.

Returns

∗this.

9.5.1.7.3.4 template<typename ReturnType , typename... Args> task_ptr& hetcompute::task_ptr<ReturnType(Args...)>::operator= ( task_ptr< ReturnType(Args...)> && other )

Resets ∗this so that it manages no object. If, before the assignment, ∗this was the lasttask_ptr<ReturnType> pointing to a task t, then the assignment will cause t to be destroyed. Ifother manages no object, ∗this will also not manage an object after the assignment.

Returns

∗this.

9.5.1.7.3.5 template<typename ReturnType , typename... Args> void hetcompute::task_ptr<ReturnType(Args...)>::swap ( task_ptr< ReturnType(Args...)> & other )

Exchanges managed tasks between ∗this and other.

Parameters

other Task pointer to exchange with.

9.5.1.7.4 Member Data Documentation

9.5.1.7.4.1 template<typename ReturnType , typename... Args> HETCOMPUTE_CONSTEXPR_CONSTsize_type hetcompute::task_ptr< ReturnType(Args...)>::arity = task_type::arity [static]

Number of parameters.

9.5.1.8 class hetcompute::task_ptr< void >

template<>class hetcompute::task_ptr< void >

Smart pointer to a task object, i.e., hetcompute::task<void>, similar to std::shared_ptr.

Public Types

• using task_type = task< void >

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 326

Page 327: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Public member functions

• task_ptr ()

Default constructor. Constructs a task_ptr<void> with no task<void>.

• task_ptr (std::nullptr_t)

Default constructor. Constructs a task_ptr<void> with no task<void>.

• task_ptr (task_ptr< void > const &other)

Copy constructor. Constructs a task_ptr<void> that manages the same task as other.

• task_ptr (task_ptr< void > &&other)

Move constructor. Move-constructs a task_ptr<void> that manages the same task as other.

• ∼task_ptr ()

• task_type ∗ get () const

Returns pointer to managed task.

• task_type ∗ operator-> () const

Dereference operator. Returns pointer to managed task.

• task_ptr & operator= (task_ptr< void > const &other)

Assignment operator. Assigns the task managed by other to ∗this.

• task_ptr & operator= (task_ptr< void > &&other)

Assignment operator. Resets ∗this.

• void swap (task_ptr< void > &other)

Exchanges managed tasks between ∗this and other.

Protected Member Functions

• task_ptr (::hetcompute::internal::task ∗t,::hetcompute::internal::task_shared_ptr::ref_policy policy)

9.5.1.8.1 Member Typedef Documentation

9.5.1.8.1.1 using hetcompute::task_ptr< void >::task_type = task<void>

Task object type.

9.5.1.8.2 Constructors and Destructors

9.5.1.8.2.1 hetcompute::task_ptr< void >::task_ptr ( )

Constructs a task_ptr<void> that manages no task<void>. task_ptr<void>::get returnsnullptr.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 327

Page 328: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.5.1.8.2.2 hetcompute::task_ptr< void >::task_ptr ( std::nullptr_t )

Constructs a task_ptr<void> that manages no task<void>. task_ptr<void>::get returnsnullptr.

9.5.1.8.2.3 hetcompute::task_ptr< void >::task_ptr ( task_ptr< void > const & other )

Constructs a task_ptr<void> object that manages the same task<void> as other. If other pointsto nullptr, the newly built object also points to nullptr.

Parameters

other Task pointer to copy.

9.5.1.8.2.4 hetcompute::task_ptr< void >::task_ptr ( task_ptr< void > && other )

Constructs a task_ptr<void> object that manages the same task as other and resets other. Ifother points to nullptr, the newly built object also points to nullptr.

Parameters

other Task pointer to move from.

9.5.1.8.2.5 hetcompute::task_ptr< void >::∼task_ptr ( )

Destructor.

9.5.1.8.3 Member Function Documentation

9.5.1.8.3.1 task_type∗ hetcompute::task_ptr< void >::get ( ) const

Returns pointer to the managed task. Remember that the lifetime of the task is defined by the lifetime of thetask_ptr<void> objects managing it. If all task_ptr<void> objects managing a task t go out ofscope, all task<void>∗ pointing to t may be invalid.

Returns

Pointer to managed task object.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 328

Page 329: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.5.1.8.3.2 task_type∗ hetcompute::task_ptr< void >::operator-> ( ) const

Returns pointer to managed task. Do not call this member function if ∗this manages no task.

Returns

Pointer to managed task object.

9.5.1.8.3.3 task_ptr& hetcompute::task_ptr< void >::operator= ( task_ptr< void > const & other )

Assigns the task managed by other to ∗this. If, before the assignment, ∗this was the lasttask_ptr<void> pointing to a task t, then the assignment will cause t to be destroyed. If othermanages no object, ∗this will also not manage an object after the assignment.

Parameters

other Task pointer to copy.

Returns

∗this.

9.5.1.8.3.4 task_ptr& hetcompute::task_ptr< void >::operator= ( task_ptr< void > && other )

Resets ∗this so that it manages no object. If, before the assignment, ∗this was the lasttask_ptr<void> pointing to a task t, then the assignment will cause t to be destroyed. If othermanages no object, ∗this will also not manage an object after the assignment.

Returns

∗this.

9.5.1.8.3.5 void hetcompute::task_ptr< void >::swap ( task_ptr< void > & other )

Exchanges managed tasks between ∗this and other.

Parameters

other Task pointer to exchange with.

9.5.1.9 class hetcompute::task_ptr<>

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 329

Page 330: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

template<>class hetcompute::task_ptr<>

Smart pointer to a task object, i.e., hetcompute::task<>, similar to std::shared_ptr.

Public Types

• using task_type = task<>

Public member functions

• task_ptr ()

Default constructor. Constructs a task_ptr<> with no task<>.

• task_ptr (std::nullptr_t)

Default constructor. Constructs a task_ptr<> with no task<>.

• task_ptr (task_ptr const &other)

Copy constructor. Constructs a task_ptr<> that manages the same task as other.

• task_ptr (task_ptr &&other)

Move constructor. Move-constructs a task_ptr<> that manages the same task as other.

• ∼task_ptr ()

• task_type ∗ get () const

Returns the pointer to the managed task.

• operator bool () const

Checks whether pointer is not nullptr.

• task_type ∗ operator-> () const

Dereference operator. Returns pointer to managed task.

• task_ptr & operator= (task_ptr const &other)

Assignment operator. Assigns the task managed by other to ∗this.

• task_ptr & operator= (std::nullptr_t)

Assignment operator. Resets ∗this.

• task_ptr & operator= (task_ptr &&other)

Move-assignment operator. Move-assigns the task managed by other to ∗this.

• void reset ()

Resets the pointer to the managed task.

• void swap (task_ptr &other)

Exchanges managed tasks between ∗this and other.

• bool unique () const

Returns if this is the only task_ptr managing the underlying task.

• size_t use_count () const

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 330

Page 331: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Returns the number of task_ptr<> objects managing the same object (including ∗this).

9.5.1.9.1 Member Typedef Documentation

9.5.1.9.1.1 using hetcompute::task_ptr<>::task_type = task<>

Task Object type

9.5.1.9.2 Constructors and Destructors

9.5.1.9.2.1 hetcompute::task_ptr<>::task_ptr ( )

Constructs a task_ptr<> that manages no task<>. task_ptr<>::get returns nullptr.

9.5.1.9.2.2 hetcompute::task_ptr<>::task_ptr ( std::nullptr_t )

Constructs a task_ptr<> that manages no task<>. task_ptr<>::get returns nullptr.

9.5.1.9.2.3 hetcompute::task_ptr<>::task_ptr ( task_ptr<> const & other )

Constructs a task_ptr<> object that manages the same task<> as other. If other points tonullptr, the newly built object also points to nullptr.

Parameters

other Task pointer to copy.

9.5.1.9.2.4 hetcompute::task_ptr<>::task_ptr ( task_ptr<> && other )

Constructs a task_ptr<> object that manages the same task as other and resets other. If otherpoints to nullptr, the newly built object also points to nullptr.

Parameters

other Task pointer to move from.

9.5.1.9.2.5 hetcompute::task_ptr<>::∼task_ptr ( )

Default destructor.

9.5.1.9.3 Member Function Documentation

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 331

Page 332: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.5.1.9.3.1 task_type∗ hetcompute::task_ptr<>::get ( ) const

Returns pointer to the managed task. Remember that the lifetime of the task is defined by the lifetime of thetask_ptr<> objects managing it. If all task_ptr<> objects managing a task t go out of scope, alltask<>∗ pointing to t may be invalid.

Returns

Pointer to managed task object.

9.5.1.9.3.2 hetcompute::task_ptr<>::operator bool ( ) const [explicit]

Checks whether ∗this manages a task.

Returns

true – The pointer is not nullptr (∗this manages a task).false – The pointer is nullptr (∗this does not manage a task).

9.5.1.9.3.3 task_type∗ hetcompute::task_ptr<>::operator-> ( ) const

Returns pointer to managed task. Do not call this member function if ∗this manages no task.

Returns

Pointer to managed task object.

9.5.1.9.3.4 task_ptr& hetcompute::task_ptr<>::operator= ( task_ptr<> const & other )

Assigns the task managed by other to ∗this. If, before the assignment, ∗this was the lasttask_ptr<> pointing to a task t, then the assignment will cause t to be destroyed. If other managesno object, ∗this will also not manage an object after the assignment.

Parameters

other Task pointer to copy.

Returns

∗this.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 332

Page 333: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.5.1.9.3.5 task_ptr& hetcompute::task_ptr<>::operator= ( std::nullptr_t )

Resets ∗this so that it manages no object. If, before the assignment, ∗this was the last task_ptr<>pointing to a task t, then the assignment will cause t to be destroyed. ∗this will also not manage anobject after the assignment.

Returns

∗this.

9.5.1.9.3.6 task_ptr& hetcompute::task_ptr<>::operator= ( task_ptr<> && other )

Move-assigns the task managed by other to ∗this. other will manage no task after the assignment.

If, before the assignment, ∗this was the last task_ptr<> pointing to a task t, then the assignment willcause t to be destroyed. If other manages no object, ∗this will also not manage an object after theassignment.

Parameters

other Task pointer to move from.

Returns

∗this.

9.5.1.9.3.7 void hetcompute::task_ptr<>::reset ( )

Resets pointer to managed task. If, ∗this was the last task_ptr<> pointing to a task t, then reset()cause g to be destroyed.

Returns

Pointer to managed task object.

9.5.1.9.3.8 void hetcompute::task_ptr<>::swap ( task_ptr<> & other )

Exchanges managed tasks between ∗this and other.

Parameters

other Task pointer to exchange with.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 333

Page 334: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.5.1.9.3.9 bool hetcompute::task_ptr<>::unique ( ) const

Returns if this is the only task_ptr object managing the underlying task.

Returns

A boolean indicating if task_ptr uniquely manages the underlying task

9.5.1.9.3.10 size_t hetcompute::task_ptr<>::use_count ( ) const

Returns the number of task_ptr<> objects managing the same object (including ∗this).

1 #include <cassert>2 #include <hetcompute/hetcompute.hh>34 int5 main()6 {7 hetcompute::runtime::init();8 std::atomic<bool> running(false);9 std::atomic<bool> finish(false);1011 // Create task t.12 auto t = hetcompute::create_task([&running, &finish] {13 running = true;14 while (!finish)15 {16 };17 });1819 // t’s use_count should be 1.20 HETCOMPUTE_ILOG("After construction: t.use_count() = %zu\n", t.use_count());2122 // Copy-construct t2 from t. t and t2’s use_count is 2.23 auto t2 = t;24 HETCOMPUTE_ILOG("After copy-construction: t2.use_count() = %zu\n", t2.use_count());2526 auto t3 = t.get();27 HETCOMPUTE_ILOG("After calling t.get(). t.use_count() = %zu\n", t.use_count());2829 // t’s use_count should be 2.30 HETCOMPUTE_ILOG("After t->wait_for: t.use_count() = %zu\n", t.use_count());3132 assert(t3 != nullptr);33 HETCOMPUTE_UNUSED(t3);34 hetcompute::runtime::shutdown();35 return 0;36 }

Output

After construction: t.use_count() = 1After copy-construction: t2.use_count() = 2After calling t.get(). t.use_count() = 2After t->wait_for: t.use_count() = 2

Returns

Total number of task_ptr<> points to the same task.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 334

Page 335: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.5.2 Typedef Documentation

9.5.2.1 template<typename Fn > using hetcompute::collapsed_task_type = typedeftypename ::hetcompute::internal::task_factory<Fn>::collapsed_task_type

Collapsed task type.

9.5.2.2 template<typename Fn > using hetcompute::non_collapsed_task_type =typedef typename ::hetcompute::internal::task_factory<Fn>::non_collapsed_-task_type

Non-collapsed task type.

9.5.3 Function Documentation

9.5.3.1 void hetcompute::abort_on_cancel ( )

HetCompute uses cooperative multitasking. Therefore, it cannot abort an executing task without help fromthe task. In HETCOMPUTE, each executing task is responsible for periodically checking whether it shouldabort. Thus, tasks call hetcompute::abort_on_cancel() to test whether they, or any of the groupsto which they belong, have been canceled. If true, hetcompute::abort_on_cancel() does notreturn. Instead, it throws hetcompute::abort_task_exception, which the HetCompute runtimecatches. The runtime then transitions the task to a canceled state and propagates cancellation to the task’ssuccessors, if any.

Because hetcompute::abort_on_cancel() does not return if the task has been canceled, werecommend that you use use RAII to allocate and deallocate the resources used inside a task. If using RAIIin your code is not an option, surround hetcompute::abort_on_cancel() with try – catch, andcall throw from within the catch block after the cleanup code.

Exceptions

<code>abort_task_-exception</code>

If called from a task that has been canceled viahetcompute::cancel() or that belongs to a canceled group.

<code>api_-exception</code>

If called from outside a task.

Note

If exceptions are disabled in application, will terminate the app if called from outside a task. Anothercaveat to note with usage of abort_on_cancel with exceptions disabled is that the applicationcode can get sandwidched between functions that are able to handle exceptions resulting in impropercleanups in the function where exceptions are disabled.

Example 1

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();7

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 335

Page 336: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

8 // Create task9 auto t = hetcompute::create_task([] {10 size_t num_iters = 0;11 while (1)12 {13 HETCOMPUTE_ILOG("Task has executed %zu iterations!", num_iters);1415 // Check whether the task needs to stop execution.16 // Without abort_on_cancel() the task would never17 // return18 hetcompute::abort_on_cancel();1920 usleep(30);21 num_iters++;22 }23 });2425 // Create group g26 auto g = hetcompute::create_group("example group");2728 // Launch t into g.29 g->launch(t);30 // We don’t use t after launch(), so we can reset the shared pointer31 t.reset();3233 // Wait for the task to execute a few iterations34 usleep(200);3536 // Cancel group g, and wait for t to complete37 g->cancel();3839 try40 {41 g->wait_for();42 }43 catch (const hetcompute::canceled_exception&)44 {45 // Do nothing46 }47 catch (...)48 {49 // Do nothing50 }5152 hetcompute::runtime::shutdown();53 return 0;54 }

Output

Task has executed 1 iterations!Task has executed 2 iterations!...Task has executed 47 iterations!

Example 2

1 #include <cassert>23 #include <hetcompute/hetcompute.hh>45 int6 main()7 {8 hetcompute::runtime::init();910 auto t = hetcompute::create_task([] {11 while (1)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 336

Page 337: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

12 {13 try14 {15 hetcompute::abort_on_cancel();16 }17 catch (const hetcompute::abort_task_exception&)18 {19 //..do cleanup20 throw;21 }22 catch (...)23 {24 //..do cleanup25 throw;26 }27 // HETCOMPUTE_ILOG("Waiting to be canceled.\n");28 usleep(10);29 }30 assert(false); // This will never fire31 });3233 // Launch t34 t->launch();3536 // Wait for 20 micro-seconds.37 usleep(20);3839 // Cancel task. Returns immediately.40 t->cancel();4142 try43 {44 // Wait for the task to complete.45 t->wait_for();46 }47 catch (const hetcompute::canceled_exception& e)48 {49 std::cout << e.what() << " thrown" << std::endl;50 }51 catch (...)52 {53 // Never reached54 }5556 hetcompute::runtime::shutdown();57 return 0;58 }

9.5.3.2 void hetcompute::abort_task ( )

Use this method from within a running task to immediately abort it and all its successors.hetcompute::abort_task() never returns. Instead, it throwshetcompute::abort_task_exception, which the HetCompute runtime catches. The runtime thentransitions the task to a canceled state and propagates propagation to the task’s successors, if any.

Exceptions

abort_task_exception If called from a task has been canceled via hetcompute::cancel() or a taskthat belongs to a canceled group.

api_exception If called from outside a task.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 337

Page 338: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Note

If exceptions are disabled in application, will terminate the app if called from outside a task

Example

1 #include <cassert>23 #include <hetcompute/hetcompute.hh>45 int6 main()7 {8 hetcompute::runtime::init();910 auto t1 = hetcompute::create_task([] {11 int i = 0;12 while (true)13 {14 HETCOMPUTE_ILOG("Hello World %d\n", i);15 sleep(1);16 i++;17 if (i == 10)18 {19 hetcompute::abort_task();20 }21 }22 // This will never fire23 assert(false);24 });2526 auto t2 = hetcompute::create_task([] {27 // This will never fire28 assert(false);29 });3031 t1 >> t2;3233 // Launch tasks34 t1->launch();35 t2->launch();3637 try38 {39 // Wait for t1 to complete.40 t1->wait_for();41 }42 catch (const hetcompute::canceled_exception& e)43 {44 std::cout << e.what() << " thrown when syncing with t1" << std::endl;45 }46 catch (...)47 {48 // Never reached49 }5051 try52 {53 // Returns immediately, t2 is canceled.54 t2->wait_for();55 }56 catch (const hetcompute::canceled_exception& e)57 {58 std::cout << e.what() << " thrown when syncing with t2" << std::endl;59 }60 catch (...)61 {62 // Never reached63 }64

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 338

Page 339: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

65 hetcompute::runtime::shutdown();66 return 0;67 }

Output

Hello World!Hello World!..Hello World!

9.5.3.3 template<typename Task > hetcompute::internal::by_data_dep_t<Task&&>hetcompute::bind_as_data_dependency ( Task && t )

Explicitly bind a hetcompute::task_ptr<...> or hetcompute::task<...>∗ as data dependency. The type ofthe return value of the hetcompute::task_ptr<...> or the hetcompute::task<...>∗ should match thecorrepsonding parameter type for the task to bind.

Parameters

t a hetcompute::task_ptr or hetcompute::task∗ which has thereturn type information.

9.5.3.4 template<typename Task > hetcompute::internal::by_value_t<Task&&>hetcompute::bind_by_value ( Task && t )

Explicitly bind a hetcompute::task_ptr<...> or hetcompute::task<...>∗ by value. Thetype of the hetcompute::task_ptr<...> or the hetcompute::task<...>∗ should matchthe corresponding parameter type for the task to bind.

Parameters

t a hetcompute::task_ptr or hetcompute::task∗, which has thereturn type information.

9.5.3.5 template<typename BlockingFunction , typename CancelFunction > voidhetcompute::blocking ( BlockingFunction && bf, CancelFunction && cf )

Used to enclose user-code that blocks on external activity and needs to be cancelable when an enclosingtask gets canceled.

A function/functor containing the blocking code bf is executed immediately. If cancellation isasynchronously requested for the enclosing task while bf is currently executing, the cancellation handlerfunction/functor cf is asynchronously executed. Once bf completes, blocking throwshetcompute::canceled_exception if task cancellation was requested.

If cancellation of the task had already been requested prior to the execution of blocking, blockingimmediately throws hetcompute::canceled_exception without executing bf or cf.

The programmer must write bf and cf to satisfy the following requirements:

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 339

Page 340: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

1. cf can be safely called concurrently with bf.

2. The blocked work inside bf must somehow be able to unblock and resume execution when signalledto do so by cf.

Example

bf may block on a network access causing its thread to sleep. cf writes special data into the networkhandle causing bf to unblock.

auto t = hetcompute::create_task([&x, &handle](){

hetcompute::blocking([&]() { x = network_fetch(handle); }, //blocking code[&]() { write_spurious(handle); }, //cancellation handler);

// throws hetcompute::canceled_exception if encapsulating task is canceled

do_whole_bunch_of_work(x, ...);});

Note: It is not required that the blocking construct be enclosed in a task. Without an enclosing task bf willexecute as a normal function and cf will never be invoked.

Parameters

bf Function/functor/lambda with signature void(void) that encapsulatesblocking work.

cf Function/functor/lambda with signature void(void) that is capable ofcanceling blocked work in bf.

9.5.3.6 template<typename Code , typename... Args> collapsed_task_type<Code>hetcompute::create_task ( Code && code, Args &&... args )

Create a collapsed task out of Code and (optionally) bind all arguments.

Template Parameters

Code Type of the work for this task.Args... Parameter types of the task.

Parameters

code The work for the task. code can be Qualcomm HetCompute kernels (CPU,GPU, or DSP), a lambda expression, a function object, or a function pointer.

args Argument used to bind to the task (only supported by CPU tasks). If leftempty, no arguments will be bound to the task.

Returns

hetcompute::task_ptr<ReturnType(Params...)>, the task_ptr with full functionsignature.

1 #include <hetcompute/hetcompute.hh>

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 340

Page 341: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

23 int4 main()5 {6 hetcompute::runtime::init();7 // Create a task out of a lambda and bind the argument.8 auto t1 = hetcompute::create_task([](int x) { return x; }, 27);9 // Launch t1.10 t1->launch();11 // Wait for t1 to finish and show the return value.12 HETCOMPUTE_ILOG("t1->copy_value() = %d", t1->copy_value()); // Expect 27;1314 // Create a task out of a lambda and bind the argument later.15 auto t2 = hetcompute::create_task([](int x) { return x; });16 // Bind the argument before launch.17 t2->bind_all(42);18 // Launch t2.19 t2->launch();20 // Wait for t2 to finish and show the return value.21 HETCOMPUTE_ILOG("t2->copy_value() = %d", t2->copy_value()); // Expect 42;2223 // Create a cpu kernel out of a lambda.24 auto cpu_kn = hetcompute::create_cpu_kernel([](int x) { return x; });25 // Create a task out of a cpu kernel and bind the argument.26 auto t3 = hetcompute::create_task(cpu_kn, 73);27 // Launch t3.28 t3->launch();29 // Wait for t3 to finish and show the return value.30 HETCOMPUTE_ILOG("t3->copy_value() = %d", t3->copy_value()); // Expect 73;3132 // Create a collapsed task.33 // typeof(t4) = hetcompute::task_ptr<int(int)>34 auto t4 = hetcompute::create_task(35 [](int x) {36 // Create a task.37 // typeof(t) = hetcompute::task_ptr<int(int)>38 auto t = hetcompute::create_task([](int y) { return y; }, x);39 return t;40 },41 168);42 // Launch t4.43 t4->launch();44 // Wait for t4 to finish and show the return value.45 HETCOMPUTE_ILOG("t4->copy_value() = %d", t4->copy_value()); // Expect 168;4647 hetcompute::runtime::shutdown();48 return 0;49 }

See Also

hetcompute::task<ReturnType(Args...)>::bind_all

9.5.3.7 template<typename Code , typename... Args> non_collapsed_task_-type<Code> hetcompute::create_task ( do_not_collapse_t , Code && code,Args &&... args )

Create a non-collapsed task out of Code and (optionally) bind all arguments.

Template Parameters

Code Type of the work for this task.Args... Parameter types of the task.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 341

Page 342: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Parameters

code The work for the task. code can be Qualcomm HetCompute kernels (CPU,GPU, or DSP), a lambda expression, a function object, or a function pointer.

args Argument used to bind to the task (only supported by CPU tasks). If leftempty, no arguments will be bound to the task.

Returns

hetcompute::task_ptr<ReturnType(Params...)>, the task_ptr with full functionsignature.

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();78 // Create a non-collapsed task.9 // typeof(t) = hetcompute::task_ptr<hetcompute:task_ptr<int(int)>(int)>10 auto t = hetcompute::create_task(

hetcompute::do_not_collapse,11 [](int x) {12 // Create a task.13 // typeof(tt) = hetcompute::task_ptr<int(int)>14 auto tt = hetcompute::create_task([](int y)

{ return y; }, x);15 return tt;16 },17 271);1819 // Launch t.20 t->launch();2122 // Wait for t to finish and get the return value.23 auto tt = t->copy_value();2425 // Launch tt.26 tt->launch();2728 // Wait for tt to finish and show the return value.29 HETCOMPUTE_ILOG("tt->copy_value() = %d", tt->copy_value()); // Expect 271;3031 hetcompute::runtime::shutdown();3233 return 0;34 }

See Also

hetcompute::task<ReturnType(Args...)>::bind_all

9.5.3.8 template<typename ReturnType , typename... Args> ::hetcompute::task-_ptr<ReturnType> hetcompute::create_value_task ( Args &&... args)

Create a value task.

The task needs to be launched and will return immediately.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 342

Page 343: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

Template Parameters

ReturnType Return type of the task.Args... Parameter types of the task.

Parameters

args Arguments used to construct an object of type ReturnType.

Returns

hetcompute::task_ptr<ReturnType> whose return value is an object of typeReturnType constructed with args....

1 #include <hetcompute/hetcompute.hh>23 // User-defined type4 struct point2d5 {6 // Member variables7 int _x;8 int _y;910 // first constructor11 explicit point2d(int x) : _x(x), _y(0) {}1213 // second constructor14 point2d(int x, int y) : _x(x), _y(y) {}15 };1617 int18 main()19 {20 hetcompute::runtime::init();2122 // Create a value task returns an object of build-in type (int) of value 2.23 auto t = hetcompute::create_value_task<int>(2);24 // Launch t.25 t->launch();26 // Wait for t to finish.27 t->wait_for();28 HETCOMPUTE_ILOG("t->copy_value() = %d", t->copy_value()); // Expect 2;2930 int x = 5;31 // Create a value task returns an object of point2d constructed by the first constructor.32 auto t1 = hetcompute::create_value_task<point2d>(x);33 // Launch t1.34 t1->launch();35 // Wait for t1 to finish.36 t1->wait_for();3738 HETCOMPUTE_ILOG("t1->copy_value()._x = %d", t1->copy_value()._x); // Expect 5;39 HETCOMPUTE_ILOG("t1->copy_value()._y = %d", t1->copy_value()._y); // Expect 0;4041 int y = 6;42 x = 7;43 // Create a value task returns an object of point2d constructed by the 2nd constructor.44 auto t2 = hetcompute::create_value_task<point2d>(x, y);45 // Launch t2.46 t2->launch();47 // Wait for t2 to finish.48 t2->wait_for();4950 HETCOMPUTE_ILOG("t2->copy_value()._x = %d", t2->copy_value()._x); // Expect 7;51 HETCOMPUTE_ILOG("t2->copy_value()._y = %d", t2->copy_value()._y); // Expect 6;52

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 343

Page 344: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

53 hetcompute::runtime::shutdown();5455 return 0;56 }

9.5.3.9 void hetcompute::finish_after ( ::hetcompute::task<> ∗ task )

Specifies that the task invoking this function should be deemed to finish only after the task finishes. Thismethod returns immediately.

If the invoking task is multi-threaded, the programmer must ensure that concurrent calls to finish_after fromwithin the task are properly synchronized.

Do not call this function if task is nullptr. It would cause a fatal error.

Parameters

task Task after which invoking task is deemed to finish. Can’t be nullptr

Exceptions

api_exception If invoked from outside a task or from within a hetcompute::pfor_each or if’task’ points to null.

9.5.3.10 template<typename Code , typename... Args> collapsed_task_type<Code>hetcompute::launch ( Code && code, Args &&... args )

Create a collapsed task out of Code, bind all arguments, if any exist (mandatory), and launch the task.

Template Parameters

Code Type of the work for this task.Args... Parameter types of the task.

Parameters

code The work for the task. code can be Qualcomm HetCompute kernels (CPU,GPU, or DSP), a lambda expression, a function object, or a function pointer.

args Argument used to bind to the task (only supported by CPU tasks). If leftempty, no arguments will be bound to the task.

Returns

hetcompute::task_ptr<ReturnType(Params...)>, the task_ptr with full functionsignature.

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();7 // Create a task out of a lambda, bind the argument, and launch.8 auto t1 = hetcompute::launch([](int x) { return x; }, 27);

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 344

Page 345: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9 // Wait for t1 to finish and show the return value.10 HETCOMPUTE_ILOG("t1->copy_value() = %d", t1->copy_value()); // Expect 27;1112 // Create a cpu kernel out of a lambda.13 auto cpu_kn = hetcompute::create_cpu_kernel([](int x) { return x; });14 // Create a task out of a cpu kernel, bind the argument and launch.15 auto t2 = hetcompute::launch(cpu_kn, 73);16 // Wait for t3 to finish and show the return value.17 HETCOMPUTE_ILOG("t2->copy_value() = %d", t2->copy_value()); // Expect 73;1819 // Create a collapsed task, bind the argument, and launch.20 // typeof(t3) = hetcompute::task_ptr<int(int)>21 auto t3 = hetcompute::launch(22 [](int x) {23 // Create a task.24 // typeof(t) = hetcompute::task_ptr<int(int)>25 auto t = hetcompute::create_task([](int y) { return y; }, x);26 return t;27 },28 168);2930 // Wait for t3 to finish and show the return value.31 HETCOMPUTE_ILOG("t3->copy_value() = %d", t3->copy_value()); // Expect 168;3233 hetcompute::runtime::shutdown();34 return 0;35 }

See Also

hetcompute::create_task(Code&&, Args&&...)

9.5.3.11 template<typename Code , typename... Args> non_collapsed_task_-type<Code> hetcompute::launch ( do_not_collapse_t , Code && code,Args &&... args )

Create a non-collapsed task out of Code, bind all arguments, if any exist (mandatory), and launch the task.

Template Parameters

Code Type of work for this task.Args... Parameter types of the task.

Parameters

code The work for the task. code can be Qualcomm HetCompute kernels (CPU,GPU, or DSP), a lambda expression, a function object, or a function pointer.

args Argument used to bind to the task (only supported by CPU tasks). If leftempty, no arguments will be bound to the task.

Returns

hetcompute::task_ptr<ReturnType(Params...)>, the task_ptr with full functionsignature.

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 345

Page 346: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

6 hetcompute::runtime::init();7 // Create a non-collapsed task, bind the argument, and launch.8 // typeof(t) = hetcompute::task_ptr<hetcompute:task_ptr<int(int)>(int)>9 auto t = hetcompute::launch(hetcompute::do_not_collapse,10 [](int x) {11 // Create a task, bind the argument and launch.12 // typeof(tt) = hetcompute::task_ptr<int(int)>13 auto tt = hetcompute::launch([](int y) { return y; },

x);14 return tt;15 },16 271);1718 // Wait for t to finish and get the return value.19 auto tt = t->copy_value();2021 // Launch tt.22 tt->launch();23 // Wait for tt to finish and show the return value.24 HETCOMPUTE_ILOG("tt->copy_value() = %d", tt->copy_value()); // Expect 271;2526 hetcompute::runtime::shutdown();27 return 0;28 }

See Also

hetcompute::create_task(do_not_collapse_t, Code&&, Args&&...)

9.5.3.12 bool hetcompute::operator!= ( ::hetcompute::task_ptr<> const & t,std::nullptr_t )

Compares task t to nullptr.

Returns

true – The pointer is nullptr (∗this does not manage a task). false – The pointer is notnullptr (∗this manages a task).

9.5.3.13 bool hetcompute::operator!= ( std::nullptr_t , ::hetcompute::task_ptr<>

const & t )

Compares task t to nullptr.

Returns

true – The pointer is nullptr (∗this does not manage a task). false – The pointer is notnullptr (∗this manages a task).

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 346

Page 347: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.5.3.14 bool hetcompute::operator!= ( ::hetcompute::task_ptr<> const & a,::hetcompute::task_ptr<> const & b )

Compares task a to task b.

Returns

true – Task a is not the same as task b. false – Task a is the same as task b.

9.5.3.15 template<typename T1 , typename T2 > inline ::hetcompute::task_-ptr<decltype(std::declval<typename ::hetcompute::task_ptr<T1>::return_-type>) % std::declval<typename ::hetcompute::task_ptr<T2>::return_-type>))> hetcompute::operator% ( const ::hetcompute::task_ptr< T1 > & t1,const ::hetcompute::task_ptr< T2 > & t2 )

Algebraic binary operator % for tasks.

See Also

template<typename T1, typename T2> inline::hetcompute::task_ptr<decltype(std::declval<typename::hetcompute::task_ptr<T1>::return_type>()

• std::declval<typename ::hetcompute::task_ptr<T2>::return_type>())> operator+(const::hetcompute::task_ptr<T1>& t1, const ::hetcompute::task_ptr<T2>& t2)

9.5.3.16 template<typename T1 , typename T2 > inline ::hetcompute::task_-ptr<decltype(std::declval<typename ::hetcompute::task_ptr<T1>-::return_type>) % std::declval<T2>))> hetcompute::operator% ( const::hetcompute::task_ptr< T1 > & t1, T2 && op2 )

Algebraic binary operator % for tasks.

See Also

template<typename T1, typename T2> inline ::hetcompute::task_ptr<decltype(std::declval<T1>()

• std::declval<typename ::hetcompute::task_ptr<T2>::return_type>())> operator+(T1&& op1,const ::hetcompute::task_ptr<T2>& t2)

9.5.3.17 template<typename T1 , typename T2 > inline ::hetcompute::task_-ptr<decltype(std::declval<T1>) % std::declval<typename ::hetcompute-::task_ptr<T2>::return_type>))> hetcompute::operator% ( T1 && op1,const ::hetcompute::task_ptr< T2 > & t2 )

Algebraic binary operator % for tasks.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 347

Page 348: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

See Also

template<typename T1, typename T2> inline ::hetcompute::task_ptr<decltype(std::declval<T1>()

• std::declval<typename ::hetcompute::task_ptr<T2>::return_type>())> operator+(T1&& op1,const ::hetcompute::task_ptr<T2>& t2)

9.5.3.18 template<typename T1 , typename T2 > inline ::hetcompute::task_-ptr<decltype(std::declval<typename ::hetcompute::task_ptr<T1>::return_-type>) & std::declval<typename ::hetcompute::task_ptr<T2>::return_-type>))> hetcompute::operator& ( const ::hetcompute::task_ptr< T1 > & t1,const ::hetcompute::task_ptr< T2 > & t2 )

Algebraic binary operator & for tasks.

See Also

template<typename T1, typename T2> inline::hetcompute::task_ptr<decltype(std::declval<typename::hetcompute::task_ptr<T1>::return_type>()

• std::declval<typename ::hetcompute::task_ptr<T2>::return_type>())> operator+(const::hetcompute::task_ptr<T1>& t1, const ::hetcompute::task_ptr<T2>& t2)

9.5.3.19 template<typename T1 , typename T2 > inline ::hetcompute::task_-ptr<decltype(std::declval<typename ::hetcompute::task_ptr<T1>-::return_type>) & std::declval<T2>))> hetcompute::operator& ( const::hetcompute::task_ptr< T1 > & t1, T2 && op2 )

Algebraic binary operator & for tasks.

See Also

template<typename T1, typename T2> inline ::hetcompute::task_ptr<decltype(std::declval<T1>()

• std::declval<typename ::hetcompute::task_ptr<T2>::return_type>())> operator+(T1&& op1,const ::hetcompute::task_ptr<T2>& t2)

9.5.3.20 template<typename T1 , typename T2 > inline ::hetcompute::task_-ptr<decltype(std::declval<T1>) & std::declval<typename ::hetcompute-::task_ptr<T2>::return_type>))> hetcompute::operator& ( T1 && op1, const::hetcompute::task_ptr< T2 > & t2 )

Algebraic binary operator & for tasks.

See Also

template<typename T1, typename T2> inline ::hetcompute::task_ptr<decltype(std::declval<T1>()

• std::declval<typename ::hetcompute::task_ptr<T2>::return_type>())> operator+(T1&& op1,const ::hetcompute::task_ptr<T2>& t2)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 348

Page 349: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.5.3.21 template<typename T1 , typename T2 > inline ::hetcompute::task_-ptr<decltype(std::declval<typename ::hetcompute::task_ptr<T1>::return_-type>) ∗ std::declval<typename ::hetcompute::task_ptr<T2>::return_-type>))> hetcompute::operator∗ ( const ::hetcompute::task_ptr< T1 > & t1,const ::hetcompute::task_ptr< T2 > & t2 )

Algebraic binary operator ∗ for tasks.

See Also

template<typename T1, typename T2> inline::hetcompute::task_ptr<decltype(std::declval<typename::hetcompute::task_ptr<T1>::return_type>()

• std::declval<typename ::hetcompute::task_ptr<T2>::return_type>())> operator+(const::hetcompute::task_ptr<T1>& t1, const ::hetcompute::task_ptr<T2>& t2)

9.5.3.22 template<typename T1 , typename T2 > inline ::hetcompute::task_-ptr<decltype(std::declval<typename ::hetcompute::task_ptr<T1>-::return_type>) ∗ std::declval<T2>))> hetcompute::operator∗ ( const::hetcompute::task_ptr< T1 > & t1, T2 && op2 )

Algebraic binary operator ∗ for tasks.

See Also

template<typename T1, typename T2> inline ::hetcompute::task_ptr<decltype(std::declval<T1>()

• std::declval<typename ::hetcompute::task_ptr<T2>::return_type>())> operator+(T1&& op1,const ::hetcompute::task_ptr<T2>& t2)

9.5.3.23 template<typename T1 , typename T2 > inline ::hetcompute::task_-ptr<decltype(std::declval<T1>) ∗ std::declval<typename ::hetcompute::task-_ptr<T2>::return_type>))> hetcompute::operator∗ ( T1 && op1, const::hetcompute::task_ptr< T2 > & t2 )

Algebraic binary operator ∗ for tasks.

See Also

template<typename T1, typename T2> inline ::hetcompute::task_ptr<decltype(std::declval<T1>()

• std::declval<typename ::hetcompute::task_ptr<T2>::return_type>())> operator+(T1&& op1,const ::hetcompute::task_ptr<T2>& t2)

9.5.3.24 template<typename T > inline ::hetcompute::task_ptr<typename::hetcompute::task_ptr<T>::return_type> hetcompute::operator+ ( const::hetcompute::task_ptr< T > & t )

Algebraic unary operator + for task.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 349

Page 350: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

See Also

template<typename T> inline ::hetcompute::task_ptr<typename::hetcompute::task_ptr<T>::return_type> operator-(const ::hetcompute::task_ptr<T>& t);

9.5.3.25 template<typename T1 , typename T2 > inline ::hetcompute::task_-ptr<decltype(std::declval<typename ::hetcompute::task_ptr<T1>::return_-type>) + std::declval<typename ::hetcompute::task_ptr<T2>::return_-type>))> hetcompute::operator+ ( const ::hetcompute::task_ptr< T1 > & t1,const ::hetcompute::task_ptr< T2 > & t2 )

Algebraic binary operator + for tasks.

Create a new task whose return value will be the result of this operator applied to the return values of task t1and task t2.

The new task will be data dependent on task t1 and t2.

The new task will be launching automatically by the runtime once the data are ready.

Note: the operator should be applicable onto the return values of task t1 and task t2.

Parameters

t1 First task operand (should have return value).t2 Second task operand (should have return value).

Returns

A new task whose return value is the result of this operator.

1 #include <iostream>2 #include <hetcompute/hetcompute.hh>34 int5 main()6 {7 hetcompute::runtime::init();89 // create and launch a task that return -73 and 27.810 auto t1 = hetcompute::launch([]() { return 73; });11 auto t2 = hetcompute::launch([]() { return 27.8; });1213 // create a task whose return value is t1’s return value + t2’s return value14 // t is data dependent on t1 and t215 // t’s return type will be the same as t2’s (type promotion for +)16 auto t = t1 + t2;1718 // wait for t to finish and display the return value19 std::cout << "The return value of t is: " << t->copy_value() << std::endl;2021 hetcompute::runtime::shutdown();22 }

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 350

Page 351: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.5.3.26 template<typename T1 , typename T2 > inline ::hetcompute::task_-ptr<decltype(std::declval<typename ::hetcompute::task_ptr<T1>-::return_type>) + std::declval<T2>))> hetcompute::operator+ ( const::hetcompute::task_ptr< T1 > & t1, T2 && op2 )

Algebraic binary operator + for task.

Create a new task whose return value will be the result of this operator applied to the return value of task t1and operand op2.

The new task will be data dependent on task t1.

The new task will be launching automatically by the runtime once the data is ready.

Note: the operator should be applicable onto the return value of task t1 and operand op2.

Parameters

t1 Task operand (should have return value).op2 Value operand.

Returns

A new task whose return value is the result of this operator.

1 #include <iostream>2 #include <hetcompute/hetcompute.hh>34 int5 main()6 {7 hetcompute::runtime::init();89 // create and launch a task that return -7310 auto t1 = hetcompute::launch([]() { return -73; });1112 // create a task whose return value is t1’s return value + 10013 // t is data dependent on t114 auto t = t1 + 100;1516 // wait for t to finish and display the return value17 std::cout << "The return value of t is: " << t->copy_value() << std::endl;1819 hetcompute::runtime::shutdown();20 }

9.5.3.27 template<typename T1 , typename T2 > inline ::hetcompute::task_-ptr<decltype(std::declval<T1>) + std::declval<typename ::hetcompute-::task_ptr<T2>::return_type>))> hetcompute::operator+ ( T1 && op1, const::hetcompute::task_ptr< T2 > & t2 )

Algebraic binary operator + for tasks.

Create a new task whose return value will be the result of this operator applied to the return value ofoperand op1 and task t2.

The new task will be data dependent on task t2.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 351

Page 352: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

The new task will be launching automatically by the runtime once the data is ready.

Note: the operator should be applicable onto the return value of operand op1 and task t2.

Parameters

op1 Value operand.t2 Task operand (should have return value).

Returns

A new task whose return value is the result of this operator.

1 #include <iostream>2 #include <hetcompute/hetcompute.hh>34 int5 main()6 {7 hetcompute::runtime::init();89 // create and launch a task that return -7310 auto t2 = hetcompute::launch([]() { return -73; });1112 // create a task whose return value is 100 + t2’s return value13 // t is data dependent on t214 auto t = 100 + t2;1516 // wait for t to finish and display the return value17 std::cout << "The return value of t is: " << t->copy_value() << std::endl;1819 hetcompute::runtime::shutdown();20 }

9.5.3.28 template<typename T > inline ::hetcompute::task_ptr<typename::hetcompute::task_ptr<T>::return_type> hetcompute::operator- ( const::hetcompute::task_ptr< T > & t )

Algebraic unary operator - for task.

Create a new task whose return value will be the result of this operator applied to the return value of task t.

The new task will be data dependent on task t.

The new task will be launching automatically by the runtime once the data is ready.

Note: the operator should be appliable onto the return value of task t

Parameters

t Task operand (should have return value).

Returns

A new task whose return value is the result of this operator.

1 #include <iostream>2 #include <hetcompute/hetcompute.hh>3

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 352

Page 353: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

4 int5 main()6 {7 hetcompute::runtime::init();89 // create a task returns 7310 auto t = hetcompute::create_task([]() { return 73; });11 // launch the task12 t->launch();1314 // create a task whose return value is the negation of the return value of t15 // t1 is data dependent on t16 auto t1 = -t;1718 // wait for t1 to finish and display the return value19 std::cout << "The return value of t1 is: " << t1->copy_value() << std::endl;2021 hetcompute::runtime::shutdown();22 }

9.5.3.29 template<typename T1 , typename T2 > inline ::hetcompute::task_-ptr<decltype(std::declval<typename ::hetcompute::task_ptr<T1>::return-_type>) - std::declval<typename ::hetcompute::task_ptr<T2>::return_-type>))> hetcompute::operator- ( const ::hetcompute::task_ptr< T1 > & t1,const ::hetcompute::task_ptr< T2 > & t2 )

Algebraic binary operator - for tasks.

See Also

template<typename T1, typename T2> inline::hetcompute::task_ptr<decltype(std::declval<typename::hetcompute::task_ptr<T1>::return_type>()

• std::declval<typename ::hetcompute::task_ptr<T2>::return_type>())> operator+(const::hetcompute::task_ptr<T1>& t1, const ::hetcompute::task_ptr<T2>& t2)

9.5.3.30 template<typename T1 , typename T2 > inline ::hetcompute::task_-ptr<decltype(std::declval<typename ::hetcompute::task_ptr<T1>-::return_type>) - std::declval<T2>))> hetcompute::operator- ( const::hetcompute::task_ptr< T1 > & t1, T2 && op2 )

Algebraic binary operator - for tasks.

See Also

template<typename T1, typename T2> inline ::hetcompute::task_ptr<decltype(std::declval<T1>()

• std::declval<typename ::hetcompute::task_ptr<T2>::return_type>())> operator+(T1&& op1,const ::hetcompute::task_ptr<T2>& t2)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 353

Page 354: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.5.3.31 template<typename T1 , typename T2 > inline ::hetcompute::task_-ptr<decltype(std::declval<T1>) - std::declval<typename ::hetcompute::task-_ptr<T2>::return_type>))> hetcompute::operator- ( T1 && op1, const::hetcompute::task_ptr< T2 > & t2 )

Algebraic binary operator - for tasks.

See Also

template<typename T1, typename T2> inline ::hetcompute::task_ptr<decltype(std::declval<T1>()

• std::declval<typename ::hetcompute::task_ptr<T2>::return_type>())> operator+(T1&& op1,const ::hetcompute::task_ptr<T2>& t2)

9.5.3.32 template<typename T1 , typename T2 > inline ::hetcompute::task_-ptr<decltype(std::declval<typename ::hetcompute::task_ptr<T1>::return-_type>) / std::declval<typename ::hetcompute::task_ptr<T2>::return_-type>))> hetcompute::operator/ ( const ::hetcompute::task_ptr< T1 > & t1,const ::hetcompute::task_ptr< T2 > & t2 )

Algebraic binary operator / for tasks.

See Also

template<typename T1, typename T2> inline::hetcompute::task_ptr<decltype(std::declval<typename::hetcompute::task_ptr<T1>::return_type>()

• std::declval<typename ::hetcompute::task_ptr<T2>::return_type>())> operator+(const::hetcompute::task_ptr<T1>& t1, const ::hetcompute::task_ptr<T2>& t2)

9.5.3.33 template<typename T1 , typename T2 > inline ::hetcompute::task_-ptr<decltype(std::declval<typename ::hetcompute::task_ptr<T1>-::return_type>) / std::declval<T2>))> hetcompute::operator/ ( const::hetcompute::task_ptr< T1 > & t1, T2 && op2 )

Algebraic binary operator / for tasks.

See Also

template<typename T1, typename T2> inline ::hetcompute::task_ptr<decltype(std::declval<T1>()

• std::declval<typename ::hetcompute::task_ptr<T2>::return_type>())> operator+(T1&& op1,const ::hetcompute::task_ptr<T2>& t2)

9.5.3.34 template<typename T1 , typename T2 > inline ::hetcompute::task_-ptr<decltype(std::declval<T1>) / std::declval<typename ::hetcompute::task-_ptr<T2>::return_type>))> hetcompute::operator/ ( T1 && op1, const::hetcompute::task_ptr< T2 > & t2 )

Algebraic binary operator / for tasks.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 354

Page 355: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

See Also

template<typename T1, typename T2> inline ::hetcompute::task_ptr<decltype(std::declval<T1>()

• std::declval<typename ::hetcompute::task_ptr<T2>::return_type>())> operator+(T1&& op1,const ::hetcompute::task_ptr<T2>& t2)

9.5.3.35 bool hetcompute::operator== ( task_ptr<> const & t, std::nullptr_t )

Compares task t to nullptr.

Returns

true – The pointer is not nullptr (∗this manages a task).false – The pointer is nullptr (∗this does not manage a task).

9.5.3.36 bool hetcompute::operator== ( std::nullptr_t , task_ptr<> const & t )

Compares task t to nullptr.

Returns

true – The pointer is not nullptr (∗this manages a task).false – The pointer is nullptr (∗this does not manage a task).

9.5.3.37 bool hetcompute::operator== ( ::hetcompute::task_ptr<> const & a,::hetcompute::task_ptr<> const & b )

Compares task a to task b.

Returns

true – Task a is the same as task b. false – Task a is not the same as task b.

9.5.3.38 inline ::hetcompute::task_ptr& hetcompute::operator>> ( ::hetcompute-::task_ptr<> & pred, ::hetcompute::task_ptr<> & succ )

Set control dependency from pred to task succ.

See Also

hetcompute::task<>::then

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 355

Page 356: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.5.3.39 template<typename T1 , typename T2 > inline ::hetcompute::task_-ptr<decltype(std::declval<typename ::hetcompute::task_ptr<T1>::return_-type>) ∧ std::declval<typename ::hetcompute::task_ptr<T2>::return_-type>))> hetcompute::operator∧ ( const ::hetcompute::task_ptr< T1 > & t1,const ::hetcompute::task_ptr< T2 > & t2 )

Algebraic binary operator ∧ for tasks.

See Also

template<typename T1, typename T2> inline::hetcompute::task_ptr<decltype(std::declval<typename::hetcompute::task_ptr<T1>::return_type>()

• std::declval<typename ::hetcompute::task_ptr<T2>::return_type>())> operator+(const::hetcompute::task_ptr<T1>& t1, const ::hetcompute::task_ptr<T2>& t2)

9.5.3.40 template<typename T1 , typename T2 > inline ::hetcompute::task_-ptr<decltype(std::declval<typename ::hetcompute::task_ptr<T1>-::return_type>) ∧ std::declval<T2>))> hetcompute::operator∧ ( const::hetcompute::task_ptr< T1 > & t1, T2 && op2 )

Algebraic binary operator ∧ for tasks.

See Also

template<typename T1, typename T2> inline ::hetcompute::task_ptr<decltype(std::declval<T1>()

• std::declval<typename ::hetcompute::task_ptr<T2>::return_type>())> operator+(T1&& op1,const ::hetcompute::task_ptr<T2>& t2)

9.5.3.41 template<typename T1 , typename T2 > inline ::hetcompute::task_-ptr<decltype(std::declval<T1>) ∧ std::declval<typename ::hetcompute::task-_ptr<T2>::return_type>))> hetcompute::operator∧ ( T1 && op1, const::hetcompute::task_ptr< T2 > & t2 )

Algebraic binary operator ∧ for tasks.

See Also

template<typename T1, typename T2> inline ::hetcompute::task_ptr<decltype(std::declval<T1>()

• std::declval<typename ::hetcompute::task_ptr<T2>::return_type>())> operator+(T1&& op1,const ::hetcompute::task_ptr<T2>& t2)

9.5.3.42 template<typename T1 , typename T2 > inline ::hetcompute::task_-ptr<decltype(std::declval<typename ::hetcompute::task_ptr<T1>::return-_type>) | std::declval<typename ::hetcompute::task_ptr<T2>::return_-type>))> hetcompute::operator| ( const ::hetcompute::task_ptr< T1 > & t1,const ::hetcompute::task_ptr< T2 > & t2 )

Algebraic binary operator | for tasks.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 356

Page 357: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

See Also

template<typename T1, typename T2> inline::hetcompute::task_ptr<decltype(std::declval<typename::hetcompute::task_ptr<T1>::return_type>()

• std::declval<typename ::hetcompute::task_ptr<T2>::return_type>())> operator+(const::hetcompute::task_ptr<T1>& t1, const ::hetcompute::task_ptr<T2>& t2)

9.5.3.43 template<typename T1 , typename T2 > inline ::hetcompute::task_-ptr<decltype(std::declval<typename ::hetcompute::task_ptr<T1>-::return_type>) | std::declval<T2>))> hetcompute::operator| ( const::hetcompute::task_ptr< T1 > & t1, T2 && op2 )

Algebraic binary operator | for tasks.

See Also

template<typename T1, typename T2> inline ::hetcompute::task_ptr<decltype(std::declval<T1>()

• std::declval<typename ::hetcompute::task_ptr<T2>::return_type>())> operator+(T1&& op1,const ::hetcompute::task_ptr<T2>& t2)

9.5.3.44 template<typename T1 , typename T2 > inline ::hetcompute::task_-ptr<decltype(std::declval<T1>) | std::declval<typename ::hetcompute::task-_ptr<T2>::return_type>))> hetcompute::operator| ( T1 && op1, const::hetcompute::task_ptr< T2 > & t2 )

Algebraic binary operator | for tasks.

See Also

template<typename T1, typename T2> inline ::hetcompute::task_ptr<decltype(std::declval<T1>()

• std::declval<typename ::hetcompute::task_ptr<T2>::return_type>())> operator+(T1&& op1,const ::hetcompute::task_ptr<T2>& t2)

9.5.3.45 template<typename T > inline ::hetcompute::task_ptr<typename::hetcompute::task_ptr<T>::return_type> hetcompute::operator∼ ( const::hetcompute::task_ptr< T > & t )

Algebraic unary operator ∼ for task.

See Also

template<typename T> inline ::hetcompute::task_ptr<typename::hetcompute::task_ptr<T>::return_type> operator-(const ::hetcompute::task_ptr<T>& t);

9.5.4 Variable Documentation

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 357

Page 358: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Tasks Reference API

9.5.4.1 const do_not_collapse_t hetcompute::do_not_collapse {}

Object of type do_not_collapse_t.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 358

Page 359: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

10 Buffers Reference API

The Qualcomm HetCompute buffers API provides the user with a runtime-managed heterogeneous datastructure. Tasks on the CPU, GPU and Hexagon devices can share data using a Qualcomm HetComputebuffer. The following categories provide the API reference for buffers and related functionality.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 359

Page 360: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

10.1 Heterogeneous Compute Device Types

Classes

• class hetcompute::device_set

Captures a set of device types. More...

Enumerations

• enum hetcompute::device_type {cpu_big = HETCOMPUTE_DEVICE_TYPE_CPU_BIG, cpu_little = HETCOMPUTE_DEVICE_-TYPE_CPU_LITTLE, cpu = HETCOMPUTE_DEVICE_TYPE_CPU_BIG |HETCOMPUTE_DEVICE_TYPE_CPU_LITTLE, gpu = HETCOMPUTE_DEVICE_TYPE_GPU,dsp = HETCOMPUTE_DEVICE_TYPE_DSP }

The system devices capable of executing HetCompute tasks.

Functions

• std::string hetcompute::to_string (device_type d)

Converts device_type to string.

10.1.1 Class Documentation

10.1.1.1 class hetcompute::device_set

Captures a set of device types.

Supports addition and removal of device_types from the set. Supports set union and set subtractionwith another device_set object.

Public member functions

• device_set ()

Default constructor produces empty set.

• device_set (std::initializer_list< device_type > device_list)

Constructor with initialization.

• device_set & add (device_type d)

Add a device_type to the device_set.

• device_set & add (device_set const &other)

Set union with another device_set.

• bool empty () const

Checks if device set has any devices or its empty.

• HETCOMPUTE_DEFAULT_METHOD (device_set(device_set const &))

• HETCOMPUTE_DEFAULT_METHOD (device_set &operator=(device_set const &))

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 360

Page 361: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

• HETCOMPUTE_DEFAULT_METHOD (device_set(device_set &&))

• HETCOMPUTE_DEFAULT_METHOD (device_set &operator=(device_set &&))

• device_set & negate ()

Negate the device_set.

• bool on_cpu () const

Query if cpu is part of the device_set.

• bool on_cpu_big () const

Query if cpu big core is part of the device_set.

• bool on_cpu_little () const

Query if cpu LITTLE core is part of the device_set.

• bool on_dsp () const

Query if dsp is part of the device_set.

• bool on_gpu () const

Query if gpu is part of the device_set.

• device_set & remove (device_type d)

Remove a device_type from the device_set.

• device_set & remove (device_set const &other)

Set substraction with another device_set.

• std::string to_string () const

Convert the device_set to a string representation.

Friends

• hetcompute_device_set_t internal::get_raw_device_set_t (device_set const &d)

10.1.1.1.1 Constructors and Destructors

10.1.1.1.1.1 hetcompute::device_set::device_set ( )

Default constructor produces empty set.

10.1.1.1.1.2 hetcompute::device_set::device_set ( std::initializer_list< device_type > device_list )

Constructor with initialization.

Parameters

device_list nn initializer list of device_type elements.

Example:

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 361

Page 362: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

hetcompute::device_set ds{hetcompute::cpu, hetcompute::dsp};

10.1.1.1.2 Member Function Documentation

10.1.1.1.2.1 device_set& hetcompute::device_set::add ( device_type d )

Add a device_type to the device_set.

Parameters

d device_type to add (no effect if already present).

Returns

Reference to the updated device_set.

10.1.1.1.2.2 device_set& hetcompute::device_set::add ( device_set const & other )

Set union with another device_set.

Parameters

other Another device_set.

Returns

Reference to the updated device_set.

Example:

hetcompute::device_set a{hetcompute::cpu};

hetcompute::device_set b{hetcompute::gpu};

a.add(b);

assert(true == a.on_cpu());

assert(true == a.on_gpu());

assert(false == a.on_dsp());

10.1.1.1.2.3 bool hetcompute::device_set::empty ( ) const

Checks if device set has any devices or its empty.

Returns

true if device_set has no devices,false otherwise.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 362

Page 363: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

10.1.1.1.2.4 hetcompute::device_set::HETCOMPUTE_DEFAULT_METHOD ( device_set(device_setconst &) )

Copy constructor.

10.1.1.1.2.5 hetcompute::device_set::HETCOMPUTE_DEFAULT_METHOD ( device_set & operator =(device_set const &) )

Copy assignment.

10.1.1.1.2.6 hetcompute::device_set::HETCOMPUTE_DEFAULT_METHOD ( device_set(device_set &&))

Move constructor.

10.1.1.1.2.7 hetcompute::device_set::HETCOMPUTE_DEFAULT_METHOD ( device_set & operator =(device_set &&) )

Move assignment.

10.1.1.1.2.8 device_set& hetcompute::device_set::negate ( )

Negate the device_set.

Returns

Reference to the updated device_set.

Example:

hetcompute::device_set a{hetcompute::cpu, hetcompute::gpu};

a.negate();

assert(false == a.on_cpu());

assert(false == a.on_gpu());

assert(true == a.on_dsp());

10.1.1.1.2.9 bool hetcompute::device_set::on_cpu ( ) const

Query if cpu is part of the device_set.

Returns

true if cpu present,false otherwise.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 363

Page 364: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

10.1.1.1.2.10 bool hetcompute::device_set::on_cpu_big ( ) const

Query if cpu big core is part of the device_set.

Returns

true if cpu big core present,false otherwise.

10.1.1.1.2.11 bool hetcompute::device_set::on_cpu_little ( ) const

Query if cpu LITTLE core is part of the device_set.

Returns

true if cpu LITTLE core present,false otherwise.

10.1.1.1.2.12 bool hetcompute::device_set::on_dsp ( ) const

Query if dsp is part of the device_set.

Returns

true if dsp present,false otherwise.

10.1.1.1.2.13 bool hetcompute::device_set::on_gpu ( ) const

Query if gpu is part of the device_set.

Returns

true if gpu present,false otherwise.

10.1.1.1.2.14 device_set& hetcompute::device_set::remove ( device_type d )

Remove a device_type from the device_set.

Parameters

d device_type to remove (no effect if not present).

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 364

Page 365: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

Returns

Reference to the updated device_set.

10.1.1.1.2.15 device_set& hetcompute::device_set::remove ( device_set const & other )

Set substraction with another device_set.

Parameters

other Another device_set.

Returns

Reference to the updated device_set.

Example:

hetcompute::device_set a{hetcompute::cpu, hetcompute::gpu};

hetcompute::device_set b{hetcompute::gpu, hetcompute::dsp};

a.remove(b);

assert(true == a.on_cpu());

assert(false == a.on_gpu());

assert(false == a.on_dsp());

10.1.1.1.2.16 std::string hetcompute::device_set::to_string ( ) const

Convert the device_set to a string representation.

Returns

std::string representation of the devices present.

Example:

hetcompute::device_set a{hetcompute::cpu, hetcompute::gpu};

assert(a.to_string() == "cpu gpu ");

10.1.2 Enumeration Type Documentation

10.1.2.1 enum hetcompute::device_type

The system devices capable of executing HetCompute tasks.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 365

Page 366: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

10.1.3 Function Documentation

10.1.3.1 std::string hetcompute::to_string ( device_type d )

Converts device_type to string.

Parameters

d device_type (e.g., cpu, cpu_big, cpu_little, gpu, or dsp).

Returns

std::string (e.g., "cpu", "cpu_big", "cpu_little", "gpu", or "dsp").

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 366

Page 367: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

10.2 Buffers

Classes

• class hetcompute::buffer_const_iterator< T >

Const random access iterator over a buffer. More...

• class hetcompute::buffer_iterator< T >

Random access iterator over a buffer. More...

• class hetcompute::buffer_ptr< T >

Pointer to an underlying runtime-managed buffer data structure. More...

• struct hetcompute::in< BufferPtr >

Indicates that a buffer parameter is input-only (read-only) for a kernel. More...

• struct hetcompute::inout< BufferPtr >

Indicates that a buffer parameter is used both as an input and an output (read-write) by a kernel. More...

• struct hetcompute::out< BufferPtr >

Indicates that a buffer parameter is output-only (write-invalidate) for a kernel. More...

• class hetcompute::scope_acquire_ro< T >

Scope guard for read-only acquire of a buffer by the host code. More...

• class hetcompute::scope_acquire_rw< T >

Scope guard for read-write acquire of a buffer by the host code. More...

• class hetcompute::scope_acquire_wi< T >

Scope guard for write-invalidate acquire of a buffer by the host code. More...

Functions

• template<typename T >

buffer_ptr< T > hetcompute::create_buffer (size_t num_elems, device_set const &likely_devices)

Creates a buffer of datatype T of the requested size.

• template<typename T >

buffer_ptr< T > hetcompute::create_buffer (T ∗preallocated_ptr, size_t num_elems, device_set const&likely_devices)

Creates a buffer of datatype T of the requested size from a pre-allocated pointer.

• template<typename T >

buffer_ptr< T > hetcompute::create_buffer (memregion const &mr, size_t num_elems, device_setconst &likely_devices)

Creates a buffer of datatype T of the requested size from a hetcompute::memregion.

• template<typename T >

bool hetcompute::operator!= (::hetcompute::buffer_ptr< T > const &b,::std::nullptr_t)

• template<typename T >

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 367

Page 368: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

bool hetcompute::operator!= (::std::nullptr_t,::hetcompute::buffer_ptr< T > const &b)

• template<typename T >

bool hetcompute::operator!= (::hetcompute::buffer_ptr< T > const &b1,::hetcompute::buffer_ptr<T > const &b2)

• template<typename T >

bool hetcompute::operator== (::hetcompute::buffer_ptr< T > const &b,::std::nullptr_t)

• template<typename T >

bool hetcompute::operator== (::std::nullptr_t,::hetcompute::buffer_ptr< T > const &b)

• template<typename T >

bool hetcompute::operator== (::hetcompute::buffer_ptr< T > const&b1,::hetcompute::buffer_ptr< T > const &b2)

10.2.1 Class Documentation

10.2.1.1 class hetcompute::buffer_const_iterator

template<typename T>class hetcompute::buffer_const_iterator< T >

Const random access iterator over a buffer.

See Also

hetcompute::buffer_ptr::const_iteratorhetcompute::buffer_ptr::cbegin()hetcompute::buffer_ptr::cend()

Public member functions

• buffer_const_iterator (buffer_iterator< T > const &it)

• HETCOMPUTE_DEFAULT_METHOD (buffer_const_iterator(buffer_const_iterator const &))

• HETCOMPUTE_DEFAULT_METHOD (buffer_const_iterator &operator=(buffer_const_iteratorconst &))

• HETCOMPUTE_DEFAULT_METHOD (buffer_const_iterator(buffer_const_iterator &&))

• HETCOMPUTE_DEFAULT_METHOD (buffer_const_iterator &operator=(buffer_const_iterator&&))

• bool operator!= (buffer_const_iterator const &it) const

• T const & operator∗ () const

• buffer_const_iterator operator+ (size_t offset)

• buffer_const_iterator & operator++ ()

• buffer_const_iterator operator++ (int)

• buffer_const_iterator & operator+= (size_t offset)

• buffer_const_iterator operator- (size_t offset)

• int operator- (buffer_const_iterator const &it)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 368

Page 369: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

• buffer_const_iterator & operator-- ()

• buffer_const_iterator operator-- (int)

• buffer_const_iterator & operator-= (size_t offset)

• bool operator< (buffer_const_iterator const &it) const

• bool operator<= (buffer_const_iterator const &it) const

• bool operator== (buffer_const_iterator const &it) const

• bool operator> (buffer_const_iterator const &it) const

• bool operator>= (buffer_const_iterator const &it) const

• T const & operator[ ] (size_t n) const

10.2.1.2 class hetcompute::buffer_iterator

template<typename T>class hetcompute::buffer_iterator< T >

Random access iterator over a buffer.

See Also

hetcompute::buffer_ptr::iteratorhetcompute::buffer_ptr::begin()hetcompute::buffer_ptr::end()

Public member functions

• HETCOMPUTE_DEFAULT_METHOD (buffer_iterator(buffer_iterator const &))

• HETCOMPUTE_DEFAULT_METHOD (buffer_iterator &operator=(buffer_iterator const &))

• HETCOMPUTE_DEFAULT_METHOD (buffer_iterator(buffer_iterator &&))

• HETCOMPUTE_DEFAULT_METHOD (buffer_iterator &operator=(buffer_iterator &&))

• bool operator!= (buffer_iterator const &it) const

• T & operator∗ () const

• buffer_iterator operator+ (size_t offset)

• buffer_iterator & operator++ ()

• buffer_iterator operator++ (int)

• buffer_iterator & operator+= (size_t offset)

• buffer_iterator operator- (size_t offset)

• int operator- (buffer_iterator const &it)

• buffer_iterator & operator-- ()

• buffer_iterator operator-- (int)

• buffer_iterator & operator-= (size_t offset)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 369

Page 370: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

• bool operator< (buffer_iterator const &it) const

• bool operator<= (buffer_iterator const &it) const

• bool operator== (buffer_iterator const &it) const

• bool operator> (buffer_iterator const &it) const

• bool operator>= (buffer_iterator const &it) const

• T & operator[ ] (size_t n) const

10.2.1.3 class hetcompute::buffer_ptr

template<typename T>class hetcompute::buffer_ptr< T >

Pointer to an underlying runtime-managed buffer data structure, if != nullptr.

Similar to std::shared_ptr, the buffer pointer can be assigned to another buffer pointer, compared forequality/inequality against another buffer pointer or against nullptr, and provides ref-counted access.However, HetCompute does not expose the underlying buffer as an API object. Therefore, there is nosupport for getting the "address-of" the buffer to assign to a buffer pointer.

The underlying buffer is ref-counted: it is automatically deallocated when there are no longer any bufferpointers to it. Once the programmer creates a buffer via a hetcompute::create_buffer() call, theuser can control the lifetime of the buffer by controlling the lifetime of a buffer pointer object pointing to it.

See Also

Shared pointers http://en.cppreference.com/w/cpp/memory/shared_ptr

Public Types

• typedef buffer_const_iterator< T > const_iterator

Random access iterator providing immutable access to the buffer data.

• using data_type = T

• typedef buffer_iterator< T > iterator

Random access iterator providing mutable access to the buffer data.

Public member functions

• buffer_ptr ()

Create a buffer_ptr with no underlying buffer storage created.

• buffer_ptr (buffer_ptr< typename std::remove_const< T >::type > const &other)

Copy constructor: creates a new buffer_ptr pointing to the same underlying buffer as other.

• void acquire_ro () const

Acquires the underlying buffer for read-only access by the host code.

• bool acquire_rw () const

Attempts to acquire the underlying buffer for read-write access by the host code.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 370

Page 371: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

• bool acquire_wi () const

Attempts to acquire the underlying buffer for write-invalidate access by the host code.

• T & at (size_t index) const

Indexed lookup of buffer data with array-bounds and host-access allocation checks.

• iterator begin () const

Iterator to the start of the buffer data.

• const_iterator cbegin () const

Const iterator to the start of the buffer data.

• const_iterator cend () const

Const iterator to the end of the buffer data.

• iterator end () const

Iterator to the end of the buffer data.

• void ∗ host_data () const

Gets a pointer to the host accessible data of the underlying buffer, allocating if necessary.

• buffer_ptr & operator= (buffer_ptr< typename std::remove_const< T >::type > const &other)

Copy assignment: points to the underlying buffer of other.

• T & operator[ ] (size_t index) const

Indexed lookup of buffer data.

• size_t release () const

Decrements the host acquire count, releasing the buffer from host access when the count goes to zero.

• void ∗ saved_host_data () const

Fast lookup of a saved pointer to the host accessible data of the underlying buffer.

• size_t size () const

The number of elements of datatype T in the underlying buffer pointed to by this buffer_ptr.

• std::string to_string () const

Gets a string with basic information about the buffer_ptr.

• template<hetcompute::graphics::image_format img_format, int dims>

buffer_ptr & treat_as_texture (hetcompute::graphics::image_size< dims > const &is)

Allows this buffer_ptr to be passed to a gpu_kernel where ahetcompute::graphics::texture_ptr parameter was expected.

10.2.1.3.1 Member Typedef Documentation

10.2.1.3.1.1 template<typename T> typedef buffer_const_iterator<T> hetcompute::buffer_ptr< T>::const_iterator

Random access iterator providing immutable access to the buffer data.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 371

Page 372: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

10.2.1.3.1.2 template<typename T> using hetcompute::buffer_ptr< T >::data_type = T

Data type of buffer

10.2.1.3.1.3 template<typename T> typedef buffer_iterator<T> hetcompute::buffer_ptr< T >::iterator

Random access iterator providing mutable access to the buffer data.

10.2.1.3.2 Constructors and Destructors

10.2.1.3.2.1 template<typename T> hetcompute::buffer_ptr< T >::buffer_ptr ( )

Create a buffer_ptr with no underlying buffer storage created. Tests equal to nullptr.

Example:

hetcompute::buffer_ptr<int> b;assert(b == nullptr);

10.2.1.3.2.2 template<typename T> hetcompute::buffer_ptr< T >::buffer_ptr ( buffer_ptr< typenamestd::remove_const< T >::type > const & other )

Copy constructor: creates a new buffer_ptr pointing to the same underlying buffer as other. Abuffer_ptr<const T> instance may be constructed from an instance of buffer_ptr<T>.

Parameters

other An existing buffer pointer.

Example:

hetcompute::buffer_ptr<int> b = hetcompute::create_buffer<int>(10);hetcompute::buffer_ptr<int> x(b);hetcompute::buffer_ptr<const int> y(b);

10.2.1.3.3 Member Function Documentation

10.2.1.3.3.1 template<typename T> void hetcompute::buffer_ptr< T >::acquire_ro ( ) const

Acquires the underlying buffer for read-only access by the host code. The host code may read the existingcontents of the buffer after this call, until the host code releases access using the release() method.

The call will block for any conflicting operations to complete (e.g., a task concurrently performingread-write access to the buffer), after which the buffer is acquired for access by the host code and the callunblocks. However, if the buffer has already been acquired for the host code by a precedingacquire_∗(), the call will return immediately.

The host code may recursively acquire the buffer using a combination of acquire_ro(),acquire_wi() and acquire_rw() calls. The first acquire_∗ establishes the access type(read-only, write-invalidate, or read-write) of the buffer for the host code. Subsequent recursiveacquire_∗ calls will succeed only if they are compatible with the previously established access type.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 372

Page 373: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

Subsequent recursive acquire_wi() and acquire_rw() calls will return with failure if the firstrecursive acquisition was acquire_ro(), as the access type of these calls is incompatible with theestablished read-only access. However, any subsequent acquire_∗() recursive calls will succeed if thefirst acquisition was either write-invalidate or read-write. When the established access type iswrite-invalidate, subsequent recursive read-only or read-write acquisitions are considered to get access toany data written to the buffer after the original write-invalidate. When the established access type isread-write, a subsequent recursive write-invalidate does not destroy any prior data, as there is no additionalsynchronization required between device memories to access the latest data.

The host code releases the buffer only when a number of release() calls equal to the number ofsuccessful recursive acquire_∗() calls are made.

Note that access by concurrent threads of the host code is also considered recursive, even when theacquire-release calls do not properly nest across threads. The first acquire by any one thread establishes thehost access type for all threads of the host code, until the host code releases.

HetCompute disallows concurrent access to a buffer when the buffer is being modified. The acquisition willbe blocked when a concurrent task/pattern has acquired the buffer for read-write or write-invalidate access.In rare situations, the acquisition may also be blocked when a concurrent task/pattern has read-only accessbut HetCompute is unable to synchronize the buffer data for host access until the concurrent task/patterncompletes.

See Also

hetcompute::create_buffer()

10.2.1.3.3.2 template<typename T> bool hetcompute::buffer_ptr< T >::acquire_rw ( ) const

Attempts to acquire the underlying buffer for read-write access by the host code. Returns true issuccessful, false on failure to acquire for read-write due to a prior read-only acquisition by the host code.If successful, the host may read the prior contents of the buffer and update the contents until the host codereleases access using the release() method.

The call will block for any conflicting operations to complete (e.g., a task concurrently performingread-write access to the buffer), after which the buffer is acquired for access by the host code and the callunblocks. However, if the buffer has already been acquired for the host code by a precedingacquire_∗(), the call will return immediately.

The host code may recursively acquire the buffer using a combination of acquire_ro(),acquire_wi() and acquire_rw() calls. The first acquire_∗ establishes the access type(read-only, write-invalidate, or read-write) of the buffer for the host code. Subsequent recursiveacquire_∗ calls will succeed only if they are compatible with the previously established access type.Subsequent recursive acquire_wi() and acquire_rw() calls will return with failure if the firstrecursive acquisition was acquire_ro(), as the access type of these calls is incompatible with theestablished read-only access. However, any subsequent acquire_∗() recursive calls will succeed if thefirst acquisition was either write-invalidate or read-write. When the established access type iswrite-invalidate, subsequent recursive read-only or read-write acquisitions are considered to get access toany data written to the buffer after the original write-invalidate. When the established access type isread-write, a subsequent recursive write-invalidate does not destroy any prior data, as there is no additionalsynchronization required between device memories to access the latest data.

The host code releases the buffer only when a number of release() calls equal to the number ofsuccessful recursive acquire_∗() calls are made.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 373

Page 374: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

Note that access by concurrent threads of the host code is also considered recursive, even when theacquire-release calls do not properly nest across threads. The first acquire by any one thread establishes thehost access type for all threads of the host code, until the host code releases.

HetCompute disallows concurrent access to a buffer when the buffer is being modified. The acquisition willbe blocked when a concurrent task/pattern has acquired the buffer for read-write or write-invalidate access.In rare situations, the acquisition may also be blocked when a concurrent task/pattern has read-only accessbut HetCompute is unable to synchronize the buffer data for host access until the concurrent task/patterncompletes.

See Also

hetcompute::create_buffer()

10.2.1.3.3.3 template<typename T> bool hetcompute::buffer_ptr< T >::acquire_wi ( ) const

Attempts to acquire the underlying buffer for write-invalidate access by the host code. Returns true issuccessful, false on failure to acquire for write-invalidate due to a prior read-only acquisition by the hostcode. If successful, the prior contents of the buffer are lost after this call. The host code may write thebuffer data (and read back what it wrote) after this call, until the host code releases access using therelease() method.

The call will block for any conflicting operations to complete (e.g., a task concurrently performingread-write access to the buffer), after which the buffer is acquired for access by the host code and the callunblocks. However, if the buffer has already been acquired for the host code by a precedingacquire_∗(), the call will return immediately.

The host code may recursively acquire the buffer using a combination of acquire_ro(),acquire_wi() and acquire_rw() calls. The first acquire_∗ establishes the access type(read-only, write-invalidate, or read-write) of the buffer for the host code. Subsequent recursiveacquire_∗ calls will succeed only if they are compatible with the previously established access type.Subsequent recursive acquire_wi() and acquire_rw() calls will return with failure if the firstrecursive acquisition was acquire_ro(), as the access type of these calls is incompatible with theestablished read-only access. However, any subsequent acquire_∗() recursive calls will succeed if thefirst acquisition was either write-invalidate or read-write. When the established access type iswrite-invalidate, subsequent recursive read-only or read-write acquisitions are considered to get access toany data written to the buffer after the original write-invalidate. When the established access type isread-write, a subsequent recursive write-invalidate does not destroy any prior data, as there is no additionalsynchronization required between device memories to access the latest data.

The host code releases the buffer only when a number of release() calls equal to the number ofsuccessful recursive acquire_∗() calls are made.

Note that access by concurrent threads of the host code is also considered recursive, even when theacquire-release calls do not properly nest across threads. The first acquire by any one thread establishes thehost access type for all threads of the host code, until the host code releases.

HetCompute disallows concurrent access to a buffer when the buffer is being modified. The acquisition willbe blocked when a concurrent task/pattern has acquired the buffer for read-write or write-invalidate access.In rare situations, the acquisition may also be blocked when a concurrent task/pattern has read-only accessbut HetCompute is unable to synchronize the buffer data for host access until the concurrent task/patterncompletes.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 374

Page 375: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

See Also

hetcompute::create_buffer()

10.2.1.3.3.4 template<typename T> T& hetcompute::buffer_ptr< T >::at ( size_t index ) const

Same as operator[] but also checks if the buffer data has been previously made host accessible via thisbuffer_ptr and performs array bounds check. However, does not guarantee that the buffer data is currentlyhost accessible. The programmer must ensure that the buffer will not be concurrently accessed by tasks thatmay invalidate the host accessible data (for example, by not launching any task that accesses a buffer_ptr tothis buffer until the host access is complete).

See saved_host_data() for host access criteria.

Parameters

index The index to the element to lookup inside the buffer data.

Exceptions

hetcompute::api_-exception

if data is not host-accessible.

std::out_of_range if index exceeds array bounds.

Note

If exceptions are disabled by application, the API will terminate the application if data is nothost-accessible.

Returns

A reference to the indexed element.

See Also

saved_host_data()

10.2.1.3.3.5 template<typename T> iterator hetcompute::buffer_ptr< T >::begin ( ) const

Get iterator to the start of the buffer data. Allows mutable access to the buffer data.

Returns

Iterator to the start of the buffer data.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 375

Page 376: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

10.2.1.3.3.6 template<typename T> const_iterator hetcompute::buffer_ptr< T >::cbegin ( ) const

Get const iterator to the start of the buffer data. Restricts to immutable access.

Returns

Const iterator to the start of the buffer data.

10.2.1.3.3.7 template<typename T> const_iterator hetcompute::buffer_ptr< T >::cend ( ) const

Get const iterator to the end of the buffer data. Restricts to immutable access.

Returns

Const iterator to the end of the buffer data.

10.2.1.3.3.8 template<typename T> iterator hetcompute::buffer_ptr< T >::end ( ) const

Get iterator to the end of the buffer data. Allows mutable access to the buffer data.

Returns

Tterator to the end of the buffer data.

10.2.1.3.3.9 template<typename T> void∗ hetcompute::buffer_ptr< T >::host_data ( ) const

Gets a pointer to the host accessible data of the underlying buffer, allocating the host accessible storage ifnecessary. Note that this call does not ensure that the buffer data is currently host accessible. For example,data updates by a concurrent task on the underlying buffer may not be visible yet via the host accessibledata pointer.

Returns

nullptr, if this buffer_ptr is nullptr.!=nullptr, if this buffer_ptr points to a valid buffer.

See Also

saved_host_data() for fast lookup of a previously queried pointer to the host accessible data.acquire_ro()acquire_wi()acquire_rw()release()

to allow the buffer to be read or written by the host code, in addition to querying a pointer to the hostaccessible data.

Unlike the acquire calls, which may sometimes block when there is concurrent task access to the buffer, thismethod can be called at any time without blocking to determine the pointer to the host accessible data.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 376

Page 377: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

10.2.1.3.3.10 template<typename T> buffer_ptr& hetcompute::buffer_ptr< T >::operator= (buffer_ptr< typename std::remove_const< T >::type > const & other )

Copy assignment: points to the underlying buffer of other. A buffer_ptr<const T> instance maybe assigned to from an instance of buffer_ptr<T>. If this buffer_ptr was the last one pointing to itsunderlying buffer, the underlying buffer will get deallocated once the buffer_ptr is copy-assigned and pointsto a different underlying buffer.

Parameters

other An existing buffer pointer.

Example:

hetcompute::buffer_ptr<int> b = hetcompute::create_buffer<int>(10);hetcompute::buffer_ptr<int> x;hetcompute::buffer_ptr<const int> y;x = b;y = b;

10.2.1.3.3.11 template<typename T> T& hetcompute::buffer_ptr< T >::operator[ ] ( size_t index )const

If the buffer data is host accessible or being accessed as a CPU task parameter, it performs an array indexlookup. Undefined behavior for host accesses if the programmer has not previously ensured that the bufferdata is host accessible.

See saved_host_data() for host access criteria.

Parameters

index The index to the element to lookup inside the buffer data.

Returns

A reference to the indexed element.

See Also

at()saved_host_data()

10.2.1.3.3.12 template<typename T> size_t hetcompute::buffer_ptr< T >::release ( ) const

Decrements the host acquire count, releasing the buffer from host access when the count goes to zero.release() needs to be called once for every successful recursive call to acquire_∗(), after which thebuffer is released from access by the host code. The host code may not read or write the buffer contentsafter the final release() call, until the host code acquires the buffer again.

The release() call never blocks.

The call returns the number of recursive acquisitions remaining to be released before the host code will

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 377

Page 378: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

release access to the buffer. That is, the host code releases access when the return value of this call is 0.

Exceptions

hetcompute::api_-exception

if called when the buffer is not currently acquired by the host code.

Note

If exceptions are disabled by application, terminates the application when the buffer is not currentlyacquired by the host code.

10.2.1.3.3.13 template<typename T> void∗ hetcompute::buffer_ptr< T >::saved_host_data ( ) const

Fast lookup of a saved pointer to the host accessible data of the underlying buffer. The pointer may be savedby either a previous host_code() or acquire_∗ calls.

Note that this call does not ensure that the buffer data is currently host accessible via this buffer_ptr. Forexample, data updates by a concurrent task on the underlying buffer may not be visible yet via the hostaccessible data pointer.

Returns

nullptr, ifi) the host accessible data pointer has not previously been queried via this buffer_ptr ii) this buffer_ptris a nullptr.!=nullptr, if

See Also

host_code()acquire_ro()acquire_wi()acquire_rw()release()

for explicit host synchronization.

10.2.1.3.3.14 template<typename T> size_t hetcompute::buffer_ptr< T >::size ( ) const

The number of elements of datatype T in the underlying buffer pointed to by this buffer_ptr.

10.2.1.3.3.15 template<typename T> std::string hetcompute::buffer_ptr< T >::to_string ( ) const

Gets a string with basic information about the buffer_ptr.

Returns

String with basic information about the buffer_ptr.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 378

Page 379: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

10.2.1.3.3.16 template<typename T> template<hetcompute::graphics::image_format img_-format, int dims> buffer_ptr& hetcompute::buffer_ptr< T >::treat_as_texture (hetcompute::graphics::image_size< dims > const & is )

Allows this buffer_ptr to be interpreted as a texture of a given format, dimensionality and size when passedas an argument to a gpu_kernel expecting a hetcompute::graphics::texture_ptr. Theinterpretation applies to the current buffer_ptr, not to the buffer as a whole. That is, multiplebuffer_ptrs to the same buffer may simultaneously be interpreted as textures of different formats,dimensions and sizes.

Template Parameters

img_format The texture image format to interpret this buffer_ptr as.dims The texture dimensions to interpret this buffer_ptr as.

Parameters

is The texture image size to interpret this buffer_ptr as.

Returns

This buffer_ptr.

Throws an assertion if the HetCompute library is built without GPU support.

10.2.1.4 struct hetcompute::in

template<typename BufferPtr>struct hetcompute::in< BufferPtr >

Use in a kernel parameter declaration to indicate that a buffer parameter will be input-only (read-only) forthe kernel.

10.2.1.5 struct hetcompute::inout

template<typename BufferPtr>struct hetcompute::inout< BufferPtr >

Use in a kernel parameter declaration to indicate that a buffer parameter will be used both as an input andan output (read-write) by the kernel.

10.2.1.6 struct hetcompute::out

template<typename BufferPtr>struct hetcompute::out< BufferPtr >

Use in a kernel parameter declaration to indicate that a buffer parameter will be output-only(write-invalidate) for the kernel.

10.2.1.7 class hetcompute::scope_acquire_ro

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 379

Page 380: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

template<typename T>class hetcompute::scope_acquire_ro< T >

Scope guard for read-only acquire of a buffer by the host code.

Example: instead of writing

void f() {b.acquire_ro();if(...) {

b.release();return;

}...b.release();

}

write

void f() {scope_acquire_ro<decltype(b)::data_type> guard(b);if(...) {

return;}...

}

See Also

buffer_ptr::acquire_ro();

Public member functions

• scope_acquire_ro (hetcompute::buffer_ptr< T > const &b)

• HETCOMPUTE_DELETE_METHOD (scope_acquire_ro())

• HETCOMPUTE_DELETE_METHOD (scope_acquire_ro(scope_acquire_ro const &))

• HETCOMPUTE_DELETE_METHOD (scope_acquire_ro &operator=(scope_acquire_ro const&))

• HETCOMPUTE_DELETE_METHOD (scope_acquire_ro(scope_acquire_ro &&))

• HETCOMPUTE_DELETE_METHOD (scope_acquire_ro &operator=(scope_acquire_ro &&))

10.2.1.8 class hetcompute::scope_acquire_rw

template<typename T>class hetcompute::scope_acquire_rw< T >

Scope guard for read-write acquire of a buffer by the host code.

Example: instead of writing

void f() {b.acquire_rw();if(...) {

b.release();return;

}...b.release();

}

write

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 380

Page 381: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

void f() {scope_acquire_rw<decltype(b)::data_type> guard(b);if(...) {

return;}...

}

See Also

buffer_ptr::acquire_rw();

Public member functions

• scope_acquire_rw (hetcompute::buffer_ptr< T > const &b)

• HETCOMPUTE_DELETE_METHOD (scope_acquire_rw())

• HETCOMPUTE_DELETE_METHOD (scope_acquire_rw(scope_acquire_rw const &))

• HETCOMPUTE_DELETE_METHOD (scope_acquire_rw &operator=(scope_acquire_rw const&))

• HETCOMPUTE_DELETE_METHOD (scope_acquire_rw(scope_acquire_rw &&))

• HETCOMPUTE_DELETE_METHOD (scope_acquire_rw &operator=(scope_acquire_rw &&))

10.2.1.9 class hetcompute::scope_acquire_wi

template<typename T>class hetcompute::scope_acquire_wi< T >

Scope guard for write-invalidate acquire of a buffer by the host code.

Example: instead of writing

void f() {b.acquire_wi();if(...) {

b.release();return;

}...b.release();

}

write

void f() {scope_acquire_wi<decltype(b)::data_type> guard(b);if(...) {

return;}...

}

See Also

buffer_ptr::acquire_wi();

Public member functions

• scope_acquire_wi (hetcompute::buffer_ptr< T > const &b)

• HETCOMPUTE_DELETE_METHOD (scope_acquire_wi())

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 381

Page 382: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

• HETCOMPUTE_DELETE_METHOD (scope_acquire_wi(scope_acquire_wi const &))

• HETCOMPUTE_DELETE_METHOD (scope_acquire_wi &operator=(scope_acquire_wi const&))

• HETCOMPUTE_DELETE_METHOD (scope_acquire_wi(scope_acquire_wi &&))

• HETCOMPUTE_DELETE_METHOD (scope_acquire_wi &operator=(scope_acquire_wi &&))

10.2.2 Function Documentation

10.2.2.1 template<typename T > buffer_ptr<T> hetcompute::create_buffer ( size_tnum_elems, device_set const & likely_devices )

Creates a buffer of datatype T of the requested size. Fatal error if num_elems is 0.

Template Parameters

T User data type for buffer.

Parameters

num_elems Number of elements of type T in buffer. Must be larger than zero.likely_devices Optional, default is an empty hetcompute::device_set. Allows the

programmer to convey advance knowledge of which device-types mayaccess this buffer, thereby allowing Qualcomm HetCompute to internallydetermine an optimal storage and data-transfer policy for this buffer. Thisinformation is only used as a hint to guide internal storage-allocation anddata-transfer optimizations, and is allowed to be partial or incorrect.

Returns

A buffer pointer to the created buffer.

The optional parameters allow for the following variants to this call.

hetcompute::create_buffer(size_t num_elems);

hetcompute::create_buffer(size_t num_elems,hetcompute::device_set const& likely_devices);

10.2.2.2 template<typename T > buffer_ptr<T> hetcompute::create_buffer ( T ∗preallocated_ptr, size_t num_elems, device_set const & likely_devices )

Creates a buffer of datatype T of the requested size from a pre-allocated pointer. The pre-allocated pointerprovides initial storage and potentially initial data for the buffer. Fatal error if num_elems is 0.

Template Parameters

T User data type for buffer.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 382

Page 383: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

Parameters

preallocated_ptr Pointer to pre-allocated contiguous memory of size at least num_elems ∗sizeof(T) bytes.

num_elems Number of elements of type T in buffer. Must be larger than zero.likely_devices Optional, default is an empty hetcompute::device_set. Allows the

programmer to convey advance knowledge of which device-types mayaccess this buffer, thereby allowing Qualcomm HetCompute to internallydetermine an optimal storage and data-transfer policy for this buffer. Thisinformation is only used as a hint to guide internal storage-allocation anddata-transfer optimizations, and is allowed to be partial or incorrect.

Returns

A buffer pointer to the created buffer.

The optional parameter allows for the following variant to this call.

create_buffer(T* preallocated_ptr,size_t num_elems);

create_buffer(T* preallocated_ptr,size_t num_elems,

device_set const& likely_devices);

10.2.2.3 template<typename T > buffer_ptr<T> hetcompute::create_buffer (memregion const & mr, size_t num_elems, device_set const & likely_devices)

Creates a buffer of datatype T of the requested size from a hetcompute::memregion. Thehetcompute::memregion provides initial storage and potentially initial data for the buffer. Fatal error if bothnum_elems == 0 and mr.get_num_bytes() == 0, or if (num_elems ∗ sizeof(T)) >mr.get_num_bytes().

Template Parameters

T User data type for buffer.

Parameters

mr A memory region of the desired type allocated by the user.num_elems Optional, default =0. If =0, the number of elements is determined by the

capacity of mr, computed as follows mr.get_num_bytes() /sizeof(T). If non-zero, specifies the number of elements of type T inbuffer, which must fit in the capacity of mr.

likely_devices Optional, default is an empty hetcompute::device_set. Allows theprogrammer to convey advance knowledge of which device-types mayaccess this buffer, thereby allowing Qualcomm HetCompute to internallydetermine an optimal storage and data-transfer policy for this buffer. Thisinformation is only used as a hint to guide internal storage-allocation anddata-transfer optimizations, and is allowed to be partial or incorrect.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 383

Page 384: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

Returns

A buffer pointer to the created buffer.

See Also

hetcompute::memregion

The optional parameters allow for the following variants to this call.

hetcompute::create_buffer(hetcompute::memregion const& mr);

hetcompute::create_buffer(hetcompute::memregion const& mr,size_t num_elems);

hetcompute::create_buffer(hetcompute::memregion const& mr,hetcompute::device_set const& likely_devices);

hetcompute::create_buffer(hetcompute::memregion const& mr,size_t num_elems,hetcompute::device_set const& likely_devices);

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 384

Page 385: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

10.3 Memory Regions

Classes

• class hetcompute::glbuffer_memregion

Creates inter-operability with an OpenGL buffer. More...

• class hetcompute::ion_memregion

Allocates ION memory on platforms that support it. More...

• class hetcompute::main_memregion

Allocates aligned memory from the platform main memory. More...

• class hetcompute::memregion

Base class for all mem-regions. More...

• class hetcompute::svm_memregion

Allocates memory from the OpenCL 2.0 SVM Memory region. More...

Functions

• hetcompute::glbuffer_memregion::glbuffer_memregion (GLuint id)

Constructor, wraps an existing OpenGL buffer to allow inter-operability with Qualcomm HetCompute.

• hetcompute::ion_memregion::ion_memregion (size_t sz, bool cacheable=true)

Constructor, allocates ION memory.

• hetcompute::ion_memregion::ion_memregion (void ∗ptr, size_t sz, bool cacheable)

Constructor, uses allocated ION memory.

• hetcompute::ion_memregion::ion_memregion (void ∗ptr, int fd, size_t sz, bool cacheable)

Constructor, uses allocated ION memory, with the associated file descriptor.

• hetcompute::main_memregion::main_memregion (size_t sz, size_t alignment=s_default_alignment)

Constructor, allocates aligned memory.

• hetcompute::main_memregion::main_memregion (void ∗ptr, size_t sz)

Constructor, uses user-allocated memory.

• hetcompute::memregion::memregion (internal::internal_memregion ∗int_mr)

• hetcompute::svm_memregion::svm_memregion (size_t sz, cl_svm_mem_flagsflags=CL_MEM_READ_WRITE)

Constructor, allocates coarse-grained SVM memory.

• hetcompute::svm_memregion::svm_memregion (void ∗ptr, size_t sz, cl_svm_mem_flagsflags=CL_MEM_READ_WRITE)

Constructor, uses user-allocated memory.

• hetcompute::memregion::∼memregion ()

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 385

Page 386: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

Destructor.

• int hetcompute::ion_memregion::get_fd () const

Gets the file descriptor associated with the pointer to the allocated ION memory.

• GLuint hetcompute::glbuffer_memregion::get_id () const

Gets the id of the wrapped OpenGL buffer.

• size_t hetcompute::memregion::get_num_bytes () const

Get the size of the mem-region in bytes.

• void ∗ hetcompute::main_memregion::get_ptr () const

Gets a pointer to the allocated memory.

• void ∗ hetcompute::ion_memregion::get_ptr () const

Gets a pointer to the allocated ION memory.

• void ∗ hetcompute::svm_memregion::get_ptr () const

Gets a pointer to the allocated memory.

• hetcompute::memregion::HETCOMPUTE_DELETE_METHOD (memregion())

• hetcompute::memregion::HETCOMPUTE_DELETE_METHOD (memregion(memregion const&))

• hetcompute::memregion::HETCOMPUTE_DELETE_METHOD (memregion&operator=(memregion const &))

• hetcompute::memregion::HETCOMPUTE_DELETE_METHOD (memregion(memregion &&))

• hetcompute::main_memregion::HETCOMPUTE_DELETE_METHOD (main_memregion())

• hetcompute::main_memregion::HETCOMPUTE_DELETE_METHOD(main_memregion(main_memregion const &))

• hetcompute::main_memregion::HETCOMPUTE_DELETE_METHOD (main_memregion&operator=(main_memregion const &))

• hetcompute::main_memregion::HETCOMPUTE_DELETE_METHOD(main_memregion(main_memregion &&))

• hetcompute::ion_memregion::HETCOMPUTE_DELETE_METHOD (ion_memregion())

• hetcompute::ion_memregion::HETCOMPUTE_DELETE_METHOD(ion_memregion(ion_memregion const &))

• hetcompute::ion_memregion::HETCOMPUTE_DELETE_METHOD (ion_memregion&operator=(ion_memregion const &))

• hetcompute::ion_memregion::HETCOMPUTE_DELETE_METHOD(ion_memregion(ion_memregion &&))

• hetcompute::glbuffer_memregion::HETCOMPUTE_DELETE_METHOD(glbuffer_memregion())

• hetcompute::glbuffer_memregion::HETCOMPUTE_DELETE_METHOD(glbuffer_memregion(glbuffer_memregion const &))

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 386

Page 387: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

• hetcompute::glbuffer_memregion::HETCOMPUTE_DELETE_METHOD(glbuffer_memregion &operator=(glbuffer_memregion const &))

• hetcompute::glbuffer_memregion::HETCOMPUTE_DELETE_METHOD(glbuffer_memregion(glbuffer_memregion &&))

• hetcompute::svm_memregion::HETCOMPUTE_DELETE_METHOD (svm_memregion())

• hetcompute::svm_memregion::HETCOMPUTE_DELETE_METHOD(svm_memregion(svm_memregion const &))

• hetcompute::svm_memregion::HETCOMPUTE_DELETE_METHOD (svm_memregion&operator=(svm_memregion const &))

• hetcompute::svm_memregion::HETCOMPUTE_DELETE_METHOD(svm_memregion(svm_memregion &&))

• bool hetcompute::ion_memregion::is_cacheable () const

Returns whether the ION memregion is cacheable.

Variables

• internal::internal_memregion ∗ hetcompute::memregion::_int_mr

• static constexpr size_t hetcompute::main_memregion::s_default_alignment = 4096

The default alignment needed for the allocation to be page aligned.

Friends

• class hetcompute::memregion::::hetcompute::internal::memregion_base_accessor

10.3.1 Class Documentation

10.3.1.1 class hetcompute::glbuffer_memregion

Creates inter-operability with an OpenGL buffer. The user may have an external OpenGL buffer.Qualcomm HetCompute may access the OpenGL buffer once inter-operability has been setup using aninstance of this class. A derived class of hetcompute::memregion.

See Also

hetcompute::memregion

Public member functions

• glbuffer_memregion (GLuint id)

Constructor, wraps an existing OpenGL buffer to allow inter-operability with Qualcomm HetCompute.

• GLuint get_id () const

Gets the id of the wrapped OpenGL buffer.

• HETCOMPUTE_DELETE_METHOD (glbuffer_memregion())

• HETCOMPUTE_DELETE_METHOD (glbuffer_memregion(glbuffer_memregion const &))

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 387

Page 388: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

• HETCOMPUTE_DELETE_METHOD (glbuffer_memregion &operator=(glbuffer_memregionconst &))

• HETCOMPUTE_DELETE_METHOD (glbuffer_memregion(glbuffer_memregion &&))

• HETCOMPUTE_DELETE_METHOD (glbuffer_memregion &operator=(glbuffer_memregion&&))

Additional Inherited Members

10.3.1.2 class hetcompute::ion_memregion

Allocates ION memory on platforms that support it. The ION memory can be allocated as cacheable ornon-cacheable. A derived class of hetcompute::memregion.

See Also

hetcompute::memregion

Public member functions

• ion_memregion (size_t sz, bool cacheable=true)

Constructor, allocates ION memory.

• ion_memregion (void ∗ptr, size_t sz, bool cacheable)

Constructor, uses allocated ION memory.

• ion_memregion (void ∗ptr, int fd, size_t sz, bool cacheable)

Constructor, uses allocated ION memory, with the associated file descriptor.

• int get_fd () const

Gets the file descriptor associated with the pointer to the allocated ION memory.

• void ∗ get_ptr () const

Gets a pointer to the allocated ION memory.

• HETCOMPUTE_DELETE_METHOD (ion_memregion())

• HETCOMPUTE_DELETE_METHOD (ion_memregion(ion_memregion const &))

• HETCOMPUTE_DELETE_METHOD (ion_memregion &operator=(ion_memregion const &))

• HETCOMPUTE_DELETE_METHOD (ion_memregion(ion_memregion &&))

• HETCOMPUTE_DELETE_METHOD (ion_memregion &operator=(ion_memregion &&))

• bool is_cacheable () const

Returns whether the ION memregion is cacheable.

Additional Inherited Members

10.3.1.3 class hetcompute::main_memregion

Allocates aligned memory from the platform main memory. The default alignment is 4096 bytes to getpage-aligned allocation. A derived class of hetcompute::memregion.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 388

Page 389: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

See Also

hetcompute::memregion

Public member functions

• main_memregion (size_t sz, size_t alignment=s_default_alignment)

Constructor, allocates aligned memory.

• main_memregion (void ∗ptr, size_t sz)

Constructor, uses user-allocated memory.

• void ∗ get_ptr () const

Gets a pointer to the allocated memory.

• HETCOMPUTE_DELETE_METHOD (main_memregion())

• HETCOMPUTE_DELETE_METHOD (main_memregion(main_memregion const &))

• HETCOMPUTE_DELETE_METHOD (main_memregion &operator=(main_memregion const&))

• HETCOMPUTE_DELETE_METHOD (main_memregion(main_memregion &&))

• HETCOMPUTE_DELETE_METHOD (main_memregion &operator=(main_memregion &&))

Static Public Attributes

• static constexpr size_t s_default_alignment = 4096

The default alignment needed for the allocation to be page aligned.

Additional Inherited Members

10.3.1.4 class hetcompute::memregion

Base class for all mem-regions. The only common feature across the mem-regions is that they all have asize in bytes. The user constructs a mem-region of the appropriate type to allocate the corresponding typeof specialized device-memory (hetcompute::main_memregion andhetcompute::ion_memregion) or to create inter-operability with data from an external framework(hetcompute::glbuffer_memregion).

Mem-regions provide RAII semantics:

• The specialized memory is allocated or the interop created when the user constructs the mem-regionobject of the appropriate type.

• The user keeps the allocated memory or the interop alive by keeping the mem-region object alive.

Note

The base class hetcompute::memregion is not user-constructible. The user may construct froma derived class of hetcompute::memregion that provides the desired allocation or interopfunctionaity.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 389

Page 390: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

See Also

hetcompute::main_memregionhetcompute::svm_memregionhetcompute::ion_memregionhetcompute::glbuffer_memregion

Public member functions

• ∼memregion ()

Destructor.

• size_t get_num_bytes () const

Get the size of the mem-region in bytes.

• HETCOMPUTE_DELETE_METHOD (memregion())

• HETCOMPUTE_DELETE_METHOD (memregion(memregion const &))

• HETCOMPUTE_DELETE_METHOD (memregion &operator=(memregion const &))

• HETCOMPUTE_DELETE_METHOD (memregion(memregion &&))

• HETCOMPUTE_DELETE_METHOD (memregion &operator=(memregion &&))

Protected Member Functions

• memregion (internal::internal_memregion ∗int_mr)

Protected Attributes

• internal::internal_memregion ∗ _int_mr

Friends

• class ::hetcompute::internal::memregion_base_accessor

10.3.1.5 class hetcompute::svm_memregion

Allocates memory from the OpenCL 2.0 SVM memory region. A derived class ofhetcompute::memregion.

See Also

hetcompute::memregion

Public member functions

• svm_memregion (size_t sz, cl_svm_mem_flags flags=CL_MEM_READ_WRITE)

Constructor, allocates coarse-grained SVM memory.

• svm_memregion (void ∗ptr, size_t sz, cl_svm_mem_flags flags=CL_MEM_READ_WRITE)

Constructor, uses user-allocated memory.

• void ∗ get_ptr () const

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 390

Page 391: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

Gets a pointer to the allocated memory.

• HETCOMPUTE_DELETE_METHOD (svm_memregion())

• HETCOMPUTE_DELETE_METHOD (svm_memregion(svm_memregion const &))

• HETCOMPUTE_DELETE_METHOD (svm_memregion &operator=(svm_memregion const &))

• HETCOMPUTE_DELETE_METHOD (svm_memregion(svm_memregion &&))

• HETCOMPUTE_DELETE_METHOD (svm_memregion &operator=(svm_memregion &&))

Additional Inherited Members

10.3.2 Function Documentation

10.3.2.1 hetcompute::glbuffer_memregion::glbuffer_memregion ( GLuint id )[explicit]

Constructor, wraps an existing OpenGL buffer to allow inter-operability with Qualcomm HetCompute. Thesize of the OpenGL buffer is automatically determined and set as the size of the mem-region.

Parameters

id The GLuint id of an existing OpenGL buffer.

10.3.2.2 hetcompute::ion_memregion::ion_memregion ( size_t sz, bool cacheable =true ) [explicit]

Constructor, allocates ION memory.

Parameters

sz Size of the allocation in bytes.cacheable Optional, cacheable if true, non-cacheable if false. Default is

cacheable.

10.3.2.3 hetcompute::ion_memregion::ion_memregion ( void ∗ ptr, size_t sz, boolcacheable )

Constructor, uses allocated ION memory. The user is responsible for ensuring the lifetime of the IONmemory, and handling the deallocation. The lifetime of the user allocated memory MUST exceed any useof the memory via the memregion object (say, if the memregion is used by a buffer).

Parameters

ptr Pointer to the externally allocated region.The block at ptr of size sz bytesmust be fully contained within an existing HetCompute ion_memregion

sz Size of the allocation in bytes.cacheable true if ptr points to a cacheable ion region, false otherwise.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 391

Page 392: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

10.3.2.4 hetcompute::ion_memregion::ion_memregion ( void ∗ ptr, int fd, size_t sz,bool cacheable )

Constructor, uses allocated ION memory, with the associated file descriptor. The user is responsible forensuring the lifetime of the ION memory, and handling the deallocation. The lifetime of the user allocatedmemory MUST exceed any use of the memory via the memregion object (say, if the memregion is used bya buffer). This variant is more flexible as it enables the construction of an ion_memregion using ionmemory that was (a) allocated by another process, or, (b) allocated by the same process without using ahetcompute::ion_memregion.

Parameters

ptr Pointer to the externally allocated region.fd File descriptor associated with the allocated ion pointer.sz Size of the allocation in bytes.cacheable true if ptr points to a cacheable ion region, false otherwise.

10.3.2.5 hetcompute::main_memregion::main_memregion ( size_t sz, size_talignment = s_default_alignment ) [explicit]

Constructor, allocates aligned memory.

Parameters

sz Size of the allocation in bytes.alignment Optional, desired alignment. Default is page aligned.

10.3.2.6 hetcompute::main_memregion::main_memregion ( void ∗ ptr, size_t sz )

Constructor, uses user-allocated memory. The user is responsible for ensuring the lifetime of the memory,and handling the deallocation. The lifetime of the user allocated memory MUST exceed any use of thememory via the memregion object (say, if the memregion is used by a buffer).

Parameters

ptr Pointer to the externally allocated region.sz Size of the allocation in bytes.

10.3.2.7 hetcompute::svm_memregion::svm_memregion ( size_t sz, cl_svm_mem_-flags flags = CL_MEM_READ_WRITE ) [explicit]

Constructor, allocates coarse-grained SVM memory.

Parameters

sz Size of the allocation in bytes.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 392

Page 393: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

10.3.2.8 hetcompute::svm_memregion::svm_memregion ( void ∗ ptr, size_t sz,cl_svm_mem_flags flags = CL_MEM_READ_WRITE )

Constructor, uses user-allocated memory. The user is responsible for ensuring the lifetime of the memory,and handling the deallocation. The lifetime of the user allocated memory MUST exceed any use of thememory via the memregion object (say, if the memregion is used by a buffer).

Parameters

ptr Pointer to the externally allocated region.sz Size of the allocation in bytes.

10.3.2.9 int hetcompute::ion_memregion::get_fd ( ) const

Gets the file descriptor associated with the pointer to the allocated ION memory.

Returns

The file descriptor associated with the allocated ION memory.

10.3.2.10 GLuint hetcompute::glbuffer_memregion::get_id ( ) const

Gets the id of the wrapped OpenGL buffer.

Returns

The id of the wrapped OpenGL buffer.

10.3.2.11 size_t hetcompute::memregion::get_num_bytes ( ) const

Get the size of the mem-region in bytes. Applies to all derived mem-region classes.

Returns

Size of the mem-region in bytes.

10.3.2.12 void∗ hetcompute::main_memregion::get_ptr ( ) const

Gets a pointer to the allocated memory.

Returns

A pointer to the allocated memory.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 393

Page 394: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Buffers Reference API

10.3.2.13 void∗ hetcompute::ion_memregion::get_ptr ( ) const

Gets a pointer to the allocated ION memory.

Returns

A pointer to the allocated ION memory.

10.3.2.14 void∗ hetcompute::svm_memregion::get_ptr ( ) const

Gets a pointer to the allocated memory.

Returns

A pointer to the allocated memory.

10.3.2.15 bool hetcompute::ion_memregion::is_cacheable ( ) const

Returns whether the ION memregion is cacheable.

Returns

whether the ION memregion is cacheable.

10.3.3 Variable Documentation

10.3.3.1 constexpr size_t hetcompute::main_memregion::s_default_alignment = 4096[static]

The default alignment needed for the allocation to be page aligned.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 394

Page 395: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

11 Graphics Reference API

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 395

Page 396: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Graphics Reference API

11.1 Texture APIs

Typedefs

• template<addressing_mode addr_mode, filter_mode fil_mode>

using hetcompute::graphics::sampler_ptr = ::hetcompute::internal::hetcompute_shared_ptr<internal::sampler_cl< addr_mode, fil_mode >>

• template<image_format img_format, int dims>

using hetcompute::graphics::texture_ptr = ::hetcompute::internal::hetcompute_shared_ptr<internal::texture_cl< img_format, dims >>

Functions

• template<image_format img_format, int dims>

texture_ptr< img_format, dims > hetcompute::graphics::create_derivative_texture (texture_ptr<img_format, dims > &parent_texture, extended_format_plane_type derivative_plane_type, boolread_only)

Create HetCompute single-plane derivative texture from a multi-plane parent texture. The parent texture iscreated with create_texture(...) using ION memory.

• template<addressing_mode addr_mode, filter_mode fil_mode>

sampler_ptr< addr_mode, fil_mode > hetcompute::graphics::create_sampler (boolnormalized_coords)

Create HetCompute sampler with create_sampler(...). Create HetCompute sampler withcreate_sampler(...).

• template<image_format img_format, int dims, typename T >

texture_ptr< img_format, dims > hetcompute::graphics::create_texture (image_size< dims > const&is, T ∗host_ptr)

Create Qualcomm HetCompute texture with create_texture(...). Create HetCompute texture withcreate_texture(...)

• template<image_format img_format, int dims>

texture_ptr< img_format, dims > hetcompute::graphics::create_texture (image_size< dims > const&is, ion_memregion const &ion_mr, bool read_only=false)

Create HetCompute texture with create_texture(...) using ion memory Create HetCompute texture withcreate_texture(...) using ion memory.

• bool hetcompute::graphics::is_supported (image_format img_format)

Test if given image format is supported by current platform and context at runtime.

• template<image_format img_format, int dims>

void ∗ hetcompute::graphics::map (texture_ptr< img_format, dims > &tp)

Map data from GPU to CPU with map(...). Map data from GPU to CPU with map(...).

• template<image_format img_format, int dims>

void hetcompute::graphics::unmap (texture_ptr< img_format, dims > &tp)

Unmap data from CPU to GPU with unmap(...). Unmap data from CPU to GPU with unmap(...).

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 396

Page 397: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Graphics Reference API

11.1.1 Function Documentation

11.1.1.1 template<image_format img_format, int dims> texture_ptr<img_format,dims> hetcompute::graphics::create_derivative_texture ( texture_ptr<img_format, dims > & parent_texture, extended_format_plane_typederivative_plane_type, bool read_only )

The following OpenCL QCOM extension are adopted for this feature

Extract Derivative Image Plane: cl_qcom_extract_image_plane QCOM Supported Compressed Image:cl_qcom_compressed_image QCOM Other Non-Conventional Images [NV12, TP10]:cl_qcom_other_image

Create HetCompute single-plane derivative texture from a multi-plane parent HetCompute texture. Theparent texture is created with create_texture(...) using ION memory

Template Parameters

img_format HetCompute image formatdims image dimensions

Parameters

parent_texture multi-plane parent texture created using create_texture(...)derivative_plane_-type

Type of child plane (Y or UV)

read_only indicates if the created derivative texture should be RO/RW

Returns

a pointer to the created derivative texture

Note

img_format should match the image format of parent texture

11.1.1.2 template<addressing_mode addr_mode, filter_mode fil_mode> sampler_-ptr<addr_mode, fil_mode> hetcompute::graphics::create_sampler ( boolnormalized_coords )

Template Parameters

addr_mode Addressing mode.fil_mode Filtering mode.

Parameters

normalized_coords Whether to use normalized coordinates for pixel access.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 397

Page 398: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Graphics Reference API

Returns

A pointer to the created sampler.

11.1.1.3 template<image_format img_format, int dims, typename T > texture_-ptr<img_format, dims> hetcompute::graphics::create_texture ( image_size<dims > const & is, T ∗ host_ptr )

Template Parameters

img_format Qualcomm HetCompute image format.dims Image dimensions.

Parameters

is Image dimension size.host_ptr Host pointer to image data in CPU. host_ptr must not be nullptr.

Returns

A pointer to the created texture.

11.1.1.4 template<image_format img_format, int dims> texture_ptr<img_format,dims> hetcompute::graphics::create_texture ( image_size< dims > const &is, ion_memregion const & ion_mr, bool read_only = false )

Template Parameters

img_format HetCompute image formatdims image dimensions

Parameters

is image dimension sizeion_mr instance of ion_memregion

Returns

a pointer to the created texture

11.1.1.5 bool hetcompute::graphics::is_supported ( image_format img_format )

Parameters

img_format Qualcomm HetCompute image format.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 398

Page 399: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Graphics Reference API

Returns

Ff this image format is supported by current device.

11.1.1.6 template<image_format img_format, int dims> void∗ hetcompute::graphics-::map ( texture_ptr< img_format, dims > & tp )

Template Parameters

img_format Qualcomm HetCompute image format.dims Image dimensions.

Parameters

tp Qualcomm HetCompute texture pointer.

Returns

A pointer to image data in CPU.

11.1.1.7 template<image_format img_format, int dims> void hetcompute::graphics-::unmap ( texture_ptr< img_format, dims > & tp )

Template Parameters

img_format Qualcomm HetCompute image format.dims Image dimensions.

Parameters

tp Qualcomm HetCompute texture pointer.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 399

Page 400: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Graphics Reference API

11.2 Texture Data Types

Classes

• struct hetcompute::graphics::image_size< dims >

• struct hetcompute::graphics::image_size< 1 >

• struct hetcompute::graphics::image_size< 2 >

• struct hetcompute::graphics::image_size< 3 >

Enumerations

• enum hetcompute::graphics::addressing_mode { ADDRESS_NONE,ADDRESS_CLAMP_TO_EDGE, ADDRESS_CLAMP, ADDRESS_REPEAT }

• enum hetcompute::graphics::extended_format_plane_type { ExtendedFormatYPlane,ExtendedFormatUVPlane }

• enum hetcompute::graphics::filter_mode { FILTER_NEAREST, FILTER_LINEAR }

• enum hetcompute::graphics::image_format : int {first, RGBAsnorm_int8 = first, RGBAunorm_int8, RGBAsigned_int8,RGBAunsigned_int8, RGBAunorm_int16, RGBA_float, RGBA_half,ARGBsnorm_int8, ARGBunorm_int8, ARGBsigned_int8, ARGBunsigned_int8,BGRAsnorm_int8, BGRAunorm_int8, BGRAsigned_int8, BGRAunsigned_int8,RGsnorm_int8, RGunorm_int8, RGsigned_int8, RGunsigned_int8,RGunorm_int16, RG_float, RG_half, INTENSITYsnorm_int8,INTENSITYsnorm_int16, INTENSITYunorm_int8, INTENSITYunorm_int16,INTENSITY_float,LUMINANCEsnorm_int8, LUMINANCEsnorm_int16, LUMINANCEunorm_int8,LUMINANCEunorm_int16,LUMINANCE_float, Rsnorm_int8, Runorm_int8, Rsigned_int8,Runsigned_int8, Runorm_int16, R_float, R_half,NV12unorm_int8, P010unorm_int10, TP10unorm_int10, TiledNV12unorm_int8,TiledP010unorm_int10, TiledTP10unorm_int10, CompressedNV12unorm_int8,CompressedNV124Runorm_int8,CompressedP010unorm_int10, CompressedTP10unorm_int10, last =CompressedTP10unorm_int10 }

11.2.1 Class Documentation

11.2.1.1 struct hetcompute::graphics::image_size

template<int dims>struct hetcompute::graphics::image_size< dims >

HetCompute image dimension description.

11.2.1.2 struct hetcompute::graphics::image_size< 1 >

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 400

Page 401: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Graphics Reference API

template<>struct hetcompute::graphics::image_size< 1 >

Qualcomm HetCompute 1D image dimension description.

Data fields

Type Field Descriptionsize_t _width

11.2.1.3 struct hetcompute::graphics::image_size< 2 >

template<>struct hetcompute::graphics::image_size< 2 >

Qualcomm HetCompute 2D image dimension description.

Data fields

Type Field Descriptionsize_t _height Height.size_t _width Width.

11.2.1.4 struct hetcompute::graphics::image_size< 3 >

template<>struct hetcompute::graphics::image_size< 3 >

Qualcomm HetCompute 3D image dimension description.

Data fields

Type Field Descriptionsize_t _depth Depth.size_t _height Height.size_t _width Width.

11.2.2 Enumeration Type Documentation

11.2.2.1 enum hetcompute::graphics::addressing_mode [strong]

Supported image addressing mode in Qualcomm HetCompute. Each mode can be mapped to OpenCLsampler addressing mode.

11.2.2.2 enum hetcompute::graphics::extended_format_plane_type [strong]

Qualcomm HetCompute supported extended format derivative types

11.2.2.3 enum hetcompute::graphics::filter_mode [strong]

Supported image filter mode in Qualcomm HetCompute. Each mode can be mapped to OpenCL samplerfilter mode.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 401

Page 402: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Graphics Reference API

11.2.2.4 enum hetcompute::graphics::image_format : int [strong]

Supported image format in Qualcomm HetCompute. Each format can be mapped to OpenCL image formatand pixel channel.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 402

Page 403: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

12 Data Structures Reference API

Qualcomm HetCompute provides a set of concurrent data structures that are optimized for performanceusing internal Qualcomm HetCompute primitives.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 403

Page 404: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Data Structures Reference API

12.1 Bounded Lock-Free Queue

Classes

• class hetcompute::bounded_lfqueue< T >

Typedefs

• typedefinternal::blfq::blfq_size_t< T,(sizeof(size_t) >=sizeof(T))> hetcompute::bounded_lfqueue< T >::container_type

• typedef T hetcompute::bounded_lfqueue< T >::value_type

Functions

• hetcompute::bounded_lfqueue< T >::bounded_lfqueue (size_t log_size)

• hetcompute::bounded_lfqueue< T >::HETCOMPUTE_DELETE_METHOD(bounded_lfqueue(bounded_lfqueue const &))

• hetcompute::bounded_lfqueue< T >::HETCOMPUTE_DELETE_METHOD(bounded_lfqueue(bounded_lfqueue &&))

• hetcompute::bounded_lfqueue< T >::HETCOMPUTE_DELETE_METHOD(bounded_lfqueue &operator=(bounded_lfqueue const &))

• bool hetcompute::bounded_lfqueue< T >::pop (value_type &r)

• bool hetcompute::bounded_lfqueue< T >::push (value_type const &v)

12.1.1 Class Documentation

12.1.1.1 class hetcompute::bounded_lfqueue

template<typename T>class hetcompute::bounded_lfqueue< T >

A Bounded Lock-Free FIFO Queue.

Note: The size of the queue is bounded at creation time.

Public Types

• typedefinternal::blfq::blfq_size_t< T,(sizeof(size_t) >=sizeof(T))> container_type

• typedef T value_type

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 404

Page 405: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Data Structures Reference API

Public member functions

• bounded_lfqueue (size_t log_size)

• HETCOMPUTE_DELETE_METHOD (bounded_lfqueue(bounded_lfqueue const &))

• HETCOMPUTE_DELETE_METHOD (bounded_lfqueue(bounded_lfqueue &&))

• HETCOMPUTE_DELETE_METHOD (bounded_lfqueue &operator=(bounded_lfqueue const &))

• HETCOMPUTE_DELETE_METHOD (bounded_lfqueue &operator=(bounded_lfqueue &&))

• bool pop (value_type &r)

• bool push (value_type const &v)

12.1.2 Function Documentation

12.1.2.1 template<typename T > hetcompute::bounded_lfqueue< T >::bounded_-lfqueue ( size_t log_size ) [explicit]

Constructs the Bounded Lock-Free Queue, given the log (base 2) of the maximum number of entries it cancontain.

Parameters

log_size Log (base 2) of the maximum number of entries in each node.

12.1.2.2 template<typename T > bool hetcompute::bounded_lfqueue< T >::pop (value_type & r )

Pop from the queue, placing the popped value in the result.

Parameters

r The object to store the popped value in, if successful.

Returns

True if the pop was successful; false if the queue was empty.

Note: The contents of r are not modified if the pop is unsuccessful.

12.1.2.3 template<typename T > bool hetcompute::bounded_lfqueue< T >::push (value_type const & v )

Push value into the queue.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 405

Page 406: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Data Structures Reference API

Parameters

v Value to be pushed into the queue.

Returns

True if the push was successful; false if the queue was full.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 406

Page 407: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Data Structures Reference API

12.2 Unbounded Lock-Free Queue

Classes

• class hetcompute::lfqueue< T >

Typedefs

• typedef internal::lfq::lfq< T > hetcompute::lfqueue< T >::container_type

• typedef T hetcompute::lfqueue< T >::value_type

Functions

• hetcompute::lfqueue< T >::lfqueue (size_t log_size)

• hetcompute::lfqueue< T >::HETCOMPUTE_DELETE_METHOD (lfqueue(lfqueue const &))

• hetcompute::lfqueue< T >::HETCOMPUTE_DELETE_METHOD (lfqueue(lfqueue &&))

• hetcompute::lfqueue< T >::HETCOMPUTE_DELETE_METHOD (lfqueue&operator=(lfqueue const &))

• bool hetcompute::lfqueue< T >::pop (value_type &r)

• bool hetcompute::lfqueue< T >::push (value_type const &v)

12.2.1 Class Documentation

12.2.1.1 class hetcompute::lfqueue

template<typename T>class hetcompute::lfqueue< T >

Unbounded Lock-Free FIFO queue that is capable of dynamically growing and shrinking.

Public Types

• typedef internal::lfq::lfq< T > container_type

• typedef T value_type

Public member functions

• lfqueue (size_t log_size)

• HETCOMPUTE_DELETE_METHOD (lfqueue(lfqueue const &))

• HETCOMPUTE_DELETE_METHOD (lfqueue(lfqueue &&))

• HETCOMPUTE_DELETE_METHOD (lfqueue &operator=(lfqueue const &))

• HETCOMPUTE_DELETE_METHOD (lfqueue &operator=(lfqueue &&))

• bool pop (value_type &r)

• bool push (value_type const &v)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 407

Page 408: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Data Structures Reference API

12.2.2 Function Documentation

12.2.2.1 template<typename T> hetcompute::lfqueue< T >::lfqueue ( size_t log_size) [explicit]

Constructs the Unbounded Lock-Free Queue, given the log (base 2) of the size of the static array withineach node.

Parameters

log_size Log (base 2) of the maximum number of entries in each node.

12.2.2.2 template<typename T> bool hetcompute::lfqueue< T >::pop ( value_type &r )

Pop from the queue, placing the popped value in the result.

Parameters

r The object to store the popped value in, if successful.

Returns

True if the pop was successful; FALSE if the queue was empty.

Note: the contents of r are not modified if the pop is unsuccessful.

12.2.2.3 template<typename T> bool hetcompute::lfqueue< T >::push ( value_typeconst & v )

Push value into the queue. Since the queue is capable of growing, a push always succeeds.

Parameters

v Value to be pushed into the queue.

Returns

True

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 408

Page 409: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

13 Data Sharing and Storage ReferenceAPI

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 409

Page 410: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Data Sharing and Storage Reference API

13.1 Data Sharing Synchronization

The primitives defined in this chapter allow concurrent access to data.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 410

Page 411: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Data Sharing and Storage Reference API

13.2 Scheduler StorageUsing scheduler storage requires including the following header file:

#include <hetcompute/schedulerstorage.hh>

Classes

• class hetcompute::scheduler_storage_ptr< T, Allocator >

13.2.1 Class Documentation

13.2.1.1 class hetcompute::scheduler_storage_ptr

template<typename T, class Allocator = std::allocator<T>>class hetcompute::scheduler_storage_ptr<T, Allocator >

Scheduler-local storage allows sharing of information across tasks on a per-scheduler basis, like whatthread-local storage does for threads. A scheduler_storage_ptr\<T\> stores a pointer-to-T (T∗).In contrast to task_storage_ptr, the contents are persistent across tasks. In contrast tothread_storage_ptr, the contents are guaranteed to not be changed while a task is suspended. Tomaintain these guarantees, the runtime system is free to create new objects of type T whenever needed.

See Also

task_storage_ptrthread_storage_ptr

Example

1 #include <algorithm>2 #include <iterator>34 #include <hetcompute/hetcompute.hh>56 template <size_t N>7 struct image_scratchpad8 {9 image_scratchpad() { std::fill(std::begin(edge_image), std::end(edge_image), 0); }10 char edge_image[N];11 };1213 namespace14 {15 const hetcompute::scheduler_storage_ptr<image_scratchpad<4096>

> image_buffers;16 }; // namespace1718 int19 main()20 {21 hetcompute::runtime::init();22 int const N = 200;2324 auto g = hetcompute::create_group();25 for (int i = 1; i < N; ++i)26 {27 g->launch([i] {28 // fill image buffer, which is reused across tasks29 for (auto& slot : image_buffers->edge_image)30 slot = i & 0xff;

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 411

Page 412: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Data Sharing and Storage Reference API

31 hetcompute::internal::yield(); // context-switch, we expect SLS to survive this32 // check contents33 for (auto const& slot : image_buffers->edge_image)34 {35 if (slot != char(i & 0xff))36 {37 HETCOMPUTE_ILOG("mismatch at position %d", i);38 }39 }40 });41 }42 g->wait_for();4344 hetcompute::runtime::shutdown();45 return 0;46 }1 #include <hetcompute/hetcompute.hh>23 namespace4 {5 const hetcompute::scheduler_storage_ptr<size_t> s_sls_state;6 }; // namespace78 int9 main()10 {11 hetcompute::runtime::init();12 auto g = hetcompute::create_group();1314 for (size_t i = 0; i < 200; ++i)15 {16 g->launch([i] {17 size_t c = ++*s_sls_state;18 // values for c are consecutive on a per-scheduler basis19 (void)c;20 });21 }2223 g->wait_for();2425 hetcompute::runtime::shutdown();26 return 0;27 }1 #include <hetcompute/hetcompute.hh>23 namespace4 {5 const hetcompute::scheduler_storage_ptr<size_t> s_sls_state;6 }; // namespace78 int9 main()10 {11 hetcompute::runtime::init();12 auto g = hetcompute::create_group();13 auto t = hetcompute::create_task([] {});1415 for (size_t i = 0; i < 200; ++i)16 {17 g->launch([=] {18 size_t c1 = ++*s_sls_state;19 t->launch();20 t->wait_for();21 size_t c2 = ++*s_sls_state;22 if (c1 + 1 != c2)23 {24 HETCOMPUTE_ILOG("error: mismatch");25 }26 });27 }28

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 412

Page 413: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Data Sharing and Storage Reference API

29 g->wait_for();30 hetcompute::runtime::shutdown();3132 return 0;33 }

Public Types

• typedef Allocator allocator_type

Static Public Member Functions

• static void ∗ get_specific (internal::storage_key key)

• static int key_create (internal::storage_key ∗key, void(∗dtor)(void ∗))

• static int set_specific (internal::storage_key key, void const ∗value)

Friends

• class scoped_storage_ptr<::hetcompute::scheduler_storage_ptr, T, Allocator >

Additional Inherited Members

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 413

Page 414: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Data Sharing and Storage Reference API

13.3 Scoped StorageUsing scoped storage requires including the following header file:

#include <hetcompute/scopedstorage.hh>

Classes

• class hetcompute::scoped_storage_ptr< Scope, T, Allocator >

13.3.1 Class Documentation

13.3.1.1 class hetcompute::scoped_storage_ptr

template<template< class, class > class Scope, typename T, class Allocator>classhetcompute::scoped_storage_ptr< Scope, T, Allocator >

Scoped storage allows sharing of information on a per-task basis, similar as thread-local storage does forthreads. A scoped_storage_ptr<Scope,T,Allocator> stores a pointer-to-T (T∗) for a givenscope Scope<T,A>, defining life time, persistence, etc.. Allocator controls the allocation of Tobjects.

Public Types

• typedef T const ∗ pointer_type

• typedef Scope< T, Allocator > scope_type

Public member functions

• scoped_storage_ptr ()

• T ∗ get () const

• HETCOMPUTE_DELETE_METHOD (scoped_storage_ptr(scoped_storage_ptr const &))

• HETCOMPUTE_DELETE_METHOD (scoped_storage_ptr &operator=(scoped_storage_ptr const&))

• HETCOMPUTE_DELETE_METHOD (scoped_storage_ptr &operator=(scoped_storage_ptr const&) volatile)

• HETCOMPUTE_DELETE_METHOD (scoped_storage_ptr &operator=(T ∗const &))

• operator bool () const

• operator pointer_type () const

• bool operator! () const

• bool operator!= (T ∗const &other)

• T & operator∗ () const

• T ∗ operator-> () const

• bool operator== (T ∗const &other)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 414

Page 415: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Data Sharing and Storage Reference API

13.3.1.1.1 Constructors and Destructors

13.3.1.1.1.1 template<template< class, class > class Scope, typename T, class Allocator>hetcompute::scoped_storage_ptr< Scope, T, Allocator >::scoped_storage_ptr ( )

Exceptions

hetcompute::tls_-exception

if scoped_storage_ptr could not be reserved

Note

If exceptions are disabled, logs error if key create is unsuccessful.

13.3.1.1.2 Member Function Documentation

13.3.1.1.2.1 template<template< class, class > class Scope, typename T, class Allocator> T∗hetcompute::scoped_storage_ptr< Scope, T, Allocator >::get ( ) const

Returns

Stored pointer value; a new object of type T is created and stored, if it has not been stored before.

13.3.1.1.2.2 template<template< class, class > class Scope, typename T, class Allocator>hetcompute::scoped_storage_ptr< Scope, T, Allocator >::operator bool ( ) const[explicit]

Casting operator to bool (constantly true).

13.3.1.1.2.3 template<template< class, class > class Scope, typename T, class Allocator>hetcompute::scoped_storage_ptr< Scope, T, Allocator >::operator pointer_type ( ) const

Casting operator to T∗ pointer type.

13.3.1.1.2.4 template<template< class, class > class Scope, typename T, class Allocator> boolhetcompute::scoped_storage_ptr< Scope, T, Allocator >::operator! ( ) const

Returns

Constantly false.

13.3.1.1.2.5 template<template< class, class > class Scope, typename T, class Allocator> T&hetcompute::scoped_storage_ptr< Scope, T, Allocator >::operator∗ ( ) const

Returns

Reference to value pointed to by stored pointer; A new object of type T is created and stored, if it hasnot been stored before.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 415

Page 416: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Data Sharing and Storage Reference API

13.3.1.1.2.6 template<template< class, class > class Scope, typename T, class Allocator> T∗hetcompute::scoped_storage_ptr< Scope, T, Allocator >::operator-> ( ) const

Returns

Stored pointer value; a new object of type T is created and stored, if it has not been stored before.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 416

Page 417: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Data Sharing and Storage Reference API

13.4 Task StorageUsing task storage requires including the following header file:

#include <hetcompute/taskstorage.hh>

Classes

• class hetcompute::task_storage_ptr< T >

13.4.1 Class Documentation

13.4.1.1 class hetcompute::task_storage_ptr

template<typename T>class hetcompute::task_storage_ptr< T >

Task-local storage enables allocation of task-specific data, like what thread-local storage does for threads.The value of a task_storage_ptr is local to a task. A task_storage_ptr\<T\> stores apointer-to-T (T∗).

See Also

thread_storage_ptr

Example

1 #include <hetcompute/hetcompute.hh>23 namespace4 {5 hetcompute::task_storage_ptr<int> storage;6 }; // namespace78 void func();910 void11 func()12 {13 HETCOMPUTE_ILOG("%d", *storage);14 ++*storage;15 }1617 int18 main()19 {20 hetcompute::runtime::init();21 auto g = hetcompute::create_group();22 for (int i = 0; i < 10; ++i)23 {24 g->launch([i] {25 int v = i;26 storage = &v;27 func();28 if (v != i + 1)29 {30 HETCOMPUTE_ILOG("error");31 }32 func();33 if (v != i + 2)34 {35 HETCOMPUTE_ILOG("error");36 }37 });

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 417

Page 418: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Data Sharing and Storage Reference API

38 }39 g->wait_for();40 hetcompute::runtime::shutdown();41 return 0;42 }

Public Types

• typedef T const ∗ pointer_type

Public member functions

• task_storage_ptr ()

• task_storage_ptr (T ∗const &ptr)

• task_storage_ptr (T ∗const &ptr, void(∗dtor)(T ∗))

• T ∗ get () const

• HETCOMPUTE_DELETE_METHOD (task_storage_ptr(task_storage_ptr const &))

• HETCOMPUTE_DELETE_METHOD (task_storage_ptr &operator=(task_storage_ptr const &))

• HETCOMPUTE_DELETE_METHOD (task_storage_ptr &operator=(task_storage_ptr const &)volatile)

• operator bool () const

• operator pointer_type () const

• bool operator! () const

• bool operator!= (T ∗const &other) const

• T & operator∗ () const

• T ∗ operator-> () const

• task_storage_ptr & operator= (T ∗const &ptr)

• bool operator== (T ∗const &other) const

13.4.1.1.1 Constructors and Destructors

13.4.1.1.1.1 template<typename T> hetcompute::task_storage_ptr< T >::task_storage_ptr ( )

Exceptions

hetcompute::tls_-exception

If task_storage_ptr could not be reserved.

Note

If exceptions are disabled in application, logs error if task_storage_ptr could not be reserved.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 418

Page 419: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Data Sharing and Storage Reference API

13.4.1.1.1.2 template<typename T> hetcompute::task_storage_ptr< T >::task_storage_ptr ( T ∗const& ptr ) [explicit]

Exceptions

hetcompute::tls_-exception

If task_storage_ptr could not be reserved.

Parameters

ptr Initial value of task-local storage.

13.4.1.1.1.3 template<typename T> hetcompute::task_storage_ptr< T >::task_storage_ptr ( T ∗const& ptr, void(∗)(T ∗) dtor )

Exceptions

hetcompute::tls_-exception

if task_storage_ptr could not be reserved.

Parameters

ptr Initial value of task-local storage.dtor Destructor function.

13.4.1.1.2 Member Function Documentation

13.4.1.1.2.1 template<typename T> T∗ hetcompute::task_storage_ptr< T >::get ( ) const

Returns

Pointer to stored pointer value.

13.4.1.1.2.2 template<typename T> hetcompute::task_storage_ptr< T >::operator bool ( ) const[explicit]

Casting operator to bool.

13.4.1.1.2.3 template<typename T> hetcompute::task_storage_ptr< T >::operator pointer_type ( )const

Casting operator to T∗ pointer type.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 419

Page 420: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Data Sharing and Storage Reference API

13.4.1.1.2.4 template<typename T> bool hetcompute::task_storage_ptr< T >::operator! ( ) const

Returns

True if stored pointer is nullptr.

13.4.1.1.2.5 template<typename T> T& hetcompute::task_storage_ptr< T >::operator∗ ( ) const

Returns

Reference to value pointed to by stored pointer.

Note: No checking for nullptr is performed.

13.4.1.1.2.6 template<typename T> T∗ hetcompute::task_storage_ptr< T >::operator-> ( ) const

Returns

Pointer to stored pointer value.

13.4.1.1.2.7 template<typename T> task_storage_ptr& hetcompute::task_storage_ptr< T >::operator=( T ∗const & ptr )

Assignment operator, stores T∗.

Parameters

ptr Pointer value to store.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 420

Page 421: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Data Sharing and Storage Reference API

13.5 Thread StorageUsing thread storage requires including the following header file:

#include <hetcompute/threadstorage.hh>

Classes

• class hetcompute::thread_storage_ptr< T, Allocator >

13.5.1 Class Documentation

13.5.1.1 class hetcompute::thread_storage_ptr

template<typename T, class Allocator = std::allocator<T>>class hetcompute::thread_storage_ptr< T,Allocator >

Thread-local storage allows sharing of information across tasks on a per-thread basis. Athread_storage_ptr\<T\> stores a pointer-to-T (T∗). In contrast to task_storage_ptr, thecontents are persistent across tasks. In contrast to scheduler_storage_ptr, the thread-local storagemay be accessed by other tasks while a task is suspended.

See Also

task_storage_ptrscheduler_storage_ptr

Example

1 #include <hetcompute/hetcompute.hh>23 namespace4 {5 const hetcompute::thread_storage_ptr<size_t> s_tls_state;6 }; // namespace78 int9 main()10 {11 hetcompute::runtime::init();12 auto g = hetcompute::create_group("test");13 auto t = hetcompute::create_task([] {});1415 for (size_t i = 0; i < 200; ++i)16 {17 g->launch([=] {18 size_t* p1 = s_tls_state.get();19 t->launch();20 t->wait_for();21 size_t* p2 = s_tls_state.get();22 // cannot assume that p1 == p223 (void)p1;24 (void)p2;25 });26 }2728 g->wait_for();2930 hetcompute::runtime::shutdown();31 return 0;32 }

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 421

Page 422: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Data Sharing and Storage Reference API

Public Types

• typedef Allocator allocator_type

Static Public Member Functions

• static void ∗ get_specific (internal::storage_key key)

• static int key_create (internal::storage_key ∗key, void(∗dtor)(void ∗))

• static int set_specific (internal::storage_key key, void const ∗value)

Friends

• class scoped_storage_ptr<::hetcompute::thread_storage_ptr, T, Allocator >

Additional Inherited Members

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 422

Page 423: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

14 Exceptions Reference API

In this chapter we discuss all exceptions thrown by the Qualcomm HetCompute runtime system.

Exceptions can be disabled in the library by compiling the application with the following compile time flag-DHETCOMPUTE_DISABLE_EXCEPTIONS=1. A general caveat to disabling exceptions in HetComputelibrary is that not all APIs will return error, some may terminate the application on API level errors such asinput parameter check failed, certain precondition to execute the API are not met.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 423

Page 424: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Exceptions Reference API

14.1 Exceptions

Classes

• class hetcompute::abort_task_exception

• class hetcompute::aggregate_exception

• class hetcompute::api_exception

• class hetcompute::canceled_exception

• class hetcompute::dsp_exception

• class hetcompute::error_exception

• class hetcompute::gpu_exception

• class hetcompute::hetcompute_exception

• class hetcompute::tls_exception

14.1.1 Class Documentation

14.1.1.1 class hetcompute::abort_task_exception

Exception thrown to abort the current task.

See Also

hetcompute::abort_on_cancel()hetcompute::abort_task()

Public member functions

• abort_task_exception (std::string msg="aborted task")

• virtual const char ∗ what () const HETCOMPUTE_NOEXCEPT

14.1.1.1.1 Member Function Documentation

14.1.1.1.1.1 virtual const char∗ hetcompute::abort_task_exception::what ( ) const [virtual]

Returns exception description.

Returns

C string describing the exception.

Implements hetcompute::hetcompute_exception.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 424

Page 425: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Exceptions Reference API

14.1.1.2 class hetcompute::aggregate_exception

Aggregate exception encapsulating all exceptions thrown in a task graph leading up to a point-of-use of atask/group; e.g., wait_for, copy_value, move_value

Example

auto g = hetcompute::create_group();g->hetcompute::launch([]{

std::string().at(1); // throws std::out_of_range exception});g->hetcompute::launch([]{

std::string().at(1); // throws std::out_of_range exception});try {

// Point-of-use of group gg->wait_for();

} catch (hetcompute::aggregate_exception& e) {while (e.has_next()) {

try {e.next(); // throws

} catch (const std::out_of_range&) {// Do something

} catch(...) {// Not reached

}}

} catch (...) {// Not reached

}

Public member functions

• aggregate_exception (std::vector< std::exception_ptr > ∗exceptions)

• aggregate_exception (const aggregate_exception &other)

• aggregate_exception (aggregate_exception &&other)

• bool has_next () const

• void next ()

• aggregate_exception & operator= (const aggregate_exception &other)

• aggregate_exception & operator= (aggregate_exception &&other)

• virtual const char ∗ what () const HETCOMPUTE_NOEXCEPT

14.1.1.2.1 Member Function Documentation

14.1.1.2.1.1 bool hetcompute::aggregate_exception::has_next ( ) const

Returns whether the aggregate_exception contains more exceptions

Returns

true if it contains more exceptions, false otherwise

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 425

Page 426: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Exceptions Reference API

14.1.1.2.1.2 void hetcompute::aggregate_exception::next ( )

Throws the next exception contained in the aggregate_exception.

Note

Does nothing if there are no more exceptions to be thrown.

14.1.1.2.1.3 virtual const char∗ hetcompute::aggregate_exception::what ( ) const [virtual]

Returns exception description.

Returns

C string describing the exception.

Implements hetcompute::hetcompute_exception.

14.1.1.3 class hetcompute::api_exception

Represents a misuse of the Qualcomm HetCompute API. For example, invalid values passed to a function.Should cause termination of the application (future releases will behave differently).

Public member functions

• api_exception (std::string msg, const char ∗filename, int lineno, const char ∗funcname)

14.1.1.4 class hetcompute::canceled_exception

Exception thrown to indicate that a task/group was canceled. Thrown at points-of-use such as wait_for,copy_value, and move_value.

Example

auto t = hetcompute::create_task([]{...});t->cancel();t->launch();try {

// Point-of-use of task tt->wait_for();

} catch (const hetcompute::canceled_exception&) {// Do something

} catch (...) {// Not reached

}

Public member functions

• HETCOMPUTE_DEFAULT_METHOD (canceled_exception(const canceled_exception &))

• HETCOMPUTE_DEFAULT_METHOD (canceled_exception(canceled_exception &&))

• HETCOMPUTE_DEFAULT_METHOD (canceled_exception &operator=(constcanceled_exception &)&)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 426

Page 427: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Exceptions Reference API

• HETCOMPUTE_DEFAULT_METHOD (canceled_exception &operator=(canceled_exception&&)&)

• virtual const char ∗ what () const HETCOMPUTE_NOEXCEPT

14.1.1.4.1 Member Function Documentation

14.1.1.4.1.1 virtual const char∗ hetcompute::canceled_exception::what ( ) const [virtual]

Returns exception description.

Returns

C string describing the exception.

Implements hetcompute::hetcompute_exception.

14.1.1.5 class hetcompute::dsp_exception

Thrown by the HetCompute runtime if there is a problem with a DSP kernel.

Public member functions

• HETCOMPUTE_DEFAULT_METHOD (dsp_exception(const dsp_exception &))

• HETCOMPUTE_DEFAULT_METHOD (dsp_exception(dsp_exception &&))

• HETCOMPUTE_DEFAULT_METHOD (dsp_exception &operator=(const dsp_exception &)&)

• HETCOMPUTE_DEFAULT_METHOD (dsp_exception &operator=(dsp_exception &&)&)

• virtual const char ∗ what () const HETCOMPUTE_NOEXCEPT

14.1.1.5.1 Member Function Documentation

14.1.1.5.1.1 virtual const char∗ hetcompute::dsp_exception::what ( ) const [virtual]

Returns exception description.

Returns

C string describing the exception.

Implements hetcompute::hetcompute_exception.

14.1.1.6 class hetcompute::error_exception

Superclass of all HETCOMPUTE-generated exceptions that indicate internal or programmer errors.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 427

Page 428: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Exceptions Reference API

Public member functions

• error_exception (std::string msg, const char ∗filename, int lineno, const char ∗fname)

• virtual const char ∗ file () const HETCOMPUTE_NOEXCEPT

• virtual const char ∗ function () const HETCOMPUTE_NOEXCEPT

• virtual int line () const HETCOMPUTE_NOEXCEPT

• virtual const char ∗ message () const HETCOMPUTE_NOEXCEPT

• virtual const char ∗ type () const HETCOMPUTE_NOEXCEPT

• virtual const char ∗ what () const HETCOMPUTE_NOEXCEPT

14.1.1.6.1 Member Function Documentation

14.1.1.6.1.1 virtual const char∗ hetcompute::error_exception::what ( ) const [virtual]

Returns exception description.

Returns

C string describing the exception.

Implements hetcompute::hetcompute_exception.

14.1.1.7 class hetcompute::gpu_exception

Thrown by the HetCompute runtime if there is a problem with a GPU kernel.

Public member functions

• HETCOMPUTE_DEFAULT_METHOD (gpu_exception(const gpu_exception &))

• HETCOMPUTE_DEFAULT_METHOD (gpu_exception(gpu_exception &&))

• HETCOMPUTE_DEFAULT_METHOD (gpu_exception &operator=(const gpu_exception &)&)

• HETCOMPUTE_DEFAULT_METHOD (gpu_exception &operator=(gpu_exception &&)&)

• virtual const char ∗ what () const HETCOMPUTE_NOEXCEPT

14.1.1.7.1 Member Function Documentation

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 428

Page 429: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Exceptions Reference API

14.1.1.7.1.1 virtual const char∗ hetcompute::gpu_exception::what ( ) const [virtual]

Returns exception description.

Returns

C string describing the exception.

Implements hetcompute::hetcompute_exception.

14.1.1.8 class hetcompute::hetcompute_exception

Superclass of all Qualcomm HetCompute-generated exceptions.

Public member functions

• virtual ∼hetcompute_exception () HETCOMPUTE_NOEXCEPT

• virtual const char ∗ what () const HETCOMPUTE_NOEXCEPT=0

14.1.1.8.1 Constructors and Destructors

14.1.1.8.1.1 virtual hetcompute::hetcompute_exception::∼hetcompute_exception ( ) [virtual]

Destructor.

14.1.1.8.2 Member Function Documentation

14.1.1.8.2.1 virtual const char∗ hetcompute::hetcompute_exception::what ( ) const [pure

virtual]

Returns exception description.

Returns

C string describing the exception.

Implemented in hetcompute::dsp_exception, hetcompute::gpu_exception,hetcompute::aggregate_exception, hetcompute::canceled_exception, hetcompute::abort_task_exception,and hetcompute::error_exception.

14.1.1.9 class hetcompute::tls_exception

Indicates that the thread TLS has been misused or become corrupted. Should cause termination of theapplication.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 429

Page 430: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Exceptions Reference API

Public member functions

• tls_exception (std::string msg, const char ∗filename, int lineno, const char ∗funcname)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 430

Page 431: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Exceptions Reference API

14.2 ErrorCodes

Enumerations

• enum hetcompute::hc_error : int {first, hetcompute::hc_error::HC_Success = first, hetcompute::hc_error::HC_TaskGpuFailure,hetcompute::hc_error::HC_TaskDspFailure,hetcompute::hc_error::HC_TaskCanceled, hetcompute::hc_error::HC_TaskAggregateFailure,hetcompute::hc_error::HC_TaskGenericError, hetcompute::hc_error::HC_GroupCanceled,hetcompute::hc_error::HC_GroupAggregateFailure, hetcompute::hc_error::HC_GroupGenericError,last = HC_GroupGenericError }

14.2.1 Enumeration Type Documentation

14.2.1.1 enum hetcompute::hc_error : int [strong]

Error codes returned by HetCompute SDK APIs. These error codes are applicable only if application hasdisabled exceptions. If application has enabled exceptions, hetcompute library will throw exceptionsinstead of returning error codes.

Enumerator

HC_Success HetCompute API successfully completedHC_TaskGpuFailure Error codes pertaining to task execution Returned by HetCompute runtime if

there is a problem with a GPU Kernel.HC_TaskDspFailure Returned by HetCompute runtime if there is a problem with a DSP Kernel.HC_TaskCanceled Returned by HetCompute runtime if task was canceled.HC_TaskAggregateFailure Returned by HetCompute runtime if there were multiple failures in a task

graph.HC_TaskGenericError Indicates any error not categorized by the above task failures.HC_GroupCanceled Error codes pertaining to task execution HetCompute returns this to indicate

group was canceled.HC_GroupAggregateFailure Returned by HetCompute runtime if there were multiple failures in

groups. One possible scenario where this could be returned, is if multiple tasks within a groupencountered runtime errors.

HC_GroupGenericError Indicates any error not categorized by the above group failures.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 431

Page 432: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

15 Affinity Management API

The Qualcomm HetCompute affinity API enables the programmer to request which CPU cores shouldexecute tasks.

Using the power management API requires including the following header file:

#include <hetcompute/affinity.hh>

To get a detailed description of all the APIs, please follow the following link:

Note

Current version only sets the affinity in the CPU cores, excluding the GPU and the Hexagon DSP.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 432

Page 433: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Affinity Management API

15.1 Affinity Settings

Classes

• struct hetcompute_affinity_settings_t

• class hetcompute::affinity::settings

Typedefs

• typedef void(∗ hetcompute_func_ptr_t )(void ∗data)

Enumerations

• enum hetcompute::affinity::cores { hetcompute::affinity::cores::all, hetcompute::affinity::cores::big,hetcompute::affinity::cores::little, hetcompute::affinity::cores::prime }

• enum hetcompute_affinity_cores_t { hetcompute_affinity_cores_all = 0,hetcompute_affinity_cores_big, hetcompute_affinity_cores_little, hetcompute_affinity_cores_prime }

• enum hetcompute_affinity_mode_t { hetcompute_affinity_mode_allow_local_setting = 0,hetcompute_affinity_mode_override_local_setting }

• enum hetcompute_affinity_pin_threads_t { hetcompute_affinity_pin_threads_false = 0,hetcompute_affinity_pin_threads_true }

• enum hetcompute::affinity::mode { hetcompute::affinity::mode::allow_local_setting,hetcompute::affinity::mode::override_local_setting }

Functions

• hetcompute::affinity::settings::settings (cores cores_attribute, bool pin_threads, modemd=::hetcompute::affinity::mode::allow_local_setting)

• hetcompute::affinity::settings::∼settings ()

• template<typename Function , typename... Args>

void hetcompute::affinity::execute (hetcompute::affinity::settings desired_aff, Function &&f,Args...args)

• settings hetcompute::affinity::get ()

• cores hetcompute::affinity::settings::get_cores () const

• mode hetcompute::affinity::settings::get_mode () const

• hetcompute::affinity::settings hetcompute::internal::affinity::get_non_local_affinity_settings ()

• bool hetcompute::affinity::settings::get_pin_threads () const

• void hetcompute_affinity_execute (const hetcompute_affinity_settings_t desired_aff,hetcompute_func_ptr_t f, void ∗args)

• hetcompute_affinity_settings_t hetcompute_affinity_get ()

• void hetcompute_affinity_reset ()

• void hetcompute_affinity_set (const hetcompute_affinity_settings_t as)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 433

Page 434: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Affinity Management API

• bool hetcompute::internal::soc::is_big_little_cpu ()

• bool hetcompute::internal::soc::is_this_big_core ()

• bool hetcompute::affinity::settings::operator!= (const settings &rhs) const

• bool hetcompute::affinity::settings::operator== (const settings &rhs) const

• void hetcompute::affinity::reset ()

• void hetcompute::affinity::settings::reset_pin_threads ()

• void hetcompute::affinity::set (const settings as)

• void hetcompute::affinity::settings::set_cores (cores cores_attribute)

• void hetcompute::affinity::settings::set_mode (mode md)

• void hetcompute::affinity::settings::set_pin_threads ()

15.1.1 Class Documentation

15.1.1.1 struct hetcompute_affinity_settings_t

Data fields

Type Field Descriptionhetcompute_-affinity_cores_t

cores

hetcompute_-affinity_mode_t

mode Pin threads to individual cores

hetcompute_-affinity_pin_-threads_t

pin_threads Group of cores to set the affinity to

15.1.1.2 class hetcompute::affinity::settings

Affinity settings class

This class is used to define the affinity conditions desired by the programmer

Public member functions

• settings (cores cores_attribute, bool pin_threads, modemd=::hetcompute::affinity::mode::allow_local_setting)

• ∼settings ()

• cores get_cores () const

• mode get_mode () const

• bool get_pin_threads () const

• bool operator!= (const settings &rhs) const

• bool operator== (const settings &rhs) const

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 434

Page 435: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Affinity Management API

• void reset_pin_threads ()

• void set_cores (cores cores_attribute)

• void set_mode (mode md)

• void set_pin_threads ()

15.1.2 Typedef Documentation

15.1.2.1 typedef void(∗ hetcompute_func_ptr_t)(void ∗data)

Function pointer type to pass to hetcompute_affinity_execute()

See Also

hetcompute_affinity_execute()

15.1.3 Enumeration Type Documentation

15.1.3.1 enum hetcompute::affinity::cores [strong]

C++ Affinity API

See Also

include/hetcompute/affinity.h for the C Affinity APIEnumeration type to select the cores where toapply affinity settings in a big-little system. In homogeneous systems, all is always used.

Enumerator

all Use all SoC cores to set the affinitybig Set all threads to be eligible to run in the big cluster of the SoClittle Set all threads to be eligible to run in the big cluster of the SoCprime Set all threads to be eligible to run in the big cluster of the SoC

15.1.3.2 enum hetcompute_affinity_cores_t

C Affinity API

See Also

include/hetcompute/affinity.hh for the C++ Affinity API

Enumerator

hetcompute_affinity_cores_big Use all SoC cores to set the affinityhetcompute_affinity_cores_little Set all threads to be eligible to run in the big cluster of the SoChetcompute_affinity_cores_prime Set all threads to be eligible to run in the LITTLE cluster of the

SoC Set all threads to be eligible to run in the prime cluster of the SoC

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 435

Page 436: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Affinity Management API

15.1.3.3 enum hetcompute_affinity_mode_t

Enumerator

hetcompute_affinity_mode_override_local_setting Set a default affinity for all cpu tasks for whichaffinity was not specified. For example, if the user sets the affinity in allow_local_setting mode tobig, the big cores will execute all tasks except those marked as little

15.1.3.4 enum hetcompute_affinity_pin_threads_t

Enumerator

hetcompute_affinity_pin_threads_true Do not pin threads to individual cores Pin threads toindividual cores

15.1.3.5 enum hetcompute::affinity::mode [strong]

Enumeration type to select the affinity mode in big-little systems. In homogeneous system, mode, as cores,is ignored.

Enumerator

allow_local_setting Set a default affinity for all cpu tasks for which affinity was not specified. Forexample, if the user sets the affinity in allow_local_setting mode to big, the big cores will executeall tasks except those marked as little

override_local_setting Set the affinity for all cpu tasks regardless of local task/scope settings. Forexample, if the user sets the affinity to little in override_local_setting mode, the little cores willexecute all cpu tasks including those marked as big

15.1.4 Function Documentation

15.1.4.1 hetcompute::affinity::settings::settings ( cores cores_attribute, boolpin_threads, mode md = ::hetcompute::affinity::mode::allow_local_setting) [explicit]

Constructor with cores and pin_threads arguments

Parameters

in cores_attribute Type of cores hetcompute::affinity::cores::all hetcompute-::affinity::cores::big hetcompute::affinity::cores::littlehetcompute::affinity::cores::prime

in pin_threads If true, enable pinningin md Operation mode

hetcompute::affinity::mode::allow_local_settinghetcompute::affinity::mode::override_local_setting

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 436

Page 437: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Affinity Management API

15.1.4.2 hetcompute::affinity::settings::∼settings ( )

Destructor

15.1.4.3 template<typename Function , typename... Args> void hetcompute::affinity-::execute ( hetcompute::affinity::settings desired_aff, Function && f, Args...args )

Execute function/lambda/function-object with args, while enforcing desired affinity (big/LITTLE/all).

Note

Call returns upon completion of function f

Parameters

desired_aff desired affinity for executionf function to execute; may be lambda/function/function-objectargs arguments to pass to function f for execution

15.1.4.4 settings hetcompute::affinity::get ( )

Return the current affinity settings.

15.1.4.5 cores hetcompute::affinity::settings::get_cores ( ) const

Return the cores affinity member

15.1.4.6 mode hetcompute::affinity::settings::get_mode ( ) const

Return the mode member

15.1.4.7 bool hetcompute::affinity::settings::get_pin_threads ( ) const

Return the pin affinity member

Returns

true–Device threads are pinned.false–Device threads are not pinned.

15.1.4.8 void hetcompute_affinity_execute ( const hetcompute_affinity_settings_tdesired_aff, hetcompute_func_ptr_t f, void ∗ args )

Execute function with args, while enforcing desired affinity (big/LITTLE/all).

Note

Call returns upon completion of function f

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 437

Page 438: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Affinity Management API

Parameters

desired_aff desired affinity for executionf function to executeargs arguments to pass to function f for execution

15.1.4.9 hetcompute_affinity_settings_t hetcompute_affinity_get ( )

Return the current affinity settings.

15.1.4.10 void hetcompute_affinity_reset ( )

Reset the thread pool affinity so that all Qualcomm HetCompute threads can run in any core of the system.

15.1.4.11 void hetcompute_affinity_set ( const hetcompute_affinity_settings_t as )

Set the affinity of all Qualcomm HetCompute runtime pool threads.

A successful call to this function will set the affinity of all runtime threads to the requested settings.

The HetCompute affinity settings enables to control three knobs:

• cores: Set where to run the task.

• pinning: Set whether threads can migrate among cores

• mode: Set whether kernel affinity attributes are going to be fulfilled. When mode equals normal,set_big() and set_little() kernel attributes are respected, when mode equals force, otherwise areignored. Force mode can be useful in situations where the programmers want to guarantee thatcertain cores are not used; e.g.; leave the little cores for an audio library in a game. The default modefor HetCompute is normal.

For example, to run all Qualcomm HetCompute threads in all big cores of the SoC allowing kernelattributes, call this function with a settings object with cores, pin, and mode equal to big, false, and normalrespectively. Or to enable pinning in a system with homogeneous cores, call with all and true. This will setthe mode to normal, since it is the default.

Parameters

in as Affinity settings object.

Example 1 Setting the affinity with force and normal modes

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();78 auto fn = [](int i) { HETCOMPUTE_ILOG("Function executed with specified affinity on arg %d", i); };9 auto aff_settings =10 hetcompute::affinity::settings(

hetcompute::affinity::cores::big, false,hetcompute::affinity::mode::allow_local_setting);

11 // In a big.LITTLE SoC, function fn executes on a big core.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 438

Page 439: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Affinity Management API

12 hetcompute::affinity::execute(aff_settings, fn, 42);1314 auto g = hetcompute::create_group(__FUNCTION__);1516 auto k_wout_attrib = hetcompute::create_cpu_kernel([] { HETCOMPUTE_ILOG("

Task without kernel affinity attribute."); });1718 auto k_with_attrib = hetcompute::create_cpu_kernel([] { HETCOMPUTE_ILOG("

Task with kernel affinity attribute"); });19 k_with_attrib.set_little();2021 // k_with_attrib kernel will run in a LITTLE core22 g->launch(k_with_attrib);2324 // k_wout_attrib can run in any core25 g->launch(k_wout_attrib);2627 g->wait_for();2829 // Set the affinity to the LITTLE cores without pinning in30 // allow_local_setting mode31 hetcompute::affinity::set(32 hetcompute::affinity::settings(

hetcompute::affinity::cores::little, false,hetcompute::affinity::mode::allow_local_setting));

3334 // k_wout_attrib task will run in a LITTLE core because the kernel has no35 // individual affinity specification36 g->launch(k_wout_attrib);3738 // Set the affinity to the big cores with pinning in allow_local_setting mode39 // by reading the current affinity and then updating the different fields40 auto affinity = hetcompute::affinity::get();4142 // Update the cores from LITTLE to big43 affinity.set_cores(hetcompute::affinity::cores::big);4445 // Enable thread pinning46 affinity.set_pin_threads();4748 // Update the mode from allow_local_setting to override_local_setting in the49 // settings50 affinity.set_mode(hetcompute::affinity::mode::override_local_setting

);5152 // Update the affinity with the modified affinity object53 hetcompute::affinity::set(affinity);5455 // The second run of k_with_attrib will run on a big core because the56 // affinity mode is override_local_setting and global affinity settings are57 // obeyed58 g->launch(k_with_attrib);5960 g->wait_for();6162 hetcompute::runtime::shutdown();63 return 0;64 }

15.1.4.12 bool hetcompute::internal::soc::is_this_big_core ( )

Is the core on which the calling thread is running a big core?

Returns

true if big core, false if LITTLE core

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 439

Page 440: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Affinity Management API

15.1.4.13 bool hetcompute::affinity::settings::operator!= ( const settings & rhs )const

Inequality operator to check for different settings objects

Returns

true–Compared settings are different.false–Compared settings are equal.

15.1.4.14 bool hetcompute::affinity::settings::operator== ( const settings & rhs )const

Equality operator to compare settings object

Returns

true–Compared settings are equal.false–Compared settings are different.

15.1.4.15 void hetcompute::affinity::reset ( )

Reset the thread pool affinity so that all Qualcomm HetCompute threads can run in any core of the system.

15.1.4.16 void hetcompute::affinity::settings::reset_pin_threads ( )

Reset the pin affinity member

15.1.4.17 void hetcompute::affinity::set ( const settings as )

Set the affinity of all Qualcomm HetCompute runtime pool threads.

A successful call to this function will set the affinity of all runtime threads to the requested settings.

The HetCompute affinity settings enables to control three knobs:

• cores: Set where to run the task.

• pinning: Set whether threads can migrate among cores

• mode: Set whether kernel affinity attributes are going to be fulfilled. When mode equals normal,set_big() and set_little() kernel attributes are respected, when mode equals force, otherwise areignored. Force mode can be useful in situations where the programmers want to guarantee thatcertain cores are not used; e.g.; leave the little cores for an audio library in a game. The default modefor HetCompute is normal.

For example, to run all Qualcomm HetCompute threads in all big cores of the SoC allowing kernelattributes, call this function with a settings object with cores, pin, and mode equal to big, false, and normalrespectively. Or to enable pinning in a system with homogeneous cores, call with all and true. This will setthe mode to normal, since it is the default.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 440

Page 441: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Affinity Management API

Parameters

in as Affinity settings object.

Example 1 Setting the affinity with force and normal modes

1 #include <hetcompute/hetcompute.hh>23 int4 main()5 {6 hetcompute::runtime::init();78 auto fn = [](int i) { HETCOMPUTE_ILOG("Function executed with specified affinity on arg %d", i); };9 auto aff_settings =10 hetcompute::affinity::settings(

hetcompute::affinity::cores::big, false,hetcompute::affinity::mode::allow_local_setting);

11 // In a big.LITTLE SoC, function fn executes on a big core.12 hetcompute::affinity::execute(aff_settings, fn, 42);1314 auto g = hetcompute::create_group(__FUNCTION__);1516 auto k_wout_attrib = hetcompute::create_cpu_kernel([] { HETCOMPUTE_ILOG("

Task without kernel affinity attribute."); });1718 auto k_with_attrib = hetcompute::create_cpu_kernel([] { HETCOMPUTE_ILOG("

Task with kernel affinity attribute"); });19 k_with_attrib.set_little();2021 // k_with_attrib kernel will run in a LITTLE core22 g->launch(k_with_attrib);2324 // k_wout_attrib can run in any core25 g->launch(k_wout_attrib);2627 g->wait_for();2829 // Set the affinity to the LITTLE cores without pinning in30 // allow_local_setting mode31 hetcompute::affinity::set(32 hetcompute::affinity::settings(

hetcompute::affinity::cores::little, false,hetcompute::affinity::mode::allow_local_setting));

3334 // k_wout_attrib task will run in a LITTLE core because the kernel has no35 // individual affinity specification36 g->launch(k_wout_attrib);3738 // Set the affinity to the big cores with pinning in allow_local_setting mode39 // by reading the current affinity and then updating the different fields40 auto affinity = hetcompute::affinity::get();4142 // Update the cores from LITTLE to big43 affinity.set_cores(hetcompute::affinity::cores::big);4445 // Enable thread pinning46 affinity.set_pin_threads();4748 // Update the mode from allow_local_setting to override_local_setting in the49 // settings50 affinity.set_mode(hetcompute::affinity::mode::override_local_setting

);5152 // Update the affinity with the modified affinity object53 hetcompute::affinity::set(affinity);5455 // The second run of k_with_attrib will run on a big core because the56 // affinity mode is override_local_setting and global affinity settings are

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 441

Page 442: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Affinity Management API

57 // obeyed58 g->launch(k_with_attrib);5960 g->wait_for();6162 hetcompute::runtime::shutdown();63 return 0;64 }

15.1.4.18 void hetcompute::affinity::settings::set_cores ( cores cores_attribute )

Set the cores affinity member

Parameters

in cores_attribute Type of desired cores.

15.1.4.19 void hetcompute::affinity::settings::set_mode ( mode md )

Set the mode member

When mode is force, all task will be executed by the CPUs set by the current affinity settings. Whennormal, cpu tasks will be run by default by the same set of CPUs, except when the user has manually set thekernel affinity with set_big() or set_little().

15.1.4.20 void hetcompute::affinity::settings::set_pin_threads ( )

Set the pin affinity member

When the pin is set, each hetcompute thread will be pinned to a single core

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 442

Page 443: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

16 Miscellaneous

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 443

Page 444: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Miscellaneous

16.1 Interoperability

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 444

Page 445: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Miscellaneous

16.2 Legacy

Functions

• void hetcompute::runtime::init ()

• void hetcompute::runtime::shutdown ()

16.2.1 Function Documentation

16.2.1.1 void hetcompute::runtime::init ( )

Starts up HetCompute SDK’s runtime.

Initializes Hetcompute internal data structures, tasks, schedulers, and thread pools.

16.2.1.2 void hetcompute::runtime::shutdown ( )

Shuts down HetCompute SDK’s runtime.

Shuts down the runtime. It returns only when all running tasks have finished.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 445

Page 446: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

17 Class Documentation

17.1 cpu_kernel Class Reference

The documentation for this class was generated from the following file:

• /het-compute-sdk/core/include/hetcompute/cpukernel.hh

17.2 dsp_kernel Class Reference

The documentation for this class was generated from the following file:

• /het-compute-sdk/core/include/hetcompute/dspkernel.hh

17.3 HetComputeApp::features Class Reference

Static Public Member Functions

• static bool supportException ()

The documentation for this class was generated from the following file:

• /het-compute-sdk/core/include/hetcompute/runtime.hh

17.4 hetcompute::beta::pattern::pipeline< UserData > ClassTemplate Reference

Heterogeneous Pipeline class.

Public Types

• using context = typename parent_type::context

Context type for the pipeline.

Public member functions

• pipeline ()

Constructor.

• pipeline (pipeline const &other)

Copy constructor.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 446

Page 447: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Class Documentation

• pipeline (pipeline &&other)

Move constructor.

• virtual ∼pipeline ()

Destructor.

• template<typename... Args>

std::enable_if<!internal::pipeline_utility::check_gpu_kernel< Args...>::has_gpu_kernel,void >::type add_stage (Args &&...args)

Add a CPU stage.

• template<typename... Args>

std::enable_if< internal::pipeline_utility::check_gpu_kernel< Args...>::has_gpu_kernel,void >::type add_stage (Args &&...args)

Add a GPU stage.

• pipeline & operator= (pipeline const &other)

Copy assignment operator.

• pipeline & operator= (pipeline &&other)

Move assignment operator.

template<typename... UserData>class hetcompute::beta::pattern::pipeline< UserData >

Heterogeneous Pipeline class.

Template Parameters

UserData The type for the pipeline context data or empty, i.e.,hetcompute::pattern::pipeline<size_t> orhetcompute::pattern::pipeline<>.

17.4.1 Member Typedef Documentation

17.4.1.1 template<typename... UserData> using hetcompute::beta::pattern-::pipeline< UserData >::context = typename parent_type::context

Context type for the pipeline.

17.4.2 Constructors and Destructors

17.4.2.1 template<typename... UserData> hetcompute::beta::pattern::pipeline<UserData >::pipeline ( )

Constructor.

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 447

Page 448: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Class Documentation

17.4.2.2 template<typename... UserData> virtual hetcompute::beta::pattern-::pipeline< UserData >::∼pipeline ( ) [virtual]

Destructor.

Reimplemented from hetcompute::pattern::pipeline< UserData...>.

17.4.2.3 template<typename... UserData> hetcompute::beta::pattern::pipeline<UserData >::pipeline ( pipeline< UserData > const & other )

Copy constructor.

17.4.2.4 template<typename... UserData> hetcompute::beta::pattern::pipeline<UserData >::pipeline ( pipeline< UserData > && other )

Move constructor.

17.4.3 Member Function Documentation

17.4.3.1 template<typename... UserData> template<typename... Args> std::enable_-if<!internal::pipeline_utility::check_gpu_kernel<Args...>::has_gpu_kernel,void>::type hetcompute::beta::pattern::pipeline< UserData >::add_stage (Args &&... args )

Add a CPU stage.

Parameters

args The features of the cpu stage.

See Also

template<typename... Args> void add_cpu_stage(Args&&... args)

17.4.3.2 template<typename... UserData> template<typename... Args> std::enable_-if<internal::pipeline_utility::check_gpu_kernel<Args...>::has_gpu_kernel,void>::type hetcompute::beta::pattern::pipeline< UserData >::add_stage (Args &&... args )

Add a GPU stage.

Parameters

args The features of the gpu stage.

See Also

template<typename... Args> void add_gpu_stage(Args&&... args)

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 448

Page 449: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK Class Documentation

17.4.3.3 template<typename... UserData> pipeline& hetcompute::beta::pattern-::pipeline< UserData >::operator= ( pipeline< UserData > const & other)

Copy assignment operator.

17.4.3.4 template<typename... UserData> pipeline& hetcompute::beta::pattern-::pipeline< UserData >::operator= ( pipeline< UserData > && other)

Move assignment operator.

The documentation for this class was generated from the following file:

• /het-compute-sdk/core/include/hetcompute/pipeline.hh

17.5 hetcompute::internal::pointkernel::pointkernel< RT, Args> Class Template Reference

The documentation for this class was generated from the following file:

• /het-compute-sdk/core/include/hetcompute/pfor_each.hh

17.6 stage_input_base Class Reference

The documentation for this class was generated from the following file:

• /het-compute-sdk/core/include/hetcompute/pipelinedata.hh

17.7 hetcompute::internal::task_factory< X, Y, Z > StructTemplate Reference

The documentation for this struct was generated from the following file:

• /het-compute-sdk/core/include/hetcompute/cpukernel.hh

17.8 hetcompute::internal::task_factory_dispatch< X, Y >Struct Template Reference

The documentation for this struct was generated from the following file:

• /het-compute-sdk/core/include/hetcompute/cpukernel.hh

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 449

Page 450: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Index∼dsp_kernel

hetcompute::dsp_kernel< int(∗)(Args...)>, 264∼hetcompute_exception

hetcompute::hetcompute_exception, 429∼pipeline

hetcompute::beta::pattern::pipeline, 447hetcompute::pattern::pipeline, 204

∼pipeline_contexthetcompute::pipeline_context< UserData >,

212hetcompute::pipeline_context<>, 213

∼pipeline_context_basehetcompute::pipeline_context_base, 214

∼settingsAffinity Settings, 436

∼stage_inputhetcompute::stage_input, 220

∼task_ptrhetcompute::task_ptr< ReturnType >, 319hetcompute::task_ptr< ReturnType(Args...)>,

325hetcompute::task_ptr< void >, 328hetcompute::task_ptr<>, 331

abort_on_cancelTasks, 335

abort_taskTasks, 337

acquire_rohetcompute::buffer_ptr, 372

acquire_rwhetcompute::buffer_ptr, 373

acquire_wihetcompute::buffer_ptr, 374

addhetcompute::device_set, 362hetcompute::group, 232, 233

add_stagehetcompute::beta::pattern::pipeline, 448

addressing_modeTexture Data Types, 401

Affinity Management API, 432Affinity Settings, 433∼settings, 436all, 435allow_local_setting, 436big, 435cores, 435execute, 437

get, 437get_cores, 437get_mode, 437get_pin_threads, 437hetcompute_affinity_cores_big, 435hetcompute_affinity_cores_little, 435hetcompute_affinity_cores_prime, 435hetcompute_affinity_mode_override_local_-

setting,436

hetcompute_affinity_pin_threads_true, 436hetcompute_affinity_cores_t, 435hetcompute_affinity_execute, 437hetcompute_affinity_get, 438hetcompute_affinity_mode_t, 435hetcompute_affinity_pin_threads_t, 436hetcompute_affinity_reset, 438hetcompute_affinity_set, 438hetcompute_func_ptr_t, 435is_this_big_core, 439little, 435mode, 436operator==, 440override_local_setting, 436prime, 435reset, 440reset_pin_threads, 440set, 440set_cores, 442set_mode, 442set_pin_threads, 442settings, 436

allAffinity Settings, 435

allow_local_settingAffinity Settings, 436

args_tuplehetcompute::task< ReturnType(Args...)>, 302hetcompute::task_ptr< ReturnType(Args...)>,

324arity

hetcompute::task< ReturnType(Args...)>, 304hetcompute::task_ptr< ReturnType(Args...)>,

326at

hetcompute::buffer_ptr, 375

beginhetcompute::buffer_ptr, 375

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 450

Page 451: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK INDEX

hetcompute::range_base, 289, 290big

Affinity Settings, 435bind_all

hetcompute::task< ReturnType(Args...)>, 302bind_as_data_dependency

Tasks, 339bind_by_value

Tasks, 339blocking

Tasks, 339Bounded Lock-Free Queue, 404

bounded_lfqueue, 405pop, 405push, 405

bounded_lfqueueBounded Lock-Free Queue, 405

buffer_ptrhetcompute::buffer_ptr, 372

Buffers, 367create_buffer, 382, 383

Buffers Reference API, 359

cancelhetcompute::group, 234hetcompute::task<>, 306

cancel_pipelinehetcompute::pipeline_context_base, 214

canceledhetcompute::group, 236hetcompute::task<>, 309

cbeginhetcompute::buffer_ptr, 375

cendhetcompute::buffer_ptr, 376

clKernels, 272

collapsed_task_typeTasks, 335

const_iteratorhetcompute::buffer_ptr, 371

contexthetcompute::beta::pattern::pipeline, 447hetcompute::pattern::pipeline, 204

copy_valuehetcompute::task< ReturnType >, 300

coresAffinity Settings, 435

cpu_kernel, 446hetcompute::cpu_kernel, 260

hetcompute::cpu_kernel<FReturnType(FArgs...)>, 262

create_bufferBuffers, 382, 383

create_cpu_kernelKernels, 269

create_derivative_textureTexture APIs, 397

create_dsp_kernelKernels, 270

create_gpu_kernelKernels, 270–272

create_groupGroups, 250–252

create_pdivide_and_conquerhetcompute::pattern::pdivide_and_conquerer,

188Parallel Divide-and-Conquer, 188

create_pfor_eachParallel For Loop, 165

create_preducehetcompute::pattern::preducer, 178Parallel Reduction, 178

create_pscan_inclusivehetcompute::pattern::pscan, 184Parallel Scan, 185

create_psorthetcompute::pattern::psorter, 196Parallel Sorting, 196

create_ptransformhetcompute::pattern::ptransformer, 171Parallel Transformation, 172

create_samplerTexture APIs, 397

create_taskhetcompute::pattern::pipeline, 205, 206Tasks, 340, 341

create_textureTexture APIs, 398

create_value_taskTasks, 342

datahetcompute::index_base, 275

Data Sharing and Storage Reference API, 409Data Sharing Synchronization, 410Data Structures Reference API, 403data_type

hetcompute::buffer_ptr, 371device_set

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 451

Page 452: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK INDEX

hetcompute::device_set, 361device_type

Heterogeneous Compute Device Types, 365dims

hetcompute::range_base, 290disable_sliding_window

hetcompute::pattern::pipeline, 209do_not_collapse

Tasks, 357dsp_kernel, 446

hetcompute::dsp_kernel< int(∗)(Args...)>, 264

emptyhetcompute::device_set, 362

enable_sliding_windowhetcompute::pattern::pipeline, 209

endhetcompute::buffer_ptr, 376hetcompute::range_base, 290

ErrorCodesHC_GroupAggregateFailure, 431HC_GroupCanceled, 431HC_GroupGenericError, 431HC_Success, 431HC_TaskAggregateFailure, 431HC_TaskCanceled, 431HC_TaskDspFailure, 431HC_TaskGenericError, 431HC_TaskGpuFailure, 431

ErrorCodes, 431hc_error, 431

Exceptions, 424Exceptions Reference API, 423execute

Affinity Settings, 437extended_format_plane_type

Texture Data Types, 401

filter_modeTexture Data Types, 401

finish_afterGroups, 252hetcompute::group, 236hetcompute::task<>, 311Tasks, 344

getAffinity Settings, 437hetcompute::group_ptr, 247hetcompute::scoped_storage_ptr, 415hetcompute::task_ptr< ReturnType >, 319

hetcompute::task_ptr< ReturnType(Args...)>,325

hetcompute::task_ptr< void >, 328hetcompute::task_ptr<>, 331hetcompute::task_storage_ptr, 419

get_chunk_sizehetcompute::pattern::tuner, 224

get_cl_kernel_binaryhetcompute::gpu_kernel, 268

get_coresAffinity Settings, 437

get_cpu_loadhetcompute::pattern::tuner, 224

get_datahetcompute::pipeline_context< UserData >,

212get_degree_of_concurrency

hetcompute::parallel_stage, 203get_doc

hetcompute::pattern::tuner, 224get_dsp_load

hetcompute::pattern::tuner, 224get_fd

Memory Regions, 393get_first_elem_iter_id

hetcompute::stage_input, 220get_gpu_load

hetcompute::pattern::tuner, 224get_id

Memory Regions, 393get_iter_id

hetcompute::pipeline_context_base, 216get_iter_lag

hetcompute::iteration_lag, 200get_iter_rate_curr

hetcompute::iteration_rate, 201get_iter_rate_pred

hetcompute::iteration_rate, 202get_ith_element

hetcompute::stage_input, 220get_max_stage_iter

hetcompute::pipeline_context_base, 216get_mode

Affinity Settings, 437get_name

hetcompute::group, 238get_num_bytes

Memory Regions, 393get_pin_threads

Affinity Settings, 437

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 452

Page 453: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK INDEX

get_ptrMemory Regions, 393, 394

get_sizehetcompute::sliding_window_size, 219

get_stage_idhetcompute::pipeline_context_base, 216

get_typehetcompute::serial_stage, 218

glKernels, 272

glbuffer_memregionMemory Regions, 391

gpu_kernelhetcompute::gpu_kernel, 266, 267

Graphics Reference API, 395group_ptr

hetcompute::group_ptr, 246Groups, 230

create_group, 250–252finish_after, 252intersect, 253operator&, 254

HC_GroupAggregateFailureErrorCodes, 431

HC_GroupCanceledErrorCodes, 431

HC_GroupGenericErrorErrorCodes, 431

HC_SuccessErrorCodes, 431

HC_TaskAggregateFailureErrorCodes, 431

HC_TaskCanceledErrorCodes, 431

HC_TaskDspFailureErrorCodes, 431

HC_TaskGenericErrorErrorCodes, 431

HC_TaskGpuFailureErrorCodes, 431

has_iter_limithetcompute::pipeline_context_base, 216

has_nexthetcompute::aggregate_exception, 425

has_profilehetcompute::pattern::tuner, 225

hc_errorErrorCodes, 431

HetComputeApp::features, 446

hetcompute::abort_task_exception, 424hetcompute::affinity::settings, 434hetcompute::aggregate_exception, 424hetcompute::api_exception, 426hetcompute::beta::call_tuple, 257hetcompute::beta::call_tuple< Dim, gpu_kernel<

Args...> >, 258hetcompute::beta::cl_t, 258hetcompute::beta::gl_t, 265hetcompute::bounded_lfqueue, 404hetcompute::buffer_const_iterator, 368hetcompute::buffer_iterator, 369hetcompute::buffer_ptr, 370hetcompute::canceled_exception, 426hetcompute::cpu_kernel, 258hetcompute::cpu_kernel< FReturnType(FArgs...)>,

260hetcompute::device_set, 360hetcompute::do_not_collapse_t, 299hetcompute::dsp_exception, 427hetcompute::dsp_kernel, 263hetcompute::dsp_kernel< int(∗)(Args...)>, 263hetcompute::error_exception, 427hetcompute::glbuffer_memregion, 387hetcompute::gpu_exception, 428hetcompute::gpu_kernel, 265hetcompute::graphics::image_size, 400hetcompute::graphics::image_size< 1 >, 400hetcompute::graphics::image_size< 2 >, 401hetcompute::graphics::image_size< 3 >, 401hetcompute::group, 231hetcompute::group_ptr, 245hetcompute::hetcompute_exception, 429hetcompute::in, 379hetcompute::index, 273hetcompute::index< 1 >, 273hetcompute::index< 2 >, 273hetcompute::index< 3 >, 273hetcompute::index_base, 274hetcompute::inout, 379hetcompute::ion_memregion, 388hetcompute::iteration_lag, 199hetcompute::iteration_rate, 201hetcompute::lfqueue, 407hetcompute::local, 268hetcompute::main_memregion, 388hetcompute::memregion, 389hetcompute::out, 379hetcompute::parallel_stage, 202hetcompute::pattern::pdivide_and_conquerer, 187

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 453

Page 454: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK INDEX

hetcompute::pattern::pfor, 162hetcompute::pattern::pfor< hetcompute::internal-

::pointkernel::pointkernel< RT,PKType...>, T2 >, 163

hetcompute::pattern::pfor< T1, void >, 164hetcompute::pattern::pipeline, 203hetcompute::pattern::preducer, 177hetcompute::pattern::pscan, 184hetcompute::pattern::psorter, 195hetcompute::pattern::ptransformer, 170hetcompute::pattern::tuner, 223hetcompute::pipeline_context< UserData >, 211hetcompute::pipeline_context<>, 212hetcompute::pipeline_context_base, 213hetcompute::range, 280hetcompute::range< 1 >, 280hetcompute::range< 2 >, 282hetcompute::range< 3 >, 285hetcompute::range_base, 289hetcompute::scheduler_storage_ptr, 411hetcompute::scope_acquire_ro, 379hetcompute::scope_acquire_rw, 380hetcompute::scope_acquire_wi, 381hetcompute::scoped_storage_ptr, 414hetcompute::serial_stage, 217hetcompute::sliding_window_size, 218hetcompute::stage_input, 219hetcompute::svm_memregion, 390hetcompute::task< ReturnType >, 299hetcompute::task< ReturnType(Args...)>, 301hetcompute::task< void >, 304hetcompute::task<>, 305hetcompute::task_ptr< ReturnType >, 316hetcompute::task_ptr< ReturnType(Args...)>, 322hetcompute::task_ptr< void >, 326hetcompute::task_ptr<>, 329hetcompute::task_storage_ptr, 417hetcompute::thread_storage_ptr, 421hetcompute::tls_exception, 429hetcompute_affinity_cores_big

Affinity Settings, 435hetcompute_affinity_cores_little

Affinity Settings, 435hetcompute_affinity_cores_prime

Affinity Settings, 435hetcompute_affinity_mode_override_local_setting

Affinity Settings, 436hetcompute_affinity_pin_threads_true

Affinity Settings, 436hetcompute_affinity_settings_t, 434

hetcompute::abort_task_exceptionwhat, 424

hetcompute::aggregate_exceptionhas_next, 425next, 425what, 426

hetcompute::beta::pattern::pipeline∼pipeline, 447add_stage, 448context, 447operator=, 448, 449pipeline, 447, 448

hetcompute::beta::pattern::pipeline< UserData >,446

hetcompute::buffer_ptracquire_ro, 372acquire_rw, 373acquire_wi, 374at, 375begin, 375buffer_ptr, 372cbegin, 375cend, 376const_iterator, 371data_type, 371end, 376host_data, 376iterator, 372operator=, 376release, 377saved_host_data, 378size, 378to_string, 378treat_as_texture, 378

hetcompute::canceled_exceptionwhat, 427

hetcompute::cpu_kernelcpu_kernel, 260

hetcompute::cpu_kernel< FReturnType(FArgs...)>cpu_kernel, 262

hetcompute::device_setadd, 362device_set, 361empty, 362negate, 363on_cpu, 363on_cpu_big, 363on_cpu_little, 364on_dsp, 364on_gpu, 364

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 454

Page 455: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK INDEX

remove, 364, 365to_string, 365

hetcompute::dsp_exceptionwhat, 427

hetcompute::dsp_kernel< int(∗)(Args...)>∼dsp_kernel, 264dsp_kernel, 264operator=, 265

hetcompute::error_exceptionwhat, 428

hetcompute::gpu_exceptionwhat, 428

hetcompute::gpu_kernelget_cl_kernel_binary, 268gpu_kernel, 266, 267is_cl, 268is_gl, 268

hetcompute::groupadd, 232, 233cancel, 234canceled, 236finish_after, 236get_name, 238intersect, 238launch, 238, 240–242wait_for, 244

hetcompute::group_ptrget, 247group_ptr, 246operator bool, 247operator->, 247operator=, 247, 248reset, 248swap, 248unique, 249use_count, 249

hetcompute::hetcompute_exception∼hetcompute_exception, 429what, 429

hetcompute::index_basedata, 275index_base, 274operator<, 276operator<=, 276operator>, 277operator>=, 278operator+, 275operator+=, 275operator-, 276operator-=, 276

operator=, 277operator==, 277

hetcompute::internal::pointkernel::pointkernel< RT,Args >, 449

hetcompute::internal::task_factory< X, Y, Z >, 449hetcompute::internal::task_factory_dispatch< X, Y

>, 449hetcompute::iteration_lag

get_iter_lag, 200iteration_lag, 200operator=, 200

hetcompute::iteration_rateget_iter_rate_curr, 201get_iter_rate_pred, 202iteration_rate, 201operator=, 202

hetcompute::parallel_stageget_degree_of_concurrency, 203operator=, 203parallel_stage, 202, 203

hetcompute::pattern::pdivide_and_conquerercreate_pdivide_and_conquer, 188

hetcompute::pattern::pipeline∼pipeline, 204context, 204create_task, 205, 206disable_sliding_window, 209enable_sliding_window, 209is_valid, 209operator=, 210pipeline, 204, 205run, 210

hetcompute::pattern::preducercreate_preduce, 178

hetcompute::pattern::pscancreate_pscan_inclusive, 184

hetcompute::pattern::psortercreate_psort, 196

hetcompute::pattern::ptransformercreate_ptransform, 171

hetcompute::pattern::tunerget_chunk_size, 224get_cpu_load, 224get_doc, 224get_dsp_load, 224get_gpu_load, 224has_profile, 225is_serial, 225is_static, 225set_chunk_size, 225

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 455

Page 456: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK INDEX

set_cpu_load, 225set_dsp_load, 226set_dynamic, 226set_gpu_load, 226set_max_doc, 227set_profile, 227set_serial, 227set_static, 227tuner, 223

hetcompute::pipeline_context< UserData >∼pipeline_context, 212get_data, 212

hetcompute::pipeline_context<>∼pipeline_context, 213

hetcompute::pipeline_context_base∼pipeline_context_base, 214cancel_pipeline, 214get_iter_id, 216get_max_stage_iter, 216get_stage_id, 216has_iter_limit, 216stop_pipeline, 217

hetcompute::range< 1 >index_to_linear, 281linear_to_index, 282linearized_distance, 282range, 281size, 282

hetcompute::range< 2 >index_to_linear, 284linear_to_index, 285linearized_distance, 285range, 283, 284size, 285

hetcompute::range< 3 >index_to_linear, 288linear_to_index, 288linearized_distance, 288range, 287size, 288

hetcompute::range_basebegin, 289, 290dims, 290end, 290length, 291num_elems, 291stride, 291

hetcompute::scoped_storage_ptrget, 415operator bool, 415

operator pointer_type, 415operator∗, 415operator->, 415scoped_storage_ptr, 415

hetcompute::serial_stageget_type, 218operator=, 218serial_stage, 217, 218

hetcompute::sliding_window_sizeget_size, 219operator=, 219sliding_window_size, 219

hetcompute::stage_input∼stage_input, 220get_first_elem_iter_id, 220get_ith_element, 220input_type, 220size, 221

hetcompute::task< ReturnType >copy_value, 300move_value, 300return_type, 300

hetcompute::task< ReturnType(Args...)>args_tuple, 302arity, 304bind_all, 302launch, 304return_type, 302size_type, 302

hetcompute::task<>cancel, 306canceled, 309finish_after, 311is_bound, 312launch, 313size_type, 306then, 313wait_for, 314

hetcompute::task_ptr< ReturnType >∼task_ptr, 319get, 319operator∗=, 319operatorΓA30C=, 322operator∧=, 322operator+=, 320operator->, 321operator-=, 320operator/=, 321operator=, 321operator%=, 319

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 456

Page 457: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK INDEX

operator&=, 319return_type, 318swap, 322task_ptr, 318task_type, 318

hetcompute::task_ptr< ReturnType(Args...)>∼task_ptr, 325args_tuple, 324arity, 326get, 325operator->, 325operator=, 325, 326return_type, 324size_type, 324swap, 326task_ptr, 324task_type, 324

hetcompute::task_ptr< void >∼task_ptr, 328get, 328operator->, 328operator=, 329swap, 329task_ptr, 327, 328task_type, 327

hetcompute::task_ptr<>∼task_ptr, 331get, 331operator bool, 332operator->, 332operator=, 332, 333reset, 333swap, 333task_ptr, 331task_type, 331unique, 333use_count, 334

hetcompute::task_storage_ptrget, 419operator bool, 419operator pointer_type, 419operator∗, 420operator->, 420operator=, 420task_storage_ptr, 418, 419

hetcompute_affinity_cores_tAffinity Settings, 435

hetcompute_affinity_executeAffinity Settings, 437

hetcompute_affinity_get

Affinity Settings, 438hetcompute_affinity_mode_t

Affinity Settings, 435hetcompute_affinity_pin_threads_t

Affinity Settings, 436hetcompute_affinity_reset

Affinity Settings, 438hetcompute_affinity_set

Affinity Settings, 438hetcompute_func_ptr_t

Affinity Settings, 435Heterogeneous Compute Device Types, 360

device_type, 365to_string, 366

host_datahetcompute::buffer_ptr, 376

image_formatTexture Data Types, 401

index_basehetcompute::index_base, 274

index_to_linearhetcompute::range< 1 >, 281hetcompute::range< 2 >, 284hetcompute::range< 3 >, 288

Indices, 273init

Legacy, 445input_type

hetcompute::stage_input, 220Interoperability, 444intersect

Groups, 253hetcompute::group, 238

ion_memregionMemory Regions, 391, 392

is_boundhetcompute::task<>, 312

is_cacheableMemory Regions, 394

is_clhetcompute::gpu_kernel, 268

is_glhetcompute::gpu_kernel, 268

is_serialhetcompute::pattern::tuner, 225

is_statichetcompute::pattern::tuner, 225

is_supportedTexture APIs, 398

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 457

Page 458: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK INDEX

is_this_big_coreAffinity Settings, 439

is_validhetcompute::pattern::pipeline, 209

iteration_laghetcompute::iteration_lag, 200

iteration_ratehetcompute::iteration_rate, 201

iteratorhetcompute::buffer_ptr, 372

Kernels, 256cl, 272create_cpu_kernel, 269create_dsp_kernel, 270create_gpu_kernel, 270–272gl, 272

launchhetcompute::group, 238, 240–242hetcompute::task< ReturnType(Args...)>, 304hetcompute::task<>, 313Tasks, 344, 345

Legacy, 445init, 445shutdown, 445

lengthhetcompute::range_base, 291

lfqueueUnbounded Lock-Free Queue, 408

linear_to_indexhetcompute::range< 1 >, 282hetcompute::range< 2 >, 285hetcompute::range< 3 >, 288

linearized_distancehetcompute::range< 1 >, 282hetcompute::range< 2 >, 285hetcompute::range< 3 >, 288

littleAffinity Settings, 435

main_memregionMemory Regions, 392

mapTexture APIs, 399

Memory Regions, 385get_fd, 393get_id, 393get_num_bytes, 393get_ptr, 393, 394glbuffer_memregion, 391

ion_memregion, 391, 392is_cacheable, 394main_memregion, 392s_default_alignment, 394svm_memregion, 392

Miscellaneous, 443mode

Affinity Settings, 436move_value

hetcompute::task< ReturnType >, 300

negatehetcompute::device_set, 363

nexthetcompute::aggregate_exception, 425

non_collapsed_task_typeTasks, 335

num_elemshetcompute::range_base, 291

on_cpuhetcompute::device_set, 363

on_cpu_bighetcompute::device_set, 363

on_cpu_littlehetcompute::device_set, 364

on_dsphetcompute::device_set, 364

on_gpuhetcompute::device_set, 364

operator boolhetcompute::group_ptr, 247hetcompute::scoped_storage_ptr, 415hetcompute::task_ptr<>, 332hetcompute::task_storage_ptr, 419

operator pointer_typehetcompute::scoped_storage_ptr, 415hetcompute::task_storage_ptr, 419

operator<hetcompute::index_base, 276

operator<=hetcompute::index_base, 276

operator>hetcompute::index_base, 277

operator>>Tasks, 355

operator>=hetcompute::index_base, 278

operator∗hetcompute::scoped_storage_ptr, 415

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 458

Page 459: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK INDEX

hetcompute::task_storage_ptr, 420Tasks, 348, 349

operator∗=hetcompute::task_ptr< ReturnType >, 319

operatorΓA30CTasks, 356, 357

operatorΓA30C=hetcompute::task_ptr< ReturnType >, 322

operator∼Tasks, 357

operator∧

Tasks, 355, 356operator∧=

hetcompute::task_ptr< ReturnType >, 322operator+

hetcompute::index_base, 275Tasks, 349–351

operator+=hetcompute::index_base, 275hetcompute::task_ptr< ReturnType >, 320

operator-hetcompute::index_base, 276Tasks, 352, 353

operator->hetcompute::group_ptr, 247hetcompute::scoped_storage_ptr, 415hetcompute::task_ptr< ReturnType >, 321hetcompute::task_ptr< ReturnType(Args...)>,

325hetcompute::task_ptr< void >, 328hetcompute::task_ptr<>, 332hetcompute::task_storage_ptr, 420

operator-=hetcompute::index_base, 276hetcompute::task_ptr< ReturnType >, 320

operator/Tasks, 354

operator/=hetcompute::task_ptr< ReturnType >, 321

operator=hetcompute::beta::pattern::pipeline, 448, 449hetcompute::buffer_ptr, 376hetcompute::dsp_kernel< int(∗)(Args...)>, 265hetcompute::group_ptr, 247, 248hetcompute::index_base, 277hetcompute::iteration_lag, 200hetcompute::iteration_rate, 202hetcompute::parallel_stage, 203hetcompute::pattern::pipeline, 210hetcompute::serial_stage, 218

hetcompute::sliding_window_size, 219hetcompute::task_ptr< ReturnType >, 321hetcompute::task_ptr< ReturnType(Args...)>,

325, 326hetcompute::task_ptr< void >, 329hetcompute::task_ptr<>, 332, 333hetcompute::task_storage_ptr, 420

operator==Affinity Settings, 440hetcompute::index_base, 277Tasks, 355

operator%Tasks, 347

operator%=hetcompute::task_ptr< ReturnType >, 319

operator&Groups, 254Tasks, 348

operator&=hetcompute::task_ptr< ReturnType >, 319

override_local_settingAffinity Settings, 436

Parallel Divide-and-Conquer, 187create_pdivide_and_conquer, 188pdivide_and_conquer, 189, 191pdivide_and_conquer_async, 193, 194

Parallel For Loop, 161create_pfor_each, 165pfor_each, 165–167pfor_each_async, 167, 168

Parallel Reduction, 177create_preduce, 178preduce, 179, 180preduce_async, 181, 182

Parallel Scan, 184create_pscan_inclusive, 185pscan_inclusive, 185pscan_inclusive_async, 186

Parallel Sorting, 195create_psort, 196psort, 196, 197psort_async, 197, 198

Parallel Transformation, 170create_ptransform, 172ptransform, 172–174ptransform_async, 174, 175

parallel_stagehetcompute::parallel_stage, 202, 203

Patterns Reference API, 160

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 459

Page 460: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK INDEX

pdivide_and_conquerParallel Divide-and-Conquer, 189, 191

pdivide_and_conquer_asyncParallel Divide-and-Conquer, 193, 194

pfor_eachParallel For Loop, 165–167

pfor_each_asyncParallel For Loop, 167, 168

Pipeline, 199serial_stage_type, 221

pipelinehetcompute::beta::pattern::pipeline, 447, 448hetcompute::pattern::pipeline, 204, 205

popBounded Lock-Free Queue, 405Unbounded Lock-Free Queue, 408

preduceParallel Reduction, 179, 180

preduce_asyncParallel Reduction, 181, 182

primeAffinity Settings, 435

pscan_inclusiveParallel Scan, 185

pscan_inclusive_asyncParallel Scan, 186

psortParallel Sorting, 196, 197

psort_asyncParallel Sorting, 197, 198

ptransformParallel Transformation, 172–174

ptransform_asyncParallel Transformation, 174, 175

pushBounded Lock-Free Queue, 405Unbounded Lock-Free Queue, 408

rangehetcompute::range< 1 >, 281hetcompute::range< 2 >, 283, 284hetcompute::range< 3 >, 287

Ranges, 280release

hetcompute::buffer_ptr, 377remove

hetcompute::device_set, 364, 365reset

Affinity Settings, 440hetcompute::group_ptr, 248

hetcompute::task_ptr<>, 333reset_pin_threads

Affinity Settings, 440return_type

hetcompute::task< ReturnType >, 300hetcompute::task< ReturnType(Args...)>, 302hetcompute::task_ptr< ReturnType >, 318hetcompute::task_ptr< ReturnType(Args...)>,

324run

hetcompute::pattern::pipeline, 210

s_default_alignmentMemory Regions, 394

saved_host_datahetcompute::buffer_ptr, 378

Scheduler Storage, 411Scoped Storage, 414scoped_storage_ptr

hetcompute::scoped_storage_ptr, 415serial_stage

hetcompute::serial_stage, 217, 218serial_stage_type

Pipeline, 221set

Affinity Settings, 440set_chunk_size

hetcompute::pattern::tuner, 225set_cores

Affinity Settings, 442set_cpu_load

hetcompute::pattern::tuner, 225set_dsp_load

hetcompute::pattern::tuner, 226set_dynamic

hetcompute::pattern::tuner, 226set_gpu_load

hetcompute::pattern::tuner, 226set_max_doc

hetcompute::pattern::tuner, 227set_mode

Affinity Settings, 442set_pin_threads

Affinity Settings, 442set_profile

hetcompute::pattern::tuner, 227set_serial

hetcompute::pattern::tuner, 227set_static

hetcompute::pattern::tuner, 227

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 460

Page 461: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK INDEX

settingsAffinity Settings, 436

shutdownLegacy, 445

sizehetcompute::buffer_ptr, 378hetcompute::range< 1 >, 282hetcompute::range< 2 >, 285hetcompute::range< 3 >, 288hetcompute::stage_input, 221

size_typehetcompute::task< ReturnType(Args...)>, 302hetcompute::task<>, 306hetcompute::task_ptr< ReturnType(Args...)>,

324sliding_window_size

hetcompute::sliding_window_size, 219stage_input_base, 449stop_pipeline

hetcompute::pipeline_context_base, 217stride

hetcompute::range_base, 291svm_memregion

Memory Regions, 392swap

hetcompute::group_ptr, 248hetcompute::task_ptr< ReturnType >, 322hetcompute::task_ptr< ReturnType(Args...)>,

326hetcompute::task_ptr< void >, 329hetcompute::task_ptr<>, 333

Task Storage, 417task_ptr

hetcompute::task_ptr< ReturnType >, 318hetcompute::task_ptr< ReturnType(Args...)>,

324hetcompute::task_ptr< void >, 327, 328hetcompute::task_ptr<>, 331

task_storage_ptrhetcompute::task_storage_ptr, 418, 419

task_typehetcompute::task_ptr< ReturnType >, 318hetcompute::task_ptr< ReturnType(Args...)>,

324hetcompute::task_ptr< void >, 327hetcompute::task_ptr<>, 331

Tasks, 292abort_on_cancel, 335abort_task, 337

bind_as_data_dependency, 339bind_by_value, 339blocking, 339collapsed_task_type, 335create_task, 340, 341create_value_task, 342do_not_collapse, 357finish_after, 344launch, 344, 345non_collapsed_task_type, 335operator>>, 355operator∗, 348, 349operatorΓA30C, 356, 357operator∼, 357operator∧, 355, 356operator+, 349–351operator-, 352, 353operator/, 354operator==, 355operator%, 347operator&, 348

Tasks Reference API, 229Texture APIs, 396

create_derivative_texture, 397create_sampler, 397create_texture, 398is_supported, 398map, 399unmap, 399

Texture Data Types, 400addressing_mode, 401extended_format_plane_type, 401filter_mode, 401image_format, 401

thenhetcompute::task<>, 313

Thread Storage, 421to_string

hetcompute::buffer_ptr, 378hetcompute::device_set, 365Heterogeneous Compute Device Types, 366

treat_as_texturehetcompute::buffer_ptr, 378

Tuner, 223tuner

hetcompute::pattern::tuner, 223

Unbounded Lock-Free Queue, 407lfqueue, 408pop, 408

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 461

Page 462: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK INDEX

push, 408unique

hetcompute::group_ptr, 249hetcompute::task_ptr<>, 333

unmapTexture APIs, 399

use_counthetcompute::group_ptr, 249hetcompute::task_ptr<>, 334

wait_forhetcompute::group, 244hetcompute::task<>, 314

whathetcompute::abort_task_exception, 424hetcompute::aggregate_exception, 426hetcompute::canceled_exception, 427hetcompute::dsp_exception, 427hetcompute::error_exception, 428hetcompute::gpu_exception, 428hetcompute::hetcompute_exception, 429

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 462

Page 463: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK BIBLIOGRAPHY

Bibliography[1] Sarita V. Adve and Kourosh Gharachorloo. Shared memory consistency models: A tutorial. IEEE

Computer, 29:66–76, 1995. 152

[2] Gene M. Amdahl. Validity of the single-processor approach to achieving large scale computingcapabilities. In AFIPS Conference Proceedings, volume 30, pages 483–485, Reston, VA, April 1967.148

[3] Christopher Barton, Calin Cascaval, and José Nelson Amaral. A characterization of shared data accesspatterns in upc programs. In Proceedings of the 19th international conference on Languages andcompilers for parallel computing, LCPC’06, pages 111–125, Berlin, Heidelberg, 2007.Springer-Verlag. 153

[4] Antoni Buades, Bartomeu Coll, and Jean-Michel Morel. A non-local algorithm for image denoising.In Computer Vision and Pattern Recognition, 2005. 155

[5] Calin Cascaval, Seth Fowler, Pablo Montesinos-Ortego, Wayne Piekarski, Mehrdad Reshadi, BehnamRobatmili, Michael Weber, and Vrajesh Bhavsar. Zoomm: a parallel web browser engine formulticore mobile devices. In Proceedings of the 18th ACM SIGPLAN symposium on Principles andpractice of parallel programming, PPoPP ’13, pages 271–280, 2013. 150

[6] Stephanie Coleman and Kathryn S. McKinley. Tile Size Selection Using Cache Organization andData Layout. In Proceedings of the ACM SIGPLAN Conference on Programming Languages Designand Implementation (PLDI ’95, La Jolla, CA, June 1995. SIGPLAN. 153

[7] Michael J. Flynn. Some computer organizations and their effectiveness. IEEE Transactions onComputers, C-21(9):948–960, Sept. 1972. 149

[8] Benedict R. Gaster and Lee Howes. Can GPGPU programming be liberated from the data-parallelbottleneck? IEEE Computer, pages 42–52, 2012. 150

[9] John L. Gustafson. Reevaluating Amdahl’s law. Commun. ACM, 31(5):532–533, May 1988. 149

[10] John L. Hennessy and David A. Patterson. Computer Architecture A Quantitative Approach. MorganKaufmann, second edition, 1996. 152

[11] Mark D. Hill and Michael R. Marty. Amdahl’s law in the multicore era. IEEE Computer, July 2008.149

[12] J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maurer, and D. Shippy. Introduction to theCell multiprocessor. IBM Journal of Research and Development, 49(4.5):589–604, 2005.http://dx.doi.org/10.1147/rd.494.0589. 151

[13] Milind Kulkarni, Martin Burtscher, Rajeshkar Inkulu, Keshav Pingali, and Calin Cascaval. How muchparallelism is there in irregular applications? In Proceedings of the 14th ACM SIGPLAN symposiumon Principles and practice of parallel programming, PPoPP ’09, pages 3–14, 2009. 153

[14] Timothy G. Mattson, Beverly A. Sanders, and Berna L. Massingill. Patterns for ParallelProgramming. Addison-Wesley, 2013. 151

[15] NEON intrinsics. http://gcc.gnu.org/onlinedocs/gcc/ARM-NEON-Intrinsics.html, Apr 2013. 149

[16] Qualcomm Research Silicon Valley,http://developer.qualcomm.com/snapdragon-heterogeneous-compute-sdk. Qualcomm®

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 463

Page 464: Snapdragon™ Heterogeneous Compute SDK...Qualcomm® Snapdragon™ Heterogeneous Compute SDK Documentation and Interface Specification 80-P2432-1 B August 28, 2019 Qualcomm® Snapdragon™

Qualcomm® Snapdragon™ Heterogeneous Compute SDK BIBLIOGRAPHY

Snapdragon™ Heterogeneous Compute SDK User’s Manual, 1.1.0 edition, Oct 2015. 150

[17] Gabriel Rivera and Chau-Wen Tseng. Data transformations for eliminating conflict misses. InProceedings of the ACM SIGPLAN Conference on Programming Languages Design andImplementation (PLDI ’98, pages 38–49, June 1998. 153

[18] Anne Rogers and Keshav Pingali. Process decomposition through locality of reference. SIGPLANNotices, 24(7):69–80, July 1989. 152

[19] Josep Torrellas, Monica S. Lam, and John L. Hennessy. Shared Data Placement Optimizations toReduce Multiprocessor Cache Miss Rates. In ICPP, pages II–266–II–270, 1990. 152

[20] Michael Wolfe. More Iteration Space Tiling. In Proceedings of Supercomputing ’89, pages 655–664,Reno, NV, November 1989. ACM. 153

80-P2432-1 B MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION 464