cudnn library (1)

8/10/2019 CUDNN Library (1)

1/38


2/38

www.nvidia.comcuDNN Library DU-06702-001_v6.5 | 2


3/38


Chapter 1.INTRODUCTION

NVIDIAcuDNN is a GPU-accelerated library of primitives for deep neural networks.It provides highly tuned implementations of routines arising frequently in DNNapplications:

Convolution forward and backward, including cross-correlation

Pooling forward and backward

Softmax forward and backward

Neuron activations forward and backward:

Rectified linear (ReLU)

Sigmoid

Hyperbolic tangent (TANH)

Tensor transformation functions

cuDNN's convolution routines aim for performance competitive with the fastest GEMM

(matrix multiply) based implementations of such routines while using significantly lessmemory.

cuDNN features customizable data layouts, supporting flexible dimension ordering,striding, and subregions for the 4D tensors used as inputs and outputs to all of itsroutines. This flexibility allows easy integration into any neural network implementationand avoids the input/output transposition steps sometimes necessary with GEMM-basedconvolutions.

cuDNN offers a context-based API that allows for easy multithreading and (optional)interoperability with CUDA streams.


4/38


Chapter 2.GENERAL DESCRIPTION

2.1. Programming ModelThe cuDNN Library exposes a Host API but assumes that for operations using the GPUthe data is directly accessible from the device.

The application must initialize the handle to the cuDNN library context by calling thecudnnCreate()function. Then, the handle is explicitly passed to every subsequentlibrary function call that operate on GPU data. Once the application finishes using thelibrary, it must call the function cudnnDestroy()to release the resources associatedwith the cuDNN library context. This approach allows the user to explicitly controlthe library setup when using multiple host threads and multiple GPUs. For example,the application can use cudaSetDevice()to associate different devices with differenthost threads and in each of those host threads it can initialize a unique handle to the

cuDNN library context, which will use the particular device associated with thathost thread. Then, the cuDNN library function calls made with different handle willautomatically dispatch the computation to different devices. The device associatedwith a particular cuDNN context is assumed to remain unchanged between thecorresponding cudnnCreate()and cudnnDestroy()calls. In order for the cuDNNlibrary to use a different device within the same host thread, the application must set thenew device to be used by calling cudaSetDevice()and then create another cuDNNcontext, which will be associated with the new device, by calling cudnnCreate().

2.2. Thread SafetyThe library is thread safe and its functions can be called from multiple host threads,even with the same handle. When multiple threads share the same handle, extreme careneeds to be taken when the handle configuration is changed because that change willaffect potentially subsequent cuDNN calls in all threads. It is even more true for thedestruction of the handle. So it is not recommended that multiple threads share the samecuDNN handle.


5/38

General Description


2.3. ReproducibilityBy design, most of cuDNN API routines from a given version generate the same bit-wise results at every run when executed on GPUs with the same architecture andthe same number of SMs. However, bit-wise reproducibility is not guaranteed acrossversions, as the implementation of a given routine may change. With the current release,the following routines do not guarantee reproducibility because they use atomic addoperations:

cudnnConvolutionBackwardFilter

cudnnConvolutionBackwardData

2.4. RequirementscuDNN supports NVIDIA GPUs of compute capability 3.0 and higher and requires an

NVIDIA Driver compatible with CUDA Toolkit 6.5.


6/38


Chapter 3.CUDNN DATATYPES REFERENCE

This chapter describes all the types and enums of the cuDNN library API.

3.1. cudnnHandle_tcudnnHandle_tis a pointer to an opaque structure holding the cuDNN library context.The cuDNN library context must be created using cudnnCreate()and the returnedhandle must be passed to all subsequent library function calls. The context should bedestroyed at the end using cudnnDestroy(). The context is associated with only oneGPU device, the current device at the time of the call to cudnnCreate(). Howevermultiple contexts can be created on the same GPU device.

3.2. cudnnStatus_tcudnnStatus_tis an enumerated type used for function status returns. All cuDNNlibrary functions return their status, which can be one of the following values:

Value Meaning

CUDNN_STATUS_SUCCESS The operation completed successfully.

CUDNN_STATUS_NOT_INITIALIZED The cuDNN library was not initialized properly.

This error is usually returned when a call to

cudnnCreate()fails or when cudnnCreate()has not been called prior to calling another cuDNN

routine. In the former case, it is usually due

to an error in the CUDA Runtime API called by

cudnnCreate()or by an error in the hardwaresetup.

CUDNN_STATUS_ALLOC_FAILED Resource allocation failed inside the cuDNN

library. This is usually caused by an internal

cudaMalloc()failure.

To correct: prior to the function call, deallocate

previously allocated memory as much as possible.


7/38

cuDNN Datatypes Reference


Value Meaning

CUDNN_STATUS_BAD_PARAM An incorrect value or parameter was passed to the

function.

To correct: ensure that all the parameters being

passed have valid values.

CUDNN_STATUS_ARCH_MISMATCH The function requires a feature absent fromthe current GPU device. Note that cuDNN only

supports devices with compute capabilities greater

than or equal to 3.0.

To correct: compile and run the application on a

device with appropriate compute capability.

CUDNN_STATUS_MAPPING_ERROR An access to GPU memory space failed, which isusually caused by a failure to bind a texture.

To correct: prior to the function call, unbind any

previously bound textures.

Otherwise, this may indicate an internal error/bug

in the library.

CUDNN_STATUS_EXECUTION_FAILED The GPU program failed to execute. This is usuallycaused by a failure to launch some cuDNN kernel

on the GPU, which can occur for multiple reasons.

To correct: check that the hardware, an

appropriate version of the driver, and the cuDNN

library are correctly installed.

Otherwise, this may indicate a internal error/bug

in the library.

CUDNN_STATUS_INTERNAL_ERROR An internal cuDNN operation failed.

CUDNN_STATUS_NOT_SUPPORTED The functionality requested is not presently

supported by cuDNN.

CUDNN_STATUS_LICENSE_ERROR The functionality requested requires some licenseand an error was detected when trying to check

the current licensing. This error can happen if

the license is not present or is expired or if the

environment variable NVIDIA_LICENSE_FILE is not

set properly.

3.3. cudnnTensor4dDescriptor_t

cudnnCreateTensor4dDescriptor_t is a pointer to an opaque structure holdingthe description of a generic 4D dataset. cudnnCreateTensor4dDescriptor()is used to create one instance, and cudnnSetTensor4dDescriptor()orcudnnSetTensor4dDescriptorEx() must be used to initialize this instance.


8/38



3.4. cudnnFilterDescriptor_tcudnnFilterDescriptor_tis a pointer to an opaque structure holding the descriptionof a filter dataset. cudnnCreateFilterDescriptor()is used to create one instance,and cudnnSetFilterDescriptor()must be used to initialize this instance.

3.5. cudnnConvolutionDescriptor_tcudnnConvolutionDescriptor_tis a pointer to an opaque structure holding thedescription of a convolution operation. cudnnCreateFilterDescriptor()is used tocreate one instance, and cudnnSetFilterDescriptor()must be used to initialize thisinstance.

3.6. cudnnPoolingDescriptor_tcudnnPoolingDescriptor_tis a pointer to an opaque structure holding thedescription of a pooling operation. cudnnCreatePoolingDescriptor()is used tocreate one instance, and cudnnSetPoolingDescriptor()must be used to initializethis instance.

3.7. cudnnDataType_tcudnnDataType_tis an enumerated type indicating the data type to which a tensordescriptor or filter descriptor refers.

Value Meaning

CUDNN_DATA_FLOAT The data is 32-bit single-precision floating point

(float).

CUDNN_DATA_DOUBLE The data is 64-bit double-precision floating point

(double).

3.8. cudnnTensorFormat_tcudnnTensorFormat_tis an enumerated type used bycudnnSetTensor4dDescriptor()to create a tensor with a pre-defined layout.

Value Meaning

CUDNN_TENSOR_NCHW This tensor format specifies that the data is laid

out in the following order: image, features map,

rows, columns. The strides are implicitly defined

in such a way that the data are contiguous in

memory with no padding between images, feature

maps, rows, and columns; the columns are the


9/38



Value Meaning

inner dimension and the images are the outermost

dimension.

CUDNN_TENSOR_NHWC This tensor format specifies that the data is laid

out in the following order: image, rows, columns,

features maps. The strides are implicitly defined insuch a way that the data are contiguous in memory

with no padding between images, rows, columns,

and features maps; the feature maps are the

inner dimension and the images are the outermost

dimension.

3.9. cudnnAddMode_tcudnnAddMode_tis an enumerated type used by cudnnAddTensor4d()to specify howa bias tensor is added to an input/output tensor.

Value Meaning

CUDNN_ADD_IMAGEor CUDNN_ADD_SAME_HW In this mode, the bias tensor is defined as one

image with one feature map. This image will be

added to every feature map of every image of the

input/output tensor.

CUDNN_ADD_FEATURE_MAPor

CUDNN_ADD_SAME_CHW

In this mode, the bias tensor is defined as one

image with multiple feature maps. This image

will be added to every image of the input/output

tensor.

CUDNN_ADD_SAME_C In this mode, the bias tensor is defined as oneimage with multiple feature maps of dimension

1x1; it can be seen as an vector of feature maps.Each feature map of the bias tensor will be added

to the corresponding feature map of all height-by-

width pixels of every image of the input/output

tensor.

CUDNN_ADD_FULL_TENSOR In this mode, the bias tensor has the same

dimensions as the input/output tensor. It will be

added point-wise to the input/output tensor.

3.10. cudnnConvolutionMode_t

cudnnConvolutionMode_tis an enumerated type used bycudnnSetConvolutionDescriptor() to configure a convolution descriptor. Thefilter used for the convolution can be applied in two different ways, correspondingmathematically to a convolution or to a cross-correlation. (A cross-correlation isequivalent to a convolution with its filter rotated by 180 degrees.)


10/38



Value Meaning

CUDNN_CONVOLUTION In this mode, a convolution operation will be done

when applying the filter to the images.

CUDNN_CROSS_CORRELATION In this mode, a cross-correlation operation will be

done when applying the filter to the images.

3.11. cudnnConvolutionPath_tcudnnConvolutionPath_tis an enumerated type used by the helper routinecudnnGetOutputTensor4dDim()to select the results to output.

Value Meaning

CUDNN_CONVOLUTION_FWD cudnnGetOutputTensor4dDim()will returndimensions related to the output tensor of the

forward convolution.

CUDNN_CONVOLUTION_WEIGHT_GRAD cudnnGetOutputTensor4dDim()will return thedimensions of the output filter produced while

computing the gradients, which is part of the

backward convolution.

CUDNN_CONVOLUTION_DATA_GRAD cudnnGetOutputTensor4dDim()will return the

dimensions of the output tensor produced while

computing the gradients, which is part of the

backward convolution.

3.12. cudnnAccumulateResult_tcudnnAccumulateResult_tis an enumerated type used bycudnnConvolutionForward(), cudnnConvolutionBackwardFilter() andcudnnConvolutionBackwardData() to specify whether those routines accumulatetheir results with the output tensor or simply write them to it, overwriting the previousvalue.

Value Meaning

CUDNN_RESULT_ACCUMULATE The results are accumulated with (added to the

previous value of) the output tensor.

CUDNN_RESULT_NO_ACCUMULATE The results overwrite the output tensor.

3.13. cudnnSoftmaxAlgorithm_tcudnnSoftmaxAlgorithm_tis used to select an implementation of the softmaxfunction used in cudnnSoftmaxForward()and cudnnSoftmaxBackward().


11/38


12/38


13/38


Chapter 4.CUDNN API REFERENCE

This chapter describes the API of all the routines of the cuDNN library.

4.1. cudnnCreatecudnnStatus_t cudnnCreate(cudnnHandle_t *handle)

This function initializes the cuDNN library and creates a handle to an opaquestructure holding the cuDNN library context. It allocates hardware resources onthe host and device and must be called prior to making any other cuDNN librarycalls. The cuDNN library context is tied to the current CUDA device. To use thelibrary on multiple devices, one cuDNN handle needs to be created for each device.For a given device, multiple cuDNN handles with different configurations (e.g.,different current CUDA streams) may be created. Because cudnnCreateallocatessome internal resources, the release of those resources by calling cudnnDestroywill

implicitly call cudaDeviceSynchronize; therefore, the recommended best practiceis to call cudnnCreate/cudnnDestroyoutside of performance-critical code paths.For multithreaded applications that use the same device from different threads, therecommended programming model is to create one (or a few, as is convenient) cuDNNhandle(s) per thread and use that cuDNN handle for the entire life of the thread.

Return Value Meaning

CUDNN_STATUS_SUCCESS The initialization succeeded.

CUDNN_STATUS_NOT_INITIALIZED CUDA Runtime API initialization failed.

CUDNN_STATUS_ALLOC_FAILED The resources could not be allocated.

4.2. cudnnDestroycudnnStatus_t cudnnDestroy(cudnnHandle_t handle)

This function releases hardware resources used by the cuDNN library. This functionis usually the last call with a particular handle to the cuDNN library. BecausecudnnCreateallocates some internal resources, the release of those resources by


14/38

cuDNN API Reference


calling cudnnDestroywill implicitly call cudaDeviceSynchronize; therefore,the recommended best practice is to call cudnnCreate/cudnnDestroyoutside ofperformance-critical code paths.


CUDNN_STATUS_SUCCESSThe cuDNN context destruction was successful.

CUDNN_STATUS_NOT_INITIALIZED The library was not initialized.

4.3. cudnnSetStreamcudnnStatus_t cudnnSetStream(cudnnHandle_t handle, cudaStream_t streamId)

This function sets the cuDNN library stream, which will be used to execute allsubsequent calls to the cuDNN library functions with that particular handle. If thecuDNN library stream is not set, all kernels use the default (NULL) stream. In particular,this routine can be used to change the stream between kernel launches and then to reset

the cuDNN library stream back toNULL.


CUDNN_STATUS_SUCCESS The stream was set successfully.

4.4. cudnnGetStreamcudnnStatus_t cudnnGetStream(cudnnHandle_t handle, cudaStream_t *streamId)

This function gets the cuDNN library stream, which is being used to execute all calls tothe cuDNN library functions. If the cuDNN library stream is not set, all kernels use the

defaultNULLstream.


CUDNN_STATUS_SUCCESS The stream was returned successfully.

4.5. cudnnCreateTensor4dDescriptorcudnnStatus_t cudnnCreateTensor4dDescriptor(cudnnTensor4dDescriptor_t*tensorDesc)

This function creates a Tensor4D descriptor object by allocating the memory needed to

hold its opaque structure.Return Value Meaning

CUDNN_STATUS_SUCCESS The object was created successfully.



15/38


16/38

cuDNN API Reference


This function initializes a previously created Tensor4D descriptor object, similarly tocudnnSetTensor4dDescriptorbut with the strides explicitly passed as parameters.This can be used to lay out the 4D tensor in any order or simply to define gaps betweendimensions.

At present, some cuDNN routines have limited support for strides; forexample,wStride==1is sometimes required. Those routines will return

CUDNN_STATUS_NOT_SUPPORTED if a Tensor4D object with an unsupported stride is

used. cudnnTransformTensor4dcan be used to convert the data to a supported

layout.

Param In/out Meaning

tensorDesc input/

output

Handle to a previously created tensor descriptor.

datatype input Data type.

n input Number of images.

c input Number of feature maps per image.

h input Height of each feature map.

w input Width of each feature map.

nStride input Stride between two consecutive images.

cStride input Stride between two consecutive feature maps.

hStride input Stride between two consecutive rows.

wStride input Stride between two consecutive columns.

The possible error values returned by this function and their meanings are listed below.


CUDNN_STATUS_SUCCESS The object was set successfully.

CUDNN_STATUS_BAD_PARAM At least one of the parameters n,c,h,wor

nStride,cStride,hStride,wStrideis negative

or dataTypehas an invalid enumerant value.

4.8. cudnnGetTensor4dDescriptorcudnnStatus_tcudnnGetTensor4dDescriptor( cudnnTensor4dDescriptor_t tensorDesc, cudnnDataType_t *dataType, int*n, int*c, int*h, int*w, int*nStride, int*cStride, int*hStride, int*wStride )


17/38

cuDNN API Reference


This function queries the parameters of the previouly initialized Tensor4D descriptorobject.


tensorDesc input Handle to a previously insitialized tensor descriptor.

datatype output Data type.

n output Number of images.

c output Number of feature maps per image.

h output Height of each feature map.

w output Width of each feature map.

nStride output Stride between two consecutive images.

cStride output Stride between two consecutive feature maps.

hStride output Stride between two consecutive rows.

wStride output Stride between two consecutive columns.



CUDNN_STATUS_SUCCESS The operation succeeded.

4.9. cudnnDestroyTensor4dDescriptorcudnnStatus_t cudnnDestroyTensor4dDescriptor(cudnnTensor4dDescriptor_t

tensorDesc)

This function destroys a previously created Tensor4D descriptor object.


CUDNN_STATUS_SUCCESS The object was destroyed successfully.

4.10. cudnnTransformTensor4dcudnnStatus_tcudnnTransformTensor4d( cudnnHandle_t handle, cudnnTensor4dDescriptor_t srcDesc, constvoid *srcData, cudnnTensor4dDescriptor_t destDesc, void *destData )

This function copies the data from one tensor to another tensor with a differentlayout. Those descriptors need to have the same dimensions but not necessarily thesame strides. The input and output tensors must not overlap in any way (i.e., tensorscannot be transformed in place). This function can be used to convert a tensor with anunsupported format to a supported one.


18/38

cuDNN API Reference



handle input Handle to a previously created cuDNN context.

srcDesc input Handle to a previously initialized tensor descriptor.

srcData input Pointer to data of the tensor described by the srcDescdescriptor.

destDesc input Handle to a previously initialized tensor descriptor.

destData output Pointer to data of the tensor described by the destDescdescriptor.



CUDNN_STATUS_SUCCESS The function launched successfully.

CUDNN_STATUS_BAD_PARAM The dimensions n,c,h,wor the dataTypeof thetwo tensor descriptors are different.

CUDNN_STATUS_EXECUTION_FAILED The function failed to launch on the GPU.

4.11. cudnnAddTensor4dcudnnStatus_tcudnnAddTensor4d( cudnnHandle_t handle, cudnnAddMode_t mode, constvoid *alpha, cudnnTensor4dDescriptor_t biasDesc, constvoid *biasData, cudnnTensor4dDescriptor_t srcDestDesc, void *srcDestData )

This function adds the scaled values of one tensor to another tensor. Themodeparametercan be used to select different ways of performing the scaled addition. The amountof data described by thebiasDescdescriptor must match exactly the amount of dataneeded to perform the addition. Therefore, the following conditions must be met:

Except for the CUDNN_ADD_SAME_Cmode, the dimensions h,wof the two tensorsmust match.

In the case of CUDNN_ADD_IMAGEmode, the dimensions n,cof the bias tensor mustbe 1.

In the case of CUDNN_ADD_FEATURE_MAPmode, the dimension nof the bias tensormust be 1 and the dimension cof the two tensors must match.

In the case of CUDNN_ADD_FULL_TENSORmode, the dimensions n,cof the two

tensors must match. In the case of CUDNN_ADD_SAME_Cmode, the dimensions n,w,hof the bias tensor

must be 1 and the dimension cof the two tensors must match.



biasDesc input Handle to a previously initialized tensor descriptor.


19/38

cuDNN API Reference



mode input Addition mode that describe how the addition is performed.

alpha input Scalar factor to be applied to every data element of the bias tensor before

it is added to the output tensor.

srcData input Pointer to data of the tensor described by thebiasDescdescriptor.

srcDestDesc input/

output

Handle to a previously initialized tensor descriptor.

srcDestData input/

output

Pointer to data of the tensor described by the srcDestDescdescriptor.



CUDNN_STATUS_SUCCESS The function executed successfully.

CUDNN_STATUS_BAD_PARAM The dimensions n,c,h,wof the bias tensor referto an amount of data that is incompatible with the

modeparameter and the output tensor dimensionsor the dataTypeof the two tensor descriptors are

different.


4.12. cudnnCreateFilterDescriptorcudnnStatus_t cudnnCreateFilterDescriptor(cudnnFilterDescriptor_t *filterDesc)

This function creates a filter descriptor object by allocating the memory needed to holdits opaque structure,




4.13. cudnnSetFilterDescriptorcudnnStatus_tcudnnSetFilterDescriptor( cudnnFilterDescriptor_t filterDesc,

cudnnDataType_t dataType, intk, intc, inth, intw )

This function initializes a previously created filter descriptor object. Filters layout mustbe contiguous in memory.


20/38

cuDNN API Reference



filterDesc input/

output

Handle to a previously created filter descriptor.

datatype input Data type.

k input Number of output feature maps.

c input Number of input feature maps.

h input Height of each filter.

w input Width of each filter.




CUDNN_STATUS_BAD_PARAM At least one of the parameters k,c,h,wisnegative or dataTypehas an invalid enumerantvalue.

4.14. cudnnGetFilterDescriptorcudnnStatus_tcudnnGetFilterDescriptor( cudnnFilterDescriptor_t filterDesc, cudnnDataType_t *dataType, int*k, int*c, int*h, int*w )

This function queries the parameters of the previouly initialized filter descriptor object.


filterDesc input Handle to a previously created filter descriptor.

datatype output Data type.

k output Number of output feature maps.

c output Number of input feature maps.

h output Height of each filter.

w output Width of each filter.





21/38

cuDNN API Reference


4.15. cudnnDestroyFilterDescriptorcudnnStatus_t cudnnDestroyFilterDescriptor(cudnnFilterdDescriptor_t filterDesc)

This function destroys a previously created Tensor4D descriptor object.



4.16. cudnnCreateConvolutionDescriptorcudnnStatus_t cudnnCreateConvolutionDescriptor(cudnnConvolutionDescriptor_t*convDesc)

This function creates a convolution descriptor object by allocating the memory needed tohold its opaque structure,




4.17. cudnnSetConvolutionDescriptorcudnnStatus_tcudnnSetConvolutionDescriptor( cudnnConvolutionDescriptor_t convDesc, cudnnTensor4dDescriptor_t inputTensorDesc, cudnnFilterDescriptor_t filterDesc, intpad_h, intpad_w, intu, intv, intupscalex, intupscaley, cudnnConvolutionMode_t mode )

This function initializes a previously created convolution descriptor object, accordingto an input tensor descriptor and a filter descriptor passed as parameter. This functionassumes that the tensor and filter descriptors corresponds to the formard convolutionpath and checks if their settings are valid. That same convolution descriptor can bereused in the backward path provided it corresponds to the same layer.


convDesc input/

output

Handle to a previously created convolution descriptor.

inputTensorDesc input Input tensor descriptor used for that layer on the forward path.

filterDesc input Filter descriptor used for that layer on the forward path.


22/38


23/38

cuDNN API Reference


This function initializes a previously created convolution descriptor object. It is similarto cudnnSetConvolutionDescriptorbut every parameter of the convolution must bepassed explicitly.


convDesc input/

output

Handle to a previously created convolution descriptor.

n input Number of images.

c input Number of input feature maps.

h input Height of each input feature map.

w input Width of each input feature map.

k input Number of output feature maps.

r input Height of each filter.

s input Width of each filter.

pad_h input zero-padding height: number of rows of zeros implicitly concatenated onto

the top and onto the bottom of input images.

pad_w input zero-padding width: number of columns of zeros implicitly concatenated

onto the left and onto the right of input images.

u input Vertical filter stride.

v input Horizontal filter stride.

upscalex input Upscale the input in x-direction.

upscaley input Upscale the input in y-direction.

mode input Selects between CUDNN_CONVOLUTIONand CUDNN_CROSS_CORRELATION.




CUDNN_STATUS_BAD_PARAM At least one of the following conditions are met:

One of the parameters u,vis negative.

The parametermodehas an invalid enumerantvalue.

CUDNN_STATUS_NOT_SUPPORTED The parameter upscalexor upscaleyis not 1.


24/38

cuDNN API Reference


4.19. cudnnGetOutputTensor4dDimcudnnStatus_tcudnnGetOutputTensor4dDim( constcudnnConvolutionDescriptor_t convDesc, cudnnConvolutionPath_t path, int*n, int*c, int*h, int*w )

This function returns the dimensions of a convolution's output, given the convolutiondescriptor and the direction of the convolution. This function can help to setup theoutput tensor and allocate the proper amount of memory prior to launch the actualconvolution.


convDesc input Handle to a previously created convolution descriptor.

path input Enumerant to specify the direction of the convolution.

n output Number of output images.

c output Number of output feature maps per image.

h output Height of each output feature map.

w output Width of each output feature map.



CUDNN_STATUS_BAD_PARAM Thepathparameter has an invalid enumerantvalue.


4.20. cudnnDestroyFilterDescriptorcudnnStatus_t cudnnDestroyConvolutionDescriptor(cudnnConvolutionDescriptor_tconvDesc)

This function destroys a previously created convolution descriptor object.




25/38

cuDNN API Reference


4.21. cudnnConvolutionForwardcudnnStatus_tcudnnConvolutionForward( cudnnHandle_t handle, cudnnTensor4dDescriptor_t srcDesc, constvoid *srcData, cudnnFilterDescriptor_t filterDesc, constvoid *filterData, cudnnConvolutionDescriptor_t convDesc, cudnnTensor4dDescriptor_t destDesc, void *destData, cudnnAccumulateResult_t accumulate )

This function executes convolutions or cross-correlations over srcusing the specifiedfilters, returning results in dest.




srcData input Data pointer to GPU memory associated with the tensor descriptor

srcDesc.

filterDesc input Handle to a previously initialized filter descriptor.

filterData input Data pointer to GPU memory associated with the filter descriptor

filterDesc.

convDesc input Previously initialized convolution descriptor.

destDesc input Handle to a previously initialized tensor descriptor.

destData input/

output

Data pointer to GPU memory associated with the tensor descriptor

destDescthat carries the result of the convolution.

accumulate input Enumerant that specifies whether the convolution accumulates with or

overwrites the output tensor.



CUDNN_STATUS_SUCCESS The operation was launched successfully.

CUDNN_STATUS_MAPPING_ERROR An error occured during the texture binding of thefilter data.



26/38

cuDNN API Reference


4.22. cudnnConvolutionBackwardBiascudnnStatus_tcudnnConvolutionBackwardBias( cudnnHandle_t handle, cudnnTensor4dDescriptor_t srcDesc, constvoid *srcData, cudnnTensor4dDescriptor_t destDesc, void *destData, cudnnAccumulateResult_t accumulate )

This function computes the convolution gradient with respect to the bias, which is thesum of every element belonging to the same feature map across all of the images of theinput tensor. Therefore, the number of elements produced is equal to the number offeatures maps of the input tensor.



srcDesc input Handle to the previously initialized input tensor descriptor.


srcDesc.

destDesc input Handle to the previously initialized output tensor descriptor.

destData output Data pointer to GPU memory associated with the output tensor descriptor

destDesc.







One of the parameters n,h,wof the output

tensor is not 1.

The numbers of feature maps of the input

tensor and output tensor differ.

The dataTypeof the two tensor descriptors

are different.

CUDNN_STATUS_NOT_SUPPORTED At least one of the following conditions are met: The width stride of the input tensor is not 1.

The height stride and the width of the input

tensor differ.

The feature map stride of the output tensor is

not 1.


27/38

cuDNN API Reference


4.23. cudnnConvolutionBackwardFiltercudnnStatus_tcudnnConvolutionBackwardFilter( cudnnHandle_t handle, cudnnTensor4dDescriptor_t srcDesc, constvoid *srcData, cudnnTensor4dDescriptor_t diffDesc, constvoid *diffData, cudnnConvolutionDescriptor_t convDesc, cudnnFilterDescriptor_t gradDesc, void *gradData, cudnnAccumulateResult_t accumulate )

This function computes the convolution gradient with respect to the filter coefficients.





srcDesc.

diffDesc input Handle to the previously initialized input differential tensor descriptor.

diffData input Data pointer to GPU memory associated with the input differential tensor

descriptor diffDesc.


gradDesc input Handle to a previously initialized filter descriptor.

gradData input/

output

Data pointer to GPU memory associated with the filter descriptor

gradDescthat carries the result.






CUDNN_STATUS_NOT_SUPPORTED The requested operation is not currently supportedin cuDNN. Your diffDesc is likely not in NCHW

format.

CUDNN_STATUS_MAPPING_ERROR An error occurs during the texture binding of thefilter data.



28/38

cuDNN API Reference


4.24. cudnnConvolutionBackwardDatacudnnStatus_tcudnnConvolutionBackwardData( cudnnHandle_t handle, cudnnFilterDescriptor_t filterDesc, constvoid *filterData, cudnnTensor4dDescriptor_t diffDesc, constvoid *diffData, cudnnConvolutionDescriptor_t convDesc, cudnnTensor4dDescriptor_t gradDesc, void *gradData, cudnnAccumulateResult_t accumulate );

This function computes the convolution gradient with respect to the ouput tensor.



filterDesc input Handle to a previously initialized filter descriptor.

filterData input Data pointer to GPU memory associated with the filter descriptor

filterDesc.

diffDesc input Handle to the previously initialized input differential tensor descriptor.

diffData input Data pointer to GPU memory associated with the input differential tensor

descriptor diffDesc.


gradDesc input Handle to the previously initialized output tensor descriptor.

gradData input/

output

Data pointer to GPU memory associated with the output tensor descriptor

gradDescthat carries the result.






CUDNN_STATUS_NOT_SUPPORTED The requested operation is not currently supportedin cuDNN. Your diffDesc is likely not in NCHW

format.

CUDNN_STATUS_MAPPING_ERROR An error occurs during the texture binding of the

filter data or the input differential tensor data



29/38

cuDNN API Reference


4.25. cudnnSoftmaxForwardcudnnStatus_tcudnnSoftmaxForward( cudnnHandle_t handle, cudnnSoftmaxAlgorithm_t algorithm, cudnnSoftmaxMode_t mode, cudnnTensor4dDescriptor_t srcDesc, constvoid *srcData, cudnnTensor4dDescriptor_t destDesc, void *destData )

This routine computes the softmax function.



algorithm input Enumerant to specify the softmax algorithm.

mode input Enumerant to specify the softmax mode.



srcDesc.



destDesc.





The dimensions n,c,h,wof the input tensor

and output tensors differ.

The datatypeof the input tensor and output

tensors differ.

The parameters algorithmormodehave aninvalid enumerant value.



30/38

cuDNN API Reference


4.26. cudnnSoftmaxBackwardcudnnStatus_tcudnnSoftmaxBackward( cudnnHandle_t handle, cudnnSoftmaxAlgorithm_t algorithm, cudnnSoftmaxMode_t mode, cudnnTensor4dDescriptor_t srcDesc, constvoid *srcData, cudnnTensor4dDescriptor_t srcDiffDesc, constvoid *srcDiffData, cudnnTensor4dDescriptor_t destDiffDesc, void *destDiffData )

This routine computes the gradient of the softmax function.



algorithm input Enumerant to specify the softmax algorithm.

mode input Enumerant to specify the softmax mode.



srcDesc.

srcDiffDesc input Handle to the previously initialized input differential tensor descriptor.

srcDiffData input Data pointer to GPU memory associated with the tensor descriptor

srcDiffData.

destDiffDesc input Handle to the previously initialized output differential tensor descriptor.

destDiffData output Data pointer to GPU memory associated with the output tensor descriptordestDiffDesc.





The dimensions n,c,h,wof the srcDesc,

srcDiffDescand destDiffDesctensorsdiffer.

The strides nStride, cStride, hStride,wStrideof the srcDescand srcDiffDesctensors differ.

The datatypeof the three tensors differs.



31/38

cuDNN API Reference


4.27. cudnnCreatePoolingDescriptorcudnnStatus_t cudnnCreatePoolingDescriptor( cudnnPoolingDescriptor_t*poolingDesc )

This function creates a pooling descriptor object by allocating the memory needed tohold its opaque structure,




4.28. cudnnSetPoolingDescriptorcudnnStatus_t

cudnnSetPoolingDescriptor( cudnnPoolingDescriptor_t poolingDesc, cudnnPoolingMode_t mode, intwindowHeight, intwindowWidth, intverticalStride, inthorizontalStride )

This function initializes a previously created pooling descriptor object.


poolingDesc input/

output

Handle to a previously created pooling descriptor.

mode input Enumerant to specify the pooling mode.

windowHeight input Height of the pooling window.

windowWidth input Width of the pooling window.

verticalStride input Pooling vertical stride.

horizontalStride input Pooling horizontal stride.




CUDNN_STATUS_BAD_PARAM At least one of the parameterswindowHeight,windowWidth, verticalStride,

horizontalStrideis negative ormodehas an

invalid enumerant value.


32/38

cuDNN API Reference


4.29. cudnnGetPoolingDescriptorcudnnStatus_tcudnnGetPoolingDescriptor( cudnnPoolingDescriptor_t poolingDesc, cudnnPoolingMode_t *mode, int*windowHeight, int*windowWidth, int*verticalStride, int*horizontalStride )

This function queries a previously created pooling descriptor object.


poolingDesc input Handle to a previously created pooling descriptor.

mode output Enumerant to specify the pooling mode.

windowHeight output Height of the pooling window.

windowWidth output Width of the pooling window.

verticalStride output Pooling vertical stride.

horizontalStride output Pooling horizontal stride.




4.30. cudnnDestroyPoolingDescriptorcudnnStatus_t cudnnDestroyPoolingDescriptor( cudnnPoolingDescriptor_tpoolingDesc )

This function destroys a previously created pooling descriptor object.



4.31. cudnnPoolingForwardcudnnStatus_tcudnnPoolingForward( cudnnHandle_t handle, cudnnPoolingDescriptor_t poolingDesc, cudnnTensor4dDescriptor_t srcDesc, constvoid *srcData, cudnnTensor4dDescriptor_t destDesc, void *destData )

This function computes pooling of input values (i.e., the maximum or average of severaladjacent values) to produce an output with smaller height and/or width.


33/38

cuDNN API Reference




poolingDesc input Handle to a previously initialized pooling descriptor.



srcDesc.



destDesc.





The dimensions n,cof the input tensor and

output tensors differ.

The datatypeof the input tensor and output

tensors differs.

CUDNN_STATUS_NOT_SUPPORTED ThewStrideof input tensor or output tensor is

not 1.


4.32. cudnnPoolingBackwardcudnnStatus_tcudnnPoolingBackward( cudnnHandle_t handle, cudnnPoolingDescriptor_t poolingDesc, cudnnTensor4dDescriptor_t srcDesc, constvoid *srcData, cudnnTensor4dDescriptor_t srcDiffDesc, constvoid *srcDiffData, cudnnTensor4dDescriptor_t destDesc, constvoid *destData, cudnnTensor4dDescriptor_t destDiffDesc, void *destDiffData )

This function computes the gradient of a pooling operation.



poolingDesc input Handle to the previously initialized pooling descriptor.



34/38

cuDNN API Reference




srcDesc.


srcDiffData input Data pointer to GPU memory associated with the tensor descriptorsrcDiffData.


destData input Data pointer to GPU memory associated with the output tensor descriptor

destDesc.


destDiffData output Data pointer to GPU memory associated with the output tensor descriptor

destDiffDesc.





The dimensions n,c,h,wof the srcDescand

srcDiffDesctensors differ.

The strides nStride, cStride, hStride,wStrideof the srcDescand srcDiffDesctensors differ.

The dimensions n,c,h,wof the destDescand destDiffDesctensors differ.

The stridesnStride, cStride,hStride, wStrideof the destDescand

destDiffDesctensors differ.

The datatypeof the four tensors differ.

CUDNN_STATUS_NOT_SUPPORTED ThewStrideof input tensor or output tensor isnot 1.


4.33. cudnnActivationForward

cudnnStatus_tcudnnActivationForward( cudnnHandle_t handle, cudnnActivationMode_t mode, cudnnTensor4dDescriptor_t srcDesc, constvoid *srcData, cudnnTensor4dDescriptor_t destDesc, void *destData )

This routine applies a specified neuron activation function element-wise over each inputvalue.


35/38

cuDNN API Reference




mode input Enumerant to specify the activation mode.



srcDesc.



destDesc.




CUDNN_STATUS_BAD_PARAM The parametermodehas an invalid enumerantvalue.

CUDNN_STATUS_NOT_SUPPORTED At least one of the following conditions are met:

The dimensions n,c,h,wof the input tensorand output tensors differ.

The datatypeof the input tensor and outputtensors differs.


4.34. cudnnActivationBackwardcudnnStatus_tcudnnActivationBackward( cudnnHandle_t handle, cudnnActivationMode_t mode, cudnnTensor4dDescriptor_t srcDesc, constvoid *srcData, cudnnTensor4dDescriptor_t srcDiffDesc, constvoid *srcDiffData, cudnnTensor4dDescriptor_t destDesc, constvoid *destData, cudnnTensor4dDescriptor_t destDiffDesc, void *destDiffData )

This routine computes the gradient of a neuron activation function.



mode input Enumerant to specify the activation mode.



36/38

cuDNN API Reference




srcDesc.


srcDiffData input Data pointer to GPU memory associated with the tensor descriptorsrcDiffData.


destData input Data pointer to GPU memory associated with the output tensor descriptor

destDesc.


destDiffData output Data pointer to GPU memory associated with the output tensor descriptor

destDiffDesc.




CUDNN_STATUS_BAD_PARAM The parametermodehas an invalid enumerant

value.

CUDNN_STATUS_NOT_SUPPORTED At least one of the following conditions are met:

The dimensions n,c,h,wof the four tensorsdiffer.

The strides nStride, cStride, hStride,

wStrideof the input tensor and the input

differential tensor differ.

The strides nStride, cStride, hStride,

wStrideof the output tensor and the output

differential tensor differ.

The datatypeof the four tensors differs.



37/38


Chapter 5.ACKNOWLEDGMENTS

Some of the cuDNN library routines were derived from code developed by theUniversity of Tennessee and are subject to the Modified Berkeley Software DistributionLicense as follows:

Copyright (c) 2010 The University of Tennessee.

All rights reserved.

Redistribution and use in source and binary forms, with or withoutmodification, are permitted provided that the following conditions aremet: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer listed in this license in the documentation and/or other materials provided with the distribution. * Neither the name of the copyright holders nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOTLIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FORA PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHTOWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOTLIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANYTHEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USEOF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


38/38

cudnn library (1)

Documents