flac decoding with gpu acceleration · abstract flac (free lossless audio codec) is one of the...

28
Australian National University COMP4560 Advanced Computing Project FLAC Decoding with GPU Acceleration Author: Qin Tian Supervisor: Dr. Eric McCreath Semester 1, 2017

Upload: duongdat

Post on 14-May-2018

230 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: FLAC Decoding with GPU Acceleration · Abstract FLAC (Free Lossless Audio Codec) is one of the common lossless audio codecs, which can compress audio data without any loss of qual-ity

Australian National University

COMP4560 Advanced Computing Project

FLAC Decoding with GPU

Acceleration

Author: Qin Tian

Supervisor: Dr. Eric McCreath

Semester 1, 2017

Page 2: FLAC Decoding with GPU Acceleration · Abstract FLAC (Free Lossless Audio Codec) is one of the common lossless audio codecs, which can compress audio data without any loss of qual-ity

Acknowledgement

I would like to express my sincere gratitude to my supervisor, Dr.Eric McCreath, for his support, patience, and encouragement through-out the project. His technical advice was essential to the completion ofthe project. Also, he has taught me a lot of insights on doing academicresearch, which helps me a lot in designing the approach, performingthe experiments, collecting the data, evaluating results and structur-ing the presentation and contents of this report. Thanks for Eric’sgreat patience and support when I met challenges and bottlenecks ofmy project; otherwise, I can not achieve the current project outcome.

I also would like to express my gratitude to Prof. Weifa Liang, whohas regularly taught us the instructions and skills of doing a researchproject as well as the delivery of presentation and report.

My thanks also go to my friends Jiajin Huang and Yang Wang,who were also doing projects related to FLAC audio format. Theinteresting and inspiring discussions with them expand my insights onFLAC codecs and skills of using di↵erent software tools.

1

Page 3: FLAC Decoding with GPU Acceleration · Abstract FLAC (Free Lossless Audio Codec) is one of the common lossless audio codecs, which can compress audio data without any loss of qual-ity

Abstract

FLAC (Free Lossless Audio Codec) is one of the common losslessaudio codecs, which can compress audio data without any loss of qual-ity. As for the serial FLAC decoding, the time performance has littleroom to improve, since FLAC has good speed of audio decoding whichmakes it fast enough for audio playing. However, when it comes toaudio editing and complex software application, a better time per-formance would be beneficial. In this report, two methods of FLACdecoding are implemented and compared: a serial version and a paral-lel decoding version. The parallel FLAC decoding method acceleratesthe decoding speed by deploying the residual synthesis process to aGraphical Processing Unit (GPU) where di↵erent threads are workingon di↵erent sections of the audio concurrently. The results show thatthe approach has an improvement in time e�ciency with 30% com-pared with the relevant serial version of FLAC decoding. The timefor memory allocation and transfer in the parallel decoding and theoptimal number of threads are also discussed in the report.

Key words: Audio Codecs, FLAC Decoding, GPU acceleration

2

Page 4: FLAC Decoding with GPU Acceleration · Abstract FLAC (Free Lossless Audio Codec) is one of the common lossless audio codecs, which can compress audio data without any loss of qual-ity

Table of Contents

1 Introduction 4

2 Background 52.1 FLAC Audio Codec . . . . . . . . . . . . . . . . . . . . . . . . 52.2 FLAC Architecture . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2.1 Blocking . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2.2 Interchannel Decorrelation . . . . . . . . . . . . . . . . 82.2.3 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . 92.2.4 Residual Coding . . . . . . . . . . . . . . . . . . . . . 9

2.3 GPU Accelaration . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Approach and Implementation 113.1 Sequential decoding program . . . . . . . . . . . . . . . . . . . 113.2 Parallel decoding program . . . . . . . . . . . . . . . . . . . . 143.3 Runtime Environment . . . . . . . . . . . . . . . . . . . . . . 143.4 Comparison Experiments . . . . . . . . . . . . . . . . . . . . . 15

4 Analysis and Evaluation 16

5 Conclusion 20

6 Limitations and Future Work 20

3

Page 5: FLAC Decoding with GPU Acceleration · Abstract FLAC (Free Lossless Audio Codec) is one of the common lossless audio codecs, which can compress audio data without any loss of qual-ity

1 Introduction

The primary goal of the project is to accelerate FLAC decoding using a GPU.There are 10 stages of the project, as shown in Figure 1. During the first andsecond stages, a lot of research are conducted in the FLAC format and audiodecoding. After that, during the third stage, a sequential FLAC decoderis constructed, which takes a large amount of project time because of thefollowing challenges of FLAC decoding:

• variable block size If the block size is fixed, after decoding the firstframe, the start point of the second frame can be located by addingthe block size to the o↵set of the start frame. Likewise, the headerof the following frames can be found easily. However, since the sizevaries from block to block in the audio stream, the syn code is neededto locate the position of the frame start point.

• insu�cient sync code The sync code at the start of the frame header isa series of bit 1, which may appear in the middle of the audio stream,increasing the complexity of frame location.

• complicated hierarchy of FLAC format As illustrated in Figure 2, eachinformation section of the FLAC audio data can be divided into oneor more blocks of detailed information, which will be decoded by goingthrough each bit of the section and finding the corresponding messages(e.g. bits 0010 to 0101 represent 576⇤2n�1 samples in the frame headerinformation section).

Next, during the Stage 4 and 5, GPU acceleration approach and CUDAtool are explored. Then the proposed O(lgn) algorithm is researched andattempted in the sixth and seventh stages. However, due to the complexityof the algorithm and the scope of the project, the attempt to implementingthe O(lgn) algorithm is paused and moved to another GPU accelerationapproach, where the signal synthesis part of FLAC decoding is convertedinto the parallel version. Finally, several experiments are performed on thesequential decoding program and the concurrent one respectively, the resultsof which are recorded and analyzed in the report.

The report begins with some explanation about the FLAC audio codec,the FLAC architecture and GPU acceleration based on CUDA, followed bythe description of the approach and implementation. The illustration of thecomparison experiments is also included in the section. Then it comes to theanalysis and evaluation of the di↵erent time performance of the programs

4

Page 6: FLAC Decoding with GPU Acceleration · Abstract FLAC (Free Lossless Audio Codec) is one of the common lossless audio codecs, which can compress audio data without any loss of qual-ity

with the various groups of input audio data. The report ends with theconclusions based on the evaluation outcomes with clarification of the projectlimitations and possible future work.

2 Background

In this section, the advantages of FLAC audio codec and the architecture ofFLAC are demonstrated, followed by the explanation of GPU accelerationand CUDA working flow, which will be implemented into FLAC decoding.

2.1 FLAC Audio Codec

Similar to the compression and decompression of files by ZIP, an audio codecstands for the encoding and decoding of the audio streams. There are manycommon audio formats like MP3, OGG, and AAC, but many of them arelossy. It means that part of the information in the audio streams is discardedto enhance the compression performance.

FLAC is a lossless audio coding format, developed by Josh Coalson from2001 to 2009 and now becomes one of the popular audio codecs because ofits several advantages[2]:

• Lossless This is one the most important features of FLAC. Unlike manyaudio codecs which optimize their compression by reducing the qualityof the audio data, FLAC can achieve 50-60 percent of compressionwhile keeping all the information in the audio streams[2]. In otherwords, FLAC can encode audio data with good compression ratio andwithout any loss of quality at the meanwhile.

• Free FLAC format is fully published and open-source. A free referenceimplementation can also be found on the website of the Xiph Organi-zation [2], which is available to be customized by users.

• Fast FLAC has good time performance in decoding, which makes it fasterenough for audio playing and easy to achieve real-time decode perfor-mance. As for audio editing and complex software application, furtherimprovement in time e�ciency would be beneficial.

• Other features There are other advantages such as hardware support,flexible metadata, seekable, streamable, convenient CD archiving anderror resistant.

With these notable features, it is worthwhile to research into FLAC de-coding and improve the time performance further to enhance its popularity.

5

Page 7: FLAC Decoding with GPU Acceleration · Abstract FLAC (Free Lossless Audio Codec) is one of the common lossless audio codecs, which can compress audio data without any loss of qual-ity

Figure 1: Project Stages

6

Page 8: FLAC Decoding with GPU Acceleration · Abstract FLAC (Free Lossless Audio Codec) is one of the common lossless audio codecs, which can compress audio data without any loss of qual-ity

Figure 2: The Hierarchy of FLAC Stream [2]

2.2 FLAC Architecture

Before implementing the FLAC decoding, it is necessary to understand howFLAC encodes the audio streams. Similar to many audio coders, a FLACencoder also consists of four architectural components: blocking, interchanneldecorrelation, prediction and residual coding [2]. After going through thesefour processes, the raw audio information is encoded into FLAC format.

2.2.1 Blocking

During the blocking stage, the audio data is split into blocks, and each blockis encoded to be a frame separately. For each block, there are several sub-

7

Page 9: FLAC Decoding with GPU Acceleration · Abstract FLAC (Free Lossless Audio Codec) is one of the common lossless audio codecs, which can compress audio data without any loss of qual-ity

Figure 3: FLAC Architecture [2]

blocks with one subblock for each channel. For example, a stereo audio whichhas left, center and right channels will be divided into some blocks with threesubblocks, while a mono audio is split into blocks with each having only onesubblock. As Figure 2 shows, after blocking, a FLAC stream is created begin-ning with a 16-bit string ”fLaC,” followed by the STREAMINFO metadatablock, other metadata blocks, and some audio frames.

2.2.2 Interchannel Decorrelation

After blocking the audio, FLAC performs Interchannel Decorrelation on au-dios which have multiple channels(e.g. the stereo streams), because theyusually have a certain amount of correlation between the di↵erent channels.In contrast, mono streams don’t need this process. There are four types ofthe operation mode for Interchannel Decorrelation:

• Independent The left and right channels are coded seperately.

• Mid-side The average of side channels and the di↵erence between left andright side are coded. i.e. (Left+Right)/2 and Left�Right.

• Left The left side and the di↵erence between left and right side are coded.i.e. Left and Left�Right.

• Right The right side and the di↵erence between left and right side arecoded. i.e. Right and Left�Right.

Since the frames of stereo streams can have di↵erent channel assignments,the FLAC encoder tends to choose the optimal representation among chan-nels for each frame. More detailed description about this architectural com-ponent will not be discussed here since the implementation of the approachis based on mono audios.

8

Page 10: FLAC Decoding with GPU Acceleration · Abstract FLAC (Free Lossless Audio Codec) is one of the common lossless audio codecs, which can compress audio data without any loss of qual-ity

2.2.3 Prediction

Then it comes to the Prediction stage, which is an important process thatcontributes to FLAC’s compression ratio of audio data. The next sample ispredicted according to a certain number of previous samples (i.e. warm-upsamples). After that, instead of encoding the whole sample, its predictionerror is encoded, and the number of bits per sample is reduced; thus, thedata is compressed with fewer bits needed.

FLAC adopts four typess of modeling for audio signals[2]:

• Verbatim No prediction signal and no compression. The audio signalsare directly passed to the next process and encoded.

• Constant In the Constant method, a predictor is used for silent sections,and a signal is run-length encoded.

• Fixed Linear Predictor A 4th order linear predictor is applied.

• FIR Linear Predictor Unlike the Fixed Linear Predictor, the FIR Lin-ear Predictor supports up to 32nd order linear predictor.

The prediction signals s0[n] are calculated based on the formulas below,where p is the prediction order, ak is the prediction coe�cients. The detailsabout the way that FLAC calculates the prediction coe�cients will not bediscussed here. For FLAC decoding, the coe�cients of the input FLACaudio can be directly decoded from its SubFrameLPC information block, asindicated in Figure 2.

s0[n] =k=pX

k=1

aks[n� k]

2.2.4 Residual Coding

After the modeling process, the prediction signals are obtained, and theresidual (i.e. the prediction error) can be calculated based on the formulabelow. The prediction error e[n] is the result of subtracting the predictionsignal from the original one.

e[n] = s[n]� s0[n] = s[n]�k=pX

k=1

aks[n� k]

9

Page 11: FLAC Decoding with GPU Acceleration · Abstract FLAC (Free Lossless Audio Codec) is one of the common lossless audio codecs, which can compress audio data without any loss of qual-ity

Figure 4: GPU Acceleration [3]

Then the residuals are encoded using Rice coding which is a subset ofGolomb Coding. To produce a Rice code, an integer parameter is set, andthen the residual is divided by the integer, which will result in a quotient qand a remainder r. Then the codeword is q in unary format followed by rin binary format. For example, suppose the residual signal is n = 13 andthe integer is m = 4, to get the Rice code we have q = floor(N/M) = 3Then convert it into unary code 0001 or 1110. As for the remainder, we haver = N%M = 1 = 1, so the Rice code is 00011 or 11101.

2.3 GPU Accelaration

As Figure 4 shows, a CPU only consists of a few cores which make it optimalfor sequential serial processing. In contrast, a GPU contains a large numberof cores which can execute tasks concurrently. A GPU is usually used to-gether with a CPU to accelerate applications. The compute-intensive part ofthe code is moved to the GPU, and the remaining part is executed on CPU[3].

The Compute Unified Device Architecture (CUDA) developed by NVIDIAcompany o↵ers a general-purpose interface for programming on the GPU [4].In the CUDA processing flow (Figure 5), instructed by CPU, the data iscopied from the main memory to GPU memory, and then the threads inGPU process the data parallelly and the result is copied back to the mainmemory.

10

Page 12: FLAC Decoding with GPU Acceleration · Abstract FLAC (Free Lossless Audio Codec) is one of the common lossless audio codecs, which can compress audio data without any loss of qual-ity

Figure 5: Processing flow on CUDA [4]

3 Approach and Implementation

In this section, the approach and the workflows for both of the sequentialand parallel decoding programs are discussed, followed by the description ofthe comparison experiments and runtime environment.

3.1 Sequential decoding program

The processes of the sequential decoding program are illustrated in Figure6. After the FLAC audio is loaded into the serial decoder, the 32-bit ’fLaC’marker at the beginning of the audio stream is checked. Then the StreamInfoblock which contains the basic information of the audio (e.g. min/max framesize, sample rate, total samples, etc.) is decoded by the program. After that,the decoder goes through each of the frames and decodes the headers and theSubframeLPC block including the warm-up samples and coe�cients. Thenthe residuals in each of the rice partition are decoded and restored based onthe coe�cients and LPC orders.

The most time-consuming part of the program is the frame-by-frame loop,which is likely to have a significant increase of time cost with larger inputaudio where more frames need to be decoded.

11

Page 13: FLAC Decoding with GPU Acceleration · Abstract FLAC (Free Lossless Audio Codec) is one of the common lossless audio codecs, which can compress audio data without any loss of qual-ity

Figure 6: Processing flow of sequential decoding program

12

Page 14: FLAC Decoding with GPU Acceleration · Abstract FLAC (Free Lossless Audio Codec) is one of the common lossless audio codecs, which can compress audio data without any loss of qual-ity

Figure 7: Processing flow of parallel decoding program13

Page 15: FLAC Decoding with GPU Acceleration · Abstract FLAC (Free Lossless Audio Codec) is one of the common lossless audio codecs, which can compress audio data without any loss of qual-ity

3.2 Parallel decoding program

In Figure 7, the workflow of the parallel decoding program is demonstrated,the first seven processes of which are identical to the sequential decoder. In-stead of synthesizing the signals sample-by-sample for each frame serially, theparallel decoder copies the data including the warm-up samples, coe�cientsand LPC orders from the main memory to the device memory where eachof the CUDA block works on one frame and each of the thread inside theGPU block restore several samples of the audio stream. That is, the signalsynthesis tasks are moved to the many threads of the GPU and executedconcurrently. As mentioned in the Prediction section of FLAC architecture,the samples are stored as residuals which are the results of substracting ap-proximations from the original signals. Therefore, to restore the signal, wehave the following formula:

s[n] = e[n] + s0[n] = e[n] +k=pX

k=1

aks[n� k]

After completing the signal synthesis process, the parallel decoder copiesthe results to the main memory and finishes the audio decoding.

Compared with the sequential decoder, the concurrent decoding versionsaves running time through the parallel execution of the eighth process ofthe serial FLAC decoder(Figure 6).

However, it introduces the overhead of memory transfer time betweenhost and device as well as the time of memory allocation in GPU, which willbe discussed in the evaluation section of the report. The running time of theconcurrent version is calculated through the equation below.

RunningTime = OverallRunningTime � MemoryAllocationOverhead � MemoryTransferOverhead

MemoryTransfer = Time(HostToDevice) + Time(DeviceToHost)

The CUDA library provides cudaMalloc() method for memory allocationand cudaMemcopy() function for copying data to the device (Figure 8), thetime cost of which will be deducted from the total running time and comparedwith the time performance of the serial version. The overall time cost withmemory transfer overhead will also be analyzed.

3.3 Runtime Environment

The program is run on the machine equiped with a GPU which can beremotely accessed through the SSH key. Here is the basic information of themachine:

14

Page 16: FLAC Decoding with GPU Acceleration · Abstract FLAC (Free Lossless Audio Codec) is one of the common lossless audio codecs, which can compress audio data without any loss of qual-ity

Figure 8: Overhead of Memory Allocation and Transfer

Operating System: Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-75-generic x86 64)

Hardware: 4.00GHz Intel Core i7 with Nvidia GeForce GTX 1080

3.4 Comparison Experiments

Three experiments were performed on the decoding programs. In the firstexperiment, a small audio with only 71KB was loaded into the decodingprogram, and then 10 runs were performed on the sequential and parallelversion respectively. The running times were recorded in milliseconds. Theresult of comparison between the two programs is considered as a quick viewof the performance of GPU acceleration for a relatively small audio file. Inthe second experiment, a group of audio data with di↵erent sizes from 87KBto 1565KB increasing nearly 200KB at a time were input to the program.With each of the audio data, the two versions of the program ran ten timesrespectively. Then the average time e�ciency of the ten runs is calculatedfor each program. Finally, each of eight input FLAC audios has two averagerunning time (one for CPU version and the other for GPU version) in ms. Inthe third experiment, only the parallel decoder is executed. The e↵ect of adi↵erent number of threads on the time performance of decoding is explored.For each input audio, the number of thread is set to 64, 128 and 256, andagain ten runs were performed for each of the three groups of threads. Theaverage running time is also calculated in ms.

15

Page 17: FLAC Decoding with GPU Acceleration · Abstract FLAC (Free Lossless Audio Codec) is one of the common lossless audio codecs, which can compress audio data without any loss of qual-ity

Figure 9: FLAC Format [2]

Figure 10: Decode the Stream Information of FLAC Audios

4 Analysis and Evaluation

As indicated in Figure 9, after decoding the StreamInfo block (the 3rd processof Figure 6 and 7), the basic stream information about the eight audios isprinted out, which includes the number of frames, the min/max frame size,the sample size, the sample rate and the total samples of the audio. Thetable in Figure 9 shows that the number of frames and the samples of theaudios increases significantly with the increment of audio size and length.

The time performance of serial and parallel decoding for a 71KB audiois compared in Figure 11. The running time of the serial version rangesfrom 10.81 to 11.82 milliseconds, while the parallel one takes 7.52 to 8.74milliseconds. The result shows that the performed GPU approach tends toaccelerate FLAC decoding even if the input audio file is small.

16

Page 18: FLAC Decoding with GPU Acceleration · Abstract FLAC (Free Lossless Audio Codec) is one of the common lossless audio codecs, which can compress audio data without any loss of qual-ity

Figure 11: Running Time Comparison for Decoding A 71KB FLAC Audio

Figure 12: Running Time Comparison with Di↵erent Sizes of FLAC Audios

17

Page 19: FLAC Decoding with GPU Acceleration · Abstract FLAC (Free Lossless Audio Codec) is one of the common lossless audio codecs, which can compress audio data without any loss of qual-ity

Figure 13: Decode the Stream Information of FLAC Audios

As shown in Figure 12, the time cost of serial approach increases sig-nificantly with larger sizes of FLAC audios, while the growing trend of theparallel approach is more moderate than the sequential one.

However, when the memory transfer overhead is considered, based onFigure 13, the total running time of the parallel decoding program is muchlarger than the average time cost of its serial equivalence. Identical to Fig-ure 12 but with an extra series, Figure 14 clearly shows that the overheadof memory transfer is around 540ms and doesn’t change with the increas-ing of input data size. This means that the total time cost of the paralleldecoder(include the overhead) can be less than the sequential one when theinput audio is large enough and the CPU time exceeds the memory transferoverhead.

Furthermore, Figure 15 demonstrates the time performance of the paralleldecoding among the di↵erent size of audio files and a di↵erent number ofthreads inside a CUDA block. The result doesn’t show any impact of thechanges of threads numbers. It appears that the three groups of runningtime are identical among each of the eight input audio.

18

Page 20: FLAC Decoding with GPU Acceleration · Abstract FLAC (Free Lossless Audio Codec) is one of the common lossless audio codecs, which can compress audio data without any loss of qual-ity

Figure 14: Running Time Comparison with Memory Transfer Time

Figure 15: Running Time Comparison with Di↵erent number of threads

19

Page 21: FLAC Decoding with GPU Acceleration · Abstract FLAC (Free Lossless Audio Codec) is one of the common lossless audio codecs, which can compress audio data without any loss of qual-ity

5 Conclusion

In conclusion, applying the GPU acceleration to the signal synthesis processof FLAC decoding can improve the time performance by nearly 30% onaverage. Since the largest audio used in the experiment is 1.57MB, the timee�ciency can be improved greater than the result if large audio data is used.With a certain amount of time cost of memory transfer between the CPUand GPU, it is recommended to use the approach for decoding applicationhaving the large size of audio inputs.

6 Limitations and Future Work

There are some aspects of the work done in the project that can be improvedfurther. First, the parallel part of the program is limited, and some intensivecompute portions of FLAC decoding like Rice Decoding of residuals for eachframe are not implemented in the project. In the future, the approach todecoding the whole frame from start to end concurrently needs to be exploredand is likely to have a good performance of acceleration. Second, only a GPU(NVIDIA GeOForce GTX 1080) is used in the project without comparisonwith other hardware support. For future work, the decoding performance ofa single GPU and multiple GPUs or di↵erent types of GPU can be comparedand evaluated.

20

Page 22: FLAC Decoding with GPU Acceleration · Abstract FLAC (Free Lossless Audio Codec) is one of the common lossless audio codecs, which can compress audio data without any loss of qual-ity

References

[1] Gert-Jan Van Den Braak and Henk Corporaal. R-gpu: A reconfigurablegpu architecture. ACM Trans. Archit. Code Optim., 13(1):12:1–12:24,March 2016.

[2] Josh Coalson. What is FLAC? Xiph.Org Foundation, 2011-2014.

[3] NVIDIA Corporation. WHAT IS GPU-ACCELERATED COMPUT-ING?

[4] Tom Williams. Parallel processing platform opens bridge to high perfor-mance embedded systems. RTC Magazine, 2014.

[5] Michael Wolfe. Compilers and more: Gpu architecture and applications.HPC Wire, 2008.

21

Page 23: FLAC Decoding with GPU Acceleration · Abstract FLAC (Free Lossless Audio Codec) is one of the common lossless audio codecs, which can compress audio data without any loss of qual-ity
Appendix I - Project Information
This project would explore the paralell decoding of FLAC using a GPU by using the many cores of the GPU to work on different sections of the data with the objective of improving decoding time performance.
Report: 50%Artefact: 40%Presentation: 10%
Assessment:
Description:
Page 24: FLAC Decoding with GPU Acceleration · Abstract FLAC (Free Lossless Audio Codec) is one of the common lossless audio codecs, which can compress audio data without any loss of qual-ity

INDEPENDENT STUDY CONTRACTNote: Enrolment is subject to approval by the projects co-ordinator

SECTION A (Students and Supervisors)

UniID: ____u5833064_____

SURNAME: ______Tian___________ FIRST NAMES: ____Qin__________________________

PROJECT SUPERVISOR (may be external): _Dr Eric McCreath___________________

COURSE SUPERVISOR (a RSCS academic): ________________________________________________

COURSE CODE, TITLE AND UNIT: _____COMP4560 _Advanced Computing Project

SEMESTER S1 S2 YEAR: Summer session 2016/2017 (6u) and Semester 1 2017 (6u)

PROJECT TITLE:

O(lg n) FLAC decoding using a GPU

LEARNING OBJECTIVES:The student would gain a good understanding of the binary formats, particularly the FLAC audio format,and GPGPU software development. With a focus on looking at performance and scaling relating to the decoding of a FLAC audio 2le. More generally the project would strengthen the programming and problem solving abilities along with research skill associated with exploring approaches and ideas and then implementing, testing and evaluating these approaches and ideas.

Also it is expected that the student would gain general skills relating to: writing a report, and giving a seminar.

Research School of Computer Science Form updated Jun-12

Page 25: FLAC Decoding with GPU Acceleration · Abstract FLAC (Free Lossless Audio Codec) is one of the common lossless audio codecs, which can compress audio data without any loss of qual-ity

PROJECT DESCRIPTION:The project will explore the FLAC lossless audio format. This format has grown in popularity as a lossless audio format because the format is: “free”, relatively simple, and e7ective for compressing audio data. This project would explore the parallel decoding of FLAC using a GPU. The approach would work using the many cores of the GPU to work on di7erent sections of the data with the objective of improving decoding time performance. This would involve implementing a O(lg n) algorithm (where n is the number of samples and we assume we have also n processors). The challenge is that the format is inherently serial as the byte length of each frame is unknown and varies between frames. Fortunately at the beginning of every frame there is a marker which signi2es the beginning of that frame (noting this marker sometimes appears also within the middle of a frame, which make things a little more complex). The approach would do most of the work on the GPU dividing the FLAC 2le up into sections and have di7erent threads work concurrently on di7erent sections. The challenge would be decoding and combining the results concurrently. The performance and the performance bottlenecks will be evaluatedfor the proposed approach, in particular the analyses of memory transfer time along with synchronization costs will be evaluated.

Given the serial approach for FLAC decoding is fast there is only a small room for improvement and this would only be worth while for long recordings. Such performance improvement would be most useful foraudio editing or trans-coding software, as the simple serial approaches are easily fast enough for audio playing.

The project report will contain:+ An introduction to the topic.+ A background section which describes the FLAC format,+ A section which provides a background to GPU computing.+ A description of the algorithm for decoding+ A description of the implementation. + Experimental chapter which: describes the hardware used for evaluation, the experiments done, and the results tabulated/graphed.+ Conclusion/discussion/limitations/future work chapter.

Research School of Computer Science Form updated Jun-12

Page 26: FLAC Decoding with GPU Acceleration · Abstract FLAC (Free Lossless Audio Codec) is one of the common lossless audio codecs, which can compress audio data without any loss of qual-ity

ASSESSMENT (as per course’s project rules web page, with the di7erences noted below):

Assessed project components: % of mark Due date

Evaluated

by:

Report: name style: _____________________________(e.g. research report, software description...) 50%

Artefact: name kind: ____________________________(e.g. software, user interface, robot...) 40%

Presentation: 10%

MEETING DATES (IF KNOWN): During the summer session every few days, and then during semester 1 2017 weekly.

STUDENT DECLARATION: I agree to ful2l the above de2ned contract:

………………………………………………….. ………………………..Signature Date

SECTION B (Supervisor):I am willing to supervise and support this project. I have checked the student's academic record and believe this student can complete the project.

………………………………………………….. ………………………..Signature Date

REQUIRED DEPARTMENT RESOURCES: + Most of the development can be done on the students laptop. SECTION C (Course coordinator approval)

………………………………………………….. ………………………..Signature DateSECTION D (Projects coordinator approval)

………………………………………………….. ………………………..Signature Date

Research School of Computer Science Form updated Jun-12

Page 27: FLAC Decoding with GPU Acceleration · Abstract FLAC (Free Lossless Audio Codec) is one of the common lossless audio codecs, which can compress audio data without any loss of qual-ity
Appendix II - README File
Page 28: FLAC Decoding with GPU Acceleration · Abstract FLAC (Free Lossless Audio Codec) is one of the common lossless audio codecs, which can compress audio data without any loss of qual-ity
Nine self-made audio files named by size in kilobytes
Sequential signal restore function
Parallel signal restore function
The FLAC decoder main function
Appendix III - Produced Programs
Note:The parallel part is written as CUDA code.The FLAC decoder is written in C and C++.Some of the functions are based on the serial audio decoding example (Java program) provided by the supervisor.
Runtime Environment:Operating System: Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-75-generic x86 64)Hardware: 4.00GHz Intel Core i7 with Nvidia GeForce GTX 1080