map/reduce jobs for video conversionsearch.iiit.ac.in/cloud/presentations/12.pdfapproach an input...
TRANSCRIPT
Map/Reduce Jobs for Video
Conversion
Group 12
Karun Kumar Y
Kiran Kumar N
Santosh GSK
Problem Statement
Given a video file in a HDFS, the task is to convert the
format of that file to a given format
The converted file should be stored back onto the
HDFS
Using Map/Reduce Framework
Throughput should be maximized
Approach
An input file is given containing the paths of video files and the required format conversion for each video file.
Each video file is loaded from HDFS onto the local file system.
The conversion is done on the local system to the required format specified.
Open Source libraries/Video converters are used.
The converted files are stored back onto the HDFS
Commands
copyToLocal(SourceAddress, DestAddress)
◦ This command is used to copy the video file from the HDFS to the
Local System
copyFromLocal(DestAddress, SourceAddress)
◦ This command is used to copy the converted video file from the
local system to the HDFS.
Video Conversion
FFmpeg software is used for video conversion
FFmpeg is a command-line tool
Composed of a collection of free software/open source
libraries.
Developed under GNU/Linux, but it can be compiled
under most operating systems
Command to convert a video file
◦ ffmpeg -i <input video file> <output video file with
designated format>
Ex: ffmpeg –i video.avi video.flv
Map task
In a map task, after a line is processed from the input
file
◦ the video file is copied from the HDFS location specified on the
input file to the local file system
◦ After the video file is copied, the video converter is used to
convert the video to the format as mentioned in input file
◦ The converted video file is copied from the local system to the
HDFS.
There is no need for a Reduce task.
Map Task
Video Converter
HDFS
HDFS
Local File System
Improvising Throughput
The input file hardly has any size
◦ Contains just the file paths.
◦ But the computation for each line could be higher. As
it requires the conversion and copying of video files.
◦ So, the solution would be to split the input file
Contd..
Solution
- Input file is split based upon its size in bytes
-The input split size is calculated using the size of the
input file and the max map tasks that are allowed in a
cluster.
-The input file is distributed equally among the map
tasks.
- Each split will be assigned to a Map task
Challenges faced
When few video files are executed by a map task, it
would take huge time for completion,
◦ So, it would take time for the Reporter to report the progress
◦ Meanwhile, the Framework couldn’t get the report of the task
and it may kill the task
◦ Need to notice the Framework about the status of the task
Solution
• To notify the Framework with the updates of Map task,
a thread is implemented.
• It updates the counter for every interval.
• Interval is specified by the user
• The counter information is notified to the Framework.
• The thread finishes with the map task.
Thank U