January 27, 2014 Sam Siewert
Computer and Machine Vision
Lecture Week 3
Outline of Week 3
Processing Images and Moving Pictures – High Level View and Computer Architecture for it Linux Platforms for Computer/Machine Vision I/O, Memory and Processing Challenges
Sam Siewert 2
Old School Moving Picture Media and Cameras
NTSC OTA (1941, 1953 color, 2009 dead) – Analog, Interlaced, Continuous Broadcast
Transmission or CCTV (Closed Circuit TV) – Coax Cable or Tuner with Immediate CRT Display – No Buffers, No Routing, No De-mux – No Compression
Analog Cable AM/FM OTA Film Projectors
Sam Siewert 3
Modern Digital Cameras Camera Link – High Frame Rates – High Data Rates and Resolutions – Industry Standard for Machine Vision Automation – E.g. Inspection Systems – E.g. – Sony, IDT, National Instruments
SD-SDI and HD-SDI – Standard and High Definition Synchronous Digital Interface – Standard for Studios, Broadcast
Digital Cinema – Red Camera – 1080p, 2K, 4K Resolutions and Much Higher – Automated Digital Delivery and Projection
Webcams and Mobile Phone Cameras – Very Low Cost – Proprietary – Performance Varies Dramatically
Sam Siewert 4
Differences Analog vs Digital Encoding for Transmission
– Digital Allows for Image Processing – Adds Latency – Requires Compression for Packet Switched Networks and Storage
Routed (Diversely), Buffered Compressed (MPEG, JPEG) to Lower Bit-rates Multiplexed (Shares Transmission Carrier for Audio, Video, Channels) Transported by IP (Large Packets) Continuous Transmission – Analog or Constant Bit-Rate / Frame-Rate
Sam Siewert 5
E.g. UAV Latency and Jitter Verification of Video Frame Latency Telemetry for UAV Systems Using A Secondary Optical Method, Sam Siewert, Muhammad Ahmad, Kevin Yao
Sam Siewert 6
NTSC (Analog TV)
Sam Siewert 7
AM Video to CRT FM Audio Chroma Added Later Odd/Even Lines (Interlaced) 29.97 FPS (30 before color) Vertical Blanking (CRT Retrace Time, Closed Captioning) 525 Lines, 262.5 per Field, 60 Fields per Second
http://en.wikipedia.org/wiki/File:Ntsc_channel.svg
Linux in Computer Vision Embedded Solutions – Texas Instruments OMAP (Beagle xM, Bone) – Numerous ARM SoCs (NVIDIA, Qualcomm,
Broadcomm, …)
Scalable Solutions – Multi-Core (Xeon Phi) – Vector Processing – CUDA, OpenCL GPU and GP-
GPU
Computer and Machine Vision is I/O, Memory and Processing Intensive
Sam Siewert 8
Camera Interfaces CCD (Charge Coupled Device) or CMOS (Common Metal Oxide Substrate) Detector – Integration Time for Photo-sensitive Elements in Array (to Build
up Charge) – Read-out Time to Sample Elements in Array
Luminance and Chroma Analog to Digital Conversion Double Buffer for Read-out + Processing Frame Capture – http://www.cse.uaa.alaska.edu/~ssiewert/a485_doc/Frame-
Capture-Chips/ – Host Interface over PCI Bus or USB
Sam Siewert 9
Digital Video Transport QoS Latency – To Tune in a Program, Turn-on – To Deliver a Video Frame or Audio PCM Sample – To Start, FF, REW, Start-Over, Pause
Bandwidth – Resolution, Lossy/Lossless Compression, High Motion – Pixel Encoding for Color – Frame Rate – Constant Bit-rate Transport? – Variable Bit-rate Transport and Encoding?
Jitter – Decode and Presentation Rates – Elasticity in Decode to Presentation Buffering Necessary
Sam Siewert 10
January 27, 2014 Sam Siewert
Linux System Options
(Linux for Image Processing, Camera Interfacing and Computer Vision)
Sam Siewert 12
Processing Outline Many-Core Linux Host(s) – Intel Atom – ARM – Xeon
GP-GPU Vector Processing PCI-E Co-Processors
NVIDIA Tesla/Fermi AMD ATI
NPTL – Native POSIX Threads Library NPTL Example Code Walkthrough
Sam Siewert 13
Conceptual View of RT Resources Three-Space View of Utilization Requirements – CPU Margin? – IO Latency (and
Bandwidth) Margin? – Memory Capacity (and
Latency) Margin? Upper Right Front Corner – Low-Margin Origin – High-Margin Mobile – Must Consider Battery Life Too (Power)
CPU-Utility
IO-Utility
Memory-Utility
Processing – Initial Focus Processing and Scaling Frame Transformation, Encode, Decode is Critical Memory for Buffering (Frame Transformations, CPU Integrated or GPU Offloaded – e.g. Linux VDPAU) I/O for Networking (Transport) I/O for Storage (On-Demand, Post, Non-Linear Editing)
Sam Siewert 14
Flynn’s Computer Architecture Taxonomy Single Instruction Multiple Instruction
Single Data SISD (Traditional Uni-processor)
MISD (Voting schemes and active-active controllers)
Multiple Data SIMD (e.g. SSE 4.2, GP-GPU, Vector Processing)
MIMD (Distributed systems (MPMD), Clusters with MPI/PVM (SPMD), AMP/SMP)
Sam Siewert 15
GPC has gone MIMD with SIMD Instruction Sets and SIMD Offload (GP-GPU)
NUMA vs. UMA (Trend away from UMA to NUMA or MCH vs. IOH) SMP with One OS (Shared Memory, CPU-balanced Interrupt Handling, Process Load Balancing, Mutli-User, Multi-Application, CPU Affinity Possible)
MIMD - Single Program Multi-Data vs. Multi-Program Multi-Data
Computer and Machine Vision
Treated as a Real-time and/or Interactive System – Requires Predictable Response (By Deadline) – Rate Monotonic – Earliest Deadline First
Sam Siewert 16
Sam Siewert 17
CPU Scheduling Taxonomy Execution Scheduling
Global-MP Local-Uniprocessor
Distributed Asymmetric (AMP )
Symmetric (SMP OS)
Preemptive Non-Preemptive
Fixed-Priority
Hybrid
Dynamic-Priority Cooperative
Batch
FCFS SJN
Co-Routine Continuation Function
Heuristic EDF/LLF RR Timeslice (desktop)
Multi-Frequency Executives
Static Dynamic
Rate Monotonic
Deadline Monotonic
Dataflow
(Preemptive, Non-Preemptive Subtree Under Each Global-MP Leaf)
SMT (Micro-Paralell)
Sam Siewert 18
Response Latency Ci WCET Input/Output Latency Interference Time
Event Sensed Interrupt Dispatch Preemption Dispatch
Interference
Completion (IO Queued)
Actuation (IO Completion)
Input-Latency Dispatch-Latency
Execution Execution Output-Latency
Time
Response Time = TimeActuation – TimeSensed (From Release to Response)
SIMD Vector Instructions Intel MMX, SSE 1, 2, 3, 4.x Code Generation Using SIMD Extensions to Accelerate Algorithms (Edge Enhancement) – http://software.intel.com/en-us/articles/using-intel-
streaming-simd-extensions-and-intel-integrated-performance-primitives-to-accelerate-algorithms/
Sam Siewert 19
PSF
Sam Siewert 20
Offload, Co-Proc, Vector Proc 1. GPU (Graphics Processing Units)
– Evolved for Consumer CGI and Games Physics Engines 3D Rendering + Texture (4D Vector Operations) Game Engines and Simulation HD Output: HDMI, HD-SDI, Headless GP-GPU
– Higher End Used for Digital Cinema / Post Production,
Broadcast PNY Quadro FX NVIDIA CUDA for Post
– GP-GPU Being Used to Accelerate Encode, Transcode,
Trans-rate, etc. - http://www.elementaltechnologies.com/ 2. Built-In SIMD Instruction Set Extensions – Intel SSE
GP-GPU, What Is It? Ideal for Large Bitwise, Integer, and Floating Point Vector Math Flynn’s Taxonomy SIMD Architecture often leverages GP-GPU Co-Processors or Cell for MPMD
21
Single Instruction/Prog Multiple Instruction Single Data SISD (Traditional Uni-
processor) MISD (Voting schemes and active-active controllers)
Multiple Data SIMD (SSE 4.2, Vector Processing) SPMD (Single Program Multiple Data), GP-GPU
MIMD (Distributed systems (MPMD), Clusters with MPI/PVM (SPMD), AMP/SMP)
SSE – Streaming SIMD Extensions
128-bit registers known as XMM0 through XMM7 Large Operands and Operators (Multi-Word) E.g. 128-bit XOR of Two Operands Multiple Multiply and Accumulate Operations for Floating Point (DSP Kernel Operations) – E.g. 4 Component Vector addition – 4 Single Precision Pixel Multiply and Accumulate in Single
Instruction
Sam Siewert 22
vec_res.x = v1.x + v2.x; vec_res.y = v1.y + v2.y; vec_res.z = v1.z + v2.z; vec_res.w = v1.w + v2.w; 16 operations to load 2 operands, add, store
movaps xmm0,address-of-v1 addps xmm0,address-of-v2 movaps address-of-vec_res,xmm0 3 SSE operations to load, add, store ;xmm0=v1.w | v1.z | v1.y | v1.x ;xmm0=v1.w+v2.w | v1.z+v2.z | v1.y+v2.y | v1.x+v2.x
Scheduling Parallel/Cluster HW MIMD – OS SMP threading, provides load balancing,
affinity operations, routable interrupts (e.g. MSI-X), e.g. NPTL
– RTOS AMP is most often used in Embedded Systems
MPMD – OpenCL, CUDA, DirectCompute (DirectX
extension) – Intel OpenMP, Linux Cluster, MPI
Sam Siewert 23
Sam Siewert 24
How Does NPTL Work? No Thread Manager or M-on-N Mapping – Previous POSIX Threading Model – Manager Becomes Bottleneck – Two-Level Scheduling Not Deterministic – Many Pthreads (M) to N Kernel Threads Still an Issue – O(n) Scheduling for each Manager
Direct Mapping of User to Kernel Thread or 1-to-1 – User Space Pthread Maps Directly onto Kernel Thread (Requires Root
privilege) – Deterministic (Non-Determinism due to Kernel Preemptability Issues) – O(1) Scheduling
Scheduling Policies Selectable Similar to RTOS Tasking
Sam Siewert 25
Linux NPTL Scheduling Policies Fixed Priority Preemptive – SCHED_FIFO – This is Priority Preemptive – SCHED_RR – This is Fair, but at Kernel Level – SCHED_OTHER – This is OS default and should not be used
POSIX Threads have – Policy (FIFO, RR, OTHER) – Priority (RT min to RT max) – Creation (Fork) – Join (Wait for thread completion at rendezvous) – Synchronization Methods
Semaphores Message Queues
– Asynchronous Communication Methods Signals Queued Signals
POSIX RT Extensions Include – Virtual Timer Services – Signals Tied to Timer Services – Priority Inversion Protection (Availability on Linux TBD)
July 7, 2004 Sam Siewert
NPTL Coding
Code Walk-through
Thread Scheduling Policy
Sam Siewert 27
pthread_attr_init(&rt_sched_attr); pthread_attr_setinheritsched(&rt_sched_attr, PTHREAD_EXPLICIT_SCHED); pthread_attr_setschedpolicy(&rt_sched_attr, SCHED_FIFO); rt_max_prio = sched_get_priority_max(SCHED_FIFO); rt_min_prio = sched_get_priority_min(SCHED_FIFO); rt_param.sched_priority = rt_max_prio-1; rc=sched_setscheduler(getpid(), SCHED_FIFO, &rt_param); pthread_attr_getscope(&rt_sched_attr, &scope); if(scope == PTHREAD_SCOPE_SYSTEM) printf("PTHREAD SCOPE SYSTEM\n"); else if (scope == PTHREAD_SCOPE_PROCESS) printf("PTHREAD SCOPE PROCESS\n"); else printf("PTHREAD SCOPE UNKNOWN\n");
Thread Creation and Join
Sam Siewert 28
rc = pthread_create(&main_thread, &main_sched_attr, testThread, (void *)0); if (rc) { printf("ERROR; pthread_create() rc is %d\n", rc); perror(NULL); exit(-1); } pthread_join(main_thread, NULL); if(pthread_attr_destroy(&rt_sched_attr) != 0) perror("attr destroy");
Sam Siewert 29
Issues Beyond Policy and Feasibility Throughput Latency How do they Differ? E.g. Frame Rate vs. Time to First Frame
Sam Siewert 30
Digital Video (Quick Reminders)
Simple Encode/Decode is Processing Intensive
GPU Co-Processors Can Offload CPU Example with Mplayer and VDPAU (Video Decode and Presentation Acceleration Unit) for Linux
Sam Siewert 31
Core Loading with Mplayer VDPAU MPEG Decode
(Load balancing and offload)
Dual-Core SW Decode (Load balancing)
Discussion – What Does Eye See? Ewald Hering (1872), Opponent Colors (R/G, Y/B) Color Models – RGB Cube – HSV - Hue/Saturation/Value
Hue – Similarity to R, G, Y, B Saturation – Color vs. Brightness Value – Low=Black, High=Color
– Red and Green Opponent Colors – Can’t See Both Simultaneously
– Yellow and Blue Opponent Colors – Luminance (Candela/Square-Meter) – Light
Passing Through Area Forming a Solid Angle in A Direction
Candela (Photonic Power )= Watts/Steradian More Precise than “Brightness”
– Chrominance (“CrCb” or “UV” in YCrCb or YUV) U=Blue – Luminance (Y) V=Red - Luminance (Y)
– Wavelength Spectrum - ROYGBIV
Sam Siewert 32
HSV Cylinder
RGB Cube
http://en.wikipedia.org/wiki/File:RGB_Cube_Show_lowgamma_cutout_b.png
http://en.wikipedia.org/wiki/File:HSV_color_solid_cylinder_alpha_lowgamma.png
Sam Siewert 33
Frame Analysis and Image Processing Resources for Raw Frame Data
GNU Image Processing – Single Frame Analysis and Transforms
Octave – Similar to MATLAB
Irfanview – Simple Viewer includes PPM OpenCV (C/C++ and Python API)
Single Frame Viewing and Analysis – http://www.irfanview.com/ – http://www.gimp.org/downloads/
Image Processing Libraries – http://cimg.sourceforge.net/ – http://opencv.org/
Sam Siewert 34
Practice with Linux GIMP PPM and JPEG Frame Analysis FFMPEG MPEG-4 DV to Frames Sobel Image Transformation Real-Time – http://www.cse.uaa.alaska.edu/~ssiewert/a485_code/capture-
transformer/ Sobel Image Transformation Batch Mode FFMPEG Re-encoding