v-pane virtual perspectives augmenting natural experiences€¦ · (pitch/roll) position...
TRANSCRIPT
V-PANEVirtual Perspectives Augmenting Natural Experiences
Kerry MoffittScientist
The views, opinions and/or findings expressed are those of the author(s) and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.
Distribution Statement “A” (Approved for Public Release, Distribution Unlimited). DARPA DISTAR #27938
GTC 2017
Contents
• Background: DARPA, GXV-T
• V-PANE Overview and Architecture
• Geometry
• Video
• Ongoing Work
DARPA Ground X-Vehicle Technologies (GXV-T)
Goal: Improved vehicle survivability and mobility
Development Areas:
• Increased agility
• Enhanced mobility
• Crew augmentation
• Signature management
Source: http://www.darpa.mil/program/ground-x-vehicle-technologies
Objective: Develop the next generation of ground-vehicle human-machine interfaces that fuse real-time sensor feeds with video data projected onto a 3D geometric model of the environment. The program aims to develop a fully user-controlled, multiple-perspective, live virtual representation of the vehicle’s surroundings.
V-PANE Overview
IED
Boomerang Shot Detection
NetworkedIED Report
ATAK RoutePlan
Secondary View& Touchscreen
Synthetic 1st
Person View
Popular Press:Wired MagazineNational DefenseAmerican Security TodayProduct Design and Development:
High-Level V-PANE VisionLWIR
Color
Point Cloud
Video Image Array LWIR Video
Imagery Projected onto 3D Geometry
Ground X-Vehicle
Boomerang Shot Detection
Static Imagery& Maps
3D View Renderer
Gunner View & Controls
Driver/CommanderView & Controls
SA System (e.g., ATAK)PathPlan
Lidar
AdditionalOnboardSensors
Future
‘Live’3D Model
1. The real-time fusion of:A. Lidar point clouds into a 3D model andB. Multiple 2D video streams onto that model andC. Other 2D or 3D threat, position or mapping data
2. The real-time rendering of that model with video into 2D displays from multiple perspectives
3. The real-time control of the multiple perspectives
V-PANE User Capabilities
360˚ Visualization
Arbitrary Visualization Perspectives
Multi-Spectral Image Fusion
Location Probing
Cue, Slew, Track a
Location
Fused Semantic Information
Real-timeRoute Analysis
Object Detection
Range, Bearing, Elevation
Slope (pitch/roll)
Position
Reconnaissance
After-Action
Reports
ObstaclesObstructions
Slope
Shot Reports
Blue Force Locations
Routing
Offline Viewing
Include Pre-existing
Data
Dynamic Controls
PeopleVehicles
Sensor Array
Real-Time Scanning, Modeling, Rendering
• Lidar + Video + IMU = The World
Real-Time Scanning, Modeling, Rendering
V-PANE Workstation ArchitectureLidar IMU/GPS Cameras
USBLAN
M6000Video Cache,
Rendering
2 x BMD Q2Video Grabbers
CPU1P100
Fusion
5 BMD HD
1 NovAtel1 VectorNav
2 x Velodyne HDL-32E
2 LWIR NTSC
P100Raycasting
P100Video Frusta
Projection
CPU2
1 IOI 4KSDI
K80Compression
SSD
V-PANE Data PipelineLidar IMU/GPS Cameras
USBLAN
M6000Video Cache,
Rendering
2 x BMD Q2Video Grabbers
CPU1P100
Fusion
5 BMD HD
1 NovAtel1 VectorNav
2 x Velodyne HDL-32E
Position,Orientation800 Hz2 x 40 KB/s
Raw IMU80 KB/s
1400k pts/s3 MB/s
Raw Lidar3 MB/s
Live Video2.9 GB/s
ProcessedLidar
18 MB/s
PixelPositions3 GB/s
2 LWIR NTSC
P100Raycasting
HD Video500 MB/s
Depths, Indices4 GB/s
P100Video Frusta
Projection
CompressedVideo500 MB/s
CPU2
Video3.3 GB/sDepths, Indices 4 GB/s
SSD
Video, Voxels600 MB/s
Voxels 1 GB/s
Voxels5 GB/s
NB: 1 - P2P links require no transfer to or from CPU2 - Non-P2P links count twice3 - QPI link hits both CPUs
Inter-GPGPU (P2P)
CPU-Driven
QPI
DisplayPort
1 IOI 4KSDI
Processing and Recording in a Single Server – Maximum-Load Analysis with PEX 8747 PCIe Switches
K80CompressionLive Video
2.9 GB/s
HD Video500 MB/s
Inter-GPGPU (non-P2P)
Voxels5 GB/s
Depths, Indices 4 GB/sVoxels 100 MB/s
Video, Voxels1.4 GB/s
Voxels5 GB/s
1
2
3
4
Lidar Projection: Geometry Fusion• Lidar lasers (1) reflect back
from surfaces in the world (2), sampling depth
• During each fusion update, every voxel reverse-projects (3) to lidar focal point to determine distance from voxel to surface
• Depth samples stored as rectangular array with fixed distance between samples (4) to optimize lookup (this requires resampling fixed-angle-delta lidar data)
Beam Modeling
100 m2.32 m
1.33°
Wobbler
• Fill in the vertical gaps between lasers
• Over pixels in current view
– Project into voxel array
– Find nearest zero-crossing
Ray Casting
Raycast: Determine 3D Point per Pixel
2
3
1
• For every pixel to be rendered on screen (1), project from virtual camera through pixel (2) into voxel space, to find intersection point with nearest surface on that ray (3)
• Output is a 3-space point per pixel to be rendered
Video Projection: Color Index per Pixel
2
31
• For every pixel to be rendered on screen (1), reverse-project from 3D point to real-world camera frame (2), to find intersection point with image captured in that frame (3)
• Any given scene may involve 100s of frames
Grabber GPU 1
GPGPU 3
GPGPU 2
GPGPU 1
Video Capture
V-PANE – Data Processing Frequency and Latency
Geometry Fusion
Copy
Transfer Voxels
Ray Cast
Video Project
< 50 ms
< 16 ms
Update FrequencyGeometry: 20 Hz
Video Capture: 60 HzRendering: 60 Hz
< 16 ms
Geometry Fusion
< 50 ms
Render
< 16 ms
Video Capture
. . .
< 16 ms < 100 ms
LatencyGeometry: < 148 ms
Video: < 132 ms
Geometry Latency: < 148 ms
Video Latency: < 132 ms
Profiling Geometry Fusion and Ray Casting
Geometry Fusion Cycle Time: < 50 ms
Ray Casting: < 16 ms
V-PANE Workstation Architecture
USBLAN
M6000Video Cache,
Rendering
2 x BMD Q2Video Grabbers
CPU1P100
Fusion
5 BMD HD
1 NovAtel1 VectorNav
2 x Velodyne HDL-32E
2 LWIR NTSC
P100Raycasting
P100Video Frusta
Projection
CPU2
1 IOI 4KSDI
K80Compression
SSD
Lidar IMU/GPS Cameras
Render
CPU GPUHost CUDA OpenGL
Color Convert
cudaMemcpy UndistortYUV to Pinned Host Memory
Copy Frame
From Live Cameras[Cache]
Fusion Projection
Build Frame
DXT5 to Pinned Host Memory
Copy Frame
From Disk
UploadDecompress
Undistort[Cache]
Fusion Projection
Build Frame
Copy Bits
Image Processing in V-PANE
CompressCopy BitsCompression
Download
Thread Activity (Video I/O)
Main Thread (M6000)
Video Load/Receive Threads
Enqueue Enqueue…
Decompress VBLANKRenderProcess Upload
Dequeue
Decompress…
Dequeue
Compression Thread (K80)
CompressProcess Upload
Dequeue
Process…Store
16.6 ms
Cycle: 17 ms
Video Latency: Best Observed Case
Time (milliseconds)0
…
25 50 75 100
Display Hardware
Host Processing
Grabber Capture
Camera Capture
DMA
Swap
Draw
Upld
Wait for VB
DMA
Swap…
Cycle: 17 ms
Event inWorld
Visible on Display
Hardware *
Software
CaptureThread
Render Thread
CapturedSignal
OutputSignal
After Capture, Render
Can Upload
* Total latency and host processing latency are observed
Hardware latency is inferred
Single 1080p60 Stream
LWIR Integration
Ongoing Work: Object Detection
• Image classification
Ongoing Work: Level of Detail
• Add a second voxel array:• 10x range, to 1 km radius
• 1/10th voxel resolution per dimension
• Ray caster uses per pixel when no hit in primary (hi-res) voxels
• Requires only 0.1% compute to update given 100 m lidar range
2,000 m 200 m
Ongoing Work and Challenges
• Voxel cache, Voxel LOD
• GPS/altitude
• Occlusion testing for video projection
– Bandwidth vs. render quality
• Timestamps
– GPS from IMU and lidar, but not from cameras
– Off by 100 ms = 2.5m
• What if video but no geometry? (Skybox)
• Voxel precision
– Image quality vs. geometry update rate (20 cm voxels? 10? 5?)
The End
Questions?