risedb: a novel indoor localization dataset
TRANSCRIPT
RISEdb: a Novel Indoor Localization Dataset
Carlos Sanchez-Belenguer, Erik Wolfart, Alvaro Casado-Coscolla and Vitor Sequeira
European Commission, Joint Research Centre (JRC)
Via Enrico Fermi 2749, Ispra (VA), Italy
Email: [name.surname]@ec.europa.eu
Objective
• Generation of a new indoor localization dataset for training and benchmarking indoor localization systems.
• Requirements:• Long sequences of geo-referenced images.
• Reliable ground-truth poses.
• Large variety of indoor buildings.
• Accurate 3D models for each building.
• Acquisitions performed over different conditions:• Different illumination.
• Different furniture distribution over time.
• Additional geo-referenced data acquired from a smartphone.
Acquisition platform
• MLSP backpack (Mobile Laser Scanning Platform), equipped with LiDAR and inertial sensors for SLAM applications.
• Two working modes:
• Mapping: the user moves freely inside the area to be mapped and the system produces, automatically, a globally consistent high-resolution point cloud (map).
• Tracking: the user loads a reference map and the system reports, in real-time, its pose within the reference frame defined by the map.
Approach
• Use the MLSP backpack to generate a reference map for each building of the dataset.
• Add new sensors to the platform and use it as an indoor GPS to localize the individual readings of each device.
• Main benefits:• Accurate positioning (centimeter accuracy).
• Positioning provided by an independent system (in contrast with SfM datasets).
• Fully automatic pipeline (i.e. no need to process data manually).
Updated acquisition platform
• Spherical camera: • Garmin Virb 360.
• Mounted into the sensors’ head.
• Connected to the MLSP’s main computer.
• Data:• Equi-rectangular spherical images.
• FullHD 1080p resolution.
• 15Hz.
• JPEG with quality level of 95%.
Updated acquisition platform
• Stereo camera: • Stereolabs ZED2 camera.
• Mounted into the sensors’ head.
• Connected to a wearable computer (custom design based on a nVidia Jetson TX2).
• Data:• 720p stereo pairs.
• 15Hz.
• Lossless format.
• Additional sensor data (IMU, barometer, temperature…).
• Off-the-shelf accurate visual odometry.
Updated acquisition platform
• Smartphone: • ASUS ZenFone AR.
• Mounted into the main body of the backpack.
• Custom made on-board acquisition software.
• Data:• WiFi access points available and signal intensity.
• Phone cell towers in range.
• Magnetic field, gyros and accelerometers.
• Noise level.
• Ambient temperature.
• Air pressure.
• Ambient light.
Sensor calibration
• The MLSP reports poses in its own reference frame:• Temporal: the internal clock of the backpack’s computer
• Spatial: the center of the horizontal laser
• The additional sensors acquire data autonomously (easy to integrate as many sensors as needed). We need to perform two calibrations:• Temporal: convert sensor timestamps into the MLSP’s time reference.
• Spatial: extrinsic calibration of sensors wrt the MLSP reference frame.
Extrinsic calibration
• Define manually correspondences between 3D LiDAR points and image pixels for each camera.
• Well-known PnP problem.
• Only once cameras are rigidly attached to the sensors’ head.
• Smartphone nominal values
Time calibration
• Define a 1D reference signal with the Z-rotation (yaw) component of the MLSP’s reported trajectory (Z aligned with the gravity).
• For each sensor, compute its trajectory based on the data it acquired:• Stereo camera: off-the-shelf visual odometry.
• Spherical camera: no need for time calibration, since it is recorded and timestamped with the MLSP’s computer (time reference).
• Smartphone: gyros/accelerometers fusion.
• Extract the yaw signal for each trajectory.
• Correlate each signal with the backpack’s.
• Fully automatic process.
• Performed once per sensor/acquisition.
Results – Mapping
• 5 different types of buildings mapped:• Office building: 108x76x21 metres. (bd100)
• Conference building: 45x34x8 metres. (auditorium)
• Workshop: 63x23x12 metres. (as3ml)
• Exhibition building: 20x53x4 metres. (visitors)
• Restaurant: 46x85x5 metres. (mensa)
• 1cm point cloud generated for each building (drift-free and globally consistent).
Results – Mapping demo
Results – Acquisition
• 30 sequences acquired.
• More than 6 hours recorded.
• 20.7 km walked inside the buildings.
• More than 1 million geo-referenced images.
Full dataset available at https://data.jrc.ec.europa.eu/collection/id-0111
Results – Acquisition demo
• Footage and ground truth trajectory for the spherical camera
Results – Time calibration
• Time calibration example for one acquisition: • Notice how the visual odometry from the stereo camera (ZED) matches
almost perfectly the reference trajectory provided by the MLSP.
• The smartphone trajectory contains more than enough information to perform the right correlation and, thus, time calibration.
Results – Calibration
• 2D feature projection error wrt distance analysis:• Detect and track visual features from the video sequence.
• Project the 2D coordinates into the point cloud using the ground-truth poses.
• Compute the projection error wrt distance.
• Between 2-5 metres features < 1cm error
• Overall average error: 3.35cm
• Overall median error: 2.20cm
Results – Calibration demo
• Point cloud projection over the image plane of the stereo camera using the ground-truth poses
Results – Smartphone data
Conclusion
• Fully automatic acquisition pipeline.
• Time calibration algorithm that allows integrating independent sensors with proprioceptive capabilities.
• New public dataset with large amounts of data:• High resolution reference point clouds.
• Long continuous sequences from two types of cameras, covering large environments.
• Accurate ground truth poses.
• Heterogeneous set of indoor environments.
• Changes over time and different lighting conditions.
Thanks!
• Questions?• Dedicated ICPR poster session:
• TRACK3 Computer Vision, Robotics and Intelligent System
• PS T3.6
• January 13th @ 17:00 CET
• Email: [email protected]