This is my personal page
|Project:||Multi-Descriptor Optimizations across the 2D/3D domains|
|Mentors:||Radu B. Rusu, Joesph Djusgash|
I am a student at the Technical University / Vienna under the supervision of Dr. Hannes Kaufmann.
My research interests include computer vision, machine learning, and multi-modal fusion.
The goal of this project is to investigate multi-descriptor types as an optimal alternative to a single descriptor type across the 2D and 3D domains. A common scenario is environmental changes, as in trail following outdoors, or indoor navigation, where a single descriptor type is no longer effective due to lack/changes in texture or range data. A multi-descriptor approach is expected to yield better results in such scenarios by utilizing the best performing descriptors; where a single descriptor would simply fail.
This project consists of 2 main components. The first component is a calibration/benchmarking step that captures scene properties and vital statistics of a set of feature descriptors. The 2nd main component consists of one or more objective functions, that enable the auto-selection of feature descriptor types to be applied, based on the initial calibration/benchmarking step, and the desired outcome, such as precision, speed, or compute-resources. The initial step could be repeated periodically in a background thread to re-calibrate the feature descriptors’ scene dependent vital statistics.
Although some of the research precedes this code sprint, as it is part of my thesis and interests, the PCL TOCS has given us the opportunity to enhance this work through the guidance of the mentors, and to contribute our work to the wider open source community.
Literature review to provide research level numbers and the required parameters for selected key point detectors and feature descriptors.
- Key point detector selection: SIFT, HARRIS, BRISK
- Non-max suppression / redundant key point removal
- 2D feature descriptors: SIFT, SURF, BRISK or BRIEF
- 3D feature descriptors: FPFH, SHOT, C-SHOT
Descriptor calibration/benchmarking: Build a framework that is able to capture vital statistics of the selected descriptor types:
- 2-Way and multi-descriptor matching rates.
- L2-distance, L2-ratio, and uniqueness of the top 2 kNN correspondences.
- True positive/Inliers rate based on a simulated ground truth.
- RANSAC-based evaluation as a compliment to the simulated ground truth approach.
- Background threading to be invoked periodically.
- Obtain the best key point correspondences based on the multi-descriptor approach described above.
- Select one or more descriptor type(s) that satisfies desired performance characteristics (precision, compute resources, execution time).
Verification step: using a selected set of point clouds.
It is with pleasure to share the successful completion of this Toyota Code Sprint in this final blog post. In this project, homography estimation based on multi-modal, multi-descriptor correspondence sets has been explored, and inspired the introduction of the multi-descriptor voting approach (MDv). The proposed MDv approach achieved a consistent accuracy in the 0.0X range, a level of consistency that is better than those based on single-type state of the art descriptors including SIFT. In the process, a framework for analyzing and evaluating single and multi-descriptor performance has been developed, and employed to validate the robustness of MDv, as compared with homography estimations based on a single descriptor type, as well as those based on RANSAC registration of best-K multi-descriptor correspondence sets. The code and dataset for this project are hosted on https://github.com/mult-desc/md, with dependencies on both PCL 1.7 and OpenCV 2.4.6.
Follows is an in-depth report detailing the project’s accomplishments, as well as design and validation considerations:Click here for a high resolution version of the report.
Correspondence rejection classes implement methods that help eliminate correspondences based on specific criteria such as distance, median distance, normal similarity measure or RanSac to name a few. Couple of additional filters I’ve experimented with include a uniqueness measure, and Lowe’s ratio measure as in “Distinctive image features from scale invariant keypoints”, D.G. Lowe, 2004. I’ve also explored the tradeoffs in implementing the filters within CorresondenceEstimation itself, or as external CorrespondenceRejection classes. The former is computationally more efficient if the rejection process is done in one pass, while the latter allows for scene-specific squential filter banks.
Follows is a quick reference guide of the available correspondence rejection classes with remarks extracted from the source code.
With my current work on optimizing correspondence estimation across the uv/xyz domains, it is worth providing a topology of the available correspondence estimation classes in PCL. For a highlevel treatment of the registration API, please refere to the registration tutorial.
Correspondence estimation attempts to match keypoints in a source cloud to keypoints in a target cloud, based on some similarity measure, feature descriptors in our case. Although applying scene relevant descriptor parameters and correspondence thresholds may reduce erronous matches, outliers persist with impact on pose estimation. This is due to the implied assumption that for each source keypoint, a corresponding target keypoint exists. The difficulty in estimating model or scene-specific descriptor parameters is another factor.
Follows is a quick reference guide of the available correspondence estimation classes with remarks extracted from the source code.
As mentioned in the roadmap, one of the goals is to implement a framework that captures vital statistics of selected descriptors and correspondence types. These vital statistics would then be analyzed by one or more objective function(s) to enable scene based optimizations.
The first milestone, a metrics framework for descriptor evaluation is now complete, and its output is in-line with the characteristics cited in Rublee et. al. ICCV 2011 paper, among other publications.
Specifically, the framework computes the intended vital statistics including: 2-Way and multi-descriptor matching and inlier rates. The filter banks include L2-distance, L2-ratio, and uniqueness measure. A simulated ground truth is also implemented and is generated during runtime. The framework has been applied to local 3D descriptors (FPFH33, SHOT352, and SHOT1344) across a range of downsampling leaf-sizes (0.01-0.07) and across a range of in-plane (0-90 degrees) rotations. A sample of the results is illustrated in the bar graphs below, which reflect the various metrics, computed at a 30 degree simulated rotation and at 2 levels of downsampling: 0.01 for the top bar graph and 0.07 for the next one. In total, 1680 rates were generated for further analysis by the objective function(s). A link is included below to a sample extended output for other 3D descriptors. Next step: to extend the framework to support 2D descriptors.
The extended output for other 3D descriptors follows, [click to enlarge]:
The project has started.