PCL Developers blog

All blog posts for all code sprints

SimpleTree - a PCL tool for geometrical tree modeling
Saturday, November 28, 2015
  • Introduction

Geometrical models of trees in the field of computational forestry are nowadays referred to as Quantitative Structure Models (QSMs). The capability of those models exceeds pure volume estimation of trees. If the volume of a tree is multiplied with density values the above ground biomass (AGB) of a tree can be derived non destructively.

Instead of being limited to predict the AGB like statistical or voxel based approaches QSMs give also insight into internal biomass distributions of a tree.

The geometrical building blocks of such models are commonly cylinders fitted into terrrestrial laser scan point clouds.

SimpleTree is an open source tool to build QSMs from TLS clouds (SimpleTree homepage). The released version is based on PCL 1.8.0.

  • The cylinder fitting method

Typically cylinders are fitted into point clouds with either RANSAC or NLS fitting routines. NLS needs initial estimates of cylinder parameters. Also RANSAC tends to produce errourness cylinders, if no initial segmentation of the point cloud is performed before.

The method implemented in SimpleTree relies on search spheres to produce an initial tree model which is enriched afterwards. A sphere with its center point located on the tree skeleton with a radius larger than the underlying represented branch or stem segment will cut the point cloud. All points located on the epsilon neighbourhood of the sphere will represent one or more circular cross sectional areas.


Into this subpoint cloud a circle can be fitted with RANSAC (Module sample_consensus). The sphere center point, the circle center point and the circle radius serve as parameters of a preliminary detected cylinder.


If the circle is enlarged and transformed to a 3D sphere the procedure can be repeated recursively.


To get fast access to the epsilon neighbourhood of a sphere a PCL search structure (Module octree) is used. All points contained in a search sphere have to be removed as soon as the sphere is utilized to prevent the algorithm to jump back and forth infinitely. For the case the epsilon neighbourhood of a sphere contains multiple cross sectional areas - this occurs in branch junctions - a clustering (Module segmentation) has to be performed. Into each cluster a circle is fitted


the largest circle is processed first (marked in green), while the others are stored in a FIFO queue to be processed later (marked in yellow).


By the nature of the algorithm a detected cylinders end point will coincide with another cylinders start point and an informatics tree structure can be utilized to store the cylinders. Several non PCL related statistical post processing procedures will adjust the tree structure and the cylinder parameters. The fit quality is also improved by another RANSAC routine. Points are spatially allocated to their nearest cylinder and only on this sub group a 3D RANSAC cylinder fit is performed. Before cylinder RANSAC:


and after RANSAC:


The final models are highly accurate and the informatics tree structure allows extraction of diameter classes


or the extraction of tree components like stem and branches:


Other possible parameters to extract are described in Hackenberg et al. 2015b.

  • Crown representation

Two comprehensive crown representations are also computed with PCL. The crown can be modelled as a convex hull (Module surface)


or a concave hull


Only the convex hull’s volume is for now written in Output files.

  • ICP to align cloud of different years

ICP (Module registration) can be used to allign scans of the same tree taken at different times, in the example one cloud represents the unprooned tree and the second cloud was taken after the prooning.





The initial allignment (semi automatic) is described in Hackenberg et al. 2015b.

  • References

The figures are taken from two peer-reviewed publications presenting the method and the software:

Hackenberg, J.; Morhart, C.; Sheppard, J.; Spiecker, H.; Disney, M. Highly Accurate Tree Models Derived from Terrestrial Laser Scan Data: A Method Description. Forests 2014, 5, 1069-1105.

Hackenberg, J.; Spiecker, H.; Calders, K.; Disney, M.; Raumonen, P. SimpleTree —An Efficient Open Source Tool to Build Tree Models from TLS Clouds. Forests 2015, 6, 4245-4294.

HRCS Stereo-based Road Area Detection final report
Wednesday, November 19, 2014

The Honda Research Institute code sprint was finished. All code was commited, the final report is attached below.

Improved inputs and results ( Third phase )
Friday, August 22, 2014
  • Introduction

    In this phase, the focus was to integrate the pcl::KinfuTracker class and the OpenCV libraries in order to provide better input clouds for the Rigid/Non-Rigid Registration methods and thus to obtain better results.

  • Approach

    By using the pcl::KinfuTracker class, it was possible to obtain an almost full scan of the heads of the subjects like the ones presented below:

    _images/sub1_front.png _images/sub2_front.png

    Using the pcl::KinfuTracker had the disadvantage that unwanted objects were also scanned during the procedure, however by using the cv::CascadeClassifier, the program was able to pin-point the position of the head and move the statisitcal model to a favorable position so that the Rigid Registration method could fully align the model. ( The sphere represents the center of the face)

  • Results

    Obtaining better input point-clouds, allowed to better analyze the efficiency of these types of registrations. As stated in the previous post, the accuracy of the result depends on the regularizing weight of the Non-Rigid Registration, however there is one more parameter to take into account. The training set of face meshes was registered on a ten times smaller scale than the PCL point clouds are stored in. Intuitively, this means that when the program reads the database it should register the values as ten times smaller, however the subjects used for testing this program did not have heads of exactly the same size.

    Below, you have an example of what happens when the scale is not set right:


    This result was obtained because, as it is presented in the folowing picture, the chin of the model was matched with the neck of the target, even though the rest of the face seems to be in position:


    Once the scale was properly established the correct result was obtained:

  • Feature work

    • A method should be implemented so that it would not be necessary to set the right scale, no matter what kind of person is being scanned (child/adult)
    • The pcl::KinfuTracker class has not been officially released, thus further maintenance of the prgram is required
    • The Registration methods presented so far have to be compared to other methods like the ones already implemented in PCL
Final Example “Final example” (Fourth post)
Monday, August 18, 2014
  • Introduction

Once we have the segmentation, and the functions to make the measures of the segmented candidates to objects, we need to create a example where we analyze all segments and we decide which ones are objects or not. Also the objects that have more similarities in the measures, we can classify into a category.

  • FrameWork

This tutorial gives an example of to implementation of the algorithm present in:

“Andrej Karpathy and Stephen Miller, and Li Fei-Fei”, “Object Discovery in 3D Scenes via Shape Analysis”, “International Conference on Robotics and Automation (ICRA)”, “2013”


As the autors said in the abstract: “We present a method for discovering object models from 3D meshes of indoor enviromments. Our algorithm first descomposes the scene into a set of candidate mesh segments and then ranks each segment according to its “objectness” a quality that distinguishes objects from clutter”.

In this example we will make all the framework to obtain the segmented objects at first and then calculate some geometric measures to make a clasification of the “objectness of the segmented candidates. After that we save all the “good” (the segments that the algorithm think are a object) and the measures asociated to each object. This can be used by the Recurrence to improve the clasification of the objects.

We begin with the based graph segmentation that was explained before. We made a over-segmentation of the meshes. To obtain diferent segments for each scene. We need to make this segmentations in all scenes that we want tto use for Object Discovery.

After we have the segmentation, we need to select the object-like segmentes. For that we apply some test to discard the not correct segments.

-Min and Max number of points. If the segment dont have the engouh points or is really bigger, we discard this object.

-After we take a look about the shape, we made two test, one about the shape, if is really flat or thin we discard this segment. Also we look into the size in the 3 axes. If this size is under or over a threshold is also rejected the segment.

-The next test is about diferent geometrical measures. This meauses where explained in the second post of this blog. We compute all this measures for each segment and we store this value. This will be used in the next test.

-Due to, we make a over-segmentation with different thresholds, probably will have the same segment more than one time. For that we make another test for each scene. This test name non-max supresion consist in look into all the segments with a result of more than 80% for a intersection ove union. If they have more than this, we compare the value of the measures, the object with the hightst value of this measure is keep and the other is discarded.

This is the last test. Now we show the results for a couple of scenes.

Scene 1

Initial scene

Object-like segments presents in the scene

Initial scene

Object-like segments presents in the scene


We store for a great number of scenes, all the segments and measures for each segment. Finally in the next step we add another measure, the recurrence.

More snapshots of some final results from different scenes:

_images/0.png _images/11.png _images/21.png _images/31.png

Last Step: Recurrence

We are going to compute the recurrence in the previous results. For that we need to have the results of the object discovery. This recurrence consist in analyze all the segments in all the scenes, because segments that are commonly found in other scenes are more likely to be an object rather than a segmentation artifact. For that in the paper explains that, that the best way to measure the distance between the top k most similar segments is to find the objects with a 25% of extent along principal directions in size and measure the euclidean distance between their normalized shape measures. This is the best option, taking care the computacion time. We can use in the case other approachs like icp, ransac or local descriptors, global descriptors. In this case, for all objects founded in the segmentation proccess from before, we are going to obtain this score and find the 10 most similar objects.

After aplying the Recurrence we show the next results:

1st Query for an object


This case is really good (the implementation works!!!), and make more easy to understand the paper. After all the work, we obtain a possible object in a scene ( a cup) also with the aid of all scenes, give us a measure quite confident to clasify this objects. How much big is the number of scenes, this score improves its quality. This means, more scenes more quality.

2st Query for an object


3st Query for an object


4st Query for an object


Finally I think that I accomplish the main goal of the GSoC program. That was implement the algorithm in the PCL format.

To use this program and to look into the tutorials to we need to wait to be acepted a possible pull request with the source code, the 2 class that I create into the PCL website and in the PCL library.

that’s all folks!!!

P.S. I will update the post with links if I get the pull the request of the code that I created during GSoC.

Different Feature representation of surface patches and analysis
Monday, August 18, 2014

This project aims at getting features for each surface patch and group them to gether based on a machine learning technique. I have been experimenting on the features that could represent the surface patches belonging to an object and how they relate with the inter and intra object surface patches. Below are some of them, with their description and plots.

For all the below analysis please look at the image below and the segments derived out of basic region growing which doesn’t make use of RGB color and purely based on surface curvature.


ColorHistogram: Binning colors is pretty normal thing to get the appearance cues. But they can be done in two ways. One that captures the color channels independently and the other that captures a dependent binning. In this section I am showing the independent binning of values on three different channels. Independent means at a pixel/point in scene we check RGB values different and increment the bin where these values belong to. Below is the plot that shows how the histograms look like for RGB and HSV color space.

_images/color_rgb.png _images/color_hsv.png

ColorHistogram 3D: Binning colors dependently means having a matrix with RGB in 3 dimensions and increment the value of the (r, g, b) bin only if that combination is satified. This gives a 3D histogram. Below images show the 3D histogram concatenated to a 1D histogram for RGB, HSV and YUV color space. This binning style is referred from [1].

_images/colorh_3d_rgb.png _images/colorh_3d_hsv.png _images/colorh_3d_yuv.png

Verticality: Verticality actually represents how the surface patch is orientated with respect to the camera viewpoint. Histogram is developed by binning the difference of angles between the normals and the direction in which the camera is pointing to (i.e the positive z axis for any point cloud from Kinect). Below is the histogram plot for this feature. However this is not so useful to this segmentation project but more aimed towards the object discovery and saliency related problems to distinguish the a surface patch from its peers. This is implemented based on [2] which was adopted from [3].


Dimentionality Compactness: This actually shows how compact a surface patch is. One could do a PCA of a surface patch and derive a local frame of reference. This when followed by creating a bounding box gives the 3 dimensions in which the patch ranges the max. This bounding box will have 3 dimensions and their ranges are computed. Once this ranges (xrange, yrange, zrange) are sorted into min_range, mid_range and max_range, two ratios are computed. 1) min_range / max_range and 2) mid_range / max_range. Below is plot of this histogram for the above mentioned segments. This is implemented based on [2] which was adopted from [3].


Perspective scores: This is the ratio of the area projected in the image to the maximum area spread by the region in 3D. The pixel_range below means the bounding box range in the particular direction of the image pixels. xrange, yrange and zrange are the dimentions of the 3D bounding box of the surface patch. Note that PCA shouldn’t done here as we are comparing the 3D surface patch with its appearance on the image. This is implemented based on [2] which was adopted from [3]. Below are the elements of the histogram.

  1. pixel_x_range (in meters) / xrange

  2. pixel_x_range (in meters) / yrange

  3. pixel_x_range (in meters) / zrange

  4. pixel_y_range (in meters) / xrange

  5. pixel_y_range (in meters) / yrange

  6. pixel_y_range (in meters) / zrange

  7. diameter_of_bounding_box_pixels (in meters)/ diameter_of_cuboid_bounding_box_in_3d

  8. area_of_bounding_box (in metersquare) / area_of_the_largest_two_dimentions_in_3d


Contour Compactness:

This is the ratio of the perimeter to the area of the region. This is the ratio of number of boundary points computed for a segment to the total number of points in the region. This is implemented based on [2] which was adopted from [3].



[1] A.Richtsfeld, T. Mörwald, J. Prankl, M. Zillich and M. Vincze: Learning of Perceptual Grouping for Object Segmentation on RGB-D Data; Journal of Visual Communication and Image Representation (JVCI), Special Issue on Visual Understanding and Applications with RGB-D Cameras, July 2013.

[2] Karthik Desingh, K Madhava Krishna, Deepu Rajan, C V Jawahar, “Depth really Matters: Improving Visual Salient Region Detection with Depth”, BMVC 2013.

[3] Alvaro Collet Romea, Siddhartha Srinivasa, and Martial Hebert, “Structure Discovery in Multi-modal Data : a Region-based Approach”, ICRA 2011.

These features will go into the features module of the pcl_trunk pretty soon. Currently working on the relational features which tells how two surfaces are related to each other. Next blog post should be on that.

Almost close to the end stage of GSoC, but this project has long way to go! Hope to keep pushing stuff till the entire pipeline is up on PCL.

Implementation of the shape generator finished
Wednesday, August 13, 2014

I finished the work on the shape generator module. It became quite a versatile tool. Here is an overview what one can do:

  • Create simple shapes (Sphere, Cylinder, Cone, Wedge, Cuboid, Full Torus, Partial Torus).
  • Create any type of polygon shape (Just give it a list of vertices and edges). If you want to use the more advanced features like cutting the polygon needs to be convex.
  • The list of simple shapes can be easily extended using convex polygon.
  • Combine any number of shapes/objects into more complex objects.
  • Create cavities, holes, tunnels by cutting one shape/object by another.
  • Everything you generate will have normal information attached to it.
  • Full control over the labeling of shapes. Label each shape individually, label a group of shapes, give the full object one label .... This allows one to tailor groundtruth to what you actually want to benchmark.
  • One class for reading assembly instructions from human readable text files (recipes). You can access all features of the generator except the general polygon class. Recipes can also call/include other recipes. This makes it possible to create complex objects and reuse them in different settings. Another example would be the creation of several objects which can easily be arranged in various scenes.

Here are some examples how we can create a table top scenario consisting of a table, a cup on top and a saw which cuts into the table using recipes (shown on the right side). First we make and compile a recipe file for a table leg:


Next we will combine four legs plus a cylindrical plate to a full table (we set the variable Label to keep, which preserves different labels for the object’s parts):


If we set Label to unique it will tell the generator to give only the table parts unique labels (not the parts the parts consist of :) )


This labeling would make sense if you want to test for instance part segmentation algorithms like LCCP.

Now let us combine the table and two objects. We have multiple options for the labeling now: First: Each object gets its own label:


Second: Give the parts of the objects their own label:


Or keep the labels of the parts and parts of parts unique :) :


Just to mention: All points have normal information attached:

Wednesday, August 13, 2014

Initially, the main objective of my project was to quickly obtain skeleton information using the GPU People module and focus the construction of the classification framework to recognize human actions. As it turned out that strong modifications of the gpu/people module were necessary in order to obtain the desired output of skeleton positions, the main direction of this project has moved more to extending and improving the gpu/people module. In the end, a simple action recognition framework was implemented, using K-Nearest-Neirghbour classifier on the positions and angles of selected joints, and RGB-D data of people performing the defined actions was collected for training.

The project had following major contributions:

  • Calculation of the skeleton joint positions and their visualization (in the new People App the skeleton is visualized by default)
  • Using Ground Plane People Detector in order to segment the body before the body labeling is applied. Without this modifications, the pose detector has big problems with large surfaces (walls, floor ...) and works robustly only in near-range. Besides this, the speed of the detector is increased as most of the voxels obtain very high depth depth values
  • Tracking of the skeleton joints (see my previous post). Alpa-Beta filter is applied on the measured joint positions.
  • Offline skeleton detection from the recorded LZF files.
  • Classification of skeleton data sequences with K-Nearest-Neighbours

The source code of the new version of the gpu/people module can be downloaded here: https://github.com/AlinaRoitberg/pcl/tree/master/gpu/people

Building the app requires PCL installation with GPU enabled. For more information, see the PCL information on compiling with GPU and the tutorial for the initial /gpu/people module

The new application can be executed in the same way as the initial PeopleApp, while new functions can be actuvated with folowing additional flags:

-tracking <bool> activate the skeleton tracking
-alpha <float> set tracking parameter
-beta <float> set tracking parameter
-lzf <path_to_folder_with_pclzf_files>
-lzf_fps <int> fps for replaying the lzf-files (default: 10)
 <path_to_svm_file> activates body segmentation

Tree files for people detection: https://github.com/PointCloudLibrary/data/tree/master/people/results

SVM file for body segmentation (Ground Plane People Detector): https://github.com/PointCloudLibrary/pcl/tree/master/people/data

Example of execution (active segmentation, no tracking, using live data):

./pcl_people_app -numTrees 3 -tree0 ../tree_files/tree_20.txt -tree1 ../tree_files//tree_20_1.txt -tree2 ../tree_files/tree_20_2.txt -segment_people ../svm_path/trainedLinearSVMForPeopleDetectionWithHOG.yaml

Example of execution (active segmentation and tracking, using recorded PCLZF data)

./pcl_people_app -numTrees 3 -tree0 ../tree_files/tree_20.txt -tree1 ../tree_files//tree_20_1.txt -tree2 ../tree_files/tree_20_2.txt -segment_people ../svm_path/trainedLinearSVMForPeopleDetectionWithHOG.yaml -lzf /path_to_lzf_files/ -lzf_fps 10 -tracking 1

Take a look at the video demonstrating the modifications of the skeleton tracker (sorry for the format:) ): http://youtu.be/GhCrK3zjre0

Further improvements of the skeleton detection, data collection and activity recognition framework
Tuesday, August 12, 2014

Joint tracking

Although the body segmentation has solved the problems with full body visibility the resulting joint positions were still not sufficient. Besides the Gaussian noise, big “jumps” occured if suddenly a blob with a compeletly wrong position was labelled as the corresponding body part.

To solve this, I implemented a simple tracking function (Alpha-Beta-Filter) on top of the joint calculation. The filter estimates the new position based on the predicted (calculated from the previous position and velocity) and measured position. The weight of the measured position is given by the parameter alpha, while beta shows the weight of the velocity update.

Activation of tracking, alpha and beta parameters can be set in the PeopleApp.

The parameters should be chosen wisely, current default values worked good for me, but the also depend on the frame rate, which is not easily predictable, as it depends on the GPU.

Other changes to the skeleton calculation

Integration of tracking improved the results, however, problems still occured if the voxel labelling failed. Unfortunately, this happed a lot with the hands if they were very close to the rest of the body (and could be hardly distinguished).

That’s why I added some correction based on the position of the elbows and the forearms. I estimate the expected position based on the positions of those body parts and if the measured position is too far away, the predicted result is used.

Besides I discovered the global variable AREA_THRES2 (in people_detector.cpp) and increasing it from 100 to 200 improved the labelling. Increasing it reduces the noise, while making the probability that small blobs (like hands) will be missed higher.

Data collection

As I have mentioned in my previous post, I wanted to collect the skeleton information from the recorded Kinect data in order to use the full framerate. To do so, I extended the people app with the option of using recorded PCLZF videos instead of the live Kinect data.

The positions of skeleton joints can be stored in a txt file with following flag: -w <bool val>

The data format is defined as following: SeqNumber Timestamp Pos1_x Pos1_y Pos1_z Pos2_x Pos2_y Pos2_z

I recorded the Kinect data of ten people performing the activities (defined in my previous post) multiple times. Afterwards I segmented the PCLZF-files with each directory containing the one execution of an action and run the people detector on each of them (I had a lot of fun with it:)). As copying the files and running the skeleton detector really took me “forever” I had to stick with a part of data and focus on the activity detection framework. The current framework contains segmented training data from 6 participants (269 recordings all together) and I will of course add the rest of the data, but I doubt that it would happen before the end of GSOC.

Another file format was defined for the training data, which includes all recorded skeletons, annotation of the action and UserID.

Classification Framework

I implemented a simple K-Nearest-Neighbours algorithm to classify a skeleton sequence. As all sequences have different lengths and the number of frames is high, one should find a way to calculate a good feature vector. In the implemented framework, classification happens in following way:

  • Read skeleton data is divided into a fixed number of segments
  • For each segment the joint positions are calculated as the mean value over all frames in the segment
  • For each resulting skeleton (one per segment) further feature selection takes place: the decision which joint positions and angles to use
  • All joint positions are calculated in relation to the Neck position (and not the sensor)
  • The height of the person is estimated (difference between the head and the feet in the first or last segment) and the positions are normalized accordingly
  • Mean and variance of each dimension of the feature vector is calculated over the whole training set and the data is normalized accordingly (this is important in KNN, to let all dimensions contribute in the same way).
  • The feature vectors can be classified with K-NN (1-NN by default).

The application can classify a new sample as well as perform Leave-One-Out cross validation on the training set and print out the confusion matrix.

Currently I am getting following results:


The recognition rate of 34% might not sound that great. However one should consider the high number of actions and the fact that the skeleton data has some problems with joint positions, especially with the hands, which makes the recognition very challenging. The people detector also has some severe problems with unusual poses.

Further data processing, feature selection and more complex classification methods might improve the performance significantly in the future.

Second implementation of Code “Segmentation and creation of object candidates” (Third post)
Monday, August 11, 2014
  • Introduction

The implementation of the Efficient Graph-Based Segmentation, base on:

Efficient Graph-Based Image Segmentation Pedro F. Felzenszwalb and Daniel P. Huttenlocher International Journal of Computer Vision, Volume 59, Number 2, September 2004
  • Work

Due to license problems, I had to re-implement the code of the segmentation, also I need to implement the same code to use in Mesh and Point Clouds. To use in images, we need to create a graph, based on a image. To improve the results, In the paper recommend to smooth the image before the segmentation. I use the gauss filter implemented in PCL, for this purpose. After the code was written, and with the access to original code, I made some test to check if the segmentation was good.

The first test ,was about the segmentation without the gauss filter applied for each case and after, I made the same picture with the gauss filter .

  • Results

Without Gauss Filter(Original Code)


As we can see in this image, there are a lot of segments, but if we compare this image with the next image, the adapted code is the same segments in both of them. The colors are different, because the colorization is generated random.

Without Gauss Filter(Adapted Code)


Then a test, with a sigma for the Gaussian filter of 0.5, and a threshold for the algorithm of 500 was made in the same image for both cases.

With Gauss Filter (Original Code)


With Gauss Filter (Adapted Code)


Here we can make that, there are some differences. After some research in the code. I found that, when we compute the weight for the graph, there are some minimal differences, this gives us the idea, that the smooth of the image, is different in both cases. Is small difference, some decimals, but this creates a different result. I didn’t make more research about it, because is not part of the GSoC task.

  • Point Cloud segmentation

Once we have the graph segmentation working, the next step is made the algorithm to use with Point Clouds. For that, the only thing that is needed to use, is create a new graph based on the difference of normal or curvature.

As we now, that the part of graph segmentation, we only need to check if the graph is generated correctly. For that I apply the segmentation to a mesh, and return the Point Cloud, this point cloud have some segments. Each segment have a unique color to see if it’s correct.


The original mesh


The mesh segmented


The results are correct, this mesh have a lot of different objects, and this objects are well segmented, can be asociated for one possible object in the scene. Each of this objects, will be apply the diferent measures to test and discover if is a possible object or not.

3. Some real results! (a.k.a. Not hacks!)
Monday, August 11, 2014

In this post I will show you some results I got yesterday of our demo code working in a slightly cluttered scenario and fitting superquadrics to household objects.

Here is the environment we wish to recognize:


Here are the fitting results:


Here a bit more of detail with the colored pointclouds:


And here the fitted superquadrics alone for a better view:


The code used for the results shown in the rest of this post can be found here .

So, how does this code work? A general overview would go like this:

  1. Connect your Kinect to your PC :0}
  2. A snapshot of a pointcloud (captured by OpenNI grabber) is taken (demo.cpp)
  3. The pointcloud captured is segmented (table + objects) in clusters (demo.cpp)
  4. The clusters (one-view) are mirrored based on [Bohg2011] (tabletop_symmetry/mindGapper.cpp)
  5. The mirrored clouds are then fitted to a SQ (SQ_fitter_test1.cpp)


  1. Using mirrored pointclouds helps substantially to the fitting process.
  2. I have implemented the optimization process using two different libraries: levmar and ceres. While ceres have to be installed from a external repo, levmar can just be added to the code (sq_fitting/levmar). It only depends on lapack and it is lightweight. I have to test how both compare in processing times. The results I am attaching were obtained using the levmar optimization process.
  3. If you noticed, the milk carton was not fitted. I got an spurious result there given that the mirroring code did not work well with that object.
  4. Fitting times (still in debug mode, not optimized and with warning printouts here and there) is < 2 seconds in my old laptop.
  5. You might notice that for one of the objects (the blue water container) the fitting produces a SQ longer than needed (cross the table). How to limit it to be shorter?
  6. For objects like the tuna can, in which the “bigger axis” (assumed to be Z) is horizontal, the fitting is not exactly as it should be (the revolution axis should be UP, but it is horizontal. How to deal with objects that are wider than taller?

Things to figure out:

  1. Levmar or Ceres? So far I am more inclined towards levmar since it is smaller and - other than lapack, which should be installed on most Linux machines - it can be modified. I know Ceres is fairly popular but I am not sure if we need all that power.
  2. Figure out the “wide” objects issue.
  3. Improve my mirror code for cases like the milk carton with bad viewing angles.
[Bohg2011]Bohg, Jeannette, et al. “Mind the gap-robotic grasping under incomplete observation.” Robotics and Automation (ICRA), 2011 IEEE International Conference on. IEEE, 2011.
First implementation of Code “Objectness measures” (Second post)
Wednesday, July 30, 2014
  • Introduction

I begin with main part of the code, and probably the most easy to implement. In this part I write the code to make this measures for each Point Cloud. The input in the code is a point cloud (or set of pointclouds) and the output a vector with the values of the resultant measure. To simplify the code part and the use of this for other aplications,this part is separated from the segmentation, also if anyone want to use other segmentation that the proposed in the paper is usefull.

  • “Objectnes measures”

We have 5 different measures:

  • Compactness: This measure looks into how compact is a object. Search the minimum bounding sphere that contains all the surface of the mesh of the pointcloud.
  • Symmetry: This measure analyze the reflective symmetry along the three principal axes, based on the eigenvalues of the scatter matrix.
  • Smoothness: This measure evaluate the quality of the points of the segments, if the segments have points uniform spread around it, this will score high, if have spread points this will be have a low score.
  • Local Convexity: This measure determine the convexity of each polygon edge in the mesh of the point cloud,and score each segment by the percentage of its edges which are convex.
  • Global Convexity: This measure is generated by a convex hull, and after that a mean distance between the points and the convex hull isrecorded to create this measure.
  • “Future Work”

In the future work, I need to implement the segmentation to give data to measure the objects and test this code.

Also, there is one measure missing, one that use all the segments in diferent scenes to make a better clasification.

This is the “recurence”, but this depends on the on the number of scenes and segments, this need the segmentation before and analyze the number of similar segments in the objects in all scenes. Objects with similar segments should be in the same category.

GRSD Descriptor computation and analysis
Monday, July 28, 2014

Global Radius-based surface descriptor concatenates the RSD descriptor as discussed in the previous post to represent the complete object. GRSD gives a good description of the 3D shape of the object. Below are the set of objects and its GRSD descriptors i.e. histograms. I have used University of Washington’s “Large scale RGBD dataset” for the experiments.

For an object whose surface is planar but has 2 different planes in the view
_images/box_image.png _images/box.png
For an object whose surface is planar but has 1 planes in the view
_images/plane_image.png _images/plane.png
For an object whose surface is spherical
_images/sphere_image.png _images/sphere.png
For an object whose surface is cylinderical but doesn’t have any planar surface in view
_images/cylinder_image.png _images/cylinder.png

It can be seen that all the descriptors are different from eachother. Planes and box surfaces are similar as the surface characteristics are similar in this case. Both GRSD and RSD are pushed into the pcl-trunk for people to use. The test files for these two features are also included in the trunk for the basic usage of the same.

Currently working on the NURBS for small surface patches. Since NURBS are already available in PCL we will be looking at how to tailor the same for our needs. After this we plan to work on the features that compute the relationship between the surface patches.

Object discovery in KINFU DATA (First post)
Sunday, July 20, 2014
  • Introduction

The principal goal of this project will be implement the algorithm and app developed in:

-Fei-Fei, L., Karpathy, A., Miller, S.”Object discovery Via Shape Analysis”.

This automatic object discovery will be useful for robotics or autonomous vehicles and it will be able to find different class of objects. This new approach compares different parameters of an object to classify it into a class and diffence the objects. The parameters that defined a object is based on the: “Objectness measures” part in the paper. This is the core of the app and the most important part. And probably a good part to begin to code because is quite easy to code.

Rigid and Non-Rigid Transformation ( Second phase )
Thursday, July 17, 2014
  • Introduction

    In the previous phase, it was presented how to obtain a statistical model from a set of face-meshes. The next step in our project is to “match” the mean face of the database, with the face of a random person, like the one in the picture below:


    The matching is done by applying alternatively the following methods.

  • Rigid Registration

    This method is very similar to the Iterative Closest Point Cloud algorithm, because the goal is to estimate a rotation matrix and a translation vector that would move the average face to an optimal position, near the face of the kinect. Basically, it is required to minimize the error \epsilon =  \sum ||\vec {y} - (R \cdot \vec{x} + \vec{t})||^2 and this is done by calculating the solution of this system in the least square sense. In order to calculate this solution, the system is first linearized using the Jacobian matrix.

    Of course this process is applied iteratively, and below are presented a few stages of positioning of the model over the scan:

    _images/debug_1.png _images/debug_2.png _images/debug_4.png _images/debug_5.png
  • Non-Rigid Registration

    Once the model is roughly aligned, we need to modify the shape of the model to match the face from the scan. For this we make use of the eigenvectors computed in the previous phase and we calculate the optimal solution of this system: \vec {y} = P \cdot \vec{d} + \vec{model}, where P is the matrix of eigenvectors, \vec{model} is the current form of the model and \vec{d} is the vector of basis coefficients that need to be determined.

    However, there is on more constraint to be applied and that is to minimize the sum \sum_i \frac{d_i}{\sigma_i}, where \sigma_i is the eigenvalue of the corresponding eigenvector. Therefore, to the Jacobian matrix of this system, we need to add a diagonal matrix with \frac{1}{\sigma_i} on the diagonal and multiplied by a certain weight.

    The purpose of this regualrization is to determine to what degree the face should be deformed. The eigenvectors are stored in the P matrix in decreasing order according to their eigenvalues and their position in this sorting order determines whether they have a greater or a smaller influence on the shaping of the model. When the model is mostly overlapping with the face in the scan, more information can be drawn about the final figure, hence the weight specified above should be smaller . On the other hand, if the model is not yet aligned with the scan, the deforming should be smaller and thus the weight should be bigger. Below you can see how the model looks for several values of the weight:

    _images/weight1.png _images/weight2.png

    Notice that the shaping process tends to return the same effect if the weight of the regularizing constraint exceeds a certain value.

  • Results

    As mentioned above, these functions are applied alternatively for a few number of times, and the following results were obtained:


    The above picture was obtained after one iteration and the following one after 10:


    Also, below you can observe the precision of this method, the black figure representing the final version of the model and the green one representing the point cloud of the face:

Implementation of the shape generator
Tuesday, July 08, 2014

The work on the lccp implementation is mainly done and the pull request is awaiting its approvel. Thanks a lot for the help of the pcl community (especially Sergey and Victor). Thanks to your comments the lccp algorithm is now in a much better shape. Talking about shapes: The next milestone in this GSOC project is the implementation of a shape generator which can be used to create various labeled scenes which can be used to create unit tests or benchmarks for part and/or object segmentation algorithms. I have written a big part of the code already. Now the question is: To which module of pcl should this generator go? I think about putting it into the geometry module. Any comment on this is more than welcome! The next image shows an assembled animal-like object which has been generated from simple geometric shapes (mainly geons).

2. Implementation of local-based approach for stereo matching
Saturday, June 28, 2014


In this post, I will briefly describe the current state of the stereo module and the new features added.

Currently, the stereo module encompass two matching local-based algorithms: 1. Block-based algorithm, which is programed using the Box-Filtering algorithm proposed in [McDonnell81]. 2. Adaptive Cost 2-pass Scanline Optimization, presented in [Wang06]. Both methods use the Sum of Absolute Differences (SAD) as the dissimilarity measure.

As mentioned in the previous blog, the first objective of the present project is to implement the local-based approach proposed in [Min1], for dense correspondence estimation in a pair of grayscale rectified images with an efficient cost aggregation step. Additionally, the cost aggregation step in based on the method presented in [Yoon06], where the weighting function uses a similarity measure based on the color and spatial distances.


In order to do so, a new class CompactRepresentationStereoMatching was created in the stereo module. This class inherits from class GrayStereoMatching, which in turns inherits from class StereoMatching, since some pre and post-processing methods are re-implemented. The new class has five member functions with public access: setRadius, set FilterRadius and setNumDispCandidates, setGammaS, setGammaC, which set three data members of type int (radius, filter_radius and num_disp_candidates) and two of type double (gamma_c and gamma_s) with private access, as well as implementing the virtual method compute_impl.

radius corresponds to the radius of the cost aggregation window, with default value equal to 5.

filter_radius corresponds to the radius of the box filter used for the computation of the likelihood function. The default value is 5.

num_disp_candidates is the number of the subset of the disparity hypotheses used for the cost aggregation. The default value is 60.

gamma_c is the spatial bandwidth used for cost aggregation based on adaptive weights. The default value is 15.

gamma_s is the color bandwidth used for cost aggregation based on adaptive weights. The default value is 25.

Similarly to the previous methods, the current class is based on the SAD matching function, and it estimates the per-pixel cost efficiently using the Box-Filtering algorithm.

To test the algorithm, the Middlebury stereo benchmark (http://vision.middlebury.edu/stereo/) dataset is going to be used.

[McDonnell81]McDonnell, M. J. “Box-filtering techniques”. Computer Graphics and Image Processing 17.1, 65-70, 1981.
[Wang06]Wang, Liang, et al. “High-quality real-time stereo using adaptive cost aggregation and dynamic programming.” 3D Data Processing, Visualization, and Transmission, Third International Symposium on. IEEE, 2006.
[Yoon06]K.-J. Yoon and I.-S. Kweon. “Locally Adaptive Support-Weight Approach for Visual Correspondence Search”. In Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), 924–931, 2005.
Bayesian Approach for fitting Geometric Models
Thursday, June 26, 2014

Hola! This is my first blog post and I’ll start directly with a question:

why should one still be using the RANSAC paradigm in it’s vanilla form for geometric model segmentation in these days of modern multi-core computer systems ?

I’ll argue that on a multi-core system, simple Monte-Carlo simulation based alternative, Parallel Sampling Consensus (PaSAC) combined with a Model Selection out-performs the vanilla RANSAC in most cases. However though, RANSAC may be better if the data is quite clean, i.e. contains very little outliers. In this case RANSAC can attain the required probability of accepting a sampled model within very few iterations, thus not running all the maximum number of iterations. In all other cases, I believe an PaSAC is far more better, at least in terms of speed. As simple as it is, below is a sketch of this algorithm:

for(int i = 0; i < max_iterations; i++)
        sample[i] = get_sample(); //similar to sampling from a proposal distribution
        weight[i] = likelihood_estimation( sample[i] ); //compute the weight of the proposal
//weight normalization [optional]
for(int i = 0; i < max_iterations; i++)
        likelihood[i] = weight[i]/max_weight;
//model selection: get the MAP sample
map_sample = get_sample_with_max_weight(weight[i])

As always, simple things first and a bit of theory. I’ll take plane segmentation as a running example to demonstrate the performance boost you can get on a multi-core system. The above algorithm is making so-called independent and identically distributed (i.i.d.) samples. These samples combined with their weights will give us an approximation of a distribution over planes. This idea of drawing i.i.d. samples to approximate a probability distribution function (pdf) is the core of Monte-Carlo simulation methods and is one of the main thrust in my project. Meanwhile, having the pdf, we can easily make inference. For the geometric model segmentation problem, the major goal is to estimate the maximum a posteriori (MAP) sample. This is just the sample with the highest weight. Also of interest is the minimum mean square error (MMSE) estimator. This is particularlly important in tracking applications. Using the optional weight normalization stage in the code snippet above. On our modern multi-core computer systems, we can run the loop in the above algorithm in parallel since all the samples are i.i.d. and there is no coupling from one sample to the other within the loop as in the RANSAC case.

At the moment, I’ve made a copy of the Sample Consensus Module, renamed this to mcmc module and implemented an mcmc segmentation class for most of the popular model types (Plane, Cylinder, Sphere, Line, ...) already defined in the sample consensus and segmentation modules. I parallelized the PaSAC algorithm using thread pooling concept of boost.asio and boost.thread.

The Figures below show the performance annalysis on three datasets gained from dense image matching and LiDAR. I found it a bit more interessting to test these algorithms on huge data sets containing millions of 3D points.

Dataset “Dublin” with over 8 million 3D points from LiDAR.

_images/dublincloud.png _images/dublin.png

Dataset “Facade” from dense image matching and downsampled to 5 million points.

_images/facadecloud.png _images/facade.png

Dataset “Building” from dense image matching and downsampled to 200000 points.

_images/buildingcloud.png _images/building.png

The runtime performance analysis of MSAC from pcl vs. PaSAC was made for the plane segmentation example. For this results, a DELL Precision 650, 8xcore, running Windows 7 and VS2010 was used. All time measurements where done using pcl::console::TicToc. An MSAC based likelihood function was used for PaSAC. Notice the effect on increasing the inlier threshold on the runtime.

In my next post, I’ll discuss my implementation of the three major Markov Chain Monte Carlo (MCMC) algorithms namely Importance Sampling (very similar to PaSAC), Metropolis-Hastings (MH) and the more general reversible jump MCMC (rjMCMC) algorithms. All discussions will be focused on how to use the MCMC algorithms in the sequential evolving data scenario i.e. Tracking (Yes, we are awere of the Tracking library in PCL) and fitting competing models (e.g. fitting curves using splines with an unknown number and locations of knots or control points of splines e.t.c.). Also, since mcmc samples are drawn from the model space rather than from the data space as in RANSAC, it might be challenging to just extends the existing SampleConsensusModel Class.

RSD Feature computation and analysis
Thursday, June 26, 2014

RSD Feature is a local feature histogram that describes the surface local to a query point. There is pcl implementation for this that is available in the features folder. With the help of my mentor I understood the algorithm by which this feature is obtained. To very if this is working perfectly we took a real object whose radius is known and generated the RSD computation on the entire point cloud of the object. This gives RSD Feature Histogram for all the points in the pointcloud. We can also get the min and max radius of the local surface patch around each point in the pointcloud. I generated various combination of parameters to know how the radius computed varies. Below is the object used which has a radius of 3.5cm which is 0.035m


Below are some of the params chosen and their corrresponding effect on the min and max radius in the local surface patch of each point. For Normal Radius search = 0.03 Max_radius = 0.7 (maximum radius after which everything is plane) RSD_radius search = 0.03


For Normal Radius search = 0.03

Max_radius = 0.1 (maximum radius after which everything is plane)

RSD_radius search = 0.03


For Normal Radius search = 0.02

Max_radius = 0.1 (maximum radius after which everything is plane)

RSD_radius search = 0.03 - This is found to be good way for generating histograms


I tried to do MLS smoothing on the point cloud data and then compute the RSD feature which makes the normal computation better and resulting in consistency over all the points on the object surface.

For Normal Radius search = 0.03

Max_radius = 0.7 (maximum radius after which everything is plane)

RSD_radius search = 0.03


For Normal Radius search = 0.03

Max_radius = 0.1 (maximum radius after which everything is plane)

RSD_radius search = 0.03


For Normal Radius search = 0.02

Max_radius = 0.1 (maximum radius after which everything is plane)

RSD_radius search = 0.03 - This is found to be good way for generating histograms


Now I tested out how the actual feature looks like at a point on the sphere to check if it matches with the histogram in the paper. The same is compared between raw point cloud from the kinect and MLS smoothened point cloud. Below is the result of the same.


It was really hard to fix the previous image that it can show the histograms with values and good resolution. So below is the snapshot of the spherical and cylinderical surfaces.

Cylinderical Surface:


Spherical Surface:


Next post will have the details of how GRSD results are and how they differentiate the characteristics of two surfaces. GRSD code from the author will be integrated into the PCL code base. We also plan to categorize the pipeline into modules that fit into the PCL code base as features, surface and segmentation sections. These information will be posted in the next post.

Modifications of the gpu/people module and first steps for data collection
Wednesday, June 25, 2014

As the first weeks of GSOC are over I am going to summarize my progress so far. The goal of my project is to apply machine learning techniques on collected skeletal data for activity recognition. Most research in this area has been using Nite or Kinect SDK for skeleton tracking. As PCL already has a pose detector available, we want to try using it to collect skeletal information, which however, requires some modifications of the gpu/people module, which was the major focus of my work until now.

First steps

Before the actual start of coding it was important to research the existing methods for action recognition. We decided to implement a novel approach published by Ferda Ofli, Rizwan Chaudhry, Gregorij Kurillo, Reneé Vidal, Ruzena Bajcsy - “Sequence of the Most Informative Joints (SMIJ): A new representation for human skeletal action recognition”. In the presented paper, skeleton joints are ranked based on their variance and the sequence of the resulting rankings is used as the input feature vector for classification.

The next step was to make a draft of activities that we are planning to recognize. We decided to stick with the set of actions proposed by the IAS-Lab of University of Padua, which includes: check watch, cross arms, kick, get up, pick up, point, punch, scratch head, sit down, stand, throw from bottom up, throw over head, turn around, walk and wave.

Trying out the pcl/gpu/people module

Obtaining skeleton data with gpu/people module was not as easy as it seemed to be from the first sight. After the challenge of compiling the source with GPU enabled and making it run with Openni, the detector worked with following configuration:

  • Ubuntu 14.04 LTS
  • CUDA-toolkit 5.5.22
  • Openni 1.5.4
  • Avin2 Sensor Kinect driver v 5.1.2

The tree files necessary for the detection were not provided in trunk and should be downloaded here: https://github.com/PointCloudLibrary/data/tree/master/people/results .

The pose detector runs on the RGB and Depth data obtained from an RGB-D sensor (we use Microsoft Kinect) and produces colour labelling of each pixel, depicting the corresponding body part. After testing the detector following observations should be mentioned:

  • The people detector worked very well in certain positions, the best case includes frontal orientation, near-range with no walls or ground visible.
  • Despite the picture in the tutorial, the current implementation does not provide positions of the skeletal joints, which was also an issue for discussion. Consequently, obtaining the joint position is one of the challenges of this project.
  • The program has problems with large surfaces, mostly the walls and the floor, which are often labelled as the human. This issue also occurs in the official demo video. As out project requires full body visibility, it is necessary to fix this problem (especially as it comes to the floor).
  • In unusual positions, especially while turning, the body segmentation is usually correct, but the labelling often fails.

Trying out the pcl/gpu/people pose detector with one tree (left), three trees (middle), problems with large surfaces (right)

_images/3.png _images/1.png _images/2.png

Extending the pose detector to estimate the positions of the joints

At the beginning it was necessary to browse through the code in order to understand how the program works. The detection happens by calling the process() function of the PeopleDetector class. In the end of the function, the variable sorted2 (with type BlobMatrix) contains the data of each body-part-label.

There are 26 labels (currently 24 used) all together: Lfoot, Lleg, Lknee, Lthigh,Rfoot, Rleg, Rknee, Rthigh, Rhips, Lhips, Neck, Rarm, Relbow, Rforearm, Rhand, Larm, Lelbow, Lforearm, Lhand, FaceLB, FaceRB, FaceLT, FaceRT, Rchest, Lchest, Rshoulder, Lshoulder.

The labels Rshoulder and Lshoulder exist but are currently not implemented in the detection.

The PeopleDetector class was extended with an array containing the coordinates of the joints and finding a way to calculate those coordinates was a major challange. The first idea was to simply use the mean values of the corresponding blobs. In spite of the simplicity of this approach, the results were satisfying. The second idea was to use the buildTree() function, which estimates the optimal blob-tree starting from the neck and then recursively browse through the child-blobs and use their mean values. The buildTree() function uses the “ideal” lengths of the limbs to estimate the optimal blobs (those optimal values are defined here). I also want to thank Koen Buys for giving me tips on the calculation.

As we are interested in the position of the limbs, using the mean values of the blobs is not always appropriate. For example, we are more interested in the upper border of the hip, which is connected to the torso, instead of the hip centroid. Besides this, we are also interested in the shoulder position, which was not implemented. The elbow label was also a special case as it usually has very small area and is often not detected. Consequently, I made some additional modifications to estimate those positions, which are described below.


  • Basic idea: Use the highest part of the chest blob as the shoulder position
  • The point cloud of the left/right chest blob is extracted.
  • The maximum Y-value of this point cloud is calculated
  • All the points of the chest-blob that have Y-value close to to the maximum (chosen threshold: 10 cm) are taken, their 3D mean value is calculated and used as the shoulder position


  • If an elbow-blob already exists, nothing is done: the mean value is used.
  • Otherwise: The elbow is the point of the arm (upper arm) blob, which has the largest distance from the shoulder


  • The mean of the “lowest points” (in a certain threshold) of the hip-blob (not the mean of the whole blob). This modification was done due to the fact that the blob itself is covering the whole torso.

In general the quality of the joint position depends directly on the quality of labelling. As no tracking is implemented yet, the movement of the joints is not continuous.

Skeleton visualization: “good” example with “checking watch” action (left), labelling fails when the floor is visible (right)

_images/4.png _images/5.png

Using Ground Plane People Detector for body segmentation

As mentioned before, the original People Pose Detector has some problems with large surfaces, especially the floor. We tried to solve this problem by combining the original People Pose Detector with Ground Plane People Detector (implemented by my mentor, Matteo Munaro), to segment the body cluster before the actual labelling.

In the resulting application, at first the three points of the ground plane are selected, after which the Ground Plane People Detector removes the ground plane and estimates the point cloud belonging to the human. The points of the cluster are then transformed to the depth image, setting all other depth pixels to very high values.

Some additional corrections were added to improve the segmentation results (depth filtering, extending the legs, as too many ground floor points are removed). Additionally, the RGB and Depth calibration (8 pixel shift) is done as proposed by Lingzhu Xiang .

Using the Ground Plane People Detector improves the performance significantly if the full body visibility is required as it completely solves the large-surfaces-problem.

It should also be considered, what should be done if the Ground Plane People Detector does not detect the human (meaning that none of the detected clusters had confidence over the defined threshold). In this case we use the segmentation from the last frame, in which the person was detected.

Pose detector with full body visibility without (left) and with (right) segmentation.

_images/8.png _images/9.png

Examples of activities: cross arms, check watch, kick, point, turn, wave

_images/cross_arms.png _images/watch.png _images/kick.png _images/point.png _images/turn.png _images/wave.png

Storing the data

I am currently working on completing the framework for data collection. Storing skeletal information (3-D position of the joints) in TXT files is already implemented. People Pose Detector already includes optional storage of depth and RGB-data as PNG images. However, we decided to store the RGB and Depth images with more efficient method using the lzf-format (thanks to Lingzhu Xiang for the tip). Another idea I am working right now is to run the people pose detector offline on the stored images to use the full speed.

Statistical Face Model ( First phase )
Sunday, June 22, 2014
  • Introduction

    The goal of this project is to implement a program that will modify the expressions of several scanned faces according to the facial expressions captured by a RGBD camera.

    The first step is to create a statistical model based on a training database of faces. The training set used so far was the one provided by the FaceWarehouse project and it consisted of 3D meshes stored in .obj files. For further information, please consult the following link: http://gaps-zju.org/facewarehouse/

  • Aproach

    For each face in the training set, a column vector S was created and it contained the coordinates for every vertice of the mesh. Afterwards, the avearage vector and the covariance matrix were calculated. Normally, the covariance matrix would be calculated as \frac{1}{m} \sum_{i=1}^{m} (S_i - \overline{S}) \cdot (S_i - \overline{S}), however one should note that this matrix is 34530 by 34530 and in order to compute the statistical model, the most significant eigenvectors are required. To speed up the calculations, a matrix T was formed by joining the (S_i - \overline{S}) vectors and the eigenvectors for T^t \cdot T were calculated. It is important to note that the size of T^t \cdot T is determined by the number of faces and that the eigenvectors of the covariance matrix can be obtained by left multiplying T to the eigenvectors of T^t \cdot T. Once the eigenvectors are calculated, the statistical model is obtained according to the formula: S_{model} = \overline{S} + \sum_{i=1}^{m-1} \alpha_i \cdot s_i , where \alpha_i is the weight of an eigenvector, determined by multiplying a random number in the range [-2,2] with the corresponding eigenvalue. The final results of this phase are presented below. The average face is:


    And the model is:


    As you can see, the model obtained is a bit flattened compared to the mean face, that is because in the training set the majority of the faces are a bit rounded, however this project needs a model to take into consideration several types of faces, and this is why we need to consider the covariance of the samples in the database.

  • Feature Steps

    • For this model, only the vertices of the faces were used, however the texture coordinates also need to be taken into consideration. Unfortunately, the database does not provide any information about the colors as of yet. Once the data is available the model needs to be adapted for this feature
    • Once the statistical model is fully configured, a 3D registration algorithm must be applied to project the facial expression of a testing sample to the model.
  • References

    T. Vetter and V. Blanz, A Morphable Model For The Synthesis Of 3D Faces, Max-Planck-Institut, Tubingen, Germany

1. Reduction of computational redundancy in cost aggregation in stereo matching.
Saturday, June 21, 2014


A stereo image pair can be used to estimate the depth of a scene. To do so, it is necessary to perform pixel matching and find the correspondences in both images. Different methods for stereo correspondence have been proposed and they are classified in two classes:

  • Correlation-based algorithms: Produce a dense set of correspondences.
  • Feature-based algorithms: Produce a sparse set of correspondences.

Additionally, correlation-based algorithms are usually classified in two main groups, local (window-based) or global algorithms. However, some methods do not fit into any group, and are classified in between them.

The current work is based on correlation-based algorithms, more espefically local and window based-methods, intended for applications where a dense and fast output is required.

The input of the algorithm are two calibrated images, i.e. the camera geometry is known. The images are also rectified in order to limit the correspondence to a 1D search.


The general methodology for stereo vision local approaches can be summarized as follows. An energy cost is computed for every pixel p by using the reference and d-shifted right images:

(1)e \left(p,d \right) = min \left(|I_{l}(x,y)-I_{r}(x-d,y)|, \sigma \right)

Then, the aggregated cost is computed by an adaptive sum of the per-pixel cost:

(2)E(p,d) = \dfrac{\displaystyle \sum_{q \in N(p)}w(p,q)e(q,d)}{\displaystyle \sum_{q \in N(p)}w(p,q)}

Finally, a Winner-Takes-All method is used to find the best of all the disparity hypothesis:

(3)d(p) = argmin\{ E(p,d), d \in [ 0,..,D-1 ] \}

This whole process is complex and time consuming since it is repeated for every hypothesis d. A representation of the conventional approaches can be observed in next figure [Min1].


Min et al. [Min1] introduced a new methodology to reduce the complexity, by finding a compact representation of the per-pixel likelihood, assuming that low values do not provide really informative support. In this case, only a pre-defined number of disparity candidates per pixel are selected to perform the cost aggregation step. The subset of disparity hypotheses correspond to the local maxima points in the profile of the likelihood function, previously pre-filtered to reduce the noise, as shown in the following example:


The disparity hypotheses estimation and cost aggregation processes proposed by Min et al. are depicted in the next figure, where Sc is the subset of disparity hypothesis with size Dc:

[Min1]Min, D., Lu, J., & Do, M. N. “A revisit to cost aggregation in stereo matching: How far can we reduce its computational redundancy?.” In IEEE International Conference on Computer Vision (ICCV), 2011 (pp. 1567-1574).
1. Uniform sampling for superquadrics
Friday, June 20, 2014

In this post I will talk about superquadric uniform sampling. You might be wondering why you should care about sampling. Well, there are 2 reasons:

  • Synthetic data: To test our superquadric fitting algorithm (next section) we will start using as a baseline canonical superquadrics with some added noise. Once a fitting algorithm works with these complete pointclouds, we will be able to proceed with more complex shapes (that are not necessarily superellipsoids but close enough to attempt to fith them).
  • Debugging purposes: Visualization of our results.
  • Beautiful math: The math of this part is simple, yet elegant. If you want to know the details, peek the code (linked below) or read the paper mentioned later in this entry.

From last time, you might remember that the Superellipsoids can be expressed with the explicit equations:

x = a{\cos(\theta)}^{\epsilon_{1}}{\cos(\gamma)}^{\epsilon_{2}} \\
y = b{\cos(\theta)}^{\epsilon_{1}}{\sin(\gamma)}^{\epsilon_{2}} \\
z = c{\sin(\theta)}^{\epsilon_{1}}

Now, let’s say that we want to generate pointclouds of different superellipsoids. Also by now let’s assume we only care about canonical superellipsoids (no translation and no rotation). A first idea would probably be to just sample the values of \theta and \gamma to generate the 3D samples. Let’s see what happens if we use this simple approach:


You might notice that the results for the sphere and the oval shape are reasonably well distributed; however the results for the cylinder and the box are far from even. In general, the naive approach of uniformly sampling \theta and \gamma would work well for high values of \epsilon_{1} and \epsilon_{2} (where high means closer to 1 than to 0). For lower values (such as for the box and cylinder cases where \epsilon_{1}=0.1) the performance of naively sampling the angles would degrade.

The following figure shows a more reasonable sampling:


The samples above were obtained by performing a simple technique proposed by Pilu and Fisher [PiluFisher95]. These authors noticed that in order to obtain an uniform sampling, the 3D distance between samples had to be constant; however, uniform sampling distance does not correlate with uniform angle steps.

An implementation of the algorithm proposed by Pilu and Fisher can be found in my PCL fork while the source example sampleSQ.cpp generates a pointcloud for a superellipsoid defined by user input parameters.

[PiluFisher95]Pilu, Maurizio, and Robert B. Fisher. “Equal-distance sampling of superellipse models.” DAI RESEARCH PAPER (1995).
2. Fitting superquadrics (a.k.a. The Horror of Box Constrained Non-Linear Optimization)
Friday, June 20, 2014

In this post we will see some initial results for fitting superquadrics to full pointclouds. Let’s quickly start with the math so you have a good idea of how the code works.

I will remind you again the superellipsoid equation (yes, I will keep bringing up the equation again and again so it stays forever on your brain):

(1)\left( \left(\dfrac{x}{a}\right)^{\frac{2}{\epsilon_{2}}} + \left(\dfrac{y}{b}\right)^{\frac{2}{\epsilon_{2}}} \right) ^{\frac{\epsilon_{2}}{\epsilon_1} } + \left(\dfrac{z}{c}\right)^{\frac{2}{\epsilon_{1}}} = 1

We will call the expression to the left F(x,y,z). A 3D point((x_{i},y_{i},z_{i})) will belong to the canonical superquadric defined by the parameters (a,b,c,\epsilon_{1}, \epsilon_{2}) if F(x_{i},y_{i},z_{i}) = 1. To have a general superquadric, we must consider the translation and rotation terms, hence the general equation (1) has the following form:

(2)F(x,y,z) =  \left[ \left(\dfrac{ n_{x}x + n_{y}y + n_{z}z -t_{x}n_{x}-t_{y}n_{y} - t_{z}n_{z} }{a}\right)^{\frac{2}{\epsilon_{2}}} + \left(\dfrac{ o_{x}x + o_{y}y + o_{z}z -t_{x}n_{x}-t_{y}o_{y} - t_{z}o_{z} }{b}\right)^{\frac{2}{\epsilon_{2}}} \right] ^{\frac{\epsilon_{2}}{\epsilon_1} } + \left(\dfrac{ a_{x}x + a_{y}y + a_{z}z -t_{x}a_{x}-t_{y}a_{y} - t_{z}a_{z} }{c}\right)^{\frac{2}{\epsilon_{1}}} = 1

where \mathbf{t} is the translation of the superellipsoid center with respect to a given global frame and (\mathbf{n},\mathbf{o},\mathbf{a}) are the column vectors of the rotation matrix of the superellipsoid (again, with respect to some defined global axes). In fact, to be completely rigurous, we should express F(.) as a function of both the point being evaluated and the superellipsoid parameters being used: F(x,\Lambda) where \Lambda = (a,b,c,\epsilon_{1}, \epsilon_{2},t_{x}, t_{y}, t_{z}, \psi, \theta, \gamma)

In order to find the superellipsoid that best fit a given full pointcloud (composed by 3D points \mathbf{x}_{i} with i \in [1,k], we need to minimize the error between each point and equation (2). In its most basic form, we could try to minimize this equation:

\min_{k} \sum_{k=0}^{n} \left( F(\mathbf{x}; \Lambda) - 1 \right ) ^{2}

Some wise people suggested a couple of modifications to the basic version above and came up with this:

\min_{k} \sum_{k=0}^{n} \left( \sqrt{abc}F^{\epsilon_{1}}(\mathbf{x}; \Lambda) - 1 \right ) ^{2}

the \sqrt{abc} factor makes sure that the superellipsoid obtained is the smallest possible. The additional exponent \epsilon_{1} improves the convergence time.

As of now, I will not go into more details on the math behind since I am running out of time to write this entry. However, there are a few things that you should remember of this post:

  • Our goal is to solve a Non-Linear Square problem.
  • We have 11 parameters to find
  • Our parameters are bounded, which means that they have upper and lower limits. This constraints the type of algorithms we can use to optimize our solution.
  • The more dense the pointcloud is, the more factors will be considered in the equation above.

As of now, we have implemented the method described in [Duncan13] to fit segmented, full pointclouds to superellipsoids. In this post we will present some initial results obtained for 4 test cases (sphere, box, cylinder and oval shape). We tested them in 3 scenarios:

  • Canonical superellipsoids with no noise.
  • General superellipsoids (rotation and translation added) no noise.
  • General superellipsoids with noise (up to 5% of the length of the smallest principal axis).

Let’s start with the initial test cases:

Case 0: Parameters used
Parameter Sphere Ellipsoid Cylinder Box
a 0.25 0.04 0.05 0.025
b 0.25 0.08 0.05 0.08
c 0.25 0.06 0.1 0.18
e1 1 0.75 0.25 0.1
e2 1 0.75 1.0 0.1

The pointclouds are shown in the following figure and are also available in my repository .


The code used for the results shown in the rest of this post can be found here .

For the first test case we generated full sampled pointclouds by using the sampling code we presented in our previous post. To initialize the maximizer, we used the pointcloud’s bounding box information for the superellipsoid dimensions and global transform. For all cases we used an initial value of 0.5 for \epsilon_{1} and 1.0 for \epsilon_{2} (these values are in the middle of the allowed range).

Results for the fitting are shown in the following table. It can be seen that the fitting works pretty well, which is kind of expected since this is the most basic case.

Case 0: Results
Parameter Sphere Ellipsoid Cylinder Box
a 0.247 0.039 0.0499 0.025
b 0.247 0.079 0.049 0.079
c 0.247 0.059 0.099 0.179
e1 0.99 0.753 0.271 0.10
e2 0.99 0.753 0.97 0.13

For case 1, we modified the test pointclouds by applying a transformation to them. Details of the transformations for each case are shown below (the parameters a,b,c,\epsilon_{1},\epsilon_{2} remain constant so we omit to repeat them).

Case 1: Parameters
Parameter Sphere Ellipsoid Cylinder Box
x 0.5 -0.6 -0.4 -0.1
y 0.8 0.2 0.7 0.3
z 0.0 0.0 0.3 0.5
roll 0.0 0.2 0.6 0.0
pitch 0.0 0.5 0.9 0.0
yaw 0.3 0.3 0.8 0.8

The results are shown below. We observed that the parameters that remain constant keep approximately the same fitted values as in Case 0, so we won’t repeat them in the next table.

Case 1: Results
Parameter Sphere Ellipsoid Cylinder Box
x 0.5 -0.6 -0.4 -0.1
y 0.8 0.199 0.69 0.3
z 0.0 0.0 0.29 0.49
roll 0.0 0.199 -0.117 0.0
pitch 0.0 0.49 1.02 0.0
yaw 0.0 0.29 -0.055 -2.34

We can observe that the translation values are well fitted, while the same is not the case for the rotation values. In the next post we should discuss some ideas to fix that.

Finally, we added noise to the pointclouds. The values used are shown in the following table (meaning that a uniform disturbance between [-\delta,\delta] was randomly applied to each point in the pointcloud.

Case 2: Parameters
Parameter Sphere Ellipsoid Cylinder Box
\delta 0.01 0.002 0.0025 0.0015
Percentage 4% 5% 5% 6%

The pointclouds are shown in the following figure:


The final parameters are shown below:

Case 2: Results
Parameter Sphere Ellipsoid Cylinder Box
a 0.24 0.039 0.049 0.025
b 0.245 0.079 0.049 0.08
c 0.249 0.0601 0.102 0.19
e1 1 0.76 0.35 0.33
e2 0.91 0.71 0.92 0.1
x 0.49 -0.59 -0.39 -0.1
y 0.8 0.19 0.70 0.3
z 0.0 0.0 0.30 0.49
roll 0.0 0.19 0.95 0.0
pitch 0.0 0.50 0.47 0.0
yaw -2.8 0.30 1.34 -2.3

I should probably put a parallel table with all the original values so you can compare visually more easily. In any case, couple of observations:

  • Rotation final values are the ones with the biggest errors.
  • Noise levels that exceed 5% are not acceptable (the fitting values \epsilon_{1} and \epsilon_2 vary significantly, altering the shape of the object. The other parameters keep being reasonably accurate.
  • We are not using any loss function to alleviate the effect of the outliers. Here that did not prove particularly important, but when we do experiments with data that is not synthetically obtained (real pointclouds of objects) it will probably matter.
  • A crucial requirement for a good fit is the initialization of the Z axis of the object (revolution axis).
[Duncan13]Duncan, Kester, et al. “Multi-scale superquadric fitting for efficient shape and pose recovery of unknown objects.” Robotics and Automation (ICRA), 2013 IEEE International Conference on. IEEE, 2013.
Dataset collection and curiosities in Kinect calibration
Friday, June 20, 2014

Difficult poses for the current PCL people detector

This project aims to improve people detection capability in PCL. To validate the improvement we need measurements and data as proof. Therefore the first step of the project is to collect and establish datasets of relevant human poses. PCL’s people detector assumes that human will be in upright poses instead of arbitrary poses, and the multiple stages of detection including segmentation clustering and classification in its implementation rely on such assumption, making it susceptible to varied human poses in real world. Thus I intend to collect various representative poses that deviate from upright poses. As is discussed with my mentor Matteo, those include:

  • Crouching
  • Hands up
  • Sitting
  • Dancing/running
  • Lying on sofa or floor

These poses are not equal. Crouching, dancing, and running pose challenges to the image based human classifier which currently assume a sample window of a full-body upright pose. Sitting likely involves occlusion of some part of body and increases difficulty in segmentation. Hands up would introduce errors in segmentation or even false segmentation in the current head based subclustering as it determines each subcluster by its highest local maxima which could erroneously be the hands. While lying on sofa or floor would completely defeat the current approach of segmentation because the person is no longer an independent object to be segmented. So these are the types of data we will collect and evaluate upon.

Methods to collect data

There are different ways to collect point cloud data, but as is suggested in OpenNI and PCL Recording Codes, the most efficient capturing method in PCL is pcl_openni_image (or the only method that works because other ones in PCL either totally fail or explode in I/O) which grabs RGBD frames in compact pclzf format (samples). You use pcl_openni_image '#1' to test Kinect configuration and pcl_openni_image '#1' -visualize 0 to save batch frames. This produces compact data stream in 400KB per frame, or 12MB/second, with point clouds reconstructed afterwards when being used. You can visualize the data with pcl_image_grabber_viewer -pclzf -dir captured_frames/ -fps 30, or programmatically replace pcl::OpenNIGrabber with pcl::ImageGrabber like the image grabber viewer. The other way to collect data is to save raw frames and camera information in ROS bags and play back the image frames to reconstruct the point cloud on the fly and feed depth-registered point cloud topic to PCL via pcl_ros.

Problems with depth registration

pclzf format seems fine, until it is actually reconstructed to point clouds with a large bias in depth registration:


Depth registration is the first step of data collections. PCL’s people detector makes use of image based people classifier, which relies on correctly color registered point cloud. Somehow there are some peculiarities in how we perform the depth registration and calibration prior to that. The image above shows that color of the background is mapped onto the points that belong to the person.

My guess of this was that this might be caused by wrong calibration, or loss of calibration during the capturing and reconstructing process. So after redoing the intrinsic calibration of RGB camera and IR camera and extrinsic calibration as described in openni_launch tutorials and this Kinect calibration tutorial, I tried the other method with ROS openni_launch depth_registration:=true publish_tf:=true/false. The problem with depth registration persists, and OpenNI seems to ignore my calibration parameters or frame transforms no matter what and only use its built-in factory default parameters:


It turns out PCL does not perform any internal calibration at all and relies on OpenNI providing correct depth registration, and there is no way of updating OpenNI’s calibration parameters. Yes you can calibrate the Kinect all you want but there is no way to make use of the result in existing code base. This thread about what is happening behind OpenNI depth registration with pointers inside is a good read.

We can still do the whole depth registration process manually, as in pointcloud_utils.cpp, but unfortunately the data captured by pcl_openni_image is already depth registered somehow with wrong extrinsic calibration. To avoid spending too much time on perfecting data calibration, I decided to make a minimal fix to extrinsic error in the depth registration in pclzf data by shifting the RGB layer along X axis, in pseudo-code like this:

for (x = 0; x < width; x++)
  for (y = 0; y < height; y++)
    cloud(x, y).rgb = cloud(x + 8, y).rgb;

The result looks mostly acceptable with some residual “borders” around and can be improved later on:


Note that green box in the right of the person.

Datasets and first results

I collected datasets of various poses standing, crouching, sitting, and lying on sofa. One standing dataset looks like this (selected thumbnails):


There is also this result demonstrating the failure mode of the current head based subclustering method:


These extra erroneous segments (green boxes) are caused by the hands up pose with which the current clustering method would recognize the hands as heads. So this is where I will improve upon.

The following video are the first results on all current collected datasets. Green boxes are segmented clusters and red boxes are people detections. The minimum height is set to 0.1 so you will see lots of green boxes, which is for evaluating the current performance of the segmentation.

Some major observations for the results:

  • Segmentation is very good and almost never misses the true person clusters.
  • Hands up pose likely generates a bunch of false positive clusters, as explained above.
  • Segmentation totally fails in the sofa dataset.
  • The classifier can have some room of improvement.

So this is about everything for this time. Next time I will write about dataset annotation and performance improvement of some first techniques that I will implement.

0. Whetting your appetite: Why superquadrics?
Thursday, June 19, 2014

Superquadrics are a family of geometric shapes that can represent a wide array of diverse primitives using a small number of parameters. As an example, look at the figure below: The first row depicts 5 common household objects. The second row shows the superquadrics that more closely resemble them.


Superquadrics were initially introduced in the computer graphics community by Alan Barr [Barr81], but they were later adopted by the robotics community as an effective modelling tool to approximate objects shape. In general, superquadrics include superellipsoid and supertoroids, but for most practical uses, we care for only superellipsoids. These can be expressed with the following formula:

(1)\left( \left(\dfrac{x}{a}\right)^{\frac{2}{\epsilon_{2}}} + \left(\dfrac{y}{b}\right)^{\frac{2}{\epsilon_{2}}} \right) ^{\frac{\epsilon_{2}}{\epsilon_1} } + \left(\dfrac{z}{c}\right)^{\frac{2}{\epsilon_{1}}} = 1

As it can be seen, superellipsoids in their canonical form can be expressed by 5 parameters:

  • a,b,c: Scaling factors along principal axes
  • \epsilon_{1} : Shape factor of the superellipsoid cross section in a plane orthogonal to XY containing the axis Z.
  • \epsilon_{2} : Shape factor of the superellipsoid cross section in a plane parallel to XY.

If a general transformation is considered, then the total number of parameters required to define a superellipsoid is 11 (the 6 additional being the rotation and translation degrees of freedom (x,y,z,\rho,\psi,\theta))

Expression (1) shows the implicit equation of the superellipsoids. Its parametric solution can be expressed as:

x = a{\cos(\theta)}^{\epsilon_{1}}{\cos(\gamma)}^{\epsilon_{2}} \\
y = b{\cos(\theta)}^{\epsilon_{1}}{\sin(\gamma)}^{\epsilon_{2}} \\
z = c{\sin(\theta)}^{\epsilon_{1}}

with \theta \in [-\phi/2, \phi/2] and \gamma \in [-\phi,\phi]. In our next post we will learn how to generate pointclouds for superellipsoids (which is not as simple as just sampling \theta and \gamma! Stay tuned :).

[Barr81]Barr, Alan H. “Superquadrics and angle-preserving transformations.” IEEE Computer graphics and Applications 1.1 (1981): 11-23.
First post - LCCP algorithm implementation
Monday, June 16, 2014

Hello everybody, this is my first blog post to this project. The last weeks I have been busy implementing the LCCP algorithm in PCL. The algorithm can be used to split a point cloud into regions which are isolated by concave boundaries to all other regions. It turns out that this is a highly valuable segmentation as it often retrieves (bottom-up!) nameable parts like handles, heads and so on. Especially robotic applications may find this useful. Together with Jeremie I will introduce it at this year’s CVPR 2014 conference (S. C. Stein, M. Schoeler, J. Papon, F. Woergoetter: Object Partitioning using Local Convexity). We will have a poster next Tuesday afternoon. So if you are around, you are more than welcome to visit us there. In the meantime I hope I get the pull request through, so that everybody interested can play around with the algorithm. It will be located in the segmentation module. There is also an example pcl_example_lccp_segmentation. To get you interested the following image shows an example of the segmentation. As you can see all parts can be easily named.


That’s it for the first post. Hope to see some of you at CVPR. Stay tuned for more to come.

Richtsfeld’s code and results (First post)
Monday, June 16, 2014
  • Introduction

    Goal of this project is to integrate cluttered scene segmentation methods into pcl::segmentation. This sprint involves implementation of modules from two publications below.

    • Z.-C. Marton, F. Balint-Benczedi, O. M. Mozos, N. Blodow, A. Kanezaki, L. C. Goron, D. Pangercic, and M. Beetz: Part-based geometric categorization and object reconstruction in cluttered table-top scenes; Journal of Intelligent and Robotic Systems, January 2014
    • A.-Richtsfeld, T. Mörwald, J. Prankl, M. Zillich and M. Vincze: Learning of Perceptual Grouping for Object Segmentation on RGB-D Data; Journal of Visual Communication and Image Representation (JVCI), Special Issue on Visual Understanding and Applications with RGB-D Cameras, July 2013

    These papers have made their code available and has pcl implementation to some extent already. But we will aim to make the modules interoperable in our implementation.

  • Richtsfeld’s code and results

    As a first step, Richtsfeld’s code was analyzed. Below is the highlevel picture of the structure of his code base along with comments on their functionality.


    His code was tested on some scenes that were grabbed by me using Kinect. Below are the snapshots of the same.

    _images/scene_01.png _images/scene_02.png _images/scene_03.png

    His code works on organized pointcloud of type PointXYZRGB.

    Code is yet to be tested for quantitative results with our annotated dataset.

  • Brainstorming

    These are some of the discussions I had with my mentor Zoltan and the summary is listed below.

    • Zoltan’s work classifies the relations based on features computed on group of segments as 1-8 elements in a group.

    • Richtsfeld’s work classifies the relations based on segment pairs.

    • Richtsfeld’s work is limited to organized pointcloud and cannot handle a cloud that is fused out of many pointclouds of a scene say through registration.

    • Richtsfeld’s work computes features which are inspired from Gestalt’s principles. There are someother features that are worth testing. These are the features used for structure discovery in a pointcloud data. Features are as below and more details on them are available in the publication - Collet, Alvaro, Siddhartha S. Srinivasa, and Martial Hebert. “Structure discovery in multi-modal data: a region-based approach.” Robotics and Automation (ICRA), 2011 IEEE International Conference on. IEEE, 2011.

    • Additional features to test are:

      • Shape Model
      • Self Continuity
      • Contour compactness
      • Pair continuity
      • Verticality
      • Concavity
      • Projection
      • Alignment
      • Surface compatibility
Results on the Castro dataset
Monday, March 31, 2014

We’ve implemented almost everything, what was planned, and now we want to present our results.

First of all please watch this video with results on the whole Castro dataset. Road is marked with red, left and right road borders are indicated by green and blue lines respectively.

The main problem is the presence of holes in the road surface. This is caused by the holes in the input disparity maps. We decided not to inpaint them, because we have no information about scene in those points. But we will compute the quality of the labeling only in points with known disparity. It allows to estimate the results of our method independently of the quality of the method for generating disparity maps.

Our method has a significant advantage in situations with one or both curbs are clearly visible (with corresponding sidewalks). You can compare the result of the previous sprint’s method (left) with the result of our method (right) on the same frame (which has two clearly visible sidewalks).


Next, I’m going to show you the numerical results. Precision is a ratio of right detected road’s points to all detected pixels, recall is a percent of detected road’s points. Only points with known disparity are taken into account.

Final Report
Friday, January 24, 2014

It is with pleasure to share the successful completion of this Toyota Code Sprint in this final blog post. In this project, homography estimation based on multi-modal, multi-descriptor correspondence sets has been explored, and inspired the introduction of the multi-descriptor voting approach (MDv). The proposed MDv approach achieved a consistent accuracy in the 0.0X range, a level of consistency that is better than those based on single-type state of the art descriptors including SIFT. In the process, a framework for analyzing and evaluating single and multi-descriptor performance has been developed, and employed to validate the robustness of MDv, as compared with homography estimations based on a single descriptor type, as well as those based on RANSAC registration of best-K multi-descriptor correspondence sets. The code and dataset for this project are hosted on https://github.com/mult-desc/md, with dependencies on both PCL 1.7 and OpenCV 2.4.6.

Follows is an in-depth report detailing the project’s accomplishments, as well as design and validation considerations:

Click here for a high resolution version of the report.
Edge Weights Revisited: Introducing the Curvature Term
Wednesday, November 27, 2013

In the previous blog post I described my attempts to find a good balance between the contributions of the three terms (distance, normal, and color) to edge weight computation. As it often happens, as soon as I was done with the evaluation and the blog post, I realized that there is another type of information that could be considered: curvature. And indeed, it proved to have a very positive effect on the performance of the random walker segmentation.

Let me begin by exposing a problem associated with edge weights computed using the normal term alone (i.e. depending only on the angular distance between the normals of the vertices). Consider the following scene (left):

_images/test47-voxels.png _images/test47-weights-zoomed.png
Voxelized point cloud (left) and a close-up view of the graph
edges in the region where the tall and round boxes touch
(right). The edges are colored according to their weights
(from dark blue for small weights to dark red for large

Most of the edges in the boundary region (right) are dark blue, however there are a number of red edges with quite large weights. This sort of boundary is often referred to as a “weak boundary” and, not surprisingly, has a negative effect on the performance of many segmentation algorithms. You can imagine that a boundary like this is a disaster for the flood-fill segmentation, because the flood will happily propagate through it. Luckily, the random walker algorithm is known for its robustness against weak boundaries:

_images/test47-rws1.png _images/test47-rws2.png _images/test47-rws3.png
Segmentations produced by the random walker algorithm
using three different choices of seeds (shown with red

In the first two cases one of the seeds is very close to the weak boundary, whereas another one is far away. In the third case there are multiple green seeds placed along the boundary, however a single purple seed is able to “resist” them from its remote corner.

This robustness has limits, of course. In the figure below the “table seed” is placed in the rear of the table, far from the boundaries with the boxes. The box segments managed to “spill” on the table through the weak boundaries:

Segmentation failure when a seed is placed too far from
a weak boundary

One way to address this problem is to make the sigma of the normal term smaller, therefore penalizing differences in normals’ orientations more. Enabling the distance term might also help, because the edges that contribute to the boundary weakness are often diagonal and therefore longer than the average. The figure below (left) demonstrates the graph edges in the same boundary region with new weights, computed using decreased normal term sigma (10% of the original one), and with the distance term enabled. (The overall edge color shift towards blue is due to it.)

_images/test47-weights-zoomed2.png _images/far-away-weights.png
Graph edges with weights computed with a smaller normal
term sigma and enabled distance term. Close-up view of
the region where the tall and round boxes touch (left)
and top-down view at the rear of the table (right).

Still, there are several edges with relatively large weights in the boundary region, but there are no large-weight paths connecting vertices on both sides of the boundary anymore. We got rid of the weak boundary, but this came at a price. Although the table itself is flat, the cloud that we get from the Kinect is not, and the further from the camera the more wavy it is. The image on the right shows a top-down view at the rear of the table, where the waves are particularly large. The edges that belong to the cavities between the “waves” were heavily penalized and virtually disappeared. Random walkers will have a hard time traversing this part of the table on their way to the boxes!

Having considered all of these I came to a conclusion that some additional geometrical feature is needed to improve the weighting function. Curvature was the first candidate, especially in the light of the fact that we anyways get it for free when estimating voxel normals (via PCA of the covariance matrix of the voxel neighborhood). I added one more exponential term to the weighting function:


where d_4(\cdot) is simply a product of the voxel curvatures. Similarly to how it is done in the normal term, the product is additionally multiplied by a small constant if the angle between the voxels is convex (in order not to penalize convex boundaries).

The figure below demonstrates the edge weights computed using the new term alone:

_images/test47-weights-zoomed3.png _images/far-away-weights2.png
Graph edges with weights computed using only the new
curvature term. Close-up view of the region where the
the rear of the table (right).

The boundary between the boxes is perfectly strong, whereas the weights in the rear of the table are not penalized too much. The seeding that resulted in a segmentation failure before no longer causes problems. I used the set of random seedings described in the last post to find the best sigmas for the extended weighting function. Then I generated 50 new seedings to test and compare the performance of the old (without the curvature term) and the new (with the curvature term) weighting functions. The figure below summarizes the distributions of under-segmentation errors:


The performance significantly improved both in terms of stability, quality, and number of failures (in fact, there are no failures at all).

Edge Weights for Random Walker Segmentation
Tuesday, November 19, 2013

The random walker segmentation algorithm requires that the data are modeled as a weighted graph, and the choice of edge weighting function has a great impact on the performance of the algorithm. In this blog post I will describe the weighting function and parameters that I ended up using.

Before talking about the weights of the edges between vertices, let’s discuss the vertices themselves. As mentioned in the previous blog posts, I have a pre-processing step where the input cloud is voxelized using OctreePointCloudAdjacency. Voxelization serves three purposes:

  • Data down-sampling. The number of voxels is smaller than the number of points in the original cloud.
  • Data smoothing. The normal orientation and color of a voxel are averaged over the points of the original cloud that belong to it.
  • Establishing of adjacency relations. The regular grid structure of the octree naturally defines a 26-neighborhood for each voxel.

The voxels consequently become vertices of the graph, and each of them is connected with its neighbors by an edge. Each voxel has several properties: 3D position, normal orientation, and color, which may be used in edge weight computation.

As mentioned in the very first blog post on random walker segmentation, originally I used the edge weighting function from the following paper:

Later on I introduced several modifications. Now for a pair of vertices v_i and v_j the weight is defined as:

w_{ij} = \exp{\left\{-\frac{d_1(v_i,v_j)}{\sigma_1}\right\}}\cdot\exp{\left\{-\frac{d_2(v_i,v_j)}{\sigma_2}\right\}}\cdot\exp{\left\{-\frac{d_3(v_i,v_j)}{\sigma_3}\right\}},

where d_1(\cdot), d_2(\cdot), and d_3(\cdot) are the Euclidean, angular, and color differences between voxels, and the sigmas are used to balance their contributions. Compared to the weighting function of Lai et al., the color term was added, and scaling by mean values was removed.

I devised the following procedure in order to find appropriate values for the sigmas. I took a scene with known ground truth segmentation and generated 50 random proper seedings. Here by a “proper seeding” I mean a set of seeds where each seed belongs to a distinct ground truth segment, and each ground truth segment has a single seed (see example in the figure below on the left). For each of these seedings I ran random walker and computed the under-segmentation error (that was defined in one of the earlier blog posts, see example in the figure below on the right). Then I analyzed the distributions of errors resulted from different sigma values.

_images/test47-proper-seeding.png _images/test47-undersegmentation-error.png
Voxelized point cloud with
one of the randomly
generated proper seeding
used in the experiments

Under-segmentation error of
the segmentation produced by
random walker from the
given seeds (erroneous voxels
pained red)

Note that the ground truth itself is not perfect, because it is often impossible to tell apart the points at the boundary of two objects. Consequently, the ground truth segmentation is somewhat random at the boundaries, and we should not expect (or strive) any segmentation algorithm to produce exactly the same result. The under-segmentation error displayed above has 1039 erroneous voxels, and this is pretty much the best performance we could expect from a segmentation algorithm with this ground truth.

Let’s begin by examining the influence of the angular term. In this experiment I set Euclidean sigma to the value of voxel resolution and color sigma to 0.1. Below is a plot of under-segmentation error distributions for different choices of angular sigma (note that larger sigmas correspond to less influence):


Each distribution is visualized using a boxplot. The three main features are the position of the red bar (median of the distribution), size of the box (50% of the values fall inside the box), and the amount and positions of pluses (outliers). The first one gives an idea of the average performance of the algorithm. The second one expresses the segmentation stability with respect to the seed choice (with smaller box meaning better stability). The third one indicates segmentation failures. Indeed, a significant deviation of the under-segmentation error means that the output segmentation has large mis-labeled regions, which may be deemed as a failure.

Clearly, the median values of the distributions are almost the same. The differences are very small and due to the discussed properties of the under-segmentation error can not be used to draw conclusions of which sigmas are better. The box sizes, however, vary significantly. The sigmas from 0.1 to 1.1 yield the most stable performance. Also the number of failures is less for those sigmas. This evaluation does not provide enough information to chose any particular sigma in this range, so for now I settled on 0.2 (it is the second most stable, but yields less failures than 0.1).

In order to explore the influence of the color term, I set Euclidean sigma to the value of voxel resolution again and angular sigma to 0.2:


Note the last column, which shows the error distribution when the color term is removed completely. We see that sigmas from 0.1 to 0.15 provide slightly more stable results. Unfortunately it is not visible in the plot, but the number of pluses on the 10^{4} row is less for these sigmas. So I chose 0.15 as a result of this evaluation.

Speaking about the distance sigma, it turned out to have very small influence on the results. In most cases introduction of the distance term does not change the output at all. Still, in few cases it helps to avoid complete segmentation failure. It turned out that setting this sigma to the voxel resolution value gives the best results.

Finally, let me demonstrate the segmentations produced with the chosen sigmas. Among the 50 random seedings only 5 resulted in segmentation failure:

5 failed segmentations

Clearly, failures happened when a seed was placed either exactly on the boundary between two objects (#2), or on the outermost voxels of an object (#1, #3, #4, #5).

The remaining 45 seedings yielded good segmentations. Below are 15 of them (selected randomly):

15 succeeded segmentations
Refactoring and Speeding Up Random Walker Segmentation
Sunday, November 10, 2013

In the past weeks I decided to put the “spectral thing” on hold and turned back to the random walker segmentation. In this blog post I will talk about refactoring of the random walker algorithm and my experiments with different linear solvers. In the follow-up post I will explore how the segmentation results depend on seed placement, as well as discuss edge weighting functions and the choice of parameters for them.

First of all, I refactored and cleaned up my implementation. Recall that the random walker algorithm could be applied to cluster any kind of data as long as there is a way to model it using a weighted graph. I decided that it makes sense to have a generic templated implementation of the algorithm which would work with any weighted graph. An obvious choice to represent graphs in C++ is to use the primitives available in the Boost Graph Library (BGL). This library is very generic, feature-rich, and flexible, though at the expense of a rather steep learning curve. I took their implementation of the Boykov-Kolmogorov max-flow algorithm as an example of how to design the interface for a generic graph-based algorithm. In my case the public interface is just one templated function:

template<class Graph,
         class EdgeWeightMap,
         class VertexColorMap>
randomWalkerSegmentation(Graph& g,
                         EdgeWeightMap weights,
                         VertexColorMap colors);

The user has to provide a graph, a property map that associates a weight to each edge of the graph, and a property map that contains initial vertex colors. (I adopted the term “colors” instead of “labels” because BGL has a pre-defined vertex property type with this name.) The output of the algorithm (i.e. label assignment) is written back to the color map. Internally the function instantiates a class, which does all the boring work of constructing and solving a system of linear equations, as well as interpreting its solution as a label assignment.

While this generic graph segmentation function might be useful for someone, the general audience will be interested in a class that implements a complete point cloud segmentation pipeline. This class should take care of converting an input cloud into a weighted graph, segmenting it, and turning the random walker output into a labeled point cloud. At the moment it is not clear for me how the first step should be designed. Indeed, there are multiple ways to represent a point cloud as a weighted graph, both in terms of topology and edge weights. Currently I voxelize the input cloud and use 26-neighborhood to establish edges between nodes. Alternatively, one may work with a full point cloud and use some fixed-radius neighborhood. One more option might be to generate a mesh and work with it. Exploration of these possibilities remains as a future work.

The second issue that I addressed recently was the performance of the algorithm. The main computational effort is spent on solving a sparse system of linear equations, where the number of equations is determined by the number of unlabeled vertices (i.e. basically the size of the whole point cloud). For example, the typical size of voxelized scenes from the OSD dataset that I use in my experiments is about 30000 vertices. Originally, I used the ConjugateGradient solver of Eigen, because it is “recommended for large symmetric problems”. The time needed to segment a typical point cloud with this solver is about 1 second on my three years old i5 laptop. I decided to try other options available in Eigen. In particular, I tested BiCGSTAB with Diagonal and IncompleteLUT preconditioner, SimplicialLLT, SimplicialLDLT, and SimplicialCholesky solvers. The figure below shows the runtime of these solvers with respect to the problem size. (Only one of the Simplicial*** solvers is plotted as they demonstrated very similar performance.)


The computation time depends linearly on the problem size for all solvers, however SimplicialLDLT has a much smaller growth rate. For a typical 30 thousand vertices problem it needs about 200 ms. What’s more, it can solve for multiple right-hand sides at the same time, whereas ConjugateGradient and BiCGSTAB can not. This means that as the number of labels (i.e. desired segments) grows, the computational time does not increase.

In fact, Eigen offers some more options such as CholmodSupernodalLLT, which is a wrapper for SuiteSparse package, and SparseLU, which uses the techniques from the SuperLU package. Unfortunately, the former complained that the matrices that I provide are not positive definite (though they actually are), and the latter is a very recent addition that is only available in Eigen 3.2 (which I do not have at the moment).

Taking into account the evaluation results I switched to the SimplicailLDLT solver in my random walker implementation.

First results
Monday, October 28, 2013

We have implemented an algorithm which processes frames independently (i.e. without the connection to the previous frame). Also, now we make an assumption that both of the curbs (left and right) are presented in the scene.

Below you can see a projection of the labeled DEM to the left image. Green points correspond to the left sidewalk, blue - to the right one. Red points mark the road surface. The algorithm couldn’t find the right curb on this image, so right side of the road was labeled uncorrectly. The good news is that the left curb was detected correctly.


However our goal is to label a road on the image, not on the DEM. So, if we mark each pixel with label corresponding to the DEM’s cell we get the following labeling of the road surface:


You can see a lot of holes in the road area. They caused by holes in the disparity map. We decided not to fill them, because someone/something can be situated there (we have no information).

A disparity map of this frame is shown below. Points without disparity are marked with red.

TOCS 2.0 - Superquadrics
Friday, October 25, 2013

I am pleased to announce that I just finished the second Toyota Code Sprint. The topic we tackled this time was superquadrics applied for Computer Vision tasks such as object modeling and object detection.

The code we developed was not yet integrated into PCL (it does use PCL for all the processing), but lies in a separate repository which you can find here: https://github.com/aichim/superquadrics .

An extensive report that presents some of the theory behind the concepts and algorithms we used in the project, as well as implementation details and results can be found at the end of this post.

At the end of this work, we present a set of ideas that can be used to extend the project in subsequent code sprints:

  • performance evaluation of the superquadric-based algorithms compared to other state-of-the-art object modeling and object detection approaches and integrating my code into PCL if the results are satisfactory
  • explore further possibilities of object modeling inside point clouds. This includes techniques different from superquadrics (see the report for refecences), or improving superquadric-based techniques (see supertoroids, deformable superquadrics)
  • in this code sprint we explored one approach for multipart object segmentation using superquadrics. This technique is valuable for point cloud compression, but not very efficient nor robust. More work in this direction can bring interesting results.
  • more robust and efficient object fitting using superquadrics - right now we use only the 3d location of the points, but the quality of the fitting can be improved by using normal and/or curvature information.
  • considering that most of the scans of objects will not cover the complete sphere of possible views, we should think about how to fit only partial superquadrics
Recognition results
Friday, October 25, 2013

Hello everybody. Last few weeks I was trying to train an SVM for car recognition. For this purpose I was using some clouds that I had. These were the clouds of the city of Enschede, Netherlands, that I had manually labeled earlier. Training set consists of 401 clouds of cars and 401 cloud of the other objects (people, trees, signs etc.). As for the classifier, I was using Support Vector Machine from the libSVM library.

During the training I was using 5-fold cross validation and the grid search in order to get the best values of gamma and soft margin C (parameters of the Gaussian kernel). The best accuracy achived during cross validation was 91.2718% with Gamma and C equal 2^{-5} and 2^{13} respectively.

The model obtained after training was then used for recognition. The set for recognition consists of the 401 cars and 401 other objects. Training and testing sets were taken randomly from different scanned streets. The best accuracy achived this far when trying to reconize test set is 90.7731% (728 correctly recognized objects of 802).

As for descriptors, I was using combination of RoPS feature and some global features such as height and width of the oriented bounding box. RoPS feature was calculated for the center of mass of the cloud with the support radius big enough to include all the points of the given cloud.

Since RoPS is better fits for the purpose of local feature extraction, I believe that using it with ISM and Hough Transform voting will result in higher accuracy.

An Algorithm for Spectral Clustering of Supervoxel Graphs
Saturday, October 12, 2013

In the several previous posts I tried to provide some insight in how the spectral clustering technique may be applied to the point cloud processing domain. In particular, I have demonstrated different visualizations of eigenvectors and also did a manual analysis of one particular scene. In this blog post I will (finally) describe my algorithm that does automatic analysis of eigenvectors, which leads to unsupervised supervoxel clustering.

The input of the clustering algorithm is \Phi, a set of first k eigenvectors of the graph Laplacian. Each eigenvector \phi_k has n elements that correspond to the supervoxels is the original problem space. The task of the algorithm is to determine the number of clusters that the data points form in the subspace spanned by the eigenvectors and, of course, assign points to these clusters.

The key insight drawn from previous examinations and discussions of the eigenvectors is that the clusters are linearly separable in one-dimensional subspaces spanned by the eigenvectors. In other words, for every pair of clusters there exists at least one eigenvector so that in its subspace these clusters are linearly separable. Based on this premise I built an algorithm which is a pretty straightforward instance of divisive hierarchical clustering approach. It starts with a single cluster that contains all the data points and recursively splits it in a greedy manner. The following pseudo-code summarizes the algorithm:


The interesting part are, of course, FindBestSplit and SplitQuality functions. But to get this straight, let me first define what a “split” is. A split is a tuple (\phi_k, t), where \phi_k is the eigenvector along whose subspace the split occurs, and t is a threshold value. The points that have their corresponding elements in the eigenvector less than t go to the first cluster, and the remaining go to the second. For example, the figure below shows with a red line a split (\phi_1, 0.41) on the left and a split (\phi_2, 0.93) on the right:

Example splits in the subspaces spanned by the
eigenvectors \phi_1 and \phi_2

Which of these two splits is better? Intuitively, the one on the left is more promising than the one on the right. But how to define the split quality? My initial approach was to use the difference between the points immediately above and below the splitting line as the measure. This worked to some extent, but was not good enough. Then I switched to a measure based on the relative densities of the bands above and below the splitting line. Consider the figure below:

Example splits with three bands highlighted. The
“split” band is shown in red, the “top” and “bottom”
bands are shown in yellow

The “split” band is the region of low density around the splitting line. The “top” and “bottom” bands are the high density regions immediately above and below the “split” band. I will omit the details of how these bands are computed, because it is likely that I modify the implementation in future. The quality of the split is defined as \frac{min(D(top),
D(bottom))}{D(split)}, where D(\cdot) is the density of the corresponding band.

With this quality measure at hand, the FindBestSplit function simply iterates over all available one-dimensional subspaces (i.e. over all eigenvectors) and finds the split with the highest quality.

The performance of the algorithm is excellent on simple scenes:

_images/test13.png _images/test47.png
_images/test13-ssc-clusters.png _images/test47-ssc-clusters.png
Spectral supervoxel clustering of simple table-top scenes

And is rather good (though definitely not perfect) on more cluttered ones:

_images/test55.png _images/test60.png
_images/test55-ssc-clusters.png _images/test60-ssc-clusters.png
Spectral supervoxel clustering of cluttered table-top scenes

For example, the green book in the first scene is split into two clusters. In the second scene two small boxes in the center of the image are erroneously merged into one cluster. I think these issues are closely related with the split quality measure and the threshold associated with it. Definitely, there are ways to improve these and I plan to work on it the future.

Analyzing Eigenvectors “By Hand”
Saturday, October 05, 2013

Before exposing the clustering algorithm as promised in the last blog post, I decided to motivate it by showing how the eigenvectors may be analyzed “by hand”. Hopefully, this will also provide more intuition about what these eigenvectors actually are and how they are related with the data.

Just to remind, the problem I am trying to solve is about segmenting a set of supervoxels into meaningful components. Here a meaningful component means a subset of supervoxels that are close to each other in Euclidean sense, and are separated from the rest by a sharp change in orientation. If we view each supervoxel as a point then the problem is about clustering points in a d-dimensional space. (Currently d=6 since supervoxels have 3 Euclidean coordinates plus 3 coordinates of the normal vector, however additional dimensions, e.g. color, may be added later.) The difficulty of the problem comes from the fact that the components may have arbitrary irregular shape in these dimensions. Therefore I want to map these points so some other space where the components will correspond to tight clusters, perhaps even linearly separable. The current idea is to use a subspace spanned by the first few eigenvectors of graph Laplacian of the original data.

In the last blog post I provided a visualization of the eigenvectors in the original problem space. For convenience, here are the first four eigenvectors again:

The first 4 eigenvectors in the original problem space

I want to perform clustering in a subspace though, so it is helpful to develop an intuition about how the data look like in it. The figure below demonstrates the data points (that is, supervoxels), projected on each of the first four eigenvectors. (Here and in what follows the data is whitened, i.e. de-meaned and scaled to have unit variance. Additionally, the values are sorted in increasing order.)

Data points in subspaces spanned by the first 4 eigenvectors

Obviously, in each of these subspaces (except for the second) the data is linearly separable in two clusters. What is not obvious, however, is how many clusters there will be in the combined subspace. The next figure shows data points in subspaces spanned by two different pairs of eigenvectors:

Data points in subspaces spanned by first and third (left)
and fourth and third eigenvectors (right)

Now it becomes evident that there are at least three clusters. Will there be more if we consider the subspace spanned by all these three eigenvectors? It turns out there will, see the point cloud below:

Unfortunately, we have just approached the limit in terms of how many dimensions could be conveniently visualized. Though for this particular data set it is enough, there won’t appear more clusters if we consider additional eigenvectors. Summarizing, in the subspace spanned by the first four eigenvectors the data points form four tight and well-separated (linearly) clusters. And these clusters actually correspond to the three boxes and the table in the original problem space.

Now the only thing left is to develop an algorithm which would do this kind of analysis automatically!

Correspondence Rejection: A Quick Reference Guide
Friday, October 04, 2013

Correspondence rejection classes implement methods that help eliminate correspondences based on specific criteria such as distance, median distance, normal similarity measure or RanSac to name a few. Couple of additional filters I’ve experimented with include a uniqueness measure, and Lowe’s ratio measure as in “Distinctive image features from scale invariant keypoints”, D.G. Lowe, 2004. I’ve also explored the tradeoffs in implementing the filters within CorresondenceEstimation itself, or as external CorrespondenceRejection classes. The former is computationally more efficient if the rejection process is done in one pass, while the latter allows for scene-specific squential filter banks.

Follows is a quick reference guide of the available correspondence rejection classes with remarks extracted from the source code.

Eigenvectors and Spectral Clustering
Thursday, October 03, 2013

I concluded the last blog post by noting that it seems to be possible to segment objects based on analysis of the eigenvectors of Laplacian constructed from the point cloud. This time I will provide a visual interpretation of eigenvectors and then describe the problem of their analysis.

Let me start with a quick note on eigenvector computation. As mentioned before, they are obtained through eigendecomposition of Laplacian that represents the surface. In the beginning I used SelfAdjointEigenSolver of Eigen library. It runs in \operatorname{O}\left(n^3\right) time (where n is the number of points), which obviously does not scale well. Later I switched to SLEPc. It can limit computation only to a desired number of first eigenpairs and therefore does the job much faster, but still seems to have polynomial time. Therefore I decided to execute supervoxel clustering as a pre-processing step and then compute the distances over the supervoxel adjacency graph, which has a dramatically smaller size than the original point cloud.

Now let’s turn to the eigenvalues themselves. The figure below demonstrates the voxelized point cloud of a simple scene (left) and supervoxel adjacency graph (right), where the adjacency edges are colored according to their weights (from dark blue for small weights to dark red for large weights):

_images/test13-voxels.png _images/test13-supervoxels-adjacency.png
Voxelized point cloud (voxel
size 0.006 m)

Supervoxel adjacency graph
(seed size 0.025 m), colored
according to edge weight

Eigendecomposition of Laplacian of this weighted adjacency graph yields a set of pairs \left\{\lambda_{k},\phi_{k}\right\}. Each eigenvector \phi_{k} has as many elements as there are vertices in the graph. Therefore it is possible to visualize an eigenvector by painting each supervoxel according to its corresponding element in the vector. The figure below shows the first 9 eigenvectors which correspond to the smallest eigenvalues:

First 9 eigenvectors of the graph Laplacian

The first eigenvector clearly separates the scene into two parts: the third box and everything else. In the second eigenvector the table is covered with gradient, but the first and third boxes have (different) uniform colors and, therefore, stand out. The third eigenvector highlights the second box, and so on.

I have examined quite a number of eigenvectors of different scenes, and I think the following common pattern exists. First few eigenvectors tend to break scene in several regions with uniform colors and sharp edges. In the next eigenvectors gradients begin to emerge. Typically, a large part of the scene would be covered with gradient, and a smaller part (corresponding to a distinct component of the graph) would be have some uniform color.

The overall goal is to figure out the number of distinct components of the graph (that is, objects in the scene) and segment them out. As I admitted before, it is clear that the eigenvectors capture all the information needed to do this, so the question is how to extract it. In fact, this problem has already received a lot of attention from researchers under the name of “Spectral Clustering”. (Yeah, I made quite a detour through all these distance measures on 3D surfaces before I came to know it). The standard approach is described in the following paper:

In a nutshell, the original problem space typically has many dimensions (in our case the dimensions are Euclidean coordinates, normal orientations, point colors, etc.). The clusters may have arbitrary irregular shapes, so they could neither be separated linearly, nor with hyper-spheres, which renders standard techniques like K-means inapplicable. The good news are, in the subspace spanned by the first few eigenvectors of Laplacian of the graph (constructed form the original data) the data points form tight clusters, and thus K-means could be used. This effect is evident in the eigenvectors that I demonstrated earlier. Unfortunately, the number of clusters still needs to be known. There exist literature that addresses automatic selection of the number of clusters, however I have not seen any simple and reliable method so far.

In the next blog post I will describe a simple algorithm that I have developed to analyze the eigenvectors and demonstrate the results.

Correspondence Estimation: A Quick Reference Guide
Saturday, September 28, 2013

With my current work on optimizing correspondence estimation across the uv/xyz domains, it is worth providing a topology of the available correspondence estimation classes in PCL. For a highlevel treatment of the registration API, please refere to the registration tutorial.

Correspondence estimation attempts to match keypoints in a source cloud to keypoints in a target cloud, based on some similarity measure, feature descriptors in our case. Although applying scene relevant descriptor parameters and correspondence thresholds may reduce erronous matches, outliers persist with impact on pose estimation. This is due to the implied assumption that for each source keypoint, a corresponding target keypoint exists. The difficulty in estimating model or scene-specific descriptor parameters is another factor.

Follows is a quick reference guide of the available correspondence estimation classes with remarks extracted from the source code.

Computation of Distance Measures
Thursday, September 26, 2013

Last time I wrote about distance measures on 3D surfaces, though I did not give any details about how they are computed. In this blog post I will give a formal definition, followed by two important properties that simplify the computation and provide insights that might help to solve the ultimate goal: identification of distinct objects in a scene.

Given a mesh (or a point cloud) that represents a surface, a discretization of the Laplace-Beltrami operator (LBO) is constructed. This discretization is a sparse symmetric matrix of size n \times n, where n is the number of vertices (points) in the surface. The non-zero entries of this matrix are the negated weights of the edges between adjacent vertices (points) and also vertex degrees. This matrix is often referred to as Laplacian. Eigendecomposition of Laplacian consists of pairs \left\{\lambda_{k},\phi_{k}\right\}, where 0=\lambda_{0}<\lambda_{1}\leq\dotso are eigenvalues, and \phi_{0},\phi_{1},\dotsc are corresponding eigenvectors.

The diffusion distance is defined in terms of the eigenvalues and eigenvectors of Laplacian as follows:

\mathcal{D}_t(x,y)^2 = \sum_{k=1}^{\infty}e^{-2\lambda_{k}t}\left(\phi_{k}(x)-\phi_{k}(y)\right)^2

The biharmonic distance bears a strong resemblance to it:

\mathcal{B}(x,y)^2 = \sum_{k=1}^{\infty}\lambda_{k}^{-2}\left(\phi_{k}(x)-\phi_{k}(y)\right)^2

Here \phi_{k}(x) means x-th element of eigenvector \phi_{k}. Both distances have a similar structure: a sum over all eigenpairs, where summands are differences between corresponding elements of eigenvectors scaled by some function of eigenvalues. There are two properties of these distances that I would like to stress.

Firstly, the summands form a decreasing sequence. The figure below illustrates this point with eigenvalues of Laplacian of a typical point cloud:


In the left image the first hundred of eigenvalues (except to \lambda_0 which is always zero) are plotted. Note that the values are normalized (i.e. divided) by \lambda_1. The magnitudes of eigenvalues increase rapidly. On the right the multipliers of both diffusion and biharmonic distances are plotted (also computed with normalized eigenvalues). The biharmonic distance multiplier is plotted for several choices of the parameter t. Clearly, only a few first terms in the summation are needed to approximate either of the distances. This has an important consequence that there is no need to solve the eigenproblem completely, but rather is suffices to find a limited number of eigenpairs with small eigenvalues.

Secondly, the distance between two points x and y depends on the difference between their corresponding elements in eigenvectors \phi_{k}(x) and \phi_{k}(y). The figure below demonstrates the (sorted) elements of a typical eigenvector:


One may see that there are groups of elements with the same value. For example, there are about one hundred elements with value close to 0.05. The pair-wise distances between the points that correspond to these elements will therefore be close to zero. In other words, plateaus in eigenvector graphs correspond to sets of incident points, and such sets may be interpreted as objects.

Summing up, it seems like it should be possible to identify distinct objects in a point cloud by analyzing the eigenvectors of Laplacian (even without explicitly computing any of the distance measures). Moreover, only a few first eigenvectors are relevant, so it is not necessary to solve the eigenproblem entirely.

Experimenting with half
Saturday, September 21, 2013

Today I started experimenting with half (http://half.sourceforge.net/) a C++ header-only library to provide an IEEE 754 conformant 16-bit half-precision.

The mixed results are that I ended up with fairly lighter binary files (50% lighter as expected) but there is a loss of precision when I convert it back to ASSCII.

The file outdoor.ptx when encoded to binary is just 62M vs 124M.

The conversion back to ASCII is not so great though (because of rounding probably). I am exposing 10 lines from the orginal ASCII file and the one generated after converting back the binary file.

Original ASCII file:

X Y Z intensity
0.004745 1.044357 -2.114578 0.006226
0.004745 1.046707 -2.112625 0.006714
0.004745 1.049637 -2.111862 0.006409
0 0 0 0.500000
0.004776 1.057053 -2.113419 0.006088
0.004776 1.060349 -2.113327 0.006683
0.004807 1.064133 -2.114212 0.007370
0.004807 1.068130 -2.115555 0.007156
0.004807 1.072067 -2.116714 0.006760
0.004837 1.075150 -2.116165 0.006790

Using half precision float:

X Y Z intensity
0.00474167 1.04395 -2.11328 0.00622559
0.00474167 1.0459 -2.11133 0.00671387
0.00474167 1.04883 -2.11133 0.00640869
0 0 0 0.5
0.00477219 1.05664 -2.11328 0.00608444
0.00477219 1.05957 -2.11328 0.00667953
0.00480652 1.06348 -2.11328 0.00737
0.00480652 1.06738 -2.11523 0.00715256
0.00480652 1.07129 -2.11523 0.00675964
0.00483322 1.07422 -2.11523 0.00678635

As you can notice there are differences starting at 6th decimal position. It could be worth trying to use different rounding options to see if it helps.

Enhencing LZF + JP2K reading/writing time
Friday, September 20, 2013

Through the usage of OpenMP parallelism instructions I achieved better results for ASCII reading and LZF + J2K PTX file encoding. Here I show the improved results.

Experimental protocol is very similar to the one used earlier : 10 runs in a row of leica_ascii2binary tool with LZF + JP2K encoding option. Each time we measure the ASCII file reading time and the writing.

File Nb of points ASCII LZF + JP2K
indoor.ptx 10997760 208685.6 16995.2
outdoor.ptx 8062080 134010.5 X

Compared ASCII reading times with and without OpenMP for both files are shown on the figure below. You can notice the 10 seconds gap achieved through usage of OpenMP.


Compared LZF + J2K encoding times with and without OpenMP are shown next. From the graphics you can notice the gain of almost 3 seconds during the encoding.

ROPS code and tutorial
Thursday, September 19, 2013

Hello everybody. I’d like to thank Yulan Guo, one of the authors of the RoPS feature, for his help. I’ve tested my implementation against his and got the same results. I have also tested my implementation for memory leaks with the VLD and it works fine, no memory leaks were detected. Right now the code is ready for commit. And as always I have wrote a tutorial about using the code. Right now all is left is to discuss where to place the implemented code.

Distance Measures on 3D Surfaces
Tuesday, September 17, 2013

In the recent weeks I have been developing an approach which would allow automatic selection of seed points. I decided to proceed with the methods which use graph-based representation of point clouds and, as the matter of fact, are closely related with random walks on those graphs. I still do not have any solid results, though there are some interesting outputs that I would like to share in this blog post.

In the domain of mesh segmentation, or more generally 3D shape analysis, there is a fundamental problem of measuring distances between points on a surface. The most trivial and intuitive is geodesic distance, which encodes the length of the shortest path along the surface between two points. It has a number of drawbacks, the most important being its sensitivity to perturbations of the surface. For example, introducing a hole along the shortest path between two points, or a small topological shortcut between them may induce arbitrary large change in the distance. This is an undesired property, especially considering the noisy Kinect data that we work with.

A more sophisticated distance measure, diffusion distance, is based on the mathematical study of heat conduction and diffusion. Suppose we have a graph that represents a shape (it may be constructed exactly the same way as for the Random Walker segmentation). Imagine that a unit amount of heat is applied at some vertex. The heat will flow across the edges and the speed of its diffusion will depend on the edge weights. After time t has passed, the initial unit of heat will be somehow distributed among the other vertices. The Heat Kernel (H_t) encodes this distribution. More specifically, H_t(i, j) is the amount of heat accumulated after time t at vertex j if the heat was applied at vertex i. Based on this kernel the diffusion distance between each pair of points is defined. Importantly, the distance depends on the time parameter and captures either local or global shape properties.

Another distance measure, commute-time distance, is the average time it takes a random walker to go from one vertex to the other and come back. Finally, biharmonic distance was proposed most recently in:

This distance measure is non-parametric (does not depend on e.g. time) and is claimed to capture both local and global properties of the shape.

The figure below demonstrates biharmonic, commute-time, and heat diffusion distance maps computed with respect to the point marked with a red circle:

_images/test56-crop-voxels.png _images/test56-crop-hd01.png
_images/test56-crop-bd.png _images/test56-crop-hd05.png
_images/test56-crop-ctd.png _images/test56-crop-hd10.png
Left column: voxelized point cloud (top), biharmonic distance
(middle), commute-time distance (bottom).
Right column: heat diffusion distance for several choices of
the time parameter.

In each image the points with smallest distance are painted in dark blue, and the points with largest distances are dark red. The absolute values of distances are very different in all cases.

I think these distance maps could be used to infer the number of distinct objects in the scene. Indeed, the points that belong to the same object tend to be equidistant from the source point, so different objects correspond to different blobs of homogeneous points. Finding objects thus is the same as finding modes of the distribution of distances, which could be accomplished with Mean-Shift algorithm.

Speaking about particular choice of distance measure, biharmonic distance and heat diffusion distance with large time parameter intuitively seem to be better than others, however this is a subject for a more careful examination.

ROPS Progress
Thursday, September 12, 2013

Hello everybody. I have finished implenting the ROPS feature. Next step I am going to do is to write to the authors and ask them about some samples of data and precomputed features, so that I could compare the result. After that I am planning to test ROPS feature for object recognition. For this purpose I am going to use Implicit Shape Model algorithm from PCL.

Performance analysis
Friday, September 06, 2013

This part of the project is purely analytical where I compare compression rate/speed of several compression methods.

Below are compression rates on the test dataset. Tests were run on a personal laptop powered by a i7 CPU M 620 @ 2.67GHz. To be fair, I only compare loseless compression rates.

File ASCII binary LZF LZF + JP2K
indoor.ptx 480M 210M 169M 134M
outdoor.ptx 212M 124M 57M X

For PTX files with RGB data the joint LZF + JP2K compression is the most efficient.

The image below summarizes graphically files size with reference to encoding used.


Main issue though is that the JP2K compression is not fast: it takes almost 12s on my laptop to perform for the indoor.ptx dataset but I believe it is acceptable given the gain in file size. I tested the conversion time taken by the image_to_j2k command line tool to convert the ASCII PGM image generated by copying the RGB data into a J2K image and it is roughly the same amount of time needed to perform the conversion. This indicates that its an OpenJPEG intrinsic issue.

The table below lists ASCII reading times and then writing speed for the given dataset. Times indicated are the average reading/writing speed for 10 runs expressed in ms.

Tests were run by invoking command line tool leica_ascii2binary each time with the appropriate flag:

  • 0 binary conversion;
  • 1 binary LZF compression;
  • 2 binary LZF + JP2K compression.
File Nb of points ASCII binary LZF LZF + JP2K
indoor.ptx 10997760 293994.1 1371.2 7123.7 19347.5
outdoor.ptx 8062080 134010.5 726.1 2765.3 X

Encoding times are reported to the graph below for a better visualization.

Fixing Bugs in Segmentation using Random Walks
Sunday, September 01, 2013

This is a follow-up post to the segmentation using random walks. It turned out that my initial implementation had several bugs which significantly worsened the performance. In this blog post I will describe them and show the outputs produced by the fixed algorithm.

The segmentation results that I demonstrated last time expose two (undesirable) features: vast regions with random label assignment and “zones of influence” around seed points. My first intuitive explanation was that since the edge weights are always less than 1, a random walker can not get arbitrary far from its starting point, therefore if a seed is reasonably far, the probability of getting there is very small. Surprisingly, I did not find any mentions or discussions of this effect in the literature. Moreover, while carefully re-reading “Random Walks for Image Segmentation” I mentioned that the algorithm outputs for each point and label pair not the probability that a random walker started from that point will reach the label, but rather the probability that it will first reach that label (i.e. earlier than other labels).

I decided to visualize edge weights and the probabilities I get with my implementation. In the following experiment I labeled three points, one on the table, and the other two on the boxes (see the left figure):

_images/test1-pointcloud-seeds.png _images/test1-edge-weights.png _images/test1-potentials-1.png
Color point cloud
with three labeled
Edges between
Probabilities that
a random walker
will first reach the
top right label

The figure in the middle shows all edges along which random walkers move. Each edge is visualized by its middle point colored according to edge weight (using “jet” color map where 0 is dark blue and 1 is dark red). The weights do make sense, as the edges in the planar regions are mostly reddish (i.e. weight is close to 1), whereas on the surface boundaries they are bluish (i.e. weight is close to 0). Moreover, it is clear that concave and convex boundaries have different average weights.

The figure on the right shows all points colored according to the probability that a random walker started from that point will first reach the labeled point on the box in the back. The probability images for the other labels are similar, thus for the majority of points the probability of first reaching either label is close to zero. Clearly, this can not be right, because some label has to be reached first anyways, and therefore the probabilities at each point should sum up to one. This observation triggered a long search for bugs in my implementation, but in the end I discovered and fixed two issues.

As I mentioned before, the solution for random walker segmentation is obtained by solving a system of linear equations, where coefficients matrix consists of edge weights and their sums, arranged in a certain order. The system is huge (there are as many equations as there are unlabeled points), but sparse (the number of non-zero coefficients in a row depends on the number of edges incident to a point). The first issue was due to a bug (or a feature?) of OctreePointCloudAdjacency class that I used to determine neighborhoods of points. In a nutshell, it is a specialized octree, where leaves store pointers to their direct neighbors (in terms of 26-connectivity). For some reason, a leaf always stores a pointer to itself in the list of neighbors. This caused bogus self-edges in my graph which themselves did not show up in the matrix (because diagonal elements are occupied by sums of weights of incident edges), however treacherously contributed to those sums, thus invalidating them.

The second issue was more subtle. For the system to have a solution it has to be non-singular. In terms of the graph it means that it either has to be connected, or should contain at least one seed in every connected component. From the beginning I had a check to enforce this requirement, however I did not take into account that the weighting function may assign zero weight to an edge! In the pathological case all the edges incident to a point may happen to be “weightless”, thus resulting in an all-zero row in the system.

Below are the results obtained using the random walker algorithm after I fixed the described issues:

_images/test55-rwc-fixed-manual.png _images/random-walker-potentials.gif
Random walk segmentation
with overlaid seed points
Probabilities that a random
walker will first reach each
of the labels

As I did it previously, I labeled a single point in each object. (Actually, a single point in each disjoint part of each object. That is why there are multiple labeled points on the table.) The scene is perfectly segmented. On the right is an animated GIF with a sequence of frames which show probabilities (potentials) for each label. In most cases the separation between object and background is very sharp, but in two cases (the small box on top of the book and standing box on the right) some non-zero probabilities “spill” outside the object. Nevertheless, this does not affect the final segmentation since the potentials for the correct labels are higher.

I am very happy with the results, however one should remember that they were obtained using manual selection of labels. This code sprint is aimed at an automatic segmentation, so next I plan to consider different strategies of turning this into a fully autonomous method.

Improved compression and writing speed
Thursday, August 29, 2013

I implemented two compression methods :

  • LZF mainly a rewrite from PCDWriter::writeBinaryCompressed method
  • JP2K + LZF method which uses LZF to compress XYZ and intensity information while JP2K is used to compress RGB data.

This choice is motivated by the need of a comparison basis and also by the fact that RGB data won’t be compressed efficiently by LZF since it is a dictionary based algorithm.

As for JP2K, it is an improvement of JPEG it is a wavelet based compression algorithm which claims higher compression rates with almost no data loss.

The implementation I am using is the one provided by OpenJPEG. As version 1.3 seems to be the most common I picked it to run the tests.

I spent the few past weeks trying to improve the data read/write speed by using leica centric point types which lead to better results.

In the next weeks I will be essentially running tests and trying to enhance compression performances.

For now loseless compression ratio is 0.27 using LZF + JP2K, ASCII data reading is 0.021 ms/point while LZF + JP2K data writing speed is 0.001 ms/point.

Segmentation using Random Walks
Wednesday, August 28, 2013

In the last few weeks I have been working on two things in parallel. On one hand, I continued to improve supervoxel segmentation, and a description of this is due in a later blog post. On the other hand, I started to look into an alternative approach to point cloud segmentation which uses random walkers. In this blog post I will discuss this approach and show my initial results.

The random walker algorithm was proposed for interactive image segmentation in:

The input is an image where several pixels are marked (by the user) with different labels (one for each object to be segmented out) and the output is a label assignment for all the remaining pixels. The idea behind the algorithm is rather intuitive. Think of a graph where the nodes represent image pixels. The neighboring pixels are connected with edges. Each edge is assigned a weight which reflects the degree of similarity between pixels. As it was mentioned before, some of the nodes are marked with labels. Now take an unlabeled node and imagine that a random walker is released from it. The walker randomly hops to one of the adjacent nodes with probability proportional to the edge weight. In the process of this random walk it will occasionally visit labeled nodes. The one that is most likely to be visited first determines the label of the node where the walker started.

Luckily, one does not have to simulate random walks from each node to obtain the probabilities of arriving at labeled nodes. The probabilities may be calculated analytically by solving a system of linear equations. The matrix of coefficients consists of edge weights and their sums, arranged in a certain order. The author did a great job explaining why this is so and how exactly the system should be constructed, so those who are interested are referred to the original paper.

Originally, the algorithm was applied to segment 2D images, however it could be used for any other data as long as there is a way to model it using a weighted graph. For us, of course, 3D point clouds are of a particular interest. The following paper describes how random walker segmentation could be applied for meshes or point clouds:

The authors construct the graph from a point cloud is the following way. Each point p_i in the cloud becomes a node v_i in the graph and the normal vector n_i is computed for it.

For a pair of nodes v_i and v_j two distances are defined:

  • Euclidean distance between points d_1(v_i,v_j) = ||p_i-p_j||^2
  • Angular distance between normals d_2(v_i,v_j) = \frac{\eta}{2}||n_i-n_j||^2

In the angular distance \eta is a coefficient which depends on the relative concavity between points. For the convex case it is set to 0.2, effectively discounting the difference, whereas for the concave case it is equal to 1.0.

Using K-nearest neighbors search the neighborhood N(v_i) of a node is established. An edge is created between the node and each other node in the neighborhood. The weight of the edge depends on the two distances and is defined as:

w_{ij} = \exp{\left\{-\frac{d_1(v_i,v_j)}{\sigma_1\bar{d_1}}\right\}\cdot\exp\left\{-\frac{d_2(v_i,v_j)}{\sigma_2\bar{d_2}}\right\}}

\sigma_1 and \sigma_2 are used to balance the contributions of different distances. \bar{d_1} and \bar{d_2} in the denominators stand for the mean values of the distances over the whole point cloud.

In my implementation I decided to voxelize point cloud like it is done in SupervoxelSegmentation. I also reused the OctreePointCloudAdjacency class contributed by Jeremie Papon to establish the neighborhood relations between voxels. The weight is computed exactly as it is proposed by Lai et al. Finally, I use the Sparse module of Eigen to solve the linear system.

I mentioned before, but did not stress attention on the fact that this segmentation approach is semi-automatic. This means that the user has to provide a set of labeled points. My current implementation either accepts user input or generates labeled points uniformly using the same approach as SupervoxelSegmentation does for seed generation. There are smarter ways of producing initial labeling and I plan to consider this issue later.

Here are the some initial results that I got using random walk segmentation algorithm. The big red dots show the locations of labeled points:

_images/test55-voxels.png _images/test55-rwc-few-manual.png
Voxelized point cloud (voxel
size 0.006 m)
Random walk segmentation
(with one manually labeled
point per object)
_images/test55-rwc-many-manual.png _images/test55-rwc-uniform.png
Random walk segmentation
(with multiple manually
labeled points in each object)
Random walk segmentation
(labeled points are uniformly

In the first case (top right) I manually labeled each object in the scene with one point. Many of the object boundaries are correctly labeled, however there are vast regions with totally random labels. This is especially evident in the upper right corner, where there is just one seed. A circular region around it is correctly labeled with single color, however the points outside of it are all randomly colored. According to my understanding, this happens because the edge weights are always less than 1, so a random walker can not get arbitrary far from the starting point. Thus if there are no labeled points in the vicinity of a point, then all the computed probabilities are nearly zero and label assignment happens randomly (because of numerical imprecision).

In the second case (bottom left) I manually labeled each object in the scene with multiple points guided by an intuition about how large the “zones of influence” are around each labeled point. The resulting segmentation is rather good.

In the third case (bottom right) the labeled points were selected uniformly from a voxel grid with size 10 cm. As a result I got an over-segmentation that resembles the ones produced by SupervoxelSegmentation.

I think the initial results are rather promising. In the future I plan to work on the weighting function as it seems to be the key component for the random walk segmentation. I would like to understand if and how it is possible to vary the size of the “zone of influence” around labeled points.

Labeling of the Castro dataset is finished
Tuesday, August 27, 2013

We are pleased to report, that labeling of the Castro dataset (6440 frames) is finished. Here are some examples of labeled images:


We also tested an algorithm, developed by Alex Trevor in the previous HRI code sprint. This algorithm segments points using their normal distribution, this makes it very sensitive to noise.

Basically, this algorithm computes a disparity for a stereo pair using its own dense matching method, implemented by Federico Tombari. But I additionally tested it using disparity maps precomputed by HRI. Here are the typical results (left - disparity is computed with Federico Tombari’s method, right - precomputed by HRI):


You can see that Federico Tombari’s method is friendlier to the normal-based algorithm. But it is not good enough for describing a scene, there are a lot of false positives.

Some noise is presented at the HRI’s disparity maps, even a lot of pixels have no valid disparity, sometimes there are no segments that are similar to road and there are a lot frames in which road was not found.

This algorithm has thresholds for disparity and doesn’t mark as “road” any point which doesn’t satisfy these thresholds. I didn’t take it to account because it would make these results not completely correct. Therefore, 50% recall would be a very good result.

The goal is find all pixels, that belong to the road. Total results are on the image below (precision is a ratio of right detected road’s points to all detected pixels, recall is a percent of detected road’s points):

Under-segmentation Error
Monday, August 19, 2013

I concluded the previous blog post with a claim that the improved refinement procedure yields a better segmentation. That conclusion was based purely on a visual inspection and rang a bell reminding that it is time to start thinking about quantitative evaluation metrics and prepare a tool set to preform evaluations. In this blog post I will describe my first steps in this direction.

Let us start with a prerequisite for any evaluation, ground truth data. So far I have been using scenes from Object Segmentation Database (OSD). This dataset contains a ground truth annotation for each point cloud, however I am not completely satisfied with its quality. Consider the following point cloud (left) and the provided annotation (middle):

_images/test1-pointcloud.png _images/test1-osd-annotation.png _images/test1-my-annotation.png
Color point cloud
Original ground
truth annotation
Improved ground
truth annotation

Some of the points in the close vicinity of the object boundaries are wrongly annotated. For example, almost all points on the left edge of the standing box are annotated as belonging to the table. I hypothesize that the annotation was obtained automatically using some color image segmentation algorithm. (Thus it is not really “wrong”, it is correct in terms of the color data.) As we are working with 3D point clouds, the spatial relations derived from the depth data should have larger weight than color and the annotation should be revised to respect geometrical boundaries of the objects. Unfortunately, there is no tool in PCL that would allow to load a point cloud and edit the labels. It costed me quite a few hours of work to put together a simple editor based on PCLVisualizer that could do that. An example of refined annotation is shown in the figure above (right).

The authors of SupervoxelSegmentation used two metrics to evaluate how nicely the supervoxels adhere to the object boundaries: under-segmentation error and boundary recall. These metrics were introduced in:

I decided to begin with the under-segmentation error. Formally, given a ground truth segmentation into regions g_1,\dots,g_M and a supervoxel segmentation into supervoxels s_1,\dots,s_K, the under-segmentation error for region g_i is defined as:

E_{i} = \frac{\left[\sum_{\left\{s_j | s_j \bigcap g_i \neq\emptyset\right\}}Size(s_j)\right] - Size(g_i)}{Size(g_i)}

Simply put, it takes a union of all the supervoxels that overlap with a given ground truth segment and measures how much larger its total size is than the size of the segment. The authors of SupervoxelSegmentation use a slightly modified version which summarizes the overall error in a single number:

E = \frac{1}{N}\left[\sum^{M}_{i=1}\left(\sum_{\left\{s_j | s_j \bigcap g_i \neq\emptyset\right\}}Size(s_j)\right) - N\right],

where N is the total number of voxels in the scene. In practice, this error sums up the sizes of all the supervoxels that cross the ground truth boundaries.

In my opinion, this definition biases error with the average supervoxel size. Consider the supervoxel segmentation (left):

_images/test1-010-supervoxels.png _images/test1-010-2overlaps.png _images/test1-010-my-error.png
(seed size 0.1 m)
Supervoxels that
cross ground truth
boundaries (red)
Erroneous voxels
according to the
proposed error
definition (red)

Clearly, the segmentation is rather good in terms of the border adherence. The only failure is in the right corner of the lying box, where the green supervoxel “bleeds” on the table. The middle image shows which voxels will be counted towards the error, i.e. those supervoxels that cross the ground truth boundaries. In fact, every supervoxel in the close vicinity of an edge is counted. The reason is simple: the edge between two surfaces in a typical point cloud obtained with an RGB-D camera is ambiguous. The ground truth segmentation is therefore random to some extent and the chance that the supervoxel boundary will exactly coincide with the ground truth segment boundary is close to zero. Consequently, the error always includes all the supervoxels around the boundary, and the larger they are on average, the larger the error is. The “real” error (“bled” green supervoxel) is completely hindered with it.

I decided to use a modified definition. Whenever a supervoxel crosses ground truth boundary(ies), it is split in two (or more) parts. I assume that the largest part is correctly segmented, and only count the smaller parts as erroneous. The figure above (right) demonstrates which voxels would be counted towards the error in this case. Still, there is some “noise” around boundaries, but it does not hinder the “real” error. I also do not like that the error in the original definition is normalized by the total point cloud size. I think that the normalization should be related to the amount of objects or the amount (length) of object boundaries. I will consider this options in future, but for now I just count the number of erroneous pixels and do not divide it by anything.

I would like to conclude with the results of evaluating simple and improved supervoxel refinement procedures using the described error metric. The figure below shows the voxels considered as erroneous (in red) in supervoxel segmentations obtained with 2 rounds of simple refinement (left) and 2 rounds of improved refinement (right):

_images/test56-use-simpleref.png _images/test56-use-splitref.png
Under-segmentation error
(seed size 0.1 m, with
simple refinement)
Under-segmentation error
(seed size 0.1 m, with
improved refinement)

The under-segmentation error without refinement is 15033. Simple refinement reduces it to 11099 after the first iteration and 10755 after the second. Improved refinement reduces it to 9032 and 8373 respectively. I think the numbers do agree with the intuition, so I will continue using this metric in future.

Simple Features
Monday, August 19, 2013

Hello everybody. I finally finished the code for simple features. I’ve implemented a pcl::MomentOfInertiaEstimation class which allows to obtain descriptors based on eccentricity and moment of inertia. This class also allows to extract axis aligned and oriented bounding boxes of the cloud. But keep in mind that extracted OBB is not the minimal possible bounding box.

The idea of the feature extraction method is as follows. First of all the covariance matrix of the point cloud is calculated and its eigen values and vectors are extracted. You can consider that the resultant eigen vectors are normalized and always form the right-handed coordinate system (major eigen vector represents X-axis and the minor vector represents Z-axis). On the next step the iteration process takes place. On each iteration major eigen vector is rotated. Rotation order is always the same and is performed around the other eigen vectors, this provides the invariance to rotation of the point cloud. Henceforth, we will refer to this rotated major vector as current axis.


For every current axis moment of inertia is calculated. Moreover, current axis is also used for eccentricity calculation. For this reason current vector is treated as normal vector of the plane and the input cloud is projected onto it. After that eccentricity is calculated for the obtained projection.

_images/projected_cloud.png _images/moment_of_inertia.png

Implemented class also provides methods for getting AABB and OBB. Oriented bounding box is computed as AABB along eigen vectors.

Supervoxel Refinement with Splitting
Monday, August 12, 2013

Last time I mentioned that the simple refinement procedure does not always lead to good results. In this blog post will I discuss some of the problems and show an improved refinement procedure that addresses them.

Here is another scene with a cluttered table from Object Segmentation Database (OSD):

_images/test56.png _images/voxels1.png
Color image of the scene
Fragment of the voxelized
point cloud (voxel size 6 mm)

I will concentrate on the stack with two books and two boxes in the foreground. The figures below demonstrate the results of supervoxel segmentation without and with one round of simple refinement:

_images/supervoxels-no-refine.png _images/supervoxels-1-refine.png
Supervoxel segmentation (seed size 0.1 m) without (left) and
with (right) refinement. Supervoxel centroids are overlaid.
Some supervoxels are enumerated on the left image.

There are many deficits in segmentation output if no refinement is used. #3 is split into two large parts and #8 has a small disjoint region. #2, #5, and #9 each cover two distinct surfaces. The simple refinement improves segmentation in some respects (#1 and #4 became better, #8 no longer has a disjoint part), however it fails to “join” #3 and, what’s more, splits #5 and #7. Performing more rounds of the simple refinement does not help and e.g. after 5 iterations all three supervoxels remain disjoint.

Obviously, the simple refinement can only help if after segmentation the supervoxel centroid moved to a new spot which is better than the original seed. Unfortunately for supervoxel #3 the centroid ends up somewhere in between the two parts, therefore the seed does not move significantly, and the same result is reproduced over and over again.

The simple refinement consists of creating new seeds from the centroids of supervoxels and running the segmentation algorithm again. I decided to improve it by splitting disjoint supervoxels prior to re-seeding. In each supervoxel I compute connected components (in terms of Euclidean distance). Each connected component that is not too small (has at least 15 points) defines a new seed voxel.

Below is an animated GIF where frames show supervoxels after each round of the improved refinement procedure. The number in the upper right corner gives round number (0 means before refinement):

5 rounds of supervoxel refinement with splitting

In the first iteration the yellow supervoxel is successfully split into two parts. As mentioned before, this refinement iteration degrades segmentation by making two other supervoxels disjoint. But this is fixed in the next iteration of refinement when they are also split. In the last two iterations the number of supervoxels does not change anymore, they only re-shape slightly.

The obtained supervoxel segmentation is far from being perfect. Nevertheless, it was significantly improved using updated refinement procedure.

Simple Supervoxel Refinement
Friday, August 09, 2013

In this blog post I will discuss one of the problems with supervoxel segmentation and a simple refinement procedure that resolves it.

Let us consider exactly the same part of the scene as in the previous blog post. For reference here are (again) the voxelized point cloud and object segmentation that I am getting:

_images/voxels.png _images/test55-010-clusters1.png
Voxelized point cloud (voxel
size 0.006 m)

Object segmentation (seed
size 0.1 m) with overlaid
edge graph

Below is the output of supervoxel segmentation. It is exacltly the same as in the previous post, though the colors are different (because they are chosen randomly on each execution). Additionally the seeds from which supervoxels evolved and their centroids are visualized:

Supervoxel segmentation (seed size 0.1 m) with overlaid
supervoxel seeds (red), supervoxel centroids (blue), and
edge graph

Last time I focused attention on the green and pink (formely cyan and blue) supervoxels in the central part of the image. The problem I mentioned is that the pink supervoxel split the green one into two disjoint parts. Another problem I did not talk about is that both supervoxels extend over the edge between the book spine and cover. This is undesirable because spine and cover are two distinct surfaces with different geometrical properties.

According to my understanding, the cause of this undesired segmentation is unlucky initial seeds. Both problematic supervoxels evolved from seeds which are on (or almost on) the edge between spine and cover. Right from the beginning both supervoxels started to absorb voxels from both surfaces. Only towards the end the pink supervoxel started to converge to the cover surface and the green one to the spine surface. But since the number of iterations is limited, the supervoxels never reached convergence. (Please see here how the supervoxels evolved.)

The described problems have a heavy impact on the final object segmentation. Firstly, the green supervoxel is considered to be adjacent to the dark green supervoxel in the center because of the tiny disjoint part we discussed last time. The orientations of these two supervoxels are similar (they both are vertical surfaces), therefore an edge is established and they are merged in the same object. (Unfortunately it is impossible to see in the image, but actually there is no edge from the dark green supervoxel to the pink one, but rather an edge from it to the green one, and then from the green one to the pink supervoxel.) Secondly, there is an edge between the pink supervoxel and the brown one below it because they are adjacent and have similar orientations. Was the supervoxel segmentation better, they would not touch each other and would not be connected. This explains why in the final object segmentation the table surface is merged with the book and the box on top of it.

I see two ways to address this problem. First is to come up with a smart seeding algorithm which would ensure that the seeds do not land on the edges between surfaces and at the same time guarantee an even distribution of seeds over the space. Second way is to introduce a refinement step which would post-process the supervoxels output by the segmentation algorithm.

In fact, there already exists a function SupervoxelSegmentation::refineSupervoxels(). It has to be invoked explicitly after the initial segmentation is obtained. The function simply uses the centroids of supervoxels as seeds and runs the segmentation algorithm once again. So, in a sense, it does implement the “smart seeding” approach. The improvement is massive. Below are the results of supervoxel and object segmentation when a single round of refinement is used:

_images/test55-010-refined-supervoxels.png _images/test55-010-refined-clusters.png
Supervoxel segmentation (seed
size 0.1 m, with refinement)
Object segmentation (seed
size 0.1 m, with refinement)

The final result is very good, no object is merged with any other one or the table. Unfortunately, I do not get equally good segmentations of other cluttered scenes. In the next blog post I will demonstrate them and discuss how this simple refinement step could be improved.

Final Results
Thursday, August 08, 2013

I report here below a summary of the tracking framerates for the main approaches I tested (measured on an Intel i7-3630QM processor at 2.4-3.4GHz with 4 cores and 8 threads). These framerates have been measured when publishing input images at 50 fps.

  • original approach in run_from_kinect_nodelet.launch: 23.8 fps
  • approach in run_kinect_nodelet_CodeSprint.launch (with HogSvmPCL node): 33 fps
  • approach in run_kinect_CodeSprint_bis.launch (with PCL’s people detector): 25 fps.

In the figure below, I report Detection Error Trade-off (DET) curves which compare the main tracking approaches contributed to the human_tracker project in terms of False Positives Per Frame (x axis) and False Rejection Rate (y axis). They have been obtained varying the minimum confidence parameter for the detector using HOG+SVM. The best working point for these curves is located at the bottom-left corner (with FRR = 0% and FPPF = 0). For visualization purposes, the curves are reported in logarithmic scale.


It can be noticed how the approaches developed during this Code Sprint (red and green curves) obtain considerably less FPPF with respect to the original code (blue curve). Morover, they are also shown to be faster than the original approach.

ROS node with PCL’s People Detector
Wednesday, August 07, 2013

As the final step of the code sprint, I created a ROS node which performs people detection with the ground based people detector present in PCL 1.7 (GroundBasedPeopleDetectionApp).

Instead of rgb and disparity images, this node takes as input a XYZRGB pointcloud, performs people detection and outputs a message containing the detected rois, as the other detection nodes of the human_tracker project. This node is called ground_based_people_detector (GPD) and it is used in run_kinect_CodeSprint_bis.launch of the System_Launch package. In this launch file, the detection cascade is composed of the GPD and the HaarDispAda nodes. The GPD node produces very good detections which are persistent in time and well centered on people, while the HaarDispAda node removes some false positives. The framerate is of about 25 fps, thus slightly lower than the approach in run_kinect_nodelet_CodeSprint.launch.

In the figure below, an example of the output of the nodes launched by run_kinect_CodeSprint_bis.launch is reported:

How Supervoxels Grow
Tuesday, August 06, 2013

As I mentioned in the previous blog post, I started to zoom in at the areas that cause problems for the segmentation/clustering algorithms. This made me explore how supervoxels actually grow, and in this short post I would like to share the gained insights.

Here is a part of the scene I used in the previous blog post observed from a slightly different viewpoint. On the left is the voxelized point cloud and on the right is the result of supervoxel segmentation with the default seed size (click on the image to see full-resolution version):

_images/voxels.png _images/supervoxels.png
Voxelized point cloud (voxel size 0.006 m) Supervoxel segmentation (seed size 0.1 m)

One thing that caught my attention in this segmentation is the cyan supervoxel in the central part of the image. If you examine it carefully you will mention that it consists of two disjoined parts: a big blob to the south-west of the center and a few voxels that are exactly in the center of the image. The distance between these two parts is quite large compared to the seed size, so I became curious about how exactly this segmentation came about. Even though the tutorial and the paper provide a detailed explanation of the process, I decided to visualize it to gain a better understanding.

Below is an animated GIF where frames show supervoxels after each of the 28 iterations made by the algorithm to segment the input point cloud:

Supervoxel growth (seed size 0.1 m)

The wavefront of the blue supervoxel chases the wavefront of the cyan one. At some point in time it breaks it into two parts and eventually “kills” the left one. The right one survives, presumably because its voxels are on the vertical surface of the box, which is similar to the rest of the cyan cluster. Therefore the blue supervoxel (which is mostly “horizontal”) can not seize them.

Although this particular case does not seem to be much of a problem, I could imagine a situation when a supervoxel is split into two or more chunks that are equally large. The centroid and normal of such a “broken” supervoxel will make no geometrical sense. Therefore reasoning about them in further processing steps will also make no sense and lead to random results.

How to cure this problem is an open question. We could require that supervoxels stay connected during expansion, though it might be computationally expensive to enforce. Alternatively, we could add a post-processing step which will split disjoined supervoxels into several smaller supervoxels. I put this issue in the todo-list and will come back to it later.

Supervoxel Segmentation
Saturday, August 03, 2013

In the previous blog post I briefly mentioned SupervoxelSegmentation algorithm that has recently become available in PCL. Today I would like to discuss it in a greater detail.

In a nutshell, SupervoxelSegmentation generates an over-segmentation of a 3D point cloud into small spatially compact regions (supervoxels) in which all points possess similar local low-level features (such as color, normal orientation). The main properties of the algorithm are that the supervoxels are evenly distributed across 3D space and (in most cases) do not cross object boundaries. A detailed explanation of the algorithm could be found in this tutorial and in the original paper:

Let’s consider a scene with a cluttered table from Object Segmentation Database (OSD):

_images/test55.png _images/test55-voxels.png
Color image Voxelized point cloud (voxel size 6 mm)

Here are the results of supervoxel segmentation with two different seed sizes (0.1 m, which is the default, and 0.03 m):

_images/test55-010-supervoxels.png _images/test55-010-supervoxels-adjacency.png
Supervoxel segmentation (seed size 0.1 m) The same with overlaid adjacency graph
_images/test55-003-supervoxels.png _images/test55-003-supervoxels-adjacency.png
Supervoxel segmentation (seed size 0.03 m) The same with overlaid adjacency graph

The over-segmented point clouds look like patchwork sheets. The smaller the seed size, the smaller the patches are. The white meshes represent the adjacency graph of the supervoxels. It is not immediately obvious which seed size would result in a better final object segmentation. On one hand, smaller patches are more likely to reflect true object boundaies. Furthermore, in order to segment small objects the patch size should not be greater that the smallest object we wish to segment. On the other hand, smaller patches mean more edges and more time to process them. Depending on the asymptotic complexity of the merging algorithm that could become a crucial consideration.

The next step after supervoxel segmentation is to merge supervoxels to obtain final object segmentation. To begin with, I decided to implement the algorithm proposed in the following paper:

The basic idea is to assign each edge in the adjacency graph a weight that expresses the difference (or dissimilarity) between the pair of supervoxels that it connects. The algorithm starts by sorting the edges in the non-decreasing weight order and also creates disjoint sets, where each set contains supervoxel belonging to the same object. In the beginning each supervoxel gets its own set. Then the algorithm iterates over all edges and unites the sets to which the supervoxels it connects belong if a certain condition holds. The condition I use is that the weight of the edge should be small compared to the weights of the edges between the supervoxels that are already in the sets. This algorithm is actually a modification of Kruskal’s algorithm for finding a minimum spanning tree in a connected weighted graph.

Defining a good difference function for a pair of supervoxels is crucial for the performance. It should consider all the available information about supervoxels, including their size, area, border, orientation, color, and so on. At the moment I use relatively simple function which only depends on the orientation of the supervoxels. More specifically, if two supervoxels have centroids C_i and C_j and average normals N_i and N_j, then the difference is:

D_{i,j} = \begin{cases} 1-abs\left(N_{i}\cdot N_{j}\right)\qquad & if\,\left(C_{i}-C_{j}\right)\cdot N_{i}<0\\ 0 & otherwise \end{cases}

In other words, if the supervoxels are relatively concave to each other, then the dissimilarity is proportional to the angle between their normals. Otherwise it is zero, i.e. relatively convex supervoxels are always similar.

Here are the results of running the algorithm on the supervoxel segmentations that were demonstrated before. The figures on the right show the edges that the algorithm kept (used to unite sets) on top of the object clusters:

_images/test55-010-clusters.png _images/test55-010-clusters-edges.png
Object segmentation (seed size 0.1 m) The same with overlaid edge graph
_images/test55-003-clusters.png _images/test55-003-clusters-edges.png
Object segmentation (seed size 0.03 m) The same with overlaid edge graph

There are a number of problems in both cases. For the large supervoxel size case the book at the bottom is united with the box on top of it, and the second book is segmented into two parts and merged with the table and the box on top of it. Also the saucepan in the back is joined with the table surface. The segmentation with more fine-grained supervoxels exposes different problems. Here the tetra-pak is joined with the box it stands on, and also the box nearby it is merged with the table surface. Moreover, there are several single-supervoxel “objects” that were not joined to any other cluster. Finally, both cases share other two problems: the table is split into many pieces, and the bowl on the left is split into two parts.

I think these initial results are rather good, especially considering the fact that a very simple dissimilarity measure is used. In the next blog post I plan to zoom in at the problematic areas and discuss how the dissimilarity function could be improved to solve the observed problems.

Table-top Object Segmentation
Wednesday, July 31, 2013

Hi! This is my first blog post in the scope of “Segmentation/Clustering of Objects is Cluttered Environments” code sprint. Today I would like to discuss the problem of table-top object segmentation and the tools that PCL currently has to offer.

Consider a situation when a robot observes a table with several objects standing on top of it. Assuming that the table surface is flat and is large relative to the objects’ size, and also that the objects do not touch each other (either standing side-by-side or on top of each other), the following standard simple pipeline allows to break the scene into objects:

  1. Detect dominant plane.
  2. Find a polygon (convex hull) that encloses the points that belong to the detected plane.
  3. Extract the points that are above the found polygon (i.e. that are supported by the dominant plane).
  4. Perform Euclidean clustering of the extracted points.

PCL offers components to solve each of the mentioned subtasks. After putting them together and tweaking several involved parameters, one could get an object segmentation system that works reasonably well under aforementioned assumptions. The problem, though, is that those assumptions are far too restrictive.

A typical real-world table with objects on top of it will be more challenging. Just have a look at your desk. The main properties are: a) the number of objects is unknown and is potentially very large; b) the objects do not stand in isolation, but rather touch and occlude each other; c) there might be several supporting planes, for example shelves or large objects that support smaller objects.

Unfortunately, there is nothing in PCL that could bring you anywhere close to segmenting such a scene. The available components include:

The first two methods are supposed to break a scene into foreground and background. Therefore unless one knows the number and approximate locations of all objects, these are of no use.

CRFSegmentation, as far as I understand, represents an unfinished work and also requires an initial guess (to which cluster it belongs) for each point in the dataset.

Next five modules implement region-growing segmentation. SeededHueSegmentation requires seeds and considers only Euclidean distance and difference in hue. RegionGrowing considers Euclidean distance and normal orientations, whereas RegionGrowingRGB works with differences in RGB color instead of normals. ConditionalEuclideanClustering may use any user-defined comparison function to judge whether two points belong to one segment. Finally, OrganizedConnectedComponentsSegmentation is similar to the previous one, but is crafted for organized (Kinect-style) point clouds. According to my experiments, none of these modules is able to come up with a reasonable segmentation of a real-world table-top scene.

SupervoxelClustering is the latest addition to PCL. As its name suggests, it can be used to split (oversegment) a point cloud into a set of supervoxels. Th algorithm uses geometrical structure, normals, and RGB values of points. The results I get are very nice and object boundaries are typically preserved (i.e. a single cluster does not span over multiple objects), however further processing is required to merge clusters into objects.

To conclude, there are tools in PCL that one can use to segment trivial table-top scenes. However, there is no module that could segment a complex real-world table-top scene out-of-the-box. Therefore, design of such a module will be the main focus of this code sprint.

Milestone 1: Metrics Framework for 3D descriptors
Monday, July 29, 2013

As mentioned in the roadmap, one of the goals is to implement a framework that captures vital statistics of selected descriptors and correspondence types. These vital statistics would then be analyzed by one or more objective function(s) to enable scene based optimizations.

The first milestone, a metrics framework for descriptor evaluation is now complete, and its output is in-line with the characteristics cited in Rublee et. al. ICCV 2011 paper, among other publications.

Specifically, the framework computes the intended vital statistics including: 2-Way and multi-descriptor matching and inlier rates. The filter banks include L2-distance, L2-ratio, and uniqueness measure. A simulated ground truth is also implemented and is generated during runtime. The framework has been applied to local 3D descriptors (FPFH33, SHOT352, and SHOT1344) across a range of downsampling leaf-sizes (0.01-0.07) and across a range of in-plane (0-90 degrees) rotations. A sample of the results is illustrated in the bar graphs below, which reflect the various metrics, computed at a 30 degree simulated rotation and at 2 levels of downsampling: 0.01 for the top bar graph and 0.07 for the next one. In total, 1680 rates were generated for further analysis by the objective function(s). A link is included below to a sample extended output for other 3D descriptors. Next step: to extend the framework to support 2D descriptors.

_images/fpfh01deg30.png _images/fpfh07deg30.png

The extended output for other 3D descriptors follows, [click to enlarge]:

_images/shot01deg30.png _images/shot07deg30.png _images/cshot01deg30.png _images/cshot07deg30.png
Detection Confidence Visualization
Friday, July 26, 2013

In order to better analyze detection results with the roiViewer package, I added a confidence field to the RoiRect message. It could be used to store the score computed by a detection node for every Roi.

I also modified the roiViewer so that it could display that confidence if the show_confidence parameter is set to true.

Major Improvements to Accuracy and Framerate
Wednesday, July 24, 2013

For reducing the latency and increasing the framerate of the detection cascade, I tried to reduce the detection nodelets to only two:

  • Consistency nodelet
  • HogSvm nodelet

The framerate considerably slowed down (from 28.2 to 9.8 fps) because the HogSvm nodelet was too slow to process all the rois outputted by the Consistency nodelet.

For this reason, I implemented a new nodelet, called HogSvmPCL, which exploits the HOG descriptor and the pre-trained Support Vector Machine in PCL 1.7 for detecting whole persons in RGB image patches. This new node is much faster than HogSvm while being very accurate in classification. By using the reduced cascade with HogSvmPCL in place of HogSvm (Consistency + HogSvmPCL), the tracking framerate tripled with respect to the Consistency + HogSvm approach (from 9.8 to 30 fps). The FRR decreased of 4% (from 18.24% to 22.48%) wrt the default code from SwRI, while the FPPF improved of 35% (from 0.58 to 0.38).

I also trained a SVM targeted to recognize upper bodies in order to be more robust when the lower part of a person is occluded or out of the image. This approach led to a further reduction of false positives of about 15%, while maintaining the same false rejection rate and framerate. From the launch files, the wholebody or halfbody classifier can be selected by setting the mode parameter and specifying the path to the classifier file with classifier_file.

In order to further improve accuracy, I exploited also disparity information adding to the detection cascade the HaarDispAda nodelet which uses Haar features on the disparity and Adaboost as a classifier. With respect to the detection cascade only composed by the Consistency and the HogSvmPCL nodelets, the accuracy considerably improved. In particular, the FRR decreased of 0.5% and the FPPF decreased of 55%.

The nodes of this cascade (Consistency + HogSvmPCL + HaarDispAda + ObjectTracking) can be launched with run_kinect_nodelet_CodeSprint.launch.

Spectrolab Viewer
Wednesday, July 24, 2013

The Spectrolab Viewer has been completed. The viewer streams data from a Spectroscan 3D, records point clouds, and plays the point clouds back. Additionally, supports multiple renderers. It can color points based on the x,y,z coordinates, as black and white intensity coloring, and as a fused Z range and intensity coloring. This provides maximal visibility for all of the structures in the point cloud. An example of a Spectroscan 3D point cloud and the fused Z/intensity rendering can be seen in the below image and movie.


The Spectrolab Viewer can directly record PCD files which can later be viewed with the viewer or any other PCL tool. Below is an example PCD file recorded by the Spectroscan 3D and can be shown in the PCL web viewer. You can clearly see the vehicle and person being scanned in the point cloud.

This viewer completes the PCL Spectrolab code sprint. This sprint has created open source tools to stream, save, and view data from Spectrolabs in development Spectroscan 3D. When it hits the market, it will be ready to operate with all of the opensource tools PCL has to offer.

Pointcloud Publisher Node Added
Tuesday, July 16, 2013

A node for creating XYZRGB pointclouds from RGB and disparity images has been contributed to the human_tracker repository. This node is called pointcloudPublisher and performs the following steps:

  • reading images and camera parameters from folder
  • pointcloud creation
  • depth to rgb registration
  • pointcloud publishing to /camera/depth_registered/points topic, which is the standard pointcloud topic for OpenNI devices.

This node allows to apply PCL’s people detector, which works on pointclouds, even if only RGB and disparity images are available. Later on, I will contribute a ROS node which exploits this detector.

Improvements to PCL’s people Module and SwRI’s Consistency Node
Friday, July 12, 2013

Since the last post, some updates have been provided both to PCL and ROS-Industrial repositories.

For PCL, some work had to be done in order to release the new version of PCL: 1.7. This version has been directly integrated also in the forthcoming ROS Hydro. In particular, a new and optimized version of the HOG code has been provided. This new version exploits SSE2 operations to obtain a major speed-up, but a version of that code which does not exploit SSE has also been provided, in order to make it work on every machine. Also some Windows compilation bugs have been fixed for the new release.

For what concerns the human_tracker code, the Consistency node has been updated. I noticed that the Consistency node outputted too many ROIs which were highly overlapping and this was the cause of a major loss in the frame rate of the detection cascade. Thus, a procedure for removing overlapping ROIs which was used in the HaarAda node has been implemented also in the Consistency node (in the launch file this function can be activated/deactivated with the RemoveOverlappingRois parameter). As a result, the framerate increased of 8-20% (from 23.8 to 28.2 fps) while maintaining the same accuracy.

In the figure below, the output of all the detection nodes and of the tracking node is reported while performing the removal of the ROIs overlapping more than 80% with another ROI in the Consistency node:

Plane based compression of point clouds
Thursday, July 11, 2013

Basically the approach fits a set of geometric primitives approximating the input point cloud. Although higher order proxies such as spheres or cylinders could be considered we focus here on planes. First, most man made environments are essentially piecewise planar. Second, curved objects allow piecewise linear approximations at the cost of a higher number of proxies.

Principles of our approach : i) Estimation of an oriented plane is done via a weighted PCA incorporating closeness of points and and local complexity using curvature information obtained from the eigenvalues of the PCA. We define an input point to be inlier whenever its distance is less then some threshold, it is visible on the positive side of the plane and the angle between the line of sight and the normal is bounded by some threshold. ii) A proxy consists of an oriented 3D plane and a connected subset of inliers encoding the spatial extent. Here two methods will be tested. First a RANSAC based approach and second a region growing approach.

Currently I am implementing the region growing extraction. First results and images will be reported next week.

We expect a very high compression with the proposed approach. Since a single plane can approximate a high number of points. Of course the compression is lossy and one cannot expect recovering the original point cloud. I further believe that the integration of colour information can be done without to much overhead.

Tuesday, July 09, 2013

At first, we decided to implement some simple features such as:

  • width and hight of the AABB (axis aligned bounding box)
  • center of mass
  • volume of the convex hull
  • area of the projection of the cloud
  • eccentricity
  • moment of inertia

I’ve already started implementing them. After this step I will implement some more complex descriptors (e.g. 3D SURF, RoPS - Rotational Projection Statistics). And finally I’m going to use machine learning methods for the object recognition.

Start labeling
Monday, July 08, 2013

For evaluation of the developed algorithm we need ground truth so we decided to outsource manual labelling. For this purpose a highly efficient tool was developed. A person could easily solve this task however there are some difficulties.

First of all, what to do, if there are several separated roads in the scene? The solution is to mark as “road” only the pixels of the road, on which the vehicle is. Below is an example of such frame (left) and the labeling for it (right).


How to label an image if two different roads on the previous frames are merging in the current frame? We decided to mark pixels of the first road and the second road pixels lying above the horizontal line, drawn through the end of the curb that separates the roads. Here an example for explanation:


We optimize manual labelling time in 10 times in contrast to our initial version and now we could obtain enough labelled data in reasonable time. All results will be publicly available later.

Accuracy and Framerate Evaluation
Friday, June 28, 2013

In this post, I am describing the method I chose for evaluating people tracking algorithms.

In order to compare the different approaches I will develop for this code sprint, I implemented some Matlab files which compute some quantitative indices from the tracking output written to CSV file. First, I made the tracking algorithm to output to file also people bounding boxes inside the image in order to compare them with the ground truth by exploiting the PASCAL rule usually exploited for evaluating object detectors (The PASCAL Visual Object Classes (VOC) Challenge. Everingham, M. , Van Gool, L. , Williams, C. K. I. , Winn, J. and Zisserman, A. International Journal of Computer Vision (2010)).

Then, I implemented functions for reading the CSV file with the tracking output, comparing tracking bounding boxes with ground truth bounding boxes and computing two indices:

  • False Rejection Rate (%): 100*miss/(TP + miss) = % miss
  • False Positives Per Frames: FP/frames.

Given that tracking results can be considered good even if the computed bounding box is a bit off with respect to the person center, a threshold of 0.3 has been used in the PASCAL rule, instead of the standard 0.5 threshold.

For evaluating the framerate, the rostopic hz command has been used for measuring the publishing rate of the tracking topic (/human_tracker_data). All the framerates reported in this blog have been computed with an Intel i7-3630QM processor at 2.4-3.4GHz (4 cores, 8 threads).

The original code provided by SwRI produced the following results:

  • FRR: 18.24%
  • FPPF: 0.58
  • framerate: 23.8 fps
Hello Everybody!
Friday, June 28, 2013

The project “Fast 3D cluster recognition of pedestrians and cars in uncluttered scenes” has been started!

Project started!
Thursday, June 20, 2013

The project “Part-based 3D recognition of pedestrians and cars in cluttered scenes” has been started!

Progress on stereo-based road area detection
Wednesday, June 05, 2013

A few words about the project: the goal is to detect a drivable area (continuously flat area, segmented by a height gap such as curb). As an input we have two rectified images from the cameras on the car’s roof and a disparity map. An example of such images is below.


The point cloud, that was computed from them:

The point cloud is converted into the Digital Elevation Map (DEM) format to simplify the task. DEM is a grid in column-disparity space with heights associated to each node. A projection of DEM onto the left image is illustrated below.


On the following image you can see that despite of a low resolution of the DEM it is still possible to distinguish the road from the sidewalk.

A front view of the DEM in 3D space (nodes without corresponding points, i.e. the disparity map had no points, that should be projected onto this node, are marked with red):


DEM as a point cloud:

As a starting point we are going to implement an algorithm: J. Siegemund, U. Franke, and W. Forstner, “A temporal filter approach for detection and reconstruction of curbs and road surfaces based on conditional random fields,” in Proc. IEEE Intelligent Vehicles Symp., 2011, pp. 637-642.

All source code related to the project can be found here.

Project - Multi-Descriptor Optimizations across the 2D/3D Domains
Monday, June 03, 2013

The project has started.

Spectroscan3D Driver Implemented
Friday, May 24, 2013

The first milestone of the PCL Spectrolab Code Sprint has been completed. I received the Spectroscan 3D two weeks ago and I have used it to write a driver for the camera. In addition, I have created a simple 3D viewer for the camera which streams the data to a PCLVisualizer. All of the code for the driver (and the rest of the code sprint) can be found here. There are no Spectroscan3D units commericially available right now, but by the time I am done there will be a good set of open source tools to visualize and manipulate the data.

Here is an example scan from the camera.


An amplitude colorized view of an interior door.


An amplitude image of the interior door scan.

The Spectroscan3D driver is split into two components. The first component encapsulates the communication with the lidar camera. It speaks the lidar’s UDP protocol and handles receiving the raw range/amplitude images from the scanner. This component is meant to be independent of PCL. This way future developers can use it with other software systems as well. The second layer wraps the driver in pcl grabber interface and transforms the range image into a point cloud. You can register for either PointXYZ or PointXYZI point cloud events from the scanner.

The next stage of the code sprint will be writing a visualizer capable of controlling the camera like a video camera. It will be able to watch a live stream, record the stream, or play back an old stream from disk. Stay tuned for more updates!

Image Publisher Node Added
Thursday, May 23, 2013

In order to allow to perform detection and tracking on images contained in a folder, I added a new package (imagePublisher) which reads rgb and disparity images from a folder and publishes them to the standard rgb and disparity topics published by a Kinect/Xtion. This node also reads camera and projector parameters, such as baseline, optical center and focal length, from file (cameraInfo.yml) and publishes them to the /camera/depth_registered/camera_info and /camera/projector/camera_info topics.

I also provided a launch file (run_from_dir_nodelet.launch) for performing tracking from folder which uses the imagePublisher node to create the input stream for the detection cascade. A publishing framerate (replayRate) can be chosen to produce streams at the desired rate and a delay (delayStart) can be introduced before the stream starts.

Overview of the human_tracker Code
Tuesday, May 14, 2013

Today I will give an overview of the human_tracker code.

The code is a collection of ROS packages which perform detection, tracking, visualization and labeling. The detection packages are five (Consistency, HaarAda, HaarDispAda, HaarSvm and HogSvm) and they implement different approaches which are then fused in a detection cascade for improving results. The tracking algorithm, instead, is implemented in a single node (object_tracking). The visualization package (roi_viewer) allows to plot people bounding boxes onto the rgb image. The labeling procedure is implemented in the labeler package and allows to annotate bounding boxes in the image for creating a ground truth.

Together with the source code, the human_tracker repository also contains some pre-trained classifiers and a documentation folder. In order to run the code in your machine, you need to set an environment variable, NIST_CLASSIFIERS, which points to the folder containing the pre-trained classifiers.

In the System_Launch package, there are some launch files which allow to perform tracking live from a Kinect/Xtion (run_kinect_nodelet.launch) or from a pre-recorded ROS bag (run_from_bag_kinect.launch).

As the output of the tracking algorithm, the 3D positions and velocities of people are written to a topic (/human_tracker_data) and, optionally, to a CSV file.

I report here some considerations which are useful when working with the human_tracker algorithms:

  • the camera should not be inclined more than 15° wrt the ground plane because, in the consistency node, this assumption is used to discard part of the image;
  • the tracking node automatically estimates the ground plane equation and exploits it for working on people positions in terms of ground plane coordinates;
  • a new thread is used for every track and the maximum number of threads/tracks is set to 5. Thus, it is important to use a machine with enough threads (6-7) for running the code at the maximum framerate.

In the figure below, the output of all the detection nodes and of the tracking node is reported:

Hello World!
Tuesday, May 14, 2013

The project “Stereo-based road area detection” has been started!

Spectrolab PCL Code Sprint : First post
Sunday, May 05, 2013

This is the first post of the Spectrolab PCL Code Sprint. Spectrolab is a Boeing company that has developed a new LIDAR camera. This camera is meant for industrial and robotics uses in both outdoor and indoor environments. It uses a scanning time of flight laser to generate a 256x128 range image at 5-6 hertz. It is still in development right now, but it promises to be a valuable tool for robotics and perception work. Spectrolab has sponsored integration of the camera into PCL so that when it hits the market, it can take advantage of the wealth of tools and knowledge within the PCL community.

As apart of the code sprint, I will be developing a new PCL grabber driver for the Spectrolab camera and a 3D viewer interface. As I complete the driver and software, I will post screen shots and code explanations to this blog.

Leica proper data structures
Thursday, May 02, 2013

The reading speed is slower than the aim of the project. My aim is to speed it up by using proper data types.

First, we propose a new file structure for PTX where the coordinates can be separated from the image data. Header contains three extra fields:

  • data_type: indicates whether it is
    1. ascii
    2. binary
    3. binary compressed
  • image_offset: if image data is to be separated from the coordinates than it starts at position in the file. image_offset should be set to -1 if no RGB data is stored.
  • image_encoding: indicates how the image is stored # bgr8 indicates a binary pixel map # jp2k indicates a JPEG2000 compressed data image

Second, leica::PointCloud class inherits from pcl::PointCloud with an additional transformation matrix.

Third, sensor_msgs::PTXCloudData inherits from sensor_msgs::PointCloud2 with extra fields:

  1. image_offset
  2. image_encoding
  3. image_step

Finally, adapted point types:

  1. PointXYZI without extra padding;
  2. PointXYZIRGB specific to leica contains both color and intensity data.
PTX format reader
Wednesday, May 01, 2013

Created class PTXReader to read content of leica PTX ASCII file and store data into a PCL cloud. Reader is pushed into https://github.com/nizar-sallem/pcl leica branch.

Friday, April 26, 2013

Started the data compression project.

SwRI/NIST Code Sprint : First post
Thursday, April 25, 2013

Hi! This is the first post for the SwRI/NIST Code Sprint.

This code sprint deals with developing algorithms for human detection and tracking, out of 2D camera imagery fused with 3D point cloud data. The Southwest Research Institute (SwRI) just contributed a novel algorithm for human detection and tracking to ROS-Industrial’s repository on GitHub.

At this link, you can find the source code of the human_tracker project. For this code sprint, I am requested to improve accuracy and/or framerate of the existing code. All my updates will be inserted into the develop branch of the repository.

The human_tracker project aims at developing software able to track people while meeting a number of constraints which have been defined by NIST. These requirements can be summarized as follows:

  • track all people in the distance range 1-5m
  • tracking when the system is moving up to 3m/s and 1.5rad/sec
  • nominal update rate: 15+/-2 Hz (30Hz desired)
  • ID will persist for no less than 5 seconds
  • track people in the height range 153-190cm
  • track people up to 20% occluded
RobotEye Viewer
Monday, April 15, 2013
Here’s a video demonstrating the RobotEye Viewer application with the Ocular Robotics RE05 lidar sensor. See my previous blog post for a description of the application.
Final report and code committed
Friday, March 01, 2013

Last two weeks I’ve finished the NPP implementation of the Haar Cascade detector, it is now fully ported to PCL. I’ve included Alex Trevor his organized multi-plane segmentation to people api as well, so that people can do background subtraction based on the planes found in the image. This week I wrote the final report summarizing the work done as well as how to use the module.

RobotEye Viewer
Thursday, February 28, 2013

I’ve finished developing a new pcl::Grabber called the pcl::RobotEyeGrabber, and a visualization application called RobotEye Viewer. The grabber uses the boost asio library to do asynchronous I/O on a UDP socket using a separate thread. The RobotEye sensor sends lidar data in UDP data packets which are converted by the grabber into pcl::PointCloud data. The RobotEye Viewer is a Qt application that uses the QVTKWidget to embed a pcl::visualization::PCLVisualizer visualization window.

I developed the code using MacOSX and Ubuntu Linux. I ssh to a machine that is located at the Ocular Robotics lab and has a live sensor connected to it. With this machine, I was able to run the RobotEye Viewer application remotely using the RE05 lidar sensor. Ocular Robotics setup a webcam that points at the RobotEye sensor so I can see it in action as I run the application remotely. This setup worked out very well. Here’s a screenshot of the application, stay tuned for a video.

Screenshot of RobotEye Viewer application
Tutorial of the Dinast Grabber Framework
Thursday, February 07, 2013

The PCL Dinast Grabber Framework

At PCL 1.7 we offer a new driver for Dinast Cameras making use of the generic grabber interface that is present since PCL 1.0. This tutorial shows, in a nutshell, how to set up the pcl grabber to obtain data from the cameras.

So far it has been currently tested with the IPA-1110, Cyclopes II and the IPA-1002 ng T-Less NG but it is meant to work properly on the rest of the Dinast devices, since manufacturer specifications has been taken into account.


Small example

As the Dinast Grabber implements the generic grabber interface you will see high usage similarities with other pcl grabbers. In applications you can find a small example that contains the code required to set up a pcl::PointCloud<XYZI> callback to a Dinast camera device.

Here you can see a screenshot of the PCL Cloud Viewer showing the data from a cup laying on a table obtained through the Dinast Grabber interface:


And this is a video of the PCL Cloud Viewer showing the point cloud data corresponding to a face:

Dinast Grabber currently offer this data type, as is the one currently available from Dinast devices:

  • void (const boost::shared_ptr<const pcl::PointCloud<pcl::PointXYZI> >&)

The code

The code from apps/src/dinast_grabber_example.cpp will be used for this tutorial:

 #include <pcl/common/time.h>
 #include <pcl/point_types.h>
 #include <pcl/io/dinast_grabber.h>
 #include <pcl/visualization/cloud_viewer.h>

 template <typename PointType>
 class DinastProcessor

     typedef pcl::PointCloud<PointType> Cloud;
     typedef typename Cloud::ConstPtr CloudConstPtr;

     DinastProcessor(pcl::Grabber& grabber) : interface(grabber), viewer("Dinast Cloud Viewer") {}

     cloud_cb_ (CloudConstPtr cloud_cb)
       static unsigned count = 0;
       static double last = pcl::getTime ();
       if (++count == 30)
         double now = pcl::getTime ();
         std::cout << "Average framerate: " << double(count)/double(now - last) << " Hz" <<  std::endl;
         count = 0;
         last = now;
       if (!viewer.wasStopped())

     run ()

       boost::function<void (const CloudConstPtr&)> f =
         boost::bind (&DinastProcessor::cloud_cb_, this, _1);

       boost::signals2::connection c = interface.registerCallback (f);

       interface.start ();

       while (!viewer.wasStopped())
         boost::this_thread::sleep (boost::posix_time::seconds (1));

       interface.stop ();


     pcl::Grabber& interface;
     pcl::visualization::CloudViewer viewer;


 main ()
   pcl::DinastGrabber grabber;
   DinastProcessor<pcl::PointXYZI> v (grabber);
   v.run ();
   return (0);

The explanation

At first, when the constructor of DinastProcessor gets called, the Grabber and CloudViewer Classes are also initialized:

DinastProcessor(pcl::Grabber& grabber) : interface(grabber), viewer("Dinast Cloud Viewer") {}

At the run function what we first have is actually the callback and its registration:

boost::function<void (const CloudConstPtr&)> f =
  boost::bind (&DinastProcessor::cloud_cb_, this, _1);

boost::signals2::connection c = interface.registerCallback (f);

We create a boost::bind object with the address of the callback cloud_cb_, we pass a reference to our DinastProcessor and the argument place holder _1. The bind then gets casted to a boost::function object which is templated on the callback function type, in this case void (const CloudConstPtr&). The resulting function object is then registered with the DinastGrabber interface.

The registerCallback call returns a boost::signals2::connection object, which we do not use in the this example. However, if you want to interrupt or cancel one or more of the registered data streams, you can call disconnect the callback without stopping the whole grabber:

boost::signals2::connection = interface (registerCallback (f));

// ...

if (c.connected ())
  c.disconnect ();

After the callback is set up we start the interface. Then we loop until the viewer is stopped. Finally interface is stopped although this is not actually needed since the destructor takes care of that.

On the callback function cloud_cb_ we just do some framerate calculations and we show the obtained point cloud through the CloudViewer.

Testing the code

We will test the grabber with the previous example. Write down the whole code to a file called dinast_grabber.cpp at your preferred location. Then add this as a CMakeLists.txt file:

cmake_minimum_required(VERSION 2.8 FATAL_ERROR)


find_package(PCL 1.7 REQUIRED)


add_executable (dinast_grabber dinast_grabber.cpp)
target_link_libraries (dinast_grabber ${PCL_LIBRARIES})

Then just proceed as a usual cmake compilation:

$ mkdir build
$ cd build
$ cmake
$ make

If everything went as expected you should now have a binary to test your Dinast device. Go ahead, run it and you should be able to see the point cloud data from the camera:

$ ./dinast_grabber


Q: When I run the application I get an error similar to this one:

$ ./dinast_grabber
libusb: 0.000000 error [op_open] libusb couldn't open USB device /dev/bus/usb/002/010: Permission denied.
libusb: 0.009155 error [op_open] libusb requires write access to USB device nodes.

Where the last numbers of the /dev/bus/usb/... might vary.

A: This means you do not have permission to access the device. You can do a quick fix on the permissions of that specific device:

$ sudo chmod 666 /dev/bus/usb/002/010

Or you can make this changes permanent for all future Dinast devices writing a rule for udev. In debian-like systems it is usually done writing this:

# make dinast device mount with writing permissions (default is read only for unknown devices)
SUBSYSTEM=="usb", ATTR{idProduct}=="1402", ATTR{idVendor}=="18d1", MODE:="0666", OWNER:="root", GROUP:="video"

to a file like /etc/udev/rules.d/60-dinast-usb.rules.

If you still have problems you can always use the users mailing list: pcl-users@pointclouds.org to find some extra help.


With this new grabber a new kind of short-range sensors are available through the PCL Grabber interface. It is now a breeze to connect and obtain data from Dinast devices as you do with the rest of devices supported at PCL.

If you have any development suggestions on these or new devices you can contact us through pcl-developers@pointclouds.org.

Ubuntu and Win7 - VLCS HDL Viewer Test
Wednesday, January 23, 2013

It has been fast time since VLCS starting. Keven announces “pcl_hdl_simple_viewer” in Dec. 2012. Now, it supports live mode (network) and recorded file mode (PCAP file) with several visualization format (XYZ, XYZI, XYZRGB). He mainly develops under Linux-Fedora. I have tested and supported under Linux-Ubuntu and Windows 7.

For windows implementation, Hdl_simple_viewer needs WinPcap

Unfortunately, PCAP is not stable under Windows 64bits. It takes a long time to know about it. So, I build up 32-bit development environment.

Ubuntu and Windows test image are shown as belows

  • Recording file mode(PCAP file) - road.pcap from velodyne website.

    Ubuntu XYZ / XYZI / XYZRGB

    Windows XYZ / XYZI / XYZRGB

  • Live mode (Live mode also provides three visualization format).

    Ubuntu / windows

RobotEye lidar scan animation
Saturday, January 05, 2013
Ocular Robotics sent me some lidar scans to get started with. The data is stored in binary files with azimuth, elevation, range, and intensity fields. I wrote some code to load the binary data and convert it to the pcl::PointCloud data structure. From there, I saved the data in the pcd file format and opened it with ParaView using the PCL Plugin for ParaView. I used ParaView’s animation controls to create a video of the lidar scan:
Velodyne Laser Code Sprint (VLCS) Update
Wednesday, December 05, 2012

I’m pleased to announce that PCL now supports the Velodyne High Definition Laser (HDL) -32 and -64 lasers. The interface is provided as a Grabber (HDL_Grabber) and accepts packets from the network (live) or PCAP file (recorded). Two sample programs, pcl_hdl_grabber and pcl_hdl_viewer_simple are provided to demonstrate the capabilities of the system. libpcap-devel is required to build the PCAP reading portions of code.

Sample PCAP Files are provided by Velodyne: http://velodyne.com/lidar/doc/Sample%20sets/HDL-32/

The image below came from the Building-Side.pcap.

ORCS Kickoff
Monday, November 26, 2012

I’m excited to start the PCL - Ocular Robotics code sprint. This week I’ll have a kickoff meeting over Skype with engineers from Ocular Robotics. In the meantime, I’m reading about the RE0x sensors and updating my PCL build directories!

HRCS Stereo Segmentation Final Report
Tuesday, November 13, 2012

The Honda Research Code Sprint for ground segmentation from stereo has been completed. PCL now includes tools for generating disparity images and point clouds from stereo data courtesy of Federico Tombari, as well as tools for segmenting a ground surface from such point clouds from myself. Attached is a report detailing the additions to PCL and the results, as well as a video overview of the project. There is a demo available in trunk apps, as pcl_stereo_ground_segmentation.

Update on Dinast cameras work
Wednesday, October 03, 2012

It has been quite a long time since my last post so I will give a full update on all the developments that I have taken care of in this time.

After a first implementation of the pcl::DinastGrabber for Dinast cameras (IPA-1002, IPA-1110, IPA-2001) I did some testing with multiple cameras. For that I mounted two of them on a robotics arm. After calibration, I got the combination of two pointclouds in one single view. The following picture shows the result of me standing in front of the two cameras.


Also here is the setup of the cameras on the robotics manipulator:


Then we used the RRT implementation with the robotic arm for planning purposes. The calibrated cameras where also used to build a collision shield around the robot. First RRT was run to get the path to the goal. When the robot was moving, collision checking was performed using the information obtained from the cameras. When an object was detected, the robot was stopped and RRT was run again in order to obtain a new path to the goal avoiding the detected object. The pic below shows the replanning part of the whole testing, while trying to avoid a box that was in the path to the goal.

URCS Final Report
Sunday, September 30, 2012

Over the weekend, I finished the URCS final report. I would like to thank my mentors, Jacob Schloss at Urban Robotics, Radu, and Julius, who all provided me a lot of patient support and inspiration on this project. Justin Rosen on TRCS has been a great collaborator and deserves a lot of credit for the success of the outofcore integration.

I would like to draw some attention to a few items related to outofcore that I explain in this document in further detail. First of all, even though the URCS is drawing to a close, outofcore is sill very much under development. The API is not 100% set, though significant strides have been made in restructuring the internals for long term stability and integration with PCL. Most importantly, all functionality is available for PointCloud2-based insertion and query methods now! These should be the preferred methods of the interface.

Currently, for the average user, the pipeline will be:

  • Construct an outofcore octree with pcl_outofcore_process from a set of PCD files
  • Use one of the two available outofcore viewers for visualization.
  • When building LOD, remember the number of internal root nodes can grow exponentially the deeper the tree gets. As a consequence, building LOD can take quite a long time if you are inserting tens or hundreds of millions of points into a tree that is very deep.
  • pcl_outofcore_process computes the size of the bounding box of the ENTIRE set of PCD files. That said, if you want to update the tree later, it is your responsibility to make sure the points fall within the original bounding box. If you want to set an arbitrarily large bounding box, the source of pcl_outofcore_process is easy to modify for this case.

For developers, I should emphasize two things to ensure compatibility with future changes that will be introduced into outofcore. If anyone has opinions on this, we would certainly like to entertain some discussion on the pcl-developers@ mailing list.

  • The classes are still templated, but this will change. Because rendering is dynamic, please use the PointCloud2 methods of the outofcore octree. It is admittedly a little confusing because the interfaces are still in a single class with overloaded methods. The easiest way to handle this while it is in a state of flux is via a typedef:
typedef OutofcoreOctreeBase<OutofcoreOctreeDiskContainer<pcl::PointXYZ>, pcl::PointXYZ> OutofcoreBase;

Then, FORGET that it is templated on pcl::PointXYZ; it is not important. You can use whatever Point Type you would like packed into a PointCloud2 data structure. You can easily convert your PointCloud to a PointCloud2 by using toRosMsg.

  • I also emphasize that Outofcore, while an octree, is NOT related to pcl_octree at this time. I have added some methods to make their interfaces somewhat similar, but please beware of a false sense of consistency.

The final report should contain enough information to get started developing with outofcore. I hope it also provides a sense of where the code is coming from and where the library is heading in the context of its future place in the PCL library.

Frustum Culling
Sunday, September 30, 2012

Filters points lying within the frustum of the camera. The frustum is defined by pose and field of view of the camera. The parameters to this method are horizontal FOV, vertical FOV, near plane distance and far plane distance. I have added this method in the filters module. The frustum and the filtered points are shown in the images below.

_images/frustum-culling-1.png _images/frustum-culling-2.png _images/frustum-culling-3.png
Final report and code committed
Friday, September 28, 2012

This week I committed the code to perform face detection in PCL trunk and wrote the final report summarizing the work done as well as how to use the module.

Final report
Monday, September 24, 2012

As a final blog post for this Trimble Code Sprint, I am attaching the final report I have written for the sponsors.

Some things on object recognition
Tuesday, September 04, 2012

In the last months I have been working on a new meta-global descriptor called OUR-CVFH (http://rd.springer.com/chapter/10.1007/978-3-642-32717-9_12) that as you can imagine is an extension to CVFH which is in its turn an extension to VFH (i am not very original at dubbing things). I have also commited some tools and pipelines into pcl/apps/3d_rec_framework (still unstable and not very well documented).

Tomorrow we are having a TV demo in the lab where we are showing recent work on recognition/classification and grasping of unknown objects. So, as happens usually, I had to finish some things for it and I would like to show how with OUR-CVFH is it possible to do scale invariant recognition and 6DOF pose estimation + scale. The training objects are in this case downloaded from 3d-net.org (unit scale, whatever unit is) and they usually do not fit the test objects accurately.


Apart from this, I have also extended OUR-CVFH to use color information and integrate in the histogram. Basically, the reference frame obtained in OUR-CVFH is used to create color distributions depending on the spatial distribution of the points. To test the extension, I did some evaluations on the Willow Garage ICRA 11 Challenge dataset obtaining excellent results (about 99% precission and recall). The training dataset is composed of 35 objects and the test set with 40 sequences totalling 435 object instances. A 3D recognition pipeline method based on SIFT (keypoints projected to 3D) obtains about 70% in such a dataset (even though the objects present texture most of the time). Combining SIFT with SHOT and merging the hypotheses together, gets about 84% and the most recent paper on this dataset (Tang et al. from ICRA 2012) obtains about 90% recall at 99% precission. If you are not familiar with the dataset, here are some screenshots and the respective overlayed recognition and pose estimation.

_images/T_02_01.png _images/T_05_01.png _images/T_06_01.png

The color extension to OUR-CVFH and the hypotheses verification stage are not yet in PCL but I hope to commit them as soon as possible, probably after ICRA deadline and before ECCV. You can find the Willow ICRA challenge test dataset in PCD format at http://svn.pointclouds.org/data/ICRA_willow_challenge.

Random Forest changes
Sunday, September 02, 2012

I am back from “holidays”, conferences, etc. and today I started dealing with some of the concerns I pointed out in the last email, mainly regarding the memory footprint required to train. The easiest way to deal with that is to do bagging on each tree so that the training samples used at each tree are loaded before start and dismissed after training a specific tree. I implemented that by adding an abstract DataProvider class to the random forest implementation which is specialized depending on the problem. Then, when a tree is trained and a data provider is available, the tree requests training data to the provider, trains and discards the samples.

I also realized that most of the training data I have for faces, contains a lot of NaNs except of the parts containing the face itself and other parts of the body (which are usually localized in the center of the image). So, to reduce further the data in memory, the specialization of the data provider crops the kinect frames discarding regions with only NaN values.

With this two simple tricks, I am able to train each tree in the forest with 2000 random training samples (from each sample 10 positive and 10 negative patches are extracted) requiring only 3GB of RAM. In case that more training data is needed or the training samples become bigger, one might use a similar trick to design an out-of-core implementation where the data is not requested at tree level but at node level and only indices are kept into memory.

I also found some silly bugs and now I am retraining... let’s see what comes out.

Wrap-up posting for 3D edge detection
Tuesday, August 28, 2012

My primary goal of GSOC‘12 was to design and implement a 3D edge detection algorithm from an organized point cloud. Various edges are detected from geometric shapes (boundary, occluding, occluded, and high curvature edges) or photometric texture (rgb edges). These edges can be applicable to registration, tracking, etc. Following code shows how to use the organized edge detection:

pcl::OrganizedEdgeFromRGBNormals<pcl::PointXYZRGBA, pcl::Normal, pcl::Label> oed;
oed.setInputNormals (normal);
oed.setInputCloud (cloud);
oed.setDepthDisconThreshold (0.02); // 2cm
oed.setMaxSearchNeighbors (50);
pcl::PointCloud<pcl::Label> labels;
std::vector<pcl::PointIndices> label_indices;
oed.compute (labels, label_indices);

pcl::PointCloud<pcl::PointXYZRGBA>::Ptr occluding_edges (new pcl::PointCloud<pcl::PointXYZRGBA>),
        occluded_edges (new pcl::PointCloud<pcl::PointXYZRGBA>),
        boundary_edges (new pcl::PointCloud<pcl::PointXYZRGBA>),
        high_curvature_edges (new pcl::PointCloud<pcl::PointXYZRGBA>),
        rgb_edges (new pcl::PointCloud<pcl::PointXYZRGBA>);

pcl::copyPointCloud (*cloud, label_indices[0].indices, *boundary_edges);
pcl::copyPointCloud (*cloud, label_indices[1].indices, *occluding_edges);
pcl::copyPointCloud (*cloud, label_indices[2].indices, *occluded_edges);
pcl::copyPointCloud (*cloud, label_indices[3].indices, *high_curvature_edges);
pcl::copyPointCloud (*cloud, label_indices[4].indices, *rgb_edges);

For more information, please refer to following codes in PCL trunk:

It was a great pleasure to be one of the GSOC participants. I hope that my small contribution will be useful to PCL users. Thank Google and PCL for the nice opportunity and kind support. Lastly, thank Alex Trevor for mentoring me.

Out of core: Pending API Improvements
Sunday, August 26, 2012

Over the past two weeks, I have made some considerable changes to the outofcore API:

  1. Renamed all of the out of core classes to meet the PCL class naming convention
    1. octree_base to OutofcoreOctreeBase
    2. octree_base_node to OutofcoreOctreeBaseNode
    3. octree_disk_container to OutofcoreOctreeDiskContainer
    4. octree_ram_container to OutofcoreOctreeRamContainer
  2. Renamed some public and private methods in OutofcoreOctreeBase, as well as pulled unused code, renamed protected methods, etc...
    1. printBB->printBoundingBox
    2. insertsectsWithBB->insertsectsWithBoundingBox
    3. pointWithinBB->pointInBoundingBox
    4. getBB->getBoundingBox
    5. withinBB->inBoundingBox
  3. Moved all Boost dependencies into a separate outofcore boost header
  4. Encapsulated the metadata of the OutofcoreOctreeBaseNode class into a separate class handling the JSON I/O. This should allow flexibility with metadata format (and the possibility of seamlessly switching to XML/YAML, etc...). This functionality lives in a separate class called OutofcoreOctreeNodeMetadata.

A few changes I am currently working on are:

  1. Implementing depth first and breadth first iterators, similar to the implementation in pcl_octree by Julius
  2. Improving the readability of the doxygen documentation
  3. Parameterizing of the LOD building algorithm (customization via pcl::Filter input)
Benchmarking PNG Image dumping for PCL
Thursday, August 23, 2012

Here is the result we got from PNG dumping benchmarking: 640x480 (16 bits) depth map + 640 x 480 (24 bits) color image.

08-23 20:57:43.830: I/PCL Benchmark:(10552): Number of Points: 307200, Runtime: 0.203085 (s)
08-23 20:57:54.690: I/PCL Benchmark:(10552): Number of Points: 307200, Runtime: 0.215253 (s)

If we are dumping the result to the /mnt/sdcard/, we are getting:

08-23 21:02:23.890: I/PCL Benchmark:(14839): Number of Points: 307200, Runtime: 0.332639 (s)
08-23 21:02:40.410: I/PCL Benchmark:(14839): Number of Points: 307200, Runtime: 0.328380 (s)

There is a significant overhead (about 0.1 second!) with the SD Card I/O.

We shall verify these with a faster SD card. The /mnt/sdcard seems to be mounted onto an internal SD Card on the Tegra 3 dev board that I have no access to? I tried to open the back and so already.

Also, I have tried different compression levels and it seems that level 3 is giving the best compression ratio vs the speed. More plots will come next to justify my observations.

Code Testing
Tuesday, August 21, 2012

Have gone through the whole setup again and replicated on Ubuntu 12.04 + Mac OSX 10.8 environments. The README file is now updated to reflect what is needed to have the environment setup.

At the end, it was only 4 scripts, and maybe we can automate these completely. We have also added the scripts for dumping the images store in the SD Card. check out all .sh files in the directory and that may save hours of your time.

Cloud Manipulations!
Monday, August 20, 2012

Just a quick update to let you know that you can now manipulate clouds using the mouse. This allows you to do some pretty interesting things when combined with other tools, like Euclidean Clustering.

For instance, in the image below, there are three frames. The first (left-most) shows the original point cloud. I use the Euclidean clustering tool on it, which allows me to pull out the hand as a separate cloud, and select it (now shown in red in the middle). I can then use the mouse manipulator to move the hand as I please, and, for instance, have it pick up the red peg (right most frame).


Everything is coming together; the core of the app is now in place... basically I just need to implement the rest of the PCL tutorials as plugins, and you can do basically everything PCL can do... with the exception of working with movies.

Drag and Drop(dnd) Support and More Filters
Sunday, August 19, 2012

dnd support inside the scene tree is added, the users can drag and drop point cloud items to different render window, which will help them observe the point cloud. It will be nice to add more dnd support, for example, drag point cloud directly from the file system to the modeler app, there can be many such nice features, and I will add them after GSOC. And new filters can be added very easily to the current framework. But there seems an unknown exception thrown from file save module, which crashes the app, I will anaylse what’s the problem and finish file/project load/save things.

A new point type to handle monochrome images is now available on the trunk!
Saturday, August 18, 2012

I’ve developed a new point type to handle monochrome images in the most effective way. It contains only one field, named intensity, of type uint8_t. Consequently, I’ve updated the pcl::ImageViewer so that it is able to manage the point clouds related to the new point type, named pcl::Intensity. Finally, I’ve changed the PNG2PCD converter by adding more functionalities: now the user can choose if the written cloud should be based on pcl::RGB or on pcl::Intensity. For more information, please see the documentation related to each single class or file.

Lossless Image Dumping with libpng + Android and Tegra 3
Friday, August 17, 2012

Today, I’ve added the support of libpng + libzlib for the Tegra 3 project and so we can dump the raw images from the Kinect (or any OpenNI supported devices) onto the SDCard for post-processing or debugging. After hours of fiddling with the parameters and hacking away on the code, now we can capture and compress 4-6 images per second (2-3x 24-bit RGB image + 2-3x 16-bit depth image) on a Tegra 3. I believe these libraries are already NEON optimized and thus we shall be getting the best performance from them. Here is the little magic that gives me the best performance so far.

// Write header (16 bit colour depth, greyscale)
//fine tuned parameter for speed!
png_set_filter(png_ptr, PNG_FILTER_TYPE_BASE, PNG_FILTER_SUB);
png_set_compression_level(png_ptr, 1); //1 is Z_BEST_SPEED in zlib.h!
png_set_compression_strategy(png_ptr, 3); //3 is Z_RLE

Next step, if time permitted I will use the PCL library compression code instead. Using the libpng, however, has taught me where the critical paths are and how we shall handle the data. Right now, I am sure that I wasn’t introducing any overheads from the data copying or manipulations. I was handling the raw data pointers the whole time.

For the longest time, I have had trouble getting any performance out from the Tegra 3, mainly because of the floating point operations! Again, avoid these operations at all cost unless we have a more powerful processor!

Here is a screenshot of some of the images that were dumped from my Kinect in real-time!

Tutorials, bug with pcd_viewer, PCLVisualizer and important additions
Thursday, August 16, 2012

Wrote tutorials for the 2D classes. Got another weird bug found by Radu. Plotter not working in pcd_viewer on point picking. Still struggling in it.

Tried to make PCLVisualizer cleaner and readable. Removed some unnecessary function calls. Didn’t commit yet.

Added some important functionalities in Plotter and a seperate vtkCommand event handler.

Placement for ICP Registration
Wednesday, August 15, 2012

The initial position and orientation can be tunned in a dialog triggered by double clicking the point cloud item ICP registration, as shown in the following snapshot. And ICP paremeters are exposed to give more control to the users. I will add more workers, add project support and document the functions next.

Handling templated clouds, Double Dispatch in cloud_composer
Monday, August 13, 2012

So I finally decided to bite the bullet and add support for the templated classes to the GUI, rather than just using the sensor_msgs::PointCloud2. After some discussion on the dev boards and in irc (thanks to everyone for their input, especially Radu), I decided to put together a system which allows use of the templated classes even though point type is unknown at compile time. This is inherently kind of difficult, since templates and run-time polymorphism are two opposing concepts.

Before I get into the technical stuff, let me just say that the consequence of everything that follows is that for every CloudItem, we maintain both a PointCloud2 and a templated PointCloud<> object. Because of the way the model was designed to enable undo/redo, these are always synchronized automatically, since they can only be accessed through the CloudItem interface.

Everything is centered on the CloudItem class, which inherits from QStandardItem. For those who haven’t been following, the application is built around these types of items, which are stored in a ProjectModel object. There GUI is essentially a bunch of different views for displaying/editing this model. Anyways, clouds are always loaded from file as binary blobs (PointCloud2 objects). The header of the PointCloud2 object is then parsed to determine what the underlying data looks like. By doing this, we can figure out which template PointType we need, and using a nasty switch statement, can instantiate the appropriate PointCloud<T> object.

We then take the pointer to the PointCloud<T> and store it in a QVariant in the CloudItem (we also store the PointType in an enum). When we fetch this QVariant, we can cast the pointer back to the appropriate template type using the enum and a little macro. This means one can write a tool class which deduces the template type and calls the appropriate templated worker function.

One interesting thing popped up when I was doing this. The tools use run-time polymorphism (virtual functions) to determine what work function to call. That is, I manipulate base-class Tool pointers, and let the v-table worry about what type of Tool the object actually is. A problem arises with templated types though, since virtual function templates are a no-no.

To get around this, I worked out a double dispatch system - the visitor design pattern. This allows me to determine what code to execute based on the run-time types of two different objects (rather than just one, which could be handled by a vtable). The core idea here is that we first do a run time look up in the vtable to determine what tool is being executed, passing it a reference to the CloudItem. We then take the QVariant, deduce the PointType of the cloud it references, then execute the tool using the appropriate template type.

I’m sorry if none of that made sense... I’ll draw up some diagrams of how the whole application functions in a week or so, which should help alot. I’m now finishing up the code which allows manipulation of vtk actors with the mouse to be applied back into the model (with undo/redo as well).

tl;dr: Templates now work even though you don’t know the PointType at compile-time. Mouse manipulation of actors in the PCLVisualizer render window now works, and will shortly be properly propagated back to the model.

Testing the PCL DinastGrabber
Saturday, August 11, 2012

Cameras just arrived! After some minor troubles with the shipping the cameras have finally arrived, so here is a picture of what I got:


I got two DINAST CYCLOPES II cameras and some related hardware and software. They will be used for the application on multiple cameras point cloud generation. This cameras obtain short range 3D data up to 80 cms at a resolution of 320x240. After some code tunning the PCL Grabber works properly and gets the 3D and 2D data from the cameras. Here is a video of the PCL DINAST Camera Grabber using a CYCLOPES II camera:

Supervised Segmentation
Friday, August 10, 2012

The supervised segmentation is a two step process. Consisting of training phase and segmentation. In the training phase we extract the objects from the scene. We use the FPFH features as classifiers and as a prior assignment of the unary potentials of the CRF. We compute the FPFH histogram features for all points in one object. To reduce computation and feature comparisons in the recognition step we use a k-means cluster algorithm and cluster the feature into 10 classes. The training objects can be seen in the following image.


In the segmentation and recognition step we use the learned features to assign prior probabilities to a new scene. The prior assignment of the most likely label can be seen in the following image. As you can see in the image, many of the points of the objects we want to segment and recognize are not labeled correctly. This is because the distance of two FPFH features are two far apart. However, as a first initial estimate, FPFH features are well suited. The advantage of using these features is the fact that it only captures the geometry of the features and not color information. Whit this the training data set can be much smaller.


As a second and to refine the assignment we use the fully connected CRF. The following image shows the segmentation and labeling after 10 iterations.

Openni interface for 3D edge detection
Friday, August 10, 2012

An openni interface for 3D edge detection is added in PCL trunk. Once it starts, you can show and hide each edge by pressing the corresponding number:

  • 1: boundary edges (blue)
  • 2: occluding edges (green)
  • 3: occluded edges (red)
  • 4: high curvature edges (yellow)
  • 5: rgb edges (cyan)

The high curvature and rgb edges are not enabled for fast frame rate, but you can easily enable these two edges if you want to test.

A new tool for PNG to PCD conversions is now available on the trunk!
Friday, August 10, 2012

I’ve developed a simple utility that enables the user to convert a PNG input file into a PCD output file. The converter takes as input both the name of the input PNG file and the name of the PCD output file. It finally performs the conversion of the PNG file into the PCD file by creating a:


point cloud. Now, the PNG2PCD converter is available on the trunk version of PCL under the tools directory.

Cloud Commands: Now with 99.9% Less Memory Leakage!
Wednesday, August 08, 2012

Today I reworked the Cloud Command classes and their undo/redo functionality so that everything gets properly deleted when the time comes. There are two different scenarios where a command needs to get deleted:

  • When the undo stack reaches its limit (set to 10 commands atm), we need to delete the command at the bottom of the stack (first in)
  • If we eliminate the existence of a command by undoing it, then pushing another command on the stack

These two cases have different deletes though. In the first case we need to delete the original items, since we’ve replaced them with the output of the command. In the latter case, we need to delete the new items the command generated, and leave the original ones alone.

Additionally, different commands need to delete data which is structured in different ways. For instance, a split command (such as Euclidean Clustering) needs to delete one original item or many created items, while a merge command needs to delete many original items or a single created item... and so forth...

Oh, on a side note, I’m not sure what I should have the app do when one tries to merge two Cloud items which have different fields. For now, I just say “Nope”... but there may be a way to combine them that makes sense. Unfortunately this currently doesn’t exist for PointCloud2 objects in PCL afaik. Anyone feel like adding some cases to pcl::concatenatePointCloud(sensor_msgs::PointCloud2&, sensor_msgs::PointCloud2&, sensor_msgs::PointCloud2&) that checks for different but compatible field types? For instance, concatenating rgb and rgba clouds could just give a default alpha value for all the rgb cloud points. This would be pretty easy to code if one did it inefficiently by converting to templated types, concatenating, then converting back...

Bugs and additional features
Tuesday, August 07, 2012

Fixing bugs takes time. PCLPlotter was behaving weird in pcd_viewer. A window was appearing just after the creation of an object of 2D classes (Plotter and Painter2D) without even the call of plot/spin/display functions. Thus, had to move vtkRenderwindowInteractor::Initialize() and therefore vtkRenderwindowInteractor::AddObserver() to the display triggering calls (plot/spin/display).

Added other small functionalities like setTitle* in Plotter.

Merge Cloud Command, Create New Cloud from Selection
Tuesday, August 07, 2012

Just a quick update on what I’ve added.

The frustum selection now connects back to the model through an event it invokes. It can search for points if needed, but right now I’m sanitizing all clouds (no NaNs) so there’s a 1-1 correspondence from vtk to PCL. Of course I have to figure out what cloud the vtk point belongs to, but that can be done using the CloudActorMap from PCLVisualizer.

I’ve added the last undo/redo command, merging. I also added a tool which creates a new cloud from the current selection. There are now two “selectors”; a green one for points selected in the VTK window, and a red selector for points selected in the cloud browser dock. Right now all commands applied to selections work on both, but that may change in the future.

Fair tests on the Stanford dataset 2
Sunday, August 05, 2012

I’ve finally completed the section Detectors evaluation: repeatability and time performances and it is now available for consulting.

New outofcore refactoring, cleaning of API
Saturday, August 04, 2012

I have been working on cleaning up the outofcore library for its initial release. I still need to finish updating the documentation, add the examples to the doxygen mainpage (which is currently blank), and write a tutorial on how to use the out of core libraries. There is still quite a bit of unused code to pull, and a lot of refactoring to get the code fully to PCL’s style standards. I am still debating whether it makes sense to remove the templating. I committed some refactoring this afternoon, and will continue to do so concurrently while I am preparing the code for final testing. I have also started writing a final report to wrap up the code sprint. There has been some growing interest in outofcore features in PCL on the mailing lists lately, so I hope to have the code base fully useable soon.

Fair tests on the Stanford dataset 2
Saturday, August 04, 2012

In what follows I’ll show the main results of the tests on the Stanford dataset. The complete results will be given in Detectors evaluation: repeatability and time performances. The fairness of the comparison is ensured by fixing properly some common parameters among all the detectors. ISS has clearly the best relative repeatability among all the PCL detectors that have been under testing. With regards to the legend, the same considerations made in the previous blog post apply this time.

reading code on MS
Saturday, August 04, 2012

Tomorrow I am going on vacation for two weeks, I just hope I will be able to take the time to finish the API for MeanShift and get it up and running until the deadline. In my spare time I am checking out some other existing implementations of MeanShift, namely the Edison library and the one existing in OpenCV as well as some existing Matlab implementations. I’ve also wrote a first script doing segmentation based on color, but results were not exactly what I was hoping for.

Axes Widgets, Signal Multiplexer, InteractorStyleSwitch, Rectangular Frustum Selection
Friday, August 03, 2012

Crunch time is here, so I’ll just give a short update on the things I’ve added:

There’s an axes widget now that shows the orientation of the camera, as in ParaView.

I added a signal multiplexer class which greatly simplifies the architecture. This class allows you to specify connections without having to specify the receiving object when you make the connections. You then can switch receiving objects, and all the connections are adjusted automatically. For me, this means that I have all my GUI actions connected to the multiplexer, and then I just tell the multiplexer which project is currently visible. The GUI actions are then routed to that project, and all the other projects just idle. It also allows me to update the GUI state automatically to project state when projects are switched. Why a class like this isn’t included in Qt itself is kind of a mystery to me.

I added an InteractorStyleSwitch class which allows you to switch what interactor style PCLVisualizer is using. This means that with one function call we can switch from the normal PCLVisualizer camera nteractor style to one which lets us select sub-clouds with the mouse.

Which brings us to the last addition, the ability to select subclouds using a rectangular frustum selector. As seen in the images below, this works, and it works on multiple clouds in the scene. What it doesn’t do yet is take the selected vtkPolyData points and figure out what indices (and what clouds) they correspond to in the original PCL clouds contained in the Project Model.

That’s for tomorrow... plus the ability to split off the selected points into a new CloudItem (with undo/redo of course).


Stay classy, PCL blog readership.

Fair tests on the Kinect dataset
Thursday, August 02, 2012

In what follows I’ll show the main results of the tests on the kinect dataset. The complete results are given in Detectors evaluation: repeatability and time performances. The fairness of the comparison is ensured by fixing properly some common parameters among all the detectors. NARF and ISS are clearly the two best PCL detectors among the ones tested.

I think it is required a brief explanation of the graphs legend. First the harris 3D detector is tested with all its possible response methods, so the abbreviations HA, NO, LO, TO, and CU respectively refer to the response methods: HARRIS, NOBLE, LOWE, TOMASI, CURVATURE. The abbreviation ISS - WBE refers to the execution of the ISS 3D detector without the border extraction, while ISS - BE refers to the execution of the detector with the extraction of boundary points. Finally, the abbreviations 1 th, 2 th, 3 th and 4 th stand for 1, 2, 3 and 4 threads and they are related to the OpenMP optimization of the code.

PCL Android (milestones)
Wednesday, August 01, 2012

The lack of hardware accelerated libraries for Android is the key bottleneck I’ve been facing. After spending many hours on NEON assembly and other tools, I finally come across this...

Ne10: A New Open Source Library to Accelerate your Applications with NEON http://blogs.arm.com/software-enablement/703-ne10-a-new-open-source-library-to-accelerate-your-applications-with-neon/


Next I’ve verify the actual speedup we can get with such library, and see how we can accelerate some of the PCL calls with these. With NEON + multithreading, I am looking for a 10x speedup on Tegra 3.

More to come next... Update: I’ve added Ne10 to the project tree, and have it compiled. Shall be ready to verify the theortical speedup we can obtain with the new hardware accelerated libraries. =)

News on tests
Wednesday, August 01, 2012

While developing the evaluator for the ISS 3D detector, I realized that I set the harris 3D and 6D normal estimation radius to:

multiplier * cloud_resolution

where multiplier is set to 6, 9, 12 and 15 cloud_resolution each time. Instead of this setting, I should have set the normal estimation radius to:

4 * cloud_resolution

in order to obtain fair tests. The previous tests are valid but the reader should take into account this consideration. While testing the ISS 3D detector I re-run the tests to obtain the desidered fairness. I have also decided to collect this final tests in a specific blog page, so as the user can immediately reach the results without looking for it in all my posts. I have just completed the evaluation on the Kinect dataset and I will post it soon both on the blog and on Detectors evaluation: repeatability and time performances.

starting work on mean shift segmentation
Wednesday, August 01, 2012

I have started working on implementing mean shift. I am going to try to keep the API as modular as possible, thinking of the fact that mean shift can be used for a variety of things among which segmentation is only one. First I am just writing a script like program to check if the algorithm works, and then I’ll implement it in the API. (segmentation will be based on color space and on normals at a first try)

Additional features and spin*() functions
Tuesday, July 31, 2012

As I posted before, my work is now to improve and add additional features to my two classes PCLPlotter and PCLPainter2D. I always had in my mind that I will add spin*() functions (spinOnce(time) and spin()), which are a part of all the existing visualization classes (like PCLVisualizer, PCLHistogramVisualizer, etc), to my classes. Frankly, I did not understand these functions much based on the documentation and the code, perhaps because of my no knowledge of vtk’s event handling (vtkCommand and all). All I knew that this functions someway start the interactor.

So, I finally understood those function after getting familiar to vtkCommand and going through their implementation. I kind of find the names confusing. spinOnce(t) runs the interactor event-loop for time t. spinOnce sounds like spinning (looping) one time which is confusing. spin() runs the interactor event-loop for indefinite time using spinOnce(t) thereby providing the ability to update the scene with time. But following is the description for spin() provided in the documentation in verbatim: “Calls the interactor and runs an internal loop.”. Either I am missing out something or the documentation is misleading!

Apart from the above, I was stuck for the most of the time figuring out the usage of RepeatingTimer. The repeating timer event is caught and the timer is destroyed- right in the first time! A SingleShotTimer very well suited this purpose. I did not understand the use of RepeatingTimer. I used SingleShotTimer in my spin* methods and it works as it should.

Other than spin*() functions, I added other features about which I will post in the next blog. I would also like to comment here on the general design of the “visualizer”s in pcl. Unfortunately, blogging takes time and I will post about them in the next few days, one by one.

ICP Registration
Tuesday, July 31, 2012

ICP registration is added. The input point clouds and the result can be shown in same or different render windows for better inspection, as indicated by the following snapshot.


Initial positions of the input point clouds are important for the registration to converge. I will implement two types of tools for the initial placement of the input point clouds: setting corresponding points and directly tuning the orientation/position of the point clouds. The tuning will be accomplished by interactively adjusting the orientation/position parameters or by draggers(if it’s not very complicated to implement draggers with vtk...).

Better interface for various point cloud types
Tuesday, July 31, 2012

I have modified the previous code so that it finds possible edges from various point cloud types:


OrganizedEdgeFromRGB and OrganizedEdgeFromNormals are derived from OrganizedEdgeBase. OrganizedEdgeFromRGBNormals is then derived from both OrganizedEdgeFromRGB and OrganizedEdgeFromNormals.

Tool Input Checking
Monday, July 30, 2012

So I implemented a way of showing what inputs are required for a tool and if the currently selected items are a valid input for the tool. It’s a pretty simple system really; if the selected item(s) is valid input for a tool, it becomes enabled, and you can use it. If the selected item(s) don’t match what is needed by a tool, the tool is greyed out, and you get a tooltip which says why you can’t use the tool. In the case of the image below, that’s because calculating FPFH features requires normals.


Now I’m working on mouse selections in the QVTKWidget/PCLVisualizer view. Things like rectangular-drag and point by point.

Then I’ll add registration, the merge command, and the ability to manually shift point clouds (using the keyboard, and perhaps the mouse).

Then I’ll add as many additional tools as I can manage before GSoC ends. It’s crunch-time; it’s going to be tough to implement all the features I would like to by the middle of August. This has turned out to be a pretty large project for one person... Fortunately the end of GSoC won’t be the end of this App; I’ll be contributing one day a week this Autumn/Winter to continued development of it.

PCL DinastGrabber and RRT demo
Monday, July 30, 2012

Along this time I added the pcl::DinastGrabber interface to the pcl subversion along with a grabber demo example, it compiles but I cannot say it works since I still do not have the sensors for testing. I have not messed around much with the code since I will first would like to test that it works with the cameras. As soon as they are here I will clean the code so it looks nice and meets the PCL C++ Programming Style Guide. So in the meantime I have implemented a simple Rapidly Exploring Random Tree (RRT) demo with some visualization and I have recorded some of its performance on a little video. It first shows a 2D RRT with no visualization, which you can see gets to the goal pretty fast, then I added visualization and it takes some more time, and finally I showed the 3D RRT with visualization that takes quite a long time to reach the goal (even the 3D with no visualization also takes long time). The goal is at x=100,y=100,z=100 and starting point at the x=0,y=0,z=0 for the 2D version the z dimension is always set to 0. The blue dot is the starting point and the red one the goal, points added to the tree are in green and the white lines represent the edges. Here you have the video:

I expect the sensors to arrive soon this week so I can get hands on on some more coding. I will be also adding some collision checking on the RRT algorithms.

ARM Optimization
Sunday, July 29, 2012

Often time we have ignored the importance of writing efficient source code. With the mobile platform, every bit of computation matters. Imagine you have a video player that can only achieve 10 fps, while the competitors are running at 60fps. These differences may define a successful or failure application.

To get started, these weeks I’ve gathered some NEON material and wrote some small functions that’s optimized with NEON instruction set. In fact it is surprisingly difficult due to the lack of documentations and example sometimes (maybe I’ve not tried hard enough?)

It wasn’t very difficult to have NEON Intrinsics code compiled and run on Tegra 3 after all.

All we need is adding the #include <arm_neon.h> and compile with -mfloat-abi=softfp -mfpu=neon options.

With the native c code, we can perform a simple array sum (i.e., adding all elements in an array) in about 0.034 second.

07-30 02:46:03.170: I/PCL Benchmark:(1426): Number of Points: 65536, 65536, Runtime: 0.034658 (s)

With the NEON, we get about 2x the perform.

07-30 02:48:04.070: I/PCL Benchmark:(2392): Number of Points: 65536, 65536, Runtime: 0.015879 (s)
int16_t sum=0;
for (; size != 0; size -= 1)
return sum;

.. line-block::

int16x4_t acc = vdup_n_s16(0);
int32x2_t acc1;
int64x1_t acc2;
assert((size % 4) == 0);
for (; size != 0; size -= 4)
int16x4_t vec;
vec = vld1_s16(array);
array += 4;
acc = vadd_s16(acc, vec);
acc1 = vpaddl_s16(acc);
acc2 = vpaddl_s32(acc1);
return (int)vget_lane_s64(acc2, 0);

Code Example Source:

Reference Links:

The next step would be optimizing for the floating point operations, and it seems to be a rather difficult task. It seems to be promising now as if I can multithread the work to 3 cores together with NEON, we can get 4-6x speed up, and thus a 5fps application will now run smoothly at 30fps. That’s a big improvement for sure.
Design considerations and a short example
Saturday, July 28, 2012

Underlying containers

The first important design choice for the mesh implementation is the data structure in which the individual mesh elements (vertices, half-edges, faces) are stored. The three most suitable containers are the vector, deque and list (arrays are only good for meshes with a known size). The list has the advantage that pointers to its elements are not invalidated when elements are inserted or erased. This makes it possible to store pointers (iterators) to the connected neighbors directly in each mesh element. No further steps are required if elements are removed from the mesh. The disadvantage is a slower traversal through the list which is a very common operation, for example when the mesh is transformed or rendered.

The vector and deque provide very similar functionality while using a different memory management. For both it is not possible to directly store iterators in the mesh elements because these are invalidated when elements are inserted or erased in the middle of the container. Therefore it is necessary to exchange the iterators with indices to the connected neighbors. Since indices are not dereferencable one has to call a method of the mesh to access the element at the given index.

Another problem is that if one element is removed in the middle of the container all subsequent indices have to be shifted and all elements that store an index to a shifted index have to be changed accordingly. This is a very time consuming operation. Therefore it is better to mark the mesh elements as deleted instead of erasing them. When the user has finished all desired deletions a clean-up method has to be called that adapts all indices in one run.

In experiments that I did for my diploma thesis I noticed that the list is much slower compared to a vector for my intended application. I have to do further experiments comparing the vector with the deque so for now I will stick with the vector.

Topology & geometry

Although in a mesh the geometry is stored together with the topology I want to keep them separated for the implementation. I understand the mesh class as a container itself (a quite complex one) which provides methods for accessing and changing the topology while the user is responsible for the geometry: The mesh class should make no assumptions on, for example, the existence of a point position, normal, color, or any other possible data because they are irrelevant for the topology. It is even possible to define a mesh without any geometry at all. This might not be very useful for an application but it has the advantage that the definition of arbitrary user data is very easy:

// Define the mesh traits
struct MeshTraits
  typedef pcl::PointXYZ         VertexData;
  typedef pcl::geometry::NoData HalfEdgeData;
  typedef int                   EdgeData;
  typedef pcl::Normal           FaceData;

  typedef boost::false_type IsManifold;

// Define the mesh
typedef pcl::geometry::PolygonMesh <MeshTraits> Mesh;
Mesh mesh;

The example defines a non-manifold polygon mesh which stores a PointXYZ for the vertices, nothing for the half-edges, an integer for the edges and a normal for the faces. The data is stored in a pcl::PointCloud <T> and can be retrieved by the following methods:

Mesh::VertexDataCloud&   cloud = mesh.getVertexDataCloud ();
Mesh::HalfEdgeDataCloud& cloud = mesh.getHalfEdgeDataCloud (); // Note: Empty for the example above
Mesh::EdgeDataCloud&     cloud = mesh.getEdgeDataCloud ();
Mesh::FaceDataCloud&     cloud = mesh.getFaceDataCloud ();

The topology is accesed through the mesh. Each method must give the respective index:

// Vertex connectivity
HalfEdgeIndex ind = mesh.getOutgoingHalfEdgeIndex (vertex_index);
HalfEdgeIndex ind = mesh.getIncomingHalfEdgeIndex (vertex_index);

// Half-edge connectivity
VertexIndex   ind = mesh.getTerminatingVertexIndex (half_edge_index);
VertexIndex   ind = mesh.getOriginatingVertexIndex (half_edge_index);
HalfEdgeIndex ind = mesh.getOppositeHalfEdgeIndex  (half_edge_index);
HalfEdgeIndex ind = mesh.getNextHalfEdgeIndex      (half_edge_index);
HalfEdgeIndex ind = mesh.getPrevHalfEdgeIndex      (half_edge_index);
FaceIndex     ind = mesh.getFaceIndex              (half_edge_index);
FaceIndex     ind = mesh.getOppositeFaceIndex      (half_edge_index);

// Face connectivity
HalfEdgeIndex ind = mesh.getInnerHalfEdgeIndex (face_index);
HalfEdgeIndex ind = mesh.getOuterHalfEdgeIndex (face_index);


The mesh provides several circulators:

  • VertexAroundVertexCirculator (clockwise)
  • OutgoingHalfEdgeAroundVertexCirculator (clockwise)
  • IncomingHalfEdgeAroundVertexCirculator (clockwise)
  • FaceAroundVertexCirculator (clockwise)
  • VertexAroundFaceCirculator (counterclockwise)
  • InnerHalfEdgeAroundFaceCirculator (counterclockwise)
  • OuterHalfEdgeAroundFaceCirculator (counterclockwise)
  • FaceAroundFaceCirculator (counterclockwise)

Incrementing the circulators around the vertex circulates clockwise while incrementing the circulators around the face circulates counterclockwise when looking at the mesh from the outside. The reason is that these operations don’t need to access the previous half-edge which might become an optional part of the mesh in the future. The circulators around the face can also be used to move along the boundary of the mesh.

Circulators are different compared to iterators in the aspect that they don’t have a distinct begin and end position. It is valid to circulate through a sequence endlessly. Usually one wants to access all elements only once. This can be achieved by the following procedure:

// NOTE: {...} stands for any of the circulators and (...) stands for the respective input

{...}Circulator circ     = mesh.get{...}Circulator (...);
{...}Circulator circ_end = circ;

  // do something
} while (++circ != circ_end);


I added a short example to pcl-trunk/examples/geometry/example_half_edge_mesh.cpp. It defines a non-manifold mesh and creates a simple manifold topology which is traversed in different ways. Then it deletes two faces resulting in a non-manifold topology. If you change to a manifold mesh further faces are deleted in order to keep the mesh manifold.

Please let me know if you find any bugs. And since the implementation of the mesh is still evolving I am also very happy about comments regarding the API.

ISS is available on trunk now!
Saturday, July 28, 2012

Now, ISS is available on the trunk and it is properly documented. In the section how to use the ISS 3D keypoint detector of this blog I will post some code snippets useful to the user who wants to exploit the ISS detector. Currently, the ISS detector is under testing. It will be tested for different configurations:

  1. using 1 thread and disabling the boundary estimation.
  2. using 1 thread and enabling the boundary estimation.
  3. using 1, 2, 3 and 4 threads (4 is the maximum number of threads allowed in my system) and chosing the best configuration among that described previously in 1. and 2. .

The tests will show both the repeatability results and the time performances of the detector. The results related to the configurations 1. and 2. can be compared with the results already obtained for the other detectors tested at the beginning of my GSoC work.

Coming soon: the test results for the ISS detector.

First pictures from the ISS detector
Saturday, July 28, 2012

In what follows I’ll show some snapshots related to the behaviour of the ISS detector. They have been obtained by not setting the border radius.

Shapshots from the Kinect dataset.

  • Model:

  • Scene:

  • Results:

    • Absolute repeatability: 8
    • Relative repeatability: 0.195122

Shapshots from the Stanford 1 dataset.

  • Model:

  • Scene:

  • Results:

    • Absolute repeatability: 413
    • Relative repeatability: 0.769088
New Plugins, Deleting, Properties, a Screenie
Friday, July 27, 2012

Another quick pre-sleep update... Voxel Grid Downsample and Statistical Outlier Removal have been added, which means the cloud modification undo/redo functionality works now. Delete undo/redo command has also been added. The only remaining command type to be added is the merge command, as would be needed for registration. This is up next.

Then on to selection tools... which may need to be implemented directly, rather then as plugins, since plugins have no way to interact with the VTK render windows (and they shouldn’t, since their processing is done in non-GUI threads). Adding additional tools now is really quite fast... though I think at some point in the coming days I’m going to polish the plugin interface a little bit, as well as clean up the items, mainly by giving them a templated way to get/set data that doesn’t rely on Qt’s awkward UserRole enum system.

I’ll leave you with a little screenshot of what it looks like now.


Oh, one other thing, if you’re fooling around in the GUI, and something goes horribly wrong, please submit a bug report. The whole app is starting to become a little more complex these days, so ugly bugs that I haven’t found are bound to start cropping up.

Feel free to submit feature requests too... but I make no guarantees as to how long it might be before they can be implemented!

Initial results on (new) SSD
Thursday, July 26, 2012

I have been able to reformulate the mathematical framework used in SSD for implementation in PCL, without having to explicitly allocate primal and dual graphs defined over the octree. Over the past few weeks, I have implemented this extended formulation and have done many experiments to figure out the range of regularization parameters involved in the optimization. The results look very good.

There are 3 independent tasks in the implementation:

  1. Given a point cloud with normals, construct an octree (Julius’ extended octree does this)
  2. Given an octree, estimate scalar field per leaf (My extended SSD will do this)
  3. Given a scalar field defined defined over octree leaves, either render the volume or extract isolevel zero as a polygonal mesh ( for the latter there is a need for implementation of Dual Marching cubes algorithm )

SSD with standard implementation (left) vs. proposed implementation (right). Given that the results obtained by using new formulation are quite comparable to ones obtained by using the standard implementation, I now proceed to incorporate this implementation into PCL.

fixed problem
Wednesday, July 25, 2012

I managed to fix the region growing, so now, if the seed point is on a plane paralell to the table top the method does not fail. Below you can see some screen shots of the

_images/screenshot-1344673807.png _images/screenshot-1344674049.png _images/screenshot-1344674128.png
Simplify Scene Tree and Point Cloud Storage
Tuesday, July 24, 2012

After some efforts on making the scene tree with model/view, I gave up and just build it with QTreeWidget, which is simpler and works for desired UI interactions. And the work for making a PCL 2.x PointCloud is more than I expected, so I also gave up on this point cloud and found that pcl::PointCloud<pcl::PointSurfel> meets the requirements and I just store the point cloud in it for now. The storage will be upgraded to PCL 2.x point cloud type when it’s ready. After I removed the over kill code, it becomes much easier to progress, and I will move to the registration part next.

Clustering, Splitting clouds, Selection highlighting
Tuesday, July 24, 2012

Just a quick update before I go to sleep... Euclidean clustering now works; this means that the cloud splitting command & undo/redo functionality is implemented now too. If you load a cloud, run clustering, it will split it into clusters and whatever remains from the original. Basic (ugly) highlighting now works too; the cloud selected in the browser on the left gets “highlighted” in the view by turning red... which is terrible. I want to do a more subtle highlighting effect, maybe some sort of glow, or tinting, but my attempts are failures so far. I played around with adding a colored ambient/diffuse light to the selected cloud actor (in pclVisualizer) but none of the lighting changes I made had any effect. I suspect I’m making some sort of noobish vtk mistake. I think I need to read through a quick vtk tutorial tomorrow morning.

Sunday, July 22, 2012

I added all the transform functionality in the Painter2D class now. So, one can perform transform operations like:





. . .

Since applying transform is a real time operation, it is required to keep track of when it is called. I solved this issue in similar way I tackled the underlaying vtkPen and vtkBrush. I stored a transformation matrix for each of the figure and updated it with the current transform stored in the painter class as state. Like I said before, implementation of this 2D class was not straightforward as I thought before. In fact it has been full of tricks.

Adding this functionality more or less completes the Painter2D class and my proposed work for the gsoc. For the rest of the period I will try to improve these two classes (PCLPlotter and PCLPainter2D) and add more features on request. So, if you think something should be added just email me; I will add if I think it is feasible ;-) I will also probably be assigned more work by Alex.

To sanitize or not to sanitize?
Saturday, July 21, 2012

So I’ve stumbled upon a design issue that I’m not sure how to handle. The problem is that some functions in PCL require sanitized input (such as FPFH), namely, they can’t handle NANs. This means I have to use either a passthrough filter or the removeNAN filter at some point. It might seem natural to do this within the plugin that needs sanitized inputs, but then the result won’t really correspond to its input cloud, ie, it will have less points. To get around this, I’m just sanitizing all input clouds when they are loaded, but this isn’t a satisfactory solution either, since it means you break up organized clouds (and modify the cloud automatically, which the user might not want to do).

So, what should I do here? Should plugins specify that they require sanitized input, and have their tool icon greyed out unless the selected input cloud is valid? What would be the best way to notify the user about why a tool is greyed out? Tooltips which give the reason it is disabled when you hover over the tool icon?

This seems like a sensible solution to me, but please, if anyone actually reads these blogs, let me know if you have a better idea, or if my solution seems foolish to you.

Oh, and more tools are coming, I promise, I’m just trying to make the plugin specification as complete as possible before I start implementing more tools... and I keep discovering things it is missing.

Working hard on ISS refinement
Saturday, July 21, 2012

Some time has passed since my last post. My recent and current activities regard mainly the ISS detector. First of all, I completed the detector so that it has a basic functionality (e.g. it works ! ). In order to be sure that it really works, I developed a simple evaluation framework at the same time. This framework looks like the basic frameworks I developed at the beginning of the GSoC code sprint and it allows the user both to compute the repeatability of the detector based on a pair of files (model + scene) and to visualize the resulting keypoints. My current activities regards the refinement of the ISS detector, and now I’m particularly focusing on time performances. After a brief analysis about the time performances of the detector, I refined it by using low-cost data structures. Now, I’m working on the introduction of the openMP directives in order to further speed-up the computation when 2 or more cores are avalilable. The detector will be available in the trunk when it will be fully completed and optimized.

Stay tuned with my roadmap: it always shows my current activities even if I’m not posting so much.

Changes to displaying, interaction with models
Wednesday, July 18, 2012

I haven’t been posting much on the blog, I’ll try to keep up with this better. I’ve done some rewriting of how things are displayed. Now the item classes define their behavior with respect to the various view classes. This means that a CloudItem defines its own paint function, which the View window (which uses PCLVisualizer) just calls when it needs to paint the CloudItem (ie, when it changes, when its added/removed). Also, now when changes are made in the inspector to properties they propagate through the models and update all the other views. For now, I’m only making properties editable which don’t require another call to the actual tool functionality, just ones that change how things look. For instance, you can change the scale and level of Normals, but you can’t change the radius, since that would require recalculating them, which is an expensive operation. This may change in the future, but for now, if you want to change the radius, you just have to calculate a new set by running the tool again (with a different radius).

Warming up
Wednesday, July 18, 2012

First blog post just to get in touch with the blogging system and write a bit about my first steps. I started spending some time reading documentation mostly related to pcl and the code sprint project. I am currently working on the pcl::DinastGrabber interface for the DINAST cameras. Two Cyclopes sensors from DINAST will be shipped to me in the next few days. As soon as they arrive I will test the pcl grabber code and get started with the realtime multiple sensors point cloud map generation. In the meantime I will also start with some basic rapidly exploring random tree (RRT) reading and coding.

Tuesday, July 17, 2012

OK, so PCLPainter2D class is now available in the trunk. Currently it allows user to draw all the 2D primitives. The usage is same as discussed in design:

PCLPainter2D painter;


. . .


The implementation is also exactly same as discussed in the previous blog. I’m storing drawing information in a data structure which is a vector of a class Figure2D. Later I’m using this information to re-implement paint() of the contextItem class (PCLPainter2D).

fixing things
Tuesday, July 17, 2012

This past week I have been fine tuning and rewriting the region growing part of the segmentation method in order to fix the eroneous segmentation shown in my last post.

Progress on head detection and pose estimation (II)
Tuesday, July 17, 2012

I have continued working on the face detection method and added the pose estimation part, including the clustering step mentioned on my last post. See the video for some results from our implementation (at the beginning is a bit slow due to the video recording software, then it gets better).

I fixed several bugs lately and even though the results start looking pretty good I am not yet completely satisfied. First I was facing some problems during training regarding what to do with patches where the features are invalid (division by zero), I ended up using a tree with three branches and that worked better although I am not yet sure which classification measure should be used then (working on that). The other things are: use of normal features which can be computed very fast with newer PCL versions on organized data and a modification on the way the random forest is trained. Right now it requires all training data to be available in memory (one integral image for each training frame or even four of them if normals are used). This ends up taking a lot of RAM and restricts the amount of training data that can be used to train the forest.

Outofcore Octree Updates and Changes for PointCloud2
Monday, July 16, 2012

I have been a bit quiet on my blog, but have finally checked in a lot of improvements to the OOC libraries, smoothing out construction of the trees with PointCloud2 data types. This allows us to compile OOC code without having to know the point type in the PCD file. I have finally finished implementing the PointCloud2 interface. I have not added some auxiliary functionality yet such as buildLOD (after insertion of data to leaves), but LOD can be built automatically on insertion using addPointCloud_and_genLOD.

OOC has also been enabled to build in trunk by default as it is approaching more stability. Justin is working on having the out of core visualization engine running on VTK while I am supporting lingering performance issues with OOC construction and query, and doing some final refactoring of the code and library. With the integration of PointCloud2, point clouds can be created with any XYZ-based data type by first using toROSMsg, then inserting the PointCloud2 to the OOC octree. Once Justin and I have a working out-of-core visualization pipeline (he’s handling the heavy lifting with rendering), there is still much more we can do to add to the capabilities of the library.

Representation of manifold and non-manifold meshes
Saturday, July 14, 2012

In this blog post I will talk about the representation of a mesh and how it is affected by the manifold property. Probably the most convenient way for the user to generate a mesh is to first add the vertices and then connect them by faces while the edges are created automatically. This way the mesh class can ensure that it holds only valid data. For example, if the user tries to add the same face twice then only the first insertion must be executed. Unconnected (isolated) vertices can be removed afterwards without changing the rest of the topology. The face is specified by a sequential list of vertices where the vertices are connected in a closed loop.

Manifold mesh

In a manifold mesh faces may be added only between boundary half-edges. We can make use of the property that each boundary vertex in a manifold mesh has exactly one incoming and one outgoing boundary half-edge. If a vertex does not lie on the boundary then all incoming and outgoing half-edges are not on the boundary as well. Therefore it is useful to ensure that the outgoing half-edge of a boundary vertex is a boundary half-edge. This way we can easily check if a face may be added without the need to circulate through all half-edges that are connected to each vertex in the new face. We can also check in constant time if an edge between two vertices is new or already contained in the mesh.

There are two possible problems that might be a concern for certain applications. The first problem is only related to a triangle mesh: If there are several separate components in the mesh then it is not possible to join them by only one triangle without avoiding non-manifold vertices (illustrated in the image below): The dashed triangle may not be added because the encircled vertex would become non-manifold. This problem can be circumvented by adding triangles in pairs in such cases. If this is not desired then one should consider using a quad or polygon mesh or a mesh that can handle non-manifold vertices.

Combining two separated components

The second problem appears when faces are removed from the mesh: The deletion of a face can result in non-manifold vertices. For example, if face number 5 is deleted from the mesh in the next image (left) the encircled vertex becomes non-manifold. This problem can be solved by deleting the neighboring faces until the vertex becomes manifold again (right): Starting from an edge in the deleted face (number 5) we circulate around the vertex and remove all faces (number 4 and 3) until the boundary is reached . This however, can result in further non-manifold vertices (encircled) which have to be removed iteratively until the whole mesh becomes manifold again. These operations are part of the implementation of the manifold mesh so the user does not have to worry about them. Again, if this behavior is not desired one should use a mesh that can handle non-manifold vertices.

Removing faces

Mesh with non-manifold vertices

There was a very nice description how to handle non-manifold vertices online however the site is currently down (please let me know if you see it online again). Although my implementation of the mesh is different in several aspects (mainly due to a different API) the basic idea is the same.

In a mesh with non-manifold vertices it can be no longer guaranteed that every boundary vertex has exactly one incoming and outgoing boundary half-edge. However it is still useful to store one of the boundary half-edges as the outgoing half-edge for boundary vertices. This way it is possible to check if a vertex is on the boundary or not but it becomes necessary to circulate around all neighboring half-edges in order to check if two boundary vertices are already connected.

The problem with non-manifold vertices is that the current connectivity around them might be incompatible with the insertion of a new face. However all possible consistent cases are topologically correct and the mesh does not know beforehand which faces are meant to be inserted next. Therefore it is necessary to make the half-edges adjacent before the insertion of a new face. The next image gives an example how the half-edges around a non-manifold vertex can be reconnected (the connectivity between the half-edges is represented as a continuous line). The insertion of a new face between the half-edges 3-0 and 0-2 (image on the left), for example can be done without any further intervention. However for an insertion of a new face between the half-edges 1-0 and 0-8 it becomes necessary to reconnect the half-edges around vertex 0 in such a way that the half-edges 1-0 and 0-8 become adjacent, as shown in the image on the right. Although in this case the operation resulted in other non-adjacent half-edges these can be corrected during the insertion of further faces.

Make adjacent with consistent orientation

The next image shows an invalid configuration that happens for certain situations. For example if we would try to insert a face between the edges 3-0 and 0-4 then the half-edges around vertex 0 would be reconnected in such a way that the closed loop of half-edges is split up into two loops. The problem is that vertex 0 stores only one outgoing half-edge which can point into only one of the loops. The other loop is then “unknown” by the vertex because it can’t be reached anymore by circulating around the vertex. This configuration is not supported by the half-edge data structure and must be avoided. Luckily all of these operations can be hidden inside the mesh implementation so the user does not have to worry about them.

Non-representable non-manifold vertex.

Mesh with non-manifold edges

The linked page describes how the half-edge data structure can be modified in order to handle non-manifold edges and non-orientable meshes:

The basic idea is to introduce virtual triangles between non-manifold edges. Virtual triangles are not exposed to the user but they allow keeping the mesh manifold internally. They also allow modeling non-orientable surfaces.

My implementation currently does not support virtual triangles. I think that the implementation of a non-orientable mesh with non-manifold vertices and edges would become really messy if it is not done right and I would like to test the mesh with non-manifold vertices first before continuing with more elaborate representations. Furthermore it is also worth considering other mesh data structures which might be suited better for representing arbitrary meshes. If there is a need for it I can have a more in-depth look into virtual triangles after my code sprint. But for now I want to keep things simple. In my next post I will give a short API overview of my current implementation and add everything to the trunk.

PCL Performance Benchmark on Tegra 3 (Android 4.0)
Thursday, July 12, 2012

I’ve rebuilt the PCL libraries using the build script (see pcl_binary/ in the svn respository) on Ubuntu 12.04 and the compilation works with a few hipcup. First we need to turn off the ENABLE_EXAMPLE flag due to the dependency problem. Second, we have to compile it with make -j 1 flag. Otherwise, everything ran smoothly.

I notice there isn’t any performance benchmark of PCL on Tegra 3. Here I’ve done a few simple testing. I believe it is important to see how such floating point operations we can do per second with the Tegra 3 architecture. float vs double? float vs int? Again, compiling the library as arm vs thumb mode may make different. Here I will provide a quick but throughout summary of what we can achieve with the Tegra 3 under different settings.

For simplicity, I’ve first benchmarked the passthrough filter by averaging the runtime of the filter over ten trials. I know this filter shall have a linear behaviour to the number of points, so far the benchmark results seem to be consistent.

// Create the filtering object
pcl::PassThrough < pcl::PointXYZ > pass;
pass.setFilterLimits(0.0, 1.0);
07-12 21:37:50.070: I/PCL Benchmark:(2785): Number of Points: 10000, Runtime: 0.002583 (s)
07-12 21:37:50.190: I/PCL Benchmark:(2785): Number of Points: 10000, Runtime: 0.002652 (s)
07-12 21:41:14.330: I/PCL Benchmark:(3614): Number of Points: 100000, Runtime: 0.036954 (s)
07-12 21:41:14.880: I/PCL Benchmark:(3614): Number of Points: 100000, Runtime: 0.038295 (s)
07-12 21:39:49.130: I/PCL Benchmark:(3344): Number of Points: 1000000, Runtime: 0.397860 (s)
07-12 21:39:53.720: I/PCL Benchmark:(3344): Number of Points: 1000000, Runtime: 0.392162 (s)

With these information, we can start optimizing our work by reducing the bottlenecks in each of these filters. But let’s recompile it in ARM mode and see if it will make a world of difference.

After I know what the Tegra 3 is capable of. It is time to design a simple 3D application. Argumented reality? and a few segmentation algorithms will do the tricks. What can we achieve with the current hardware?

Working in parallel on new detectors: GSS and ISS
Wednesday, July 11, 2012

Last week I started working on GSS but I stopped this week since I need some feedback from Alex and he may be on vacation now. In order not to lose much time, I have decided to switch to the implementation of ISS. After reading the related paper and after taking some knowledge on the code already implemented, I have defined the input and output parameters of the detector and the operations that have to be done in order to compute the keypoints. Then, I have prepared a skeleton class and now I’m currently filling and extending it with the implementation of new methods.

What were my other activities during the past and the current week? Bug solving and user support.

RGB Edges
Tuesday, July 10, 2012

I added RGB edge detection, which is 2D canny edge detection from the RGB channels, in pcl::OrganizedEdgeDetection. Right now, the class only takes point types having RGB channels, but it will be changed so that possible edges can be detected from a given point cloud type. For example, ‘occluding’, ‘occluded’, and ‘boundary’ edges can be detected from any XYZ point types. And ‘high curvature’ and ‘rgb’ edges can be obtained from Normal and RGB point types, respectively.

Following images show the detected occluding (green), occluded (red), boundary (blue), high curvature (yellow), and rgb (cyan) edges:

_images/screenshot-1341964342.png _images/screenshot-1341964387.png
Segmentation using color and normals features
Tuesday, July 10, 2012

The following picture two pictures show the input data set. On the left side you see the captured point cloud captured using an Asus camera. On the right side you see the han labeled data set. I will show different segmentation results using different features and different levels of noise when setting the labels as unary potentials. One challenging part of the image (red circle) is the boundary between the box and the table. The box in the lower right corner has (almost) the same color as the table.


In the following image sequence you’ll thesegmentation using onlycolor information. The input labels arewith 50% noise assigned, meaning each unary potential is with 50% probability a random label assigned. From left to right the different results after x number of iterations can be seen. Whereas X is [0, 1, 3, 5, 10, 15]. Notice that when using only color information the table label grows into the box (red circle).

_images/noisy50_it-0_it-1_it-3_noNormal.png _images/noisy50_it-5_it-10_it-15_noNormal.png

In the next image sequence we use only the normals as features. One can see that normals by itself are very powerful. However, we will also see that using only normal information has it’s limitations as well. The number of iterations per image is kept the same as well as the noise level.

_images/noisy50_it-0_it-1_it-3_Normal.png _images/noisy50_it-5_it-10_it-15_Normal.png

Lastly Color + Normal features are used for segmentation. Notice that using color and normal features has extremely fast convergence. After only 5 iterations we have a very acceptable result.

_images/noisy50_it-0_it-1_it-3_Normal+Color.png _images/noisy50_it-5_it-10_it-15_Normal+Color.png
Segmentation using color and normals features
Tuesday, July 10, 2012

In the second segmentation experiment I wanted to push the algorithm to the limit. For this I made the unary potentials extremely noisy. The potentials get with 80% a random label assigned. The first image sequence shows the segmentation result from left to right with different number of iterations [0, 1, 3, 5, 10, 15]. For the first test we use again only color features. We can see that by using only color features the algorithm performs poorly, which is not surprising. Changing the weights might help a little bit, however to make it a fair comparision I kept the weights and the standard deviations for the Gaussian kernels constant.

_images/noisy80_it-0_it-1_it-3_noNormal.png _images/noisy80_it-5_it-10_it-15_noNormal.png

Next we use only the normals as features. Using the normals results in surprisingly good results. The background is labeled almost perfectly as well as the objects on the table. The table itselfhowever, remains unlabeled. To this point I have no good explanations why this is the case. Further investigation might be interesting.

_images/noisy80_it-0_it-1_it-3_Normal.png _images/noisy80_it-5_it-10_it-15_Normal.png

Lastly we use Color + Normals features. To my surprise, I actually did not expect such a good result. The only part that seems to be mislabeled are table legs.

_images/noisy80_it-0_it-1_it-3_Normal+Color.png _images/noisy80_it-5_it-10_it-15_Normal+Color.png
Design of the 2D painter class
Monday, July 09, 2012

In this post I will discuss about the design which I thought for the 2D painter class. The aim is to have a very simple interface (just like PCLPlotter) which allows user to add figures by simple add*() methods and, in the end, a display() method to show the canvas. Something like the following is desirable:

PCLPainter2D painter;


painter.addLine(0,0, 5,0);

. . .


The underlaying implementation of PCLPainter2D in the above design will not be as straight forward as PCLPlotter where we have an instance of vtkChartXY and vtkContextView inside the class. The only job was to convert the plot data (correspondences) to a format (which is vtkPlot) appreciated by vtkChartXY. That is, we had a direct mapping in term of functionality from vtkChartXY to PCLPlotter (with difference in the type of data they process and an additional “view” object in PCLPlotter). The problem in the above design is the fact that we don’t have any vtkContextItem class which share similar properties of Painter2D class. Instead, 2D drawing works in the following way in VTK. The VTK user needs to first:

  1. Make a subclass of vtkContextItem
  2. Re-implement (override) Paint() of vtkContextItem. (shown in the figure)

It would be really nice to have a vtkContextItem class which cuts off the overhead of subclassing and allows user to draw directly from the function calls. Unfortunately, we don’t have any (out of vtkChart, vtkPlot, vtkAxis,..., etc.) vtkContextItem class with that kind of behavior. So, before directly writing a Painter class for PCL it may be wise to write something like vtkPainter2D class for vtk and extend it to PCL. In this way it can be used to avoid subclassing in both VTK and PCL and its rendering could be further optimized in the future.

Now, the steps for creating “vtkPainter2D” (or PCLPainter2D) which would be a subclass of vtkContextItem are roughly the following:

  1. Store information of 2D primitives in some data structures in every call of add*() calls.
  2. Implement Paint() using those data structures.

These things have already been discussed with Marcus. It would be nice to hear your comments and suggestions.

More plugins, more fun
Monday, July 09, 2012

So I’ve finally gotten my satellite internet connection up and running here in France, so now I can start committing again every day. I’m finished with the FPFH plugin, with display of histograms in the inspector view. Now I need to work on a splitting plugin, segmentation, so I’ll have to spend a little more time in getting the undo/redo stuff working for that. I think I need to do some more thinking about the structure of how the items are working as well. Right now I’m still using a QStandardItem subclass, where it probably makes more sense to subclass from the QAbstractItem directly and implement some things myself. I’ll probably spend the next day working on that, along with the segmentation plugin. The plan is to have the following plugins working by the end of the week: Normals, FPFH, Euclidean Segmentation, Plane Segmentation, ICP for registering two clouds.

ShadowPoints filter
Sunday, July 08, 2012

This filter removes the ghost points that appear on the edges. This is done by thresholding the dot product of the normal at a point with the point itself. Points that obey the thresholding criteria are retained. This completes the port of libpointmatcher to PCL. Now, I will be writing examples for all the techniques I added.

Sunday, July 08, 2012

I am adding snapshots of pcl_plotter in action showing examples of the functionalities which I discussed in my previous blogs. Till now, I didn’t get a good internet connection in my new apartment, but I am uploading them from my limited cellphone connection anyway.

Most of them are plots from a given function like polynomial/rational or a user defined custom function. Last two snapshots are provided for the comparison between PCLPlotter and PCLHistogramVisualizer.

  • Plot from polynomial
_images/x2.png _images/x2x3.png
  • Plot from Rational function: y = 1/x
  • Plot from an arbitrary Rational function: y = (3x^2 + x + 2)/(6x^5 + 5x^4 + 4x^3 + 3x^2 + 2x + 1)
  • Plot from user-defined callback function (eg taken: step, abs, and identity)
  • Comparison between PCLHistogramVisualizer and PCLPlotter
  • Multiple Histogram view in PCLPlotter
adding normals and more
Sunday, July 08, 2012

I’ve finaly managed to add the normals to the region growing algorithm and results are promising. I’ve conditioned adding of points to the region on the angle between the seed points’ normal and the current point. Results doing this are shown in the screen shot below.


Adding this condition helped and not at the same time. Although now growing stops when we reach the table top, parts of the object that are parallel to it don’t get added as well. This was expected of course. Still...better then the first try.

Thanks to a friend of mine I found out about this theory:

The interesting part for me is that according to this theory every object can be broken down into piece primitives and there are only 32 kinds of these primitive shapes and all of them are convex, meaning that complex objects can be separated into these parts at their concavenesses.

The following is an extract from the book From Fragments to Objects - Segmentation and Grouping in Vision T.F. Shipley and P.J. Kellman (Editors) 2001 Elsevier Science B.V. All rights reserved.

“The simplest shapes are convex shapes—whose outlines have positive curvature throughout (see, e.g., Rosin, 2000, for the role of convexity in parsing). If the outline of a shape has regions of negative curvature, especially if these regions contain salient negative minima of curvature, this usually indicates that the shape can be further parsed to give simpler subshapes. [...] Three main geometrical factors determine the perceptual salience of a part (Hoffman & Singh, 1997): (1) its protrusion, (2) its relative area, and (3) the strength of its boundaries. Salience of a part increases as its protrusion, relative area, or boundary strength increases. In this section we briefly consider protrusion and relative area, and then discuss in more detail the strength of part boundaries. We restrict attention to 2D shapes; the theory for 3D shapes is more complex and discussed elsewhere (Hoffman & Singh, 1997).”

“Hypothesis of normalized curvature: The salience of a part boundary increases as the magnitude of normalized curvature at the boundary increases.”

“Hypothesis of Turning Angle: The salience of a negative-minimum boundary increases as the magnitude of the turning angle around the boundary increases.”

Based on these theories I introduces another constraint to my region growing: when adding a point the angle between the line connecting that point to the fixation point and the points normal is checked. If this angle is concave it means that the point currently being verified belongs to the object. Doing so resulted in the following results (blue points are the ones segmented out and the green patch on the objects is the neighborhood of the fixation point):

_images/scene_1.png _images/scene_2.png _images/scene_3.png _images/scene_4.png _images/scene_bad_2.png

As it can be observed introducing that extra condition improved results a lot.The last scene is not a good result. My next step will involve investigating the cause of that erroneous segmentation. As a final observation, it was clear for me since the beginning, that the whole of this method is dependent on having a good fixation point. If this is not the case, the algorithm would not work at the moment. One of my next steps will be to investigate the method recently proposed by the authors of the original paper for automatic fixation point estimation.

With this occasion I would like to thank Zoltan-Cs. Marton for coming up with the idea of using convex vs. concave angles, it helped me a lot:). THX m8:)

Plot from rational functions and user defined callback function
Saturday, July 07, 2012

Added two functionalities:

  1. plot from rational functions which are the ratio of polynomials. Plot of 1/x looks nice.
  2. plot from a user defined callback depicting the relation between Y and X axis. The function should be continuous.

Snapshots coming!

Using Kinfu Large Scale to generate a textured mesh
Friday, July 06, 2012

We added a tutorial that describes the pipeline of KinFu Large Scale in order to generate a textured mesh. We hope that this tutorial is very helpful and encourages people to share their experience with the application. We are interested also in hearing your feedback and impressions on KinFu Large Scale.

The tutorial can be found in the Tutorials section of pointclouds.org

Francisco and Raphael

Plot from polynomial
Friday, July 06, 2012

Added the functionality to plot from polynomial. This occurred to me as a useful functionality a plotter class should have. User needs to provide a vector which stores the coefficients of the polynomial and range. PCLPlotter will plot and display them on the screen.

I still don’t have a good internet connection. I will post the snapshots later.

I am very happy that the training period of my job will get over by the end of next week. It won’t be hectic after that as the office hours will go to normal (Currently its like 12-14 hours :()

Tegra 3 + Android + OpenNI + PCL’s VoxelGrid filter Sample
Friday, July 06, 2012

I’ve collect some statistics and screenshots of our new sample app that demostrates the voxel grid filtering using the PCL library. The performance isn’t something I would be very proud of, i.e., only ~2fps with all of the processing with about ~0.3 million points (307200) as input. However, it is quite usable if we are using this for capturing something steady, perhaps to be used for 3D reconstruction in real-time.

Here are some screenshots of the sample apps, and it shows the RGB images, depth image, and also the 3D PointCloud data all using the OpenGL ES2. The downsampling does provide us at least 50% reduction on the number of points.

_images/voxel_pcl_sample_july_6.jpg _images/voxel_pcl_sample_july_6_2.jpg

Here is a little video demo of the application (running at 2fps).

We also collects some simple statistics on the performance of the algorithm, both the runtime statistics and the compression ratio we can achieve.
07-05 21:46:59.150: I/Render Loop:(10204): Display loop 0.543398 (s)
07-05 21:46:59.730: I/PCL FILTER TESTING:(10204): Original: 307200, Filtered: 75208, Ratio: 0.244818, Sum 564766.304984

In some cases, the voxel grid filter can reduce the number of points to only a small fraction. I have seen cases where the ratio is below 10% for flat surfaces. We have only touched the surface of the PCL library, but I can see number of applications can be built using these. Possibilities are just limitless. ;)

TODO: need multithreading to utilizing the quadcore on the Tegra 3. It seems to be an easy task. CPU # 1: OpenNI engine; CPU # 2: main thread with GUI; CPU#3 & 4: PCL and other processing engines. :). That way we will be fully untilizing all for the cores on Tegra 3. Also, I wonder if we can use float instead of double for all operations, and also turn off the THUMB mode! Obviously, we need to squeeze more performance out from everything. NEON optimization? Anyone?

PCLHistogramVisualizer rewrite
Thursday, July 05, 2012

All important functionalities of PCLHistogramVisualizer are now incorporated in the PCLPlotter class. They are rewritten here so that this single class can take responsibility of all ploting related functionalities. The signatures of the PCLHistogramVisualizer functions are retained as of now so that one can directly use the PCLPlotter class with previous signatures to get similar result. I will post snapshots in the evening when I get a good internet connection.

I made some changes in pcd_viewer so that it can use this Plotter class instead of HistogramViewer. As signatures of the functions are same, the changes are minor; but the fact that it is now using this unified Plotter class. I will discuss with the community before committing the new pcd_viewer.

I really want to get some feedback about this Plotter class. Please update the trunk and use the class to plot something nonsense and tell me if you would like to have something in addition.

Reading up on mean shift
Thursday, July 05, 2012

Since I was a little swamped with work this week I did not have time to try out how adding the normals will change my region growing yet, so I did some reading in my spare time about mean shift segmentation and existing implementations. Since the original papers on mean shift are not the easiest to understand and to clarify to myself about steps involved in segmenting i started searching for other articles about this topic. I’ve found this article to be the most helpful:

SamplingSurfaceNormal filter
Wednesday, July 04, 2012

This filter recursively divides the data into grids until each grid contains a maximum of N points. Normals are computed on each grid. Points within each grid are sampled randomly and the computed normal is assigned to these points. This is a port from libpointmatcher.

Moving to the future PointCloud type
Wednesday, July 04, 2012

I didn’t choose PointT as the core data structure, since different algorithms will interact with PCLModeler requesting different point cloud types, so I decided to use PointCloud2 because it is a container of many fields. But now I found it’s not easy to support some functions, for example, updating a specified field, or adding some fields, the current code is messy because of the accommodation for PointCloud2. According to this thread, PointCloud2 is going to be deprecated, and the proposed point cloud type in PCL 2.x is much more friendly for PCLModeler. So I will drop PointCloud2 and re-design the core based on the proposed data structure.

Final performances results for the Stanford dataset 2
Wednesday, July 04, 2012

In what follows, the time performances related to all the considered detectors have been collected with regards to the Stanford dataset 2 and graphically visualized to foster a better understanding. In our framework, time performances refer to the keypoint extraction time of each detector and different scales have been taken into account: 6, 9, 12, 15 * scene_resolution are considered. Results are given below.

Final repeatability results for the Stanford dataset 2
Wednesday, July 04, 2012

I just got the results for the execution of tests on the Stanford dataset. All the tests (both on the Kinect and the Stanford dataset) have been executed on a 2nd generation Intel® Core™ i5 processor with a speed equal to 2.4 GHz (3 GHz if TurboBoost technology is enabled). The results regarding the repeatability measure confirm the superiority of the NARF detector on the others. The repeatability results are graphically shown below:

Changes to API
Tuesday, July 03, 2012

Based on some discussions on how people would want to use the 2D module, there have been some changes in the API for this module. Images are now being represented in the point cloud format itself. 2D filters will extend the pcl::Filter inteface and 2D keypoints will extend the pcl::Keypoint interface. This will lead to a code structure more consistent with the rest of PCL.

This module does operations only on the RGB/Intensity channels of Organized Point Clouds. It ignores all x,y,z information for now. There are lots of features in PCL which deal with the x,y,z data. Now that the 2D module works with the same data-types, the user could use these existing features for processing the x,y,z information and the 2D module to process the RGB/Intensity information.

I’ve been focusing on designing and implementing this new API in the past few days. I’m also converting the code I wrote earlier to comply with this new API.

Hard working days
Tuesday, July 03, 2012

While being in Toulouse I had to sacrifice some of the time devoted to PCL, so this week I’m going to work hard in order to recover that time. Currently, I am testing the detectors on a synthetic dataset (the well-know Stanford dataset). Unfortunately, since this dataset does not contain the RGB information the only detectors under testing are: Harris 3D, NARF and uniform sampling. Indeed, those three detectors are characterized by having a shape-based saliency measure. As for the tests executed on the Kinect-based dataset, Harris 3D is evaluated with regard to all the possible different response methods. While the tests are executing, today I’ve decided to run up with my roadmap and I started to take some knowledge about the 3DGSS detector. In particular, the reference paper I’ve read is:

Coming soon: the final evaluation results on the synthetic dataset.

Still looking for a PhD position
Tuesday, July 03, 2012

Finally, I decided to decline the offer by the LAAS-CNRS and I’m still looking for a PhD position. Any suggestion about it?

Half-edge data structure
Monday, July 02, 2012

The half-edge data structure is a popular representation for a mesh in which each edge is split up into two half-edges with opposite directions. The main topology information is stored in the half-edges. The vertices and faces are not explicitly connected to each other because they can be reached through the respective half-edges. If one wants to find, for example all faces to which a vertex is connected to then it is necessary to go through all neighboring half-edges and refer to the faces through them. Compared to explicitly storing references from each vertex or face to all of their immediate neighbors this has the advantage that the storage space per vertex, half-edge and face is constant while it is still possible to access the neighborhood without an exhaustive search through the hole mesh. Another advantage is that the orientation of the faces is represented in the surrounding half-edges.

Half-edge connectivity:


The image above illustrates the connectivity information related to a specific half-edge (red), which explicitly stores indices to

  • the opposite half-edge (also called pair or twin)
  • the next half-edge
  • the previous half-edge (this is actually not a requirement of the half-edge data structure but it is useful for the implementation)
  • the terminating vertex
  • the face it belongs to

Other elements in the mesh can be reached implicitly (gray)

  • originating vertex = opposite half-edge -> terminating vertex
  • opposite face = opposite half-edge -> face

By using a convention in the implementation even the opposite half-edge can be accessed implicitly in constant time and its storage space can be saved: If half-edges are always added in pairs then their respective indices come right after each other. Given a half-edge index we can then derive the opposite index simply by checking if it is even or odd.

Vertex connectivity:


The image above illustrates the connectivity information related to a specific vertex (red), which explicitly stores an index to

  • one of its outgoing half-edges

The corresponding incoming half-edge can be reached implicitly (gray)

  • incoming half-edge = outgoing half-edge -> opposite half-edge

Although only one of the outgoing half-edges is stored the others can be reached by circulating around the vertex. Given any of the outgoing half-edges the next outgoing half-edge can be reached by

  • outgoing half-edge = outgoing half-edge -> previous half-edge -> opposite half-edge (counter-clockwise)
  • outgoing half-edge = outgoing half-edge -> opposite half-edge -> next half-edge (clockwise)

This procedure has to be continued until a full loop around the vertex has been completed. Similarly it is possible to use the incoming half-edge for the circulation

  • incoming half-edge = incoming half-edge -> opposite half-edge -> previous half-edge (counter-clockwise)
  • incoming half-edge = incoming half-edge -> next half-edge -> opposite half-edge (clockwise)

With a slight modification this allows us to access the vertices or faces as well (only shown in counter-clockwise order)

  1. vertex = outgoing half-edge -> terminating vertex
  2. outgoing half-edge = outgoing half-edge -> previous half-edge -> opposite half-edge


  1. face = outgoing half-edge -> face
  2. outgoing half-edge = outgoing half-edge -> previous half-edge -> opposite half-edge

This procedure is continued until a full loop is completed. Using these basic operations it is possible to find all neighbors around a vertex, which is also called the one-ring neighborhood.

Face connectivity


The image above illustrates the connectivity information related to a specific face (red), which explicitly stores an index to

  • one of the inner half-edges

The corresponding outer half-edge can be reached implicitly (gray)

  • outer half-edge = inner half-edge -> opposite half-edge

As for the vertices the other inner half-edges can be reached by circulating around the face

  • inner half-edge = inner half-edge -> next half-edge (counter-clockwise)
  • inner half-edge = inner half-edge -> previous half-edge (clockwise)

Each outer half-edge or vertex is referenced from its corresponding inner half-edge.


Boundaries can be represented by an invalid face index in the half-edge. It is important that both half-edges are kept in the mesh because else it would be no longer possible to access all neighbors. A half-edge with an invalid face index is called boundary half-edge. Accordingly, a vertex that is connected to a boundary half-edge is called boundary vertex.

One can circulate around the boundary with

  • boundary half-edge = boundary half-edge -> next half-edge (clockwise)
  • boundary half-edge = boundary half-edge -> previous half-edge (counter-clockwise)

This is the same as circulating through the inner half-edges of a face with the only difference that the direction (clockwise, counter-clockwise) is reversed.

If the mesh has a boundary then one has to be very careful when accessing the faces through the half-edges in order to avoid dereferencing an invalid face index.


I will talk about manifoldness in my next blog post.

Region growing based only on boundaries.
Monday, July 02, 2012

For the approximation of the boundaries I used the class implemented by our fellow GSOC student Changhyun Choi.

Because the active segmentation method is one that is most likely to be used in a robotic application, as described in the paper scenes involved are table top scenes containing several objects. I chose the scene below as the one for testing because there are multiple objects on the table some occluding others, but as this is the scene that i will be running the algorithm on the first time I did not want it to be too complex (e.g. extremely cluttered scenes)


After implementing and testing the first version of the region growing I got the results shown in the screen shot below....just to clarify....everything that is blue gets segmented out. The green points on the large box are the points near the “fixation point”. This version of region growing is based only on the borders and since there are no borders at the touching point of the box with the table growing does not stop. To get around this problem I will take into consideration the normals of the points as well when growing.

Android + Openni Autoscript
Sunday, July 01, 2012

Thanks to Radu and others, we have finalized the autoscript for compiling the OpenNI for Android. These are all integrated to the Android sample project. Also, we have compiled the PCL for android, and in the coming weeks we shall have a complete sample project. These can be used as our standard template for development of PCL + OpenNI on Android.

Happy Canada Day.

Final performances results for the dataset based on kinect data
Saturday, June 30, 2012

In what follows, the time performances related to all the considered detectors have been collected and graphically visualized to foster a better understanding. In our framework, time performances refer to the keypoint extraction time of each detector and different scales have been taken into account: 6, 9, 12, 15 * scene_resolution are considered. Results are given below.

Intermediate results
Friday, June 29, 2012

I’m graduated from my University. Now I have master degree and I have enough time to work at project. I figured out and implemented in Matlab surface editing based on the coordinates of Laplace. This conversion is the basis of the method that I have to implement in the Code Sprint. Implemented algorithm is Laplacian Mesh Editing consists of the following steps:

  1. Based on information about the relative positions of surface points computed the Laplacian operator of the mesh.
  2. Fix points whose position should not change - “static anchors”.
  3. Choose the point, whose position should be changed, and specify the offset of its origin - “handle anchors”.
  4. Based on available information, we construct the normal equations. For the static anchors we set big weight, and for the handle anchors little weight.
  5. The new coordinates of the surface are calculated based on the method of least squares.

The results of the program are presented below.

_images/Man.png _images/Circ.png

Now I proceed to implement the algorithm from article “Template Deformation for Point Cloud Fitting”.

Final repeatability results for the dataset based on kinect data
Friday, June 29, 2012

I’m currently visiting the LAAS-CNRS in Toulouse, in order to see if it could be a good place to me to accomplish PhD studies. While doing so, I’ve finished to test the detectors on the kinect-based dataset. Again, I want to say thank you to Bastian Steder for helping me on dealing with the NARF detector. Results are graphically shown below:

Segmenting around a fixation point
Friday, June 29, 2012

As promised in a previous post I took the time and created a basic flow chart of the involved steps of segmenting around a fixation point. Upper part of the chart (getting a boundary map, setting input cloud etc. ) are steps that need to be implemented by whoever want to use the ActiveSegmentation class(example of this will be shortly available pcl/examples), steps in the lower part are implemented in the class. It is left to the users discretion to choose an appropriate boundary map. I chose to do it this to have a bigger flexibility, and because many others are working on edge/boundary detection at the moment. My test will be based on the already implemented Boundary detection (which works for ordered and unordered point clouds as well) and trying it out with mapping 2d edge detections to the 3d cloud.

  • Flow Chart Active Segmentation

In my next post I will share some preliminary results as well.

Mesh basics
Wednesday, June 27, 2012

It’s been a while since my last blog post. I have been moving into a new home and started with a job. I hope I can find the right balance between this and the code sprint. Before starting with the half-edge data structure I would like to introduce a few mesh basics, just to make the terminology clear and consistent. Here are two links that should provide more in-depth information:

A polygon mesh is a data structure where the geometry is stored along with the topology (connectivity information). The points or nodes inside a mesh are called vertices and hold the geometry. An edge is a connection between two vertices. Three or more edges that form a closed loop are called face, i.e. a triangle, quadrilateral, pentagon, ..., polygon. The edges and faces define the topology. A mesh that consists only of triangles is called triangle-mesh, a mesh that consists only of quadrilaterals is called quad-mesh and so on.

The mesh has no boundary if each edge is connected to exactly two faces. If any edge is connected to only one face then the mesh has a boundary, shown in the next image (taken from the Geometry Processing Algorithms lecture, Basics).

Mesh with boundary

If any edge is connected to more than two faces (left) or if two or more faces are connected by only one vertex (middle, right) then the mesh becomes non-manifold.

Non-manifold configurations

Another property to consider when working with meshes is the orientation of the faces. If the faces in the mesh have a consistent orientation (all clockwise or all counter-clockwise) then the mesh is orientable (left). A non-orientable configuration is shown in the right. An example for a non-orientable surface is the Möbius strip.

Current UI and Functions
Tuesday, June 26, 2012

The current UI is shown as the following image. There are 3 groups of functions to be implemented, filters, registration and surface reconstruction. I’ve implemented one filter. More filters can be added quickly, but I will do it later after the whole framework is more stable. I will implement one function for each group, then add more. I am working on poisson reconstruction now, and then registration.



  • The point clouds can be rendered either in the main window or any other dockable windows, which is quite useful for registration, for example, the user can put the frames in the main window, and put each frame in a sperated dockable window, make interaction in the dockable windows and see how well the frames align in the main window.
  • The render windows and clouds are organized in the scene explorer, where contextual menus are supported, so the user can easily access the avaiable functions for the elements in the scene.
  • The user can turn on/off some channels when rendering the clouds.
First results
Tuesday, June 26, 2012

Here, I show the first results coming from testing all the detectors but NARF. NARF has been under investigation a little bit more, but thanks to Bastian Steder the issues I have encountered should be solved.

The absolute repeatability graph:


The relative repeatability graph:


The time performances graph (related to a 6mr scale):

Kinfu Large Scale available
Monday, June 25, 2012

We have pushed the first implementation of KinFu LargeScale to PCL trunk. It can be found under $PCL-TRUNK/gpu/kinfu_large_scale.

We will make more complete tutorial for the application. But for now we put these recommendations for those who want to try it out. Some recommendations for its use:

Make smooth movements at the time of shifting in order to avoid losing track.

Use in an environment with enough features to be detected. Remember ICP gets lost in co-planar surfaces, or big planar surfaces such as walls, floor, roof.

When you are ready to finish execution, press ‘L’ to extract the world model and shift once more. Kinfu will stop at this point and save the world model as a point cloud named world.pcd. The generated point cloud is a TSDF cloud.


In order to obtain a mesh from the generated world model (world.pcd), run -> ” ./bin/process_kinfuLS_output world.pcd ” . This should generate a set of meshes (.ply) which can then be merged in Meshlab or similar.


Francisco and Raphael

Short notification
Sunday, June 24, 2012

After thinking and discussing about it I have decided to go on and implement a region growing method based on a boundary map and the fixation point. Calculating and setting the boundary map will be left up to the user the method segmenting the region of the fixation point that is enclosed by w boundary. I will also shortly add an example of how this is done in pcl/examples.

Progress on head detection and pose estimation
Friday, June 22, 2012

Hi again!

Has been some time since my last post. Was on vacation for some days, then sick and afterwards getting all stuff done after the inactivity. Anyway, I have resumed work on head detection + pose estimation reimplementing the approach from Fanelli at ETH. I implemented the regression part of the approach so that the trees provide information about the head location and orientation and did some improvements on the previous code. I used the purity criteria in order to activate regression which seemed the most straightforward.


The red spheres show the predicted head location after filtering sliding windows that reach leaves with high variance and therefore, are not accurate. As you can see there are several red spheres at non head locations. Nevertheless, the approach relies on a final bottom-up clustering to isolate the different heads in the image. The size of the clusters allows to threshold head detections and eventually, remove outliers.

I hope to commit a working (and complete) version quite soon together with lots of other stuff regarding object recognition.

Tests launched today!
Friday, June 22, 2012

This week I’ve lost some time to debug my repeatability calculator. Since I had some misunderstanding about how the keypoint repeatability shall be computed, I’ve decided to devote a little section of my blog to this aim. You can find how to compute the absolute and relative repeatability here. After having solved such issues, I spent some time in defining more suitable parameters for the detectors under consideration. To this aim, I used the frameworks for simple keypoint detection evaluation since they allow to visualize results and so to immediately detect errors. I have also changed something in the visualization.

Now, the display of the model shows:

  • all the keypoints extracted from the model by applying a specific detector (green color)
  • all the model keypoints that are not occluded in the scene (red color)
  • all the model keypoints that are repeatable (blue color).

While, the display of the scene shows:

  • all the keypoints extracted from the scene by applying a specific detector (fuchsia color)
  • all the scene keypoints that are not occluded in the model (green color).

Tests have been executed on synthetic data. Here, I post two screenshots related to the Harris3D detectors:

  • model view
  • scene view

Finally, today I’ve launched the first evaluation tests, so in the following days I will post the available results.

First functional plugin!
Friday, June 22, 2012

The normal estimation plugin works as you might expect it would- it calculates normals. It uses the undo/redo framework, and the work_queue system, so you can undo/redo adding normals to your project as much as you want, and calculations are done in a separate thread, so the GUI doesn’t lock up while it’s thinking.

I’ll add progress bars soon, but that begs the question, how can I estimate progress for pcl functions? I can emit progress updates BETWEEN individual PCL calls in a plugin (such as between KD tree calculation and normal estimation in the normal estimation plugin) but getting real timing info would require putting some sort of macro in the functions themselves.

Another consideration is how tools should be activated. Right now I have a button which you click, which runs the selected tool. This of course is only temporary, but I’m not sure what the best replacement is. For selector tools, it’s pretty easy, but for things like adding normals or segmentation, what’s the most intuitive way of activating the tools?

High Curvature Edges
Friday, June 22, 2012

During this week, I have implemented ‘high curvature edge’ estimation. Here, the high curvature edges are defined as ridge or valley edges that are not on objects’ boundaries. These sharp edges could be useful in registration, recognition, and tracking applications. I first tried the curvature values obtained from normal estimation, but it turned out that a simple thresholding scheme using curvature values does not work very well. Especially, it was hard to get clean and thin edges on these high curvature regions. I noticed that non-maximum suppression and hysteresis thresholding are required. So I employed a canny edge implementation (pcl::pcl_2d::edge::canny()). Instead of using the RGB values available in the given organized point cloud, I used ‘normal_x’ and ‘normal_y’ images, since high gradient responses on these normal images correspond to high curvature regions.

Following images show the detected occluding (green), occluded (red), boundary (blue), and high curvature (yellow) edges:

_images/screenshot-1340407069.png _images/screenshot-1340407079.png _images/screenshot-1340407086.png
Qt Model/View and Thread
Thursday, June 21, 2012

Now the point clouds and render windows are managed by the scene tree, and the thread support is added for the workers. QThread document seems to be a mess, and there’s a lot of discussion on it, finally I took the one that seems to be the best practice of using QThread(http://mayaposch.wordpress.com/2011/11/01/how-to-really-truly-use-qthreads-the-full-explanation/).

A Quick Update on Plugins
Thursday, June 21, 2012
Work is progressing on the plugin tool, undo/redo command, work queue. After spending quite a bit of time thinking about how the architecture would work, I had a basic idea of how the components would connect together. So, after making the interface and basic headers, I began writing the first plugin tool (normal_estimation). As I went along, I realized certain parts of the design weren’t practical, or inefficient (such as having commands handle all memory and preventing plugins from reading from the model items directly, also, having the work queue dispatch multiple worker threads at once). Overall though, the design is what I showed in my last post, and things are coming together well, even if somewhat slower then I would have hoped. Once the first plugin is finished, things should progress quickly in getting the other tools working, since, for the most part, I’ll just be implementing the tutorials as plugins.
Quick update
Thursday, June 21, 2012

I’m trying to push harder this week to try and get an implemented working version of the active segmentation approach. I started inserting my already existing code parts in the PCL API, created the base classes and added the basic functionalities to it. I’m planing on creating a flowchart to illustrate how things work, but life keeps getting in the way:). Anyway, I am having a little bit of trouble on deciding which way to go after I have the fixation point and the boundary map. I have to decide on doing this the way the guys did it in the original publication and implement a mapping between the 2d image and point cloud, or I could try implementing it using region growing having the fixation point as the seed. I will have to decide on this, or maybe try out both and see where that leads me.

XML interface for people
Wednesday, June 20, 2012

I’ve added an XML interface to load/save person specific configuration files. The generic XML file is also saved in trunk. This will allow people to tune the kinematic chain according to themselfs to improve on the tracking results. This XML interface is currently already in the process of beeing extended to an v0.2 which will also include the directed acyclic graph (DAG) description of the kinematic chain tree, allowing users to reconfigure this and place the root at a different position. The XML interface is now part of the PersonAttribs class and in the examples can be used with the -XML tag. For the summer I’ll have a student to help me with the training process as we now got access to the AWS cluster to enhance our training facilities. We will also be looking into training on GPU. If anybody has good knowledge about how to achieve this (RDF training in CUDA) and is willing to point this out to me or to help out, please send me an email!

Results from implementations
Wednesday, June 20, 2012

There was a PCL tutorial at CVPR recently, where some of the upcoming features in PCL were also presented. This included some of the work that I have been doing. Following are some of the results that I obtained :

Edge Detection :


Morphological Operations :


Harris Corners :

First segmentation results
Wednesday, June 20, 2012

I implemented and modified the following two paper to handle n-dimensional data inputs given as a point cloud.

  • Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, Philipp Krähenbühl, Vladlen Koltun
  • Fast high-dimensional filtering using the permutohedral lattice, A. Adams, J. Baek, and M. A. Davis.

The input for the segmentation algorithm is a point cloud with XYZRGB (will be extended further) ordered and unordered. To convert the input point cloud into a Conditional Random Field I am using a modified voxel grid for scene discretization. Each cell with a measurement becomes a node in the graphical model. For now the edge potential features incorporate position as well as color information.

As a first step I am using the algorithm for supervised segmentation, hand label an input scene. In the following picture the input cloud (left) and the labeled cloud (right) can be seen.


With the labels are used to initialize the unary potentials of the CRF. The potentials are initialized as follows. A point with associated label gets a 0.3 probability that the label is correct. Further more I am assigning 10% of the points a randomly chosen wrong label.

In the next image you can see on the left the the noisy point cloud initialized with the unary energies. On the right you can see the result after segmentation.

Modified Voxel Grid Filter for PointXYZRGBL
Wednesday, June 20, 2012

To be able to construct a Conditional Random Field from a point cloud I am using a voxel grid for sensor input discretization. This allows to work in grid cells rather than euclidean distances. The problem what I was facing was the following, which made it necessary to extend the current voxel grid implementation with a new subclass inherited from pcl::VoxelGrid<PoinT>. The current implementation of the voxel grid filter does a linear interpolation over the position and PointCloud field lying in the same grid cell. This is problematic for the PointCloud type PointXYZRGBL, which assigns a label to any point in the cloud. By interpolating these labels can become wrong since they are just simple unsigned integer values. In my implementation I modified the voting scheme for the field ‘label’. If many point lie in the same grid cell, the label number with the highest occurrence wins.


On the left side you can see the behaviour of the modified voxel grid filter. You can see on the right side, which is the standart voxel grid filter, that the labels are wrong due to the interpolation.

Face detection in PCL
Tuesday, June 19, 2012

I’ve added the FaceDetector class framework and committed it, in the back I’ve also started porting the NPP face detector implementation from Nvidia to PCL. However as there is still some uncertainty on the license of this. I’m currently waiting for a reply from Nvidia before I will commit it to trunk. This includes a GPU Viola Jones implementation from Anton Obukhov, “Haar Classifiers for Object Detection with CUDA” explained in the GPU Computing Gems book.

Kinfu Large Scale
Monday, June 18, 2012

We have been working on the implementation of Kinfu’s extension to large areas for several weeks now. This week we will start with the integration of the code to the latest trunk. However, it will be for now pushed as a separate module named Kinfu Large Scale. The reason behind this is to keep the functionality of the current KinFu, but at the same time to make available the large scale capabilities to those interested in exploring them.

The diagrams below show the distribution of classes in KinFu Large Scale, as well as a flowchart of the behaviour in the application.

_images/07.png _images/08.png

We will post an update on the integration status by the end of this week.

Francisco and Raphael

Monday, June 18, 2012

I completed the first draft of pcl::visualization::PCLPlotter which is available in the trunk. So, you can update your trunk to use it and give feedback and suggestions.

Using this class one can plot graphs, given the point correspondences. Several types of plots can be done which includes line, bar and points.

It also includes API for plotting histogram given data. This is a similar and does the same type of functionalities as done by the histogram plotting function (hist) of matlab.

The type of plot which can be created using this class is almost same as that of using its building block VTK classes. There is very less restriction without having any knowledge of VTK and its pipeline, which makes it powerful. I am adding some snapshots of the output of the plotter class to show its extent.

_images/sinecos.png _images/sinecos1.png _images/histuniform.png _images/group.png _images/smoothred.png
Android + PCL?
Monday, June 18, 2012

Having a hard time replicating the result here.


My next task is to update the instruction on the mobile_apps directory and compile a binary that I can use my Android development. The mobile_apps directory will be updated with our new code for Android next. Most likely the first thing we will see is a port of OpenNI + PCL sample on Android. Then, a simple tracking will be added to it next.

I really wonder what the performance is like on Tegra 3. That will be my first report with some sort of side by side comparison. A few optimization would be added if I can get the code compile properly first.

Parameter and Worker
Monday, June 18, 2012

It seems vtk doesn’t have ready to use draggers...so I leave it for now. Instead, I implemented a rough support for invoking workers and setting up the parameters for the workers. I will polish it and add support for more workers.

Outofcore Updates
Sunday, June 17, 2012

I am still straightening out some I/O and insertion features with PointCloud2 and the lzf compression. I am hoping to have that part completely functional this week so Justin can finish integrating the API changes with his visualizer. I have not started changing the structure of the classes to reduce the amount of templating–I will hold off on that until we have the outofcore system up and running. Stay tuned for updates.

Gaussian kernel convolution
Sunday, June 17, 2012

I’ve added a GPU Gaussian kernel convolution, this allows to add a system model to the human in order to have sensor fusion on the pixel labels. The prediction step will make use of the Gaussian uncertainty. I’ve also added the Gaussian kernel generation function.

Hectic Times
Sunday, June 17, 2012

Several things happened in the past weeks. It has been very busy and hectic.

Firstly, I got relocated to Chennai (about 1500 KMs away from my home) for my job after undergraduate. I never expected the joining date to be this early. Usually, joining starts around mid July for most of the companies recruiting from our college. But, it seems that I got little (un)?lucky! Whatever the case maybe, I will make sure that this work on PCL goes smooth and gets completed.

Before the relocation, I pushed myself hard to figure out the path, approach, and things I will use to get my API classes done. Browsing over vtk classes and getting help from Marcus, I made a concrete logical design (the way VTK objects should interact in the PCL API) before moving to this new place.

Finally, after relocation, coding in the night, I completed the PCLPlotter class which is now available in the trunk under visualization. I will ask the PCL community about what else they want from the class. I will post about the features of this ‘plotter’ class in my coming blog.

A framework for keypoint detection evaluation
Saturday, June 16, 2012

I spent the last two weeks working on a framework for keypoint detection evaluation in PCL. First, I’ve developed simple evaluators, one for each detector under consideration. The evaluators take as input a pair of files (model + scene) and they first set the right groundtruth rotation and translation. For each detector, keypoints are extracted from both the model and the scene cloud based on a set of typical parameters. Some parameters are kept fixed for each detector in order to insure the fairness of the tests. In particular the detectors are tested on various scales and the scale is considered as a multiple of the model resolution. Finally, the results are displayed by means of the PCLVisualizer: both keypoints, repeatable keypoints and keypoints that are not occluded in the scene are shown. I show here the visual results achieved by the NARF keypoint detector on data acquired by means of the Microsoft Kinect sensor:

  • model
  • scene
  • Absolute repeatability: 7

  • Relative repeatability: 0.146

The objective of this week was to extend the simple framework evaluators to run on entire datasets. The values computed refers to the average absolute and relative repeatability. Recently, a new idea has come: the repeatability measures will be accompanied with an analysis about the detectors’ time performances. Time performances will be computed based on the extraction time of the keypoints. In the weekend, I plan to define some final parameters of the detectors and to add the time performance analysis to the frameworks. Finally, my roadmap is constantly updated with my recent advances.

Coming soon: tables and graphs about this work.

Progress on Active Segmentation
Saturday, June 16, 2012

As I mentioned in my first entry, I have decided to implement/adapt the Active Segmentation with Fixation approach first, developed by A.Mishra. As a first step I took the time and thoroughly studied the works of the aforementioned author, and found that there are several improvements to the first approach. Details on this can be found on the authors home page.

For those who are not familiar how active segmentation based on fixation works and don’t want or don’t have the time to read the initial paper here is a short description. The reasoning behind the choice of segmenting based on a fixation point originates from the way humans perceive objects in their surroundings. This method proposes an approach where we do not segment the whole image in several regions and then reason on what these regions might be, but segment only the region which contains our fixation point. So in short segmenting an object from the scene implies finding a “fixation point” and finding the contour of the object that encloses our point of interest. The main steps of the segmentation algorithm are(these steps are for rgb images using monocular cues):

  1. Obtain the boundary map
  2. Improve boundary map using using monocular cues.
  3. Set fixation points (points chosen must be part of the object we want to segment out).
  4. Convert to polar coordinates around the fixation point
  5. Search for shortest path/ apply graph-cut algorithm in order to find the contour of the object.
  6. Convert back to Cartesian space, points found on the “left” of the shortest path in polar space belong to our object of interest.

Later works improve on this method among other things by adding an automated fixation strategy for finding interest points and by using depth information. Although, as stated in the work of A. Mishra we can find object boundaries by checking for depth discontinuities, but often these boundary points do not correspond to the true boundaries of the objects. In order to find the true boundaries color and texture cues are recommended. While the purpose of my project is the implementation of the segmentation algorithm, if I find the time for it, I will pursue the implementation of a fixation strategy.

Implementation wise I have created the basic API of the segmentation based on the the other segmentation algorithms already implemented in PCL. I played around with existing implementations of boundary detection and I realized that I will probably be able to use the code of my fellow GSOC developer Changhyun Choi. I am looking forward to seeing how his different implementations of edge detection will influence the performance of segmentation.

I will be back with more information soon (mostly on implementation issues).

Multiple render window, scene tree and context menu
Friday, June 15, 2012

PCL Modeler is able to render point cloud in specified render window, the scene tree is provided to keep a track of the objects in the scene, and context menu is provided to make functions more accessible to specified objects. The basic “view” related things are ready, and I will implement draggers to set the initinal positions for registration.

A Closer Look at the Framework
Friday, June 15, 2012

Well, I’ve spent the last week or so putting the model/view stuff together into a useable GUI, which is available in the trunk under apps/cloud_composer... but before you run off and try to use it, let me warn you... it doesn’t do anything (other then show you clouds). This is because the next major step in the project, the tools plugins, are just getting started. More about that later though, first lets go over the current state of the Model/View structure.

Lets go over the components:

  • Project Model: This is primary model “container” of the app. It can contain an unlimited number of clouds, which are stored in a tree structure. The top level of the tree (children items of the root item) are clouds. The children of the clouds are things like normals, point feature histograms, labels, calculated models, etc... All of these are stored in their own items, which are derived from abstract base class ComposerItem. They can represent any sort of data, but shouldn’t actually contain the data, but simply store pointers to the data. Currently the pointers are stored in QVariant objects, allowing me to use the standard Qt get/set data functions... but I think this may change in the future, as I think I’ll probably need finer control over what the set/get data functions are doing. In any case, along with pointer(s) to the data, the ComposerItems have another important member:a QStandardItemModel, which contains its properties. These are things like height/width for a cloud, or radius for normals.

These properties can be editable or not (though if they are, the plugin which defined the item will need to specify slots which can handle what happens when a property changes).

  • CloudBrowser - This is the main tree viewer for the project model, showing what clouds are in the project, and what items those clouds contain. Selecting an item or cloud here will bring it up in the CloudInspector.
  • CloudInspector - This shows the properties of an item (which is a model itself, contained within the item) in a tree view. It allows editing of fields if possible. Since the properties are a model themselves, one can easily specify that widgets should be shown here for a property, such as a QSlider for a parameter which has a range.
  • CloudViewer - This is a tabbed view which shows the projects currently open as different tabs. Each tab contains its own QVTKWidget and PCLVisualizer. When a new tab is selected, the current model for the cloud_composer application is switched. This makes it very simple to switch back and forth between projects, and ensures that all the views are updated correctly whenever a switch occurs.
  • UndoView - This is a QUndoViewer, showing the current undo/redo stack, and lets one jump back/forward in time by clicking. Many PCL operations aren’t reversible (and take a long time to compute), so we store clouds in the stack, so undo/redo just amounts to switching the pointers in the cloud_item(s) which were modified.
  • Tools - this still just a blank toolbox, but will contain icons for the different plugins. More on plugins after the image.

So that’s the plugin framework in a nutshell. None of that XML stuff I mentioned before, though that may come at a later date (i.e. after GSoC). Let’s go over what happens when a user clicks on a tool icon.

  1. Clicking on the tool once causes the tool parameter model to be displayed in the tool parameter view.
  2. Clicking on the tool again causes the tool action to trigger, unless it is something like a drag selector, in which case clicking in the cloud view triggers the action.
  3. The action triggering causes the tool’s factory to create a new instance of the tool and a new instance of one of the command types (merge, split, modify, delete, create). These commands objects are all defined in the GUI code, not inside of individual plugins. This is very important; plugins do not actually interact with the model, only commands do. This makes things like undo/redo and safe multithreading feasible.
  4. The cloud command is sent to the work queue, which is responsible for spawning threads where tools do their processing. It maintains a reference count of what items from the project model are currently being processed, and pops commands off the stack until the top command depends on data currently being worked on. The command object is responsible for making a copy of the item being worked on, and sending the copy and the tool object off to their thread to go do work.
  5. When a tool returns with a result, the command will push the old value onto the undo/redo stack, and send the resulting data back to the project model.

That’s about it for now... I’ll let you know when I’ve got a demo tool up and running; I’m starting with normal estimation. I’m sure some of this architecture will be changed as I come across things that aren’t possible, or think of better ways to do things. If any of you see something of that nature right now (won’t work, or a better way) please let me know! Also, you get a cookie for having managed to read all the way through this.

Continue my work from NVSC.
Thursday, June 14, 2012

I believe today I will be starting updating the blog on the GSoC side, and continue my work on the Tegra 3 + Android + OpenNI + PCL etc...

See http://www.pointclouds.org/blog/nvcs/raymondlo84/index.php for my previous posts on OpenNI ports and other tricks.

Now the leading issue is to get PCL running on my Tegra 3 tablet. I hope I can get a decent performance out from the box.

Design CUDA functions
Thursday, June 14, 2012

Design CUDA functions based on PF on PCL and Kinfu.

  • ParticleXYZRPY, PointXYZRGB -> should be changed
  • Octree for search nearest point.
  • Random number generator for sampling.
  • Reduction step for ‘update’ function
  • After complete one period, I need to think about more general PF.
Preview: Finger Tracking + Android 4.0 + Tegra 3 + Kinect + OpenNI + OpenCV
Monday, June 11, 2012

In the coming tutorial, we will show you how to implement the ‘well-known’ finger tracking algorithm on Kinect + Android + OpenNI + Tegra3 + OpenCV. The algorithm is indeed very simple and computationally efficient. We can achieve real-time performance easily on Tegra 3!

We will be using the range camera (depth map) to create a contour of our hand. Then, we will perform a simple polynomial approximate on the contour, and extract the convex hull (the peaks) as our finger tips. All of these operations can happen in real-time on the Tegra 3 @ 25fps! That’s pretty impressive.

The requirements are as follows:

1. Tegra 3 or Tegra 2 Android device.
2. Microsoft Kinect or Asus Xtion (not tested)
3. NVIDIA NDK + OpenNI binary (see my previous tutorials)
4. OpenCV binary (will be provided below)
5. and 10 mins of your time :)

First of all, the method provided here are simplified for real-time usage, and we can get better results if we have a better way of extracting the hand location and filtering the depth map (See Limitation section at the end). Right now, we are thresholding on a range, i.e., we assume that the hand of a person will be positioned closer to the range camera.

Depth Segmentation and Contour Extraction
Bounding Boxes
Simple Gesture recognition - finger and grasp detection
Tegra 3 Tablet + Microsoft Kinect + My custom 12V battery
Hand Tracking with 3D PointCloud Data rendering in real-time on Tegra3.
Source code:
PCL svn repository (in about two weeks)
What’s next?
Using the OpenNI hand tracking and create several interfaces. Right now we have the point cloud data visualization on Android controlled by the Kinect at 15-20fps (with about 2 million dots).
Label histograms finished and working great
Monday, June 11, 2012

I’ve finished the probabilistic label histograms in the people library, these are now part of the RDF detector class and will be added to the future detector classes as well, this will allow for easy probabilistic merging of the different detectors in order to improve the tracking results. Downside is that calculating them actually takes a lot of time, so I will look into doing them in NPP in the future and reorganising them from AOS to SOA architecture. With the SOA beeing image oriented.

First few weeks in GSoC
Monday, June 11, 2012

I’ve been working on the project “Organized Point Cloud Data Operations”. I’m adding some functionality for 2D images in PCL. These functions can be used on the RGB information or any single-channel information contained in organized point clouds. I use a datatype vector< vector< float> > to represent images. This choice makes it easy to integrate it with the PCL datatypes. Also, because it uses such simple data-types and does not have any PCL-specific dependencies right now, one can just pluck this 2D library from PCL and use it anywhere.

I found that there were some image write functions in the pcl_io module, but no read functions! So, I created some simple image read/write functions using VTK, which I will later place in pcl_io.

Till now, I’ve implemented things like convolution, edge detection and morphological operations. I’ve compared the outputs to that obtained from OpenCV implenmentations. I used some synthetic images to check for corner cases. Everything seems to be working properly! I’m yet to write gTests to do a pixel-to-pixel level comparison.

A lot of energy actually went into adapting to the PCL style of coding, writing code comments and documenting my work. Initially I used uncrustify to check for styling errors before committing. Then I found a PCL styling file for Eclipse which made life much easier. Still adapting to the “PCL style” of coding which is quite different to my default style. I guess it will take some time :)

Will be adding a lot more functionality to this module in the coming few weeks. Hope to have some pretty pictures to add in the next blog entry.

Exams and stuff
Sunday, June 10, 2012

It’s been a while since I wrote a blogpost because I’m currently very occupied with TA and correcting exams for this. In the process we started the discussion how to redesign the PCL GPU sublibrary for the next releases of PCL. The goal is to remove PCL CUDA sublibrary by PCL2.x and keep only the PCL GPU sublibrary. I’m also thinking about a possible redesign of the DeviceMemory and DeviceArray structures that are currently beeing used. Feel free to contact me or Anatoly for pointers on this.

PointCloud2 in outofcore
Saturday, June 09, 2012

Added insertion for PointCloud2 today. Justin and I have decided on a few architectural changes to the outofcore classes. I will eliminate the templating on the classes, leaving templating only on the necessary methods for insertion and query (similar to the PCDWriter implementation). This will require a major overhaul, but streamlines the interface a lot. There is also some work to do on insertion with building the LOD using PCL downsampling techniques given the change in internal data storage. With the addition of PointCloud2 for I/O to the class, we have more flexibility with the fields, and Justin’s visualizer does not need to know the point type at compile time.

Hello NVCS
Saturday, June 09, 2012

Hi, this is my first blog post for NVCS so I would like to introduce myself shortly. My name is Martin Sälzle and I recently graduated at the Technische Universität München. I have been working with the Kinect for my diploma thesis on 3D in-hand scanning of small, texture-less objects. The work is based mainly on [Weise2009] with some personal flavors. I also concentrated on the implementation without performing an active loop closure because I first wanted to get the registration pipeline as robust as possible. A bundle adjustment would surely help in the future. Here is the scanning setup (left) and some results (right):

Kinect in-hand scanner & scanned objects

For an evaluation I downloaded the model of the Stanford Bunny (left) and 3D printed it at www.shapeways.com (middle). I scanned the real world object in again with the Kinect (right) and compared it to the original model.

Stanford Bunny

I registered the scanned model to the original model and colored the original model according to the scanning error (Euclidean distance between corresponding points). A histogram is shown in the right.


My first task is to integrate the code into PCL. I am currently implementing the half-edge data structure because I reached the limits of the face-vertex mesh I used in my thesis. In the next post I will talk about that in more detail. Due to licensing issues we can’t use CGAL or OpenMesh in PCL.

[Weise2009]Weise, Wismer, Leibe, Van Gool. In-hand scanning with online loop closure. In ICCV 2009, p.1630–1637.
Occluding & Occluded Boundary Edges
Friday, June 08, 2012

For the last couple of weeks, I have developed a boundary edge detection (pcl::OrganizedEdgeDetection) for an organized point cloud. It mainly searches for depth discontinuities with a given threshold value which is linearly adpated with respect to depth values. Since the point cloud is organized, operation is quite efficient. While pcl::BoundaryEstimation takes several seconds, my unoptimized code takes about 70 ms. It also returns edge labels: occluding, occluded, and boundary (i.e. neither occluding nor occluded) edges.

In Kinect or similar sensors, it happens that ‘nan’ points exist between occluding and occluded edges. So my algorithm searches for corresponding points across the ‘nan’ area. This search is done in an organized fashion, so it isn’t so time consuming.

Following images show the detected occluding (green), occluded (red), and boundary (blue)edges:

_images/screenshot-1338586968.png _images/screenshot-1338586994.png _images/screenshot-1338587035.png
Back on track
Wednesday, June 06, 2012

After a busy couple of weeks, I am back to work on the out of core library. Justin, Julius, Jacob, Radu, and I have been discussing some pending changes to get outofcore performing at the appropriate level. Julius has provided some excellent feedback, and I think we will have some good demos soon.

Summarizing the OOC interface as it currently stands, remaining tasks on the OOC side fall into the following categories:

  1. OOC Interface (octree_base/octree_base_node) Responsible for recrusively traversing the top level in-memory octree
    1. point/region insertion methods
      • addDataToLeaf
      • addPointCloud
      • addDataToLeaf_and_genLOD
      • addPointCloud_and_genLOD
      • TODO: Need some tools for building point clouds from directory of PCDs
      • TODO: Input support for PointCloud2
      • TODO: Improve the speed of tree-building (slow with zlib compression)
    2. frustrum/box/region requests
      • queryBBIntersects
      • queryBBIncludes
      • queryBBIncludesSubsample
      • TODO: add PointCloud2 query support (almost done)
    3. Parameterization
      • container type
      • downsampling
      • compression (lossy, lossless)
      • depth/BB resolution
      • TODO: work out the interface for controlling these parameters; cross compatibility, etc...
  2. Encoding and Decoding of compressed data (Lossy/Lossless)
    • I have already added zlib compression into PCD containers
    • TODO: look into lossy compression based on the PCL Octree compression
    • TODO: Delay write for faster construction
  3. File I/O
    • Added some additional debug output to the PCDReader methods

Roadmap for the next few days:

  • Finish adding support for PointCloud2 Queries
  • Add support for PointCloud2 as input

Roadmap for the next couple of weeks:

  • Finish improvements to OOC construction (support of containers/point types, PointCloud2, caching, etc...)
  • Work with Julius on adding lossy-compression features
  • Clean up templating of interface class
  • Clean up construction of octree for speed
  • Abstract the hierarchy for easier modification of parameters
  • Make tools for OOC tree construction more flexible
Laplacian mesh editing: MATLAB realization
Monday, June 04, 2012

I almost not was working with the project the last two weeks. I was making report on my master’s thesis. Almost completed work on it. Resume work on the project: now I will work on Matlab realization deformation of the surface.

Pointclouds from outdoor stereo cameras
Monday, June 04, 2012

I recently recieved stereo data and disparity maps to work with for this project, so I wrote a tool to convert the disparity maps to PCD files. The provided disparity data has been smoothed somewhat, which I think might be problematic for our application. For this reason, I also produced disparities usign OpenCV’s semi-global block matching algorithm, which produces quite different results. You can see an example here:


Above is the left image of the input scene. Note the car in the foreground, the curb, and the more distant car on the left of the image.


Above is a top-down view of a point cloud generated by OpenCV’s semi-global block matching. The cars and curb are visible, though there is quite a bit of noise.


Above is an image using the provided disparities, which included some smoothing. The curb is no longer visible, and there is also an odd “ridge” in the groudnplane starting at the front of the car. I think this will be problematic for groundplane segmentation. Both approaches seem to have some advantages and disadvantages, so I’ll keep both sets of PCDs around for testing. Now that I have PCD files to work with, I’m looking forward to using these with my segmentation approach. Prior to using stereo data, I developed segmentation for use on Kinect. I think the main challenge in applying this approach to stereo data will be dealing with the reduced point density and greatly increased noise. I’ll post more on this next time.

Model/View Framework for PCL
Sunday, June 03, 2012

So, I’ve been working on the basic GUI, and I’ve decided to go with a classic model/view architecture, using the Qt framework. In architecture, the data the user sees is encapsulated inside of a model, which can then be viewed using various GUI elements. This decouples the data objects from the viewing interface presented to the user, which increases flexibility and reuse. So, the architecture looks something like this:


The core idea here is that we have the CloudModel as the core object, maintaining references to the multiple clouds it may contain - multiple being necessary to allow things like registration, or segmentation of points into distinct clouds to be manipulated. The CloudModel maintains information about the clouds, which can be viewed in treeform in the CloudBrowser. This will look very much like the Pipeline Browser in Paraview. Additionally, clicking on an element in the browser will display further information about the selected element in the CloudInspector. Things like number of points, datatype, etc... It will also allow the adjustment of properties; say you have normals selected, it will allow you to adjust the radius used to calculate them.

There’s also the CloudToolSelector, which is an interface to the plugins which provide PCL functionality. As I said in my previous post, I’m still on the fence on how to implement the plugins. Ideally, I’d like them to be generated automatically based on XML description files, but it remains to be seen how difficult that will be, and if it is even possible due to the templated nature of all the PCL functions.

Finally, there’s the CloudViewer, which implements a QAbstractItemView, and contains a QVTKWidget - PCLVisualizer. The eventual plan is to have this be a tabbed view, with tabs switching between cloud projects, ie, switching which model is being viewed. That will come later though, lets get it working with one project first...

In any case, I’ll push this basic framework (minus the tools) to the SVN in the coming days. Let me know what you think, and if anyone out there sees any flaws in this architecture, please let me know. This is my first foray into the model/view world, and I’d appreciate finding out if I’m doing something wrong sooner rather than later!

Training pipeline
Thursday, May 31, 2012

I finally found the time to start putting the training pipeline code public, still the Makefiles of the public version need to be adapted, and this will most likely have multiple changes and API breaks within the next weeks, but the code can be found here: http://svn.pointclouds.org/people/. There is also a change that this will move to trunk/people in the future once this is fully in the PCL namespace.

GUI for Manipulating Point Clouds
Wednesday, May 30, 2012

Hello everyone, I just wanted to give a belated introduction to this project, and a quick status update on what I’ve been up to. To begin with, the goal of this project is to develop a GUI which acts as a graphical means of using the various modules of the PCL. The basic idea is to develop something is similar to Paraview (without the distributed part, that may come later). Basically one can load multiple clouds, or capture them from an OpenNI device, and then apply PCL functions to analyze them, modify them, or merge them. The interface is a pretty standard Qt design, with docks on each side containing tools, a list of clouds in the current window, and a bottom dock with text output. PCL calls are performed in separate threads of course. I have the basic application layout done, with basic functionality - loading/saving clouds and viewing them using the PCLVisualizer class. I’ll be pushing it to the server as soon as I get back from this review meeting in Denmark. I’d like to apologize for the slow start here, I haven’t been home in 3 weeks now thanks to conferences and meetings, and so all I’ve really been able to do is read. On that note, I’d like to discuss what I’ve been reading, and what I intend to do with it. Let’s start with what I, and I assume the community, wants. Namely, a GUI application which is easy to maintain and extend as the PCL code behind it evolves:

  • Changes in underlying algorithms should have no effect on the GUI.
  • Changes to module interfaces should require as little change in GUI code as possible.
  • Adding new functionality shouldn’t require editing the application code - when a programmer adds a new function, they should be able to add it to the GUI with minimal hassle and no without the possibility of breaking the rest of the app.

This leads us to a few conclusions. First of all, we need to isolate PCL functionality into a set of plugins. This could be one plugin per module, or one plugin per tool (ie FPFH calculation, SACModel calculation, Outlier removal, etc...), or any level of granularity in between. Next, the interface for these plugins should be as simple as possible, while still remaining flexible enough to allow all of the PCL functionality to pass through it. Finally, when someone adds a function to PCL, they should be able to add it as a tool in the GUI with minimal, if any coding. In my mind, this leaves us with two options:

  • A standard plugin system, where we define an interface, and then code a class for each tool, which performs the desired calls to PCL.
  • A meta-compiler system, where tools are specified in an XML format, and we either parse it at either run-time (to determine what to do when a tool is selected) or at compile time (to generate code which is used when the tool is selected).

The second option is obviously more complicated, but would be much easier to maintain in the long run, since the only code would be the compiler/parser. The XML specification of how to use the PCL would be relatively simple, which would make updating and adding tools as simple as changing/adding a few lines of XML (copied from a general template). In the first option, a new tool (or changes to a modules interface) would require editing the code of the plugin class. This means (imho) that tools would be much more prone to breaking. So, what am I reading?

  • Di Gennaro, Davide. Advanced C++ Metaprogramming, 2012.

Which is kind of blowing my mind. I’m feeling more confident about templates by the day, but I’m also beginning to think this may be overkill for the project. On the other hand, I’m not terribly interested in programming another standard plugin interface. That would mean I was basically spending the entire summer writing implementations which call PCL functions... which would be prone to breaking, and would require quite a bit of babysitting to keep the application running properly. I know how to do that, there’s nothing new there, and I’d just be making a clone of many other things which are already out there; just with a PCL backend. The XML version would be pretty novel (at least for me - Paraview does something somewhat similar), and would result in an application that would be very easy to extend as PCL evolves. On the other hand, the XML version is higher risk- it could result in a month of coding which fails miserably, followed by a frantic month of implementing the standard plugin class version.

Now if you’ve made it through all the text, I’d like to ask what do you guys think? Any suggestions, or advice? As I said, I’ll be at this review meeting until the end of the week, so I won’t be starting any serious coding of the plugin mechanism until next week. I would really appreciate a discussion of this over the next few days.

Oh, and what’s a good name for this beast? I’ve come up with the following two:

  • Cloud Composer
  • Cloud Jockey

Anyone have anything better?

Studying particle filter on PCL
Wednesday, May 30, 2012

Particle filter(PF) on PCL is little bit different with general algorithm.

PF on PCL it contains
  • resample - update each particles
  • weight - calc each particle weights based on coherence between reference cloud and query cloud
  • update - Update representative state of particles

In general PF, motion is given from system with uncertainty, but PF on PCL use camera motion calculated in prev frame.

PCL Datasets
Wednesday, May 30, 2012

I finally convinced myself I had something to share and say. During the first week of the coding period I decided to tackle issue #682,. The current datasets when accessed online are a list (xml) generated directly though the svn. While it works and is very fast it is a bit simplistic and improvements can be made, for example utilizing the WebGL PCD viewer, auto-loading the readme files etc.

Long story short,


A lot of features have been added so far, including, binary compressed PCD support for the WebGL Viewer, search/filtering results, mirroring controls between pointclouds, loading the readme file instantly, drag n drop PCD files from your desktop etc - among others. This mini project will hopefully be online soon, and a detailed post with instructions and included features will be made then. You can preview it if you go to issue #682 and feedback would be greatly appreciated.

Apart from this, I have installed PCL, ran a couple of examples, and tested image encoding (jpeg/png) of depth/rgb data. Lossy image compression algorithms (jpeg) are optimized for images and not depth so unfortunately while the compression rate is phenomenal a lot of unwanted noise is added. Entropy encoding does not suffer from this of course. The advantage of using images and videos lies in the browser, see browsers support images and video natively (the internet would be very ascii-art-like if they didn’t) so no overhead is added in the Javascript for decoding, plus it is quite trivial to import images in the GLSL shaders for very fast analysis. I am still performing tests to find an optimal solution.

Fine reads,

Julius Kammerl “Development and Evaluation of Point Cloud Compression for the Point Cloud Library” Institute for Media Technology, TUM, Germany - May 2011,

Fabrizio Pece, Jan Kautz, Tim Weyrich “Adapting Standard Video Codecs for Depth Streaming” Department of Computer Science, University College London, UK, Joint Virtual Reality Conference of EuroVR - EGVE (2011)

Trimming the bunny
Tuesday, May 29, 2012

I’ve added NURBS curve fitting to the example of surface fitting. The curve is fitted to the point-cloud in the parametric domain of the NURBS surface (left images). During triangulation only vertices inside the curve are treated, borderline vertices are clamped to the curve (right images).

Training pipeline
Tuesday, May 29, 2012

I had a great time presenting my work at ICRA, but still some small bug was present in the public version, from time to time the background was displayed as the correct kinematic chain, this is now solved in the current trunk version. This was because it was displaying without a check for the correct tree beeing build.

2D Drawing APIs in PCL and VTK
Sunday, May 27, 2012

Played with PCLVisualizer and surprised to see that it already contains API (simple function calls like addCircle) to draw primitives like Circle, Cube, Sphere, etc. It supports many other visualization functions; no wonder it is very powerful, much more than the simple CloudViewer.

Also played with vtkContext2D and vtkChart and got overwhelmed with the amount of APIs (which bypasses the ‘unnecessary’ VTK pipeline) for 2D drawing and Charts. Next steps seems clear.

  • Extend these classes to form PCL flavored classes for 2D drawing and making charts.
  • Either make a new ‘visualizer’ by subclassing vtkContextView or make a way to connect those extended classes to PCLVisualizer.
Tests and report
Saturday, May 26, 2012

I’m working on a smart clustering algorithm which will improve the segmentation accuracy of a cloud based on the region growing algorithm. In the meantime I’m finishing the SVN classifier and I’ll be ready to put it into PCL very soon.

In this period, I had to work also for some projects in my place; so I haven’t been spending too much time for the final report of the automated noise filtering plugin. I give me one week as a deadline to finish it.

Kintinuous: Spatially Extended KinectFusion
Saturday, May 26, 2012

Thomas Whelan and John McDonald, two of our collaborators from the National University of Ireland in Maynooth, have been feverishly working away on extensions and improvements to PCL’s implementation of KinectFusion - KinFu.

Two of the major limitations of KinectFusion have been overcome by the algorithm, which Tom calls Kintinuous:

  • By constructing a 3D cyclic buffer, which is continually emptied as the camera translates, the original restriction to a single cube (e.g. 5x5x5m) has been removed
  • Operation in locations where there are insignificant constraints to reliably carry out ICP (KinectFusion’s initial step) is now possible by using visual odometry (FOVIS).

Tom has created some amazing mesh reconstructions of apartments, corridors, lecture theaters and even outdoors (at night) - which can be produced real-time. Below is a video overview showing some of these reconstructions.

We are hoping to integrate Kintinuous into a full SLAM system (iSAM) with loop-closures to create fully consistent 3D meshes of entire buildings. More details including the output PCD files and a technical paper are available here.

Friday, May 25, 2012

I finished the preliminary report for ANF, which I will make available once Mattia is also finished, so that we can start to evaluate the work of this sprint with our mentors.

In the meanwhile I have been struggling with getting my PC ready for PCL developer use again. For the next few weeks I will be finishing some other projects for school and will also be working on my PCL to do list:

  • Implement the MixedPixel class in PCL.
  • Work on the filters module clean up.
  • Finalize the LUM class implementation.
Decision Forests and Depth Image Features
Friday, May 25, 2012

Hi again!

Because of ICRA and the preparations for the conference I was quite inactive the last couple of weeks. This week I had to finish some school projects and yesterday I resumed work on face detection. Because the BoW approach using SHOT was “slow”, I decided to give it a try to Decision Forests and Features extracted directly from the depth image (similar to http://www.vision.ee.ethz.ch/~gfanelli/head_pose/head_forest.html). Although not yet finished, I am already able to generate face responses over the depth map and the results look like this:


Basically, the map accumulates at each pixel how many sliding windows with a probability of being a face higher than 0.9 include each specific pixel. This runs in real time thanks to the use of integral image for evaluating the features. For the machine learning part, I am using the decision forest implementation available in the ML module of PCL.

Evolve from PCLVisualizer
Friday, May 25, 2012

PCLVisualizer enclosed many useful functions for point cloud rendering, I planned to inherit from it but then found something has to be changed in the parent class to support what I want, so I just copied PCLVisualizer to PCLModeler for now, and let it evolve from there as I progress. It’s a simpler and faster solution since I can deal with PCLModeler only. They may be refactored to remove duplicate code if they do share a lot in the end.

Another thing I want to mention is that sometimes UI may have very different behaviors on different platform(Windows and Linux in my case), and it’s very painful to tune the code and then switch(restart!!) between them to make sure the code have consistent behaviors. Two computers are required in this case, store source code on one computer, make two builds, code on one platform(Windows in my case) and check results instantly on the other computer!

‘Modern C++’
Friday, May 25, 2012

I was browsing the code of PCL, hoping for good old C++ with lots of pointers and clean readable inherited classes like that in VTK and QT (example). But instead, I found something like this and a light bulb went off in my head.

So, I understood that I must know and read about this ‘modern C++’style which heavily uses templates, ‘smart pointers’, boost, etc, etc.. before I proceed further to play with pcl code. After doing some googling, I found the following book to be the best and recommended for people like me:

  • C++ Templates: The Complete Guide / Vandevoorde & Josuttis

I have been reading this book and checking up boost, stl and pcl code since my last update.

Finally, I can now roughly understand pcl code and have started experimenting and exploring pcl::visualization::PCLVisualizer about which I will update soon.

OpenNI + Android FAQ
Thursday, May 24, 2012

Recently, we have received several emails about the OpenNI and Android development. Here we will address some of the most frequently asked questions and some of the future directions about this project.


Q: What tablet have we used?

Ans: We’ve used the Tegra 2 and Tegra 3 Development Board provided by NVIDIA. http://developer.nvidia.com/tegra-development-kits. I can safely assume that as long as your Tablet is as capable as Tegra 2, the OpenNI library shall be able to grab the depth map image and color images at 640*480 resolution at full speed. However, I have experienced some hiccups with the OpenNI drivers, especially the color image gets corrupted sometimes.

Q: What are the minimum OS requirements?

Ans: We’ve tested everything from Android 3.0 to Android 4.x. However, I noticed that I have got better performance on Android 3.x OS? That’s something I’m currently working on.

Q: How could I know if my tablet work with the Kinect?

Ans: Try dmesg under ‘adb shell’. You should be able to see the Microsoft Kinect is recognized as a USB hub with camera, motor, and audio devices. If you cannot see any USB devices attached, I believe we need to look into that further.

Q: I’ve trouble getting the ‘multitouch’ sample code compiled?

Ans: We notice that the NDK package from NVIDIA http://developer.nvidia.com/tegra-resources. There are some dependencies that become missing in the latest SDK. I’m current working on this (basically rewriting part of the code to be conformed with the “Android Application Lifecycle in Practice: A Developer’s Guide”. LINK: http://developer.nvidia.com/sites/default/files/akamai/mobile/docs/android_lifecycle_app_note.pdf . Please check this blog again in the future for a more self-contained package.

Q: I’m getting errors when compiling the OpenNI library. What should I do?

Ans: We would highly recommend others to use the binary (as provided in the last tutorial) for the first time. I will provide a sample script for recompiling in the future.

Q: How do I push the .so libraries to the device?

Ans: Please read the push_lib.sh file in the package. We’ve provided a few comments on how to remount the /system directory for read-write.

Q: Who’s raymondlo84?

Ans: I’m a Ph.D student studying ECE at UofT. I’m currently working with PCL and NVIDIA’s summer code program and free feel to let us know what you think about the Android project.

Thank you.

Volume shifting with Kinfu
Thursday, May 24, 2012

Hello all,

This week, we focus on accumulating the world as we shift our volume around. The next video shows a point cloud we generated with multiple shifts. The TSDF data that is shifted out of the cube is compressed before sending it to CPU; this decreases the required bandwidth when transmitting the data.

The saved vertices are only those close to the zero-crossings (the isosurface). The saved vertices include the TSDF value for later use (raycasting, marching cubes, reloading to GPU). In the video, the two-colored point cloud represents tsdf positive (pink) and negative (blue) values.

We are now implementing a class that will manage the observed world. Each time the volume will be shifted, new observations will be sent to the world manager which will update the known world and will allow quick access to some parts of it.


The following is a large point cloud generated and saved with our current implementation.


Francisco and Raphael

Testing existing boundary estimation in PCL
Thursday, May 24, 2012

As a first step for 3D edge detection, I would like to focus on edges from geometric shape (e.g. depth discontinuities or high curvature regions). Before I write codes, I have tested the existing functions in PCL which are pcl::NormalEstimation and pcl::BoundaryEstimation. Since the boundary estimation is designed for unorganized point clouds, it takes several seconds to process a Kinect frame. For an efficient boundary estimation, I will design a boundary estimation for organized point cloud (i.e. Kinect frame).

Following images are the curvature and boundary images from pcl::NormalEstimation and pcl::BoundaryEstimation.

  • Curvature

  • Boundaries

Fitting the bunny
Tuesday, May 22, 2012

The functions of NURBS fitting are documented within the header files. I’ve also added a test and example file in examples/surface where you can test the algorithms and try out to fit some pcd files (e.g. test/bunny.pcd). The result should look like the image below.

Coming up next:

  • Trimming of the bunny using the B-Spline curve fitting algorithm.
Testing functionalities for the existing keypoint detectors in PCL
Tuesday, May 22, 2012

I am currently developing a test program in order to visually compare the keypoints extracted by the existing detectors in PCL. The program looks like a tutorial and it is possible to set up the characteristic parameters of each detectors at the time the program executes. I chose to introduce this possibility since I have noted that in most of the cases the detector parameters are dependent from the features and the structure of the input point cloud.

In what follows, I show the keypoints extracted by the:

  • NARF keypoint detector

  • SIFT keypoint detector

  • keypoint detector based on the uniform sampling technique


while the examples related to the harris 2D, 3D and 6D detectors are still under development.

NURBS fitting algorithms integrated
Monday, May 21, 2012

I’ve integrated all the NURBS fitting stuff (curve and surfaces) using PDM (point-distance-minimization), TDM (tangent-distance-minimization) and SDM (squared-distance-minimization). Therefore I’ve put openNURBS 5 into the repository as well. A good comparison of the fitting techniques PDM, TDM and SDM are described in “Fitting B-spline curves to point clouds by squared distance minimization” by W. Wang, H. Pottmann, and Y. Liu (http://www.geometrie.tuwien.ac.at/ig/sn/2006/wpl_curves_06/wpl_curves_06.html)

Coming up next:

  • Consistent documentation and code cleaning.
  • Examples for better understanding of the usage.
  • Conversion of NURBS to polygon meshes.
UI framework
Monday, May 21, 2012

After some playing with vtk code, I am getting used to it. I’ve created an simple Qt based UI which supports load/save point cloud/project files and some utility functions. I will clean the code a bit, submit the diff for mentor reviewing and then make an initial commit.

Getting started
Monday, May 21, 2012

These past weeks I have been reading up on the two segmentation methods described in the papers (Mean-shift and Active Segmentation in Robotics) and in the meanwhile checking out the segmentation API and the available tutorials in PCL. I worked on setting up my environment and making everything work. My next steps will involve doing some research on the adaptation of Active Segmentation to 3D data as well as creating some draft classes and outlining the main steps of the method.

Initial arrangements
Wednesday, May 16, 2012

I was having difficulty in accessing repositories with svn+ssh as I was behind a proxy server in my university with port 22 blocked. So here is the solution: get a direct connection! (well not exactly, but you have to get port 22 open somehow to get ssh working without server’s help). Now I am in my home with everything working perfectly :).

Setting up environment
Wednesday, May 16, 2012

I prefer Windows and visual studio for development. I also prepared Ubuntu and development tools (cmake, code, build, subversion tools).

I’m reading papers and text about particle filter.

  • Min-An Chao; Chun-Yuan Chu; Chih-Hao Chao; An-Yeu Wu, “Efficient parallelized particle filter design on CUDA,” Signal Processing Systems (SiPS), 2010 IEEE Workshop on Digital Object Identifier, pp.299-304


  • Analysing particle filter and CUDA API on PCL
  • Writing draft headers for particle filter
Quick Tutorial: A simple C++ OpenNIWrapper for using OpenNI on Android Devices
Monday, May 14, 2012
Important: In this tutorial, we assume that you have already installed the OpenNI shared libraries as previously discussed to your device and so. If you haven’t done so, please follow our previous post and adb push the binary libraries to the proper location Additionally, you are required to perform
mount -o devmode=0666 -t usbfs none /proc/bus/usb

everytime after you rebooted your device. The example code provided below has been tested on the NVIDIA Tegra 3 dev board and Microsoft Kinect only. If you would like to use Xtion or other range sensors, you may need to compile the drivers, and make appropriate updates to the modules.xml file. For more information, see our previous post.

To get started, we will first introduce the simple c++ wrapper we have written for handling the OpenNI calls. In this example, we will be only handling the image buffers, the color image buffer (24 bits RGB image) and the depth image buffer (16 bits single channel). I believe we can also record audio with the Kinect (?), but we have not verify that yet. Here we have the header files from our OpenniWrapper.h.

Sample Usage

To get started, you can download the header and cpp file here
http://openvidia.svn.sourceforge.net/viewvc/openvidia/tegra_kinect/jni/OpenniWrapper.cpp?view=log http://openvidia.svn.sourceforge.net/viewvc/openvidia/tegra_kinect/jni/OpenniWrapper.h?view=log

Our wrapper consists of three main function calls, start(), release(), and WaitAndUpdate(). By default, the openni wrapper will initialize the depth map and rgb image and runs in a separate thread (i.e., non-blocking) after the start(). To obtain the depth map or rgb images, we simply call WaitAndUpdate() (blocking call) and then provide a pointer which stores our depth and rgb image. The OpenniWrapper does not provide any memory allocation and it is our responsibility to handle the malloc and free.

#include “OpenniWrapper.h”
OpenniWrapper *openni_driver;
int main(){

openni_driver = new OpenniWrapper();


//initialize the driver
//something is wrong. see log files
return 1;

int width = openni_driver->getWidth();
int height = openni_driver->getHeight();
unsigned char *rgb_buffer = (unsigned char*)malloc(width*height*3*sizeof(unsigned char));
unsigned short *depth_buffer = (unsigned short*)malloc(width*height*sizeof(unsigned short));

WaitAndUpdate(); //blocking call
process_buffers(); //can be multithreaded

//release the resources
To compile the source with the OpenNI libraries, we need to include the OpenNI headers and the shared lib paths to the Android.mk file. Particularly, I have added


LOCAL_C_INCLUDES += openni_kinect_include/include

You can find the openni_kinect_include/include directory from the SVN respository below. Or you can simply create that directory by copying the headers from the OpenNI source directly.

CPU usage and FPS:
12539 2 60% R 14 532164K 65048K fg app_39 com.nvidia.devtech.multi
I/Render Loop:(12539): Display loop 0.014692 (s) per frame –> over 60fps for rendering, the capture loop is threaded.

OpenNI Binary (Kinect Only)


Sample Source Code


Reading days
Monday, May 14, 2012

My primary activities in the last weeks were setting up the working environment, doing some basic testing on the new 3D object recognition tutorial, and fixing feature #644 while solving some related bugs. Beyond these tasks, I also have been engaged in some important readings. I list them below, since they could be useful to someone who approaches to the object recognition field:

  • Salti, S.; Tombari, F.; Stefano, L.D.; , “A Performance Evaluation of 3D Keypoint Detectors,” 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), 2011 International Conference on , vol., no., pp.236-243, 16-19 May 2011
  • Yu Zhong; , “Intrinsic shape signatures: A shape descriptor for 3D object recognition,” Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on , vol., no., pp.689-696, Sept. 27 2009-Oct. 4 2009
  • Tombari, F.; Di Stefano, L.; , “Object Recognition in 3D Scenes with Occlusions and Clutter by Hough Voting,” Image and Video Technology (PSIVT), 2010 Fourth Pacific-Rim Symposium on , vol., no., pp.349-355, 14-17 Nov. 2010

I want to do a final remarkable consideration, it is useful to those users who are approaching to the 3D object recogntition tutorial. To test the tutorial with a more extended dataset, you could refer to http://vision.deis.unibo.it/SHOT/ (scroll down till the dataset section is reached).

Laplacian mesh editing: MATLAB realization
Sunday, May 13, 2012

Now I try implement algorithm for 3D mesh editing base on Laplacian matrix. I decide to do it in MATLAB for verify my idea. If it will work I will implement it in C++.

Adding a new filtering technique
Saturday, May 12, 2012

I am working on a new filtering technique that divides the point cloud into boxes that have similar densities and approximates the box by the centroid. This will be completed soon. Parallely I am also working on the registration tutorial that I mentioned in the previous blog post.

Code for Min Cut Segmentation
Friday, May 11, 2012

Hi everybody. I have committed the code for min cut segmentation. At the moment only basic functionality can be used. I wanted to make a generalization for adding points that are known to be points of the background. Right now this option is turned off. It works fine but I just wanted to run more tests for this functionality.

So my next step will be to test this option. After that I will start to write tutorials for the code that I’ve added(RegionGrowing, RegionGrowingRGB, MinCutSegmentation). Exams are nearing so I will pay less time for working on the TRCS. But this is temporary. I hope that I will finish additional functionality, that I mentioned, before the beginning of the exams.

Testing, reporting and continuously learning new things
Friday, May 11, 2012

In the process of reporting for the sprint, Mattia and I have been working more in-depth on getting test results from the system and have been training a classifier. There have also been a few improvements to the system such as a new feature that should aid the distinguishment of leaves.

I have furthermore been learning up on function pointers, functors, boost::bind, boost::function and lambda functions. They will be useful for the filters module clean up.

Volume shifting with Kinfu
Friday, May 11, 2012

After a few weeks wrapping our minds around KinFu and CUDA, we have included the shifting functionality while scanning. Using the cyclic buffer technique (introduced in our previous post), we are able to shift the cube in the world.

Since we shift in slices, some information about the scene is kept in memory. this information is used to keep track of the camera pose even when we shifted the cube.

At this point, the data that is being ‘shifted out’ is currently lost, because we are clearing the TSDF volume slice to make space for the new information.

The next step is to extract the information from the TSDF volume before clearing it. This will allow us to compress it and save it to disk, or to a world model being saved to GPU/CPU memory.

We have some ideas on how to perform this compression and indexing, and we will explore them in the coming days.

At this point we are cleaning the code and adding useful comments. We want to push this to the trunk soon.

This video shows the shifting for a cube with volume size of 3 meters. The grid resolution is 512 voxels per axis.

This video shows the shifting for a cube with volume size of 1 meter. The grid resolution is 512 voxels per axis.

Francisco & Raphael

Setting up code structure
Friday, May 11, 2012

I’ve prepared cmake files for placing PCLModeler under apps/ and set up a very simple UI for me to try and learn vtkPolyData. I think pcl_visualizer is a good reference for learning it.

Reimplementing SSD
Monday, May 07, 2012

I am currently reimplementing SSD using PCL. It looks like I need additional datastructures defined over the octree, such as primal and dual graphs. I am investigating if the current octree version supports those graphs. Otherwise, I will implement these. As timing for this project, I am planning to deliver the software (and a report for the sponsors) by June 18.

Getting familiar to VTK
Monday, May 07, 2012

I am new to VTK and after some digging I agreed the reputation that VTK has a steep learning curve seems to be true. However, it’s really versatile and for sure is competent for this project. I am still not very clear about the VTK pipeline, but I think it should be OK after I read more code examples.

Setting up development environment
Sunday, May 06, 2012

I prefer to develop under Windows then double check and commit under Linux, and I’ve managed to get pcl compiled for both. Well, there are some errors for some modules, but I will leave it for now.

Last modifies for the final program
Saturday, May 05, 2012

The last weeks I’ve been also busy with some works in my town. I just finished my part of the “final noise removal system” and I aim to finish the report for in the next few days. Our work is finally returning promising results but we still need a feedback from our mentors to improve the solution.

Friday, May 04, 2012

During the last two weeks I have been working on the report for this sprint, which is becoming bigger and taking more time than I anticipated. I am also aiming to finish the filters module clean up which can be followed here: http://dev.pointclouds.org/issues/614.

More results of segmentation
Wednesday, May 02, 2012

Hi everybody. I have ran some tests and results are terrific. The algorithm works well even for very noisy point clouds. Here are some results:

There were some problems with the min cut algorithm. The best known is the algorithm proposed by Yuri Boykov, Olga Veksler and Ramin Zabih(based on their article “Fast Approximate Energy Minimization via Graph Cuts”). But it has some license constraints. So I used it only for testing at the beginning of the work. My code is using boykov_kolmogorov_max_flow algorithm from boost graph library. At the beginning I tried to use push_relabel_max_flow from BGL but it works a little bit strange. For the same unary and binary potentials(data and smooth costs) push_relabel_max_flow gives worse results then boykov_kolmogorov_max_flow does. So I’ve decided to give preference to the last one.

Right now I’m going to make some last changes in the code to fit the adopted rules.

Forgot to tell there are some problems with Online 3D point cloud viewer. I have already wrote about it to the authors. The problem appears only in Mozilla Firefox, it works fine with Google Chrome. So I hope everybody is able to see my point clouds.

Point cloud streaming to mobile devices with real-time visualization
Wednesday, May 02, 2012

The following video demonstrates point cloud streaming to mobile devices with real-time visualization. The point clouds are streamed from a desktop server to mobile devices over wi-fi and 4G networks. Point clouds are captured using the PCL OpenNI grabber with a Microsoft Kinect sensor. Visualization on the mobile devices is performed inside KiwiViewer using the VES and Kiwi mobile visualization framework.

The rendering frame rate is decoupled from the streaming rate. In the video, the app streams roughly ten clouds per second. In a follow up blog post I’ll describe the implementation of the current code and list future development goals. In the meantime, you can reach me on the PCL developers mailing list or the VES mailing list. For more information about VES and Kiwi, the mobile visualization framework used in the demonstration please see the VES wiki page

point cloud streaming app on mobile phone
3D Face Detection
Wednesday, May 02, 2012

Here are some results regarding 3D face detection:

To speed things up I switched to normal estimation based on integral images (20fps). I found and fixed some bugs regarding the OMP version of SHOT (the SHOT description was being OMP’ed but the reference frame computation was not) and now I am getting almost 1fps for the whole recognition pipeline. Was not convinced about the results with RNN so I decided to give Kmeans a try (no conclusion yet regarding this).

Code is far from perfect (there are few hacks here and there) but the results start looking decent. Regarding further speed-ups, we should consider moving to GPU, however, porting SHOT might not be the lightest task. Next steps would be stabilizing the code, do some commits and then will see if we go for GPU or move forward to pose detection and hypotheses verification.

Friday, April 27, 2012

It’s been some time since our last post, and we have been learning a whole lot about CUDA and Kinfu itself.

Two big points have been done so far:

# We analyzed the application from a complete perspective, to try to determine which are its strong points and its improvement areas. What we found when analyzing Kinfu looks somewhat like this figure:


The first impression is that Kinfu has clearly defined modules, which are reflected in the code distribution among the different files. However, the application requires that the functionalities and information shared between these modules is tightly coupled. It can be seen in several parts of the code that the parameters just keep cascading down the calls all the way from kinfu_app.cpp (highest level) to the GPU kernel calls (e.g. raycasting or integration operations). The main reason for this constant copying of parameters and long parameters list is precisely that the modules are separated from each other.

This might sound basic to experienced CUDA users, but we find it good to remind it there, as this was one of our main “aha moment”.

For example, if one was to declare a variable in internal.h - which is a header file included in all modules (via device.hpp) -, what you would obtain is a copy of this variable for every CUDA module, instead of getting only one of them accesible from all the modules. Once again, this is the result of having the modules compiled as independent.

After discussing with Anatoly, it has been defined that in the mid-term all the internal functionalities of Kinfu will be consolidated into a single TsdfVolume class which will contain all operations for integration, ICP, raycasting, and memory handling. This will result in a more readable code, avoid long parameter lists between operations, while keeping the real-time performance of the current implementation. In other words, at a higher level the code will be clearer and more concise, while at low level it will have the same performances.

# We have been working on the solution to tackle the scalability limitations in Kinfu. The expected behavior can be described as follows: Whenever the scene is being scanned, the user may approach the borders of the current cube. At that moment, part of the existing information about the cube must be compressed and stored to either GPU memory, Host RAM, or HDD to be saved. For now, the latter two are not in scope.

The cube that we are reconstructing is shifted, but it is partially overlapped with the previous cube. In other words, we will see a scene that is partially empty, but still contains part of the information of the previous cube. The origin of the TsdfVolume will be shifted as well, depending on where we approached the border of the initial scene. We believe that adding this overlapped information will help to estimate the pose of the camera once the new cube is loaded. This shift is handled via a circular 3D buffer implementation in the TsdfVolume class.

For clarity, we show this process in the figure below.


Francisco & Raphael

Pressing On
Wednesday, April 25, 2012

Back from vacation and getting into some of the more exciting stuff that is required to get our out-of-core viewer up and fully functional. There has been quite a bit of talk on implementation and this week I plan on adding three important features:

  • Frustum Culling - Providing a way to query the octree for nodes on disk given the camera frustum. This will provide a good start in determining what data needs to be streamed in.
  • Threading - All data processing needs to happen in a separate thread as to not block the main UI.
  • Caching - Implement some type of Least Recently Used(LRU)/Least Frequently Used(LFU) cache to store streamed data and discard the less relevant.

I’ve spent most of my time recently learning and understanding VTK and getting the new mapper working with Radu to get this integrated into the pcd_viewer. The current viewer doesn’t handle large datasets and this integration will allow for much heavier data sets in realtime.

The new VTK classes are far from finished and will require quite a bit of work to handle all the data VTK can throw at it. I’ve sent Marcus, a core VTK developer, my work in progress in hopes to get some help on proper and stable integration with VTK. Here are some of the todo items still to be addressed.

vtkVertexBufferObject - Vertex Buffer Object Wrapper

  • Needs to support both VTKs float and double data types
  • Needs to support indices via GetVerts and *vtkIdType
  • First implementation supports vertices, indices and colors. Need to add all of VTKs attributes, i.e. normals, textures, ...

vtkVertexBufferObjectMapper - Vertex Buffer Object Rendering Wrapper

  • Need the ability to set vertex, fragment and geometry shader (Currently uses simple shaders)
  • The current mapper has vertex, indices and colors VBOs. Need to support all of VTKs attributes, i.e. normals, textures, ...
  • Determining whether to pull point/cell data and if colors are present on the data passed in.
  • Handle all VTK data types to make sure calls to glVertexAttribPointer are correct and consistent.
Laplacian matrix
Wednesday, April 25, 2012

Last week, I wrote the program for calculation the Laplace matrix with uniform and cotangent weights (Described in the article Laplacian Mesh Optimization). Now I want to write a program for smoothing based on the Laplacian matrix.

Porting OpenNI to Android 4.0 + Microsoft Kinect + Tegra 2 + Sample Code
Wednesday, April 25, 2012

In this tutorial, we will show you how to compile OpenNI shared libraries and install them on your Android devices. Also, we will show you how extract the depth map information from the Microsoft Kinect with the SensorKinect driver (note: the OpenNI framework does not come with any drivers! Instead it dynamically loads the modules in runtime!). Before we start, I assume that we have already installed the following packages. I have tested this setup on my Mac OSX 10.6.8, and I believe Linux users shall not have any problems replicating these. Any Windows users? :)

System Requirements:

First, let’s get the OpenNI sources from the Git repository

cd ~/NVPACK/android-ndk-r7b/sources
cd OpenNI/Platform/Android/jni

If everything goes well, we will see the following files in this directory.

ls ~/NVPACK/android-ndk-r7b/sources/OpenNI/Platform/Android/libs/armeabi-v7a/
Sample-SimpleRead libOpenNI.so libnimRecorder.so niReg
Sample-SimpleSkeleton libnimCodecs.so libusb.so
libOpenNI.jni.so libnimMockNodes.so niLicense

Now, we will compile the SensorKinect Driver.

export NDK_MODULE_PATH=$HOME/NVPACK/android-ndk-r7b/sources/OpenNI/Platform/Android/jni
mkdir ~/NVPACK/openni
cd ~/NVPACK/openni
cd SensorKinect
git checkout faf4994fceba82e6fbd3dad16f79e4399be0c184
cd Platform/Android/jni

Again, if everything goes well, the ndk-build will create another set of .so shared libraries files in this directory

ls ~/NVPACK/openni/SensorKinect/Platform/Android/libs/armeabi-v7a
libOpenNI.so libXnDDK.so libXnDeviceSensorV2.so libusb.so
libXnCore.so libXnDeviceFile.so libXnFormats.so

Finally, we are ready to push these libraries on the Android device. One problem is the /system directory is read-only. Therefore, we have to remount this directory first by

adb shell
mount -o remount rw /system
mkdir /data/ni

Then, we push the packages to the device with these commands (these are our previously compiled .so files!)

adb push libOpenNI.so /system/lib
adb push libOpenNI.jni.so /system/lib
adb push libXnCore.so /system/lib
adb push libXnDDK.so /system/lib
adb push libXnDeviceFile.so /system/lib
adb push libXnDeviceSensorV2.so /system/lib
adb push libXnFormats.so /system/lib
adb push libusb.so /system/lib
adb push libnimCodecs.so /system/lib
adb push libnimRecorder.so /system/lib
adb push libnimMockNodes.so /system/lib

In runtime, the OpenNI framework will look for these shared libraries. To inform the OpenNI which modules to load, we need to commit this .xml file to this directory. It is very very important that all these files are put in the properly directories!

adb push data/in/modules.xml /data/ni

Here is example modules.xml file we have used.

<Module path=”/system/lib/libnimMockNodes.so” />
<Module path=”/system/lib/libnimCodecs.so” />
<Module path=”/system/lib/libnimRecorder.so” />
<Module path=”/system/lib/libXnDeviceSensorV2.so” configDir=”/data/ni/” />
<Module path=”/system/lib/libXnDeviceFile.so” configDir=”/data/ni/” />

As we can see from the modules.xml, the XnDeviceSensorV2 will look for

vim ~/NVPACK/openni/SensorKinect/Data/GlobalDefaultsKinect.ini
#and we will need to set the flag UsbInterface to 1.
#assume you are in that directory
adb push GlobalDefaultsKinect.ini /data/ni/

Lastly, we will copy the SamplesConfig.xml to the /data/ni/ directory as well. You can find this file from the sample code in OpenNI or from our svn repository.

adb push SampleConfig.xml /data/ni/

That’s it! Now, we have the latest OpenNI + Kinect Driver compiled and install on your Android! In the coming up tutorial, I explain how we can use these drivers with NDK with sample codes! and more tricks are coming up! (e.g., mount -o devmode=0666 -t usbfs none /proc/bus/usb) !!

If you would like to check if the modules are installed successfully, you can run trying niReg on your Android device.

adb push niReg /system/bin

and run

root@android:/ # niReg -l (under adb shell)

and you should see something similar to...

. . .
/system/lib/libXnDeviceSensorV2.so (compiled with OpenNI
Device: PrimeSense/SensorV2/
Depth: PrimeSense/SensorV2/
Image: PrimeSense/SensorV2/
IR: PrimeSense/SensorV2/
Audio: PrimeSense/SensorV2/
. . .

Hirotaka Niisato (http://www.hirotakaster.com/) has also provided excellent tutorials on how to compile OpenNI on Android! I would like to thank him for providing some of the basic scripts for compiling the libraries! ;). However, I’ve trouble running the sample code he has provided. Instead, I have ported the Sample-NiSimpleRead to run on Android instead. ;) Stay tuned!

Here is the complete script we have used: copy and paste these and that will get you started.

#adb shell
#mkdir /data/ni
#mount -o remount rw /system
#mount -o devmode=0666 -t usbfs none /proc/bus/usb

adb push system/lib/libOpenNI.so /system/lib
adb push system/lib/libOpenNI.jni.so /system/lib
adb push system/lib/libXnCore.so /system/lib
adb push system/lib/libXnDDK.so /system/lib
adb push system/lib/libXnDeviceFile.so /system/lib
adb push system/lib/libXnDeviceSensorV2.so /system/lib
adb push system/lib/libXnFormats.so /system/lib
adb push system/lib/libusb.so /system/lib
adb push system/lib/libnimCodecs.so /system/lib
adb push system/lib/libnimRecorder.so /system/lib
adb push system/lib/libnimMockNodes.so /system/lib
adb push modules.xml /data/ni/
adb push GlobalDefaultsKinect.ini /data/ni/
adb push SamplesConfig.xml /data/ni/
Wednesday, April 25, 2012

I’m recording a one person test dataset today which has Vicon measurement data in there as well. I’ll use the plug-in-gait model from Vicon to have something to compare the behavior of our algorithm against. As a first step this will just be a single person, in order to see how this data is best recorded and synchronised in order to have a more extensive experiment later on. In the meanwhile with help of Anatoly’s CUDA experience the peopleDetector now runs at a stable 6-8fps which can be considered adequate for online use. My hopes are that bringing in the prior fusion will only help speeding up this process.

Debugging and varying weights
Tuesday, April 24, 2012

Hi everybody. I have finally finished the work on the code for min cut segmentation. It took so long because of some bugs. The simplest mistakes are always hard to find. Any way. the code is ready and I even have some results of segmentation. The algorithm requires point clouds with cutted floor planes etc. So first of all I have deleted the floor plane. You can see the result in the point cloud viewer.

Right now I am trying to find the best unary and binary potentials(edge weights), because those that were mentioned in the article are not good enough. I hope that at the end of this week I wil be able to find them and upload the resulting code.

Face Detection with Bag Of (Geometric) Words
Tuesday, April 24, 2012

Yesterday and today, I have been working on face detection and went ahead implementing a Bag Of Words approach. I will summarize briefly the steps I followed:

  1. Generate some synthetic training data of faces (as explained in past posts) and some data of other kind of objects.
  2. Computed a 3D feature (I tested FPFH and SHOT features) on the generated views.
  3. Codebook generation. I implemented a naive version of RNN clustering (Leibe 05). Compare to K-means, RNN does not require the user to input the length of the codebook. It is controlled by a similarity threshold indicating when two clusters are similar enough to be merged together.
  4. For the training views belonging to faces, compute the Bag Of Words (Sivic, Zisserman 03). Roughly, which codewords are activated by faces and do not activate for other objects.
  5. For recognition, compute the desired features on the Kinect cloud and find nearest neighbour word on the codebook.
  6. 2D sliding windows (after backprojecting the feature position if needed) and BoW of the feature falling into the sliding window.
  7. Compute similarity between the BoW and those BoWs from the training faces. Now I take the max and save it.
  8. Sort candidates by similarity and post-process how much you like.

Right now, I am not doing any post-processing and just visualizing the first 5 candidates (those sliding windows with higher similarity). You can see some results in the following images. Red points indicate computed features (that are not found in the visualized candidates) and green spheres those that vote for a face within the candidates list. Next step will be some optimizations to speed up detection. Now is taking between 3 and 6s depending on the amount of data to be processed (ignore points far away from camera and downsampling).

_images/face_detection3.png _images/face_detection4.png _images/face_detection5.png
Correspondence rejection based on computing an optimal inlier ratio
Sunday, April 22, 2012

This method extends the existing outlier rejection method ‘CorrespondenceRejectionTrimmed’ based on the fraction of inliers. The optimal inlier ratio is computed internally instead of using a user specified value. The method is described in the paper ‘Outlier Robust ICP for Minimizaing Fractional RMSD’ by Jeff M. Philips et al. I am adding the test code in the test folder. This wraps up the porting of correspondence rejection methods from libpointmather to PCL. Next I am planning to add some filtering methods. Before that I will be adding tutorials on using the registration pipleline in PCL.

Sunday, April 22, 2012
This is an overview of things I will focus on the next week:
  • Add a probabilistic DeviceArray into the RDF CUDA code
  • Expand each tree vote into a more probabilistic 1/NR_TREES = 25% vote in this DeviceArray
  • Add a discretisation function (in a CUDA kernel) for this array, giving the final votes

This expansion allows the detector to do sensor fusion based on data that can be fetched from other detectors (like face-detector). A more theoretical explanation to this will follow.

Porting SSD
Sunday, April 22, 2012

On Monday (4/16), I had a short discussion with my mentor about how to improve surface reconstruction code in PCL. We started porting our SSD software based on the work SSD: Smooth Signed Distance Surface Reconstruction. The code hasn’t been pushed to the trunk yet. Once this is ready, it will provide a base for us to make further improvements. I am going to make another blog post soon to share our improvement plans.

First blog entry
Saturday, April 21, 2012

This is my first blog post. Past two weeks I’ve been integrating the code for run-time RDF labeling into PCL together with Anatoly. The code is now arranged in the PeopleDetector class and all the CUDA code is converted to use the DeviceArray structures. For the Seeded Hue Segmentation (SHS) and Euclidean Labeled Clustering (ELEC) a custom function was implemented specific for our use case. For interested users: please keep on using the original version in trunk/segmentation and don’t use the one in gpu/people as that one is currently written specific.

Completing modules for PCL
Friday, April 20, 2012

The last days I’ve been working on two methods I would like to put into PCL. The first regards the contour extraction of objects in a cloud, the second is the Support Vector Machine. Two issues have been threw at:

From tomorrow I’ll work at the system that I and Florentinus are setting up. I will have to work on a smart way to easily train a classifier.

Publication and face detection
Friday, April 20, 2012

This week I have been working on a publication which took away most of my time. However, we found some time to chat with Federico about face detection. We decided to try next week a 3D features classification approach based on a bag of words model. Such an approach should be able to gracefully deal with intraclass variations and deliver regions of interest with high probability of containing faces on which we can focus the most expensive recognition and pose estimation stage.

Finishing up segmentation
Thursday, April 19, 2012

I have been finishing up the segmentation steps of the system. The following is a typical result when used on the Trimble outdoor sets using the default parameters:

These results are downsampled from the original and the ground segmentation has already taken place. The clusters marked as red are not passed to the SVM classifier. They are either too small and will be marked as isolated and removed without the need for classification, or they are too large in which case they will be classified as background and are never removed.

I upgraded the system with a very basic over-segmentation detection system that performs quite alright. Furthermore, most my time was spent getting the parameterization right: Making all parameters depend on as few global parameters as possible and still allowing to greatly and intuitively vary the clustering needs. Since we are having a chat with the mentors soon, I will discuss these topics in the report that I will write for that chat, which is what I will be working on for the next few days.

Correspondence rejection based on the orientation of normals
Wednesday, April 18, 2012

As mentioned in my previous blog post I added a new correspondence rejection method based on the orientation of the normals. A threshold based on the dot product of the normals at corresponding points is used as a criteria for correspondence rejection. I am adding sample codes demonstrating the usage of the new classes in the test folder. It should be in the trunk after the code review. I will be adding a couple of new rejection methods over the next week to wrap up the port of correspondence rejection methods from libpointmatcher.

Improved segmentation step
Tuesday, April 17, 2012

I have worked on improving the segmentation steps of the system. Initially I wanted to do a separate sub clustering after the euclidean clustering. However, the version I have now performs the advanced segmentation simultaneously with the euclidean clustering. In the future I might add this as a new class to PCL since it could be useful for any type of conditional region growing.

For our purposes this conditional region growing currently checks distance, intensity difference and curvature difference with respect to the candidate points, which results in better clustering than before. However, the balance between over- and under-segmentation is still as tricky as before. Ideally, the trees are clustered into separate leaves so that the SVM has a lot of different training candidates from one set. It is still quite impossible to get this clustering while not having too much clustering in the non-noise parts of the scene. I did manage to condense all the parameters for the segmentation step into one parameter, indicating clustering aggressiveness, so that a user could tweak this balance himself.

An idea I still want to investigate is to use a pyramid or staged system that will only further sub-segment under certain conditions. I think these conditions will need to be more complex than what I am using now (intensity, curvature and location). Although making them too complex could limit the applicability of the system.

Out of core node data to PCD
Monday, April 16, 2012

For the past week, I have finished the change from dump binary C-struct to PCD file at each node. This should help with debugging, and will make manipulating the data, at each node and pre-rendering, easier. I need to clean up the code a bit more before I commit to the repository.

In addition to cleaning up before committing, a few things that remain outstanding in this change are:

  • block reading of point cloud files
  • efficient appending to PCD files
  • a more detailed consideration of how to take advantage of compression methods
  • tending to the insertion and query methods of the octree_ram container class for use with PCD files
Principal Component Analysis for edges and contours extraction
Monday, April 16, 2012

PCA (principal component analysis) is a simple method to extract information from a set of data. Geometrically, its objective is to present the data from the reference axes that mainly highlights their structure.

The idea was to give a meaning to points depending on their positions and using the PCA. For each point I extracted the k-neighbors, calculated the PCA (from the centroid) and analyzed the eigenvalues. While the eigenvectors give the direction of the axes along which the data are extended, the eigenvalues ​​are their length.

From a geometrical point of view, three eigenvalues ​​of equal magnitude represents a uniform distribution of a solid; a limited distribution on two axes represents a plane, while a distribution skewed on a single eigenvalue represents a line. Carrying this theory on a cloud of points, we have that points in the proximity of the edges has one eigenvalue much larger than the other. For the inner points, in case of uniform distribution of the point density, the length of the eigenvalues ​​will be better distributed along all the three eigenvectors.

Normalizing the three eigenvectors, we have that in correspondence of a contour or an angle of an object the largest eigenvalue is a value greater than 2/3 (about 66%) of the total propagation of the points. The results are shown below.

_images/1.jpg _images/2.jpg
Description of the algorithm
Monday, April 16, 2012

The steps of the chosen algorithm:


User-program interaction:

  1. User must choose the most appropriate template for the input point cloud.
  2. User must identifies and marks pairs of corresponding points on template and point data, defines a local frame for every marked point (Fig. 1 (a, b)).

The program:

  1. From the selected correspondences, we compute the initial deformation of the template. We compute Laplacian coordinates of the template and estimate local rotations for every pair of corresponding points.
  2. Make initial deformation (Fig.1 (c)).
  3. We estimate a global scaling factor for the template which is applied to the Laplacian coordinates, to consider the fact that template and input data may be (and generally are) scaled differently may distort the resulting shape in an unacceptable way . Make new deformation (Fig.1 (d)).
  4. Iterative process moves the template nearer towards the data points guided by local correspondences which are established from simple heuristics. This local matching is motivated by iterative closest point (ICP) algorithms for finding (rigid) transformations for shape registration (Fig.1 (e-g)).
  5. We improve the remaining regions of the deformed template for which no counterparts exist in the data (Fig.1 (h)).

The first phase of the realization described algorithm, I will write code to calculate the initial approximation and estimate the coordinates of Laplace

Object recognition framework (Global Pipeline II)
Saturday, April 14, 2012

Hi again, first I need to correct myself regarding my last post where I claimed that this week I would be working on face detection. The experiments and tests I was doing on CVFH ended up taking most of my time but the good news are that I am getting good results and the changes increased the descriptiveness and robustness of the descriptor.

Mainly, a unique and repeatable coordinate frame is built at each CVFH smooth cluster (I also slightly modified how the clusters are computed) of an object enabling a spatial description of the object in respect to this coordinate frame. The other good news are that this coordinate frame is also repeatable under roll rotations and thus can substitute the camera roll histogram which in some situations was not accurate/resolutive enough yielding several roll hypotheses that need to be further postprocessed and inevitably slow down the recognition.

This are some results using the first 10 nearest neighbours, pose refinement with ICP and hypotheses verification using the greedy approach. The recognition time varies between 500ms and 1s per object, where approx 70% of the time is spent on pose refinement. The training set contains 16 objects.


A couple of scenes avoiding pose refinement stage where it can be observed that the pose obtained aligning the reference frame is accurate enough for the hypotheses verification to select a good hypothesis. In this case, the recognition time varies between 100ms and 300ms per object.


I am pretty enthusiatic about the modifications and believe that with some GPU optimizations (mainly regarding nearest neighbour searches for ICP and hypotheses verification) a real time (at least, almost) could be implemented.

Regarding the local pipeline, I implemented a new training data source for registered views obtained with a depth device. In this case, the local pipeline can be used as usual without needing 3D meshes of the objects to recognize. The input is represented as pcd files (segmented views of the object) together with a transformation matrix that align a view to a common coordinate frame. This allows to easily train objects in our environment (Kinect + calibration pattern) and allow the use of RGB/texture cues (if available in the sensor) that were not available using 3D meshes. The next image shows an example of a fast experiment where four objects where scanned from different viewpoint using a Kinect and placed into a scene with some clutter in order to be recognized.


The red points represent the overlayed model after being recognized using SHOT, geometric correspondence grouping, SVD, ICP and Papazov’s verification. The downside of not having a 3D mesh is that the results do not look so pretty :) Notice that such an input could as well be used to train the global pipeline. Anyway, I will be doing a “massive” commit later next week with all these modifications. GPU optimizations will be postponed for a while but help is welcomed after the commits.

First Automated Noise Filtering program
Thursday, April 12, 2012

Yesterday, I sent to Jorge the first revision of the program that implements the pipeline designed for the removal of vegetation and ghost points. The program, designed by me and Florentinus, segments the image into large groups which are then classified.

The biggest problem encountered is related to the number of samples needed for optimal training of the classifier. Dividing the cloud into large groups, these samples are too few to make this happens. So we think of a more refined method for the recognition of the problems.

Since Florentinus will be busy this week, I will concentrate on implementing a filtering algorithm based on Principal Component Analysis. Throught this approach it’s possible the filtering of lines (1D), flat surfaces (2D) and solid objects (3D). More details will come up soon.

Thursday, April 12, 2012

Hi, I’m still working on some paper for submittion next week and another one in three weeks. After that I’ll clean up the code and commit it to pcl.

Wrapping up - Reports and Presentation
Thursday, April 12, 2012

As a final blog post for this Toyota Code Sprint, I am attaching the final report I have written for the sponsors.

The presentation slides associated with the report:

And the midterm report:

Changes in the outofcore code
Monday, April 09, 2012

I was able to fix the issue reported by Justin in his March 22nd posting about construction of large out-of-core trees. There was an issue in construction of octrees from inputs that were too large (the TRCS point sets were too large). However, if the clouds are broken up into pieces, or come initially from a set of smaller point clouds, there doesn’t seem to be any issue. For now, I refactored some of the reading/writing constants used, and that seems to have fixed it.

Justin and I have been discussing modifying the PCD reader to iterate through PCD files out-of-core to avoid loading the cloud entirely into memory. This goes for both reading and writing PCD files. I think this is a good idea, especially since this software may be running on a busy server at Urban Robotics, in addition to encoding/decoding the octree data. I’m almost done with the change from C-struct dumped binary data files (ending in ”.oct_dat”) at each node to a PCD file. This will provide a lot of convenience for debugging, as well as a simple way to save in binary compressed (lmz) format, and give us access to PCL algorithms at the leaves.

Radu, Julius and I have been chatting about compression within the out-of-core point cloud. As Radu pointed out to me, the chief issue here that we need to be wary of is I/O speed, particularly in reading and decoding the compressed data from disk, since this is supposed to speed up rendering/visualization of enormous data sets. Speed of writing to disk isn’t currently a primary concern for optimization, though it has its place. Construction of the out-of-core tree is usually an off-line pre-processing step. Julius is going to help us determine how we could use the compression from the existing octree for fast decoding at variable densities.

As a final note, I’ve added the abstract parent class for the disk and ram container classes. This is another step toward refactoring the code base and standardizing the interface throughout.

Implementation progress
Sunday, April 08, 2012

I haven’t been posting for a while. In this week I didn’t get any new result apart working on the SVM learning machine implementation following the PCL standards. As Florentinus, I’m working at the same program for an automated filtering elaboration. I’ve been testing new features for the classifier like:

  • PCA.
  • Point density within a cluster.

After the feature extraction, my steps are twofold: a first training procedure for the classifier and a classification.

In the first case, since the generated clusters are pretty big, they are iteratively displayed one by one asking for an user feedback. The data are then used for a .model file which is generated and loaded for future predictions.

Among the improvements, i found the way to calculate a score percentage which indicates if a cluster is more likely to belong to a class instead of another.

Added new correspondence rejection method base on median distance of the correspondences
Sunday, April 08, 2012

I ported a new correspondence rejection method from libpointmatcher. This method operates on a new thresholding criteria. The median of the distances between corresponding points, times a factor is used as a threshold for rejecting correspondences. The user can set the desired factor. A discussion of this method can be found in the paper “Simultaneous Localization and Mapping with Active Stereo Vision. Diebel, J. and Reutersward, K. and Thrun, S. and Davis, J. and Gupta”. Next I will be adding a rejection method based on the normals at the corresponding points.

The chosen algorithm
Saturday, April 07, 2012

This week in Wednesday we chose algorithm for implementation: Template Deformation for Point Cloud Fitting. I think that it will be first iteration for common algorithm: Template matching using 2D + 3D.

Now I’m researching algorithm and thinking about its implementation. I will have consultation with my mentor next week. After this I will describe here basic steps of implementation and will begin realization.

BGL usage
Friday, April 06, 2012

Finally! I have figured out how to use the push_ralabel_max_flow. Documentation of the BGL is not the best(ordinary users won’t understand how all this works). But I have managed to launch a few simple examples and it looks great.

So right now I will change the code a little so that it would work with the real point clouds. It won’t take much time so I hope that in a few days I will be able to post some results of the segmentation using PCL web viewer.

Full implementation
Thursday, April 05, 2012

As we are implementing the ANF system, the pipeline has been slightly modified:


The ground and wall segmentation are now both performed in the planar segmentation block which can do this iteratively as long as large enough planes are being detected.

The object segmentation has been split into two blocks:

  • Euclidean clustering will only use point location data to do segmentation and the parameters will be set to focus on not having over-segmentation. This will automatically make the step more prone to under-segmentation but this will be handled by the sub segmentation block. This step will also limit cluster sizes, automatically classifying isolated points.
  • The sub segmentation block will further divide the clusters found by Euclidean clustering and will use additional information to divide them into exactly 1 object per cluster as good as it can.

Lastly, a feature estimation block is added that will be gathering the information needed to feed the SVM classifier. The sub segmentation block may be combined or interchanged with the feature estimator depending on what information will be needed to do the segmentation properly.

The main interface for this pipeline and all of the stages are implemented with the exception of the sub-segmenter. What remains is refining some stages and most importantly: keeping the “automatedness” of the system. Each stage has its own parameters that need to be set dynamically depending on other information of the cloud. At the moment this is working for most cases. For those that it doesn’t: I hope to build a feedback loop so that the system can rectify its own parameters. If that doesn’t work out I will need to translate it to an intuitive parameter that needs to be input by the user.

I have updated my roadmap incorporating the latest developments.

Texturing KinFu outputs
Thursday, April 05, 2012

In parallel with the SRCS sprint, we have been working on texturing the meshes generated with KinFu.

The code has been pushed to trunk under surface/texture_mapping.

So far, textures are not blended and a face is attached to the first texture that sees it entirely.
In the future, we wish to come up with simple heuristics that will select textures more efficiently (like the closest and most-facing one) and/or blend them together.

A first result can be seen here:

In this video, 5 meshes have been textured an manually aligned to form the full room. Hopefully, the possibility to stitch volumes in KinFu (see our previous entry) will allow us to scan the room as one big mesh and skip the alignment process.

Raphael & Francisco

Object recognition framework (Global Pipeline)
Thursday, April 05, 2012

This last week I have continued working on the recognition framework, focusing on the global pipeline. The global pipelines require segmentation to hypothesize about objects in the scene, each object is then encoded using a global feature (right now available in PCL are VFH, CVFH, ESF, ...) and matched against a training set which objects (their partial views) have been encoded using the same feature. The candidates obtained from the matching stage are post-processed with the Camera Roll Histogram (CRH) to obtain a full 6DOF pose. Finally, the pose can be refined and the best candidate selected by means of an hypotheses verification stage. I will also integrate Alex’s work regarding real time segmentation and euclidean clustering to the global pipeline (see http://www.pointclouds.org/news/new-object-segmentation-algorithms.html).

In summary, I committed the following things to PCL:

  1. KissFFT library to perform real or complex FFTs. KissFFT has been added to common/fft and therefore is available to all pcl modules.
  2. The Camera Roll Histogram feature and matching stage. The first can be found under pcl_features and the second one in pcl_recognition. Both contain examples on how to use the KissFFT library.
  3. A greedy hypotheses verification stage based on model inliers and outliers (in pcl_recognition).

These are some results using CVFH, CRH, ICP and the greedy hypotheses verification:

_images/cvfh_crh.png _images/cvfh_crh6.png

I have as well been playing a bit with CVFH to solve some mirror invariances and in general, increase the descriptive power of the descriptor. Main challenge so far has been finding a semi-global unique and repeatable reference frame. I hope to finish at the beginning of next week with this extension and be able to cleanup the global pipeline so I can commit it. Regarding the main topic of the sprint, we will try some fast face detectors based on depth to efficiently retrieve regions of interest with high probability of containing faces. Another interesting approach that we will definetely try can be found here: http://www.vision.ee.ethz.ch/~gfanelli/head_pose/head_forest.html

Full implementation
Wednesday, April 04, 2012

Mattia and I are now working on a full implementation of the ANF system we have been doing for this sprint. We are writing the code in the same format as the tools in PCL’s trunk/tools/ with the only difference being that we moved all stages of the pipeline to their own separate implementation file so that we could work on it simultaneously. The pipeline overview and the interface between the different stages is nearly finished. Tomorrow I will post a more elaborate blog update with an improved graphical overview of the pipeline and update of my roadmap.

Volume stitching 101
Wednesday, April 04, 2012

At this point, it is possible to detect when the sensor is reaching the border of the cube along the x-axis. Implementation for Y and Z still remains, but have to think on a smart/elegant way to determine when these boundaries have been passed.

Since there is a VOLUME_SIZE in internal.h, a VOLUME_SHIFT_THRESHOLD has been included as well. The latter represents the distance to the border of the cube that will trigger the volume shifting.

Volume shifting is toggled by pressing the s key while running Kinfu. It would be nice to have it as parameter in command line.

Thinking about the following steps, the question arises whether the camera pose is the only reference to the global coordinates, because then shifting the camera would make us lose any reference to the world whatsoever.

We got some interesting results by using the cube reset at the time of reaching the threshold. The link is at the bottom of this post.

For now we are saving the last pose before doing the shift. This functionality could be similar to a video we saw on youtube as well which also refers to volume stitching.

By stitching the volumes using the transform that is saved, the post-processing of the whole is could be possible. Although this is not yet ideal with respect to memory usage.

Francisco & Raphael

vtkVBOPolyDataMapper Commit - WIP
Tuesday, April 03, 2012

I’ve committed a working version of my vtkVBOPolyDataMapper that the outofcore_viewer is now using. This is the working version from my previous post.

I started breaking out the VBO functionality into the separate class vtkVertexBufferObject that’ll be used by the mapper, vtkVBOPolyDataMapper. I’m running into a few issues I need to resolve before committing and getting feedback from Marcus at VTK and the PCL group. The interface for the new vtkVertexBufferObject should be quite simple and handle the basic VTK objects. Will post another update once I’ve got this all working.

Min-Cut Segmentation
Monday, April 02, 2012

Hi everybody. I have wrote the code for building the graph. Right now it uses triangulation algorithm from PCL. But I intend to write my own code for this later, because triangulation from PCL requires normals and their calculation can slow down the graph building. My variant will simply find K nearest points and add edges between initial point and its neighbours, as suggested in the article. Constructing the graph this way can cause existance of several connected components. But this problem can be solved easily by connecting points from different components with the edge.

Right now I am inspecting Boost Graph Library for the necessary algorithms. I have found several algorithms for finding the maximum flow in the graph, but I think that I will use push_relabel_max_flow algorithm because it is the fastest one. I have never used BGL algorithms for finding maximum flow, so right now I am trying to run some simple examples for small graphs.

One more important thing is that we have decided to generalize the algorithm. It will allow user to specify points that belong to backrgound/object. Not only one point that belongs to object as said in the base algorithm. The idea is based on the interactive application that was described in the artcile.

Pre-Filtering progresses
Monday, April 02, 2012

During the pre-filtering, me and want florentinus take advantage of the structured nature of the point cloud to make a pre-segmentation of the cloud. The latter is based on the search of adjacent pixels having similar intensity characteristics. The first step is therefore to generate an image as the one shown below:


In red the nan-points. The points are very scattered, but a pre-segmentation would immediately highlight the individual positions of the leaves. Among the algorithms studied, I took into account the GrabCut implemented in OpenCV (GrabCut is an image segmentation method based on graph cuts) and a Graph-Based Segmentation discussed in this paper.

The method implemented in opencv is very powerful and segments the image by dividing the background from the foreground. The result is not very useful in our case and, consequently, I haven’t investigated the use. The second method proved to be very powerful! It is based on the simple proximity of similar tonalities and a simple result is shown in this figure:


Different colors represent different segments. Besides highlighting many details, the algorithm is very fast and the resulting segmentation could have a great practical implication.

After a chat with Jorge, me and Florentinus decided to conclude a first “release” of our filtering pipeline by the end of this week. Therefore, I will leave aside the pre-filtering (which will take some time to be adapted to our requirements) and I will spend more time for the optimization of the steps that have already been tested.

Integrating libpointmatcher with PCL
Monday, April 02, 2012

I will be integrating some modules from libpointmacher (https://github.com/ethz-asl/libpointmatcher) to PCL. In the next couple of weeks new methods will be ported from libpointmatcher to the CorrespondenceRejection module in PCL. Documentation and results for the same will also be added.

General proposed outline and some results on loading TSDF
Monday, April 02, 2012

Last Friday we were discussing more with Anatoly on how to extend KinFu for large areas. After two weeks of code and solution exploration, we have set the broad strategy to go forward. In general terms, the goal is to implement as much functionality as possible within the GPU. This means that we will minimize the information exchange between GPu and CPU, since the PCIe bus is a well-known bottleneck. Three main steps have been identified as well:

  1. Implement demo that allows to travel within office without tracking failures. When Kinect goes out of volume, the volume changes its physical position to fit camera frustum again. So that camera tracking could continue without resetting. We agree that filtered out data is dropped at this stage [1].
  2. Implement a repacking procedure of the volume. When the volume is shifted, the information of the cube is extracted and compressed within GPU for later use. The new area must be initialized with some values (TBD).
  3. Develop smart heuristics to decided when the volume is shifting, and the consequences of such shift.

Therefore, the tasks starting from today are [1]:

  1. Familiarize with all KinFu sources.
  2. Implement integration/ray-casing that takes into consideration physical volume position.
  3. Implement volume content repacking to be consistent with 3D world after shift

Last week we were also experimenting with loading the TSDF from the filesystem. We created a small application to load data from a file to the GPU, perform raycasting and generate a screenshot.

The figure below shows the stored TSDF point cloud. This is already in KinFu trunk.


The result that we get is shown below. We used three camera pose (also stored in files). These are the results :


For now, this code will not be included in PCL because it needs some clean-up and matching to the coding standards. Furthermore it is part of the second step so it will be explored afterwards.

Francisco & Raphael


[1]Anatoly Baksheev, Minutes of meeting, March 30, 2012.
Curve fitting results
Saturday, March 31, 2012

Below I posted some results of the B-Spline curve fitting algorithm. I haven’t found any existing code that can handle point-clouds like those in the image. The reasons for this are heavy clutter and noise inside and at the boundary (c,d), strong turns and concavities and regions where the boundary is quite close to another one (a,b).

After finishing the work I’ll have a talk with Radu to discuss about implementation issues regarding NURBS, which will define how the curve fitting algorithms will be implemented.

Best features for noise recognition
Friday, March 30, 2012

Together with Florentinus, we concluded to divide the job for removing noisy points following a pipeline of this type.


(by courtesy of Florentinus)

The work done during these days was to improve the classification stage of the pipeline. This mainly consists in the search for features outlining the errors and in making flexible as possible the SVM learning machine. This required great efforts in terms of analysis of the codes and research between the methods already present in PCL for the extraction of features.

What has been most tested is the VFH (link) that provides in output 308 features for each cluster analyzed. Unfortunately, the results were realy bad. The goal for the future is to find features that describe the surroundings of a cluster together with the intrinsic properties.

In the coming days I will also work to make SVM (based on libsvm) compatible with the other libraries of PCL.

FoVis Visual Odometry Library released
Friday, March 30, 2012

Just a quick note that the visual odometry library we have been using within our particle filter - called FoVis (Fast Visual Odometry) - has been released. It supports both stereo cameras (including Pt Grey Bumblebee) and depth cameras (MS Kinect, Asus Pro Live) and achieves 30Hz using a single CPU core. I haven’t used another VO system but its highly recommended.

FoVis was developed by Nick Roy’s group here in MIT. More details can be found in this paper:

Visual Odometry and Mapping for Autonomous Flight Using an RGB-D Camera. Albert S. Huang, Abraham Bachrach, Peter Henry, Michael Krainin, Daniel Maturana, Dieter Fox, and Nicholas Roy Int. Symposium on Robotics Research (ISRR), Flagstaff, Arizona, USA, Aug. 2011

You can download the paper and the library from here.

Object clustering
Friday, March 30, 2012

I am testing the new embedded pcd viewer and showing the results of the objects clustering I have been working on.

The viewpoint of this pcd viewer does not auto correct which is annoying. When pressing R in the embedded pcd viewer it will reset the viewpoint to the one described in the .pcd file instead of calculate an optimal one. This means that you manually need to open the .pcd file and modify the viewpoint there so that the initial view is already valid.

Edit: I spent 1.5 hours trying to do this... this quaternion system sucks big time. I will just ask Radu to update the embedded viewer :)

As for the actual work: I implemented a basic ground, wall and object segmenter. The ground and wall surfaces are removed using SAC segmentation with a plane model. These steps are really fast and leave one fourth of the points left for the remaining object clustering. There I apply an octree representation and a Euclidean clustering to get to the clustering as shown in the picture. The octree is more or less used like a voxel grid filtering at the moment and speeds up the euclidian clustering significantly. It also holds the density information which will be useful information to pass on to the classifier. For the benchmark data set there were no issues with objects being very close to one another. Next week I will improve the Euclidean clustering to also take varying point densities into account and be able to distinguish adjacent objects.

Introduce a joint probability distribution for estimating changes
Friday, March 30, 2012

Recently, I came across a paper in which a statistical model was proposed to detect changes between two range scans. I am going to well understand and implement it.

Currently, PCL has the capability to handle geometrical and intensity differences detection. The general criterion is the Euclidean distance. I wish to improve this by adopting the model mentioned in this paper.

I am contacting the author about the mathematical derivations in the paper and will post the progress later.

Face recognition based on 3D meshes (II)
Thursday, March 29, 2012

Hi again, I integrated the generation of training data for faces into the recognition framework and use the standard recognition pipeline based on SHOT features, Geometric Consistency grouping + RANSAC, SVD to estimate the 6DOF pose and the hypotheses verification from Papazov. The results are pretty cool and encouraging...


Thats me in the first image (should go to the hairdresser...) and the next image is Hannes, a colleague from our lab.


The CAD model used in this case was obtained from http://face.turbosquid.com/ that contains some free 3D meshes of faces. Observe that despite of the training geometry being slightly different than those from the recognized subjects, the model is aligned quite good to the actual face. Notice also the amount of noise in the first image.

I am having interesting conversations with Radu and Federico about how to proceed, so I will post a new entry soon with a concrete roadmap.

Outofcore Octree Update
Wednesday, March 28, 2012

Recently I’ve been investigating the issues with the outofcore octree reported by Justin. I have not been able to track down the reason that we cannot yet handle insertion of very large pcd files.

I have also been re-implementing the outofcore containers (ram and disk) to inherit a common abstract interface class. While I am working on this, I am cleaning up the binary serialization code, and plan to transition the binary point data at each node to PCL’s binary format. This will also allow us easy access to the lmzCompression of the point data.

As we go along, I have been adding unit tests to monitor bugs/features. These still are not being run by the build server, but I hope to see the outofcore code base sufficiently stable to enable by default soon.

Current pipeline idea
Wednesday, March 28, 2012

Here is a graphical overview of how the pipeline of the ANF system may be turning out:


This pipeline is fine-tuned for larger scale environments such as the Trimble data sets. For smaller scale environments the ground and wall segmentation steps could be omitted.

The process from left to right can also be seen as an interactive information gathering process where the information up to that point is being used to refine the search for new information. This is useful for both the speed and accuracy of the final results.

  • The ground segmentation step will use very basic analysis of the scene and should be a fast step, removing a lot of points so that the other steps are processing faster as well. This step is likely going to be based on iterative plane fitting.
  • Similarly, the wall segmentation will also remove a lot of points, easing further processing steps. It will however be more difficult to distinguish walls from objects of interest so slightly more advanced analysis is required. This step is likely going to be based on a region growing algorithm only passing very large regions.
  • The object segmentation step is going to determine how the remaining points in the point cloud are making up objects. An important challenge is that it needs to be able to detect when two objects are very near to one another, where a region growing algorithm would fail. The step will also be gathering more information like point density, normals and curvature to use in its analysis.
  • For each of the segmented objects the classifier will determine what type of object it actually is. It will calculate features such as VFH and use almost all of the information already gathered. This step is already implemented by using a Support Vector Machine learning algorithm and is working quite accurately.
  • The final step is very specific to the application. For our current purposes we just need to remove the trees and moving objects from the scene.

So the idea is to do the information gathering as late as possible in order to optimize speed (late in the pipeline means less points to apply the algorithm to). But don’t move it too late: earlier in the pipeline is better for accuracy.

For the next few days I will be focusing on the object segmentation step here. More specifically: I will investigate normalized cut clustering.

SVM ready. Next step: pre-filtering
Tuesday, March 27, 2012

In the last few days I have been working to improve the implementation and performance of the machine learnng SVM. By providing a pre-labeled training set, I managed to get a performance rating of about 95%.

The trained classifier shows good performance as highlighted in the following screenshots:

_images/12.png _images/22.png _images/32.png

The next days I will deal with the extraction of segments of the cloud in order to reduce the number of points on which to perform the classification.

Florentinus, meanwhile, is working on finding the best features to be extracted in a cluster to improve performance and minimize mistakes.

Soon I will test also a classifier k-NearestNeighbour based on a kDTree representation.

Face recognition based on 3D meshes (1)
Tuesday, March 27, 2012

Last week I have been working on a small framework for 3D object recognition/classification. It can be found on trunk under apps/3d_rec_framework but be aware that it is far from finished.

This is related to 3D face orientation project as face detection and orientation estimation might be approached using a classic object recognition approach: a training set of the objects to be detected (faces in this case) is available and salient features are computed on the training data. During recognition, the same feature can be computed on the input depth image / point cloud and matched against the training features yielding point-to-point correspondences from which a 3D pose can be estimated usually by means of RANSAC-like approaches.

I am a big fan of using 3D meshes or CAD models for object recognition due to many reasons so I decided to do a small experiment regarding this for face detection. I took a random mesh of a face from the Princeton Shape Benchmark and aligned as depicted in the first image. Yaw, Pitch and Roll are usually used to define a coordinate system for faces.


Because we would like to recognize a face from several orientations, the next step consists in simulating how the mesh would like when seen from different viewpoints using a depth sensor. So, basically we can discretize the yaw,pitch,roll space and render the mesh from the same viewpoint after being transformed using yaw,pitch,roll rotations.


The small window with red background is a VTK window used to render the mesh and the point cloud (red points overlapped with the mesh) is obtained by reading VTKs depth-buffer. The partial view is obtained with a yaw of -20° and a pitch of 10°. The next image is a screenshot of a multi-viewport display when varying the yaw in a range of [-45°,45°].


And maybe more interesting, varying pitch from [-45°,45°] with a 10° step. The points are colored according to their z-value.


Basically, this allows to generate training data easily with known pose/orientation information which represents an interesting opportunity to solve our task. The idea would be to have a big training dataset so that variability among faces (scale, traits, ...) is captured. Same would apply for tracking applications were a single person is to be tracked. A mesh of the face could be generated in real-time (using KinFU) and use as only input for the recognizer. This is probably the next thing I am going to try using FPFH or SHOT for the feature matching stage.

Ill last week
Monday, March 26, 2012

Hi, unfortunately I was ill last week and I’m struggeling a little bit with paper deadlines right now. I’ll post some results of the paper this week and will integrate the stuff right after the deadline.

SVM ready. Next step: pre-filtering
Saturday, March 24, 2012

The SVM classifier is implemented and ready to be used and tested in our automated noise filtering working chain. It is of course based on libsvm and i created classes for training,testing and use the classification algorithm.

An interesting chat with Florentinus, highlighted a new methods which is worth to be tested. They came up after reading this paper (more info in Florentinus’ blog).

In the next days i want to test the classifier for clusters recognition. Then I’ll start thinking on a pre-filtering process based on organized grid representations of Tribmle’s datasets.

New papers, New ideas
Thursday, March 22, 2012

After reading some more papers on segmentation, classification and recognition, Mattia and I had another talk on the ANF system. We are now investigating and adapting ideas from this paper, which has a lot of similarities with what we are trying to achieve. We are thinking of splitting up the work as follows: I will work on the “information gathering” and Mattia will work on the “information processing”. For instance, the machine learning based classification will be done by Mattia and I will work on providing the classifier with enough features to classify on.

The new information gathering that I will be doing will likely belong in the pre-processor that I was working on already. However, the information that is useful to extract often requires other information and again a step-like system would develop. For example, the Trimble data sets start out with x, y, z and intensity information. This can then be used to calculate normals and curvatures. With this, simple features can be computed. After that, more sophisticated features, etc.

I am currently investigating the features module of PCL more in-depth and am already finding a lot of useful things for this sprint. The eventual ANF system will probably turn out to use and interact with almost all modules in PCL :)

First blog entry
Thursday, March 22, 2012

This is Raphael’s and Francisco’s first blog entry. We have had a discussion with Anatoly to introduce ourselves, as well as brainstorm on potential solutions to gets us closer to the goal. Friday the 30th we will have a more concrete discussion about the most promissing solution(s). It will also help to determine more concrete tasks, since for now we are in an exploratory stage.

Francisco & Raphael

New Commits and More on Visualization
Thursday, March 22, 2012

I’ve updated the pcl_outofcore_process runtime to dump numerous pcd files into an octree. There are currently a few issues with this tool that need to be resolved:

  • numpts isn’t written to the json file
  • Given that numpts isn’t written it’s hard to tell if all points specified within the pcd files are getting written to disk
  • Should we store bounding box info within the header of a pcd file so an entire cloud doesn’t require parsing?
  • Should we write a pcd reader that can iterate through the points so the entire cloud doesn’t have to be loaded into memory? This’ll be useful when loading up clouds in the millions+

I’ve committed the octree_viewer from my last post with very basic VTK functionality. This required some additional methods that already existed within the pcl_octree. Although, I’ve noticed that the voxels displayed aren’t square, which I think should be. I’ll have to look into this further.

On to the more exciting things!

I’ve been in talks with Radu and Marcus Hanwell, one of the developers on VTK on how we should move forward with our visualizations tools. The current version of VTK is based on OpenGL display lists, which is a very old technology deprecated in 3.0, removed in 3.1 with support via the ARB_compatibility extension and a compatibility profile introduced in 3.2. That being said, there are classes within VTK that use the newer technology, just not the pieces we’re interested in.

Why is this important? Well, because writing an out-of-core viewer doesn’t make much sense with display lists and requires a more dynamic implementation.

As VTK 6 evolves various newer OpenGL features will be integrated, vertex buffer objects (VBOs) among them. Until then, I plan on helping Marcus get these new features in faster by helping prototype and possibly develop the classes required. The good news is the VTK guys now have immediate testers with a simple test case, billions of points!

I’ve started to hack together a new VTK mapper vtkVBOPolyDataMapper to replace the vtkPolyDataMapper. I have a very barebones version working. It’s got a long way to go, makes quite a few assumptions and’ll need some love to work generically in the VTK framework. I’ll post more on this when I update the octree_viewer with the newer functionality.

Overview of Computation
Wednesday, March 21, 2012

So I’ve been doing some analysis to determine the overall benefit of the improvements mentioned below within our application. It will give a feel for the impact of various improvements in future.

High Level Timing

First lets look at the high level computing usage. We did several tests and looked at elapsed time for different numbers of particles and also looked at (1) doing the likelihood evaluation on the GPU (using the shader) or (2) on the CPU (as previously).

high level computation usage

The data in this case was at 10Hz so real-time performance amounts to the sum of each set of bars being less than 0.1 seconds. For low number of particles, the visual odometry library, FoVis, represents a significant amount of computation - more than 50 percent for 100 particles. However as the number of particles increases the balance of computation shifts to the likelihood function.

The visual odometry and likelihood function can be shifted to different threads and overlapped to avoid latency. We haven’t explored it yet.

Other components such as particle propogation and resamping are very small fractions of the overall computation and are practically negiligible.

In particular you can see that the likelihood increases much more quickly for the CPU evaluation.

Low Level Timing of Likelihood

The major elements in the likelihood are the rendering of the model view and the scoring of the simulated depths - again using the results of these same runs.

low level computation usage

Basically the cost of rendering is fixed - a direct function of the (building) model complexity and the number of particles. We insert the ENTIRE model into OpenGL for each iteration - so there will be a good saving to be had by determining the possibly visible set of polygons to optimize this. This requires clustering the particles and extra caching at launch time but could result in an order of magnitude improvement.

It’s very interesting to see that the cost of doing the scoring is so significant here. Hordur put a lot of work into adding OpenGL Shader Language (OpenGLSL) support. The effect of it is that for large numbers of particles e.g. 900 scoring is only 5% of available time (5% of 0.1 seconds). Doing this on the CPU would be 30%.

For the simplified model, this would be very important as the scoring will remain as is, but the render time will fall a lot.

NOTE: I think that some of these runs are slightly unreliable. For portions of operation there was a charasteristic chirp in computation during operation. I think this was the GPU being clocked down or perhaps my OpenGL-based viewer soaking up the GPU. I’ll need to re-run to see.

Testing WebGL
Wednesday, March 21, 2012

Test of WebGL and pcl:


My first step
Wednesday, March 21, 2012

Hello everybody! This is my first blog entry as a participant of PCL-SRCS. I filled out information about myself. Now we refine the algorithm that will be implemented in the PCL-SRCS.

Filled in my information at PCL-SRCS blog
Wednesday, March 21, 2012

This is my first blog entry to test the blogging system.

Studying machine learning uses
Tuesday, March 20, 2012

Recently I’m working hard on understanding and best configuring the most famous machine learning algorithms. The purpose is to use Support Vector Machines to generate a weighting value for clusters in a cloud, then to group the ones with similarities and finally remove leaves and ghosts.

I’ve also thought to use Artificial Neural Networks and I started to implement the learning algorithm. But after discussing about it with Jorge and Federico, they addressed me toward more sofisticated and better performing approaches, exactly like SVMs.

Results will come up soon.

Tree segmentation progress
Tuesday, March 20, 2012

I have been discussing work on ANF with Mattia. Mattia will be building the more sophisticated segmentation system based on machine learning. I will be working on a more basic segmentation system and will focus on the interaction between different classes and levels in the hierarchy of the complete ANF system. I have updated my roadmap incorporating the latest developments.

Meanwhile, the TreeSegmentation class I was working on has improved further:

_images/071.png _images/081.png

It now also uses a very basic shape and size classification. Note that the results are actually in the form of a weighting to each point, the above screenshot depicts the points above a certain threshold for this weighting.

I am not comfortable with the current method I use though. I want to look at an octree implementation where I can zoom through different levels of resolution and also make use of density information. Hopefully this will provide more accurate and faster results for the shape and size classification.

Testing blog system
Tuesday, March 20, 2012

I was now on vacation for a couple of days and will be flying back tomorrow to Vienna. Used some spare time to update personal info and get familiar with the blogging system.

Color-based segmentation
Monday, March 19, 2012

Hi everybody. I have commited the code for Region Growing algorithm that uses the points color. I also made some refactoring for the base algorithm, so now it works faster. Right now I’m going to implement the segmentation algorithm based on the graph cut.

Fitting NURBS curves (real data)
Monday, March 19, 2012

Fitting NURBS curve to real data in a generic way is quite tricky. I’ve developed an algorithm to fit NURBS curves in an error-adaptive manner, with respect to knot insertion and concavity filling. This means that at regions where the error (distance from the curve to the closest point) is high, control points are added by knot insertion and simultaneously the concavity of the control point is increased bending the NURBS curve inwards. Not modeling this concavity constraint would lead to bridging of narrow gaps.

Please have a look at the video “Curve fitting” on my homepage: http://users.acin.tuwien.ac.at/tmoerwald/?site=4#nurbs

Features Analysis
Saturday, March 17, 2012

Moving forward on my journey to noise removal, I’m facing the problem of point clusters identification for noise extraction. The problem is not easy at all, and I show in this blog post the frequency histograms meant to compare different features for the cluster identification. Moreover, I want to find range of values of these features with the purpose of marking a cluster with a matching percentage score.

The next step is the use of a good method to build a classifier. I really would like to implement Artificial Neural Networks. I know that it’s not the shortest and easiest way, but it’s probably the most powerful and gratifying.

  • Frequency Histograms representing the intensity distribution:
  • Frequency Histograms representing the cardinality distribution:
  • Frequency Histograms representing the Standard Deviation distribution of normal vectors inside the clusters:
  • Frequency Histograms representing the Standard Deviation distribution of curvatures inside the clusters:
  • Frequency Histograms representing the distribution of the eigenvalues calculated from the covariance matrices of each cluster:
Code revision and future plan
Saturday, March 17, 2012

According to the feedback from Gordon, I revised the code I wrote. I will add more functionalities to handle RGB point cloud, as well as spatial+intensity change simultaneously.

ANF progress
Friday, March 16, 2012

I wrote a very basic pre-processor for the ANF system. The idea is to gather commonly used data that the majority of the other steps of the ANF system will want to use anyway. For the Trimble data sets it currently only performs normal estimation and appends this information to the point clouds, resulting in PointXYZINormal type clouds.

At the moment I am still working on the TreeSegmentation class, which will use almost all of the fields of PointXYZINormal. The classification steps for intensity and curvature are already finished, what remains are the steps for shape and size classification of clusters of points.

Chats about Outofcore Octrees
Wednesday, March 14, 2012

Justin, Radu, Julius, Jacob and I have been discussing outofcore octrees. Some particularly interesting points of note arose in the conversation regarding the method of serialization. The UR constructs the octree in a depth-first manner, storing the point data in the leafs of the tree. If the LODs are generated, the folders (which contain internal node data) can be read (and rendered) in a breadth first manner, providing a successively more detailed octree as deeper nodes are read (see Justin’s blog).

Julius encodes his outofcore octree using a method similar to the Nested Octrees in the Scheiblauer and Wimmer paper [2]. Each file serialized is itself a sub-octree.

Currently, I am investigating appraoches to serialization for octrees. I’m studying two papers in particular that I’ve found useful:

[2]Claus Scheiblauer and Michael Wimmer, “Out-of-Core Selection and Editing of Huge Point Clouds.” Computers and Graphics, April 2011.
LUM class unit test
Wednesday, March 14, 2012

I intended to create a suitable unit test for the LUM class, however, I stumbled upon particular cases where the LUM class fails to give proper results. I ended up spending the last two days searching for the cause but was unable to find it. In order to satisfy http://dev.pointclouds.org/issues/623, I will now make the LUM class instantiate through a .cpp file instead.

Tomorrow I will continue with the ANF project again.

Studying existing segmentation procedures
Tuesday, March 13, 2012

We try to decrease the number of elements in a point cloud by collecting the points into groups called clusters. For this purpose it is very important the choice of a measuring distance that links points into a cluster. The most common types are:

  • Euclidean takes into account the direction and magnitude of vectors: \sqrt{(x_i-y_i)^2}
  • Squared Euclidean accentuates the distance between entities and tends to give more weight to outliers: (x_i-y_i)^2
  • Manhattan is greater than Euclidean but it might appear less compact: |x_i-y_i|

Once the method is decided, the strategy of classification can be hierarchical or non-hierarchical. A non-hierarchical method fixes the number of clusters a priori (like K-Means or Self Organizing Map). A hierarchical method starts from a number of clusters equal to the number of points and progressively reduces the number of clusters combining the calculated ones based on a “closeness” criteria.

Following these steps, the points are organized into single entities to which we want to attribute a physical meaning. The aim of the noise removal requires to distinguish ghost points and vegetation from the useful data. The idea is then to assign labels associated to a classification score. To do this we will train a classifier with a large amount of training datasets, and analyze some features like:

  • cardinality;
  • eigenvalue decomposition ​​of the covariance matrix;
  • intensity;
  • curvature;
  • normal variance.

Obviously, the cloud will be full of classification errors and mismatches. So, we will introduce a linking policy with which a leaf or a ghost cluster must be surrounded by clusters of the same type to be removed. This further analysis has the goal of making the method more robust and flexible. To do this we need to define the “distance” between clusters and different criteria like:

  • *Local Measures*
    • Minimum or Nearest-Neighbour Method: the distance between two groups is the minimum distance between all pairs of points belonging to the first and second group. The criterion generates groups with the shape of “extended chains” of scattered elements.
    • Maximum or Furthest-Neighbour Method: the measure of distance between two groups is the maximum distance between all pairs of points belonging to the first and second group. The criterion generates compact groups with very close elements.
  • *Global Measures*
    • Within Groups Clustering Method: it considers the average of all the distances between pairs of points in the first and second group.
    • Centroid: the distance between two groups is determined by the distance of their centers of gravity.
    • Ward: the clusters are aggregated so that the variance increment in the new group is the minimum possible (every time you add a connection the variance increases; we want to minimize this increase).
Fitting NURBS curves
Tuesday, March 13, 2012

One of the geometric opperations on NURBS is to trim them. That is to cut out holes or define a boundary which has a different polynomial degree then the NURBS itself. Then, trimming is nothing else but defining regions on the NURBS where it is visible or invisible. In our case we need it for defining the boundary of arbitrary surfaces and furthermore for intersecting NURBS of different polynomial degree.

To apply such boundaries to real data we want to fit Nurbs curves to a point-cloud boundary. In the image below you see some test case of a boundary with some outliers in the middle (think of typical kinect data). The curve is initialised with PCA with a radius equal to the first eigenvalue (iteration 0), and then successively refined and fitted. The fitting weight of the points (blue) is influenced by the distance (green) to the curve (red) using the gaussian function so that close points lead to high influence and vice verca.

To account for higher frequence in the measurement, the degree of freedom of the curve is increased by knot insertion (iteration 1 and 2). After a few iterations the curve approximates data quite nice (iteration 5).


The radius of the fitted curve is 0.5 with an overlayed sinusoidal of peak amplitude 0.05.

Coming up next: Fitting of NURBS curves in the parametric domain of NURBS surfaces and trimming (on real data).

Sunday, March 11, 2012

I finally managed to put Misha’s Poisson reconstruction implementation in PCL. It now works properly and passes the unit tests, with the same behavior as the original application. Futhermore, we are adapting it to the PCL coding style.

A new pcl::surface base class has been introduced, due to some confusion communicated via the pcl-users mailing lists. Now, there is the CloudSurfaceProcessing class, which represents the base class for algorithms that take a point cloud as an input and produce a new output cloud that has been modified towards a better surface representation. These types of algorithms include surface smoothing, hole filling, cloud upsampling etc.

In this category, we have the MovingLeastSquares algorithm and its additional features we mentioned in previous blog posts and the all-new BilateralUpsampling algorithm, based on the bilateral filter (for more details about the workings of the algorithm, please see the previous post). Suat was kind enough to help me by modifying the OpenNIGrabber such that it will output 1280x1024 PointXYZRGB clouds when the Kinect is set to high-resolution RGB -> this means that each second row and column contains nan depth values. And here is where the BilateralUpsampling comes in, using the RGB info to fill in the missing depth data, as visualized in the following example image:

Automated Segmentation and ANF
Friday, March 09, 2012

We are currently focusing on “binary noise”, i.e. noise that is defined by the (binary) existence of points. For this type of noise, the challenges in Automated Noise Filtering can completely boil down to challenges in Automated Segmentation; If the AS is performing ideally, the only further step to get to ANF is to apply the ExtractIndices filter in PCL. Hence I have added my latest work to the segmentation module in PCL http://docs.pointclouds.org/trunk/group__segmentation.html.

Currently there are the base class AutomatedSegmentation and one derived class AutomatedTreeSegmentation. Each derived class from AutomatedSegmentation is going to represent one step in the total system and focus on one particular type of entity to segment from the scene. These different steps can then be used in succession to get to a complete ANF system. However, I aim to build these classes so that they can interact with one another in more complex ways (like an iterative design or combination with registration -> SRAM). More information on the classes can be found in their documentation as soon as docs.pointclouds updates.

Also, each of these classes/steps is built up from different substeps. For instance, The AutomatedTreeSegmentation performs intensity classification and curvature classification as substeps. I am still thinking if it could be interesting to separate these substeps into their own classes somehow. For these substeps it also holds that they may need to interact more complexly than just successive application.

I am hoping to converse with Mattia or other people who are interested to see if this is the most interesting/useful implementation of an ANF system. If you are reading this and have suggestions or feedback (both positive and negative) about this, don’t hesitate to drop me a line: f.j.florentinus@student.tue.nl.

Meanwhile I will continue working on the implementation of AutomatedTreeSegmentation since is it not finished. I will also spend time on http://dev.pointclouds.org/issues/614 and other PCL related clean up things.

NURBS fitting on real data
Thursday, March 08, 2012

I’ve made some experiments with cylindrical NURBS and NURBS patches on real data. I’ve uploaded two videos to my homepage: http://users.acin.tuwien.ac.at/tmoerwald/?site=4.

Latest Result for Change Detection on Intensity Values
Thursday, March 08, 2012

I continue working on http://dev.pointclouds.org/issues/630.

I am still using Gordon’s data, but at this time I did more trimming and converting work. I added one more data member to pcl::SegmentDifferences which controls the threshold that the users could specify when looking at the intensity values.

By far, the new code has been working fine.






Visualization with VTK
Wednesday, March 07, 2012

I’ve started to wrap my mind around VTK and the PCL Visualizer. I wrote an application similar to the octree_viewer using straight VTK. The following is a processed outofcore cloud with 4 LODs.

Vegetation ANF
Tuesday, March 06, 2012

The grand idea of the ANF system as discussed during last week is going to take a while to fully take shape. Following Jorge’s suggestion, I am going to focus on the subproblem of vegetation first (trees in particular) and implement an easy system. The system will already be a multi-step system where the first two steps are:

1 Intensity classification

Trees generally have a low intensity rating compared to most flat surface entities. Hopefully this step does not need any input parameters, i.e. all laser scanner type data sets have this property (TODO). This step is performed first for it is a very fast step and will leave a smaller exploration space for the other steps.

2 Curvature classification

Leaves generally have a high curvature. This step will likely also need to analyze the size of the areas that have high curvatures and maybe their shape too. It could be useful to implement the size/shape detection as a separate step in ANF so other classifiers can also make use of this (TODO). This step has some parameters that need to be set but it is likely possible that in the end this can be done fully automatically.

An interesting global input parameter / cost function would be to set the amount of trees that are in the point cloud. This is hopefully easily determined by a user and it allows for an iterative design where the ANF system could iterate and re-adjust its internal parameters in order to get to the appropriate answer.

The system will also start using the weighting system where each step applies weighting to the points for how likely that point is part of a tree or not.

I have also been working on http://dev.pointclouds.org/issues/614 since my last blog post.

New test results
Monday, March 05, 2012

Hi everybody. I have tested the Region Growing algorithm that uses points color. So here are the results.

_images/office4_original.png _images/office4_dist-10_rcdist-5_pcdist-6_pnum-600_.png _images/office2_original.png _images/office2_dist-10_rcdist-5_pcdist-6_pnum-200_88-segments_1067-sec_.png _images/office3_original.png _images/office3_dist-10_rcdist-5_pcdist-6_pnum-200_75-segments_1889-sec_.png _images/office1_original.png _images/office1_dist-10_rcdist-5_pcdist-6_pnum-200_87-segments_1247-sec__.png

There are some more pictures in my folder in higher resolution. You can find them in “trcsweb\source\velizhev\images\COLOR”

pcl::SegmentDifferences Profiling Result
Sunday, March 04, 2012

After talking with Radu, I decided to use Very Sleepy to profile the performance of pcl::SegmentDifferences. The computing time was extremely large to Trimble data. I could not wait till the program stopped before it used up the memory.

The basic statitics shows:

Filename: D:PCL_TRCSpcl_sshbinTRCS_test_debug.exe

Duration: 31549.642000s

Date: Sat Mar 03 23:29:50 2012

Samples: 3619112

I wish I could have made a figure but I haven’t found a tool to convert the result to a reasonable graph. The output .csv file was not well generated. So, here I just show the screen shot from which you could clearly see which parts are the most time consuming parts.

Clustering process
Sunday, March 04, 2012

After the use of a Region Growing clustering process based on the Euclidean distance, I show a good result which will definitelly be good for the recognition of leaves on trees. In the image below, different colors mean different clusters. Next step is the use of a classifier to distinguish good and noisy clusters.

Cylindrical NURBS
Sunday, March 04, 2012

UPDATE: The inversion of points for NURBS cylinders is now fixed and takes the same time as for NURBS patches

This week I implemented fitting of periodical NURBS, namely cylinders. Unfortunately the inversion of points (i.e. finding the closest distance from a point to the NURBS surface) takes (for some reason I didn’t find yet) much longer than for NURBS patches. In the image below you can see some data points in blue. The NURBS cylinder is initialized using PCA. Then for each point the distance to the closest point on the NURBS surface is calculated (red lines), and the linear equation for minimizing this distance is assembled and solved. As usual a regularisation term models ‘stiffness’ of the NURBS (magenta).

Some successful result for the leaves removal filter
Friday, March 02, 2012

From the last chat meeting had with Jorge, Radu and Federico (see Florentinus’ blog) I got to experiment with new ideas.

The proposed filtering step is based on the calculation of the covariance matrix of points coordinates, in the neighborhood of a sphere of radius R. Using an EVD (Eigen Value Decomposition), the filtering policy is based on:

s <= \frac{\lambda_{min}}{\lambda_{max}}

where s is an user defined constant (in my case 0.04). All the points respecting the previous constraint are deleted from the cloud.

The second step of filtering uses a simple RadiusOutlierRemoval filter iterated twice. The results are shown in figure:


The method on the global cloud reported minimal loss of non-noisy points and high processing time (just over an hour).

_images/24.png _images/34.png

This solution is therefore an excellent candidate for the removal of vegetations. In the next study I will try to segment the image to apply the filter only in the areas which are marked as “vegetation”. Hopefully, this will minimize the loss of details not meant to be filtered. Once I get good results I’ll optimize the algorithm to reduce the computation time.

Chat with Alexander and Mattia
Friday, March 02, 2012

Today I had a chat with Alexander and Mattia on segmentation. Alexander explained his work on Non-Associative Markov Networks for 3D Point Cloud Classification: http://graphics.cs.msu.ru/en/node/546. This type of classification could be very useful for our ANF system since we are working with quite specific noise types. These noise types would probably be adequately distinguishable through the machine learning that is used in this classification method. Alexander adds that the current implementation uses OpenCV, which would add a new dependency if it was implemented into PCL as such.

While I am gathering information like this and thinking of how to combine it into one big ANF system, I will also be working on the following roadmap entries for the next couple of days:

  • Bugfix, patch and enhance the PCL filters module.
    • Make all filters that are about point removal derive from FilterIndices instead of Filter.
    • Move the getRemovedIndices() system from Filter to FilterIndices.
    • Implement the getRemovedIndices() system for all the derived filters.
    • Implement the MixedPixel class in PCL.
Chat with Jorge, Radu, Federico and Mattia
Thursday, March 01, 2012

Last Monday I had a very useful meeting with Jorge, Radu, Federico and Mattia about the TRCS and ANF. A quick summary of the conversation:

  • The system setup for TRCS-ANF is more clear now:
    • The automated noise filtering system will perform analysis on the scene and use the results of the analysis to determine which filtering to apply and which parameters to use.
    • The system could have any number of these analysis and filtering steps, where each step has a particular focus. Steps should be minimized for the removal/alteration of non-noise points, which could limit the filter’s ability in tackling noise. Hence the idea of having multiple steps: widen the overall applicability of the system.
    • Each step would have at most one settable parameter, like a slider that ranges from 0 to 1, indicating the “aggressiveness” of that step.
    • Ideally the number of sliders of the ANF system would approach zero. This would likely only happen if an adequate cost function can be devised. The cost function could also be used to enhance the performance of some of the steps in the sytem by allowing an iterative design.
    • The system can still be built in various ways, largely depending on what types of noise are in need to be tackled. For now we will focus on the noise of the Trimble data sets, namely: vegetation and moving objects noise.
  • Brainstorm on the analysis type steps:
    • Use segmentation to distinguish between different entities (both noise and non-noise). For instance:
    • Use properties of points and determine new properties based on surrounding points:
    • Apply a very fast, simple conditional analysis that passes points that are definitely not noise or filters points that definitely are noise. The first steps of the system should be fast and simple like this. As the subset of unclassified points decreases, the complexity of the steps increases.
    • Instead of binary removal of points, apply weights to points, describing the certainty that the point is noise. Different steps in the pipeline alter this weight accordingly. At the end of the (sub-)pipeline apply hysteresis and/or use one of the main system sliders to determine the actual removal.
    • If possible: use change detection to further analyze the scene. Most interesting option: combine this with the registration process of the total system using an iterative design. Allows to link the intermediate results of noise weighting across different point clouds. Also ensures a symbiotic relation between registration and filtering: SRAM (Simultaneous Registration And Modeling).
  • Brainstorm on the filtering type steps:

Implementing all of the ideas above could become a huge project. Tomorrow I will discuss with Mattia how to properly split up the system into subsystems and determine priorities for each of the subsystems. The results from that conversation can be found on my roadmap.

PCL’s Point Clouds in Outofcore Octrees
Wednesday, February 29, 2012

Over the weekend I committed a lot more refactoring and documentation for the outofcore octree code. It is now wrapped in the pcl::outofcore namespace, and I added support for a member function to the pcl::outofcore::octree_base class to copy data from a point cloud to an outofcore octree.

The outofcore library contains four classes:

  • octree_base
  • octree_base_node
  • octree_disk_container
  • octree_ram_container

Users interact with the outofcore octree entirely through the public interface of the octree_base class, which is templated by type of tree (disk v. ram) and the Point Type, and manages the nodes of the octree. The header files for the outofcore files are:

#include <pcl/outofcore/outofcore.h>
 #include <pcl/outofcore/outofcore_impl.h>

For an out-of-core octree, use the octree_disk_container, which will create a root directory for the tree, and up to eight directories, labeled 0-7 within each subsequent directory. Using the octree_ram_container constructs or loads an octree entirely in main memory, which is similar to the Octree already in PCL.

Each directory represents a node of the octree, a cubic axis-aligned region of space whose dimensions are specified by a “bounding box”. All nodes at the same depth, but in different branches, represent disjoint regions of 3D space. On disk, each node is a folder containing:

  • a JSON metadata file
  • up to eight child-node directories labeled 0-7
  • an optional data file of points subsampled from its children. However, if the node is a leaf, then it contains the highest “Level of Detail,” i.e. all the points falling within that bounding box, whether or not the intermediate LOD point sets were computed.

The bounding box of the root node must be specified upon the first creation of the octree. This represents the region of 3D space which contains the entire out-of-core point cloud. Resizing an octree dynamically upon insertion of a point, or set of points, that do not fall within this bounding box is an expensive task, though not impossible (cf. Scheiblauer and Wimmer’s strategy in [1] ). However, it currently is not implemented, meaning any point that is inserted that does not fall within the root’s bounding box will be discarded.

For building up octrees, I’ve added support for the addition of an arbitrary number of PointCloud::Ptr inputs via octree_base’s public member function:

addPointCloud(PointCloudConstPtr cloud)

Justin has written a command line tool for constructing an out-of-core octree from a PCD file. Once this tool has been used to create an out-of-core octree stored entirely on disk, the following code sample can be used to load and manipulate the out-of-core data.

At this point, Justin and I are looking forward to focusing on analysis of query, insertion, scalability and addition of features/algorithms for the outofcore data octree. I will also be adding examples to the trunk later this week.

[1]Claus Scheiblauer and Michael Wimmer, “Out-of-Core Selection and Editing of Huge Point Clouds.” Computers and Graphics, April 2011.
Add intensity differences detection to pcl::Segmentation module
Tuesday, February 28, 2012

I have opened up a new issue on http://dev.pointclouds.org/issues/630 and still working on it.

Urban Robotics Octree Framework
Tuesday, February 28, 2012

The following diagrams the general framework PCL received from Urban Robotics for use in the out-of-core project. The diagram is broken up into three parts:

  • Creation

    • Points of type PointT are added to the octree_base data structure. This data structure is in charge of managing the underlying child nodes and subsequently divided data.
    • As the points are subdivided, octree_base_nodes are created containing a random subsample or LOD of the points that are contained within each node (branch). These nodes are in charge of managing bounding box and meta data on disk and hold payload data read from and written to disk, but doesn’t handle the lower level read/writes.
    • Once a max depth or leaf node is reached a container type is created to manage disk or ram access. These are currently the only types of containers available within the framework.
    • The disk containers handle the low disk I/O
  • Directory Structure

    • At the top level of the directory structure lives a .octree file containing the octree depth or LOD, number of points at each LOD and various other bits of meta data. This maps to the octree_base.
    • Each directory from the top level root directory maps to an octree_base_node. Each node directory contains a .oct_idx file providing a nodes bounding box and LOD data. Leaf nodes have no children (child directories) and are found at the max depth of the tree providing access to the original payload data (Not a subsample).
  • Query

    • When reading or querying the tree the octree_base provides an interface to the underlying data structure.
    • Querying the tree is accomplished by providing a bounding box that intersects with the underlying octree_base_nodes. These octree_base_nodes provide access to the point data via containers or filepaths containing the binary point data.

Stephen and I have been documenting and refactoring the underlying the code and are at a point where we can start investigating some of the more interesting features to be implemented.

In addition I’ve started to commit tools that’ll be useful in the processing of pcd files for use in the framework.

New commit
Monday, February 27, 2012

Hi everybody. I’ve made it. Not without the help from Radu, but I finally committed the code. I found out how those gTests work and wrote some for my class. I also solved the problem with line endings(MSVC was using CR+LF instead of LF).

There was one more interesting thing about the code that I wrote. I am using vector of lists to store indices of segmented points. It looks like std::vector<std::list<int>>. I was very surprised when Radu told me that there occured some errors during the compilation, because I have manually assembled the library on my PC. The cause of it was the missing space. GCC wasn’t able to compile the std::vector<std::list<int>>. “>>” cannot be compiled on GCC.

Right now I’m going to prepare the the second variant of the RegionGrowing algorithm taking into account all those difficulties that I met on my way. I think it would be easier and much faster because now I have more experience.

submitted pcl/surface/nurbs
Monday, February 27, 2012

I’ve submitted the templated version of NURBS fitting to pcl trunk and tested it with a small program.

OpenGL/GPU Optimizations
Monday, February 27, 2012

In this post I’m going to talk about some GPU optimizations which have increased the speed of the likelihood function evaluation. Hordur Johannsson was the one who did most of this OpenGL magic.

Evaluating Individual Measurement to Model Associations

This is the primary approach of this method. Essentially by using the Z-buffer and an assumption of a generative ray-based cost function, we can evaluate the likelihood along the ray rather than the distance in Euclidean space. This is as was previously discussed below, I just mention it here for context.

Computing Per-pixel Likelihood Functions on the GPU using a Shader

The likelihood function we are currently evaluating is a normalized Gaussian with an added noise floor:

pixel lh equation

Previously this was computed on the CPU by transferring the depth buffer back to the CPU from the GPU. We have instead implemented a GPU shader to compute this on the GPU.

Currently we look-up this function from a pre-computed look-up table. Next we want to evaluate this functionally, to explore the optimization of function’s shape as well as things like image decimation. (Currently we decimate the depth image to 20x15 pixels).

Summing the Per-pixel Likelihood on the GPU

Having evaluted the log likelihood per pixel, the next step is to combine them into a single log likelihood per particle by log summation:

particle lh equation

This can be optimized by parallel summation of the pixel images: e.g. from 32x32 to 16x16 and so on to 1x1 (the final particle likelihood). In addition there may be some accuracy improvement by summing simularly sized values and avoiding floating point rounding errors: e.g. (a+b) + (c+d) instead of ((a+b)+c) +d

We’re currently working on this but the speed up is unlikely to be as substantial as the improvement in the previous section.

Single Transfer of Model to GPU

An addition to the above, previously we used the mixed polygon model. We have now transferred to using only a model made up of only triangles. This allows us to buffer the model on the GPU and instead we transmit the indices of the model triangles which should be rendered in a particular iteration. (Which is currently all the triangles).

Future work will look at determining the set of potentially visible polygons - perhaps using a voxel-based indice grid. This is more complex as it requires either implicit or explicit clustering of the particle set.

In addition to the above, we are now using an off screen buffer which allows us to renderer virtual images without being limited by the resolution of the machine’s specific resolution.

Putting it all together

We’ve benchmarked the improvements using a single threaded application which carries out particle propogration (including Visual Odometry), renderering, likelihood scoring and resampling. The test log file was 94 seconds long at about 10Hz (about 950 frames in total). We are going to focus on the increased number of particles for real-time operation with all modules being fully computed.

The biggest improvement we found was using the triangle model. This resulted in about a 4 times increase in processing speed. Yikes!

particle lh equation

Using the triangle model, we then tested with 100, 400 and 900 particles the effect of computing the likelihood on the GPU using a shader:

particle lh equation

This results in a 20-40% improvement, although we would like to carry out more sample points to verify this. The result of this is that we can now achieve real-time performance with 1000 particles. For the log file we didn’t observe significant improvement in accuracy beyond about 200 particles - so the next step is to start working with more aggressive motion and data with motion blur. There, being able to support extra particles will become important.

In my next post I’ll talk about how that computation is distributed across the various elements of the application.

Filters module clean up
Monday, February 27, 2012

For the last couple of days I have been working on http://dev.pointclouds.org/issues/614. I have expanded the FilterIndices base class with the following new systems:

  • FilterNegative
  • RemovedIndices
  • KeepOrganized
  • Filtering for NaN and Inf

The latter is not actually part of the base class, the derived classes may want to implement this if they so choose. The reason for that would be to give meaning to the difference between FilterNegative and RemovedIndices. FilterNegative only inverts the conditions of point removal for the real points. RemovedIndices also keeps track of the points removed because of NaN or Inf.

For the next couple of days I will upgrade the filters that can use these new systems to do so.

Joint Bilateral Upsampling for the Kinect
Sunday, February 26, 2012

Time for a new blog post. Lately, I have been working on image-based approaches for solving our smoothing and surface reconstructions problems. A straight-forward, but very effective method I wanted to implement for a long time is the one in:

  • Johannes Kopf, Michael Cohen, Dani Lischinski, and Matt Uyttendaele - Joint Bilateral Upsampling, ACM Transactions on Graphics (Proceedings of SIGGRAPH 2007)

The idea behind is to use the RGB image in order to enhance the depth image, in a joint bilateral filtering, based on the following formula:

\tilde{S}_p = \frac{1}{k_p} \sum_{q_d \in \Omega} {S_{q_d} f(||p_d - q_d|| g(||\tilde{I}_p-\tilde{I}_q||})

where, in our case, S is the depth image and \tilde{I} is the RGB image.

The nice thing about it is the fact that we can use the 15Hz mode of the Kinect in order to produce high resolution (1280x1024 px) RGB images and normal (640x480 px) depth images. By using this method, we can obtain 1280x960 depth images. I faced some problems with the registration of depth to high-res RGB images, so the results below show just the case of 640x480 depth and color images.

_images/low_res_color.png _images/low_res_random.png

As you can see, there are a lot of new points in the filtered point cloud (168152 vs 141511 in the input), no noisy points, and their positions are coherent with their neighbors.

Just as a teaser, an example of the high-resolution image-based upsampling (will come back with more results once we solve the registration problem mentioned above):


Also, in the meantime, I have spent a frustratingly large amount of time fixing the Poisson implementation we had in PCL. It seems that there was a memory error in the version a Google Summer of Code participant introduced in PCL, so in the next days we shall try to bring the newer version in.

Integral Images with point clouds
Saturday, February 25, 2012

I got back on track after my graduation holidays. I’m currently studying the Integral Images approach and how to use the technique for the point cloud. Nice results are expected from Tuesday.

Results for correspondence estimation based on normal shooting
Thursday, February 23, 2012

For every point in the source cloud, the normal and K nearest points in the target cloud are computed. Among these K points, the point that has the least distance to the normal (point to line distance in 3d http://mathworld.wolfram.com/Point-LineDistance3-Dimensional.html) is considered as the corresponding point. Result for a test dataset created is shown below. Two parallel planes differing in their y co-ordinates are created. Correspondences are shown by connecting lines. Implementation files and test program are available in the trunk.

Points on the planes shown in white dots. Correspondence shown by red lines.

_images/normal-shooting-1.png _images/normal-shooting-2.png
Filters module analysis
Thursday, February 23, 2012

The most eligible PCL filters for the specific noise removal in the Trimble data sets have already been discussed in previous blog posts by me and Mattia. The filters described in this blog post are not really suitable to be independently tested on the Trimble data sets, but a quick summary would be useful. While I was analyzing these filters I stumbled upon minor bugs, API inconsistencies and typos. The following analysis will summarize functionality, time complexity and possible improvements.

1 Filter

Function: Base class for almost all filters. Inherits from PCLBase, which manages a point cloud.
Runtime: N/A
Notes: Currently manages removed_indices, which is only useful for filters that are about point removal, and only a few of those filters actually use this functionality. Better to move this functionality to the FilterIndices base class.

2 FilterIndices

Function: Base class for filters that are about point removal. Inherits from Filter; the added functionality is being able to forward the indices of filtered points instead of points themselves.
Runtime: N/A
Notes: Some filters still inherit from Filter that could easily be upgraded to inherit from this class.

3 Clipper3D

Function: Base class for BoxClipper3D and PlaneClipper3D.
Runtime: N/A
Notes: Not officially part of any API module.

4 ExtractIndices

Function: Extracts a set of indices from a point cloud as a separate point cloud.
Runtime: O(n), iterates through all indices once, not performing any real computations.
Notes: Uses setNegative instead of the inherited getRemovedIndices for inversion purposes. May be upgraded to inherit from FilterIndices instead of Filter.

5 PassThrough

Function: Pass certain elements of a point cloud based on constraints for one particular dimension. Can act on any dimension of any PointT, not just spatial dimensions. Only acts on one dimension at a time though.
Runtime: O(n), iterates through all indices once, not performing any real computations.
Notes: Has setFilterLimitsNegative and getRemovedIndices for inversion purposes. May be upgraded to inherit from FilterIndices instead of Filter.

6 CropBox

Function: Pass certain elements of a point cloud based on contraints for their spatial dimensions. The constraint area is always a box but can be scaled, rotated and translated to any extent.
Runtime: O(n), iterates through all indices once, performing minor computations.
Notes: Does not use the inherited getRemovedIndices method and has no inversion system.

7 CropHull

Function: Pass certain elements of a point cloud based on contraints for their spatial dimensions. The constraint area is defined by a polygon structure.
Runtime: O(n \cdot p), iterates through all indices and through all polygon points, performing medium computations.
Notes: Uses setCropOutside instead of the inherited getRemovedIndices for inversion purposes.

8 PlaneClipper3D

Function: Check points on contraints for their spatial dimensions. The constraint area is a half-space, defined by a plane in 3D. Can be used on separate points, lines, planar polygons and point clouds.
Runtime: O(n), iterates through all indices once, performing minor computations.
Notes: Not part of any API module.

9 BoxClipper3D

Function: Check points on contraints for their spatial dimensions. The constraint area is always a box but can be scaled, rotated and translated to any extent. Can be used on separate points as well as point clouds.
Runtime: O(n), iterates through all indices once, performing minor computations.
Notes: Not part of any API module. Two virtual methods are not implemented. The point cloud implementation is almost identical to the CropBox functionality.

10 ProjectInliers

Function: Project certain elements of a point cloud to a predefined model. The possible models are defined in the sample_consensus module. Only moves points, does not remove points.
Runtime: O(n), iterates through all indices once, performing medium computations.

11 RandomSample

Function: Downsamples a point cloud with uniform random sampling.
Runtime: O(n), iterates through all indices once, performing minimal computations.
Notes: Does not use the inherited getRemovedIndices method. Does not use the inherited setIndices method.

For the next few days I will be tackling some of the typos and minor issues and will be adding the “bigger” issues on dev.pointclouds.

Source code
Wednesday, February 22, 2012

Hi everybody. I have commited the source code of the RegionGrowing algorithm. But right now it is disabled, because it needs to be fixed a little. But everyone who is interested can look at it.

I wrote one more variant of this algorithm. It takes the color of the points into account. It also has a good approach for controlling under- and over- segmentation. The idea is very simple. If the segment has less points than the user wants then the algorithm finds the nearest neighbouring segment and merges them together. I want to test it a few more times. The detailed description can be found in the article “Color-based segmentation of point clouds” by Qingming Zhana, Yubin Liangb, Yinghui Xiaoa.

One more interesting thing about the commited code. During testing I found it too slow. I was very surprised when my profiler said that the problem is in std::vector<bool>. I didn’t knew that it packs booleans one per bit, causing loss of speed when accessing elements. Anyway, I solved this problem by simply changing the value type.

My next step is to write some unit tests for the algorithm and make a tutorial.

Global optimization
Wednesday, February 22, 2012

The last week I worked on global optimization of objects with several fitted NURBS surfaces. As mentioned in earlier posts there are several problems when fitting several NURBS to C^1 continuous regions of an object, like overlaping, gaps and other intersection and alignment combinations.

Until now I was fitting C^1 continuous regions sequential in a linear equation. The key idea of global optimization of NURBS is to assemble all NURBS fitting equation of one object into one system of linear equations. This allows to define relationships between NURBS like the closing boundary constraint. This one basically defines that a point on one NURBS lies on a point on another NURBS. This is especially visible in the ‘Boxes’ videos available at http://users.acin.tuwien.ac.at/tmoerwald/?site=4.

The points for closing the boundary between NURBS are also used for trimming them. Since those points are by definition the outline of the sub-point-cloud of the C^1 continuous region they are simply added to the triangulation algorithm.

For Triangulation the established Delaunay Triangulation is used (http://www.sintef.no/Projectweb/Geometry-Toolkits/TTL/). The textures are captured from the RGB image by simply projecting them into the camera plane. This causes the problem that the texture of occluded area is the same as the one of the occluder. To solve this problem I want to implement a z-buffer to check which surface is the nearest.

Coming up next:

  • Multiple views for complete models.
Moving Least Squares Upsampling Methods (cntd)
Sunday, February 19, 2012

In the last period, I have concentrated on coming up with new and better upsampling methods for the Moving Least Squares algorithm. Also, a lot of issue on the ones presented last time were solved.

Everything was committed to trunk (along with a complete interface and documentation) and should be included in the next PCL release. The upsampling methods are the following:

  • NONE - no upsampling will be done, only the input points will be projected to their own MLS surfaces
  • SAMPLE_LOCAL_PLANE - the local plane of each input point will be sampled in a circular fashion using the upsampling_radius and the upsampling_step parameters
  • RANDOM_UNIFORM_DENSITY - the local plane of each input point will be sampled using an uniform random distribution such that the density of points is constant throughout the cloud - given by the desired_num_points_in_radius parameter
  • VOXEL_GRID_DILATION - the input cloud will be inserted into a voxel grid with voxels of size voxel_size; this voxel grid will be dilated dilation_iteration_num times and the resulting points will be projected to the MLS surface of the closest point in the input cloud; the result is a point cloud with filled holes and a constant point density.

A quick timing analysis shows us that the running times are well within the 2 minutes as mentioned in the project requirements. Important to note is the fact that the bulk of the time (~35s) is spent on computing the MLS surface, and about 1-3s is spent on the actual upsampling. Thus, we can conclude that the quality improvements of the upsampling are well worth the additional ~5% increase in execution time.

Upsampling method Time(s) Resulting # points
NONE 35 256.408

A more conclusive test would be to take a Kinect cloud of a wall at a distance where the noise is accentuated (~3m) and try to fit a plane in each of the resulting upsampled clouds. In order to make the experiment more realistic, we took the picture of the wall at an angle, such that the quantization effects would increase along the wall. The numeric results are the following:

Upsampling method Cloud # points % inliers
original 275.140 81.3
NONE 275.140 81.1
SAMPLE_LOCAL_PLANE 2.201.120 81.2

Unfortunately these numerical values do not represent the actual quality of the fit, because of the varying point density across the cloud in the different upsampling methods (i.e., the parts of the wall closer to the sensor had a larger density and precision in the original cloud, and as points get farther from the sensor, the sparsity and noise increase; BUT in VOXEL_GRID_DILATION and RANDOM_UNIFORM_DENSITY, the density is constant across the cloud, meaning that the noisy part of the wall has the same amount of points as the more precise part).

As such, in order to analyze the quality of the fit, we do a visual analysis of the inliers/outliers ratio, as shown in the following pictures:

Original cloud and its plane inliers


NONE cloud and its plane inliers


SAMPLE_LOCAL_PLANE cloud and its plane inliers


RANDOM_UNIFORM_DENSITY cloud and its plane inliers


VOXEL_GRID_DILATION and its plane inliers


The conclusion is that the VOXEL_GRID_DILATION method behaves the best, as it has the least holes out of all the options.

So this is a wrap-up for the MLS smoothing. Next, I shall be looking into image-based hole-filling and how this can be applied to our problem. This will involve some experiments using MatLab and adding some code into the PCL I/O framework.

Bilateral filter analysis
Friday, February 17, 2012

The current bilateral filter in PCL acts only on the intensity values of the points in a point cloud and is therefore not interesting for the desired noise removal in the Trimble data sets. However, analysis of this implementation will help in understanding a new implementation and gives an indication for the time complexity that can be expected.

The current version has O(n \cdot m) complexity where n is the number of points to iterate through and m the average number of neighbors found for each point.

For each point pair it will calculate weights based on distance and intensity difference using gaussian kernels. It uses this information to only alter the intensity values of the points. For more information: C. Tomasi and R. Manduchi. Bilateral Filtering for Gray and Color Images. In Proceedings of the IEEE International Conference on Computer Vision, 1998.

The neighbors are found by a radius search (currently only tested with kdtree search) where the radius is 2 \, \sigma_s. The searching is the most time consuming part of the algorithm and the parameter \sigma_s greatly determines runtime.

\boldsymbol{\sigma_s} Time (s)
1 21
5 68
10 203
15 426
25 1116

Increasing \sigma_s also increases the area of effect of the smoothing as can be seen in the following pictures:

original, \sigma_s = 5:

_images/031.png _images/041.png

\sigma_s = 15, \sigma_s = 25:

_images/051.png _images/061.png

The above results all have a \sigma_r of 1000. Reducing this number reduces the kernel effect and gives seemingly similar results as reducing \sigma_s. The value of \sigma_r has no effect on the computation time.

This bilateral filter could already be considered useful for the Trimble data sets since the intensity does have some noise in it. With \sigma_s = 15 the noise is significantly removed whilst not reducing detail on edges. The runtime is however very long. Hopefully a new searching algorithm would significantly reduce this. Furthermore it can be noted that the code can easily exploit computing parallelism since there is no conditional branching and few dependencies.

In conclusion: The filter is very powerful and elegant, requiring few input parameters that are also easily understood. The upgrade to a spatial bilateral filter for 3D will likely be worhtwhile, although the drawback (for now) will be its runtime.

Preliminary conversion from UR’s point type to PCL’s PointT
Thursday, February 16, 2012

I’ve started to make the change to PCL’s templated point type. It required minimal refactoring since the code is already templated, and very few operations are done on the point itself except accessing the fields (which are fortunately mostly the same as pcl’s). To accommodate the Urban Robotics point data structure, we’ll need to use the PointXYZRGBNormal data type, but for the time being, I jettisoned three additional fields (error, cameraCount and traidID) and have been testing with PointXYZ. The extra fields are currently just payload (within the library), and are never used within. We may need to develop some custom point types for the additional payload fields in the long run.

Once I made the change, I had to rewrite some of the unit tests, as they relied on the operator== in the URCS point struct. This was a minor change, but now I can construct and query octrees out of core with the pcl::PointXYZ. I’d still like to benchmark HDD performance for queries, since performance seems to deteriorate quite quickly on my slow hard drive. I’m also planning to use the concepts in the existing octree unit tests as a basis to do a more exhaustive test of the out-of-core octree interface.

[==========] Running 6 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 6 tests from PCL
[ RUN      ] PCL.Outofcore_Octree_Pointcloud_Test
[       OK ] PCL.Outofcore_Octree_Pointcloud_Test (3448 ms)
[ RUN      ] PCL.Bounding_Box
[       OK ] PCL.Bounding_Box (0 ms)
[ RUN      ] PCL.Point_Query
[       OK ] PCL.Point_Query (215 ms)
[ RUN      ] PCL.Octree_Build
[       OK ] PCL.Octree_Build (700 ms)
[ RUN      ] PCL.Octree_Build_LOD
[       OK ] PCL.Octree_Build_LOD (56 ms)
[ RUN      ] PCL.Ram_Tree
[       OK ] PCL.Ram_Tree (42 ms)
[----------] 6 tests from PCL (4461 ms total)

[----------] Global test environment tear-down
[==========] 6 tests from 1 test case ran. (4461 ms total)
[  PASSED  ] 6 tests.

Tonight and tomorrow I hope to start working on changing the container type to PointCloud pointers. Justin and I have also been discussing ways to remove the JSON dependency. Once the basic port is done, I’ll be focusing on improving the interface, improving features and improving overall performance of the out-of-core octree. Finally, I hope to help develop supporting algorithms for Justin’s out-of-core octree visualization tools.

New filtering idea (work in progress...)
Thursday, February 16, 2012

Among all algorithms the filter StatisticalOutlierRemoval is definitely the best, although this has many faults such as the elimination of good portions of the cloud. Thanks to the suggestions of Jorge and Federico, I spent some time considering optimizations and finding out how far we can improve the algorithm.

The studied subjects are two:

  1. Optimizing the search for adjacent points with the study of integral images.
  2. Optimizing the search of noisy points.

The last three days I was focused on the second point. First, it is important to consider how the StatisticalOutlierRemoval works:

  • For all the points of the cloud, it is calculated the average Euclidean distance of the target point with respect to a set of N neighboring points.
  • Then, it is estimated variance and standard deviation of all the mean points resulting in a bell-shaped distribution similar to the Gaussian one.
  • It iterates again to the cloud points and deletes all those which fall outside a certain variance range.

A major drawback is that this algorithm does not take into account the direction of the calculated distances. A good idea is to change the previous filter introducing the covariance matrix and the eigenvalue decomposition​​:

  • For all cloud points it is calculated the vector of the mean value distances (u) within a fixed radius R.
  • It calculates the covariance matrix using X_i measurements (along axis x, y and z) and the expected value u.
  • From the covariance matrix, the eigenvalue decomposition is performed ​​and the smaller eigenvalue is taken into account (NB: The eigenvalue with the largest value corresponds to the axis that has the greatest variance; it will therefore be the variance of the main component 1. Then, the other eigenvalues ​​are the variances along the other “weak” directions. A large eigenvalue indicates that the query point is slightly connected to other points along a direction, and vice versa for a small eigenvalue. Taking the smallest eigenvalue we can assess the best “connection” level with the rest of the point cloud).
  • Variance and standard deviation are extracted from the eigenvalues, resulting in a bell-shaped distribution similar to the Gaussian one.
  • The algorithm iterates again to the cloud points and deletes all those which fall outside a certain variance range.

From a theoretical side, the changes should remove only the points that are weakly connected to the rest of the cloud along the three directions x, y and z. After various tests, and despite the validity of the theoretical functioning, the attached picture shows that real results are not really promising as there are removed points almost exclusively from the “good” part of the cloud.


The ghost points are still present and the points are particularly affected in the corners (see the building facade and the car).

Diving into the UR Out-of-core Octree Code
Wednesday, February 15, 2012

Thanks to Justin Rosen, the Urban Robotics out-of-core octree code is now in the trunk, cleaned up quite a bit from its original state, and compiling in the PCL framework (but we haven’t changed the data structures to PCL’s yet). I have been studying the code and starting to determine what else will need to be done as far as integration goes. I have to thank Justin and Radu for their warm welcome, and for getting me up to speed over the weekend.

So far, I have run the unit tests that Justin converted from Boost to Google unit testing framework, and experimented with the limitations of the code. Currently, I’m working on a combination of two things. I’m working on refactoring the code to work with PCL’s templating and allow point clouds, rather than vector<PointT> as inputs. The transition is going smoothly so far. I can still build the octrees on disk with the unit tests, but there are some issues with querying the points / bounding boxes that I’m still looking into. I’ll have a more detailed update as soon as I get it straightened out.

For tracking down bugs in the PCL in general, I’ve set up a double-buffered installation of PCL trunk and whatever revision I’m interested in, and set up a build environment for test programs to build and run against the different versions. Now that I’m getting settled in, I’m hoping to establish a more specific roadmap by the end of the week, and add some literature review for out-of-core and octree background reading.

Final week for this quarter
Wednesday, February 15, 2012

It is approaching my final week in this quarter. I am in a little bit more pressure. Anyway I am planning to play with more data from trimble and try to gain some ideas from the noise removal project.

Quantitative filter analysis using benchmarks
Wednesday, February 15, 2012

Today I finished the quantitative filter analysis benchmark algorithm, which takes the resulting cloud of a filter, compares it against the target benchmark cloud using octree and returns the following result:

Error = \dfrac{noise_{known}-noise_{removed}}{noise_{known}} \; + \; \dfrac{noise_{added}}{\frac{1}{10} \, desired} \; + \; \dfrac{desired_{removed}}{\frac{1}{10} \, desired}

The first term ranges from 0 to 1, denoting the amount of noise remaining, i.e. how good the filter is at removing the noise we want it to remove. The second term increases if additional noise is being generated because of the filter. The third term increases if non-noise/desired points are being removed because of the filter. If the resulting sum of these terms becomes equal or greater than 1, the filter is deemed useless. Because of this interrelationship the last two terms are scaled with a percentage of the total desired points in the cloud. This value (currently 10%) may still be changed after further analysis. Also, the checking for removed points is currently not taking point location into account, for instance: If the desired points were to be uniformly downsampled to 90% of the original, the error would be 1 although the actual result would not be that useless.

For the next few days I will be finishing up on the PCL filter testing using these new metrics.

RGB-D Localization Software Framework
Tuesday, February 14, 2012

An overview of the system we are developing is illustrated at the bottom of this post. Sensor output in illustrated in lemon color, although only RGB-D data is considered in this project. Modules we have been actively developing are colored blue. The module in orange is our Incremental Motion Estimation algorithm. This module current utilizes a feature-based visual odometry algorithm. One alternative to this approach would be Steinbrucker et al. (ICCV 2001) which was developed specifically for RGB-D.

System Diagram

We have also illustrated in pink three modules which would naturally augment the point cloud-based localization system by providing redundancy and recovery methods. We have been developing these modules independently of PCL. They are illustrated here to demonstrate the flexibility of an SMC-based approach to localization.

Some typical VO performance for motion along a corridor. The boxes are 5m in size. We used a Kinect and the Freenect driver for this.

Typical VO Performance

I’m currently finalizing a particle filter which used Eigen Isometry3d and boost. My old version used GSL for random number generation - which I want to move away from.

Quantitative filter analysis using benchmarks
Tuesday, February 14, 2012

Today I finished the benchmark target point cloud that has its noise manually removed. After discussing with Jorge, it was decided that we are currently only focussing on removal of points and not smoothing of points.

_images/011.png _images/021.png

Next I will be working on the comparison algorithm that will return a number describing the success of noise removal. I will level will Shaohui since this is a particular case of change detection.

Benchmark of the computation time
Monday, February 13, 2012

I have prepared a detailed report for the analysis of the computational speed of the filtering methods: RadiusOutlierRemoval and StatisticalOutlierRemoval. To do that, I performed the tests on two point clouds: A of 3.7Mln points and * B * of 24.4Mln points. The laptop I used has Linux Kubuntu 11.04, Intel Core Duo P8600 2.4GHz and 4GB RAM.


A for loop iterates through all the N elements of the cloud (complexity O(N)). Within the same cycle, all points within a radius r are found to check if the wuery point has a sufficient number of neighbors. Assuming that the search algorithm of the points within a radius r (‘RadiusSearch’) is brute-force (O(N) for unordered point clouds), the computational complexity of this method is O(N·N) .

The table has been constructed by varying the searching radius of the points, and the results are expressed in seconds:

Ray (mm) Cloud A (sec) Cloud B (sec)
10 12s 24s
20 24s 43s
50 93s 151s
100 320s 530s
200 1569s 2200s

By increasing the searching radius, the computation time grows very fast. This means that the search algorithm totally affects the filtering speed of this methodology.



For this filter, a for loop iterates through all the elements of the point cloud (O (N)). Then, still within the same cycle, the method ‘nearestKSearch’ searches for the closest meanK points to the query point (O(N·meanK)). Afterthat the average distance is calculated to obtain the Gaussian distribution (O(N)). Finally, the filtering end up taking into consideration the variance analysis (O (N)). Thus, the computational complexity is approximately: O(N (N·meanK) + N + N).

Num of neigh Cloud A (sec) Cloud B (sec)
10 12s 25s
20 17s 33s
50 33s 65s
100 186s 134s
200 1600s 406s


The above results show that the search algorithms are the most time consuming part of the classes. Therefore it’s very important to develop a spherical ordered search algorithm in order to optimize any type of filter operation that requires the searching of surrounding points: ‘SphericalOrganizedNeighbor’.

Urban Robotics Octree Refactor and Documentation
Monday, February 13, 2012

I’ve started to dig through Urbans code a bit more, refactoring where possible. The code is now broken out into a few more manageable pieces. I’ve also started commenting and documenting the portions I’ve walked through.

Radu and I welcomed Stephen Fox on today who is working on the URCS code sprint. Stephens been brought up to speed and we’ll now have two minds diving into the world of out-of-core visualization.

I committed to the trunk a pcl_outofcore module. It’s unclear to me at the moment if this will eventually be rolled into the existing octree module.

Chat with Radu, Federico and Mattia
Monday, February 13, 2012

Last friday I had a very useful chat with Radu, Federico and Mattia about the TRCS and ANF. Unfortunately Jorge was not able to be present.

A quick summary of the conversation:

  • For the time being there are 3 tasks at hand that Mattia and I can work on:
    • Test current PCL filters on Trimble data sets.
    • Bugfix and patch/modify filters that do not perform as they should.
    • Research and brainstorm for new filter implementations.
  • For further filter testing during this TRCS, a more quantifiable error metric is useful. Radu suggested to create a target point cloud that has the noise manually removed and compare that target with the filter results. This needs to be further discussed with Jorge since he will know best what noise is in need to be removed. Another topic of further discussion relates to currently defining noise solely as point removal, though point cloud smoothing is also interesting. Alexandru-Eugen Ichim is currently working on an algorithm also used for point cloud smoothing: http://www.pointclouds.org/blog/tocs/aichim/index.php.
  • Radu mentioned that shadow point removal is easily implementable and already being used on their PR2. Related papers: http://researchcommons.waikato.ac.nz/bitstream/handle/10289/3828/Mixed%20Pixel%20Return%20Separation.pdf and http://www.robotic.de/fileadmin/robotic/fuchs/TOFCamerasFuchsMay2007.pdf.
  • The current PCL bilateral filter only changes intensity. A new filter implementation based on the bilateral filter would act on the 3D coordinates. Federico is very knowledgeable in this field. A link that came up during the chat: http://people.csail.mit.edu/sparis/bf/
  • Federico mentioned the topic of integral images in 3D, which would be useful for the 3D bilteral filter and for fast filtering. Mattia has shown interest in working on this implementation.
  • For those filters that use a searching algorithm, which are the more powerful filters, the searching is currently the most time consuming aspect. Michael, Radu and Suat are discussing the possibility for adding a SphericalOrganizedNeighbor search, useful for LIDAR scans.
  • For vegetation removal; remove areas with high curvature; analyze neighborhoods differenly.
  • For LIDAR scans; take note that point cloud density is not uniform.
  • For exploiting GPU parallelism; implementations will stay in trunk till PCL 2.0

I have updated my roadmap and will start work on the new error metric; creating the benchmark target (segment from Statues_1.pcd) that has its noise manually removed.

Closing the gaps
Monday, February 13, 2012

Hi, I’ve commited the code for NURBS fitting to svn last week. There’s still some clean-up necessary but the code should work fine!

I’m now starting with optimization methods for reconstructed NURBS surfaces: After fitting a set of NURBS to a point cloud, a couple of things have to be done to make a nice CAD model out of it. E.g. gaps have to be closed, intersecting NURBS need to be trimmed, ...

To deal with this issues I’m working on a global optimization approach, where I solve a linear equation in the least square sense, so that the NURBS still approximates the point-cloud while meeting the constraints (formulated as weak) mentioned above.

Trimble data
Sunday, February 12, 2012

Hi evreybody. I finally managed to test the algorithm on the Trimble datasets. So here are some results.

_images/place_1.png _images/place_2.png _images/facade_1.png _images/facade_2.png _images/elephant_1.png _images/elephant_2.png _images/statues_1.png _images/statues_2.png
Finding the best method to filter vegetation
Sunday, February 12, 2012

A first analysis of the filters available in PCL, showed that only two can be considered almost valid for the removal of “ghost points” and vegetation. Though a good setting of the parameters can return good results for the removal of shadows and moving objects, the results were not as satisfactory for the removal of vegetation. As a point of reference I took a cloud in which the trees are covered with many leaves. The goal was to minimize the leaves which are described as small groups of flying points in a PointCloud.

Results with RadiusOutlierRemoval (in the right the original cloud, in the left the filtered one):

_images/17.png _images/25.png

Results with StatisticalOutlierRemoval (in the right the original cloud, in the left the filtered one):

_images/35.png _images/42.png

In conclusion I can say that neither of the filters is actually able to offer an intelligent removal of vegetation without damaging the “good” background of the point cloud.

Started my experiments
Sunday, February 12, 2012

I have started validating the existing registration algorithms on Trimble data. I am beginning with validating Point to Plane ICP on the entire dataset. Its going to take a few hours for the algorithm to finish given the size of the dataset. On the implementation side, I have implemented CorrespondenceEstimation based on normal shooting. A new class CorrespondenceEstimationNormalShooting should be available in the trunk shortly. The reference to this method is “Efficient Variants of ICP” as mentioned in my previous blog post.

PCL Visualizer
Sunday, February 12, 2012

Spent some time with Radu going over some of the issues related to the current PCL Visualizer. We were able to knock down memory performance and rendering speed for larger datasets. In the current implementation we had multiple copies of the cloud used by PCL and VTK.

For the time being we updated the visualizer to run in immediate mode which should speed things up significantly, while taking a hit during the creation of the display list. This won’t work for applications which require a more interactive session, i.e. filtering.

Going Live with the PCL-URCS Developer Blog
Saturday, February 11, 2012

Greetings to the PCL community. My name is Stephen Fox, and I am the new developer who has just come aboard for the Urban Robotics Code Sprint. Stay tuned for updates on the project as we move forward with the integration of Urban Robotics’s out-of-core octree implementation!

Preliminary Result for Change Detection on Intensity Values
Saturday, February 11, 2012

I just came back to NY from a conference in CA. Currently, I am coding in order to test some thoughts. This time, I am looking at the intensity difference between two corresponding points in two inputs. In order to figure out the corresponding pairs, I adopted nearest neighbour searching method which was similarly used in pcl:SegmentDifferences. This time, I tried to ignore the location difference. So if the distance between them was too large, the two matching points were avoided any further processing.

I actually care about the two papers on the floor in target more. The result is the intensity difference between the two inputs. We could see the papers were successfully detected because they have higher values. We could some other major differences were detected. Please ignore the noise on the background this time.






Moving Least Squares Upsampling Methods
Wednesday, February 08, 2012

With some very good advice from Zoltan and a lot of hacking, we now have 3 upsampling methods for the MLS algorithm.


No additional points are created here. The input pointcloud is projected to its own MLS surface. This is exactly what we previously tested and presented in a recent blog post.



For each point, sample its local plane by creating points inside a circle with fixed radius and fixed step size. Then, using the polynomial that was fitted, compute the normal at that position and add the displacement along the normal. To reject noisy points, we increased the threshold for the number of points we need in order to estimate the local polynomial fit. This guarantees that points with a ‘weak neighborhood’ (i.e., noise) do not appear in the output.

And a few results:

_images/mls_slp_table_bottles.png _images/mls_slp_table_tupperware.png _images/mls_slp_door_handle.png

The first picture is the reconstruction of coke bottles on a table. Please notice the correction for the quantization effects. The table surface is now planar and the objects look ‘grippable’. The second picture is a similar scenario, now using tupperware. The last picture shows how well the door handle is reconstructed. We conclude that visually, the results are very good.

An immediate problem is that this method adds the same amount of new samples to all points, not taking into account the local point density. An improvement we can make on this approach is to filter it with a voxel grid in order to have a uniform point density. Of course, this operation is superfluous, and is useful just for memory saving (see picture below for comparison between upsampled and upsampled + voxel grid).



Take as input a desired point density within a neighborhood with a fixed radius. For each point, based on the density of its vicinity, add more points on the local plane using a random number generator with uniform distribution. We then apply the same procedure as for 2 to project the point to the MLS surface.


The results are satisfying. As compared to 2, we do not need to apply the expensive voxel grid filter anymore. An issue might be the fact that, because we generate the points using a random number generator, the output point cloud looks a bit messy (as compared to 2, where the points are generated on a grid determined by the step size), but the surface is still well preserved. Also, the time performance is poor because of the rng.


This method makes sense theoretically, but in practice we are having serious issues optimizing it to fit into main memory. The idea behind it is to take each point pair within a fixed radius neighborhood and to sample the line connecting these two points. Ideally, this would fill up any small holes inside the cloud. The downside is that it also creates a lot of additional points in already dense areas. Handling this elegantly is something we need to think about.

Finding the best method to filter scattered ghost points
Tuesday, February 07, 2012

For the purpose of the filtering of ghost points, me and Florentinus decided to take as reference the dataset Statues_1. Inspecting that scenario, we marked as possible ghost noises some points highlighted in the following images:


The inspected methods are:

  • PassThrough< PointT >
  • RadiusOutlierRemoval< PointT >
  • ConditionalRemoval< PointT >
  • StatisticalOutlierRemoval< PointT >
  • VoxelGrid< PointT >
  • ApproximateVoxelGrid< PointT >

It turned out that only two methods can be succesfully used for our purpose: the RadiusOutlierRemoval and the StatisticalOutlierRemoval.

RadiusOutlierRemoval: the user specifies a number of neighbors which every indices must have within a specified radius to remain in the PointCloud. Because the filter is based on a fixed radius, many points on the object contours are deleted. Due to a good filtering result, the “good” objects are also affected.


StatisticalOutlierRemoval: for each point, it computes the mean distance from it to a specified number of neighbors. By assuming that the resulted distribution is Gaussian with a mean and a standard deviation, all points whose mean distances are outside an interval defined by the global distances mean and standard deviation are trimmed from the dataset. This is so far the best method for the ghost deletion; moreover, it strictly depends to the parameters and the filter often removes portions of small “good” objects.

Hello Everybody
Sunday, February 05, 2012

This is my first blog post for TRCS and I am pretty excited about this project. I was getting a hang of the registration pipeline in PCL and understanding the various modules over the last couple of days. I read the paper “Efficient Variants of ICP” which I recommend reading for people interested in understanding the variants of ICP. This should give a good idea on customizing the different modules of the PCL registration pipeline to suit your dataset. Also it helps in understanding the registration pipleline in PCL. We had a discussion on the modules where new algorithms can be added and we have shortlisted some for now. I will be working on pairwise registration to begin with.

Testing PCL filters on Trimble data sets
Friday, February 03, 2012

Mattia and I have extracted a noisy segment from Statues_1.pcd that we will use as a benchmark to test the different filters in PCL. We have also constructed a timing benchmark that allows us to measure the algorithm’s speed more or less independent of platform. The desirable noise reduction and undesirable deformation are currently measured by our own (subjective) grading. In order to speed up the work, the extracted segment is deprived of NaNs and in the process also lost its organizing.

Mesh Construction Methods
Thursday, February 02, 2012

In this blog post, we shall inspect the mesh construction methods available in PCL.

Marching Cubes

The algorithm was first presented 25 years ago in:

  • William E. Lorensen, Harvey E. Cline: Marching Cubes: A high resolution 3D surface construction algorithm. In: Computer Graphics, Vol. 21, Nr. 4, July 1987

In PCL, these are implemented in the MarchingCubes class with the variants MarchingCubesGreedy and MarchingCubesGreedyDot. The ‘greedy’ comes from the way the voxelization is done. Starting from a point cloud, we create a voxel grid in which we mark voxels as occupied if a point is close enough to the center. Obviously, this allows us to create meshes with a variable number of vertices (i.e., subsample or upsample the input cloud). We are interested in the performance of this algorithm with the noisy Kinect data. Time-wise, the algorithm ran in about 2-3 seconds for a 640x480 cloud.

The following figure shows the results for various leaf sizes (from left to right, bottom to top: leaf size of 0.5 cm, 1 cm, 3 cm, and 6 cm, respectively):


And a close-up on the highest resolution cloud:


We conclude that the results are not satisfactory, as the upsampling is ‘artificial’ and does not inherit the properties of the underlying surface. Furthermore, there is no noise-removal mechanism and the blocking artifacts are disturbing.

Naive Algorithm for Organized Point Cloud Triangulation

This algorithm is implemented in the OrganizedFastMesh class in PCL. The idea behind is very simple: it takes each point in the inherent 2D grid of the Kinect clouds and triangulates it with its immediate neighbors in the grid. One can quickly understand that NaN points (points that were not captured by the sensor) will result in holes in the mesh. This is a mesh construction method and will output a mesh with exactly the same vertices as the input cloud. It does not take care of noise or NaN values in any way.

A screenshot of the output can be seen in the following figure. Visually, the result is decent, considering that the processing time is extremely small - just a single pass through all the points of the clouds.

First Testing Result based on Current Octree-based Change Detector
Thursday, February 02, 2012

I have set up an initial testing project for myself a while before. Finally, I got great testing datasets. Trimble provided us with more than 2G data and Gordon collected several point clouds of his own garage. Since there are a lot of NAN values in Trimble data that could be removed by centain PCL functionality but I am kind of ‘lazy’ and have other options :D, I decided to choose two of Gordon’s data files of which details are shown as below to start.

  • reference.pcd: baseline scan
  • change.pcd: items on the floor moved a few cm, door moved, item moved from floor to table.

I have been studying how octree-based change detector works. Basically, double buffer structure and XOR bit pattern were used to decide which points are in the target but not in the reference. If you want to know which points are in the reference but not in the target, you just need to switch their roles.

Here are some first simple testing results. The major time-consuming part is not the stage of change detection, but the loading data part.






Start the real work
Wednesday, February 01, 2012

Everything is ready to start testing the PCL filters on Trimble data. I plan to talk with Florentinus, Jorge and Radu to define the “noise” and the expected accuracy of the filters.

First results
Tuesday, January 31, 2012

A few days ago I was thinking about how it would be better to organize the code and its structure, dependencies between classes. Because there will be several algorithms of segmentation, I have decided that it would be better if all of them will be inherited from some base class named Segmentation(or something like that). So based on these thoughts I have written the code and even managed to test it on some of the datasets I had.

Synthetic clouds

_images/Cube.png _images/26.png

Noisy synthetic clouds

_images/6.png _images/44.png

Real clouds

_images/Cars.png _images/53.png

Point cloud that I found on the local forum

_images/Office_1.png _images/Office_2.png

I hope that at the end of this week I will be able to commit the code. But right now I’m gonna run more tests and will give the code a proper form.

Testing PCL filters on Trimble data sets
Tuesday, January 31, 2012

I am currently downloading data sets that Jorge provided for further testing. For the next couple of days I will test the currently existing PCL filters on them and analyze the type of noise in the sets. I will also attempt to set up a more detailed roadmap for the next phase of the sprint.

Urban Robotics Octree Unit Tests
Tuesday, January 31, 2012

I’ve successfully compiled and ran Urban Robotics’ Octree code and unit tests. In doing so I created a new library named pcl_outofcore (which is likely to change), but is giving me a test bed for compiling their code.

There were minor code changes including switching the unit tests from Boost to GoogleTest:

[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from PCL
[ RUN      ] PCL.Octree_Build
[       OK ] PCL.Octree_Build (1584 ms)
[ RUN      ] PCL.Bounding_Box
[       OK ] PCL.Bounding_Box (5 ms)
[ RUN      ] PCL.Point_Query
[       OK ] PCL.Point_Query (356 ms)
[ RUN      ] PCL.Ram_Tree
[       OK ] PCL.Ram_Tree (482 ms)
[----------] 4 tests from PCL (2427 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test case ran. (2427 ms total)
[  PASSED  ] 4 tests.

This created tree of files on disk which represent the octree. The depth of the directory structure is the depth of the tree. Each directory represents a branch or tree node:

├── 0
│   ├── ...
├── 1
│   ├── ...
├── 2
│   ├── ...
├── 3
│   ├── ...
├── 4
│   ├── ...
├── 5
│   ├── ...
├── 6
│   ├── ...
├── 7
│   ├── ...
├── ade37c05-a2bb-4da4-8768-0aaa4f67a0e7_node.oct_dat
├── tree_test2.oct_idx

Within each directory (node) we’ll find a JSON formatted metadata index file (oct_idx), binary point data (oct_dat) and multiple directories which are the nodes children:

  "version":      2,
  "bbmin":        [0, 0, 0],
  "bbmax":        [1, 1, 1],
  "bin":  "ade37c05-a2bb-4da4-8768-0aaa4f67a0e7_node.oct_dat"

I’ve also updated my roadmap which now contains a bit more detail on where I’ll be going with a refactor of the current codebase.

Change Detection based on Hausdorff Distance
Friday, January 27, 2012

I have started looking at how to do change detection based on Hausdorff Distance. Basically what I wish to do is looking for the nearest neigbour in the reference point cloud of each point in the target point cloud and then calculating the Euclidian distance. There is supposed to be a threshold that could be specified by users to segment out the differences. The whole thought is pretty similar to what we have now in pcl:getPointCloudDifference. I want to inject this functionality to the module pcl:octree.

Holiday + NURBS integration finished
Friday, January 27, 2012

I’ll be on my snowboard for the next week, so don’t be surprised if I don’t respond until Monday 6th of February.

I’ve got NURBS now fully functional integrated into pcl. But since I’m on holiday next week I will commit it to the svn-repo when I come back. So that I’m available just in case of problems.

TOCS Dataset collection now complete!
Friday, January 27, 2012

We have managed to collect all the datasets required by Toyota. For a complete description, please visit the following link (also accessible from my main page).

Programming-wise, we have spent time fixing bugs and beautifying the pcl_surface module. After I will finish my exams next week, I shall start looking into implementing some new algorithms.

My first blog entry
Tuesday, January 24, 2012

Hi everybody. Today I finally figured out how to post blog entries. I also was able to install python and Sphinx on my PC. So from now on I don’t have to commit the rst files to see the result.

Tuesday, January 24, 2012

A little late I also figured out the blogging system. Had a couple of exams last week, the last one yesterday, so I’m now ready to start.

Regarding my plan for the next couple of days/weeks: I’m implementing NURBS basis functions so that afterwards I can modify my NURBS fitting functions to remove the dependency of openNURBS.

Update on progress on change detection project
Monday, January 23, 2012

I am still running and hacking the current example on Octree-based detector. So far, I have not been quite sure where I should go with yet. Radu and Gordon gave me some suggestion on methods based on

  • change detection in geometry space (e.g., XYZ)
  • change detection in color space (e.g., RGB)
  • combination of both

I was looking for some published literature on Octree-based method. Now according to the new suggestion, I would like to change my direction and find something new and interesting.

Setting up my workspace
Monday, January 23, 2012

Today I realized a small program meant to test the filter functions. It permits to compare the differences on a point cloud before and after the filtering operation; moreover it is possible to change parameters without recompiling the program.

Familiarizing with filters module in PCL
Monday, January 23, 2012

I have not spent a lot of time on TRCS since my last update; I am currently finishing up on some work here and will not be spending full-time on TRCS during upcoming week. I have discussed initial approaches for ANF with Mattia and Jorge and have slightly reworked my roadmap. Currently I am working on:

  • Get to know the filters module in PCL; the code structuring, the different filters in there, how to use them, when to use them, basic understanding of their workings.
Urban Robotics Octree Road Map Updates
Monday, January 23, 2012

With the availability of Urban Robotics’ octree-based point cloud format I’ll be doing the initial integration of their work into PCL. I’ve made a few updates to my roadmap related to Urban’s octree format as well as some additional visualization tasks related to OpenGL and VTK.

In addition to Urban’s code I’ve started doing quite a bit of research trying finding the most relevant references related to the topic of out-of-core visualization. This topic spans a large problem set where octrees play a key role, but are not the end all be all solution.

The list of references on my blog are sure to grow as the sprint moves forward, but let’s get started with some of my initial findings:

Quantifying Performance
Friday, January 20, 2012

So as to quantify the benefit of using our depth image simulation mode for localization, we need to do some benchmarking. There are a number of parameters we need to test:

  • number of particles
  • accuracy and/or completeness of the model
  • downsampling rate of the incoming imagery
  • Framerate required for successful tracking

And some metrics:

  • Achieveable framerate [fps]
  • Mean Error [in meters]
  • Per cent of frames within some distance of the true location

Where error is measured using LIDAR-based ground truth.

In addition to this we also developed a second type of likelihood function. This method is essentially equivalent to ICP - without the iterative. Each pixel’s location is compared to the nearest point in euclidean space.

In this figure we try to illustrate some examples of where the scores would be substantially different. So we have an RGB-D sensor sensing two depth samples (purple) something like the end of a corridor (green, from above):

Illustration of ranging issue

For the depth image simulation method (blue, Ray-to-Plane) the likelihood would be formed by comparing with the difference in depth (blue star). For the ICP-type method (red, point-to-plane) the distance can be very different. For the lower case the distances are approximately the same. For the upper case, the association is very different. Also as the ICP-type method requires searching over the entire set of planes for the explict correspondence and it is also quite expensive.

We wanted to test what effect this choice has on accuracy. Below are some figures showing the results across an extensive experimentation. The figures were produced with my 2 year old 4-core 2.53GHz Pentium Core2 with an Nvidia Quadro 1700M with 32-cores. Each result is the average of 20 independent Monte Carlo runs. Total testing is equivalent to 16 hours of runtime.

First of all, error broadly converges to about the same accuracy as the number of particles increases:

Error for both methods with varying number of particles Failure rate for both methods with varying number of particles

The 50cm figure is largely because we aim to maintain multi-modal estimate - so as to be more robust when tracking. To get the optimal performance for each frame you could use ICP using the most likely particle as the starting condition. We haven’t done that here.

This figure shows the timing performance as the number of particles increases:

Framerate for both methods with varying number of particles

OUR implementation of the nearest plane lookup was pretty slow. However our target framerate of 10Hz was achieved with 100 particles for the depth image likelihood function (ray-to-plane). As you saw above, for 100 particles the error has typically converged, so thats been sufficient to be realtime with the type of motion you see in the figure below.

For all of these figures we decimated the original RGB-D image by a factor of 32. Next we would like to look at what effect that has - especially now that my new laptop has 256 GPU cores.

Additionally we want to look at sub-dividing the full model, so that only the planes near to the particle poses [within 20m] are passed to OpenGL for testing. This is likely to give us a substantial performance improvement. I believe that 1000 particles at 10Hz should be achieveable.

Interestingly the approach developed by Ryohei Ueda during his internship at PCL/Willow Garage is very close to what we do here: the tracking paradigm is essentially inverted. Later in this project it would be interesting to apply depth image simulation method to his tracking algorithm and see if it can be speeded up. Given the object models he was using are much smaller than a building model, it should do.

My first blog entry
Wednesday, January 18, 2012

Getting familiar with Sphinx blog system.

First SVN connection
Wednesday, January 18, 2012

I started learning how to use Subversion and Sphinx. Tomorrow I will meet some PhD students of my university; I would like to collaborate with some of them and get access to some tools (cameras) to test the algorithms I will develop.

First blog entry
Wednesday, January 18, 2012

Today I managed to get Sphinx running and have been updating my personal and roadmap pages. For the remainder of this week I will be working on the following entries of my roadmap:

  • Get familiar with the blogging system, commiting code, my mentor(s), my co-worker(s) and the other people of the TRCS.
  • Gather information on the latest/best/most interesting work in the field of ANF.
hello world!
Tuesday, January 17, 2012

yo just print like "hello world!" bro
VTK Smoothing Algorithms and Other Updates
Tuesday, January 17, 2012

For the VTK smoothing tests, we took the raw clouds, triangulated them using the OrganizedFastMesh triangulation with the TRIANGLE_ADAPTIVE_CUT option, and then fed this to the 3 smoothing algorithms from the VTK library.

The first one to be tested is MeshSmoothingLaplacianVTK with the default parameters recommended by VTK, but with an increase on the number of iterations from 20 to 100.

Bed_sheets Dataset

Here, the results are satisfactory, in both cases, the quantization artifacts are reduced (they are still visible).

_images/mesh_smoothing_laplacian_bed_sheets_1.png _images/mesh_smoothing_laplacian_bed_sheets_2.png

Also, if we look at the corresponding mesh, the reconstruction after smoothing looks more natural, with a better surface curvature.


Bottles and Tupperware Datasets

In this case, the Laplacian smoothing does not work well anymore. The quantization and the high noise level is still present in the case of both the bottles and tupperware datasets. The main reason for this is the fact that the objects of interest were quite far away from the sensor and the quantization artifacts are quite accentuated (i.e., there are large gaps between the points belonging to the same object).

_images/mesh_smoothing_laplacian_bottles.png _images/mesh_smoothing_laplacian_tupperware.png

The mesh subdivision schemes we have been provided by the VTK library are not of great use for our scenarios, as they just split up triangles in the mesh, inheriting from their artifacts. Furthermore, these schemes are highly dependent on the quality of the initial triangulation - which in our case is the simple OrganizedFastMesh - does not yield excellent results. They basically just resample point on the triangles present in the input mesh, without taking into consideration any more complex information about the vertex neighborhood.


Another thing we tried was to combine the simple subdivision with the previous laplacian smoothing, and the results are visually decent, as shown in the next figure. Again, we inherit the problems of the subdivision scheme (the holes caused by the incorrect triangulation).


In the meantime, I have worked on solving some issues with Zoltan Marton’s Greedy Projection Triangulation. Two trac issues regarding this were solved, but its current state does not allow us to reconstruct Kinect scans - once we solve this, I will do benchmarking on the gp3 algorithm too. Other time-consuming fixes were done for OrganizedFastMesh.

A direction we definitely need to look into is to have some algorithms that also add points during reconstruction. The original MLS and GP3 papers do mention this possibility, but they have not been implemented in PCL yet. It is clear so far that we still do not have the Holy Grail of smoothing yet.

Point Cloud Smoothing Benchmarks - MovingLeastSquares
Thursday, January 05, 2012

After we have collected part of our datasets of interest (there are still some objects missing from our collection, will get them next week), we proceed in testing our available smoothing algorithms. Please note that these tests use only real sensor data of scanned objects that are rather irregular, so we do not have any ground truth for our benchmarks. As such, we will limit ourselves just to a visual inspection of the results. This inspection will look mostly into sensor artifacts that we might have in the clouds after the algorithms were applied (please see the problem description page for more details) or artifacts caused by the algorithm itself (issues such as over-smoothing).

Bed_sheets Dataset

One of the best algorithms we currently have in the PCL library is the MovingLeastSquares implementation. We ran this algorithm on the bed_sheets dataset and tweaked the parameters to see the situations it creates.

The first image, from left to right:

  • input cloud bed_sheets/style_1/frame_00050.pcd
  • MLS-smoothed with parameters:
    • search_radius: 0.05
    • sqr_gauss_param: 0.0025
    • processing time: ~19 seconds
  • MLS-smoothed with parameters:
    • search_radius: 0.03
    • sqr_gauss_param: 0.0009
    • processing time: ~46 seconds.

The results seem satisfactory, in general. MLS removes some of the quantization effects (note that the bed was at about 1.5-2m away from the camera), although the slices are still clearly visible. Due to the fact that the details in some wrinkles were lost using a 5 cm smoothing radius, we also tried a 3 cm radius, which seemed to reduce the over-smoothing effect.

The second image, left to right:

  • input cloud bed_sheets/style_2/frame_00050.pcd
  • MLS-smoothed with parameters:
    • search_radius: 0.05
    • sqr_gauss_param: 0.025
    • processing time: ~46 seconds
  • MLS-smoothed with parameters:
    • search_radius: 0.05
    • sqr_gauss_param: 0.0025
    • use_polynomial_fit: 1
    • polynomial_order: 2
    • processing time: ~73 seconds

Here, we show that the usage of polynomial fitting in the MLS algorithm is useful for preserving sharp edges. One can see that the image in the middle is over-smoothed with the 5 cm radius, but the ridges are preserved in the third image.

Tupperware Dataset

MLS was applied to the tupperware dataset and obtained the following results.

Both images, from left to right:

  • input cloud tupperware/multiple/frame_00050.pcd
  • MLS-smoothed with parameters:
    • search_radius: 0.03
    • sqr_gauss_param: 0.0009
    • use_polynomial_fit: 1
    • polynomial_order: 2
    • processing time: ~11 seconds
  • MLS-smoothed with parameters:
    • search_radius: 0.05
    • sqr_gauss_param: 0.0025
    • use_polynomial_fit: 1
    • polynomial_order: 2
    • processing time: ~22 seconds

On on hand, MovingLeastSquares seems to group points together and form visible ‘long holes’. This is due to the heavy quantization errors introduced by the sensor - the table and the curtains in the back are at about 2.5-4m from the camera.


On the other hand, it clearly improves the shape of the objects. The second figure shows a top-down view of the table. The tupperware seems much more smoother and grippable, without loss of information.

Glasses Dataset

In the list of objects we are interested in, there are transparent glasses/mugs. Unfortunately, the PrimeSense technology proves incapable of recording ANY depth for the points corresponding to the glasses, as shown in the following image. There is nothing a surface reconstruction algorithm can do in order to recreate the points on the glasses, so we shall discard this dataset in our following benchmarks.


Bottles Dataset

As expected, the transparent parts of the plastic bottles have not been recorded by the depth sensor.

The image below, from left to right:

  • input cloud bottles/set_1/frame_00050.pcd
  • MLS-smoothed with parameters:
    • search_radius: 0.03
    • sqr_gauss_param: 0.0009
    • use_polynomial_fit: 1
    • polynomial_order: 2
    • processing time: ~19 seconds
  • MLS-smoothed with parameters:
    • search_radius: 0.05
    • sqr_gauss_param: 0.0025
    • use_polynomial_fit: 1
    • polynomial_order: 2
    • processing time: ~45 seconds

The result is very satisfactory. MLS does NOT add any points in the reconstruction, but one can notice the very good silhouette of the bottles, as compared to the very noisy input.

Point Cloud Smoothing Project DATASETS
Wednesday, January 04, 2012

As required by Toyota, we started recording a series of typical household scenes. This first post shows the first 23 recordings we did using an Asus Xtion Pro camera. One can easily download them by the following command:

svn co http://svn.pointclouds.org/data/Toyota

Those datasets are mainly meant to represent realistic situations that a personal robot might face in an undirected human environment. All of the scenes are recorded starting from a distance of about 3-4 meters from the main subject and getting close and rotating around it, in order to simulate the behavior of a robot and to capture most of the artifacts that the PrimeSense cameras present.

These are split into the following categories:

  • Bed Sheets - 3 styles of bed sheets in bedrooms:

    • bed_sheets/style_1/ - 152 frames

    • bed_sheets/style_2/ - 205 frames

    • bed_sheets/style_3/ - 240 frames

  • Bottles - 2 layouts on a table in the kitchen

    • bottles/set_1/ - 180 frames

    • bottles/set_2/ - 260 frames

  • Door Handles - 5 styles of indoor/outdoor door handles

    • door_handles/style_1/ - 200 frames

    • door_handles/style_/ - 330 frames

    • door_handles/style_3/ - 232 frames

    • door_handles/style_4/ - 199 frames

    • door_handles/style_5/ - 256 frames

  • Glasses - one recording for opaque mugs and one for transparent glasses in the kitchen

    • glasses/opaque/ - 246 frames

    • glasses/transparent/ - 364 frames

  • Keyboards - 4 different laptop keyboards on an office desk

    • keyboards/laptop_1 - 249 frames

    • keyboards/laptop_2 - 220 frames

    • keyboards/laptop_3 - 157 frames

    • keyboards/laptop_4 - 221 frames

  • Shoes - 2 recordings

    • shoes/single/ - 275 frames

    • shoes/multiple/ - 200 frames

  • Tupperware - 3 recordings of tupperware on the kitchen table

    • tupperware/single/ - 358 frames

    • tupperware/multiple/ - 337 frames

    • tupperware/stacked/ - 286 frames

  • Other - 2 other recordings I found interesting for the point cloud smoothing problem

    • other/small_windows/ - 262 frames

    • other/textured_wall/ - 219 frames

PCL Surface Architecture Updates
Friday, December 30, 2011

With the help of Michael and Radu, we have made a few changes to the pcl::surface module. We have now structured it by adding three base classes which differentiate between algorithms with distinct purposes:

  • MeshConstruction - reconstruction algorithms that always preserve the original input point cloud data and simply construct the mesh on top (i.e. vertex connectivity)
    • input: point cloud
    • output: PolygonMesh using the input point cloud as the vertex set
    • examples: ConcaveHull, ConvexHull, OrganizedFastMesh, GreedyProjectionTriangulation
  • SurfaceReconstruction - reconstruction methods that generate a new surface or create new vertices in locations different than the input point cloud
    • input: point cloud
    • output: PolygonMesh with a different underlying vertex set
    • examples: GridProjection, MarchingCubes, MovingLeastSquares, SurfelSmoothing
  • MeshProcessing - methods that modify an already existent mesh structure and output a new mesh
    • input: PolygonMesh
    • output: PolygonMesh with possibly different vertices and different connectivity
    • examples: EarClipping, MeshSmoothingLaplacianVTK, MeshSmoothingWindowedSincVTK, MeshSubdivisionVTK

Please notice the new classes ending with VTK. We already had these implemented in PCL before, but in quite a simple state. They are now fully usable and documented.

The recordings for the required datasets is in progress and they will be tested with most of the algorithms mentioned above.

Also, a new Poisson implementation is underway.

Back in Action and new Project requirements
Wednesday, December 21, 2011

I have not been too active lately due to intense school activities (exams and end of semester projects/presentations). I am now ready to continue with my TOCS assignments.

A couple of weeks ago, some discussions took place between Toyota and PCL representatives and my project got a bit more clearer. The things I am going to spend my following days on is creating a database of recordings of different household items and household-specific scenes. Next, I shall apply all the current algorithms we have in PCL for surface smoothing and reconstruction and report back with the results of a qualitative analysis of the output.

Friday, December 09, 2011
Bugs removed and global alignment verified
Friday, December 09, 2011

After some prodding from Christian at Willow, we fixed a few bugs with our coordinate frames. Thanks! Applying the camera transform to the point clouds now results in perfect registeration of two views. This is three views of a teapot without noise or quantization:

Simulated Data

And with noise and quantization:

Simulated Data

Basically the bugs came about when we inverted the Z axis when reading the depth buffer making our system left handed. (Stop hating on us leftys!) This is the coordinate frames we’re using now.

  • OpenGL: +X right, +Y up, +Z backwards out of the screen
  • Computer Vision and PCL: +X right, +Y down, +Z into the view
  • Robotics: +X right, +Y forward, +Z up

We’re just about ready to add shader-based cost functions.

Started implementing 3DGSS Features
Friday, December 02, 2011

As mentioned in the roadmap, one of the steps would be to implement a method that would help find edges in the depth images. The one I started looking into was proposed by John Novatack and Ko Nishino in “Scale-Dependent 3D Geometric Features”. The final goal of the paper is to have scale-dependent 3D feature descriptors. But on the way, they compute edges and corners in 3D.

The novelty of the approach is that they compute the scale-space of a dense and regular 2D representation of the surface using the normals of the scan. Technically, they create the Gaussian pyramid of the “normal images” and also the first and second derivative (i.e. Laplacian) of the levels of this pyramid.

Just like in 2D computer vision, the edges are found by looking for the zero-crossings of the Laplacian of the normal maps at different scales (+ some thresholding on the corresponding first derivative).

An example of the pyramid of normal maps:


Edges found in an example scene:


More results will be posted once this is finished.

Added PLY support (including colour) to simulator
Thursday, December 01, 2011

We are getting pretty close to having a complete RGB-D simulator integrated into PCL. Below you can see some figures showing:

  • Top Right: a view of the model (complete with garish default colors)
  • Top Left: the depth image from OpenGL’s depth buffer
  • Bottom: the same information in the PointCloud viewer (including color)
Simulated Data Simulated Data 2

Note the disparity-based quantization and the Gaussian noise. A fully realistic simulator will be much more complicated though! The images correspond to a 3D model of our Stata Center building which we have, at about this location:

Third floor view at MIT's Stata Center

In addition here is the RangeImage using Bastian Steder’s work - which we’ve integrated:

Range Image created using pcl->RangeImage library

For some reason it appears very small, hence the low resolution (to-be-fixed). We haven’t done much else but feature extraction, segmentation registeration should be possible and it could be useful for unit testing and stochastic

We have a GLUT-based application takes as input a single .ply file and the user can use a mouse to ‘drive around and take shots. For the really interested, you can try out our range range-test program in pcl/simulation. Perhaps its useful to people who have had problems using OpenNi. We are re-writing it using VTK currently.

Here’s the sample ply file to use from MIT’s Stata Center:


We’ve been talking with Alex about combining efforts - towards a library for point cloud simulation. Hordur Johannsson has been looking at using OpenGL Shaders to do comparison between these types of simulated views and real data - to give a measure of a match between to images.

NOTE: the models were generated by students working with Professor Seth Teller. More details and models here:


Virtual Scanner Improvements
Saturday, November 26, 2011

In the past days, I have been doing some research in the literature regarding point cloud upsampling and smoothing, trying to find some approaches that might be suitable with the PrimeSense cameras. Will produce a blogpost regarding this as soon as I have done some conclusive experiments.

Until then, the following figure shows the current status of the Virtual Scanner application. It now has a GUI written in VTK, where the user can load VTK-compatible objects, freely manipulate a camera and produce 3D scans of the scene. The scanned cloud is shown live in another window.


I have tried to make the artifacts of the output cloud to be similar with the ones produced by the Kinect. The solution for the quantization artifacts was suggested by Suat and it consists of the following:

  • the depth of a pixel is defined by Z = f * b / d where f is the focal length in pixels (measured at 575 pixels for the Kinect), b is the baseline (7.5 cm) and d is the disparity measured in pixels.
  • the Kinect quantizes the disparity by 1/8-th of a pixel.
  • add Gaussian noise before quantizing
  • an example of such an artifact:
    • consider a pixel with a disparity of d_2 = 5 px \Rightarrow Z_2 = 8.625 m
    • the next disparity value is d_1 = 5.125 px \Rightarrow Z1 = 8.415 m
    • and the previous one was d_3 = 4.875 px \Rightarrow Z3 = 8.8461 m
    • The difference is of 21 cm between the first two and increases to 22.1 cm at the next quantized disparity value and will continue to increase at larger distances

There are still some interface issues to be solved, and this will be commited to trunk soon.

Results from analyzing MLS smoothing
Saturday, November 19, 2011

I started off by doing some experiments with one of the smoothing algorithms we already have implemented in the PCL library: Moving Least Squares smoothing.

First, the influence of the search radius on the smoothing output was analyzed (without fitting a polynomial). The following figure shows the results: from bottom to top, left to right: original kinect cloud, MLS with search radii: 0.01, 0.02, 0.05; color coding by curvature.


Best result that smooths the wall to a plane and keeps the shapes of the objects is obtained with search_radius of 0.02 (=2 cm). 0.01 does not perfectly smooth the wall and 0.05 eliminates the depth of the small figure on the desk.

The time performance was looked into and the collected data is presented in the following table. Two different search approaches were used: the kdtree implementation from FLANN and the search::OrganizedNeighbor class using a window-based approach (approximate method).

MLS search_radius KdTreeFLANN OrganizedNeighbor Time improvement
0.01 5.6 s 3 s 46 %
0.02 19.1 s 10.8 s 43 %
0.05 107 s 61.8 s 42 %
0.07 201.3 s 117.3 s 42 %

So, the immediate conclusion is that MLS is definitely not suited for real-time application. It would be a viable option as a post-processing step for the registration pipeline we mentioned in the roadmap.

Next, we varied the order of the polynomial to be fitted. The following figure shows the results: MLS with polynomial fitting of orders 0, 1, 2, and 3 with a constant search radius of 0.02 (ordered left to right, bottom to top).


The result differences are rather subtle, some fine details tend to be preserved with higher order polynomial fitting. But these fine details are mostly due to noise and the time expenses one has to pay for the additional polynomial fitting is not totally worth the small improvements, as the following table shows:

MLS polynomial order Time Increase to order 0
0 18.4 s 0 %
1 19.4 s 5 %
2 22.5 s 22 %
3 25.8 s 40 %
Adding a simulated RGB-D sensor to PCL
Thursday, November 17, 2011

The first major step is to port over our work for generating simulated range images from a global world view. It uses OpenGL to render simulated image views in much the same way as it would for a gaming application. The implementation has a few extra bells which mean that arrays of (smaller) views can be read efficiently. This allowed us to achieve 100s of simulated views at 10s fps.

When this is fully working, there is a test progam which will allow the user to “drive” around a simulated world and generate a log of RGB-D data.

Left simulated Depth images on the, right read measured depth image

The maps we were using previous used a pretty funky file format: basically each plane was read from an individual PCD file - so litterally 1000s of files were read in to build the map. The next step in our work is to enable support for obj, vtk, ply file types. Thankfully PCL already has good support for reading these files.

Global Point Cloud Localization: Project Intro
Wednesday, November 16, 2011

Our contribution to the Toyota Code Sprint will be focused on Global Point Cloud Localization.

We’ve been working on this problem previously using RGB-D/Kinect sensors. You can see an overview of our work over on the main part of pointcloud