PCL Developers blog

Google Summer Of Code

PCL has joined the Google Summer of Code program for the third time. 12 students are ready to take the challenge to push PCL even further this summer.


For a complete list of all the present and past PCL code sprints please visit http://www.pointclouds.org/blog.

Latest 15 blog updates

Improved inputs and results ( Third phase )
Friday, August 22, 2014
  • Introduction

    In this phase, the focus was to integrate the pcl::KinfuTracker class and the OpenCV libraries in order to provide better input clouds for the Rigid/Non-Rigid Registration methods and thus to obtain better results.

  • Approach

    By using the pcl::KinfuTracker class, it was possible to obtain an almost full scan of the heads of the subjects like the ones presented below:

    ../_images/sub1_front.png ../_images/sub2_front.png

    Using the pcl::KinfuTracker had the disadvantage that unwanted objects were also scanned during the procedure, however by using the cv::CascadeClassifier, the program was able to pin-point the position of the head and move the statisitcal model to a favorable position so that the Rigid Registration method could fully align the model. ( The sphere represents the center of the face)

  • Results

    Obtaining better input point-clouds, allowed to better analyze the efficiency of these types of registrations. As stated in the previous post, the accuracy of the result depends on the regularizing weight of the Non-Rigid Registration, however there is one more parameter to take into account. The training set of face meshes was registered on a ten times smaller scale than the PCL point clouds are stored in. Intuitively, this means that when the program reads the database it should register the values as ten times smaller, however the subjects used for testing this program did not have heads of exactly the same size.

    Below, you have an example of what happens when the scale is not set right:


    This result was obtained because, as it is presented in the folowing picture, the chin of the model was matched with the neck of the target, even though the rest of the face seems to be in position:


    Once the scale was properly established the correct result was obtained:

  • Feature work

    • A method should be implemented so that it would not be necessary to set the right scale, no matter what kind of person is being scanned (child/adult)
    • The pcl::KinfuTracker class has not been officially released, thus further maintenance of the prgram is required
    • The Registration methods presented so far have to be compared to other methods like the ones already implemented in PCL
Final Example “Final example” (Fourth post)
Monday, August 18, 2014
  • Introduction

Once we have the segmentation, and the functions to make the measures of the segmented candidates to objects, we need to create a example where we analyze all segments and we decide which ones are objects or not. Also the objects that have more similarities in the measures, we can classify into a category.

  • FrameWork

This tutorial gives an example of to implementation of the algorithm present in:

“Andrej Karpathy and Stephen Miller, and Li Fei-Fei”, “Object Discovery in 3D Scenes via Shape Analysis”, “International Conference on Robotics and Automation (ICRA)”, “2013”


As the autors said in the abstract: “We present a method for discovering object models from 3D meshes of indoor enviromments. Our algorithm first descomposes the scene into a set of candidate mesh segments and then ranks each segment according to its “objectness” a quality that distinguishes objects from clutter”.

In this example we will make all the framework to obtain the segmented objects at first and then calculate some geometric measures to make a clasification of the “objectness of the segmented candidates. After that we save all the “good” (the segments that the algorithm think are a object) and the measures asociated to each object. This can be used by the Recurrence to improve the clasification of the objects.

We begin with the based graph segmentation that was explained before. We made a over-segmentation of the meshes. To obtain diferent segments for each scene. We need to make this segmentations in all scenes that we want tto use for Object Discovery.

After we have the segmentation, we need to select the object-like segmentes. For that we apply some test to discard the not correct segments.

-Min and Max number of points. If the segment dont have the engouh points or is really bigger, we discard this object.

-After we take a look about the shape, we made two test, one about the shape, if is really flat or thin we discard this segment. Also we look into the size in the 3 axes. If this size is under or over a threshold is also rejected the segment.

-The next test is about diferent geometrical measures. This meauses where explained in the second post of this blog. We compute all this measures for each segment and we store this value. This will be used in the next test.

-Due to, we make a over-segmentation with different thresholds, probably will have the same segment more than one time. For that we make another test for each scene. This test name non-max supresion consist in look into all the segments with a result of more than 80% for a intersection ove union. If they have more than this, we compare the value of the measures, the object with the hightst value of this measure is keep and the other is discarded.

This is the last test. Now we show the results for a couple of scenes.

Scene 1

Initial scene

Object-like segments presents in the scene

Initial scene

Object-like segments presents in the scene


We store for a great number of scenes, all the segments and measures for each segment. Finally in the next step we add another measure, the recurrence.

More snapshots of some final results from different scenes:

../_images/0.png ../_images/11.png ../_images/21.png ../_images/31.png

Last Step: Recurrence

We are going to compute the recurrence in the previous results. For that we need to have the results of the object discovery. This recurrence consist in analyze all the segments in all the scenes, because segments that are commonly found in other scenes are more likely to be an object rather than a segmentation artifact. For that in the paper explains that, that the best way to measure the distance between the top k most similar segments is to find the objects with a 25% of extent along principal directions in size and measure the euclidean distance between their normalized shape measures. This is the best option, taking care the computacion time. We can use in the case other approachs like icp, ransac or local descriptors, global descriptors. In this case, for all objects founded in the segmentation proccess from before, we are going to obtain this score and find the 10 most similar objects.

After aplying the Recurrence we show the next results:

1st Query for an object


This case is really good (the implementation works!!!), and make more easy to understand the paper. After all the work, we obtain a possible object in a scene ( a cup) also with the aid of all scenes, give us a measure quite confident to clasify this objects. How much big is the number of scenes, this score improves its quality. This means, more scenes more quality.

2st Query for an object


3st Query for an object


4st Query for an object


Finally I think that I accomplish the main goal of the GSoC program. That was implement the algorithm in the PCL format.

To use this program and to look into the tutorials to we need to wait to be acepted a possible pull request with the source code, the 2 class that I create into the PCL website and in the PCL library.

that’s all folks!!!

P.S. I will update the post with links if I get the pull the request of the code that I created during GSoC.

Different Feature representation of surface patches and analysis
Monday, August 18, 2014

This project aims at getting features for each surface patch and group them to gether based on a machine learning technique. I have been experimenting on the features that could represent the surface patches belonging to an object and how they relate with the inter and intra object surface patches. Below are some of them, with their description and plots.

For all the below analysis please look at the image below and the segments derived out of basic region growing which doesn’t make use of RGB color and purely based on surface curvature.


ColorHistogram: Binning colors is pretty normal thing to get the appearance cues. But they can be done in two ways. One that captures the color channels independently and the other that captures a dependent binning. In this section I am showing the independent binning of values on three different channels. Independent means at a pixel/point in scene we check RGB values different and increment the bin where these values belong to. Below is the plot that shows how the histograms look like for RGB and HSV color space.

../_images/color_rgb.png ../_images/color_hsv.png

ColorHistogram 3D: Binning colors dependently means having a matrix with RGB in 3 dimensions and increment the value of the (r, g, b) bin only if that combination is satified. This gives a 3D histogram. Below images show the 3D histogram concatenated to a 1D histogram for RGB, HSV and YUV color space. This binning style is referred from [1].

../_images/colorh_3d_rgb.png ../_images/colorh_3d_hsv.png ../_images/colorh_3d_yuv.png

Verticality: Verticality actually represents how the surface patch is orientated with respect to the camera viewpoint. Histogram is developed by binning the difference of angles between the normals and the direction in which the camera is pointing to (i.e the positive z axis for any point cloud from Kinect). Below is the histogram plot for this feature. However this is not so useful to this segmentation project but more aimed towards the object discovery and saliency related problems to distinguish the a surface patch from its peers. This is implemented based on [2] which was adopted from [3].


Dimentionality Compactness: This actually shows how compact a surface patch is. One could do a PCA of a surface patch and derive a local frame of reference. This when followed by creating a bounding box gives the 3 dimensions in which the patch ranges the max. This bounding box will have 3 dimensions and their ranges are computed. Once this ranges (xrange, yrange, zrange) are sorted into min_range, mid_range and max_range, two ratios are computed. 1) min_range / max_range and 2) mid_range / max_range. Below is plot of this histogram for the above mentioned segments. This is implemented based on [2] which was adopted from [3].


Perspective scores: This is the ratio of the area projected in the image to the maximum area spread by the region in 3D. The pixel_range below means the bounding box range in the particular direction of the image pixels. xrange, yrange and zrange are the dimentions of the 3D bounding box of the surface patch. Note that PCA shouldn’t done here as we are comparing the 3D surface patch with its appearance on the image. This is implemented based on [2] which was adopted from [3]. Below are the elements of the histogram.

  1. pixel_x_range (in meters) / xrange

  2. pixel_x_range (in meters) / yrange

  3. pixel_x_range (in meters) / zrange

  4. pixel_y_range (in meters) / xrange

  5. pixel_y_range (in meters) / yrange

  6. pixel_y_range (in meters) / zrange

  7. diameter_of_bounding_box_pixels (in meters)/ diameter_of_cuboid_bounding_box_in_3d

  8. area_of_bounding_box (in metersquare) / area_of_the_largest_two_dimentions_in_3d


Contour Compactness:

This is the ratio of the perimeter to the area of the region. This is the ratio of number of boundary points computed for a segment to the total number of points in the region. This is implemented based on [2] which was adopted from [3].



[1] A.Richtsfeld, T. Mörwald, J. Prankl, M. Zillich and M. Vincze: Learning of Perceptual Grouping for Object Segmentation on RGB-D Data; Journal of Visual Communication and Image Representation (JVCI), Special Issue on Visual Understanding and Applications with RGB-D Cameras, July 2013.

[2] Karthik Desingh, K Madhava Krishna, Deepu Rajan, C V Jawahar, “Depth really Matters: Improving Visual Salient Region Detection with Depth”, BMVC 2013.

[3] Alvaro Collet Romea, Siddhartha Srinivasa, and Martial Hebert, “Structure Discovery in Multi-modal Data : a Region-based Approach”, ICRA 2011.

These features will go into the features module of the pcl_trunk pretty soon. Currently working on the relational features which tells how two surfaces are related to each other. Next blog post should be on that.

Almost close to the end stage of GSoC, but this project has long way to go! Hope to keep pushing stuff till the entire pipeline is up on PCL.

Wednesday, August 13, 2014

Initially, the main objective of my project was to quickly obtain skeleton information using the GPU People module and focus the construction of the classification framework to recognize human actions. As it turned out that strong modifications of the gpu/people module were necessary in order to obtain the desired output of skeleton positions, the main direction of this project has moved more to extending and improving the gpu/people module. In the end, a simple action recognition framework was implemented, using K-Nearest-Neirghbour classifier on the positions and angles of selected joints, and RGB-D data of people performing the defined actions was collected for training.

The project had following major contributions:

  • Calculation of the skeleton joint positions and their visualization (in the new People App the skeleton is visualized by default)
  • Using Ground Plane People Detector in order to segment the body before the body labeling is applied. Without this modifications, the pose detector has big problems with large surfaces (walls, floor ...) and works robustly only in near-range. Besides this, the speed of the detector is increased as most of the voxels obtain very high depth depth values
  • Tracking of the skeleton joints (see my previous post). Alpa-Beta filter is applied on the measured joint positions.
  • Offline skeleton detection from the recorded LZF files.
  • Classification of skeleton data sequences with K-Nearest-Neighbours

The source code of the new version of the gpu/people module can be downloaded here: https://github.com/AlinaRoitberg/pcl/tree/master/gpu/people

Building the app requires PCL installation with GPU enabled. For more information, see the PCL information on compiling with GPU and the tutorial for the initial /gpu/people module

The new application can be executed in the same way as the initial PeopleApp, while new functions can be actuvated with folowing additional flags:

-tracking <bool> activate the skeleton tracking
-alpha <float> set tracking parameter
-beta <float> set tracking parameter
-lzf <path_to_folder_with_pclzf_files>
-lzf_fps <int> fps for replaying the lzf-files (default: 10)
 <path_to_svm_file> activates body segmentation

Tree files for people detection: https://github.com/PointCloudLibrary/data/tree/master/people/results

SVM file for body segmentation (Ground Plane People Detector): https://github.com/PointCloudLibrary/pcl/tree/master/people/data

Example of execution (active segmentation, no tracking, using live data):

./pcl_people_app -numTrees 3 -tree0 ../tree_files/tree_20.txt -tree1 ../tree_files//tree_20_1.txt -tree2 ../tree_files/tree_20_2.txt -segment_people ../svm_path/trainedLinearSVMForPeopleDetectionWithHOG.yaml

Example of execution (active segmentation and tracking, using recorded PCLZF data)

./pcl_people_app -numTrees 3 -tree0 ../tree_files/tree_20.txt -tree1 ../tree_files//tree_20_1.txt -tree2 ../tree_files/tree_20_2.txt -segment_people ../svm_path/trainedLinearSVMForPeopleDetectionWithHOG.yaml -lzf /path_to_lzf_files/ -lzf_fps 10 -tracking 1

Take a look at the video demonstrating the modifications of the skeleton tracker (sorry for the format:) ): http://youtu.be/GhCrK3zjre0

Implementation of the shape generator finished
Wednesday, August 13, 2014

I finished the work on the shape generator module. It became quite a versatile tool. Here is an overview what one can do:

  • Create simple shapes (Sphere, Cylinder, Cone, Wedge, Cuboid, Full Torus, Partial Torus).
  • Create any type of polygon shape (Just give it a list of vertices and edges). If you want to use the more advanced features like cutting the polygon needs to be convex.
  • The list of simple shapes can be easily extended using convex polygon.
  • Combine any number of shapes/objects into more complex objects.
  • Create cavities, holes, tunnels by cutting one shape/object by another.
  • Everything you generate will have normal information attached to it.
  • Full control over the labeling of shapes. Label each shape individually, label a group of shapes, give the full object one label .... This allows one to tailor groundtruth to what you actually want to benchmark.
  • One class for reading assembly instructions from human readable text files (recipes). You can access all features of the generator except the general polygon class. Recipes can also call/include other recipes. This makes it possible to create complex objects and reuse them in different settings. Another example would be the creation of several objects which can easily be arranged in various scenes.

Here are some examples how we can create a table top scenario consisting of a table, a cup on top and a saw which cuts into the table using recipes (shown on the right side). First we make and compile a recipe file for a table leg:


Next we will combine four legs plus a cylindrical plate to a full table (we set the variable Label to keep, which preserves different labels for the object’s parts):


If we set Label to unique it will tell the generator to give only the table parts unique labels (not the parts the parts consist of :) )


This labeling would make sense if you want to test for instance part segmentation algorithms like LCCP.

Now let us combine the table and two objects. We have multiple options for the labeling now: First: Each object gets its own label:


Second: Give the parts of the objects their own label:


Or keep the labels of the parts and parts of parts unique :) :


Just to mention: All points have normal information attached:

Further improvements of the skeleton detection, data collection and activity recognition framework
Tuesday, August 12, 2014

Joint tracking

Although the body segmentation has solved the problems with full body visibility the resulting joint positions were still not sufficient. Besides the Gaussian noise, big “jumps” occured if suddenly a blob with a compeletly wrong position was labelled as the corresponding body part.

To solve this, I implemented a simple tracking function (Alpha-Beta-Filter) on top of the joint calculation. The filter estimates the new position based on the predicted (calculated from the previous position and velocity) and measured position. The weight of the measured position is given by the parameter alpha, while beta shows the weight of the velocity update.

Activation of tracking, alpha and beta parameters can be set in the PeopleApp.

The parameters should be chosen wisely, current default values worked good for me, but the also depend on the frame rate, which is not easily predictable, as it depends on the GPU.

Other changes to the skeleton calculation

Integration of tracking improved the results, however, problems still occured if the voxel labelling failed. Unfortunately, this happed a lot with the hands if they were very close to the rest of the body (and could be hardly distinguished).

That’s why I added some correction based on the position of the elbows and the forearms. I estimate the expected position based on the positions of those body parts and if the measured position is too far away, the predicted result is used.

Besides I discovered the global variable AREA_THRES2 (in people_detector.cpp) and increasing it from 100 to 200 improved the labelling. Increasing it reduces the noise, while making the probability that small blobs (like hands) will be missed higher.

Data collection

As I have mentioned in my previous post, I wanted to collect the skeleton information from the recorded Kinect data in order to use the full framerate. To do so, I extended the people app with the option of using recorded PCLZF videos instead of the live Kinect data.

The positions of skeleton joints can be stored in a txt file with following flag: -w <bool val>

The data format is defined as following: SeqNumber Timestamp Pos1_x Pos1_y Pos1_z Pos2_x Pos2_y Pos2_z

I recorded the Kinect data of ten people performing the activities (defined in my previous post) multiple times. Afterwards I segmented the PCLZF-files with each directory containing the one execution of an action and run the people detector on each of them (I had a lot of fun with it:)). As copying the files and running the skeleton detector really took me “forever” I had to stick with a part of data and focus on the activity detection framework. The current framework contains segmented training data from 6 participants (269 recordings all together) and I will of course add the rest of the data, but I doubt that it would happen before the end of GSOC.

Another file format was defined for the training data, which includes all recorded skeletons, annotation of the action and UserID.

Classification Framework

I implemented a simple K-Nearest-Neighbours algorithm to classify a skeleton sequence. As all sequences have different lengths and the number of frames is high, one should find a way to calculate a good feature vector. In the implemented framework, classification happens in following way:

  • Read skeleton data is divided into a fixed number of segments
  • For each segment the joint positions are calculated as the mean value over all frames in the segment
  • For each resulting skeleton (one per segment) further feature selection takes place: the decision which joint positions and angles to use
  • All joint positions are calculated in relation to the Neck position (and not the sensor)
  • The height of the person is estimated (difference between the head and the feet in the first or last segment) and the positions are normalized accordingly
  • Mean and variance of each dimension of the feature vector is calculated over the whole training set and the data is normalized accordingly (this is important in KNN, to let all dimensions contribute in the same way).
  • The feature vectors can be classified with K-NN (1-NN by default).

The application can classify a new sample as well as perform Leave-One-Out cross validation on the training set and print out the confusion matrix.

Currently I am getting following results:


The recognition rate of 34% might not sound that great. However one should consider the high number of actions and the fact that the skeleton data has some problems with joint positions, especially with the hands, which makes the recognition very challenging. The people detector also has some severe problems with unusual poses.

3. Some real results! (a.k.a. Not hacks!)
Monday, August 11, 2014

In this post I will show you some results I got yesterday of our demo code working in a slightly cluttered scenario and fitting superquadrics to household objects.

Here is the environment we wish to recognize:


Here are the fitting results:


Here a bit more of detail with the colored pointclouds:


And here the fitted superquadrics alone for a better view:


The code used for the results shown in the rest of this post can be found here .

So, how does this code work? A general overview would go like this:

  1. Connect your Kinect to your PC :0}
  2. A snapshot of a pointcloud (captured by OpenNI grabber) is taken (demo.cpp)
  3. The pointcloud captured is segmented (table + objects) in clusters (demo.cpp)
  4. The clusters (one-view) are mirrored based on [Bohg2011] (tabletop_symmetry/mindGapper.cpp)
  5. The mirrored clouds are then fitted to a SQ (SQ_fitter_test1.cpp)


  1. Using mirrored pointclouds helps substantially to the fitting process.
  2. I have implemented the optimization process using two different libraries: levmar and ceres. While ceres have to be installed from a external repo, levmar can just be added to the code (sq_fitting/levmar). It only depends on lapack and it is lightweight. I have to test how both compare in processing times. The results I am attaching were obtained using the levmar optimization process.
  3. If you noticed, the milk carton was not fitted. I got an spurious result there given that the mirroring code did not work well with that object.
  4. Fitting times (still in debug mode, not optimized and with warning printouts here and there) is < 2 seconds in my old laptop.
  5. You might notice that for one of the objects (the blue water container) the fitting produces a SQ longer than needed (cross the table). How to limit it to be shorter?
  6. For objects like the tuna can, in which the “bigger axis” (assumed to be Z) is horizontal, the fitting is not exactly as it should be (the revolution axis should be UP, but it is horizontal. How to deal with objects that are wider than taller?

Things to figure out:

  1. Levmar or Ceres? So far I am more inclined towards levmar since it is smaller and - other than lapack, which should be installed on most Linux machines - it can be modified. I know Ceres is fairly popular but I am not sure if we need all that power.
  2. Figure out the “wide” objects issue.
  3. Improve my mirror code for cases like the milk carton with bad viewing angles.
[Bohg2011]Bohg, Jeannette, et al. “Mind the gap-robotic grasping under incomplete observation.” Robotics and Automation (ICRA), 2011 IEEE International Conference on. IEEE, 2011.
Second implementation of Code “Segmentation and creation of object candidates” (Third post)
Monday, August 11, 2014
  • Introduction

The implementation of the Efficient Graph-Based Segmentation, base on:

Efficient Graph-Based Image Segmentation Pedro F. Felzenszwalb and Daniel P. Huttenlocher International Journal of Computer Vision, Volume 59, Number 2, September 2004
  • Work

Due to license problems, I had to re-implement the code of the segmentation, also I need to implement the same code to use in Mesh and Point Clouds. To use in images, we need to create a graph, based on a image. To improve the results, In the paper recommend to smooth the image before the segmentation. I use the gauss filter implemented in PCL, for this purpose. After the code was written, and with the access to original code, I made some test to check if the segmentation was good.

The first test ,was about the segmentation without the gauss filter applied for each case and after, I made the same picture with the gauss filter .

  • Results

Without Gauss Filter(Original Code)


As we can see in this image, there are a lot of segments, but if we compare this image with the next image, the adapted code is the same segments in both of them. The colors are different, because the colorization is generated random.

Without Gauss Filter(Adapted Code)


Then a test, with a sigma for the Gaussian filter of 0.5, and a threshold for the algorithm of 500 was made in the same image for both cases.

With Gauss Filter (Original Code)


With Gauss Filter (Adapted Code)


Here we can make that, there are some differences. After some research in the code. I found that, when we compute the weight for the graph, there are some minimal differences, this gives us the idea, that the smooth of the image, is different in both cases. Is small difference, some decimals, but this creates a different result. I didn?t make more research about it, because is not part of the GSoC task.

  • Point Cloud segmentation

Once we have the graph segmentation working, the next step is made the algorithm to use with Point Clouds. For that, the only thing that is needed to use, is create a new graph based on the difference of normal or curvature.

First implementation of Code “Objectness measures” (Second post)
Wednesday, July 30, 2014
  • Introduction

I begin with main part of the code, and probably the most easy to implement. In this part I write the code to make this measures for each Point Cloud. The input in the code is a point cloud (or set of pointclouds) and the output a vector with the values of the resultant measure. To simplify the code part and the use of this for other aplications,this part is separated from the segmentation, also if anyone want to use other segmentation that the proposed in the paper is usefull.

  • “Objectnes measures”

We have 5 different measures:

  • Compactness: This measure looks into how compact is a object. Search the minimum bounding sphere that contains all the surface of the mesh of the pointcloud.
  • Symmetry: This measure analyze the reflective symmetry along the three principal axes, based on the eigenvalues of the scatter matrix.
  • Smoothness: This measure evaluate the quality of the points of the segments, if the segments have points uniform spread around it, this will score high, if have spread points this will be have a low score.
  • Local Convexity: This measure determine the convexity of each polygon edge in the mesh of the point cloud,and score each segment by the percentage of its edges which are convex.
  • Global Convexity: This measure is generated by a convex hull, and after that a mean distance between the points and the convex hull isrecorded to create this measure.
  • “Future Work”

In the future work, I need to implement the segmentation to give data to measure the objects and test this code.

GRSD Descriptor computation and analysis
Monday, July 28, 2014

Global Radius-based surface descriptor concatenates the RSD descriptor as discussed in the previous post to represent the complete object. GRSD gives a good description of the 3D shape of the object. Below are the set of objects and its GRSD descriptors i.e. histograms. I have used University of Washington’s “Large scale RGBD dataset” for the experiments.

For an object whose surface is planar but has 2 different planes in the view
../_images/box_image.png ../_images/box.png
For an object whose surface is planar but has 1 planes in the view
../_images/plane_image.png ../_images/plane.png
For an object whose surface is spherical
../_images/sphere_image.png ../_images/sphere.png
For an object whose surface is cylinderical but doesn’t have any planar surface in view
../_images/cylinder_image.png ../_images/cylinder.png

It can be seen that all the descriptors are different from eachother. Planes and box surfaces are similar as the surface characteristics are similar in this case. Both GRSD and RSD are pushed into the pcl-trunk for people to use. The test files for these two features are also included in the trunk for the basic usage of the same.

Currently working on the NURBS for small surface patches. Since NURBS are already available in PCL we will be looking at how to tailor the same for our needs. After this we plan to work on the features that compute the relationship between the surface patches.

Object discovery in KINFU DATA (First post)
Sunday, July 20, 2014
  • Introduction

The principal goal of this project will be implement the algorithm and app developed in:

-Fei-Fei, L., Karpathy, A., Miller, S.”Object discovery Via Shape Analysis”.

This automatic object discovery will be useful for robotics or autonomous vehicles and it will be able to find different class of objects. This new approach compares different parameters of an object to classify it into a class and diffence the objects. The parameters that defined a object is based on the: “Objectness measures” part in the paper. This is the core of the app and the most important part. And probably a good part to begin to code because is quite easy to code.

Rigid and Non-Rigid Transformation ( Second phase )
Thursday, July 17, 2014
  • Introduction

    In the previous phase, it was presented how to obtain a statistical model from a set of face-meshes. The next step in our project is to “match” the mean face of the database, with the face of a random person, like the one in the picture below:


    The matching is done by applying alternatively the following methods.

  • Rigid Registration

    This method is very similar to the Iterative Closest Point Cloud algorithm, because the goal is to estimate a rotation matrix and a translation vector that would move the average face to an optimal position, near the face of the kinect. Basically, it is required to minimize the error \epsilon =  \sum ||\vec {y} - (R \cdot \vec{x} + \vec{t})||^2 and this is done by calculating the solution of this system in the least square sense. In order to calculate this solution, the system is first linearized using the Jacobian matrix.

    Of course this process is applied iteratively, and below are presented a few stages of positioning of the model over the scan:

    ../_images/debug_1.png ../_images/debug_2.png ../_images/debug_4.png ../_images/debug_5.png
  • Non-Rigid Registration

    Once the model is roughly aligned, we need to modify the shape of the model to match the face from the scan. For this we make use of the eigenvectors computed in the previous phase and we calculate the optimal solution of this system: \vec {y} = P \cdot \vec{d} + \vec{model}, where P is the matrix of eigenvectors, \vec{model} is the current form of the model and \vec{d} is the vector of basis coefficients that need to be determined.

    However, there is on more constraint to be applied and that is to minimize the sum \sum_i \frac{d_i}{\sigma_i}, where \sigma_i is the eigenvalue of the corresponding eigenvector. Therefore, to the Jacobian matrix of this system, we need to add a diagonal matrix with \frac{1}{\sigma_i} on the diagonal and multiplied by a certain weight.

    The purpose of this regualrization is to determine to what degree the face should be deformed. The eigenvectors are stored in the P matrix in decreasing order according to their eigenvalues and their position in this sorting order determines whether they have a greater or a smaller influence on the shaping of the model. When the model is mostly overlapping with the face in the scan, more information can be drawn about the final figure, hence the weight specified above should be smaller . On the other hand, if the model is not yet aligned with the scan, the deforming should be smaller and thus the weight should be bigger. Below you can see how the model looks for several values of the weight:

    ../_images/weight1.png ../_images/weight2.png

    Notice that the shaping process tends to return the same effect if the weight of the regularizing constraint exceeds a certain value.

  • Results

    As mentioned above, these functions are applied alternatively for a few number of times, and the following results were obtained:


    The above picture was obtained after one iteration and the following one after 10:


    Also, below you can observe the precision of this method, the black figure representing the final version of the model and the green one representing the point cloud of the face:

Implementation of the shape generator
Tuesday, July 08, 2014

The work on the lccp implementation is mainly done and the pull request is awaiting its approvel. Thanks a lot for the help of the pcl community (especially Sergey and Victor). Thanks to your comments the lccp algorithm is now in a much better shape. Talking about shapes: The next milestone in this GSOC project is the implementation of a shape generator which can be used to create various labeled scenes which can be used to create unit tests or benchmarks for part and/or object segmentation algorithms. I have written a big part of the code already. Now the question is: To which module of pcl should this generator go? I think about putting it into the geometry module. Any comment on this is more than welcome! The next image shows an assembled animal-like object which has been generated from simple geometric shapes (mainly geons).

2. Implementation of local-based approach for stereo matching
Saturday, June 28, 2014


In this post, I will briefly describe the current state of the stereo module and the new features added.

Currently, the stereo module encompass two matching local-based algorithms: 1. Block-based algorithm, which is programed using the Box-Filtering algorithm proposed in [McDonnell81]. 2. Adaptive Cost 2-pass Scanline Optimization, presented in [Wang06]. Both methods use the Sum of Absolute Differences (SAD) as the dissimilarity measure.

As mentioned in the previous blog, the first objective of the present project is to implement the local-based approach proposed in [Min1], for dense correspondence estimation in a pair of grayscale rectified images with an efficient cost aggregation step. Additionally, the cost aggregation step in based on the method presented in [Yoon06], where the weighting function uses a similarity measure based on the color and spatial distances.


In order to do so, a new class CompactRepresentationStereoMatching was created in the stereo module. This class inherits from class GrayStereoMatching, which in turns inherits from class StereoMatching, since some pre and post-processing methods are re-implemented. The new class has five member functions with public access: setRadius, set FilterRadius and setNumDispCandidates, setGammaS, setGammaC, which set three data members of type int (radius, filter_radius and num_disp_candidates) and two of type double (gamma_c and gamma_s) with private access, as well as implementing the virtual method compute_impl.

radius corresponds to the radius of the cost aggregation window, with default value equal to 5.

filter_radius corresponds to the radius of the box filter used for the computation of the likelihood function. The default value is 5.

num_disp_candidates is the number of the subset of the disparity hypotheses used for the cost aggregation. The default value is 60.

gamma_c is the spatial bandwidth used for cost aggregation based on adaptive weights. The default value is 15.

gamma_s is the color bandwidth used for cost aggregation based on adaptive weights. The default value is 25.

Similarly to the previous methods, the current class is based on the SAD matching function, and it estimates the per-pixel cost efficiently using the Box-Filtering algorithm.

To test the algorithm, the Middlebury stereo benchmark (http://vision.middlebury.edu/stereo/) dataset is going to be used.

[McDonnell81]McDonnell, M. J. “Box-filtering techniques”. Computer Graphics and Image Processing 17.1, 65-70, 1981.
[Wang06]Wang, Liang, et al. “High-quality real-time stereo using adaptive cost aggregation and dynamic programming.” 3D Data Processing, Visualization, and Transmission, Third International Symposium on. IEEE, 2006.
[Yoon06]K.-J. Yoon and I.-S. Kweon. “Locally Adaptive Support-Weight Approach for Visual Correspondence Search”. In Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), 924–931, 2005.
RSD Feature computation and analysis
Thursday, June 26, 2014

RSD Feature is a local feature histogram that describes the surface local to a query point. There is pcl implementation for this that is available in the features folder. With the help of my mentor I understood the algorithm by which this feature is obtained. To very if this is working perfectly we took a real object whose radius is known and generated the RSD computation on the entire point cloud of the object. This gives RSD Feature Histogram for all the points in the pointcloud. We can also get the min and max radius of the local surface patch around each point in the pointcloud. I generated various combination of parameters to know how the radius computed varies. Below is the object used which has a radius of 3.5cm which is 0.035m


Below are some of the params chosen and their corrresponding effect on the min and max radius in the local surface patch of each point. For Normal Radius search = 0.03 Max_radius = 0.7 (maximum radius after which everything is plane) RSD_radius search = 0.03


For Normal Radius search = 0.03

Max_radius = 0.1 (maximum radius after which everything is plane)

RSD_radius search = 0.03


For Normal Radius search = 0.02

Max_radius = 0.1 (maximum radius after which everything is plane)

RSD_radius search = 0.03 - This is found to be good way for generating histograms


I tried to do MLS smoothing on the point cloud data and then compute the RSD feature which makes the normal computation better and resulting in consistency over all the points on the object surface.

For Normal Radius search = 0.03

Max_radius = 0.7 (maximum radius after which everything is plane)

RSD_radius search = 0.03


For Normal Radius search = 0.03

Max_radius = 0.1 (maximum radius after which everything is plane)

RSD_radius search = 0.03


For Normal Radius search = 0.02

Max_radius = 0.1 (maximum radius after which everything is plane)

RSD_radius search = 0.03 - This is found to be good way for generating histograms


Now I tested out how the actual feature looks like at a point on the sphere to check if it matches with the histogram in the paper. The same is compared between raw point cloud from the kinect and MLS smoothened point cloud. Below is the result of the same.


It was really hard to fix the previous image that it can show the histograms with values and good resolution. So below is the snapshot of the spherical and cylinderical surfaces.

Cylinderical Surface:


Spherical Surface:


Next post will have the details of how GRSD results are and how they differentiate the characteristics of two surfaces. GRSD code from the author will be integrated into the PCL code base. We also plan to categorize the pipeline into modules that fit into the PCL code base as features, surface and segmentation sections. These information will be posted in the next post.