PCL Developers blog

Code Sprints

Q: What is a code sprint?

A code sprint is an accelerated research and development program, where we pair talented developers with senior researchers and engineers for 3-6 months of extended programming work. The model is inspired by the Google Summer Of Code initiative, and is meant to create open source solutions for interesting 2D/3D computer perception problems. Each code sprint is financially supported by a different institution (see below).

Code sprints are made possible through the involvement of our host organization, Open Perception. Please see our mission page for more information.

Q: What does this blog represent?

The Point Cloud Library (PCL) developer blog represents a great way to keep track of daily updates in PCL sponsored code sprints. Our developers are writing about their experiences and progress updates while working on exciting PCL projects.

Active Sprints

The following shows the current list of ongoing code sprints. Click on any of the logos below to go to the respective sprint blog page.

_images/toyota_logo.png _images/leica_logo.png _images/hri_logo.png _images/gsoc1412.png _images/simpletree.png

Completed Sprints

The list of code sprints which have been completed is:

_images/swri_logo.png _images/spectrolab_logo.png _images/velodyne_logo.png _images/ocular_logo.png _images/dinast_logo.png _images/nvidia_logo.png _images/trimble_logo.png _images/sandia_logo.png _images/hri_logo.png _images/urban_logo.png _images/gsoc1213.png _images/toyota_logo.png _images/gsoc2011_logo.png

We would like to thank our financial sponsors for their generous support. For details on how to start a new code sprint or contribute to PCL, please visit http://www.pointclouds.org/about and our organization’s web site at http://www.openperception.org/get-involved/.

Latest 15 blog updates

SimpleTree - a PCL tool for geometrical tree modeling
Saturday, November 28, 2015
  • Introduction

Geometrical models of trees in the field of computational forestry are nowadays referred to as Quantitative Structure Models (QSMs). The capability of those models exceeds pure volume estimation of trees. If the volume of a tree is multiplied with density values the above ground biomass (AGB) of a tree can be derived non destructively.

Instead of being limited to predict the AGB like statistical or voxel based approaches QSMs give also insight into internal biomass distributions of a tree.

The geometrical building blocks of such models are commonly cylinders fitted into terrrestrial laser scan point clouds.

SimpleTree is an open source tool to build QSMs from TLS clouds (SimpleTree homepage). The released version is based on PCL 1.8.0.

  • The cylinder fitting method

Typically cylinders are fitted into point clouds with either RANSAC or NLS fitting routines. NLS needs initial estimates of cylinder parameters. Also RANSAC tends to produce errourness cylinders, if no initial segmentation of the point cloud is performed before.

The method implemented in SimpleTree relies on search spheres to produce an initial tree model which is enriched afterwards. A sphere with its center point located on the tree skeleton with a radius larger than the underlying represented branch or stem segment will cut the point cloud. All points located on the epsilon neighbourhood of the sphere will represent one or more circular cross sectional areas.


Into this subpoint cloud a circle can be fitted with RANSAC (Module sample_consensus). The sphere center point, the circle center point and the circle radius serve as parameters of a preliminary detected cylinder.


If the circle is enlarged and transformed to a 3D sphere the procedure can be repeated recursively.


To get fast access to the epsilon neighbourhood of a sphere a PCL search structure (Module octree) is used. All points contained in a search sphere have to be removed as soon as the sphere is utilized to prevent the algorithm to jump back and forth infinitely. For the case the epsilon neighbourhood of a sphere contains multiple cross sectional areas - this occurs in branch junctions - a clustering (Module segmentation) has to be performed. Into each cluster a circle is fitted


the largest circle is processed first (marked in green), while the others are stored in a FIFO queue to be processed later (marked in yellow).


By the nature of the algorithm a detected cylinders end point will coincide with another cylinders start point and an informatics tree structure can be utilized to store the cylinders. Several non PCL related statistical post processing procedures will adjust the tree structure and the cylinder parameters. The fit quality is also improved by another RANSAC routine. Points are spatially allocated to their nearest cylinder and only on this sub group a 3D RANSAC cylinder fit is performed. Before cylinder RANSAC:


and after RANSAC:


The final models are highly accurate and the informatics tree structure allows extraction of diameter classes


or the extraction of tree components like stem and branches:


Other possible parameters to extract are described in Hackenberg et al. 2015b.

  • Crown representation

Two comprehensive crown representations are also computed with PCL. The crown can be modelled as a convex hull (Module surface)


or a concave hull


Only the convex hull’s volume is for now written in Output files.

  • ICP to align cloud of different years

ICP (Module registration) can be used to allign scans of the same tree taken at different times, in the example one cloud represents the unprooned tree and the second cloud was taken after the prooning.





The initial allignment (semi automatic) is described in Hackenberg et al. 2015b.

  • References

The figures are taken from two peer-reviewed publications presenting the method and the software:

Hackenberg, J.; Morhart, C.; Sheppard, J.; Spiecker, H.; Disney, M. Highly Accurate Tree Models Derived from Terrestrial Laser Scan Data: A Method Description. Forests 2014, 5, 1069-1105.

Hackenberg, J.; Spiecker, H.; Calders, K.; Disney, M.; Raumonen, P. SimpleTree —An Efficient Open Source Tool to Build Tree Models from TLS Clouds. Forests 2015, 6, 4245-4294.

HRCS Stereo-based Road Area Detection final report
Wednesday, November 19, 2014

The Honda Research Institute code sprint was finished. All code was commited, the final report is attached below.

Improved inputs and results ( Third phase )
Friday, August 22, 2014
  • Introduction

    In this phase, the focus was to integrate the pcl::KinfuTracker class and the OpenCV libraries in order to provide better input clouds for the Rigid/Non-Rigid Registration methods and thus to obtain better results.

  • Approach

    By using the pcl::KinfuTracker class, it was possible to obtain an almost full scan of the heads of the subjects like the ones presented below:

    _images/sub1_front.png _images/sub2_front.png

    Using the pcl::KinfuTracker had the disadvantage that unwanted objects were also scanned during the procedure, however by using the cv::CascadeClassifier, the program was able to pin-point the position of the head and move the statisitcal model to a favorable position so that the Rigid Registration method could fully align the model. ( The sphere represents the center of the face)

  • Results

    Obtaining better input point-clouds, allowed to better analyze the efficiency of these types of registrations. As stated in the previous post, the accuracy of the result depends on the regularizing weight of the Non-Rigid Registration, however there is one more parameter to take into account. The training set of face meshes was registered on a ten times smaller scale than the PCL point clouds are stored in. Intuitively, this means that when the program reads the database it should register the values as ten times smaller, however the subjects used for testing this program did not have heads of exactly the same size.

    Below, you have an example of what happens when the scale is not set right:


    This result was obtained because, as it is presented in the folowing picture, the chin of the model was matched with the neck of the target, even though the rest of the face seems to be in position:


    Once the scale was properly established the correct result was obtained:

  • Feature work

    • A method should be implemented so that it would not be necessary to set the right scale, no matter what kind of person is being scanned (child/adult)
    • The pcl::KinfuTracker class has not been officially released, thus further maintenance of the prgram is required
    • The Registration methods presented so far have to be compared to other methods like the ones already implemented in PCL
Final Example “Final example” (Fourth post)
Monday, August 18, 2014
  • Introduction

Once we have the segmentation, and the functions to make the measures of the segmented candidates to objects, we need to create a example where we analyze all segments and we decide which ones are objects or not. Also the objects that have more similarities in the measures, we can classify into a category.

  • FrameWork

This tutorial gives an example of to implementation of the algorithm present in:

“Andrej Karpathy and Stephen Miller, and Li Fei-Fei”, “Object Discovery in 3D Scenes via Shape Analysis”, “International Conference on Robotics and Automation (ICRA)”, “2013”


As the autors said in the abstract: “We present a method for discovering object models from 3D meshes of indoor enviromments. Our algorithm first descomposes the scene into a set of candidate mesh segments and then ranks each segment according to its “objectness” a quality that distinguishes objects from clutter”.

In this example we will make all the framework to obtain the segmented objects at first and then calculate some geometric measures to make a clasification of the “objectness of the segmented candidates. After that we save all the “good” (the segments that the algorithm think are a object) and the measures asociated to each object. This can be used by the Recurrence to improve the clasification of the objects.

We begin with the based graph segmentation that was explained before. We made a over-segmentation of the meshes. To obtain diferent segments for each scene. We need to make this segmentations in all scenes that we want tto use for Object Discovery.

After we have the segmentation, we need to select the object-like segmentes. For that we apply some test to discard the not correct segments.

-Min and Max number of points. If the segment dont have the engouh points or is really bigger, we discard this object.

-After we take a look about the shape, we made two test, one about the shape, if is really flat or thin we discard this segment. Also we look into the size in the 3 axes. If this size is under or over a threshold is also rejected the segment.

-The next test is about diferent geometrical measures. This meauses where explained in the second post of this blog. We compute all this measures for each segment and we store this value. This will be used in the next test.

-Due to, we make a over-segmentation with different thresholds, probably will have the same segment more than one time. For that we make another test for each scene. This test name non-max supresion consist in look into all the segments with a result of more than 80% for a intersection ove union. If they have more than this, we compare the value of the measures, the object with the hightst value of this measure is keep and the other is discarded.

This is the last test. Now we show the results for a couple of scenes.

Scene 1

Initial scene

Object-like segments presents in the scene

Initial scene

Object-like segments presents in the scene


We store for a great number of scenes, all the segments and measures for each segment. Finally in the next step we add another measure, the recurrence.

More snapshots of some final results from different scenes:

_images/0.png _images/11.png _images/21.png _images/31.png

Last Step: Recurrence

We are going to compute the recurrence in the previous results. For that we need to have the results of the object discovery. This recurrence consist in analyze all the segments in all the scenes, because segments that are commonly found in other scenes are more likely to be an object rather than a segmentation artifact. For that in the paper explains that, that the best way to measure the distance between the top k most similar segments is to find the objects with a 25% of extent along principal directions in size and measure the euclidean distance between their normalized shape measures. This is the best option, taking care the computacion time. We can use in the case other approachs like icp, ransac or local descriptors, global descriptors. In this case, for all objects founded in the segmentation proccess from before, we are going to obtain this score and find the 10 most similar objects.

After aplying the Recurrence we show the next results:

1st Query for an object


This case is really good (the implementation works!!!), and make more easy to understand the paper. After all the work, we obtain a possible object in a scene ( a cup) also with the aid of all scenes, give us a measure quite confident to clasify this objects. How much big is the number of scenes, this score improves its quality. This means, more scenes more quality.

2st Query for an object


3st Query for an object


4st Query for an object


Finally I think that I accomplish the main goal of the GSoC program. That was implement the algorithm in the PCL format.

To use this program and to look into the tutorials to we need to wait to be acepted a possible pull request with the source code, the 2 class that I create into the PCL website and in the PCL library.

that’s all folks!!!

P.S. I will update the post with links if I get the pull the request of the code that I created during GSoC.

Different Feature representation of surface patches and analysis
Monday, August 18, 2014

This project aims at getting features for each surface patch and group them to gether based on a machine learning technique. I have been experimenting on the features that could represent the surface patches belonging to an object and how they relate with the inter and intra object surface patches. Below are some of them, with their description and plots.

For all the below analysis please look at the image below and the segments derived out of basic region growing which doesn’t make use of RGB color and purely based on surface curvature.


ColorHistogram: Binning colors is pretty normal thing to get the appearance cues. But they can be done in two ways. One that captures the color channels independently and the other that captures a dependent binning. In this section I am showing the independent binning of values on three different channels. Independent means at a pixel/point in scene we check RGB values different and increment the bin where these values belong to. Below is the plot that shows how the histograms look like for RGB and HSV color space.

_images/color_rgb.png _images/color_hsv.png

ColorHistogram 3D: Binning colors dependently means having a matrix with RGB in 3 dimensions and increment the value of the (r, g, b) bin only if that combination is satified. This gives a 3D histogram. Below images show the 3D histogram concatenated to a 1D histogram for RGB, HSV and YUV color space. This binning style is referred from [1].

_images/colorh_3d_rgb.png _images/colorh_3d_hsv.png _images/colorh_3d_yuv.png

Verticality: Verticality actually represents how the surface patch is orientated with respect to the camera viewpoint. Histogram is developed by binning the difference of angles between the normals and the direction in which the camera is pointing to (i.e the positive z axis for any point cloud from Kinect). Below is the histogram plot for this feature. However this is not so useful to this segmentation project but more aimed towards the object discovery and saliency related problems to distinguish the a surface patch from its peers. This is implemented based on [2] which was adopted from [3].


Dimentionality Compactness: This actually shows how compact a surface patch is. One could do a PCA of a surface patch and derive a local frame of reference. This when followed by creating a bounding box gives the 3 dimensions in which the patch ranges the max. This bounding box will have 3 dimensions and their ranges are computed. Once this ranges (xrange, yrange, zrange) are sorted into min_range, mid_range and max_range, two ratios are computed. 1) min_range / max_range and 2) mid_range / max_range. Below is plot of this histogram for the above mentioned segments. This is implemented based on [2] which was adopted from [3].


Perspective scores: This is the ratio of the area projected in the image to the maximum area spread by the region in 3D. The pixel_range below means the bounding box range in the particular direction of the image pixels. xrange, yrange and zrange are the dimentions of the 3D bounding box of the surface patch. Note that PCA shouldn’t done here as we are comparing the 3D surface patch with its appearance on the image. This is implemented based on [2] which was adopted from [3]. Below are the elements of the histogram.

  1. pixel_x_range (in meters) / xrange

  2. pixel_x_range (in meters) / yrange

  3. pixel_x_range (in meters) / zrange

  4. pixel_y_range (in meters) / xrange

  5. pixel_y_range (in meters) / yrange

  6. pixel_y_range (in meters) / zrange

  7. diameter_of_bounding_box_pixels (in meters)/ diameter_of_cuboid_bounding_box_in_3d

  8. area_of_bounding_box (in metersquare) / area_of_the_largest_two_dimentions_in_3d


Contour Compactness:

This is the ratio of the perimeter to the area of the region. This is the ratio of number of boundary points computed for a segment to the total number of points in the region. This is implemented based on [2] which was adopted from [3].



[1] A.Richtsfeld, T. Mörwald, J. Prankl, M. Zillich and M. Vincze: Learning of Perceptual Grouping for Object Segmentation on RGB-D Data; Journal of Visual Communication and Image Representation (JVCI), Special Issue on Visual Understanding and Applications with RGB-D Cameras, July 2013.

[2] Karthik Desingh, K Madhava Krishna, Deepu Rajan, C V Jawahar, “Depth really Matters: Improving Visual Salient Region Detection with Depth”, BMVC 2013.

[3] Alvaro Collet Romea, Siddhartha Srinivasa, and Martial Hebert, “Structure Discovery in Multi-modal Data : a Region-based Approach”, ICRA 2011.

These features will go into the features module of the pcl_trunk pretty soon. Currently working on the relational features which tells how two surfaces are related to each other. Next blog post should be on that.

Almost close to the end stage of GSoC, but this project has long way to go! Hope to keep pushing stuff till the entire pipeline is up on PCL.

Implementation of the shape generator finished
Wednesday, August 13, 2014

I finished the work on the shape generator module. It became quite a versatile tool. Here is an overview what one can do:

  • Create simple shapes (Sphere, Cylinder, Cone, Wedge, Cuboid, Full Torus, Partial Torus).
  • Create any type of polygon shape (Just give it a list of vertices and edges). If you want to use the more advanced features like cutting the polygon needs to be convex.
  • The list of simple shapes can be easily extended using convex polygon.
  • Combine any number of shapes/objects into more complex objects.
  • Create cavities, holes, tunnels by cutting one shape/object by another.
  • Everything you generate will have normal information attached to it.
  • Full control over the labeling of shapes. Label each shape individually, label a group of shapes, give the full object one label .... This allows one to tailor groundtruth to what you actually want to benchmark.
  • One class for reading assembly instructions from human readable text files (recipes). You can access all features of the generator except the general polygon class. Recipes can also call/include other recipes. This makes it possible to create complex objects and reuse them in different settings. Another example would be the creation of several objects which can easily be arranged in various scenes.

Here are some examples how we can create a table top scenario consisting of a table, a cup on top and a saw which cuts into the table using recipes (shown on the right side). First we make and compile a recipe file for a table leg:


Next we will combine four legs plus a cylindrical plate to a full table (we set the variable Label to keep, which preserves different labels for the object’s parts):


If we set Label to unique it will tell the generator to give only the table parts unique labels (not the parts the parts consist of :) )


This labeling would make sense if you want to test for instance part segmentation algorithms like LCCP.

Now let us combine the table and two objects. We have multiple options for the labeling now: First: Each object gets its own label:


Second: Give the parts of the objects their own label:


Or keep the labels of the parts and parts of parts unique :) :


Just to mention: All points have normal information attached:

Wednesday, August 13, 2014

Initially, the main objective of my project was to quickly obtain skeleton information using the GPU People module and focus the construction of the classification framework to recognize human actions. As it turned out that strong modifications of the gpu/people module were necessary in order to obtain the desired output of skeleton positions, the main direction of this project has moved more to extending and improving the gpu/people module. In the end, a simple action recognition framework was implemented, using K-Nearest-Neirghbour classifier on the positions and angles of selected joints, and RGB-D data of people performing the defined actions was collected for training.

The project had following major contributions:

  • Calculation of the skeleton joint positions and their visualization (in the new People App the skeleton is visualized by default)
  • Using Ground Plane People Detector in order to segment the body before the body labeling is applied. Without this modifications, the pose detector has big problems with large surfaces (walls, floor ...) and works robustly only in near-range. Besides this, the speed of the detector is increased as most of the voxels obtain very high depth depth values
  • Tracking of the skeleton joints (see my previous post). Alpa-Beta filter is applied on the measured joint positions.
  • Offline skeleton detection from the recorded LZF files.
  • Classification of skeleton data sequences with K-Nearest-Neighbours

The source code of the new version of the gpu/people module can be downloaded here: https://github.com/AlinaRoitberg/pcl/tree/master/gpu/people

Building the app requires PCL installation with GPU enabled. For more information, see the PCL information on compiling with GPU and the tutorial for the initial /gpu/people module

The new application can be executed in the same way as the initial PeopleApp, while new functions can be actuvated with folowing additional flags:

-tracking <bool> activate the skeleton tracking
-alpha <float> set tracking parameter
-beta <float> set tracking parameter
-lzf <path_to_folder_with_pclzf_files>
-lzf_fps <int> fps for replaying the lzf-files (default: 10)
 <path_to_svm_file> activates body segmentation

Tree files for people detection: https://github.com/PointCloudLibrary/data/tree/master/people/results

SVM file for body segmentation (Ground Plane People Detector): https://github.com/PointCloudLibrary/pcl/tree/master/people/data

Example of execution (active segmentation, no tracking, using live data):

./pcl_people_app -numTrees 3 -tree0 ../tree_files/tree_20.txt -tree1 ../tree_files//tree_20_1.txt -tree2 ../tree_files/tree_20_2.txt -segment_people ../svm_path/trainedLinearSVMForPeopleDetectionWithHOG.yaml

Example of execution (active segmentation and tracking, using recorded PCLZF data)

./pcl_people_app -numTrees 3 -tree0 ../tree_files/tree_20.txt -tree1 ../tree_files//tree_20_1.txt -tree2 ../tree_files/tree_20_2.txt -segment_people ../svm_path/trainedLinearSVMForPeopleDetectionWithHOG.yaml -lzf /path_to_lzf_files/ -lzf_fps 10 -tracking 1

Take a look at the video demonstrating the modifications of the skeleton tracker (sorry for the format:) ): http://youtu.be/GhCrK3zjre0

Further improvements of the skeleton detection, data collection and activity recognition framework
Tuesday, August 12, 2014

Joint tracking

Although the body segmentation has solved the problems with full body visibility the resulting joint positions were still not sufficient. Besides the Gaussian noise, big “jumps” occured if suddenly a blob with a compeletly wrong position was labelled as the corresponding body part.

To solve this, I implemented a simple tracking function (Alpha-Beta-Filter) on top of the joint calculation. The filter estimates the new position based on the predicted (calculated from the previous position and velocity) and measured position. The weight of the measured position is given by the parameter alpha, while beta shows the weight of the velocity update.

Activation of tracking, alpha and beta parameters can be set in the PeopleApp.

The parameters should be chosen wisely, current default values worked good for me, but the also depend on the frame rate, which is not easily predictable, as it depends on the GPU.

Other changes to the skeleton calculation

Integration of tracking improved the results, however, problems still occured if the voxel labelling failed. Unfortunately, this happed a lot with the hands if they were very close to the rest of the body (and could be hardly distinguished).

That’s why I added some correction based on the position of the elbows and the forearms. I estimate the expected position based on the positions of those body parts and if the measured position is too far away, the predicted result is used.

Besides I discovered the global variable AREA_THRES2 (in people_detector.cpp) and increasing it from 100 to 200 improved the labelling. Increasing it reduces the noise, while making the probability that small blobs (like hands) will be missed higher.

Data collection

As I have mentioned in my previous post, I wanted to collect the skeleton information from the recorded Kinect data in order to use the full framerate. To do so, I extended the people app with the option of using recorded PCLZF videos instead of the live Kinect data.

The positions of skeleton joints can be stored in a txt file with following flag: -w <bool val>

The data format is defined as following: SeqNumber Timestamp Pos1_x Pos1_y Pos1_z Pos2_x Pos2_y Pos2_z

I recorded the Kinect data of ten people performing the activities (defined in my previous post) multiple times. Afterwards I segmented the PCLZF-files with each directory containing the one execution of an action and run the people detector on each of them (I had a lot of fun with it:)). As copying the files and running the skeleton detector really took me “forever” I had to stick with a part of data and focus on the activity detection framework. The current framework contains segmented training data from 6 participants (269 recordings all together) and I will of course add the rest of the data, but I doubt that it would happen before the end of GSOC.

Another file format was defined for the training data, which includes all recorded skeletons, annotation of the action and UserID.

Classification Framework

I implemented a simple K-Nearest-Neighbours algorithm to classify a skeleton sequence. As all sequences have different lengths and the number of frames is high, one should find a way to calculate a good feature vector. In the implemented framework, classification happens in following way:

  • Read skeleton data is divided into a fixed number of segments
  • For each segment the joint positions are calculated as the mean value over all frames in the segment
  • For each resulting skeleton (one per segment) further feature selection takes place: the decision which joint positions and angles to use
  • All joint positions are calculated in relation to the Neck position (and not the sensor)
  • The height of the person is estimated (difference between the head and the feet in the first or last segment) and the positions are normalized accordingly
  • Mean and variance of each dimension of the feature vector is calculated over the whole training set and the data is normalized accordingly (this is important in KNN, to let all dimensions contribute in the same way).
  • The feature vectors can be classified with K-NN (1-NN by default).

The application can classify a new sample as well as perform Leave-One-Out cross validation on the training set and print out the confusion matrix.

Currently I am getting following results:


The recognition rate of 34% might not sound that great. However one should consider the high number of actions and the fact that the skeleton data has some problems with joint positions, especially with the hands, which makes the recognition very challenging. The people detector also has some severe problems with unusual poses.

Further data processing, feature selection and more complex classification methods might improve the performance significantly in the future.

Second implementation of Code “Segmentation and creation of object candidates” (Third post)
Monday, August 11, 2014
  • Introduction

The implementation of the Efficient Graph-Based Segmentation, base on:

Efficient Graph-Based Image Segmentation Pedro F. Felzenszwalb and Daniel P. Huttenlocher International Journal of Computer Vision, Volume 59, Number 2, September 2004
  • Work

Due to license problems, I had to re-implement the code of the segmentation, also I need to implement the same code to use in Mesh and Point Clouds. To use in images, we need to create a graph, based on a image. To improve the results, In the paper recommend to smooth the image before the segmentation. I use the gauss filter implemented in PCL, for this purpose. After the code was written, and with the access to original code, I made some test to check if the segmentation was good.

The first test ,was about the segmentation without the gauss filter applied for each case and after, I made the same picture with the gauss filter .

  • Results

Without Gauss Filter(Original Code)


As we can see in this image, there are a lot of segments, but if we compare this image with the next image, the adapted code is the same segments in both of them. The colors are different, because the colorization is generated random.

Without Gauss Filter(Adapted Code)


Then a test, with a sigma for the Gaussian filter of 0.5, and a threshold for the algorithm of 500 was made in the same image for both cases.

With Gauss Filter (Original Code)


With Gauss Filter (Adapted Code)


Here we can make that, there are some differences. After some research in the code. I found that, when we compute the weight for the graph, there are some minimal differences, this gives us the idea, that the smooth of the image, is different in both cases. Is small difference, some decimals, but this creates a different result. I didn’t make more research about it, because is not part of the GSoC task.

  • Point Cloud segmentation

Once we have the graph segmentation working, the next step is made the algorithm to use with Point Clouds. For that, the only thing that is needed to use, is create a new graph based on the difference of normal or curvature.

As we now, that the part of graph segmentation, we only need to check if the graph is generated correctly. For that I apply the segmentation to a mesh, and return the Point Cloud, this point cloud have some segments. Each segment have a unique color to see if it’s correct.


The original mesh


The mesh segmented


The results are correct, this mesh have a lot of different objects, and this objects are well segmented, can be asociated for one possible object in the scene. Each of this objects, will be apply the diferent measures to test and discover if is a possible object or not.

3. Some real results! (a.k.a. Not hacks!)
Monday, August 11, 2014

In this post I will show you some results I got yesterday of our demo code working in a slightly cluttered scenario and fitting superquadrics to household objects.

Here is the environment we wish to recognize:


Here are the fitting results:


Here a bit more of detail with the colored pointclouds:


And here the fitted superquadrics alone for a better view:


The code used for the results shown in the rest of this post can be found here .

So, how does this code work? A general overview would go like this:

  1. Connect your Kinect to your PC :0}
  2. A snapshot of a pointcloud (captured by OpenNI grabber) is taken (demo.cpp)
  3. The pointcloud captured is segmented (table + objects) in clusters (demo.cpp)
  4. The clusters (one-view) are mirrored based on [Bohg2011] (tabletop_symmetry/mindGapper.cpp)
  5. The mirrored clouds are then fitted to a SQ (SQ_fitter_test1.cpp)


  1. Using mirrored pointclouds helps substantially to the fitting process.
  2. I have implemented the optimization process using two different libraries: levmar and ceres. While ceres have to be installed from a external repo, levmar can just be added to the code (sq_fitting/levmar). It only depends on lapack and it is lightweight. I have to test how both compare in processing times. The results I am attaching were obtained using the levmar optimization process.
  3. If you noticed, the milk carton was not fitted. I got an spurious result there given that the mirroring code did not work well with that object.
  4. Fitting times (still in debug mode, not optimized and with warning printouts here and there) is < 2 seconds in my old laptop.
  5. You might notice that for one of the objects (the blue water container) the fitting produces a SQ longer than needed (cross the table). How to limit it to be shorter?
  6. For objects like the tuna can, in which the “bigger axis” (assumed to be Z) is horizontal, the fitting is not exactly as it should be (the revolution axis should be UP, but it is horizontal. How to deal with objects that are wider than taller?

Things to figure out:

  1. Levmar or Ceres? So far I am more inclined towards levmar since it is smaller and - other than lapack, which should be installed on most Linux machines - it can be modified. I know Ceres is fairly popular but I am not sure if we need all that power.
  2. Figure out the “wide” objects issue.
  3. Improve my mirror code for cases like the milk carton with bad viewing angles.
[Bohg2011]Bohg, Jeannette, et al. “Mind the gap-robotic grasping under incomplete observation.” Robotics and Automation (ICRA), 2011 IEEE International Conference on. IEEE, 2011.
First implementation of Code “Objectness measures” (Second post)
Wednesday, July 30, 2014
  • Introduction

I begin with main part of the code, and probably the most easy to implement. In this part I write the code to make this measures for each Point Cloud. The input in the code is a point cloud (or set of pointclouds) and the output a vector with the values of the resultant measure. To simplify the code part and the use of this for other aplications,this part is separated from the segmentation, also if anyone want to use other segmentation that the proposed in the paper is usefull.

  • “Objectnes measures”

We have 5 different measures:

  • Compactness: This measure looks into how compact is a object. Search the minimum bounding sphere that contains all the surface of the mesh of the pointcloud.
  • Symmetry: This measure analyze the reflective symmetry along the three principal axes, based on the eigenvalues of the scatter matrix.
  • Smoothness: This measure evaluate the quality of the points of the segments, if the segments have points uniform spread around it, this will score high, if have spread points this will be have a low score.
  • Local Convexity: This measure determine the convexity of each polygon edge in the mesh of the point cloud,and score each segment by the percentage of its edges which are convex.
  • Global Convexity: This measure is generated by a convex hull, and after that a mean distance between the points and the convex hull isrecorded to create this measure.
  • “Future Work”

In the future work, I need to implement the segmentation to give data to measure the objects and test this code.

Also, there is one measure missing, one that use all the segments in diferent scenes to make a better clasification.

This is the “recurence”, but this depends on the on the number of scenes and segments, this need the segmentation before and analyze the number of similar segments in the objects in all scenes. Objects with similar segments should be in the same category.

GRSD Descriptor computation and analysis
Monday, July 28, 2014

Global Radius-based surface descriptor concatenates the RSD descriptor as discussed in the previous post to represent the complete object. GRSD gives a good description of the 3D shape of the object. Below are the set of objects and its GRSD descriptors i.e. histograms. I have used University of Washington’s “Large scale RGBD dataset” for the experiments.

For an object whose surface is planar but has 2 different planes in the view
_images/box_image.png _images/box.png
For an object whose surface is planar but has 1 planes in the view
_images/plane_image.png _images/plane.png
For an object whose surface is spherical
_images/sphere_image.png _images/sphere.png
For an object whose surface is cylinderical but doesn’t have any planar surface in view
_images/cylinder_image.png _images/cylinder.png

It can be seen that all the descriptors are different from eachother. Planes and box surfaces are similar as the surface characteristics are similar in this case. Both GRSD and RSD are pushed into the pcl-trunk for people to use. The test files for these two features are also included in the trunk for the basic usage of the same.

Currently working on the NURBS for small surface patches. Since NURBS are already available in PCL we will be looking at how to tailor the same for our needs. After this we plan to work on the features that compute the relationship between the surface patches.

Object discovery in KINFU DATA (First post)
Sunday, July 20, 2014
  • Introduction

The principal goal of this project will be implement the algorithm and app developed in:

-Fei-Fei, L., Karpathy, A., Miller, S.”Object discovery Via Shape Analysis”.

This automatic object discovery will be useful for robotics or autonomous vehicles and it will be able to find different class of objects. This new approach compares different parameters of an object to classify it into a class and diffence the objects. The parameters that defined a object is based on the: “Objectness measures” part in the paper. This is the core of the app and the most important part. And probably a good part to begin to code because is quite easy to code.

Rigid and Non-Rigid Transformation ( Second phase )
Thursday, July 17, 2014
  • Introduction

    In the previous phase, it was presented how to obtain a statistical model from a set of face-meshes. The next step in our project is to “match” the mean face of the database, with the face of a random person, like the one in the picture below:


    The matching is done by applying alternatively the following methods.

  • Rigid Registration

    This method is very similar to the Iterative Closest Point Cloud algorithm, because the goal is to estimate a rotation matrix and a translation vector that would move the average face to an optimal position, near the face of the kinect. Basically, it is required to minimize the error \epsilon =  \sum ||\vec {y} - (R \cdot \vec{x} + \vec{t})||^2 and this is done by calculating the solution of this system in the least square sense. In order to calculate this solution, the system is first linearized using the Jacobian matrix.

    Of course this process is applied iteratively, and below are presented a few stages of positioning of the model over the scan:

    _images/debug_1.png _images/debug_2.png _images/debug_4.png _images/debug_5.png
  • Non-Rigid Registration

    Once the model is roughly aligned, we need to modify the shape of the model to match the face from the scan. For this we make use of the eigenvectors computed in the previous phase and we calculate the optimal solution of this system: \vec {y} = P \cdot \vec{d} + \vec{model}, where P is the matrix of eigenvectors, \vec{model} is the current form of the model and \vec{d} is the vector of basis coefficients that need to be determined.

    However, there is on more constraint to be applied and that is to minimize the sum \sum_i \frac{d_i}{\sigma_i}, where \sigma_i is the eigenvalue of the corresponding eigenvector. Therefore, to the Jacobian matrix of this system, we need to add a diagonal matrix with \frac{1}{\sigma_i} on the diagonal and multiplied by a certain weight.

    The purpose of this regualrization is to determine to what degree the face should be deformed. The eigenvectors are stored in the P matrix in decreasing order according to their eigenvalues and their position in this sorting order determines whether they have a greater or a smaller influence on the shaping of the model. When the model is mostly overlapping with the face in the scan, more information can be drawn about the final figure, hence the weight specified above should be smaller . On the other hand, if the model is not yet aligned with the scan, the deforming should be smaller and thus the weight should be bigger. Below you can see how the model looks for several values of the weight:

    _images/weight1.png _images/weight2.png

    Notice that the shaping process tends to return the same effect if the weight of the regularizing constraint exceeds a certain value.

  • Results

    As mentioned above, these functions are applied alternatively for a few number of times, and the following results were obtained:


    The above picture was obtained after one iteration and the following one after 10:


    Also, below you can observe the precision of this method, the black figure representing the final version of the model and the green one representing the point cloud of the face:

Implementation of the shape generator
Tuesday, July 08, 2014

The work on the lccp implementation is mainly done and the pull request is awaiting its approvel. Thanks a lot for the help of the pcl community (especially Sergey and Victor). Thanks to your comments the lccp algorithm is now in a much better shape. Talking about shapes: The next milestone in this GSOC project is the implementation of a shape generator which can be used to create various labeled scenes which can be used to create unit tests or benchmarks for part and/or object segmentation algorithms. I have written a big part of the code already. Now the question is: To which module of pcl should this generator go? I think about putting it into the geometry module. Any comment on this is more than welcome! The next image shows an assembled animal-like object which has been generated from simple geometric shapes (mainly geons).