DriveWorks SDK Reference
3.0.4260 Release
For Test and Development only

Structure from Motion (SFM)
SW Release Applicability: This module is available in both NVIDIA DriveWorks and NVIDIA DRIVE Software releases.

About This Module

The Structure from Motion module reconstructs the 3D structure of the scene given a moving camera rig. This is achieved by means of triangulation, e.g. geometric reasoning based on optics and multiple observations over time. The assumption made is that of a static world with a moving observer, e.g. changes in observation are only due to motion of the car and not of changes in the 3D position of the feature itself.

The structure is reconstructed as a point cloud and a series of rig poses, i.e. a 3D location for each tracked feature and the rotations and translations of the camera rig with respect to a fixed world reference frame. The module requires as inputs a list of tracked feature points and an initial estimate for the rig pose at each time instant. These inputs can be generated by the 2D tracker and the Egomotion module respectively.

The reconstructor object (dwReconstructorHandle_t) provides three main functionalities: triangulating 3D points from 2D tracked features, refining the rig pose, and predicting the pixel locations of 3D points in future frames (see dwReconstructor_triangulateFeatures(), dwReconstructor_estimatePoseAsync(), and dwReconstructor_predictFeaturePosition()).


Triangulating points is the first step of the algorithm. 2D features must be tracked over several frames until they are observed with a wide-enough baseline to provide a stable triangulation. With the dwReconstructorConfig structure, you specify a baseline suitable for your application.

Determining When the Baseline is Wide-Enough

SFM determines when there is a wide-enough baseline by waiting until several sequential frames are observed, each of which has a wide-enough baseline to provide a stable triangulation. An additional reprojection check ensures a reduced number of outliers.

A wide-enough baseline is not ensured for the entire rig (i.e. the minRigDistance parameter is not currently used) but the baseline is ensured for each feature being triangulated when you use a combination of minNewObservationAngleRad and minTriangulationEntries.

Rig distance is not a good measure for triangulation accuracy because far-away features require more distance between observations than near features. The algorithm uses the angle between optical rays as a measure instead. An observation is only added if the angle between the new observation and the observations in the history is above the threshold (minNewObservationAngleRad). Moreover, a feature is only triangulated once the number of observations is above a threshold (minTriangulationEntries). Thus, the effective minimum observation angle before triangulation can be approximated by minTriangulationEntries*minNewObservationAngleRad. This ensures a good-enough baseline for triangulation.

Updating History

The reconstructor object keeps a running history of the tracked features and where they have been observed at different points in time. For every frame, you must update this history by calling dwReconstructor_updateHistory(). The algorithm calculates the observation baseline and only adds entries to the history if they contribute information for triangulation.

Getting Triangulation Information

After updating the history, features can be triangulated by calling dwReconstructor_triangulateFeatures. The triangulation uses the internal history accumulated over the previous frames. Only features that have accumulated enough information are triangulated. Once a feature is triangulated, if an entry in the history is detected as an outlier the status for that feature is marked as DW_FEATURE2D_STATUS_INVALID. Triangulated points are returned as a 3D homogeneous point in world coordinates, where the fourth element is zero if the triangulation is invalid.

Pose Refinement

The SFM module requires an initial pose estimate to perform triangulation. The camera rig pose is provided as a dwTransformation3f, i.e. a 4x4 matrix composed of a 3D rotation and translation. The name of the pose argument denotes the direction of the transformation. For example, a pose called rig2World can be used to transform a point in rig coordinates to world coordinates:


where the points are in 3D homogeneous coordinates.

This pose is usually provided through odometry measurements (e.g. using the Egomotion module). However, once enough features have been triangulated this initial pose estimate can be refined by calling dwReconstructor_estimatePoseAsync(). This function optimizes the pose by minimizing the reprojection error of 3D points with regards to the tracked features.

Feature Prediction

Most 2D trackers can greatly benefit from a good prediction of where a previously seen feature will be in the current frame. The SFM module can predict the position of most features given an estimation of the camera rig’s pose (see dwReconstructor_predictFeaturePosition()). The module provides three types of feature prediction according to how much is known about the feature.

  • Triangulated points are directly reprojected onto the image using the estimated rig’s pose, the rig to camera transformation, and the camera intrinsics.
  • Features without triangulation that are below the horizon are temporarily assumed to lie on the ground plane and predicted to move according to the corresponding plane-induced homography.
  • Features without triangulation that are above the horizon are temporarily assumed to be very far away from the car. Thus, only the relative rotation between the rig’s previous pose and its current pose is considered. The features are predicted to move according to the corresponding 3D rotation-induced homography.

Relevant Tutorials