Description

The Regressor-based and End-to-End Landmark Detection sample demonstrates how to use the NVIDIA^® proprietary deep neural network (DNN) MapNet to perform lane marking detection
and landmark detection on the road. It detects the lane you are in (ego-lane), and the left and right adjacent lanes when they are present. Landmarks include
vertical poles, intersection markings (e.g., crosswalks) and roadmarkings (e.g., arrows) on the road.
MapNet has been trained with RCB images and its performance is invariant to RGB encoded H.264 videos.

This sample can also stream a H.264 or RAW video and computes the multi-class likelihood map of lane markings on each frame. A user assigned
threshold value binarizes a likelihood map into clusters of lane markings, then image post-processing steps are employed to fit polylines onto the lane clusters
and assign them with lane position and appearance types. The sample can also be operated with cameras.

Sensor Details

The image datasets used to train MapNet have been captured by a View Sekonix Camera Module (SF3324/5) with AR0231 RCCB sensor.
The camera is mounted high up at the rear view mirror position. Demo videos are captured at 2.3 MP and down-sampled to 960 x 604.

To achieve the best lane detection performance, adopt a similar camera setup and align the video center vertically with the horizon before recording new videos.

Running the Sample

The Regressor-based and End-to-End Landmark Detection sample, sample_landmark_detection_by_regressor accepts the following optional parameters.
If none are specified, it will perform detections on supplied pre-recorded video.

./sample_landmark_detection_by_regressor --input-type=[video|camera]
                                         --video=[path/to/video]
                                         --model-type=[regressor|e2e]
                                         --camera-type=[camera]
                                         --camera-group=[a|b|c|d]
                                         --camera-index=[0|1|2|3]
                                         --roi=[x,y,w,h]

Where:

--input-type=[video|camera]
        Defines if the input is from live camera or from a recorded video.
        Live camera is only supported on On NVIDIA DRIVE platform.
        Default value: video

--video=[path/to/video]
        Is the absolute or relative path of a raw or h264 recording.
        Only applicable if '--input-type=video'.
        Default value: path/to/data/samples/laneDetection/video_lane.h264.

--model-type=[regressor|e2e]
        Specifies which type of MapNet model to use for landmark detection.
        Default value: e2e.

--camera-type=[camera]
        Is a supported AR0231 `RCCB` sensor.
        Only applicable if '--input-type=camera'.
        Default value: ar0231-rccb-bae-sf3324

--camera-group=[a|b|c|d]
        Is the group where the camera is connected to.
        Only applicable if '--input-type=camera'.
        Default value: a

--camera-index=[0|1|2|3]
        Indicates the camera index on the given port.
        Default value: 0

--roi=[x,y,w,h]
        Defines a Region of Interest (ROI) where detections occur:
        X: x-coordinate.
        Y: y-coordinate.
        W: width.
        H: height.
        Default value: No ROI.

Examples

To run the sample on Linux

./sample_landmark_detection_by_regressor --video=<video file.h264>

or

./sample_landmark_detection_by_regressor --video=<video file.raw>

To run the sample on an NVIDIA DRIVE platform with cameras

./sample_landmark_detection_by_regressor --input-type=camera --camera-type=<camera_type> --camera-group=<camera_group>

where <camera type> is a supported RCCB sensor. See List of cameras supported out of the box for the list of supported cameras for each platform.

To run the End-to-End MapNet (default option)

./sample_landmark_detection_by_regressor --model-type e2e

To run the regressor based MapNet

./sample_landmark_detection_by_regressor --model-type regressor

Note: The Regressor-based and End-to-End Landmark Detection sample directly resizes video frames to the network input resolution.
To get the best performance, use videos with a similar aspect ratio as the demo video (which has a resolution of 960 x 600).
You can also set a Region of Interest (ROI) to perform inference on a sub-window of the full frame.

Output

MapNet creates a window and displays the final landmark polyline outputs overlaid on top of the video.
The polyline colors represent the detected landmark attribute type as follows:

Lane markings:

Red: Solid lane marking.
Blue: Dashed lane marking.
Light Gray: Road boundary lane marking.

Vertical Poles:

White: General pole type.
Yellow: Roadsign pole type.

Intersections:

Dark Blue: Crosswalk type.
Orange: Intersection type.
Purple: Crossing intersection type.
Pink: Gore area type.

Roadmark:

Cyan: Road text shape marking.

Numbers are displayed on top of landmark detections indicating a track ID for a specific detection.
This track ID is a number that corresponds to the same detection(such as a lane) across camera frames so the same lane detected in multiple frames will have the same track ID associated with it.
The letter E is appended in front of a track ID to indicate a landmark (poles, intersections, road markings) other than lanes to differentiate lane tracks from other landmark tracks.

Landmark Detection Sample

Additional Information

For more details see Landmark Perception.

Table of Contents