drive/driveworks-3.5/samples_2pathperception_2camera_2README_8md_source.html

# Copyright (c) 2019-2020 NVIDIA CORPORATION.  All rights reserved.

@page dwx_path_perception_sample Path Perception Sample (PathNet)
@tableofcontents

@note SW Release Applicability: This sample is available in **NVIDIA DRIVE Software** releases.

@section dwx_path_perception_sample_description Description

The Path Perception sample demonstrates how to use the NVIDIA<sup>&reg;</sup> proprietary deep neural network
to perform path perception on the road. It detects the path you are in (ego-path), as well
as the left and right adjacent paths when they are present. PathNet has been trained with RCB
images and its performance is invariant to RGB encoded H.264 videos.

This sample streams an H.264 or RAW video, computing paths for each frame. The network
directly computes the path vertices and a confidence value for each path.
A user assigned threshold value sets the minimum confidence for each path to be considered valid.
The sample can also be operated with cameras.

#### Sensor details

The image datasets used to train Pathnet have been captured by a View Sekonix Camera Module (SS3323) with
AR0231 RCCB sensor with a 60 degree field of view. The camera is mounted high up at the rear view mirror position.
Demo videos are captured at 2.3 MP and down-sampled to 960 x 604.

To achieve the best path perception performance, NVIDIA<sup>&reg;</sup> recommends to adopt a similar camera setup and align
the video center vertically with the horizon before recording new videos. Also, the detection will perform best
with a 60 degree field of view camera.

@section dwx_path_perception_sample_running Running the Sample

The Path Perception sample, sample_path_perception, accepts the following optional parameters. If none are specified, it will perform path perception on pre-recorded video.


./sample_path_perception
        --camera-group=[a|b|c|d]
        --camera-index=[0|1|2|3]
        --camera-type=[camera]
        --input-type=[video|camera]
        --slave=[0|1]

        --debugView=[true|false]
        --detectionThreshold=<floating-point number in (0, 1)>
        --enableFovea=[true|false]
        --enableFoveaInTopView=[true|false]
        --fps=<integer in (1, 120)>
        --horizonHeight=<integer in (0, image_height)>
        --precision=[int8|fp16|fp32]
        --rig=[path/to/rig]
        --roi.height=<integer in (0, image_height)>
        --roi.width=<integer in (0, image_width)>
        --roi.x=<integer in (0, image_width)>
        --roi.y=<integer in (0, image_height)>
        --showForks=[true|false]
        --temporalSmoothingFactor=<floating-point value in (0, 1)>
        --useCudaGraph=[0|1]
        --video=[path/to/video]
        --windowWidth=&lt;integer window width in pixels>
        --customModelPath=[path/to/custom/model]

where the first five options are only valid on the Drive platform:

    --camera-group=[a|b|c|d]
        Is the group where the camera is connected to.
        Only applicable if --input-type=camera.
        Default value: a

    --camera-index=[0|1|2|3]
        Indicates the camera index on the given port.
        Default value: 0

    --camera-type=[camera]
        Is a supported AR0231 `RCCB` sensor.
        Only applicable if --input-type=camera.
        Default value: ar0231-rccb-bae-sf3324

    --input-type=[video|camera]
        Defines if the input is from live camera or from a recorded video.
        Live camera is only supported on NVIDIA<sup>&reg;</sup> DRIVE platform.
        Default value: video

    --slave=[0|1]
        Setting this parameter to 1 when running the sample on Xavier B allows to access a camera that
        is being used on Xavier A. Only applicable if --input-type=camera.
        Default value: 0

and the remaining options are valid for all platforms:

    --debugView=[true|false]
        Whether to show the default view or the debug view, which includes fishbone lines connecting the predicted points of the network.
        Default value: true

    --detectionThreshold=<floating-point number in (0, 1)>
        The detection threshold parameter is used to determine the validity of a path generated
        by the network. If there is no path with a confidence above this value, then no paths will be displayed.
        By default, the value is 0.5, which provides the best accuracy based on the NVIDIA<sup>&reg;</sup> test data set.
        Decrease the threshold value if path polylines flicker or cover shorter distance.
        Default: 0.5

    --enableFovea=[true|false]
        Enable interleaved fovea-based path prediction mode.
        Default value: false

    --enableFoveaInTopView=[true|false]
        In fovea mode (--enableFovea=true), render fovea paths in top view instead full resolution paths.
        Default value: false

    --fps=<integer number in (1, 120)>
        Frames per second that the video is played at.
        Default value: 30

    --horizonHeight=<int in 0, image height>
        y coordinate of the flat world horizon.
        Default value: 600

    --precision=[int8|fp16|fp32]
        Specifies the precision for the PathNet model.
        Default value: fp32

    --rig=[path/to/rig]
        Rig file containing all information about vehicle sensors and calibration.
        Default value: path/to/samples/pathDetection/rig.json

    --roi.height=<integer in [0, image_height]>
        The height of our ROI.
        By default, the value is set to 800, which provides the best accuracy based on the NVIDIA<sup>&reg;</sup> test data set.

    --roi.width=<integer in [0, image_width]>
        The width of our ROI.
        By default, the value is set to 1920, which provides the best accuracy based on the NVIDIA<sup>&reg;</sup> test data set.

    --roi.x=<integer number in [0, image_width)>
        The top left x image coordinate in the input frame that is to be cropped and passed into the network.
        By default, the value is set to 0, which provides the best accuracy based on the NVIDIA<sup>&reg;</sup> test data set.

    --roi.y=<integer number in [0, image_height)>
        The top left y image coordinate in the input frame that is to be cropped and passed into the network.
        By default, the value is set to 400, which provides the best accuracy based on the NVIDIA<sup>&reg;</sup> test data set.

    --showForks=[true|false]
        Choose to render and display the forking paths.
        Default value: true

    --temporalSmoothingFactor=<floating-point number in (0, 1)>
        The temporal smoothing factor is used to take a weighted average of the model predictions from the current
        frame and the immediately preceding frame. The average is computed as
        x'(t) = (1 - temporalSmoothingFactor) * x(t) + temporalSmoothingFactor * x(t-1). This means that the higher
        the factor, the less the impact of the current prediction on the final output. A factor of 1 would never update
        the output and a factor of 0 would never consider the past input.
        By default, the value is 0.1, which provides the best accuracy based on the NVIDIA<sup>&reg;</sup> test data set.
        Increase the factor value if path polylines flicker.

    --useCudaGraph=[0|1]
        Setting this parameter to 1 runs PathNet DNN inference by CUDAGraph if the hardware supports.
        Default value: 0

    --video=[path/to/video]
        Specifies the absolute or relative path of recording.
        Only applicable if --input-type=video
        Default value: path/to/samples/pathDetection/video_paths.h264

    --windowWidth=<integer window width in pixels>
        Width in pixels of rendered output window.
        Default value: 1600

    --customModelPath=[path/to/custom/model]
        Folder should contain pathnet.dnn, pathnet_metadata.json and tensorrt_metadata.json.
        Default value: <empty string>

@subsection dwx_path_perception_sample_examples Examples

#### To run the sample on Linux

    ./sample_path_perception --video=<video file.h264> --detectionThreshold=<floating-point number in (0,1)>
or

    ./sample_path_perception --video=<video file.raw> --detectionThreshold=<floating-point number in (0,1)>

#### To run the sample on an NVIDIA DRIVE platform with cameras:

    ./sample_path_perception --input-type=camera --camera-type=<camera_type> --camera-group=<camera_group> --detectionThreshold=<floating-point number in (0,1)>

where `<camera type>` is a supported `RCCB` sensor.
See @ref supported_sensors for the list of supported cameras for each platform.

@note Path perception sample directly resizes video frames to the network
input resolution. Therefore, to get the best performance, it is suggested to
use videos with similar aspect ratio to the demo video.  Or you can set Region
of Interest (ROI) to perform inference on a sub-window of the full frame.

@section dwx_path_perception_sample_output Output

PathNet creates a window, displays a video, and overlays a collection of polylines for each detected path.
The path center line is displayed as a thick polyline, with its lateral extent shown as thin polylines.

The colors of the polylines represent the path marking position types and the path attribute that
it detects, as follows:

- DarkRed: Ego path, alternate color : Red
- Blue: Left adjacent path, alternate color: LightBlue
- Green: Right adjacent path, alternate color: LightGreen
- Dark Red: Ego path fork-left
- Purple: Ego path fork-right
- Dark Green: Right adjacent path fork-right
- Dark Blue: Left adjacent path fork-left
- White: Opposite traffic direction

Note that the alternate colors are not used.  In cases, where we need to render multiple overlapping
paths, we can use the alternate color set for constrast.

![Path Detection Sample](doc/images/sample_path_detection.png)
![Path Detection With Opposite Traffic Sample](doc/images/sample_path_detection_with_opposite_traffic.png)

@section dwx_path_perception_sample_more Additional Information

For more details see @ref pathperception_mainsection_pathdetector.